Transforming Queensland’s geoscience data ecosystem
Background
The Geological Survey of Queensland (GSQ) has defined the Geoscience Data Modernisation Program (GDMP) for the provision of innovative data management products and services including:
- detailed design, build and implementation of a solution to manage the Geological Survey of Queensland’s (GSQ) geoscience data
- migration of data and functionality from six existing systems into the new solution
- training of staff in both the use of the solution and in data management best practices
- support of the pilot and the solution as it was progressively implemented.
GSQ holds more than 200Tb of geoscience data including:
- Aerial geophysical surveys
- Airborne hyperspectral surveys
- Seismic surveys
- Magnetotelluric surveys
- Borehole data
- Geochemistry
- Production
- Resources
- Geological reports
A true Data Modernisation Project
The project was about leveraging cloud services to modernise the capture, storage and searchability of all geoscience data in QLD. It has created a platform that ensures that any requirement for the capture of data from the industry (structured and unstructured) or the publication of data to the industry is supported by a highly scalable platform that ensures these functions do not need to be provided by individual applications into the future.
In the mining sector, this can support all regulatory reporting by mining companies and the upload, capture and cataloguing of geoscience data. New reporting obligations or new data capture requirements can easily be configured without needing software development effort.
This project delivered a data catalogue as part of a wider geoscience data management platform.
SRA provided business analysis and metadata definition services as part of the overall solution, which realised the FAIR data principles of:
- Findable: Data and supplementary materials have rich metadata and unique and persistent identifiers.
- Accessible: Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository.
- Interoperable: Metadata uses a formal, accessible, shared and broadly applicable language for knowledge representation.
- Reusable: Data and collections have a clear usage license and provide accurate information on provenance.
Components of the solution include:
- Data Catalogue – a CKAN-based data catalogue with a number of extensions that improve user experience, performance, and integration.
- Data Store – low cost, high volume cloud-based data object storage (AWS S3 buckets).
- Data Schemas – standardised data models representing a dataset: metadata, elements and attributes, and relationships to other datasets and data elements. The data schemas are based on DCAT2, ISO, GGIC, PPDM, and other standards.
- Controlled Vocabularies – agreed to sets of terms to enable data to be shared and reused across application, enterprise, and community boundaries.
- Persistent Identifiers (PID) – a long-lasting reference to a digital resource such as a document, file, web page, or another object.
- Linked Data – connecting related data so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other related data.
The Geoscience Data Modernisation Program (GDMP) uses an extensive suite of AWS services to provide a cutting edge, scalable, secure solution. It continues to be updated and evolved in response to changes in business need and technology.
These include (and many more):
- EC2 instances: hosting software such as CKAN, SOLR, GraphDb, VocPrez, Antivirus, and Drupal
- Application Load Balancers
- Autoscaling Groups
- Elastic File System
- Storage
- RDS Databases
- S3-based data lake
- Glacier archiving
- Redshift data warehouse
- Serverless web application: with React hosted in S3
- Serverless web API: with Lambda and API Gateway
- Messaging: with SQS and SNS
- Analytics: with Kinesis Firehose, Glue, and Athena
- Monitoring: with CloudWatch, CloudTrail, and X-Ray
- Security: follows AWS best practice
SRA designed, developed, implemented and currently supports a comprehensive new software solution to support their Geoscience Data Modernisation Program (GDMP). The system is the Queensland Government’s new core solution to manage the Geological Survey of geoscience data. SRA provided the end-to-end delivery of this major business system.
By optimising every aspect of geoscience data, we aim to give the industry a strong advantage in an era of ever-increasing challenges to exploration and development.
Qld Government / Geological Survey of Queensland (GSQ)
Technology to be proud of
Via and agile methodology, SRA delivered:
Web Portal(s), including:
- GeoProperties Database based on SOSA/SSN ontology (for: Borehole Register and a Samples, Observations & Results Database)
- Lodgement Portal for industry to submit statutory reports and notices
Web API’s
Data Catalog
Public and private Data Lake platform and storage
Service Delivery:
ITIL-based operational and software support
ITIL Service Management fr ongoing enhancements
Australian-based Service Desk
Tech specs
- AWS cloud
- AWS-based Data Lake
- CKAN Open Data Catalogue
- PowerBI for analytics and reporting
- AWS Security
- AWS microservices
- Serverless components
- Visualizations directly from the Data Lake
- Catalog crawler
- Spark-based ETL
- Queues
- VocPrez web delivery system for SKOS formulated RDF vocabularies
- GraphDB to store and navigate relationships
- Extra-large data lodgment and download
- ML and AI ready
SRA has helped us make Queensland a leading destination for mining and resource investment.
Geological Survey of Queensland (GSQ) / Department of Resources
GDMP by Qld Govt
The Geoscience Data Modernisation Project (GDMP) will help transform Queensland’s geoscience data ecosystem and enable data-driven exploration, discovery and success for industry.
The project is being managed by the Geological Survey of Queensland (GSQ).
New initiatives
The project initiatives are:
- GSQ Open Data Portal
- GSQ Lodgement Portal
- New industry reporting requirement and confidentiality changes
Our approach
By optimising every aspect of geoscience data, we aim to give industry a strong advantage in an era of ever-increasing challenges to exploration and development.
We will unlock the full value of the data we hold via key focus areas:
- data: ensure data quality and usability through systems and processes
- curation: create a secure, advanced and more accessible data repository
- adding value: embed and enable advanced data analytics
- skills and capability: expand data science and management skills in our workforce.