In June this year we upgraded our ALA systems to make them more reliable, more robust, and equipped to manage increasing amounts of biodiversity data. The major upgrade was possible due to our close partnership with GBIF – the Global Biodiversity Information Facility.

The Atlas of Living Australia is the Australian node of GBIF, an international network and data infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to biodiversity data.

Collaboration

Our systems upgrade is the result of more than 12 months of collaborative work between teams based in Australia, UK and Denmark.

The development teams established a software codebase shared by both teams and worked on a framework for genuinely open-source software development.

“The ALA and GBIF development teams worked so well together. There’s a strong sense of trust and confidence in the skills of each team leading to individual contributions contributing to the larger whole.

Our teams know that accessible, high quality biodiversity data is improving research and environmental outcomes across the globe and that their efforts are integral to making this happen,” said Joe Miller, executive secretary of GBIF and former research group leader at the Centre for Australian National Biodiversity Research, CSIRO.

Major upgrade reaps benefits for users

Efficiency and effectiveness

The new infrastructure, using GBIF/ALA shared codebase, increases indexing speeds for the ALA, reduces its running costs, and reduces long-term technical debt. With the ALA’s infrastructure growing in excess of millions of biodiversity occurrence records annually, such efficiencies are critical.

Javier Molina, ALA’s Core Infrastructure Upgrade Project Lead, was pleased to report that users have noticed the system is more responsive, particularly when searching species data, using map-based navigation, and using the Spatial Portal (ALA’s analytical tool).

“Prior to the upgrade, we were experiencing some unscheduled outages every month. Since the project went live in June, we haven’t received any reports of unscheduled outages. The system has also coped well with high peaks of utilisation,” said Javier.

“We recently ingested a dataset from eBird Australia of more than 20 million records, this went smoothy and without issue. A dataset that large would have created lots of headaches for us in the old system.”

Consistency is key

Consistency between the ALA, GBIF and other biodiversity infrastructures is another component of the recent upgrade.

Many users of the ALA also frequently access data through GBIF. For these users, the benefits of a shared infrastructure are clear. Shared approaches to processing biodiversity data eliminate potential sources of confusion for users. For example, if there are consistent flags and interpretations, users become comfortable and proficient using multiple platforms.  

International Living Atlases will also benefit from upgrade

Software originally developed by the Atlas of Living Australia team is increasingly used across the GBIF network. There are now 27 installations around the world. Some of these are national biodiversity databases and some are thematic, regional or institutional instances. The International Living Atlases is a collaborative global community that supports developers who implement the open-source platform.

“Our collaborative work with GBIF has built social capital in the global community, and it is an excellent entry point for the International Living Atlases community to jump onboard,” said Javier.

“Completion of this first phase now makes it possible for the international community to co-develop new features and work on other mutual areas of interest and concern, including a shared registry and improved handling of sampling-event datasets.”

Technical details

  • AWS: virtual infrastructure from AWS performs better and is more affordable. The biggest impact on the infrastructure costs is by far the migration of storage for images from block storage (standard file system) to AWS S3.
  • Solr search engine: By optimising our software stack with the Solr search engine we reduced the required server specifications compared to the previous infrastructure.
  • On day 1, the project went live on an infrastructure that was 43% more affordable than the old infrastructure. This represents a saving of A$79K per annum (comparison based on full on-demand price, production-only systems).

The savings on running a more stable system and co-maintaining an up-to-date codebase will be significant.

Next steps

The roll-out of the upgrade in June was the first of three stages in the project.

Working from a shared codebase allows us to collaborate and innovate together, for example with respect to improving how we handle eDNA data, establishing common vocabularies, and simplifying the data ingestion process.


For more details, visit our Core Infrastructure Upgrade Project page.