Our coding tool {galaxias} makes it easier for researchers to share their biodiversity data for greater open science impact.
The ALA’s Science and Decision Support Team developed galaxias (clockwise from top right) Dax Kellie, Amanda Buyan, Shandiya Balasubramaniam and Martin Westgate.

A fishy new software package will help researchers share data more easily with the Atlas of Living Australia (ALA).

{galaxias}, a new R and Python package, makes it easier than ever for researchers to bundle data and metadata for sharing.

Martin Westgate, ALA Science & Decision Support Team Lead, says a huge amount of biodiversity data is generated by long-term ecological research run out of universities.

“This data is very rich, so it has a lot of structure to tell us how things are changing over time, and it’s expertly derived,” Martin said.

“People are storing this data, publishing it and sometimes providing it to state governments as part of their permit requirements.

“But it’s not making its way into public data fora, so we felt that the story of biodiversity data in Australia wasn’t as complete as it might be.”

Going with the flow: swimming with research workflows

Dax Kellie, ALA’s Science Program Lead and a co-author of galaxias, says the team designed galaxias with a research workflow in mind.

It’s targeted at researchers with data that they’ve collected and worked with that they might want to share.

galaxias uses code many researchers already know and use for preparing, analysing and visualising their data. The idea is to empower researchers to manage their own data as the experts in their data.

“It gives researchers the agency to standardise their data themselves,” Dax said.

“So it makes data sharing simple and fast, and it does it in a language that most researchers are familiar with.”

When you fish upon a star: formatting data for sharing

galaxias simplifies the process of converting biodiversity data into the Darwin Core Archives (DwCA) format. This format enables sharing of biodiversity data by standardising identifying features, taxonomic classification and data definitions.

A small spotted galaxias fish in a clear plastic box
The Spotted Galaxias (Galaxias truttaceus) is found across southern Victoria, Tasmania, and south-west Western Australia. Credit: Zacky, CC BY-NC

“I guess the challenge or the technical problem that galaxias solves is that no one knows what our data format is,” Martin said.

“People will be collecting data of the same type – I saw this species on this day in this place – but using their own data structures, their own databases and their own mechanisms.”

galaxias makes it easy for users to convert their data to a universal standard, including ways to:
• standardise observational data
• write metadata
• validate minimum requirements for data sharing
• publish data and metadata.

Some-fin to celebrate

The name “galaxias” comes from a genus of small native fish found in rivers, lakes and streams in South Australia, Victoria and Tasmania.

They’re small and rarely seen, and two-thirds of Australia’s Galaxias species are vulnerable or endangered, so they could do with some publicity!

“galaxias is the partner package to galah—galaxias gets data in, galah gets data out. So there’s a real circle there in where galaxias fits into the ALA universe,” Dax says.

“And, like ‘galah’, ‘galaxias’ is the name of an Australian native animal containing our acronym ‘ALA’.”

Galaxias are important members of their freshwater ecosystems and great indicators of ecosystem health, but many are quite delicate and easy to overlook.

Similarly, the ALA’s new galaxias software package aims to play an important role in the open research data ecosystem.

To in-fin-ity and beyond

The Mountain Galaxias (Galaxias olidus) is found on both sides of the Great Dividing Range from southern Queensland, south-eastern New South Wales, across Victoria and further west to South Australia. Credit: Tse Chung Yi, CC BY-NC

Sharing this stream of research data makes for a richer data ecosystem for biodiversity information in Australia.

“Sharing data means that researchers get to influence policy and get to influence the conservation of the species they’re working on–because suddenly that information is available more widely to environmental consultants,” Martin says.

“And the data will be available for other researchers, so they may generate insights that hadn’t been considered.”

Sharing data also provides the opportunity for greater insight because data from many different sources can be used to give a more comprehensive picture on the environment.

Feedback welcome

Our ultimate goal is to make the process of preparing data in Darwin Core simple and stress-free, encouraging more people to share data. If you choose to try galaxias, we’d appreciate any feedback about your experience so we can make Galaxias better!

Feedback can be sent to support@ala.org.au or reported as an issue on the R package GitHub page or Python package GitHub page.

”This is early days and we don’t know exactly what the best package will be,” Dax said.

“We’re very flexible at this stage and happy to amend or adjust, which we’ve done before with galah.

“There’s a lot of potential here if this works to implement it more as a point and click interface too. So the more people who use it, the better; and we can build it together.”

See the quick start guide in R or the quick start guide in Python or our short intro video for more information.