Searching for molecular data in the Atlas

Searching for molecular data in the Atlas

  • By Robyn Lawrence
  •  February 23, 2012
  •  Tags:  Blogs & news How to

The following applies for searching any data set in the Atlas, using facets to restrict the occurrence search result set, for querying the Spatial Portal, and for downloading.

We will choose a particular data set, filter (facet) on a particular type of record – in this case those with molecular DNA data, and look for a particular group of organisms.

Of course, you can experiment with your own searches, faceting and downloads …

How to search the Atlas

For general information on how to search the Atlas »

How to search for Data sets

If you are interested in a particular data set, for instance those containing molecular data e.g., the ‘European Molecular Biology Laboratory Australian Mirror’ or ‘BOLD – Australia’, then one way to access their records is to find the data set, view and then optionally download the associated occurrence records.

Currently, only one data set in the Atlas holds records with DNA samples, with a record type of ‘GenomicDNA’. The data set is the ‘European Molecular Biology Laboratory Australian Mirror’, with an acronym of ‘EMBL‘. Read how to search for data sets »

Note: The ‘BOLD – Australia’, (acronym = ‘BOLD’) data set holds molecular data, but currently its record type is set to ‘PreservedSpecimen’ not ‘GenomicDNA’. It is hoped this anomaly will be addressed before long.

Or you could facet (filter) the results of an occurrence search for molecular data, and download the records. Read more »

Or better yet, use the Spatial Portal to search for molecular data, visually display the records, and download the results. Read more »

With the downloaded results you can access other external molecular tools. Read more »


Facet the results of an occurrence search

On the occurrence search page, the search text field matches against any attribute of an occurrence record e.g., scientific name (at any taxonomic level), collector, data set full name etc.

Search the occurrence records for Lepidoptera

Search the occurrence records for Lepidoptera

In the following detailed example, we match against the order, Lepidoptera, by typing it in the auto-complete Search field.

We facet (filter) the record by clicking the Record Type – ‘GenomicDNA’ – to restrict the selection to only those Lepidopteran records with molecular DNA data. Currently, this data only is available in the EMBL data set, so the ‘European Molecular Biology Laboratory Australian Mirror’ is the only Dataset shown in the available facets list to the left.
 

Lepidopteran occurrence results faceted for genomic data

Lepidopteran occurrence results faceted for genomic data

The user can further facet the returned results to the user’s specific area of interest by clicking on additional facet class values – note only one per facet, e.g. choose a State/Territory.

Pressing the ‘Downloads’ button, brings up dialogue window for the user to fill in their details. A zip file of occurrence records (data.csv) and a list of data providers (citations.csv) is produced by pressing ‘Download All Records’. See Downloading »

When viewing a list of occurrence records from the EMBL data set – accessed by many means – navigate to an individual record by clicking the on the link View record.

A single occurrence record from the data set EMBL

Example of an individual Occurrence record from the data set 'EMBL'.

Using an individual occurrence record to access the European Nucleotide Archive (ENA)

For ‘European Molecular Biology Laboratory Australian Mirror’ records, click on the More details link. You will be taken to an individual EMBL record in the European Nucleotide Archive.

EMBL occurrence data record displayed through the European Nucleotide Archive (ENA)

EMBL occurrence data record displayed through the European Nucleotide Archive (ENA).

EMBL has a significant tool set to capture and manipulate sequence data.

When on the occurrence record results page, clicking the ‘Map’ tab produces a basic overview map of the points (see image below). Some additional faceting by colours is available on the map under ‘Colour by’ and the ‘Legend’.

Clicking on the map tab displays a basic map of the Lepidopteran points with molecular DNA data.

Clicking on the map tab displays a basic map of the Lepidopteran points with molecular DNA data.

Clicking on the View in spatial portal sends the same query to the Spatial Portal – a powerful and interactive mapping and analysis tool. See Spatial Portal Help »

The Spatial portal displaying molecular occurrence data for the Order Lepidoptera.

The Spatial portal displaying molecular occurrence data for the Order Lepidoptera, with an additional area added to the map.

Catalogue Numbers can be used to download selected sequences either individually or in groups from the EMBL source site. The ability to select just the sequence data from targeted areas or from any facet division meaningful to the user, provides a powerful way of targeting only the pertinent sequence data for download.

In the image above, an area around Forbes has been added. The user can choose to selectively download only those records within the area. See Spatial Portal Add Areas to Map and Export Point Sample »
 
The Catalogue Numbers of the occurrence records can be extracted from the download (data.csv) and used to create a simple comma delimited file which can be uploaded into the ENA search screen. (There are multiple ways this site can accept input search queries).

Test txt file of Catalogue Numbers

Test txt file of Catalogue Numbers for loading into other external tools

In the ENA, click on ‘Choose File’ and browse to find a file containing Catalogue Numbers, ‘test.txt’ in the above image.

Add the text file with Catalogue Numbers in ENA

Add the text file with Catalogue Numbers in ENA

EMBL occurrence data record displayed through the European Nucleotide Archive (ENA)

EMBL occurrence data record displayed through the European Nucleotide Archive (ENA)

Results from the search can then be downloaded into multiple formats including, Text, XML or FASTA.
A similar process is also available through the NCBI nucleotide repository.

Data imported into the alignment package (BioEdit) using the FASTA file download option.

BioEdit Sequence Alignment Editor

BioEdit Sequence Alignment Editor

Use the Spatial Portal to facet the results for Genomic Lepidopteran data for an area

The same process described above using the Atlas to search for data sets, view and download occurrence records containing genomic data can be ‘replicated‘ using the Spatial Portal. For more info »

Useful Links

  • A Blog post about Molecular Data through the ALA’s Data sets and the Spatial Portal. Read more »