Searching for molecular data in the Atlas
Searching for molecular data in the Atlas
- By Robyn Lawrence
- February 23, 2012
- Tags: Blogs & news How to
The following applies for searching any data set in the Atlas, using facets to restrict the occurrence search result set, for querying the Spatial Portal, and for downloading.
We will choose a particular data set, filter (facet) on a particular type of record – in this case those with molecular DNA data, and look for a particular group of organisms.
Of course, you can experiment with your own searches, faceting and downloads …
How to search the Atlas
For general information on how to search the Atlas »
How to search for Data sets
If you are interested in a particular data set, for instance those containing molecular data e.g., the ‘European Molecular Biology Laboratory Australian Mirror’ or ‘BOLD – Australia’, then one way to access their records is to find the data set, view and then optionally download the associated occurrence records.
Currently, only one data set in the Atlas holds records with DNA samples, with a record type of ‘GenomicDNA’. The data set is the ‘European Molecular Biology Laboratory Australian Mirror’, with an acronym of ‘EMBL‘. Read how to search for data sets »
Note: The ‘BOLD – Australia’, (acronym = ‘BOLD’) data set holds molecular data, but currently its record type is set to ‘PreservedSpecimen’ not ‘GenomicDNA’. It is hoped this anomaly will be addressed before long.
Or you could facet (filter) the results of an occurrence search for molecular data, and download the records. Read more »
Or better yet, use the Spatial Portal to search for molecular data, visually display the records, and download the results. Read more »
With the downloaded results you can access other external molecular tools. Read more »
Facet the results of an occurrence search
On the occurrence search page, the search text field matches against any attribute of an occurrence record e.g., scientific name (at any taxonomic level), collector, data set full name etc.
In the following detailed example, we match against the order, Lepidoptera, by typing it in the auto-complete Search field.
We facet (filter) the record by clicking the Record Type – ‘GenomicDNA’ – to restrict the selection to only those Lepidopteran records with molecular DNA data. Currently, this data only is available in the EMBL data set, so the ‘European Molecular Biology Laboratory Australian Mirror’ is the only Dataset shown in the available facets list to the left.
The user can further facet the returned results to the user’s specific area of interest by clicking on additional facet class values – note only one per facet, e.g. choose a State/Territory.
Pressing the ‘Downloads’ button, brings up dialogue window for the user to fill in their details. A zip file of occurrence records (data.csv) and a list of data providers (citations.csv) is produced by pressing ‘Download All Records’. See Downloading »
Using an individual occurrence record to access the European Nucleotide Archive (ENA)
For ‘European Molecular Biology Laboratory Australian Mirror’ records, click on the More details link. You will be taken to an individual EMBL record in the European Nucleotide Archive.
EMBL has a significant tool set to capture and manipulate sequence data.
When on the occurrence record results page, clicking the ‘Map’ tab produces a basic overview map of the points (see image below). Some additional faceting by colours is available on the map under ‘Colour by’ and the ‘Legend’.
Catalogue Numbers can be used to download selected sequences either individually or in groups from the EMBL source site. The ability to select just the sequence data from targeted areas or from any facet division meaningful to the user, provides a powerful way of targeting only the pertinent sequence data for download.
In the image above, an area around Forbes has been added. The user can choose to selectively download only those records within the area. See Spatial Portal Add Areas to Map and Export Point Sample »
The Catalogue Numbers of the occurrence records can be extracted from the download (data.csv) and used to create a simple comma delimited file which can be uploaded into the ENA search screen. (There are multiple ways this site can accept input search queries).
In the ENA, click on ‘Choose File’ and browse to find a file containing Catalogue Numbers, ‘test.txt’ in the above image.
Results from the search can then be downloaded into multiple formats including, Text, XML or FASTA.
A similar process is also available through the NCBI nucleotide repository.
Data imported into the alignment package (BioEdit) using the FASTA file download option.
Use the Spatial Portal to facet the results for Genomic Lepidopteran data for an area
The same process described above using the Atlas to search for data sets, view and download occurrence records containing genomic data can be ‘replicated‘ using the Spatial Portal. For more info »
- A Blog post about Molecular Data through the ALA’s Data sets and the Spatial Portal. Read more »