Scatterplot

Scatterplot

  • By Dena Paris
  •  February 5, 2012
  •  Tags:  spatial portal help

The scatterplot function links the sampled values of any two environmental variables on a species (or genus etc) with the map. Points on the scatterplot represent the environment found at each occurrence record, as given by the environmental variables of the two axes of the scatterplot.

The scatterplot (environmental space) and the map (geographic space) are linked. Dragging a rectangle over an area of the scatterplot to enclose occurrence points will highlight the corresponding points on the map. You can also define an active area on the map and have all occurrences within that area highlighted on the scatterplot.

Scatterplot 1_640
The scatterplot allows you to:

  • Examine the environmental niche of one or two taxa or taxonomic groups. Using a species as the primary taxa and its genus as the secondary taxa can be highly informative. For example, survey effort for the species may be able to be evaluated. Two similar species can be plotted to see how they may be partitioning the environment.
  • Identify where species do not seem to occur. Areas in the scatterplot that do not contain occurrences are just as interesting as areas that do. Species may not occur in an area of the scatterplot for at least four reasons:
    1. The species may not be able to survive in that environment.
    2. The species may occur in that environment but there has been no surveys to look for it.
    3. The environment may have been surveyed but the species was not found, even though it was there.
    4. The species may be in that environment, but misidentified.(the scatterplot functionality has been designed to help discern what reason seems the most likely)
  • Identify where a species tolerates a wide range of environmental conditions.
  • Identify sub-populations that may adapt best to various climate change scenarios.
  • Identify what environmental combinations exist within an area.


From the menu option, select ‘Tools’ and then ‘Scatterplot’.

Options

The scatterplot requires a minimum of three parameters – a species or taxonomic group and two environmental variables.

  • A primary species or any taxonomic group. The occurrences of this group are the points on the scatterplot (e.g. Eucalyptus camaldulensis).
  • Optionally, a second species or taxonomic group can be selected as a background (e.g. the genus Eucalyptus).
  • Two environmental layers (e.g. Temperature – annual mean (Bio01) and Precipitation – annual (Bio12)).

Faceting on legends

Any legend permits modification of the display of the associated layer. In the scatterplot tool, this means that both the points (occurrences) in both the map (geographic) and environmental (scatterplot) space. To activate the legend in the scatterplot, click on the ‘Species display setting’ button. This will create a floating legend that will permit rendering the points in both spaces on the basis of selected legend properties. For example, in the image at the top of the page, under the facet dropdown box on the legend, “Institution” was selected and the Apply button pressed. After a little while (for many points) the points on the scatterplot and the map will be coloured according to the institution facet.

For more detailed information on Scatterplot faceting »

Additional Options

  • Highlight active area occurrences. Selecting this option will highlight those occurrences on the scatterplot occurring within the map’s defined active area.
  • Display possible environments in area produces a grey-scale background on the scatterplot that delineates the combination of environments that occur within the full range of possible environmental values. Environmental combinations outside the environmental envelope do not occur in nature and are shown in light blue. The grey-scale represents the geographic (map) area of the environmental combinations. Light colours imply a greater area of the environmental combination than darker colours, thus providing an indication of the spatial extents of the environments.
  • The Select records with missing values check box will highlight those occurrences on the map that have one or more missing environmental values. In most cases, missing values will occur when occurrences are off the extent of the environmental surface, e.g. terrestrial occurrences occurring in marine or limnetic environments or vice versa. The number of occurrences with missing environmental values is also listed.
  • Selecting occurrence points on the scatterplot. A rectangle area can be created on the scatterplot by dragging the mouse between any two points. When the left mouse button is released:
    • the area perimeter will be displayed on the scatterplot with a black border.
    • primary taxa in this area will be highlighted on the map.
    • the number of records selected will be listed.
    • the range of both environmental variables for the rectangle area will be listed.
    • the add in/out layers to the map button creates two new layers to the layers list: an IN-group that includes all the occurrences within the environmental rectangle and an OUT-group that contains all occurrences not within the environmental rectangle. The legend and look of the IN/OUT layers can be altered the same way as for other species layers. See the Layer Interaction Panel for a screenshot of the species legend.
  • The Download databutton creates a CSV (comma-separated values) file of the data used to create the scatterplot. The variables are the:
    • Occurrence record identifier.
    • Status of the scientific name occurrence: ‘Uploaded’ or ‘In Active Area’ (for those records highlighted in the active area of the scatterplot, there will be both an uploaded and an in active area record).
    • X – environmental variable value.
    • Y – environmental variable value.
  • The Download image button will create a png image file of the scatterplot as displayed when the button is pressed. This file can be viewed or downloaded.

A case study

A case study on using the scatterplot tool to investigate the distribution of Banksia integrifolia in Australia, is given by Dr Ben Raymond of the Australian Antarctic Division, Hobart.

Read the Case Study »

A worked example

  • Eucalyptus camaldulensis as the Primary species.
  • Eucalyptus (genus) as the background taxonomic group.
  • Temperature – annual mean (Bio01).
  • Precipitation – annual (Bio12).

Once you have entered the name of the primary taxa (Eucalyptus camaldulensis), the (primary) occurrences are mapped.

Scatterplot 3_640

The background taxa group is the genus Eucalyptus. This gives us a good indication of what environments the genus covers and what portion of that environment is covered by E. camaldulensis. These occurrences are only mapped on the scatterplot in orange in the background. The E. camaldulensis is shown by blue points. If the highlight records in the active area was selected, then those records would be ringed with a red circle.

In the worked example, we will use temperature (Temperature – annual mean (Bio01)) and precipitation (Precipitation – annual (Bio12)) as the two environmental variables to define the environment. Once these two variables have been added, the scatterplot is generated. As there is a large number of occurrences (Eucalyptus has over 240,000 records), processing can take up to a minute or so. The distribution of Eucalyptus (orange dots) covers a significant portion of the scatterplot, thereby indicating that the genus can handle a wide range of temperature and rainfall conditions. The majority of the distribution is below 2,500mm rainfall, with two higher rainfall extensions at low and high temperature. To learn more about the environment used by the genus, make it the primary taxa.

Eucalyptus camaldulensis is located toward the bottom of the scatterplot distribution and clearly follows the outline of the genus ‘envelope’ on the low precipitation end, but over a broad range of temperature. This suggests that E. camaldulensis is stereotypic of low rainfall adapted eucalyptus. However, it covers mean annual temperatures from 12°C to nearly 30°C – a very impressive range!

Let’s look at some of the outliers to see where they occur. First, the low temperature end. Drag a rectangle over the lower end occurrences on the scatterplot. This highlights the corresponding points on the map, near Cressy in Tasmania and Macedon in Victoria. The former is low altitude, but further south than the higher altitude Macedon.
Scatterplot 5_640

Let’s do the same at the high temperature end to see where these occurrences are located. Drag the rectangle on the scatterplot and then examine the highlighted occurrences on the map. Not unexpectedly – the high temperature occurrences are found in the extreme north of Australia.

Scatterplot 6_640

Note that the range of temperature and rainfall values of the rectangle are listed above the scatterplot. In this case, a mean annual temperature range of 25.6130°C to 28.0974°C and rainfall between 285.996mm and 485.908mm. Also note that there are 20 records selected

The selected occurrences could be used to create two new mapped layers – an ‘IN-group’ containing only those 20 occurrences and an ‘OUT-group’ containing all the rest. This option can be useful for filtering/separating out a subset of occurrences for further analysis in say the spatial prediction model. Also note that there are 73 occurrences that have one or two missing environmental values of temperature or rainfall. If an IN/OUT groups are created these occurrences are added to the OUT-group by default. If you click the checkbox saying ‘Select records with missing values’, then the corresponding occurrences will be highlighted on the map and added to the IN-group. In all cases, these occurrences are located off the terrestrial temperature and rainfall surfaces; they occur in the ocean. This may be due to the resolution of the surfaces or of the coastline or just inaccurate occurrence locations.

Scatterplot 8_640

Next, let’s consider why E. camaldulensis doesn’t occur in a few environments on the scatterplot.

There is a hole in the distribution of E. camaldulensis at around 25°C and 600mm that is filled by other eucalypt species (shown by the orange Eucalyptus background points) so that environment exists in nature. But why are there no occurrences here? There are at least four possibilities:

  1. E. camaldulensis may not be able to survive in that environment. This is possible here but unlikely.
  2. E. camaldulensis may occur in that environment but there had been no surveys in the area represented by that environment. This is a likely scenario and best addressed by a targeted survey.
    NOTE: You can use the Environmental Envelope option for defining an Active Area (see Active Area Help) to map the locations that conform to this environmental combination.
  3. The environment may have been surveyed but the species was not seen, even though it was there. Another possible scenario but E. camaldulensis is a huge tree, so one would hope that this was an unlikely, but you never know!
  4. E. camaldulensis may have been seen in that environment, but misidentified. Not an uncommon taxonomic problem! The Atlas is incorporating identification keys using Identify Life (http://www.identifylife.org/). You could examine what eucalypts occur in the area represented by this hole by using the Map All option on the Environmental Envelope noted in (2. above) and see if their characters could be confused with E. camaldulensis.

The same situation doesn’t occur with the ‘dent’ in the environment at around 14°C and 2500mm. Obviously that environment doesn’t exist in Australia (at least not represented by the environmental layers we have chosen) – and it is therefore not surprising that no eucalypts are to be found. The eucalyptus background covers much of the potential environmental range indicating the ubiquity of the genus. The grey-scale of the ‘display possible environments in area’ indicate the size of the corresponding mapped areas, with black representing only a small area with this environment in Australia, and reversely white, a large area. For example, there are only small areas of Australia with extreme rainfall (around Tully in Northern Queensland), and a large area of very low rainfall. This can be examined further by examining the environmental layers: Temperature – annual mean (Bio01) and Precipitation – annual (Bio12).

Mean annual temperature and annual rainfall were chosen because these variables were very likely to constrain the spatial distribution of eucalyptus. You may wish to use the Prediction Tool (MaxEnt) to find out which environmental variables best seem to control the distribution of Eucalyptus camaldulensis.

Demonstration Youtube Video

By Lee Belbin, Geospatial Team Leader