Uncertainty, priors, autocorrelation and disparate data in downscaling of species distributions

Keil P, Wilson AM, Jetz W 2014. Diversity and Distributions 20: 797-812

Abstract

Aim We provide a step-by-step demonstration of how a map of species’ detections at continental extent can be downscaled to a fine-grain map of occurrence probabilities using a two-scale hierarchical Bayesian modelling (HBM). The method requires fine-grain environmental data, but it does not require fine-grain data on species detections. We demonstrate how it can incorporate spatial autocorrelation (SAC) and informative priors based on known habitat preferences, and how it can provide maps of uncertainty about the downscaled predictions.

Location USA.

Methods We used range map and point record data on the distribution of American three-toed woodpecker (Picoides dorsalis, Baird 1858) to produce a reliable coarse-grain (160 km × 160 km) map of the species’ presences and absences. We developed an HBM combining coarse-grain information with fine-grain (20 km × 20 km) environmental data to predict probabilities of occurrence at the fine grain together with 95% prediction intervals. The model incorporated SAC in the form of conditional autoregressive (CAR) random effects. It also incorporated prior knowledge on habitat preferences in the form of prior distribution of parameters. We evaluated the predictions using 751 well-surveyed fine-grain cells.

Results Our HBM produced reliable fine-grain probabilities of occurrence that matched the detections and non-detections in the 751 validation cells well (Nagelkerke’s R2 = 0.69, AUC = 0.93). By mapping the uncertainty in the downscaled predictions, we identified areas of low uncertainty and high occurrence probability, as well as large areas of high prediction uncertainty. Mapping the autocorrelation term enabled to identify areas of likely spurious observations.

Main conclusions We demonstrate how hierarchical downscaling enables estimation of species distributions at grains finer than the grain of the original data. The approach can integrate various types of information on distribution and biology in a single statistical framework, and it enables propagating and mapping prediction uncertainty. Yet there are also computational challenges for large datasets.