Improving the Modeling of Government Surveillance Data: a Case Study on Malaria in the Brazilian Amazon
The study of the effect of the environment (e.g., climate and land use) on disease typically relies on aggregate disease data collected by the government surveillance network. The usual approach to analyze these data, however, often ignores a) changes in sampling effort (i.e., total number of individuals examined), b) the fact that these data are biased towards symptomatic individuals, and; c) the fact that the observations (e.g., individuals diagnosed and treated for the disease) often directly influence disease dynamics by decreasing infection prevalence. Here we highlight the consequences of ignoring the problems listed above and develop a novel modeling framework to circumvent them. We illustrate this modeling framework using simulated and real malaria data from the Western Brazilian Amazon.
Our simulations reveal that trends in the number of disease cases do not necessarily imply similar trends in infection prevalence or incidence, due to the strong influence of concurrent changes in sampling effort. Furthermore, we show that ignoring decreases in the pool of infected individuals due to the treatment of part of these individuals can significantly hinder inference on underlying patterns of infection incidence. We propose an innovative model that avoids the problems listed above. This model can be seen as a compromise between more phenomenological statistical models and more mechanistic disease dynamics models; in particular, a validation exercise reveals that the proposed model has higher out-of-sample predictive performance than either one of these alternative models. Our case study on malaria in the Brazilian Amazon reveals surprising patterns in infection prevalence and incidence, which might be partially attributed to seasonal rainfall variation.
We have proposed and applied a novel modeling approach that avoids problems that have plagued several earlier analyses of government surveillance disease data. We illustrate how ignoring these problems can significantly hinder inference on the effect of environmental factors on disease dynamics. This modeling approach is likely to be useful for the modeling of various diseases using government surveillance data.
government surveillance data
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Masters Theses
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info