Data mining US-wide county level data suggests an environmental role in the incidence of PD Posted to NeuroTalk 29th July 2012 http://neurotalk.psychcentral.com/thread173942.html ------------------------------------------------------------------------- Summary Data mining US county level mortality data suggests that environmental factors play a part in the incidence of PD. About 35% of the variance in the rates of Parkinson's incidence across the US can be attributed to spatial differences. Question addressed in this post Using publicly available Parkinson's Disease epidemiological data, can we throw some light on the following question: Do environmental factors play a part in the incidence of PD? Method Ideally we would use county level PD incidence rates. But we don't have access to these. As a proxy, we use age adjusted mortality rates. We justify this on the grounds that, apart from any changes in prevalence, since there is no cure, whoever goes into the Parkinson's population will die with it. (We note that it is thought that PD is under-reported as a cause of death [1]. This limits the questions that can be answered using the data but, fortunately, our analysis only requires that reporting is consistent within each county.) We reduce the possible variance due to race, by considering only people whose race is described by the CDC as "white". We divide the remaining population into two parts. In the present study, we make the division on the grounds of gender. (We note that it is thought that men are more likely than women to get PD [2], but again this doesn't affect the analysis.) We separately query for the number of deaths for men and women in each county in the years 1999-2009 where PD was one of the multiple causes of death listed. We then find the correlation between the values of the two groups. We argue that if there is no spatial role the correlation should average 0. Finally, we need to justify the claim that some of the spatial correlations can be attributed to environmental factors. Spatial factors which have not already been taken into account that are not environmental include: - the error in the age adjustment; - the error due to the gender ratio varying from county to county; - the extent to which genetic effects, not already taken into account by race, are not smoothed out at the county level; - the error due to intra-county reporting inconsistencies. These effects reduce the size of the environmental component. On the other hand, the size of the environmental factor is likely to be higher than suggested here because some environmental effects will be at a scale, e.g. sick building syndrom, which a county level analysis is unlikely to detect. Data The US Centers for Disease Control and Prevention (CDC) provide an excellent database query tool, called WONDER, for various datasets related mainly to mortality. [3] You can extract cause of death data at a county level, subject to the restriction that, in order to maintain confidentiality, there must be at least 10 deaths in the reporting period, otherwise the reports are suppressed. The values can be age normalised for which at least 20 records are required, otherwise the reports are marked as unreliable. The search parameters for the analysis described here were: "Dataset: Multiple Cause of Death, 1999-2009" "Query Parameters:" "Title:" "Autopsy: All" "Gender: Female" followed by a new search with "Gender: Male" "Hispanic Origin: All" "MCD - ICD-10 Codes: G20 (Parkinson's disease)" "Place of Death: All" "Race: White" "States: All" "Ten-Year Age Groups: All" "UCD - ICD-10 Codes: All" "Urbanization: All" "Weekday: All" "Year/Month: All" "Group By: County" "Show Totals: False" "Show Zero Values: True" "Show Suppressed: True" "Standard Population: 2000 U.S. Std. Population" "Calculate Rates Per: 100,000" There are 3164 counties in the dataset, of which data from 1278 was available (i.e. both the male and female results were neither suppressed nor marked unreliable.) Results The correlation between the age adjusted mortality rates of the two genders is 0.5925 (n=1278, 99% CI = [0.544,0.626] [4]). Taking the square of the correlation suggests that about 35% of the county level differences can be attributed to spatial differences. The claim that the spatial differences we find are due to environmental effects is more speculative. The proportion of the correlation that is causal is unknown. Further work is required. References [1] http://journals.lww.com/epidem/fullt...erative.2.aspx [2] "Incidence of Parkinson’s Disease: Variation by Age, Gender, and Race/Ethnicity" Stephen K. Van Den Eeden1, Caroline M. Tanner2, Allan L. Bernstein3, Robin D. Fross4, Amethyst Leimpeter1, Daniel A. Bloch5 and Lorene M. Nelson5 American Journal of Epidemiology, 157:11, pp 1015-1022, Nov, 2002 http://aje.oxfordjournals.org/content/157/11/1015.full [3] http://wonder.cdc.gov/wonder [4] http://vassarstats.net/rho.html John