AbstractThis paper presents maps of the inferred relative prevalence of PD in England. The maps are based on data from publicly accessible data sets:
The data does not explicitly mention PD, so we adopt a proxy measure. This is based on the observation that some drugs, such as levodopa, are associated with PD. The ratio between the number of "Parkinson's" prescriptions written by a practice and the total number of prescriptions that it writes gives an indication of the prevalence of PD in that practice's catchment areas. We go one stage further, we derive our proxy measure by dividing a practice's ratio by the national average ratio. This gives the inferred relative prevalence for that practice. It is from these values that the maps are drawn. The need for the workFinding variations in the prevalence of Parkinson's Disease is of interest for a number of reasons. They may give clues as to:
In short, if we understood the spatial differences in PD's prevalence, we would be much closer to finding a cure. As far as I'm aware, there are only two papers that give detailed maps of prevalence across a whole country. One by Willis et al. [1] for the US. And one by Pedro-Cuesta et al. [2] for Spain. I've found nothing similar for the UK. It is this gap, at least as far as England is concerned, that this paper hopes to go some way to address. DataThe UK government through its Open Data initiative [3] has now made it easy to, at least, make a start on filling this gap. Amongst the many data sets that are open to the public are ones relating to prescriptions. Personal details such as the names of the people to whom the prescriptions are written are not disclosed. The maps are based on NHS prescription data from June, 2012, from each of about 10,000 GP's surgeries in England. Some of the maps are based on the whole sample. Some, where appropriate to avoid distortion by unrepresentative smaller practices, are based on a reduced sample of 8400 that comes from including only those practices that wrote at least 500 prescriptions in the month. The distribution of practices is felt to reasonably follow that of the population. For this work I've used three data sets:
A critical feature of the analysis is the identification of "Parkinson's" drugs. These were taken to be those drugs with BNF numbers from 0409010A0 to 0409020S0. See Table 1. Inferred relative prevalenceThe data makes no explicit mention of Parkinson's disease. So, the crucial question is: How do we extract PD prevalence data? Strictly speaking, we don't. Instead, we look for a weaker measure: relative prevalence. This allows us, in this context, to answer questions like, Where in England is PD most common?. But not, how many people in England have PD? So, we're looking for a measure of relative PD prevalence. That's not in the database either. A proxy is used:
First we argue for the use of IRP for an individual practice. Although exceptions can be found to each of the following claims, we argue that regardless of the prescribing policies and patient mix of a practice the following statements are stochastically correct, (in the sense that each is an unbiased estimator of the true situation):
Secondly, we argue that aggregating the results from a group of practices by aggregating the raw data gives an unbiased estimator of the overall situation. Limitations of this approachDoes the approach adopted here give results exactly equivalent to the true rate of PD prevalence? Of course not, for instance:
A deeper problem is that with a slowly progressing disease such as Parkinson's a patient may have lived in many areas during the course of the illness. So, even if the identification of the disease by the prescription method is correct, any attempt to use the IRP statistics, such as to find spatial correlations with environmental toxins, is made harder. But, although these problems will certainly add noise to the statistics, I don't think they will invalidate the whole approach. AnalysisAnd the results? It is early days yet, but some things can be seen: Plotting the whole sample does not show large regional differences like those reported in the US. The most telling measure of this, albeit not a sufficient condition to disprove large variances in distribution, is that the "centres of gravity" of the source of all prescriptions compared with the PD prescriptions differ by only a few miles. See Map 1. However, analyses based on on comparisons between the lowest and highest prevalence practices hint at spatial variations. For instance, Map 2 shows the bottom 5% and the top 5% of practices ordered by inferred rates, hint at variations, with the South West appearing to have higher prevalence rates. This may just reflect demographics. Data attributionThe main condition for the use of the data sets is that you attribute the source, which in this case is:
Provided you do this the Open Government Licence gives you the right to use, distribute and adapt the data. References
[1] "Geographic and Ethnic Variation in Parkinson Disease: A Population-Based Study of US Medicare Beneficiaries"
[2]"Spatial distribution of Parkinson's disease mortality in Spain, 1989-1998, as a guide for focused aetiological research or health-care intervention" [3] http://data.gov.uk/ [4] http://data.gov.uk/dataset/gp-practice-prescribing-data [5] http://www.ordnancesurvey.co.uk/oswebsite/products/code-point/index.html [6] http://neurotalk.psychcentral.com/thread179755.html John |
Go straight to maps.
                                     
Table 1: list of "Parkinson's" drugs, prescriptions, June 2012.                    |