This is part two of Vanessa Murray’s series on Open Data. Read part one here.
At its simplest, open data is information that is available for anyone to use, for any purpose, at no cost. It’s propelling our researchers towards a future where openly shared data can be mined for trends and patterns that result in discoveries beyond the capability of a single researcher or team.
The Federal Government’s Super Science Initiative is investing millions of dollars in building world-leading data storage and collaboration infrastructure initiatives, but while it’s one thing to share observations of the southern oceans or climate data, it’s quite another to share an individual’s personal and health information.
But it’s precisely that — health information — that Sir Tim Berners-Lee, who recently toured Australia, thinks is one of open data’s most promising applications.
"If you’re distributing clean water or if you’re distributing drugs, then there’s just a certain amount of clean water," he told Al Jazeera’s Sophie Sportiche in December 2012. "Information, however, has this explosive quality: if I get information that I download on the Internet, then I still have it when I give it away. That way, important information about health care … can spread if you’ve got the information infrastructure in place. Getting the information infrastructure in place is enabling in a way that nothing else is."
The term for this new approach to the management and distribution of personal health information is "open health". The term encompasses e-health (the storage and provision of identified personal medical information online), and the release of de-identified health information to the public at large (the health side of open data).
Berners-Lee’s latest initiative, the London-based, not-for-profit Open Data Institute (ODI), is currently touting the work of a group of NHS doctors, academics, and London tech start-ups, Prescribing Analytics, as an open health data success story.
Prescribing Analytics analysed 37 million rows of publicly available NHS prescriptions data and discovered that an average of £27 million (AUD$40.7 million) per month of potentially unnecessary expenditure on the two proprietary statins — a class of drugs used to prevent cardiovascular problems — was taking place.
"This project is an example of how open data can be used to help services run more effectively and efficiently," commented Nigel Shadbolt, Chairman of the ODI.
In a thoughtful four-part series published in The Conversation last year, doctoral students Nick Evans (of the Australian National University) and Adam Henschke (of Charles Sturt University) voiced caution.
They’re concerned that personal health information isn’t just going to be shared outwards from government, but that open data projects aim to increase information sharing across government departments. This means that in the future, health data will be linked with education, law enforcement and spending data, and potentially used to target particular groups.
Like the open data movement as a whole, e-Health is already underway in Australia. e-Health records are not intended to be shared, except with an individual’s healthcare providers.
Yet register online for an Australian e-Health record, and you’ll be prompted to link multiple online government services (so far, Centrelink, Child Support, Medicare, Department of Veterans’ Affairs and the eHealth Record System) with a single user ID and password. It seems likely that as the system grows, more will be added.
The facilitation of online access to identified personal data through e-Health is separate from, but will potentially intersect with, the release of de-identified health data in an open research environment.
Evans and Henschke reference the UK’s 2012 UK Open Data White Paper, which examined the need to maintain anonymity within and between data sets. The "mosaic" or "jigsaw" effect, where data can be used to re-identify groups and individuals, was among the issues identified.
"Once linked, data can be very difficult to unlink. Indeed, data protection and securing anonymity lies in tension with a central driver of the open health movement, innovation. Ultimately, protecting anonymity will cost money and time, and this can slow the distribution of data to interested parties who can use it for practical ends."
"An important step in crafting good data oversight will be assessing the trade off between oversight and innovation," they add, pointing out that the importance of ensuring that the voice of the innovator, whose motives may be purely financial, doesn’t drown out the reason researchers innovate in the first place: to improve lives.
Berners-Lee and the ODI also caution against going open slather on open data. The ODI recently advised against making the UK’s National Pupil Database (which contains data about nursery and school pupils in England dating back to 1995), available to businesses without consent from the affected individuals. They consider this an unacceptable risk of individuals being identified, but argued there are potentially great benefits, without the same risks, in providing suitably de-identified aggregate information sourced from the NPD as open data.
As with other aspects of the open data movement, the realities of making open health data a reality are complex. Issues of privacy, ownership, intellectual property, liability, patient consent and research ethics all come into play.
"The National Health Medical Research Council are introducing open access policies, which are fine in theory, but get very blurry very quickly in practice," comments Richard Sinnott, Director of eResearch at the University of Melbourne.
"The researcher has to ask themselves, am I allowed to share this data? What is the data? In many cases data ownership can be non-trivial to establish, especially when dealing with aggregation of data from multiple sources. It’s often an ill-defined problem."
Anne Bell, head librarian at the University of Sydney, points out that medical data can only be shared if certain conditions or restrictions are met, such as those imposed by human research ethics and privacy requirements:
"Whether research data is made openly available will depend on a number of factors including ethics and privacy issues, commercial exploitation opportunities, the source of research funding, any funding body requirements, and so on."
Ross Wilkinson, Director of the Australian National Data Service (ANDS), cites the work of the Population Health Research Network (PHRN) as an example of a project where the pros outweigh the cons. The PHRN is building a nationwide data linkage infrastructure to securely and safely managing Australian health information. Its first project is linking hospital admissions data with hospital-related deaths data to enhance hospital performance and accountability.
"Research data that can unlock important medical discoveries should not be locked behind doors that prevent discovery," he says.
The last word perhaps belongs to Evans and Henschke.
"Careful and honest reflection on the tension between the values that drive open health is required before the projects start. On the one hand, human health is vitally important, but we only have so many resources to go around. On the other, innovation will help with this resource problem. In between are the lives and personal information of whole societies, released into the open on the premise that doing so benefits everyone."