Data Wants To Be Free

Sir Tim Berners-Lee, inventor of the world wide web, is currently on a blink-or-you’ll-miss-it speaking tour of Australia.

He’s here to discuss his current concerns: truly global access to the web; freedom of information and commercial net neutrality; the web’s potential as a conduit for extremism; and the shift towards a research culture where data is openly available to all. Of these, it’s perhaps the last — the open data movement — that is most challenging within the Australian context.

At its simplest, open data is information that is available for anyone to use, for any purpose, at no cost. The concept has existed since the late 1950s, when the International Council for Science (ISCU) formed the World Data Centre system to archive and distribute data.

Berners-Lee believes an open data culture will create economic, environmental, and social value by facilitating the increased creation, dissemination and cross-pollination of knowledge to address local and global issues.

In late 2012 he co-founded the London-based, not-for-profit Open Data Institute (ODI), to catalyse an open data culture. It’s inviting involvement from anyone interested in open data, hosting hackathons, and offering innovation vouchers to startups looking to draw on open data.

But it’s the web’s capacity for fast and ubiquitous networking that is giving the concept some serious traction. The open data movement is demanding individual researchers and academics get on board with fundamental cultural and operational changes. It’s asking them to make their datasets accessible to other researchers and the wider public, and to publish their raw (unanalysed) and derived (aggregate) data along with their conclusions.

It’s reshaping the highly competitive publishing and funding models that drive and inform the way they work, with sometimes hazy — yet far reaching — implications for notions of ownership, intellectual property, ethics, standards, security and long term use.

Professor Richard Sinnott is the Director of eResearch at the University of Melbourne. He and his team are building the $20 million Australian Urban Research Infrastructure Network (AURIN), a kind of one-stop e-infrastructure-shop for Australia-wide urban and built environment research.

"We’ve been tasked with enabling access to a wide range of distributed data with tools for analysing and visualising it. This includes housing data, transport data and health data," he told NM, pointing out that while researchers often know what kind of data they want to draw on for their research, it’s usually scattered in different databases and formats — and therefore obtuse and hard to unify.

AURIN is part of a suite of world-leading data storage and collaboration infrastructure initiatives that have attracted more than $100 million under the Federal Government’s Super Science Initiative.

Others include the Australian National Data Service (ANDS), the National eResearch Collaboration Tools and Resources (NeCTAR) and the Research Data Storage Infrastructure (RDSI) projects. They are taking data that are unmanaged, disconnected, invisible and single-use and modeling them into structured collections that are managed, connected, findable and reusable.

The shift has been coming for a while. In 2004, the science ministers of the Organisation for Economic Cooperation and Development (OECD) signed a declaration stating that all publicly funded archive data should be made publicly available. Many Australian researchers agree. In an open letter to the Australian Research Council (ARC), Alex Holcombe and Matthew Todd of the University of Sydney encouraged the ARC to make data sharing a condition of its funding.

"Science (real science, not the summaries in popular books and the media) is needlessly closed to the outside world," they wrote. "Worse, it is closed within itself, with every lab its own silo, and little sharing of data or materials."

They foresee a future in which openly shared data can be mined by computer algorithms to seek trends and patterns that result in discoveries beyond the capability of any one lab or research team.

The experience of Dutch schoolteacher and "citizen scientist" Hanny Voorwerp is a pleasing illustration of open data’s inclusiveness. In 2007, Voorwerp discovered a mysterious astronomical "blob" while participating in the Galaxy Zoo project. Voorwerp was included as an author on a resulting publication, and the blob — which is the subject of further research — was named after her.

There are multiple potential benefits in opening up data, says Anne Bell, head librarian at the University of Sydney. These include increased dissemination and impact of research; increased opportunities for collaboration; higher data citation; reduction of research duplication; increased provision of support services to enable management and sharing of data; improved returns on public investment in research; and enhanced accountability and public confidence in research, as data is available for others to validate or challenge.

On the surface, Australia is right up to speed, but making open data happen in a robust and meaningful way is not without complications. There are a number of challenges, says Dr Ross Wilkinson of ANDS, which is building the Australian Research Data Commons:

"Will researchers get credit for their data, in the same way they get value for a publication? Can they access good data management tools and techniques? How will they work with others in cooperating over data?"

Then there’s the scale and cost. "Open data is one thing, but the realities of the scale of data being produced is quite another. Many data holders are dealing with digital data at an unprecedented scale. How do you harvest that? How do you manage that? Opening up data is not a one off thing; it must be maintained, and it’s very expensive," Sinnott comments.

Researchers will have to adapt to a massive cultural shift; they are more accustomed to keeping their data to themselves in order to retain the competitive advantage demanded by current funding models. The ARC’s Excellence in Research for Australia (ERA), is a case in point.

The ERA measures and ranks the quality of research by assessing publications at 39 Australian universities, then funding them accordingly. It stands to reason that publications about new or untapped data are going to generate more interest — and subsequent financial reward — than those flogging old data horses.

Bell says it’s too early to be clear about the future of the relationship between the ERA and open data, but points out that some journals and publishing houses are driving the shift towards open data. Innovative journals are now requiring authors to make data associated with publications available to readers as a condition of publication.

"The ERA currently measures research publications only," adds Wilkinson. "But openness of data has a secondary yet important effect: publications with associated open data are more cited."

Researchers want layers of openness, cautions Sinnott. "Putting it all out there for everyone to access is not in sync with the research psyche. If a dataset is truly open, most researchers probably don’t care about it anymore. They want data that’s new and untapped; they want to be doing cutting edge research."

Bell is more optimistic: "It needs to be recognised that openness of data or other publications will not interfere with a researcher’s ability to publish and gain reward and recognition for their work. Researchers can continue to publish first, share later."

"There’s no reason research data can’t be included in future research assessment exercises such as the ERA, enabling researchers to gain recognition and reward for published data as well as traditional research outputs such as journal articles."

Donate To New Matilda

New Matilda is a small, independent media outlet. We survive through reader contributions, and never losing a lawsuit. If you got something from this article, giving something back helps us to continue speaking truth to power. Every little bit counts.