Those sly dogs.
Gloved and hunched over scanners, librarians from some of the biggest libraries in the world have spent the last five years digitising each and every page of millions of books — five million of them still under copyright.
They did it to democratise information, they said, to preserve the corpus of human knowledge for generations to come. They did it without permission from the copyright holders.
They did it for Google. Or, more specifically, for the Google Books Library Project, a virtual database containing the scanned pages of millions of the world’s books.
Originally, back in 2004, the partnership between Google and America’s great libraries was conceived to digitise the 15 per cent of library books that were in the public domain — golden oldies like Wuthering Heights and David Copperfield. In America (and Australia, thanks to the Fair Trade Agreement), a book enters the public domain 70 years after the author’s death (in Australia it used to be 50), or if it was published prior to 1 January 1923.
That left 85 per cent of library books unscanned — 10 per cent of which are still in print and on bookstore shelves, and the remainder of which are "orphans" (books out of print but still in copyright). But because Google are uppity little nerds who consider the world as theirs to metatag, they decided to scan them all, regardless of legal status.
Arm-in-arm with librarians, Google declared they would have 15 million books digitised in under a decade. In other words, almost half of the 32 million books that humans have published.
Using the Elphel 323 — a digital camera that can scan 1000 pages per hour — librarians and Google began to scan the full texts of every book in five major university and public libraries: Stanford, Harvard, Oxford, the University of Michigan and the New York Public Library. Google archived the entire text of each book, indexing it to be responsive to search requests. Users got a few lines of text as their search result — a "snippet" — which Google claimed was "fair use", the same way a review might quote a few lines of a film or book.
The reaction from authors and publishers was a unanimous "Wtf?". Their outrage was two-fold — that Google would have a virtual copy of these books on its server, and that a bunch of IT nerds could presume to scan first, ask later.
In Australia, Google would have been shut down before they had the chance to turn the power on. The main exceptions to our copyright laws come under the "fair dealing" exception, which must fall within a range of very specific uses. America’s "fair use" exception allows any use, regardless of purpose, as long as it is "fair". This is an open-ended exception which can only be interpreted by the courts. Which means that giants like Google can scan first, and fight later.
After 10 months of tense negotiation with Google, authors and publishers united in their resolve. The Authors Guild kicked things off, launching a class action against Google on behalf of all authors in September 2005, claiming "massive copyright infringement". One month later, five major publishers claimed the same, and launched the McGraw-Hill civil lawsuit.
With typical pluck, Google continued to scan. Librarians were champing at the bit. Mary Sue Coleman, president of the University of Michigan, called the project "legal, ethical and noble", predicting that it would change the world.
The prospect of a universal library is revolutionary, and sometimes revolutions require a little bloodshed. The arguments supporting Google’s flagrant disregard for copyright are lofty. There’s the prospect of storing the world’s books in one place, available to the one billion people on planet Earth with access to the internet. Digitise these works and man’s knowledge is preserved for time immemorial, kept safe from political revolution — like the Khmer Rouge’s burning of Cambodia’s national library — and natural disaster, like the loss of government documents in Louisiana’s Tulane University during Hurricane Katrina. And for authors, many of whose out-of-print books are likely to have sunk into obscurity, Google’s online library would make their masterpiece available to the world again.
Google weren’t the only ones scanning. Beijing-based company Superstar has already scanned every book in 200 of China’s libraries, a total of 1.3 million titles which, according to Superstar, is approximately half the number of books published in China since 1949.
As you’d expect, scanning a book in China is a lot cheaper than doing it at Stanford — a third of the price, actually, $10 instead of $30. In 2004, just as Google was beginning its book project, Raj Reddy, a professor at Carnegie Mellon University, shipped out tens of thousands of volumes from the Carnegie Mellon and Carnegie library to China. Reddy’s scanning enterprise, the Million Book Project, is now being made possible by assembly lines of Chinese and Indian workers, who are cranking out 100,000 pages per day. Most of the books are in the public domain, and permission has been acquired to include over 60,000 copyrighted books. As of November 2007, 1.5 million books had been scanned.
And then there’s Microsoft, who always seem a little slow off the mark these days. It started a copycat Google Books project in 2006 called Live Search Books, which was ditched in May 2008.
In March 2007, Thomas Rubin, associate general counsel for copyright, trademark, and trade secrets at Microsoft, accused Google of violating copyright law with their book search service. Specifically, he criticised Google’s policy of copying work until notified by the copyright holder to stop.
Meanwhile, that March, as negotiations continued in the courts, Google had 20 libraries on board, and according to the New York Times, had scanned one million books at a cost of around $US5 million. Barely 18 months later, in October 2008, Google claimed to have seven million books archived: one million in the public domain, another million scanned by their 20,000 publishing partners, and five million still under copyright.
That October, authors and publishers got what they wanted: an out-of-court settlement valued at US$125 million. This gets split three ways: US$34.5 million for notice and administration costs, and to establish the Book Rights Registry, which authors can search to see if they can make a claim; US$45 million to resolve existing claims by authors and publishers; and the rest goes to the publishers’ legal fees. This last figure might be way underestimated. Harper Collins CEO Jane Friedman declared, "I don’t expect this suit to be resolved in my lifetime".
So what did they win?
Copyright holders now have the right to decide whether or not they want to be in Google’s online library. For books out of print, copyright holders can opt in or out; for books in print, the publisher must make this decision with the consent of the author.
The time available to opt out or object is ridiculously short — written notification must be sent to Google by 5 May 2009.
For those that want in, there are a number of compensations. The first is a one-off payment of US$60 for each book. If the book is still in print, this gets split according to profit-share agreements between publishers and authors. Google will pay the full $60 to the copyright holders of books out of print. If Google runs advertising on a page featuring just the one book, 63 per cent of revenue will go to that book’s copyright holder, and 37 per cent to Google.
The deadline for authors to opt in is not much better — 5 January 2010. Any authors who have not made a claim by that date will get no profit from the digitising of their book, and will have no say over what percentage of the text Google makes available online. Google is no doubt banking on the likelihood that only a fraction of authors and publishers will lodge a claim; the settlement requires that Google pay a paltry minimum of US$45 million. To compensate all five million books in copyright would cost Google US$300 million.
So for the total sum of US$125 million, Google has the right to digitise almost all books published on or before 5 January 2009.
For now, Google can show 20 per cent of a book’s text. However, according to Jeremy Fisher, executive director of the Australian Society of Authors, the devil is in the detail. The settlement vaguely stipulates that Google can make "other specified uses" of texts, suggesting that Google may eventually make the text of all its scanned books available.
The settlement remains subject to a final fairness hearing, which is scheduled for 11 June. Regardless of what the court decides, Google has more than just thrown down the gauntlet to the traditional business model that gets books from authors to readers. The onus now is on creators to rethink the way they make money before companies like Google decide it for them.