Barbara has only a special-purpose blog or two. So, for the sake of continuity with the readers from the Symposium and DP, Barbara writes:
Bill has posted my notes from the symposium. If you read them, you’ll see lots of typos (though I did try to fix most of them), many sentence fragments, a few editorial comments (which I hope you can discern from the speakers’ comments), and probably lots of abbreviations that don’t quite make sense in any context, let alone in the notes of someone who is not really involved in the whole “Mass Digitization” project except as maybe a user.
Let me try to summarize.
First, let’s set the scene. The (notably well-organized) event was held in a beautiful auditorium on the campus of the University of Michigan. UM is one of the particpants in the Google digitization project. The 200-400 attendees included librarians, library scientists, administrators, academics, students, academic press publishers, and community members. The overall tone was one of enthusiasm for the project, although it was not all sweetness and light for all of the sessions.
Here’s what I think the threads of discussion were:
- What is the role of the library (and by extension, librarians), if every book catalogued in OCLC were digitized? Consider not only the document delivery aspect, but the curation of a collection. What does the term collection mean? Is a library a collection of content, or of artifacts?
- What risks do we take by having our cultural heritage documented by a commercial service? How can we be sure that what the user gets is what the author wrote? How can we be sure that if that commercial service goes away we can still get at the content?
- What is the real purpose of the Google project in particular? The common view is that Google is scanning all of these books, and then they’ll be available online. However, Google insists that they are scanning all of the books and only the public domain ones are available online — one page at a time. However, they are using their indexing algorithms to index the books, and for non-public domain works will show only enough information for the searcher to determine if that is a book they may be interested in acquiring either from their library or a bookseller. But that’s what they say now. What about ten years from now?
- The economics of producing and distributing written (scholarly) works is going to change and we can’t predict exactly how. Most agree that there is still plenty of money to be made, but business models are likely to require significant readjustment.
- Are we in a rush to digitize all of this content? Shouldn’t there be some planning? Some standards? Are we just going about willy-nilly, wasting resources in duplications of effort?
- The largest issue, the one that is really at the heart of the oppostion to the project, is the issue surrounding copyright and fair use. In the US, copyright was originally an “opt-in” system — one had to claim copyright and mark the work as claimed. It was possible to renew one’s rights once. In the 1970’s copyright became a default state for new works, whether they were claimed or not. In fact, it is apparently quite difficult to disclaim copyright.
The application of copyright law and the fair use statute is becoming more complex with the easy ability of users to “mix and match” content into new forms. One of the points of fair use is the transforming nature of the end result. At what point does remixing become a new work, and not an infringement of another’s copyright? Some point to Creative Commons, which works within the bounds of copyright to allow for remixes. Related to this is contract law that can affect a copyright holder’s ability to benefit from owning the rights to the work. This seems to be conflated with the idea of using licensing bodies (such as ASCAP or BMI in music) to extract fees from end users. How do we ensure that rightsholder’s perogatives aren’t violated? [This conflation of ideas was pointed out succinctly by James Hilton (I paraphrase): “Some publishers think of fair use under the terms of license. They view copyright as though it were a license and would like it to be a license, but it isn’t.
Through the whole event, I had a difficult time reminding myself that we were talking primarily about academic issues, about scholarship, and how large-scale digitization could affect the nature of scholarly work. Topics kept coming up that seemed more suited to a fiction-writer’s conference, because as is fairly well-known, most academic authors do not make money directly from the sale their books. So the points about ensuring proper remuneration to authors are a bit misleading. On the other hand, a few academic publishers mentioned that they make hard copy books only because authors demand it. An author may not need many, just a few copies “to show to their mother and to their tenure committee.” (This is a topic better suited to a conference on the future of the university.)
I was reminded that there are three types of books: those in the Public Domain, those which are copyrighted and still in print, and those which are copyrighted and out-of-print (the so-called orphan works). Public domain works are ostensibly easy to handle, because, well, they’re in the public domain. Pre-1923 books, as we know, easily fall into this category. However, there are many books, as we also know, that have “risen into the public domain” (possibly) through lack of attention on the copyright holder’s part. Finding these are time consuming and can be expensive. However, as Clifford Lynch remarked, “just because I have a copy of something in the public domain, doesn’t mean I have to give you a copy of it.”
How pleased I am that the people who support Distributed Proofreaders and Project Gutenberg don’t feel quite the same way. I do get frustrated with reprinters that take PG works and make money off our our volunteer efforts. However, I am positive that the greater good is being served by our work.
All in all it was an extremely interesting, though tiring, day and a half. I don’t know that I know any more about how Google Books might affect DP, but I am convinced that there is room for us, even to the point of our Content Providers still providing scans made with our own hands. In fact, I am more enthusiastic about DP and PG than ever. Our role as a source for Project Gutenberg means that our efforts go to providing freely available etexts to anyone at anytime. That is not Google’s goal, nor is it the goal of University Libraries. We would do well to remind ourselves of it from time to time.
Now, for the DP-related bits:
- When Adam Smith (the man in charge of the Google Project) gave his remarks Friday afternoon (and my battery was dead), he mentioned that people are already using the information created by the project in new ways. One he highlighted (on a slide) was titled “Bruce Albrecht’s List.” I recognized that name — a veteran DPer.
- Bill asked Mr Smith if it would be possible for individuals (like us) to contribute our scans to the effort. Also mentioned was the idea of errata. As we’ve found, sometimes the scans are, well, lacking, and Google needs a mechanism for reporting and fixing them. Smith seemed amenable to both ideas.

