Barbara and I are semi-live blogging from the Symposium being held at the Rackham Auditorium today; I’m going to drop her notes here, and we’ll edit in links and comments afterwards.
A Dialogue about the Impacts of Mass Digitization
University of Michigan, Rackham Auditorium
10 March 2006
Rackham is a fabulous auditorium. Recently restored, it shows the best of early-20th century decoration. Peachy-brown paint on the walls, gilded harlequin pattern on the ceiling.
And there are two big screens with showing the video portions (likely Powerpoint presentations), showing two links: Symposium website, and Symposium blog.
Not exactly live-blogging here, though there is wireless access. Power is at a premium, so Bill will be copying this over later.
Beth Fitzsimmons:
Some commissioners of NCLIS (public policy) stand up and get welcomed.Mass Digitization means people have been “liberated from the constraints of the analog world”
Small humor: As she says “we can be in contact with anyone, anywhere” somebody’s cell phone goes off.
Brenda Johnson & John Wilkin — the emcees (logistics, etc)
- Mary Sue Coleman: “Love of books”
- invokes T Jefferson and Library of Congress
- Google = 800 lb Gorilla
- quoting first president of UM: “Books are our fixed capital”
- before google, 5000-8000 books per year were being scanned at UM
- 4.8 billion artifacts under care in the US “heritage and culture”
- Most are books
- Digitizing doesn’t replace books, it expands the marketplace for books
- Deep respect for IP — “It is our Number one product”
- Protect all copyrighted materials in the archive. Will not ignore the law.
- Paraphrase: Lookit all that information!
- Mentions Making of America project
- Mentions “Bees and Bee-keeping” and how useful it still is for modern beekeepers
- Transcends debates about “snippets” and copyrights.
- March 10th is the anniversary of the telephone. Relates it to the internet.
Not too many questions (one comment, one leading question about copyright). She’s enthusiastic, as expected, about the google thing.
Panel discussion Libraries:
Josie Parker, Barbara Allen, Michael Keller, Karin WittenborgParker: What do we mean by “public libraries”?
Change is not free(there is an awful ringing echo within the microphone that is very distracting)
Allen: window of opportunity before public policy is set to experiment
trends in changes in user behaviors:
Assc of Research Libraries: circulation/reference transactions dropped below 1991 levels (in 2003). ILL increased
Strong preference for digital, even when paper is available
Expenditures Doubled, staffing dropped, purchasing down — mostly for electronic resources (up to 50% of the collection budget), mainly to journal publishers
Don’t know enough about the print collection to know the best way to develop/use the collection
OCLC has 32mill records — nearly 40% are held uniquely, half are published before 1977Rethink the space of the library
Consider a new organizing principle for collections
“It makes little sense” to randomly digitize — should be organized so we can capitalize on existing technology and not waste time by rescanning stuffand then the mike fault makes us unhappy
Keller: All attorneys please raise your hands
Need to change terms of reference.
“Library as an idea” — they are aetherial, not just buildings
Hard to measure digital resource usage
Mentions “Final Encyclopedia” by Dickson
Google is the largest supplier of hits to Highwire (3bill a month)
we’ve been doing digitization for years, google is just an extension
sales of current books increase when you can look at the content (ref: amazon)
stanford also limiting copyrighted materials to their students/faculty for fair use research
Intend: provide new services for stanford users using the “Digital avatars” from google
Taxonomically index the books (by ideas, not just words)
Associative searching
Citation linking (footnotes)
Graphical user interface for linkage
Alerts to users
Recommendation services
Ancillary services (such as definitions, map locations)
Work with other projects: get use cases for public policy, show not harmful to IP owners, and lots of bennies for usersOrphan works: discover which ones are available
Mentioned Internet Archive — 1923-1964 non-renewed works
Also expand Fair Use defense: It’s about intellectual explorationLibrary of Alexandria: digitizing Arabic-language works — not available in Middle East because of size of libraries, money, etc. So western libraries would scan the stuff they have so the embattled folks in the ME would be able to see their rich heritage
Wittenborg:
TJ is “really thrilled about mass digitization”
Not an impartial observer. “Google is one of the most important thing that has happened to libraries in my career, and UVA isn’t even involved”
Deeply disappointed and also somewhat hurt at the opposition to mass digitization by some librarians, publishers and authors. — understand “It changes the status quo.”Assumptions:
Quality will improve — will not be perfect, but it doesn’t have to be
Count on perpetual access
Inexpensive, portable devices
Quickly move from mostly text to all languages, and other formats
Will be affordableAnnouncement of digitization gave “nightmares of preservation microfilming” but happy that Google has the money to do it, and the library resources are available to other things
Excited about “a new definition of fair use”
Exploration of use of copyright
Physical libraries will change (will have to reinvent ourselves) — Libraries are sinkholes for space
What to do with all that space generated by not buying books!!
Librarians — less custodial, more informationWhat value can libraries add? Is it sufficient?
Questions:
Section 108 group recommendations?
Not just to digitization for archive, but use should be available to the organization that made the archive.What are you going to do with the books after they’re digitized?
- Dunno. Let other people store ‘em.
- Keeping only one copy.What are you going to do with the space?
- Stanford: planning bookless engineering library, more study/collaboration space, more professionalsMD (specifically the google project) is indexing, not replacing.
- UVA: better serve graduate students
- more collaboration, socialQ: Point about publishers buying from researchers, then selling to libraries — needs to change or there won’t be enough $$. Univs should support their uni presses.
- What about digital repositories? (such as phys archive) — should be interoperable, but needs some work.
- When they become “worthy”, some printed journals may stop printing. But we’re building a tower of babel, with the competing digital initiatives.Q: What kind of added value are you talking about?
- no real answerQ: Thanks for taking the legal heat, but why are we in a hurry?
-K: I don’t think there is a rush.
- MD helps in preserving the materials — pressure is coming in a positive sense from google (they got the money)
- driving force is student pressureQ: Issues about cooperation, duplication of effort, etc. If you take the stand that preserved (copyright) items are only for the folks that make the archive, then the general benefit is lost, because smaller libraries wouldn’t have access. And what about out-of-print but still in copyright?
-K: In this one instance I would love for the strict constructionists to reharmonize copyright with patents.Q: Authentication — trustable content?
- there are some methodsQ: DMCA?
- “By arrangement”
- Need more pressure from scholarsQ: What happens when the computers stop working?
- Redundancy is important, but don’t let it paralyze you.Q: Sustainability? (a long comment, actually)
No response.Q: What happens to digital content when the EMF comes? Is somebody making hard copies?
No response.
Second session:
Reading 2.0
Tim O’Reilly, O’Reilly Media
Three or four talks in one
Thinking about the future of reading (symposium next week, too)
What does a book do? Not just an artifact.
- Compares Harry Potter to World of Warcraft.
- Encyclopedia Britannica to Google, and Wikipedia
Not just paper between coversO’Reilly: Reference, teaching, entertainment
Make Magazine “Martha Stewart for Geeks”
– potato cannon “Technology can be entertainment too”1988: worked with Davenport Group: DocBook
later: Safari Bookshelf
recently: Self-organizing maps (similar-looking material)
Safari U: custom textbook engine (in beta)
Books are the content and their interrelations, and not just the stuff between the covers.What job does a Library do?
Preserving history (Internet archive and the wayback machine)
Future access
Archive.org used more than Library of CongressWhy Google Library matters?
People could easily burn CD, lead to hand-wringing in music industry. But needed the content available to make the iPod possible.
People forget history. Internet was free, now lots of $$ is made.
“Free is replaced by commercial economy, but only if you let it go.”
Apple has made the DRM fairly transparent (more later)
last-fm and pandora as examples of discovery mechanisms (music)
same could happen with books
Books aren’t “easy to rip”Orphaned works problem: buying rights of orhpaned works is a nightmare. Publishers don’t know who has the rights.
DRM should be held loosely, like a cat at the vet.
Near term opportunity: search > demand > increased sales.
O’R Safari: sales of physical vs ebooks — is there a difference in the Long Tail?
Point of sale (B&N, Borders, Amazon) 17,754 distinct SKUs. 2171 also available through Safari.
Safari: bumps of older works
Long tail: accounts for 23% of Safari (out of physical print) books
Correlated with bookscan sales (O’Reilly titles)
>> Don’t know if it is an economic opportunity, yet (seeing out-of-print books available as ebooks for sale online)
Reference better online, “Missing manuals” better hardcopyDoes search help or hurt? Most early evidence — not a big deal either way
Type of access differs depending on type of bookWeb 2.0 “The Internet as Platform”
Information Businesses
Harnesing Collective Intelligence
- Users add value
The Perpetual Beta
- ebooks shouldn’t be artifacts, they should be a service
- giving people access to books as they’re being developed
— buy book as a pdf of book in progress
— info about user behavior
— about 1/2 buy just pdf version, more techinical, more electronic
— huge proportion of digital only are international
Software above the level of a single device
- Google is the most widely used linux application, so you use linux!How does it apply to the book?
- needs integration into the ecosystemData is the next “Intel inside”
- applications increasingly data driven
- concerned about all the data residing with one provider
- how does content become more mobile?A platform beats an application every time
- “one ring to rule them all” or “small pieces loosely joined”
i.e. Microsoft vs internet routing network. Avoid monoculture.
- mashups
- wanted integrate their content directly into applications
- so creating webservices based help systemQ: Wikibooks?
It is possible to build works collaboratively, but as a publisher, I need to figure out the economic model. The other issue is the length — a few pages, sure, but it becomes difficult to curate the user experience. Online creators are more often not really creating books — they’re more assemblages of shorter bits. (Miscellanies)Q: Catastrophes?
Redundancies, and don’t protect too much. Multiple formats. “We need bookster.”
Session 3, early afternoon:
Research, Teaching & Learning
John King, Jean-Claude GuÈdon, Ed Tenner, Ann WolpertKing: welcome
Tenner: Thanks UM for “Making of America” and understands how helpful. “Writer on unintended consequences.”
10 years ago, Bill Gates was optimisitic that we would become better educated by having the “information superhighway.” But the recent studies suggest that people are more illterate than expected.
With increase in internet use and reduction in TV, we should be better at dealing with online information than we are. (pulling info from a report) Needs some attention, but doesn’t seem to be too much.Google makes people think they’re good at searching, but they aren’t. Google & other search engines do ok, but students may possibly need info from later pages but they don’t bother to find it.
You have to know what you’re looking for to perform a good search.
Wikipedia is good, but doesn’t have it all. (The general audience reaction: “So fix it!”)
We need to bring up literacy, but people’s IQs are getting higher (because of MTV and video games), so what’s up?
He’s suggesting that Schools and Libraries practice “Search Engine Optimiziation”, so they get presented first.
Q: It’s not the fault of Google if the result isn’t very good. So many things are available only behind walls. We should teach doubt and “information heuristics” and so forth.
Q: Wikipedia entry needed to be fixed. Did you fix it?
I am planning to do so. And possibly write about my experiences.Q: Are you familiar with projects used to teach people who to search? And is the information there to be found?
Not everything, but in limited time, I only described the problem.GuÈdon: What happens when the very nature and very essense of the document changes? And what happens to the relationship between the document and the user?
Comparison to beginning of printing. Pre-printing, authority of document belonged to its lineage. Printers just printed whatever they could. The response “who are those people anyway?” So they tried to create some reputation. Eventually, the text became a method of presenting the text. Critical editions become difficult, because they don’t necessarily reflect the intent of the author. Shakespeare different editions from 1603 to 1621 are different. Couldn’t he have changed his mind in 21 years?
Print == snapshot of the times. Wikipedia is not a thing or a product. It is a process. The relationship between the people and process is different. One can edit wikipedia, one cannot edit britannica.
Distance between people and texts is decreasing. People should be back at the center of the search engine. “I love to Google myself. Much better than a mirror, at least in my case. I Google my friends. I Google my enemies.” But I use it connect with people.
Every Ph.D. thesis should have a literature review. If the libarians cross-linked this review, then there’s a set of self-updating knowledge. Also, clustering via concordances and SIPs [in Amazon sense]. Create communities of people interested in particular topics.
Q: Citation is threatened if your source is constantly changing. What about the transformation of citation?
Possibly use RFC analogy. [Barbara: Always about wikipedia. But they don't seem to understand that the history is available -- but Bill says, wikipedia history isn't searchable]
Attribution is also an issue (who to reward? who to blame?)Q: Statistically improbable phrases?
It’ll all work outQ: Images should be Open Source. Converting to text adds value, but gets one away from the original.
Wolpert: Has Google approached a degree of ubiquity as a tool for scholarly research as it has for general subjects?
MIT deals with “born-digital” so doesn’t have so much to do with mass digitization. Been working with Google Scholar, instead of Book Search.
D-space repositories are searched by Google.
MIT Students/faculty search google, then if MIT subscribes to the journal in question, they get a live link.
Lots of experience on the journal end — but mental model of large scale digitization is shaped by journal experience. So what users want from books is the metadata like they get from journals (abstract, bibliogrpahy, salent excerpts, etc.)Journals easier than books becaue there’s not “a lot of baggage” like is associated with book.
Ebooks don’t have similar status, becuse of audience size, markets not well defined, difficult to define business model, etc.
MIT Library survey Late 2005: What services do you want? Result: 46% response rate, thousands of comments, and volunteers for beta testing. Prefer online resources, mostly in the library. Search tools more important than the resources themselves. Course management systems very important part of the experience of finding information. Users go to the MIT catalog first, and Amazon second for finding books. For facts, Google, then Wikipedia (library staff near the bottom). Wants: a single search interface and expanded online availability of older material, and “help sorting through chaos”. People want to help design (or self-design) the way of accessing the information. They don’t want the librarians to go off and do something then bring it back fully done.
Librarians have domain expertise, which they should maintain. And should promote economic models for the the academy.
Q: Share your survey?
Yes.Q: Staff reaction?
Survey showed the users were most satisfied with the staff.Open questions:
G: maybe we’re trying to force books into a new shape? Why not redefine what the document is?
T: ebooks have the same constraints as printed books, without any of the benefits of new technology.
W: Book production is complicated. Books are not the same as journals; faculty can write articles pretty easily, but getting them to finish a book is difficult. And it is a funny business — and broken. And academic authors require hardcopy “to prove” they’re doing work.
T: Scientific authors don’t write books, because they don’t have the time and it is less prestigious than doing the “real work.” And a few digs at authors who insist on dust jackets. And cloth binding.
K: Books in my office have become a decoration. I get information online. … A regulatory bias against dispensatory nature of the internet.
W: Librarians must select things to keep. Is the web a big free vending machine? The academy is based on focusing and winnowing. Somebody had to select what goes on the web. And the academy depends on credentialling (trusing, creating, and recycling).
Fourth session:
Publishing
Mark Sandler, Suzanne DeBell, Daniel Greenstein, Alicia Wise
S: Everyone can be publishers. What do true publishing activites provide to making works “public”? Libraries and publishers have shared interests in connecting authors to readers.
D: We’ve all heard “libraries are going away.” But we don’t vacation on the moon.
Mass digitization is frightening, becuse it’s never mentioned in a nuanced way. Simple questions, where good enough is good enough, is the province of general web searches. For “things that matter,” however, commercial publishers are the way to go.
For example, our niche is to “clean up” newspapers, and digitization doesn’t work well. We do “what’s important, not what’s easy.”
Early English Books are available online — to subscribers. Fully available, not just “snippets”. To subscribers. Google isn’t scholarly, because it is only showing bits. And it’s illegal. “Is ProQuest so broken? Or is there another motive?”
“ProQuest is all about collaboration,” but it isn’t willing to release its content to the public, because they can’t control it [or charge for it].
W: “Was there life before Google?” Well, yes. Digitization is old People (publishers) think Google isn’t going to be a benevolent dictator, and doesn’t understand copyright.
(battery change)
Some issues: Copyright probably needs to be updated to recognize the changes in digital publishing, etc.
More works are born digital, so don’t have to worry too much about getting them online. Who pays? And it’s not just the pubishers, etc…. it’s the users! [Big Flash!]
Finding new business models is hard — give us some time to figure out how to make the money.
If we’ve been digitizing for 15 years, where’s the experience captured?
Need more complex internet to assure rights management. Authentication isn’t enough.
People want convenient, affordable, personalized content.
Publishers are socially responsible too!
Must define and agree on guidelines for digitization. Avoid duplication of effort.
Copyright confounded with contract law, and it’s very complex.
G: “I’m not a publisher”
Information is becoming part of the utiltiy service — a public good. Market in scholarly publishing is in adjustment. Pricing is beginning to get adjusted by what libraries are willing to pay.Faculty (producers of intellectual property) are trying to make their information more Open Access, and learn the economics of scholarly publishing.
“I don’t understand” why publishers are suing Google, since it is providing access to their backlists.
Value-added services require open access to the underlying data.
His users: scholars would have catalog & amazon open at the same time — look up the book, look in the book to see if they want it, order it from the library…
Libraries have poor circulation data to get recommender systems, so could get it from publishers, Google, etc.
Value add for browsers, rather than searchers.
Open Content Alliance: 5000 books a month (mostly out of copyright).
Trusted, third-party preservation (in perpetuity)
Open services defintion
Collection support tols
TransparencyOCA file formats — page scans: JPG, PDF. With OCR.
Want: Knowledge which rises to the public domain to stay there.
Q: Different standards across projects — how to work together?
G: They always change. We’re also becoming better at knowing what we want — for intance at CDL don’t need huge TIFFs becuase we’re better at making and providing digital images.
W: Product metadata is better, but we need more information about what librarians need. Also are developing digital rights management standards.
D: metadata yes, but image files must be minimumally acceptableQ: Publishers resent Google’s commercial success. Author’s resent publishers for a similar reason.
W: Publishers are pragmatic and business-like. But Google is possibly a competitor, and could drive down prices. Scholarly authors are different from fiction authors.Q: Authors feel bullied by publishers.
W: When I was an academic, I made sure I knew what I was signing when I signed publishing contracts.Q: Orphaned works? We’re after content, not the books themselves. When we get an out-of-print book, we’re buying the artifact for much more than the idea would have cost. What do you think about that?
G: Google Print allows for backlists to be available, often in Print on Demand. I think that’s good.
D: There’s an opportunity to push on the orphan works. But it’s hard to get copyright cleared (even for little bits included in another book).Q: Google isn’t going to have a monopoly. UM isn’t throwing away the books. As an archivist, I know there are zillions of pages that has never been read before, and needs to be figured out how to include in the corpus so that only published works don’t prejudice what’s available.
Q: The function of publication is different from what publishers do.
Q: is republishing different from publishing as far as the authors are concerned?
D: we should be looking at the newer stuff, rather than fightign over the old stuff
W: Authorship is becoming different.Q: Publishers aren’t the bad people. Consider different types of license for scholarly works (speaker is from the Purdue Univ Press). If this is all a public good? Who pays?
G: I will. And the commercial sector is paying for it. Need to recognise the scale and vision.Q: DRM from music, movies as a model? We’re not buying content the way we used to. Need better, more open DRM.
W: not about locking up content. iTunes is a pain.
Notes from the morning session, Saturday: The Economics side:
Mass Digitization
11 March 2006
Economics Panel
Ron Milne, Paul Courant, Karl Pohrt, Hal Varian
M: Publishers have a leading role to play, but their business model will have to change. Nothing but support from faculty for the project.
Questions from colleagues: prioritizing? Should we continue our own projects? How does the Google project affect acquisition budgets? And what about storage problems. Oxford building an automated building to house 8 mill books. Other libraries may be pressured to cut back on storage space.
What happens to the independent bookseller? Codex is pretty handy, and it’s better than several hundred sheets of A4 printout. We’ve been hearing about “the death of the book” for years now, and we have more than ever.
C: Comments on yesterday: We have a set of technologies that make it easier to do everything we love to do. We should be able to do a better job. We should care about how this is going to affect scholarship, not so much about how it affects libraries or booksellers or publishers or …. The economic problem is how best we can organize ourselves around this disruptive technologies.
Academic libraries: supply and demand — marginal cost of providing a particular book is low if the book is nearby. Libraries were strategic investments. Great libraries and great universities grew up together, part of each others.
Acad libs never pay-per-view, always considered a public good.
Once it’s available online, the cost of adding an additional reader is very low, zero. The functions of Great Libraries will not matter as much anymore. for the great mass of the material, it doesn’t matter where it is.
Tilt to preferring material available online — if it’s not available, it won’t be used. We have to assure “the good stuff” is available.
market competition isn’t going to sustain this — need to have cooperation and organization, etc
Scholarship & scholarly communication: hate the phrase “scholarly communication” — it’s a redundant term. “Publish or Perish” is a moral imperative! Scholarship is a collaboration across time and space. Mechanism we use: we put it in the library so others can get at it. Hundreds of years later, if necessary.
Ability to have a system to reliably get stuff into and out of the library is the key problem. Librarians *are* the trusted third party for ensuring this can happen.
Money: can’t have a market that works well unless you have the rights well-established. That’s an issue for public policy.
Preservation of culture: want to get back at our own history. This worked pretty good with books. Film and video is not so good. CUrrent lengths of copyrights are just an outrage. It has to be available otherwise we’ll build our stories only on the junk
Q: Public lending right in UK and Canada, in which publishers are reimbursed for borrowings from libraries. Consider for US policy?
– in a utopian world, you wouldn’t do it (marginal cost = 0). That said, a cheap pay-per-view model may be sustainable. But (use of) libraries should be free.Q: Isn’t worldwide copyright system a result of economic forces?
– if you mean “human greed”, you’re right. “Invisible Hand” doesn’t work in the case of public goods. How to best get value (defined as public value) out of scarce resources?V: Google Library Economic Analysis.
What is the project? Partner and Library. Partner, everyone happy with. Library, Publishers says violates copyright want opt-in, Google says fair use wants opt-out.
Legal issue: pretty specifically a US issue. Purpose and character, including commercial. Google’s ad model is tied to queries, not content. Nature of work, fact rather than fiction. Amount and substantiality, tiny selection of content. Effect of use on potential market (likely most important), finding the work possibly leads to user buying the work.
Kelly vs Arriba Soft 9th circuit court 2003. Found to be fair use, since the work was transformative, etc.
Opt in vs Opt out: transactions costs. Partner program: opt in — send us your books. Library program: opt-out. transaction costs higher with opt-in, because of search costs to find rights holder, negotiation costs. cost of opt-out: legitimate rights-holder sends email to google.
Finding rights holder: how to find? And how do you know who it is? And what about the heirs — unless there is a specific assignment in the will, very difficult. And contract law sits on top of the copyright. CMU study: 22% of publishers could not be found.
Google collection: about 25M books, and bargaining can only occur after rights holder has been found
Opt out minimizes transactions costs.
Whose behavior changes?
What is economic impact of Google Library?
Are publishers and authors going to publish fewer books? Lower quality? Lower profits?
Readers: Easier to find relevant books? Better search experience?Broader issue: Who will make the catalogs? Parties themselves have poor incentives and skills. Why we have Books in Print, etc. In future have to minimize human intervention. Computers have to scan or copy works. If you need prior search and negotiation, would place huge transactions costs on the cataloging industry. (if catalogs required rights management
Q: What about collective licensing? Such as ASCAP/BMI? What are actual transactions cost of opting out to the publishers?
– transactions costs: it’s an email. Difficult to get rights from people who don’t know they have the rights.Q: If google thinks it’s fair use, then why even bother with opt-out?
– not speaking for google, but they follow the web model — it’s there, ok to index it, etc, unless I tell you not to.Q: Fundamental distrust of Google’s motivation. Will leak out real text. How to deal with distrust?
– public policy should fit all models. Unrealistic to think there is a huge security problem, because any one with a $50 scanner and an hour’s time could scan a book.P: Independent Book Retailer. World of books changing very fast, but economies are sluggish. Will new economic models be robust enough to support *my* community?
“Madbyamikan Viewpoint” attack all philosophical viewpoints. Liberation from delusion.
Retail book environment much more volatile. in the 1990s, ABA had 4800 member companies, 6000 stores. Now, about 1200 companies. In the ’80s, 1/3 of market share, now about 10%. now 60% happen outside of bookstores.
Traditional customer base has changed — other entertainment choices, decline in literary reading accelerating.
Independent booksellers are early-warning system for publishers, outperforming market share in 2nd top 150 books.
Nothing works quite right, and what happens if google project doesn’t work as planned?
Textbooks: rental books, or none at all.
Sony reader: expensive to buy unit, but ebooks are 20-25% lower cost than retail, available only on Sony website. Disintermediates the retail bookstore.Recommends Accelerando by Charles Stross. And mentions that Stross has added info to Wikipedia. And shows it is available for free on the web.
Don’t put all our cultural eggs in one basket. Just as there are different transportation modes, there are different information modes.
General discussion:
Q: Should the government be involved?
V: if google removes a book, could be escrowed by the Library of Congress, etc. Time for enlightened public policy.Q: Scenario: find book on web, and print (bind) on demand at independent bookseller. Collective licensing for artists useful as a model. Why not for books?
P: POD could be useful, but tech isn’t quite there yet.
V: opt-in/opt-out isn’t legal concept, it is a transactions concept. Fair use for indexing is a question for courts. The frustrating part is once a user find the work exists, how does a user get it?Q: What do we do with the books? What about the scholars who care about the physical object? And what about the citation mechanism? How do we get permanence in digital works?
C: Editions are different, and librarians will help remind us of that.
Final round table:
Public Policy
Nancy Davenport, James Hilton, Bruce James, Brian Kahin
D: Over the past panels, speakers have referenced public policy and laws. Items “Rise into the public domain.” Libraries as community as well as cultural crossroads.
We’re looking for a set of policies that can adapt to changing economics conditions.
Speakers here are talking about copyright, as well as other policy issues.
J: Appreciation to the organizers. Issue is the out-of-control cost of higher education. But cutting salaries isn’t the answer. Need to find new ways to deliver higher education to students. If there was only *one* digital library, what would that do to higher education?
Government Printing Office created by James Madison (1813), cannot be copyrighted, as part of the property of the people. Government information is public and widely available. Government library system is unique in the world. 1250 partner institutions. 1993, Congress ordered GPO to put up Congressional record online, for free. Could charge for other GPO documents, but found it too cost prohibitive to collect the fees. 92% of recent stuff is online (rest is difficult — maps, etc).
Difficulties in version control, etc.
Enthusiastic about Google project, ’cause they just do it, rather than planning a lot and “doing it right.” They are making mistakes, but we’re all learning from it.
GPO needs to assure you that the document is what you think it is, i.e. Authentication. Maybe a watermark? But how to make sure it’s reputable?
His job is to save government documents in perpetuity, i.e. the time the US will exist as a country. How long have companies been around for a hundred years? So we are not going to trust a private company to be the keeper of the government’s documents. Volume of Printed publications has dropped by 90%, but people are getting it from the internet (Register: 1M hits/day). “We never had a million users a day 10 years ago.”
GPO really isn’t a bookseller. Example of 9/11 commission report — sold 10,000 copies, but private publhisher sold 200,000 copies. So need to find partnerships.
GPO has authentication, and version control and preservation responsibility. But not nec. the selling/distribtuion of the document.
K: Google vs McGraw Hill will depend on Fair Use interpretation.
an “equitable rule of reason” Stewart v. Abend. 495 US 207 (1990)Lens through which the case will be decided. Idea is to have the parties bring out the “fuzzier” points so they can be clarified.
Courts don’t think in terms of transaction costs.
Remember, the entire copyright system used to be opt-in. Changed in the 1970s, which reversed the default. To make the best protection, one should register, make notice, etc, but it’s not required.
The web is a defacto opt-out — burden on content provider to put limitations on it’s use (tell spiders to stay out, for example)
With the old system, then we could track whether people actually owned rights. Now, it’s much harder.
Conventions for negotiating digital work: such as the bits in the DMCA (notice and takedown).
Widespread agreement that Google is doing a good thing, but questions about who owns the work? and who to go after in infrinement cases? Early internet days, it was thought that it was going to be pay-per-view, but it didn’t really happen that way. It’s advertising…
Internet has moved away from linear assembly line model (or canonical model) ie a value chain, and towards a continual value cluster — where producers can benefit from parts of the network outside of your own control. Example: software developers drive use of platform.
Alien model to booksellers, but newspapers used to it — so it was easy for newspapers to move to the web.
Original response of Patricia Schroeder (Amer assoc publishers) — could bump print-on-demand, and doesn’t affect the publishers core competence. Later, says that Google hasn’t been very nice.
“I don’t get the Open Content Alliance” — seems rather like a big playpen, and heterogenous.
Google’s value is comprehensiveness. Critcal mass problem. People don’t search by publisher, or even author.
Fear that Google is next Microsoft, but Gs business model is different from MS. More open, etc.
Google has mastered the Attention Economy. Unlike Microsoft, it doesn’t have to compete with itself. Unlike publishers, it doesn’t have to compete with its backlist.
H: The emergence of the “pure property” view of the world of ideas and expression undermines the soul of the academey and is perhaps….
Fences make good neighbors, but we shouldn’t be putting our ideas in cattle pens.
General drift of patent and copyright law is towards protection of smaller and smaller ideas.
Some publishers think of fair use under the terms of license. They view copyright as though it were a license and would like it to be a license, but it isn’t.
Analogy: Should publishers be entitled to part of the coffee revenue (from the coffee shop in the bookstore)?
Who owns collaboration?
What to do?
Protest is satisfying, but not particularly effective.
Creative Commons: good, but based on the wrong premise (current copyright)
Participate in Open Source stuff
Examine new ways of scholarship
Don’t let tech transfer be the tail that wags the scholarship dog (it’s a drop in the bucket!)
Google Library goes after the public goodQ: From Open Content Alliance — clarification: not comparing to Google, more having to do with helping libraries how to continue to be libraries. Building infrastructure for providing value added services.
Q: Google court cases: Perfect10 case, Blakefield vs Google — fair use defense upheld. Compare/contrast?
K: Don’t know about 2nd case, but re: Perfect10. Google indexing pictures from third-party sites..
Q: Libertarian book, privacy, digital environment, personally believe that transparency is best. But some people are much more private.
H: Privacy is more or less gone, even in the “real” world. have to pay attention to who is managing the information. More disturbing not knowing that informatin is being kept.
Q: Polarity: Publishers vs Google. Utah Univ press has opted in. But Google is a commercial operation, and their expedients aren’t the same as maybe mine. For example, China. And Amazon — well, small publishers aren’t being supported. And who is the trusted agent for digital repository?
Q: Govdocs: is it going all the way back to 1776?
J: Yes.Q: Santorum said National weather service shouldn’t be available publicly. And what about other government-sponsored stuff — is someone going to try to hide it because it competes with commercial interests?
J: GPO is less affected by this, as commercial outfits aren’t really involved. But the overall policy shifts as policy makers do.Q: Have been unable to find government documents that used to be there. Why should we trust you?
J: Govt should take responsibility for its own documents. Whole scheme is to make certain that the citizens should be able to watch the govt.
I hear stories about stuff disappearing, but this doesn’t happen from the GPO (not talking about National Archives issue) often.
Concluding speaker
Closing talk
Clifford Lynch
What happens if we succeed?
Digitizing the Public Domain.
It is very important to get our PD materials into digital form. It is mostly non-controversial.
We will see a range of digitization projects, so Large-scale rather than mass.
We’ve focused here so much on books, but PD is so much more than books. Music, art, images — all of these need to find their way into digital form as well.
Policy issues affect these materials, also. For instance, what about 50 year old amateur photographs?Even if we get the OCR perfect, we will never get the metadata perfect. We need to figure out how to have systems to support the conversation about our cultural heritage.
Just because I have a copy of something in the public domain, doesn’t mean I have to give you a copy of it.
What does it mean to be stewards of large amounts of public domain works? What level does one decide to make a collection of PD material? One page at a time, sure (like Google). One book at a time, ok. What about the entire collection at the push of a button? Are you willing to do that?
Libraries as “Why would you want to do that? We’ll always be here.” Well, libraries don’t ask that question in general (why do you want that item?).
Some conversations overheard: We know how to digitize PD works, but how do we raise the capital?
We need to be sure we do not allow the reprivatization of large amounts of PD works through the use of contracts and subscriptions.
We don’t really have a legal problem, it is mainly a public policy problem. Terms of copyright extension. (I really wish some one would write a good book about how the Sonny Bono copyright law got enacted — a good investigative jouralist report. It looks to me, naively and without data, but it probably made a bigger economic impact to film and video indstries than the books.)
Orphan Works: not just books, and an even bigger problem in many of those areas. We we could come up with a more rational framework around copyright, these issues would become more tractable.
The older the material, the more expensive it would be to find the rightsholders.
Our great collections are important social, cultural and institutional resources. How do we insure them? How do we set values? How meaningful is it to insure a collection of treasures? Is a check really a substitute?
Digitization and large-scale replication of items is insurance. Not the same object, but the idea isn’t lost.
Go read Coleman’s comments to the AAP (link from library website).
Changing use of texts — not just single texts, but masses of text. Individual use of text is deeply engraved in all of our assumptions. However, we have moved beyond individuals having a relationship with a single text. Google itself will perform a very large complex computation on the corpus, so that it can support people’s searches. We say this is “index” — is this copyrighted?
We do lots of text processing for scientific research — Google can’t show us the text, but can compute on it all it wants. But are others going to be able to compute over the corpus?
In libraries: there are works of scholarship, and evidence (support, documentation) in support of scholarship. Need to be careful about conflating the markets for scholarly work and source materials.
Unintended consequences will be substantial. Need to keep an eye out for them as we forge as fast as possible to getting our scholarly and cultural digitized.
Q: How can I consume information better, since there is so much?
– text mining won’t replace scholars, but will help discover unexpected linkages. Increased specialization has lead to more collaboration. What does this do to historians? No way one could read every source document.Q: Digital divide — people without computers are getting left behind. How does large-scale digitization help/affect them?
– probably a whole other topic. it is a huge problem. People off-net are going to become increasingly disadvantages. For certain information needs, Google has done well, but librarians will have to help as they always have done.Q: What the outputs of the computation might be, and how do they feed back into the scholarship?
– Near term: most going to be hevily mediated through the work of scholars. Such as links that haven’t been explored yet. Sifting data. Itentifying network of information flow (and the anomolies). Areas of agreement and disagreement.Q: What has been obsolseced/reversed by creating this huge corpus?
– Plagarism, and the fear of plagarism. Not just active plagarism, but pointing out to say, high schoolers, that their ideas aren’t really new is not necessarily the best way to train students.