Notes from “A Dialogue about the Impacts of Mass Digitization”

Bar­bara and I are semi-​​live blog­ging from the Sym­po­sium being held at the Rack­ham Audi­to­rium today; I’m going to drop her notes here, and we’ll edit in links and com­ments afterwards.

A Dia­logue about the Impacts of Mass Digitization

Uni­ver­sity of Michi­gan, Rack­ham Auditorium

10 March 2006

Rack­ham is a fab­u­lous audi­to­rium. Recently restored, it shows the best of early-​​20th cen­tury dec­o­ra­tion. Peachy-​​brown paint on the walls, gilded har­le­quin pat­tern on the ceiling.

And there are two big screens with show­ing the video por­tions (likely Pow­er­point pre­sen­ta­tions), show­ing two links: Sym­po­sium web­site, and Sym­po­sium blog.

Not exactly live-​​blogging here, though there is wire­less access. Power is at a pre­mium, so Bill will be copy­ing this over later.

Beth Fitzsim­mons:
Some com­mis­sion­ers of NCLIS (pub­lic pol­icy) stand up and get welcomed.

Mass Dig­i­ti­za­tion means peo­ple have been “lib­er­ated from the con­straints of the ana­log world”

Small humor: As she says “we can be in con­tact with any­one, any­where” somebody’s cell phone goes off.

Brenda John­son & John Wilkin — the emcees (logis­tics, etc)

  • Mary Sue Cole­man: “Love of books”
  • invokes T Jef­fer­son and Library of Congress
  • Google = 800 lb Gorilla
  • quot­ing first pres­i­dent of UM: “Books are our fixed capital”
  • before google, 5000–8000 books per year were being scanned at UM
  • 4.8 bil­lion arti­facts under care in the US “her­itage and culture”
  • Most are books
  • Dig­i­tiz­ing doesn’t replace books, it expands the mar­ket­place for books
  • Deep respect for IP — “It is our Num­ber one product”
  • Pro­tect all copy­righted mate­ri­als in the archive. Will not ignore the law.
  • Para­phrase: Lookit all that information!
  • Men­tions Mak­ing of Amer­ica project
  • Men­tions “Bees and Bee-​​keeping” and how use­ful it still is for mod­ern beekeepers
  • Tran­scends debates about “snip­pets” and copyrights.
  • March 10th is the anniver­sary of the tele­phone. Relates it to the internet.

Not too many ques­tions (one com­ment, one lead­ing ques­tion about copy­right). She’s enthu­si­as­tic, as expected, about the google thing.

Panel dis­cus­sion Libraries:
Josie Parker, Bar­bara Allen, Michael Keller, Karin Wittenborg

Parker: What do we mean by “pub­lic libraries”?
Change is not free

(there is an awful ring­ing echo within the micro­phone that is very distracting)

Allen: win­dow of oppor­tu­nity before pub­lic pol­icy is set to exper­i­ment
trends in changes in user behav­iors:
Assc of Research Libraries: circulation/​reference trans­ac­tions dropped below 1991 lev­els (in 2003). ILL increased
Strong pref­er­ence for dig­i­tal, even when paper is avail­able
Expen­di­tures Dou­bled, staffing dropped, pur­chas­ing down — mostly for elec­tronic resources (up to 50% of the col­lec­tion bud­get), mainly to jour­nal pub­lish­ers
Don’t know enough about the print col­lec­tion to know the best way to develop/​use the col­lec­tion
OCLC has 32mill records — nearly 40% are held uniquely, half are pub­lished before 1977

Rethink the space of the library
Con­sider a new orga­niz­ing prin­ci­ple for col­lec­tions
“It makes lit­tle sense” to ran­domly dig­i­tize — should be orga­nized so we can cap­i­tal­ize on exist­ing tech­nol­ogy and not waste time by res­can­ning stuff

and then the mike fault makes us unhappy

Keller: All attor­neys please raise your hands
Need to change terms of ref­er­ence.
“Library as an idea” — they are aethe­r­ial, not just build­ings
Hard to mea­sure dig­i­tal resource usage
Men­tions “Final Ency­clo­pe­dia” by Dick­son
Google is the largest sup­plier of hits to High­wire (3bill a month)
we’ve been doing dig­i­ti­za­tion for years, google is just an exten­sion
sales of cur­rent books increase when you can look at the con­tent (ref: ama­zon)
stan­ford also lim­it­ing copy­righted mate­ri­als to their students/​faculty for fair use research
Intend: pro­vide new ser­vices for stan­ford users using the “Dig­i­tal avatars” from google
Tax­o­nom­i­cally index the books (by ideas, not just words)
Asso­cia­tive search­ing
Cita­tion link­ing (foot­notes)
Graph­i­cal user inter­face for link­age
Alerts to users
Rec­om­men­da­tion ser­vices
Ancil­lary ser­vices (such as def­i­n­i­tions, map loca­tions)
Work with other projects: get use cases for pub­lic pol­icy, show not harm­ful to IP own­ers, and lots of ben­nies for users

Orphan works: dis­cover which ones are avail­able
Men­tioned Inter­net Archive — 1923–1964 non-​​renewed works
Also expand Fair Use defense: It’s about intel­lec­tual exploration

Library of Alexan­dria: dig­i­tiz­ing Arabic-​​language works — not avail­able in Mid­dle East because of size of libraries, money, etc. So west­ern libraries would scan the stuff they have so the embat­tled folks in the ME would be able to see their rich heritage

Wit­ten­borg:
TJ is “really thrilled about mass dig­i­ti­za­tion“
Not an impar­tial observer. “Google is one of the most impor­tant thing that has hap­pened to libraries in my career, and UVA isn’t even involved“
Deeply dis­ap­pointed and also some­what hurt at the oppo­si­tion to mass dig­i­ti­za­tion by some librar­i­ans, pub­lish­ers and authors. — under­stand “It changes the sta­tus quo.”

Assump­tions:
Qual­ity will improve — will not be per­fect, but it doesn’t have to be
Count on per­pet­ual access
Inex­pen­sive, portable devices
Quickly move from mostly text to all lan­guages, and other for­mats
Will be affordable

Announce­ment of dig­i­ti­za­tion gave “night­mares of preser­va­tion micro­film­ing” but happy that Google has the money to do it, and the library resources are avail­able to other things

Excited about “a new def­i­n­i­tion of fair use“
Explo­ration of use of copy­right
Phys­i­cal libraries will change (will have to rein­vent our­selves) — Libraries are sink­holes for space
What to do with all that space gen­er­ated by not buy­ing books!!
Librar­i­ans — less cus­to­dial, more information

What value can libraries add? Is it sufficient?

Ques­tions:
Sec­tion 108 group rec­om­men­da­tions?
Not just to dig­i­ti­za­tion for archive, but use should be avail­able to the orga­ni­za­tion that made the archive.

What are you going to do with the books after they’re dig­i­tized?
– Dunno. Let other peo­ple store ‘em.
– Keep­ing only one copy.

What are you going to do with the space?
– Stan­ford: plan­ning book­less engi­neer­ing library, more study/​collaboration space, more professionals

MD (specif­i­cally the google project) is index­ing, not replacing.

- UVA: bet­ter serve grad­u­ate stu­dents
– more col­lab­o­ra­tion, social

Q: Point about pub­lish­ers buy­ing from researchers, then sell­ing to libraries — needs to change or there won’t be enough $$. Univs should sup­port their uni presses.
– What about dig­i­tal repos­i­to­ries? (such as phys archive) — should be inter­op­er­a­ble, but needs some work.
– When they become “wor­thy”, some printed jour­nals may stop print­ing. But we’re build­ing a tower of babel, with the com­pet­ing dig­i­tal initiatives.

Q: What kind of added value are you talk­ing about?
– no real answer

Q: Thanks for tak­ing the legal heat, but why are we in a hurry?
–K: I don’t think there is a rush.
MD helps in pre­serv­ing the mate­ri­als — pres­sure is com­ing in a pos­i­tive sense from google (they got the money)
– dri­ving force is stu­dent pressure

Q: Issues about coop­er­a­tion, dupli­ca­tion of effort, etc. If you take the stand that pre­served (copy­right) items are only for the folks that make the archive, then the gen­eral ben­e­fit is lost, because smaller libraries wouldn’t have access. And what about out-​​of-​​print but still in copy­right?
–K: In this one instance I would love for the strict con­struc­tion­ists to rehar­mo­nize copy­right with patents.

Q: Authen­ti­ca­tion — trustable con­tent?
– there are some methods

Q: DMCA?
– “By arrange­ment“
– Need more pres­sure from scholars

Q: What hap­pens when the com­put­ers stop work­ing?
– Redun­dancy is impor­tant, but don’t let it par­a­lyze you.

Q: Sus­tain­abil­ity? (a long com­ment, actu­ally)
No response.

Q: What hap­pens to dig­i­tal con­tent when the EMF comes? Is some­body mak­ing hard copies?
No response.

Sec­ond session:

Read­ing 2.0

Tim O’Reilly, O’Reilly Media

Three or four talks in one

Think­ing about the future of read­ing (sym­po­sium next week, too)

What does a book do? Not just an arti­fact.
– Com­pares Harry Pot­ter to World of War­craft.
– Ency­clo­pe­dia Bri­tan­nica to Google, and Wikipedia
Not just paper between covers

O’Reilly: Ref­er­ence, teach­ing, entertainment

Make Mag­a­zine “Martha Stew­art for Geeks“
— potato can­non “Tech­nol­ogy can be enter­tain­ment too”

1988: worked with Dav­en­port Group: Doc­Book
later: Safari Book­shelf
recently: Self-​​organizing maps (similar-​​looking mate­r­ial)
Safari U: cus­tom text­book engine (in beta)
Books are the con­tent and their inter­re­la­tions, and not just the stuff between the covers.

What job does a Library do?
Pre­serv­ing his­tory (Inter­net archive and the way­back machine)
Future access
Archive​.org used more than Library of Congress

Why Google Library mat­ters?
Peo­ple could eas­ily burn CD, lead to hand-​​wringing in music indus­try. But needed the con­tent avail­able to make the iPod pos­si­ble.
Peo­ple for­get his­tory. Inter­net was free, now lots of $$ is made.
“Free is replaced by com­mer­cial econ­omy, but only if you let it go.“
Apple has made the DRM fairly trans­par­ent (more later)
last-​​fm and pan­dora as exam­ples of dis­cov­ery mech­a­nisms (music)
same could hap­pen with books
Books aren’t “easy to rip”

Orphaned works prob­lem: buy­ing rights of orh­paned works is a night­mare. Pub­lish­ers don’t know who has the rights.

DRM should be held loosely, like a cat at the vet.

Near term oppor­tu­nity: search > demand > increased sales.

O’R Safari: sales of phys­i­cal vs ebooks — is there a dif­fer­ence in the Long Tail?
Point of sale (B&N, Bor­ders, Ama­zon) 17,754 dis­tinct SKUs. 2171 also avail­able through Safari.
Safari: bumps of older works
Long tail: accounts for 23% of Safari (out of phys­i­cal print) books
Cor­re­lated with bookscan sales (O’Reilly titles)
» Don’t know if it is an eco­nomic oppor­tu­nity, yet (see­ing out-​​of-​​print books avail­able as ebooks for sale online)
Ref­er­ence bet­ter online, “Miss­ing man­u­als” bet­ter hardcopy

Does search help or hurt? Most early evi­dence — not a big deal either way
Type of access dif­fers depend­ing on type of book

Web 2.0 “The Inter­net as Plat­form“
Infor­ma­tion Busi­nesses
Har­nesing Col­lec­tive Intel­li­gence
– Users add value
The Per­pet­ual Beta
– ebooks shouldn’t be arti­facts, they should be a ser­vice
– giv­ing peo­ple access to books as they’re being devel­oped
— buy book as a pdf of book in progress
— info about user behav­ior
— about 12 buy just pdf ver­sion, more techini­cal, more elec­tronic
— huge pro­por­tion of dig­i­tal only are inter­na­tional
Soft­ware above the level of a sin­gle device
– Google is the most widely used linux appli­ca­tion, so you use linux!

How does it apply to the book?
– needs inte­gra­tion into the ecosystem

Data is the next “Intel inside“
– appli­ca­tions increas­ingly data dri­ven
– con­cerned about all the data resid­ing with one provider
– how does con­tent become more mobile?

A plat­form beats an appli­ca­tion every time
– “one ring to rule them all” or “small pieces loosely joined“
i.e. Microsoft vs inter­net rout­ing net­work. Avoid mono­cul­ture.
– mashups
– wanted inte­grate their con­tent directly into appli­ca­tions
– so cre­at­ing web­ser­vices based help system

Q: Wik­i­books?
It is pos­si­ble to build works col­lab­o­ra­tively, but as a pub­lisher, I need to fig­ure out the eco­nomic model. The other issue is the length — a few pages, sure, but it becomes dif­fi­cult to curate the user expe­ri­ence. Online cre­ators are more often not really cre­at­ing books — they’re more assem­blages of shorter bits. (Miscellanies)

Q: Cat­a­stro­phes?
Redun­dan­cies, and don’t pro­tect too much. Mul­ti­ple for­mats. “We need bookster.“

Ses­sion 3, early afternoon:

Research, Teach­ing & Learn­ing
John King, Jean-​​Claude GuÈ­don, Ed Ten­ner, Ann Wolpert

King: wel­come

Ten­ner: Thanks UM for “Mak­ing of Amer­ica” and under­stands how help­ful. “Writer on unin­tended con­se­quences.“
10 years ago, Bill Gates was opti­misitic that we would become bet­ter edu­cated by hav­ing the “infor­ma­tion super­high­way.” But the recent stud­ies sug­gest that peo­ple are more ill­ter­ate than expected.
With increase in inter­net use and reduc­tion in TV, we should be bet­ter at deal­ing with online infor­ma­tion than we are. (pulling info from a report) Needs some atten­tion, but doesn’t seem to be too much.

Google makes peo­ple think they’re good at search­ing, but they aren’t. Google & other search engines do ok, but stu­dents may pos­si­bly need info from later pages but they don’t bother to find it.

You have to know what you’re look­ing for to per­form a good search.

Wikipedia is good, but doesn’t have it all. (The gen­eral audi­ence reac­tion: “So fix it!”)

We need to bring up lit­er­acy, but people’s IQs are get­ting higher (because of MTV and video games), so what’s up?

He’s sug­gest­ing that Schools and Libraries prac­tice “Search Engine Opti­mizia­tion”, so they get pre­sented first.

Q: It’s not the fault of Google if the result isn’t very good. So many things are avail­able only behind walls. We should teach doubt and “infor­ma­tion heuris­tics” and so forth.

Q: Wikipedia entry needed to be fixed. Did you fix it?
I am plan­ning to do so. And pos­si­bly write about my experiences.

Q: Are you famil­iar with projects used to teach peo­ple who to search? And is the infor­ma­tion there to be found?
Not every­thing, but in lim­ited time, I only described the problem.

GuÈ­don: What hap­pens when the very nature and very essense of the doc­u­ment changes? And what hap­pens to the rela­tion­ship between the doc­u­ment and the user?

Com­par­i­son to begin­ning of print­ing. Pre-​​printing, author­ity of doc­u­ment belonged to its lin­eage. Print­ers just printed what­ever they could. The response “who are those peo­ple any­way?” So they tried to cre­ate some rep­u­ta­tion. Even­tu­ally, the text became a method of pre­sent­ing the text. Crit­i­cal edi­tions become dif­fi­cult, because they don’t nec­es­sar­ily reflect the intent of the author. Shake­speare dif­fer­ent edi­tions from 1603 to 1621 are dif­fer­ent. Couldn’t he have changed his mind in 21 years?

Print == snap­shot of the times. Wikipedia is not a thing or a prod­uct. It is a process. The rela­tion­ship between the peo­ple and process is dif­fer­ent. One can edit wikipedia, one can­not edit britannica.

Dis­tance between peo­ple and texts is decreas­ing. Peo­ple should be back at the cen­ter of the search engine. “I love to Google myself. Much bet­ter than a mir­ror, at least in my case. I Google my friends. I Google my ene­mies.” But I use it con­nect with people.

Every Ph.D. the­sis should have a lit­er­a­ture review. If the libar­ians cross-​​linked this review, then there’s a set of self-​​updating knowl­edge. Also, clus­ter­ing via con­cor­dances and SIPs [in Ama­zon sense]. Cre­ate com­mu­ni­ties of peo­ple inter­ested in par­tic­u­lar topics.

Q: Cita­tion is threat­ened if your source is con­stantly chang­ing. What about the trans­for­ma­tion of cita­tion?
Pos­si­bly use RFC anal­ogy. [Bar­bara: Always about wikipedia. But they don’t seem to under­stand that the his­tory is avail­able — but Bill says, wikipedia his­tory isn’t search­able]
Attri­bu­tion is also an issue (who to reward? who to blame?)

Q: Sta­tis­ti­cally improb­a­ble phrases?
It’ll all work out

Q: Images should be Open Source. Con­vert­ing to text adds value, but gets one away from the original.

Wolpert: Has Google approached a degree of ubiq­uity as a tool for schol­arly research as it has for gen­eral subjects?

MIT deals with “born-​​digital” so doesn’t have so much to do with mass dig­i­ti­za­tion. Been work­ing with Google Scholar, instead of Book Search.
D-​​space repos­i­to­ries are searched by Google.
MIT Students/​faculty search google, then if MIT sub­scribes to the jour­nal in ques­tion, they get a live link.
Lots of expe­ri­ence on the jour­nal end — but men­tal model of large scale dig­i­ti­za­tion is shaped by jour­nal expe­ri­ence. So what users want from books is the meta­data like they get from jour­nals (abstract, bib­li­ogr­pahy, salent excerpts, etc.)

Jour­nals eas­ier than books becaue there’s not “a lot of bag­gage” like is asso­ci­ated with book.

Ebooks don’t have sim­i­lar sta­tus, becuse of audi­ence size, mar­kets not well defined, dif­fi­cult to define busi­ness model, etc.

MIT Library sur­vey Late 2005: What ser­vices do you want? Result: 46% response rate, thou­sands of com­ments, and vol­un­teers for beta test­ing. Pre­fer online resources, mostly in the library. Search tools more impor­tant than the resources them­selves. Course man­age­ment sys­tems very impor­tant part of the expe­ri­ence of find­ing infor­ma­tion. Users go to the MIT cat­a­log first, and Ama­zon sec­ond for find­ing books. For facts, Google, then Wikipedia (library staff near the bot­tom). Wants: a sin­gle search inter­face and expanded online avail­abil­ity of older mate­r­ial, and “help sort­ing through chaos”. Peo­ple want to help design (or self-​​design) the way of access­ing the infor­ma­tion. They don’t want the librar­i­ans to go off and do some­thing then bring it back fully done.

Librar­i­ans have domain exper­tise, which they should main­tain. And should pro­mote eco­nomic mod­els for the the academy.

Q: Share your sur­vey?
Yes.

Q: Staff reac­tion?
Sur­vey showed the users were most sat­is­fied with the staff.

Open ques­tions:

G: maybe we’re try­ing to force books into a new shape? Why not rede­fine what the doc­u­ment is?

T: ebooks have the same con­straints as printed books, with­out any of the ben­e­fits of new technology.

W: Book pro­duc­tion is com­pli­cated. Books are not the same as jour­nals; fac­ulty can write arti­cles pretty eas­ily, but get­ting them to fin­ish a book is dif­fi­cult. And it is a funny busi­ness — and bro­ken. And aca­d­e­mic authors require hard­copy “to prove” they’re doing work.

T: Sci­en­tific authors don’t write books, because they don’t have the time and it is less pres­ti­gious than doing the “real work.” And a few digs at authors who insist on dust jack­ets. And cloth binding.

K: Books in my office have become a dec­o­ra­tion. I get infor­ma­tion online. … A reg­u­la­tory bias against dis­pen­satory nature of the internet.

W: Librar­i­ans must select things to keep. Is the web a big free vend­ing machine? The acad­emy is based on focus­ing and win­now­ing. Some­body had to select what goes on the web. And the acad­emy depends on cre­den­tialling (trusing, cre­at­ing, and recycling).

Fourth ses­sion:

Publishing

Mark San­dler, Suzanne DeBell, Daniel Green­stein, Ali­cia Wise

S: Every­one can be pub­lish­ers. What do true pub­lish­ing activites pro­vide to mak­ing works “pub­lic”? Libraries and pub­lish­ers have shared inter­ests in con­nect­ing authors to readers.

D: We’ve all heard “libraries are going away.” But we don’t vaca­tion on the moon.

Mass dig­i­ti­za­tion is fright­en­ing, becuse it’s never men­tioned in a nuanced way. Sim­ple ques­tions, where good enough is good enough, is the province of gen­eral web searches. For “things that mat­ter,” how­ever, com­mer­cial pub­lish­ers are the way to go.

For exam­ple, our niche is to “clean up” news­pa­pers, and dig­i­ti­za­tion doesn’t work well. We do “what’s impor­tant, not what’s easy.”

Early Eng­lish Books are avail­able online — to sub­scribers. Fully avail­able, not just “snip­pets”. To sub­scribers. Google isn’t schol­arly, because it is only show­ing bits. And it’s ille­gal. “Is Pro­Quest so bro­ken? Or is there another motive?”

Pro­Quest is all about col­lab­o­ra­tion,” but it isn’t will­ing to release its con­tent to the pub­lic, because they can’t con­trol it [or charge for it].

W: “Was there life before Google?” Well, yes. Dig­i­ti­za­tion is old Peo­ple (pub­lish­ers) think Google isn’t going to be a benev­o­lent dic­ta­tor, and doesn’t under­stand copyright.

(bat­tery change)

Some issues: Copy­right prob­a­bly needs to be updated to rec­og­nize the changes in dig­i­tal pub­lish­ing, etc.

More works are born dig­i­tal, so don’t have to worry too much about get­ting them online. Who pays? And it’s not just the pubish­ers, etc.… it’s the users! [Big Flash!]

Find­ing new busi­ness mod­els is hard — give us some time to fig­ure out how to make the money.

If we’ve been dig­i­tiz­ing for 15 years, where’s the expe­ri­ence captured?

Need more com­plex inter­net to assure rights man­age­ment. Authen­ti­ca­tion isn’t enough.

Peo­ple want con­ve­nient, afford­able, per­son­al­ized content.

Pub­lish­ers are socially respon­si­ble too!

Must define and agree on guide­lines for dig­i­ti­za­tion. Avoid dupli­ca­tion of effort.

Copy­right con­founded with con­tract law, and it’s very complex.

G: “I’m not a pub­lisher“
Infor­ma­tion is becom­ing part of the utiltiy ser­vice — a pub­lic good. Mar­ket in schol­arly pub­lish­ing is in adjust­ment. Pric­ing is begin­ning to get adjusted by what libraries are will­ing to pay.

Fac­ulty (pro­duc­ers of intel­lec­tual prop­erty) are try­ing to make their infor­ma­tion more Open Access, and learn the eco­nom­ics of schol­arly publishing.

I don’t under­stand” why pub­lish­ers are suing Google, since it is pro­vid­ing access to their backlists.

Value-​​added ser­vices require open access to the under­ly­ing data.

His users: schol­ars would have cat­a­log & ama­zon open at the same time — look up the book, look in the book to see if they want it, order it from the library…

Libraries have poor cir­cu­la­tion data to get rec­om­mender sys­tems, so could get it from pub­lish­ers, Google, etc.

Value add for browsers, rather than searchers.

Open Con­tent Alliance: 5000 books a month (mostly out of copyright).

Trusted, third-​​party preser­va­tion (in per­pe­tu­ity)
Open ser­vices defin­tion
Col­lec­tion sup­port tols
Transparency

OCA file for­mats — page scans: JPG, PDF. With OCR.

Want: Knowl­edge which rises to the pub­lic domain to stay there.

Q: Dif­fer­ent stan­dards across projects — how to work together?
G: They always change. We’re also becom­ing bet­ter at know­ing what we want — for intance at CDL don’t need huge TIFFs becuase we’re bet­ter at mak­ing and pro­vid­ing dig­i­tal images.
W: Prod­uct meta­data is bet­ter, but we need more infor­ma­tion about what librar­i­ans need. Also are devel­op­ing dig­i­tal rights man­age­ment stan­dards.
D: meta­data yes, but image files must be min­i­mu­mally acceptable

Q: Pub­lish­ers resent Google’s com­mer­cial suc­cess. Author’s resent pub­lish­ers for a sim­i­lar rea­son.
W: Pub­lish­ers are prag­matic and business-​​like. But Google is pos­si­bly a com­peti­tor, and could drive down prices. Schol­arly authors are dif­fer­ent from fic­tion authors.

Q: Authors feel bul­lied by pub­lish­ers.
W: When I was an aca­d­e­mic, I made sure I knew what I was sign­ing when I signed pub­lish­ing contracts.

Q: Orphaned works? We’re after con­tent, not the books them­selves. When we get an out-​​of-​​print book, we’re buy­ing the arti­fact for much more than the idea would have cost. What do you think about that?
G: Google Print allows for back­lists to be avail­able, often in Print on Demand. I think that’s good.
D: There’s an oppor­tu­nity to push on the orphan works. But it’s hard to get copy­right cleared (even for lit­tle bits included in another book).

Q: Google isn’t going to have a monop­oly. UM isn’t throw­ing away the books. As an archivist, I know there are zil­lions of pages that has never been read before, and needs to be fig­ured out how to include in the cor­pus so that only pub­lished works don’t prej­u­dice what’s available.

Q: The func­tion of pub­li­ca­tion is dif­fer­ent from what pub­lish­ers do.
Q: is repub­lish­ing dif­fer­ent from pub­lish­ing as far as the authors are con­cerned?
D: we should be look­ing at the newer stuff, rather than fight­ign over the old stuff
W: Author­ship is becom­ing different.

Q: Pub­lish­ers aren’t the bad peo­ple. Con­sider dif­fer­ent types of license for schol­arly works (speaker is from the Pur­due Univ Press). If this is all a pub­lic good? Who pays?
G: I will. And the com­mer­cial sec­tor is pay­ing for it. Need to recog­nise the scale and vision.

Q: DRM from music, movies as a model? We’re not buy­ing con­tent the way we used to. Need bet­ter, more open DRM.
W: not about lock­ing up con­tent. iTunes is a pain.

Notes from the morn­ing ses­sion, Sat­ur­day: The Eco­nom­ics side:

Mass Digitization

11 March 2006

Eco­nom­ics Panel

Ron Milne, Paul Courant, Karl Pohrt, Hal Varian

M: Pub­lish­ers have a lead­ing role to play, but their busi­ness model will have to change. Noth­ing but sup­port from fac­ulty for the project.

Ques­tions from col­leagues: pri­or­i­tiz­ing? Should we con­tinue our own projects? How does the Google project affect acqui­si­tion bud­gets? And what about stor­age prob­lems. Oxford build­ing an auto­mated build­ing to house 8 mill books. Other libraries may be pres­sured to cut back on stor­age space.

What hap­pens to the inde­pen­dent book­seller? Codex is pretty handy, and it’s bet­ter than sev­eral hun­dred sheets of A4 print­out. We’ve been hear­ing about “the death of the book” for years now, and we have more than ever.

C: Com­ments on yes­ter­day: We have a set of tech­nolo­gies that make it eas­ier to do every­thing we love to do. We should be able to do a bet­ter job. We should care about how this is going to affect schol­ar­ship, not so much about how it affects libraries or book­sellers or pub­lish­ers or .… The eco­nomic prob­lem is how best we can orga­nize our­selves around this dis­rup­tive technologies.

Aca­d­e­mic libraries: sup­ply and demand — mar­ginal cost of pro­vid­ing a par­tic­u­lar book is low if the book is nearby. Libraries were strate­gic invest­ments. Great libraries and great uni­ver­si­ties grew up together, part of each others.

Acad libs never pay-​​per-​​view, always con­sid­ered a pub­lic good.

Once it’s avail­able online, the cost of adding an addi­tional reader is very low, zero. The func­tions of Great Libraries will not mat­ter as much any­more. for the great mass of the mate­r­ial, it doesn’t mat­ter where it is.

Tilt to pre­fer­ring mate­r­ial avail­able online — if it’s not avail­able, it won’t be used. We have to assure “the good stuff” is available.

mar­ket com­pe­ti­tion isn’t going to sus­tain this — need to have coop­er­a­tion and orga­ni­za­tion, etc

Schol­ar­ship & schol­arly com­mu­ni­ca­tion: hate the phrase “schol­arly com­mu­ni­ca­tion” — it’s a redun­dant term. “Pub­lish or Per­ish” is a moral imper­a­tive! Schol­ar­ship is a col­lab­o­ra­tion across time and space. Mech­a­nism we use: we put it in the library so oth­ers can get at it. Hun­dreds of years later, if necessary.

Abil­ity to have a sys­tem to reli­ably get stuff into and out of the library is the key prob­lem. Librar­i­ans *are* the trusted third party for ensur­ing this can happen.

Money: can’t have a mar­ket that works well unless you have the rights well-​​established. That’s an issue for pub­lic policy.

Preser­va­tion of cul­ture: want to get back at our own his­tory. This worked pretty good with books. Film and video is not so good. CUr­rent lengths of copy­rights are just an out­rage. It has to be avail­able oth­er­wise we’ll build our sto­ries only on the junk

Q: Pub­lic lend­ing right in UK and Canada, in which pub­lish­ers are reim­bursed for bor­row­ings from libraries. Con­sider for US pol­icy?
– in a utopian world, you wouldn’t do it (mar­ginal cost = 0). That said, a cheap pay-​​per-​​view model may be sus­tain­able. But (use of) libraries should be free.

Q: Isn’t world­wide copy­right sys­tem a result of eco­nomic forces?
– if you mean “human greed”, you’re right. “Invis­i­ble Hand” doesn’t work in the case of pub­lic goods. How to best get value (defined as pub­lic value) out of scarce resources?

V: Google Library Eco­nomic Analysis.

What is the project? Part­ner and Library. Part­ner, every­one happy with. Library, Pub­lish­ers says vio­lates copy­right want opt-​​in, Google says fair use wants opt-​​out.

Legal issue: pretty specif­i­cally a US issue. Pur­pose and char­ac­ter, includ­ing com­mer­cial. Google’s ad model is tied to queries, not con­tent. Nature of work, fact rather than fic­tion. Amount and sub­stan­tial­ity, tiny selec­tion of con­tent. Effect of use on poten­tial mar­ket (likely most impor­tant), find­ing the work pos­si­bly leads to user buy­ing the work.

Kelly vs Arriba Soft 9th cir­cuit court 2003. Found to be fair use, since the work was trans­for­ma­tive, etc.

Opt in vs Opt out: trans­ac­tions costs. Part­ner pro­gram: opt in — send us your books. Library pro­gram: opt-​​out. trans­ac­tion costs higher with opt-​​in, because of search costs to find rights holder, nego­ti­a­tion costs. cost of opt-​​out: legit­i­mate rights-​​holder sends email to google.

Find­ing rights holder: how to find? And how do you know who it is? And what about the heirs — unless there is a spe­cific assign­ment in the will, very dif­fi­cult. And con­tract law sits on top of the copy­right. CMU study: 22% of pub­lish­ers could not be found.

Google col­lec­tion: about 25M books, and bar­gain­ing can only occur after rights holder has been found

Opt out min­i­mizes trans­ac­tions costs.

Whose behav­ior changes?
What is eco­nomic impact of Google Library?
Are pub­lish­ers and authors going to pub­lish fewer books? Lower qual­ity? Lower prof­its?
Read­ers: Eas­ier to find rel­e­vant books? Bet­ter search experience?

Broader issue: Who will make the cat­a­logs? Par­ties them­selves have poor incen­tives and skills. Why we have Books in Print, etc. In future have to min­i­mize human inter­ven­tion. Com­put­ers have to scan or copy works. If you need prior search and nego­ti­a­tion, would place huge trans­ac­tions costs on the cat­a­loging indus­try. (if cat­a­logs required rights management

Q: What about col­lec­tive licens­ing? Such as ASCAP/​BMI? What are actual trans­ac­tions cost of opt­ing out to the pub­lish­ers?
– trans­ac­tions costs: it’s an email. Dif­fi­cult to get rights from peo­ple who don’t know they have the rights.

Q: If google thinks it’s fair use, then why even bother with opt-​​out?
– not speak­ing for google, but they fol­low the web model — it’s there, ok to index it, etc, unless I tell you not to.

Q: Fun­da­men­tal dis­trust of Google’s moti­va­tion. Will leak out real text. How to deal with dis­trust?
– pub­lic pol­icy should fit all mod­els. Unre­al­is­tic to think there is a huge secu­rity prob­lem, because any one with a $50 scan­ner and an hour’s time could scan a book.

P: Inde­pen­dent Book Retailer. World of books chang­ing very fast, but economies are slug­gish. Will new eco­nomic mod­els be robust enough to sup­port *my* community?

Mad­byamikan View­point” attack all philo­soph­i­cal view­points. Lib­er­a­tion from delusion.

Retail book envi­ron­ment much more volatile. in the 1990s, ABA had 4800 mem­ber com­pa­nies, 6000 stores. Now, about 1200 com­pa­nies. In the ‘80s, 13 of mar­ket share, now about 10%. now 60% hap­pen out­side of bookstores.

Tra­di­tional cus­tomer base has changed — other enter­tain­ment choices, decline in lit­er­ary read­ing accelerating.

Inde­pen­dent book­sellers are early-​​warning sys­tem for pub­lish­ers, out­per­form­ing mar­ket share in 2nd top 150 books.

Noth­ing works quite right, and what hap­pens if google project doesn’t work as planned?

Text­books: rental books, or none at all.
Sony reader: expen­sive to buy unit, but ebooks are 20–25% lower cost than retail, avail­able only on Sony web­site. Dis­in­ter­me­di­ates the retail bookstore.

Rec­om­mends Accelerando by Charles Stross. And men­tions that Stross has added info to Wikipedia. And shows it is avail­able for free on the web.

Don’t put all our cul­tural eggs in one bas­ket. Just as there are dif­fer­ent trans­porta­tion modes, there are dif­fer­ent infor­ma­tion modes.

Gen­eral discussion:

Q: Should the gov­ern­ment be involved?
V: if google removes a book, could be escrowed by the Library of Con­gress, etc. Time for enlight­ened pub­lic policy.

Q: Sce­nario: find book on web, and print (bind) on demand at inde­pen­dent book­seller. Col­lec­tive licens­ing for artists use­ful as a model. Why not for books?

P: POD could be use­ful, but tech isn’t quite there yet.
V: opt-​​in/​opt-​​out isn’t legal con­cept, it is a trans­ac­tions con­cept. Fair use for index­ing is a ques­tion for courts. The frus­trat­ing part is once a user find the work exists, how does a user get it?

Q: What do we do with the books? What about the schol­ars who care about the phys­i­cal object? And what about the cita­tion mech­a­nism? How do we get per­ma­nence in dig­i­tal works?

C: Edi­tions are dif­fer­ent, and librar­i­ans will help remind us of that.

Final round table:

Pub­lic Policy

Nancy Dav­en­port, James Hilton, Bruce James, Brian Kahin

D: Over the past pan­els, speak­ers have ref­er­enced pub­lic pol­icy and laws. Items “Rise into the pub­lic domain.” Libraries as com­mu­nity as well as cul­tural crossroads.

We’re look­ing for a set of poli­cies that can adapt to chang­ing eco­nom­ics conditions.

Speak­ers here are talk­ing about copy­right, as well as other pol­icy issues.

J: Appre­ci­a­tion to the orga­niz­ers. Issue is the out-​​of-​​control cost of higher edu­ca­tion. But cut­ting salaries isn’t the answer. Need to find new ways to deliver higher edu­ca­tion to stu­dents. If there was only *one* dig­i­tal library, what would that do to higher education?

Gov­ern­ment Print­ing Office cre­ated by James Madi­son (1813), can­not be copy­righted, as part of the prop­erty of the peo­ple. Gov­ern­ment infor­ma­tion is pub­lic and widely avail­able. Gov­ern­ment library sys­tem is unique in the world. 1250 part­ner insti­tu­tions. 1993, Con­gress ordered GPO to put up Con­gres­sional record online, for free. Could charge for other GPO doc­u­ments, but found it too cost pro­hib­i­tive to col­lect the fees. 92% of recent stuff is online (rest is dif­fi­cult — maps, etc).

Dif­fi­cul­ties in ver­sion con­trol, etc.

Enthu­si­as­tic about Google project, ’cause they just do it, rather than plan­ning a lot and “doing it right.” They are mak­ing mis­takes, but we’re all learn­ing from it.

GPO needs to assure you that the doc­u­ment is what you think it is, i.e. Authen­ti­ca­tion. Maybe a water­mark? But how to make sure it’s reputable?

His job is to save gov­ern­ment doc­u­ments in per­pe­tu­ity, i.e. the time the US will exist as a coun­try. How long have com­pa­nies been around for a hun­dred years? So we are not going to trust a pri­vate com­pany to be the keeper of the government’s doc­u­ments. Vol­ume of Printed pub­li­ca­tions has dropped by 90%, but peo­ple are get­ting it from the inter­net (Reg­is­ter: 1M hits/​day). “We never had a mil­lion users a day 10 years ago.”

GPO really isn’t a book­seller. Exam­ple of 911 com­mis­sion report — sold 10,000 copies, but pri­vate pub­l­hisher sold 200,000 copies. So need to find partnerships.

GPO has authen­ti­ca­tion, and ver­sion con­trol and preser­va­tion respon­si­bil­ity. But not nec. the selling/​distribtuion of the document.

K: Google vs McGraw Hill will depend on Fair Use inter­pre­ta­tion.
an “equi­table rule of rea­son” Stew­art v. Abend. 495 US 207 (1990)

Lens through which the case will be decided. Idea is to have the par­ties bring out the “fuzzier” points so they can be clarified.

Courts don’t think in terms of trans­ac­tion costs.

Remem­ber, the entire copy­right sys­tem used to be opt-​​in. Changed in the 1970s, which reversed the default. To make the best pro­tec­tion, one should reg­is­ter, make notice, etc, but it’s not required.

The web is a defacto opt-​​out — bur­den on con­tent provider to put lim­i­ta­tions on it’s use (tell spi­ders to stay out, for example)

With the old sys­tem, then we could track whether peo­ple actu­ally owned rights. Now, it’s much harder.

Con­ven­tions for nego­ti­at­ing dig­i­tal work: such as the bits in the DMCA (notice and takedown).

Wide­spread agree­ment that Google is doing a good thing, but ques­tions about who owns the work? and who to go after in infrine­ment cases? Early inter­net days, it was thought that it was going to be pay-​​per-​​view, but it didn’t really hap­pen that way. It’s advertising…

Inter­net has moved away from lin­ear assem­bly line model (or canon­i­cal model) ie a value chain, and towards a con­tin­ual value clus­ter — where pro­duc­ers can ben­e­fit from parts of the net­work out­side of your own con­trol. Exam­ple: soft­ware devel­op­ers drive use of platform.

Alien model to book­sellers, but news­pa­pers used to it — so it was easy for news­pa­pers to move to the web.

Orig­i­nal response of Patri­cia Schroeder (Amer assoc pub­lish­ers) — could bump print-​​on-​​demand, and doesn’t affect the pub­lish­ers core com­pe­tence. Later, says that Google hasn’t been very nice.

I don’t get the Open Con­tent Alliance” — seems rather like a big playpen, and heterogenous.

Google’s value is com­pre­hen­sive­ness. Crit­cal mass prob­lem. Peo­ple don’t search by pub­lisher, or even author.

Fear that Google is next Microsoft, but Gs busi­ness model is dif­fer­ent from MS. More open, etc.

Google has mas­tered the Atten­tion Econ­omy. Unlike Microsoft, it doesn’t have to com­pete with itself. Unlike pub­lish­ers, it doesn’t have to com­pete with its backlist.

H: The emer­gence of the “pure prop­erty” view of the world of ideas and expres­sion under­mines the soul of the acad­e­mey and is perhaps.…

Fences make good neigh­bors, but we shouldn’t be putting our ideas in cat­tle pens.

Gen­eral drift of patent and copy­right law is towards pro­tec­tion of smaller and smaller ideas.

Some pub­lish­ers think of fair use under the terms of license. They view copy­right as though it were a license and would like it to be a license, but it isn’t.

Anal­ogy: Should pub­lish­ers be enti­tled to part of the cof­fee rev­enue (from the cof­fee shop in the bookstore)?

Who owns collaboration?

What to do?
Protest is sat­is­fy­ing, but not par­tic­u­larly effec­tive.
Cre­ative Com­mons: good, but based on the wrong premise (cur­rent copy­right)
Par­tic­i­pate in Open Source stuff
Exam­ine new ways of schol­ar­ship
Don’t let tech trans­fer be the tail that wags the schol­ar­ship dog (it’s a drop in the bucket!)
Google Library goes after the pub­lic good

Q: From Open Con­tent Alliance — clar­i­fi­ca­tion: not com­par­ing to Google, more hav­ing to do with help­ing libraries how to con­tinue to be libraries. Build­ing infra­struc­ture for pro­vid­ing value added services.

Q: Google court cases: Perfect10 case, Blake­field vs Google — fair use defense upheld. Compare/​contrast?

K: Don’t know about 2nd case, but re: Perfect10. Google index­ing pic­tures from third-​​party sites..

Q: Lib­er­tar­ian book, pri­vacy, dig­i­tal envi­ron­ment, per­son­ally believe that trans­parency is best. But some peo­ple are much more private.

H: Pri­vacy is more or less gone, even in the “real” world. have to pay atten­tion to who is man­ag­ing the infor­ma­tion. More dis­turb­ing not know­ing that infor­matin is being kept.

Q: Polar­ity: Pub­lish­ers vs Google. Utah Univ press has opted in. But Google is a com­mer­cial oper­a­tion, and their expe­di­ents aren’t the same as maybe mine. For exam­ple, China. And Ama­zon — well, small pub­lish­ers aren’t being sup­ported. And who is the trusted agent for dig­i­tal repository?

Q: Gov­docs: is it going all the way back to 1776?
J: Yes.

Q: San­to­rum said National weather ser­vice shouldn’t be avail­able pub­licly. And what about other government-​​sponsored stuff — is some­one going to try to hide it because it com­petes with com­mer­cial inter­ests?
J: GPO is less affected by this, as com­mer­cial out­fits aren’t really involved. But the over­all pol­icy shifts as pol­icy mak­ers do.

Q: Have been unable to find gov­ern­ment doc­u­ments that used to be there. Why should we trust you?
J: Govt should take respon­si­bil­ity for its own doc­u­ments. Whole scheme is to make cer­tain that the cit­i­zens should be able to watch the govt.
I hear sto­ries about stuff dis­ap­pear­ing, but this doesn’t hap­pen from the GPO (not talk­ing about National Archives issue) often.

Con­clud­ing speaker

Clos­ing talk

Clif­ford Lynch

What hap­pens if we succeed?

Dig­i­tiz­ing the Pub­lic Domain.
It is very impor­tant to get our PD mate­ri­als into dig­i­tal form. It is mostly non-​​controversial.
We will see a range of dig­i­ti­za­tion projects, so Large-​​scale rather than mass.
We’ve focused here so much on books, but PD is so much more than books. Music, art, images — all of these need to find their way into dig­i­tal form as well.
Pol­icy issues affect these mate­ri­als, also. For instance, what about 50 year old ama­teur photographs?

Even if we get the OCR per­fect, we will never get the meta­data per­fect. We need to fig­ure out how to have sys­tems to sup­port the con­ver­sa­tion about our cul­tural heritage.

Just because I have a copy of some­thing in the pub­lic domain, doesn’t mean I have to give you a copy of it.

What does it mean to be stew­ards of large amounts of pub­lic domain works? What level does one decide to make a col­lec­tion of PD mate­r­ial? One page at a time, sure (like Google). One book at a time, ok. What about the entire col­lec­tion at the push of a but­ton? Are you will­ing to do that?

Libraries as “Why would you want to do that? We’ll always be here.” Well, libraries don’t ask that ques­tion in gen­eral (why do you want that item?).

Some con­ver­sa­tions over­heard: We know how to dig­i­tize PD works, but how do we raise the capital?

We need to be sure we do not allow the repri­va­ti­za­tion of large amounts of PD works through the use of con­tracts and subscriptions.

We don’t really have a legal prob­lem, it is mainly a pub­lic pol­icy prob­lem. Terms of copy­right exten­sion. (I really wish some one would write a good book about how the Sonny Bono copy­right law got enacted — a good inves­tiga­tive joural­ist report. It looks to me, naively and with­out data, but it prob­a­bly made a big­ger eco­nomic impact to film and video ind­stries than the books.)

Orphan Works: not just books, and an even big­ger prob­lem in many of those areas. We we could come up with a more ratio­nal frame­work around copy­right, these issues would become more tractable.

The older the mate­r­ial, the more expen­sive it would be to find the rightsholders.

Our great col­lec­tions are impor­tant social, cul­tural and insti­tu­tional resources. How do we insure them? How do we set val­ues? How mean­ing­ful is it to insure a col­lec­tion of trea­sures? Is a check really a substitute?

Dig­i­ti­za­tion and large-​​scale repli­ca­tion of items is insur­ance. Not the same object, but the idea isn’t lost.

Go read Coleman’s com­ments to the AAP (link from library website).

Chang­ing use of texts — not just sin­gle texts, but masses of text. Indi­vid­ual use of text is deeply engraved in all of our assump­tions. How­ever, we have moved beyond indi­vid­u­als hav­ing a rela­tion­ship with a sin­gle text. Google itself will per­form a very large com­plex com­pu­ta­tion on the cor­pus, so that it can sup­port people’s searches. We say this is “index” — is this copyrighted?

We do lots of text pro­cess­ing for sci­en­tific research — Google can’t show us the text, but can com­pute on it all it wants. But are oth­ers going to be able to com­pute over the corpus?

In libraries: there are works of schol­ar­ship, and evi­dence (sup­port, doc­u­men­ta­tion) in sup­port of schol­ar­ship. Need to be care­ful about con­flat­ing the mar­kets for schol­arly work and source materials.

Unin­tended con­se­quences will be sub­stan­tial. Need to keep an eye out for them as we forge as fast as pos­si­ble to get­ting our schol­arly and cul­tural digitized.

Q: How can I con­sume infor­ma­tion bet­ter, since there is so much?
– text min­ing won’t replace schol­ars, but will help dis­cover unex­pected link­ages. Increased spe­cial­iza­tion has lead to more col­lab­o­ra­tion. What does this do to his­to­ri­ans? No way one could read every source document.

Q: Dig­i­tal divide — peo­ple with­out com­put­ers are get­ting left behind. How does large-​​scale dig­i­ti­za­tion help/​affect them?
– prob­a­bly a whole other topic. it is a huge prob­lem. Peo­ple off-​​net are going to become increas­ingly dis­ad­van­tages. For cer­tain infor­ma­tion needs, Google has done well, but librar­i­ans will have to help as they always have done.

Q: What the out­puts of the com­pu­ta­tion might be, and how do they feed back into the schol­ar­ship?
– Near term: most going to be hevily medi­ated through the work of schol­ars. Such as links that haven’t been explored yet. Sift­ing data. Iten­ti­fy­ing net­work of infor­ma­tion flow (and the anomolies). Areas of agree­ment and disagreement.

Q: What has been obsolseced/​reversed by cre­at­ing this huge cor­pus?
– Pla­garism, and the fear of pla­garism. Not just active pla­garism, but point­ing out to say, high school­ers, that their ideas aren’t really new is not nec­es­sar­ily the best way to train students.

This entry was posted in Uncategorized by Tozier. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>