Every digitized book, no matter how old or new, can become captive to an internet publishing enterprise – with Google a clear front-runner – potentially accumulating millions, perhaps billions, for an online aggregator while doling out pennies to those who one by one wrote this wealth of knowledge, permanently commodifying, commercializing, and monetizing the out-of-print backlist. With this kind of corporate potential at play how will anymore copyright-free ever see the digital light of day?
Lynn Chu, a writer’s rep, describes the 385 page “mind-numbing” Google setlement as “a vast cumpulsory licensing scheme ….setting in amber” Google’s internet “publisher monopoly power” exploiting America’s entire publishing output through a copyright-replacing Book Rights Registry, even managing publishers’ and authors’ capitulation as Google “data-entry slaves.” Strong words from a biased source in a WSJ.com opinion piece March 28.
A few years ago I heard an Associate University Librarian from the University of Michigan blandly express disinterest with what Google might do with the digitized copyies of that library’s treasure, freely given in exchange for a local digitized instance. Chu points out that “PDF scanning (how Google and everyone else digitizes books) [is] cheap and easy. Books will be digitized without Google” (see my comments from a year ago). Copyright-free? Dead as civic values. Is this the legacy some of America’s best libraries help leave in a drive to digitize on the cheap?
Folks from the Biodiversity Heritage Library gave a presentation to Boston Library Consortium (BLC) members today about how they are using books and serials scanned from their collections into the Internet Archive (as charter participants in the Open Content Alliance) to create a scholarly portal (geek’s haven) for accessing their content in a variety of interesting ways. The natural science collections they are scanning, some of the oldest yet still currently used scientific literature, lends itself to searching by species and other like names. The most intriguing tool they have developed is to cross-index all the content of the books and journals they have scanned (and are continuing to scan) against the NameBank taxonomic classification system (currently at 10,775,553 records) created by the Marine Biological Laboratory in Woods Hole, Massachusetts, whose library, the MBLHWHOI Library, is also a member of the BLC. As they explained it, names of plants, animals, insects, etc. in scientific literature very much depend upon history and precedence – where does this fit in with what has been observed and classified before? – which sounds to me a lot like the ISI principle of citation history – who cites whom – tracking the growth and development of a scholarly body of literature.
There’s no reason these same principles could not be applied to other scholarly schemes. Someone mentioned, for example, tracking every instance of the words “Tom Sawyer” in fiction not written by Samuel Clemens utilizing a human “namebank” would yield some fascinating results. A multi-type library academic consortium such as the BLC could provide fascinating “windows” into its scanned collection(s) this way. It also strikes me that there are a lot of institutional repository-like lessons to be learned here as well as a striking example of creating a sophisticated web interface using a dazzling variety (“purposeful emerging technology”) of off-the-shelf web tools / software / applications, etc.
Hit more for my detailed notes on today’s meeting-
(more…)
Algorithms are the secret recipe rendering unmanageable amounts of data useful to the naked eye. In fact, according to the Economist, a recipe is an algorithm! A recent article, pages 85-7 in the September 15 issue and dated September 13 online, entitled “Business by numbers,” provides an excellent overview for the lay reader of the various purposes to which algorithms are put: verifying credit card numbers (the Luhn algorithm); projecting shipping logistics at UPS; rerouting delivery trucks out on the road, internet traffic across the globe, and airplanes on the runway using real time optimisation; or detecting fraudulent shopping behavior.
Algorithms are also used to make sense of folks’ daily activities – yours and mine – and provides opportunity “to respond to each customer in a personalised way.” My grocer can compare my inordinate love of orange juice with those of other juice purchasers and soon get a pretty good idea of what else I might like to buy that I am not yet already buying – and pitch me with a coupon. Amazon now tries to get us to buy two books instead of just the one we were looking for. Search engines, like Google, where algorithms underpin every results screen sent us, analyze our every browse and click. The bigger the pile of data the more precisely can be the response – as long as the algorithm sorting works. The author describes the fine art of discrimination required in creating effective and useful algorithms. Just like recipes, they require testing. This is how “they” hope to get ever closer to answering differently and well for each one of us that all important question: “what should I do on my day off tomorrow?” (See previous post.)
Loyalty cards and logins help peel back the curtain shielding us from corporate prying eyes. So while tailored offerings might delight us, one should not forget the title of this article, either. It is commercial consideration ultimately driving business by the numbers.
At last year’s Off-Campus Library Services Conference in Savannah, Georgia I heard Marshall Keys’ attempt to disabuse us librarians in attendance about the privacy concerns of young library patrons in the online world of today. He entertained us with pictures of students with bongs, in heightened states of revelry, and the like – all easily garnered from the web. Many newspaper articles have since appeared about the imprudence of such displays when college graduates job-search the next year.
But this is just the tip of the proverbial iceberg. We give away a little with every search and online post. Carefully collected revelations about ourselves could feed the consumerist machine which will then ceaselessly throw back tailored just-for-us pitches. And who wants to lead the charge? Google seems a likely candidate:
“As CEO Eric Schmidt explained last May, ‘We cannot even answer the most basic questions because we don’t know enough about you. That is the most important aspect of Google’s expansion.’ He said that Google wants to be able to answer when users ask, for example, ‘the question such as ‘What shall I do tomorrow?’ and ‘What job shall I take?””
So Google wants to know enough about us to answer the kind of question we might ask our partner as we finish dinner or ask a friend what direction our life should take after we’ve settled in over a drink. What’s the likelihood, though, if it’s sailing we enjoy, that community sailing will be the top search engine response as to what to do tomorrow, instead of the Sailing Boat Show with a stiff entry fee? Madison Avenue is climbing on board and Google is luring them in – that’s the gist of Jeffrey Chester’s online article Will Google’s Greed Ruin the Internet? Chester is head of the Center for Digital Democracy.
Those dreamy fluid notions from a few years back of an alternative, democratically driven, and pure online world just seem to be slipping through our mouseclicks. A number of the posts commenting on this article suggested (pleaded?): don’t click on the ads! Well, somebody is – just look at Google’s balance sheets. Librarians safeguard the curiosities of folks from the questions asked to the books checked out. Need we, when everyone is already letting it all hang out and Google wants to be able to skip the reference interview entirely?