School of Information Management & Systems
 Previously School of Library & Information Studies

  296a-1 Seminar: Information Access.
 ("The Friday Afternoon Seminar")
 Summaries - Spring 2004.


Fridays 3-5. 107 South Hall. Schedule.
Summaries will be added below as they become available.

Jan 23: Cliff LYNCH: Welcome and introduction to the Seminar.
    Fredric GEY & Ray LARSON: Report on the HICSS: Hawaii International Conference on System Sciences, Jan 2004. www.hicss.org.
    Michael BUCKLAND: "Intermediate infrastructure".
    In a traditional, paper-based library, if you wanted to find out about an unfamiliar topic, the reference collection provided a rich environment for the early stages of inquiry. Dictionaries, encyclopedias, biographical dictionaries, gazetteers and atlases, and other reference tools help one with Who, What, Where, and When. Bibliographies and the catalog indicate resources which might help with How and Why.
    Digital library services started with bibliographies and catalogs, then moved to texts, images, and other primary resources. How might we reconstruct in a digital library environment the "intermediate infrastructure" provided by the conventional reference library?

Jan 30: Clifford LYNCH: Stewardship in the Digital Age.
    The shift of much of our ongoing intellectual and cultural record into digital format, and the capability to capture much of the past record in digital form, gives rise to a vast number of new stewardship issues. My first goal in this talk is to give a survey and overview of the issues as they play out on three levels: the personal, the organizational, and the national. I will then, as time allows, go into more detail on several specific issues: storage models and replication; issues raised by cyberinfrastructure or "e-science" developments; and public policy issues. I'll highlight a number of open research questions. I intend to continue exploring specific issues within this framework in additional seminar presentations during the semester.

Feb 6: Jane WHITE, Director, International Children's Digital Library www.icdlbooks.org
    Background: The International Children's Digital Library is a research project funded by the National Science Foundation. The primary goal of the project is to study how children relate to digital material. Currently, the library is showcasing around 300 books from around the world, with an additional 600 about to be posted. These books represent over 26 different languages. The library was launched in November 2002 at the Library of Congress in Washington, D.C.
    Scope of Project: The International Children's Digital Library is dedicated to showcasing the best in children's literature from around the world. In order to achieve this goal, the ICDL’s is building relationships around the world that help identify quality books reflective of each country’s outstanding children’s literature both historic and contemporary. In each country, the ICDL identifies the appropriate organization(s) best suited to accomplish these tasks. The ICDL believes that an important collection is built with a strong selection process and relies on partnerships with national libraries, IBBY representatives, and other Institutes for Children’s Literature. It is hoped that through these collaborative relationships that the ICDL collection will continue to grow and reach its goal of 10,000 quality children’s books in over 100 languages.
    Who is using the site: Initial user analysis has shown that the majority of ICDL users are parents or kids using the ICDL in their home. Second are teachers and librarians eager to address the needs of a growing diversity in their classrooms and/or libraries. The majority of ICDL's users are in the United States, Taiwan, Canada, Hong Kong, or Europe. As the collection grows, it is expected that use will increase in other countries.

Feb 13: AnnaLee SAXENIAN, Dean: Discussion.
    Professor Saxenian became Dean of this School on February 1 and will lead a discussion about the opportunities and challenges of developing an interdisciplinary professional school at Berkeley, including the challenges of building a new program in the current economic and intellectual climate.
    As background, note the Information Planning Group report that recommended a new program at http://www.sims.berkeley.edu/about/history/proposal.html.

Feb 20: Clifford LYNCH and Michael BUCKLAND
    Clifford LYNCH: Stewardship in the Digital Age (Continued)
.
    The shift of much of our ongoing intellectual and cultural record into digital format, and the capability to capture much of the past record in digital form, gives rise to a vast number of new stewardship issues. My first goal in this talk is to give a survey and overview of the issues as they play out on three levels: the personal, the organizational, and the national. I will then, as time allows, go into more detail on several specific issues: storage models and replication; issues raised by cyberinfrastructure or "e-science" developments; and public policy issues. I'll highlight a number of open research questions. I intend to continue exploring specific issues within this framework in additional seminar presentations during the semester. (Continued).
    Michael BUCKLAND: Infrastructure to Support Multimedia Search.
    The broad adoption of digital technology leads to optimistic ideas about "convergent media environments," but the reality is complex. Images and text, for example, remain quite different forms of expression in a digital environment as in a paper environment. Searching across media directly appears impossible in principle, but it can be done indirectly using shared (or interoperable) metadata. The concepts and terminology for analyzing these issues appear inadequate. I will raise some questions and invite answers and suggestions.

Feb 27: Oliver GUENTHER, Institute of Information Systems, Humboldt University, Berlin: Privacy in E-Commerce: Stated Preferences vs. Actual Behavior.
    In this talk, we present results from a large-scale online shopping experiment. They suggest that, given the right circumstances, online users easily forget about their privacy concerns and communicate even the most personal details without any compelling reason to do so. This holds in particular when the online exchange is entertaining and appropriate benefits are offered in return for information revelation -- circumstances easily created by second-generation agent technologies and embodied interface agents. Privacy statements have no impact on most users' behavior. In concluding, we discuss some possible reasons for this discrepancy between stated preferences and actual behavior. We also suggest ways how to help users better align their actions with their goals. (Joint work with Bettina Berendt and Sarah Spiekermann).

Mar 5: Czeslaw GRYCZ, Octavo: Octavo Ultra-High Resolution Digital Images of Rare Books: A Survey of What Works and What Needs Work. www.octavo.com.
    Czeslaw Jan Grycz is CEO of Octavo, a company specializing in the ultra-high digitization (10,500 x 12,700 pixels) of rare books, incunabula titles, and manuscripts. Seeking a niche in the publishing market, Octavo also publishes a variety of resulting materials from the images.
    On the one hand, the very high definition images constitute surrogate digital facsimile files for a library wishing to protect its valuable original works against damage, unnecessary circulation, or over-handling. On the other hand, the digital publications Octavo produces, provide wider access to books that would otherwise be difficult, if not impossible, to handle and study in person.
    As Octavo's collection grows, issues of file of management, metadata, color controls, proper visualization all become more complex.
    Chet came to the UC Berkeley from Stanford University Press. He worked at the University of California Press for 14 years as Design and Manufacturing Manager, and for Clifford Lynch in the (then) "Division of Library Automation" for an additional six years before taking early retirement. He will share with us a survey of Octavo's accomplishments during 2003, tell us about some projects underway in 2004, and will identify some of the most pressing challenges of 2004. He will hypothesize about future of digitization and digital publication, given ongoing changes in technology and pedagogy.

March 12: John McCARTHY, Steve LUSSIER, Henri POOLE, Dan ROBINSON and Phil WOLFF:   Digital Democracy: Report on a recent conference and notes from the trenches.
    We'll recap highlights from presentations as well as informal conversations with participants at the Digital Democracy Teach-In on Feb 9 in San Diego which brought together "pioneers who are re-inventing democracy for our networked world," including Joe Trippi, political consultant and former Dean Campaign Manager, Wes Boyd, co-founder of MoveOn.org, and Scott Heiferman, founder of Meetup.com.
    We'll also give our personal perspectives on some successes and failures of information technology in this year's election cycle, along with discussion of where these developments may lead in the next few years.
    John McCARTHY has been working on databases at LBL since 1980 (and elsewhere for 15 years before that) and has participated regularly in the Friday Afternoon Seminars over the years. Having returned to volunteer political activities in June after a thirty year break, he has worked with MoveOn.org, helped organize Tech4Dean (volunteer computer professionals working for Howard Dean's Presidential campaign), and is now working to bring together volunteer computer professionals from the various Democratic Presidential campaigns.
    Steve LUSSIER is a recent MIMS graduate.
    Henri POOLE is technical advisor to Dennis Kucinich, Free Software Foundation Board member, and former CEO of Mandrakesoft.
    Dan ROBINSON is co-founder and CTO of the E-Volve Foundation, a strategic think-tank that works with non-profits and public interest groups to create definitive models for technology-enabled organizing; he helped set up and run the East Bay for Dean web site.
    Philip WOLFF hails from Oakland, California. In the last year he presented at the ProjectWorld conference (project blogging), BlogTalk Wien (the future of blogging), and BloggerCon (blogging behind the firewall). He posts regularly to a klog apart and Blogcount, and in moments of apoplexy to Don't Blog: Blogging the Weblog Backlash. Phil has been blogging for 5 years, computing for more than 30 years, a marketing and technology veteran of the Naval Supply Systems Command, Gateway, Compaq, Wang Laboratories, Bechtel National, and Adecco SA where he served as global VP for strategy and technology. When Phil isn't helping companies rethink their employment sites, his Evanwolf Group helps them develop strategies, plans and technologies for workplace blogging. Phil created and organized the eastbaykerry.com web site and the KerryTech email group. Contact: Ryze, contact me.

Mar 19: Daniel GREENSTEIN, University Librarian and Executive Director, California Digital Library: Economics of Scholarly Publishing. California Digital Library: Economics or Scholarly Publishing.

Mar 26: No seminar - Spring recess.

Apr 2: Michal FELDMAN: Economic Incentives for Cooperation in Peer-to-Peer Networks..
    Peer-to-peer (p2p) systems enable resource sharing between individual peers, who are expected to voluntarily contribute their own resources to the system. However, contribution consumes their resources and may impair their own welfare. Therefore, many users prefer to “free ride” on the system’s resources; consuming the system’s resources without also providing their own. The inherent tension between individual rationality and collective welfare produces a misalignment of incentives, which threaten to degrade the system performance. In this talk, I am going to give an overview of the problem, and discuss several projects I’m involved in, which attempt to develop a framework for understanding the technical and economic characteristics of p2p systems and design economic mechanisms for incentive compatible p2p applications.

Apr 9: Rob SANDERSON, Liverpool Univ.: A Discussion of the SRW 1.1 (Search and Retrieve for the Web) Protocol.
    SRW 1.1, the first stable version of the ZiNG initiative's Search/Retrieve Web Service, was released in mid February with significant improvements over version 1.0. It is an XML oriented protocol designed as a low barrier to entry alternative to Z39.50. This discussion will quickly cover the basics of SRW and how they compare to Z39.50, before going over the new aspects of the protocol in more detail. More at http://srw.cheshire3.org/
    Rob Sanderson is the Senior Editor for SRW and recently graduated with a PhD from the University of Liverpool. He works on the Cheshire project, currently implementing a new distributable version of the server. More at http://www.o-r-g.org/~azaroth/

Apr 16: Two topics building on previous discussions:
    Aitao CHEN & Michael BUCKLAND: Infrastructure for Disambiguating Entities and Events.

    Natural language processing is being used to detect and disambiguate named entities and events: The extraction of named entities (person, place, organization, and institution) from texts; the detection of relations between named entities; the disambiguation of place names; and the translation of foreign named entities into English. How might traditional reference tools, such as gazetteers and chronologies, be used to improve performance?
    Jeanette ZERNEKE & Michael BUCKLAND: Redesigning Scholarly Publishing - Part 1.
    Dan Greenstein argued persuasively on March 19 that the present system of scholarly publishing is broken and that new methods need to be designed and tested. We will take up this challenge on April 16 and 23. What constitutes good practice in e-publications that go beyond static documents to include dynamic content? How can scholars get credit for the additional work it takes to create good digital projects? How can these efforts be preserved over time? What institutional changes are indicated? Major issues include practices for peer reviews of electronic publications, IT architecture and data formats for electronic publications, how to incorporate distributed internet data, and persistence of dynamic, interactive publications.
    Some years ago the Electronic Cultural Atlas Initiative, in collaboration with the California Digital Library, developed an electronic publications program to provide stable, long-term access to peer reviewed, map-based digital scholarship in history and the humanities. These publications include text components, web resources such as images, web-based maps, and fully interactive downloadable maps. Standards and processes for creating the publications include conducting peer reviews of both their technical architecture and scholarly content. The first ECAI publications are A Sasanian Seal Collection in Context and Mapping the Mainline: Using Historical GIS to Study American Religion. See http://ecai.org/projects/epublication s.html.
    What would be the optimal strategy for a renewed e-publication program that would be an effective contribution to the wider problem? Come and help plan such a program.

Apr 23: Vivien PETRAS: The Use of Specialized Vocabulary in Subject Searches
    The language problem in information retrieval is in essence a problem of expressing a vague information need in a query processable by an information retrieval system. Search uncertainty arises because a searcher might not know how to state the information need as a textual query; or might use language that does not match the language used to describe the concept in the document set; or might use terms (words or category codes) that are not the best choice for finding the most relevant documents or even mostly relevant documents. Entry Vocabulary Indexes match searcher language (terms that the searcher thought of for an initial query) with controlled vocabulary terms from the document set in order to improve query formulation. Two main advantages arise: the searcher is presented with terms in the system's vocabulary describing the information need (based on initial query) thereby not only providing support for query formulation and expansion (without additional learning effort by the searcher) but also increasing the probability of successful retrieval because more (search)effective controlled vocabulary terms are used. I will describe the idea of using research specialties to specify search spaces within a general document set and the idea of using the specialized technical language of specialties to distinguish between specialties in an information retrieval system. It is proposed to show how the specialized language used in the searcher's specialty can be used to improve Entry Vocabulary Indexes to make this process even more precise.

Apr 30: Mikhail AVREKH: FreeDB and Other Music Metadata Providers: The Hidden Linchpins of the P2P Phenomenon.
    I will discuss the metadata infrastructure that supports p2p networks, as well as some historical antecedents having to do with providers of metadata in other fields (e.g. the library community).
    Also Merrilee PROFFITT, RLG: RedLightGreen.
    One year ago, RLG was preparing to launch RedLightGreen, a free online service based on the records in the RLG Union Catalog aimed at college undergraduates and optimized to provide access to a wealth of high quality, trusted, print resources through a simple, easy-to-use interface. Last spring, we gave a presentation in this forum that highlighted use of FRBR, MARC in XML, data mining, user studies, and future directions. Now, with a full semester and more of academic trial use, further user studies, and with continued funding from the Andrew W. Mellon Foundation, RLG will report on:
• Who's using the system, and how?
• Further findings from extended user studies, and how user studies have specifically influenced interface design and helped dictate future directions for the service
• How institutions can join an expanded partnership for RedLightGreen - for free
• Planned future directions for the service

May 7: John WIECZOREK and John McCARTHY: Global Biodiversity Information Facility: Information Retrieval from Federated Databases
    The Global Biodiversity Information Facility Network (GBIF) has designed and begun to deploy a network architecture "to enable users throughout the world to discover and put to use vast quantities of global biodiversity data, thereby advancing scientific research in many disciplines, promoting technological and sustainable development, facilitating the equitable sharing of the benefits of biodiversity, and enhancing the quality of life of members of society." A recent review (by John McCarthy) said that "GBIF is well on its way toward becoming one of the premier examples of a successful federated database network. Moreover, they have done so thus far on a remarkably modest budget by using widely used modern software, protocols and standards."
    John WIECZOREK is project programmer for the Mammal Networked Information System (MaNIS) at UC Berkeley's Museum of Vertebrate Zoology (MVZ www.mip.berkeley.edu/mvz/). He is co-architect and implementor of the Distributed Generic Information Retrieval (DiGIR) protocol and software and a member of the GBIF Data Access and Database Interoperability (DADI), Digitisation of Natural History Collections (DIGIT), and Science (SCI) working groups.
    John McCARTHY has been working on databases at LBL since 1980 (and elsewhere for 15 years before that); he has participated regularly in the Friday Afternoon Seminar over the years.
    May 7 is the last Seminar meeting of the semester.



Aug 13: Julian Warner, Belfast: A labor theoretic approach to information retrieval
    In the post-Edenic world, we are condemned to labor and compelled to choose. Labor has often been conceived as physical rather than mental labor and has seldom been connected with choice. Yet in information retrieval systems, mental labor and choice can be seen to converge. Examining their convergence can yield an evaluative model for information retrieval, closely linked to real world forces and practice.
    The essential aim for information retrieval systems is taken as selection power rather than the transformation of a query into a set of relevant records. Selection power is regarded as the product of selection labor. Selection labor is taken to be composed of description labor and search labor.
    A certain quantity of selection labor, associated with the number and variety of objects described within a system, is assumed. Components of selection labor can be transferred to information technologies and the distribution of selection labor between description and searching can be modified, but the overall quantity of labor cannot be reduced. Description labor is distinguished from description processes, and, more sharply, from description products (such as catalog or database records).
    Two significant constraints can work against enhancing selection power. First, the costs of direct human labor in description and searching may lead to a preference for economy in the use of that labor. Secondly, there are epistemological difficulties in representation of objects. These constraints have tended to be considered separately, with more attention given to epistemological issues. Constraints arising from the costs of human labor may have been more influential on practice.
    A decision framework for the consideration, design, and use of information retrieval systems is then constructed: description labor should be increased, and selection power enhanced, until the costs of that description labor approach the anticipated costs of search labor.
    The implications of this decision framework for information retrieval systems are considered. A reduction in human description work and a transfer of labor and expertise to the searcher, for many public systems, are both predicted and exemplified.
    The framework for consideration established demonstrates the analytic and practical value of a revealing theory.
    Julian Warner is in the School of Management and Economics, The Queen's University, Belfast, and was Visiting Scholar here in 1991/92.

Aug 22: To be announced.
The Seminar will resume in the Fall semester starting Friday September 3.   Spring 2004 schedule.   Fall 2003 schedule and summaries.