School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Fall 2012.

Fridays 3-5. 107 South Hall. Schedule. Weekly mailing list.
Summaries will be added as they become available.

Friday, Aug 24: Vivien PETRAS, Merrilee PROFFITT and Max KLEIN.
    Vivien PETRAS, Humboldt University, Berlin: Recent Studies.

    A brief overview of some of her recent and current work.
    Max KLEIN and Merrilee PROFFITT, OCLC Research: Wikipedia and Libraries: What's the Connection?
    We'll talk about synergies between libraries and Wikipedia, what OCLC Research has been doing to connect libraries and Wikipedia to ensure that those who start their research in Wikipedia can find resources in libraries. We'll talk about our VIAFBot project and a range of potential projects that we've identified that will be interesting to both librarians and Wikipedians.
    Max Klein is OCLC Research's Wikipedian in Residence. He is working with OCLC Research as community coordinator to explore and pursue mutually beneficial projects between OCLC, library stakeholders, and the Wikipedia community through a range of activities, including working with OCLC staff and libraries to help foster a broader understanding of Wikipedia's practices.
    Merrilee Proffitt is a Senior Program Officer in OCLC Research. She provides project management skills and expert support to institutions represented within the OCLC Research Library Partnership, with a special focus on increasing visibility of archives and special collections.

Friday, Aug 31: No Seminar meeting.

Friday, Sep 7: Ray R. LARSON: The Social Networks and Archival Context Project: Status Report.
    The SNAC project has been funded for another two years by the Mellon Foundation. This talk will review the SNAC project, and examine the new data being included in this round. This includes over 2 million MARC archival records from OCLC, as well as EAC data from the British Library, the Bibliothèque National de France, and EAD records from the UK Archives Hub. More at
    Also: Michael BUCKLAND: Editors' Notes: Update.
    The "Editorial Practices and the Web" project is a collaboration in making available working notes created in the preparation of documentary editions of historically important papers and notes made by curators of library special collections. A shared website for making notes available is now openly accessible without a password. More at and

Friday, Sep 14: Eric KANSA: Sometimes Data is Best Served Cooked, Rather than Raw: Scholarly Publishing and the Web of Data.
    There is a great deal of interest in the sciences and humanities around how to manage "data." By "data," we usually refer to content that has some formal and logical structure needed to meet the requirements of software processing. Data quality issues play an important role in shaping professional incentives to participate in data dissemination and in issues of trust and reliability around the use of shared data. As in the case with other areas of scholarly production, researchers need appropriate workflows to edit, review, and improve the quality of shared data. This presentation will explore how the transactional nature of data helps shape this workflow. Because use of data is heavily mediated by software, datasets can be seen as an integral part of software. This thinking motivated us to experiment with using software debugging and issue tracking tools to help organize collaborative work on editing data. Debugging and issue tracking tools, widely used to improve software quality, can play a similar role in the "debugging" of data.
    Finally, such editorial workflows need to also take into account issues of context. To be more useful, datasets need to be understood and related to other information available on the Web. This is particularly true for archaeology, an inherently multidisciplinary domain with inputs from the humanities, history, and natural sciences. Beyond the research community, much information relevant to archaeology is routinely collected through government administrative processes relating to environmental impact regulations and historical preservation laws. "Linked Open Data" methods can help to better contextual research data both with other datasets, other forms of scholarly communications, and records maintained by government institutions.
    Eric Kansa is Director of the Open Context project, a service for the publication of research data in archaeology. Open Context works closely with the California Digital Library for long-term archiving and curation of digital data. See He is also affiliated with the ISchool.

Friday, Sep 21: Sara NOFRI, Univ. of Hamburg, Germany; Visiting Scholar, Journalism:
    Google and its Consequences for Information Retrieval.

    Google currently dominates Internet searches, results, information seeking patterns, cognitive patterns, and our ways of looking for things. Google has crawled and catalogued images, videos, sounds, and texts, and has categorised them by location, appearance, dimension, time, and in other ways. Further, string searches have become increasingly important, as Google's algorithms keep improving, guessing for the searcher, adding stuff one does not want to retrieve, and obscuring stuff that one is looking for. So much that is not spread by word-of-mouth now comes from Google. It is difficult to imagine what could counter this influence.
    What are the challenges facing Google and its users? What are the elements that still prevent effective information retrieval or manipulate the search and retrieval processes? My presentation will address two challenges: The language (and cultural) challenge; and, in more depth, the truthfulness challenge. In this situation how can information retrieval be made more successful?
    Sara Nofri has a PhD in Communication Science from the University of Hamburg, an MA in Conference Interpreting and Translation Studies from the University of Bologna, and studied Political Science and Scandinavian Studies at the Ruhr-University Bochum. She has studied, traveled, taught, and conducted research in several different European countries. Her doctoral thesis compared linguistically, quantitatively, and qualitatively the coverage of environmental issues in Sweden, Italy, Germany and the U.K. Since 2006, she has been teaching at several departments at the University of Hamburg. Currently, she is working at a startup project for creating software aggregating Internet data and performing partially automated qualitative analyses of those data.

Friday, Sep 28: Clifford LYNCH: What We Still Don't Know about Institutional Repositories.
    The higher education and library world, in the US and globally, now has about ten years of experience with the depolyment of instutional repository services; yet there is still a surprising amount that we don't understand. After a brief review of the evolution of institutional repositories, I'll focus on the status of a series of these open research questions, including: how to measure success; metadata-related strategies and integration challenges; institutional and disciplinary repository relationships; data intensive scholarship and repositories; and provisioning repositories as use environments.

Friday, Oct 5: Brian CARVER and Michael LISSNER: The Court Listener.
    Michael Lissner and Brian Carver created, an alert service covering the U.S. federal appellate courts, a database of opinions with >750,000 documents, including the Supreme Court corpus since 1754. Rowyn McDonald and Karen Rustad created a legal citatory on it. The site is provided at no cost. The site's code is under open source licenses. All documents, with citation relationships, can be downloaded in bulk providing a large corpora of English-language documents research and collaboration. As time permits, we'll discuss: (1) Frequent (often angry) requests that court opinions be removed from their site or blocked from discovery by search engines and a study of requests made. Who makes such requests and why. Recommendations for a balance between access and privacy. (2) There 100+ court websites, many with poor funding or prioritization. Gaining a higher-level view of the law can be challenging. "Juriscraper" is a new project, spun out of CourtListener, designed to ease problems for those who wanting court opinions daily. The project is under active development, and we are looking for others to get involved. Michael Lissner will describe the difficulties and complexities of building Juriscraper, and discuss some of our solutions to these problems. (3) Finally, we will describe how Rowyn and Karen built an online application which detects citations in court opinions, creates links between cases, and tracks the resulting citation network. Problems encountered and solved, and directions for future work, with a demo of the citator.

Friday, Oct 12: Laurie PEARCE, Patrick SCHMITZ and Davide SEMENZIN:
    Ancient Families, Modern Tools: Berkeley Prosopography Services.

    BPS is a set of services for prosopographic analysis developed at Berkeley in response to historians' needs to mine prosopographic data from text corpora, supporting study of societal relations among documented individuals. BPS supersedes the limitations of traditional pen-and-paper research by providing researchers with a flexible and intuitive corpus-based toolset for data processing, analysis, and visualization. From its inception, BPS was required to be generalizable, scalable, corpus agnostic, extensible, and universally accessible. BPS' innovative and unique contribution as a research tool is in the support for the promulgation and exploration of counterfactual assertions within the context of corpora curated by domain-experts, while preserving domain integrity and tracking intellectual contribution and authority. Challenges encountered in BPS development include: translating the high-level requirements of humanities researchers into technically-sound designs, abstract modeling of probabilistic networks, and deployment of the tools as reusable web-services.
    Laurie Pearce is lecturer in Assyriology in the Department of Near Eastern Studies, Berkeley. She specializes in the social and economic history of Mesopotamia (modern Iraq) in the late first millennium BCE. The legal texts from Hellenistic Uruk, which serve as the development and demonstrator corpus for Berkeley Prosopography Services (, are the core component of her project Hellenistic Babylonia: Texts, Images and Names (, a component of the Open Richly Annotated Cuneiform Corpus.
    Patrick Schmitz is a Semantic Services Architect and Manager of the IST Research Technologies Architecture and Design group at U.C. Berkeley, where he focuses on bringing semantic intelligence to cultural heritage communities. He is the technical lead for the CollectionSpace project and senior architect on Project Bamboo and was previously in research groups at Microsoft, Yahoo!, and CWI in Amsterdam. He has extensive experience as system architect and software developer on multimedia and information management platforms, has co-founded several tech startups, and is active in W3C working groups. He has a BA in Computer Science and a Masters in Information Management and Systems from U.C. Berkeley, and is also a Lecturer in the I School faculty.
    Davide Semenzin is a visiting scholar working on his Masters project for the University of Utrecht. His areas of focus include Social Network Analysis and Graph Visualization.

Friday, Oct 19: Michael BUCKLAND & AnnaLee SAXENIAN.
    Michael BUCKLAND: Lodewyk Bendikson and the Copying of Documents.
    Printing allowed mass-produced copies of documents, but easy, accurate one-off copying was not achieved until the 20th century with specialized photographic techniques: first photostat (developed for humanities scholars) then microfilm (for banks). Photography also allowed image enhancement and document forensics. Explained through the work of a Dutch-born Californian, Lodewyk Bendikson (1875-1953).
    More at
    4:00 p.m. AnnaLee SAXENIAN: Online Education: The I School, Berkeley, and Beyond.
    I'll talk about online initiatives at the I School in the context of current experimentation on the Berkeley campus and beyond. More at

Friday, Oct 26: Laine FARLEY and Patricia MARTIN, California Digital Library: Return On Investment: Is It Possible to Demonstrate Value for Digital Library Services?
    As the financial crisis has deepened in higher education, administrators and others have called for more evidence of value and return on investment, even for such core services as libraries. Others have argued that this approach simply can't apply to library services. The California Digital Library attempted to develop various measures of value and return on investment (ROI) to its services and will examine where these approaches may be credible and where there are gaps.
    Background: "Stop the Madness: The Insanity of ROI and the Need for New Qualitative Measures of Academic Library Success by James G. Neal
    LibValue database:
    Laine Farley has been the Executive Director of CDL since 2008 and has been with CDL since its inception in 1997.
    Patricia Martin is Director of Discovery and Delivery Services at CDL which includes the Melvyl Catalog, UC-eLinks, the Request service for ILL and new metadata management services in support of HathiTrust and the Center for Research Libraries' Print Archiving and Preservation Registry.

Friday, Nov 2: Howard BESSER, New York University:
    Strategies for Copying Out-of-Print Works: NYU's "Video At Risk" Project.

    Howard Besser will discuss the Mellon-sponsored "Video At Risk" (VAR) project, paying particular attention to findings and strategies that might prove useful to other libraries. Among the topics he will cover as well as discuss methodology for are: VAR's studies that have shown that a significant number of mass-produced works purchased by academic libraries for their circulating libraries are now both out-of-print and held by few libraries; VAR's Guidelines for interpreting Section 108 Copyright laws, and experiments to support claims around "deterioration"; the severe problems with inconsistent cataloging in OCLC records; and the development of guidelines for digital reformatting of video. He may also discuss the development of an open-source tool for extracting metadata in a manageable form from digital video files.
    Howard Besser is a Professor of Cinema Studies in NYU's Tisch School of the Arts, and founding Director of NYU's Moving Image Archiving & Preservation Program.

Friday, Nov 9: Short Reports.
    Once a semester we devote Seminar time to miscellaneous short reports, especially highlights of recent, distant, expensive conferences. Today's Seminar will include reports on the American Society for Information Science & Technology Annual Meeting and the Pre-conference in the History of Information Science; the first and second National Archival Authorities Cooperative (NAAC) meetings to build a National Archival Authorities Infrastructure; and the International Conference on Theory and Practice of Digital Libraries (successor of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL)) in Cyprus.
    Attendees will be encouraged to contribute additional brief reports on other meetings and on their current, recent or forthcoming projects.

Friday, Nov 16: David ROSENTHAL, Stanford: The Truth Is Out There: Preservation and the Cloud.
    With the recent introductions of DuraCloud, Preservica, Glacier and others, preservation has joined most other applications in being offered as a cloud service, Preservation as a Service (PaaS). Does PaaS make technical, economic or business sense? What characteristics make applications cloud-friendly? If outsourcing to third-party cloud services is such a great idea, why do companies that get big enough all build their own clouds?
    David S. H. Rosenthal, has been an engineer in Silicon Valley for more than a quarter-century. He co-founded and is Chief Scientist of the LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford University Libraries. He was an early employee at Sun Microsystems, and employee #4 at NVIDIA.

Friday, Nov 23: Thanksgiving. No seminar meeting.

Friday, Nov 30: Clifford LYNCH: Research Agendas in Personal Digital Archiving.

    A number of important technical, economic and social developments -- many aspects of commerce and consumer activities, correspondence, personal photography, and some uses of social media -- have been grouped together under shorthands like "digital lives" or "personal digital archiving", and in the past few years there has been at least one annual conference (the next being held in College Park MD in Feb 2013) looking at this area. Recently I've been working on a concluding chapter for a book of essays on personal digital archiving, and after giving a brief overview of the scope of the field and its interconnections to other related areas, I'll share some of the ideas that I've developed about the key research issues and agendas in this area.

The Seminar will resume in January 25, 2013.

Fall 2012 schedule.   Spring 2012 schedule and summaries.   Spring 2013 schedule and summaries.