School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Spring 2009.

Fridays 3-5. 107 South Hall. Schedule.
Summaries will be added as they become available.

Friday, Jan 23: Clifford LYNCH: Introduction to the Seminar.
    Michael BUCKLAND: Remaking Access to Reference Resources.

    Understanding depends on knowing the context and background of any topic, so in a print and paper world a reference library is an excellent place to read, to write, and to learn. But non-digital solutions are obsolescent and the special amenity of the reference collection has not yet made an effective transition to the digital library environment. In hindsight, the print on paper codex-based was less than perfect in several ways. Research on reference service has focused on empowering the reference librarian rather than the even more important task of empowering library users. So how could we combine the convenience of Google and the Wikipedia with the selectivity and trustworthiness of the library reference collection? Our projects supported by IMLS and NEH suggest how we might achieve the best of both worlds. The talk will discuss the issues, demonstrate prototypes being developed, discuss the possibilities, and invite comments.
    Background reading: Library reference service in a digital environment, Library and Information Science Research 30, no 2 (2008): 81-85.

Friday, Jan 30: Clifford LYNCH: Networked Instrumentation for the Humanities and Social Sciences.
    One of the key ideas of e-science cyberinfrastructure is the integration of large scale observational instrumentation into the network, data storage and computation fabric to support scientists; data collected or derived from these instruments might be shared among large numbers of researchers and in fact also repurposed to support teaching and learning at all levels. Recently, I have been thinking about the implications of having large corpora of literature and discourse -- both historic and current -- available on the net, and what this might offer for the creation of new networked instrumentation in the humanities and social sciences, building on enabling technologies like text mining. In this discussion, I'll outline what I believe are some of the possibilities here, and also some of the issues involved in access to data sources.

Friday, Feb 6: Fredrik WALLENBERG: Judging a Book by it's Cover: Online Previews and Book Sales.
  There has been a continued debate about the benefits and cost associated with providing free samples of information goods on the Internet. Some argue that the samples lead to increased sales through increased awareness of the good while others claim that the previews and samples cannibalize sales. In this paper I present a unifying model where we show that information about the good, specifically samples/trial versions/previews, etc., of the good have both a sales promoting and cannibalizing effect and that either of the two can be dominant. I then set up an experiment in which I look at the impact on the sales of a specific set of books from the enabling of searching the contents and previewing of pages relevant to the search query. I find no significant impact on sales from the previews. The sample available to me is however on the small side and also from a very specific genre, both of which impact the results.

Friday, Feb 13: David GREENBAUM, Steve MASOVER, and Rich MEYER: The Bamboo Planning Project - Developing "Cyberinfrastructure" for the Arts, Humanities, and Interpretive Social Sciences.
    We will give an overview and update on the Bamboo Planning Project, Bamboo is a multi-institutional, interdisciplinary, and inter-organizational effort that brings together researchers in the arts and humanities, computer scientists, information scientists, librarians, and campus information technologists to tackle the question: How can we advance arts and humanities research through the development of shared technology services? The Bamboo Planning project, funded by the Andrew W. Mellon Foundation and led by UC Berkeley and the University of Chicago, is half way through its 18-month planning and community design process. Over 100 universities, colleges, and organizations concerned with digital humanities have participated in the planning process to date. In this presentation we will discuss some of the cultural, organizational, and technical opportunities and challenges in attempting to develop cyberinfrastructure across a highly diverse set of scholarly practices and institutions.
    David Greenbaum, Steve Masover, and Rich Meyer are staff members in the campus's Information Services and Technology division.

Friday, Feb 20: Students' Progress Reports and 2009 i-School Conference report.
    Megan FINN: Information Practices after Modern California Earthquakes
    This work will be a chapter of my dissertation which situates the information practices of early internet users after the 1989 earthquake in the context of California earthquakes. My hope is to make an argument that information practices after an earthquake are more enduring than the latest information technology.
    Patrick RILEY: The Metadata of Live Television.
    What if you could search for what is being mentioned on TV? What if you wanted to get an email alert as soon as the Berkeley iSchool is mentioned on live TV? Is there information we can gain from indexing everything ever said on TV at any given time, in terms of linguistics research, media monitoring, fact-checking, social media, and search query trends? Is TV metadata even important compared to the enormous production of social, user-generated media?
    A new search engine created at the Berkeley iSchool for indexing the metadata of live TV will be demonstrated by Patrick Riley, a Ph.D. student at the iSchool, and a discussion on metadata, copyright, media monitoring, and data mining will follow.
    Report and Discussion on the 2009 i-School Conference by Daniela Rosner, Christo Sims and maybe others.
    The Fourth iSchools Conference brought together scholars and professionals who come from diverse backgrounds and share interests in working at the nexus of people, information, and technology. "The conference celebrates and engages our multidisciplinary efforts to understand the scholarly, educational i and engagement dimensions of the iSchool movement." A report from people who attended. See

Feb 27: Tuukka RUOTSALO and Mikko VILLI, Finland.
    Mikko VILLI, University of Art and Design Helsinki UIAH: Visual mobile communication. Camera phone photographs in mobile messaging.

    [Brief presentation only]: I concentrate on photo messages, i.e. photographs taken with a camera phone and sent to another mobile phone. A salient aspect is the convergence of phone and camera -- the phone being a communication device intended mainly for interpersonal communication, and, on the other hand, the camera being a device devoid of any means to directly communicate with other people. From this disparity rises the central question in my research: How does the convergence of photography and mobile phone communication affect our communicational and photographic practices? More at
    Tuukka RUOTSALO, Finnish CultureSampo, Helsinki University of Technology (TKK): Cultural Heritage on the Semantic Web.
    Cultural Heritage has recently become an important application area for semantic technologies. The current semantic technologies enable powerful ontology-based search and browsing capabilities for digital collections. However, many bottlenecks of semantic systems can be identified: 1) quality of ontologies, 2) mediation of heterogeneous content and 3) information visualization and access. I will present the publication concept and the online semantic portal CultureSampo, a system of creating a collective semantic memory of cultural heritage on a national level:
    The system addresses the challenge of aggregating highly heterogeneous, cross-domain cultural heritage into a semantically rich intelligent system for human and machine users. In addition to the CultureSampo system, I will present methods and tools for collaborative ontology development, search, and natural language processing developed within the National Semantic Web Ontology Project in Finland (2003-2007, 2008-2010) (, and an ongoing work on context-aware mobile search/recommending in the EU FP7 project SmartMuseum (
    More information about the Semantic Computing Research Group at the Helsinki University of Technology and University of Helsinki:

Friday, Mar 6: Group tour of remodeled Bancroft Library.
    This week a guided group tour of the newly remodeled Bancroft Library. Assemble as usual in South Hall 107 promptly at 3:10 p.m. and we will go over as a group. See

Friday, Mar 13: Michael BUCKLAND and Ryan SHAW: Editing Historical Papers in a Digital Environment.
    The editing of historical papers in projects such as the Emma Goldman Papers Project here on campus is a challenging undertaking in several ways. Such projects are hard to fund. The traditional product is a set of printed volumes which constrain the amount of editorial research that can be published. Relatively little use is made of digital technology and with current methods substantial duplication of effort appears to be unavoidable.
    In recent months we have been exploring ways in which some efficiency and return on investment might be improved. Some efficiencies might result from improved search support and editing tools. A new genre of web-published "Editors Notes" might very beneficially complement the print volumes, improve access, and reduce duplicative effort.
    I will lead a discussion of the challenges and how they might be addressed.

Friday, Mar 20: Paul DUGUID: The World According to grep: What Have We Been Searching For?
    In recent years the Internet has increasingly been defined by search, its resources reached primarily through a search box. While the Internet is new, search of course is not. And though modern search may appear to endorse the idea that we have always been foraging for information, and that progress has involved shrugging off old encumbrances in order to make information increasingly "free" and autonomous, this discussion hopes to put the history of search in an alternative light and so doing clarify some of what is and is not new and perhaps what is and is not possible for the developing world of digital search.
    Also Clifford LYNCH: An Update on Institutional Repositories and Development of International Repository Infrastructure
    We have looked several times at the evolving role of institutional repositories. In this discussion, which follows on an international meeting earlier this week in Amsterdam looking at the roadmap for inter-repository infrastructure, I will look at some of the goals that we might hope to achieve in the evolution of inter-repository infrastructure and inter-repository interoperability of various kinds, and some of the technologies and standards that may be helpful in achieving these goals.

Friday, Mar 27: Spring Break. No Seminar meeting.

Friday, Apr 3: Jeanette ZERNEKE: Update and Issues in Digital Humanities.

    This presentation will outline some recent developments in digital humanities infrastructure, humanities computing, visualization tools and techniques, and digital scholarship. Three primary themes will be outlined: infrastructure; scholarly processes; and visualization. Information from several recent conferences will be incorporated, including the Electronic Cultural Atlas Initiative / Pacific Neighborhood Consortium joint fall conference (Hanoi, Dec 2-6, 2008), the Visualizing the Past workshop (University of Richmond, Feb 20-21, 2009), and CAA 2009 (Computing and Computational Methods in Archaeology, Williamsburg, March 22-26, 2009) We hope to spark a lively discussion of each theme.
    Jeanette Zerneke is Director of Information Technology for the Electronic Cultural Atlas Initiative (ECAI) and was until recently Director, Information Systems and Services, for International and Area Studies.

Friday, Apr 10: Interim Progress Reports: Ryan SHAW, Patrick RILEY, and Megan FINN.
    Ryan SHAW: Providing Context for Historical Documents.

    Tremendous resources are being invested in digitizing historical documents. These investments promise to dramatically improve our access to the documents of the past. Yet simply finding and accessing a document does not, in itself, enable understanding. Effective use requires understanding a document's context. Traditionally a library's reference collection has provided various tools for assembling such contextual understanding. How might we repurpose and augment these tools for a networked environment? In this talk I will present my research into representing and providing in a networked environment contextual information of the kind typically found in reference works. In particular, I will discuss my ongoing dissertation research, which focuses on "events" as a topical category and examines the feasibility of developing "event directory" services that aggregate and provide basic information about historical events and their relationships. I will argue that the design of such services should be grounded in a clear understanding of the nature of historical events and how they function as concepts for organizing our understanding of the past, and that work in the critical philosophy of historiography can aid such understanding.
    Megan FINN: History of Post-earthquake Communication in California.
    An update on the archival research I have been doing for my dissertation chapter that examines Californian's communications related to earthquakes. This chapter will help to situate my study of information practices after the 1989 Loma Prieta Earthquake.
    Patrick RILEY: The Necessity and User Expectations of Real-time Indexing.
    With so much live information being shared on the internet through various communication channels, internet users are starting to expect completely up-to-date search results. However, truly current searches require real-time updating of the indexes and document weighting (if used) as well as real-time execution of searches. There currently exist various attempts at real-time indexing, with twitter providing for searching tweets, and Google with search results with news feeds, but how effective are these, and how can we make this better?

Friday, Apr 17: Mari MILLER and Avi RAPPOPORT.
    Mari MILLER: The Economics of Open Access.

    Mari Miller, Librarian and Liaison to the I School, will review issues in the economics of open access and resources she has been exploring for a bibliography on the open access movement (books, articles, blogs, websites, etc.). She will show where they are located online and in the Library, and propose a collaborative model for developing it further.
    See updated webliography at
    Avi RAPPOPORT: Metadata, Sex, and Amazon.
    Amazon failed in a big way on Easter weekend, and because it is responsible for about a third of all electronic commerce in the United States, it matters. If Amazon won't sell a book, or will sell it but will "de-list" it, the book practically disappears. The ways Amazon failed are many: it did not (and still doesn't) have a clear policy on adult (sex and sexuality) content, and there is evidence that it deals with adult materials in special ways. It placed too great a reliance on metadata. The technical infrastructure was too flexible, allowing changes without approval. Its communications to its customers, authors and the media were worse than nothing. And it had the bad luck to make a significant mistake regarding people who are highly articulate and communicative, at a moment when there are technology tools to support them, and the bad judgment to stay silent hoping it would go away.
    Avi Rappoport is a metadata and search engine consultant with Search Tools Consulting <>.

Friday, Apr 24: Clifford LYNCH: The Scholarly Journal and the 4th Paradigm of Science.
    I've been asked to write a short chapter for a book being published in celebration of the work of the late Jim Gray (who has spoken in the Seminar in the past), particularly his vision of a 4th paradigm of scientific inquiry based on data intensive science within which the earlier approaches of theory, experiment and simulation might be unified. My talk will present and test the main theses of this chapter. After very briefly summarizing some of Jim's ideas, I'll explore what this 4th paradigm might mean for the evolution of scholarly communication and the scholarly record, both looking backwards at the evolution of the traditional scholarly journal article and also into more speculative ideas such as open notebook science.

Friday, May 1: Students' Final Progress Reports.
    Ryan SHAW: Mining Events from Wikipedia.

    Last semester I presented progress on mining texts for descriptions of events by looking for statistically significant co-occurrences of dates and names. This semester I will present progress on mining descriptions of events from a rather more structured source: Wikipedia chronologies. Wikipedia has a great many chronology or timeline articles that are rich sources of 1 or 2 sentence event descriptions. By scraping these articles and parsing the individual chronology entries into event representations, using the Wikipedia links as a high-quality form of named entity detection, I can quickly assemble databases of events. I have been experimenting with making these events available on the web as Linked Data and queryable via SPARQL.
    Megan FINN: The History of Communication After California Earthquakes.
    I will present a final outline for my chapter on the history of communication after California earthquakes paying particular attention to information practices of ordinary people.
    Patrick RILEY: Giving Relevancy to Twitter: A new approach at Real-time Search.
    Twitter search is an interesting "pulse" of what people are thinking and talking about, and offers a new source of live, and most importantly, public information. However, the search engine purchased by Twitter provides only chronological search results, with no offer or consideration of search priority based on relevancy. A popular and successful way of determining "relevancy" on the internet is the HITS or similar PageRank algorithm, which depends on web publishers to show their level of support by linking to the most important pages, which isn't possible in a "real-time" source of information. In this presentation, I will present a search algorithm for assigning a factor of relevancy to Twitter search results.

Friday, May 8: No Seminar Meeting.

The Seminar will resume on August 28.

Spring 2009 schedule.   Fall 2008 schedule and summaries.