School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Fall 2010.

Fridays 3-5. 107 South Hall. Schedule.
Friday, Aug 27: Clifford LYNCH: Introductions. New Infrastructure and Instruments for the Human Sciences, and Implications for the Locus of Research and the Evolution of Disciplines in the Academy.
    Introduction to the seminar and plans for coming weeks. Participant introductions, with comments invited on interesting things that they have seen or done over the summer.
    This talk, based on a paper currently in draft, examines several developments. I explore a broad view of the prospects for observational and analytic instrumentation and accompanying cyberinfrastructure to support new kinds of inquiry in a wide range of disciplines, including the humanities, the social and behavioral sciences that are practiced both inside and outside the traditional academy. A key point is that in contrast to the physical and life sciences, where most instrumentation and cyberinfrastructure is intentionally purpose built, here we find that many key systems depend on, or are the result of commercial activity, of the transition of other media and formerly empheral human interactions to more persistent and computational accessible digital forms, of government needs, or of citizen humanities activities. Finally, I'll speculate about where research using such tools is likely to take place, and how this relates to the evolution of the traditional academic disciplines.

Friday, Sept 3: No semester meeting.

Friday, Sept 10: Ryan SHAW: Events and Periods as Concepts for Organizing Historical Knowledge.
    Summary of dissertation research. Events and periods are not objectively existing phenomena, but concepts we use to organize our knowledge of history. They make historical change comprehensible and help us orient ourselves with respect to the history of the culture in which we participate. Thus they are indispensable for describing both the content of history scholarship and the context of documents that serve as evidence for that scholarship. As historical discourse shifts its emphases and new aspects of the past come to be considered significant, periods and events are subject to constant change. Despite this change, we can model periods and events in systems of knowledge organization because it is possible to discern and formally describe relatively stable recurrent patterns in their narration.

Friday, Sept 17: Charles van den HEUVEL, Regents' Lecturer: Interface as Thing: Annotation and Visualization of Historical Evidence in e-Research.
    Michael Buckland coined in 1991 the phrase "information as thing" and discussed this concept in relation to evidence. Buckland explored his notion that information-as-thing can be seen as evidence further by analyzing types of information: data, objects and documents from a historical perspective. One of the key figures in his historical exploration of the term document was the Belgian pioneer of knowledge organization Paul Otlet (1868-1944). Inspired by Buckland's concept of information-as-thing, I will discuss Otlet's role of multidimensional representations of knowledge in the development of "interface as thing." Some of the hundreds of the visualizations of interfaces that Otlet made or commissioned and that are kept in the Archives of the Mundaneum in Mons (Belgium) will be analyzed to demonstrate the importance of "materialization of knowledge" in e-research for recent discussions on the provenance and evidence of information in Web 2.0 and Semantic Web solutions. The hypothesis will be put forward that the visibility of Otlet's struggle with tensions of representation, incompatibility, and of inoperability of interfaces, can lead to new questions that would not come to mind in current representations of interfaces that connect data at first sight seamlessly into a unified but ultimately problematic homogenous whole. Differences in the heterogeneity, ambiguity and interpretative character of data, between e-research in the natural sciences, humanities and social sciences should be acknowledged and made visible in e-research rather than be reduced or ignored. I will focus particularly on the role of annotation and visualization in the creation of evidence as part of e-research by discussing some digital humanities projects I am involved in. Since Buckland stated that "evidence implies passiveness [and] like information-as-thing, does not do actively" several claims have been put forward by Berners-Lee and others about (semi-) automated generated refinement of concepts that can be seen as a more active form of evidence creation. I will refer to some historical antecedents of such claims of automated hypothesis generation in the works of Otlet, Ostwald, Ranganathan, Iverson and Miksa and explore the implications for information science to make these manually and automated "chunks of evidence" accessible for future e-research.
    Charles van den Heuvel works in the Virtual Knowledge Studio for Humanities and Social Sciences in the Royal Netherlands Academy of Arts and Sciences (KNAW), Amsterdam. He has been appointed to a short term visiting appointment in the School of Information as Regents Lecturer.

Friday, Sept 24: Lewis LANCASTER, Tim TANGHERLINI (UCLA) and Michael BUCKLAND: Network Pattern Recognition in Large Humanities Corpora.
    A new grant to the Electronic Cultural Atlas Initiative will support the application of techniques developed for the analysis of very large science datasets to newly available very large textual datasets in the humanities. In collaboration with Tina Eliassi-Rad (Rutgers) and Christos Faloutos (Carnegie-Mellon & Google), we will focus on recent developments in network analysis that focus on complex problems including visual query systems, topic discovery, anomaly detection, and rapid mining of complex time-stamped data as a means for extending these approaches to noisy Humanities data using Buddhist Canonic texts (Chinese and Sanskrit); Irish studies journals (English and Gaelic); and Danish folklore (English and Danish). We propose to begin by tuning the visual query system for large graphs (GRAPHITE).

Friday, Oct 1: Ray R. LARSON, and Krishna JANAKIRAMAN, and Brian TINGLE (California Digital Library).
    The Social Networks and Archival Context Project.

    Archivists have a long history of describing the people who, acting individually, in families, or in formally organized groups, create and collect primary sources. Archivists research and describe the artists, political leaders, scientists, government agencies, soldiers, universities, businesses, families, and others who create and are represented in the items that are now part of our shared cultural legacy. Because archivists have traditionally described records and their creators together, this information is tied to specific resources and institutions. Currently there is no system in place that aggregates and interrelates those descriptions.
    Leveraging the new standard Encoded Archival Context -- Corporate Bodies, Persons, and Families (EAC-CPF), the Social Networks and Archival Context Project (SNAC) will use digital technology to "unlock" descriptions of people from descriptions of their records and link them together in exciting new ways. It will create an efficient open-source tool that allows archivists to separate the process of describing people from that of records and will create an integrated portal to creator descriptions--linked to resource descriptions in archives, libraries and museums, online biographical and historical databases, and other diverse resources--thereby providing more effective access and robust historical context to a broad array of humanities materials. The prototype access system will demonstrate that descriptions of persons, families, and organizations can be used as access points to archive, library, and museum resources.
    The Institute for Advanced Technology in the Humanities at the University of Virginia will lead the SNAC project in partnership with the California Digital Library and Berkeley's School of Information. EAC-CPF records will be derived from existing archival finding aids from the Library of Congress, the Online Archive of California, the Northwest Digital Archive, and Virginia Heritage; and also from name authority files supplied by the Library of Congress, Getty Vocabulary Program, and OCLC Research.

Friday, Oct 8: Catherine MARSHALL, Microsoft Research: Testing the Limits of Social Media Ownership.
    Social media, by its very nature, introduces questions about content ownership. Content ownership comes into play most crucially when we design services and applications to archive, reuse, remix, or remove social media. We have been investigating social media ownership issues using a series of Mechanical Turk surveys that probe respondents' current attitudes and practices; the surveys combine open-ended questions about use with realistic scenarios that test respondents' attitudes in specific situations. (This talk will describe joint work with Frank Shipman at Texas A&M).

Friday Oct 15: Tuukka RUOTSALO, Visiting Scholar: Knowledge Management for Digital Cultural Heritage.
    Dr Ruotsalo is a Fulbright Scholar in the School of Information for the current academic year. He has extensive experience in the use of digital techniques in cultural heritage institutions. He will summarize recent experience from large national projects in Finland (FinnONTO) on using ontologies and semantic knowledge management technologies to overcome interoperability and information access problems. He will also discuss the related project he is working on at the iSchool. For more see

Friday October 22: Jeanette ZERNEKE, Electronic Cultural Atlas Initiative: The Early California Cultural Atlas (ECCA).
    The Early California Cultural Atlas (ECCA) is a collaborative research project led by Professor Steven Hackel at UC Riverside in collaboration with Jeanette Zerneke of ECAI. ECCA is developing a digital atlas of historical data related to the colonization and settlement of early California. European settlement in North America and the establishment of missions to Indians initiated dramatic demographic, environmental, religious, and social change. In the first phase of the project we constructed a website of historical change in the region of Monterey, California. Embedded Google Earth visualizations show changes by year and allow the user to interact with the data layers and time bar. The project has chosen to intentionally address ambiguity, developed an ambiguity characterization methodology, and experimented with methods to visualize characteristic land use patterns. In the process, we encountered significant new historical questions.
    ECCA integrates multiple types of data such as:
  -   California Mission records from the Early California Population Project based at the Huntington Library in Pasadena;
  -   Historical maps from the Library of Congress and David Rumsey Collection; and
  -   Hand drawn maps, images, and texts from the Online Archive of California.
    For further information see:

Friday, Oct 29: Clifford LYNCH: Some Initial Thoughts on Data Retention Lifecycles and Data Lifespans.
    Determining what data to keep and what to discard, and most critically when to discard it, is essential to any sustainable approach to research data curation. Yet we seem to have almost no practical measures to help establish specific lifespans for data. In this talk and discussion, I'll try to outline some initial thinking that may serve as a point of departure and also identify some hypotheses that might be useful in advancing further work on the topic.

Friday, Nov 5: Krishna JANAKIRIMAN and Michael BUCKLAND.
    Krishna JANAKIRAMAN: Report on Matching and Clustering Entities in Large Collections of Encoded Archival Context (Corporate Names, Persons and Families) Records.

    I will be reporting my progress towards implementing techniques that match and merge entities in collections of Encoded Archival Context (Corporate Names, Persons and Families) records. I would be discussing cases where our initial simple techniques, techniques based on exact matches using name authority files as a reference, failed to identify matches. I also plan to discuss my experiments on using probabilistic graphical models to cluster entities based on the information present in these records.
    Also Michael BUCKLAND: (Re-)Using Other People's Data.
    Join us for a discussion of the issues and impediments involved in the use of data created by other people, especially when the new use of old data is for a different purpose? How could the relative importance of different barriers and the probable cost-effectiveness of alternative remedies be assessed? If we set aside the difference between digital and non-digital media, what can be learned from our pre-digital experience?

Friday, Nov 12: John WILBANKS, Vice President, Science Commons, Creative Commons.
    The Work of the Science Commons.
    For John Wilbanks see For the Science Commons see

Friday, Nov 19: Ryan SHAW & Patrick GOLDEN: Editorial Practices and the Web.
    An initial progress report on the "Editorial Practices and the Web" project. Scholarly editions of historically significant texts are important in the Humanities. However, expert editorial work is difficult and funding is scarce. Current Web technology can be used to improve the return on investment by making editors' work available more quickly, more fully, and more widely. Additional objects are to avoid duplicative effort among different projects and explore a closer relationship between scholarly editing and library special collections. See

Friday, Nov 26: Thanksgiving: No Seminar meeting.

Friday, Dec 3: Last meeting of the semester.
    Krishna JANAKIRAMAN: Matching and Merging
and Megan FINN: Californians and Their Earthquakes.
    Krishna JANAKIRAMAN: Matching and Merging Entities in Collections of Archive Description Records.

    I will present my final report on the progress made towards matching and merging entities in collections of archive description records. I will discuss techniques that use exact string matching algorithms, approximate string matching algorithms and discuss how information from name authority files can be used to improve matching results. Experiments with clustering algorithms and nearest neighborhood algorithms will be reported. I also plan to discuss efforts towards linking data from dbpedia into the existing data and the possibilities such linkages may provide.
    Megan FINN: Californians and Their Earthquakes.
    I will present a chapter of my dissertation research about Californian information practices after earthquakes. In this talk I will discuss the 1868 Hayward Fault earthquake, the last time an earthquake originating on the Hayward fault shook the Bay Area. The presentation will focus on the circulation of documents about the earthquake, with an eye towards the telegraph and the circulation of reproduced images. Upon the completion of the telegraph, the Sacramento Daily Union presented a view of the telegraph that was not usual for the day: "the lightning has annihilated a continent as an obstacle to intellectual communication." I argue that the relatively new cross-continental telegraph does not alone constitute an infrastructural epistemology, but what Californians learn about the earthquake can be understood in light of existing goals of several groups of people. Specifically, I examine the documentary activities of the powerful San Francisco Chamber of Commerce, the accountability of the San Francisco's government, the newspapers' analysis of the quality of reports available, and the authority of the California Academy of Sciences.

The Seminar will resume in the Spring Semester on Friday, January 21, 2011.
