Friday, Jan 20: Clifford LYNCH: Surveying the Changing Cultural Record.
    Introduction to the seminar. Introductions of participants. Short reports.
    We have become accustomed to talking rather casually about ideas like "archiving the cultural record", yet we have a very poor understanding of what constitutes this record and how it is changing with the continual evolution of information technology. As a result it's difficult to tell how effective our range of cultural memory institutions are, how to set priorities, and where there are needs to develop new public policies. In this discussion, I'll do some initial very high level surveying of what might be considered to constitute the cultural record and briefly discuss some of the stewardship issues.

Friday, Jan 27: Pinar ÖZTÜRK: Classical and Textual Case-based Reasoning.
    Case-based reasoning (CBR) is an artificial intelligence method for problem solving. The underlying idea is that similar problems have similar solutions. When a new problem is encountered, similar past solutions are found and used to solve the new problem. A difficulty with "classical" CBR is the assumption of structured representation of cases in the form of attribute-value pairs. In real life most problem solving experiences are documented as free text, not as structured representations. Manual structuring/engineering of cases is a daunting task especially when the experience collection is large. In the past two years I have started to explore how to do CBR using textual cases using less knowledge engineering effort. I will discuss the research questions, challenges, and possible directions within CBR.
    Pinar ÖZTÜRK is an associate professor in at the department of computer and information science at the Norwegian University of Science and Technology. Her research focus is in the areas of case-based reasoning, learning by imitation, and multiagent systems. She has been involved in projects using CBR in medical diagnosis and oil drilling in the past. She is currently responsible for applying CBR in a smart power grid project funded by Norwegian utility companies. In another project, again in the power domain, she is employing multiagent systems for e-cars.

Friday, Feb 3: Michael BUCKLAND and Patrick GOLDEN: Scholarly Notes and Digital Humanities.
    Our presentation will have two components: A progress report on the "Editorial Practices and the Web" project (and some closely related work) and a discussion the implications of this work for the nature and infrastructure of digital humanities.
    The preparation of scholarly editions of historically important texts is important in the humanities, but expensive. A minor change in editorial procedures to allow editors' working notes to be available in a collaborative environment increases the return on investment by reducing duplicative work, increasing editorial productivity, making additional resources available to the public, and preserving resources ordinarily lost when funding ends. Scholarly notes of various kinds are created throughout the Humanities. So a focus on scholarly notes provides a basis for convergence and collaboration among quite diverse constituencies across the humanities.
    The project website is at

Friday, Feb 10: Ray LARSON and Brian TINGLE: Update on the Social Networks and Archival Context (SNAC) Project.
    Archivists have a long history of describing the people who -- acting individually, in families, or in formally organized groups -- create and collect primary sources. They research and describe the artists, political leaders, scientists, government agencies, soldiers, universities, businesses, families, and others who create and are represented in the items that are now part of our shared cultural legacy. However, because archivists have traditionally described records and their creators together, this information is tied tospecific resources and institutions.
    The SNAC Project is using digital technology to "unlock" descriptions of people from descriptions of their records and link them together in interesting new ways. We are aggregating and interrelating those descriptions using EAC-CPF (the Encoded Archival Context - Corporate Bodies, Persons and Families). Work on the SNAC project is being conducted by a consortium consisting of The Institute for Advanced Technology in the Humanities (University of Virginia), the UC Berkeley I-School, and the California Digital Library.
    To represent the network of relationships between corporate bodies, persons and families, the merged EAC-CPF XML records have been processed into a graph database, which is used to power an interactive network visualization and generate linked data for publication as a SPARQL endpoint. The talk will review the Tinkerpop graph database stack and Apache Jena linked data technologies used for this processing. The graph and RDF data sets and APIs published by the project will also be described, including an overview of key sections of source code for the graph processing.
    In this presentation we will describe and present an update on the SNAC project and demonstrate the public access interface for the SNAC database, including social network visualizations of SNAC persons, corporate bodies and families. The SNAC project is currently funded by the National Endowment for the Humanities and by a grant from the Mellon Foundation. We will also discuss future plans for the project.
    Links: Project Site:
    Brian Tingle is Technical Lead for Digital Special Collections at the California Digital Library.

Friday, Feb 17: AnnaLee SAXENIAN and Irene ELETA.
    Annalee SAXENIAN: Ischools and the Ischool Conference.

    Dean Saxenian will lead a discussion on the recent ischool conference.
    Irene ELETA, Univ of Maryland: Multilingual Social Tagging of Art Images: Cultural Bridges and Diversity.
    Brief overview of the "T3 project: Test, Tags, and Trust" at the University of Maryland (, which combines text mining and social tags for improving access to digital image collections in museums and libraries. The principal investigators of this project are Drs. Judith Klavans (Computational Linguistics and Information Processing Lab) and Jennifer Golbeck (iSchool Human-Computer Interaction Lab). Within this broader context, the talk will focus on a study of multilingual social tagging, carried out by Irene Eleta with the guidance of Dr. Jennifer Golbeck. This study compares social tagging patterns in two languages (Spanish and English) in image collections of art. Also, it proposes ways to leverage multilingual tags for enriching the images metadata, adding diversity, and improving access in different languages. Recently, this work was accepted in the International ACM Conference CSCW (Computer-Supported Cooperative Work), to be held in February, 2012.
    Irene Eleta is a doctoral candidate at the University of Maryland iSchool, with Fulbright sponsorship. Access to multilingual information is the overarching motivation for her research and professional career; with experience as a professional translator and in the evaluation of machine translation systems, she became interested in multilingual and cross-language search during her master studies a the University of Sheffield (UK). Her recent work includes multilingual social tagging, and multilingual communication in Twitter. Irene comes from Spain, has lived in four countries, and speaks Spanish, English and French.

Friday, Feb 24: No Seminar Meeting: Invitation to Personal Digital Archiving 2012.
    Seminar attendees are invited to attend the final session of the Personal Digital Archiving 2012 conference instead.

Friday, March 2: Adam JATOWT and Michael BUCKLAND.
    Adam JATOWT, Kyoto Univ.: Studies in Collective Memory: Towards Computational History through Large Scale Text Mining.

    Given the huge amount of data about the past, computer science will play an increasingly important role in historical studies, with computational history becoming an emerging interdisciplinary field of research. In this presentation, I will show the results of our recent study on how the past is remembered through large scale text mining. I will demonstrate that analysis of references to the past in news articles allows us to gain a lot of insight into the collective memories and societal views of different countries. At the end of the talk, I will also briefly describe my recent work on the readability of web content and historical documents.
    Adam Jatowt is as an Associate Professor at the Department of Social Informatics in Kyoto University. He has been working on: computational history, future-related information extraction and summarization, content readability, and information access to web archives. He has been PC co-chair of iPRES2011 and served as PC member of SIGIR, JCDL, HT and COLING conferences. Prof. Jatowt is in Berkeley for a month. See
    Michael BUCKLAND: Integrative Data Management.
    I will lead a discussion of the problems, best practices, and graduate education associated with the re-use of digital resources created by someone else at some time in the past for some other purpose. What kinds of problems are there? How could we ascertain their relative importance? What kind of educational initiative could lead to better practices among PhD students in all disciplines? What would make academic research data sets more accessible for the off-campus public?

Friday, March 9: Clifford LYNCH: Personal Digital Archiving: Discussion Based on the Personal Digital Archiving 2012 Conference.
    In recent years, there has been growing interest in the implications of the records of personal life moving to digital formats. These developments will change the practice of history and biography, and other scholarly disciplines; the ways in which personal, family, and other group and community memories are created, documented and transmitted; the assumptions about what is private and what is public. Two weeks ago the Internet Archive in San Francisco hosted the third annual conference on personal digital archiving; several regular seminar particpants were able to attend. Today's seminar will be a review of this meeting and of developments in personal archiving more broadly. Particpants in the Personal Digital Archiving Conference are particularly invited to join us and share their views and reflections.

Friday, March 16: Catherine MARSHALL, Microsoft Research: Whose Content is it Anyway? A User Perspective on the Ownership and Control of Social Media.
    User-contributed content forms the cornerstone of many popular Web services and resources including Flickr, Facebook, YouTube, iTunes, Twitter, Yelp, and even some MMRPGs. Although specific rights about the ownership and control of this content are spelled out in licensing agreements and by copyright law, most contributors and re-users ignore formal contracts, laws, and policies. In this talk, I propose to report on the results of six surveys that use a series of realistic scenarios and specific questions about recent practice to probe respondents' thoughts and behaviors about the value of user-contributed content and how user-contributed content may be reused, archived, re-purposed, and removed. The surveys solicited 988 valid responses (out of 1060 total) from a broad range of Internet-savvy (but mostly non-technical) people, and covered significant types and genres of Web content including photos, tweets, reviews, videos, podcasts, and educational recordings. This talk describes work done in collaboration with Frank Shipman at Texas A&M University.
    Cathy Marshall is a Principal Researcher in Microsoft Research's Silicon Valley Lab. She is currently working on Community Information Management applications and issues associated with personal digital archiving and social media ownership.

Friday, March 23: Marcia BATES, UCLA: Can You Spell Idiographic? -- Designing Information Systems for Humanities Scholarship.
    The talk will provide a precis of what we know about 1) the nature of humanities scholarship and how it differs in its essence from scientific research, 2) what is distinctive about how humanities scholars do research and seek for information, and 3) implications of (1) and (2) for the design of information retrieval systems and interfaces.
    Marcia Bates is one of this School's PhD graduates (1972); Professor Emerita, Department of Information Studies, UCLA; Fellow, American Association for the Advancement of Science; and Editor, Encyclopedia of Library and Information Sciences, 3rd Ed. For more see

*** Friday, Mar 30: Spring break: No Seminar meeting. ***

Friday, Apr 6: Juliane STILLER: Interaction and Collaboration in Cultural Heritage Information Systems.
    Cultural heritage information systems aggregate, search and display cultural heritage objects or their surrogates in an online environment. These objects are coming from memory institutions such as libraries, museums and archives and cover text, image, speech and video. The goal of these systems is to make this content universally available for a broad audience through search, browse and discovery. How to find and implement user interaction and collaboration patterns for experiencing cultural heritage online is the core of this talk. A set of cultural heritage information systems was analyzed with regards to the prevailing interaction and user collaboration patterns. Additionally, it was found that problems of existing cultural heritage information systems are rooted in the different purposes cultural objects serve in their original context and the failure of the information system to translate these into the online world. This presentation will focus on the challenges information systems need to overcome to offer useful services for enabling purposeful interaction with cultural heritage online.
    Juliane Stiller is a researcher at the Berlin School of Library and Information Science at Humboldt University, Berlin, and is currently visiting student researcher here at the School of Information. She is working on multilingual information retrieval and evaluation of digital libraries within the EU-funded projects EuropeanaConnect, GALATEAS, and Promise. The research of her doctoral thesis focuses on user interaction and collaboration in cultural heritage information systems. Before taking on this research position, she was employed at Google Ltd. as a Search Quality Analyst for web search. See

Friday, Apr 13: Clifford LYNCH: Memory Organizations and Evidence to Support Scholarship in the 21st Century.
    Memory organizations have two functions with regard to scholarship: they organize and preserve the scholarly record itself, and they try to select, prioritize, and preserve the much larger body of evidence that can be used to support future scholarly work. There has been a great deal of discussion about the changing scholarly record, and the changes in scholarly practice driven by information technology and data intensive scholarship. In the last few years, there has been a great deal of focus on stewardship of certain types of observational and experimental data, most commonly in the sciences, particularly as new technologies (gene sequencing, synoptic sky surveys, the Large Hadron collider, earth observatories, etc) allow the construction of new scientific instruments that greatly expand the base of evidence. Less well considered are new evidentiary resources that can drive the human sciences; these are often encumbered by privacy and human subjects issues, secrecy, and proprietary considerations. We see new instruments have been constructed and deployed mainly outside of the academy, and the evidence being collected here presents enormous challenges -- indeed, rising to the level of public policy issues -- to memory organizations and to future scholarly work.

Friday, Apr 20: MacKenzie SMITH: Data Governance: Another Side of Data Curation.
    Data governance is the system of rights and accountabilities for who can take what actions with what data and when, under what circumstances, using what methods. It includes laws and policies associated with data, as well as strategies for data quality control and management in the context of organizations, be they physical or virtual (such as large, international research collaborations). Data governance ensures that data can be trusted and that people made accountable for actions affecting it. There are also related technological issues, such as how to implement mandatory attribution on the Web or insure persistence of cited data. The seminar will provide an overview of the issues and current activities in this emerging aspect of data curation.
    Until 2012 MacKenzie Smith was Research Director at the MIT Libraries, where she oversaw digital library research and development. Her research focused on the Semantic Web for scholarly communication, and digital data curation in support of e-research. From 2002 until 2011 she was the Library's Associate Director for Technology, overseeing the library's use of technology and its technology strategy. MacKenzie is now based in the Bay Area and is consulting on several cutting-edge digital library and related initiatives, including a Science Fellowship at Creative Commons, Special Consultant to the Association of Research Libraries' E-Science Institute, and the Digital Public Library of America. Her interest in data governance stems from working with Creative Commons on their Science program.     See Her interest in data governance stems from working with Creative Commons on their Science program.

Friday, Apr 27: Victoria STODDEN, Visiting Scholar: Building the Reproducible Computational Science Movement: Catalyzing Action Through Policy, Software Tools, and Ideas.
    The movement toward reproducible computational science -- where the code used to generate the results are made conveniently available along with the published paper -- has accelerated dramatically in the last few years. Fields as diverse as statistics, bioinformatics, geosciences, and applied math are making efforts to publish reproducible findings, and journal publication and funding agency requirements are changing. I will discuss the changing landscape of openness in computational science, motivate the reproducible research movement, and discuss my recent work in enabling code and data release through legal and policy standards, as well as new software tools for sharing and deposit. In particular I will discuss very recent work on changes in scholarly journal publication requirements.
    Victoria Stodden is Assistant Professor of Statistics, Columbia University, and Visiting Scholar in South Hall during the Spring semester. See

