Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2012.
Fridays 3-5. 107 South Hall.
Schedule. Weekly mailing list.
Summaries will be added as they become available.
Friday, Jan 20: Clifford LYNCH: Surveying the Changing Cultural Record.
Introduction to the seminar. Introductions of participants.
We have become accustomed to talking rather casually about ideas like
"archiving the cultural record", yet we have a very poor understanding of what constitutes
this record and how it is changing with the continual evolution of information technology.
As a result it's difficult to tell how effective our range of cultural memory institutions
are, how to set priorities, and where there are needs to develop new public policies.
In this discussion, I'll do some initial very high level surveying of what might be
considered to constitute the cultural record and briefly discuss some of the stewardship issues.
Friday, Jan 27: Pinar ÖZTÜRK: Classical and Textual Case-based Reasoning.
Case-based reasoning (CBR) is an artificial intelligence method for problem
solving. The underlying idea is that similar problems have similar solutions. When a new
problem is encountered, similar past solutions are found and used to solve the new problem.
A difficulty with "classical" CBR is the assumption of structured representation of cases
in the form of attribute-value pairs. In real life most problem solving experiences are
documented as free text, not as structured representations. Manual structuring/engineering
of cases is a daunting task especially when the experience collection is large. In the past
two years I have started to explore how to do CBR using textual cases using less knowledge engineering
effort. I will discuss the research questions, challenges, and possible directions within CBR.
Pinar ÖZTÜRK is an associate professor in at the department of
computer and information science at the Norwegian University of Science and Technology.
Her research focus is in the areas of case-based reasoning, learning by imitation, and
multiagent systems. She has been involved in projects using CBR in medical diagnosis and
oil drilling in the past. She is currently responsible for applying CBR in a smart power
grid project funded by Norwegian utility companies. In another project, again in the
power domain, she is employing multiagent systems for e-cars.
Friday, Feb 3: Michael BUCKLAND and Patrick GOLDEN: Scholarly Notes and Digital Humanities.
Our presentation will have two components: A progress report on the
"Editorial Practices and the Web" project (and some closely related work) and a discussion the
implications of this work for the nature and infrastructure of digital humanities.
The preparation of scholarly editions of historically important texts is
important in the humanities, but expensive. A minor change in editorial procedures to allow
editors' working notes to be available in a collaborative environment increases the return on
investment by reducing duplicative work, increasing editorial productivity, making additional
resources available to the public, and preserving resources ordinarily lost when funding ends.
Scholarly notes of various kinds are created throughout the Humanities. So a focus on scholarly
a basis for convergence and collaboration among quite diverse constituencies across the
The project website is at
Friday, Feb 10: Ray LARSON and Brian TINGLE: Update on the Social Networks and
Archival Context (SNAC) Project.
Archivists have a long history of describing the people who -- acting
individually, in families, or in formally organized groups -- create and collect primary
sources. They research and describe the artists, political leaders, scientists, government
agencies, soldiers, universities, businesses, families, and others who create and are
represented in the items that are now part of our shared cultural legacy. However, because
archivists have traditionally described records and their creators together, this information
is tied tospecific resources and institutions.
The SNAC Project is using digital technology to "unlock" descriptions of people
from descriptions of their records and link them together in interesting new ways. We are
aggregating and interrelating those descriptions using EAC-CPF (the Encoded Archival Context
- Corporate Bodies, Persons and Families). Work on the SNAC project is being conducted by
a consortium consisting of The Institute for Advanced Technology in the Humanities (University
of Virginia), the UC Berkeley I-School, and the California Digital Library.
To represent the network of relationships between corporate bodies, persons
and families, the merged EAC-CPF XML records have been processed into a graph database,
which is used to power an interactive network visualization and generate linked data for
publication as a SPARQL endpoint. The talk will review the Tinkerpop graph database stack
and Apache Jena linked data technologies used for this processing. The graph and RDF data
sets and APIs published by the project will also be described, including an overview of key
sections of source code for the graph processing.
In this presentation we will describe and present an update on the SNAC project
and demonstrate the public access interface for the SNAC database, including social network
visualizations of SNAC persons, corporate bodies and families. The SNAC project is currently
funded by the National Endowment for the Humanities and by a grant from the Mellon Foundation. We will also discuss future plans for the project.
Project Site: http://socialarchive.iath.virginia.edu/.
Brian Tingle is Technical Lead for Digital Special Collections at the California
Friday, Feb 17: AnnaLee SAXENIAN and Irene ELETA.
Annalee SAXENIAN: Ischools and the Ischool Conference.
Dean Saxenian will lead a discussion on the recent ischool conference.
Irene ELETA, Univ of Maryland: Multilingual Social Tagging of Art Images:
Cultural Bridges and Diversity.
Brief overview of the "T3 project: Test, Tags, and Trust" at the University
(umiacs.umd.edu/research/t3/), which combines
text mining and social tags for improving access to digital image collections in museums and
libraries. The principal investigators of this project are Drs. Judith Klavans (Computational
Linguistics and Information Processing Lab) and Jennifer Golbeck (iSchool Human-Computer
Within this broader context, the talk will focus on a study of multilingual social tagging,
carried out by Irene Eleta with the guidance of Dr. Jennifer Golbeck. This study compares
social tagging patterns in two languages (Spanish and English) in image collections of art.
Also, it proposes ways to leverage multilingual tags for enriching the images metadata,
adding diversity, and improving access in different languages. Recently, this work was
accepted in the International ACM Conference CSCW (Computer-Supported Cooperative Work),
to be held in February, 2012.
Irene Eleta is a doctoral candidate at the University of Maryland
iSchool, with Fulbright sponsorship. Access to multilingual information is the overarching
motivation for her research and professional career; with experience as a professional
translator and in the evaluation of machine translation systems, she became interested in
multilingual and cross-language search during her master studies a the University of Sheffield
(UK). Her recent work includes multilingual social tagging, and multilingual communication
in Twitter. Irene comes from Spain, has lived in four countries, and speaks Spanish,
English and French.
Friday, Feb 24: No Seminar Meeting: Invitation to Personal Digital Archiving 2012.
Seminar attendees are invited to attend the final session of the Personal
Digital Archiving 2012 conference instead.
Friday, March 2: Adam JATOWT and Michael BUCKLAND.
Adam JATOWT, Kyoto Univ.: Studies in Collective Memory: Towards Computational
History through Large Scale Text Mining.
Given the huge amount of data about the past, computer science will play an
increasingly important role in historical studies, with computational history becoming an emerging
interdisciplinary field of research. In this presentation, I will show the results of our
recent study on how the past is remembered through large scale text mining. I will
demonstrate that analysis of references to the past in news articles allows us to gain a
lot of insight into the collective memories and societal views of different
countries. At the end of the talk, I will also briefly describe my recent work on the
readability of web content and historical documents.
Adam Jatowt is as an Associate Professor at the Department of
Social Informatics in Kyoto University. He has been working on: computational history,
future-related information extraction and summarization, content readability, and information
access to web archives. He has been PC co-chair of iPRES2011 and served as PC member
of SIGIR, JCDL, HT and COLING conferences. Prof. Jatowt is in Berkeley for a month.
Michael BUCKLAND: Integrative Data Management.
I will lead a discussion of the problems, best practices, and graduate education
associated with the re-use of digital resources created by someone else at some time in the past
for some other purpose. What kinds of problems are there? How could we ascertain their relative
importance? What kind of educational initiative could lead to better
practices among PhD students in all disciplines? What would make academic research data sets more
accessible for the off-campus public?
Friday, March 9: Clifford LYNCH: Personal Digital Archiving: Discussion Based on the Personal Digital
Archiving 2012 Conference.
In recent years, there has been growing interest in the implications of the
records of personal life moving to digital formats. These developments will change the
practice of history and biography, and other scholarly disciplines; the ways in which
personal, family, and other group and community memories are created, documented and
transmitted; the assumptions about what is private and what is public. Two weeks ago the
Internet Archive in San Francisco hosted the third annual conference on personal digital
archiving; several regular seminar particpants were able to attend. Today's seminar will
be a review of this meeting and of developments in personal archiving more broadly.
Particpants in the Personal Digital Archiving Conference are particularly invited to
join us and share their views and reflections.
Friday, March 16: Catherine MARSHALL, Microsoft Research: Whose Content is it Anyway?
A User Perspective on the Ownership and Control of Social Media.
User-contributed content forms the cornerstone of many popular Web services
and resources including Flickr, Facebook, YouTube, iTunes, Twitter, Yelp, and even some
MMRPGs. Although specific rights about the ownership and control of this content are spelled
out in licensing agreements and by copyright law, most contributors and re-users ignore
formal contracts, laws, and policies. In this talk, I propose to report on the results of
six surveys that use a series of realistic scenarios and specific questions about recent
practice to probe respondents' thoughts and behaviors about the value of user-contributed
content and how user-contributed content may be reused, archived, re-purposed, and removed.
The surveys solicited 988 valid responses (out of 1060 total) from a broad range of
Internet-savvy (but mostly non-technical) people, and covered significant types and genres
of Web content including photos, tweets, reviews, videos, podcasts, and educational
recordings. This talk describes work done in collaboration with Frank Shipman at
Texas A&M University.
Cathy Marshall is a Principal Researcher in Microsoft Research's Silicon
Valley Lab. She is currently working on Community Information Management applications
and issues associated with personal digital archiving and social media ownership.
Friday, March 23: Marcia BATES, UCLA: Can You Spell Idiographic? --
Designing Information Systems for Humanities Scholarship.
The talk will provide a precis of what we know about 1) the nature of humanities
scholarship and how it differs in its essence from scientific research, 2) what is distinctive
about how humanities scholars do research and seek for information, and 3) implications of
(1) and (2) for the design of information retrieval systems and interfaces.
Marcia Bates is one of this School's PhD graduates (1972); Professor
Emerita, Department of Information Studies, UCLA;
Fellow, American Association for the Advancement of Science; and
Editor, Encyclopedia of Library and Information Sciences, 3rd Ed. For more see
*** Friday, Mar 30: Spring break: No Seminar meeting. ***
Friday, Apr 6: Juliane STILLER: Interaction and Collaboration in Cultural Heritage
Cultural heritage information systems aggregate, search and display
cultural heritage objects or their surrogates in an online
environment. These objects are coming from memory institutions such as
libraries, museums and archives and cover text, image, speech and
video. The goal of these systems is to make this content universally
available for a broad audience through search, browse and discovery.
How to find and implement user interaction and collaboration patterns
for experiencing cultural heritage online is the core of this talk.
A set of cultural heritage information systems was analyzed with
regards to the prevailing interaction and user collaboration patterns.
Additionally, it was found that problems of existing cultural heritage
information systems are rooted in the different purposes cultural
objects serve in their original context and the failure of the
information system to translate these into the online world. This
presentation will focus on the challenges information systems need to
overcome to offer useful services for enabling purposeful interaction
with cultural heritage online.
Juliane Stiller is a researcher at the Berlin School of Library and
Information Science at Humboldt University, Berlin, and is currently
visiting student researcher here at the School of Information. She is working on
multilingual information retrieval and evaluation of digital libraries
within the EU-funded projects EuropeanaConnect, GALATEAS, and Promise.
The research of her doctoral thesis focuses on user interaction and
collaboration in cultural heritage information systems. Before taking
on this research position, she was employed at Google Ltd. as a Search
Quality Analyst for web search. See
Friday, Apr 13: Clifford LYNCH: Memory Organizations and Evidence to Support
Scholarship in the 21st Century.
Memory organizations have two functions with regard to scholarship:
they organize and preserve the scholarly record itself, and they try to select,
prioritize, and preserve the much larger body of evidence that can be used to
support future scholarly work. There has been a great deal of discussion about
the changing scholarly record, and the changes in scholarly practice driven by
information technology and data intensive scholarship. In the last few years,
there has been a great deal of focus on stewardship of certain types of observational
and experimental data, most commonly in the sciences, particularly as new technologies
(gene sequencing, synoptic sky surveys, the Large Hadron collider, earth observatories,
etc) allow the construction of new scientific instruments that greatly expand the base
of evidence. Less well considered are new evidentiary resources that can drive the
human sciences; these are often encumbered by privacy and human subjects issues,
secrecy, and proprietary considerations. We see new instruments have been constructed and
deployed mainly outside of the academy, and the evidence being collected here presents
enormous challenges -- indeed, rising to the level of public policy issues -- to
memory organizations and to future scholarly work.
Friday, Apr 20: MacKenzie SMITH: Data Governance: Another Side of Data Curation.
Data governance is the system of rights and accountabilities for who can
take what actions with what data and when, under what circumstances, using what methods.
It includes laws and policies associated with data, as well as strategies for data quality
control and management in the context of organizations, be they physical or virtual
(such as large, international research collaborations). Data governance ensures that data
can be trusted and that people made accountable for actions affecting it. There are also
related technological issues, such as how to implement mandatory attribution on the Web or
insure persistence of cited data. The seminar will provide an overview of the issues and
current activities in this emerging aspect of data curation.
Until 2012 MacKenzie Smith was Research Director at the MIT Libraries,
where she oversaw digital library research and development. Her research focused on the
Semantic Web for scholarly communication, and digital data curation in support of e-research.
From 2002 until 2011 she was the Library's Associate Director for Technology, overseeing the
library's use of technology and its technology strategy.
MacKenzie is now based in the Bay Area and is consulting on several cutting-edge digital
library and related initiatives, including a Science Fellowship at Creative Commons,
Special Consultant to the Association of Research Libraries' E-Science Institute, and the
Digital Public Library of America. Her interest in data governance stems from working with
Creative Commons on their Science program.
Her interest in data governance stems from working with Creative Commons on their Science program.
Friday, Apr 27: Victoria STODDEN, Visiting Scholar: Building the Reproducible Computational
Science Movement: Catalyzing Action Through Policy, Software Tools, and Ideas.
memory organizations and to future scholarly work.
The movement toward reproducible computational science -- where the
code used to generate the results are made conveniently available along with the published
paper -- has accelerated dramatically in the last few years. Fields as diverse as
statistics, bioinformatics, geosciences, and applied math are making efforts to publish
reproducible findings, and journal publication and funding agency requirements are changing.
I will discuss the changing landscape of openness in computational science, motivate the
reproducible research movement, and discuss my recent work in enabling code and data release
through legal and policy standards, as well as new software tools for sharing and deposit. In
particular I will discuss very recent work on changes in scholarly journal publication
Victoria Stodden is Assistant Professor of Statistics, Columbia University,
and Visiting Scholar in South Hall during the Spring semester.
The Seminar will resume in August 24.
Spring 2012 schedule.