School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Fall 2012.
Fridays 3-5. 107 South Hall.
Schedule. Weekly mailing list.
Summaries will be added as they become available.
Friday, Aug 24: Vivien PETRAS, Merrilee PROFFITT and Max KLEIN.
Vivien PETRAS, Humboldt University, Berlin: Recent Studies.
A brief overview of some of her recent and current work.
Max KLEIN and Merrilee PROFFITT, OCLC Research:
Wikipedia and Libraries: What's the Connection?
We'll talk about synergies between libraries and Wikipedia, what OCLC
Research has been doing to connect libraries and Wikipedia to ensure that those who start
their research in Wikipedia can find resources in libraries. We'll talk about our
VIAFBot project and a range of potential projects that we've identified that will be
interesting to both librarians and Wikipedians.
Max Klein is OCLC Research's Wikipedian in Residence. He is
working with OCLC Research as community coordinator to explore and pursue mutually
beneficial projects between OCLC, library stakeholders, and the Wikipedia community
through a range of activities, including working with OCLC staff and libraries to help
foster a broader understanding of Wikipedia's practices.
Merrilee Proffitt is a Senior Program Officer in OCLC Research.
She provides project management skills and expert support to institutions represented
within the OCLC Research Library Partnership, with a special focus on increasing visibility
of archives and special collections.
Friday, Aug 31: No Seminar meeting.
Friday, Sep 7: Ray R. LARSON: The Social Networks and Archival Context Project:
Status Report.
The SNAC project has been funded for another two years by the Mellon
Foundation. This talk will review the SNAC project, and examine the new data being
included in this round. This includes over 2 million MARC archival records from OCLC,
as well as EAC data from the British Library, the Bibliothèque National de France, and
EAD records from the UK Archives Hub. More at
socialarchive.iath.virginia.edu.
Also: Michael BUCKLAND: Editors' Notes: Update.
The "Editorial Practices and the Web" project is a collaboration in making
available working notes created in the preparation of documentary editions of historically
important papers and notes made by curators of library special collections. A shared website
for making notes available is now openly accessible without a password. More at
ecai.org/mellon2010 and
editorsnotes.org.
Friday, Sep 14: Eric KANSA: Sometimes Data is Best Served Cooked, Rather than
Raw: Scholarly Publishing and the Web of Data.
There is a great deal of interest in the sciences and humanities
around how to manage "data." By "data," we usually refer to content that has some formal and
logical structure needed to meet the requirements of software processing. Data quality issues
play an important role in shaping professional incentives to participate in data dissemination
and in issues of trust and reliability around the use of shared data. As in the case with other
areas of scholarly production, researchers need appropriate workflows to edit, review, and
improve the quality of shared data. This presentation will explore how the transactional nature
of data helps shape this workflow. Because use of data is heavily mediated by software,
datasets can be seen as an integral part of software. This thinking motivated us to
experiment with using software debugging and issue tracking tools to help organize
collaborative work on editing data. Debugging and issue tracking tools, widely used to improve
software quality, can play a similar role in the "debugging" of data.
Finally, such editorial workflows need to also take into account
issues of context. To be more useful, datasets need to be understood and related to other
information available on the Web. This is particularly true for archaeology, an inherently
multidisciplinary domain with inputs from the humanities, history, and natural sciences.
Beyond the research community, much information relevant to archaeology is routinely collected
through government administrative processes relating to environmental impact regulations and
historical preservation laws. "Linked Open Data" methods can help to better contextual research
data both with other datasets, other forms of scholarly communications, and records maintained
by government institutions.
Eric Kansa is Director of the Open Context project,
a service for the publication of research data in archaeology. Open Context
works closely with the California Digital Library for long-term archiving
and curation of digital data. See opencontext.org.
He is also affiliated with the ISchool.
Friday, Sep 21: Sara NOFRI, Univ. of Hamburg, Germany; Visiting Scholar, Journalism:
Google and its Consequences for Information Retrieval.
Google currently dominates Internet searches, results, information
seeking patterns, cognitive patterns, and our ways of looking for things.
Google has crawled and catalogued images, videos, sounds, and texts, and has categorised
them by location, appearance, dimension, time, and in other ways. Further, string searches
have become increasingly important, as Google's algorithms keep improving, guessing
for the searcher, adding stuff one does not want to retrieve, and obscuring stuff that
one is looking for. So much that is not spread by word-of-mouth now comes from
Google. It is difficult to imagine what could counter this influence.
What are the challenges facing Google and its users?
What are the elements that still prevent effective information retrieval or manipulate
the search and retrieval processes? My presentation will address two challenges:
The language (and cultural) challenge; and, in more depth, the truthfulness challenge.
In this situation how can information retrieval be made more successful?
Sara Nofri has a PhD in Communication Science from the University of
Hamburg, an MA in Conference Interpreting and Translation Studies from the University
of Bologna, and studied Political Science and Scandinavian Studies at the Ruhr-University
Bochum. She has studied, traveled, taught, and conducted research in several different
European countries. Her doctoral thesis compared linguistically,
quantitatively, and qualitatively the coverage of environmental issues in Sweden, Italy,
Germany and the U.K. Since 2006, she has been teaching at several departments
at the University of Hamburg. Currently, she is working at a startup project for creating
software aggregating Internet data and performing partially automated qualitative
analyses of those data.
Friday, Sep 28: Clifford LYNCH:
What We Still Don't Know about Institutional Repositories.
The higher education and library world, in the US and globally, now
has about ten years of experience with the depolyment of instutional repository
services; yet there is still a surprising amount that we don't understand. After a
brief review of the evolution of institutional repositories, I'll focus on the status
of a series of these open research questions, including: how to measure success;
metadata-related strategies and integration challenges; institutional and disciplinary
repository relationships; data intensive scholarship and repositories; and provisioning
repositories as use environments.
Friday, Oct 5: Brian CARVER and Michael LISSNER: The Court Listener.
Michael Lissner and Brian Carver created CourtListener.com, an alert service
covering the U.S. federal appellate courts, a database of opinions with >750,000 documents,
including the Supreme Court corpus since 1754. Rowyn McDonald and Karen Rustad created a
legal citatory on it. The site is provided at no cost. The site's code
is under open source licenses. All documents, with citation relationships, can be downloaded
in bulk providing a large corpora of English-language documents research and collaboration.
As time permits, we'll discuss: (1) Frequent (often angry) requests that court opinions be
removed from their site or blocked from discovery by search engines and a study of requests
made. Who makes such requests and why. Recommendations for a balance between access and privacy.
(2) There 100+ court websites, many with poor funding or prioritization. Gaining a higher-level
view of the law can be challenging. "Juriscraper" is a new project, spun out of CourtListener,
designed to ease problems for those who wanting court opinions daily. The project is under
active development, and we are looking for others to get involved. Michael Lissner
will describe the difficulties and complexities of building Juriscraper, and discuss some of
our solutions to these problems.
(3) Finally, we will describe how Rowyn and Karen built an online application which detects
citations in court opinions, creates links between cases, and tracks the resulting citation
network. Problems encountered and solved, and directions for future work, with a demo of the
citator. courtlistener.com.
Friday, Oct 12: Laurie PEARCE, Patrick SCHMITZ and Davide SEMENZIN:
Ancient Families, Modern Tools: Berkeley Prosopography Services.
BPS is a set of services for prosopographic analysis developed at Berkeley
in response to historians' needs to mine prosopographic data from text corpora, supporting
study of societal relations among documented individuals. BPS supersedes the limitations
of traditional pen-and-paper research by providing researchers with a flexible and intuitive
corpus-based toolset for data processing, analysis, and visualization. From its inception,
BPS was required to be generalizable, scalable, corpus agnostic, extensible, and universally
accessible. BPS' innovative and unique contribution as a research tool is in the support
for the promulgation and exploration of counterfactual assertions within the context of
corpora curated by domain-experts, while preserving domain integrity and tracking
intellectual contribution and authority. Challenges encountered in BPS development
include: translating the high-level requirements of humanities researchers into
technically-sound designs, abstract modeling of probabilistic networks, and deployment of
the tools as reusable web-services.
Laurie Pearce is lecturer in Assyriology in the Department of
Near Eastern Studies, Berkeley. She specializes in the social and economic history of
Mesopotamia (modern Iraq) in the late first millennium BCE. The legal texts from
Hellenistic Uruk, which serve as the development and demonstrator corpus for Berkeley
Prosopography Services
(berkeleyprosopography.org),
are the core component of her project Hellenistic Babylonia: Texts, Images and Names
(oracc.museum.upenn.edu/hbtin),
a component of the Open Richly Annotated Cuneiform Corpus.
Patrick Schmitz is a Semantic Services Architect and Manager of the IST Research
Technologies Architecture and Design group at U.C. Berkeley, where he focuses on bringing
semantic intelligence to cultural heritage communities. He is the technical lead for the
CollectionSpace project and senior architect on Project Bamboo and was previously in
research groups at Microsoft, Yahoo!, and CWI in Amsterdam. He has extensive experience
as system architect and software developer on multimedia and information management
platforms, has co-founded several tech startups, and is active in W3C working groups. He has a
BA in Computer Science and a Masters in Information Management and Systems from U.C. Berkeley,
and is also a Lecturer in the I School faculty.
Davide Semenzin is a visiting scholar working on his Masters project
for the University of Utrecht. His areas of focus include Social Network Analysis and
Graph Visualization.
Friday, Oct 19: Michael BUCKLAND & AnnaLee SAXENIAN.
Michael BUCKLAND: Lodewyk Bendikson and the Copying of Documents.
Printing allowed mass-produced copies of documents, but easy, accurate
one-off copying was not achieved until the 20th century with specialized photographic
techniques: first photostat (developed for humanities scholars) then microfilm (for banks).
Photography also allowed image enhancement and document forensics. Explained through
the work of a Dutch-born Californian, Lodewyk Bendikson (1875-1953).
More at people.ischool.berkeley.edu/~buckland/bend.html.
4:00 p.m. AnnaLee SAXENIAN:
Online Education: The I School, Berkeley, and Beyond.
I'll talk about online initiatives at the I School in the context of
current experimentation on the Berkeley campus and beyond. More at
http://people.ischool.berkeley.edu/~anno/Papers/Online_Education_at_Berkeley.pdf.
Friday, Oct 26: Laine FARLEY and Patricia MARTIN, California Digital Library:
Return On Investment: Is It Possible to Demonstrate Value for Digital Library Services?
As the financial crisis has deepened in higher education, administrators
and others have called for more evidence of value and return on investment, even for such
core services as libraries. Others have argued that this approach simply can't apply to
library services. The California Digital Library attempted to develop various measures
of value and return on investment (ROI) to its services and will examine where these
approaches may be credible and where there are gaps.
Background:
"Stop the Madness: The Insanity of ROI and the Need for New Qualitative Measures of Academic
Library Success by James G. Neal academiccommons.columbia.edu/catalog/ac%3A130659.
LibValue database: libvalue.cci.utk.edu.
Laine Farley has been the Executive Director of CDL since 2008 and has
been with CDL since its inception in 1997.
Patricia Martin is Director of Discovery and Delivery Services at
CDL which includes the Melvyl Catalog, UC-eLinks, the Request service for ILL and new
metadata management services in support of HathiTrust and the Center for Research Libraries'
Print Archiving and Preservation Registry.
Friday, Nov 2: Howard BESSER, New York University:
Strategies for Copying Out-of-Print Works: NYU's "Video At Risk" Project.
Howard Besser will discuss the Mellon-sponsored "Video At Risk" (VAR) project,
paying particular attention to findings and strategies that might prove useful to other
libraries. Among the topics he will cover as well as discuss methodology for are: VAR's
studies that have shown that a significant number of mass-produced works purchased by
academic libraries for their circulating libraries are now both out-of-print and held
by few libraries; VAR's Guidelines for interpreting Section 108 Copyright laws, and
experiments to support claims around "deterioration"; the severe problems with inconsistent
cataloging in OCLC records; and the development of guidelines for digital reformatting
of video. He may also discuss the development of an open-source tool for extracting
metadata in a manageable form from digital video files. www.nyu.edu/tisch/preservation/research/video-risk.
Howard Besser is a Professor of Cinema Studies in NYU's Tisch School
of the Arts, and founding Director of NYU's Moving Image Archiving & Preservation Program.
Friday, Nov 9: Short Reports.
Once a semester we devote Seminar time to miscellaneous short reports,
especially highlights of recent, distant, expensive conferences. Today's Seminar will
include reports on the American Society for Information Science & Technology Annual Meeting
and the Pre-conference in the History of Information Science; the first and second
National Archival Authorities Cooperative (NAAC) meetings to build a National Archival
Authorities Infrastructure; and the International Conference on Theory and Practice
of Digital Libraries (successor of the European Conference on Research and Advanced
Technology for Digital Libraries (ECDL)) in Cyprus.
Attendees will be encouraged to contribute additional brief reports on other meetings
and on their current, recent or forthcoming projects.
Friday, Nov 16: David ROSENTHAL, Stanford: The Truth Is Out There: Preservation and the Cloud.
With the recent introductions of DuraCloud, Preservica, Glacier and
others, preservation has joined most other applications in being offered
as a cloud service, Preservation as a Service (PaaS). Does PaaS make
technical, economic or business sense? What characteristics make
applications cloud-friendly? If outsourcing to third-party cloud
services is such a great idea, why do companies that get big enough all
build their own clouds?
David S. H. Rosenthal, has been an engineer in Silicon Valley for
more than a quarter-century. He co-founded and is Chief Scientist of the
LOCKSS (Lots Of Copies Keep Stuff Safe) Program at the Stanford
University Libraries. He was an early employee at Sun Microsystems,
and employee #4 at NVIDIA.
Friday, Nov 23: Thanksgiving. No seminar meeting.
Friday, Nov 30: Clifford LYNCH: Research Agendas in Personal Digital Archiving.
A number of important technical, economic and social developments --
many aspects of commerce and consumer activities, correspondence, personal
photography, and some uses of social media -- have been grouped together under shorthands
like "digital lives" or "personal digital archiving", and in the past few years there
has been at least one annual conference (the next being held in College Park MD in Feb
2013) looking at this area. Recently I've been working on a concluding chapter for a
book of essays on personal digital archiving, and after giving a brief overview of the
scope of the field and its interconnections to other related areas, I'll share some
of the ideas that I've developed about the key research issues and agendas in this area.
The Seminar will resume in January 25, 2013.
Fall 2012 schedule.
Spring
2012 schedule
and summaries.
Spring
2013 schedule
and summaries.