School of
Information
Previously School of Library & Information Studies
296a-1
Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries - Fall 2008.
Fridays 3-5. 107 South Hall.
Schedule.
Summaries will be added as they become available.
Friday, August 29: First Meeting of the semester.
Clifford LYNCH: Introduction. Summer Developments
Introduction to Seminar: Purpose, History, Plans. Introductions.
Summary of Summer Happenings & Conference Reports (Participation
from all attendees welcome). I'll report on the Report of the NSF Cyberlearning
Task Force, which appeared a few weeks ago; I served on this task force,
and the report can be found at
www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf08204.
I will also report on some developments in cyberinfrastructure,
and provide very brief reports on the WIPO meeting on preservation
and intellectual property in the digital age (July 15), the UK Digital Lives
project and the RLG partners symposium. I hope we will also have brief
reports on the highlights of the Joint Conference on Digital Libraries
(though we will have THE highlight next week, when Cathy Marshall will
present an expanded version of her prize-winning paper at seminar),
and other events or publications of interest.
Fri, Sep 5: Cathy MARSHALL, Microsoft Research:
Taking the Scholars' Perspective on Scholarly Archiving.
About a year ago, I undertook a qualitative field study of
the scholarly writing, collaboration, information management, and long-term
archiving practices of researchers in five related subdisciplines. Fifteen
generous participants allowed me to interview them about the kinds of
artifacts they create in the process of writing a paper, how they exchange
and store materials over the short term, how they handle references and
bibliographic resources, and the strategies they use to guarantee the
long term safety of their scholarly materials. The
findings revealed some surprising design implications for collaboration
infrastructure and personal scholarly archives in addition to suggesting
some ways to facilitate the deposit of scholarly materials into institutional
and disciplinary repositories.
What did I learn? Come find out!
Fri, Sep 12: Clifford LYNCH: Stewardship and Cultural Memory Organizations
in the Digital Age.
In this presentation, I'll begin a discussion of the changing
nature of cultural memory organizations in the digital world, examine some
of the convergences taking place among libraries, museums and archives,
and raise questions about the nature of good stewardship in the digital age,
and some of the legal and social challenges to this. The approach will
include some historical perspectives, as well as a look at current developments.
I'll also discuss some aspects of the nature of cultural memory in the
digital world (building in part on some of Cathy Marshall's earlier presentation).
I will include a number of open research topics. It's likely that some
topics here will be continued to subsequent seminar sessions, depending
on the specific interests of the group.
Fri, Sep 19: Thomas TUNSCH, National Museums in Berlin:
The Semantic Web and Wikis.
Museums as well as other communities related to cultural
heritage have developed many standards with different scopes and levels
of implementation. The CIDOC CRM is the international standard (ISO 21127:2006)
for the controlled exchange of cultural heritage information.
Although covering the universe of cultural heritage concepts and providing
the formal ontology for archives, libraries and museums, implementations and
utilizations of this model are still considered rare.
While the CIDOC CRM is the result of the efforts of the
specialized CIDOC working group, it seems to be difficult for other members
of the professional community of museum specialists to share the highly
abstract essence of a conceptual reference model. The same is true for
other complex and diversified standards. Wikis with semantic functionality
(Semantic MediaWiki) are able to deal with both the complex and the abstract
features of an ontology as well as multiple pieces of data and information.
Therefore the combination of the model and a wiki can provide new qualities
of accessibility and connectivity for cultural heritage standards.
Fri, Sep 26: Ray LARSON:
Recent Trends in Digital Libraries and Information Retrieval.
This informal talk will discuss some of the recent trends and highlights
in recent conferences on Digital Libraries and Information Retrieval,
including the Joint Conference on Digital Libraries (JCDL), the ACM SIGIR
Meeting, and the recently ended European Conference on Digital Libraries
and Cross-Language Evaluation Forum.
Fri, Oct 3: Clifford LYNCH: Stewardship and Cultural Memory Organizations
in the Digital Age -- Part II.
This session continues the discussion I started at the
September 12 seminar about the changing nature of cultural memory organizations
in the digital world. While the earlier conversation focused largely on the
implications of converting the existing (retrospective) cultural and
intellectual record into digital form, the focus of this presentation
will be on the changing nature of the now increasingly digital record
as we go foward, and what that means for the roles and practices of
cultural memory organizations, and indeed for the nature of cultural
memory in the digital world.
Fri, Oct 10: M. BUCKLAND, R. LARSON & C. LYNCH: The Evolving Concept of
"Digital Libraries."
  Also guest Niels Windfeld LUND, University of Tromso, Norway.
  Former Visiting Professor Niels Lund will briefly describe
two projects that he is engaged on.
Both are seen as challenges to document theory and practice:
- Opera without an opera house: Using the internet to allow geographically dispersed
singers and musicians to perform together; and
- Telemedicine as a challenge to collaboration among diverse parties centered on
shared documents.
  Clifford Lynch will lead a discussion of the history and evolving
forms of the idea of "Digital Libraries" from the 1960s onwards. The Bay Area and
especially the Berkeley campus
and the UC Office of the President have been heavily engaged in these developments:
The School with its Institute for Library Research (through 1978) and subsequent
studies;
the UC Office of the President has made significant contributions through
the Division of Library Automation, later the California Digital Library; and
major campus based projects includes the SEQUIOA project and
the Digital Libraries Initiatives projects.
However, we will try to establish an overall picture, not
simply an account of local contributions.
Fri, Oct 17: Short Reports, including:
Patrick RILEY: The Toll of Privacy.
A report on a survey of attitudes and usage of the
Fastrak toll system, with special reference to possible trade-offs between
privacy and cost-savings.
Aurelie BENARD, Univ of Paris:
Creating an event-based analysis of biographical texts.
Entries in biographical dictionaries chronicle
the events in peoples' lives but not in a structured way.
Can be events be used as an analytical device for adding useful
structure to biographical texts? Experience in creating an event
analysis will be presented and the difficulties briefly discussed.
Nick RABINOWITZ: TimeMap: Displaying Geotemporal Data Online.
The TimeMap Javascript library is an Open Source project that ties
together Google Maps and MIT's SIMILE Timeline tool to display data that has both
a geographic and a temporal element. I will present several example implementations
and discuss further work and potential areas of application.
Ryan SHAW:
Designing an Event Directory for Irish History.
I will explain the notion of an "event directory" web
service and the kinds of applications it is intended to enable, and
will present initial steps toward developing an event directory for
use with the Digital Library of Core E-Resources on Ireland.
Fri, Oct 24: Geoffrey BOWKER, U. of Santa Clara: Representing Indigenous Knowledge.
The new tools of the information society have largely been
created by and for the developed world. In this talk, I discuss the
significance of and difficulties with representing other ways of knowing.
I will conclude by describing two new projects in this area.
For Geoffrey Bowker see
http://epl.scu.edu:16080/~gbowker.
Also Michael BUCKLAND: Women Pioneers of Library and Information
Science in France in the 1930s.
Influenced by Paul Otlet and by American librarianship,
a talented group of innovators rapidly and radically transformed librarianship
in France -- across the board from bookmobiles to information science.
The transformation was remarkable for the prominent leadership of several
women who knew each other and collaborated in numerous and complex ways
throughout long and successful careers.
A very brief pictorial introduction.
Fri, Oct 31: Robert PASLEY, Sheffield Univ., U.K.:
The Extent of Geographic Resources Available on the Web.
In this paper, we describe a methodology to estimate the
extent of geographic resources available on the web without the need for secondary
knowledge or complex geo-tagging. This is achieved by
randomly selecting toponyms from the Ordnance Survey 50K
gazetteer to create search queries and thus gather document
counts from various web sources for Great Britain. The same
gazetteer is then used to geo-code the results and enable
mapping. To validate our approach, and demonstrate the effects
of geo/non-geo and geo/geo ambiguity, we mapped the selected
toponyms to Geograph, a community project that contains user
generated geo-tagged photographs of the UK. Although success
varies with resolution, the proposed approach is likely sufficient
to be reliably used by applications exploring the geographic
coverage of the web for cases where references to settlements
are likely to be common. In our case, we applied the method to
produce maps of web coverage for a range of sources at a
resolution of 30km.
This paper is also being presented at the
16th ACM SIGSPATIAL International Conference on
Advances in Geographic Information Systems
(ACM GIS 2008).
Fri, Nov 7: David ROSENTHAL:
"Ensuring the Longevity of Digital Documents" Revisited.
Jeff Rothenberg's seminal Scientific American article "Ensuring the
Longevity of Digital Documents" looks forward 50 years from 1995 to ask
whether the transition to digital media places society's memory at risk.
Now, more than a quarter of the way through Rothenberg's scenario, it is
time to review his contribution. It stands as a monument to both the
value and the risk of this kind of scholarship. His broad vision has
been immensely valuable, drawing public attention to an important
problem and motivating significant, continuing efforts to address it. We
now have practical examples of systems preserving documents in the ways
he identified. Yet, to some extent because of his success, events have
transpired in ways that render all the specific concerns he
identified insignificant, while several concerns he missed completely
appear to pose much greater risks to society's memory than those he
envisaged.
David S. H. Rosenthal
invented the LOCKSS (Lots Of Copies Keep Stuff
Safe) technology and has been Chief Scientist of the LOCKSS program
at the Stanford Libraries since it started a decade ago. The program
develops tools that allow libraries to collect and preserve web
published materials (ejournals, books, blogs, web sites, archival
materials, etc) using low-cost, collaborative, peer-to-peer
technology.
Dr Rosenthal
is a long-time Silicon Valley engineer. He was an early
employee at Sun Microsystems, where he helped developed the X
Window System which has long been the open source standard. He
was employee #4 at Nvidia, now the leading supplier of high-performance
graphics chips.
Fri, Nov 14: Colin BURKE: Univ of Maryland Baltimore County:
Information and Intrigue:
From the Concilium to Noel Field and Alger Hiss.
A Biography of an Information Pioneer, His System, His Family, and His Heritage
1898-2008;
Or, You Can't Take the History of Information Out of History.
In the 1890s a young Quaker graduate of Harvard's
famed zoology
program decided to revolutionize the world's science information systems.
Using his own funds, as he awaited customers and support from the largest
philanthropic and professional organizations, he established the Concilium in
Zurich, Switzerland. Cooperating with Paul Otlet to modify Melvil Dewey's
numeric classifications he began to create a universal indexing and retrieval
system, promising to distribute a "random access, on time" technologically
advanced cumulative file of the world's natural science literature. In 1898
Herbert Field launched what he believed would be his contribution to modernization
and world peace.
The story of the rise, fall, re-birth, and demise of the Concilium
Bibliographicum is more than a near half-century epic of information technology.
The fate of the Zurich system was entwined with the emergence of modern science
and its non-profit institutions (and its first professional entrepreneur-scholars
in America); with espionage in World War I and World War II; with national
competition in science publishing; with the attempt to rebuild world science
after Versailles; with the ramifications of the Russian Revolution and the Great
Depression; and, with the rise and morphing of America's liberal culture.
The biography of the Concilium and its founder and his family
travels into the twenty-first century as his children became central to the
horrible political purges in Eastern Europe; as science, universities, and
science information all became big businesses; as America struggled through
the Cold War and conflicts over "information socialism"; and, as the new
century shows signs that science information will no longer be the domain of
idealists like Herbert Field, the ideologues who ran the great Soviet VINITI
information system, or even an Information Scientist like Eugene
Garfield. Rather, it seems to have become part of the world of global capitalism.
Fri, Nov 21.Timothy TANGHERLINI, UCLA: Mapping Folklore:
hGIS, Machine Learning and the Danish Folklore Archive.
The project has multiple parts to it, and the goal is to get the parts
eventually working together.
The problem is large: folklore, which emerges from the dialectic tension
between individual and tradition, is conditioned by social networks and
reflects individuals' use of the resource of tradition to understand
changes in the physical and manmade environment, and negotiate their
shifting status in the rapidly changing economic and social
environments. The Danish folklore archive material with which I work is
based on 200+ fieldtrips made by Evald Tang Kristensen from 1867-1910,
during which he collected 250,000+ stories, songs, games, rhymes, cures,
observations of daily life, etc, from 6500+ named informants.
The goal of my work is to attach all of these collections back to the
individuals who told them, and to situate these individuals into a data
rich environment, in which their stories--and patterns that emerge in
the storytelling--can be interrogated not only as broad phenomena, but
also in depth (drilling down to the individual story). By using
information from other sources--census data, church records, voting
patterns, parish out/in migration, I hope to be able to present a far
richer interpretive environment for the study of folklore while, at the
same time, making the voices of the generally disenfranchised available
to other researchers. You can play with the current interface (which is
more of a book project than a research interface) at: dev.cdh.ucla.edu/danishfolklore/bin/mainview.html
Some of the patterns are discernible using simple fairly simple math on
the graphs drawn in ArcGIS, while other machine learning techniques
(particularly unsupervised machine learning) allow for the discovery of
other patterns based on the text(s) themselves rather than simply on
places mentioned in the texts. Projecting the results of machine
learning back into the hGIS environment allows for another set of
secondary evaluations of the "clusters" using standard GIS tools (the
wide circles vs. narrow ellipses for example, based on informant
gender).
The use of machine learners (particularly supervised learners), could
also help us infer social networks either based on individual story
comparisons, or on repertoire comparisons (eg all of the fairy tales
told by a single informant).
Why the need to use these techniques--sheer volume: if one were doing
this for one or two, or even ten people, you might be able to do some
excellent close reading; doing it for dozens or hundreds or thousands
requires a different set of tools.
Where does the NLP come in? For Danish, morph analysis is actually
trivial; but for some of the cognate languages, such as New Norwegian,
Icelandic, etc., the problem is quite significant--if one wants to be
able to consider connecting the Danish materials to the Icelandic
materials to the Norwegian to the Swedish--and perhaps to the English
and Irish--one needs good NLP. Morph analysis is one small part of that
equation, but would allow for greater efficiency and accuracy in the
machine learning environment. Named-entity detection for inflected
languages would also work an awful lot better if you had automated
morpho-syntactic markup. So, the Icelandic work is focusing on the
morphological side of this problem. Once we get that running, then (a)
NED and auto-mapping from say the sagas (or the giant db of Icelandic
folklore) would be quite easy, (b) lemmatized searching in the corpus
would be possible and (c) cross language searching would be more
accurate.
Timothy TANGHERLINI is Chair of the Scandinavian Section
at UCLA. More at
www.humnet.ucla.edu/humnet/scandinavian/tango.html.
Fri, Nov 28: Thanksgiving. No Seminar meeting.
Fri, Dec 5: Last Seminar meeting of the semester: Students' Final
Progress Reports:
-- Patrick RILEY: Berkeley Energy Dashboard.
Along with CS and Urban Planning Ph.D. students, Patrick
Riley has been designing the Berkeley Energy Dashboard this semester,
which aims to make resource usage data for all buildings on the Berkeley
campus. With both interactive exhibits at the Free Speech Cafe, and
a powerful website, they are intent on raising awareness, facilitating
online discussions on usage data, and even changing behavior
related to energy use.
-- Nick RABINOWITZ: TimeMap and HistoryVis: Displaying Geotemporal Data Online.
The TimeMap Javascript library is an Open Source project that ties
together Google Maps and MIT's SIMILE Timeline. I will present HistoryVis, a
user-friendly system based on the TimeMap library that allows non-technical
users to create and edit maps and timelines that work together to display
content with both geographic and temporal components.
-- Ryan SHAW: Mining historical event references from scanned documents.
As part of an effort to build an "event directory" for
use with the Digital Library of Core E-Resources on Ireland, I am
experimenting with various techniques for mining references to
historical events from scanned texts (books and journals). I will
explain the various techniques I have tried or am considering, as well
as (hopefully) some preliminary results.
The Seminar will resume on January 23.
Fall 2008 schedule.
Spring
2008 schedule
and summaries.