296a-1 Seminar Information Access Summaries Fall 2008

School of Information
Previously School of Library & Information Studies

296a-1 Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries - Fall 2008.

Fridays 3-5. 107 South Hall. Schedule.

Summaries will be added as they become available.

Friday, August 29: First Meeting of the semester.
Clifford LYNCH: Introduction. Summer Developments
Introduction to Seminar: Purpose, History, Plans. Introductions.
Summary of Summer Happenings & Conference Reports (Participation from all attendees welcome). I'll report on the Report of the NSF Cyberlearning Task Force, which appeared a few weeks ago; I served on this task force, and the report can be found at www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf08204.
I will also report on some developments in cyberinfrastructure, and provide very brief reports on the WIPO meeting on preservation and intellectual property in the digital age (July 15), the UK Digital Lives project and the RLG partners symposium. I hope we will also have brief reports on the highlights of the Joint Conference on Digital Libraries (though we will have THE highlight next week, when Cathy Marshall will present an expanded version of her prize-winning paper at seminar), and other events or publications of interest.

Fri, Sep 5: Cathy MARSHALL, Microsoft Research: Taking the Scholars' Perspective on Scholarly Archiving.
About a year ago, I undertook a qualitative field study of the scholarly writing, collaboration, information management, and long-term archiving practices of researchers in five related subdisciplines. Fifteen generous participants allowed me to interview them about the kinds of artifacts they create in the process of writing a paper, how they exchange and store materials over the short term, how they handle references and bibliographic resources, and the strategies they use to guarantee the long term safety of their scholarly materials. The findings revealed some surprising design implications for collaboration infrastructure and personal scholarly archives in addition to suggesting some ways to facilitate the deposit of scholarly materials into institutional and disciplinary repositories.
What did I learn? Come find out!

Fri, Sep 12: Clifford LYNCH: Stewardship and Cultural Memory Organizations in the Digital Age.
In this presentation, I'll begin a discussion of the changing nature of cultural memory organizations in the digital world, examine some of the convergences taking place among libraries, museums and archives, and raise questions about the nature of good stewardship in the digital age, and some of the legal and social challenges to this. The approach will include some historical perspectives, as well as a look at current developments. I'll also discuss some aspects of the nature of cultural memory in the digital world (building in part on some of Cathy Marshall's earlier presentation). I will include a number of open research topics. It's likely that some topics here will be continued to subsequent seminar sessions, depending on the specific interests of the group.

Fri, Sep 19: Thomas TUNSCH, National Museums in Berlin: The Semantic Web and Wikis.
Museums as well as other communities related to cultural heritage have developed many standards with different scopes and levels of implementation. The CIDOC CRM is the international standard (ISO 21127:2006) for the controlled exchange of cultural heritage information. Although covering the universe of cultural heritage concepts and providing the formal ontology for archives, libraries and museums, implementations and utilizations of this model are still considered rare.
While the CIDOC CRM is the result of the efforts of the specialized CIDOC working group, it seems to be difficult for other members of the professional community of museum specialists to share the highly abstract essence of a conceptual reference model. The same is true for other complex and diversified standards. Wikis with semantic functionality (Semantic MediaWiki) are able to deal with both the complex and the abstract features of an ontology as well as multiple pieces of data and information. Therefore the combination of the model and a wiki can provide new qualities of accessibility and connectivity for cultural heritage standards.

Fri, Sep 26: Ray LARSON: Recent Trends in Digital Libraries and Information Retrieval.
This informal talk will discuss some of the recent trends and highlights in recent conferences on Digital Libraries and Information Retrieval, including the Joint Conference on Digital Libraries (JCDL), the ACM SIGIR Meeting, and the recently ended European Conference on Digital Libraries and Cross-Language Evaluation Forum.

Fri, Oct 3: Clifford LYNCH: Stewardship and Cultural Memory Organizations in the Digital Age -- Part II.
This session continues the discussion I started at the September 12 seminar about the changing nature of cultural memory organizations in the digital world. While the earlier conversation focused largely on the implications of converting the existing (retrospective) cultural and intellectual record into digital form, the focus of this presentation will be on the changing nature of the now increasingly digital record as we go foward, and what that means for the roles and practices of cultural memory organizations, and indeed for the nature of cultural memory in the digital world.

Fri, Oct 10: M. BUCKLAND, R. LARSON & C. LYNCH: The Evolving Concept of "Digital Libraries."
Also guest Niels Windfeld LUND, University of Tromso, Norway.
Former Visiting Professor Niels Lund will briefly describe two projects that he is engaged on. Both are seen as challenges to document theory and practice:
- Opera without an opera house: Using the internet to allow geographically dispersed singers and musicians to perform together; and
- Telemedicine as a challenge to collaboration among diverse parties centered on shared documents.
Clifford Lynch will lead a discussion of the history and evolving forms of the idea of "Digital Libraries" from the 1960s onwards. The Bay Area and especially the Berkeley campus and the UC Office of the President have been heavily engaged in these developments: The School with its Institute for Library Research (through 1978) and subsequent studies; the UC Office of the President has made significant contributions through the Division of Library Automation, later the California Digital Library; and major campus based projects includes the SEQUIOA project and the Digital Libraries Initiatives projects. However, we will try to establish an overall picture, not simply an account of local contributions.

Fri, Oct 17: Short Reports, including:
Patrick RILEY: The Toll of Privacy.
A report on a survey of attitudes and usage of the Fastrak toll system, with special reference to possible trade-offs between privacy and cost-savings.
Aurelie BENARD, Univ of Paris: Creating an event-based analysis of biographical texts.
Entries in biographical dictionaries chronicle the events in peoples' lives but not in a structured way. Can be events be used as an analytical device for adding useful structure to biographical texts? Experience in creating an event analysis will be presented and the difficulties briefly discussed.
Nick RABINOWITZ: TimeMap: Displaying Geotemporal Data Online.
The TimeMap Javascript library is an Open Source project that ties together Google Maps and MIT's SIMILE Timeline tool to display data that has both a geographic and a temporal element. I will present several example implementations and discuss further work and potential areas of application.
Ryan SHAW: Designing an Event Directory for Irish History.
I will explain the notion of an "event directory" web service and the kinds of applications it is intended to enable, and will present initial steps toward developing an event directory for use with the Digital Library of Core E-Resources on Ireland.

Fri, Oct 24: Geoffrey BOWKER, U. of Santa Clara: Representing Indigenous Knowledge.
The new tools of the information society have largely been created by and for the developed world. In this talk, I discuss the significance of and difficulties with representing other ways of knowing. I will conclude by describing two new projects in this area.
For Geoffrey Bowker see http://epl.scu.edu:16080/~gbowker.
Also Michael BUCKLAND: Women Pioneers of Library and Information Science in France in the 1930s.
Influenced by Paul Otlet and by American librarianship, a talented group of innovators rapidly and radically transformed librarianship in France -- across the board from bookmobiles to information science. The transformation was remarkable for the prominent leadership of several women who knew each other and collaborated in numerous and complex ways throughout long and successful careers. A very brief pictorial introduction.

Fri, Oct 31: Robert PASLEY, Sheffield Univ., U.K.: The Extent of Geographic Resources Available on the Web.
In this paper, we describe a methodology to estimate the extent of geographic resources available on the web without the need for secondary knowledge or complex geo-tagging. This is achieved by randomly selecting toponyms from the Ordnance Survey 50K gazetteer to create search queries and thus gather document counts from various web sources for Great Britain. The same gazetteer is then used to geo-code the results and enable mapping. To validate our approach, and demonstrate the effects of geo/non-geo and geo/geo ambiguity, we mapped the selected toponyms to Geograph, a community project that contains user generated geo-tagged photographs of the UK. Although success varies with resolution, the proposed approach is likely sufficient to be reliably used by applications exploring the geographic coverage of the web for cases where references to settlements are likely to be common. In our case, we applied the method to produce maps of web coverage for a range of sources at a resolution of 30km.
This paper is also being presented at the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008).

Fri, Nov 7: David ROSENTHAL:
"Ensuring the Longevity of Digital Documents" Revisited.
Jeff Rothenberg's seminal Scientific American article "Ensuring the Longevity of Digital Documents" looks forward 50 years from 1995 to ask whether the transition to digital media places society's memory at risk. Now, more than a quarter of the way through Rothenberg's scenario, it is time to review his contribution. It stands as a monument to both the value and the risk of this kind of scholarship. His broad vision has been immensely valuable, drawing public attention to an important problem and motivating significant, continuing efforts to address it. We now have practical examples of systems preserving documents in the ways he identified. Yet, to some extent because of his success, events have transpired in ways that render all the specific concerns he identified insignificant, while several concerns he missed completely appear to pose much greater risks to society's memory than those he envisaged.
David S. H. Rosenthal invented the LOCKSS (Lots Of Copies Keep Stuff Safe) technology and has been Chief Scientist of the LOCKSS program at the Stanford Libraries since it started a decade ago. The program develops tools that allow libraries to collect and preserve web published materials (ejournals, books, blogs, web sites, archival materials, etc) using low-cost, collaborative, peer-to-peer technology.
Dr Rosenthal is a long-time Silicon Valley engineer. He was an early employee at Sun Microsystems, where he helped developed the X Window System which has long been the open source standard. He was employee #4 at Nvidia, now the leading supplier of high-performance graphics chips.

Fri, Nov 14: Colin BURKE: Univ of Maryland Baltimore County: Information and Intrigue: From the Concilium to Noel Field and Alger Hiss. A Biography of an Information Pioneer, His System, His Family, and His Heritage 1898-2008; Or, You Can't Take the History of Information Out of History.
In the 1890s a young Quaker graduate of Harvard's famed zoology program decided to revolutionize the world's science information systems. Using his own funds, as he awaited customers and support from the largest philanthropic and professional organizations, he established the Concilium in Zurich, Switzerland. Cooperating with Paul Otlet to modify Melvil Dewey's numeric classifications he began to create a universal indexing and retrieval system, promising to distribute a "random access, on time" technologically advanced cumulative file of the world's natural science literature. In 1898 Herbert Field launched what he believed would be his contribution to modernization and world peace.
The story of the rise, fall, re-birth, and demise of the Concilium Bibliographicum is more than a near half-century epic of information technology. The fate of the Zurich system was entwined with the emergence of modern science and its non-profit institutions (and its first professional entrepreneur-scholars in America); with espionage in World War I and World War II; with national competition in science publishing; with the attempt to rebuild world science after Versailles; with the ramifications of the Russian Revolution and the Great Depression; and, with the rise and morphing of America's liberal culture.
The biography of the Concilium and its founder and his family travels into the twenty-first century as his children became central to the horrible political purges in Eastern Europe; as science, universities, and science information all became big businesses; as America struggled through the Cold War and conflicts over "information socialism"; and, as the new century shows signs that science information will no longer be the domain of idealists like Herbert Field, the ideologues who ran the great Soviet VINITI information system, or even an Information Scientist like Eugene Garfield. Rather, it seems to have become part of the world of global capitalism.

Fri, Nov 21.Timothy TANGHERLINI, UCLA: Mapping Folklore: hGIS, Machine Learning and the Danish Folklore Archive.
The project has multiple parts to it, and the goal is to get the parts eventually working together. The problem is large: folklore, which emerges from the dialectic tension between individual and tradition, is conditioned by social networks and reflects individuals' use of the resource of tradition to understand changes in the physical and manmade environment, and negotiate their shifting status in the rapidly changing economic and social environments. The Danish folklore archive material with which I work is based on 200+ fieldtrips made by Evald Tang Kristensen from 1867-1910, during which he collected 250,000+ stories, songs, games, rhymes, cures, observations of daily life, etc, from 6500+ named informants.
The goal of my work is to attach all of these collections back to the individuals who told them, and to situate these individuals into a data rich environment, in which their stories--and patterns that emerge in the storytelling--can be interrogated not only as broad phenomena, but also in depth (drilling down to the individual story). By using information from other sources--census data, church records, voting patterns, parish out/in migration, I hope to be able to present a far richer interpretive environment for the study of folklore while, at the same time, making the voices of the generally disenfranchised available to other researchers. You can play with the current interface (which is more of a book project than a research interface) at: dev.cdh.ucla.edu/danishfolklore/bin/mainview.html
Some of the patterns are discernible using simple fairly simple math on the graphs drawn in ArcGIS, while other machine learning techniques (particularly unsupervised machine learning) allow for the discovery of other patterns based on the text(s) themselves rather than simply on places mentioned in the texts. Projecting the results of machine learning back into the hGIS environment allows for another set of secondary evaluations of the "clusters" using standard GIS tools (the wide circles vs. narrow ellipses for example, based on informant gender).
The use of machine learners (particularly supervised learners), could also help us infer social networks either based on individual story comparisons, or on repertoire comparisons (eg all of the fairy tales told by a single informant).
Why the need to use these techniques--sheer volume: if one were doing this for one or two, or even ten people, you might be able to do some excellent close reading; doing it for dozens or hundreds or thousands requires a different set of tools.
Where does the NLP come in? For Danish, morph analysis is actually trivial; but for some of the cognate languages, such as New Norwegian, Icelandic, etc., the problem is quite significant--if one wants to be able to consider connecting the Danish materials to the Icelandic materials to the Norwegian to the Swedish--and perhaps to the English and Irish--one needs good NLP. Morph analysis is one small part of that equation, but would allow for greater efficiency and accuracy in the machine learning environment. Named-entity detection for inflected languages would also work an awful lot better if you had automated morpho-syntactic markup. So, the Icelandic work is focusing on the morphological side of this problem. Once we get that running, then (a) NED and auto-mapping from say the sagas (or the giant db of Icelandic folklore) would be quite easy, (b) lemmatized searching in the corpus would be possible and (c) cross language searching would be more accurate.
Timothy TANGHERLINI is Chair of the Scandinavian Section at UCLA. More at www.humnet.ucla.edu/humnet/scandinavian/tango.html.

Fri, Nov 28: Thanksgiving. No Seminar meeting.

Fri, Dec 5: Last Seminar meeting of the semester: Students' Final Progress Reports:
-- Patrick RILEY: Berkeley Energy Dashboard.
Along with CS and Urban Planning Ph.D. students, Patrick Riley has been designing the Berkeley Energy Dashboard this semester, which aims to make resource usage data for all buildings on the Berkeley campus. With both interactive exhibits at the Free Speech Cafe, and a powerful website, they are intent on raising awareness, facilitating online discussions on usage data, and even changing behavior related to energy use.
-- Nick RABINOWITZ: TimeMap and HistoryVis: Displaying Geotemporal Data Online.
The TimeMap Javascript library is an Open Source project that ties together Google Maps and MIT's SIMILE Timeline. I will present HistoryVis, a user-friendly system based on the TimeMap library that allows non-technical users to create and edit maps and timelines that work together to display content with both geographic and temporal components.
-- Ryan SHAW: Mining historical event references from scanned documents.
As part of an effort to build an "event directory" for use with the Digital Library of Core E-Resources on Ireland, I am experimenting with various techniques for mining references to historical events from scanned texts (books and journals). I will explain the various techniques I have tried or am considering, as well as (hopefully) some preliminary results.

The Seminar will resume on January 23.

Fall 2008 schedule. Spring 2008 schedule and summaries.