School of
Information Management & Systems
Previously School of Library & Information Studies
296a-1
Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries - Spring 2004.
Fridays 3-5. 107 South Hall.
Schedule.
Summaries will be added below as they become available.
Jan 23: Cliff LYNCH: Welcome and introduction to the Seminar.
Fredric GEY & Ray LARSON: Report on the HICSS: Hawaii International
Conference on System Sciences, Jan 2004.
www.hicss.org.
Michael BUCKLAND: "Intermediate infrastructure".
In a traditional, paper-based library, if you wanted
to find out about an unfamiliar topic, the reference collection
provided a rich environment for the early stages of inquiry. Dictionaries,
encyclopedias, biographical dictionaries, gazetteers and atlases, and
other reference tools help one with Who, What, Where, and When.
Bibliographies and the catalog indicate resources which might help
with How and Why.
Digital library services started with bibliographies
and catalogs, then moved to texts, images, and other primary resources.
How might we reconstruct in a digital library environment the "intermediate
infrastructure" provided by the conventional reference library?
Jan 30: Clifford LYNCH: Stewardship in the Digital Age.
The shift of much of our ongoing intellectual and
cultural record
into digital format, and the capability to capture much of the
past record in
digital form, gives rise to a vast number of new stewardship issues.
My first goal
in this talk is to give a survey and overview of the issues as they
play out
on three levels: the personal, the organizational, and the national.
I will then,
as time allows, go into more detail on several specific issues:
storage models
and replication; issues raised by cyberinfrastructure or
"e-science" developments;
and public policy issues. I'll highlight a number of open
research questions.
I intend to continue exploring specific issues within this
framework in
additional seminar presentations during the semester.
Feb 6: Jane WHITE, Director, International Children's Digital Library
www.icdlbooks.org
Background: The International Children's Digital
Library is a research project funded by the National Science Foundation.
The primary goal of the project is to study how children relate to
digital material. Currently, the library is showcasing around 300
books from around the world, with an additional 600 about to be
posted. These books represent over 26 different languages.
The library was launched in November 2002 at the Library of
Congress in Washington, D.C.
Scope of Project: The International Children's
Digital Library is dedicated to showcasing the best in children's
literature from around the world. In order to achieve this goal,
the ICDL’s is building relationships around the world that help
identify quality books reflective of each country’s outstanding
children’s literature both historic and contemporary. In each
country, the ICDL identifies the appropriate organization(s) best
suited to accomplish these tasks. The ICDL believes that an
important collection is built with a strong selection process and
relies on partnerships with national libraries, IBBY representatives,
and other Institutes for Children’s Literature. It is hoped that
through these collaborative relationships that the ICDL collection
will continue to grow and reach its goal of 10,000 quality
children’s books in over 100 languages.
Who is using the site: Initial user analysis
has shown that the majority of ICDL users are parents or kids using
the ICDL in their home. Second are teachers and librarians eager
to address the needs of a growing diversity in their classrooms
and/or libraries. The majority of ICDL's users are in the
United States, Taiwan, Canada, Hong Kong, or
Europe. As the collection
grows, it is expected that use will increase in other countries.
Feb 13: AnnaLee SAXENIAN, Dean: Discussion.
Professor Saxenian became Dean of this School on February 1 and
will lead a discussion about the opportunities and challenges of developing
an interdisciplinary professional school at Berkeley, including
the challenges of building a new program in the current economic
and intellectual climate.
As background, note the Information Planning Group
report that recommended a new program at
http://www.sims.berkeley.edu/about/history/proposal.html.
Feb 20: Clifford LYNCH and Michael BUCKLAND
Clifford LYNCH: Stewardship in the Digital Age (Continued).
The shift of much of our ongoing intellectual and
cultural record
into digital format, and the capability to capture much of the
past record in
digital form, gives rise to a vast number of new stewardship issues.
My first goal
in this talk is to give a survey and overview of the issues as they
play out
on three levels: the personal, the organizational, and the national.
I will then,
as time allows, go into more detail on several specific issues:
storage models
and replication; issues raised by cyberinfrastructure or
"e-science" developments;
and public policy issues. I'll highlight a number of open
research questions.
I intend to continue exploring specific issues within this
framework in
additional seminar presentations during the semester. (Continued).
Michael BUCKLAND: Infrastructure to Support Multimedia Search.
The broad adoption of digital technology leads to
optimistic ideas about "convergent media environments," but the
reality is complex. Images and text, for example, remain quite
different forms of expression in a digital environment as in a
paper environment. Searching across media directly appears
impossible in principle, but it can be done indirectly
using shared (or interoperable) metadata. The concepts and
terminology for analyzing these issues appear inadequate.
I will raise some questions and invite answers and suggestions.
Feb 27: Oliver GUENTHER, Institute of Information Systems,
Humboldt University, Berlin: Privacy in E-Commerce: Stated
Preferences vs. Actual Behavior.
In this talk, we present results from a large-scale online shopping
experiment. They suggest that, given the right circumstances, online
users easily forget about their privacy concerns and communicate even
the most personal details without any compelling reason to do so. This
holds in particular when the online exchange is entertaining and
appropriate benefits are offered in return for information revelation --
circumstances easily created by second-generation agent technologies and
embodied interface agents. Privacy statements have no impact on most
users' behavior. In concluding, we discuss some possible reasons for
this discrepancy between stated preferences and actual behavior. We also
suggest ways how to help users better align their actions with their
goals. (Joint work with Bettina Berendt and Sarah Spiekermann).
Mar 5: Czeslaw GRYCZ, Octavo:
Octavo Ultra-High Resolution Digital Images of Rare Books:
A Survey of What Works and What Needs Work.
www.octavo.com.
Czeslaw Jan Grycz is CEO of Octavo, a company
specializing in the ultra-high digitization (10,500 x 12,700 pixels)
of rare books, incunabula titles, and manuscripts. Seeking a niche in
the publishing market, Octavo also publishes a variety of resulting
materials from the images.
On the one hand, the very high definition images
constitute surrogate digital facsimile files for a library wishing to
protect its valuable original works against damage, unnecessary
circulation, or over-handling. On the other hand, the digital
publications Octavo produces, provide wider access to books that
would otherwise be difficult, if not impossible, to handle and study
in person.
As Octavo's collection grows, issues of file of
management, metadata, color controls, proper visualization all
become more complex.
Chet came to the UC Berkeley from Stanford University
Press. He worked at the University of California Press for 14 years
as Design and Manufacturing Manager, and for Clifford Lynch in
the (then) "Division of Library Automation" for an additional six
years before taking early retirement. He will share with us a
survey of Octavo's accomplishments during 2003, tell us about
some projects underway in 2004, and will identify some of the
most pressing challenges of 2004. He will hypothesize about
future of digitization and digital publication, given ongoing
changes in technology and pedagogy.
March 12: John McCARTHY, Steve LUSSIER, Henri POOLE, Dan ROBINSON
and Phil WOLFF: Digital Democracy:
Report on a recent conference and notes from the trenches.
We'll recap highlights from presentations as well as
informal conversations with participants at the
Digital Democracy Teach-In
on Feb 9 in San Diego which brought together "pioneers who are re-inventing
democracy for our networked world,"
including Joe Trippi, political consultant and former Dean Campaign
Manager, Wes Boyd, co-founder of MoveOn.org, and Scott Heiferman,
founder of Meetup.com.
We'll also give our personal perspectives on some
successes and failures of information technology in this year's
election cycle, along with discussion of where these developments
may lead in the next few years.
John McCARTHY has been working on databases
at LBL since 1980 (and elsewhere for 15 years before that) and has
participated regularly in the Friday Afternoon Seminars over the years.
Having returned to volunteer political activities in June after a
thirty year break, he has worked with MoveOn.org, helped organize
Tech4Dean (volunteer computer professionals working for Howard Dean's
Presidential campaign), and is now working to bring together volunteer
computer professionals from the various Democratic Presidential campaigns.
Steve LUSSIER is a recent MIMS graduate.
Henri POOLE is
technical advisor to Dennis Kucinich, Free Software Foundation Board
member, and former CEO of Mandrakesoft.
Dan ROBINSON
is co-founder and CTO of the E-Volve Foundation, a strategic think-tank
that works with non-profits and public interest groups to create
definitive models for technology-enabled organizing; he helped set
up and run the East Bay for Dean web site.
Philip WOLFF hails from Oakland,
California. In the last
year he presented at the
ProjectWorld conference (project blogging),
BlogTalk Wien
(the future of blogging),
and BloggerCon
(blogging behind the firewall). He posts regularly
to
a klog apart
and
Blogcount, and
in moments of apoplexy to
Don't Blog:
Blogging the Weblog Backlash.
Phil has been blogging for 5 years, computing for
more than 30 years, a
marketing and technology veteran of the Naval Supply
Systems Command, Gateway,
Compaq, Wang Laboratories, Bechtel National, and
Adecco SA where he served as
global VP for strategy and technology. When Phil
isn't helping companies
rethink their employment sites, his Evanwolf Group
helps them develop
strategies, plans and technologies for workplace
blogging.
Phil created and organized the eastbaykerry.com
web site and the KerryTech email group.
Contact: Ryze,
contact me.
Mar 19: Daniel GREENSTEIN, University Librarian and Executive Director,
California Digital Library: Economics of Scholarly Publishing.
California Digital Library: Economics or Scholarly Publishing.
Mar 26: No seminar - Spring recess.
Apr 2: Michal FELDMAN: Economic Incentives for Cooperation in
Peer-to-Peer Networks..
Peer-to-peer (p2p) systems enable resource sharing between individual peers, who are expected to voluntarily contribute their own resources to the system. However, contribution consumes their resources and may impair their own welfare. Therefore, many users prefer to “free ride” on the system’s resources; consuming the system’s resources without also providing their own. The inherent tension between individual rationality and collective welfare produces a misalignment of incentives, which threaten to degrade the system performance. In this talk, I am going to give an overview of the problem, and discuss several projects I’m involved in, which attempt to develop a framework for understanding the technical and economic characteristics of p2p systems and design economic mechanisms for incentive compatible p2p applications.
Apr 9: Rob SANDERSON, Liverpool Univ.:
A Discussion of the SRW 1.1 (Search and Retrieve for the Web) Protocol.
SRW 1.1, the first stable version of the ZiNG initiative's
Search/Retrieve Web Service, was released in mid February with significant
improvements over version 1.0. It is an XML oriented protocol designed as
a low barrier to entry alternative to Z39.50. This discussion will
quickly cover the basics of SRW and how they compare to Z39.50, before
going over the new aspects of the protocol in more detail.
More at http://srw.cheshire3.org/
Rob Sanderson is the Senior Editor for SRW and recently graduated with a
PhD from the University of Liverpool. He works on the Cheshire project,
currently implementing a new distributable version of the server. More at
http://www.o-r-g.org/~azaroth/
Apr 16: Two topics building on previous discussions:
Aitao CHEN & Michael BUCKLAND:
Infrastructure for Disambiguating Entities and Events.
Natural language processing is being used to detect and disambiguate
named entities and events: The extraction of named entities (person,
place, organization,
and institution) from texts;
the detection of relations between named entities;
the disambiguation of place names; and
the translation of foreign named entities into English. How might traditional
reference tools, such as gazetteers and chronologies, be used
to improve performance?
Jeanette ZERNEKE & Michael BUCKLAND:
Redesigning Scholarly Publishing - Part 1.
Dan Greenstein argued persuasively on March 19 that the present system
of scholarly publishing is broken and that new methods need to be designed and tested.
We will take up this challenge on April 16 and 23.
What constitutes good practice in e-publications that go beyond
static documents to include dynamic content? How can scholars get credit for the
additional work it takes to create good digital projects? How can these efforts
be preserved over time? What institutional changes are indicated? Major issues
include practices for peer reviews of electronic publications, IT architecture and
data formats for electronic publications, how to incorporate distributed internet
data, and persistence of dynamic, interactive publications.
Some years ago the Electronic Cultural Atlas Initiative, in
collaboration with the California Digital Library, developed an electronic
publications program to provide stable, long-term access to peer reviewed,
map-based digital scholarship in history and the humanities.
These publications include text components, web resources such as images,
web-based maps, and fully interactive downloadable maps. Standards and processes
for creating the publications include conducting peer reviews of both their
technical architecture and scholarly content. The first ECAI publications are
A Sasanian Seal Collection in Context and Mapping the Mainline:
Using Historical GIS to Study American Religion.
See http://ecai.org/projects/epublication
s.html.
What would be the optimal strategy for a renewed e-publication
program that would be an effective contribution to the wider
problem? Come and help plan such a program.
Apr 23: Vivien PETRAS: The Use of Specialized Vocabulary in Subject Searches
The language problem in information retrieval is in essence a problem of
expressing a vague information need in a query processable by an
information retrieval system. Search uncertainty arises because a searcher
might not know how to state the information need as a textual
query; or might use language that does not match the language used to
describe the concept in the document set; or might use terms (words
or category codes) that are not the best choice for finding the most
relevant documents or even mostly relevant documents.
Entry Vocabulary Indexes match searcher language (terms that the searcher
thought of for an initial query) with controlled vocabulary terms from the
document set in order to improve query formulation. Two main advantages
arise: the searcher is presented with terms in the system's vocabulary describing the
information need (based on initial query)
thereby not only providing support for query formulation and expansion
(without additional learning effort by the searcher) but also
increasing the probability of successful retrieval because more
(search)effective controlled vocabulary terms are used.
I will describe the idea of using research specialties to specify search
spaces within a general document set and the idea of using the specialized
technical language of specialties to distinguish between specialties in an
information retrieval system. It is proposed to show how the specialized
language used in the searcher's specialty can be used to improve
Entry Vocabulary Indexes to make this process even more precise.
Apr 30: Mikhail AVREKH: FreeDB and Other Music Metadata Providers: The
Hidden Linchpins of the P2P Phenomenon.
I will discuss the metadata infrastructure that supports p2p networks, as
well as some historical antecedents having to do with providers of
metadata in other fields (e.g. the library community).
Also Merrilee PROFFITT, RLG: RedLightGreen.
One year ago, RLG was preparing to launch RedLightGreen, a free online
service based on the records in the RLG Union Catalog aimed at college
undergraduates and optimized to provide access to a wealth of high
quality, trusted, print resources through a simple, easy-to-use interface.
Last spring, we gave a presentation in this forum that highlighted use of
FRBR, MARC in XML, data mining, user studies, and future directions. Now,
with a full semester and more of academic trial use, further user studies,
and with continued funding from the Andrew W. Mellon Foundation, RLG will
report on:
• Who's using the system, and how?
• Further findings from extended user studies, and how user
studies have specifically influenced interface design and helped dictate
future directions for the service
• How institutions can join an expanded partnership for
RedLightGreen - for free
• Planned future directions for the service
May 7: John WIECZOREK and John McCARTHY:
Global Biodiversity Information Facility: Information Retrieval
from Federated Databases
The Global Biodiversity Information Facility Network
(GBIF) has designed and begun to deploy a
network architecture "to enable users throughout the world to
discover and put to use vast quantities of global biodiversity data,
thereby advancing scientific research in many disciplines, promoting
technological and sustainable development, facilitating the
equitable sharing of the benefits of biodiversity, and enhancing
the quality of life of members of society." A recent review
(by John McCarthy) said that "GBIF is well on its way toward
becoming one of the premier examples of a successful federated
database network. Moreover, they have done so thus far on a
remarkably modest budget by using widely used modern software,
protocols and standards."
John WIECZOREK is project programmer for
the Mammal Networked Information System
(MaNIS) at UC Berkeley's
Museum of Vertebrate Zoology (MVZ
www.mip.berkeley.edu/mvz/).
He is co-architect and
implementor of the Distributed Generic Information Retrieval
(DiGIR) protocol and software
and a member of the GBIF Data Access and Database
Interoperability (DADI), Digitisation of Natural History
Collections (DIGIT), and Science (SCI) working groups.
John McCARTHY has been working on
databases at LBL since 1980 (and elsewhere for 15 years
before that); he has participated regularly in the
Friday Afternoon Seminar over the years.
May 7 is the last Seminar meeting of the semester.
Aug 13: Julian Warner, Belfast:
A labor theoretic approach to information retrieval
In the post-Edenic world, we are condemned to labor
and compelled to choose.
Labor has often been conceived as physical rather than mental labor and has
seldom been connected with choice. Yet in information retrieval systems,
mental labor and choice can be seen to converge. Examining their
convergence can yield an evaluative model for information retrieval, closely
linked to real world forces and practice.
The essential aim for information retrieval systems is taken as selection
power rather than the transformation of a query into a set of relevant
records. Selection power is regarded as the product of selection labor.
Selection labor is taken to be composed of description labor and search
labor.
A certain quantity of selection labor, associated with the number and
variety of objects described within a system, is assumed. Components of
selection labor can be transferred to information technologies and the
distribution of selection labor between description and searching can be
modified, but the overall quantity of labor cannot be reduced. Description
labor is distinguished from description processes, and, more sharply, from
description products (such as catalog or database records).
Two significant constraints can work against enhancing selection power.
First, the costs of direct human labor in description and searching may lead
to a preference for economy in the use of that labor. Secondly, there are
epistemological difficulties in representation of objects. These
constraints have tended to be considered separately, with more attention
given to epistemological issues. Constraints arising from the costs of
human labor may have been more influential on practice.
A decision framework for the consideration, design, and use of information
retrieval systems is then constructed: description labor should be
increased, and selection power enhanced, until the costs of that description
labor approach the anticipated costs of search labor.
The implications of this decision framework for information retrieval
systems are considered. A reduction in human description work and a
transfer of labor and expertise to the searcher, for many public systems,
are both predicted and exemplified.
The framework for consideration established demonstrates the analytic and
practical value of a revealing theory.
Julian Warner is in the School of Management and
Economics, The Queen's University, Belfast, and was Visiting Scholar
here in 1991/92.
Aug 22: To be announced.
The Seminar will resume in the Fall semester starting Friday September 3.
Spring 2004 schedule.
Fall 2003 schedule
and
summaries.