School of
Information Management & Systems
Previously School of Library & Information Studies
296a-1
Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries
Spring 2001. Fridays 3-5. 107 South Hall.
Schedule.
Friday Jan 19:
Clifford LYNCH: Introduction.
1. Introduction to Seminar plans for spring;
2. Updates and comments on recent meeting: HICSS, IETF, and others;
3. Discussion of reputation management.
Jan 28: Niels Windfeld LUND: From Information Science to Documentation.
Abstract:
"The title is inspired by Irene Farkas-Conn's title:
"From Documentation to Information Science."
Do we need to consider the physical aspects of documents in the Age of Cyberspace?
My
preliminary answer is Yes. To me it is more important than ever before to consider the
interrelationship between the physical, social, and cognitive aspects of
documents and the roles they play. For my forthcoming book I am trying
to develop an analytical framework for studies of these issues."
Note:
Niels Windfeld LUND is Visiting Professor in the School for the Spring
semester. He is completing a book in the multiple ways in which
records (documents, texts, images, etc.) are used in contemporary
society. He is conducting a
Seminar: Documents in Society
on Tuesday afternoons. Prof. Lund comes from
the Institute for Documentation Studies,
University
of Tromsoe, Norway, "The world's northern-most university".
Feb 2: Heath O'CONNELL and Pat KREITZ, SLAC, Stanford: eConf: An Online Conference
Proceedings Site.
The eConf Electronic Conference Proceedings site has been developed to
facilitate scientific communication and reduce the cost to libraries of
buying, cataloging and warehousing conference proceedings. We would like
to show the current system and discuss what could be done to extend this
model beyond the physics community.
Feb 9: Fred GEY: Evaluation of Entry Vocabulary: A
Technology to Enhance Digital Search.
For the past three years the Metadata research program at SIMS
has been
developing advanced search technologies to improve search across diverse
genres of digital objects -- documents, patents, multilingual retrieval,
numeric data and images. The technology leverages human indexing of
objects in specialized domains to provide increased accessibility to
non-expert searchers. Thus far the technology's benefits have been
annecdotal. The program is about to embark, under supplemental funding
from DARPA, on a comprehensive evaluation of what does and doesn't work
with entry vocabulary software and data. Our ideas for three approaches
to evaluation will be described -- TREC-like compartmentalized evaluation
without human intervention, evaluation of search for numeric statistical
data by human subjects, and evaluation through web-based session logging
and attempting to determine 'proxies' for relevance. The description
will start with a review of our research program and its accomplishments.
The Metadata research program website is at
www.sims.berkeley.edu/research/metadata/
Also: Michael BUCKLAND: The Effect of Dialects on
the Use of Indexes.
The scope ("domain") covered by large bibliographic or textual
databases usually includes several specialized topical areas
("subdomains"). Each topical area reflects the work of community of specialist
s. These communities evolve their own specialized discourse and
vocabulary: different terms and specialized meanings of other terms. The obvio
us approach is to create a single dictionary for the target
database as a whole. But searches are usually concerned with a specialized top
ic within a database. So...
1. Should search support should be customized
to each subdomain?
2. Would that lead to a better (more specialized)
search terms?
3. Would that lead to different, better search results?
Specialized dictionaries (entry vocabulary modules) have been created for
specialized topics (subdomains) in order to find out.
Preliminary analyses indicate substantial differences in the
choice of metadata terms and in the retrieval results.
This work and possible future developments will be briefly summarized.
Some earlier technical notes are at
http://
www.sims.berkeley.edu/research/metadata/subvocab.html.
Feb 16: Howard BESSER, UCLA:
Longevity of Electronic Art.
This talk explores the problems of maintaining accessibility to electronic
works of art over time. It examines the various hardware and software
issues surrounding digital longevity, then discusses the special
characteristics of electronic art that make it much more problematic to
preserve than more conventional types of works. Finally, the speaker
offers up a new paradigm for approaching preservation of these types of
works, and suggests some concrete/pragmatic steps that can be taken to
preserve this type of material.
Feb 23: David M. LEVY, U of Washington: The "LC21:
A Digital Strategy for the Library of Congress" report.
I was a member of the National Research Council committee
that recently
published its digital strategy for the Library of Congress.
In this talk, I
will describe some of the report's principal findings and recommendations.
(David Levy was a researcher at Xerox PARC for nearly twenty years. He is
currently a visiting professor in the Information School of the University
of Washington.)
The report is available for sale or online, see:
http://books.nap.edu/catalog/
9940.html
Also anyone wanting to borrow a copy of the report can contact
Michael Buckland.
Joseph A. BUSCH, Content Intelligence Evangelist, Interwoven, and
2001 President, American Society for Information Science and Technology:
Helping people find content...preparing content to be found: Enabling the
semantic Web.
Anyone who has spent time searching for information on the Web or at a Web
site knows how frustrating the experience can be. More often than not the
search returns zero hits, or thousands of hits that must be further sifted
manually. Tim Berners-Lee, inventor of the Web, and Dagobert Soergel, a
professor of Library and Information Science, share a vision for the future
that they call the semantic Web or SemWeb. This vision provides the lingua
franca--XML, and the Rosetta Stone--DTDs; but the Holy Grail--accurate
content automatically processed so that it can be easily found, remains out
of reach. Institutions that are authorities in vertical subject areas have a
unique opportunity to be major players in transforming content roulette into
successful search experiences. My talk will discuss the concept of the
semantic Web, how it is being built, how organizations can participate in
building it, and how it is transforming the Web user experience today and
will continue to transform it in the future.
I will also provide an update on
ASIST activities.
Mar 9: Liv Aasa HOLM, Oslo University College:
SOME R & D PROJECTS AT OSLO UNIVERSITY COLLEGE.
1. ONE-2 (OPAC Network in Europe - 2, 1998 - 2001).
This is the continuation of the European Union
project ONE which linked together 10 European database hosts via
the Z39.50 protocol and the ILL protocol. The user interface to this
"network" is either through the local library systems (integrated solution), via
stand-alone clients like ICONE or via the Web.
In ONE-2 we will concentrate on ILL and electronic document delivery.
What were the problems and how were they
solved.
2. The Norwegian emigration to North America - a 175 year commemoration.
What
were the problems in creating the Web site
www.nb.no/html/emigrasjon.html
and how were they solved.
3. The life and work of Alexander Kielland. A multimedia WEB-site:
www.kielland.org/LivVirke/. How
to
organize the different types of information.
Note:
Liv Holm
has nearly thirty years experience in digital
library research and development, first in
Norsk dokumentdata,
a Governmental program for improving automation in libraries, and later
BRODD: the Research, Development and Consultancy department of the Norwegian
School for Library and Information Science.
She is now Associate Professor
in the Faculty of Journalism and Library
and Information Studies of the
Oslo University College,
Norways Largest
Institution of Professional Education,
established in 1994 when the Norwegian college system
was restructured and 18 smaller colleges in the Oslo area merged.
She is in Berkeley for
the current semester and
was visiting scholar here in South Hall in 1992.
Mar 16: Short Reports by Lincoln CUSHING; Chan Lee CHAN; Frederick
WONG; and Clifford LYNCH.
Lincoln CUSHING: The shifting paradigm: Collaborative collection-building and displaying
of large-format digital image collections on the Web.*
Until recently, libraries and archives built their collections the old-fashioned way - they bought them. This allowed institutions to fulfill their mission by expanding their holdings with only space and cost constraining them. However, as digital information begins to supplement and even replace conventional documents, the logic of this approach begins to break down. Digital technology and the Internet allow documents to be shared with multiple users simultaneously with no risk of damage or loss to the original artifact, thus dramatically transforming the ability of academic researchers and the curious public alike to better understand their world. Unfortunately, not all cultural materials are entering the digital tidal wave at the same rate. At risk of being lost in this transition are documents that are "difficult" (i.e., expensive) to digitize, which includes large-format documents on paper such as maps and posters. This paper will look at options for collaborative building of Web-accessible "union catalogs" of image-based collections.
Chan Jean LEE: How to organize products in e-commerce site.*
It is hard to search products in e-commerce sites. Users' tremendously diverse search terms and manufacturers' arbitrary brand names rarely match. To solve this problem, users should be able to browse products very efficiently. Organizing products by facets is good, because the number of products can be narrowed down easily. It can also help users express their
information needs by choosing terms used in the website.
Frederick WONG: Un-structural searching based on structural information.*
Most of the search engines today use massive indexing with keywords and metadata. Documents are returned based on the hit scores of the documents using the number of occurrences of the words used in the search query. In a situation where users have only an incomplete idea of what they are searching for, the search engine usually returns irrelevant documents with high hit scores. This research project first trying to understand the state of the arts in text pattern matching technology, with the idea of creating a search engine prototype (data model and a simple search engine) that uses structural information obtained from the user through a sequence of searches to obtain a better search results. The idea is to use the user's search history to build a meaningful subject area that can be used together with the search keywords.
Clifford LYNCH: Meetings and Metadata Architecture.
I will do a brief trip report on a few meetings I have been to lately, and then (time permitting) I will talk a bit about the ARIST chapter Cecilia Preston and I are starting to work on covering metadata architecture and infrastructure.
March 23 Alice AGOGINO & Andy DONG:
Demonstrating the Core Integration System for the National SMET Education Digital Library.
For some years Professor Agogino has been deeply involved in the development of The National Engineering Education Delivery System (NEEDS), which developed a scalable infrastructure that allows engineering educators to locate and discuss digital learning resources and participate as a community of practice. NEEDS is now expanding to include a broader community of educators concerned with Science, Mathematics, Engineering, and Technology Education (SMETE), including a SMETE Digital Library (See
www.smete.org).
Two recent papers (in pdf format) are
Using
the National Engineering Education Delivery System as the Foundation for Building a Test-Bed Digital Library for
Science, Mathematics, Engineering and Technology Education and
Towards
a Digital Learning Community for Engineering Education.
Mar 30: Spring Break: No seminar meeting.
Apr 6: John OBER, California Digital Library: Building an Infrastructure
for Digital Content Management.
The California Digital Library
http://www.cdlib.org -- "a co-library of the
campuses of the University of California" --
finds
itself working with and making commitments for the management of an
increasing variety of digital objects and their metadata. Infrastructures
to support this management have tended to be program-based (e.g. for a
"government information" initiative) or format-based (e.g. "images"). We've
started discussions about commonality of purpose and whether
it is feasible to define and build a generalized infrastructure. I would
like to review the needs and goals, the informal process we're using for
investigation, and then get input from seminar participants about next steps.
Apr 13: Clifford LYNCH: Meetings and Metadata.
Two topics: Reports on recent meetings; and Metadata architecture and infrastructure.
Apr 20: Aitao CHEN: What can one do with 5 million catalog records?
A new development in the School's
Metadata
Research Program
is detailed analysis of a large set of library catalog records.
Statistical association techniques allow mapping between values in one field with values in
another field. For example, title words can be used to create multilingual
indexes to Library of Congress Subject Headings and such an index can be used to establish
mappings between databases.
See
http://otlet.sims.berkeley.edu/mevm.html
Fredric GEY: Report on the Human Language Technology conference,
www.hlt2001.org.
The first DARPA sponsored Human Language Technology conference
was held March 18-21 in San Diego. The conference brought together
a wide variety of specialists across the spectrum of research in
human language technology, including:
speech recognition
machine translation
cross language information retrieval
text summarization
information extraction
question answering
I gave a demonstration of our Entry Vocabulary Technology
and I will give a personal overview of the conference. Come
learn why the answer to the question "Where do lobsters live?"
is "On the table".
Apr 27: Michael BUCKLAND and Ruth MOSTERN:
Metadata and Geo-temporal Analysis: New Ideas.
Michael Buckland:
Fredric Gey, Ray Larson, Aitao Chen, several students and others, and I
have been engaged in a "Metadata Research Program" since 1997.
The past twelve months have been particularly fruitful in new ideas,
mostly not yet published, and in proposals for future research.
We have been exploring how existing metadata (categorization, indexing, etc.)
could be made easier to use; how metadata-rich resources could
be used to support metadata-poor resources; how variations ("dialects")
in the
vocabularies of the populations served can be used to
improve search effectiveness; on how the performance of intermediaries
can be measured; and other topics.
I will provide an overview and explanation of these developments
-- and also talk about what we want to work on in the
next twelve months.
Ruth Mostern, Electronic Cultural Atlas Initiative,
will describe some of the intriguing ideas and possibilities in
the exceptional resources world-wide associated with the Electronic
Cultural Atlas Initiative. The emphasis in ECAI is on the opportunities
for new insights when it becomes possible
to examine culture and history with detailed attention to place
and time in data analysis and visualization.
There are exciting possibilities for collaboration between ECAI and the
Metadata Research Program.
May 4: Last Seminar Meeting of the Semester: Students' Projects.
Frederick WONG: Searching the Impossible?!*
As a computer science student, search engine is basically a keywords
matching in a giant meta-table. To end-users, a search engine is a place to
find what they are looking for; which has a far deeper meaning than that nxm
matrix. In this term project presentation, we will examine the some
technologies being used in existing industrial search engines deployed in both
web-based and document-based environment; the lesson learnt; and a
discussion of a simple prototype "smart search helper" that combines some
ideas from existing technologies.
Chan Jean LEE: For Better Performance of Entry Vocabulary Indexes.*
1. Utilize the Internet as a metadata rich resource.
Feeding terms and directories of websites into the Entry Vocabulary
Module(EVM) may improve EVM's precision and recall.
This is because not only terms and directories of websites will enrich the
term metrics of EVM, but also these terms and directories are more domain
specific.
2. Provide Relational Operators to connect different domains.
Users may want to find information across domains without knowing the
structures of each domain (e.g. MeSH has a "disease" field, US PATENT has
an "invention" classification. Without knowing these details, users may want to search
"invention" which can be used for cure of "a certain disease").
To join different fields effectively without knowing the details,
there is a need for "relational operators" so that users can specify the
relationship between query terms in the query. (e.g. query = invention "USED
FOR" disease_name).
Lincoln CUSHING:
Union Catalog Architecture and "Hunting-Gathering".*
Union catalogs are generally designed and built following two proven
models: adding new collections to existing databases or linking external
databases. They are based on the idea that the logical primary task is
to help patrons find what they are looking for. This goal necessitates
considerable institutional effort in standards promulgation (e.g.
Z39.50), thesauri, clever search engines, and data massaging. However,
in some ways institutions have been slow to adopt the full power of file
transfer possibilities that the HTTP protocol provides for materials on
the Web. Much has been done to optimize data storage and retrieval, but
these functions do not address all the needs of patrons.
"Collaboration" in this context is the capability to publish the
hypermedia resource base locally and have it viewable globally, and the
capability to swiftly and easily transfer the hypermedia resources,
annotate them, and republish them on another site" (emphasis by
author).
I am suggesting that the practical needs of most web researchers are
not met without a viable method for dealing with the materials they find
after searching. This could be described as a Hunter-Gatherer model.
"Gatherer" describes the act of assembling found items, and describes
the actual human practice of building knowledge. No one goes to a search
engine, formulates a query, and assumes that the end result is all there
is to know. True research is built on iterative stages of searching
("hunting"), picking out relevant materials, and searching some more.
When dealing with unusual cultural materials, such as poster archives or
museum holdings, there will always be a limit to the consistency and
integration of information retrieved across different collections. The
practice of gathering is a natural style for doing research, and
expanding tools to help this gathering opens up a wide range of
opportunities for productive collaborative projects. The ArtsConnected
website, hosted by the Minneapolis Institute of Arts/Walker Art Center
(www.artsconnected.org/)
offers a very rich (though local)
Gatherer model, using a well-designed user interface to assemble and
present materials on the Web. I have built something similar at my Docs
Populi site, demonstrating how materials from two distinct databases can
be "joined" in a gathering operation. The next step is to expand this
approach to materials from other resources. Patrons would be able to
build classroom presentations or dissertation references based on
cataloged and digitized materials from URL's anywhere in the world.
Several technical and procedural approaches could easily lend themselves
to this sort of "item-by-item" gathering from different holdings.
Fall 2000 summaries.
Schedule for Spring 2001.
Fall 2001 summaries.