School of Information Management & Systems
Previously School of Library & Information Studies
296a-3
Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries
Fridays 3-5. 107 South Hall.
Schedule.
Fri Jan 21:
Clifford LYNCH: Introduction. Report on HICSS:
Hawaii International Conference on System
Sciences, Jan 2000. Also:
Authenticity and integrity in a digital environment.
Jan 28: Brett BUTLER, INFOUR Intellectual Property Development, Foster City:
AnswerBase and how we restructure a query to
get closer to users' needs - and get an answer, not just a
citation or a text.
Brett Butler, founding President of Information Access, established
Magazine Index and, later, Infotrac, pioneering electronic indexing
services, using Library of Congress Subject Headings and an OPAC-like
structure to build a service that became the largest access product line
in the U.S.
Now he is taking another look at access, starting a company based on the
premise that we have interjected too many indexing and browsing tools
between the patron's question and the target answer. AnswerBase will be a
reference database that links queries directly with specific answers using
traditional library classification and other structures in non-traditional
ways.
The service will also be collaborative, enabling libraries to capture
questions as they are asked and submit answers to a central, editorially
reviewed database - a first for reference publishing. Traditionally,
information flows only from publisher to library.
He will address the impact of a truly interactive query and response
system on library practices and on organization for information delivery.
Feb 4: Fredric GEY & Ray LARSON: TREC8: Report on the 8th Text Retrieval
Evaluation Conference.
TREC, Text REtrieval Conference has been conducted from the past
8 years by the National Institute of Standards and Technology (NIST)
with support from Defense Advanced Research Projects Agency (DARPA)
The Text Retrieval Conference (TREC) workshop series encourages research
in information retrieval from large text applications by providing a
large test collection, uniform scoring procedures, and a forum for
organizations interested in comparing their results. Now in its ninth
year, the conference has become the major experimental effort in the field.
Participants in the previous TREC conferences have examined a wide variety
of retrieval techniques, including methods using automatic thesauri,
sophisticated term weighting, natural language techniques, relevance
feedback, and advanced pattern matching. Other related problems such as
cross-language retrieval, retrieval of recorded speech, and question
answering have also been studied. Details about TREC can be found
at the TREC web site, http://trec.nist.gov .
TREC focuses on a number of specific retrieval tasks in a set of "tracks".
Below is a brief summary of the tasks.
Complete descriptions of tasks
performed in previous years are included in the Overview papers in each
of the TREC proceedings (in the Publications section of the web site).
The central task for all past TREC's has been the Ad Hoc retrieval task,
attempting to find the relevant documents in a fixed database.
In addition there are a number of more specialized tracks:
-
Cross-Language Track -- a track that investigates the ability of
retrieval systems to find documents that pertain to a topic
regardless of the language in which the document is written.
-
Filtering Track -- A task in which the user's information need is stable
(and some relevant documents are known) but there is a stream
of new documents. For each document, the system must make a
binary decision as to whether the document should be retrieved
(as opposed to forming a ranked list).
-
Interactive Track -- A track studying user interaction with text retrieval systems.
- Query Track -- A track designed to foster research on the effects of
query variability and analysis on retrieval performance.
-
Question Answering Track -- A track designed to take a step closer
to *information* retrieval rather than *document* retrieval.
For each of a set of 500 questions, systems produce a text
extract that answers the question.
-
Spoken Document Retrieval Track -- A track that investigates the
effects of speech recognition errors on retrieval performance.
-
Web Track -- A track featuring ad hoc search tasks on a document
set that is a snapshot of the World Wide Web.
Feb 11: Jack L. XU, Senior Manager, Search Technology Group, Excite@Home Corp: Internet Search Engines: Real World IR Issues and Challenges.
The Excite Web search engine was created in 1996. As one of the industry
leading engines, Excite Search is currently indexing 250 million Web pages
from an initial database of over 920 million visited Web page and supporting
eleven languages including Japanese, Italian, Spanish and Chinese.
10s millions of users per day, 100s queries per second, 10s Terabytes of
data, 100s gigabyte database of indexed terms, 10s Sun E4500 servers on
the backend, multiple data centers. At that scale, everything isn't easy.
Jack Xu joined Excite in 1996, he is one of the founding researchers and
developers at Excite. Jack currently manages the Search Technology Group
within Excite@Home Corp. This talk discusses real world IR issues (web
collection, users, queries ...), and why the issues underlying the internet
search engines are challenging. This talk will be illustrated with lessons
learned along the way in managing the Excite search engine.
Feb 18: Fred GEY, UCDATA; Steve LUSSIER; John McCARTHY, LBL & Frank OLKEN, LBL:
ISO/IEC 11179 metadata registries.
Report on the
Open Forum on ISO 11179 Metadata Registries, Jan 17-21, 2000,
in Santa Fe, New Mexico, US. The fourth in a series of
international conferences with participants from private
enterprise, government, academe and standards organizations to explore
the capabilities,
uses, content, development and operation of metadata registries,
particularly those based on
ISO/IEC 11179. Emphasis is on managing the content (semantics)
of data
that is shared within
and between organizations or disseminated via the World Wide Web.
Feb 25: Richard GEIGER, S.F. Chronicle:
From "Morgue" to Electronic Publisher -- The Evolution
of Newspaper Libraries.
A survey of the many changes that have taken place in news libraries over the last two decades and look ahead to the
future, addressing such issues as subject access, format standardization, database software and vendor relations. Also will
discuss the effects of the the Web and the global economy on news libraries.
Mar 3: Michel BIEZUNSKI, Infoloom, Paris, France:
The Topic Maps International Standard (ISO/IEC 13250:1999).
The Topic Maps International Standard (ISO/IEC 13250:1999) provides
a standard syntax for interchanging the information needed to support
collaborative creation and maintenance of finding aids such as indexes and
glossaries. Topic Maps permit such modeling information to be maintained
separately from the materials that are indexed. This presentation will give
an
overview of the Topic Maps architecture, covering concepts, syntax, and some
applications currently under development will be presented.
Michel Biezunski Michel Biezunski is working as an independent
consultant. He specializes on SGML applications, and has worked
specifically on document architectures based on links within information
objects.
For an explanation of Topics Maps see:
Topic Maps.
Welcome to
Topic Map Land.
"The new ISO standard ISO/IEC 13250 Topic Maps defines a model and
architecture for the semantic structuring of link networks. The basic
concepts of the standard are topics, occurrences of topics, and
relationships ("associations") between topics. A topic map in its
interchange form is an SGML (or XML) document (or set of documents) in
which different element types are used to represent topics, occurrences of
topics, and associations between topics."
Mar 10: Progress reports:
-- Lincoln CUSHING: "Call for Paper: Paper permanency developments.
It is common knowledge in the library community that most printed
documents produced over the past 150 years are slowly deteriorating
because the paper is archivally unstable. The consequential lost
knowledge has been serious enough that the ALA has described the
situation as "...a form of censorship." This report summarizes the
extent of the problem, reviews the proactive efforts made to improve the
quality of new materials being produced, and raises suggestions for
areas of further policy development.
-- Karthik IYER: The application of XML in the e-commerce arena.
Also something about new technologies like tuple spaces and
Jini architecture and the possible incorporation of Xml in those
technologies.
-- Sridarshan KOUNDINYA: Ontologies.
What are they? Why are they interesting? Summary of definitions
Various approaches taken by different researchers.
Research questions that intrigue me. How does this topic link my past background in pu
blic policy with my current and future interest in information management?
Progress in clarifying the concept.
-- Kathryn KADA & Steve LUSSIER.
Environmental Informatics Portal Prototype.
We are taking a non-profit, "open source" approach to
support creators and users of environmental datasets, seeking both to
lower entry barriers to the field and to promote structured dialogue and
collaborative knowledge development among researchers.
A recent draft interface is up at
www.sims.berkeley.edu/~s
lussier/newmain2.html.
Mar 17: Patricia BREIVIK, Dean of the University Library, San Jose State U.:
Two Changing Faces of Libraries: Information Literacy
and Joint Libraries.
-- While concerns about America's digital divide
intensify, some librarians are aggressively confronting this challenge.
This presentation will explore examples of two very different approaches to
closing the gap between the haves and have nots in our Information Society.
These examples are: a $771.5 million project of the San Jose State
University and the City of San Jose to build a joint library with
integrated services and the growing impact of information literacy in
education.
Mar 31: Spring Break.
Apr 7: Reports on recent developments:
- New project on "Translingual Information Management Using
Domain Ontologies";
- Web-Wise: Institute for Museum and Library Services conference for National
Library Leadership Grant recipients;
- DARPA TIDES kick-off meeting;
- Coalition for Networked Information Forum; and more!
Apr 14: John L. OBER, California Digital Library, UC Office of the President:
Applied Research and Technology Transfer for the California Digital Library.
John Ober, Director of Education and Applied Research
at the CDL, will
discuss the creative tension between immediate goals, available and
emerging technology, and the establishment of an applied research and
technology transfer agenda to address the mid and long-term goals of the
CDL and its users. A tools and services "wish list," creation of strategic
partnerships, and organizational processes are all facets of the topic open
for discussion.
Apr 21: Clifford LYNCH: Collaborative filtering,
popularity based notification, and "how hits happen."
Apr 28: Ray LARSON: Cross-Domain Resource Discovery:
Integrated Discovery and Use
of Textual, Numeric and Spatial Data.
This talk will describe the International Digital Library project
sponsored by NSF and JISC in the UK under the NSF/JISC International Digital Library Grant program.
The goals of this project are twofold:
1) Practical application of existing DL technologies to some
large-scale cross-domain collections.
2) Theoretical examination and evaluation of next-generation designs
for systems architecture and and distributed cross-domain searching
for DLs.
The Participants:
* University of Liverpool
* Art and Humanities Data Service (http://ahds.ac.uk/)
* OTA (Oxford), HDS (Essex), PADS (Glasgow), ADS (York),
VADS (Surrey & Northumbria)
* Consortium of University Research Libraries (CURL)
* UC Berkeley Library
* Making of America II
* Online Archive of California
* Use in NESSTAR
For the first goal, we are implementing a distributed search system
based on international standards (Z39.50 and SGML/XML) called
"Cheshire II" which will be used for cross-domain searching. Databases
include: Arts and Humanities Data Service (AHDS), CURL (Consortium of
University Research Libraries) Online Archive of California (OAC)
Making of America II (MOA2)
The second goal will be addressed in the design, development, and
evaluation of the distributed information retrieval system
architecture, its client-side systems that aid the user in exploiting
distributed resources and in the design and evaluation of protocols
for efficient and effective retrieval in a internationally distributed
multi-database environment. (Cheshire III?).
We will be dealing with types of data: 1) document databases with
information about various topics ranging from news reports and library
catalogue entries to full-text articles from academic journals
including text, images and multimedia elements, 2) Numeric statistical
databases which assemble facts about a wide variety of social,
economic, and natural phenomena and 3) Geographic databases derived
from geographic information systems, digitized maps, and other
resource types which have a georeferenced view of the geographic
features and boundaries including georeferenced information derived
from place names.
For CHESHIRE see
http://cheshire.sims.berkeley.edu
For this project see
http://cheshire.sims.berkeley.edu/proposal.html
May 5:
Sridarshan KOUNDINYA: Ontologies.
"Ontologies" are a topic of widespread discussion, but what is meant?
There is "Ontology" as a field of Philosophy and there are different kinds
of "AN ontology." The differences will be explained with special reference to
the use of "ontology" in the sense of a metadata language, such as a thesaurus.
Also
Lincoln CUSHING: Modern Industrial Papermaking and its Consequences
for Librarians.
An overview of why the paper used for producing documents over the past
150 years has had terrible consequences for librarians and archivists.
Technical and policy issues are explored, as well as suggestions for
future work.
May 12:
Karthik IYER:
XML Applications and related technologies.
There are various XML based applications used for commercial purposes.
Some like HotMetal server are used for e-commerce applications.
I will also describe some of the XML technologies like xsl, xlink,xpointer, etc.
Also
Michael GEBBIE: Preliminary Research on
Subdomain Indexes
Special Vocabulary for Special Information Needs.
Conventional practice in indexing is to create a general index to the
entire database or corpus. But searchers are usually looking for
something within a specific topic. In the DARPA Metadata project we
have been experimenting with creating indexes based on a specialized sub-area
("subdomain") only, with striking results. A progress report.
Fall 1999 schedule.