296a2 Summary - Seminar Information Access Spring 2000

School of Information Management & Systems
Previously School of Library & Information Studies

296a-1 Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries

Spring 2001. Fridays 3-5. 107 South Hall. Schedule.

Friday Jan 19: Clifford LYNCH: Introduction.
1. Introduction to Seminar plans for spring;
2. Updates and comments on recent meeting: HICSS, IETF, and others;
3. Discussion of reputation management.

Jan 28: Niels Windfeld LUND: From Information Science to Documentation.
Abstract: "The title is inspired by Irene Farkas-Conn's title: "From Documentation to Information Science." Do we need to consider the physical aspects of documents in the Age of Cyberspace? My preliminary answer is Yes. To me it is more important than ever before to consider the interrelationship between the physical, social, and cognitive aspects of documents and the roles they play. For my forthcoming book I am trying to develop an analytical framework for studies of these issues."
Note: Niels Windfeld LUND is Visiting Professor in the School for the Spring semester. He is completing a book in the multiple ways in which records (documents, texts, images, etc.) are used in contemporary society. He is conducting a Seminar: Documents in Society on Tuesday afternoons. Prof. Lund comes from the Institute for Documentation Studies, University of Tromsoe, Norway, "The world's northern-most university".

Feb 2: Heath O'CONNELL and Pat KREITZ, SLAC, Stanford: eConf: An Online Conference Proceedings Site.
The eConf Electronic Conference Proceedings site has been developed to facilitate scientific communication and reduce the cost to libraries of buying, cataloging and warehousing conference proceedings. We would like to show the current system and discuss what could be done to extend this model beyond the physics community.

Feb 9: Fred GEY: Evaluation of Entry Vocabulary: A Technology to Enhance Digital Search.
For the past three years the Metadata research program at SIMS has been developing advanced search technologies to improve search across diverse genres of digital objects -- documents, patents, multilingual retrieval, numeric data and images. The technology leverages human indexing of objects in specialized domains to provide increased accessibility to non-expert searchers. Thus far the technology's benefits have been annecdotal. The program is about to embark, under supplemental funding from DARPA, on a comprehensive evaluation of what does and doesn't work with entry vocabulary software and data. Our ideas for three approaches to evaluation will be described -- TREC-like compartmentalized evaluation without human intervention, evaluation of search for numeric statistical data by human subjects, and evaluation through web-based session logging and attempting to determine 'proxies' for relevance. The description will start with a review of our research program and its accomplishments. The Metadata research program website is at www.sims.berkeley.edu/research/metadata/

Also: Michael BUCKLAND: The Effect of Dialects on the Use of Indexes.
The scope ("domain") covered by large bibliographic or textual databases usually includes several specialized topical areas ("subdomains"). Each topical area reflects the work of community of specialist s. These communities evolve their own specialized discourse and vocabulary: different terms and specialized meanings of other terms. The obvio us approach is to create a single dictionary for the target database as a whole. But searches are usually concerned with a specialized top ic within a database. So...
1. Should search support should be customized to each subdomain?
2. Would that lead to a better (more specialized) search terms?
3. Would that lead to different, better search results?
Specialized dictionaries (entry vocabulary modules) have been created for specialized topics (subdomains) in order to find out. Preliminary analyses indicate substantial differences in the choice of metadata terms and in the retrieval results. This work and possible future developments will be briefly summarized. Some earlier technical notes are at http:// www.sims.berkeley.edu/research/metadata/subvocab.html.

Feb 16: Howard BESSER, UCLA: Longevity of Electronic Art.
This talk explores the problems of maintaining accessibility to electronic works of art over time. It examines the various hardware and software issues surrounding digital longevity, then discusses the special characteristics of electronic art that make it much more problematic to preserve than more conventional types of works. Finally, the speaker offers up a new paradigm for approaching preservation of these types of works, and suggests some concrete/pragmatic steps that can be taken to preserve this type of material.

Feb 23: David M. LEVY, U of Washington: The "LC21: A Digital Strategy for the Library of Congress" report.
I was a member of the National Research Council committee that recently published its digital strategy for the Library of Congress. In this talk, I will describe some of the report's principal findings and recommendations.
(David Levy was a researcher at Xerox PARC for nearly twenty years. He is currently a visiting professor in the Information School of the University of Washington.)
The report is available for sale or online, see: http://books.nap.edu/catalog/ 9940.html Also anyone wanting to borrow a copy of the report can contact Michael Buckland.

Joseph A. BUSCH, Content Intelligence Evangelist, Interwoven, and 2001 President, American Society for Information Science and Technology: Helping people find content...preparing content to be found: Enabling the semantic Web.
Anyone who has spent time searching for information on the Web or at a Web site knows how frustrating the experience can be. More often than not the search returns zero hits, or thousands of hits that must be further sifted manually. Tim Berners-Lee, inventor of the Web, and Dagobert Soergel, a professor of Library and Information Science, share a vision for the future that they call the semantic Web or SemWeb. This vision provides the lingua franca--XML, and the Rosetta Stone--DTDs; but the Holy Grail--accurate content automatically processed so that it can be easily found, remains out of reach. Institutions that are authorities in vertical subject areas have a unique opportunity to be major players in transforming content roulette into successful search experiences. My talk will discuss the concept of the semantic Web, how it is being built, how organizations can participate in building it, and how it is transforming the Web user experience today and will continue to transform it in the future. I will also provide an update on ASIST activities.

Mar 9: Liv Aasa HOLM, Oslo University College:
SOME R & D PROJECTS AT OSLO UNIVERSITY COLLEGE.
1. ONE-2 (OPAC Network in Europe - 2, 1998 - 2001). This is the continuation of the European Union project ONE which linked together 10 European database hosts via the Z39.50 protocol and the ILL protocol. The user interface to this "network" is either through the local library systems (integrated solution), via stand-alone clients like ICONE or via the Web. In ONE-2 we will concentrate on ILL and electronic document delivery. What were the problems and how were they solved.
2. The Norwegian emigration to North America - a 175 year commemoration. What were the problems in creating the Web site www.nb.no/html/emigrasjon.html and how were they solved.
3. The life and work of Alexander Kielland. A multimedia WEB-site: www.kielland.org/LivVirke/. How to organize the different types of information.
Note: Liv Holm has nearly thirty years experience in digital library research and development, first in Norsk dokumentdata, a Governmental program for improving automation in libraries, and later BRODD: the Research, Development and Consultancy department of the Norwegian School for Library and Information Science. She is now Associate Professor in the Faculty of Journalism and Library and Information Studies of the Oslo University College, Norways Largest Institution of Professional Education, established in 1994 when the Norwegian college system was restructured and 18 smaller colleges in the Oslo area merged. She is in Berkeley for the current semester and was visiting scholar here in South Hall in 1992.

Mar 16: Short Reports by Lincoln CUSHING; Chan Lee CHAN; Frederick WONG; and Clifford LYNCH.
Lincoln CUSHING: The shifting paradigm: Collaborative collection-building and displaying of large-format digital image collections on the Web.*
Until recently, libraries and archives built their collections the old-fashioned way - they bought them. This allowed institutions to fulfill their mission by expanding their holdings with only space and cost constraining them. However, as digital information begins to supplement and even replace conventional documents, the logic of this approach begins to break down. Digital technology and the Internet allow documents to be shared with multiple users simultaneously with no risk of damage or loss to the original artifact, thus dramatically transforming the ability of academic researchers and the curious public alike to better understand their world. Unfortunately, not all cultural materials are entering the digital tidal wave at the same rate. At risk of being lost in this transition are documents that are "difficult" (i.e., expensive) to digitize, which includes large-format documents on paper such as maps and posters. This paper will look at options for collaborative building of Web-accessible "union catalogs" of image-based collections.
Chan Jean LEE: How to organize products in e-commerce site.*
It is hard to search products in e-commerce sites. Users' tremendously diverse search terms and manufacturers' arbitrary brand names rarely match. To solve this problem, users should be able to browse products very efficiently. Organizing products by facets is good, because the number of products can be narrowed down easily. It can also help users express their information needs by choosing terms used in the website.
Frederick WONG: Un-structural searching based on structural information.*
Most of the search engines today use massive indexing with keywords and metadata. Documents are returned based on the hit scores of the documents using the number of occurrences of the words used in the search query. In a situation where users have only an incomplete idea of what they are searching for, the search engine usually returns irrelevant documents with high hit scores. This research project first trying to understand the state of the arts in text pattern matching technology, with the idea of creating a search engine prototype (data model and a simple search engine) that uses structural information obtained from the user through a sequence of searches to obtain a better search results. The idea is to use the user's search history to build a meaningful subject area that can be used together with the search keywords.
Clifford LYNCH: Meetings and Metadata Architecture.
I will do a brief trip report on a few meetings I have been to lately, and then (time permitting) I will talk a bit about the ARIST chapter Cecilia Preston and I are starting to work on covering metadata architecture and infrastructure.

March 23 Alice AGOGINO & Andy DONG: Demonstrating the Core Integration System for the National SMET Education Digital Library.
For some years Professor Agogino has been deeply involved in the development of The National Engineering Education Delivery System (NEEDS), which developed a scalable infrastructure that allows engineering educators to locate and discuss digital learning resources and participate as a community of practice. NEEDS is now expanding to include a broader community of educators concerned with Science, Mathematics, Engineering, and Technology Education (SMETE), including a SMETE Digital Library (See www.smete.org).
Two recent papers (in pdf format) are Using the National Engineering Education Delivery System as the Foundation for Building a Test-Bed Digital Library for Science, Mathematics, Engineering and Technology Education and Towards a Digital Learning Community for Engineering Education.

Mar 30: Spring Break: No seminar meeting.

Apr 6: John OBER, California Digital Library: Building an Infrastructure for Digital Content Management.
The California Digital Library http://www.cdlib.org -- "a co-library of the campuses of the University of California" -- finds itself working with and making commitments for the management of an increasing variety of digital objects and their metadata. Infrastructures to support this management have tended to be program-based (e.g. for a "government information" initiative) or format-based (e.g. "images"). We've started discussions about commonality of purpose and whether it is feasible to define and build a generalized infrastructure. I would like to review the needs and goals, the informal process we're using for investigation, and then get input from seminar participants about next steps.

Apr 13: Clifford LYNCH: Meetings and Metadata.
Two topics: Reports on recent meetings; and Metadata architecture and infrastructure.

Apr 20: Aitao CHEN: What can one do with 5 million catalog records?
A new development in the School's Metadata Research Program is detailed analysis of a large set of library catalog records. Statistical association techniques allow mapping between values in one field with values in another field. For example, title words can be used to create multilingual indexes to Library of Congress Subject Headings and such an index can be used to establish mappings between databases. See http://otlet.sims.berkeley.edu/mevm.html
Fredric GEY: Report on the Human Language Technology conference, www.hlt2001.org.
The first DARPA sponsored Human Language Technology conference was held March 18-21 in San Diego. The conference brought together a wide variety of specialists across the spectrum of research in human language technology, including:
speech recognition
machine translation
cross language information retrieval
text summarization
information extraction
question answering
I gave a demonstration of our Entry Vocabulary Technology and I will give a personal overview of the conference. Come learn why the answer to the question "Where do lobsters live?" is "On the table".

Apr 27: Michael BUCKLAND and Ruth MOSTERN: Metadata and Geo-temporal Analysis: New Ideas.
Michael Buckland: Fredric Gey, Ray Larson, Aitao Chen, several students and others, and I have been engaged in a "Metadata Research Program" since 1997. The past twelve months have been particularly fruitful in new ideas, mostly not yet published, and in proposals for future research. We have been exploring how existing metadata (categorization, indexing, etc.) could be made easier to use; how metadata-rich resources could be used to support metadata-poor resources; how variations ("dialects") in the vocabularies of the populations served can be used to improve search effectiveness; on how the performance of intermediaries can be measured; and other topics. I will provide an overview and explanation of these developments -- and also talk about what we want to work on in the next twelve months.
Ruth Mostern, Electronic Cultural Atlas Initiative, will describe some of the intriguing ideas and possibilities in the exceptional resources world-wide associated with the Electronic Cultural Atlas Initiative. The emphasis in ECAI is on the opportunities for new insights when it becomes possible to examine culture and history with detailed attention to place and time in data analysis and visualization. There are exciting possibilities for collaboration between ECAI and the Metadata Research Program.

May 4: Last Seminar Meeting of the Semester: Students' Projects.
Frederick WONG: Searching the Impossible?!*
As a computer science student, search engine is basically a keywords matching in a giant meta-table. To end-users, a search engine is a place to find what they are looking for; which has a far deeper meaning than that nxm matrix. In this term project presentation, we will examine the some technologies being used in existing industrial search engines deployed in both web-based and document-based environment; the lesson learnt; and a discussion of a simple prototype "smart search helper" that combines some ideas from existing technologies.
Chan Jean LEE: For Better Performance of Entry Vocabulary Indexes.*
1. Utilize the Internet as a metadata rich resource.
Feeding terms and directories of websites into the Entry Vocabulary Module(EVM) may improve EVM's precision and recall. This is because not only terms and directories of websites will enrich the term metrics of EVM, but also these terms and directories are more domain specific.
2. Provide Relational Operators to connect different domains.
Users may want to find information across domains without knowing the structures of each domain (e.g. MeSH has a "disease" field, US PATENT has an "invention" classification. Without knowing these details, users may want to search "invention" which can be used for cure of "a certain disease"). To join different fields effectively without knowing the details, there is a need for "relational operators" so that users can specify the relationship between query terms in the query. (e.g. query = invention "USED FOR" disease_name).
Lincoln CUSHING: Union Catalog Architecture and "Hunting-Gathering".*
Union catalogs are generally designed and built following two proven models: adding new collections to existing databases or linking external databases. They are based on the idea that the logical primary task is to help patrons find what they are looking for. This goal necessitates considerable institutional effort in standards promulgation (e.g. Z39.50), thesauri, clever search engines, and data massaging. However, in some ways institutions have been slow to adopt the full power of file transfer possibilities that the HTTP protocol provides for materials on the Web. Much has been done to optimize data storage and retrieval, but these functions do not address all the needs of patrons. "Collaboration" in this context is the capability to publish the hypermedia resource base locally and have it viewable globally, and the capability to swiftly and easily transfer the hypermedia resources, annotate them, and republish them on another site" (emphasis by author).
I am suggesting that the practical needs of most web researchers are not met without a viable method for dealing with the materials they find after searching. This could be described as a Hunter-Gatherer model. "Gatherer" describes the act of assembling found items, and describes the actual human practice of building knowledge. No one goes to a search engine, formulates a query, and assumes that the end result is all there is to know. True research is built on iterative stages of searching ("hunting"), picking out relevant materials, and searching some more. When dealing with unusual cultural materials, such as poster archives or museum holdings, there will always be a limit to the consistency and integration of information retrieved across different collections. The practice of gathering is a natural style for doing research, and expanding tools to help this gathering opens up a wide range of opportunities for productive collaborative projects. The ArtsConnected website, hosted by the Minneapolis Institute of Arts/Walker Art Center (www.artsconnected.org/) offers a very rich (though local) Gatherer model, using a well-designed user interface to assemble and present materials on the Web. I have built something similar at my Docs Populi site, demonstrating how materials from two distinct databases can be "joined" in a gathering operation. The next step is to expand this approach to materials from other resources. Patrons would be able to build classroom presentations or dissertation references based on cataloged and digitized materials from URL's anywhere in the world. Several technical and procedural approaches could easily lend themselves to this sort of "item-by-item" gathering from different holdings.

Fall 2000 summaries. Schedule for Spring 2001. Fall 2001 summaries.