Aug 29: Clifford LYNCH: Introduction. Interesting Recent Meeetings.     Introduction to the seminar.     There have been several interesting meetings since last spring. I will lead a discussion on some of these. We could cover SIGIR, EuroDL, JCDL, the Document Academy's DOCAM03 conference in South Hall, the IMS Global Learning Consortium/CNI workshop, and the IMS meeting, and others.

Sept 5: David MESSERSCHMITT: Implications of the NSF Cyberstructure Report.
    Interim Dean Messerschmitt served on the National Science Foundation Advisory Committee for Cyberinfrastructure. The Committee's report envisions a future cyberinfrastructure that will "radically empower" the science and engineering community. Major funding for cyberinfrastructure research and development is expected.
News release: pa/news/03/pr0318.htm.
Report: www.communitytech
    At the request of the Association of Research Libraries, Dave prepared a paper outlining many opportunities for libraries in the NSF Advanced Cyberinfrastructure Program.
    Dave discuss the report and his paper.

Sept 12: John CHUANG: Economics of Peer-to-Peer Computing
    John Chuang will provide a summary report on the first Workshop on Economics of Peer-to-Peer Systems that he organized in June 2003. The workshop brought together for the first time researchers and practitioners from multiple disciplines to discuss the economic characteristics of P2P systems and economics-informed P2P system design, with applications ranging from file-sharing to distributed computation, application layer overlays to mobile ad hoc networks. He will discuss the state-of-the-art in this area, and highlight the many research challenges and open questions facing this emerging research community.

Sept 19: Fredric GEY and Aitao CHEN: The Hindi Surprise Language Exercise and Other Current Research in Cross-Language Information Retrieval.
    What would your choice of a surprise language be if you wanted search and retrieval for that language in 29 days? Would it be Cebuano, spoken by 24 percent of the Phillipine population (the lingua franca of the southern Phillipines)? Or would it be Hindi, spoken by over 200 million people on the Indian subcontinent? These challenges were presented to us this spring by the DARPA Surprise Language exercise.
    We will also be presenting our work on multilingual access to nine European languages (English, Dutch, Finnish, French, German, Italian, Russian, Spanish, and Swedish) from the recent CLEF workshop on European language retrieval in Norway (Aug 21-22), and the forthcoming special issue on cross-language information retrieval of Information Processing and Management being edited by Fredric Gey, Carol Peters of Italy and Noriko Kando of Japan.

Sep 26: Michael BUCKLAND: Representing Place, Time, Topic, Person: Intermediate Infrastructure.
    We have discussed place, place names, and geo-referencing in the the seminar in previous semesters. I will update and extend those discussions. Place names have temporal aspects, since both places and place names change over time. This has led to exploration of the idea of a "time period directory" analogous to a gazetteer, especially as period names have geographical aspect. Another releated genre is the biographical dictionary, which draws heavily on places and times. This work has helped to develop ideas about infrastructure for digital libraries and for Question Answering systems.

Oct 3: Jessie HEY, Southampton University Library. Member of the Intelligence, Agents, Multimedia Group in the School of Electronics.  Academic Scholarship and the Deep (or Invisible) Web.
    Academics constantly need to keep up to date with the latest work for their research and their teaching. However, resource discovery has become a complex task in a hybrid world of paper and digital libraries. Various techniques have been tried to make this easier. When Google becomes a search engine of choice many valuable resources lie behind a barrier that we think of as the invisible web. We describe experiments with a global information gathering agent, combining agent technologies and information management skills, to make visible these hybrid resources. We then discuss an alternative approach stimulated by the Open Archives Initiative in which academic e-Print archives become harvested by global search services.
    Biography: Dr. Jessie Hey has worked in information management for many years at the interface of computers and users and has taught courses on Human Computer Interaction and many workshops on the electronic/digital library. She has a Physics degree from Oxford, a postgraduate Certificate of Education, and Diploma in Library and Information Studies. She is a Chartered Librarian (MCLIP) and Member of the ACM.
    Besides spells at Caltech in Pasadena and CERN in Geneva, she worked in the UK in a variety of posts in higher, further and primary education. She was Manager of Technical Information Services at IBM's UK Research Labs for some years where she also set up an interactive learning centre. At Southampton University Library she supported Engineering, Mathematics and Physical Science users and is now working on the UK funded TARDis e-Prints repository project. A member of the Intelligence, Agents, Multimedia Group in the School of Electronics and Computer Science, she has previously worked on digital and 'hybrid' library projects such as ERCOMS and MALIBU. She completed a PhD in Resource Discovery In Digital Libraries last year and maintains a wide interest in digital libraries and scholarly communication issues as we move towards the Semantic Web.
    Personal webpage:   Related URLs: and
"Building quality assurance into metadata creation: an analysis based on the learning objects and e-prints communities of practice" Barton, Currier, and Hey will be presented at DC-2003 on 29th September.

Oct 10: Students' progress reports:
    Melanie FEINBERG: Design of Named Time Period Directory.
    Previous seminar presentations have discussed the use of place and time as important components for organization and retrieval of historical data, and the use of geographcal gazetteers for disambiguating place names. My project investigates whether gazetteer schemas for place names can beadapted for time period descriptions (such as "Akkadian Empire" and "Elizabethan"). I will describe the work done so far and plans for the remainder of the semester.
    Marcus ISON: Digital Preservation.
    Aspects of storing and accessing information digitally in the near future. Sample ssues addressed are the methods by which non-digitized mediums are preserved, methods to counteract or standardize information systems used to store this information to avoid obsolescence.
    Luz MARIN: Naming Documents for Short and Long-term Retrieval.
    My goal is to find whether offices in general rely on a pre-designed effective indexing to store and retrieve documents for short and long-term use. My findings so far are disappointing. Some use meaningful prefixes, numbers, and dates as unique identifiers in their naming schemes. Some available indexing systems that have been created for database documents as well as commercial parts. These systems could probably be implemented to index documents in the office environment. Currently, I am looking into Intelligent versus Non-Intelligent naming systems.
    Keasha MARTINDILL: Digital Libraries for Children.
    I plan to examine various issues regarding their setup, such as publishing copyrighted works, filtering and censorship, plus design considerations to support searching and browsing by children. Additionally, I will address whether these libraries will likely bring about the benefits hoped for, and the current lack of standards about what a children's digital library should contain.
    As the International Children's Digital Library (ICDL) a product of the University of Maryland and the Internet Archive is a unique, well-documented project, I will conduct an evaluation of it in light of the aforementioned issues. Lastly, my paper will draw conclusions about the future of children's digital libraries.
    John RISTEVSKI: Cy-Ark: A Multimedia Archive for Three-Dimensional Cultural Deritage Data: Cy-Ark is a repository and distribution portal for high-definition geometric and other associated data required to accurately describe sites for both general preservation and to adequately facilitate virtual reconstruction and academic research. The project has been conducted and funded through the Kacyra Family Foundation and the University of California, Berkeley and is currently in the prototype stage. I will demo the prototype and give an overview of the project and where it is heading.

Oct 17: David WARTHEN: Information Access for Young Children
    I have identified a lot of potentially interesting material on information access for young children. Through reading and evaluation of papers and products, I propose to examine what can be learned in this area.
    Michael BUCKLAND: The Artificiality of "Information": or, Resurrecting the Human in Information Management.
    Several related tendencies have reinforced an emphasis on the objectification of information and a corresponding reduction in the role of the human: Modernist emphasis on "facts;" huge growth of modern IT dependent on formal specifications, fixity of data, and algorithms; mass media lead to second-hand factoids; increasingly we know less and less from direct experience; the commodification of "information" positions human action is consumer choice rather than inquiry; the social ideology of "formal sciences" encourages formalization and scientism in social sciences; formalism encourages mechanistic views of role of humans within information systems.
    These developments are systemicly incomplete. Consequences include: natural language processing, based on character strings, statistical analysis and syntax, but less attention to semantics, semiotics, pragmatics; attention deflected away from processes of knowing / becoming informed and to data and records; sustained disregard for interpretation, symbolism, aesthetics, affect, belief, cognitive authority. The fundamental difference between IM and IT could get lost in a reductionist view of IM as being only within IT, with a limited, mechanistic view of humans.
    Resurrecting the human in IM could follow from:
    (1) A shift emphasis from formalisms (fixity, algorithms) to a focus on processes and events. Information-as-phenomenon: perception is an event, remembering is an event, and, with constructivist theories of knowledge, knowing is an event.
    (2) Acknowledging the artificiality of documents: Documents are artefacts; meaning is constructed individualistically, document by document, person by person, time after time.
    (3) A document is meaningless absent a context; contexts are personal/social, fluid, unstable, constructed. Placing documents in meaningful contexts, meaningful situationally for individuals is the central concern of Information Management.

Oct 24: Dagobert SOERGEL
, Univ. of Maryland.
    Developing a Test Collection for Retrieval in Large Digital Oral History Archives: The Example of SHOAH MALACH
    The purpose of the MALACH project is to develop techniques to improve access to large oral history archives. . This talk will first give an overview of the MALACH (Multilingual Access to Large spoken ArCHives) project and then describe an innovative approach to developing a test collection that supports not only overall recall and precision measurements but also detailed analysis of factors that affect the retrievability of documents.
    The purpose of the MALACH project is to develop techniques to improve access to large oral history archives. It advances automated speech recognition for difficult speech (spontaneous, accented, emotional, speech from elderly interviewees) in multiple languages: English, Czech, etc. Informed by user studies, it develops retrieval systems that use ASR results and user interfaces that facilitate interaction with oral history materials. The test bed for these techniques is the very large archive of videotaped oral histories assembled by the Survivors of the Shoah Visual History Foundation -- 116,000 hours of digitized testimonies (interviews) in 32 languages from 52,000 survivors, liberators, rescuers, and witnesses of the Nazi Holocaust. In this context, particular emphasis is on the use of these materials by educators and students for tolerance education.
    For evaluation purposes, we are developing a test collection of ultimately 20,000 testimony segments and 50 questions. We selected 50 questions from requests submitted to the Shoah foundation. We worked from a 4,000-testimony subset of the Shoah Foundation collection for which detailed cataloging data that are available: the testimonies are divided into meaningful segments of 3 - 5 minutes; each segment has a three-sentence summary, often augmented by more extensive cataloger notes, and assigned subject descriptors. We configured a test collection of 400 testimonies which contains a reasonable number of relevant segments (predominant range 20 -80) for each of the 50 questions. Four assessors worked for two months completing work on 30 questions. They used the catalog data to conduct thorough searches and then assessed relevance on a 5-point scale (0 - 4) based on the catalog data and, as needed, on listening to the audio. They also assessed the reason for or type of relevance (direct evidence, indirect or circumstantial evidence, comparison, context, and pointer to other information) and they indicated for each segment roughly what proportion pertains to the question. They identified pivotal portions of the cataloger notes or the spoken testimony that support their assessment. In addition, they kept notes on each query, setting out in writing their interpretation of the query and their criteria for assigning values for each type of relevance. The assessors were supported by an interface that minimized clerical effort and thus made this detailed data collection feasible. This test collection can be used for refined analyses of the factors that influence retrieval of a segment (the type of question, the degree of relevance, the reason for relevance, the proportion of the segment that is germane, ASR word error rate, ASR confidence, and possibly others).

Oct 31: Clifford LYNCH: Stewardship in the Digital Age
    This talk considers a series of issues about the stewardship of cultural heritage and cultural memory as large parts of this memory becomes digital, and as digital technology becomes available to create surrogates for physical materials. I consider these questions from the perspective of the individual, the organization and the nation-state, and argue that we face fundamental changes in the nature of effecitve and responsible stewardship.
    This talk is an extended version of my keynote at the European Digital Libraries meeting in Trondheim, Norway, August 2003.

Nov 7: No seminar meeting.

Nov 14: Peter BRANTLEY, UC Office of the President, & Raymond YEE , IST-Interactive University, UCB.
    Libraries and Instructional Technology.
    How should digital libraries facilitate the use of their content and services in the development of digital learning materials? This question is of current interest in both the library and educational technology communities. The California Digital Library (CDL) -- the 11th university library of the University of California -- and the UC Berkeley Interactive University Project (IU) have been working together to test and develop ways for educational technologies to make the library's resources more accessible to all its audiences -- including current and potential users in K-12 communities.
    In this talk, Peter Brantley, Director of Technology at the CDL, and Raymond Yee, Technology Architect of the IU, will discuss how the problem of interoperability between information and learning environments looks from their respective insitutions and their end-users, both theoretically and practically, functionally and technologically.
    For some background reading: "Interoperability between Information and Learning Environments Bridging the Gaps: A Joint White Paper on behalf of the IMS Global Learning Consortium and the Coalition for Networked Information"

Nov 21: Warren SACK, Marc DAVIS, Michael BUCKLAND: DOCAM03 and the Berkeley New Media initiative.
    A report on the joint SIMS / Document Academy (DOCAM3) conference in South Hall last August, which examined the nature of documents from various perspectives. Please see Also a report on the Berkeley campus' New Media Initiative and a discussion of the relevance of both to the development of SIMS. Technology permitting, Professor Niels Windfeld Lund (Univ of Tromso, Norway, who organized the DOCAM03 conference) will join the discussion by teleconference and will report briefly on related developments at Tromso.

Nov 28: Thanksgiving. No seminar meeting.

Dec 5: Students Final Progress Reports.
    Keasha MARTINDILL: Digital Libraries for Children.
    My paper will discuss the potential benefits of digital libraries for children,review current children's digital libraries and collections of digitized books for children, discuss pertinent issues regarding these libraries and collections,and review the International Children's Digital Library regarding those issues.
    David WARTHEN: Information Access for Young Children.
    Marcus ISON: Digital Preservation: Taking a Byte out of History..
    An exploration of the ramifications of Digital Preservation. What are the current methods of Digital Preservation? Who has and will have access? Is Digital the answer?
    Melanie FEINBERG: Design of Time Period Directory.
    I will present a summary of the prototype development process since my initial report, show the prototype database and some simple queries, and discuss how work in this area might proceed.
    John RISTEVSKI: Cy-Ark: A Multimedia Archive for Three-Dimensional Cultural Heritage Data.

