School of
Information Management & Systems
Previously School of Library & Information Studies
296a-1
Seminar: Information Access.
("The Friday Afternoon Seminar")
Summaries - Fall 2003.
Fridays 3-5. 107 South Hall.
Schedule.
Summaries will be added below as they become available.
Aug 29: Clifford LYNCH: Introduction. Interesting Recent Meeetings.
Introduction to the seminar.
There have been several interesting meetings
since last spring. I will lead a discussion on some of these.
We could cover SIGIR, EuroDL, JCDL, the Document Academy's
DOCAM03 conference in South Hall, the IMS Global Learning Consortium/CNI
workshop, and the
IMS meeting, and others.
Sept 5: David MESSERSCHMITT: Implications of the NSF Cyberstructure Report.
Interim Dean Messerschmitt served on the National Science
Foundation Advisory Committee for Cyberinfrastructure. The Committee's
report envisions a future cyberinfrastructure that will "radically empower"
the science and engineering community. Major funding for cyberinfrastructure
research and development is expected.
News release: www.nsf.gov/od/l
pa/news/03/pr0318.htm.
Report: www.communitytech
nology.org/nsf_ci_report/
At the request of the
Association of Research Libraries, Dave prepared a paper
outlining many opportunities for libraries
in the NSF Advanced Cyberinfrastructure Program.
Paper:
www.eecs.berkeley.edu/~messer/PAPERS/03/CI-ARL-Bimonthly.pdf
Dave discuss the report and his paper.
Sept 12: John CHUANG: Economics of Peer-to-Peer Computing
John Chuang will provide a summary report on the first
Workshop on Economics
of Peer-to-Peer Systems
www.sims.berkeley.edu/p2pecon/
that he
organized in June 2003. The workshop brought together for the first time
researchers and practitioners from multiple disciplines to discuss the
economic characteristics of P2P systems and economics-informed P2P system
design, with applications ranging from file-sharing to distributed
computation, application layer overlays to mobile ad hoc networks. He will
discuss the state-of-the-art in this area, and highlight the many research
challenges and open questions facing this emerging research community.
Sept 19: Fredric GEY and Aitao CHEN:
The Hindi Surprise Language Exercise and Other
Current Research in Cross-Language Information Retrieval.
What
would your choice of a surprise language be if you wanted search and
retrieval for that language in 29 days? Would it be Cebuano, spoken by
24 percent of the Phillipine population (the lingua franca of the southern
Phillipines)? Or would it be Hindi, spoken by over 200 million people on
the Indian subcontinent? These challenges were presented to us this
spring by the DARPA Surprise Language exercise.
We
will also be presenting our work on multilingual access to nine European
languages (English, Dutch, Finnish, French, German, Italian, Russian,
Spanish, and Swedish)
from the recent CLEF workshop on European language retrieval in Norway
(Aug 21-22), and the forthcoming special issue on cross-language information
retrieval of Information Processing and Management being edited by
Fredric Gey, Carol Peters of Italy and Noriko Kando of Japan.
Sep 26: Michael BUCKLAND: Representing Place, Time, Topic, Person: Intermediate
Infrastructure.
We have discussed place, place names, and geo-referencing
in the the seminar in previous semesters. I will update and extend
those discussions. Place names have temporal aspects, since both places
and place names change over time. This has led to exploration of the idea
of a "time period directory" analogous to a gazetteer, especially as
period names have geographical aspect. Another releated genre is the
biographical dictionary, which draws heavily on places and times.
This work has helped to develop ideas about
infrastructure for digital libraries and for Question Answering systems.
Oct 3: Jessie HEY,
Southampton University Library.
Member of the Intelligence, Agents, Multimedia Group in the School of
Electronics.
Academic Scholarship and the Deep (or Invisible) Web.
Academics constantly need to keep up to date with the latest work for
their
research and their teaching. However, resource discovery has become a
complex task in a hybrid world of paper and digital libraries. Various
techniques have been tried to make this easier. When Google becomes a search
engine of choice many valuable resources lie behind a barrier that we think
of as the invisible web. We describe experiments with a global information
gathering agent, combining agent technologies and information management
skills, to make visible these hybrid resources. We then discuss an
alternative approach stimulated by the Open Archives Initiative in which
academic e-Print archives become harvested by global search services.
Biography:
Dr. Jessie Hey has worked in information management for many years at
the interface of
computers and users and has taught courses on Human Computer
Interaction and many workshops on the electronic/digital library.
She has a Physics degree from Oxford, a postgraduate Certificate of
Education, and Diploma in Library and Information Studies.
She is a Chartered Librarian
(MCLIP) and Member of the ACM.
Besides spells at Caltech in Pasadena and CERN in Geneva, she worked in
the UK in a variety of posts in higher, further and primary education. She was
Manager of Technical Information Services at IBM's UK Research Labs
for some years where she also set up an interactive learning centre.
At Southampton University Library she supported Engineering, Mathematics
and Physical Science users and is now working on the UK funded TARDis
e-Prints repository
project. A member of the Intelligence, Agents, Multimedia Group in the School of
Electronics and Computer Science, she has previously worked on digital and 'hybrid'
library projects such as ERCOMS and MALIBU.
She completed a PhD in Resource Discovery
In Digital Libraries last year and maintains a wide interest in digital libraries
and scholarly communication issues as we move towards the Semantic Web.
Personal webpage:
www.ecs.soton.ac.uk/~jmnh/
Related URLs:
tardis.eprints.org/ and
www.ecs.soton.ac.uk
"Building quality assurance into metadata creation: an analysis
based on the learning
objects and e-prints communities of practice" Barton, Currier, and Hey will be
presented at DC-2003 http://dc2003.ischool.washington.edu/index.html on 29th
September.
Oct 10: Students' progress reports: Provisional titles:
Melanie FEINBERG: Design of Named Time Period Directory.
Previous seminar presentations have discussed the use of place and time as
important components for organization and retrieval of historical data,
and the use of geographcal gazetteers for disambiguating place names. My
project investigates whether gazetteer schemas for place names can beadapted for time period descriptions (such as "Akkadian Empire" and
"Elizabethan"). I will describe the work done so far and plans for the
remainder of the semester.
Marcus ISON: Digital Preservation.
Aspects of storing and accessing information
digitally in the near future.
Sample ssues addressed are the methods by which non-digitized
mediums are preserved, methods to counteract or standardize
information systems used to store this information to
avoid obsolescence.
Luz MARIN: Naming Documents for Short and
Long-term Retrieval.
My goal is to find whether offices in general
rely on a pre-designed effective indexing to store and retrieve
documents for short and long-term use. My findings so far are
disappointing. Some use meaningful prefixes, numbers,
and dates as unique identifiers in their naming schemes.
Some available indexing systems that have been
created for database documents as well as commercial parts.
These systems could probably be implemented to index documents
in the office environment.
Currently, I am looking into Intelligent versus Non-Intelligent
naming systems.
Keasha MARTINDILL: Digital Libraries for Children.
I plan to examine various issues regarding their setup,
such as publishing copyrighted works, filtering and censorship,
plus design considerations to support searching and browsing by
children. Additionally, I will address whether these libraries will
likely bring about the benefits hoped for, and the current lack of
standards about what a children's digital library should contain.
As the International Children's Digital Library (ICDL)
a product of the University of Maryland and the Internet Archive
is a unique, well-documented project, I will conduct an evaluation of
it in light of the aforementioned issues.
Lastly, my paper will draw conclusions about the future of children's
digital libraries.
John RISTEVSKI: Cy-Ark: A Multimedia
Archive for Three-Dimensional Cultural Deritage Data:
Cy-Ark is a repository and distribution portal for high-definition geometric
and other associated data required to accurately describe sites for both
general preservation and to adequately facilitate virtual reconstruction and
academic research. The project has been conducted and funded through the
Kacyra Family Foundation and the University of California, Berkeley and is
currently in the prototype stage.
I will demo the prototype and give an overview of
the project and where it is heading.
Oct 17: David WARTHEN: Information Access for Young
Children.
I have identified a lot of potentially interesting material
on information access for young
children.
Through reading and evaluation of papers and
products, I propose to examine
what can be learned in this area.
Michael BUCKLAND: The Artificiality of "Information":
or, Resurrecting the Human in Information Management.
Several related tendencies have reinforced an emphasis on the objectification of information and a corresponding reduction in the role of the human:
Modernist emphasis on "facts;" huge growth of modern IT dependent on formal
specifications, fixity of data, and algorithms; mass media lead to
second-hand factoids; increasingly we know less and less from direct experience;
the commodification of "information" positions human action is consumer
choice rather than inquiry; the social ideology of "formal sciences"
encourages formalization and scientism in social sciences; formalism encourages mechanistic views of role of humans within information systems.
These developments are systemicly incomplete.
Consequences include: natural language processing, based on
character strings, statistical analysis and syntax, but less attention
to semantics, semiotics, pragmatics; attention deflected away from processes
of knowing / becoming informed and to data and records; sustained disregard
for interpretation, symbolism, aesthetics, affect, belief, cognitive authority.
The fundamental difference between IM and IT could get lost in a reductionist
view of IM as being only within IT, with a limited, mechanistic view of
humans.
Resurrecting the human in IM could follow from:
(1) A shift emphasis from formalisms (fixity, algorithms)
to a focus on processes and events. Information-as-phenomenon: perception
is an event, remembering is an event, and, with constructivist theories of
knowledge, knowing is an event.
(2) Acknowledging the artificiality of documents: Documents are artefacts; meaning is constructed individualistically, document by document, person by person, time after time.
(3) A document is meaningless absent a context; contexts are personal/social, fluid, unstable, constructed. Placing documents in meaningful contexts, meaningful situationally for individuals is the central concern of Information Management.
Oct 24: Dagobert SOERGEL, Univ. of Maryland.
Developing a Test Collection for Retrieval in
Large Digital Oral History Archives: The Example of SHOAH MALACH
The purpose of the MALACH project is to develop
techniques to improve access to large oral history archives. .
This talk will first give an overview of the MALACH
(Multilingual Access to Large spoken ArCHives) project and then
describe an innovative approach to developing a test collection that
supports not only overall recall and precision measurements but also
detailed analysis of factors that affect the retrievability of
documents.
The purpose of the MALACH project is to develop
techniques to improve access to large oral history archives.
It advances automated speech recognition for difficult speech
(spontaneous, accented, emotional, speech from elderly interviewees)
in multiple languages: English, Czech, etc. Informed by user studies,
it develops retrieval systems that use ASR results and user interfaces
that facilitate interaction with oral history materials. The test bed
for these techniques is the very large archive of videotaped oral
histories assembled by the Survivors of the Shoah Visual History
Foundation -- 116,000 hours of digitized testimonies (interviews)
in 32 languages from 52,000 survivors, liberators, rescuers, and
witnesses of the Nazi Holocaust. In this context, particular
emphasis is on the use of these materials by educators and students
for tolerance education.
For evaluation purposes, we are developing a test
collection of ultimately 20,000 testimony segments and 50 questions.
We selected 50 questions from requests submitted to the Shoah
foundation. We worked from a 4,000-testimony subset of the Shoah
Foundation collection for which detailed cataloging data that are
available: the testimonies are divided into meaningful segments of
3 - 5 minutes; each segment has a three-sentence summary, often
augmented by more extensive cataloger notes, and assigned subject
descriptors. We configured a test collection of 400 testimonies
which contains a reasonable number of relevant segments (predominant
range 20 -80) for each of the 50 questions. Four assessors worked
for two months completing work on 30 questions. They used the
catalog data to conduct thorough searches and then assessed
relevance on a 5-point scale (0 - 4) based on the catalog
data and, as needed, on listening to the audio. They also
assessed the reason for or type of relevance (direct evidence,
indirect or circumstantial evidence, comparison, context, and
pointer to other information) and they indicated for each segment
roughly what proportion pertains to the question. They identified
pivotal portions of the cataloger notes or the spoken testimony
that support their assessment. In addition, they kept notes on
each query, setting out in writing their interpretation of the
query and their criteria for assigning values for each type of
relevance. The assessors were supported by an interface that
minimized clerical effort and thus made this detailed data
collection feasible. This test collection can be used for
refined analyses of the factors that influence retrieval of
a segment (the type of question, the degree of relevance,
the reason for relevance, the proportion of the segment that
is germane, ASR word error rate, ASR confidence, and possibly
others).
Oct 31: Clifford LYNCH: Stewardship in the Digital Age
This talk considers a series of issues about the
stewardship of cultural heritage and cultural memory as large parts
of this memory becomes digital, and as digital technology becomes
available to create surrogates for physical materials. I consider
these questions from the perspective of the individual, the
organization and the nation-state, and argue that we face
fundamental changes in the nature of effecitve and responsible
stewardship.
This talk is an extended version of my keynote
at the European Digital Libraries meeting in Trondheim, Norway,
August 2003.
Nov 7: No seminar meeting.
Nov 14: Peter BRANTLEY, UC Office of the President, & Raymond YEE
, IST-Interactive
University, UCB.
Libraries and Instructional Technology.
How should digital libraries facilitate the use
of their content and services in the development of digital
learning materials? This question is of current interest in both
the library and educational technology communities.
The California Digital Library (CDL) -- the 11th university
library of the University of California -- and the UC Berkeley
Interactive University Project (IU) have been working together
to test and develop ways for educational technologies to make the
library's resources more accessible to all its audiences --
including current and potential users in K-12 communities.
In this talk, Peter Brantley, Director of Technology
at the CDL, and Raymond Yee, Technology Architect of the IU, will
discuss how the problem of interoperability between information
and learning environments looks from their respective insitutions
and their end-users, both theoretically and practically,
functionally and technologically.
For some background reading: "Interoperability between
Information and Learning Environments – Bridging the Gaps:
A Joint White Paper on behalf of the IMS Global Learning Consortium
and the Coalition for Networked Information"
http://www.imsglobal.org/DLims_white_paper_publicdraft_1.pdf
Nov 21: Warren SACK, Marc DAVIS, Michael BUCKLAND: DOCAM03 and the Berkeley
New Media initiative.
A report on the joint SIMS / Document Academy (DOCAM3)
conference in South Hall last August, which examined the nature of
documents from various perspectives. Please see
thedocumentacademy.hum.uit.no/events/announcements/DOCAM03.Program.html.
Also a report on the Berkeley campus' New Media Initiative
and a discussion of the relevance of both to the development of SIMS.
Technology permitting, Professor Niels Windfeld Lund (Univ of Tromso, Norway, who
organized the DOCAM03 conference)
will join the discussion by teleconference
and will report briefly on related developments at Tromso.
Nov 28: Thanksgiving. No seminar meeting.
Dec 5: Students Final Progress Reports.
Keasha MARTINDILL: Digital Libraries for Children.
My paper will discuss the potential benefits of digital
libraries for children,review current children's digital
libraries and collections of digitized books for children,
discuss pertinent issues regarding these libraries and
collections,and review the International Children's Digital
Library regarding those issues.
David WARTHEN: Information Access for Young
Children.
Marcus ISON: Digital Preservation:
Taking a Byte out of History..
An exploration of the ramifications of
Digital Preservation. What are the current methods of
Digital Preservation? Who has and will have access?
Is Digital the answer?
Melanie FEINBERG: Design of Time Period
Directory.
I will present a summary of the prototype development process
since my initial report, show the prototype database and some simple
queries, and discuss how work in this area might proceed.
John RISTEVSKI: Cy-Ark: A Multimedia
Archive for Three-Dimensional Cultural Heritage Data.
The Seminar will resume on Friday, January 23.
Fall 2003 schedule.
Spring 2003 schedule
and
summaries.