School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Fall 2009.
Fridays 3-5. 107 South Hall.
Schedule.
Summaries are added as they become available.
Friday, Aug 28: Clifford LYNCH: Introduction.
Building Computational Instruments for the Humanities and Social Sciences.
Introduction to Seminar; Schedule for Semester;
Introduction of Participants.
Building Computational Instruments for the Humanities and
Social Sciences.(Lynch). Continuing and extending a discussion at last
year's seminar, I'll explore some of the potential for building new computational
services that can function as new instruments for the humanities and social
sciences (and indeed for many other areas of investigation) and relate them
to developments in text and data mining and information retrieval.
I will highlight a number of recent experiments in this area. Finally, I'll
frame questions about how such instruments might be deployed, and by what organizations.
Friday Sep 4: No Seminar Meeting.
Friday Sep 11: Doug OARD, Univ of Maryland: Finding Things You Can't Read:
Interactive cross-language search for monolingual users.
Speech recognition and machine translation techniques are evolving
rapidly, creating new opportunities to build systems that can support information
seeking in large collections of multilingual and multimedia content.
Little is presently known, however, about how people would use such systems to
accomplish real tasks. In such circumstances, designers naturally rely on their own
judgment to decide how component capabilities should be optimized and how those
components should be integrated. Once that's been done, the next step is to put
the resulting system in the hands of users in order to learn what they do with it.
In this talk, I will describe what we have learned so far from such a process.
I'll start with some background on user-centered evaluation for cross-language
information retrieval at the Cross Language Evaluation Forum (CLEF). I will then
introduce Rosetta, an integrated system that supports search and display of live
and archived news feeds in four languages for users who know only English and I'll
explain how we have used a formative evaluation process to co-evolve both the design
of the system and of the ways in which it can be used. I'll conclude the talk with
a few design ideas that build on what we have learned to date.
Douglas Oard is an Associate Processor at the University of
Maryland, College Park, with joint appointments in the College of Information
Studies and the Institute for Advanced Computer Studies. He is on sabbatical at
Berkeley's School of Information for the Fall 2009 semester. Dr. Oard earned
his Ph.D. in Electrical Engineering from the University of Maryland, and his
research interests center around the use of emerging technologies to support
information seeking by end users. His recent work has focused on interactive
techniques for cross-language information retrieval, searching conversational
media, and support for sense-making in large digital archival collections.
Additional information is available at
www.glue.umd.edu/~oard/.
Friday Sep 18: Julian WARNER, Queen's University, Belfast:
Creativity in Feist,
This paper is not about the legal aspects of Feist (1991), but
approaches the judgment from an information science perspective.
Analogies are found between concepts in the widely circulated public
discourse of Feist and distinctions between forms of mental labor recently introduced
to information science. The delineation of the absence of creativity in Feist
is analogous to syntactic mental labor and the judgment's criteria for creativity
can be encompassed by semantic labor.
The validity and significance of the distinction between syntactic
and semantic mental labor is supported by the discovery of corresponding concepts
in the judgment.
Julian Warner teaches information science and information
policy in the Management School at the Queen's University, Belfast, and has been
a Visiting Scholar here. He is interested in the history of information and of
information technology.
His forthcoming book
Human Information Retrieval will the first in the
new MIT Press series on the History and Theory of Information Science.
More at
http://www.qub-efrg.com/faculty-directory/26/julian-warner.
Friday Sep 25: Catherine MARSHALL, Microsoft Research, Silicon Valley:
No Bull, No Spin: Comparing Public Tags with other Descriptive User Metadata.
User-contributed tags have shown promise as a means of indexing
multimedia collections by harnessing the efforts and enthusiasm of online communities.
But tags are only one way of creating viable descriptions of multimedia collections.
In this talk, I report on a study that takes a close look at the characteristics of
public tags by comparing them to other forms of descriptive metadata that users have
assigned to an image collection. I also use the study results to formulate design
recommendations for tagging tools and to speculate on how photo sharing sites may be
used as de facto art and architecture resources.
Cathy Marshall is currently a senior researcher at Microsoft
Research's Silicon Valley laboratory after a stint in Microsoft's product divisions
as part of the Advanced Reading Technologies team. Before that, she was a long-time
member of the research staff at Xerox PARC. Cathy's non-Microsoft homepage is at
www.csdl.tamu.edu/~marshall. There
you will find her publications, blog, contact information, and will learn why she
was not invited to her high school reunion.
Also Brief Progress Report: Ryan SHAW:
Modeling Colligatory Concepts in Historical Texts.
The philosopher of history W.H. Walsh introduced the notion of
"colligation" to describe how historians gather diverse factual
statements under a unifying concept like "The Renaissance" or "The
French Revolution." Frank Ankersmit, building on Walsh's ideas,
proposed that types of these "colligatory concepts" could be defined
extensionally, by clustering overlapping sets of statements from
various texts narrating similar concepts under the same name.
My proposal for this semester is to investigate Ankersmit's theory by
analyzing the full text of 10 books on the 1886 Haymarket Square Riot
from the Internet Archive. I plan to use sentence alignment techniques
(Barzilay & Elhadad 2003) to identify overlapping sets of statements
among the 10 texts. I hope to demonstrate that we can extensionally
model the Library of Congress Subject Heading "Haymarket Square Riot,
Chicago, Ill., 1886" according to Ankersmit's theory and provide an
interface for highlighting differences among the individual narratives
constructed by the different texts.
Friday, Oct 2: Michael BUCKLAND: Design for the Future Use of Reference Works.
Understanding depends on knowing the background, context, and
relationships of whatever is of interest. Learning comes through adding to or
modifying what one already knows. For these purposes a variety of reference
books evolved in the print environment. Having a suitable set of explanatory works
conveniently at hand is a valuable amenity, but has been slow to evolve in the
online environment. How could such an amenity be made part of everyone's personal
computing environment? The literature on library reference service has concentrated
on empowering librarians to find answers for library users, which is good, but
most people prefer to find explanations for themselves if they can do so easily enough.
Economic considerations and changes in technology make a compelling basis for a
shift in emphasis to the support of reference self-service.
A current project of the Electronic Cultural Atlas
Initiative and the School of Information entitled "Context and Relationships:
Ireland and Irish Studies" seeks to provide a remedy.
After several months experience
with an initial prototype "Context Finder", a quite different design is
now being worked on.
I will lead a
discussion of some of the implications of enabling self-service discovery in relatively
trustworthy resources, which are commonly digital versions of traditional
print-on-paper reference works. We will consider the design implications for
three groups: provders of office software (browsers, wordprocessors);
publishers of reference works; and librarians, bibliographers, and
teachers.
More at ecai.org/neh2007.
Friday, Oct 9: Students' progress reports:
Krishna JANAKIRAMAN: BellKor and other approaches towards building
book recommendation systems.
The Netflix recommendation system competition has effected a surge
in recommendation systems research. This has resulted
in more accurate and scalable approaches towards building recommendation systems
(Y Koren RecSys 2008). For the seminar, I would like to take
a detailed look at the BellKor algorithm (Y. Koren, "Factorization Meets the
Neighborhood: a Multifaceted Collaborative Filtering Model",
the algorithm that won the Netflix recommendation system competition for the year 2008.
One motivation is to try and apply the same algorithm
for book recommendation using the BookCrossing dataset (www.bookcrossing.com).
Another motivation is to perform a detailed statistical analysis
of the BookCrossing dataset itself. Such an analysis, I believe, may lead towards
discovering interesting rules and patterns within a large book reading
community like BookCrossing. The inferred rules can further be utilized towards
engineering rules or decision tree based recommendation systems for books -
an approach seldom taken by recommendation system engineers.
Nick DOTY: A Meaningful Ontology of Location.
As more and more devices have the ability to geolocate themselves,
we have an increasing ability to map our own geospatial position. Where we are
at a given point of time can provide a valuable and meaningful context to our
lives, but in practice most location-based services exclusively exchange latitude
and longitude coordinates. Though those coordinates are straightforward for
storage and transmission, they leave out a lot of the semantic content.
I'll report on my work so far looking at existing ontologies of location
and then roughly sketch out some of the additions that might be useful in
capturing the cultural meaning of our location.
Clifford LYNCH: Storage Systems, Resilience, and the Research
Agenda for Digital Preservation.
I'll share some reflections on the recent Library of Congress
sponsored Symposium on Storage Systems for Digital Preservation, what we are
learning about storage systems, some ideas from the emerging field of
resilient systems, and talk about what this may suggest for the future
research and development agenda in support of digital preservation going
forward.
Friday, Oct 16: Katsumi TANAKA, Kyoto University: Web Search and Information
Credibility Analysis.
We describe a new concept for improving Web search performance and/or
increasing the information credibility of search results using Web 1.0
and Web 2.0 content in a complementary manner. Conventional Web search
engines still suffer from a low precision/recall ratio, especially for
searching multimedia content (images, videos, etc.). The quality control
of Web search is generally insufficient due to low publishing barriers.
As a result, there is a large amount of mistaken and unreliable
information on the Web that can have detrimental effects on users. This
calls for technology that facilitates the judging of the trustworthiness
or credibility of content and the accuracy of the information that users
encounter on the Web. Such technology should be able to handle a wide
range of tasks: extracting credible information related to a given
topic, organizing this information, detecting its provenance, and
clarifying background, facts, and other related opinions and their
distribution. We propose and describe a concept of enhancing the search
performance of conventional Web search engines and analyzing information
credibility of Web information using the interaction between Web 1.0 and
Web 2.0 content. We also overview our recent research activities on Web
search and information credibility based on this concept.
Professor Katsumi Tanaka received the BS, MS and PhD degrees in
Information Science from Kyoto University, in 1974, 1976 and 1981,
respectively. In 1986, he joined the Department of Instrumentation
Engineering, Faculty of Engineering at Kobe University, as an associate
professor. In 1994, he became a full professor at the Department of
Computer and Systems Engineering Department, Faculty of Engineering,
Kobe University. Since 2001, he has been a professor of the Graduate
School of Informatics, Kyoto University. He is currently a vice-dean of
the school. His research interests include database theory and systems,
Web search, video retrieval, and multimedia information systems.
More at
www.dl.kuis.kyoto-u.ac.jp/~tanaka/.
Also Katzutoshi SUMIYA, Hyogo University:
Less-Conscious Information Retrieval Techniques for Location Based Services.
We have developed methods which can deal with the users'
interaction without the conventional conscious searching
manner. When a user generally performs map operations with
certain information retrieval intentions (less-conscious),
a system using our method can detect the specific
operation sequences. For example, if the user performs
zooming-in and centering operations, the user is narrowing
down the search area to a certain location. We define such
operation sequences as chunks. The system detects the
chunks and uses them to analyze the user's operations and
thereby detect the user's intentions. We have developed
several prototype systems based on the proposed methods.
Kazutoshi Sumiya
is professor, School of Human Science and
Environment, University of Hyogo, Japan. He specializes in information
search, the WWW, content integration and multimedia. He
received his BE and ME degrees in instrumentation
engineering from Kobe University in 1986 and 1988, respectively.
Then he joined Matsushita Electric Industrial Co. He received
his Ph.D in Information media from Kobe University in 1998. He
left the company and became a lecturer at Kobe University in
1999, and then was promoted to an associate professor in 2000.
He became an associate professor in 2001 at Kyoto University and
a professor at the University of Hyogo in 2004. He developed
software development support systems using visual prototyping
for embedded software in home appliances and digital satellite
data dissemination systems at Matsushita Electric. At Kobe
University and Kyoto University, he developed information
dissemination systems and fusion technique for broadcast media
and network media. At the University of Hyogo, he is developing
next-generation information techniques. He is a chair of
Database System special interest group (DBS) in the Information
Processing Society of Japan (IPSJ) and a co-editor of IPSJ
Transaction on Database.
Friday, Oct 23: Patrick SCHMITZ: Berkeley Prosopography Services and
CollectionSpace.
Berkeley Prosopography Services (BPS) is an open-source prosopographical
toolkit that generates interactive visualizations of the biological and
social connections that link documented individuals, providing a dynamic and
heuristic tool for researching historical communities documented in legal
and administrative archives.
We are currently exploring and developing a prototype application with a
single target corpus, but will soon expand to support multiple corpora. The
initial corpus is a set of Hellenistic Babylonian legal texts (cuneiform
tablets). I'll describe our architecture and the tools we're using, and
describe our plans for the next year or so.
CollectionSpace is a collaboration that brings together a variety of
cultural and academic institutions with the common goal of developing and
deploying an open-source, web-based software application for the
description, management, and dissemination of museum collections
information.
Berkeley is responsible for the development of the services back-end, which
follows SOA principles adapted to this domain. I'll talk about the overall
project architecture and organization, and some of the new approaches we've
developed to services architecture, SOA methodology, and SOA governance.
Pilot deployments of CollectionSpace are underway with the Phoebe A. Hearst
Museum of Anthropology, and with the Herbaria collections.
Both of these projects fit into a longer term mission in IST-Data Services
to build a platform of reusable, interoperable services that support
research and teaching.
See
Using Natural Language Processing and Social Network Analysis
to study ancient Babylonian society. inews.berkeley.edu/articles/Spring2009/BPS.
Also
Collection management systems for campus museums:
CollectionSpace 0.1 released. inews.berkeley.edu/articles/Aug2009/CollectionSpace.
Patrick Schmitz is Semantic Services Architect in
the campus Information Services and Technology's Data Services section.
Friday, Oct 30: Isaac MAO, Social Brain Foundation:
The Future of Sharism: Social Media's Impact in China.
As we mark 40 years since the transformation of the Internet from a
single meme into a global communication tool, it's time for us to
imagine that the future of the Intenet could be both socialized to
connect all people and materalized to connect all things. Considering
the speed with which we now connect, a high level of global
consciousness could emerge with active sharism around the world. This
kind of emergent power could be showcased soon in some rapidly wired
countries like China to see its constructive potential in politics and
society.
Isaac MAO is a philosopher on Sharism, social entrepreneur, blogger,
software architect and researcher in learning and social technology. He
divides his time between research, social works, business and
technology. He is now managing director of Social Brain Foundation
.
As one of the earliest bloggers in the Chinese community, Isaac is not
only co-founder of CNBlog.org
www.cnblog.org which is the
earliest evangelizing site in China on grassroots publishing, but also
the co-chair of Chinese Blogger Conference
www.cnbloggercon.org/.
Issac Mao's homepage is at
www.isaacmao.com.
Friday, Nov 6: Clifford LYNCH: Very Large Scale Preservation; Free Speech and
Access to Knowledge.
After a quick around-the-table for announcements, I'm going
to first cover questions about new storage and computational models for very
large scale digital preservation, with particular focus on the issues raised
by work going on in the resilient computing area, which I will summarize.
This will help to shape a new research agenda for digital preservation.
If there's enough time, I'll follow this up with a re-visiting
of some of the talk that I gave last week at the inagural Kaplan symposium at
Penn State, which examines the relationships between fundamental American
values of free speech and freedom of the press and related ideas of rights of
access to knowledge and information. There are some surprisingly deep problems
here, including the relationships among "knowledge", "information",
"entertainment" and "culture".
Friday, Nov 13: Tom MORITZ: Data as Evidence.
For decades there has been a general recognition that data should
be freely and effectively available for use. (The scientific method assumes the
availability of data for replication or falsification of results.) A variety
of countervailing pressures have impeded such access and use.
Recently, the European Union, the US National Academies, the Ecological
Society of America, GBIF (the Global Biodiversity Information Facility), the
NSF OCI DataNet initiative and have all been exploring new models for full
life cycle management of data.
In well funded, "big science" domains, models for data management
incorporating community standards, metrics and best practices have evolved to
provide for access and use. In small science such models are less well developed.
This talk will consider data and emerging developments in data curation and
dissemination -- focusing on "small science" and on effective applications of
data to policy formation and decision making.
Tom Moritz has worked since 1975 as a librarian and
knowledge manager in both the public sector and the non-profit private sector,
in governmental, academic and museum settings. He has worked as an advisor on
knowledge management in Africa, Asia, Europe, the Pacific and Latin America,
was a lead organizer of the Biodiversity Heritage Library Project and the
now-UNEP-based Conservation Commons
(www.conservationcommons.org).
He led in the development and release of the first World Database on Protected
Areas. and has successfully participated in grants from the Mellon Foundation,
the Sloan Foundation and the US National Science Foundation. In the Fall of
2005, he served as Visiting Assoc. Prof. at the Pratt Institute Graduate School
of Library and Info. Science in NY.
Friday, Nov 20: Students' Final Progress Reports.
Nick DOTY: Personalized Ontologies of Location.
Many valuable location-based services will depend on an understanding
of location that is personal rather than universal. How would such a parameterized
ontology vary from standard concepts of geography and how can it enable
self-reflection, privacy and context? What challenges stand in the way of any
implementation?
Krishna JANAKIRAMAN: Neighborhood Based Approaches Towards Building
a Book Recommendation System.
Collaborative filtering algorithms are recommender systems that
predict unknown user ratings for items from previously known ratings. A predominant
approach towards building such algorithms is to build a neighborhood model for items
(or users) and then predict the unknown rating for an item using ratings from the
item's (or user's) neighborhood. In my final report, I will discuss three such
approaches towards building a Collaborative Filtering algorithm for recommending
books using the Book Crossing dataset(www.bookcrossing.com). This includes a new
approach suggested by Koren et.al. in their Netflix prize winning algorithm.
In their BellKor algorithm, (Y. Koren, "Factorization Meets the Neighborhood:
a Multifaceted Collaborative Filtering Model") Koren et. al proposed a neighborhood
model in which the weights that relate ratings in the neighborhood to the predicted
rating are learned from a global optimization scheme. I will be analyzing their
method's performance on the Book Crossing dataset against two well known neighborhood
based approaches where the neighborhood models are built using the Pearson's
correlation coefficient and the SVD of the user-book ratings matrix respectively.
Ryan SHAW: The Haymarket Affair/Massacre/Riot:
Programmatically Analyzing Full
Texts About a Contested Event.
I will present a progress report on my attempt to investigate
Ankersmit's theory of the colligation by analyzing the full text of 10
books on the 1886 Haymarket Affair from the Internet Archive.
Friday, Nov 27: No Seminar Meeting -- Thanksgiving.
Friday, Dec 4: *** No Seminar Meeting *** .
The Seminar will resume on January 22, 2010.
Fall 2009 schedule.
Spring
2009 schedule
and summaries.
Spring
2010 schedule
and summaries.