School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Spring 2010.

Fridays 3-5. 107 South Hall. Schedule.
Friday Jan 22: Clifford LYNCH: Introductions and Varied Topics.
    1. Introduction to the seminar and preview of the semester.
    2. Brief topics (contributions from all participants welcome): Reflections on HICSS, the fall CNI meeting, the NSF workshop on assessing the impact of scholarly communication.
    3. Some initial thoughts on personal archiving and the role of cultural memory organizations in the digital age (preparation for a February workshop talk).

Friday Jan 29: Jöran BEEL and Béla GIPP, Magdeburg, Germany: Mindmaps and Citation Proximity Analysis. See below.
    Also Cecilia PRESTON: Professional Societies and their members: What is effective communication today?
    A brief discussion: The Professional Societies and the journal have a long history - 1660's or so. In the 21st C. is this the only way? I am the co-chair of a committee of the American Society for Information Science and Technology (ASIS&T) which has been charged with exploring the communication methods of the Society, including traditional publication. We are to recommend strategies to move the Society forward in light of the many rapidly changes in modes of communication. We all have heard over and over about the crisis in scholarly communications - these problems also occur within the professional societies (which in this case encompasses a highly ranked scholarly journal JASIS). What works and/or doesn't work for you and the professional/scholarly societies to which you belong? Let's talk.
    Béla GIPP, Guericke University, Magdeburg, Germany: Citation Proximity Analysis
    The search for related work is a time-consuming procedure that even if performed by experts often leads to unsatisfying results. To alleviate the problem, digital libraries, search engines and recommender systems such as Google Books and Google Scholar offer to display "related" documents. The most powerful approaches to determine related documents usually apply citation analysis. As part of my PhD project I developed a new approach to identify related documents, called Citation Proximity Analysis (CPA). It is based on co-citation analysis, but additionally evaluates the citations' position within the full text. The same concept can also be applied on mind maps and quotations (Quotation Proximity Analysis; QPA) and therefore could be used to determine related documents. The talk presents latest research results of the CPA and QPA as well as a new approach to determine plagiarism based on citation order analysis even if the plagiarized document was translated to another language.
    Béla GIPP is a PhD student in Computer Science at the Very Large Business Applications Lab in the Faculty of Computer Science at the Otto-von-Guericke University in Magdeburg, Germany, and a Visiting Scholar in the I-School this year. His fields of research are information retrieval and recommender systems.
    Jöran BEEL, Guericke University, Magdeburg, Germany: Enhancing Search Applications with Data Retrieved from Mind Maps.
    Mind maps are used for many purposes, among others for brainstorming, project management and outlining documents. In my presentation I will demonstrate how to manage scientific literature and drafting research papers with the mind mapping software "SciPlore MindMapping". In addition, concepts and first research results will be presented how data retrieved from mind maps can enhance search applications such as expert search systems; web and academic search engines; research paper recommender systems; and search query recommender.
    Jöran BEEL is also a PhD student at the Very Large Business Applications Lab in the Faculty of Computer Science at the Otto-von-Guericke University in Magdeburg, Germany and a Visiting Scholar in the I-School this year. His fields of research are academic search engines and mind maps.
    For further reading on both presentations:
and for SciPlore MindMapping:

Friday Feb 5: ** Double program: 1:30 p.m. and 3:00 p.m. **
    Dominic POWLESLAND, Landscape Research Centre, U.K.: My mind Boggles as I Goggle at my Google Earth: Challenging the Serendipity and Emptiness of Past Landscapes through the dissemination of 30 years of archaeological research in the Vale of Pickering, Yorkshire, England

    Thirty years ago the Vale of Pickering in Yorkshire, England, was an archaeological distributional "blank", a landscape dominated by apparent "emptiness" - like most of the maps produced by archaeologists and others in the humanities whose knowledge had mostly been gained through serendipitous discovery or chance survival.
    Over the last 3 decades the Landscape Research Centre (LRC) with the support of English Heritage has run multiple projects, both large excavations and pioneering remote sensing programmes. When combined the results expose what appears to be the most densely utilised fragment of human landscape in Britain covering c. 100 sq km, showing an area of intense human activity for over the last 10,000 years. The results of this work challenge our perceptions and interpretations of the past at almost every level and for almost every chronological period; however they also challenge our theoretical frameworks and our methods of knowledge dissemination.
    The lecture will reflect on past research and how false theories and impressions have created an un-functioning view of the past. Whilst complex archaeological datasets are difficult to articulate and disseminate, we innovatively achieve this at present using a vast plot measuring 1 x 5 meters at a scale of 1:2000 termed the "wallpaper". The lecture will review work we are doing to try and deliver the results of our research through the Google Earth platform. The addition of a clock in Google Earth was a dream come true for anyone wishing to articulate cultural sequence in space -- the holy grail 4th dimension for archaeological computing. However we have had to look at more novel approaches to face up to the challenge of presenting more than 27,000 distinct and interlinked features from the Mesolithic to Medieval periods.
    Dominic Powlesland is Visiting Professor, Institute of Medieval Studies, University of Leeds, England, and Director of The Landscape Research Centre

- 3:00 p.m.: W. David BAMMAN and Gregory CRANE, Project Perseus, Tufts Univ: New Questions for Old Data: Information Technology as Catalyst for New Possibilities and Combinations of Teaching and Research with Historical Languages.
    The rise of massive collections, increasingly sophisticated services and global, heterogeneous intellectual communities has changed the role of intellectual inquiry: not only can we work with far more textual data and with more languages than was ever before feasible, but there are new opportunities for undergraduates and non-specialists to contribute as well. We will talk about some of the opportunities as well as challenges before us.
    David Bamman is a senior researcher in computational linguistics for the Perseus Project at Tufts University. Gregory Crane is Editor-in-Chief of the Perseus Project and Professor of Classics at Tufts University. More at: and

Friday Feb 12: Bob BELL: E-Ris (E-Resources for Industry Studies)
    The project I intend to develop for the Information Access Seminar is a continuation of the database project I started in INFO 257: e-RIS (e-Resources for Industry Studies). In the database course, I developed the database structure and a simple PHP-enabled website for accessing and viewing tables as well as executing queries and filtering results. The second phase of the project will be two-part: (1) Improving usability for research team's ability to add and edit records to database; (2) Migrating database to public website with search functionality and user feedback enabled.
    Also Clifford LYNCH: Evolving Roles of Subject and Institutional Repositories.
    In this discussion, I'll reflect on the evolving roles and relationships between subject, or disciplinary, repositories on one hand and institutional repositories on the other. This is based in part on a talk I gave in late January at the conference on Subject Repositories in Economics hosted at the British Library.

Friday Feb 19: Nick DOTY, Ryan GREENBERG, Julián LIMÓN NÚÑEZ, Hyunwoo PARK, Ljuba MILJKOVIC, and Abe COFFMAN: IO Lab Showcase Redux.
    Information Organization Lab was a new course taught in the Fall at the School of Information for students to experiment with fundamental concepts of information organization and retrieval. Students from the class will demo some of the over 30 group and individual projects that were built as part of rapid two- to three-week assignments using tools like JavaScript, jQuery, Python, Google App Engine and APIs from Delicious, Flickr, Yahoo! and Freebase. Come see Vannevar Bush's Memex implemented using Delicious, a location-based news retrieval interface, a Flickr tag browser and more. The instructors will also be on hand to answer questions about how the course ran and to get your feedback on whether and how to continue the class in the future.

Friday Feb 26: Ray LARSON: Bringing Lives to Light: Biography in Context -- A Retrospective.
    Cultural heritage, history, and the social sciences are fundamentally about human activity. Everyone is interested in what other people do and have done. Chronological, geographical and biographical data lend themselves naturally to being connected: an event is associated with a place, a time and potentially with particular people; places are associated with different events and people; and individual people are also associated (in a variety of ways) with different places and events. Life-events in sequence constitute a narrative that can engage interest and spark inquiry. History, geography, and most other subjects can come alive in the travelogues, journeys of discovery, and the life-stories of those involved. Science can be explained through the work of scientists. Engineering is routinely explained through the heroic struggles of inventors. But mere narrative is not enough. Understanding the context of these life events differentiates education from memorizing. It is understanding the circumstances of people's actions that illuminates their lives, but there is a significant gap in the infrastructure developed by libraries, museums, and publishers in this area. Our objective in this project was to design, demonstrate, and evaluate techniques that would bring lives to light by revealing them in their contexts.
    Project website Final report:

Friday Mar 5: Ben SHNEIDERMAN, Univ of Maryland: A National Initiative for Technology-Mediated Social Participation.
    Technology-mediated social participation is generated when social networking tools (such as Facebook), blogs and microblogs (Twitter), user-generated content sites (YouTube), discussion groups, problem reporting, recommendation systems, and other social media are applied to national priorities such as health, energy, education, disaster response, environmental protection, business innovation, cultural heritage, or community safety.
    Fire, earthquake, storm, fraud, or crime reporting sites provide information to civic authorities, AmberAlert has more than 7 million users who help with information on child abductions, Peer-to-Patent provides valuable information for patent examiners, and the SERVE.GOV enables citizens to volunteer for national parks, museums and other institutions. These early attempts hint at the vast potential for technology-mediated social participation, but substantial research is needed to scale up, raise motivation, control malicious attacks, limit misguided rumors, and protect privacy (
    As national initiatives are launched in several countries to dramatically increase research and education on social media, a coordinated approach will be helpful. Clearly stated research challenges should have three key elements: (1) close linkage to compelling national priorities; (2) scientific foundation based on established theories and well-defined research questions (privacy, reciprocity, trust, motivation, recognition, etc.); and (3) computer science research challenges (security, privacy protection, scalability, visualization, end-user development, distributed data handling for massive user-generated content, network analysis of community evolution, cross network comparison, etc.).
    Potential short-term interventions include:
- universities changing course content, adding courses, and offering new degree programs;
- industry helping researchers by providing access to data and platforms for testing; and
- government agencies applying these strategies in pilot studies related to national priorities.
    Ben Shneiderman is a Professor in the Department of Computer Science, Founding Director (1983-2000) of the Human-Computer Interaction Laboratory, and Member of the Institute for Advanced Computer Studies at the University of Maryland at College Park ( He is the author of Leonardo's Laptop: Human Needs and the New Computing Technologies (MIT Press, 2002) and Designing the User Interface: Strategies for Effective Human-Computer Interaction, Fifth Edition, Addison-Wesley, 2009).

Friday Mar 12: Michael BUCKLAND: Information, Knowledge, and I-Studies.
    Information Studies is widely regarded as a new and emerging field -- and has been for several decades. The rise of "i-schools" and demands for the development of a science of information makes the time ripe for a clarification of the core concerns of i-studies. I will introduce and invite discussion of ideas about how that might be achieved through a more careful view of "information", of "knowledge", and of being "interdisciplinary" and what kind of "science" this field could be.

Friday Mar 19: David S. H. ROSENTHAL, Stanford University Libraries: Stepping Twice Into the Same River.
    The talk Dr. Rosenthal gave here in November 2008 on the problems of digital preservation evolved into a plenary talk "How Are We Ensuring the Longevity of Digital Documents?" at CNI ( It concluded that the transition from pre-Web electronic documents for manipulation to post-Web documents for publication had profound effects on digital preservation. In this talk he will broaden the focus to look at the effects on the publishing ecosystem of the looming transition from static content to dynamic services, and how access for future readers could be maintained.

Friday Mar 26: Spring break. No seminar meeting.

Friday Apr 2: Short Reports:
    Bob BELL: E-Ris (E-Resources for Industry Studies): Progress report
    I will provide a short update on my work on the E-RIS project which aims to create a public e-service that improves both access of industry researchers to e-resources and their ability to make use of e-resources in generating their output. In the update, I will discuss (1) creating a new set of meta-data for the initial e-resources; and (2) providing supplemental data on each e-resource; and (3) revising the database structure.
    Ray LARSON: New NEH Project on Archival Name Extraction and Disambiguation.
    Noah KERSEY & Michael BUCKLAND: New Context Finder.
    Michael BUCKLAND: New Mellon Project on Editors' and Curators' Notes

To be rescheduled:
    Avi RAPPOPORT, Search Tools Consulting: The Average Lifespan of a Web Page
    I'll talk about an interesting statistic that came out of the research I did about the "average lifespan of a web page". It seems existential: What is a lifespan? What is a web page? What is average? What about greater web sites? How is this related to URL turnover? What about institutional vs. personal networking communities? Who decides and when?
    I will also report briefly on The UK Web Archive. The British Library got tired of waiting for the copyright law to be interpreted, so they're going ahead and indexing everything on the .uk top-level domain. IBM is providing services for handling this "Big Data" (new-ish jargon term), including Apache Hadoop, Pig Latin, Nutch, Open Calais, InfoSphere and ManyEyes. They're playing with metadata extraction and interface ideas. The one they're currently touting is a spreadsheet interface (thus the name BigSheets). My articles on this are at: and
    Avi Rappoport is a metadata and search engine consultant with Search Tools Consulting

Friday Apr 9: Alan INOUYE, Director, Office for Information Technology Policy, American Library Association: Information Policy and Politics in the Obama Administration.
    This session will include discussion of the broadband grant programs under the American Recovery and Reinvestment Act (the big stimulus package from 2009) and the recently-released National Broadband Plan. How do these initiatives affect the library and related communities? What is the back story of how such initiatives are crafted? In addition, there will be some discussion of other activities within ALA's Office for Information Technology Policy such as the Google Book Search lawsuit, readers' rights in the digital age, the future of libraries and the public's access to information, and other topics as attendee interest dictates.
    Alan Inouye received his Ph.D. from South Hall in 1997. In 2007, he was appointed director of the American Library Association's Office for Information Technology Policy, based in Washington. Previously, he served as coordinator of the President's Information Technology Advisory Committee (PITAC) and study director at the National Academy of Science's Computer Science and Telecommunications Board (CSTB).

Friday Apr 16: Ingeborg & Arne SØLVBERG, Trondheim, Norway, and Thomas TUNSCH, Berlin.
    Short report: Thomas TUNSCH, Berlin: The Wikimedia@MW2010 Workshop.
    A report on the Museums and the Web 2010 conference's Wikimedia@MW2010 Workshop on how the museum and wikipedia communities can work together more effectively.
    Ingeborg & Arne SØLVBERG, Trondheim, Norway: Knowledge Management in Times of IT Revolutions.
    The distribution channels for knowledge have become much less expensive over the last 5-10 years, and may be regarded as a free commodity relative to other costs, e.g., for producing knowledge and for consuming. Established organizations for knowledge management have become under pressure, e.g., newspapers, libraries, publishers, booksellers, and TV. The representation of knowledge is also changing, from the representation of knowledge in printed documents to the representation of knowledge in the semantic web of the future. The presentation is composed of two parts:
    Ingeborg Torvik Sølvberg: The reengineering of Norwegian research libraries during the previous IT-revolution, what we can learn by the experiences of bringing IT into an existing library organization.
    Arne Sølvberg: Model management versus document management, when will the two meet?
    The role of models in the representation of knowledge; how knowledge, information and data relate; approaches to information service provision; cross-disciplinary knowledge management; the semantic web.
    Discussion theme: The skills basis of the knowledge manager of the future.
    Background: One of the most striking features of the IT revolution is that the distribution of knowledge has become gratis, relatively speaking. Those who previously controlled the expensive distribution channels had a nearly monopoly in deciding which content was "worthy" of distribution, e.g., the newspaper editors determined what the newspaper would print; the publishers determined which books to print. Only a few would be granted the privilege of delivering content, and only after a serious quality control process. Both the distribution monopoly and the quality control are currently under severe pressure.
    Libraries were originally established for making knowledge represented as printed material available to the citizens for an affordable cost. Quality of content could be taken as granted because it was controlled by the owners of the distribution channels. Libraries served as distribution channels from the few content providers to the many content consumers. The current situation is that content is increasingly provided by the many and consumed by the many, and the distribution is close to free. There are no longer distinct geographical locations where knowledge is stored. Knowledge is stored in the computer "cloud". Much knowledge is stored as text. But there is also a trend towards "model management", that the ideas that the texts are about increasingly are presented in explicit forms, as is evidenced by the proposals for a "semantic web".
    The digital library has taken over tasks and offers more services than the traditional library. The role of the "new" digital library is unclear, which knowledge to manage, which services to produce, which needs to satisfy.
    Ingeborg Torvik Sølvberg is in the Information Management Group and Arne Sølvberg is in the Information Systems Group of the Department of Computer and Information Science, Norwegian University of Science and Technology (NTNU), Trondheim, Norway. They are Visiting Scholars here this year. See and

Friday, Apr 23: Clifford LYNCH: Citizen Science and Citizen Humanities in the Age of Cyberinfrastructure.
    First, briefly: Some comments on recent meetings, notably the CNI spring meeting and the National Academies E-Journal meeting.
    Some thoughts on citizen science and citizen humanities in the age of cyberinfrastructure. "Amateur" or "citizen" science has a rich history. Indeed, once all scientists were arguably amateurs. We are seeing a resurgence of this in a range of scientific areas. I'll survey a few of these, and try to draw some generalizations about where amateur engagement seems most feasible, as well as trying to raise some questions about possible distinctions between amateur and citizen science today. I'll then examine the recent rise of amateur or citizen humanistic work, which has been enormous and has been fueled both by social changes and changes in the educational system, and look at some of the implications of these developments for scholarship broadly.

Friday, Apr 30: Bob BELL: Final Progress Report, and Eric KANSA: Scientific Data.
    Bob Bell: Breathing Life into E-RIS: Creating an online platform for e-resources
    I will discuss the progress on the project, including (a) database changes, including meta-data/descriptors for e-resources, populating the database, and structural changes to the database; and (b) database-driven website development, including the interactive user's forum for e-resources.
    Eric KANSA: Carrots, Sticks, and Web Publishing of Scientific Data: Open Context in Context.
    The journal Nature recently published a series of editorials and features highlighting the lack of data sharing in many scientific domains. Reluctance to share primary data comes from a number of factors. In some cases, researchers see risks of appropriation and misuse more than they see rewards for sharing data. In other cases, "cyberinfrastructure" supporting dissemination channels and services remains poorly developed.     I will discuss a case study from archaeology to help illustrate how data sharing practices are evolving. Often, the impact and research benefits of sharing data remain theoretical rather than demonstrated. Fortunately, this is beginning to change , at least in certain specialized subject areas within archaeology. As data sharing assumes greater prominence in publication and funding, two distinct approaches for data sharing are emerging within archaeology. On the one hand, there are models for "data-sharing through archiving," and on the other hand, there are models that try to cast "data sharing as publication." In the background, professional ethics, funding mandates, and supporting "cyberinfrastructure" are all aligning to place data sharing as a more regular part of professional practice.
    Eric KANSA is Adjunct Professor and Executive Director of the School's Information and Service Design Program. He is also Co-Founder and former Executive Director of the Alexandria Archive Institute

Friday, May 7: No Seminar meeting.

The Seminar will resume in the Fall semester on August 27.
Spring 2010 schedule.   Fall 2009 schedule and summaries.