School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Fall 2022.

Fridays 3-5. Details will be added as they become available.
In person, with also Zoom -- unless indicated otherwise. Campus policy requires all Zoom participants to sign into a Zoom account prior to joining meetings hosted by UC Berkeley. Face mask recommended but not required for in-person attendance. Zoom sessions are not recorded.
A link to each Seminar session is available only at the School's event listing:
Schedule. Weekly mailing list.

Aug 26: Clifford LYNCH: Overview of stewardship lectures.
    Michael BUCKLAND & Clifford LYNCH: Introduction.
    Introduction to the Seminar, and Plans for Fall, including a summary of upcoming sessions. Introductions of Participants; expectations for registered students.
    Clifford LYNCH: Context and Overview for the Stewardship Lectures.
    In 2016 I gave a series of talks in the seminar trying to summarize and synthesize work I've done over the past 20 years on stewardship of the scholarly and cultural record. These were transcribed and I hoped to use them as the basis for a book. As I covered the material it became clear that there were several major areas that were omitted from the survey (and I published several papers covering some of these areas); in the following years several urgent personal issues, and then a pandemic intervened. And the landscape of preservation and stewardship changed significantly. In this introductory discussion, I'll outline what I hope to cover in the 4-5 sessions we've scheduled on stewardship this fall, and try to provide a broad context for the upcoming lectures.

Sep 2: No Seminar meeting. Labor Day weekend.

Sep 9: Clifford LYNCH: Stewardship 1: The Scope of the Challenge.

    Following the August 26 introduction and framing of the stewardship challenge. my talks on September 9 and October 7 will begin an examination of the scope of the stewardship challenge. I'll look at the scope and nature of the scholarly record, and the explore and contrast that to the much larger and less well defined cultural record. After some general discussion, I'll look at a series of specific case studies: music; books; moving and still images; geospatial, remote sensing and documenting the environment; the web at large, including the deep web, and the digital ephemera of the future; social media, personal digital archiving; factual biography; and news.
    After these case studies, I'll explore the continually-shifting contested areas and boundaries at the the fringes of the cultural record. I'll briefly discuss efforts to measure or approximate the scale of various segments of the cultural record, and of the parts of that record under the protection of stewardship institutions.

Sep 16: Short Reports by Arogya KOIRALA, Shai DHALIWAL, Calvin LEE, Alan KYLE, Siddharth ADELKAR, Sarah BARRINGTON and Ameya NAIK.
    Arogya KOIRALA: Monitoring War Destruction from Space Using Machine Learning.

    Extracting information on-war related destruction is difficult because it relies on eyewitness accounts and manual detection on the field, which is not only costly for institutions carrying out these efforts, but also unsafe for the individuals carrying out this task. The information gathered is also incomplete, which makes it difficult for use in media reporting, carrying out humanitarian relief, understanding human rights violations, or academic reporting. This seminar introduces an automated approach to measure destruction in war damaged buildings by applying deep learning in publicly available satellite imagery. We adapt different neural network architectures and make them applicable for the building damage detection use case. As a proof of concept, we apply this method to the Syrian Civil War to reconstruct war destruction in these regions over time. We close the discussion by talking about how the nature and quality of the inputs used (publicly available satellite imagery) and different architectural choices made in the design of the machine learning system relate to the robustness and generalizability of the outputs produced. This work builds on prior work by Mueller et al. in the PNAS paper "Monitoring war destruction from space using machine learning".
    Shai DHALIWAL: Modernizing Mainframe RACF.

    I plan to explore how cloud modernization will improve cyber security for legacy mainframe information systems: Consider the driving factors which influence organizations to leverage legacy mainframe systems today and existing capabilities for Identity & Access Management; explore opportunities for organizations to migrate IBM Mainframe workloads securely to the cloud and how this could benefit organizations for the next 10-20 years, and provide a recommended framework to execute Mainframe RACF modernization for improved Identity & Access Management security.
    Calvin LEE: Exploring Consumer Robocall Mitigations.
    Have you recently received a strange call from an unknown number in your area code requesting that you extend your expired vehicle warranty insurance? If you have, you had one of the 75.9 billion illegal robocalls that were made within the last 12 months. Counter-advances have been made by critical players such as the Federal Communications Commission and Tier-1 US network carriers, but are they proving futile? Perhaps we can look towards similar tactics such as email anti-spam or CAPTCHA techniques. Designing a solution to mitigate these unwanted calls is complex but we will explore new possible mitigations.
    Alan KYLE: Drawing Lines Between Section 230 and Trust & Safety.
    In this presentation I will discuss what Section 230 and trust & safety are and how they are connected. I will use my experience as a trust & safety professional, and my research of Section 230 bills as a jumping off point for thinking about ways to contextualize current attempts to regulate the Internet.
    Siddarth ADELKAR: English Documentation of Non-English Stories: A Study of the People's Archive of Rural India.
    The People's Archive of Rural India (PARI) is primarily a journalism project. PARI reports on the "everyday lives of everyday people" by using the craft and structure of journalism to inform their predominantly English speaking urban readership -- an important part of which is school and college students. PARI sees itself as an antidote to the structural problems in Indian media, education and historiography. While Indian media focuses on the urban rich, PARI focuses on the rural poor. While Indian English education focuses on skilling and emigration, PARI focuses on deep engagement with one's surroundings. While Indian history focuses on the narrative of kings and empires, PARI prepares an archive for future writings of people's histories. The unique themes that have stood out in PARI's reportage include the agrarian crisis in India, women's sexual and reproductive health, and chronicling the impending climactic catastrophe as "everyday" people feel it.
    An important question that I wish to study in PARI's coverage is the impact -- benefits and drawbacks -- of English-first documentation of livelihood and culture in a predominantly non-English speaking country. PARI is translated in up to 11 Indian languages by over 100 accomplished translators. Yet, in the future if only PARI were to survive what will be lost due to the English-ness or English-firstness of the stories? Will something be gained? Does the presence of pictures and video improve the situation or worsen it?
    Sarah BARRINGTON: The ‘Fungibility’ of Non-Fungible Tokens: Vulnerabilities in an Over-Hyped Market.
    Non-Fungible Tokens, digital certificates of ownership for virtual art, are becoming increasingly ubiquitous in Western media; from releasing the first notes of the Beatles’ ‘Hey Jude’ as an NFT, to the first tweet being sold as an NFT for $2.9 million. In 2021, the market was valued at $17.6bn, representing a growing and salient portion of the overall cryptocurrency and blockchain economy. Yet, the NFT market is also speculative, variously described as irrational and overhyped. The emergence of vulnerabilities, along with a sustained market downtime, are now calling the role of NFTs into question: what exactly are NFTs? And most importantly, what gives them value? This project aims to address these questions, arguing that three fundamental properties (permanence, immutability and uniqueness) are necessary (but not sufficient) conditions for an NFT to have value. We explore both the underlying artworks and their associated metadata in order to define these metrics. Furthermore, we take a quantitative approach to testing these definitions against 6 months of real-world data, examining the true permanence and perceived value of over 7 million NFTs. We ultimately envision this work to help buyers and marketplaces identify and warn users against purchasing NFTs that may be overvalued, and bring some much needed rigour to a presently complex and recondite market.
    Ameya NAIK: Assessing Data Subject Access Requests.
    Any mobile or desktop application prompts you to sign a term of policy agreement that allows the application to gather information related to you, your activities, and your attributes. The level and type of information gathered depend on the organization, business model, kind of application, and geographical location. You could access this information through Data Subject Access Requests, which the applications are bound to provide. While GDPR (Article 15) and CCPA have broadly drawn rules and regulations for Data Subject Access Request(DSAR), however, there still are differences in the way the data is stored and shared with the consumers(you). I plan to initiate and note the process of DSARs and then analyze the data shared. This analysis could potentially lead to interesting observations, and would want to compare the shared data by similar applications (messaging, social media etc.). Developing a catalog and visualizing the data would be the other aspect of the project.

Sep 23: Chris FREELAND, Internet Archive: Controlled Digital Lending.
    The Internet Archive’s Open Libraries program empowers libraries to lend digital books to patrons using controlled digital lending (CDL). In the course of this discussion we'll cover how CDL works, different implementations of the library practice, including Internet Archive's Open Libraries program, and the impact that the practice has for libraries and the communities we serve. We'll also cover Hachette v Internet Archive, the lawsuit brought against the Internet Archive by four commercial publishers for controlled digital lending.
    Chris Freeland is the Director of Open Libraries at the Internet Archive, working in support of the organization's mission to provide "Universal access to all knowledge." Before joining the Internet Archive Chris was an Associate University Librarian at Washington University in St. Louis, managing Washington University Libraries' digital initiatives and related services, and the Director of the Center for Biodiversity Informatics at the Missouri Botanical Garden. He holds an M.S. in Biological Sciences from Eastern Illinois University and an M.S. in Library and Information Science from University of Missouri-Columbia.

Sep 30: Clifford LYNCH: Stewardship: The Scope of the Challenge (Continued).
    I'll continue the discussion from September 16 exploring the nature of the (digital) cultural record through a series of detailed examinations of developments involving particular genres of content. After completing the discussion of recorded music from the last session, we'll discuss moving images (video and film); geospatial and remote sensing broadly; the web, including the "deep" web and consideration of new modes of grey literature and emphemera; and, time permitting, social media and its implications for stewardship. This discussion will continue on November 4.

Oct 7: Michael BUCKLAND & Wayne de FREMERY: Contexts, Works, and Catalogs.
    Building on work previously presented at this seminar by Wayne de Fremery and myself, we propose some fundamental changes to bibliographic and library search and discovery:
    1. Consider the purpose of retrieval systems as a search for "families" or "contexts" of related works rather than for particular items;
    2. Diminish the privileged status accorded to individual creative works, notably by Seymour Lubetzky ’34 and others, and redefine and redirect the Functional Requirements for Bibliographic Resouces (FRBR) model accordingly; and
    3. Unbundle the the tight relationship between library catalog and library collection in order to harmonize theory with contemporary technological reality.
    Wayne de Fremery is Professor of Information Science and Entrepreneurship and Director of the Francoise O. Lepage Center for Global Innovation at Dominican University of California. Previously, he was an associate professor in the School of Media, Arts, and Science at Sogang University in South Korea, where he has lived for twenty years. He currently represents the Korean National Body at ISO as Convener of a working group on document description, processing languages, and semantic metadata (ISO/IEC JTC 1/SC 34 WG 9). His recent research projects have concerned "Digital humanities in the iSchool" (JASIST, 2022), "Copy theory" (JASIST, 2022), "Context, relevance, and labor" (JASIST, 2022), as well as the use of deep learning to improve Korean OCR, for which he received a national citation of merit from the South Korean Ministry of Culture, Sports, and Tourism. More at

Oct 14: Cathy MARSHALL: Who Broke Mechanical Turk?
    Crowdsourcing platforms provide a valuable way to perform a wide range of human intelligence tasks--e.g. data labeling, content moderation, text translation, citizen science--as well as a convenient venue for collecting participant data. I've been using Amazon Mechanical Turk in various capacities since 2010, and have followed worker forums, labor organizing efforts, and the development of worker-centered tools (on one side) and increasingly sophisticated uses of the crowd (on the other). Early on, my colleagues and I were (perhaps naively) delighted by the quality of the data we gathered and by generally positive interactions we had with workers. Using practical advice from the literature, we were able to vet work and encourage good-faith participation in our studies.
    More recently, a handful of researchers from diverse disciplines who use crowdsourcing platforms have described an uptick in unusable data from US-based workers. Frank Shipman and I saw this ourselves in 2018 and 2019 when we re-ran a survey we'd used successfully five years earlier: by 2019, we had to exclude more than 12% of the completed HITs according to our established cleaning heuristics. Even knowing this, what we saw on Mechanical Turk this spring and summer startled us. Almost 90% of the data was unusable. In this talk, I'll use a preliminary analysis of our own and other researchers' data in an effort to explain what seems to be happening on Mechanical Turk, present evidence of why it's not necessarily a symptom of bots, autocompletion tools, or bad faith work, and speculate why Amazon has little incentive to do anything about it.
    Cathy Marshall is a senior research scientist at Texas A&M University and a former principal researcher at Microsoft Research, and before that, at Xerox PARC. She's a fan of personal ephemera, special collections, and a quiet reading room.

Oct 21: Students' Progress Reports.

Oct 28: Early start! 2:30 pm.
    Seminar combined with the School's 104th Birthday Celebration.

    For program and registration go to

Nov 4: Clifford LYNCH: Stewardship - 3.

Nov 11: No Seminar meeting. Veterans' Day holiday.

Nov 18: Final Progress Reports.

Nov 25: No Seminar meeting. Thanksgiving.

Dec 2: In person only. Clifford LYNCH.

The Seminar will resume in the Spring semester.
Spring 2022 schedule and summaries.