School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Spring 2023.

Fridays 3-5. Details will be added as they become available.
In person, with also Zoom -- unless indicated otherwise. Campus policy requires all Zoom participants to sign into a Zoom account prior to joining meetings hosted by UC Berkeley. Face mask recommended but not required for in-person attendance. Zoom sessions are not recorded.
A link to each Seminar session is available only at the School's event listing: www.ischool.berkeley.edu/events.
Schedule. Weekly mailing list.

Jan 20: Clifford LYNCH & Michael BUCKLAND: Introductions.
    Introduction to the Seminar. Expectations for registered students. Schedule highlights. Introduction of participants.
    Clifford LYNCH: Stewardship in the Digital Age Continued. The Scope of the Challenge: Personal Digital Archiving.
    During the Spring 2023 Semester of the Seminar, I'll continue the series of talks exploring the issues of stewardship in the digital age that I started in Fall 2022. For the benefit of those new to the semester, I'll briefly summarize the framing of this series of talks and the topics covered in Fall 2022, and then provide a roadmap for at least the next few talks that I'll give in Spring 2023.
    Following this I'll move into an in-depth consideration of personal digital archiving, and consider how this relates to historical and future stewardship practices. We'll examine the scope of what might constitute a personal digital archive, and how this relates to prior discussions of stewardship of social media and the web. Of particular interest is the question of when, why and how material moves from the personal (or perhaps familial) setting to the broader cultural

Jan 27: Nick MERRILL & Jake HARTNELL: Data DAOs: Promises, perils, and privacy.
    Data cooperatives and data unions have long inspired dreams of cooperative ownership over individual- and community-level data. Decentralized Autonomous Organizations (DAOs), software that provisions cooperative ownership over shared data using cryptography, have brought these dreams closer to fruition in the form of Data DAOs. This talk discusses the promises and perils of Data DAOs. We draw on our experience "on the frontline," implementing Data DAOs on top of the open source DAO DAO contracts.
    Nick Merrill is a research fellow at the UC Berkeley Center for Long-Term Cybersecurity and a core contributor to DAO DAO. More at www.else.how/about.
    Jake Hartnell (MIMS '14) is a core contributor to DAO DAO, Juno, and Stargaze. More at https://www.ischool.berkeley.edu/people/jacob-hartnell.
    For more on DAO DAO see daodao.zone/.

Feb 3: Students' Proposals and Short Reports, including:
    Sarah BARRINGTON: Humans vs. AI: Detecting AI-Created Textual Content Using ChatGPT.

    Generative AI models, such as ChatGPT and DALL-E from OpenAI, are becoming increasingly prevalent in western technology culture. These models can generate seemingly human responses to a range of complex tasks; including answering philosophical natural language questions, writing fully-functioning computer code and producing new forms of art and graphics. As a result, questions are now being raised regarding the implications of these technologies on a range of fields, from plagiarism in education to displacement of the white-collar workforce. At present, these tools remain somewhat nascent to the public. As such, there are few detection methods available to analyze whether a piece of digital content has been created by a human or a generative AI model. The aim of this project is to develop a predictive classifier model that can detect whether a text has been written by a human or produced by a generative AI program.
    Gautham KOORMA: A Survey of Music Information Retrieval Literature.
    Music Information Retrieval (MIR) is an interdisciplinary research area that uses learnings from diverse fields such as musicology, psychoacoustics, signal processing, computer science, and machine learning to organize and retrieve information from music. MIR is still a relatively nascent discipline, and the International Society for Music Information Retrieval (ISMIR) is a research community formed in the year 2000 to advance research in the field of MIR. Growth in this field has led to advancements related to music signal processing, music search and retrieval, and music recommendations that are being increasingly used in both academia and by corporations. This project aims to survey MIR literature, provide an interdisciplinary perspective on how the field has progressed over the years, and evaluate recent developments in related fields that potentially improve MIR tasks.
    Clifford LYNCH: Varied Short Reports.
    Following presentations from registered students on their initial plans for their projects, we'll cover a variety of short topics, as time permits; we may not get to all of these topics. I'll speak briefly about the current survey that the Pew Research Institute is doing on future prospects for what they call "digital life", and at somewhat more length on recent announcements and policies from the US White House Office of Science and Technology Policy (OSTP) in various areas. I'll also share some brief comments on a recent EU policy seeking to facilitate the re-use of what it characterizes as "high-value" datasets, and offer some brief comments on the current series of National Academies webinars on digital twins. Seminar participants are welcome to contribute their own additional short reports.

Feb 10: Michael BUCKLAND: Information from the Individual's Perspective.
    Theorizing about information and information systems generally privileges a providers' perspective -- with more or less concern for the user. But what if the individual living subject's perceiving, interpreting, reasoning, knowing, and remembering really were treated as central? and as the point of departure?
    I will present a tentative outline of how this might be approached: the components, how they are related, some implications and two remaining open questions. This approach requires some changed definitions but, I will argue, offers a more complete and more coherent view of the field of information studies.

Feb 17: Clifford LYNCH: Stewardship in the Digital Age: The Scope of the Challenge.
    In the coming two talks, I expect to complete the survey of the scope of the challenge for stewardship in the digital age. I will cover a final set of case studies dealing with telemetry and sensing data broadly (including environmental, weather, "smart cities", remote sensing and similar collections of data) and commercial and governmental data and telemetry broadly (including financial records, insurance records, business records, census data). I'll focus on questions of what should be part of the cultural record, and how it should be incorporated in this record.
    After this, I'll consider a number of "edge cases", such as software (particularly open source software), data dumps and stolen data, and what we might call "abandoned data", and make some concluding comments about stewardship in an age of abundance, and issues related to acquisition/appraisal and de-acquisition policies and processes.
    Future talks will move beyond the scope of the challenge to examine legal and policy issues, institutional and collector roles, and transfers of stewardship responsibility, among other topics. (To be continued on Feb 24).

Feb 24: Clifford LYNCH: Stewardship in the Digital Age: The Scope of the Challenge - Cont.
    In this talk, I'll complete the survey of the scope of the challenge for stewardship in the digital age. I'll do a brief call for any further comments on our previous discuss of "telemetry", be it environmental or governmental/commercial. We'll spend most of the time discussing a variety of cross-cutting or otherwise problematic cases, including the role of software in the cultural record, data dumps and purloined data, and examples of what we might call "abandoned data". Time permitting, I'll conclude with some overall comments about stewardship in an age of abundance and issues related to acquisition/appraisal/collection development and de-acquisition policies and processes.
    Future talks (to be announced) will move beyond the scope of the challenge to consider legal and policy issues, institutional and individual (including collector) roles, and transitions of stewardship responsibilities, among other topics.

Mar 3: Maria GOULD: The Research Organization Registry.
    The Research Organization Registry (ROR) is an initiative developed and co-led by California Digital Library to provide an open solution to the problem of identifying affiliations in research outputs and tracking research outputs at the institutional level. The ROR registry provides unique identifiers and metadata records for 100,000+ research organizations around the world. ROR is being integrated in research systems and workflows to collect clean affiliation data and make this data openly available to support discovery and tracking of outputs by institutions.
    This session will provide an overview of what ROR is, present examples of how and where ROR IDs are being integrated, and discuss how libraries and research institutions can benefit from the open data that ROR provides.
    Maria Gould is a product manager and research data specialist at the California Digital Library (CDL), where she is responsible for the University of California Curation Center's portfolio of persistent identifier services and directs the Research Organization Registry (ROR) initiative. More at CDL staff profile.

Mar 10: Megan FINN, Sarika SHARMA, & Amelia ACKER: Evaluating Tools for Data Management Plans.
    Data management plans (DMPs) are required from researchers seeking funding from federal agencies in the USA. Ideally, DMPs disclose how research outputs will be managed and shared. How well DMPs communicate those plans is less understood. Evaluation tools such as the DART rubric and the Belmont scorecard assess the completeness of DMPs and offer one view into what DMPs communicate. This talk presents an initial analysis of the 1,000 DMPs that we collected. We also present an evaluation of the DART and Belmont tools by applying them to the same corpus of 150 DMPs from five different NSF programs.
    Megan Finn, Amelia Acker, Sarika Sharma, and Yubing Tian (PhD Candidate, UW) are working on an NSF-sponsored project about Scientific Data Governance, Preservation and Archiving, investigating the life of scientific data, specifically in relation to National Science Foundation’s requirement for Data Management Plans with a focus on the relationship between national science policies and different epistemic cultures.
    Megan Finn, PhD '12, is an associate professor at the University of Washington Information School. Her work examines relations among institutions, infrastructures, and practices in the production, circulation, and use of information. I examine these themes in a book, called Aftermath Documenting Aftermath: Information Infrastructures in the Wake of Disasters (MIT Press 2018).
    Amelia Acker is an associate professor at the University of Texas at Austin in the School of Information. Her research on data archives, cultures of computing, and preservation has been funded by the National Science Foundation and the Institute for Museum and Library Services. Acker’s current research focuses on cultures of mobile computing, emerging digital preservation models, data literacy, data durability, and metadata standards for exchange between private and public archives.
    Sarika Sharmais a postdoctoral fellow on the Afterlives Project with Dr. Acker and Dr. Finn. She recently received her PhD from the School of Information at Syracuse University. Her dissertation examined the institutional effects of the Report on the Blue-Ribbon Advisory Panel on Cyberinfrastructure established by the National Science Foundation in 2003 using theories of institutionalization and institutional logics to explain how data integration became a legitimate practice in the field of ecology.

Mar 17: Gautham KOORMA & Sarah BARRINGTON: Short Reports.
    Gautham KOORMA: Music Similarity Measures for MIR and Shazam Case Study.

    Music Information Retrieval (MIR) is an interdisciplinary research area that uses learnings from diverse fields such as musicology, psychoacoustics, signal processing, computer science, and machine learning to organize and retrieve information from music. For the first update on the survey of MIR techniques, I will be covering the concept of music similarity as it pertains to music information retrieval by summarizing how music similarity measures have evolved in the field, a few typical applications of music similarity measures, followed by a case study on the use of a particular similarity technique called audio fingerprinting by the music identification app Shazam.
    Sarah BARRINGTON: Humans vs. AI: Detecting AI-Created Textual Content Using ChatGPT.
    Large Language Models (LLMs) and Generative AIs, such as ChatGPT and DALL-E from OpenAI, are becoming increasingly prevalent in western technology culture. These models can generate seemingly human responses to a range of complex tasks; including answering philosophical natural language questions, writing fully-functioning computer code and producing new forms of art and graphics. As a result, questions are now being raised regarding the implications of these technologies on a range of fields, from plagiarism in education to displacement of the white-collar workforce. This presentation will provide an update on the foundational research questions exploring the weaknesses and potential for adversarial manipulation in LLMs such as ChatGPT.

Mar 24: Chris HOOFNAGLE: The TechCons
    Congames are as old as human society. The advent of the internet did not alter congames’ narratives, but the internet did alter congames operational environment, making congames much more powerful. To understand how, I revisit a consumer protection framework developed by Yale Law Professor Arthur Leff in 1976. Leff showed how market structure is a powerful factor for distinguishing illegal congames—-what he called “swindling”— from legal “selling,” which often does involve some deception. I will then apply Leff’s framework to explain three different internet phenomena: First, how the internet empowered con artists. Second, how cryptocurrencies share fundamental traits with Ponzi schemes. And finally, that online behavioral advertising requires a monopoly market structure to deliver on its promises.

Mar 31: University holiday. No Seminar meeting.

Apr 7: Clifford LYNCH: Stewardship in the Digital Age; The Scope of the Challenge.

    In this talk, I hope to conclude the survey of the scope of the challenge for stewardship in the digital age. We'll first examine the question of "abandoned" materials (introduced in my last talk) in the cultural record (both physical and digital) as a cross-cutting issue. I'll then conclude with some overall comments about stewardship in an age of abundance and issues related to acquisition/appraisal/collection development and de-acquisition policies and processes. Future talks (to be announced) on Stewardship in the Digital age will move beyond the scope of the challenge to consider legal and policy issues, institutional and individual (including collector) roles, and transitions of stewardship responsibilities, among other topics.

Apr 14: Clifford LYNCH: Stewardship in the Digital Age: Legal and Policy Frameworks.
    This session will explore the implications of the shifting legal framework governing digital content based on a move from first sale to licensing and its implications for stewardship of the cultural record. I'll also consider additional legal provisions such as copyright deposit and their potential role as well as ideas such as cultural heritage, patrimony, and rights of creators. Finally, I'll briefly consider possible policy or legal changes that might encourage or facilitate stewardship in the digital age.

Apr 21: Sarah BARRINGTON and Clifford LYNCHG.
    Sarah BARRINGTON: Humans vs AI: The Effectiveness of Detecting AI-Created Content.

    Large Language Models (LLMs) and Generative AIs, such as ChatGPT and DALL-E from OpenAI, are becoming increasingly prevalent in western technology culture. These models can generate seemingly human responses to a range of complex tasks; including answering philosophical natural language questions, writing fully-functioning computer code and producing new forms of art and graphics. As a result, a wave of AI-generated text detectors have emerged to tackle the problem of attribution for a range of fields, from academic dishonesty to combating misinformation campaigns. This presentation will provide an update on the foundational research questions exploring the efficacy of these detectors; including empirical results generated from a large sample of real and fake text from a range of sources.
    Clifford LYNCH: Stewardship in the Digital Age: Legal and Policy Frameworks.
    We will conclude the discussion of law and policy frameworks for stewardship in the digital age with a review of issues around cultural heritage and patrimony, which have been key mechanisms for building and preserving cultural memory collections in earlier eras. As time permits, we'll then get a start on the classification of threats to stewardship of the cultural record.

Apr 28: Jeffrey MACKIE-MASON: Priorities for Shrinking Academic Research Libraries.
    Funding for academic research libraries nationally and internationally has been declining for at least two decades, especially when adjusted for inflation and the number of patrons served. During the same time, the information environment has radically changed, and along with it expectations for research library services. With no funding turnaround in sight, it is necessary to regularly revisit core priorities. I will present some facts and frame some typical critical choices facing research libraries, to stimulate discussion: what are the core priorities?.

May 5: Last Seminar: Gautham KOORMA and Clifford LYNCH.
    Gautham KOORMA: Short Report: Music Recommender Systems for Music Information Retrieval.

    Music Information Retrieval (MIR) is an interdisciplinary research area that uses learnings from diverse fields such as musicology, psychoacoustics, signal processing, computer science, and machine learning to organize and retrieve information from music. In the final progress report, I will start with a discussion on auditory perception in humans. I will then discuss user-centric MIR and how networked music delivery has enabled the collection and processing of user data to build music recommender systems. Finally, I will discuss how popular music streaming apps such as Spotify use content-based filtering and collaborative filtering in music recommender systems.
    Clifford LYNCH: Stewardship in the Digital Age: Threats, Transitions, and a Return to the Fundamental Questions.
    In this session, which will conclude the 2022-2023 survey of stewardship in the digital age, I'll start with some discussion of the classification of threats to stewardship of the cultural record and then discuss briefly transitions of stewardship from one organization to another. I'll conclude with a return to examining some of the fundamental questions, notably: what are we trying to preserve, and why? How will the digital cultural record be organized to facilitate discovery, management and re-use?

    The Seminar will resume in the Fall semester.
Fall 2022 schedule and summaries. Fall 2023 schedule and summaries.