School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2023.
Fridays 3-5. Details will be added as they become available.
In person, with also Zoom -- unless indicated otherwise. Campus policy requires
all Zoom participants to sign into a Zoom account prior to joining
meetings hosted by UC Berkeley. Face mask recommended but not required
for in-person attendance. Zoom sessions are not recorded.
A link to each Seminar session is available only at
the School's event listing: www.ischool.berkeley.edu/events.
Schedule. Weekly
mailing list.
Jan 20: Clifford LYNCH & Michael BUCKLAND: Introductions.
Introduction to the Seminar. Expectations for registered
students. Schedule highlights. Introduction of participants.
Clifford LYNCH: Stewardship in the Digital Age Continued.
The Scope of the Challenge: Personal Digital Archiving.
During the Spring 2023 Semester of the Seminar, I'll
continue the series of talks exploring the issues of stewardship in
the digital age that I started in Fall 2022. For the benefit of those
new to the semester, I'll briefly summarize the framing of this series
of talks and the topics covered in Fall 2022, and then provide a roadmap
for at least the next few talks that I'll give in Spring 2023.
Following this I'll move into an in-depth consideration
of personal digital archiving, and consider how this relates to
historical and future stewardship practices. We'll examine the scope of
what might constitute a personal digital archive, and how this relates
to prior discussions of stewardship of social media and the web. Of
particular interest is the question of when, why and how material moves
from the personal (or perhaps familial) setting to the broader cultural
Jan 27: Nick MERRILL & Jake HARTNELL: Data DAOs: Promises, perils, and
privacy.
Data cooperatives and data unions have long inspired dreams of
cooperative ownership over individual- and community-level data.
Decentralized Autonomous Organizations (DAOs), software that provisions cooperative ownership over shared data
using cryptography, have brought these dreams closer to fruition in
the form of Data DAOs. This talk discusses the promises and perils of
Data DAOs. We draw on our experience "on the frontline," implementing
Data DAOs on top of the open source DAO DAO contracts.
Nick Merrill is a research fellow at the UC Berkeley
Center for Long-Term Cybersecurity and a core contributor to
DAO DAO. More at www.else.how/about.
Jake Hartnell (MIMS '14) is a core contributor to
DAO DAO, Juno, and Stargaze. More at https://www.ischool.berkeley.edu/people/jacob-hartnell.
For more on DAO DAO see daodao.zone/.
Feb 3: Students' Proposals and Short Reports, including:
Sarah BARRINGTON: Humans vs. AI: Detecting AI-Created
Textual Content Using ChatGPT.
Generative AI models, such as ChatGPT and DALL-E from
OpenAI, are becoming increasingly prevalent in western technology culture.
These models can generate seemingly human responses to a range of complex
tasks; including answering philosophical natural language questions,
writing fully-functioning computer code and producing new forms of art and
graphics. As a result, questions are now being raised regarding the
implications of these technologies on a range of fields, from plagiarism in
education to displacement of the white-collar workforce. At present, these
tools remain somewhat nascent to the public. As such, there are few
detection methods available to analyze whether a piece of digital content
has been created by a human or a generative AI model. The aim of this
project is to develop a predictive classifier model that can detect whether
a text has been written by a human or produced by a generative AI program.
Gautham KOORMA: A Survey of Music Information Retrieval
Literature.
Music Information Retrieval (MIR) is an interdisciplinary
research area that uses learnings from diverse fields such as musicology,
psychoacoustics, signal processing, computer science, and machine
learning to organize and retrieve information from music. MIR is still
a relatively nascent discipline, and the International Society for Music
Information Retrieval (ISMIR) is a research community formed in the year
2000 to advance research in the field of MIR. Growth in this field has
led to advancements related to music signal processing, music search
and retrieval, and music recommendations that are being increasingly
used in both academia and by corporations. This project aims to survey
MIR literature, provide an interdisciplinary perspective on how the
field has progressed over the years, and evaluate recent developments
in related fields that potentially improve MIR tasks.
Clifford LYNCH: Varied Short Reports.
Following presentations from registered students on their
initial plans for their projects, we'll cover a variety of short topics,
as time permits; we may not get to all of these topics. I'll speak
briefly about the current survey that the Pew Research Institute is
doing on future prospects for what they call "digital life", and at
somewhat more length on recent announcements and policies from the
US White House Office of Science and Technology Policy (OSTP) in
various areas. I'll also share some brief comments on a recent EU
policy seeking to facilitate the re-use of what it characterizes as
"high-value" datasets, and offer some brief comments on the current
series of National Academies webinars on digital twins. Seminar
participants are welcome to contribute their own additional short reports.
Feb 10: Michael BUCKLAND: Information from the Individual's Perspective.
Theorizing about information and information systems
generally privileges a providers' perspective -- with more or less
concern for the user.
But what if the individual living subject's perceiving, interpreting,
reasoning, knowing, and remembering really were treated as central?
and as the point of departure?
I will present a tentative outline of how this might be
approached: the components, how they are related, some implications and
two remaining open questions. This approach requires some changed
definitions but, I will argue, offers a more complete and more
coherent view of the field of information studies.
Feb 17: Clifford LYNCH: Stewardship in the Digital Age: The Scope
of the Challenge.
In the coming two talks, I expect to complete the survey
of the scope of the challenge for stewardship in the digital age.
I will cover a final set of case studies dealing with telemetry and
sensing data broadly (including environmental, weather, "smart cities",
remote sensing and similar collections of data) and commercial and
governmental data and telemetry broadly (including financial records,
insurance records, business records, census data). I'll focus on questions
of what should be part of the cultural record, and how it should be
incorporated in this record.
After this, I'll consider a number of "edge cases", such
as software (particularly open source software), data dumps and stolen
data, and what we might call "abandoned data", and make some concluding
comments about stewardship in an age of abundance, and issues related
to acquisition/appraisal and de-acquisition policies and processes.
Future talks will move beyond the scope of the challenge to
examine legal and policy issues, institutional and collector roles, and
transfers of stewardship responsibility, among other topics. (To be continued
on Feb 24).
Feb 24: Clifford LYNCH: Stewardship in the Digital Age: The Scope
of the Challenge - Cont.
In this talk, I'll complete the survey of the scope of
the challenge for stewardship in the digital age. I'll do a brief call
for any further comments on our previous discuss of "telemetry", be it
environmental or governmental/commercial. We'll spend most of the time
discussing a variety of cross-cutting or otherwise problematic cases,
including the role of software in the cultural record, data dumps and
purloined data, and examples of what we might call "abandoned data".
Time permitting, I'll conclude with some overall comments about stewardship
in an age of abundance and issues related to
acquisition/appraisal/collection development and de-acquisition policies
and processes.
Future talks (to be announced) will move beyond the scope
of the challenge to consider legal and policy issues, institutional and
individual (including collector) roles, and transitions of stewardship
responsibilities, among other topics.
Mar 3: Maria GOULD: The Research Organization Registry.
The Research Organization Registry (ROR) is an initiative
developed and co-led by California Digital Library to provide an open
solution to the problem of identifying affiliations in research outputs
and tracking research outputs at the institutional level. The ROR registry provides unique identifiers and metadata
records for 100,000+ research organizations around the world. ROR is
being integrated in research systems and workflows to collect clean
affiliation data and make this data openly available to support
discovery and tracking of outputs by institutions.
This session will provide an overview of what ROR is,
present examples of how and where ROR IDs are being integrated, and
discuss how libraries and research institutions can benefit from the
open data that ROR provides.
Maria Gould is a product manager and research data
specialist at the California Digital Library (CDL), where she is
responsible for the University of California Curation Center's portfolio
of persistent identifier services and directs the Research Organization
Registry (ROR) initiative. More at CDL
staff profile.
Mar 10: Megan FINN, Sarika SHARMA, & Amelia ACKER:
Evaluating Tools for Data Management Plans.
Data management plans (DMPs) are required from researchers
seeking funding from federal agencies in the USA. Ideally, DMPs disclose
how research outputs will be managed and shared. How well DMPs
communicate those plans is less understood. Evaluation tools such as the
DART rubric and the Belmont scorecard assess the completeness of DMPs
and offer one view into what DMPs communicate. This talk presents an
initial analysis of the 1,000 DMPs that we collected. We also present
an evaluation of the DART and Belmont tools by applying them to the
same corpus of 150 DMPs from five different NSF programs.
Megan Finn, Amelia Acker, Sarika Sharma, and Yubing
Tian (PhD Candidate, UW) are working on an NSF-sponsored project
about Scientific
Data Governance, Preservation and Archiving, investigating
the life of scientific data, specifically in relation to National
Science Foundation’s requirement for Data Management Plans with a focus
on the relationship between national science policies and different
epistemic cultures.
Megan Finn, PhD '12, is an associate professor at the
University of Washington Information School. Her work examines relations
among institutions, infrastructures, and practices in the production,
circulation, and use of information. I examine these themes in a book,
called
Aftermath Documenting Aftermath: Information Infrastructures in the Wake
of Disasters (MIT Press 2018).
Amelia Acker is an associate professor at the
University of Texas at Austin in the School of Information. Her research
on data archives, cultures of computing, and preservation has been funded
by the National Science Foundation and the Institute for Museum and
Library Services. Acker’s current research focuses on cultures of mobile
computing, emerging digital preservation models, data literacy, data
durability, and metadata standards for exchange between private and
public archives.
Sarika Sharmais a postdoctoral fellow on the
Afterlives Project with Dr. Acker and Dr. Finn. She recently received
her PhD from the School of Information at Syracuse University. Her
dissertation examined the institutional effects of the Report on the
Blue-Ribbon Advisory Panel on Cyberinfrastructure established by the
National Science Foundation in 2003 using theories of institutionalization
and institutional logics to explain how data integration became a
legitimate practice in the field of ecology.
Mar 17: Gautham KOORMA & Sarah BARRINGTON: Short Reports.
Gautham KOORMA: Music Similarity Measures for MIR and
Shazam Case Study.
Music Information Retrieval (MIR) is an interdisciplinary
research area that uses learnings from diverse fields such as musicology,
psychoacoustics, signal processing, computer science, and machine
learning to organize and retrieve information from music. For the
first update on the survey of MIR techniques, I will be covering the
concept of music similarity as it pertains to music information
retrieval by summarizing how music similarity measures have evolved
in the field, a few typical applications of music similarity measures,
followed by a case study on the use of a particular similarity
technique called audio fingerprinting by the music identification
app Shazam.
Sarah BARRINGTON: Humans vs. AI: Detecting AI-Created
Textual Content Using ChatGPT.
Large Language Models (LLMs) and Generative AIs, such as
ChatGPT and DALL-E from OpenAI, are becoming increasingly prevalent
in western technology culture. These models can generate seemingly human
responses to a range of complex tasks; including answering philosophical
natural language questions, writing fully-functioning computer code and
producing new forms of art and graphics. As a result, questions are
now being raised regarding the implications of these technologies on a
range of fields, from plagiarism in education to displacement of the
white-collar workforce. This presentation will provide an update on the
foundational research questions exploring the weaknesses and potential
for adversarial manipulation in LLMs such as ChatGPT.
Mar 24: Chris HOOFNAGLE: The TechCons
Congames are as old as human society. The advent of the
internet did not alter congames’ narratives, but the internet did
alter congames operational environment, making congames much more powerful.
To understand how, I revisit a consumer protection framework developed
by Yale Law Professor Arthur Leff in 1976. Leff showed how market
structure is a powerful factor for distinguishing illegal congames—-what
he called “swindling”— from legal “selling,” which often does involve
some deception. I will then apply Leff’s framework to explain three
different internet phenomena: First, how the internet empowered con
artists. Second, how cryptocurrencies share fundamental traits with
Ponzi schemes. And finally, that online behavioral advertising
requires a monopoly market structure to deliver on its promises.
Mar 31: University holiday. No Seminar meeting.
Apr 7: Clifford LYNCH: Stewardship in the Digital Age; The
Scope of the Challenge.
In this talk, I hope to conclude the survey of the scope
of the challenge for stewardship in the digital age. We'll first
examine the question of "abandoned" materials (introduced in my last
talk) in the cultural record (both physical and digital) as a
cross-cutting issue. I'll then conclude with some overall comments about
stewardship in an age of abundance and issues related to
acquisition/appraisal/collection development and de-acquisition policies
and processes.
Future talks (to be announced) on Stewardship in the
Digital age will move beyond the scope of the challenge to consider legal
and policy issues, institutional and individual (including collector)
roles, and transitions of stewardship responsibilities, among other topics.
Apr 14: Clifford LYNCH: Stewardship in the Digital Age:
Legal and Policy Frameworks.
This session will explore the implications of the
shifting legal framework governing digital content based on a move
from first sale to licensing and its implications for stewardship of
the cultural record. I'll also consider additional legal provisions such
as copyright deposit and their potential role as well as ideas such as
cultural heritage, patrimony, and rights of creators. Finally, I'll
briefly consider possible policy or legal changes that might encourage
or facilitate stewardship in the digital age.
Apr 21: Sarah BARRINGTON and Clifford LYNCHG.
Sarah BARRINGTON: Humans vs AI: The Effectiveness of
Detecting AI-Created Content.
Large Language Models (LLMs) and Generative AIs, such as
ChatGPT and DALL-E from OpenAI, are becoming increasingly prevalent in
western technology culture. These models can generate seemingly human
responses to a range of complex tasks; including answering philosophical
natural language questions, writing fully-functioning computer code and
producing new forms of art and graphics. As a result, a wave of
AI-generated text detectors have emerged to tackle the problem of
attribution for a range of fields, from academic dishonesty to combating
misinformation campaigns. This presentation will provide an update on the
foundational research questions exploring the efficacy of these
detectors; including empirical results generated from a large sample
of real and fake text from a range of sources.
Clifford LYNCH: Stewardship in the Digital Age:
Legal and Policy Frameworks.
We will conclude the discussion of law and policy
frameworks for stewardship in the digital age with a review of issues
around cultural heritage and patrimony, which have been key mechanisms
for building and preserving cultural memory collections in earlier eras.
As time permits, we'll then get a start on the classification of threats
to stewardship of the cultural record.
Apr 28: Jeffrey MACKIE-MASON: Priorities for Shrinking Academic
Research Libraries.
Funding for academic research libraries nationally
and internationally has
been declining for at least two decades, especially when adjusted for
inflation and the number of patrons served. During the same time, the
information environment has radically changed, and along with it
expectations for research library services. With no funding turnaround
in sight, it is necessary to regularly revisit core priorities. I will
present some facts and frame some typical critical choices facing
research libraries,
to stimulate discussion: what are the core priorities?.
May 5: Last Seminar: Gautham KOORMA and Clifford LYNCH.
Gautham KOORMA: Short Report:
Music Recommender Systems for Music Information Retrieval.
Music Information Retrieval (MIR) is an
interdisciplinary research
area that uses learnings from diverse fields such as musicology,
psychoacoustics, signal processing, computer science, and machine
learning to organize and retrieve information from music. In the final
progress report, I will start with a discussion on auditory perception
in humans. I will then discuss user-centric MIR and how networked music
delivery has enabled the collection and processing of user data to build
music recommender systems. Finally, I will discuss how popular music
streaming apps such as Spotify use content-based filtering and
collaborative filtering in music recommender systems.
Clifford LYNCH: Stewardship in the Digital Age:
Threats, Transitions, and a Return to the Fundamental Questions.
In this session, which will conclude the 2022-2023 survey
of stewardship in the digital age, I'll start with some discussion of
the classification of threats to stewardship of the cultural record and
then discuss briefly transitions of stewardship from one organization
to another. I'll conclude with a return to examining some of the
fundamental questions, notably: what are we trying to preserve, and why?
How will the digital cultural record be organized to facilitate discovery,
management and re-use?
The Seminar will resume in the Fall semester.
Fall
2022 schedule and summaries.
Fall
2023 schedule and summaries.