School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Fall 2022.
Fridays 3-5. Details will be added as they become available.
In person, with also Zoom -- unless indicated otherwise. Campus policy requires
all Zoom participants to sign into a Zoom account prior to joining
meetings hosted by UC Berkeley. Face mask recommended but not required
for in-person attendance. Zoom sessions are not recorded.
A link to each Seminar session is available only at
the School's event listing: www.ischool.berkeley.edu/events.
Schedule. Weekly
mailing list.
Aug 26: Clifford LYNCH: Overview of Stewardship Lectures.
Michael BUCKLAND & Clifford LYNCH: Introduction.
Introduction to the Seminar, and Plans for Fall,
including a summary of upcoming sessions. Introductions of
Participants; expectations for registered students.
Clifford LYNCH: Context and Overview for the
Stewardship Lectures.
In 2016 I gave a series of talks in the seminar
trying to summarize and synthesize work I've done over the past
20 years on stewardship of the scholarly and cultural record.
These were transcribed and I hoped to use them as the basis for
a book. As I covered the material it became clear that there
were several major areas that were omitted from the survey
(and I published several papers covering some of these areas);
in the following years several urgent personal issues, and then
a pandemic intervened. And the landscape of preservation and
stewardship changed significantly. In this introductory
discussion, I'll outline what I hope to cover in the 4-5
sessions we've scheduled on stewardship this fall, and try
to provide a broad context for the upcoming lectures.
Sep 2: No Seminar meeting. Labor Day weekend.
Sep 9: Clifford LYNCH: Stewardship 1: The Scope of
the Challenge.
Following the August 26 introduction and framing
of the stewardship challenge. my talks on September 9
and October 7 will begin an examination of the scope of the
stewardship challenge. I'll look at the scope and nature of
the scholarly record, and the explore and contrast that to
the much larger and less well defined cultural record. After
some general discussion, I'll look at a series of specific
case studies: music; books; moving and still images; geospatial,
remote sensing and documenting the environment; the web at
large, including the deep web, and the digital ephemera of the
future; social media, personal digital archiving; factual
biography; and news.
After these case studies, I'll explore the
continually-shifting contested areas and boundaries at the the
fringes of the cultural record. I'll briefly discuss efforts
to measure or approximate the scale of various segments of the
cultural record, and of the parts of that record under the
protection of stewardship institutions.
Sep 16: Short Reports by Arogya KOIRALA, Shai DHALIWAL,
Calvin LEE, Alan KYLE, Siddharth ADELKAR,
Sarah BARRINGTON and Ameya NAIK.
Arogya KOIRALA: Monitoring War Destruction
from Space Using Machine Learning.
Extracting information on-war related
destruction is difficult because it relies on eyewitness
accounts and manual detection on the field, which is not only
costly for institutions carrying out these efforts, but also
unsafe for the individuals carrying out this task. The information
gathered is also incomplete, which makes it difficult for use
in media reporting, carrying out humanitarian relief,
understanding human rights violations, or academic reporting.
This seminar introduces an automated approach to measure
destruction in war damaged buildings by applying deep learning
in publicly available satellite imagery. We adapt different
neural network architectures and make them applicable for the
building damage detection use case. As a proof of concept,
we apply this method to the Syrian Civil War to reconstruct
war destruction in these regions over time. We close the
discussion by talking about how the nature and quality of the
inputs used (publicly available satellite imagery) and
different architectural choices made in the design of the
machine learning system relate to the robustness and
generalizability of the outputs produced. This work builds
on prior work by Mueller et al. in the PNAS paper
"Monitoring war destruction from space using machine
learning".
Shai DHALIWAL: Modernizing Mainframe RACF.
I plan to explore how cloud modernization will improve
cyber security for legacy mainframe information systems: Consider the
driving factors which influence organizations to leverage legacy
mainframe systems today and existing capabilities for Identity &
Access Management; explore opportunities for organizations to migrate
IBM Mainframe workloads securely to the cloud and how this could
benefit organizations for the next 10-20 years, and provide a
recommended framework to execute Mainframe RACF modernization for
improved Identity & Access Management security.
Calvin LEE: Exploring Consumer Robocall
Mitigations.
Have you recently received a strange call from
an unknown number in your area code requesting that you extend
your expired vehicle warranty insurance? If you have, you had
one of the 75.9 billion illegal robocalls that were made within
the last 12 months. Counter-advances have been made by critical
players such as the Federal Communications Commission and
Tier-1 US network carriers, but are they proving futile?
Perhaps we can look towards similar tactics such as email
anti-spam or CAPTCHA techniques. Designing a solution to
mitigate these unwanted calls is complex but we will explore
new possible mitigations.
Alan KYLE: Drawing Lines Between Section 230 and
Trust & Safety.
In this presentation I will discuss what Section 230
and trust & safety are and how they are connected. I will use my
experience as a trust & safety professional, and my research of
Section 230 bills as a jumping off point for thinking about ways
to contextualize current attempts to regulate the Internet.
Siddarth ADELKAR: English Documentation of Non-English
Stories: A Study of the People's Archive of Rural India.
The People's Archive of Rural India
(PARI) is primarily a journalism project. PARI reports on the "everyday
lives of everyday people" by using the craft and structure of journalism
to inform their predominantly English speaking urban readership --
an important part of which is school and college students. PARI sees
itself as an antidote to the structural problems in Indian media,
education and historiography. While Indian media focuses on the urban
rich, PARI focuses on the rural poor. While Indian English education
focuses on skilling and emigration, PARI focuses on deep engagement
with one's surroundings. While Indian history focuses on the narrative
of kings and empires, PARI prepares an archive for future writings of
people's histories. The unique themes that have stood out in PARI's
reportage include the agrarian crisis in India, women's sexual and
reproductive health, and chronicling the impending climactic
catastrophe as "everyday" people feel it.
An important question that I wish to study in PARI's
coverage is the impact -- benefits and drawbacks -- of English-first
documentation of livelihood and culture in a predominantly non-English
speaking country. PARI is translated in up to 11 Indian languages by
over 100 accomplished translators. Yet, in the future if only PARI
were to survive what will be lost due to the English-ness or
English-firstness of the stories? Will something be gained? Does
the presence of pictures and video improve the situation or worsen it?
Sarah BARRINGTON: The ‘Fungibility’ of Non-Fungible
Tokens: Vulnerabilities in an Over-Hyped Market.
Non-Fungible Tokens, digital certificates of ownership
for virtual art, are becoming increasingly ubiquitous in Western
media; from releasing the first notes of the Beatles’ ‘Hey Jude’
as an NFT, to the first tweet being sold as an NFT for $2.9 million.
In 2021, the market was valued at $17.6bn, representing a growing
and salient portion of the overall cryptocurrency and blockchain
economy. Yet, the NFT market is also speculative, variously
described as irrational and overhyped. The emergence of
vulnerabilities, along with a sustained market downtime, are now
calling the role of NFTs into question: what exactly are NFTs?
And most importantly, what gives them value? This project aims to
address these questions, arguing that three fundamental properties
(permanence, immutability and uniqueness) are necessary (but not
sufficient) conditions for an NFT to have value. We explore both
the underlying artworks and their associated metadata in order to
define these metrics. Furthermore, we take a quantitative approach
to testing these definitions against 6 months of real-world data,
examining the true permanence and perceived value of over 7
million NFTs. We ultimately envision this work to help buyers and
marketplaces identify and warn users against purchasing NFTs that
may be overvalued, and bring some much needed rigour to a presently
complex and recondite market.
Ameya NAIK: Assessing Data Subject Access Requests.
Any mobile or desktop application prompts you to sign
a term of policy agreement that allows the application to gather
information related to you, your activities, and your attributes. The
level and type of information gathered depend on the organization,
business model, kind of application, and geographical location. You
could access this information through Data Subject Access Requests,
which the applications are bound to provide. While GDPR (Article 15)
and CCPA have broadly drawn rules and regulations for Data Subject
Access Request(DSAR), however, there still are differences in the way
the data is stored and shared with the consumers(you). I plan to
initiate and note the process of DSARs and then analyze the data
shared. This analysis could potentially lead to interesting observations,
and would want to compare the shared data by similar applications
(messaging, social media etc.). Developing a catalog and visualizing
the data would be the other aspect of the project.
Sep 23: Chris FREELAND, Internet Archive: Controlled Digital Lending.
The Internet Archive’s Open Libraries program
empowers libraries to lend digital books to patrons using controlled
digital lending (CDL). In the course of this discussion we'll cover
how CDL works, different implementations of the library practice,
including Internet Archive's Open Libraries program, and the impact
that the practice has for libraries and the communities we serve.
We'll also cover Hachette v Internet Archive, the lawsuit brought
against the Internet Archive by four commercial publishers for
controlled digital lending.
Chris Freeland is the Director of Open
Libraries at the Internet Archive, working in support of the
organization's mission to provide "Universal access to all
knowledge." Before joining the Internet Archive Chris was an
Associate University Librarian at Washington University in
St. Louis, managing Washington University Libraries' digital
initiatives and related services, and the Director of the
Center for Biodiversity Informatics at the Missouri Botanical
Garden. He holds an M.S. in Biological Sciences from Eastern
Illinois University and an M.S. in Library and Information
Science from University of Missouri-Columbia.
Sep 30: Clifford LYNCH: Stewardship: The Scope of the
Challenge (Continued).
I'll continue the discussion from September 16
exploring the nature of the (digital) cultural record through
a series of detailed examinations of developments involving
particular genres of content. After completing the discussion of
recorded music from the last session, we'll discuss moving images
(video and film); geospatial and remote sensing broadly; the web,
including the "deep" web and consideration of new modes of grey
literature and emphemera; and, time permitting, social media and
its implications for stewardship. This discussion will continue
on November 4.
Oct 7: Michael BUCKLAND & Wayne de FREMERY.
Jeanette ZERNEKE: Brief Report on the
Recent PNC and ECAI Meetings.
Michael BUCKLAND & Wayne de FREMERY:
Contexts, Works, and Catalogs.
Building on work previously presented at this seminar
by Wayne de Fremery and myself,
we propose some fundamental changes to bibliographic and library search
and discovery:
1. Consider the purpose of retrieval systems as a
search for "families" or "contexts" of related works rather than
for particular items;
2. Diminish the privileged status accorded to
individual creative works, notably by Seymour Lubetzky ’34 and
others, and redefine and redirect the Functional Requirements for
Bibliographic Resouces (FRBR) model accordingly; and
3. Unbundle the the tight relationship between library catalog
and library collection in order to harmonize theory with contemporary
technological reality.
Wayne de Fremery is Professor of Information Science and Entrepreneurship and Director of the Francoise O. Lepage Center for Global Innovation at Dominican University of California. Previously, he was an associate professor in the School of Media, Arts, and Science at Sogang University in South Korea, where he has lived for twenty years. He currently represents the Korean National Body at ISO as Convener of a working group on document description, processing languages, and semantic metadata (ISO/IEC JTC 1/SC 34 WG 9). His recent research projects have concerned "Digital humanities in the iSchool" (JASIST, 2022), "Copy theory" (JASIST, 2022), "Context, relevance, and labor" (JASIST, 2022), as well as the use of deep learning to improve Korean OCR, for which he received a national citation of merit from the South Korean Ministry of Culture, Sports, and Tourism. More at www.pwdef.info.
Oct 14: Cathy MARSHALL: Who Broke Mechanical Turk?
Crowdsourcing platforms provide a valuable way to
perform a wide range of human intelligence tasks--e.g. data labeling,
content moderation, text translation, citizen science--as well as a
convenient venue for collecting participant data. I've been using
Amazon Mechanical Turk in various capacities since 2010, and have
followed worker forums, labor organizing efforts, and the development
of worker-centered tools (on one side) and increasingly sophisticated
uses of the crowd (on the other). Early on, my colleagues and I were
(perhaps naively) delighted by the quality of the data we gathered
and by generally positive interactions we had with workers. Using
practical advice from the literature, we were able to vet work and
encourage good-faith participation in our studies.
More recently, a handful of researchers from diverse
disciplines who use crowdsourcing platforms have described an uptick
in unusable data from US-based workers. Frank Shipman and I saw this
ourselves in 2018 and 2019 when we re-ran a survey we'd used
successfully five years earlier: by 2019, we had to exclude more than
12% of the completed HITs according to our established cleaning
heuristics. Even knowing this, what we saw on Mechanical Turk this
spring and summer startled us. Almost 90% of the data was unusable.
In this talk, I'll use a preliminary analysis of our own and other
researchers' data in an effort to explain what seems to be happening
on Mechanical Turk, present evidence of why it's not necessarily a
symptom of bots, autocompletion tools, or bad faith work, and
speculate why Amazon has little incentive to do anything about it.
Cathy Marshall is a senior research scientist
at Texas A&M University and a former principal researcher at Microsoft
Research, and before that, at Xerox PARC. She's a fan of personal
ephemera, special collections, and a quiet reading room.
Oct 21: Students' Progress Reports.
Sarah BARRINGTON, Arogya KOIRALA, Shai DHALIWAL,
Calvin LEE, Alan KYLE, Siddharth ADELKAR and Ameya NAIK.
Sarah BARRINGTON: The Fungibility of Non-Fungible Tokens:
A Quantitative Analysis of ERC-721 Metadata.
Non-Fungible Tokens (NFTs), digital certificates of
ownership for virtual art, have until recently been traded on a highly
lucrative and speculative market. Yet, an emergence of misconceptions,
along with a sustained market downtime, are calling the value of NFTs
into question. This project (1) describes three properties that any
valuable NFT should possess (permanence, immutability and uniqueness),
(2) creates a quantitative summary of permanence as an initial criteria,
and (3) tests our measures on 6 months of NFTs on the Ethereum blockchain,
finding 45% of ERC721 tokens in our corpus do not satisfy this initial
criterion. Our work could help buyers and marketplaces identify and warn
users against purchasing NFTs that may be overvalued.
Arogya KOIRALA: Monitoring War Destruction from Space
Using Machine Learning.
Extracting information on war related destruction is
difficult because it relies on eyewitness accounts and manual detection
on the field, which is not only costly for institutions carrying out
these efforts, but also unsafe for the individuals carrying out this
task. The information gathered is also incomplete, which makes it
difficult for use in media reporting, carrying out humanitarian relief,
understanding human rights violations, or academic reporting. This
seminar introduces an automated approach to measure destruction in
war damaged buildings by applying deep learning in publicly available
satellite imagery. We adapt different neural network architectures and
make them applicable for the building damage detection use case. As a
proof of concept, we apply this method to the Syrian Civil War to
reconstruct war destruction in these regions over time. In the last
talk we outlined the problem space, and talked about different
data-related considerations to keep in mind when approaching the
problem using machine learning. For this talk, we will take a closer
look at the data, introduce the different machine learning architectures
that we will employ, and (if time permits) discuss the potential
benefits and drawbacks of these choices as they relate to our goal of
identifying war-related building destruction.
Shai DHALIWAL: Modernizing Mainframe RACF.
During this seminar, I plan to explore how cloud
modernization will improve cyber security for legacy mainframe
information systems, focusing on research progress made for below
technical areas:
- Structuring Mainframe Information, Metadata, Databases
- Tokenization, Privacy, and Standards, and
- Time permitting: RACF User Migration to the Cloud & Benefits.
Calvin LEE: Evaluating Consumer Robocall Mitigations.
With the all-time high in October 4.5B robocalls were
made across the US. We will evaluate the most and least effective
mitigations as to date. We will provide an update and review the
latest policies in how the Federal Communications Commission is placing
new obligations for gateway providers to play a more active role to
curb this abuse.
Alan KYLE: Artificial Intelligence and Machine Learning
Fairness.
In this presentation I will share my progress in
formulating my project on machine learning fairness toolkits. As a
part of a large team, my contribution will focus on the policy aspects
of these tools. Questions to be answered are: How are fairness toolkits
used? What are the current AI policies that practitioners need to be
advised on? And, how can organizations promote fairness practices
through their internal policies?
Siddharth ADELKAR: English Documentation of Non-English
Stories: A Study of the People's Archive of Rural India.
In understanding what information is lost or gained in
English-first documentation of Indian conditions, I investigate the
discourse in Indian English and its closeness to other Indian languages.
I focus on four dimensions of language viz. phonology, lexicon, syntax
but most importantly, what is being said – the contexts. I consider
the verbal and written communications of Indian English writers who
are native Marathi speakers, and compare the distance between their
English and Marathi. If Indian English is closer to Indian languages
than to, say English (UK) and if the distance between them is
comparable to that between native dialects of English then it would
be fair to say that Indian English is only as good or bad in documenting
local conditions as its native languages.
Ameya NAIK: Assessing “Data” in Data Subject Access Requests.
Any mobile or desktop application prompts you to sign
a term of policy agreement even before using it.
This term of policy agreement allows the application to gather certain
information related to you, your
activities, and your attributes. The level and type of information gathered
depend on the organization,
business model, kind of application, and geographical location. EU Data
protection and CCPA grants the
consumer the right to personal data the company holds on them.
While GDPR (Article 15) and CCPA have broadly drawn rules and regulations
for Data Subject Access
Requests (DSAR), however, there still are differences in the way the data
is stored and shared with the
consumers (you). We have initiated DSARs for certain mobile applications,
which are categorized in
similar categories, taking due notes of the process, and comparing the
data and visualzing the data.
Oct 28: Early start! 2:30 pm.
Seminar combined with the School's 104th Birthday
Celebration.
For program and registration go to www.ischool.berkeley.edu/events/2022/104th-birthday-celebration.
Nov 4: Clifford LYNCH: Stewardship: The Scope of the Challenge
(continued).
In today's talk, I'll conclude my survey of
now-digital mass-market content with an examination of digital news
material. I'll next examine the stewardship challenges involved in
social media platforms, and, time permitting, start a discussion
of personal digital archiving and related issues.
Nov 11: No Seminar meeting. Veterans' Day holiday.
Nov 18: Final Progress Reports by Sarah BARRINGTON, Arogya KOIRALA,
Shai DHALIWAL, Calvin LEE, Alan KYLE, Siddharth ADELKAR and Ameya NAIK.
Sarah BARRINGTON: The Fungibility of Non-Fungible Tokens:
A Quantitative Analysis of ERC-721 Metadata.
Non-Fungible Tokens (NFTs), digital certificates of
ownership for virtual art, have until recently been traded on a highly
lucrative and speculative market of $17.6bn. Yet, an emergence of
misconceptions, along with a sustained market downtime, are calling the
value of NFTs into question. This project (1) describes three properties
that any valuable NFT should possess (permanence, immutability and
uniqueness), (2) creates a quantitative summary of permanence as an
initial criteria, and (3) tests this measure on 6 months of NFTs on the
Ethereum blockchain, finding 45% of ERC721 tokens in our corpus do not
satisfy this initial criteria. We find that, ultimately, 75% of
ERC-721NFT assets are stored off the blockchain entirely. Our work could
help buyers and marketplaces identify and warn users against purchasing
NFTs that may be overvalued.
Arogya KOIRALA: Monitoring War Destruction from Space
Using Machine Learning.
In the last talk we presented different deep neural
network architectures we have employed to detect building destruction
from satellite imagery. We also noted some challenges, particularly
around computational efficiency and model understanding. In this talk
we walk you through approaches that we have brainstormed for addressing
some of these challenges, viz. improving computational efficiency,
getting a better sense of what the model is learning, and gaining
clarity of the gain (if any) that we derive from employing these
deep networks.
Shai DHALIWAL: Modernizing Mainframe RACF.
During this seminar, I plan to provide a final update on
my research into how cloud modernization will improve cyber security
for legacy mainframe information systems:
- Provide an in depth analysis on opportunities for
organizations to migrate their mainframe workloads securely to the cloud.
- Provide a final recommendation on what will come next in
this space and showcase a potential framework for organizations to implement a RACF modernization program.
Calvin LEE: The Robocall & Robotext Trajectory.
Since the last Federal Communications Commission robocall
mitigation initiative, 9 phone companies have been shut down due to
non-compliance. Since then, there has been a reduction of 1.5B robocalls.
Although the measures are proving to be effective, bad actors have been
migrating into the robotext space instead. In our exploration of robocall
and robotext’s current state of affairs, we will uncover the latest
happenings in politics and industry.
Alan KYLE: Artificial Intelligence and Machine Learning
Fairness.
This final progress report will discuss the initial steps
and findings of a MIMS capstone project to build an AI fairness toolkit
for the healthcare space. Specifically, international public policy and
ethics frameworks for AI will be discussed.
Siddharth ADELKAR: English Documentation of Non-English
Stories: A Study of the People's Archive of Rural India.
In my final report I shall cover two other lines of
investigation apart from English-first documentation of Indian life.
I discuss the adaptability of Large Language Models such as Open AI's
Whisper to Indian accented English and languages. I also start a
preliminary study of multilingual "course articulation" problem in
universities.
Ameya NAIK: Assessing “Data” in Data Subject Access Requests.
Any mobile or desktop application prompts you to sign
a term of policy agreement even before using it. This term of policy
agreement allows the application to gather certain information related to
you, your activities, and your attributes. The level and type of
information gathered depend on the organization, business model, kind
of application, and geographical location. EU Data Protection and CCPA
(and new CPRA) grant the consumer the right to personal data the company
holds on them, and all major applications have mechanisms to access this
information often referred as Data Subject Access requests.
The project dives into evaluating the process of raising these DSARs and
analyzing the data applications have on their users. Additionally, we
also analyzed how this process works in educational institutions.
Nov 25: No Seminar meeting. Thanksgiving.
Dec 2: In person and Zoom.
Clifford LYNCH: Stewardship: The Scope of the Challenge
(continued).
Today's talk will focus on stewardship and preservation
issues related to social media. This will include a look at the evolving
social media "universe" and the various platforms that might be viewed
(or not viewed) as part of this universe, consideration of reasons why
some platforms might be prioritized over others and how to adapt to
shifting usage patterns among platforms. The varying possible objectives
of stewardship activities surrounding social media -- and the feasibility
of achieving them -- will also be discussed at some length from
technical, legal and policy perspectives.
If time permits, we will continue on with a discussion
of personal digital archiving (including its relationship to the
stewardship of social media). I expect that this discussion will
continue at the opening session of the Spring 2023 Seminar on
January 20, 2023.
The Seminar will resume in the Spring semester.
Spring
2022 schedule and summaries.
Spring
2023 schedule and summaries.