School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2020.
Fridays 3-5. 107 South Hall.
Schedule. Weekly
mailing list.
Details will be added as they become available.
Jan 24: Clifford LYNCH: Perhaps We Have Framed the Research Data
Management Challenge Incorrectly?
Introduction to the Seminar.
Over the past year I've been doing a good deal of thinking
about the broad way we've framed the work of research data management
and preservation, and the roles of various parties (researchers, data
curators, repositories, etc) in this effort. The dominant model to date
has been one of describing datasets, archiving them into repositories,
and assuming that they will be discovered and reused by other scholars.
I'll critically examine this model and some of the ideas -- for example,
the FAIR principles, privacy challenges, and widespread use of machine
learning -- place great stress on this model, along with the
proliferation of what I'll call "scholarly information aggregation and
management environments". I'll speculate about what these developments
may imply for how to reformulate the research data management enterprise,
including some discussion about implications for funding, roles, and
resource allocation and prioritization.
Jan 31: Michael BUCKLAND: Genres of Library Service: Economics,
Ideology, Technology, etc.
Each library is unique, of course, but there are
different types and, historically, there have been large differences
between countries, as well as major changes over time. Historical studies
have suggested some causal influences but not their relative importance.
International comparative studies, popular in the 1970s and 1980s
were heavily descriptive with little explanatory analysis.
I will review issues that seem to me important for the interpretation
and explanation of differences.
Feb 7: Cathy MARSHALL: The US Census and Social Media Archiving.
Social media archiving at an institutional level (like
the Library of Congress’s now-truncated effort to archive Twitter) is
viewed as a grand challenge technical problem. Yet the future use of
and modes of participation in institutional social media archives have
received less attention. Social media users are generally averse to
such archiving efforts, even if only public accounts and data are
collected. Just a small minority perceive any long-term value of such
an enormous effort; far more feel that the risk is unjustified (and
continue to feel this way in the wake of scandals like Cambridge
Analytica). What can we learn from the US Census and the questions
it was slated to answer in 1940, when 120,000 enumerators went
door-to-door and conducted more than 37 million in-person household
interviews? What stories does the census tell—-both inadvertently and
intentionally-—80 years later, now that the source data has been
released from embargo? I will discuss use, participation, and
confidentiality issues entailed by social media archiving through
the lens of recent study results, coupled with examples drawn from
contemporaneous reactions and responses to the 1940 US Census.
Cathy Marshall is a research scientist in the
CS Department at Texas A&M University, and a former principal researcher
at Microsoft Research, Silicon Valley and Xerox PARC.
Feb 14: Monique le Conge ZIESENHENNE, Palo Alto:
The Role of Public Libraries, Today and Tomorrow.
We'll discuss how public libraries are serving their
communities today, how librarians are following trends for the future
and planning to meet coming challenges. Literacy, social issues, the
political environment, and technology, among many other issues, all
play a part as cities and counties respond to basic daily needs and
work to solve long-term issues.
Monique le Conge Ziesenhenne has been the Assistant
City Manager in Palo Alto since April 2019. Before that, she was the
Library Director and has been the Community Services Director and worked in
the area of public art. She has been a library consultant and worked
for a high school and a county library before working as a
library director in the Bay Area since 1998. Monique earned a Bachelor of
Science degree in Design from UC Davis in 1987. She followed with a
Master of Library and Information Studies from UC Berkeley in 1988, and a
PhD in Managerial Leadership in Information Professions, from Simmons
University in 2017. She has served as the President of the Public Library
Association and of the California Library Association.
Feb 21: Hany FARID: Assessing the Reliability of Clothing-Based
Forensic Identification.
A 2009 report by the National Academy of Science was
highly critical of many forensic practices. This report concluded that
significant changes and advances were required to ensure the reliability
across the forensic sciences. We examine the reliability of one such
forensic technique used for identification based on purported distinct
patterns on the seams of denim pants. Although first proposed more
than twenty years ago, no thorough analysis of reliability or
reproducibility of this forensic technique has previously been
reported. We performed a detailed analysis of this forensic technique
to determine its reliability and efficacy.
Joint work with postdoctoral scholar Sophie
Nightingale.
Feb 28: Seminar Project Progress Reports.
Chintan VYAS: Cloud Information Systems and Advancements in
Infrastructure Technologies.
Information Systems all around the world are going through a
major disruption due to recent advancements in infrastructure technology.
Cloud computing has changed the way organizations store, retrieve, analyze
and share data. Servers, storage, networking, software, analytics, and
intelligence are all being offered as a service over the internet to allow
organizations to innovate faster with flexible resources and leverage
economies of scale. With cloud information systems, users typically only
pay for cloud services they use, helping them lower their operating costs,
run their infrastructure more efficiently, and scale as the needs of an
organization change. My study focuses on studying the evolution of
infrastructure technologies and how it could be used to create a
distributed robust and scalable search engine to index documents for
information retrieval.
Chintan Vyas is a second-year graduate student at
the School of Information working towards developing a cloud-based media
management solution that could be used by organizations to automatically
index and retrieve rich media (images, audio, and video). Recently, I
worked as a Product Manager at Sumo Logic (a machine data analytics company)
and was a Software Engineer for 3 years prior to pursuing graduate studies.
More at http://chintanvyas.com.
Vikramank SINGH and Mekhola MUKHERJEE: Accelerating Human
Learning through AI.
In the study of human learning, there is broad evidence
that our ability to retain information improves with repeated exposure
and decays with delays since last exposure. This plays a crucial role
in the design of educational software, leading to a trade-off between
teaching new material and reviewing what has already been taught. A
common way to balance this trade-off is spaced repetition, which uses
a periodic review of content to improve long-term retention. Though
spaced repetition is widely used in practice, e.g., in electronic
flashcard software, there is little formal understanding of the design
of these systems. Our work will address two core problems in sequential
learning of vocabulary using flashcards or a mobile application. First,
is there a specific order to make students learn the vocab in a
better way than just memorizing them in alphabetical order? Second, how
can we better optimize the revision of flashcards, so that we can increase
the overall performance of the student with less revision
of vocab in the learning process?
Vikramank Singh is a second year graduate student
in the MIMS program. He has a background in machine learning and
reinforcement learning. Prior to Berkeley, Vikramank
spent time at Facebook Research as a software engineer in Machine
learning and at the MIT Media Lab as a Deep Learning Researcher.
Currently most of his research is around the topics of sequential
decision making for large scale systems using reinforcement learning
and machine learning. For more see www.vikramanksingh.org.
Mekhola Mukherjee is a second year MIMS student.
She has beena software engineer at Hewlett Packard Enterprise for 3
years working in the computational storage domain. Her current interests
lie in the field of data privacy and machine learning. For more see www.ischool.berkeley.edu/people/mekhola-mukherjee.
Aobo LYU: The Evolution of e-Commerce
Platform Marketing Modes.
China's e-commerce is developing rapidly. In recent years,
a variety of e-commerce platform marketing methods have emerged,
such as allowing consumers to obtain discounts by itemizing product
information on their social media accounts and allowing consumers to
gather friends together to complete tasks arranged by the platform
to get money. This study focuses on the characteristics of this
marketing change, the role of information technology in it,
and analyzes the driving force behind this change.
Step 1. Through the research on the topic and abstracts
of research papers in the field of e-commerce in recent years, find
out the research “aspects” in the field of e-commerce.
Step 2. By studying the changes of one e-commerce platform
in recent years, analyze each research “aspect”, and summarize the
trends of different research “aspects”. Step 3. By analyzing the
relationship between different research “aspects”, the overall trend
and evolution power of e-commerce platform evolution are obtained.
Aobo Lyu is an exchange student from China. His major
is information management and information systems, and he is
currently doing research in e-commerce and studying system theory,
information theory and Cybernetics.
Mar 6: Howard BESSER: Archiving the Non-Organizational Born-Digital:
The Challenges Posed by Material from Individuals, Communities, Social
Movements, and Events.
The transition from analog to digital creation and
communication has
forced Archives and Special Collections to handle the vastly larger corpus
of born-digital records that have already begun to enter our archives. The
ubiquity of devices for recording and sending has not only increased the
number of items that any individual creates, but has led to the creation
of vast numbers of media types (digital videos and photographs, emails,
tweets, social media postings, ... We need to find smart ways to handle
this digital deluge, particularly ways to streamline the processes of
selection/appraisal, ingest, and description.
In this presentation, Howard Besser will discuss
his work with archivists,
individuals, and community groups in addressing some of the challenges of
this digital deluge. He will particularly look at the problems posed by
personal, community and event-based born-digital material--the type of
material that documents the lives of ordinary people and the social and
community organizations that they form. The presentation will illustrate
Archiving community, as well as the work of Activist Archivists in
documenting the Occupy Wall Street movement.
After a dozen years as an LIS professor, Howard Besser
became Professor of Cinema Studies at New York University, and Founding
Director of the Moving Image Archiving
& Preservation MA Program. His work over the past 35 years has emphasized
policy issues (copyright, privacy), technology issues (image and
multimedia databases), metadata (Dublin Core, METS, PREMIS), media
archiving and preservation (Personal Digital Archiving, museum time-based
media conservation), and teaching with technology (distance learning). He
is a graduate of South Hall.
Also Ankit BANSAL: Enhancing Digital Media Search
and Retrieval using AI.
  Project Progress Report. With the explosion of digital
content in the form of
audio, image and video files, the ability to understand the content
of the media serves as the key for faster retrieval and deeper
analysis. I aim to use various forms of Artificial Intelligence
(Computer Vision, Speech to Text, NLP etc) to create metadata that
aids in enhanced lookup of the required information. The goal would
be to understand the current state of the art, the needs of different
kinds of users, and explore the technical infrastructure needed to
meet those requirements.
Ankit Bansal is a second year masters student.
His focus areas are Cybersecurity and Artificial Intelligence. In the
past, he has been affiliated with Cisco, National University of
Singapore, BLUES Lab, and Samsung Research on projects relating
to assistive technology, accessibility, privacy and security of
IoT devices. Currently, his research involves the development and
detection of AI generated fake videos called Deepfakes, which are
increasingly used for spreading misinformation in the context of elections.
Mar 13: Robert SANDERSON, J. Paul Getty Trust: Tiers of Abstraction and
Audience in Cultural Heritage Data Modeling.
When modeling data, and especially uncertain, historical
and culturally sensitive data such as is managed by museums, archives and
special collections, there are always design and scoping decisions that
can seriously impact the usability, precision and sustainability of any
system built with that data. Too simple, and the data will not capture
sufficient knowledge for it to be any more useful than a web page, and
too complete and the data will be incomprehensible to anyone other than
the data model architect. I have previously and widely argued for
usability as a key indicator for success, and in this presentation will
expand upon that to investigate two parallel sets of interactions: the
different abstraction layers that must be considered when modeling
knowledge, and the different audiences that must be taken into account
when publishing that knowledge.
There are four tiers of abstraction in data modeling, and
different systems have chosen to make some or all of these explicit.
The presentation will cover several initiatives, and their choices about
the need for separation between conceptual model, ontology, vocabulary
and application profile. Different choices in abstraction give different
outcomes for the resulting data, which then has a direct impact on the
ability to serve the needs of the four tiers of audience: Humans,
Machines, the Network, and Research. These audiences have different
requirements and expectations, especially when it comes to features
such as modeling uncertainty and the provenance of the data, combined
with social factors such as trust, credit and usage metrics.
Dr. Robert Sanderson is an internationally known
information scientist and expert in Linked Open Data and cultural heritage
standards.
He is the J. Paul Getty Trust’s first Semantic Architect and is a passionate
advocate for open digital cultural heritage. He is responsible for the
design and direction of cultural heritage data information models and their
implementation with a primary goal of striking the right balance between
ease of publication and consumption, and the precision of the data’s
semantics. He is a co-chair of the Linked Art SIG, and a member of the
CIDOC-CRM SIG in ICOM. He is chair of the JSON-LD Working Group in the W3C,
a specification editor and long-standing leader in the IIIF community,
and on the advisory boards of many projects in the digital cultural and
knowledge sector including the American Art Collaborative, Annotating
All Knowledge, and Art Information Commons. He has previously been a
Standards Advocate at Stanford University, a Research Scientist at Los
Alamos National Laboratory’s Research Library, and a Lecturer in Computer
Science at the University of Liverpool. Previous projects and
collaborations include the Open Annotation Collaboration and subsequent
W3C standard, NISO/OAI's ResourceSync, Memento, SRW/SRU, and the Cheshire
information retrieval system. His Ph.D. focused on an interdisciplinary
digital edition of a Medieval French manuscript chronicling the Hundred
Years War. More at interview.
Mar 20: ** CANCELLED. WE HOPE TO RE-SCHEDULE **
Mary ELINGS and Christina FIDLER, Bancroft Library:
Building a Born Digital Archives Program at the Bancroft Library.
Much of the focus in born digital archives rightly centers
on the technical practices and tools necessary to manage these
collections, however, this is just one layer of the overall management
of these materials. A born digital archives program must address a user's
ability to access and work with these materials in practical ways while
both challenging and conforming to traditional archival practices.
In this presentation, Mary Elings and Christina V. Fidler
will discuss the Bancroft Library’s multifaceted approach to managing
born digital collections and their efforts to build a framework to
support a sustainable digital archives program. They will discuss the
challenges born digital collections present throughout the archival
process including appraisal/selection, arrangement/description, and
access/preservation as well as the tools they use to address those
challenges.
Mary Elings is Assistant Director and Head of
Bancroft's Technical Services division overseeing acquisitions,
accessioning, cataloging, archival processing, and digital collections
units. Prior to becoming Assistant Director in 2017, she served as Head
of the Digital Collections unit which is responsible for the creation
and management of Bancroft's digital collections. Ms. Elings taught a
graduate course in Digital Collections for over ten years and regularly
presents on that topic. More at https://update.lib.berkeley.edu/2018/03/29/the-bancroft-library-announces-new-head-of-technical-services/.
Christina V. Fidler is the Digital Archivist at the
Bancroft Library. Prior to this role, she was the Museum Archivist at UC
Berkeley’s Museum of Vertebrate Zoology. She has also worked at the
California Academy of Sciences as the Digital Projects Coordinator and
as the Project Manager for the Academy Library’s IMLS Grant "Connecting
Content."
Mar 27: No Seminar. Spring break.
Apr 3: Michael BUCKLAND: 101 Years of Women.
There is a campuswide project celebrating 150 years
of women on the Berkeley campus. So what about the history and heritage
of women in this School? There has always been more women than men.
In every case there is a human interest story as well as a
professional or academic record.
If we wanted to know more about them as individuals or as a group
how could we find out? I will combine brief
accounts of some individual women in the School's past with discussion
of sources and of how this history and heritage could be made more
accessible. Ppt.
Apr 10: Deirdre MULLIGAN: Covid-19 Surveillance.
Everyday a new government or private sector initiative
that uses trace data from social or cellphone networks to address
COVID-19 in real time, or to better understand its spread or the
impact of policy decisions to date is announced. There are ongoing
debates about the desirability, legality, and disparate impact of
various efforts. In this discussion I would like to step back and
consider this as an example of an emerging private sector research
infrastructure--first really brought to light by the Facebook
emotional contagion study--and consider its normative and practical
implications for the research community and the public.
Join us for a conversation.
Apr 17: Jeffrey MACKIE-MASON and Rachael SAMBERG:
UC Berkeley's Digital Lifecycle Program: Mass Digitization of Special
Collections for Use and Preservation.
To join the Seminar go to the School's event announcement for it.
Nearly all of UC Berkeley's 13 million circulating
volumes are digitized and held for public use by the HathiTrust.
However, Berkeley also has vast rare and special historical
collections, most of which have not been digitized. By rough count,
we have about 5 million pages digitized so far, with at least 200
million to go. To pursue making (almost) all of these collections
easily accessible by anyone, anytime, from anywhere -- and to
ensure that our digital collections are effectively usable today
and preserved for future generations -- we have launched the
Digital Lifecycle Program.
We will give a brief overview of the history and
key architectural elements of the Program. We will then discuss
in some depth one of the special challenges for mass digitization
of special collections that Berkeley has tackled on behalf of
institutions everywhere: protocols and workflows for responsible
access. We address efficient, thoughtful and responsive treatment
of four issues: copyright restrictions, privacy rights, ethical
concerns, and contractual (gift agreement) restrictions.
Jeffrey MacKie-Mason is University Librarian, Professor School of Information, and Professor of Economics. More at https://jeff-mason.com/.
Rachael Samberg is Scholarly Communications
Officer, UC Berkeley Library. More at https://www.lib.berkeley.edu/scholarly-communication/about/team.
Apr 24: Seminar Project Reports: Ankit BANSAL and Chintan VYAS; Aobo LYU;
Mekhola MUKHERJEE and Vikramank SINGH.
Ankit BANSAL and Chintan VYAS: AI-powered Digital
Media Management and Analysis.
With the explosion of digital content in the form of
audio, image and video files, the ability to understand the content
of the media serves as the key for faster retrieval and deeper analysis.
The team aims to create a cloud platform to achieve this objective.
Our solution will use various forms of AI (Computer Vision, Speech to
Text, NLP, etc) to create metadata that aids in enhanced lookup of
the required information.
Aobo LYU: The Evolution of e-Commerce Platform Data
Processing Activity.
This study conducts a case study on Taobao and Pinduoduo,
two famous e-commerce platforms in China, and analyzes the related articles
in the past 15 years in the three journals: Journal of Theoretical and
Applied Electronic Commerce Research, International Journal of
Electronic Commerce, and Electronic Commerce Research, summarizing
the e-commerce platform's evolution trend in user information processing
and application process, and predicts the future e-commerce platform user
data utilization activities. The conclusion of evolution analysis is
presented in the form of an evolution model, which includes three parts:
data collection type, data analysis method and analysis result
application.
Mekhola MUKHERJEE and Vikramank SINGH: Statistical Analysis
of Human Procedural Learning.
The order in which learners are exposed to various modules
and its effect on mastery and long term retention has been widely studied
in education-based literature. Most of this experimentation has been in
the field of mathematics. Our goal is to evaluate the effectiveness of
blocked and interleaved tasks in a procedural language like coding. We
also aim to examine whether there is a ‘best’ order within modules that
can lead to the best learning outcomes and if we can find such an order
with statistical methods.
Ankit Bansal, Chintan Vyas, Mekhola Mukherjee, and
Vikramank Singh are second year MIMS students.
Aobo Lyu is an exchange student from China.
May 1: Wayne de FREMERY and Michael BUCKLAND: Copy Theory.
The history and theory of copying and of copies has
attracted very little attention compared with writing, printing,
telecommunications, and computing. That is unfortunate because
copies and copying (which have the same root as copiousness!)
have had a very large impact. Imagine adjusting to life without any copies.
Seemingly a simple and commonplace word, "copy" is
like "information" and "relevance": The meaning is obvious
until you try to be precise. "Copy" has multiple meanings;
it becomes quite complex when analyzed; and its goes a long way towards
providing a unifying theory for much of information science and technology.
Wayne de Fremery teaches Korean literature and
bibliography at Sogang University in Seoul, Korea, where he develops
new technologies for investigating Korean literature and documentary
traditions, as well as information systems as cultural systems.
The Seminar will resume in the Fall Semester.
Fall
2019 schedule and summaries. Fall
2020 schedule and summaries.