296a-1 Seminar on Information Access Summaries Spring 2020

School of Information
Previously School of Library & Information Studies

Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2020.

Fridays 3-5. 107 South Hall. Schedule. Weekly mailing list.

Details will be added as they become available.

Jan 24: Clifford LYNCH: Perhaps We Have Framed the Research Data Management Challenge Incorrectly?
Introduction to the Seminar.
Over the past year I've been doing a good deal of thinking about the broad way we've framed the work of research data management and preservation, and the roles of various parties (researchers, data curators, repositories, etc) in this effort. The dominant model to date has been one of describing datasets, archiving them into repositories, and assuming that they will be discovered and reused by other scholars. I'll critically examine this model and some of the ideas -- for example, the FAIR principles, privacy challenges, and widespread use of machine learning -- place great stress on this model, along with the proliferation of what I'll call "scholarly information aggregation and management environments". I'll speculate about what these developments may imply for how to reformulate the research data management enterprise, including some discussion about implications for funding, roles, and resource allocation and prioritization.

Jan 31: Michael BUCKLAND: Genres of Library Service: Economics, Ideology, Technology, etc.
Each library is unique, of course, but there are different types and, historically, there have been large differences between countries, as well as major changes over time. Historical studies have suggested some causal influences but not their relative importance. International comparative studies, popular in the 1970s and 1980s were heavily descriptive with little explanatory analysis. I will review issues that seem to me important for the interpretation and explanation of differences.

Feb 7: Cathy MARSHALL: The US Census and Social Media Archiving.
Social media archiving at an institutional level (like the Library of Congress’s now-truncated effort to archive Twitter) is viewed as a grand challenge technical problem. Yet the future use of and modes of participation in institutional social media archives have received less attention. Social media users are generally averse to such archiving efforts, even if only public accounts and data are collected. Just a small minority perceive any long-term value of such an enormous effort; far more feel that the risk is unjustified (and continue to feel this way in the wake of scandals like Cambridge Analytica). What can we learn from the US Census and the questions it was slated to answer in 1940, when 120,000 enumerators went door-to-door and conducted more than 37 million in-person household interviews? What stories does the census tell—-both inadvertently and intentionally-—80 years later, now that the source data has been released from embargo? I will discuss use, participation, and confidentiality issues entailed by social media archiving through the lens of recent study results, coupled with examples drawn from contemporaneous reactions and responses to the 1940 US Census.
Cathy Marshall is a research scientist in the CS Department at Texas A&M University, and a former principal researcher at Microsoft Research, Silicon Valley and Xerox PARC.

Feb 14: Monique le Conge ZIESENHENNE, Palo Alto: The Role of Public Libraries, Today and Tomorrow.
We'll discuss how public libraries are serving their communities today, how librarians are following trends for the future and planning to meet coming challenges. Literacy, social issues, the political environment, and technology, among many other issues, all play a part as cities and counties respond to basic daily needs and work to solve long-term issues.
Monique le Conge Ziesenhenne has been the Assistant City Manager in Palo Alto since April 2019. Before that, she was the Library Director and has been the Community Services Director and worked in the area of public art. She has been a library consultant and worked for a high school and a county library before working as a library director in the Bay Area since 1998. Monique earned a Bachelor of Science degree in Design from UC Davis in 1987. She followed with a Master of Library and Information Studies from UC Berkeley in 1988, and a PhD in Managerial Leadership in Information Professions, from Simmons University in 2017. She has served as the President of the Public Library Association and of the California Library Association.

Feb 21: Hany FARID: Assessing the Reliability of Clothing-Based Forensic Identification.
A 2009 report by the National Academy of Science was highly critical of many forensic practices. This report concluded that significant changes and advances were required to ensure the reliability across the forensic sciences. We examine the reliability of one such forensic technique used for identification based on purported distinct patterns on the seams of denim pants. Although first proposed more than twenty years ago, no thorough analysis of reliability or reproducibility of this forensic technique has previously been reported. We performed a detailed analysis of this forensic technique to determine its reliability and efficacy.
Joint work with postdoctoral scholar Sophie Nightingale.

Feb 28: Seminar Project Progress Reports.
Chintan VYAS: Cloud Information Systems and Advancements in Infrastructure Technologies.
Information Systems all around the world are going through a major disruption due to recent advancements in infrastructure technology. Cloud computing has changed the way organizations store, retrieve, analyze and share data. Servers, storage, networking, software, analytics, and intelligence are all being offered as a service over the internet to allow organizations to innovate faster with flexible resources and leverage economies of scale. With cloud information systems, users typically only pay for cloud services they use, helping them lower their operating costs, run their infrastructure more efficiently, and scale as the needs of an organization change. My study focuses on studying the evolution of infrastructure technologies and how it could be used to create a distributed robust and scalable search engine to index documents for information retrieval.
Chintan Vyas is a second-year graduate student at the School of Information working towards developing a cloud-based media management solution that could be used by organizations to automatically index and retrieve rich media (images, audio, and video). Recently, I worked as a Product Manager at Sumo Logic (a machine data analytics company) and was a Software Engineer for 3 years prior to pursuing graduate studies. More at http://chintanvyas.com.
Vikramank SINGH and Mekhola MUKHERJEE: Accelerating Human Learning through AI.
In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delays since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses a periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our work will address two core problems in sequential learning of vocabulary using flashcards or a mobile application. First, is there a specific order to make students learn the vocab in a better way than just memorizing them in alphabetical order? Second, how can we better optimize the revision of flashcards, so that we can increase the overall performance of the student with less revision of vocab in the learning process?
Vikramank Singh is a second year graduate student in the MIMS program. He has a background in machine learning and reinforcement learning. Prior to Berkeley, Vikramank spent time at Facebook Research as a software engineer in Machine learning and at the MIT Media Lab as a Deep Learning Researcher. Currently most of his research is around the topics of sequential decision making for large scale systems using reinforcement learning and machine learning. For more see www.vikramanksingh.org.
Mekhola Mukherjee is a second year MIMS student. She has beena software engineer at Hewlett Packard Enterprise for 3 years working in the computational storage domain. Her current interests lie in the field of data privacy and machine learning. For more see www.ischool.berkeley.edu/people/mekhola-mukherjee.
Aobo LYU: The Evolution of e-Commerce Platform Marketing Modes.
China's e-commerce is developing rapidly. In recent years, a variety of e-commerce platform marketing methods have emerged, such as allowing consumers to obtain discounts by itemizing product information on their social media accounts and allowing consumers to gather friends together to complete tasks arranged by the platform to get money. This study focuses on the characteristics of this marketing change, the role of information technology in it, and analyzes the driving force behind this change.
Step 1. Through the research on the topic and abstracts of research papers in the field of e-commerce in recent years, find out the research “aspects” in the field of e-commerce. Step 2. By studying the changes of one e-commerce platform in recent years, analyze each research “aspect”, and summarize the trends of different research “aspects”. Step 3. By analyzing the relationship between different research “aspects”, the overall trend and evolution power of e-commerce platform evolution are obtained.
Aobo Lyu is an exchange student from China. His major is information management and information systems, and he is currently doing research in e-commerce and studying system theory, information theory and Cybernetics.

Mar 6: Howard BESSER: Archiving the Non-Organizational Born-Digital: The Challenges Posed by Material from Individuals, Communities, Social Movements, and Events.
The transition from analog to digital creation and communication has forced Archives and Special Collections to handle the vastly larger corpus of born-digital records that have already begun to enter our archives. The ubiquity of devices for recording and sending has not only increased the number of items that any individual creates, but has led to the creation of vast numbers of media types (digital videos and photographs, emails, tweets, social media postings, ... We need to find smart ways to handle this digital deluge, particularly ways to streamline the processes of selection/appraisal, ingest, and description.
In this presentation, Howard Besser will discuss his work with archivists, individuals, and community groups in addressing some of the challenges of this digital deluge. He will particularly look at the problems posed by personal, community and event-based born-digital material--the type of material that documents the lives of ordinary people and the social and community organizations that they form. The presentation will illustrate Archiving community, as well as the work of Activist Archivists in documenting the Occupy Wall Street movement.
After a dozen years as an LIS professor, Howard Besser became Professor of Cinema Studies at New York University, and Founding Director of the Moving Image Archiving & Preservation MA Program. His work over the past 35 years has emphasized policy issues (copyright, privacy), technology issues (image and multimedia databases), metadata (Dublin Core, METS, PREMIS), media archiving and preservation (Personal Digital Archiving, museum time-based media conservation), and teaching with technology (distance learning). He is a graduate of South Hall.
Also Ankit BANSAL: Enhancing Digital Media Search and Retrieval using AI.
Project Progress Report. With the explosion of digital content in the form of audio, image and video files, the ability to understand the content of the media serves as the key for faster retrieval and deeper analysis. I aim to use various forms of Artificial Intelligence (Computer Vision, Speech to Text, NLP etc) to create metadata that aids in enhanced lookup of the required information. The goal would be to understand the current state of the art, the needs of different kinds of users, and explore the technical infrastructure needed to meet those requirements.
Ankit Bansal is a second year masters student. His focus areas are Cybersecurity and Artificial Intelligence. In the past, he has been affiliated with Cisco, National University of Singapore, BLUES Lab, and Samsung Research on projects relating to assistive technology, accessibility, privacy and security of IoT devices. Currently, his research involves the development and detection of AI generated fake videos called Deepfakes, which are increasingly used for spreading misinformation in the context of elections.

Mar 13: Robert SANDERSON, J. Paul Getty Trust: Tiers of Abstraction and Audience in Cultural Heritage Data Modeling.
When modeling data, and especially uncertain, historical and culturally sensitive data such as is managed by museums, archives and special collections, there are always design and scoping decisions that can seriously impact the usability, precision and sustainability of any system built with that data. Too simple, and the data will not capture sufficient knowledge for it to be any more useful than a web page, and too complete and the data will be incomprehensible to anyone other than the data model architect. I have previously and widely argued for usability as a key indicator for success, and in this presentation will expand upon that to investigate two parallel sets of interactions: the different abstraction layers that must be considered when modeling knowledge, and the different audiences that must be taken into account when publishing that knowledge.
There are four tiers of abstraction in data modeling, and different systems have chosen to make some or all of these explicit. The presentation will cover several initiatives, and their choices about the need for separation between conceptual model, ontology, vocabulary and application profile. Different choices in abstraction give different outcomes for the resulting data, which then has a direct impact on the ability to serve the needs of the four tiers of audience: Humans, Machines, the Network, and Research. These audiences have different requirements and expectations, especially when it comes to features such as modeling uncertainty and the provenance of the data, combined with social factors such as trust, credit and usage metrics.
Dr. Robert Sanderson is an internationally known information scientist and expert in Linked Open Data and cultural heritage standards. He is the J. Paul Getty Trust’s first Semantic Architect and is a passionate advocate for open digital cultural heritage. He is responsible for the design and direction of cultural heritage data information models and their implementation with a primary goal of striking the right balance between ease of publication and consumption, and the precision of the data’s semantics. He is a co-chair of the Linked Art SIG, and a member of the CIDOC-CRM SIG in ICOM. He is chair of the JSON-LD Working Group in the W3C, a specification editor and long-standing leader in the IIIF community, and on the advisory boards of many projects in the digital cultural and knowledge sector including the American Art Collaborative, Annotating All Knowledge, and Art Information Commons. He has previously been a Standards Advocate at Stanford University, a Research Scientist at Los Alamos National Laboratory’s Research Library, and a Lecturer in Computer Science at the University of Liverpool. Previous projects and collaborations include the Open Annotation Collaboration and subsequent W3C standard, NISO/OAI's ResourceSync, Memento, SRW/SRU, and the Cheshire information retrieval system. His Ph.D. focused on an interdisciplinary digital edition of a Medieval French manuscript chronicling the Hundred Years War. More at interview.

Mar 20: ** CANCELLED. WE HOPE TO RE-SCHEDULE **
Mary ELINGS and Christina FIDLER, Bancroft Library: Building a Born Digital Archives Program at the Bancroft Library.
Much of the focus in born digital archives rightly centers on the technical practices and tools necessary to manage these collections, however, this is just one layer of the overall management of these materials. A born digital archives program must address a user's ability to access and work with these materials in practical ways while both challenging and conforming to traditional archival practices.
In this presentation, Mary Elings and Christina V. Fidler will discuss the Bancroft Library’s multifaceted approach to managing born digital collections and their efforts to build a framework to support a sustainable digital archives program. They will discuss the challenges born digital collections present throughout the archival process including appraisal/selection, arrangement/description, and access/preservation as well as the tools they use to address those challenges.
Mary Elings is Assistant Director and Head of Bancroft's Technical Services division overseeing acquisitions, accessioning, cataloging, archival processing, and digital collections units. Prior to becoming Assistant Director in 2017, she served as Head of the Digital Collections unit which is responsible for the creation and management of Bancroft's digital collections. Ms. Elings taught a graduate course in Digital Collections for over ten years and regularly presents on that topic. More at https://update.lib.berkeley.edu/2018/03/29/the-bancroft-library-announces-new-head-of-technical-services/.
Christina V. Fidler is the Digital Archivist at the Bancroft Library. Prior to this role, she was the Museum Archivist at UC Berkeley’s Museum of Vertebrate Zoology. She has also worked at the California Academy of Sciences as the Digital Projects Coordinator and as the Project Manager for the Academy Library’s IMLS Grant "Connecting Content."

Mar 27: No Seminar. Spring break.

Apr 3: Michael BUCKLAND: 101 Years of Women.
There is a campuswide project celebrating 150 years of women on the Berkeley campus. So what about the history and heritage of women in this School? There has always been more women than men. In every case there is a human interest story as well as a professional or academic record. If we wanted to know more about them as individuals or as a group how could we find out? I will combine brief accounts of some individual women in the School's past with discussion of sources and of how this history and heritage could be made more accessible. Ppt.

Apr 10: Deirdre MULLIGAN: Covid-19 Surveillance.
Everyday a new government or private sector initiative that uses trace data from social or cellphone networks to address COVID-19 in real time, or to better understand its spread or the impact of policy decisions to date is announced. There are ongoing debates about the desirability, legality, and disparate impact of various efforts. In this discussion I would like to step back and consider this as an example of an emerging private sector research infrastructure--first really brought to light by the Facebook emotional contagion study--and consider its normative and practical implications for the research community and the public. Join us for a conversation.

Apr 17: Jeffrey MACKIE-MASON and Rachael SAMBERG: UC Berkeley's Digital Lifecycle Program: Mass Digitization of Special Collections for Use and Preservation.
To join the Seminar go to the School's event announcement for it.
Nearly all of UC Berkeley's 13 million circulating volumes are digitized and held for public use by the HathiTrust. However, Berkeley also has vast rare and special historical collections, most of which have not been digitized. By rough count, we have about 5 million pages digitized so far, with at least 200 million to go. To pursue making (almost) all of these collections easily accessible by anyone, anytime, from anywhere -- and to ensure that our digital collections are effectively usable today and preserved for future generations -- we have launched the Digital Lifecycle Program.
We will give a brief overview of the history and key architectural elements of the Program. We will then discuss in some depth one of the special challenges for mass digitization of special collections that Berkeley has tackled on behalf of institutions everywhere: protocols and workflows for responsible access. We address efficient, thoughtful and responsive treatment of four issues: copyright restrictions, privacy rights, ethical concerns, and contractual (gift agreement) restrictions.
Jeffrey MacKie-Mason is University Librarian, Professor School of Information, and Professor of Economics. More at https://jeff-mason.com/.
Rachael Samberg is Scholarly Communications Officer, UC Berkeley Library. More at https://www.lib.berkeley.edu/scholarly-communication/about/team.

Apr 24: Seminar Project Reports: Ankit BANSAL and Chintan VYAS; Aobo LYU; Mekhola MUKHERJEE and Vikramank SINGH.
Ankit BANSAL and Chintan VYAS: AI-powered Digital Media Management and Analysis.
With the explosion of digital content in the form of audio, image and video files, the ability to understand the content of the media serves as the key for faster retrieval and deeper analysis. The team aims to create a cloud platform to achieve this objective. Our solution will use various forms of AI (Computer Vision, Speech to Text, NLP, etc) to create metadata that aids in enhanced lookup of the required information.
Aobo LYU: The Evolution of e-Commerce Platform Data Processing Activity.
This study conducts a case study on Taobao and Pinduoduo, two famous e-commerce platforms in China, and analyzes the related articles in the past 15 years in the three journals: Journal of Theoretical and Applied Electronic Commerce Research, International Journal of Electronic Commerce, and Electronic Commerce Research, summarizing the e-commerce platform's evolution trend in user information processing and application process, and predicts the future e-commerce platform user data utilization activities. The conclusion of evolution analysis is presented in the form of an evolution model, which includes three parts: data collection type, data analysis method and analysis result application.
Mekhola MUKHERJEE and Vikramank SINGH: Statistical Analysis of Human Procedural Learning.
The order in which learners are exposed to various modules and its effect on mastery and long term retention has been widely studied in education-based literature. Most of this experimentation has been in the field of mathematics. Our goal is to evaluate the effectiveness of blocked and interleaved tasks in a procedural language like coding. We also aim to examine whether there is a ‘best’ order within modules that can lead to the best learning outcomes and if we can find such an order with statistical methods.
Ankit Bansal, Chintan Vyas, Mekhola Mukherjee, and Vikramank Singh are second year MIMS students.
Aobo Lyu is an exchange student from China.

May 1: Wayne de FREMERY and Michael BUCKLAND: Copy Theory.
The history and theory of copying and of copies has attracted very little attention compared with writing, printing, telecommunications, and computing. That is unfortunate because copies and copying (which have the same root as copiousness!) have had a very large impact. Imagine adjusting to life without any copies.
Seemingly a simple and commonplace word, "copy" is like "information" and "relevance": The meaning is obvious until you try to be precise. "Copy" has multiple meanings; it becomes quite complex when analyzed; and its goes a long way towards providing a unifying theory for much of information science and technology.
Wayne de Fremery teaches Korean literature and bibliography at Sogang University in Seoul, Korea, where he develops new technologies for investigating Korean literature and documentary traditions, as well as information systems as cultural systems.

The Seminar will resume in the Fall Semester.

Fall 2019 schedule and summaries. Fall 2020 schedule and summaries.