School of Information
 Previously School of Library & Information Studies

 Friday Afternoon Seminar: Summaries.
  296a-1 Seminar: Information Access, Spring 2024.

Fridays 3-5. Everyone interested is welcome. Details are added as they become available.
In person, with also Zoom -- unless indicated otherwise. Campus policy requires all Zoom participants to sign into a Zoom account prior to joining meetings hosted by UC Berkeley. Face mask recommended but not required for in-person attendance. Zoom sessions are not recorded.
A link to each Seminar session is available only at the School's Seminar event listing: www.ischool.berkeley.edu/events/ias.
Schedule. Weekly mailing list.

Jan 19: Clifford LYNCH & Michael BUCKLAND: Introductions.
    Introduction to the Seminar. Expectations for registered students. Schedule highlights. Introduction of participants.
    Clifford LYNCH: Recent Developments.
   Relatively short summaries and comments on recent report releases and other publications, conference reports, etc.
    Other participants will also be encouraged to contribute.

Jan 26: Ron DAY, Indiana University: Auto-Documents and Documentality.
    Robert Pagès’ concept of the “auto-document” in his 1948 article, “Transformations documentaires et milieu culturel” [Documentary Transformations and Cultural Context] proposed an understanding of documents that depends on the “uniqueness” of a document. His article proposed a post-Otletian theory of documents in ways that are similar to a discussion of documents by Bernd Frohmann in 2012 with the concept of “documentality.” Pagès’ work in these regards can be read as foreshadowing a philosophy of Documentality. Further attention to Pagès work and to Frohmann’s works could result in new understandings of Suzanne Briet’s works, could illuminate other works and authors understood as belonging to Neo-documentation, and it could yield new understandings of documents and information from the perspective of Documentality as a new philosophy of information and documents.
    Pagès’ 1948 article is available with an introduction and an English translation at Proceeding from the Document Academy, vol 8, issue 1.
    Alumnus Ron Day, MLIS ''93, is a Professor at Indiana University, Bloomington, in the Dept. of Information and Library Science. He has written books and many articles on the theory and history of documentation and information. He has degrees in Philosophy and Comparative Literature. His principal books are The Modern Invention of Information (2001). Indexing it All: The Subject in the Age of Documentation, Information, and Data (2014). and Documentarity: Evidence, Ontology, and Inscription (MIT, 2019). More at https://roday.pages.iu.edu.

Feb 2: Michael BUCKLAND: How is a Document Relevant?
    The concept of relevance is central to information studies but has resisted clear analysis. Prior discussion at the Seminar led to (i) a narrow definition of relevance as transient, existing only during reasoning, and (ii) a linear model relating contexts, documents, document properties (aka affordances or powers), perceptions, and reasoning. I will discuss this model more fully with special attention to properties and reasoning. This builds on recent work with Wayne de Fremery.

Feb 9: Lidia UZIEL, UC Santa Barbara: The Open Book Collective: Sustainable Futures for Open Access Monographs.
    The Open Book Collective (OBC) is an international partnership and a collective of open access (OA) publishers, infrastructure providers, libraries, and other non-profit organizations. Its mission is to create a new OA book publishing ecosystem that is equitable, community-governed, and built on sustainable business models and community-owned infrastructures. The session will present the OBC's community-led governance structure and highlight the project's principal challenges in reshaping the larger open knowledge ecosystem. It will also include a discussion of strategic partnerships focused on amplifying bibliodiverse and equitable community-led approaches and expanding critical infrastructures for OA monograph publishing. More at /www.openbookcollective.org.
    Lidia Uziel is Associate University Librarian for Research Resources and Scholarly Communication at the University of California, Santa Barbara. She is the Chair of the Board of Stewards of the Open Book Collective and was actively involved in the Community-Led Open Publication Infrastructures for Monographs (COPIM) project. Prior to the University of California, Santa Barbara, Lidia held several leadership positions at Harvard and Yale University Libraries.
    Lidia holds a doctorate in Comparative Literature received in cotutelle from the University of Montreal and Jean Moulin Lyon 3 University, a Master in Comparative Literature from Jean Moulin Lyon 3 University, and a Master in Library and Information Science from the University of Montreal. Her current research is in digital knowledge management, including the intersection of scholarly communication, libraries, and digital humanities/computational projects.

Feb 16: Jevin WEST: Search Engines as Gates and Gateways to Misinformation.
    Search engines are indispensable tools for navigating our information worlds. They can prioritize authoritative sources and de-prioritize problematic content; they can label results and contextualize search headings; but they can also be gateways to misleading information obfuscated in ads and hard-to-debunk, video content. Given this potential, what are the effects of skewed or misleading query results? And do these misleading results alter collective perceptions of health, science, and political discourse? In this talk, I will explore these questions through two recent publications. In the first paper, we audit search results for misinformation during the 2020 U.S. election. In the second paper, we look at the impact of academic search engines and recommender systems on the construction of the scientific literature. I will also talk about next steps for this kind of research and how it can inform search literacy efforts.
    Jevin West is a visiting associate professor here and an Associate Professor in the Information School at the University of Washington (UW). He is the co-founder and the inaugural director of the Center for an Informed Public at UW, aimed at resisting strategic misinformation, promoting an informed society and strengthening democratic discourse. His research and teaching focus on the impact of data and technology on science, with a focus on slowing the spread of misinformation. He is the co-author of the book, Calling Bullshit: The Art of Skepticism in a Data-Driven World, which helps non-experts question numbers, data, and statistics without an advanced degree in data science. More at jevinwest.org/.

Feb 23: Coye CHESHIRE: Trustworthiness and Online Health Information.
    Accurate, appropriate, and trustworthy information is necessary to access relevant health services, to make informed health decisions, and to combat health disparities. I will discuss some of my recent work that examines trustworthiness of health information and information seeking in online environments, with particular attention to youth. I will also discuss some of my newest, in-progress efforts with public health and social welfare researchers as we explore the scope of these issues in popular social media platforms such as TikTok.

Mar 1: Remote presentation by Zoom, not in-person.
    Rob SANDERSON, Yale: Linked Data Enlightenment: Lessons Learnt from LUX.

    Over the past five years, Yale University has built a highly innovative discovery platform (LUX) that uses knowledge graph technologies to aggregate, reconcile, enrich and present all of the University’s cultural and natural history collections in a single environment. LUX advances the University’s core missions of teaching and research excellence, while still being easily accessible to the general public. Over the course of the design and implementation, several core principles of linked open data were called into question as to whether they are useful in practice or only in theory. This discussion will introduce LUX with a live demonstration of the system, and then delve into the lessons learnt around cross-collection, cross-domain and cross-institutional linked open data including the choice for a multi-modal database, the use of URIs and their persistence, and the details of entity reconciliation and record enrichment at scale.
    Presentation slides
    More at lux.collections.yale.edu/.
    Dr Robert Sanderson is Senior Director for Digital Cultural Heritage at Yale University, and works with Yale’s museums, libraries and archives to help them to be more connected and consistent in their processes and data. He is the technical architect and visionary for LUX, Yale’s cross-collection discovery platform built using the Linked Open Usable Data paradigm and technologies. He is chair of the Linked Art working group and long-standing editor for IIIF specifications, and has been co-chair and editor of foundational W3C specifications. He has previously worked at the Getty in Los Angeles, Stanford University and Los Alamos National Laboratory.

Mar 8: Javier CHA, Hong Kong University: Future-Proofing the Past: Big Data and the Transformation of Historical Practice.
    How will future historians study the 2020s? The total amount of data estimated to have been generated by the digital revolution until 2020 dwarfs what historians have traditionally encountered. However, the challenge for the historian is not only due to information overload, but also difficulties in how to access big data as an archive, such as bit rot, sharding, replication, (in)compatibility, encryption, and the physical presence of digital information in the form of data centers and global communications infrastructure. This new reality prompts the need to rethink the established approaches to digital history, which, while innovative, are designed for converting documents to digital media and applying quantitative methods to sub-1.0 gigabyte data sets. As today's born-digital artifacts are vast, dynamic, and heterogeneous, research and training in the nature of big data from a historian's perspective are a necessity, not an option. In this talk, I will present the fruits of five years of experimental research at the Big Data Studies Lab, where we investigate the preservation, authentication, energy demands, and societal implications of big data. Our approach is inspired by how book historians examine parchment, paper, ink, printing, and circulation, but in the context of solid-state drives, 5D optical discs, and content-delivery networks.
    Javier Cha is a digital historian and medievalist who specializes in the intellectual and cultural history of East Asia. He divides his research time between translations of essays written in classical Chinese into English, the application of graph database technology in historical scholarship, and experimental humanities projects that address the challenges posed by big data and artificial intelligence. Assistant Professor of Digital Humanities in the Department of History at the University of Hong Kong and Principal Investigator of the Big Data Studies Lab bigdatastudies.net/. More a https://javiercha.com/.

Mar 15: Christine BORGMAN, UCLA: From Data Creator to Data Reuser: Distance Matters.
    Sharing research data is complex, labor-intensive, expensive, and requires infrastructure investments by multiple stakeholders. Open science policies focus on data release, yet reuse is also difficult and may never occur. Investments in data management could be made more wisely by considering who might reuse data, how, why, for what purposes, and when. Drawing upon empirical studies of data sharing and reuse, we develop the theoretical construct of distance between data creator and data reuser, identifying six distance dimensions that influence the ability to transfer knowledge effectively: domain, methods, collaboration, curation, purposes, and time and temporality. These dimensions are primarily social in character, with associated technical aspects that can decrease – or increase – distances between creators and reusers. We identify ways that data creators, data reusers, data archivists, and funding agencies can make data sharing and reuse more effective.
    This reports joint work with Paul Groth, head of the Intelligent Data Engineering Lab at the University of Amsterdam.
    See Borgman, C. L., & Groth, P. T. (2024). From Data Creator to Data Reuser: Distance Matters (arXiv:2402.07926). arXiv. arxiv.org/pdf/2402.07926.pdf.
    Christine L. Borgman is Distinguished Research Professor in Information Studies at the University of California, Los Angeles. Her publications in information studies, data and computer science, communication, and law include three award-winning books from MIT Press. A Fellow of the American Association for the Advancement of Science and the Association for Computing Machinery, she has held visiting posts at Oxford, Harvard, and several European institutions. Her current work in knowledge infrastructures focuses on data sharing and reuse.

Mar 22: Howard BESSER and AnnaLee SAXENIAN.
    Howard BESSER: Preserving Digital Images and Data: Procedural, Policy, and Privacy Issues.
    21st century media pose challenges to preserving the historical record. Collecting institutions need guidance and new strategies in order to save selective cellphone video, GPS data, and video from surveillance cameras, drones, and police bodycams. In this talk, Howard Besser will discuss how saving this type of material poses procedural, policy, and privacy issues. He will demonstrate the ongoing tension between preservation and privacy. The presentation will include a case-study of preserving cellphone videos from the Occupy Movement, and a close look at police body cam videos.
    Howard Besser, PhD '88, is Professor Emeritus at New York University where he founded the MA program in Moving Image Archiving and Preservation. He has over 30 years of experience as an Information Studies educator, and has published scores of articles and conducted scores of workshops. In 2009 he was named to Library of Congress's select list of "Pioneers of Digital Preservation". He has taught courses in digital preservation and in surveillance video. He designed the “Policy” course for the Society of American Archivists Digital Archives Specialist Certificate Program. Besser is co-founder of the Library Freedom Institute, a nationwide project to train “Privacy Advocates” to teach digital privacy skills and advocate for enlightened privacy policies. Howard is an MLIS and PhD graduate of this School.
    Also 4:10 pm: AnnaLee SAXENIAN: Rethinking Antitrust for the Cloud Era.
    Antitrust is at the top of policy agendas globally. This talk will offer lessons for US antitrust policy from an examination of the development of infrastructure for data processing in the cloud. See Gerry Berk and AnnaLee Saxenian, “Rethinking Antitrust for the Cloud Era” Politics and Society, (51) 3, 2023. (UC Berkeley Library provides online access for those with campus library privileges.)

Mar 29: No Seminar. Midsemester Break.

Apr 5: Anushah HOSSAIN, Stanford: How to Type Garbage: A History of Text Standards for the ‘Rest of World’.
    Today we can send and receive messages in most of the world’s major writing systems, but this was not always the case. This talk walks through the development of core standards in our modern day text stack that enable multilingual digital communication: the Unicode Standard, OpenType font format, and different rendering and layout software. Together we’ll consider what values and scripts were privileged, and for whom writing online became an arduous task.
    Anushah Hossain studies the history and cultures surrounding internet standards. Currently, she is a postdoctoral fellow at the Digital Civil Society Lab at Stanford University. Prior to that, she completed her PhD at UC Berkeley in the interdisciplinary Energy and Resources Group. Anushah will be returning to Berkeley in Fall 2024 to lead the Script Encoding Initiative, a project aimed at helping historic and minority scripts gain a digital footing.

Apr 12: David H. S. ROSENTHAL: Decentralized Systems Aren't.
    Decentralized systems have many advantages over centralized ones. They can be more resilient to failures and attacks, and can scale better. But they have three major problems; these advantages come with significant additional monetary and operational costs, the user experience is worse, and they exhibit emergent behaviors that drive centralization. The world has been on a decades-long series of experiments trying to build successful decentralized systems marked almost entirely by failure.
    David S. H. Rosenthal is retired from Stanford University Libraries. He was a team member of C-MU's "Andrew Project"; an early employee and Distinguished Engineer at Sun Microsystems; Employee #4, first Chief Scientist, and first sysadmin at Nvidia; and Co-founder 20 years ago of the LOCKSS Program. He has been an active blogger for many years.

Apr 19: Michael BUCKLAND: Belief and Misbelief in Context.
    The genesis and spread of belief and misbelief can be represented by extending an explanatory model of relationships between contexts, documents, relevance and communication. This continues the February 2 Seminar discussion.

Apr 26: Last scheduled seminar session of the semester.
    Clifford LYNCH: Early Thoughts on AI-based Systems and Stewardship of the Evolving Cultural Record.

    AI-based systems, including generative AI systems that embed foundation models (LLMs), have captured the public attention, and by virtue of this alone require consideration from the perspective of preservation and stewardship. Beyond this, we are seeing many high-impact and large scale systems that embed AI technology deployed on a steady basis. Overall, little consideration has been given to these developments in terms of what they may represent for the digital cultural record; indeed, it's not at all clear what problems we want to solve or what are likely to be able to solve.
    There are perhaps useful parallels and contrasts with the problems involved in thinking about the role of social media platforms as part of the digital cultural record, and I'll consider some of these. This is very preliminary work on a topic that seems to have received little attention to date. Discussion and pointers to relevant work are very welcome.

    The Seminar will resume in the Fall semester.
Fall 2023 schedule and summaries.