296a-1 Seminar on Information Access Summaries Spring 2018

School of Information
Previously School of Library & Information Studies

Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2018.

Fridays 3-5. 107 South Hall. Schedule. Weekly mailing list.

Details will be added as they become available.

Jan 19: Howard BESSER, New York University: Making Web Archiving Work for Streaming Media; and Digital Privacy Training Project.
Making Web Archiving Work for Streaming Media: Archiving the Websites of Contemporary Young Composers.
Web Archiving software is notoriously deficient at capturing streaming media. For the past two years New York University Libraries has been working with the Internet Archive to replace the ubiquitous Heritrix web crawler with one that can better capture streaming audio and video. With funding from the Andrew W. Mellon Foundation they have both created a new Crawler (Browsler) and tested this within the context of archiving the websites of contemporary young composers (showing how early-career composers represent themselves with a web presence).
This presentation will examine the deficiencies in current web crawlers for handling streaming media and presenting it in context, and explain how Browsler addresses those deficiencies. It will also explain the project to archive composer websites, touching on everything from contractual arrangements with the composers, to tying together various NYU Library tools (ArchiveSpace, Archivmatica, repository) with the Internet Archive’s Archive-It, to assessing researcher satisfaction with the result. It will also cover the combinations of automated and manual methods for archiving composer websites.
Digital Privacy Training Project
Digital privacy is under constant threats from hackers, governments, and corporate entities. Most individuals are relatively naïve about how to protect their personal privacy. This talk reports on a project to create and train a set of 40 digital privacy advocates. Primarily drawn from geographically dispersed Public Libraries, the advocates will conduct local digital privacy workshops, proactively liaise with community groups (particularly with frequently targeted communities such as seniors and immigrant groups), advise on local privacy concerns, become public policy advocates, and turn their own libraries into privacy centers. As project funding only began last month, this Talk will focus on end goals, recruitment, and planned curriculum and delivery methods for the 6 month training program.
Howard Besser is Founding Director of New York University’s Moving Image Archiving & Preservation MA Program, Professor of Cinema Studies, and Senior Scientist for Digital Initiatives for NYU’s Library. He has published over 50 articles and book chapters on information environments in libraries, museums, and archives; has been involved in the creation of a wide variety of standards (Dublin Core, METS, PREMIS, Z39.87); and is one of the Library of Congress’ Digital preservation Pioneers. He has both a PhD and MLIS from South Hall.

Jan 26: Wayne de FREMERY, Sogang University, Korea: Documenting the 99%.
Those attempting to assess non-standard documentary forms are at a disadvantage in curating data for large-scale, computer-aided analysis. New tools and methods are needed for enabling communities around the world to better curate, assess, share, preserve, and make use of their historically and culturally specific documentary traditions. Addressing broad topics in bibliography and document studies while focusing specifically on historical, literary, and cultural information difficult to extract from image data, I introduce new technologies and methodologies for documenting the 99% of documents for which optical character recognition (OCR) and other automated systems cannot provide accurate descriptions and/or transcriptions.
Wayne de Fremery teaches Korean literature at Sogang University in Seoul where he develops new technologies for investigating Korean literature and documentary traditions, as well as information systems as cultural systems.

Feb 2: Anno SAXENIAN: Issues and Opportunities facing the School.
Dean Saxenian will comment on some of the challenges and opportunities facing the campus and the School.

Feb 9: Marcia BATES, UCLA: Information and Embodiment.
Looking for and gathering information is usually viewed as a primarily cognitive process. But information is absorbed and processed through the body as well. A fuller understanding of human interaction with information requires the integration of a sense of physical embodiment as well. The study of embodiment has infused anthropology, psychology, and biology in recent years. In the paper, portions of this research are brought to bear on the study of human information seeking and use. Topics addressed briefly in a targeted way include: Information and Survival, the Law of Requisite Variety, Information Literacy, Ecological Psychology, the New Unconscious, Grounded and Embodied Cognition, the Extended Phenotype, Niche Construction, and Cognitive Assemblages. In the talk, some of these topics will be used to exemplify the embodiment approach.
Marcia J. Bates is Professor Emerita in the University of California at Los Angeles (UCLA) Department of Information Studies. A Fellow of the American Association for the Advancement of Science, she is a leading authority on information search, human-centered design of information systems, and information practices. She was Editor-in-Chief of the 7-volume Encyclopedia of Library and Information Sciences, 3rd ed., and is the recipient of many awards for her research and leadership. In addition to her teaching and scholarship, she has been a technical consultant to numerous organizations in government, foundations, and businesses, including technology startups. A graduate of Pomona College (B.A.) and of this School (M.L.S. '67, Ph.D. '72), Bates also served in Thailand in the Peace Corps. More at pages.gseis.ucla.edu/faculty/bates.

Feb 16: Elaine SEDENBERG: Study of Private Sector Research and Data Sharing Practices.
Activities within companies in the areas of artificial intelligence, machine learning, and behavioral analytics implies new trends in internal user research, but lacks empirical study on how these new roles are developing and impacting the existing research and development (R&D) ecosystem. Data scientists and social scientists employed by tech companies are building research divisions where the line between practice (direct product improvement) and research may be blurry. Further, these private sector data are of rich interest to academics, and gaining access for a study is an exception instead of the rule. These shifts in sites of research and sources of large repositories may challenge fundamental tenets of the research ecosystem such as the free flow of information via publications and presentations, access to data for validation, and ethics frameworks dependent upon the academic model. Additionally there is little understanding of public attitudes regarding corporate research use of user data, and how these evolving norms may influence the acceptability of particular practices. This talk will cover preliminary findings from an ongoing dissertation study using a mixed method approach of both qualitative interviews with research practitioners and quantitate survey work of user attitudes. The aim of this dissertation is to inform national and corporate R&D policies.
Elaine Sedenberg is a PhD Candidate at the UC Berkeley School of Information, and Co-Director of the Center for Technology, Society & Policy (CTSP). Previously she was a Science Policy Fellow at the Science and Technology Policy Institute (STPI) in Washington DC, and has many years experience working in federal S&T policy as well as technology transfer activities.

Feb 23: Joshua BLUMENSTOCK: Migration and the Value of Social Networks.
How does the structure of an individual's social network affect his or her decision to migrate? Economic theory suggests two prominent mechanisms -- as conduits of information about jobs, and as a safety net of social support -- that have historically been difficult to differentiate. We bring a rich new dataset to bear on this question, which allows us to adjudicate between these two mechanisms and add considerable nuance to the debate. Using the universe of mobile phone records of an entire country over a period of four years, we first characterize the migration decisions of millions of individuals with extremely granular quantitative detail. We then use the data to reconstruct the complete social network of each person in the months before and after migration, and show how migration decisions relate to the size and structure of the migrant's social network. We use these stylized results to develop and estimate a structural model of network utility, and find that the average migrant benefits more from networks that provide social support than networks that efficiently transmit information. Finally, we show that this average effect masks considerable heterogeneity in how different types of migrants derive value from their social networks.
Joshua Blumenstock, PhD 2012, is Assistant Professor and specializes in Development Economics, Data Science, Econometrics, Machine Learning, and ICTD. His work focuses on using novel data and methods to better understand the economic lives of the poor. Most projects are based in developing and conflict-affected countries. More at www.jblumenstock.com/

Mar 2: Anushah HOSSAIN, Ankeet SHANKAR, and Michael BUCKLAND: Progress reports.
Anushah HOSSAIN: Who are the humans kept out by CAPTCHAs?
Are web services provided by global companies equally accessible to internet users across the world? Anecdotal evidence suggests that there are significant barriers to viewing and using websites in developing regions. IP addresses from entire countries are known to be blocked by certain websites and important page elements such as CAPTCHA often fail to appear or lock users in endless loops due to misclassification as bots. Despite widespread experience with these barriers, there is little systematic documentation of the extent to which they affect users. I investigate a single page element - the CAPTCHA challenge - and its performance across different network conditions. To what extent do network strength and location affect one’s classification as a bot or human? What services are most restricted and where does this burden fall globally? I invite feedback on my methods and framing for this early stage project.
Anushah Hossain is an MA-PhD student in the Energy and Resources Group at UC Berkeley. She previously worked as a survey researcher for non-profit and government organizations such as the Gates Foundation, USAID, and EPA, studying the usage and impacts of technologies in developing regions. Her current research focuses on differential access to information and communication technologies.
Ankeet SHANKAR: PrivSec-F1: Compliance Focused Toolkit.
PrivSec-F1 attempts to incorporate various legal and regulatory requirements for Product Managers, Chief Information Officers, and Chief Information Security Officers of mid and small size firms who often, due to budget constraints, do not have in their organizational structure cybersecurity or legal experts. Furthermore, the proposed product will incorporate cybersecurity requirements from the Federal Trade Commission (“FTC”) as well as highlight “soft law” privacy issues that could raise consumer ire yet still be legal, as classified in Bert Jaap Koops publication “A Typology of Privacy”. Other notable frameworks which we propose be included are Cloud Security Alliance (“CSA”) Requirements, Payment Card Industry Data Security Standard (“PCI-DSS”) requirements, International Organization for Standardization(“ISO”), National Institute of Standards and Technology (“NIST”) to name a few.
Ankeet Shankar is a second year MIMS student focusing on CyberSecurity and Privacy. He has extensive prior experience in the field of Information Technology as a Management Consultant, with a specialized focus on IT risk management and mitigation,IT Strategy, outsourcing vendor audits, vulnerability assessments, penetration testing.
Michael BUCKLAND: Unified Search Support for All Kinds of Information.
Bibliographies support search and discovery in published literature, but there is no generally agreed concept of bibliography, which has been used in three ways: for the relationship of publications to knowledge; for the listing of publications; and for the physical examination of books. How might one expand (or replace) bibliography to include search and discovery of a wider range of resources including but not limited to publications?

Mar 9: Two Sessions.
1:30 p.m.: Clifford LYNCH: Stewardship of the Cultural Record: How do we approximate cultural "production".
As I have outlined earlier, a prerequisite to an effective program of cultural stewardship is making some reasonable estimates about the incremental growth of the cultural record in various genres year by year, and then trying to measure the proportion of that material that is being taken care of by memory organizations. For many classes of material (e.g. books, recorded music, films) this was relatively easy in say 1970, because the means of production were heavily centralized; there were certainly various cases around the margins which were problematic, but they were around the margins. Today, this is not the case. In all of the genres mentioned, and many more, we face very hard problems even defining what represented the universe that might legitimately claim stewardship attention and resources. This session will be an outline of some of these dilemmas and a discussion of how they might be approached in the present day

3:00 p.m. Stephen ABRAMS, California Digital Library: The Means Don’t Quantify the Ends: Criteria and Metrics for Evaluating Digital Preservation Success. During a workshop at the 2006 Joint Conference on Digital Libraries, Clifford Lynch stated that digital preservation was “a metric that’s defied measurement.” Unfortunately, little progress has made since then. The scholarly literature and professional practice have focused extensively on the trustworthiness of preservation programs and systems, but given little attention to the question of the success of the resulting preservation outcomes. While there is broad consensus in the preservation field about what to do and how to do it, there is no such agreement about effective measures of how well it has actually been done. But without a clear sense of what constitutes success, how can one rationally plan for, expect, measure, or be held accountable for those outcomes? This presentation will focus on work towards measurable metrics for digital preservation. The goal of digital preservation is often stated as ensuring ongoing access to and use of preserved resources. But the assessment of use is a slippery notion, as it is inextricably tied to a particular time, place, person, and manner; one person’s success could quite easily be another’s failure. Too often the field tacitly assumes that trustworthy means will necessarily lead to successful ends, but those ends should be independently evaluated. Beyond being a problem of appropriate data management, digital preservation should be seen more broadly as a problem of human communication across time, with an understanding that temporal distance implicates concomitant cultural distance. In these terms, the entire preservation enterprise – embracing the production and consumption, as well as management of digital resources – is susceptible to communicological analysis, leading towards a semiotic-phenomenological model for preservation-enabled communication. The granular components of that model can then be used to derive appropriate criteria and metrics for evaluating digital preservation success.
Stephen Abrams is Associate Director of the California Digital Library and responsible responsible for strategic planning, innovation, and technical oversight of UC3’s services, systems, and initiatives in preservation, data curation, data management planning, and web archiving. More at www.cdlib.org/contact/staff_directory/sabrams.html.

Mar 16: Chris HOOFNAGLE & Aniket KESARI: Consumer Law for the 21st Century.
Imagine a future where every purchase decision is complex as choosing a mobile phone. Will it have coverage at home and work? What will ongoing service cost? How long will the device last? Can I and may I switch providers? These are just some of the questions one must consider when a product is “tethered”--linked to the seller in an ongoing way. The Internet of Things, but more broadly, consumer products with embedded software, will be tethered, with many implications for the consumer-seller relationship. Tethered products blur the lines between goods and services by incorporating elements of physical goods, digital goods, and digital services.
The promise of new functionalities will bring consumers many benefits and consumers will want tethered products. Our project seeks to predict the pathologies that will arise from tethered products by culling examples from recent seller/consumer conflicts and by mapping out the microeconomic dynamics of tethered products. We then rethink consumer law approaches to maximize healthy competition in a tethered environment.
Chris Jay Hoofnagle is adjunct professor of information and of law at UC Berkeley. He is the author of Federal Trade Commission Privacy Law & Policy, a history of the FTC’s consumer protection efforts. He is an elected member of the American Law Institute and is a strategic and legal advisor to companies in cybersecurity and emerging technology fields. For more see hoofnagle.berkeley.edu/.
Aniket Kesari is a PhD Student at UC Berkeley's Jurisprudence & Social Policy program, and will be starting his JD at Yale Law in Fall 2018. He specializes in law & economics, with research interests that lie in technology law, data science, and public policy. He is currently working on projects related to digital privacy, innovation, and consumer protection. For more see https://goo.gl/J5NG7t.

Mar 23: No seminar meeting.

Mar 30: Spring break. No seminar meeting.

Apr 6: Catherine MARSHALL: How Digital Libraries, Social Media, and Collections of Ephemera Will Change the Practice of Biographical Research. Biography is a literary form that intertwines history and identity. Modern biographies have relied extensively on journalistic interviews and human memory to supplement contemporaneous sources of primary material such as letters, literary drafts, photos, and journals and notebooks, usually held as physical special collections in research libraries. I use the construction of a subject-driven collection of over 11,000 discrete digital items as a case study to demonstrate how new digital resources can extend the breadth and depth of biographical description, facilitate the rediscovery of a subject’s social network, and enable formerly invisible literary influences to be foregrounded. I also explore the implications of the use of ephemera in tandem with other digital resources, both to ask what we might want to save in the future, and to begin a discussion of the trade-offs inherent in making material that was once ephemeral (and difficult to access) so readily available online.
Catherine Marshall is a San Francisco-based adjunct professor in the Computer Science Department at Texas A&M University. She was formerly a principal researcher at Microsoft Research Silicon Valley. More at www.csdl.tamu.edu/~marshall/.

Apr 13: Ankeet SHANKAR and Anushah HOSSAIN.
Ankeet SHANKAR: PrivSec-F1: Compliance Focused Toolkit.
Based on the review and feedback provided during the last progress report presented on March 2, a subset of the report and key findings will be presented. This progress report will be focused on the legal aspects of PrivSec-F1 pertaining to soft requirements and guidance provided by Federal Trade Commission (FTC) for businesses. This will focus on the key lessons learnt from over 50 FTC cases and complaints and a subsequent trend analysis and key learning which companies/start-ups may utilize for keeping their data safe.
Ankeet Shankar is a second year MIMS student focusing on CyberSecurity and Privacy. He has extensive prior experience in the field of Information Technology as a Management Consultant, with a specialized focus on IT risk management and mitigation,IT Strategy, outsourcing vendor audits, vulnerability assessments, penetration testing.
Anushah HOSSAIN: Using CAPTCHAs to Measure Internet Fragmentation.
In 2015, the World Economic Forum (WEF) convened a session on ‘Keeping ‘Worldwide’ in the Web’ following increasing concern of internet fragmentation. Commercial, political, and technical interests have manifest as unequal quality and distribution of infrastructure, content censorship, data localization policies, and other forms of closure. The subsequent WEF report published in 2016 emphasized the difficulty and the need to summarize the scope and nature of internet fragmentation. What are the lines of fissure (geographic, cultural, or otherwise), and what are the consequences of fragmentation?
This early stage project outlines one entry point to tracking internet fragmentation. I propose studying a single page element – the CAPTCHA challenge – as a way of understanding the tensions in designing both a secure and usable Web. Based on a literature review and document analysis, I summarize recent trends in CAPTCHA delivery, profiles of systematically disadvantaged users, and next steps for data gathering and framing.
Anushah Hossain is an MA-PhD student in the Energy and Resources Group at UC Berkeley. She previously worked as a survey researcher for non-profit and government organizations such as the Gates Foundation, USAID, and EPA, studying the usage and impacts of technologies in developing regions. Her current research focuses on differential access to information and communication technologies.

Apr 20: Deirdre MULLIGAN and Daniel S. GRIFFIN: Reasons and Rights to be Brave: Rescripting Search to Respect the Right to Truth.
What is the function of Google search? Breakdowns, as others have noted provide an opportunity to understand a technical artifact’s function. They allow us to explore the mismatch between the roles and expectations prescribed to and demanded of the users by the designers and the behavior and expectations of actual users. This talk explores a particular breakdown to understand what it tells us about how Google imagines its users, and how users imagine Google search, and how those competing imaginaries contribute to the definition and perception of failure. Through a close examination of one construed failure we identify a particular responsibility—to respect the collective right to truth rooted in the growing expectation on businesses to respect human rights—that Google failed to enact. We then explore how portrayals of search, and the norms of search engineers discourage Google from acting to respect the right to truth--despite is deep entanglement with the function both Google and its users ascribe to search--while at the same time deflecting work to protect the right to truth onto other participants in the script of search. We leverage Akrich’s de-scription to unpack the breakdown of search’s script, and reveal the potential space for Google to act and perform differently. We then bring the discussion together in connecting the normative call for action with normative concerns about the manner of action, and offer a framework rooted in human rights to guide a rescripting of search that aligns social expectations of Google’s responsibility to respect human rights in the context search, and protects against the slippery slope of moderation content platforms fear.
Deirdre Mulligan is an Associate Professor in the School of Information at UC Berkeley, a faculty Director of the Berkeley Center for Law & Technology, and an affiliated faculty on the Hewlett funded Berkeley Center for Long-Term Cybersecurity. Mulligan’s research explores legal and technical means of protecting values such as privacy, freedom of expression, and fairness in emerging technical systems. Her book, Privacy on the Ground: Driving Corporate Behavior in the United States and Europe, a study of privacy practices in large corporations in five countries, conducted with UC Berkeley Law Prof. Kenneth Bamberger was recently published by MIT Press. More at www.ischool.berkeley.edu/people/deirdre-mulligan#profile-main.
Daniel Griffin is a PhD student at the School of Information interested in questions connecting search and misinformation. He is a co-director of the Center for Technology, Society & Policy. More at danielsgriffin.com/.

Apr 27: Michael BUCKLAND: Search Support for All Kinds of Evidence.
Bibliographical techniques to provide sophisticated access to printed publications were developed long ago. Access to other resources has been slower to develop. Bibliographer Donald F. McKenzie wrote that bibliography should be extended to all media including culturally significant landscapes and librarian Suzanne Briet asserted that an antelope in a cage could be a document. But these materials are beyond the scope of established bibliographical practice. So how might a unified framework for the description, indexing, and discovery of all kinds if evidence resources be developed? What are the practical and theoretical consequences of extending (or replacing) bibliography to include antelopes, landscapes, and any other kinds of evidence?
We approach this challenge by examining the history and affordances of bibliographies; adopting the perspective of the person becoming informed; preferring description, archetypes and similarities to definitions, distinctions and dichotomies; and seeking a generalized approach to reference and reference works.

May 4: Sebastian BENTHALL: Context, Causality, and Information Flow: Implications for Privacy Engineering, Security, and Data Economics.
The creators of technical infrastructure are under social and legal pressure to comply with expectations that can be difficult to translate into computational and business logics. The dissertation presented in this talk bridges this gap through three projects that focus on privacy engineering, information security, and data economics, respectively. These projects culminate in a new formal method for evaluating the strategic and tactical value of data. This method relies on a core theoretical contribution building on the work of Shannon, Dretske, Pearl, Koller, and Nissenbaum: a definition of information flow as a channel situated in a context of causal relations.
Sebastian Benthall is a security scientist working at the intersection of computer science, economics, law, and philosophy. He is a Research Fellow at the Digital Life Initiative at Cornell Tech and a Data Scientist for Ion Channel, a Washington, DC based cybersecurity company. Before becoming a scientist, Sebastian managed the development of spatial data infrastructure for global coordination around disaster risk reduction. He holds a B.A. in Cognitive Science from Brown University and is completing his PhD at UC Berkeley's School of Information.

The Seminar will resume in the Fall semester.

Fall 2017 schedule. summaries.