School of
Information
Previously School of Library & Information Studies
Friday Afternoon Seminar: Summaries.
296a-1 Seminar: Information Access, Spring 2018.
Fridays 3-5. 107 South Hall.
Schedule. Weekly
mailing list.
Details will be added as they become available.
Jan 19: Howard BESSER, New York University: Making Web Archiving Work for
Streaming Media; and Digital Privacy Training Project.
Making Web Archiving Work for Streaming Media: Archiving
the Websites of Contemporary Young Composers.
Web Archiving software is notoriously deficient at capturing
streaming media. For the past two years New York University Libraries has
been working with the Internet Archive to replace the ubiquitous Heritrix
web crawler with one that can better capture streaming audio and video.
With funding from the Andrew W. Mellon Foundation they have both created
a new Crawler (Browsler) and tested this within the context of archiving
the websites of contemporary young composers (showing how early-career
composers represent themselves with a web presence).
This presentation will examine the deficiencies in current
web crawlers for handling streaming media and presenting it in context,
and explain how Browsler addresses those deficiencies. It will also
explain the project to archive composer websites, touching on everything
from contractual arrangements with the composers, to tying together
various NYU Library tools (ArchiveSpace, Archivmatica, repository)
with the Internet Archive’s Archive-It, to assessing researcher
satisfaction with the result. It will also cover the combinations of
automated and manual methods for archiving composer websites.
Digital Privacy Training Project
Digital privacy is under constant threats from hackers,
governments, and corporate entities. Most individuals are relatively
naïve about how to protect their personal privacy. This talk reports
on a project to create and train a set of 40 digital privacy advocates.
Primarily drawn from geographically dispersed Public Libraries, the
advocates will conduct local digital privacy workshops, proactively
liaise with community groups (particularly with frequently targeted
communities such as seniors and immigrant groups), advise on local
privacy concerns, become public policy advocates, and turn their own
libraries into privacy centers. As project funding only began last
month, this Talk will focus on end goals, recruitment, and planned
curriculum and delivery methods for the 6 month training program.
Howard Besser is Founding Director of New York
University’s Moving
Image Archiving & Preservation MA Program, Professor of Cinema Studies,
and Senior Scientist for Digital Initiatives for NYU’s Library. He has
published over 50 articles and book chapters on information environments
in libraries, museums, and archives; has been involved in the creation
of a wide variety of standards (Dublin Core, METS, PREMIS, Z39.87); and
is one of the Library of Congress’ Digital preservation Pioneers. He has
both a PhD and MLIS from South Hall.
Jan 26: Wayne de FREMERY, Sogang University, Korea:
Documenting the 99%.
Those attempting to assess non-standard documentary forms
are at a disadvantage in curating data for large-scale, computer-aided
analysis. New tools and methods are needed for enabling communities
around the world to better curate, assess, share, preserve, and make
use of their historically and culturally specific documentary traditions.
Addressing broad topics in bibliography and document studies while
focusing specifically on historical, literary, and cultural information
difficult to extract from image data, I introduce new technologies and
methodologies for documenting the 99% of documents for which optical
character recognition (OCR) and other automated systems cannot provide
accurate descriptions and/or transcriptions.
Wayne de Fremery teaches Korean literature at Sogang
University in Seoul where he develops new technologies for investigating
Korean literature and documentary traditions, as well as information
systems as cultural systems.
Feb 2: Anno SAXENIAN: Issues and Opportunities facing the School.
Dean Saxenian will comment on some of the challenges and
opportunities facing the campus and the School.
Feb 9: Marcia BATES, UCLA: Information and Embodiment.
Looking for and gathering information is usually viewed
as a primarily cognitive process. But information is absorbed and
processed through the body as well. A fuller understanding of human
interaction with information requires the integration of a sense of
physical embodiment as well. The study of embodiment has infused
anthropology, psychology, and biology in recent years. In the paper,
portions of this research are brought to bear on the study of human
information seeking and use. Topics addressed briefly in a targeted way
include: Information and Survival, the Law of Requisite Variety,
Information Literacy, Ecological Psychology, the New Unconscious,
Grounded and Embodied Cognition, the Extended Phenotype, Niche Construction,
and Cognitive Assemblages. In the talk, some of these topics will be
used to exemplify the embodiment approach.
Marcia J. Bates is Professor Emerita in the University
of California at Los Angeles (UCLA) Department of Information Studies.
A Fellow of the American Association for the Advancement of Science, she
is a leading authority on information search, human-centered design of
information systems, and information practices. She was Editor-in-Chief
of the 7-volume Encyclopedia of Library and Information Sciences,
3rd ed., and is the recipient of many awards for her research and
leadership. In addition to her teaching and scholarship, she has been a
technical consultant to numerous organizations in government, foundations,
and businesses, including technology startups. A graduate of Pomona College
(B.A.) and of this School (M.L.S. '67, Ph.D. '72), Bates also served in
Thailand in the Peace Corps. More at pages.gseis.ucla.edu/faculty/bates.
Feb 16: Elaine SEDENBERG: Study of Private Sector Research and Data
Sharing Practices.
Activities within companies in the areas of artificial
intelligence, machine learning, and behavioral analytics implies new
trends in internal user research, but lacks empirical study on how these
new roles are developing and impacting the existing research and
development (R&D) ecosystem. Data scientists and social scientists
employed by tech companies are building research divisions where the line
between practice (direct product improvement) and research may be blurry.
Further, these private sector data are of rich interest to academics,
and gaining access for a study is an exception instead of the rule. These
shifts in sites of research and sources of large repositories may
challenge fundamental tenets of the research ecosystem such as the free
flow of information via publications and presentations, access to data for
validation, and ethics frameworks dependent upon the academic model.
Additionally there is little understanding of public attitudes regarding
corporate research use of user data, and how these evolving norms may
influence the acceptability of particular practices. This talk will cover
preliminary findings from an ongoing dissertation study using a mixed
method approach of both qualitative interviews with research practitioners
and quantitate survey work of user attitudes. The aim of this dissertation
is to inform national and corporate R&D policies.
Elaine Sedenberg is a PhD Candidate at the UC
Berkeley School of Information, and Co-Director of the Center for
Technology, Society & Policy (CTSP). Previously she was a Science Policy
Fellow at the Science and Technology Policy Institute (STPI) in Washington
DC, and has many years experience working in federal S&T policy as well
as technology transfer activities.
Feb 23: Joshua BLUMENSTOCK: Migration and the Value of Social Networks.
How does the structure of an individual's social network
affect his or her decision to migrate? Economic theory suggests two
prominent mechanisms -- as conduits of information about jobs, and as a
safety net of social support -- that have historically been difficult to
differentiate. We bring a rich new dataset to bear on this question, which
allows us to adjudicate between these two mechanisms and add considerable
nuance to the debate. Using the universe of mobile phone records of an
entire country over a period of four years, we first characterize the
migration decisions of millions of individuals with extremely granular
quantitative detail. We then use the data to reconstruct the complete
social network of each person in the months before and after migration,
and show how migration decisions relate to the size and structure of the
migrant's social network. We use these stylized results to develop and
estimate a structural model of network utility, and find that the average
migrant benefits more from networks that provide social support than
networks that efficiently transmit information. Finally, we show that
this average effect masks considerable heterogeneity in how different
types of migrants derive value from their social networks.
Joshua Blumenstock, PhD 2012, is Assistant Professor and
specializes in Development Economics, Data Science, Econometrics,
Machine Learning, and ICTD. His work focuses on using novel data and
methods to better understand the economic lives of the poor. Most
projects are based in developing and conflict-affected countries.
More at www.jblumenstock.com/
Mar 2: Anushah HOSSAIN, Ankeet SHANKAR, and
Michael BUCKLAND: Progress reports.
Anushah HOSSAIN: Who are the humans kept out by CAPTCHAs?
Are web services provided by global companies equally
accessible to internet users across the world? Anecdotal evidence suggests
that there are significant barriers to viewing and using websites in
developing regions. IP addresses from entire countries are known to be
blocked by certain websites and important page elements such as CAPTCHA
often fail to appear or lock users in endless loops due to misclassification
as bots. Despite widespread experience with these barriers, there is little
systematic documentation of the extent to which they affect users.
I investigate a single page element - the CAPTCHA challenge - and its
performance across different network conditions. To what extent do network
strength and location affect one’s classification as a bot or human? What
services are most restricted and where does this burden fall globally?
I invite feedback on my methods and framing for this early stage project.
Anushah Hossain is an MA-PhD student in the Energy
and Resources Group at UC Berkeley. She previously worked as a survey
researcher for non-profit and government organizations such as the Gates
Foundation, USAID, and EPA, studying the usage and impacts of technologies
in developing regions. Her current research focuses on differential access
to information and communication technologies.
Ankeet SHANKAR: PrivSec-F1: Compliance Focused Toolkit.
PrivSec-F1 attempts to incorporate various legal and
regulatory requirements for Product Managers, Chief Information Officers,
and Chief Information Security Officers of mid and small
size firms who often, due to budget constraints, do not have in their
organizational structure cybersecurity or legal experts. Furthermore, the
proposed product will incorporate cybersecurity requirements from the
Federal Trade Commission (“FTC”) as well as highlight “soft law” privacy
issues that could raise consumer ire yet still be legal, as classified in
Bert Jaap Koops publication “A Typology of Privacy”. Other notable
frameworks which we propose be included are Cloud Security Alliance (“CSA”)
Requirements, Payment Card Industry Data Security Standard (“PCI-DSS”)
requirements, International Organization for Standardization(“ISO”),
National Institute of Standards and Technology (“NIST”) to name a few.
Ankeet Shankar is a second year MIMS student
focusing on CyberSecurity and Privacy. He has extensive prior experience
in the field of Information Technology as a Management Consultant, with
a specialized focus on IT risk management and mitigation,IT Strategy,
outsourcing vendor audits, vulnerability assessments, penetration testing.
Michael BUCKLAND: Unified Search Support for All Kinds
of Information.
Bibliographies support search and discovery in published
literature, but there is no generally agreed concept of
bibliography, which has been used in three ways: for the relationship of
publications to knowledge; for the listing of publications; and for the
physical examination of books. How might one expand (or replace) bibliography
to include search and discovery of a wider range of resources including but
not limited to publications?
Mar 9: Two Sessions.
1:30 p.m.: Clifford LYNCH: Stewardship of the
Cultural Record: How do we approximate cultural "production".
As I have outlined earlier, a prerequisite to an
effective program of
cultural stewardship is making some reasonable estimates about the
incremental growth of the cultural record in various genres year by
year, and then trying to measure the proportion of that material that
is being taken care of by memory organizations. For many classes of
material (e.g. books, recorded music, films) this was relatively easy
in say 1970, because the means of production were heavily centralized;
there were certainly various cases around the margins which were
problematic, but they were around the margins. Today, this is not the
case. In all of the genres mentioned, and many more, we face very hard
problems even defining what represented the universe that might
legitimately claim stewardship attention and resources. This session
will be an outline of some of these dilemmas and a discussion of how
they might be approached in the present day
3:00 p.m. Stephen ABRAMS, California Digital Library:
The Means Don’t Quantify the Ends: Criteria and Metrics for
Evaluating Digital Preservation Success.
During a workshop at the 2006 Joint Conference on Digital
Libraries, Clifford Lynch stated that digital preservation was “a metric
that’s defied measurement.” Unfortunately, little progress has made since
then. The scholarly literature and professional practice have focused
extensively on the trustworthiness of preservation programs and systems,
but given little attention to the question of the success of the resulting
preservation outcomes. While there is broad consensus in the preservation
field about what to do and how to do it, there is no such agreement about
effective measures of how well it has actually been done. But without a
clear sense of what constitutes success, how can one rationally plan for,
expect, measure, or be held accountable for those outcomes? This
presentation will focus on work towards measurable metrics for digital
preservation. The goal of digital preservation is often stated as
ensuring ongoing access to and use of preserved resources. But the
assessment of use is a slippery notion, as it is inextricably tied to
a particular time, place, person, and manner; one person’s success could
quite easily be another’s failure. Too often the field tacitly assumes
that trustworthy means will necessarily lead to successful ends, but
those ends should be independently evaluated. Beyond being a problem
of appropriate data management, digital preservation should be seen more
broadly as a problem of human communication across time, with an
understanding that temporal distance implicates concomitant cultural
distance. In these terms, the entire preservation enterprise –
embracing the production and consumption, as well as management of
digital resources – is susceptible to communicological analysis,
leading towards a semiotic-phenomenological model for preservation-enabled
communication. The granular components of that model can then be used
to derive appropriate criteria and metrics for evaluating digital
preservation success.
Stephen Abrams is Associate Director of the California
Digital Library and responsible responsible for strategic planning, innovation, and technical oversight of UC3’s services, systems, and initiatives in preservation, data curation, data management planning, and web archiving. More at www.cdlib.org/contact/staff_directory/sabrams.html.
Mar 16: Chris HOOFNAGLE & Aniket KESARI: Consumer Law for the 21st
Century.
Imagine a future where every purchase decision is complex as
choosing a mobile phone. Will it have coverage at home and work? What will
ongoing service cost? How long will the device last? Can I and may I switch
providers? These are just some of the questions one must consider when
a product is “tethered”--linked to the seller in an ongoing way. The
Internet of Things, but more broadly, consumer products with embedded
software, will be tethered, with many implications for the consumer-seller
relationship. Tethered products blur the lines between goods and services
by incorporating elements of physical goods, digital goods, and digital
services.
The promise of new functionalities will bring consumers
many benefits and consumers will want tethered products. Our project seeks
to predict the pathologies that will arise from tethered products by
culling examples from recent seller/consumer conflicts and by mapping out
the microeconomic dynamics of tethered products. We then rethink consumer
law approaches to maximize healthy competition in a tethered environment.
Chris Jay Hoofnagle is adjunct professor of
information and of law at UC Berkeley. He is the author of Federal Trade
Commission Privacy Law & Policy, a history of the FTC’s consumer
protection efforts. He is an elected member of the American Law Institute
and is a strategic and legal advisor to companies in cybersecurity and
emerging technology fields. For more see hoofnagle.berkeley.edu/.
Aniket Kesari is a PhD Student at UC Berkeley's
Jurisprudence & Social Policy program, and will be starting his JD at
Yale Law in Fall 2018. He specializes in law & economics, with research
interests that lie in technology law, data science, and public policy.
He is currently working on projects related to digital privacy, innovation,
and consumer protection. For more see https://goo.gl/J5NG7t.
Mar 23: No seminar meeting.
Mar 30: Spring break. No seminar meeting.
Apr 6: Catherine MARSHALL:
How Digital Libraries, Social Media, and Collections of Ephemera
Will Change the Practice of Biographical Research.
Biography is a literary form that intertwines history and
identity. Modern biographies have relied extensively on journalistic
interviews and human memory to supplement contemporaneous sources of primary
material such as letters, literary drafts, photos, and journals and
notebooks, usually held as physical special collections in research
libraries. I use the construction of a subject-driven collection of over
11,000 discrete digital items as a case study to demonstrate how new
digital resources can extend the breadth and depth of biographical
description, facilitate the rediscovery of a subject’s social network, and
enable formerly invisible literary influences to be foregrounded. I also
explore the implications of the use of ephemera in tandem with other
digital resources, both to ask what we might want to save in the future,
and to begin a discussion of the trade-offs inherent in making material
that was once ephemeral (and difficult to access) so readily available
online.
Catherine Marshall is a San Francisco-based adjunct
professor in the Computer Science Department at Texas A&M University.
She was formerly a principal researcher at Microsoft Research Silicon Valley.
More at www.csdl.tamu.edu/~marshall/.
Apr 13: Ankeet SHANKAR and Anushah HOSSAIN.
Ankeet SHANKAR: PrivSec-F1: Compliance Focused Toolkit.
Based on the review and feedback provided during the last
progress report presented on March 2, a subset of the report and key
findings will be presented. This progress report will be focused on
the legal aspects of PrivSec-F1 pertaining to soft requirements and
guidance provided by Federal Trade Commission (FTC) for businesses.
This will focus on the key lessons learnt from over 50 FTC cases and
complaints and a subsequent trend analysis and key learning which
companies/start-ups may utilize for keeping their data safe.
Ankeet Shankar is a second year MIMS
student focusing on CyberSecurity and Privacy. He has extensive prior
experience in the field of Information Technology as a Management
Consultant, with a specialized focus on IT risk management and
mitigation,IT Strategy, outsourcing vendor audits, vulnerability
assessments, penetration testing.
Anushah HOSSAIN: Using CAPTCHAs to Measure Internet
Fragmentation.
In 2015, the World Economic Forum (WEF) convened a session
on ‘Keeping ‘Worldwide’ in the Web’ following increasing concern of
internet fragmentation. Commercial, political, and technical interests
have manifest as unequal quality and distribution of infrastructure,
content censorship, data localization policies, and other forms of
closure. The subsequent WEF report published in 2016 emphasized the
difficulty and the need to summarize the scope and nature of internet
fragmentation. What are the lines of fissure (geographic, cultural,
or otherwise), and what are the consequences of fragmentation?
This early stage project outlines one entry point to
tracking internet fragmentation. I propose studying a single page
element – the CAPTCHA challenge – as a way of understanding the
tensions in designing both a secure and usable Web. Based on a
literature review and document analysis, I summarize recent trends
in CAPTCHA delivery, profiles of systematically disadvantaged users,
and next steps for data gathering and framing.
Anushah Hossain is an MA-PhD student in the Energy and Resources Group at UC Berkeley. She previously worked as a survey researcher for non-profit and government organizations such as the Gates Foundation, USAID, and EPA, studying the usage and impacts of technologies in developing regions. Her current research focuses on differential access to information and communication technologies.
Apr 20: Deirdre MULLIGAN and Daniel S. GRIFFIN: Reasons and Rights to be Brave: Rescripting Search to Respect the Right to Truth.
What is the function of Google search? Breakdowns, as others
have noted provide an opportunity
to understand a technical artifact’s function. They allow us to
explore the mismatch between the roles and expectations prescribed to
and demanded of the users by the designers and the behavior
and expectations of actual users. This talk explores a particular
breakdown to understand what it tells us about how Google
imagines its users, and how users imagine Google search, and how
those competing imaginaries contribute to the definition and perception
of failure. Through a close examination of one
construed failure we identify a particular responsibility—to respect
the collective right to truth
rooted in the growing expectation on businesses to respect human
rights—that Google failed to
enact. We then explore how portrayals of search, and the norms of
search engineers discourage Google from acting to respect the right to
truth--despite is deep entanglement with the function both
Google and its users ascribe to search--while at the same time
deflecting work to protect the right
to truth onto other participants in the script of search. We leverage
Akrich’s de-scription to unpack
the breakdown of search’s script, and reveal the potential space for
Google to act and perform
differently. We then bring the discussion together in connecting the
normative call for action with
normative concerns about the manner of action, and offer a
framework rooted in human rights to guide a rescripting of search that aligns
social expectations of Google’s responsibility to respect
human rights in the context search, and protects against the slippery
slope of moderation content platforms fear.
Deirdre Mulligan is an Associate Professor in the
School of Information at UC Berkeley, a faculty Director of the Berkeley
Center for Law & Technology, and an affiliated faculty on the Hewlett
funded Berkeley Center for Long-Term Cybersecurity. Mulligan’s research
explores legal and technical means of protecting values such as
privacy, freedom of expression, and fairness in emerging technical
systems. Her book, Privacy on the Ground: Driving Corporate Behavior
in the United States and Europe, a study of privacy practices in
large corporations in five countries, conducted with UC Berkeley Law
Prof. Kenneth Bamberger was recently published by MIT Press.
More at www.ischool.berkeley.edu/people/deirdre-mulligan#profile-main.
Daniel Griffin is a PhD student at the School
of Information interested in questions connecting search and
misinformation. He is a co-director of the Center for Technology,
Society & Policy. More at danielsgriffin.com/.
Apr 27: Michael BUCKLAND: Search Support for All Kinds of Evidence.
Bibliographical techniques to provide sophisticated
access to printed publications were developed long ago. Access to other
resources has been slower to develop. Bibliographer Donald F. McKenzie
wrote that bibliography should be extended to all media including
culturally significant landscapes and librarian Suzanne Briet asserted
that an antelope in a cage could be a document. But these materials
are beyond the scope of established bibliographical practice.
So how might a unified framework for the description, indexing, and
discovery of all kinds if evidence resources be developed? What are the
practical and theoretical consequences of extending (or replacing)
bibliography to include antelopes, landscapes, and any other kinds
of evidence?
We approach this challenge by examining the history and
affordances of bibliographies; adopting the perspective of the person
becoming informed; preferring description, archetypes and similarities
to definitions, distinctions and dichotomies; and seeking a generalized
approach to reference and reference works.
May 4: Sebastian BENTHALL: Context, Causality, and Information Flow:
Implications for Privacy Engineering, Security, and Data Economics.
The creators of technical infrastructure are under social
and legal pressure to comply with expectations that can be difficult to
translate into computational and business logics. The dissertation
presented in this talk bridges this gap through three projects that focus
on privacy engineering, information security, and data economics,
respectively. These projects culminate in a new formal method for
evaluating the strategic and tactical value of data. This method relies
on a core theoretical contribution building on the work of Shannon,
Dretske, Pearl, Koller, and Nissenbaum: a definition of information
flow as a channel situated in a context of causal relations.
Sebastian Benthall is a security scientist working
at the intersection of computer science, economics, law, and philosophy.
He is a Research Fellow at the Digital Life Initiative at Cornell Tech
and a Data Scientist for Ion Channel, a Washington, DC based cybersecurity
company. Before becoming a scientist, Sebastian managed the development
of spatial data infrastructure for global coordination around disaster
risk reduction. He holds a B.A. in Cognitive Science from Brown University
and is completing his PhD at UC Berkeley's School of Information.
The Seminar will resume in the Fall semester.
Fall
2017 schedule.
summaries.