"Information Organization & Retrieval"
INFO 202, Fall 2008 (MW 9:00-10:30 in 202 South Hall)
Professor Robert Glushko (glushko@ischool.berkeley.edu)
Teaching Assistants: Jonathan Breitbart, Shawna Hein, Nick Rabinowitz
This course introduces the intellectual foundations of information organization and retrieval: conceptual modeling, semantic representation, vocabulary and metadata design, classification, and standardization, as well as information organization and retrieval practices, technology, and applications, including computational processes for analyzing information in both textual and non-textual formats. Students will learn how information organization and retrieval is carried out by professionals, authors, and users; by individuals in association with other individuals, and as part of the business processes in an enterprise and across enterprises.
This is a required introductory course for incoming School of Information masters students, integrating
perspectives and best practices from a wide range of disciplines.
Students are also required to attend a one-hour small section meeting each Monday starting the second week of the semester (11-12, 12-1, 4-5 - all in Room 107 South Hall)
Required Texts:
- Elaine Svenonius, The Intellectual Foundation of Information Organization, MIT Press, 2000.
- Christopher D. Manning, Prabhakar Raghavan, & Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008.
Other Course Resources:
1. INTRODUCTION (8/27)
Readings
- Vannevar Bush, "As We May Think," The Atlantic Monthly, July 1945, http://www.theatlantic.com/doc/194507/bush
- Jim Gemmell, Gordon Bell, and Roger Lueder,
"MyLifeBits: a personal database for everything," Communications of the ACM, 49(1), January 2006, 88-95 http://doi.acm.org/10.1145/1107458.1107460
- Jorge Luis Borges, "The Library of Babel," from Labyrinths: Selected Stories and Other Writings
- Ramana Rao, "From IR to Search and Beyond" ACM Queue (May 2004) http://doi.acm.org/10.1145/1005062.1005070
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Section 19.1
Assignment 1
- "202 in the News" -- due 9/3
2. ISSUES AND CONTEXTS (9/3)
Readings
- Malcolm Wheatley, "Operation Clean Data"
CIO (July 2004) http://www.cio.com.au/index.php/id;637800080;fp;;fpid;;pf;1
- Jane Zhang,"Tailing Virulent Veggies"
Wall Street Journal (March 13, 2007)
- R. Baron, E. Fabens, M. Schiffman and E. Wolf, "Electronic Health Records: Just around the Corner? Or over the
Cliff?" (222 - 226), Annals of Internal Medicine 143 (August 2005)
- David Karger and William Jones, "Data Unification in Personal Information Management," Communications of the ACM, 49(1), January 2006, 77-82 http://doi.acm.org/10.1145/1107458.1107496
- Rajat Mukherjee and Jianchang Mao "Enterprise Search: Tough Stuff", ACM Queue (April 2004) http://doi.acm.org/10.1145/988392.988406
-
Gary Stix, "The Elusive Goal of Machine Translation," Scientific American (March 2006)
3. ORGANIZATION {AND,OR,VS} RETRIEVAL (9/8)
Readings
-
Elaine Svenonius, The Intellectual Foundation of Information Organization, Preface and Chapters 1-2
- David Weinberger, Everything is Miscellaneous, Preface and Chapter 1 http://www.everythingismiscellaneous.com/samples/
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Sections 8.1-8.3
4. XML (9/10)
Readings
- Robert J. Glushko and Tim McGrath, Document Engineering, Chapter 2, "XML Foundations"
Assignment 2
5. CONCEPTS & CATEGORIES (9/15)
Readings
-
George Lakoff, Women, Fire, and Dangerous Things, Chapters 1 and 2 (pages 5-57)
- Robert J. Glushko, Paul Maglio, Teenie Matlock and Lawrence Barsalou, "Categorization in the Wild," Trends in Cognitive Sciences, 12(4): 129-135, April 2008 http://dx.doi.org/10.1016/j.tics.2008.01.007
6. METADATA & METADATA STANDARDS (9/17)
Readings
- Svenonius, The Intellectual Foundation of Information Organization,Chapter 3, Chapter 4 (62 - 66)
- Lois Chan and Marcia Zeng, "Metadata Interoperability and Standardization: A Study of
Methodology Part I. Achieving Interoperability at the Schema Level", D-Lib Magazine, 12(6), June 2006 http://www.dlib.org/dlib/june06/chan/06chan.html
- Cory Doctorow, "Metacrap: Putting the torch to seven straw-men of the
meta-utopia", 26 August 2001 http://www.well.com/~doctorow/metacrap.htm
7. CONTROLLED NAMES AND VOCABULARIES (9/22)
Readings
- Svenonius, The Intellectual Foundation of Information Organization, Chapter 6, Chapter 8 (127-132)
-
George W. Furnas, Thomas K. Landauer, Louis M. Gomez, and Susan T. Dumais, "The Vocabulary Problem in Human-System Communication," Communications of the ACM, 30(11), 964-971 (1987) http://doi.acm.org/10.1145/32206.32212
- Karen Coyle, "Identifiers: Unique, Persistent, Global," The Journal of Academic Librarianship, 34(4), June 2006, 428-431. http://www.kcoyle.net/jal-32-4.html
- L. Karl Branting, "Name Matching in Law Enforcement and Counter-Terrorism," ICAIL Workshop on Data Mining, Information Extraction, and Evidentiary Reasoning for Law Enforcement and Counter-Terrorism, June 2005. http://www.karlbranting.net/papers/icail2005.pdf
Assignment 3
8. CLASSIFICATION (9/24)
Readings
9. ONTOLOGY (9/29)
Readings
Assignment 4
10. DOCUMENTS AND DATA MODELS... AND MODELING (10/1)
Readings
11. INFORMATION INTEGRATION & INTEROPERABILITY (10/6)
Readings
12. ENTERPRISE / INSTITUTIONAL CATEGORIZATION & STANDARDS (10/8)
Readings
- M. Brun, J. Brown and R. Lohde, "Adoption of UBL in Denmark: Business cases and experiences" XTech (2005) http://idealliance.org/proceedings/xtech05/papers/03-05-02/
- Smita Brunnermeier and Sheila Martin, "Interoperability Costs in the US Automotive Supply Chain" Supply Chain Management 7(2) (2002)
- Arnon Rosenthal, Len Seligman, and Scott Renner. "From Semantic Integration to Semantics Management: Case Studies and a Way Forward," SIGMOD Record, 33(4), December 2004. http://doi.acm.org/10.1145/1041410.1041418
Assignment 5
13. THE SEMANTIC WEB (10/13)
Readings
- Catherine Marshall and Frank Shipman, "Which Semantic Web?"
ACM conference on Hypertext and Hypermedia (2003) http://doi.acm.org/10.1145/900051.900063
- Nigel Shadbolt, Wendy Hall, and Tim Berners-Lee. "The Semantic Web Revisited," IEEE Intelligent Systems, May/June 2006, 96-101. http://eprints.ecs.soton.ac.uk/12614/1/Semantic_Web_Revisted.pdf
- Anupriya Ankolekar, Markus Krotzsch, Thanh Tran, and Denny Vrandecic. "The two cultures: Mashing up Web 2.0 and the semantic web." Proceedings of the 16th international conference on World Wide Web, 2007, 825-834. http://doi.acm.org/10.1145/1242572.1242684
14. SOCIAL / DISTRIBUTED CATEGORIZATION (10/15)
Readings
- Cameron Marlow, Mor Naaman, Danah Boyd, and Marc Davis. "HT06, Tagging Paper, Taxonomy, Flickr, Academic Article, To Read," Proceedings of the seventeenth conference on Hypertext and hypermedia, 2006 http://doi.acm.org/10.1145/1149941.1149949
- Tom Gruber, "Collective knowledge systems: Where the social web meets the semantic web," Journal of Web Semantics, 6, 2008 doi:10.1016/j.websem.2007.11.011 AND http://tomgruber.org/writing/CollectiveKnowledgeSystems.htm
15. PERSONAL INFORMATION MANAGEMENT (10/20)
Readings
Assignment 6
16. CONTENT MANAGEMENT (10/22)
Readings
- William L. Kuechler, "Business Applications of Unstructured Text," Communications of the ACM, 50(10), 2007, 86-93. http://doi.acm.org/10.1145/1290958.1290967
- Alexander B. Schwarzman, Hyunmin Hur, Shu-Li Pai, and Carter
Glass, "XML-centric workflow offers benefits to scholarly publishers," XML 2004 Conference http://www.idealliance.org/proceedings/xml04/papers/71/XML2004-schwarzman.pdf
17. MIDTERM EXAM (10/27)
18. METADATA FOR MULTIMEDIA (10/29)
Readings
- Patricia Harpring,
"The Language of Images: Enhancing Access to Images by Applying
Metadata Schemas and Structured Vocabularies" http://www.getty.edu/research/conducting_research/standards/intro_aia/harpring.pdf
- Kai-Ping Yee, Kirsten Swearington, Kevin Li, and Marti Hearst,
"Faceted metadata for image search and browsing" ACM CHI 2003 http://doi.acm.org/10.1145/642611.642681 AND http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf
-
Mor Naaman, Susumu Harada, QianUing Wang, Hector Garcia-Molina, and Andreas Paepcke,
"Context Data in Geo-Referenced Digital Photo Collections," Proceedings of the 12th annual ACM international conference on Multimedia (2004) http://doi.acm.org/10.1145/1027527.1027573
19. INFORMATION ORGANIZATION IN / FOR USER INTERFACES (11/3)
Readings
-
Jennifer Tidwell, Designing Interfaces, "Chapter 2, Organizing the Content: Information Architecture and Application Structure."
-
Jennifer Tidwell, Designing Interfaces, "Chapter 4, Organizing the Page: Layout of
Page Elements"
-
Globalization, Localization, Internationalization and
Translation
- Mano Marks and Kelly Snow, "User Interface Design Patterns: Strengths, Challenges, and Future Directions," UCB ISchool, February 2006. http://www.ui-designpatterns.org/tr/uidp_strengths_challenges_future.pdf
20. MODEL-BASED APPLICATIONS AND UIs (11/5)
Readings
Assignment 7 (OPTIONAL EXTRA CREDIT)
21. SEARCH MODELS AND UIs FOR IR (11/10)
Readings
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Sections 19.4, 8.5.1-8.7
- Marc Resnick and Misha Vaughan, "Best Practices and Future Visions for Search User Interfaces" Journal of the American Society for Information Science and Technology 57(6) (2006) http://www2.sims.berkeley.edu/courses/is202/f06/Readings/BestPracticesForSearchUI.pdf
- Anne Aula, Studying User Strategies and Characteristics for Developing Web Search Interfaces, Ph D Thesis, University of Tampere (December 2005), Chapter 4 http://www2.sims.berkeley.edu/courses/is202/f06/Readings/AulaThesis.pdf
22. TEXT PROCESSING; BOOLEAN MODELS (11/12)
Readings
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Chapters 1 and 2
23. VECTOR MODELS (11/17)
Readings
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Chapter 6
Assignment 8
24. DIMENSIONALITY REDUCTION (11/19)
Readings
25. STRUCTURE-BASED MODELS [1] (11/24)
Readings
- Alejandro Diaz, "Through the Google Goggles:
Sociopolitical Bias in Search Engine Design." 2005 (Chapters 4 and 5 http://epl.scu.edu:16080/~stsvalues/readings/Diaz_thesis_final.pdf
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Chapter 21
- Pairin Katerattanakul, Bernard Han, and Soongoo Hong. "Objective Quality Ranking of Computing Journals," Communications of the ACM, 46(10), October 2003, 111-114. http://doi.acm.org/10.1145/944217.944221
26. STRUCTURE-BASED MODELS [2] (11/26)
Readings
27. MULTIMEDIA IR (12/1)
Readings
- Alejandro Jaimes, Mike Christel, Sbastien Gilles, Ramesh
Sarukkai, and Wei-Ying Ma, "Multimedia Information Retrieval: What is it, and why isnt
anyone using it?" Proceedings of the 7th ACM SIGMM (2005) http://doi.acm.org/10.1145/1101826.1101829
- Peter Knees, Tim Pohle, Markus Schedl, and Gerhard Widmer. "A Music Search Engine Built upon Audio-based and Web-based Similarity Measures," SIGIR 2007. http://doi.acm.org/10.1145/1277741.1277818
28. APPLIED IR & NLP (12/3)
Readings
- Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval, Chapter 13, through Section 13.1
- Adam Kilgarriff and Gregory Grefenstette, "Introduction to the Special Issue on the Web as Corpus," Computational Linguistics 29(3) (2003)
http://acl.ldc.upenn.edu/J/J03/J03-3001.pdf
-
"From Babel to Knowledge Data Mining Large Digital Collections" Daniel Cohen, -Lib Magazine (March 2006)
http://www.dlib.org/dlib/march06/cohen/03cohen.html
-
Weiguo Fan, Linda Wallace, Stephanie Rich, and Zhongju Zhang, "Tapping the Power of Text Mining," Communications of the ACM, September 2006 http://doi.acm.org/10.1145/1151030.1151032
- Paul Graham, "A Plan for Spam" http://www.paulgraham.com/spam.html
29. ALUMNI DAY (12/8)
Zach Gillen
Benjamin Hill
Mano Marks
Patrick Schmitz
30. COURSE REVIEW (12/10)
FINAL EXAM (12/15)