Web Architecture and Information Management

INFO 153 (CCN 42509) – Spring 2011
School of Information, UC Berkeley

Instructors: Erik Wilde and Dilan Mahendran
TA: Prateek Kakirwar

Lecture: Mon & Wed 15.00–16.00, 88 Dwinelle Hall
Lab: Fri 14.00–15.00 202 South Hall

Description: This courses focuses on understanding the Web as an information system, and how to use it for information management for personal and shared information. The Web is an open and constantly evolving system which can make it hard to understand how the different parts of the landscape fit together. This course provides students with an overview of the Web as a whole, and how the individual parts it together. We briefly look at topics such as Web design and Web programming, but this course is not exclusively designed to teach HTML or JavaScript. Instead, we look at the bigger picture and how and when to use these and other technologies. The Web already is and will remain a central part in many information-related activities for a long time to come, and this course provides students with the understanding and skills to better navigate and use the landscape of Web information, Web technologies, Web tools, and common Web patterns.

Date Subject Slides Required Reading Additional Resources
2011-01-19 Overview and Introduction: This introductory lecture gives the motivation for the course, some information about the people involved and the organization of the course, a high-level overview of the course's topics, and an overview of the assignments which are an important part of the course program. Introduction bSpace [https://bspace.berkeley.edu/portal/site/2676fd30-179a-46b4-be96-cea7e9694390?panel=Main]
2011-01-24 Web History: The Web is, in the words of its creator Tim Berners-Lee, a "global information space." The Web is relatively new, but the vision of a global information space is at least a century old. Looking back at these early visions can give us a sense of the recurring problems in human communication and information management to which the Web was intended to be a solution. Yet we must be careful to avoid seeing an unbroken line of technological progress where there was none: many of the pioneers of information management were forgotten, and later generations constructed their own pragmatic historical narratives. History The Web Time Forgot [http://www.nytimes.com/2008/06/17/science/17mund.html] · History of the Web [http://www.w3c.rl.ac.uk/primers/history/origins.pdf] · A Proposal [http://www.w3.org/History/1989/proposal.html] Paul Otlet [http://people.ischool.berkeley.edu/~buckland/otlet.html] · Emanuel Goldberg [http://people.ischool.berkeley.edu/~buckland/goldberg.html] · World Brain [https://sherlock.ischool.berkeley.edu/wells/world_brain.html] · Internet Pioneers [http://www.ibiblio.org/pioneers/] · Augmenting Human Intellect [http://www.dougengelbart.org/pubs/augment-3906.html] · Hypertext '87 [http://portal.acm.org/citation.cfm?id=43953] · W3C Web Chronology [http://www.w3.org/History.html] · How It All Started [http://www.w3.org/2004/Talks/w3c10-HowItAllStarted/?toc=true] · Histories of the Internet [http://www.isoc.org/internet/history/]
2011-01-26 Standards and Standards Bodies: The Web is composed of standards: rules and practices to which technologies must adhere if they are to be considered part of the Web. Most of these standards are developed by standards bodies, organizations that publish documents defining Web technologies. But sometimes standards develop more organically, from technologies that become popular and are widely implemented without a formal process. However they arise, Web standards are critical for the existence of the Web, and they can have enormous economic and societal impact. Standards Intro to Dynamics of Standards [http://books.google.com/books?id=IXkX8WKG24gC&pg=PA1] · Web Standards FAQ [http://www.webstandards.org/learn/faq/] · WHATWG FAQ (Parts 1 & 2) [http://wiki.whatwg.org/wiki/FAQ] How the Internet Got Its Rules [http://www.nytimes.com/2009/04/07/opinion/07crocker.html] ·IETF [http://www.ietf.org/] · W3C [http://www.w3.org/] · WHATWG [http://www.whatwg.org/]
2011-01-31 Filesystems and Web Servers: It can be hard to understand web technologies without a basic understanding of filesystems, directory structure, and file paths. This lecture provides an overview of these topics and shows how web servers build upon these fundamentals. Filesystems Files [http://en.wikipedia.org/wiki/Computer_file] · Filesystems [http://hubpages.com/hub/understanding-your-file-system] · Directory structure [http://www.geo.hunter.cuny.edu/~tbw/spars/dept.faqs/file_dir_structure.htm] · Paths [http://en.wikipedia.org/wiki/Path_(computing)] · Website structure [http://www.netstrider.com/tutorials/HTML/structure/] · Web servers [http://www.howstuffworks.com/web-server.htm/printable] Filesystem Hierarchy Standard [http://www.pathname.com/fhs/pub/fhs-2.3.html] · Apache URL Mapping [http://httpd.apache.org/docs/2.2/urlmapping.html]
2011-02-02 HyperText Markup Language (HTML): The Hypertext Markup Language (HTML) is the most important content type on the Web. This lecture covers a basic overview of how to use HTML markup in general. In particular, we look at page titles, meta tags, inserting text and images, using lists, and creating simple tables. Attributes can be used for more layout control in the HTML tags, but most layout issues are deferred until the CSS lecture. HTML Getting started with HTML [http://www.w3.org/MarkUp/Guide/] · Getting to know HTML [http://proquest.safaribooksonline.com/059610197X/1] HTML Tutorial [http://www.w3schools.com/html] · HTML Reference [https://developer.mozilla.org/en/HTML/Element] · HTML Validator [http://validator.w3.org/]
2011-02-07 Cascading Style Sheets (CSS): Cascading Stylesheets (CSS) have been designed as a language for better separating presentation-specific issues from the structuring of documents as provided by HTML. CSS uses a simple model of selectors and declarations. Selectors specify to which elements of a document a set of declarations (each being a value assigned to a property) apply; in addition there is a model of how property values are inherited and cascaded. The biggest limitation of CSS is that it cannot change the structure of the displayed document. CSS Adding a Touch of Style [http://www.w3.org/MarkUp/Guide/Style] · Getting started with CSS [http://proquest.safaribooksonline.com/059610197X/285] CSS Spec [http://www.w3.org/TR/CSS21/] · Properties [http://www.w3.org/TR/CSS21/propidx.html] · CSS Tutorial [http://www.w3schools.com/css] · CSS Validator [http://jigsaw.w3.org/css-validator/]
2011-02-09 Web Browsers: This lecture looks at Web browsers and how they work. It introduces the basic functionalities of a browser; retrieval and rendering of Web pages. Any modern browser needs to support more than just HTTP and HTML; it must support CSS for stylesheets, JavaScript for scripted Web pages, various image formats, and popular applications such as Flash. In addition, browsers can support additional functionality such as off-line operation, or in general more application-oriented features such as AIR or Silverlight. Browsers Wikipedia [http://en.wikipedia.org/wiki/Web_Browser] · History [http://en.wikipedia.org/wiki/History_of_the_web_browser] Firefox [http://www.mozilla.com/firefox/] · Safari [http://www.apple.com/safari/] · IE [http://www.microsoft.com/windows/products/winfamily/ie/default.mspx] · Chrome [http://www.google.com/chrome] · Opera [http://www.opera.com/]
2011-02-14 Internet Architecture: The Internet is the technical infrastructure on top of which the Web is built. Some of the services provided by the Internet are essential for the Web, most importantly the naming service and the data transfer service. The Domain Name System (DNS) provides the human-readable names for computers, which can then be used in the addresses of Web servers and ultimately Web pages. The Transmission Control Protocol (TCP) provides the reliable data transfer service between Web Servers and Web Browsers, building on the very robust Internet Protocol (IP). Internet TCP/IP [http://xrds.acm.org/article.cfm?aid=197182] Internet Architecture [http://en.wikipedia.org/wiki/Category:Internet_architecture] · TCP/IP Overview [http://www.garykessler.net/library/tcpip.html] · Timeline [http://www.zakon.org/robert/internet/timeline/]
2011-02-16 Web Foundations (URIs & HTTP): The Web's architecture has very simple principles revolving around the ideas of placing a heavy emphasis on a consistent and global identification mechanism for resources, a standardized way of how resource representations can be retrieved, and a standardized way of how resource representations should be usable by using standardized media types. Based on the Internet, the Web's transport protocol transmits representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP). URIs & HTTP HTTP [http://en.wikipedia.org/wiki/Http] · Cool URIs [http://www.w3.org/Provider/Style/URI] Live HTTP Headers [https://addons.mozilla.org/en-US/firefox/addon/3829] · HTTP and CGI [http://www.garshol.priv.no/download/text/http-tut.html] · URI Spec [http://tools.ietf.org/html/rfc3986] · HTTP Spec [http://tools.ietf.org/html/rfc2616]
2011-02-23 Anatomy of a Basic Web Application: The vast majority of web sites today are no longer static HTML pages but database-driven web applications. Today we'll look at a simple database-driven web application in detail, to see how its various components--HTML forms, application server, and database--work together. Basic Web Apps Database-Driven Website [http://www.killersites.com/articles/articles_databaseDrivenSites.htm] · Forms [http://en.wikipedia.org/wiki/Form_(web)] · It's Alive! [http://proquest.safaribooksonline.com/9780596157739/1] · Web Application Frameworks [http://docforge.com/wiki/Web_application_framework] HTML Forms FAQ [http://htmlhelp.com/faq/html/forms.html] · HTML Forms Spec [http://www.w3.org/TR/html401/interact/forms.html]
2011-02-28 Media Types (MIME): One of the most important aspect of computer-based communications is the concept of media types, the question what type of information some digital artifact represents, and how it is encoded. The most common standard for this information is the scheme introduced by Multipurpose Internet Mail Extensions (MIME). Media types can be negotiated by peers communicating through HTTP. Some media types allow fragment identifiers, which allow references to a resource to identify a fragment of the complete resource. Media Types Firefox Handling [https://developer.mozilla.org/En/How_Mozilla_determines_MIME_Types] Registry [http://www.iana.org/assignments/media-types/] · Wikipedia [http://en.wikipedia.org/wiki/MIME_type]
2011-03-02 Multimedia: Multimedia is a broad term for pictures, audio and video. Pictures include both images and graphics. Until HTML5, images were the only multimedia content on the Web widely supported by standardized formats. With the arrival of HTML5, audio and video are now supported directly by Web browsers, and there is wider support for graphics as well. Multimedia Style Guide [http://www.webstyleguide.com/wsg3/11-graphics/] · Graphics in HTML5 [http://www.youtube.com/watch?v=siOHh0uzcuY#t=6m06s] · Video in HTML5 [http://www.youtube.com/watch?v=siOHh0uzcuY#t=20m53s] Graphics File Formats [http://en.wikipedia.org/wiki/Graphics_file_format] · Intro to SVG [http://www.w3schools.com/svg/svg_intro.asp] · Audio and Video in Firefox [https://developer.mozilla.org/En/Using_audio_and_video_in_Firefox]
2011-03-07 State Management (Cookies): HTTP is a stateless protocol, where each request/response interaction is a separate interaction and there is no protocol support for longer sessions (such as a user logging in and working on a Web site as an identified user). State management refers to mechanisms which provide support for this kind of scenario, the most popular choice for state management are cookies. Another possibility is URI-based state management. The newest option for storing state is HTML5 Web Storage. This lecture is also a glimpse into the world of Representational State Transfer (REST), the Web's fundamental model of handling interaction with resources. Cookies HowStuffWorks [http://computer.howstuffworks.com/cookie.htm/printable] · Databases in HTML5 [http://www.youtube.com/watch?v=siOHh0uzcuY#t=33m10s] Cookie Spec [http://tools.ietf.org/html/rfc2965] · Wikipedia [http://en.wikipedia.org/wiki/HTTP_cookie] · HTTP Viewer [http://www.httpviewer.net/] · HTML5 Web Storage [http://dev.w3.org/html5/webstorage/]
2011-03-09 Client-side Scripting: Scripting is used on the majority of today's modern Web sites. Scripting can be used to improve the usability and accessibility of a Web site (for example for validating form data on the client side), it can vastly improve the user experience with new interface design (the smooth scrolling of Google Maps vs. older click to scroll map services), or it can be used to implement behavior that would be impossible without scripting (for example the online applications of Google Docs). This introductory lecture looks into scripting fundamentals such as JavaScript itself, the Document Object Model (DOM) for accessing the browser window's content, and XMLHttpRequest for script-server communications. Scripting The Interactive Web [http://proquest.safaribooksonline.com/9780596527747/chapter_1_the_interactive_web] Best Practices [http://domscripting.com/book/sample/] · Tutorial [http://www.webteacher.com/javascript/] · Wikipedia [http://en.wikipedia.org/wiki/Dynamic_HTML]
2011-03-14 Anatomy of an Advanced Web Application: The widespread adoption of client-side scripting and AJAX techniques has resulted in web applications becoming easier use but harder to understand. No longer is it the case that HTML is used simply to present a document to be read. Now HTML, Javascript and CSS are used together to build dynamic applications that run in the browser. These applications often depend on APIs, resources intended for use by programs rather than people. Advanced Web Apps AJAX [http://www.adaptivepath.com/ideas/e000385] · AJAX Tutorial [http://code.google.com/edu/ajax/tutorials/ajax-tutorial.html] · XMLHttpRequest [http://en.wikipedia.org/wiki/XMLHttpRequest] · jQuery [http://en.wikipedia.org/wiki/JQuery] JSON [http://en.wikipedia.org/wiki/JSON] · Web service [http://en.wikipedia.org/wiki/Web_service]
2011-03-16 Mid-term Exam: Midterm
2011-03-28 Web Search: In his early vision of the Web, Tim Berners-Lee expected that most people would discover information by following hyperlinks, rather than by using keyword searches. Thus there is no search functionality built into the Web. Web search engines came later and had a profound effect on how we use and experience the Web. Now it is hard to imagine using the Web without search, a fact that has both technological and political implications. Search Google Basics [http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=70897] · Search Engine Basics [http://proquest.safaribooksonline.com/9780596809133/search_engine_basics] · Politics of Search Engines [http://epl.scu.edu/~stsvalues/readings/ShapingTheWeb.pdf] PageRank [http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf] · robots.txt [http://en.wikipedia.org/wiki/Robots_exclusion_standard] · Site Map [http://en.wikipedia.org/wiki/Site_map] · nofollow [http://en.wikipedia.org/wiki/Nofollow] · Google Insights [http://www.google.com/insights/search/#]
2011-03-30 Web Intermediaries: Until now we have discussed the Web in terms of interactions between Web servers (where content is published) and Web browsers (where content is displayed). In actuality, things are a bit more complicated than that. There are many different kinds of Web intermediaries that may occupy the path between where content originates and where it is consumed. These intermediaries can provide a number of services, from improving performance to filtering content to protecting privacy. Intermediaries Web Intermediaries [http://www.almaden.ibm.com/cs/wbi/papers/www7/Intermediaries.pdf] · The Great Firewall [http://www.theatlantic.com/magazine/archive/2008/03/-ldquo-the-connection-has-been-reset-rdquo/6650/] Proxy auto-config [http://en.wikipedia.org/wiki/Proxy_auto-config] · Content-control software [http://en.wikipedia.org/wiki/Content-control_software] · Web cache [http://en.wikipedia.org/wiki/Web_cache] · Anonymizer [http://en.wikipedia.org/wiki/Anonymizer] · CDN [http://en.wikipedia.org/wiki/Content_delivery_network]
2011-04-04 Content Syndication (Atom & RSS): For many information sources on the Web, it is useful to have some standardized way of subscribing to information updates. Syndication formats such as RSS and Atom can be used by these information sources to publish a feed of updated information items. Feeds can be read directly in a browser, but in most cases they are read by specialized software; either a feed reader that allows users to subscribe to more than one feed and manage the information received through all these feeds, or some software module that reads feeds and embeds them for example in a Web page. This latter example is the classical usage of feeds; news feeds published by news agencies, and then embedded as news tickers into Web pages as a constantly updated source of information. Syndication History [http://en.wikipedia.org/wiki/History_of_web_syndication_technology] Wikipedia (Syndication) [http://en.wikipedia.org/wiki/Web_syndication] · Wikipedia (Feeds) [http://en.wikipedia.org/wiki/Web_feed] · Podcast Spec [http://www.apple.com/itunes/whatson/podcasts/specs.html]
2011-04-06 Third-party Content: HTML pages served by one web server can "host" content from a 3rd-party web server. That functionality is basic to the Web, but it has only really been exploited in recent years. In this lecture we'll look at methods for including 3rd-party content in web page, and some common patterns of application that use these methods. Third-Party Widgets, badges, and gadgets [http://blogs.zdnet.com/Hinchcliffe/?p=80] Transclusion [http://en.wikipedia.org/wiki/Transclusion] · Widget landscape [http://www.w3.org/TR/widgets-land/] · Widget Marketing [http://www.startup-review.com/blog/youtube-case-study-widget-marketing-comes-of-age.php]
2011-04-11 Semantic Web, Linked Data & Microformats: HTML pages are for human users and describe a resource in structural terms (headings, lists, tables, …). For machine-based interaction, it is often useful to have more information about the meanings of application concepts. The Semantic Web is a research program and set of standards for trying to specify these meanings. One goal is to enable data published on the Web to be easily interlinked. Semantics Wikipedia [http://en.wikipedia.org/wiki/Semantic_Web] FAQ [http://www.w3.org/2001/sw/SW-FAQ] · Why Semantics? [http://proquest.safaribooksonline.com/9780596802141/3] · Microformats [http://microformats.org/about] · Linked Data [http://linkeddata.org/] · The Next Web? [http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html]
2011-04-13 Mobile Web: A mobile device is a computer you carry around with you. Today, any mobile device has some capability to access the Web, and a quickly growing percentage of all Web usage is done on mobile devices. Clearly, web content needs to be adadpted for the wide variety of sizes, shapes, and functionality of mobile devices. More importantly, however, mobile devices open new possibilities for adapting content based on a user's context. Mobile Mobile Web [http://en.wikipedia.org/wiki/Mobile_Web] · Context [http://proquest.safaribooksonline.com/9780596806231/45] Mobile Web Trends [http://www.quantcast.com/docs/display/info/Mobile+Report] · W3C Mobile Web [http://www.w3.org/Mobile/] · Geolocation in Firefox [http://www.mozilla.com/en-US/firefox/geolocation/] · Geolocation Privacy Issues [http://escholarship.org/uc/item/0rp834wf]
2011-04-18 Real-time Web: Real Time Web is a collection of technlogies that allow immediate and direct web publishing. Real time is perhaps the newest evolution in web technology but in manys ways is more hype than reality. Real time Web has attracted the attention of marketers and those interested in finding new ways to target ads and more actionable user behavior. Real-time Real-time Web [http://en.wikipedia.org/wiki/Real-time_web] Intro to the Real-Time Web [http://www.readwriteweb.com/archives/introduction_to_the_real_time_web.php] · Betting on the Real-Time Web [http://www.businessweek.com/magazine/content/09_33/b4143046834887.htm] · The Realtime Chronicles [http://www.roughtype.com/archives/2009/02/the_free_arts_a.php]
2011-04-20 Security & Privacy: TCP and thus HTTP are clear-text protocols, which make no attempt to hide the data being transmitted. For secure data transfers, it thus is necessary to use additional technologies for providing secure data transfers. For the Web, the most interesting security feature are secure HTTP interactions, which are provided by HTTP over SSL (HTTPS), a protocol that layers an encryption layer (SSL or TLS) between TCP and HTTP. For any task involving personalization and/or trust, it is not only necessary to have a concept for providing privacy, but also to have concepts for identity and how to prove identity, which needs authentication. Security Security [http://en.wikipedia.org/wiki/Internet_security] · Privacy [http://en.wikipedia.org/wiki/Internet_privacy] · Browser Security [http://cacm.acm.org/magazines/2009/8/34494-browser-security/fulltext] Browser Options [http://support.mozilla.com/en-US/kb/Options+window] · HTTPS [http://en.wikipedia.org/wiki/Https] · HTTPS Spec [http://tools.ietf.org/html/rfc2818]
2011-04-25 Guest Lecture by Ashwin Jacob Mathew [http://www.ischool.berkeley.edu/people/students/ashwinmathew] : Openness & Transparency: It has become commnplace to praise the Web as "open" or to hear impassioned defenses of "the Open Web." But what does it mean for an information system to be open? Today we will examine some varying definitions of openness, and guest speaker Ashwin Jacob Mathew [http://www.ischool.berkeley.edu/people/students/ashwinmathew] will present some questions about issues about "openness" raised by his study of the Internet Working Protocols [./img/ashwin.pdf].(Click for Slide Presentation) Openness Openness in Communication [http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1367/1286] · Meaning of Open [http://googleblog.blogspot.com/2009/12/meaning-of-open.html] · Against Transparency [http://www.tnr.com/article/books-and-arts/against-transparency] Open Web Foundation [http://openwebfoundation.org/] · Creative Commons [http://creativecommons.org/] · Open Access [http://www.earlham.edu/~peters/fos/overview.htm] · Sunlight Foundation [http://www.sunlightfoundation.com/] · Data.gov [http://www.data.gov/] · Public Knowledge [http://www.publicknowledge.org/] · Open Knowledge [http://www.okfn.org/]
2011-04-27 Course Recap: Recap
Show Abstracts
Hide Abstracts
Creative Commons License Please send comments to dilanm@ischool.berkeley.edu
Last modification on Tuesday, 01-Mar-2011 16:43:34 PST
valid CSS! valid XHTML 1.0!