URIs & HTTP

Web Architecture [./]
Fall 2011 — INFO 253 (CCN 42598)

Dilan Mahendran, UC Berkeley School of Information
2011-10-11

Creative Commons License [http://creativecommons.org/licenses/by/3.0/]

This work is licensed under a CC
Attribution 3.0 Unported License
[http://creativecommons.org/licenses/by/3.0/]

Contents D. Mahendran: URIs & HTTP

Contents

D. Mahendran: URIs & HTTP

(2) Abstract

The Web's architecture has very simple principles revolving around the ideas of placing a heavy emphasis on a consistent and global identification mechanism for resources, a standardized way of how resource representations can be retrieved, and a standardized way of how resource representations should be usable by using standardized media types. Based on the Internet, the Web's transport protocol transmits representations of resources identified by a Uniform Resource Identifier (URI) between Web servers and clients. The most important protocols for data transfer on the Web is the Hypertext Transfer Protocol (HTTP).



Files, Filesystems, & Directory Structure

Outline (Files, Filesystems, & Directory Structure)

  1. Files, Filesystems, & Directory Structure [4]
  2. File Paths [4]
  3. Web Servers [3]
  4. Uniform Resource Identifier (URI) [7]
  5. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Files, Filesystems, & Directory Structure D. Mahendran: URIs & HTTP

(4) Files



Files, Filesystems, & Directory Structure D. Mahendran: URIs & HTTP

(5) Filesystems and Directory Structure



Files, Filesystems, & Directory Structure D. Mahendran: URIs & HTTP

(6) Graphical View of Directory Structure

Graphical view of directory structure


Files, Filesystems, & Directory Structure D. Mahendran: URIs & HTTP

(7) Textual View of Directory Structure

Textual view of directory structure


File Paths

Outline (File Paths)

  1. Files, Filesystems, & Directory Structure [4]
  2. File Paths [4]
  3. Web Servers [3]
  4. Uniform Resource Identifier (URI) [7]
  5. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
File Paths D. Mahendran: URIs & HTTP

(9) File Paths



File Paths D. Mahendran: URIs & HTTP

(10) Absolute vs. Relative File Paths



File Paths D. Mahendran: URIs & HTTP

(11) Absolute vs. Relative File Paths



File Paths D. Mahendran: URIs & HTTP

(12) Why Relative Paths?



Web Servers

Outline (Web Servers)

  1. Files, Filesystems, & Directory Structure [4]
  2. File Paths [4]
  3. Web Servers [3]
  4. Uniform Resource Identifier (URI) [7]
  5. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Web Servers D. Mahendran: URIs & HTTP

(14) Web Servers



Web Servers D. Mahendran: URIs & HTTP

(15) URL Paths



Web Servers D. Mahendran: URIs & HTTP

(16) URL Paths vs. File Paths



D. Mahendran: URIs & HTTP

(17) Web Server Service



Uniform Resource Identifier (URI)

Outline (Uniform Resource Identifier (URI))

  1. Files, Filesystems, & Directory Structure [4]
  2. File Paths [4]
  3. Web Servers [3]
  4. Uniform Resource Identifier (URI) [7]
  5. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(19) Resource Identification

Global naming leads to global network effects... the value of an identifier increases the more it is used consistently

Architecture of the World Wide Web, Volume One [http://www.w3.org/TR/webarch/]



Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(20) URIs & Resources



Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(21) URIs & Resources



Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(22) URI Schemes

URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
http://courses.ischool.berkeley.edu/i253/f11/geolocation#(4)


Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(23) Resources & Representations



Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(24) 1 Resource, 2 Representations

img/google-representations-1.png


Uniform Resource Identifier (URI) D. Mahendran: URIs & HTTP

(25) 2 Resources, 1 Representation

img/google-representations-2.png


Hypertext Transfer Protocol (HTTP)

Outline (Hypertext Transfer Protocol (HTTP))

  1. Files, Filesystems, & Directory Structure [4]
  2. File Paths [4]
  3. Web Servers [3]
  4. Uniform Resource Identifier (URI) [7]
  5. Hypertext Transfer Protocol (HTTP) [14]
    1. HTTP Basics [7]
    2. HTTP Authentication [5]
Hypertext Transfer Protocol (HTTP) D. Mahendran: URIs & HTTP

(27) DNS & HTTP

The two basic protocols which every Web browser must implement are DNS [Internet Architecture; Domain Name System (DNS) (1)] access and HTTP [Hypertext Transfer Protocol (HTTP) (1)]. However, most operating systems provide an API for DNS access, so the browser can use this service locally and only has to implement HTTP. TCP [Internet Architecture; Transmission Control Protocol (TCP) (1)] (which is required as the foundation for HTTP) is usually provided by the operating system.

browser-dns-http.png

Hypertext Transfer Protocol (HTTP) D. Mahendran: URIs & HTTP

(28) The Web's Protocol

internet-traffic-trends.png

provided by CacheLogic Inc. [http://www.cachelogic.com/]



HTTP Basics

HTTP Basics D. Mahendran: URIs & HTTP

(30) HTTP Messages

  • HTTP needs a reliable connection
    • the foundation for HTTP is the Transmission Control Protocol (TCP) [Internet Architecture; Transmission Control Protocol (TCP) (1)]
    • DNS resolution yields an IP address
    • open TCP connection to port 80 or port specified in URI (http://rosetta.sims.berkeley.edu:8085/)
  • HTTP is a text-based protocol
    • the connection is used to transmit text messages
    • all HTTP messages are human-readable (not all entities, though)
    • basic HTTP operations can be carried out by hand
start-line
						message-header *
						
						message-body ?


HTTP Basics D. Mahendran: URIs & HTTP

(31) HTTP Header Fields

  • Header fields contain information about the message
    • general header: Date as the message origination date
    • request header: Accept-Language indicates language preferences
    • response header: Server contains system information
    • entity header: Content-Type specifies the media type of the entity
  • HTTP defines a number of header fields [http://www.cs.tut.fi/~jkorpela/http.html]
    • unknown fields must be ignored (extensibility)
    • unstandardized fields should use a X- prefix
  • HTTP is about acting on these fields
    • HTTP defines what HTTP implementations must or should do


HTTP Basics D. Mahendran: URIs & HTTP

(32) HTTP Requests

  • After opening a connection, the client sends a request
    • the method indicates the action to be performed on the resource
    • HTTP's most interesting methods are: GET, HEAD, POST
    • other interesting methods are: PUT, DELETE
  • The URI identifies the resource to which the request should be applied
    • absolute URIs are required when contacting proxies
    • absolute paths are required when contacting a server directly
    • the URI may contain query information
  • The Host header field must be included in every request
wget -Sv www.berkeley.edu


HTTP Basics D. Mahendran: URIs & HTTP

(33) HTTP GET

  • Retrieval action based on the URI
    • maybe implemented by reading a file
    • maybe implemented by processing a file (PHP)
    • maybe implemented by invoking a process
  • Semantics may change based on header fields
    • If-*: only reply with the entity if necessary
    • Range: only reply with the requested part of the entity
  • Cacheability depends on header fields of the response
GET / HTTP/1.1
						Host: ischool.berkeley.edu


HTTP Basics D. Mahendran: URIs & HTTP

(34) HTTP Responses

  • The server's response to interpreting a request
    • the status code is given numerically and as text
    • 2** for variations of ok
    • 3** for redirections
    • 4** are different client side problems (404: not found)
    • 5** are different server side problems
  • Header fields specify additional information
    • information about the server
    • information about the entity (media type, encoding, language)
HTTP/Major.Minor Status-Code Text
						[Header]*
						
						[Entity]?


HTTP Basics D. Mahendran: URIs & HTTP

(35) HTTP Performance

  • HTTP/1.0 allowed one transaction per connection
    • TCP connection setup and teardown are expensive
    • TCP's slow start slows down the initial phase of data transfer
    • typical Web pages use between 10-20 resources (HTML + images + CSS + scripts)
    • typically, these resources are stored on the same server
  • HTTP/1.1 introduces persistent connections
    • the TCP connection stays open for some time (10 sec is a popular choice)
    • additional requests to the same server use the same TCP connection
  • HTTP/1.1 introduces pipelined connections
    • instead of waiting for a response, requests can be queued
    • the server responds as fast as possible
    • the order may not be changed (there is no sequence number)


HTTP Basics D. Mahendran: URIs & HTTP

(36) HTTP Connection Handling

http-phttp-pipelining.png

HTTP Authentication

HTTP Authentication D. Mahendran: URIs & HTTP

(38) HTTP Access Control

  • HTTP servers can deny access [http://en.wikipedia.org/wiki/List_of_HTTP_status_codes#4xx_Client_Error] because of access control
    • 401 Unauthorized means the resource is access controlled
    • 403 Forbidden means the resource is inaccessible
    • 405 Method Not Allowed signals a request using the wrong request method [HTTP Requests (1)]
  • Two different approaches to unauthorized access are possible
    • repeat the HTTP request with the proper authentication credentials
    • redirect to a Login Page [Login Page (1)] and establish an authenticated [@session]


HTTP Authentication D. Mahendran: URIs & HTTP

(39) HTTP Authentication

HTTP Authentication

HTTP Authentication D. Mahendran: URIs & HTTP

(40) Basic HTTP Authentication



HTTP Authentication D. Mahendran: URIs & HTTP

(41) Repeated Access

  • Clients typically access more than one protected resource
    • a perfectly stateless client would always request authentication from the user
    • using the realm clients can identify repeated accesses
  • Web interactions by default are perfectly stateless
    • each request is completely independent from other requests
    • stateless interactions make the Web loosely coupled and scalable
    • concepts like the realm or Cookies [Web Storage; Cookies (1)] introduce state
  • Clients remember the authentication and replay it automatically
    • browsers provide little control over this feature
    • logging out of HTTP authenticated sessions is hard


HTTP Authentication D. Mahendran: URIs & HTTP

(42) Login Page

  • Basic HTTP Authentication [Basic HTTP Authentication (1)] works with browser controls (including the window)
    • no possibility to log out without using browser-specific controls
    • client side security depends on browser security measures
  • Using forms gives more freedom in session management
    • authentication and authorization are completely application-based
    • if there were secure personal browsers this would not work very well


D. Mahendran: URIs & HTTP

(43) Conclusions



2011-10-11 Web Architecture [./]
Fall 2011 — INFO 253 (CCN 42598)