(2) Abstract
This lecture looks at Web browsers and how they work. It introduces the basic functionalities of a browser; retrieval and rendering of Web pages. Any modern browser needs to support more than just HTTP and HTML; it must support CSS for stylesheets, JavaScript for scripted Web pages, various image formats, and popular applications such as Flash. In addition, browsers can support additional functionality such as off-line operation, or in general more application-oriented features such as AIR or Silverlight.
Browser Basics
(4) What is a Web Browser?
- Network access (HTTP, HTTPS, FTP, file system, …)
- Rendering HTML layout (a subset of CSS layout)
- CSS specifies many more features
- Handling special HTML in the required way
- images (in various formats) must be downloaded and embedded
- forms must be rendered and form data must be submitted
- Running scripts and providing them access to the page
- re-rendering when scripts change the page (DHTML)
- providing scripts with network access (Ajax)
- Utility functions to make the browser more usable
- tabs and bookmarks for more organized browsing
- security policies for safer browsing
- additional content types may be supported (by external software)
- the browser may be extended (add-ons)
(5) One Minute in the Life of a Browser
- Analyze URI and connect to server to retrieve resource
- recursively repeat until all required resources are retrieved
- Analyze HTML, correct errors, and compute a DOM tree
- DOM is a memory representation of the HTML markup
- Apply CSS and compute the layout of the styled DOM tree
- compute CSS decorated DOM and apply formatting algorithm to it
- Start executing Scripting [Scripting] code and change the DOM as required
- scripting may have initial phase and user interaction phase
- Continue executing scripting code in response to user interactions
- for many dynamic Web pages, this is a continuous activity
- If the user clicks on a link, start all over again
(6) Browsers, Apps, Operating Systems
- Traditionally, a browser is an application for an OS
- loading page descriptions and rendering the pages
- the first browsers did not execute any code
- Scripting [Scripting] and Plug-ins changed the browser into a runtime platform
- browsers are code (the browser code) executing downloaded code
- Browsers are becoming increasingly feature-rich
XMLHttpRequest
allows script/server communications- [http://www.w3.org/html/wg/] adds WebStorage, WebSockets, and more OS-like features
- Browsers could become the only app to run on hardware
- Google's [http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html] may become the first
browser OS
- Palm is going a different route with [http://en.wikipedia.org/wiki/WebOS] (JavaScript apps)
- WebOS and ChromeOS are essentially the same (rich JavaScript runtimes)
(7) Browser Usage (Fall 2009)
██ Internet Explorer (69.80%)
██ Mozilla Firefox (20.66%)
██ Safari (7.18%)
██ Chrome (0.87%)
██ Opera (0.72%)
██ Netscape (0.52%)
██ Other (0.25%)
(8) Browsers and CSS
- Browsers have their own built-in CSS code
- HTML pages with no CSS are still formatted in some way
- HTML pages can provide their own CSS to change defaults
- users can change the browser's default to their own preferences
- CSS has a [http://www.w3.org/TR/CSS21/cascade.html#cascading-order]
- browser defaults
- user declarations
- page declarations
- page [http://www.w3.org/TR/CSS21/cascade.html#important-rules] declarations
- user [http://www.w3.org/TR/CSS21/cascade.html#important-rules] declarations
- Rendering of HTML/CSS depends on a variety of factors
- default settings of the browser
- preferences set by the user
- CSS code provided by the page author
- HTML/CSS capabilities of the browser
(9) Browsers and the Internet
Before retrieving the Web page [http://www.berkeley.edu/], the browser first has to find out the IP [Internet Architecture; Internet Protocol (IP) (1)]address of the www.berkeley.edu
server. Using this address, it can then open an HTTP [URIs & HTTP; Hypertext Transfer Protocol (HTTP) (1)] connection. The lookup service used by the browser is the Domain Name System (DNS) [Internet Architecture; Domain Name System (DNS) (1)].
(10) Supported URI Schemes
- Most Web pages are available over HTTP [URIs & HTTP; Hypertext Transfer Protocol (HTTP) (1)]
- one popular exception are pages available over HTTPS [Security & Privacy; HTTP over SSL (HTTPS) (1)]
- Most browsers support more than just the HTTP and HTTP URI Schemes [URIs & HTTP; URI Schemes (1)]
- http: and https: are necessary (these are the Web protocols)
- [http://en.wikipedia.org/wiki/File_URI_scheme] allows the browser to load local files
- ftp: is useful because many documents are available on FTP servers
- mailto: usually is not built into the browser (the mail tool is started)
- tel: is a useful scheme for devices with telephone functionality
- sms: is another useful scheme for devices with telephone functionality
- Firefox 3 allows the [https://developer.mozilla.org/en/Web-based_protocol_handlers]
(11) Caching
- Browsers retrieve resources for rendering Web pages
- In a typical user session, many resources are used repeatedly
- using the browser's
back
button - accessing pages reusing the same CSS or images
- Caching is a frequently used optimization in computer systems
- store retrieved data locally
- reuse that data when it is used again instead of fetching it again
- the hard (and important) part is cache invalidation
(12) Security and Privacy
- Browsers store a lot of security-sensitive data
- data entered in forms is stored for future visits
- authentication credentials (Cookies [Web Storage; Cookies (1)]) are stored on behalf of servers
- the browsing history of visited pages is stored
- passwords are stored in password managers
- Connecting to HTTPS Web sites requires a certificate validity check
- browsers come with a large set of pre-installed certification authorities
- users implicitly trust this list of pre-installed authorities
- Browsers provide control over these features in complicated settings
- Browsers start providing more user-friendly
private modes
- Safari calls the feature private browsing
- IE8 has an [http://www.microsoft.com/windows/internet-explorer/beta/features/browse-privately.aspx] mode
- Firefox 3.1 includes Private Browsing
- Security/Privacy (as always) is a trade-off with convenience
(13) Browsers and Scripting
- Scripting [Scripting] is essential for most modern Web pages
- well-designed Web pages also work when scripting is turned off
- many Web pages are not designed all that well
- when scripting is turned on, behavior should be predictable and consistent
- Scripting problems plagued Web developers for a long time
- major parts of Web development go into ensuring compatibility
- ill-behaving browsers (such as IE) make it impossible to develop simple code
- JavaScript Frameworks [Scripting; JavaScript Frameworks (1)] provide
compatibility layers
on top of browsers
- Browsers can morph into
runtime environments
- using [http://docs.google.com/] has little to do with Web browsing
- some essential features are missing (offline capabilities, local storage)
- HTML5/Chrome is Google's attempt to morph the Web into an application platform