Web Publishing Architecture

Look at the various components of Web publishing, many of which are common to most Web applications.

Overview

HyperText Transfer Protocol (HTTP)

HTTP is a Request/Response Protocol

"HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. " Tim Berners-Lee, 1992, Basic HTTP

References: HTTP 1.1 Spec

Browser-Server Tranaction

User initiates transaction by typing URL in browser or clicking on link.

http://xyz.com/98/document.html

Browser

  1. Uses DNS to locate server (xyz.com) and make a connection to port number 80 (in a typical configuration) on that machine..
  2. Sends a document request using one of the HTTP methods, typically GET.
GET /98/document.html HTTP/1.0 

Server

  1. Returns status of request.
    HTTP/1.1  200 OK
              404 Not Found
  2. Sends header info followed by a blank line.
    Content-type: text/html
    Content-length: 3896
  3. Sends document or data from a CGI program.
  4. Objects embedded in document such as images generate new requests to the server.

Stateless

HTTP is a stateless protocol, which means that the server does not maintain any information about the transaction that persists throughout a session. In other words, each transaction is independent of the others.

Session tracking requires a server-side application to maintain it. Session tracking is used in shopping cart applications, for instance.

The Basic Web Server

The Apache Group, an Open Source software project, has developed the leading Web server with over 50% of all servers. Microsoft (IIS) and Netscape's server combined don't come close.

Web servers are fairly stable technology.

Reference: Apache.org, Netcraft survey

Webmaster

Site administrator usually takes care of the following server configuration issues:

For Apache, this is the /usr/local/apache/conf directory and the main configuration file is httpd.conf.

Document Mapping

 

URL Management

Decision about URLs:

 

Authentication & Access Control

Authentication is asking a user to provide identification, usually a user name and password. Basic Authentication uses the htaccess file, and more sophisticated applications will manage this information in a user database.

Access control is determining which areas of the site can be accessed. You can configure the server to allow or deny access to different individuals or groups of users or IP addresses.

Logs

Access Log

152.163.201.137 - - [20/Sep/1998:02:10:08 -0700] "GET / HTTP/1.0" 200 8087    
152.163.201.137 - - [20/Sep/1998:02:10:13 -0700] "GET /SLlogo2.gif HTTP/1.0"    200 6848 
152.163.201.135 - - [20/Sep/1998:02:10:13 -0700] "GET /perl_id_313c.gif    HTTP/1.0 " 200 1911 
152.163.201.136 - - [20/Sep/1998:02:10:13 -0700] "GET /w3jicon.gif    HTTP/1.0" 200 1970  

Some of the tasks surrounding logs:

References: Lincoln Stein, Yahoo's list of tools, Marketwave's Hitlist Examples

Server Hardware and OS

Webmaster manages the hardware, the OS and the network.

Properly configured PC's can be powerful enough to handle sizable load, obviating the need for more expensive servers from Sun.

Small dedicated Web server devices such as the Cobalt server with embedded Linux and Web administration.

Key issues

References: Server Watch, WebServer Compare

Applications Development

 

CGI Applications

CGI modules in Perl and Python provide a higher-level interface for the programmer, and hide the low level details.

Script installed in server's cgi-bin directory.

HTML document containing form references the CGI script.

CGI Basics

HTML Page

<form action="http://dale.songline.com/cgi-py/formreply.py" method="Get"> 
Name and Address Form

Name:

Address:

http://dale.songline.com/cgi-py/formreply.py?name="dale" 

CGI Script

Python script installed in cgi-bin directory

import cgi

	
form = cgi.FieldStorage()
form_ok = 0
if form.has_key("name") and form.has_key("addr"):
     if form["name"].value != "" and form["addr"].value != "":
          form_ok = 1
          
print "Content-type: text/html"     # HTML is following
print                               # blank line, end of headers


if not form_ok:
	print "<H1>Error</H1>"
	print "Please fill in the name and addr fields."
        return
else:
	print "<H1>Results</H1>"
	print form["name"].value, form["addr"].value

Application Servers

Programs like Cold Fusion and Microsoft's Active Server Pages (ASP) are application servers.

Applications servers provide a framework for non-programmers to create dynamic Web sites. Still requires technical knowledge to build applications, but coding complexity is more similar to HTML.

Databases and SQL

Static vs. Dynamic Sites

Increasingly, applications structure information in databases and then generate HTML dynamically.

A database can mean many different things: flat-file database, dbm files, relational database such as Access and SQL Server from Microsoft or MySQL, which is free software. Oracle and Informix provide large scale database servers.

Ideally, you design your application independent of a particular database, and then you can migrate your data to better performing database systems as the need arises.

The main application interfaces to the database are through SQL and/or ODBC. SQL can be used to create or modify data records in the database as well as to select sets of data from it.

Example:

SELECT NAME, ADDR FROM EMPLOYEES WHERE NAME EQ "DALE DOUGHERTY"

Languages such as Perl, Python and Java all provide fairly standard interfaces for accessing databases.

Cold Fusion Example

Cold Fusion from Allaire is a Windows/NT application.
Server is configured so that files ending in .cfm are passed to the Cold Fusion application server.

HTML file: (could be created as a .cfm file.)

<FORM ACTION="searchquery.cfm" METHOD="Post">
Last Name: <Input Type="text" Name="LastName">
<Input Type="Submit" Value="Search">
</FORM>

Application file (.cfm):

<CFQUERY Name="EmployeeList" Datasource="Examples">
 
	Select * From Employees
	WHERE LastName = '#LastName#'

</CFQUERY>

<body>
<H2>Results</H2>

<CFOUTPUT>
	<P>The search for #Form.LastName# returned
	the following:
</CFOUTPUT>

<CFOUTPUT QUERY="EmployeeList">
	<HR>
	#FirstName# #LastName# (Phone: #PhoneNumber#) 
	<BR>
</CFOUTPUT>
ASP variables are referenced using %. (e.g., %LastName%).
 

Corporate Applications

Other Major Components

 

Streaming Media

RealAudio and RealVideo require a seperate server to stream multimedia content to users.

Content must be first prepared in their format using RealPublisher or another tool that supports this format.

Licensed based on the number of simultaneous connections to the server.

Just came out with new G2 system.

References: Real Media, Perl Interview

Ad Server

The ad server provides for the dynamic rotation of advertising banners on a site, and the collection of data to track impressions and click-throughs.

Ad sales rep uses the server as adminstrator to set up campaigns. Advertisers use the site to get real-time reporting on how their ad is doing.

High-end ad servers allow more targeted delivery of ads based on:

References: 3 Ad Server Solutions

Search Engine

Search engine provides a full-text index of a site or a collection of sites.

Webmaster needs to configure indexer to run at certain intervals, either to regenerate complete index or simply to update it.

We use a subject index as the primary interface for searching and then offer the full-text search.

References: Web Review Search

Conferencing and Chat Systems

Sites will use conferencing and chat systems to create community and increase user involvement.

Conferencing or Bulletin Board Systems

Chat

References: WebBoard

Mailing List Software

Email remains the dominant form of communication on the Web. The ability to send regular email to users is very valuable.

We use an "email" subscription box on our sites to encourage users to provide an email address to us. Then we send our table of contents to them weekly.

Mailing List Servers automate the process of maintaining a mailing list and sending out large numbers of messages:

Major Domo, ListServ, Lyris

Browser

Netscape and Microsoft share about 90% of the market, with Netscape still the leader with over 50% but Microsoft continues to show steady growth. Opera, an interesting new entry, hasn't made much progress.

Key Developments:

References: Mozilla.org, Opera,

Key Issues

Browser incompatibilities remains big headache

 

Layout

 

Implementing Layouts

Which Layout Strategy Will You Use?

 

Content Management

A system for managing the production, development and delivery of content.

References: PACE

Full View