GWD: Scalability Needs

Distributed Computing Applications and Infrastructure (IS 206)
Fall 1998

[Group E Home] | [IS206 Home] | [Milestone 6 Top]

Scalability Needs for GWD

Global Web Developer, as an application tailored to facilitate the web development of various sizes of organizations, can have installations which range from an organization developing a small community center website to a large-scale corporate website composed of smaller departmental subsites. GWD has been designed with the scalability needs of various types of organizations in mind.

For organizations with a small website, GWD can adequately run from a single host. In many cases, scalability for a single host may be a matter as simple as upgrading to a better server with multiple processors and more RAM. For larger-scale website development, however, the use of GWD can be scaled and optimized by partitioning across multiple hosts. GWD has been designed to enable this kind of partitioning. Taxed processes can be restored to desired performance levels by adding additional hosts into the application server group, effectively scaling to match increased use.

Modularization for partitioning

Many of GWD's functions have been modularized in such a way that different features can be partitioned onto different hosts. These functions have been split into 12 different atomic processes, listed in the table given below. Although each module can potentially be installed on separate hosts, in practice scalability is attained more appropriately by clustering similar atomic functions on the same host. As necessary, the more resource-intensive processes, such as the collaborative environment, can have entire hosts devoted to their processing.

This table lists each feature and the database associated with that module. The "Group" letters on the right of the table are clustering suggestions for the different modules. These are only suggestions based on the closeness in function between the different processes. Some of these groups or individual processes may require further partitioning onto multiple hosts, depending on their degree of use (please refer to the section on congestion).

The twelve basic modules are as follows:

Table: Process-based Modules

Feature Database Group

Access Control/Permission GWD User A

Styleguide & Template Management; Quality Control GWD Website; Template A

User Email Alert; Access Control/Permission Visitor B

(Separate location for public pages) Public Website B

Annotation Annotation C

Real Time Collaboration Real Time C

Replication/ Reconciliation Reconciliation D

Version Control Version Control D

Labor Tracking Site Status E

Link Management Link Management E

Traffic-Based Link Suggestion; Content-Based Link Suggestion Suggestion E

Backup Backup F

Characteristics for each group:

Group A: GWD-specific administrative features
Group B: Public access information
Group C: Messaging features
Group D: Collaborative editting features
Group E: Site-specific dynamic features
Group F: Backup

Partitioning in terms of scalability

As mentioned above, there are individual modules based around specific functions and these modules are grouped together into clusters.

Diagram: GWD Application Logic Architecture

(To get a more detailed view of where this GWD application logic diagram resides, please refer to the section about "Host-level Architecture for GWD". )

There are three reasons behind the modularization of features with feature-specific databases and their groupings:

Separation of concerns to make interoperability and future modifications easier

Minimize congestion in accessing a single large database

Minimize the communications required across modules

Even though processes across multiple hosts may be required for some transactions, such as those initiating from the Group A cluster and redirected to the other processes, the majority of usage is directed toward the use of one module.

Taking the "real-time collaboration" process as an example, consider that organizations which require intensely interactive web development will heavily take advantage of the collaboration features. In these organizations, the Group C cluster may have to broken up to devote a single or several fulltime servers to the "real-time collaboration" module. This would improve the performance of both the collaboration and the annotation.

Looking closely at individual modules, modules themselves also have the ability to be partitioned across multiple hosts. One of the methods, partitioning the "real-time collaboration" module across multiple hosts, is outlined in the section on concurrency. This example highlights the value of directing all collaboration requests to one of many hosts so that the load is balanced and evenly distributed.

Partitioning of modules is not limited to "real-time collaboration." In the case of an organization focusing on publishing web-based books, heavy use might be centered more on the "replication/reconciliation" feature and thus require the partitioning of this module across multiple hosts while opting to keep the other clusters (A, B, C, D's "version control" module , and E) on a single host.

Along this line, scalability is not only limited to the processes but also applies to the databases used by the modules. Because GWD will be heavily used in some organizations, these databases may become filled with a substantial amount of data. The amount of data stored in the databases can slow down the performance of tasks. Therefore, these databases can also be partitioned according to the website subsection or other set identifier to allow incoming processes to a database to be directed to the proper partition.

What questions must scalability address?

The main benefit for having different methods of partitioning is the ability to customize GWD installations to specific organizations. For each organization, the use of GWD, the scalability needs, and the monetary resources will differ. A few questions must be answered to inform the scaling of GWD:

Where are the likely locations of congestion based upon the organization's anticipated use patterns and practices?

In what way will this impact performance?

If multiple hosts are going to be added, how should modules be clustered and which modules must be partitioned?

What hardware and network communications infrastructure purchases and improvements are necessary?

Are there scalability tradeoffs that must be made to increase the performance of key areas while adhering to a strict budget?

The purpose of scalability is to optimize performance. The addition of mobile code as a means of distributing application logic to the client side may, in some cases, be another scalability option to increase performance without the need of adding hardware. A diagram with mobile code as part of GWD can be seen in the architecture section.

[Milestone 6 Top]
last updated 12/04/98

Feature	Database	Group
Access Control/Permission	GWD User	A
Styleguide & Template Management; Quality Control	GWD Website; Template	A
User Email Alert; Access Control/Permission	Visitor	B
(Separate location for public pages)	Public Website	B
Annotation	Annotation	C
Real Time Collaboration	Real Time	C
Replication/ Reconciliation	Reconciliation	D
Version Control	Version Control	D
Labor Tracking	Site Status	E
Link Management	Link Management	E
Traffic-Based Link Suggestion; Content-Based Link Suggestion	Suggestion	E
Backup	Backup	F