Untitled Document

Distributed Computing Applications and Infrastructure (IS 206)
Fall 1998

Concurrency

Global Web Developer is a large, distributed application which supports dozens or hundreds of users. The role of concurrency in such an application must be significant in order to enable multiple users and processes simultaneously. To achieve the desired application performance goals, many simultaneous users and processes must be accommodated. GWD requires concurrency within single hosts, via time slicing, as well as across multiple hosts. Concurrency across several hosts, including the addition of new hosts to increase throughput as application-use increases, is key to GWD's scalability strategy. Multiple host concurrency is realized in GWD through the use of task routing. Each process requiring high concurrency is routed by a single host to as many hosts as are available to handle that process. This facilitates scalability because as a given task's throughput begins to lag, adding new hosts to that task's routing system alleviates the slowness and scales the system to accommodate its increased use. Employing these concurrency strategies is key to meeting GWD's performance expectations.

Parallelism and Pipelining in Concurrency

More specifically, task concurrency will be realized through parellelism and pipelining, each where appropriate. Parallelism will be used extensively in GWD whenever a process is scaled to service more users. For example, assuming an installation of GWD requiring multiple collaboration servers to handle all the sessions, users of the collaborative environment are routed to the least-taxed server running the collaboration processes. Because there are multiple hosts responsible for the collaboration environment and each collaboration user pair is assigned to the most available server, these multiple instances of collaboration have been facilitated through parallelism. Most other user functions requiring scaling are also concurrently handled with parallelism. Pipelining is another way to implement concurrency. In GWD, a good example of a concurrency need met through pipelining is the processing of documents for Content-Based Link Suggestion. Before a new set of web documents can have content-based links suggested, they must be indexed and compared with the database of previously indexed web documents. This logical process achieves greater throughput by using pipelining to concurrently process several web documents at a time. The different stages of pipelining in this process are:

A new document is parsed to extract keywords
These keywords are indexed and weighted based on term frequency
The new document's keyword list is added to the database
The database is searched for other documents sharing highly ranked terms with the new document
References to similar documents are stored in the link suggestion database for users to access

By dividing the processing of documents into different stages which can be pipelined, link suggestions for new sets of web documents can be quickly created. Each step of the process is fulfilled through specialization and efficient concurrency is achieved.

Processes and Threads

Considering the implementation of concurrency at a more technical level brings us to the discussion of processes and threads. Application-level functions are split into differenct functional processes as required and can be partitioned across hosts or maintained within a single host. Within a single process, however, concurrent centers of activity are each handled as different threads and collectively handled through multithreading. The partitioning of the application into different processes is outlined in the scalability section. These different processes, or modules, are important to make the application scalable. Breaking the overall application into different modules enables the processes to be located on different hosts without difficulty. Multithreading comes into play within the processes. An example use of multithreading in GWD is within the link management process. When new or modified documents are submitted to the link management module, each link starts a new thread. The link's thread checks to see if the link is new or pre-existing. If it is new, it is added to the link management database. In any case, the thread then checks to make sure the link's destination document is still around. Finally, if the linked-to document is missing, the thread flags this in the link management database for the user's attention. Because each document submitted to the link management module has a potentially great number of links, using multithreading increases the throughput. Rather than the process waiting on each link, checking all these details and issuing one or more queries, each link is independently handled with its own thread and the process as a whole proceeds with more throughput.

[Milestone 6 Top]

last updated 12/04/98