Chapter 4. Reliability

Table of Contents

Simple page processing
RAS properties
Behavior in case of failures
Multi-Page (parallel) processing
Homepage-charts service
RAS properties without timeout
Behavior in case of failures
RAS properties with timeout
Effect of caching on the scheduler
A design flaw:
Another design flaw
Multiple Multi-Pages
Service Access and Reliability

Simple page processing

The diagram below – taken from the Infrastructure Overview - shows the typical flow of control for a simple page request in a J2EE Model 2 Architecture.

Figure 4.1.

  1. POST/GET. The request is submitted from the client browser and arrives at the Controller servlet. There is a single instance of the Controller servlet per web application. The Controller multi-threads, processing multiple client request concurrently. All requests for the web application arrive at the Controller.

  2. Dispatch. The controller determines who the requesting user is, determines which page they are requesting, sets up the necessary context and invokes the correct Handler to process that request.

  3. Create/Update. There is typically one Business Logic Handler per page. The handler is responsible for initiating the business operations requested by the user. Typically this will involve interaction with backend systems to retrieve or modify some persistent state. When the processing has completed, the Handler is responsible for creating or modifying State Data (the Model) held in the Web Application to represent the results. The Handler then completes and control passes back to the Controller.

  4. Forward. The Controller will then determine which JSP to invoke to display the results. Normally the JSP is determined automatically based on the Requested page, however the Handler may have over-ridden the default if it wishes.

  5. Extract. The role of the JSP is to render the page as HTML. When the JSP needs to display application data it extracts it directly from the State Data (Model) that has previously been setup by the Handler.

  6. Respond. When the JSP has finished rendering the page it is returned to the client browser for display.§

Therefore, each page to be delivered to the client typically involves writing a triplet comprising: a Handler to process the Business Logic; a Model to hold the result data; a JSP to display the results back to the client.

AEPortal uses the JADE infrastructure to support the above process.

RAS properties

The processing of a simple page request has the following RAS properties:

  • No additional threads created. Handler uses servlet thread

  • During the request the AEPortal database is contacted (e.g. for profile information) and optionally an external service is accessed (e.g. to load a research document from a web-server

  • Processing is sequential and response-time constrained: the handler cannot use wait times (e.g. waiting for network responses) for other tasks

  • If the handler is blocked, the whole request coming from the web container is blocked too. If the maximum number of open connections is reached, no new request can enter AEPortal WHILE the handler(s) are busy.

  • No timeouts are specified for a handler. If timeouts happen they do so within the external service access API.

  • There is currently no external service access API that would offer a Quality-of-Service interface e.g. to set timeouts or inquire the status of a service.

Behavior in case of failures

Let’s assume that an external service becomes unavailable. Eventually a handler waiting for this resource will get a timeout in the access API of that service and return with an error. This may take x seconds to become effective. (We need to know more about timeouts in our access APIs).

While waiting for the external resource a handler will hold on to some system resources but at the same time block an entry into the system. The effects on the system resources should be benign.

The user will have to wait until the timeout happens and the request returns. A different user with a request to a different resource will not be affected but a user going after the same resource will see the same delay while waiting for a timeout.

It would be an improvement for both system and user if we could tag a service as being unavailable and not start any new requests against this service. But this raises a couple of questions:

  • Who turns a service off? Is it the service itself? A handler?

  • When is a service turned off? If it does not answer at all? If it is simply slower than usual? What is too slow?

  • When is a service turned on again? After a certain time or number of requests?

  • Who can turn on a service again?

For now we need to make sure that the timeouts hidden in our access API’s are short enough to avoid blocking too many requests too long.

Debug has shown some requests waiting 4 minutes or more e.g. on Quotes. This a severe system drain and a bad user experience.