Queuing and Response Time - Cloud Computing

Queues form because resources are limited and demand fluctuates. If there is only one bank teller, then we all need to queue (stand in line) waiting for the teller. Typically, more people go to the bank at lunch time than during normal working hours, but even during the peak lunch-hour timeframe, customer arrival rates are not uniform. People arrive when they feel like it. Queueing theory explains what happens when utilization of resources increases and tasks serialize (wait) for a resource to become available.

That is, what happens as the teller gets busier and busier. Since requests aren’t evenly spaced (people arrive at the bank haphazardly), as utilization of a resource (the teller) increases, waiting times go through the roof. The classic queue length (response time) versus utilization curve looks like the curve

Queue of buses in Scotland (Photograph by Ian Britton, licensed under Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License from FreeFoto).

Queue of buses in Scotland (Photograph by Ian Britton, licensed under Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License from FreeFoto).

Now consider this:
Response Time = Waiting Time (Queueing Time) + Processing Time
Waiting Time = Time to Service One Request×Queue Length

For example, let us consider a resource (say, a disk drive servicing a database query) that takes, on average, 200 milliseconds per query. The response time is 200 milliseconds if no request is ahead of yours. But if there are 20 requests queued ahead of yours, then the average response time will be 200 times 20 or 4,000 milliseconds (4 seconds) of waiting time in addition to the 200 milliseconds needed to get the job done. The queue grows exponentially as utilization of the resource increases, in a characteristic “hockey-stick” curve.

Queue length increases as utilization increases.

Queue length increases as utilization increases.


The magic is that doubling the number of servers has a dramatic effect on response times. if one server is ninety-five percent busy, the queue length is eighteen milliseconds. If we add another server, and these two servers are bothserving the same workload, each server will, on average, only be busy 47.5 percent of the time, and from the average queue length drops from eighteen milliseconds to less than two milliseconds.

But the magic is not infinite, and the economic law of diminishing returns (also called diminishing marginal returns) applies. Doubling to four servers has little effect on improving response times. The servers mostly sit idle, and the resources are wasted. Notice that the “knee” of the above curve occurs between sixty-five and seventy-five percent; for utilization above the knee, response times deteriorate rapidly.

You’ve observed queuing theory in action while waiting in line at the post office, the bank, or the Department of Motor Vehicles for a single clerk, and then all of a sudden, another clerk is added. Waiting times drop quickly.

You may have also observed it in heavy rush-hour traffic where the road is jammed with cars (very high utilization) and the speed of travel slows to a crawl, but there is no sign of an accident.

Of course, in a computer system, there are numerous resources, and queues form for access to each of them. This explanation was a long-winded way of demonstrating the importance of getting it right. Too few resources, and response times become unacceptable; too many, and resources are wasted.

Cloud computing is a capacity planner’s dream. The beauty of cloudbased resources are that you can dynamically and elastically expand and contract the resources reserved for your use, and you only pay for what you have reserved.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Cloud Computing Topics