Instrumentation (Measuring Resource Consumption) - Cloud Computing

Unfortunately, this belief often causes the effort to degenerate into a tendency to “wing it.” Since not all that happens in computer is intuitive, conclusively demonstrated, the results can be catastrophic.

First, Get Your Business Needs Down Clearly
As they say, when the boss says, “jump,” you need to ask, “how high?” Goals not clearly articulated cannot be achieved. You need to start by asking the hard questions:

  • What is the response time (speed) at which services must be delivered to the user?
  • What level of availability is required? (Many cloud service level agreements promise 99.99% uptime, but what does that mean to your business?) Does availability mean the server is running or that applications are performing to your specifications? Is availability measured from within the cloud’s firewall or from end users’ actual devices?
  • What level of elasticity (scalability) do you need, and how quickly (at what velocity) must scaling be accomplished?

Elasticity = Velocity + Capacity

A requirement for a quick ramp-up during peak customer usage periods, and only during those times, requires a high degree of elasticity. How efficiently can the system scale to your needs? If you need to ramp up too early, the benefit of scalability is diminished; scale too late, and your system performance deteriorates under the increased load. The goal is “just-in-time” scalability. Will the required ramp-up be fast at all times of the day and across all geographies?

And just how much capacity can you get? Will an additional hundred or 300 instances be there when you need them? How much human intervention is required to scale? Can it be accomplished automatically by setting policies?

What are the propagation delays? Is a transaction made in your London office available minutes later for use by the Mountain View, California sales team trying to close an end-of-quarter deal? How long does it take the end user to complete a multistep 100 Implementing and Developing Cloud Computing Applications workflow process, irrespective of the time of day, time of the month, or geographical location?

What Technologists Must Know to Manage Performance and Capacity
The following categories of system resources are often tracked by capacity planners.

CPU Utilization : The central processing unit is always technically either busy or idle; from a Linux perspective, it appears to be in one of several statuses:

  • Idle: in a wait state, available for work
  • User : busy doing high-level (application-level) functions, data movement, math, etc.
  • System: activities taking place in “protection code 0” that perform kernel functions, I/O and other hardware interaction, which users are prevented from accessing directly
  • Nice: similar to user state, it’s for interruptible jobs with low priority willing to yield the CPU to tasks with higher priority.

Analysis of the average time spent in each state (especially over time), yields evidence of the overloading of one state or another. Too much idle is an indication of excess capacity; excessive system time indicates possible thrashing (excessive paging), caused by or insufficient memory and/or a need for faster I/O or additional devices to distribute loads. Each system will have its own signature while running normally, and watching these numbers over time allows the planner to determine what constitutes normal behavior for a system. Once a baseline is established, changes are easily detected.

Interrupts: Most I/O devices use interrupts to signal (interrupt) the CPU when there is work for it to do. For example, SCSI controllers will raise an interrupt to signal that a requested disk block has been read and is available in memory. A serial port with a mouse on it will generate an interrupt each time a button is pressed/released or when the mouse is moved. Watching the count of each interrupt can give you a rough idea of how much load the associated device is handling.

Context Switching: Input/output devices and processors are mismatched in terms of speed. This phenomenon makes computers appear to be doing multiple jobs at once by allocating slices of processor time to multiple applications. Each task is given control of the system for a certain “slice” of time, and when that time is up, the system saves the state of the running process and gives control of the system to another process, making sure that the necessary resources are available. This administrative process is called context switching. In some operating systems, the cost of this task-switching can be fairly expensive, sometimes consuming more resources than the processes being switching. Linux is very efficient in this regard, but by watching the amount of this activity, you will learn to recognize when a system exhibits excessive task-switching time.

Memory: When too many processes are running and using up available memory, the system will slow down as processes are paged or swapped out to make room for other processes to run. When the time slice is exhausted, that task may have to be written out to the paging device to make way for the next process. Memory-utilization graphs help highlight memory problems.

Paging: Page faults are said to occur when available (free) memory becomes scarce, at which the virtual memory system will seek to write pages in real memory out to the swap device, freeing up space for active processes. Today’s disk drives are fast, but they haven’t kept pace with the increases in processor speeds. As a result, when the level of page faults increases to such a rate that disk arm activity (which is mechanical) becomes excessive, then response times will slow drastically as the system spends all of its time shuttling pages in and out. This, too, is an undesirable form of thrashing. Paging in a Linux system can also be decreased by loading needed portions of an executable program into pages that are loaded on-demand, rather than being preloaded.(In many systems, this happens automatically).

Swapping: Swapping is much like paging. However, it migrates entire process images, consisting of many pages of memory, from real memory to the swapping devices, rather than page-by-page.

Disk I/O: Linux maintains statistics on the first four disks: total I/O, reads, writes, block reads, and block writes. These numbers can show uneven loading of multiple disks and show the balance of reads versus writes.

Network I/O: Network I/O can be used to diagnose problems and examine loading of the network interface(s). The statistics show traffic in and out, collisions, and errors encountered in both directions.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

Cloud Computing Topics