The problem of the speed over the years had been built up by the multiprocessors. Over a decade, if DRAM is considered, the access time of the memory is reduced by half implies that the speed of the memory has increased to 2 times. When a microprocessor is considered, the speed has increased by ten times. Therefore, it can be concluded, over a time one decade, the clock cycles of the processor increased by a factor of six, which can be considered as the memory access latency.
If considered the bus-based systems, the latency of the memory data has increased due to developing and organizing a bus of high-bandwidth between the processor and memory. Local memory access latency is increased by distribution of the memory as memory distribution results in network latency and this network latency is combined with the local memory access latency.
The increase in size of the machines facilitates in accommodating more number of nodes for communication and computation which leads to increased network jumps thus resulting in increasing the latency. Maintenance of the high bandwidth and decreasing the data access latency is the aim of the hardware design.
An in-depth examination of the resource utilization helps in understanding the concept of handling latency tolerance. The processors consider the pipeline as the information interchange architecture from one node to another node. These pipelines include network interfaces in the switches, in the network lines and at the source and the destination. Stages are involved in information interchange as well such as main processor, local memory, cache memory.
When communication pipeline is considered, at a particular point of time only a single word is transmitted thus enabling only one stage as busy. In other structure of communication either the processor or the communication architecture is busy. In these cases, the main goal of the latency tolerance is to use the resources to the maximum by the process of overlapping.
By a send operation, a message is sent by the sender in message-passing. The data from the buffer is replicated into the application address space by the receive operation. The source of the data is being issued a request message by the receiver. This is considered as the communication initiated by the receiver. The data is then sent through using a different send operation.
The data is the message takes time to be sent to destination, the receive processing takes time and the receipt of acknowledge is also time consuming. The time taken by all these activities is summed up which results in the communication latency of a send operation. The time taken for overhead processing which includes the time taken for data getting copied into the application, the latency in the cases of not arrival of the data when summed up results in the receive operation latency. Overheads are used at both the send and receive operation ends in order to cover these latencies.
Shared address space is basically operated by read-write communications. When the data from a cache memory is accessed or the data from memory of a different processor is accessed by the read operations, it is considered as the communication that is being initiated by the receiver. In the absence of cached memory, the data of the remote memories can be accessed and is considered as the communication that is being initiated by the sender.
The cache coherence protocols decided on whether the writes are either by the sender or is the communication initiated by the receiver. The latency tolerance is considered crucial by both the read and writes of the shared address space when they are finely concentrated in a supported hardware which processes the communication initiated wither by the sender or the receiver.
By using the clear user programs, block transfers can be smoothly processed by using the hardware or software in the shared address space. The command is used for transferring of the blocks. This command is similar to that of the send command. A communication assist after obtaining the relevant explanation about the send command, transfers the data from the source to the destination through a pipeline. Specified locations are selected for storing the data that is pulled at the destination by the communication assist from the network interface.
This can be differentiated with that of send-receive passing in the aspects that the program data structures are instructed by the send operation , the locations for placing the data at the destination and the locations are within the shared address space.
There are some instances where the memory operation is activated as non-blocking. Here a past memory operation can be considered as other instruction by the processor and the processor can continue. This is easily implemented for writes. In case when some time is taken by the processor, meanwhile the write that is being placed in a write buffer is sent to the memory system. But in case of read, the instructions are quickly followed by read and the value is to be returned immediately.
Most of the commercial microprocessors have adopted and widely use the concept of pre-communication, for hiding the latency. By using the pre-communication technique, the data item is not replaced by a prefetch instruction, and hence prefetch instruction is considered as non-blocking and thus results in overlapping which leads to reduce the latency.
A specific structure of the hardware is created known as prefetch buffer, which is meant for storing the prefetch data. When a read instruction is processed, then in this case it is from a prefetch buffer and not from the main memory. When long time is taken to hide or reduce the latency, many of the iterations are prefetched in advance enabling the prefetch buffer to maintain sufficient words.
One of the widely accepted technique for hiding and reducing the latencies is the technique of multithreading. The advantages associated with this hardware-supported technique are as follows -
As time passes, the latencies are growing longer and longer. Also many sophisticated microprocessors are developed which develop many latest methods for extending the multithreading. The emergence of new techniques of multithreading when combined with the above mentioned reason, calls for future changes in this multithreading trend.
Parallel Computer Architecture Related Interview Questions
|Python Interview Questions||C++ Interview Questions|
|Artificial Intelligence Interview Questions||Computer Graphics Interview Questions|
|Compiler Design Interview Questions||Computer architecture Interview Questions|
|Synchronized Multimedia Integration Language (SMIL) Interview Questions||x86 Interview Questions|
|Multimedia compression Interview Questions||Advanced C++ Interview Questions|
|Basic C Interview Questions|
Parallel Computer Architecture Tutorial
Parallel Computer Architecture
All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd
Wisdomjobs.com is one of the best job search sites in India.