Hardware Software Tradeoffs - Parallel Computer Architecture

What are the different tradeoffs between hardware and software?

Different methodologies are offered by parallel architecture for reduction of the cost of the hardware. By reducing the intensity of the integrity between the network and the communication assist, hardware cost can be reduced. By increasing the space for occupancy and latency the hardware cost can be reduced.

By replacing the automatic replication technique from hardware to software, the cost of the hardware can be reduced. This enables to place the replication and coherence into the main memory. The hardware cost can thus be reduced by the nodes and interconnect using the off-the-shelf commodity parts.

What are Relaxed Memory Consistency Models?

The limitations of the memory operations are arranged in an order in which they are executed either in the same location or in a different location. These limitations are defined by the relaxed memory consistency models. This model includes three different interfaces such as hardware-software, user-system and the programmer for the system incorporating and practicing the naming model that incorporates a shared address space.

System Specifications

The requirements of the system with respect to the information related to the manner in which the memory operations are ordered and the performance pertaining to a particular order is identified by the system specifications.

The program order is being adopted by the following types of specification models.

Relaxing the Write-to-Read Program Order

The operations that were missed to be written in the cache memory and which are being missed at the initial level are being enabled to be restrained by the hardware, by this model. The reads of the processors can be completed when none of the other processors can view the write buffer of the write miss.

Relaxing the Write-to-Read and Write-to-Write Program Orders

Prior to the updation of the main memory, write buffer facilitates in merging different multiple writes which are being emerged from the previous writes that are considered as outstanding. The process of overlapping is missed by these writes and hence is visible. By enabling more and more of the data values visible to the processors, efficiency of the processors communication can be increased and the effect of the write latency on the break time of the processor can be reduced.

Relaxing All Program Orders

In a process only the data along with the control dependencies are by default assured. The main advantage associated with this is that the read latency can be hidden by writes turning themselves as out of order as multiple read requests turn out to be outstanding at a particular time. The processors which enable to carry forward the past misses to other memory can use this model, facilitating in re-orderings.

The Programming Interface

It is based on the assumption that, within synchronization, there is no need to maintain the program orders. The operations under synchronization are identified and are labelled. The operations of the synchronization are transformed into desired order-preserving operations by a runtime library.

In a process, by not disrupting the location dependences, the operations can be reorder under synchronization. It is ensured that this does not affect the execution of the system in a sequential and consistent manner by the programming interface. Any number of reordering is facilitated for the processor and reordering as desired. The intensity of the model of the consistency should be similar to that of the hardware interface.

Translation Mechanisms

Each operation under synchronization are prefixed and sufficed by the suitable memory barrier instructions for controlling and maintaining the order mechanism and for transforming the labels. This process is mostly done in microprocessors. The extra instructions are avoided by this process and the loads of instructions for each the individual are saved. This process is not comfortable accepted by most of the microprocessors due to infrequent flow of the operations.

What are Overcoming Capacity Limitations?

The processor cache initially first gets copied or replicated on the reference directly without getting replicated or copied in the local main memory.

This limits the scope of the replication only to hardware cache. The remote memory stores the blocks that are being replaced by the cache memory. These blocks when desired need to be obtained from the remote memory. This facilitates the solutions for the problems associated with replication capacity and efficient provision of hardware coherence.

Tertiary Caches

Remote access cache can also be used for solving the problems associated with the replication capacity. In cases of small nodes of the machine, they can be transformed into larger machines for better performance. The blocks that are being replaced by the local processor cache memory are stored in the tertiary caches.

Cache-only Memory Architectures (COMA)

A tag of the hardware is fixed with all the memory blocks for the COMA machines. For the memory block space, specific node is not fixed. The main memories of the nodes copy or replicate the data which is always migrating. To access the remote block the hardware first gets copies it in the memory and then is received by the cache and is copied in both the places. There is no specific home location for the data block and thus move anywhere within the attraction memory.

Reducing Hardware Cost

By transferring some of the specialized functions from hardware to software, the cost of the hardware can be reduced. The replication and coherence can be easily managed by the software cache than the hardware cache. Replication and coherence in the software main memory seems to reduce cost. The hardware specialization enables to assist other functional components thus making them more efficient.

A variety of approaches are used for reducing the hardware cost. By performing the access control in specialized hardware, and transferring some of the functionalities to software, the hardware cost can be reduced. By performing the access control in software without the support of the specialized hardware thus reduces the cost of the hardware.

What are the Implications for Parallel Software?

Synchronization points are used for accessing the desired conflicts which are labelled by the parallel programs. Some of the variables are labelled as synchronization with the help of the programming language and then the labels are translated as suitable to the instructions of the order-preservation by the compiler. Specific labels can be used for restricting the compiler to order the access to the shared memory.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Parallel Computer Architecture Topics