Cache Coherence and Synchronization - Parallel Computer Architecture

What is Cache Coherence Problem?

In different levels of the multiprocessor system, there could be variations of the data. This may also happen in the level of memory hierarchy. For instance, there could be a variation in the copy from the original object in the main memory and the cache.

The different copies of the block of memories vary as the operation of the multiple processors is in parallel and independent, thus leading to cache coherence problem. To overcome this problem, parallel architecture provides with the cache coherence schemes which facilitated in retaining the identical state of the cached data.

Inconsistency Sharing

From the figure depicted, two processor P1 and P2 refer the shared data by element X. The new data X1 is written by the processor P1, this is enabled by the write-through policy to copy the same in the shared memory thus leading to mismatch between the main memory and cached memory. This can be overcome by using the write-back policy which facilitates in updating the main memory whenever the cached memory data is replaced.

The problem of inconsistency may arise due to many sources. Some of them are -

  • Sharing of writable data
  • Process migration
  • I/O activity

What are Snoopy Bus Protocols?

The system of memory which enables to maintain uniformity of the data in the shared memory and cached memory is termed as snoopy protocols which resembles the memory system that is based on bus memory. The consistency and uniformity is maintained by the snoopy bus protocols by using the policies of Write-invalidate and write-update.

Consistent Copies

Write Invalidate Operation

In the first figure, it is depicted that, the processors P1, P2, and P3 have the copy of the data ‘X’ in shared memory and cache memory. The write-invalidate protocol is used for writing X1 in the cache memory by the processor P1 and the bus is used invalidating the other copies. The blocks that are invalidated are not to be used and hence are termed as dirty. Through bus, the copies of the cache are updated by using the write-update protocol and the copies of the memory are updated by using the write back cache.

Write Update Operation

Cache Events and Actions

When the commands for execution and invalidation are used in the due course, some of the actions and events take place. They are as follows -

  • Read-miss – The block that the processor has to read and which is not present in the cache leads to read-miss. This call for the bus-read operation. A consistent copy is being sent to the cache memory that request for, by the main memory in the absence of dirty copies. In case of existence of the dirty copies in a remote cache memory, that particular cache memory sends the copy to the requesting cache memory.
  • Write-hit – For the copies in dirty state, the activity of writing is performed locally and thus enables the new state as dirty. The copies of the new state ate invalidated by using the write-invalidate command. Immediately after the first write, the main memory reserves the resulting state.
  • Write-miss – By using the read-initiative command, the copy is either sent from the main memory or from the remote cache memory in cases of missing of the writing in local cache memory. This enables the copies of the cache to be invalidated and the dirty state is updated to the local copy.
  • Read-hit – By not using the bus or by not causing any state transition, local cache memory performs the read-hit.
  • Block replacement – The block replacement method facilitates in writing back the dirty copy to the main memory. This option of writing back is not possible for the in valid copies are the copies that are reserved for invalid.

What are Directory-Based Protocols?

To suit and cope with the network with multistage, several changes and updation need to be carried out for the snoopy cache protocols thus enabling them capable of developing large multiprocessor including hundreds of processors. The caches that maintain the copy of the black are communicated with the consistency commands, as broadcasting turn out to be too expensive in a multistage network environment. To serve the purpose of network-connected multiprocessors, the directory-based protocols are being created and designed.

In this system, a common directory maintains the data that is required to be shared and a logical consistency is built between all the caches. An entry is being loaded from the primary memory to the cache memory only when the directory permits to do so. The entry when changed is either updated or will enable to invalidate other cache entries.

What are Hardware Synchronization Mechanisms?

The exchange of information between different processors by one processor communicating with the other processor is known as synchronization.

The process of synchronization is carried out by mostly using hardware mechanisms of the multiprocessor systems. The synchronization process is carried out by using the some of the primitives such as memory read, write or read-modify-write along with some of the inter-processor interrupts.

How Cache Coherency is maintained in Shared Memory Machines?

For the processor with cache memory, it is very difficult and important concern for maintaining cache coherence. As the chances of occurrence of inconsistency in the data among different caches is more.

The major concern areas are −

  • Sharing of writable data
  • Process migration
  • I/O activity

Sharing of writable data

The data element X in the local caches of the two processors P1 and P2 is the same and when P1 writes to X, the main memory is also updated. Now X is not identified by P2 if it wants to read X, as it is updated.

Sharing of writable data

Process migration

Initially, the data element X is present in the cache of P1 and not in P2. Any process done on P2 is received on P1 only after writing on X. As the data element on X turns to be outdated, the process cannot read the data element on X anymore. The data element X is initially written by P1 and is shifted to P2. Then the data element X is read by P2 but the copy of the outdated X still exists in the main memory.

Process Migration

I/O activity

For all the two-processor multiprocessor architecture, a bus is derived and an I/O device is being added to the bus. The new data element is enabled by the I/O device to be stored in the main memory thus making the data element X outdated. When X is transferred by I/O device, the outdated copy is sent.

I/O Activity

What is Uniform Memory Access (UMA)?

The processors which possess the same shared memory in the computer system refer to Uniform Memory Access (UMA) architecture. Symmetric Multiprocessors (SMPs) are those UMAs that are most widely used by the servers. The processor is enabled to uniform access to all the resources like memory, disks, and other I/O devices by the SMP.

What is Non-Uniform Memory Access (NUMA)?

Internal shared networks are being possessed by the SMPs in the NUMA design. All the networks are being connected through a network of message-passing. Hence NUMA depicts the architecture of logically shared physically distributed memory.

A particular element of the memory is determined as either from the local SMP memory or from remote memory by the NUMA machine by using the processor cache-controller. The remote data can be cached by applying the cache processor of the NUMA architecture. There is a need for maintaining of cache coherency as caches exist and hence is also termed as CC-NUMA (Cache Coherent NUMA).

What is Cache Only Memory Architecture (COMA)?

On the basis of addresses, a particular location in the DRAM cache is being identified for mixing the data blocks. The local main memory stored that remotely obtained data. Data blocks are enabled to move in the system as any home location is not assigned to them.

For passing of the messages, the architecture of COMS follows the hierarchical process in which the directory is provided by the tree and the sub-trees constitute the data elements. The requirement of data leads to searching as a home location is not assigned to it. The desired data need to be searched from the directories of the tree, enabling the requirement of a traversal along the switches for accessing remotely. The multiple requests received by the subtrees are combined as one request and is sent to the parent tree. The multiple copies of the desired output data is sent to all the subtrees.

What are the differences between COMAandCC-NUMA?

Some of the significant differences between COMA and CC-NUMA are as follows -

  • In terms of flexibility, COMA is considered as flexible when compared to that of CC-NUMA. Operating system is not required by COMA for transmitting and making copies of the data.
  • The implementation of coherency protocol turns to be very difficult and compulsory requirement of standard memory management makes installation of COMA machines expensive and makes building of the machines more complex.
  • As there the home location is not associated in COMA, this leads to slow down the process of remote access, as it requires traversal for identifying the data when compared to CC-NUMA.

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd Protection Status

Parallel Computer Architecture Topics