Compiler Intermediate Code Generation - Compiler Design

What is an Intermediate code in Compiler Design?

An intermediate code is required for the following reasons:

Intermediate Code

  • If the source language is translated to the target machine language by the compiler without the option of generating intermediate code, a native compiler is required for each of the new machine.
  • The necessity of a new full compiler for each of the unique machine is eliminated by the Intermediate code, the analysis portion remaining the same for the compilers.
  • In accordance with the target machine, the second part of the compiler, synthesis is changed.
  • The source code modifications are easily applied for improving the performance of the code applying the code optimization techniques on intermediate code.

How the Intermediate Code is represented in Compiler Design?

There are many ways of representing Intermediate codes and each of the way has its own benefits.

  • High Level IR – The code that is close to the source language is High-Level intermediate code. They are easily generated from the source code and the code modifications can be easily applied for enhancing the performance of the source code. But it is not preferred for the target machine optimization.
  • Low Level IR – The code that is close to the target machine and which is made suitable for memory allocation, instruction set selection etc is known as Low Level Intermediate code. Machine-dependent optimizations benefits best from this code.

Intermediate code can be either language specific (e.g., Byte Code for Java) or language independent (three-address code).

What is Three-Address Code in Compiler Design?

The input is received by the predecessor phase, semantic analyzer, in the form of annotated syntax tree by the Intermediate code generator. The syntax tree is then converted into a linear representation. The intermediate code is machine independent code and hence it is assumed by the code generator for having unlimited number of memory storage for generating the code.

For instance:

This expression is divided into sub-expressions by the intermediate code generator and the corresponding code is generated.

r being used as registers in the target program.

A three-address code has three address locations used for calculating the expression and is represented in two forms - quadruples and triples.

Quadruples

The instruction of the quadruple presentation is divided into four fields - operator, arg1, arg2, and result. An example is represented in quadruples format as follows:

Op

arg1

arg2

result

*

c

d

r1

+

b

r1

r2

+

r2

r1

r3

=

r3

a

Triples

The instruction of the Triples presentation is divided into three fields - op, arg1, and arg2. The position of the expression denotes the results of the respective sub-expressions. The similarity with DAG and syntax tree is represented by triples. When expressions are represented, they are equivalent to DAG.

Op

arg1

arg2

*

c

d

+

b

(0)

+

(1)

(0)

=

(2)

While optimization, code immovability problem may be faced by triples and hence may result in changing the order or position of the expression.

Indirect Triples

The enhancement in the representation of triples is known as Indirect triples. The results are stored in pointers instead of position facilitating the optimizers to re-position the sub-expression and produce the optimized code.

What are Declarations in Compiler Design?

Before using, the variables or the procedure need to be declared, which may require space allocation in the memory and entering the same in the symbol table? In view of the target machine structure in mind, the program is coded and designed. But a source code cannot be accurately converted to a target language.

The whole program is considered as a collection of procedures and sub-procedures and all the names local to the procedure are declared. The names are allocated to the memory in a sequence and memory is allocated in a consecutive manner. Offset variables are used and is set to zero {offset = 0} denoting the base address.

The target machine architecture and the source programming language differ in the way the names are stores and hence relative addressing is used. The first name is allocated with the first memory location0, {offset=0}, the next declared name is allocated with the next memory.

Example:

An illustration of C programming language is considered where the integer variable is assigned by 2 bytes of memory and 4 bytes of memory is assigned to a float variable.

A procedure ’enter’ is used for entering the details in the symbol table and have the structure as follows:

An empty is created in the symbol table by this procedure for variable ‘name’ which has a type set of type and relative address in the data area.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

Compiler Design Topics