# Static semantics of JavaC - JVM

The capital Latin letters A, B, C have been used to denote primitive types.

We use the same letters now to denote classes and interfaces. This is on purpose, since in the next chapter, classes and interfaces will be used as types, too. In the present chapter, classes and interfaces are treated as modules only.

By convention, classes or interfaces A, B, C are identi_ers starting with an upper case letter. Interfaces are often denoted by I , J .

Syntax of JavaC

contains a schematic de_nition of a Java class. The parts in angle brackets are optional. The keywords public, abstract and final are called modi_ers. We say that the class is m, if the modi_er m appears in the de_nition of the class.

Constraint. The class de_nition in Fig. below must satisfy the following constraints:

1. The type B must be a class and I1; : : : ; In must be di_erent interfaces.
2. The class B is not final.
3. If A is final, then it is not abstract.
4. If A = Object, then there is no extends clause.

If the extends clause is present, we say that A is a direct subclass of B or B is a direct superclass of A and de_ne A _d B. If the extends clause is

Syntax of a Java class

missing and A 6= Object, we de_ne A _d Object. If the implements clause is present, we say that I1; : : : ; In are direct superinterfaces of A and de_ne A _d Ii for i = 1; : : : ; n.

The syntax of interfaces di_ers slightly from the syntax of classes. If the extends clause is present, we say that I is a direct subinterface of Ji or Ji is a direct superinterface of I and de_ne I _d Ji for i = 1; : : : ; n.

Constraint . The interface de_nition in Fig. below must satisfy the following constraints:

1. The types J1; : : : ; Jn are di_erent interfaces.
2. The interface I is implicitly abstract.

Let _h be the transitive closure of _d. This means that A _h B holds if, and only if, B can be reached by a _nite number of direct _d steps. The relation _h is called the inheritance relation. The following terminology is used for classes A, B and interfaces I , J :

It is not allowed that A _h A. Cycles in the inheritance relation are detected at compile-time or at run-time when classes are dynamically loaded.

Constraint . The inheritance relation _h must be acyclic.

The relation A _h B is de_ned as A _h B or A = B. If A _h B, then one can say that each A is-a' B.

Syntax of a Java interface

Lemma. The relation _h is a partial ordering:

1. A _h A.
2. If A _h B and B _h C, then A _h C.
3. If A _h B and B _h A, then A = B.

The relation _h restricted to classes is a _nite tree. The root of the tree is the class Object. In mathematical terms, this can be expressed as follows.

Lemma . Let A, B and C be classes. Then we have:

1. A _h Object (every class is a subclass of Object).
2. If A _d B and A _d C, then B = C.
3. If A _h B and A _h C, then B _h C or C _h B.

Not much can be said with respect to interfaces except that interfaces have no superclasses but only superinterfaces.

Lemma . If A is an interface and A _h B, then B is an interface, too. Classes and interfaces are collected in so-called packages.

De_nition . A package is a collection of classes and interfaces.
De_nition . A JavaC program is a set of packages.

The usual way to tell the compiler to which package a class or interface belongs is to prepend a package statement to the _le in which the class or interface is de_ned. A package statement has the following form:

package PackageName;

A package name is a sequence of identi_ers separated by dots. The JLS proposes a unique way to name packages using Internet domains. For example:

Syntax of JavaC

Inside the package one can refer to a class by its simple name, e.g. Point3D. Outside the package one has to use the full quali_ed name, e.g.

ch:ethz:inf:staerk:Point3D:

Since the dot is overloaded, an expression x .x .x ' can denote di_erent things in di_erent contexts [18, x6.5].

De_nition . We say that a type B is accessible from A, if one of the following conditions is true:

1. B is a primitive type (Table 3.1), or
2. B is in the same package as A, or
3. B is public.

Constraint. The inheritance relation must satisfy the following constraint:

If A _d B, then B is accessible from A.

Fig. below de_nes what is added in JavaC to the syntax of JavaI, namely return statements and expressions for _elds and method invocations. Method invocations can occur inside expressions or as top-level statements. Fig. below uses the following universes:

Class . . . . (fully quali_ed) class and interface names,

Class members

Class members are constructor declarations, _eld declarations, method declarations and static initializers.

Field declarations. A _eld declaration in a class C has the following syntax:

We refer to the _eld as C=_eld. The type A is called the declared type of the _eld. We say that the _eld is m, if the modi_er m appears in the declaration of the _eld. If the optional part = exp' is present, then the assignment _eld = exp is called the initializer of the _eld.

Constraint. A _eld declaration must satisfy the following constraints:

1. The type A is accessible from C.
2. The _eld is declared at most once in C.
3. If the _eld is final, then a variable initializer must appear in the declaration of _eld.

Fields are classi_ed according to whether they are static or not:

   { If the _eld is static, then it is called a class _eld { If the _eld is not static, then it is called an instance  _eld Class _elds correspond to global  variables in a module, whereas instance _elds correspond to _elds in a record. Method declarations. A method declaration in a class C has the following syntax: hpublic j protected j privatei habstracti hfinali hstatici hnativei Ameth(B1  loc1; : : : ;Bn locn) body The method body can be: body := ;' j block

We refer to the method as C=msig, where msig is the signature of the method, i.e., msig is the expression meth(B1; : : : ;Bn). We say that C=msig is m, if the modi_er m appears in the declaration of msig in C. The universe MSig consists of method signatures, i.e., method names together with the number of arguments and the types of the arguments.

Constraint. A method declaration must satisfy the following constraints:

1. The name A is a type or the keyword void. It is called the declared return type of the method.
2. The types A, B1; : : : ;Bn are accessible from C.
3. The identi_ers loc1; : : : ; locn are pairwise di_erent. They are called the formal parameters of the method. We say that the parameter locj is declared of type Bj .
4. The formal parameters loc1; : : : ; locn are di_erent from identi_ers in local variable declarations of body.
5. If a variable loc is used in an expression in body, then loc is a formal parameter of the method or loc is in the scope of a local variable declaration of loc.
6. If the declared return type A is di_erent from void, then each execution path of body must be terminated with a statement return exp;'.
7. If the declared return type A is void, then each execution path of body must be terminated by the statement return;'. (Otherwise, the compiler inserts a return statement at the end of the body.)
8. The method msig is declared at most once in C.
9. The method is abstract if, and only if, its body is the semicolon.
10. If C=msig is abstract, then C is abstract.
11. If C=msig is private, final or static, then it is not abstract.

Note, that void is not a real type. It is not allowed to declare a formal parameter or a local variable to be of type void.

Methods are classi_ed according to whether they are static or not:

Class methods correspond to procedures in a module. Static initializers. A static initializer has the following syntax:

static block

We assume that all static initialization blocks and all static _eld initializers of a class are combined in textual order in one single static initialization block.

This block is called the initializer of the class or interface. It is executed when the class is initialized.

Constraint. The keyword return is not allowed to appear in the block of a static initializer.

Interface members

The members of an interface are constant declarations and abstract method declarations. Constant declarations. A constant declaration in an interface I has the following syntax:

The expression exp can be an arbitrary non-constant expression. It is evaluated when the interface is initialized. Usually the identi_er _eld consists of upper case letters only.

Constraint. A constant declaration must satisfy the following constraints:

1. The type A is accessible from I .
2. A _eld is declared at most once in I .
3. The _eld is implicitly public, static and final.

Although an interface does not contain static initialization blocks, we assume that all _eld initializers are combined in textual order as a sequence of assignments in one block which is called the initializer of the interface I .

Abstract method declarations. An abstract method declaration in an interface I has the following syntax:

If a class implements an interface, then all abstract methods of the interface must be implemented in the class. What this means will be explained below.

Constraint. An abstract method declaration must satisfy the following constraints:

1. The types A and B1; : : : ;Bn must be accessible from I .
2. The method is implicitly public and abstract (and not static).

Accessibility, visibility, hiding and overriding

A class inherits members from its superclasses and superinterfaces. A declaration of a _eld or static method, however, may hide a member of a superclass with the same name. A declaration of an instance method is said to override a declaration of a method with the same signature. Members which are visible in a class can be referred to by their simple names. In the following de_nitions, x denotes a _eld or a method signature.

De_nition. We say that x has default access in class C, if x is neither private nor public nor protected in C.

De_nition. An element C=x is accessible from A means:

1. x is private in C and A = C, or
2. x is not private in C and C is in the same package as A, or
3. x is public in C, or
4. x is protected in C and A _h C.

Some consequences of these de_nitions are:

1. If x is private in C, then C=x is accessible from class C only. Outside of C, the element C=x is not accessible.
2. If x has default access in C, then C=x is accessible from all classes in the same package. Outside of the package, the element C=x is not accessible.
3. If x is public in C, then C=x is accessible from everywhere.
4. If x is protected in C, then C=x is accessible from the same package or outside of the package from subclasses of C.
5. Elements of interfaces are accessible from everywhere, because they are public by de_nition.

The next de_nition is almost identical with the previous de_nition except that the clause for the modi_er protected has an additional condition [18, x6.6.2].

De_nition . An element C=x is accessible from A with respect to B means:

1. x is private in C and A = C, or
2. x is not private in C and C is in the same package as A, or
3. x is public in C, or
4. x is protected in C and B _h A _h C.

In the next de_nition we de_ne what it means that an element is visible in a class or interface A. In terms of the JLS this means that it is a member of A.

De_nition. The visibility of members is de_ned inductively:

1. If x is declared in A, then A=x is visible in A.
2. If A _d B, C=x is visible in B, x is not declared in A and C=x is accessible from A, then C=x is visible in A.

Example (; CD). Consider the following two classes:

The _eld A=i is not visible in class B, because i is de_ned in class B, too. The _eld A=j is not visible in class B, because it is private in A and therefore not accessible from B.

It is possible that two _elds with the same identi_er are visible in a class, since a class can implement everal interfaces.

Example (; CD) . Both, the _eld I=MAX and the _eld J=MAX are visible in class A.

As long as the constant MAX is not accessed in A by its simple name, no syntax error occurs.

The _eld I=MAX is visible in class B through its superclass A as well as directly, since B implements I.

The JLS uses the term override' for instance methods only. We use it here for class methods, too.

De_nition . A method A=msig is said to directly override a method

C=msig, if there is a class or interface B such that

1. A _d B,
2. C=msig is visible in B and
3. C=msig is accessible from A.

When a new method possibly overrides or hides a method with the same signature in a superclass or superinterface several conditions have to be satis_ed, for example, the return type has to be the same.

Constraint. If A=msig directly overrides C=msig, then the following constraints must be satis_ed:

1. The return type of msig in A is the same as in C.
2. Method msig is not final in C.
3. Method msig is static in A if, and only if, it is static in C.
4. Method msig is not private in A.
5. If msig is public in C, then msig is public in A.
6. If msig is protected in C, then msig is public or protected in A.

The last three constraints say that access may not decrease according to the following ordering:

private < default < protected < public

The relation overriding' is the reexive, transitive closure of direct overriding'.

De_nition . The relation A=msig overrides B=msig is inductively de- _ned as follows:

1. If msig is declared in A, then A=msig overrides A=msig.
2. If A=msig directly overrides B=msig and B=msig overrides C=msig, then A=msig overrides C=msig.

It is possible that a method msig is declared in several superinterfaces of a class A. It is also possible that msig is declared in a superinterface and in a superclass of A. In order to avoid inconsistencies one has to require that the return type of msig is always the same.

Constraint. If two methods B=msig and C=msig with the same signature are both visible in A, then the following constraints must be satis_ed:

1. msig has the same return type in B and C,
2. If msig is public in B, then msig is public in C.
3. If msig is not static in B, then msig is not static in C.

The following constraint for abstract methods is not contained in the JLS.

The constraint is natural, since abstract methods in interfaces are public by de_nition. The constraint is later used in the Lookup Lemma 8.4.1.

Constraint. If C=msig is abstract, then it is public or protected.

The JLS allows abstract methods with default access. Such methods, however, are strange, because they cannot be implemented in a di_erent package.

De_nition . A class A implements a method msig, if there exists a class B such that

1. A _h B and msig is declared in B,
2. B=msig is visible in A,
3. msig is not abstract in B.

Unless a class A implements all methods of its superinterfaces the class has to be declared abstract. Also if an abstract method of a superclass is visible in A, then A has to be declared abstract.

Constraint. If the abstract method C=msig is visible in class A and A does not implement msig, then A is abstract.

In other words, if a non abstract class A implements an interface I , then A implements each method declared in the interface I .

Example (; CD). Class A inherits from its direct superclass B a nonabstract method m(int). Therefore, class A implements method m(int).

The abstract method I=m(int) is visible in class A. Since A implements

m(int), class A is not abstract.

Example (; CD). If the method m(int) is declared private in class B, then it is no longer visible in class A.

Since class A does not implement method m(int), class A has to be declared abstract.

Example (; CD). If the method m(int) is declared with default access in class B, then Constraint below is violated, because m(int) is public

The compiler reports an error because the access modi_er of m(int) is made more restrictive.

Static type checking

For the rest of this chapter we assume that all _elds and methods of classes are static. Static _eld access expressions are replaced at compile-time by abstract expressions C:_eld, where _eld is a class _eld declared in class or interface I . There are two possibilities to access a static _eld:

1. B:_eld, where B is a class or interface.
2. _eld

These expressions are replaced at compile-time as follows:

1. In an expression B:_eld the identi_er _eld can denote a _eld of the class or interface B or a _eld of one of B's superclasses or superinterfaces which is visible in B. At compile-time, the expression B:_eld in class A is replaced by C:_eld, if the class or interface C is unique with the property that C=_eld is visible in B and accessible from A. If there is no such class C or if _eld is not static in C, then a syntax error occurs.
2. If a simple expression _eld in class A is not in the scope of a local variable declaration or formal parameter with the same name, then it denotes a _eld of A or _eld of one of A's superclasses or superinterfaces which is visible in A. The simple expression _eld is replaced by the expression C:_eld, if the class or interface C is unique with the property that C=_eld is visible in A, and if _eld is static in C.

The type of a static _eld access expressions C:_eld is the declared type of _eld in C (see Table below).

A method invocation expression can refer to a method in the current class or to a visible method of one of its superclasses. Since methods can be overloaded, during compile-time the most speci_c method is chosen which is applicable to the types of the arguments of the invocation. The type of the method invocation is then the return type of the chosen method. A method is more speci_c, if it is de_ned in a subclass or in the same class and if the argument types are subtypes. The return type of the method is ignored in the comparison. The relation more speci_c' is a partial ordering. If a set of methods has a least element, then this element is unique.

De_nition . A method C=meth(A1; : : : ;An) is more speci_c than a method D=meth(B1; : : : ;Bn), if C _h D and Ai _ Bi for i = 1; : : : ; n.

There are two kinds of method invocations:

1. _meth_(exps)
2. _C:meth_(exps), where C is a class.

Type constraints for JavaC

As a result of parsing and elaboration each kind of expression is replaced by _D:m_(exps), where D and the method signature m are determined as follows: Assume that the position _ is in class A and that _(exps) is _(1exp1; : : : ; n expn). Let msig be the signature meth(T (1); : : : ; T (n)). A set of applicable methods app(_) is determined as follows:

• Let app(_) be the set of all methods D=m such that

a) A=msig is more speci_c than D=m and

b) D=m is visible in A.

• Let app(_) be the set of all methods D=m such that

a) C=msig is more speci_c than D=m and

b) D=m is visible in C and accessible from A with respect to C.

Assume that app(_) contains a most speci_c element D=m, i.e.,

{ D=m 2 app(_) { If E=k 2 app(_), then D=m is more speci_c than E=k

Assume that m is static in D. Then D=m is the method chosen by the compiler. Moreover, the type at position _ is the declared return type of m in D (see Table below).

Example (; CD). In the following program the method m is overloaded.

It can take arguments of type double as well as arguments of type long. The most speci_c method is chosen during compile-time:

Since i is declared to be of type int, the most speci_c method for the method invocation m(i) is m(long). In order to ensure type safety, the compiler inserts automatically a type cast: the method invocation m(i) is replaced by m((long)i). Hence, before the method m(long) is invoked, the argument is converted from type int to long.

Example (; CD). It can happen that there exists no most speci_c method which is applicable to a method invocation.

In this case, the compiler reports that the reference to the method is ambiguous.

Note, that the literal 0 is of type int.

Example (; CD) 4.1.9. In the following example A=m(int) and B=m(long) are both applicable to the method invocation expression m(0):

Since B _h A and int _ long the two methods are not comparable and therefore there is no most speci_c method for m(0).

Example (; CD). The type of the expression in the return statement can be a subtype of the declared return type of the method:

In this example, the value of i is automatically converted to type long: the compiler replaces return i' by return (long)i' .

Type constraints after introduction of primitive type casts

Vocabulary of JavaC

The extension of the vocabulary we describe in this section for JavaC re- ects that this machine comes with a class environment, including a class initialization mechanism, that it deals with di_erent methods which can be invoked and be returned from, and that with method return it introduces a new reason of abruption of normal program execution.

For the sake of simplicity, but without loss of generality, we assume that any class C has a class initializer C=<clinit>()|its body (whose function is to initialize the class _elds at the _rst active use of the class, see below) is a phrase static block, where block may be empty. Non constant class _eld initializations are syntactically reduced to assignments and are placed at the beginning of the class initializer. JavaC abstracts from initializations of constant _elds; the latter are final class _elds, whose values are compile-time constants [18, x15.27]. The value of constant _elds is precomputed (as part of the elaboration phase) and stored in the class and interface environment of the given program.

We assume also that there are only _eld access expressions of the kind C:_eld, where _eld is a static _eld declared in C. Other _eld access expressions are replaced during parsing and type checking. Moreover, method invocations are of the kind C:msig(exps), where msig is a method signature of a static method of class C. The method signature as well as the class C have been determined during type-checking.

JavaC programs are executed w.r.t. a static class environment which is set up during parsing and elaboration. The following static functions look up information in this environment, possibly traversing the inheritance hierarchy from bottom to top (from subtype to supertype).

The function super returns the direct superclass of a given class, provided there is a superclass, i.e., C _d super(C). We use the function classNm to access the class name of a compound identi_er (e.g. classNm(c=m) = c). The function methNm accesses the method name (e.g. methNm(c=m) = methNm(m)). The function body yields the body of the given method in the given class.

super: Class ! Class body : Class=MSig ! Block

In JavaC we distinguish four initialization states for a class: either the initialization of the class has not yet started (but the class is Linked), it is InProgress, it is already Initialized or during the initialization an error occured.

Therefore we introduce a universe data ClassState = Linked j InProgress j Initialized j Unusable together with a dynamic function classState: Class ! ClassState

which records the current initialization status of a class. A class is initialized, if the initialization state for the class is InProgress or Initialized.

To model the dynamic state of class _elds, we have to reserve storage for these variables. The dynamic function globals yields the value stored under a _eld speci_cation.

In JavaC we have to deal with di_erent methods which can be invoked and be returned from. We use the dynamic function meth to denote the currently executed method.

meth: Class=MSig

A method may call other methods. We use the usual stack technique to implement method calls. When a new method is invoked, the frame of the invoking method meth, consisting of meth, restbody; pos, and locals, is pushed onto the stack to be resumed after the invoked method has _nished. We denote by a dynamic function frames the sequence of currently still to be executed frames on the stack.

In JavaC there are two new reasons for abruption, namely Return and Return(Val ), occurring through the execution of return statements which by de_nition complete the body of a method abruptly and possibly return a result value to the invoker of the method. They will be used in the extension of the abruption handling rules of execJavaStmI in execJavaStmC .