Type assignments without subroutine call stacks - JVM

This notion combines the type frame successor function of the previous section with the checking function of the defensive VM.

Based on the examples above we make the following assumptions about subroutines. We assume that the _rst instruction of a subroutine is a Store(addr; x ) instruction which stores the return address in a variable x .

This assumption is reasonable, since if a subroutine ever returns, it has to take its return address from a local variable in the Ret instruction. The return address is on top of the operand stack, when the subroutine is called. It has to be stored in the local variable somewhere in the subroutine. We assume that this is at the beginning.

Our second assumption is that the code generated by a Java compiler for a finally block is connected. The code may contain several Ret instructions, but the code indices for the block must form an interval. Both assumptions are implicit in the notion of bytecode type assignment below.

We divide the instruction set of the Java VM into normal instructions and control transfer instructions.

De_nition (Control transfer instructions). Control transfer instructions are: Goto(i), Cond(p; i), Return(_ ), Throw, Jsr(i), Ret(x ).

Each code index has a (possibly empty) set of successor indices which can be reached in the next step of the execution of the Java VM without applying a return instruction (Return(t) or Ret(x )).

De_nition(Successor index). A code index j is called a successor index of i (with respect to a given method), if one of the following conditions is true:

in the exception table of the method such that f _ i < u and the ith instruction is not Jsr, Goto, Return or Load

Note that the index of a Jsr instruction has two possible successors unlike the type frame associated to a successor instruction which has only one (except for type frames of possible exception handlers). This is because in the intended pairing" of Jsr(s) and Ret(x) instructions, the subroutine called by Jsr returns to the instruction immediately following Jsr, similarly to the association between the index i of a method invocation instruction and its successor index i +1 which is reached by executing the corresponding Return instruction from the invoked method. In the following de_nition of reachability it is important that both successor indices of Jsr are included.

De_nition(Reachable). A code index j is called reachable from I if there exists a _nite (possibly empty) sequence of successor steps from I to j . A code index is called reachable, if it is reachable from 0.

Since the treatment of subroutines is rather complicated we have to de_ne precisely what we mean by a subroutine.

De_nition (Subroutine). If i is reachable from 0 and the ith instruction is Jsr(s), then the code index s is called a subroutine of the method. Since we assume that the _rst instruction of a subroutine stores the return address in a local variable, the possible returns from the subroutine are easy to identify.

De_nition (Return from subroutine). A code index r is a possible return from subroutine s, if code(s) = Store(addr; x ), code(r ) = Ret(x) and r is reachable from s + 1 on a path that does not use any Store( ; x ) instruction.

The instructions which belong to a subroutine are simply those which are in the interval between the _rst instruction of the subroutine and any possible return from the subroutine (there could be several).

De_nition (Belongs to a subroutine). A code index i belongs to subroutine s, if there exists a possible return r from s such that s _ i _ r .
For the treatment of polymorphic subroutines a function mod is used which assigns to each subroutine the set of local variables modi_ed by the subroutine.

This set includes also the variables which are used in other subroutines called by this subroutine as well as the variables which are used in implicitly called exception handlers, as long as they belong to the subroutine.

De_nition (Modi_ed variables). Let s be a subroutine. A variable x belongs to mod(s), if there exists a code index i which belongs to s such that code(i) = Store(t; y) and one of the following conditions is satis_ed:

1. size(t) = 1 and x = y, or
2. size(t) = 2 and x = y or x = y + 1.

The set mod(s) of variables which are modi_ed by a subroutine is used to restrict the type assignment to local variables when a subroutine returns with a Ret(x ) instruction. At this time, the variable x must be of type retAddr(s) for some subroutine s. The types of the variables modi_ed by s have to be returned to the callers of s. For this purpose we use the notion of the restriction of type assignments to a given set of variables.

Since instructions Ret(x have no successor type frames, they have to be treated in a special way in the following de_nition.

De_nition (Bytecode type assignment). A bytecode type assignment with domain D for a method _ is a family (regTi ; opdTi )i2D of type frames satisfying the following conditions:


T1. D is a set of valid code indices of the  method _.
T2. Code index 0 belongs to D.
T3. Let [_1; : : : ; _n] = argTypes(_)and c = classNm(_).
If _ is a
a) class initialization method: regT0 = ;.
b) class method: f0 7! _1; : : : ; n   1 7! _ng vreg regT0.
c) instance method: f0 7! c; 1 7! _1; : : : ; n 7! _ng vreg  regT0.
d) constructor: f0 7! InInit; 1 7! _1; : : : ; n 7! _ng vreg regT0.
T4. The list opdT0 is empty.
T5. If i 2 D, then check(_;i; regTi;opdTi)is true.
T6. If i 2 D and (j ;regS; opdS)2succ(_;i; regTi;opdTi ), then
j 2 D, regS vreg regTj and opdS vseq opdTj .
T7. If i 2 D,code(i)=Ret(x)and regTi(x)= retAddr(s), then for all
reachable j 2 D with code(j ) = Jsr(s):
a) j + 1 2 D,
b) regTi vreg mod(s) _ regTj+1,
c) opdTi vseq opdTj+1,
d) regTj vreg mod(s) _  regTj+1,
e) if retAddr() occurs in mod(s) _  regTj+1, then each code index
which belongs to s belongs to l ,
f) neither (c; k)new nor InInit occur in mod(s) _  regTj+1.
T8. If i 2 D and retAddr(s) occurs in regTi , then i belongs to s.
If i 2 D and retAddr(s) occurs in opdTi , then i = s.


The verify type unusable is allowed on the operand stack but not as the type of a register.

T1 says that the domain D of a bytecode type assignment must be a set of valid code indices. A bytecode type assignment does not necessarily assign type frames to all code indices which are reachable from 0 according to Def. For example, if the bytecode jumps at index i to subroutine s and s does never return, then i +1 does not necessarily belong to D. We will see in the soundness proof that at run-time the program counter pc always belongs to D. This is certainly true at the beginning, because T2 ensures that code index 0 is in D.

T3 and T4 are conditions for the type frame at code index 0. The operand stack must be empty at the beginning. The declared types of the arguments of the method must be more speci_c than the types assigned in regT0. An instance method assigns the value of this to register 0, hence the type of 0 must be the class in which the method is de_ned. When a constructor is called, the this object becomes partially initialized, hence the type of register 0 must be InInit except in class Object where the this object becomes fully initialized. Therefore, the constructor of class Object is treated as an instance method in T3. We do not require that the types of the arguments are equal to the ones in regT0, because it may happen that the code later jumps back to 0 when some of the registers have more general types or are not even used any longer.

In T5, we use the check functions of the defensive VM in the previous chapter to ensure the correct applicability of an instruction. Since the check functions operate on the type frame and not on the raw values, we can apply them to regTi and opdTi for each code index i in the domain D of the bytecode assignment. We write check(meth; pc; regT; opdT) for checkE(meth)(code(meth)(pc); maxOpd(meth); pc; regT; opdT).

T6 says that a successor type frame has to be more speci_c than the type frame assigned to the successor index. T6 requires that the successor index j of i belongs to the domain D. Since D consists of valid code indices only, T6 ensures that successor code indices are valid code indices. T6 therefore makes it impossible to drop o_ the end of a method body. The length of the operand stack of the computed successor type frame for the instruction with index j must be the same as the length of the operand stack which is assigned to j and the types must be compatible. For type registers the condition is weakened. For each register in the computed successor either the type is compatible with the one assigned at index j or it cannot be used at successor index j .

T7 deals with subroutines. If the ith instruction is Ret(x), a return from a subroutine, then by T5 the local variable x has type retAddr(s) in regTi for some subroutine s. This means that the value of x at run-time will be a return address to a caller of the subroutine s. Let j be any code index which calls s, i.e., code(j ) = Jsr(s). Then the code index j + 1 must be valid (a).

For local variables which are used at index j + 1 and are modi_ed by the subroutine s, the type at i must be more speci_c than at j + 1 (b). For local variables not modi_ed by s, the type at j , from where the subroutine s was called, has to be more speci_c than the type at index j + 1 to which subroutine s is returning (d). The types assigned to the operand stack at index i must be more speci_c than the types at index j + 1 (c).

Condition T7 (e) ensures that subroutines are properly nested. If a type retAddr() is used at code index j + 1 for a register which is not modi_ed by the subroutine s, then the subroutine l must enclose s in the sense of Def. In Fig. below all conditions are satis_ed except of T7 (e). Although the subroutine s is called by the subroutine l , the variables x and r2 which are modi_ed by s are not in mod(l). We have mod(s) = fr2; xg and mod(l ) = fr1g. Because x is not in mod(l), the type String is propagated at the Jsr(l ) instruction and not the type int. At run-time there would be a type violation there. Condition T7 (e) is violated, because the second Jsr(s) belongs to subroutine l but subroutine s is not contained in l. Condition T7 (e) is used below in the proof of the Coincidence Lemma.

Instead of condition T7 (e) we could also use the following inductive de_nition of the sets mod(s):

Violation of T7 (e): subroutine s is not contained in l (; CD)

Violation of T7 (f): (c; k)new occurs in mod(s)_ regTj+1 (; CD)

1. If i belongs to s and code(i) = Store(t; x ), then x 2 mod(s) and, if size(t) = 2, then x + 1 2 mod(s).

1. 2. If i belongs to l , code(i ) = Jsr(s) and x 2 mod(s), then x 2 mod(l ).

With this extended de_nition of mod, more bytecode programs would have type assignments. A compiler for Java, however, had to use more local registers for storing return values and for storing exceptions in default handlers.

Condition T7 (f) is similar to a condition proposed by Freund and Mitchell in. Without T7 (f) a subroutine may create new objects which can later be used without being fully initialized. This

behavior is shown. When the second object which has been created in the subroutine is initialized, the type (c; k)new is replaced by c everywhere in the type frame. The variable x gets type c, although the value of x at run-time is an uninitialized object. Hence a bytecode type assignment must ensure that there is always at most one object of type (c; k)new. Without T7 (f) (or a similar condition) this is not possible.

In a similar way T7 (f) prevents code from initializing an object twice.

an example of what could happen without T7 (f). Since the variable x is not modi_ed by the subroutine s, it does not belong to mod(s). Therefore type (c; k)new is propagated from j to j +1 and type c remains at index i. The object can be initialized a second time.

In Fig. below conditions T1{T7 are satis_ed but condition T8 is violated.

According to Def.the last instruction Ret(r1) is a possible return from subroutine s1. Therefore mod(s1) = fr1; r2g. The instruction Ret(r2) is a possible return from subroutine s1. Since the instruction Ret(r2) is before s1, the set mod(s2) is empty although subroutine s1 stores at run-time an integer in register x . Condition T8 is violated, since the type retAddr(s2) is used several times for register r2, but there are no instructions that belong to subroutine s2.