# Soundness of bytecode type assignments - JVM

For the rest of this chapter we assume that each method in each class of the current class environment has a bytecode type assignment.

What does it mean that the type assignment is sound? It means that the bytecode does not violate any checks when it runs on the defensive VM. One can show that at run-time the values of the operands and the values stored in local variables belong to the assigned types. If there is a verify type assigned to a local variable, then at run-time the local variable contains a value which belongs to that verify type. If the type is a primitive type, then the value is of exactly that type. If the type is a reference type, then the value is a pointer to an object or array which is compatible with that type. The same is true for the verify types assigned to the operand stack. Moreover, at run-time the

Violation of T7 (f): (c; k)new occurs in mod(s)_ regTj+1 (; CD)

Violation of T8 (; CD)

operand stack has exactly the same length as the list of verify types which are assigned to the operand stack.

The crucial point is how to de_ne what it means that a return address belongs to the type retAddr(s). The idea is that j + 1 belongs to the type retAddr(s), if code(j) = Jsr(s) and, for all variables x which are not modi_ed by the subroutine s, the value of x in the local environment belongs to the type assigned to x at index j +1, i.e., reg(x ) belongs to regTj+1(x ). Therefore a return address has to be typed with respect to the local environment reg.

Moreover, the type of a reference depends on the class of the reference, which is stored in the heap, and on the initialization status of the reference, which is stored in the functions initState.

De_nition(Typing rules). Let (regTi ; opdTi )i2D be a type assignment for a method _. Let reg be a local environment for _, v be a value and _ be a verify type. The


typing judgment regv:_is de_ned by the following rules:
type(v) v _
reg  v: _
j 2 D; code(j ) = Jsr(s); s does not return
j 2 D; code(j ) = Jsr(s); s returns
reg reg(x ):regTj+1(x)for each x 2 dom(clean(mod(s)_regTj+1))
The function clean _lters out all not fully  initialized types:
clean(regT) = f(x ; t) 2 regT j t 6= ( ; )new ; t 6= InInitg


A subroutine s returns, if there exists an i 2 D such that code(i ) = Ret(x ) and regTi (x ) = retAddr(s).

The typing rules for return addresses depend on the local environment reg, the function heap and the type assignment to the method. Since a Store instruction changes the local environment, a return address could possibly loose its type. The following coincidence lemma therefore states a convenient condition under which a return address keeps its type. The typing of a return address from a subroutine depends only on the local variables which are not modi_ed by the subroutine. In the following we often omit the tag retAddr(s) in values which represent return addresses.

Lemma 16.4.1 (Coincidence). If reg j+1: retAddr(s) and mod(s)_ reg = mod(s)_ reg0, then reg0 j+1: retAddr(s).

Proof. By induction on the length of a derivation of reg v: _ . Assume that regj+1: retAddr(s) and mod(s)_ reg = mod(s)_ reg0:
From the typing rules we obtain that j 2 D and code(j ) = Jsr(s).

If s does not return, we obtain reg0  j +1: retAddr(s) by the second typing rule. Otherwise, by the third typing rule we have

1. j 2 D, code(j ) = Jsr(s), s returns,
2. reg  reg(x ): regTj+1(x ) for each x 2 dom(clean(mod(s) _ regTj+1)).

Let x be a local variable which is de_ned in regTj+1 but does not belong to mod(s). Since reg0 is equal to reg outside of mod(s), it follows by (2) that reg

reg0(x ): regTj+1(x ). If regTj+1(x )

is not a return address type, then, by the _rst typing rule, we immediately obtain reg0  reg0(x ): regTj+1(x ).

Otherwise there is an  such that regTj+1(x ) = retAddr(). By T7 (e) , it follows that mod(s) _ mod(). Hence, mod()_ reg = mod() _ reg0:

By the induction hypothesis, we obtain that reg0  reg0(x ): retAddr().

Hence, the three premises of the third typing rule are satis_ed and we can conclude that reg0  j + 1: retAddr(s). ut

Because of object initialization the types of references may change at runtime.

When the initialization state of an object ref changes from InInit to Complete, its type given by the function typeOf (ref) changes from InInit to classOf (ref ), too. Once the initialization state is Complete, the reference keeps its type forever. Since un-initialized types and partially initialized types are excluded in the typing rules for return addresses via the function clean, the initialization of objects causes no problems for typing judgments.

A bytecode type assignment ensures by T6 that if the instruction at code index i is a call to an instance initialization method invoked on a reference of type (c; k)new , this type is replaced by the type c everywhere in the type frame at code index i+1. When the instance initialization method is executed at run-time, we know that the reference on which it is invoked is of that type. In the soundness proof below, we have to prove that in the future, after the instance initialization method has returned and the object has been initialized, the invariants are still true. At code index i, however, we do not know what the heap will be after the completion of the initialization method.

Hence we have to prove the invariants for all possible continuations of the current heap. A possible continuation of the current heap with respect to a frame on the stack, is a heap in the state of the computation when the frame again becomes the current frame.

De_nition (Active reference). Let (pc_; reg_; opd_; meth_) be a frame and (regTi ; opdTi )i2D be the bytecode type assignment for meth_. We say that the frame contains an active reference r with assigned type _ , if one of the following two conditions is satis_ed:

1. there exists a local register x with regTpc_ (x ) = _ and reg_(x ) = r , or
2. there exists a stack position i with opdTpc_ (i) = _ and opd_(i) = r .

We say that a reference r is active in the frame, if the frame contains r with some assigned type.

De_nition (Continuation of the heap). Let f be a frame on the stack. A continuation of the heap for f is a function heap_ with the following properties:

1. dom(heap) _ dom(heap_),
classOf (p) = classOf _(p) for each p
2. dom(heap).
If the frame f is followed by a constructor invocation frame with this object ref , then
3. initState(p) = initState_(p) for each active p of f with p 6= ref ,
4. initState_(ref ) = Complete.
If the frame f is not followed by a constructor invocation frame, then
5. initState(p) = initState_(p) for each active p in f .
(Continuation of a frame). Assume that the frame

(pc_; reg_; opd_; meth_) is a frame on the stack of the JVM. Let _ be the method of the next frame (the called frame). Let vals be a sequence of possible return values compatible with the return type of _ with respect to a continuation of heap for the frame. If the return type is void, then vals is the empty sequence. Then (pc_+1; reg_; opd__vals; meth_) is called a continuation of the frame.

De_nition (Init sequence). An init sequence on the stack is a maximal sequence of consecutive frames on the stack with the following properties:

1. Each frame in the sequence is a constructor invocation frame.
2. For each frame in the sequence except for the _rst frame, the constructor belongs to the same class as the constructor of the parent frame or is in its direct superclass.
3. The value of register 0 is the same in all frames of the sequence.

The value of register 0 is called the init object of the init sequence.

Theorem 16.4.1 (Soundness of type assignments). Assume that in the given run, the frame (pc_; reg_; opd_; meth_)

1. is the current frame (pc; reg; opd; meth) and heap_ is the current heap heap, or
2. is a frame on the stack waiting for a <clinit> method to return and heap_ is a continuation of heap for the frame, or
3. is a continuation of a frame f on the stack and heap_ is a continuation of heap for the frame f .

Let (regTi ; opdTi )i2D be the type assignment for meth_. Then the following invariants are satis_ed at run-time for the frame (pc_; reg_; opd_; meth_):


(pc) pc_ 2 D (hence pc_ is a valid code index for meth_).
(check) check(meth_; pc_; types(reg_);types(opd_)) is true.
(reg1) dom(regTpc_ )_ dom(reg_).
(reg2) reg_ reg_(x ):regTpc_(x)for every x 2 dom(regTpc_ ).
(reg64)If reg_(x)= (w1;lowt)and reg_(x +1)=(w2;hight),then the pair
(w1; w2) is a correct 64-bit value of type t.
(opd1) dom(opdTpc_) = dom(opd_).
(opd2) reg_ opd_(i ): opdTpc_ (i) for each i 2 dom(opdTpc_ ).
(opd3) length(opdTpc_ ) < maxOpd.
(opd64)If opd_(k)= (w1; lowt)and opd_(k + 1)=(w2; hight),then the
pair (w1; w2) is a correct 64-bit value of type t.


(init1) If the frame contains an active reference r with assigned type InInit, then meth_ is an <init> method (not of class Object), r = reg_(0) and classOf (r ) v classNm(meth_). Conversely, if meth_ is an <init> method, then reg_(0) contains a reference r with initState(r ) = InInit or initState(r ) = Complete.

(init2) For each c and i , the frame contains at most one active reference r with assigned type (c; i)new .

(init3) If the frame contains an active reference r with assigned type (c; i)new , then r is not the init object of an init sequence of stack and r is not active in any other frame. The following global invariants are true at run-time:

(global) If the static _eld c=f of declared type A is used in the program, then type(globals(c=f )) v A.

(ref) If a reference r is used in the current state of the JVM, then r points to an existing object or array on the heap; classOf (r ) is a non-abstract class or an array type.

(object) If classOf (r ) is a class c and d=f is an instance _eld of c of declared type A, then type(getField(r ; d=f )) v A.

(array) If classOf (r ) is an array type A[ ], then type(arrayElem(r ; i )) v A for each i < arraySize(r ).

(initseq) There are not two di_erent init sequences on stack with the same init object.

Proof. We show _rst that the invariant (check) follows from the other invariants:

The type assignment ensures that check(meth_; pc_; regTpc_ ; opdTpc_ ) is true. Let regS = types(reg_) and opdS = types(opd_) be the type assignments associated to the local environment reg_ and the operand stack opd_. From (reg1), (reg2), (opd1) and (opd2) we can deduce that regS vreg regTpc_ and opdS vseq opdTpc_ . Since the structural constraints are monotonic, it follows that check(meth_; pc_; regS; opdS) is true as well.

The remaining invariants are proved by an induction on the run of the defensive VM. We show here the critical cases for subroutines, object initialization and the current frame.

Case 1. code(pc) = Jsr(s): The new pc is s and the new operand stack is

opd _ [pc + 1]. The type assignment condition T6 ensures that { regTpc vreg regTs

{ opdTpc _ [retAddr(s)] vseq opdTs

This implies (by the transitivity of v and the induction hypothesis) invariants (reg1), (reg2), (reg64), (opd1), (opd2), (opd3) and (opd64) at s, except for reg  pc+1: retAddr(s). The induction hypothesis says that reg  reg(x ): regTpc(x ) for every x 2 dom(regTpc). If s does not return, then by the second typing rule of Def., reg  pc + 1: retAddr(s):

Otherwise, if s returns, then by T7 (d), regTpc vreg mod(s) _ regTpc+1 and, by the third typing rule of Def., it follows that reg  pc+1: retAddr(s).

The invariants (init1){(init3) and the global invariants can be moved from pc to s.

Case 2. code(pc) = Ret(x ): Condition T5 and the check in Fig. below ensure that there exists a subroutine s such that regTpc(x ) = retAddr(s) and therefore s returns. The induction hypothesis (reg2) for x implies that reg  reg(x ): retAddr(s). Let reg(x = j + 1. By the third typing rule in Def. it follows that reg  reg(x ): regTj+1(x ) for every x 2 dom(clean(mod(s) _ regTj+1)).

The new pc is j +1. The typing rules in Def. below also yield that j 2 D and code(j = Jsr(s). The type assignment condition T7 ensures that {j + 1 2 D (hence j + 1 is a valid code index for meth)

Invariants (reg1), (reg2) and (reg64) follow from the induction hypothesis at pc. Namely regTj vreg mod(s) _ regTj+1, the fact that neither (c; k)new nor InInit occur in mod(s)_ regTj+1 and the typing rules for the return address j + 1 imply that reg  reg(x ): regTj+1(x ) for every x 2 dom(regTj+1) n mod(s).

Moreover, regTpc vreg mod(s) _ regTj+1 implies that reg  reg(x ): regTj+1(x ) for every x 2 dom(regTj+1) mod(s). Invariants (opd1){(opd64) follow immediately from the induction hypothesis, since the operand stack is propagated to j + 1 without modi_cations. The global invariants and the invariants (init1){(init3) can be moved from pc to j + 1, since by condition T7 (f) we know that mod(s)_ regTj+1 is free of ( ; )new and InInit.

Case 3. code(pc) = Store(t; x ): Assume that size(t) = 1 (the argument for the case size(t) = 2 is similar). The type assignment condition T6 ensures that there exist opdS and _ such that

Hence, opd = opd0 _ [v]. Let reg0 = reg[x 7! v]. The new pc is pc +1, the new operand stack is opd0 and the new local environment is reg0. Invariants (pc), (reg1), (reg64), (opd1), (opd3), (opd64) for pc + 1 follow from the induction hypothesis. The same holds for invariants (reg2) and (opd2), using part 1 of Def. below, except for operands and local variables of return address type.

Assume that retAddr() occurs in regTpc+1 or opdTpc+1. Then retAddr() already occurs in regTpc or opdTpc and, by T8, pc belongs to subroutine

Therefore x 2 mod() and mod() _ reg = mod() _ reg0. We can apply the Coincidence Lemma and see that, if reg  j + 1: retAddr(), then reg0  j + 1: retAddr(). Hence, the invariants are also satis_ed for return addresses at code index pc + 1. The global invariants and the invariants (init1){(init3) can be moved from pc to pc + 1 without problems.

Case 4. code(pc) = New(c): The type assignment condition T6 for the successor frame (see Fig. below) ensures that { opdS _ [(c; pc)new ] vseq opdTpc+1, where opdS = [if t = (c; pc)new then unusable else t j t 2 opdTpc] { f(x ; t) 2 regTpc j t 6= (c; pc)new g vreg regTpc+1 Hence the only occurrence of the type (c; pc)new in the type frame at pc + 1 is the topmost position in opdTpc+1. Since there is only one occurrence of this type, invariant (init2) can be carried over from pc to pc + 1. The other invariants follow with similar arguments as in the previous cases.

The VM creates a new reference r on heap with classOf (r ) = c. Hence, each continuation of the new heap for a frame f on stack is also a continuation of the current heap for f . Therefore the invariants for continuations of other frames on the stack remain true.

Case 5. code(pc) = InvokeSpecial (void; c=<init>): We consider the invocation of a nullary constructor only. The general case is similar. For a nullary constructor the type assignment condition T5 yields with Fig. below that { opdTpc = opdS _ [_ ] and initCompatible(meth; _; c).

The predicate initCompatible implies that one of the following statements is true:

Let r be the topmost value on the operand stack opd. Invariants (opd1) and (opd2) yield reg  r : _ and thus type(r ) = _ .

The defensive VM deletes r from opd, pushes the current frame onto the stack (Fig. below) and sets initState(r) to Complete if c is class Object and to InInit otherwise. Hence type(r ) might change and we have to think about the possibility that the invariants for the continuations of other frames on the stack might be violated. Fortunately this is not the case because of invariants (init3) and (initseq).

We have to show the invariants for the new frame for the method c=<init>:

If _ = (c; k)new , then by invariant (init3), the reference r is not the init object of any init sequence on stack. This fact is needed, because r is the init object of the new frame created for the method c=<init>. Hence all init sequences have di_erent init objects and invariant (initseq) is true. Invariant (init1) is true in the new frame, since classOf (r) = c. The other invariants are trivially true for the new frame.

If _ = InInit, then by invariant (init1) we are already in an init sequence with init object r . This init sequence is extended by the new frame. By the induction hypothesis (init1), we obtain that classOf (r) v classNm(meth). Since classNm(meth) v c, it follows that classOf (r) v c. The other invariants are trivially true for the new frame.

It is not enough to show the invariants for the new frame for method c=<init>. We have to show them also for every continuation of the current frame which is pushed on stack and for every continuation of the current heap. In such a continuation we are at code index pc + 1 and we can assume that initState(r ) = Complete.

If _ = (c; k)new , then in the successor type frame of pc the type _ is replaced by c. Because initState(r ) = Complete in the continuation of the frame, we have type(r) = c and hence reg  r : c. Note, that if the successor frame contains an active occurrence of r , then its assigned type in the current frame must be _ .

If _ = InInit, then in the successor type frame of pc the type _ is replaced by classNm(meth) (Fig. below). By the induction hypothesis (init1) it follows that classOf (r) v classNm(meth). Since type(r) = classOf (r ) in the continuation, we obtain reg ` r : type(r ).

Case 6. code(pc) = Return(t): The checks in Fig. below (including the clause endinit) ensure that

The VM pops the topmost frame from the stack and makes it again the current frame. The returned value is appended to the operand stack and the program counter is incremented, if necessary. Hence we obtain exactly a continuation of the frame according to Def. Moreover, the current heap is a continuation of itself for the topmost frame according to Def. The induction hypothesis yields the invariants for the new state.

Case 7. code(pc) = Dupx(s1; s2): { opdTpc = opdS _ ts1 _ ts2 { length(tsi ) = si and validTypeSeq(tsi ) for i = 1; 2

Hence, opd = opd0 _ ws1 _ ws2, where length(wsi ) = si and the new operand stack at pc + 1 is opd0 _ ws2 _ ws1 _ ws2.

By the de_nition of the predicate validTypeSeq, the type sequences ts1 and ts2 do not start with a type hight. Hence, by invariant (opd2), the _rst words in the sequences ws1 and ws2 do not have the type tag hight. Therefore we see that the instruction Dupx cannot be used to construct an invalid 64-bit value be permuting single words on the operand stack and the invariant (opd64) remains true.

The cases for the other instructions are treated in a similar way. ut The entanglement of embedded subroutines and object initialization is rather delicate. In our approach we forbid that a caller of a subroutine uses later an un-initialized or partially initialized object stored in a register not modi_ed by the subroutine. This prevents that a subroutine can mislead the caller in believing that an object has already been initialized by the subroutine although it is not. It prevents also the reverse direction that a caller of a subroutine believes an object is still un-initialized although the subroutine already did that. We could achieve the same by other conditions on subroutines. We could forbid that

1. a caller passes not fully initialized objects to a subroutine and
2. a subroutine returns not fully initialized objects to a caller.

The _rst condition can be implemented by de_ning the successor set of a Jsr(s) instruction at type frame (regT; opdT) as follows: