THE REXX DATA MODEL - IBM Mainframe

Now that we know at the highest level what the structure of a REXX program is, it is time to look more closely at how REXX manages data. There are two primary facts to remember about the REXX data model. The first is that all data is stored (conceptually at least) as character stringy. That is, REXX in general does not recognize data types. All data in REXX, without exception, can be handled as a character string. It can be concatenated with other strings. String operations like sub-string can be performed on it. All data can be input and output without the need to perform conversions.

Certain operations in REXX, like arithmetic, do require the data to be understandable as a number, and will give an error if it isn't. But conversions in such cases are implicit and automatic. Even when data has to be treated as numeric, the user is relieved of the requirement present in other languages to be concerned with the internal representation of the number. That is, there is no distinction made between integer, binary, decimal, or floating point representations. Indeed, there is little need to be concerned with the precision of a number, i.e., the size or number of significant digits in it. By default, REXX allows for nine significant digits. If necessary, this default limit can be raised, subject only to limits of the specific implementation and available space.

The second primary fact about data in REXX is that the language makes no provision at all for declaring data. In other languages, data must usually be declared for at least three reasons: to specify the type of the data, to specify the amount of storage required for the data, and to specify the name used to access the data. All of these reasons are in reality designed for the convenience of the language processor rather than for the convenience of the user. REXX handles each of these details automatically. As just explained, it handles any necessary conversions implicitly. It automatically manages storage allocation. And it can always recognize variable names from context. Since REXX eliminates these needs for declaration of data, the language does not have any data declarations.

All data items in REXX are referred to with a symbolic name. REXX has no other way, such as pointers, to access data. This makes REXX very safe to use, since it is impossible to reference memory that has not been allocated. REXX symbols are tokens that contain only the upper-and lowercase alphabetic characters, numerals, and certain special characters (T ".", "?" and "-"). Not all valid REXX symbols can be used as the name of a variable, but the precise rules are not important at this point. Variable names can usually be quite long, though this is implementation dependent. Commonly the limit is 250 characters or more. REXX always converts symbols to uppercase before interpreting them.

A REXX variable acquires a value when it is the target of an assignment or in a few other specific cases such as PARSE. Such a variable is said to be initialized. It is legal in REXX to use uninitialized variables. This is because any uninitialized variable is assumed to have a value that is the same as its name. Although there are a few times where this convention is convenient, it is usually not any more advisable in REXX than it is in any other language. While a REXX program will never crash simply because it refers to an uninitialized variable (as can happen in many languages), it certainly may malfunction and give incorrect results. Though not the default, it is possible to force REXX to raise an error condition when an uninitialized variable is used inadvertently. There are two kinds of variables in REXX: simple and compound. So far, all examples we have presented use simple variables. These behave much like variables in any other language. The other kind of variables, compound variables, is one of the most significant and characteristic features of the language. Compound variables are similar to arrays in other languages, but with significant differences (as well as advantages and disadvantages). A compound variable is referred to with a symbol that contains one or more periods in it, such as:

ARRAY.I

TWO_DIMENSIONAL_ARRAY.I.J DATABASE_RECORD.TYPE.FIELD.NAME

Each part of such a symbol is a simple symbol. We may speak of simple and compound symbols as (respectively) those that do not or do contain a period. There is a fundamental distinction in REXX between the symbol that refers to a variable and the actual name of the variable, although it is relevant only for compound variables. While it is true for simple variables that the symbol, which refers to the variable and the variable's name, are the same, this is not true for compound variables.

Let us agree to call each portion of a compound symbol delimited by periods a node. The first node, up to the first period, is called the stem. The rule for mapping a symbol to a variable name is as follows: for each node in the symbol except the stem, substitute the value of the variable named by the corresponding simple symbol. As a special case, for each node corresponding to a simple symbol which names an uninitialized variable, substitute the name in uppercase. (This is, after all, the "value" of an uninitialized variable.) The stem does not undergo substitution (but it is uppercased). The result is the name of a variable. (The periods are retained, too, so the name contains at least as many periods as the original symbol.) This name, sometimes called the derived name, is then used just like an ordinary (simple) name in whatever way is appropriate for the context. To take the simplest example, suppose x=i and y is undefined. Then "FOO.X= ‘ALEXIS’ FOO.Y=’ LEON’" assigns values to two variables, having derived names FOO.l and FOO.Y, respectively.

The statement 'SAY FOO.X FOO.Y' displays ALEXIS LEON. Many different symbols can refer to the same variable if they produce the same derived name. For instance, if Z ='Y' Then 'SAY FOO.l FOO.Z'

produces the result ALEXIS LEON as before. Keep in mind that there are symbols, which cannot be the names of (simple) variables. Such symbols include numbers, or any symbol that begins with a number.

-When symbols like this occur in a node of a compound symbol, they are used literally. The symbol FOO. 1 is an example of this. For additional examples of the general process, suppose the following assignments have been made:

THE REXX DATA MODEL

Note that the values of the simple variables that are substituted may contain lowercase letters, which are not uppercased. In fact, those values may contain any characters at all, even blanks, special characters, and extra periods. So variable names may contain arbitrary characters.

Because variable names may contain arbitrary characters, there are many names, which cannot appear explicitly in a program. This is the case, for instance, with names that contain blanks or operator symbols. Such names can be referred to only when derived from an appropriate compound symbol.

The periods occurring in the original compound symbol remain in the compound name, and additional periods may occur if they form part of the value of one of the simple variables being substituted. Unlike a compound symbol, however, a compound name should be thought of as having only two parts: the initial part up to and including the first period (the stem), and everything else (which may contain additional periods).

When REXX goes to look up the value of a compound variable, it first searches for the stem. Then under the stem it searches for the suffix consisting of the remainder of the name, just as if this suffix named a simple variable in a private name space defined by the stem. This suffix is called the tail. If the resulting name is not found, the original compound symbol still refers to a value, which is the derived compound name, according to the normal rules by which REXX handles undefined variables.

Though the details of this process for working with compound variables are somewhat involved, REXX compound variables turn out to be a very powerful and useful facility of the language. Compound variables can be used very much as arrays are in other languages. Even so, the REXX approach has several advantages. It is not necessary (or possible) to determine the size of an array in advance; storage is allocated as needed, and there can even be large gaps in the array without wasting space. Also, though compound variables can be used as if they were arrays of a specific number of dimensions, they can also be used without any specific fixed dimensionality if that is convenient.

REXX compound variables have the significant advantage over arrays in most other languages in that the "subscripts" need not be numeric; they can be any valid character string (up to some implementation-defined maximum length). This permits very useful associative retrieval of data. For instance, database records pertaining to individuals can be retrieved directly by the name of the individual:

These symbols might be used to work with a personnel file. To access any piece of data, it is necessary to have only the actual name as the value of the variable NAME. (All current REXX implementations keep data in memory only; they do not refer directly to external files. Therefore, this example assumes the data has been loaded in from some sort of file. But in principle REXX could transparently use disk files for its data.)


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics