STRING MANIPULATION AND PARSING - IBM Mainframe

One of the most important strengths of the REXX language is its character string handling ability. As noted earlier, REXX has no explicit data types and all data can be manipulated as character strings. This is not a limitation for most applications where REXX is naturally used (application macros, command procedures, prototyping, etc.), and is actually quite convenient. Further, because REXX specializes in handling character strings, it does it very well and offers many built-in facilities for this purpose.

The most frequent string operation, concatenation, can be expressed with a simple operator ("| |") or in many cases none at all (direct abuttal of tokens). Equality and comparison operators for strings are the same as for numeric values, and the distinction is usually immaterial. REXX even tries to work with strings in a way that is most natural in ordinary applications, so leading and trailing blanks are ignored in the standard equality and comparison operators. Alternative "exact" equality and comparison operators are also available when leading and trailing blanks should not be ignored.

String handling is greatly facilitated by the fact that storage allocation and management in REXX is completely automatic. It is never necessary to specify the (maximum) length of a string or to allocate space for it. Providing temporary storage for intermediate results is also handled transparently, and there is no need for "garbage collection."

REXX has two other significant features designed for manipulating character strings. The first is a collection of string-oriented, built-in functions and the second is the PARSE instruction. A number of REXX's strings handling functions provide services commonly available in other programming languages. Some examples are:

  • SUBSTR( )-Sub-string of argument string
  • LENGTH( ) - Length of argument string
  • POS ( ) - Position of one argument string in another
  • COPIES ( ) - Arbitrary number of copies of argument string

REXX string functions extend far beyond such standard capabilities, however. One interesting group of functions is based on the frequently occurring situation of regarding a string as a sequence of words delimited by blanks. Strings of this sort include natural language text (after punctuation is removed) as well as short lists ("bread eggs butter onions tomatoes"). In this category are functions like WORD (string, n), which returns the nth word in the string, and WORDS (string), which returns the total number of words in the string.

There are quite a few other string functions for miscellaneous purposes, some of which have surprisingly powerful capabilities. Among these are COMPARE( ), which determines whether or not two strings are identical and otherwise returns the first position in which they differ; INSERT(), which inserts one string at an arbitrary position in another; STRIP ( ) , which removes any specific character from the beginning or end of a string; VERIFY ( ), which tests a string for the occurrence or nonoccurrence of a specific set of characters; and TRANSLATE ( ), which replaces any desired characters with specific others.

To show a bit of the flavor of string handling in REXX, here is a little program that takes a time in the form HH : MM (hours and minutes) and displays the value in English:

STRING MANIPULATION AND PARSING

'TIME IS' HR MIN'.'

For instance, when the input is 10:33 this program displays

TIME IS TEN THIRTY-THREE

There are a few features of REXX used here which have not been explained yet, such as the use of a literal in the PULL instruction and the '%" (integer division) and "//" (remainder) arithmetic operators. However, apart from illustrating string handling in REXX, the main point to be made here is how transparently REXX deals appropriately with data as either numbers or strings. Arithmetic can be performed directly on character strings when appropriate. In particular, notice how the variable minutes can be used as easily with string functions (RIGHT (), LEFT ()) as with numeric operators. Of course, a real program would have error-checking to ensure that only valid numbers are involved.

This example also illustrates how one often uses lists of words separated by blanks instead of arrays. The WORD () built-in function is used to access specific elements of the list. The use of the PULL instruction here also bears further discussion. PULL is really just a shorthand form of the PARSE instruction. The example could have been written equivalently with the line PARSE UPPER PULL HOURS ":" MINUTES instead. The full interpretation of this instruction is: "read a line of input from the user, assign everything before ":" to the variable hours and everything after ":" to minutes."

The PARSE instruction (or its equivalents implied by PULL and ARG) is used frequently in REXX programs. It is able to take strings from a number of possible sources and break them apart into constituent parts using a fairly natural notation. The part of the instruction that tells how to parse the string is called the purse template. The simplest form of a template is just a list of variable names. The input string is divided into blank-delimited words, which are assigned, in order, to the variables. If there are more words than variables, the entire remaining part of the String is assigned to the last variable. If there are more variables than words, the variables are assigned the null string. This construct is useful in reading several numbers from a user. or tabular data from a file. For instance:

STRING MANIPULATION AND PARSING

uses the LINEIN ( ) function to read a line at a time from a file into compound variables with the stem avg. Each line of the file contains four numbers separated by blanks, but otherwise in a free format. The file is easy to maintain with a text editor because there is no need for a restriction to specific column numbers. (The LINES ( ) function is nonzero until the end of the file is reached, which makes it convenient for terminating input loops.)

It is often helpful to be able to automate the processing of computer files produced by various applications. When such files are in a report format suitable for reading by people, they are more of a problem to process by another program. For instance, a report may have on a single line:

NAME: ALEXIS LEON BIRTH-DATE: 22/06/67 EMPNO: 098765

In many languages, this would require a lot of work to interpret, because (for instance) the name might be a variable number of words. A single PARSE instruction,

PARSE VAR LINE 'NAME:' NAME 'BIRTH-DATE:' BIRTHDAY, 'EMPNO:' EMPNO

handles the whole thing and assigns each data item to an appropriate variable—NAME, BIRTHDAY and EMPNO


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

IBM Mainframe Topics