Making Your Sorts More Efficient: Using the NOEQUALS Option - SAS Programming

If you do not need to maintain the original order of observations within each BY group, you can specify the NOEQUALS option of PROC SORT to reduce machine time. The following program uses the default EQUALS option which does maintain the original order:

Example– INEFFICIENT

PROC SORT DATA=TEST;
BY YEAR;
RUN;

Use the NOEQUALS option to specify that the order of observations within the levels of the BY variables in the sorted data set does not have to be the same as that of the data set before sorting.

Example – EFFICIENT

PROC SORT DATA=TEST NOEQUALS;
BY YEAR;
RUN;

A Word on Indexing

It would be a serious omission if we did not mention indexing before leaving this chapter, so we mention it briefly.Do not take that as meaning that it is unimportant.It's just another topic that is beyond the scope of this.

For the programmer working with large data sets, indexing is a method that trades disk storage for efficiency. The decreases in search time may be offset by the increased space needed to store the indices.The main advantage of indexing is that you can directly access an observation of an indexed variable.When you are considering a small subset of a large file, indexing can be significantly more efficient than processing without indices.Data can also be retrieved for BYgroup processing without sorting when indexing is used.

Problems

    Rewrite each of the following programs to make them more efficient:

  1. DATA ONE;
  2. INPUT GROUP $ X Y Z;
    DATALINES;
    A 1 2 3
    B 2 3 4
    B 6 5 4
    A 4 5 6
    RUN;
    PROC SORT DATA=ONE;
    BY GROUP;
    RUN;
    PROC MEANS N MEAN STD DATA=ONE;
    BY GROUP;
    VAR X Y Z;
    RUN;
    DATA TWO;
    SET ONE;
    IF 0 LE X LE 2 THEN XGROUP=1;
    IF 2 LT X LE 4 THEN XGROUP=2;
    IF 4 LT X LE 6 THEN XGROUP=3;
    RUN;
    PROC FREQ DATA=TWO;
    TABLES XGROUP;
    RUN;
    DATA NEW;
    SET OLD;
    RAWSCORE=SUM (OF SCORE1-SCORE100);
    RUN;
    PROC SORT DATA=NEW;
    BY GENDER;
    RUN;
    PROC MEANS N MEAN STD MAXDEC=3;
    BY GENDER;
    VAR RAWSCORE;
    RUN;
  3. The raw data file BIGFILE contains one or more blanks between all data values.SAS data set variables ITEM1-ITEM5 are 1 byte in length.
  4. DATA ONE;
    INFILE 'BIGFILE';
    INPUT GENDER $ ITEM1-ITEM5 X Y Z;
    IF GENDER = 'M' THEN COMPUTE = 2 * X + Y;
    IF GENDER = 'F' THEN COMPUTE = 2 * X;
    RUN;
    PROC FREQ DATA=ONE;
    TABLES ITEM1-ITEM5;
    RUN;
    PROC PLOT DATA=ONE;
    PLOT Z * COMPUTE;
    RUN;

    Note: Variables ITEM1-ITEM5 are used for frequencies only. No arithmetic operations are performed on these variables.

  5. SAS data set ONE contains variables GROUP, GENDER, RACE, and X1-X100.
  6. DATA TWO;
    SET ONE;
    IF GROUP=1 OR GROUP=3 OR GROUP=5;
    RUN;
    PROC FREQ DATA=TWO;
    TABLES GENDER * RACE;
    RUN;
  7. SAS data set LARGE contains the variables ID,DATE, YEAR, SCORE1- SCORE5, X1-X100.You want to create two new SAS data sets. The first one should contain the first observation for each date (the one with the lowest ID number on each date.)The other one should contain the last observation for each date (the one with the highest ID number on each date.) The new data sets are to contain only ID, DATE,and SCORE1-SCORE5. You also want to restrict the data to the years 1990 through 1993, inclusive.
  8. PROC SORT DATA=LARGE;
    BY DATE;
    RUN;
    DATA FIRST;
    SET LARGE;
    BY DATE;
    WHERE YEAR BETWEEN 1990 AND 1993;
    DROP X1-X100;
    IF FIRST.DATE;
    RUN;
    DATA LAST;
    SET LARGE;
    BY DATE;
    WHERE YEAR BETWEEN 1990 AND 1993;
    DROP X1-X100;
    IF LAST.DATE;
    RUN;

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SAS Programming Topics