This course contains the basics of Sas Programming

Course introductionTest Your Caliber

Interview Questions

Pragnya Meter Exam

**SAS* Dates**

SAS software has a wealth of features for working with dates. Date values are commonly found in many applications in many different forms—and SAS software can read them all into SAS data sets and work with them.Dates may be found in MM/DD/YY, YY/MM/DD,or DD/MM/YY formats, with or without the slashes (or just about any delimiter such as dashes,commas, spaces,periods, etc.for that matter). Dates may also be encountered in Julian format.

A Julian date consists of the year (either two digits, such as 94,or four digits, such as 1994) followed by the number of days since January 1.As an example, February 1,1995,is written as 95032 (or 1995032) in Julian format.Dates are also found in ddMMMyy format,such as 01FEB95.Whatever the original format,the SAS System can read the date and convert it to a SAS date value.More about that in a paragraph or two.

Once you've read dates with your SAS program,you can perform any number of operations on them,including formatting them for output as well as for arithmetic operations.You may want to

- sort a data set in date order
- find the elapsed number of intervals between two dates in days, months, years, etc.
- extract the day of the week, month of the year, or the year for a particular date
- print dates in reports in a variety of formats.

To help you accomplish these goals,SAS software provides you with informats, formats,and date functions.In this chapter, we show you how these powerful features can assist you in all of these tasks.

Now, let's get back to those SAS date values. All SAS dates are stored as the number of days from January 1,1960 (day 0,picked somewhat arbitrarily).This allows you to find the interval between any two dates and to have a uniform way of storing all dates. It is very important to remember that once a date has been read in and converted to a number, it is treated as any other numeric quantity in a SAS data set. For example, if you read in a date of January 1, 1990, a value of 10,958 will be stored. SAS programs have no way of knowing if a number is a date or not. With these points in mind, let's proceed to our first example.

There are many ways to represent dates in a raw data file. For example, February 13, 1982, could be found as:

02/13/82

02-13-82

02+13,82

13FEB82

021382

02/13/1982

13/02/82

82044 (Julian Date)

1982044 (Julian Date)

SAS software can read all these forms and more. For example,if each of these date values starts in column 8, you can read each of them with the appropriate informat as follows:

Now, here is the first example.You want to read in the following raw data:

Write a SAS program to read in these data and compute the number of days between the admission and discharge dates. You will then print out a report showing: ID, ADMIT, DISCHRG, LOS,and COST, where LOS is the length of stay that you computed. Here is the program:

**Example **

DATA PATIENT;

INFILE 'HOSP'?

INPUT @i ID $2.

€5 ADMIT MMDDYY8,

€15 DISCHRG MMDDYY8.

€25 COST 5.;

LOS * DISCHRG-ADMIT+1?

LABEL ADMIT *'Admission Date'

DISCHRG-'Discharge Date'

COST ='Cost of Treatment'

LOS »'Length of Stay';

FORMAT ADMIT DISCHRG MMDDYY8. COST DOLLARS.;

RUN;

PROC PRINT LABEL DATA=PATIENT;

TITLE 'Hospital Report';

ID ID;

VAR ADMIT DISCHRG LOS COST;

RON;

Why do we add one day (ADMIT + 1) to the LOS calculation? This is something that most people forget to do when calculating a range in general. In this case, when patients come in for same-day surgery,if you just take DISCHRG-ADMTT as the LOS, you would lose all of those days.Now, here is the output:

*Output from Example- Reading a Date from Raw Data*

The FORMAT statement assigning the format MMDDYY8. to the variables ADMIT and DISCHRG is necessary to make the displayed date values understandable. Also note the use of the DOLLAR8,format applied to COST to enhance its readability. Had you omitted the FORMAT statement, the listing would have looked like this:

The numbers you see under the Admission Date and Discharge Date columns are the number of days from January 1,1960, corresponding to each of the dates.When you see output like this,you should be reminded to include date formats for your date variables. Also note that the values for COST are not easily readable without the dollar signs and commas.

Suppose you have a data file containing dates in a format for which SAS software does not have an informat.Suppose further that you can extract the month, day, and year of each date.To make it still more gruesome,suppose that the month, day, and year information are not even in contiguous columns. Enter the MDY function.

This function,which takes as its arguments month, day, and year, will create a SAS date for you. For example, suppose you have DAY in columns 1-2, MONTH in columns 10-11,and YEAR in columns 20-23.To create a SAS date, you would code:

**Example **

DATA MDYEXAMP;

INPUT DAY 1-2

MONTH 10-11

YEAR 20-23;

DATE=MDY (MONTH,DAY,YEAR);

FORMAT DATE WORDDATE*;

DATAMNES;

12 11 1992

11 09 1899

;

PROC PRINT DATA=MDYEXAMP;

TITLE 'Example of MDY function';

RUN;

*Output from Example - Creating a SAS Date from Month, Day, and Year*

The WORDDATE format prints the month name,the day of the month, and the year as seen in the above output.

Suppose you have a SAS data set EMPLOYEE which contains variables ID, DOB (date of birth),and several other variables. One requirement is to compute AGE from a date of birth. You have several choices to make.You can round the age to the nearest year or drop all fractional parts and take the age as of the last birthday.

You also need to consider as of what date you want the age computed.The example that follows is technically an approximate value for age,using the fact that an "average" year has 365.25 days (1leap year every 4 years). More sophisticated programming is needed to compute exact ages. First, let's look at ways to compute ages using the approximate method. Here is a program to compute a person's age several different ways:

**Example **

DATA AGE;

SET EMPLOYEE;

AGE1=INT (('01JAN95'D-DOB)/365.25);

AGE2=ROUND (('01JAN95'D-DOB)/365,25,1);

AGE3=INT ((TODAY()-DOB)/365.25);

AGE4=ROUND ((TODAY()-DOB)/365.25,1);

RUN;

AGE1 and AGE2 represent the age as of January 1,1995,with AGE2 rounded to the nearest year.These assignment statements use a date literal which is of the form: 'DDMMMYY'D DD is a two-digit day of the month, MMM is a month abbreviation, and YY is a two- or fourdigit year (it would be'DDMMYYYY'D in the latter case).

This string is placed in single or double quotation marks,followed by an upper-or lower-case D. SAS software will convert this literal into a SAS date (i.e. the number of days since January 1, 1960).Variables AGE3 and AGE4 are ages computed as of the current date which is represented by the TODAY() function.

The null parentheses () following the function name are required by SAS System syntax; all functions, even ones without any arguments, need parentheses so that the software can distinguish variable names from functions names.(Question: What did the TODAY function say to the MEAN function? Answer: "Don't give me any arguments!") Finally, AGE1 and AGE3 compute age as of the last birthday—the INT function truncates (throws away) the decimal part of the number.

AGE2 and AGE4 use the ROUND function to compute age to the nearest year.These functions are described in more detail in Example 3 in Chapter 5, "SAS Functions." We demonstrate an exact method to compute age later in this chapter.

The SAS System contains functions to compute day of the week and day of the month from a SAS date. It is easy to confuse the two. The DAY function returns a number from 1 to 31 which represents the day of the month corresponding to a given date; the WEEKDAY function returns a number from 1 to 7,corresponding to the day of the week.Suppose you want to chart the frequency of hospital admissions by day of the week and day of the month using the HOSP data from . Here is the program:

**Exarrmle **

PROC FORMAT; O

VALUE DAYFMT 1='SUN' 2='MON' 3='TUE' 4='WED

5='THU' 6='FRI' ?='SAT';

RON;

DATA PATIENT;

INF1LE 'HOSP';

INPUT €1 ID $2.

85 ADMIT MMDDYY6.

115 DISCHRG MMDDYY8,

€25 COST 5.?

DAY_WEEK * WEEKDAY (ADMIT); ©

DAYJ10N - DAY (ADMIT); ®

tABBL ADMIT ='Admission Date'

DISCHRG ='Discharge Date'

COST ='Cost of Treatment'

DAYJ=EEK-'Day of the Week'

DAYJION -'Day of the Month';

FORMAT ADMIT DISCHRG MMDDYY8, COST DOLLARS,

DAYJfEEK DAYFMT,;

RUN;

PROC CHART DATA=PATIENT;

VBAR DAY_WEEK DAY MON/DISCRETE;

RUN;

Line © computes DAY_WEEK which is the day of the week, a number from 1 to 7.Line © computes DAY_MON,the day of the month, a number from 1 to 31.PROCFORMAT 0 creates a format for DAY_WEEK.As is usually the case with the SAS System, there is more than one way to achieve the same results.In the current example, DAY_WEEK could have been calculated in an alternative manner which would not necessitate the PROCFORMAT step.The following line of code,

DAY_WEEK=PUT(ADMIT,WEEKDATE3.);

creates a three-character variable (DAY_WEEK) which has as its value SUN,MON,TUE, etc.The only other difference is that DAY_WEEK is now a character variable as opposed to the earlier method which yields a numeric value for DAY_WEEK.In all likelihood, however, this would not present a major difficulty, if at all.This is just another testament to the versatility and power of the SAS System.

What day of the week was July 4,1776? How can you create variables that represent the YEAR and MONTH for a given date? SAS software has a number of very useful date functions to answer these and other questions.The next two examples demonstrate representative situations.Suppose you have a SAS data set FUND which contains fund drive information. Included are variables DATE (date of donation, stored as a SAS date value) and AMOUNT (amount of donation).You would like charts showing donation AMOUNT by year and month. Here is the program:

**Example **

DATA TEMP;

SET FUND;

YEAR=YEAR (DATE);

MONTH=MONTH (DATE);

RUN?

PRQC CHART DATA=TEMP;

VBAR YEAR/SUMVAR=AMOUNT DISCRETE;

VBAR MONTH/SUMVAR=AMOUNT DISCRETE;

RUN;

You need only use the YEAR and MONTH functions to extract the appropriate information from DATE.The YEAR function returns a four-digit year; the MONTH function returns a number from 1 to 12.You might want to create a user-defined format for MONTH with 1='JAN', etc."PROC CHART," has more details on the CHART procedure.

*Computing Date Intervals*

The remaining examples in this chapter cover the INTCK and INTNX functions. These functions are used to compute the number of intervals between two dates and the date after which a given number of intervals has occurred, respectively. Interval units can be one of the following: DAY, WEEK,TENDAY, SEMIMONTH, MONTH, QUARTER (QTR), SEMIYEAR, or YEAR. There is also a WEEKDAY interval which allows you to count the number of working days, specifying which day(s) of the week will be days off. These functions can be quite complicated and SAS Technical Report P-222, a paper in the 1994 Northeast SAS Users Group Proceedings, by Alan Dickson entitled "Blind Dates & Other Lapses of Reason: Handling Dates in SAS." We demonstrate some elementary applications of these functions .

Suppose you have a company SAS data set EMPLOY which contains, among other variables, EMP_NUM (employee number) and DATEHIRE (date of hire).You want to know how many years each employee has worked.One approximate method is to count how many year intervals have passed between the date of hire and the current date.The SAS function INTCK can be used to count how many times the period of employment has entered a new year (i.e. how many January Ist's have passed).This is accomplished using the following code:

**Example**

DATA NEW_EMP;

SET EMPLOY;

CURRENT=TODAY();

WORK_YRS=INTCK ('YEAR',DATEHIRE,CURRENT)?

RUN?

Using 'YEAR' for the first argument of the INTCK function gives you the number of times a yearly boundary has passed.For example,if an employee were hired on November 3, 1993, and the value for CURRENT is February 1,1994, the program above would set WORK_YRS equal to 1 since there is one yearly boundary (January 1st) between November 3, 1993, and February 1, 1994. If an employee were hired on January 2, 1994, and the value for CURRENT is December 31,1994, the value of WORK_YRS would be 0,even though the employee had worked almost a full year.

Be cautious when using these functions and interpret your results very carefully.If you want to count the number of months between the current month and the month of hire you can use the INTCK function with MONTH as the appropriate interval.In general, the code to determine the number of months between the DATEHIRE date and the CURRENT date is:

WORK_MON=INTCK ('MONTH',DATEHIRE,CURRENT);

In the first case where the date of hire is 11/3/93 and the CURRENT date is 2/1/94,the calculation gives you 3 (you have passed 3 month boundaries: Dec., Jan., and Feb.).

Age in years(as of the last birthday)was computed using an approximate calculation based on the fact that an average year contains 365.25 days.While this is usually good enough,an exact method is shown next in case you need to use it someday.It also serves as another example of the INTCK function.

This calculation involves counting the number of months between two dates (e.g. date of birth and the current date) and subtracting one month if the current day of the month is less than the day of the month of the person's birthday.This calculation was originally posted on a public SAS bulletin board (SAS-L) by William Kreuter,and we present it here (with a slight modification) with his permission:

AGE=INT( (INTCK('MONTH',BIRTH,CURRENT) - (DAY(CURRENT)<DAY(BIRTH)))/12 );

BIRTH is the date of birth and CURRENT is the date of the age that you want to compute.The result is the age as of the last birthday. Note the clever use of the logical expression (DAY(CURRENT) < DAY(BIRTH)) in the equation.If the current day of the month is less than the person's birthday,the logical expression is true and you subtract 1 from the number of months.If the current day of the month is greater than the birthday day of the month,the expression is false,which means you subtract 0 from the number of months (you count the current month in the total). Dividing by 12 converts the months to years,and the INT function drops any fractional part of the year.

Let's test this with several current dates. If a person were born on October 21, 1992, and the current date is November 23, 1994,the AGE will be 2 years since INT ((25 - 0)/12) = 2.If the current date is September 15,1994, the number of months is 23.You subtract 1 since the current day of the month (15) is less than the day of the month at birth (21).The result is AGE=1.Finally, if the current date is October 3, 1994,the number of months is 24, but you subtract 1 (3 is less than 21 and the logical expression is true) to get 23. Dividing this number by 12 and than applying the INT function to the result yields the correct answer, 1.

One situation where this exact method differs from the approximate method is when you are computing ages and the current date is a person's birthday. For example, if the date of birth is 10/21/92,and you compute the age in years for a current date of 10/21/93 by the approximate method (using the INT function), the age will be computed as 0; by the exact method it will be 1 year.

The reason for the bogus number is the .25 in the divisor. In this example, the difference in days is 365.Dividing this by 365.25 yields a number less than 1, and applying the INT function to that result yields 0. Be careful.

The companion function to INTCK is the INTNX function.This function returns a SAS date value which is a given number of intervals after a base date value. Suppose you want to find the date 10 months after a patient's initial visit so a follow-up office visit can be arranged.Working from the initial visit date, VISIT, you can easily generate the date 10 months hence as:

FOLLOWUP=INTNX ('MONTH',VISIT,10);

For example,if the date of VISIT is 10/21/94,the FOLLOWUP date will be August 1,1995.Once again, realize that this function,like INTCK,works with internal boundaries (in this case,the 1st of each month).As long as you are fully cognizant of what the code is doing, you can't go wrong. Remember one of Murphy's laws: When working toward solving a problem, it always helps when you know the answer.

**Problems**

- You have an instream raw data file of patient hospital stays with the following file layout:
- You have a SAS data set, DATES2,that contains variables MONTH and YEAR (of birth)but not day. Create a new data set based on DATES2 and include AGE as of the date the program is run, using the 15th of the month as an approximate day for the date of birth. Round the value of AGE in years to the nearest year. Store the MMDDYY8. format with DOB.
- You have a SAS data set, DATES3,which contains the variables ID, DOB (date of birth), VISIT_D, VISIT_M, and VISIT_Y(visit day, month, and year respectively). Write a program to create a new data set which includes a variable called AGE (approximate age rounded to the nearest year at the time of visit.) Try to calculate the value of AGE with one statement.
- You have a set of raw data showing the number of crayons produced at Company X on certain sample dates. The company started producing crayons in the beginning of 1990. Here is the raw data file layout:
- You are assigned the job of "match-making" for a dating firm. The firm has its data in a SAS data set called CLIENTS, consisting of variables ID (ID number),GENDER (M or F),and DOB (date of birth.)The astrologer on staff suggests that there are two compatible groups of clients, one being males born on a Wednesday or Thursday in either January or March, and the other being females born on Fridays in either August or September.First, write a program to create a subset of CLIENTS which contains clients fitting the astrologer's suggestions. Next, print out two lists, one for females and one for males,listing the clients by ID and showing the DOB, the DAY of the week on which they were born,and the MONTH of the year in which they were born.

- a) Write a program to create a SAS data set called DATES 1, and list the resulting data set with PROC PRINT. Create variables ID, ADMIT, DISCH, and DOB from the given data, and also create the following new variables:

AGE age in years on the date of admission (as of the last birthday)

DAY numeric day of the week of admission date (l=Sun, 2=Mon, etc.)

MONTH numeric month of year of admission date (l=Jan, 2=Feb, etc.)

Set up the DATA step so that the variables print with the following formats:

ADMIT mm/dd/yy

DISCH mm/dd/yy

DOB ddMMMyyyy

AGE no decimals

DAY 3 letter names (SUN, MON, etc.)

MONTH numeric (1=Jan, 2=Feb, etc.)

b) Compute the mean age of all patients (use PROC MEANS — see Chapter 10, "PROC MEANS and PROC UNIVARIATE").

c) Create a chart showing the frequency of admissions by day of week (use PROC CHART—see Chapter 12, "PROC CHART").

d) Compute frequencies on the number of visits by month of year (use PROC FREQ — see Chapter 10).

You want to compute the mean number of crayons produced in each quarter, starting from 01/01/90. For example, the mean number of crayons produced in quarter 0 (from 01/01/90 to 03/31/90) is 12,550.There are various ways to do this, but try using the INTCK function.Also, see Chapter 10 for the syntax for PROC MEANS.