Kreara: August 2009

21.8.09

Macro Variable Resolution Using Multiple Ampersands

Prepared by Meena R S (meena@kreara.com)

The SAS macro consists of two basic parts: macros and macro variables. The names of macro variables are prefixed with an ampersand (&) while the names of macros are prefixed with percent sign (%). Prior to the execution of the SAS code the macro variables are resolved. The resolved values are then substituted back into the code.

The macro variable references that have been described with one ampersand preceding the macro variable name are direct reference to a macro variable. In indirect referencing, more than one ampersand precedes a macro variable reference. The macro processor follows specific rules in resolving references with multiple ampersands.

The rules that the macro processor uses to resolve macro variable reference that contain multiple ampersands follow

Macro variable references are resolved from left to right
Two ampersands (&&) resolve to one ampersand (&)
Multiple leading ampersands cause the macro processor to rescan the reference until no more ampersands can be resolved.

Consider the example below.

Options symbolgen;

%let section4 =operating system;

%let n=4;

%put &&section&n;

For the above code, on the first pass the two ampersands are resolved to one and &n is resolved to 4, yielding &section4. On the second pass the macro variable reference &section4 resolves to operating system.

The following figure shows the process of resolving the macro variable reference in the program.

%let a =freight;

%let b=passenger;

%let c=special;

%let code=a;

%put &code;

%put &&code;

%put &&&code;

The following demonstrates how the macro variables with multiple ampersands are resolved.

Combining SAS data sets using UPDATE statement

Prepared by Rajeev V (rajeev@kreara.com)

UPDATE is an executable type SAS statement generally used in DATA steps. By updating a SAS dataset replaces the values of variables in one dataset with values from another dataset. The dataset containing the original information is the master data set, and the data set containing the new information is the transaction data set. UPDATE performs much the same function as merge with two exceptions:

Ø Only two datasets can be combined using UPDATE statement

Ø If the observation of a variable in the transaction dataset (i.e. data set containing new information) is missing, then the updated dataset containing the value of the observation same as in the master dataset

Syntax

DATA updated data set;

UPDATE master-data-set transaction-data-set;

BY variable-list;

RUN;

Where Master-data-set names the SAS data set that is used as the master file. Transaction-data-set names the SAS dataset that contains the changes to be applied to the master data set. Variable-list specifies the variables by which observations are matched.

Basic use of UPDATE statement

Consider two datasets, Lab1 and Lab2. Both data sets have four subjects with information about their blood pressure at two different times. The first dataset Lab1 contains the subject id, name of the subject and Blood pressure reading. The second dataset contain the latest blood pressure reading for the same subjects but the name of the subjects is not given.

It is required to replace the latest Blood pressure values in LAB1 with the values in LAB2. Here LAB1 is known as Master data set and LAB2 is transaction dataset.

Lab 1

Subject Name BP

001 AAA 120

002 BBB 130

003 CCC 140

004 DDD 150

Lab 2

Subject BP

001 160

002 160

003 160

004 160

The following program updates LAB1 (Master Dataset) with LAB2 (Transaction dataset).

DATA lab1_updated;

UPDATE lab1 lab2;

BY Subject;

RUN;

Printed output of lab1_updated is given below;

In the above example it is found that the latest blood pressure value is populated into updated dataset based on the BY variable Subject identifier.

Suppose Lab1 contains a duplicate observation. For example,

Lab 1

Subject Name BP

001 AAA 120

002 BBB 130

003 CCC 140

004 DDD 150

DATA lab1_updated;

UPDATE lab1 lab2;

BY Subject;

RUN;

If the above program is executed updating will not work for duplicate BY value.

That means if master data set contains two observations with the same value of the BY variable, the first observation is updated and the second observation is ignored. SAS writes a warning message to the log when the data step executes.

Printed output of lab1_updated is given below;

Observations two and three have duplicate values of BY variable subject identifier. However, the value of variable Blood pressure was not updated in the second occurrence of the duplicate BY value.

A situation may arise where a missing observation can occur in master dataset or in transaction dataset. Below example illustrates updating when dataset contains unmatched and missing observations.

Lab A

Subject Name BP

001 AAA .

002 BBB 130

003 CCC 140

004 DDD 150

Lab B

Subject BP

001 160

002 .

003 160

004 160

In the above dataset Lab A (Master Dataset) contains missing value corresponding to the subject 001 and the Lab B (Transaction dataset) contains missing value corresponding to the subject 002. The output of the below code gives the latest known BP value for each subject.

DATA lab1_updated;

UPDATE LabA LabB;

BY Subject;

RUN;

Printed output is given below;

The following is a more advanced example where the UPDATE statement is used to “flatten” a dataset containing different variable data values for a key spread across several observations. The goal is to combine non-missing values into one record per unique key value. The master dataset structure is read, but the OBS=0 option stops the DATA step from reading any data from it. The same dataset is then uploaded as transaction dataset, flattening non-missing observations for each unique key into one observation.

DATA lab;

INPUT Subject $ Calcium Albumin Chloride;

CARDS;

001 2.25 . .

001 . 49 .

001 . . 100

002 3.1 . .

002 . 50 90

002 . . .

;

RUN;

DATA lab;

UPDATE lab (obs=0) lab;

BY subject;

RUN;

Output is given below;

20.8.09

INFILE OPTIONS

Prepared by Sreeja E V(sreeja@kreara.com)

Infile has a number of options available.

FLOWOVER

FLOWOVER is the default option on INFILE statement. Here, when the INPUT statement reaches the end of non-blank characters without having filled all variables, a new line is read into the Input Buffer and INPUT attempts to fill the rest of the variables starting from column one. The next time an INPUT statement is executed, a new line is brought into the Input Buffer.

Consider the following text file containing three variables id, type and amount.

11101 A

11102 A 100

11103 B 43

11104 C

11105 C 67

The following SAS code uses the flowover option which reads the next non missing values for missing variables.

data B;

infile "External file" flowover;

input id $ type $ amount;

run;

which creates the following dataset

MISSOVER

When INPUT reads a short line, MISSOVER option on INFILE statement does not allow it to move to the next line. MISSOVER option sets all the variables without values to missing.

data B;

infile "External file" missover;

input id $ type $ amount;

run;

which creates the following dataset

TRUNCOVER

Causes the INPUT statement to read variable-length records where some records are shorter than the INPUT statement expects. Variables which are not assigned values are set to missing.

Difference between TRUNCOVER and MISSOVER

Both will assign missing values to variables if the data line ends before the variable’s field starts. But when the data line ends in the middle of a variable field, TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing value.

Consider the text file below containing a character variable chr.

ccc

dddd

eeeee

ffffff

Consider the following SAS code

data trun;

infile "External file" truncover;

input chr $3. ;

run;

When using truncover option we get the following dataset

data miss;

infile ""External file" missover;

input chr $3. ;

run;

While using missover option we get the output