8.6.10

Index, Indexc and Indexw functions

Prepared by Meena R S

The INDEX, INDEXC and INDEXW functions are used for determining the character matches. These functions are useful for testing purposes. The index and indexc functions can indicate if a string of characters is present in a target variable. Both functions return the position number of the match in the target variable. A zero indicates that the search argument is not present in the target variable.

For finding a special character such as a letter, a group of letters, or special characters, Index functions can be used and it is case sensitive. The syntax is,

INDEX (source, excerpt)

The INDEXC function allows multiple arguments and will identify the first occurrence of any of the characters in any of the arguments, but otherwise functions similarly to the index function.

/* Example: 1 */

/* Results*/


The INDEXW function searches source, from left to right, for the first occurrence of excerpt and returns the position in source of the substring's first character. If the substring is not found in source, INDEXW returns a value of 0. If there are multiple occurrences of the string, INDEXW returns only the position of the first occurrence.

The INDEXW function is case sensitive function that performs exactly the same function as the INDEX function, with one significant exception. The indexw function searches for strings that are words, whereas the index function searches for patterns as separate words or as parts of other words.

/* Example: 2 */

/* Results */

Result Explanation

The above program demonstrates the difference between INDEX and INDEXW functions. In the first observation in the table above, the INDEX function returns a 1 because the letters "the" as part of the word "there" begin the string. Since the INDEXW function needs either white space at the beginning or end of a string to delimit a word, it returns a 12, the position of the word "the" in the string. Observation 3 emphasizes the fact that a punctuation mark does not serve as a word separator. Finally, since the string "the" does not appear anywhere in the fourth observation, both functions return a 0.

Read from multiple external files in one data step by using FILEVAR= option

Prepared by Jose Abraham

External files are usually read into SAS one by one using separate data steps for each external file. But multiple external files which have the same structure can be easily read into SAS in one data step by using FILEVAR= and END= options in the INFILE statement. Following example illustrates how to read multiple external files where the locations of the external files are stored in another external file.

Consider we have demographic information of subjects from three different centers stored in three external files. All the three external files have the same structure as given below




The data values are aligned in columns and there are no missing values. The layout follows.


We have another external file which contains the location information of these external files. Suppose these files are stored in the 'demog' folder in the E-drive, and the following external file (dmgfiles) which contains the locations is also in it.


Following SAS data step reads the three external files in one DATA step by using the names which are specified in the external file 'dmgfiles'. This reads the list to determine the external files it should read.



Data step working:

1.First INFILE statement specifies the name of the external file containing the list of filenames that the DATA step should read.

2.First INPUT statement reads the name of the external files with modified list input. A width (60) which is sufficient to hold the name of the external file is specified.

3.Second INFILE statement specifies a text, dummy, and this act as a placeholder for the file specification which is always required on the INFILE statement. The actual specification for the input file comes from the value of the variable assigned by the FILEVAR= option.

a.The FILEVAR= option is set to 'dmgfiles', the variable that contains the name of the external file that the current iteration of the data step should read.

b.END= option defines a variable that SAS sets to 1 when it reads the last data line in the currently opened external file. The END= variable is initialized to 0 and retains the value until it detects that the current input data line is the last in the external file. SAS then sets the variable to 1.

c.When the FILEVAR= option is included in the INFILE statement, SAS resets the END= variable to 0 when the value of the FILEVAR= variable changes (If SAS did not reset the value of the END= variable to 0 each time it opened a new external file, the DATA step would stop after reading the first external file).

4.The do while loop is controlled by testing the value of the END= variable. The loop stops after SAS reads the last data line in the currently opened external file.

5.Name of the file from which the records are read (source file name) is assigned into a variable 'Source'.

6.The above data step iterates four times: one for each of the dmg files (dmg01, dmg02, dmg03) and a fourth time in which it detects that there are no more data lines in the external file that contains the filenames.

7.The default behavior of SAS is that it writes an observation to a data set only at the end of each iteration of the DATA step. An explicit OUTPUT statement is specified to avoid this and output all data values read form the external file.

8.The output dataset 'demogdat' obtained is as follows

Source

Ctrn

Subjid

Age

Sex

Race

E:\demog\dmg01.txt

001

001_01

29

Male

Caucasian

E:\demog\dmg01.txt

001

001_02

28

Female

Caucasian

E:\demog\dmg01.txt

001

001_03

25

Male

Caucasian

E:\demog\dmg02.txt

002

002_01

27

Male

Asian

E:\demog\dmg02.txt

002

002_02

28

Male

Asian

E:\demog\dmg02.txt

002

002_03

25

Female

Asian

E:\demog\dmg03.txt

003

003_01

27

Female

Asian

E:\demog\dmg03.txt

003

003_02

28

Male

Asian

E:\demog\dmg03.txt

003

003_03

25

Female

Asian