1.7.10

Dynamic concatenation of transposed variables

Prepared by Jose Abraham

The transpose procedure in SAS changes multiple values in rows (for a column) into columns, and can also change multiple columns’ values into multiple rows values for a single column. The number of variables (columns) to be created in the output file is based on the maximum number of observations (rows) in a variable (column) to be transposed. If the prefix= option is not specified, then the transposed variables have names Col1, Col2 etc…. i.e. if the dataset has 4 observations in a variable, then on transposing 4 variables namely Col1, Col2, Col3 and Col4 are obtained against each value of the by variable (labparam, lab_id in the below example).

In certain situations, it is required to concatenate the columns obtained during transposing. This can be attained simply by setting the dataset and concatenating the variables using the concatenation operator (||).

e.g. Newvar= Col1|| ‘,’||Col2|| ‘,’||Col3|| ‘,’||Col4;

This procedure may however not produce desirable results if any of the four columns have missing values. If suppose Col3 and Col4 have missing values then consecutive ‘,’ will be presented as the last few characters in the variable Newvar.

The concatenation is thus done by determining the maximum number of Col variables in the transposed dataset and then concatenating the non missing variables to create the new variable against each of the by variables.

Consider the following example from a clinical trial analysis, where the laboratory values are collected at different laboratories. For different laboratories, the reference ranges adopted for parameters are different.Such a dataset is shown below


While obtaining descriptive statistics for different lab parameters, the lab ranges are to be presented in the report as a sub header of the following form.

e.g. Param1

Lab-A: 0-35 g/L (Age 0-99)

Lab-B: 0-34 g/L (Age 0-18), 0-35 g/L (Age 18-60), 0-36 g/L (Age 60-99)

For different laboratory groups the number of ranges will be different, so it is necessary to dynamically concatenate the ranges for each lab for each parameter. The steps involved in this process are given below

  1. Dataset with the lab ranges is transposed.

    *Transposing the data to get the reference ranges horizontally;

    proc transpose data=range out=range_trn;

    by labparam lab_id;

    var range;

    run;

  2. Select the information about the number of columns in the transposed dataset with the help of sashelp.vcolumn dataset. A macro variable named ‘coln’ is created with the maximum number of ‘Col’ variables as its value.

    *Creating a macro variable with maximum number of columns;

    data _null_;

    set sashelp.vcolumn end=last;

    where trim(libname) = 'WORK' and trim(memname) ='RANGE_TRN' and upcase(substr(name,1,3))='COL';

    if last then call symput('coln',trim(left(put(_n_,6.))));

    run;

  3. Set the transposed dataset and giving array reference to the ‘Col’ variables. Concatenate the ranges with using a do loop and then remove the preceding comma by taking substring of the concatenated variable. A variable named ‘labrange’ is created by concatenating with the lab_id.

    data range_con;

    length convar $400;

    array cols {&coln.} $50 col1-col&coln.;

    set range_trn;

    do i=1 to dim(cols);

    if not missing(cols[i]) then convar= trim(convar)||', '||cols[i];

    end;

    convar=substr(left(convar),2);

    Labrange=trim(left(lab_id))||':'|| compbl(convar);

    keep labparam lab_id labrange;

    run;

The dataset range_con contains the concatenated ‘labrange’ variable.

*Transposing the dataset to present the ranges for different lab_id's in different variables;

proc transpose data= range_con out=lab_rng(drop=_name_);

id lab_id;

by labparam;

var labrange;

run;

The output dataset obtained during transposing is of the following form

By merging it with the descriptive statistics dataset and using ‘labparam’ and the variables containing the concatenated lab ranges (here LAB_A and LAB_B) as by variables in proc report creates a report which contains descriptive statistics for one parameter per page. By using the ‘#byval’ option in the title statement we can obtain the sub headers of the following type.

Param1

Lab-A: 0-35 g/L (Age 0-99)

Lab-B: 0-34 g/L (Age 0-18), 0-35 g/L (Age 18-60), 0-36 g/L (Age 60-99)





No comments: