Kreara: March 2009

13.3.09

Shift table

Laboratory test results are commonly labeled as "low", "high" or "normal" depending on whether the observed value is below the lower reference limit, above the upper reference limit or between the limits. These categorical outcomes are commonly summarized using shift tables. A laboratory shift table is a tabular display that can show how a population’s laboratory data change, or ‘shift’, over time. A shift table can provide this information at a glance.

The following example illustrates how a shift table presents changes in laboratory values from visit to visit. Here we consider the laboratory parameter chloride, in the laboratory group bio-chemistry. The normal range for the parameter is 98 to 107. Then the laboratory values obtained for chloride are categorized as low, normal and high according as described above. Here the lab values less than or equal to 98 are flagged as Low, the lab values within 97 and 107 are flagged as Normal and the lab values above 107 as high. There are three visits in this example screening, visit 1 and visit 2. The screening is considered as baseline visit so here we consider two cases

1. Changes occurred in lab values from screening to visit 1
2. Changes occurred in lab values from screening to visit 2

*********************************************************;

*Program name: Shift table;
*Purpose: Illustrate how a shift table create and present;
*********************************************************;
*Give formats for Visits;
proc format;
value vsyn 0='Screening'
1='Visit-1'
2='Vist-2';
*Give formats for categories;
value flg -1='Low'
0='Normal'
1='High';
run;

data lab;
length labgrp $20 labparam $20;
format visit vsyn. flag flg.;
input SUBNO VISIT labgrp&$ labparam&$ val low high;
if val <=low then flag=-1; else if val>low and val<=high then flag=0; else if val>high then flag=1;
cards;

1 0 Bio chemistry Chloride (MMOL/L) 108 98 107
1 1 Bio chemistry Chloride (MMOL/L) 101 98 107
2 0 Bio chemistry Chloride (MMOL/L) 105 98 107
2 1 Bio chemistry Chloride (MMOL/L) 97 98 107
3 0 Bio chemistry Chloride (MMOL/L) 101 98 107
3 1 Bio chemistry Chloride (MMOL/L) 100 98 107
4 0 Bio chemistry Chloride (MMOL/L) 104 98 107
4 1 Bio chemistry Chloride (MMOL/L) 106 98 107
5 0 Bio chemistry Chloride (MMOL/L) 101 98 107
5 1 Bio chemistry Chloride (MMOL/L) 108 98 107
6 0 Bio chemistry Chloride (MMOL/L) 103 98 107
6 1 Bio chemistry Chloride (MMOL/L) 103 98 107
7 0 Bio chemistry Chloride (MMOL/L) 105 98 107
7 1 Bio chemistry Chloride (MMOL/L) 104 98 107
8 0 Bio chemistry Chloride (MMOL/L) 97 98 107
8 1 Bio chemistry Chloride (MMOL/L) 108 98 107
9 0 Bio chemistry Chloride (MMOL/L) 100 98 107
9 1 Bio chemistry Chloride (MMOL/L) 100 98 107
10 0 Bio chemistry Chloride (MMOL/L) 103 98 107
10 1 Bio chemistry Chloride (MMOL/L) 102 98 107
11 0 Bio chemistry Chloride (MMOL/L) 100 98 107
11 1 Bio chemistry Chloride (MMOL/L) 102 98 107
12 0 Bio chemistry Chloride (MMOL/L) 109 98 107
12 1 Bio chemistry Chloride (MMOL/L) 104 98 107
13 0 Bio chemistry Chloride (MMOL/L) 99 98 107
13 1 Bio chemistry Chloride (MMOL/L) 101 98 107
14 0 Bio chemistry Chloride (MMOL/L) 101 98 107
14 1 Bio chemistry Chloride (MMOL/L) 100 98 107
15 0 Bio chemistry Chloride (MMOL/L) 110 98 107
15 1 Bio chemistry Chloride (MMOL/L) 96 98 107
1 2 Bio chemistry Chloride (MMOL/L) 108 98 107
2 2 Bio chemistry Chloride (MMOL/L) 99 98 107
3 2 Bio chemistry Chloride (MMOL/L) 100 98 107
4 2 Bio chemistry Chloride (MMOL/L) 106 98 107
5 2 Bio chemistry Chloride (MMOL/L) 96 98 107
6 2 Bio chemistry Chloride (MMOL/L) 103 98 107
7 2 Bio chemistry Chloride (MMOL/L) 108 98 107
8 2 Bio chemistry Chloride (MMOL/L) 103 98 107
9 2 Bio chemistry Chloride (MMOL/L) 100 98 107
10 2 Bio chemistry Chloride (MMOL/L) 96 98 107
11 2 Bio chemistry Chloride (MMOL/L) 102 98 107
12 2 Bio chemistry Chloride (MMOL/L) 104 98 107
13 2 Bio chemistry Chloride (MMOL/L) 101 98 107
14 2 Bio chemistry Chloride (MMOL/L) 100 98 107
15 2 Bio chemistry Chloride (MMOL/L) 99 98 107
;

*Create datasets for each of the visits and rename the flag variable;

data scr (where=(visit=0)rename=(flag=flg0))
vis1(where=(visit=1)rename=(flag=flg1))
vis2(where=(visit=2)rename=(flag=flg2));
set lab;
run;

*Merge the datasets so that the flags for the three visits lie adjacent to each other;

data comon;
merge scr vis1 vis2;
run;

*Find number of subjects for all combinations of laboratory flags

from baseline (screening) to visit 1;
proc means data=comon completetypes noprint missing;
class labgrp labparam flg0 flg1 /preloadfmt ;
output out=shif01 N=num;
run;

*Transposing data into presentable format;

proc transpose data= shif01(where=(flg1 ne . and flg0 ne .)) out=trn1;
by flg0 notsorted;
id flg1;
var num;
copy labgrp labparam;
run;

*Find number of subjects for all combinations of laboratory flags from baseline (screening) to visit 2;

proc means data=comon completetypes noprint missing;
class labgrp labparam flg0 flg2 /preloadfmt ;
output out=shif02 N=num;
run;

*Transposing data into presentable format;

proc transpose data= shif02(where=(flg2 ne . and flg0 ne .)) out=trn2;
by flg0 notsorted;
id flg2;
var num;
copy labgrp labparam;
run;

*Combine two transposed datasets to create report;

data final(where=(_NAME_ ne '' and labgrp ne '' and labparam ne ''));
merge trn1
trn2(rename=(low=low2 normal=normal2 high=high2));
run;

*Determining number of subjects under each flag at baseline;

data base;
set shif01;
keep flg0 num;
where labgrp eq '' and labparam eq '' and flg1 eq . and flg0 ne .;
rename num=basenum;
run;

data final1;
merge final base;
by flg0;
run;

*rtf output of shift table;

option nodate nonumber;

ods rtf file=".\shifteg.rtf";
proc report data= final1 nowd style(header)=[background=white font_size=8 pt] split='*';
column labgrp labparam flg0 ('Screening vs. Visit 1' Low Normal High ) ('Screening vs. Visit 2' Low2 Normal2 High2 ) basenum;
define labgrp/order 'Lab Group'left;
define labparam/order 'Lab Parameter' left;
define flg0/display 'Baseline *Status'left;
define Low/display 'Low' center;
define Normal/display 'Normal' center;
define High/display 'High'center;
define Low2/display 'Low'center;
define Normal2/display 'Normal'center;
define High2/display 'High'center;
define basenum/display 'Baseline*Count' center;
run;
ods rtf close;

The output is obtained as shown below

In the table the column named ‘Baseline Count’ represents the number of low, normal and high flags at baseline visit(or screening) i.e. there is 1 subject with low flag, 11 with normal flag and 3 subjects with high flag.

The ‘Screening vs Visit 1’ section is a 3x3 contingency table. The value in first cell (first row, first column) presents the number of subjects for whom the flag was low both at Screening and Visit1. The second cell (first row, second column) further presents the number of subjects with low flag at screening and normal flag at visit1 and so on.

6.3.09

The Bootstrap Method

Prepared by Meena R S

Bootstrap method is a way of simulating results for a larger number of samples based on the given sample. The bootstrap method is used to quantify the uncertainty in any parameter estimate (e.g., mean, variance, percentile value, etc.). All bootstrap methods involve generating hypothetical samples from the original sample. Each hypothetical sample is called a Bootstrap Sample.

Bootstrap method has the following assumptions

1. The sample taken should be a valid representative of the population

2. Bootstrap method takes sampling with replacement from the sample. Each sub sampling is independent and identical distribution (i.i.d.). In other words, it assumes that the sub samples come from the same distribution of the population, but each sample is drawn independently from the other samples.

The bootstrap works by computing the desired statistic for a sub sample of the data set. The sub sampling is done with replacement and the size of the sample is equal to the size of the original sample. The desired statistic is calculated for each sub sample. The collection of these statistics is used as an estimate of the sampling distribution.

Example 1: The following example represents the length of 3 different petals 10 trees of the same type. This program estimates the uncertainty parameters mean and standard deviation

data petals;
input petal1 petal2 petal3;
cards;
1.21 1.31 1.53
2.13 2.21 3.17
1.59 1.70 1.56
1.45 1.23 1.21
1.41 1.96 1.24
1.04 1.8 1.58
1.03 1.05 2.1
1.4 1.25 1.26
1.56 1.26 1.34
1.82 1.24 1.56
;
run;

* This macro is used to generate 10 bootstrap samples of the above data;

%macro bootsamp(data,boot, b);
data &boot;
do isample=1 to &b;
do i = 1 to nobs;
pt = round(ranuni(0) * nobs) ;
set &data nobs = nobs point=pt; *point options is used to create samples in any order;
output;
end;
end;
stop;
run;
%mend;

%bootsamp(petals, boot, 10); *Generating 10 bootstrap samples;

*Calculating the parameters mean and standard deviation for each subsample and appending them to obtain the final sample dataset;

%macro sample(j=, n=);
%do i=&j %to &n;
data petals1;
set boot;
where isample=&i;
run;

proc means data=petals1 ;
var petal1 petal2 petal3 ;
output out = petals_&i mean = mean std = std n =n;
run;

proc append base=final data=petals_&i force;
run;
%end;
%mend;

%sample (j=1, n=10);

proc means data=final mean std ;
var mean std;
output out = means_ mean = mean std = std n =n;
run;

Output is shown below

Use of ampersands and semicolons in SAS macros

Prepared by Jose Abraham

Single, double and triple ampersands

Multiple ampersands can be used to allow the value of a macro variable to become another macro variable reference. The macro variable reference will be rescanned until the macro variable is resolved.

The following demonstrates how macro variables with multiple ampersands are resolved. There are 4 macro variables

Macro variable :Value
A : CATCH
B : STUMP
C : RUN
HIT : A

Resolving a macro variable:

&VARNAME references a macro variable. The rule is that the scanner reads from left to right.

1. If we put one ampersand i.e., ‘&HIT’ then the macro variable hit resolves to ‘A’.

2. If we put two ampersands then the two ampersands resolve to one and scanner continues. i.e., ‘&&HIT’
On the first scan - ‘&&’ resolves to ‘&’ and ‘HIT’ held as token.
On the second scan – ‘&HIT’ resolves to ‘A’.

3. If we put three ampersands i.e., ‘&&&HIT’
On the first scan -‘&&’ resolves to & and the remaining &HIT resolves to ‘A’ and the it results ‘&A’
On the second scan –‘&A’ resolves to ‘CATCH’

Single and double semi colons
When creating macros for programming, sometimes we would like to generate a dynamic SAS statement within a macro %DO loop. For example if we want to run a print procedure inside a macro and refer to a set of macro variables within the VAR statement.

proc print;
var
%do i = 1 %to &max;
&&var&i
%end;;
run;

Consider a simple program containing this %DO loop in a macro

data one;
input A $ B C D E;
datalines;
a 12 16 18 20
;
run;

%let var1=A;
%let var2=B;
%let var3=C;
%let var4=D;
%let var5=E;
%let max=4;
%let indt=one;

%macro prnt;
proc print data=&indt.;
var
%do i = 1 %to &max.;
&&var&i
%end;;
run;
%mend;

%prnt;

In this program, there are two consecutive semicolons used after the %end statement which is not common in a simple SAS program. Here the first semicolon closes the %END and the second semicolon closes the VAR statement. And if we run this macro it will generate the following SAS statements

proc print data=one;
var a b c d;
run;

and it produces the result

5.3.09

Alignment of decimal points

Prepared by Sreeja E V

As per quality standards when presenting descriptive statistics for parameters in clinical trial reporting, the data should be aligned with respect to the decimal point. This dynamic decimal alignment and numeric precision should be maintained between varying parameters in the same dataset.

The following example contains 4 parameters A, B, C and D and their values. We have to present the descriptive statistics namely n, mean, standard deviation, minimum, median and maximum. The mean, standard deviation and median will be presented to one more decimal place than the observed value while minimum and maximum will be presented to the same number of decimal places as observed value. The value for n will be presented as integer.

As a first step the descriptive statistics needs to be computed for the parameters and then it is required to determine the number of decimal places needed for the descriptive statistics for each parameter. The observed value with the most number of decimal places is then found out and the maximum number of decimal places used to present the data is determined for each parameter.

Once the maximum number of decimal places per parameter is obtained, one simply needs to pass this information into a character variable containing a representation of the appropriate numeric format as described.

Further the maximum integer length for each parameter is determined and white space is inserted using repeat function for values whose integer length is less than maximum integer length.

data lab;
input parameter $ value;
datalines;
A 12.3654
A 13.1
B 456.1
B 456
C 41.236
C 41.04
D 1.76
D 1.241
;
run;

proc means data=lab noprint;
by parameter;
var value;
output out=desc_data N=N Mean=Mean Std=std Min=Min Median=Median Max=Max;
run;

*Determining the number of decimal points;
*For that the values of all the parameters have been converted to character values so that the digits after the decimal places can be extracted to the variable de_part and its length can be stored in the variable dec_no. If the values of a particular parameter are whole numbers then dec_no will be assigned to zero;

data deci_point;
set lab;
value_n=put(value,best.);
de_part=scan(value_n,2,'.');
if de_part ne ' ' then dec_no=length(de_part);
else dec_no=0;
run;

*Determining maximum number of decimal points for each parameter; *Here the maximum of the variable dec_no for each parameter is determined and stored in the variable decimal;

proc sql noprint;
create table decimal as select
distinct parameter,
max(dec_no) as decimal
from deci_point
group by parameter;
select * from decimal;
quit;

*Creating the formats;

*The variables zerornd, onernd, zerofmt and onefmt are determined for each parameter for rounding and formatting purpose;

proc sql noprint;
create table decimal_1 as select
distinct
parameter,
decimal,
10**(-decimal -0) format best. as zerornd,
10**(-decimal -1) format best. as onernd,
"8." put(decimal +0,1.) as zerofmt,
"8." put(decimal +1,1.) as onefmt
from decimal
;
select * from decimal_1;
quit;

*Applying decimal formats;

data desc_stats(keep=parameter fn fmean fmedian fstd fmin fmax );
merge desc_data decimal_1;
by parameter;
fn=compress(put(n,3.));
if mean ne . then fmean=compress(putn(round(mean,onernd),onefmt));
if median ne . then fmedian=compress(putn(round(median,onernd),onefmt));
if std ne . then fstd=compress(putn(round(std,onernd), onefmt));
if min ne . then fmin=compress(putn(round(min,zerornd),zerofmt));
if max ne . then fmax=compress(putn(round(max,zerornd), zerofmt));
run;

proc sort data=desc_stats;
by parameter;
run;

proc transpose data=desc_stats out=stat;
var fn fmean fstd fmin fmedian fmax;
by parameter;
run;

*To align decimal points;

*Length of integer part of each value is determined and stored in the variable lenint. Further maximum integer length is obtained by determining maximum over lenint and maxint where initial value of the variable maxint is set to zero. While attaining end of the file the value of maxint is assigned to the macro variable max;

data outdata;
set stat(rename=(col1=value)) end=eof;
retain maxint 0;
lenint=length(compress(scan(value,1,'.')));
maxint = max(maxint, lenint);
if eof then call symput("max", put(maxint, best.));
run;

*The difference between the variable max and lenint is determined by the variable diffint. For observations whose diffint>0 (i.e.the observations whose integer length is diffint times less than max) white space is inserted diffint-1 times using repeat function (repeat function gives repetitions after the original string) and concatenates that with the value after removing trailing blanks of value using trim function. For observations whose diffint=0 (i.e.the observations whose integer length same as max) no white space is inserted;

data aligned(drop=maxint lenint diffint value);
retain parameter _name_ value value_aligned;
length value_aligned $15;
set outdata;
if parameter ne '' and value ne '' then do;
diffint = &max - lenint - 1;
if diffint >= 0 then do;
value_aligned = repeat(" ", diffint)trim(left(value));
end;
else do;
value_aligned = trim(left(value));
end;
end;
run;

proc format ;
value $stat
"fn"="n"
"fmean"="Mean"
"fstd"="SD"
"fmin"="Minimum"
"fmedian"="Median"
"fmax"="Maximum"
;
run;

proc print data=aligned;
format _name_ $stat.;
run;

The output is obtained as