Prepared by Sreeja E V
As per quality standards when presenting descriptive statistics for parameters in clinical trial reporting, the data should be aligned with respect to the decimal point. This dynamic decimal alignment and numeric precision should be maintained between varying parameters in the same dataset.
The following example contains 4 parameters A, B, C and D and their values. We have to present the descriptive statistics namely n, mean, standard deviation, minimum, median and maximum. The mean, standard deviation and median will be presented to one more decimal place than the observed value while minimum and maximum will be presented to the same number of decimal places as observed value. The value for n will be presented as integer.
As a first step the descriptive statistics needs to be computed for the parameters and then it is required to determine the number of decimal places needed for the descriptive statistics for each parameter. The observed value with the most number of decimal places is then found out and the maximum number of decimal places used to present the data is determined for each parameter.
Once the maximum number of decimal places per parameter is obtained, one simply needs to pass this information into a character variable containing a representation of the appropriate numeric format as described.
Further the maximum integer length for each parameter is determined and white space is inserted using repeat function for values whose integer length is less than maximum integer length.
data lab;
input parameter $ value;
datalines;
A 12.3654
A 13.1
B 456.1
B 456
C 41.236
C 41.04
D 1.76
D 1.241
;
run;
proc means data=lab noprint;
by parameter;
var value;
output out=desc_data N=N Mean=Mean Std=std Min=Min Median=Median Max=Max;
run;
*Determining the number of decimal points;
*For that the values of all the parameters have been converted to character values so that the digits after the decimal places can be extracted to the variable de_part and its length can be stored in the variable dec_no. If the values of a particular parameter are whole numbers then dec_no will be assigned to zero;
data deci_point;
set lab;
value_n=put(value,best.);
de_part=scan(value_n,2,'.');
if de_part ne ' ' then dec_no=length(de_part);
else dec_no=0;
run;
*Determining maximum number of decimal points for each parameter; *Here the maximum of the variable dec_no for each parameter is determined and stored in the variable decimal;
proc sql noprint;
create table decimal as select
distinct parameter,
max(dec_no) as decimal
from deci_point
group by parameter;
select * from decimal;
quit;
*Creating the formats;
*The variables zerornd, onernd, zerofmt and onefmt are determined for each parameter for rounding and formatting purpose;
proc sql noprint;
create table decimal_1 as select
distinct
parameter,
decimal,
10**(-decimal -0) format best. as zerornd,
10**(-decimal -1) format best. as onernd,
"8." put(decimal +0,1.) as zerofmt,
"8." put(decimal +1,1.) as onefmt
from decimal
;
select * from decimal_1;
quit;
*Applying decimal formats;
data desc_stats(keep=parameter fn fmean fmedian fstd fmin fmax );
merge desc_data decimal_1;
by parameter;
fn=compress(put(n,3.));
if mean ne . then fmean=compress(putn(round(mean,onernd),onefmt));
if median ne . then fmedian=compress(putn(round(median,onernd),onefmt));
if std ne . then fstd=compress(putn(round(std,onernd), onefmt));
if min ne . then fmin=compress(putn(round(min,zerornd),zerofmt));
if max ne . then fmax=compress(putn(round(max,zerornd), zerofmt));
run;
proc sort data=desc_stats;
by parameter;
run;
proc transpose data=desc_stats out=stat;
var fn fmean fstd fmin fmedian fmax;
by parameter;
run;
*To align decimal points;
*Length of integer part of each value is determined and stored in the variable lenint. Further maximum integer length is obtained by determining maximum over lenint and maxint where initial value of the variable maxint is set to zero. While attaining end of the file the value of maxint is assigned to the macro variable max;
data outdata;
set stat(rename=(col1=value)) end=eof;
retain maxint 0;
lenint=length(compress(scan(value,1,'.')));
maxint = max(maxint, lenint);
if eof then call symput("max", put(maxint, best.));
run;
*The difference between the variable max and lenint is determined by the variable diffint. For observations whose diffint>0 (i.e.the observations whose integer length is diffint times less than max) white space is inserted diffint-1 times using repeat function (repeat function gives repetitions after the original string) and concatenates that with the value after removing trailing blanks of value using trim function. For observations whose diffint=0 (i.e.the observations whose integer length same as max) no white space is inserted;
data aligned(drop=maxint lenint diffint value);
retain parameter _name_ value value_aligned;
length value_aligned $15;
set outdata;
if parameter ne '' and value ne '' then do;
diffint = &max - lenint - 1;
if diffint >= 0 then do;
value_aligned = repeat(" ", diffint)trim(left(value));
end;
else do;
value_aligned = trim(left(value));
end;
end;
run;
proc format ;
value $stat
"fn"="n"
"fmean"="Mean"
"fstd"="SD"
"fmin"="Minimum"
"fmedian"="Median"
"fmax"="Maximum"
;
run;
proc print data=aligned;
format _name_ $stat.;
run;
The output is obtained as
No comments:
Post a Comment