Prepared by Jose Abraham
Survival analysis (also called time to event analysis) is concerned with studying the time between entry to a study and a subsequent event. These methods are most often applied to the study of deaths. In fact, they were originally designed for that purpose, which explains the name survival analysis. Survival analysis is an important medical concern and is extremely useful for studying events like onset of disease and recurrence of disease.
The point of survival analysis is to follow subjects over time and observe at which point in time they experience the event of interest. The data which is obtained from survival studies may contain censored observations. Censoring comes in many forms and occurs for many different reasons.
For example if we consider a cancer study in which the subjects after response from treatment were followed up for a specific period of time for the recurrence of cancer (event of interest). If a subject experiences recurrence at time t, which is not known exactly and all we know that the event occurred after a specific time T (i.e. t>T), then the last time at which the subject was observed is recorded and the survival time for that subject is considered as right censored. Also if the recurrence is experienced before a specific time, and the exact time is unknown, then the survival time recorded from that subject is considered as left censored. So the times obtained from subjects who are having no recurrence until the end of the study and those who were lost to follow up, before the end of the study period are censored.
In the aforesaid study, the basic structure of the data is that for each case there is one variable which contains either the time that recurrence happened or, for censored cases, the last time at which the case was observed, both measured from the chosen origin. Another variable that denotes the censoring status of each case is also present (uncensored =1 and censored=1). Also the data contain values of other variables such as markers, tissues etc…. A small part of data in this form is given below
data molecules;
input marker surv censor stage histo;
datalines;
0 75 1 2 1
1 115 0 3 2
1 96 1 1 1
0 110 0 2 3
0 178 0 3 2
1 149 1 2 3
1 163 1 4 4
0 211 1 1 2
1 167 1 2 1
0 195 0 2 1
1 140 1 3 4
0 202 0 4 4
0 153 0 2 2
1 147 0 1 3
0 132 0 4 1
0 178 1 3 2
;
run;
Analysis of censored data can be easily performed in SAS with the help of various procedures like PROC LIFETEST, PROC PHREG etc.The purpose of the analysis is to model the underlying distribution of the survival time variable and to assess the dependence of the survival time variable on the independent variables.
The Kaplan Meier curve is plotted by taking disease free survival time on the horizontal axis and survival probability on the vertical axis. This curve is useful to measure the proportion of patients surviving at a specific time. Also we can compare the survival experience of two groups by comparing their curves. This comparison of survival estimates can be done by making use of the strata statement in PROC LIFETEST. The significant differences of the Kaplan Meier curves can be tested by Logrank test. If the p-value in the log-rank test is large (>0.05) then we can say that there is no difference in survival. The piece of SAS code for doing this comparison of survival curves between those cases in which the marker is present (marker=1) and those in which it is not present (marker=0).
proc lifetest data= survdata method=km plots=(s,lls) outsurv=option;
time surv*censor (0);
strata marker;
run;
The strata statement provides the log rank test and Wilcoxon test statistics. The outsurv= option in the proc lifetest statement to create a SAS data set that has the KM survival estimates. Plots=(s, lls) produces log-log curves as well as survival curves. The log-log survival curves will be parallel or nearly parallel if the proportional hazard assumption is met.
Kaplan – Meire Curves
Hazard ratio is a reasonable estimate for representing the effect of different factors in event occurrence. Cox regression model can be used and it models the time to event data. This can be done in SAS using the PROC PHREG. The following piece of code can be used to model the data
proc phreg data =molecules;
model surv*censor(0) =marker stage histo /rl ties=breslow selection=b;
baseline out=out1 survival=s logsurv=ls loglogs=lls;
run;
The backward selection procedure (with the option selection=b) in Cox’s regression removes the non-significant variables from the regression model and it includes only significant variables in the final model. The option ties= breslow is used to handle the ties. Proc phreg produces the regression coefficients and their standard errors for the variables which were included in the final model along with the p-values obtained from the Wald’s chi-square test. Hazard ratios and their 95% confidence intervals for those variables are also included in the output.
Hazard ratios can be interpreted similarly as that of interpreting odds ratios, i.e. a hazard ratio of 1 for an explanatory variable can be interpreted as it has no effect on the hazard. While a hazard ratio less than 1 denotes that the variable effect results in a decreased hazard. And a hazard ratio greater than 1, denotes that the variable effect results in an increased hazard.