High-dimensional
Survival models with high-dimensional data structure
Principal investigators
Prof. Dr. Martin Schumacher
Institute for Medical Biometry and Medical Informatics
University Medical Center Freiburg
Stefan-Meier-Str. 26, 79104 Freiburg, Germany
Phone: ++49 (0)761 203 6661
Fax: ++49 (0)761 203 6688
Prof. Dr. Jens Timmer
Physics Institute
University of Freiburg
Hermann-Herder-Str. 3, 79104 Freiburg, Germany
Phone: ++49 761 203 5829
Fax: ++49 761 203 5967
Researchers
Prof. Dr. Martin Schumacher | ms@imbi.uni-freiburg.de | ++49 (0)761 203 6661 |
Prof. Dr. Jens Timmer | jeti@fdm.uni-freiburg.de | ++49 (0)761 203 5829 |
Dr. Harald Binder | binderh@imbi.uni-freiburg.de | ++49 (0)761 203 7704 |
Dipl. Stat. Christine Porzelius | cp@imbi.uni-freiburg.de | ++49 (0)761 203 7708 |
Summary
Many clinical disciplines are still suffering from a comparatively low predictive power of specially developed risk scores. A hope is that essential progress is initiated by identification of genomic and proteomic features. Here, microarray data and protein mass spectra promise further insights. The understanding of whole genomes and the development of disease specific biomarkers should aid diagnosis, improve the performance of prognostic scores, and finally lead to new treatments. Such data is characterized by a huge number of potential predictors and typically only few patients, which makes it difficult to analyze. Standard survival techniques, such as fitting a Cox regression model by maximizing partial likelihood, are not directly applicable.
In this project we adapt statistical approaches that can deal with high-dimensional data structures, such as penalized estimation and boosting. These methods have been developed mostly for the continuous and binary response case. Only recently, some proposals have been made for right censored event time response variables, but there are still methodological problems. An example is the rather fragile selection of the number of steps required for path algorithm procedures. There is little research on modelling of time variation of covariates for high-dimensional data, potentially in combination with time-varying effects on survival. Therefore we start with discrete-time survival models, where time-varying covariates are easily incorporated and available techniques for binary responses variables can be adapted. In a next step we develop a competitive continuous-time approach. Boosting and path algorithm techniques will be investigated for estimation.
A central problem is the selection of regularization or complexity parameters. For our discrete-time survival approach, model selection criteria built on model-based estimates of the effective degrees of freedom will be adapted. For validation, we will investigate bootstrap-based estimates of the degrees of freedom. For continuous-time survival models, such degrees of freedom estimates are difficult to obtain, and it is important to take the right censored data structure into account. We will focus on resampling-based estimates of prediction error, that incorporate time and deal appropriately with right censoring. These estimates will then be used for selection of model complexity, to avoid overfitting for our flexible time survival approach. As an alternative, model selection based on false discovery rates will be investigated.
The work in this project will be closely coordinated with the projects of our clinical research partners. In particular, a comprehensive analysis for the project ``Microarray validation of cardiovascular risk factors'' will be provided. Further benefit can be expected from collaboration with Time-varying and Dynamic scores.
Publications