Uni-Logo
Sie sind hier: Startseite Projects Medical statistics Current Projects High-dimensional
Artikelaktionen

High-dimensional

Survival models with high-dimensional data structure

Principal investigators

Prof. Dr. Martin Schumacher
Institute for Medical Biometry and Medical Informatics
University Medical Center Freiburg
Stefan-Meier-Str. 26, 79104 Freiburg, Germany
Phone: ++49 (0)761 203 6661
Fax: ++49 (0)761 203 6688

Prof. Dr. Jens Timmer
Physics Institute
University of Freiburg
Hermann-Herder-Str. 3, 79104 Freiburg, Germany
Phone: ++49 761 203 5829
Fax: ++49 761 203 5967

Researchers

Prof. Dr. Martin Schumacher ms@imbi.uni-freiburg.de ++49 (0)761 203 6661
Prof. Dr. Jens Timmer jeti@fdm.uni-freiburg.de ++49 (0)761 203 5829
Dr. Harald Binder binderh@imbi.uni-freiburg.de ++49 (0)761 203 7704
Dipl. Stat. Christine Porzelius cp@imbi.uni-freiburg.de ++49 (0)761 203 7708

Summary

Many clinical disciplines are still suffering from a comparatively low predictive power of specially developed risk scores. A hope is that essential progress is initiated by identification of genomic and proteomic features. Here, microarray data and protein mass spectra promise further insights. The understanding of whole genomes and the development of disease specific biomarkers should aid diagnosis, improve the performance of prognostic scores, and finally lead to new treatments. Such data is characterized by a huge number of potential predictors and typically only few patients, which makes it difficult to analyze. Standard survival techniques, such as fitting a Cox regression model by maximizing partial likelihood, are not directly applicable.
In this project we adapt statistical approaches that can deal with high-dimensional data structures, such as penalized estimation and boosting. These methods have been developed mostly for the continuous and binary response case. Only recently, some proposals have been made for right censored event time response variables, but there are still methodological problems. An example is the rather fragile selection of the number of steps required for path algorithm procedures. There is little research on modelling of time variation of covariates for high-dimensional data, potentially in combination with time-varying effects on survival. Therefore we start with discrete-time survival models, where time-varying covariates are easily incorporated and available techniques for binary responses variables can be adapted. In a next step we develop a competitive continuous-time approach. Boosting and path algorithm techniques will be investigated for estimation.
A central problem is the selection of regularization or complexity parameters. For our discrete-time survival approach, model selection criteria built on model-based estimates of the effective degrees of freedom will be adapted. For validation, we will investigate bootstrap-based estimates of the degrees of freedom. For continuous-time survival models, such degrees of freedom estimates are difficult to obtain, and it is important to take the right censored data structure into account. We will focus on resampling-based estimates of prediction error, that incorporate time and deal appropriately with right censoring. These estimates will then be used for selection of model complexity, to avoid overfitting for our flexible time survival approach. As an alternative, model selection based on false discovery rates will be investigated.
The work in this project will be closely coordinated with the projects of our clinical research partners. In particular, a comprehensive analysis for the project ``Microarray validation of cardiovascular risk factors'' will be provided. Further benefit can be expected from collaboration with Time-varying and Dynamic scores.

Publications

  • Faller D, Voss HU, Timmer J, Hobohm U. Normalization of DNA-microarray data by nonlinear correlation maximization. Journal Computational Biology 2003; 10(5):751-762.
  • Donauer J, Rumberger B, Klein M, Faller D, Wilpert J, Sparna T, Schieren G, Rohrbach R, Dern P, Timmer J, Pisarski P, Kirste G, Walz G. Expression profiling on chronically rejected transplant kidneys. Transplantation 2003; 76(3):539-547.
  • Goerttler PS, Kreutz C, Donauer J, Faller D, Maiwald T, März E, Rumberger B, Sparna T, Schmitt-Gräff A, Wilpert J, Timmer J, Walz G, Pahl HL. Gene expression profiling in polycythaemia vera: overexpression of transcription factor NF-E2. British Journal of Haematology 2005; 129(1):138-150.
  • Schieren G, Rumberger B, Klein M, Kreutz C, Wilpert J, Geyer M, Faller D, Timmer J, Quack I, Rump LC, Walz G, Donauer J. Gene profiling of polycystic kidneys. Nephrology, dialysis, transplantation 2006; 21(7):1816-1824.
  • Pfeifer D, Pantic M, Skatulla I, Rawluk J, Kreutz C, Martens U, Fisch P, Timmer J, Veelken H. Genome-wide analysis of DNA copy number changes and LOH in CLL using high-density SNP arrays. Blood 2006; in press.
  • Fang X, Zeisel MB, Wilpert J, Gissler B, Thimme R, Kreutz C, Maiwald T, Timmer J, Kern WV, Donauer J, Geyer M, Walz G, Depla E, von Weizsäcker F, Blum HE, Baumert TF. Host cell responses induced by hepatitis C virus binding. Hepatology 2006; 43(6):1326-1336.
  • Schumacher M, Binder H, Gerds T. Assessment of survival prediction models in highdimensional settings 2006. Manuscript, submitted.
  • Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine 1999; 18:2529-2545.
  • Gerds TA, Schumacher M. Consistent estimation of the expected brier score in general survival models with right-censored event times. Biometrical Journal 2006; in press.
  • Schumacher M, Graf E, Gerds T. How to assess prognostic models for survival data: A case study in oncology. Methods of Information in Medicine 2003; 42(5):564-571.
  • Gerds TA, Schumacher M. On Efron type measures of prediction error for survival analysis 2006. Manuscript, submitted.
  • Tutz G, Binder H. Flexible modelling of discrete failure time including time-varying smooth effects. Statistics in Medicine 2004; 23:2445-2461.
  • Binder H. Flexible Semi- and Non-Parametric Modelling and Prognosis for Discrete Outcomes. Logos: Berlin, 2006.
  • Tutz G, Binder H. Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 2006; in press.
  • Binder H. GAMBoost: Generalized additive models by likelihood based boosting 2006. R package.
  • Stollhoff R, Sauerbrei W, Schumacher M. Boosting methods to improve the performance of classifiers: an experimental investigation 2006. Manuscript, submitted.
  • Binder H, Tutz G. Fitting generalized additive models: A comparison of methods 2006. FDM-Preprint Nr. 93, University of Freiburg.
  • Tutz G, Binder H. Boosting ridge regression. Accepted for Computational Statistics & Data Analysis 2006; .
  • « April 2024 »
    April
    MoDiMiDoFrSaSo
    1234567
    891011121314
    15161718192021
    22232425262728
    2930
    Benutzerspezifische Werkzeuge