Lena Kulla
On a Mystery in Machine Learning
Was |
|
---|---|
Wann |
07.07.2023 von 12:00 bis 13:00 |
Wo | Hörsaal 2, Albertstr. 23b |
Termin übernehmen |
vCal iCal |
In classical regression modelling, the complexity of the model, e.g. measured by the number of parameters, is smaller than the amount of training data. The prediction error exhibits a U-shaped behaviour. The (first) descent is due to decreasing bias, the ascent due to increasing variance. In modern machine learning, often the number of parameters by far exceeds the number of training data points. Intuitively, one could expect that the prediction error explodes with increasing model complexity due to overfitting. Belkin et al. (2019) observed that this is not the case. Instead, the prediction error decreases again when surpassing a certain threshold in model complexity, in some case even below the minimum of the classical, U-shaped regime. A phenomenon the authors denominated as double descent. To understand double descent, we study the simplest setting of linear regression and show that it can be explained by investigating the singular values of the design matrix. Finally, we give an outlook for the non-linear model setting.
Belkin, M.; Hsu, D.; Ma, S.; Mandal, S.: Reconciling modern machine-learning practice and the classical bias–variance trade-off. In: Proceedings of the National Academy of Sciences 116 (2019), jul, Nr. 32, 15849–15854.