Franziska Grundner-Culemann

Inference of biogeographical ancestry from SNP data -- an evaluation of selection methods and classifiers on a variety of SNP data sets

Wann 07.03.2018
von 14:30 bis 15:30
Wo Eckerstraße 1, Raum 404, 4. OG
Single nucleotide polymorphism in DNA have proven to be suitable for inferring biogeographical ancestry of human individuals. Various methods have been developed and recent articles in this field focus on their advantages and evaluate their qualities in a variety of settings and under different aspects, such as the ability to predict admixture rates, the dependency on assumptions or handling different rates of missing data. This thesis includes three aspects that are heavily linked:

First, we test forward selection algorithms to select a minimal sufficient or maybe even best subset of SNPs for ancestry prediction from a given set of SNPs that may not be preselected. We compare the quality of predictions on SNP sets chosen by forward selection with different methods with those on a SNP set selected using a procedure that is based on FST values. Secondly we introduce a novel version of a naive Bayesian classifier.Different versions of Bayesian classifiers have been developed for this purpose and they show good performances. Finally we use SNP data simulation software to systematically test our methods and compare the Bayesian classifier with logistic regression, which is an established method in eye color prediction from SNP data. We investigate the impact of parameters, such as migration and the number of islands, on the prediction performances and conclude the analysis with comparing the results to those on real data sets.
