Franziska Grundner-Culemann
—
abgelegt unter:
FDM-Seminar
Inference of biogeographical ancestry from SNP data -- an evaluation of selection methods and classifiers on a variety of SNP data sets
Was |
|
---|---|
Wann |
07.03.2018 von 14:30 bis 15:30 |
Wo | Eckerstraße 1, Raum 404, 4. OG |
Termin übernehmen |
vCal iCal |
Single nucleotide polymorphism in DNA have proven to be suitable for inferring biogeographical ancestry of human individuals. Various methods have been developed and recent articles in this field focus on their advantages and evaluate their qualities in a variety of settings and under different aspects, such as the ability to predict admixture rates, the dependency on assumptions or handling different rates of missing data. This thesis includes three aspects that are heavily linked:
First, we test forward selection algorithms to select a
minimal sufficient or maybe even best subset of SNPs for ancestry
prediction from a given set of SNPs that may not be preselected. We
compare the quality of predictions on SNP sets chosen by forward
selection with different methods with those on a SNP set selected using a
procedure that is based on FST values. Secondly we introduce a novel
version of a naive Bayesian classifier.Different versions of Bayesian
classifiers have been developed for this purpose and they show good
performances. Finally we use SNP data simulation software to
systematically test our methods and compare the Bayesian classifier with
logistic regression, which is an established method in eye color
prediction from SNP data. We investigate the impact of parameters, such
as migration and the number of islands, on the prediction performances
and conclude the analysis with comparing the results to those on real
data sets.