Merle Behr
—
abgelegt unter:
FDMSeminar
Learning compositional structures
Was 


Wann 
08.11.2019 von 12:00 bis 13:15 
Wo  ErnstZermeloStraße 1, Raum 404, 4. OG 
Termin übernehmen 
vCal iCal 
Many data problems, in particular in biogenetics, often come with a
highly complex underlying structure. This often makes is difficult to
extract interpretable information. In this talk we want to demonstrate
that often these complex structures are well approximated by a
composition of a few simple parts, which provides very descriptive
insights into the underlying data generating process. We demonstrate
this with two examples.
In the first example,
the single components are finite alphabet vectors (e.g., binary
components), which encode some discrete information. For instance, in
genetics a binary vector of length n can encode whether or not a
mutation (e.g., a SNP) is present at location i = 1,…,n in the genome.
On the population level studying genetic variations is often highly
complex, as various groups of mutations are present simultaneously.
However, in many settings a population might be well approximated by a
composition of a few dominant groups. Examples are Evolve&Resequence
experiments where the outer supply of genetic variation is limited and
thus, over time, only a few haplotypes survive. Similar, in a cancer
tumor, often only a few competing groups of cancer cells (clones) come
out on top.
In the second example, the single
components relate to separate branches of a tree structure. Tree
structures, showing hierarchical relationships between samples, are
ubiquitous in genomic and biomedical sciences. A common question in many
studies is whether there is an association between a response variable
and the latent group structure represented by the tree. Such a relation
can be highly complex, in general. However, often it is well
approximated by a simple composition of relations associated with a few
branches of the tree.
For both of these
examples we first study theoretical aspects of the underlying
compositional structure, such as identifiability of single components
and optimal statistical procedures under probabilistic data model. Based
on this, we find insights into practical aspects of the problem, namely
how to actually recover such components from data.