Local Ancestry Inference in Non-Model Organisms

Abstract

Recent advances have allowed accurate inference of the timings of and contributors to admixture events (wherein genetically diverged populations come together) from SNP data. It has been shown that statistical methods based on Hidden Markov Models can accurately assign chromosomal segments of the admixed individuals to unseen ancestral populations, and simultaneously infer how these ancestral populations relate to observed modern populations. This has been accomplished in the context of high quality, phased haplotype data on many individuals, sampled from both the admixed target population and multiple extant reference populations. The most recent contributions in this area do not need to assume prior knowledge of the relationship between the unseen ancestral mixing groups and the reference panels. However, such methods have thus far not been extended to non-model organisms. Researchers in this area are increasingly reliant on low-coverage whole-genome sequencing data, for which genotypes are not called and a high rate of SNP missingness occurs. Versions of imputation and phasing algorithms do exist that allow for the replacement of called genotypes with genotype likelihoods, allowing for application to such data. We propose and outline an extension to existing models for multiway admixture that accounts for uncertainty in the genotypes and allows for a large rate of missing data. We demonstrate and assess the method by downsampling high coverage data. The approach will be especially useful in conservation genetics studies.

Publication
47th European Mathematical Genetics Meeting