TITLE: “Population structure inference for biobank-scale data.”
ABSTRACT: Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. While a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method that can infer population structure from biobank-scale data. We show that SCOPE is as or more accurate than existing methods while being orders of magnitude faster. SCOPE able to infer population structure in about a day on a dataset consisting of one million individuals and SNPs. Furthermore, SCOPE is able to incorporate allele frequencies from previous studies in a supervised fashion to further aid interpretability of estimated admixture proportions.