TITLE: “AutoComplete: Deep Learning-based Phenotype Imputation”
ABSTRACT: Health data has become increasingly available, vast in scale, and highly missing. For many downstream applications, the ability to accurately impute missing features in health records may tap into additional analytical power which would be unrealized otherwise. While existing imputation methods are applicable, many fall short in one or more aspects of being reliable or scalable in the domain of massive, highly incomplete, and heterogenous population-scale data. We propose AutoComplete, a deep learning-based imputation method that extends with ease to incomplete datasets with millions of entries and handles heterogeneous data of continuous and categorical format. In imputing phenotypes for a collection of half-million individuals from the UK Biobank, AutoComplete significantly improved imputation accuracy for several phenotypes in comparison to best-performing low-rank factorization and deep learning methods.