Friday, November 22 2019
2:00pm - 4:30pm
PhD Thesis Presentation
Subrata Paul Doctoral Exam

Modeling Heterogeneity in an Association Framework for a Complex Trait through the Use of Mixture Models

Complex traits are influenced by multiple genetic variants and possibly by their interaction with environmental factors. Genome-wide association studies (GWAS) are used to identify common genetic variants associated with a complex trait. Individually, a genetic variant explains only a small proportion of total phenotypic variance; however, the effects of many genetic variants can be combined to produce a polygenic risk score (PRS). The predictive ability of a PRS depends on the amount of phenotypic variance explained by the genetic variants used to build the risk score. In the presence of phenotypic heterogeneity, classical association tests suffer from a loss in statistical power resulting in biased effect size estimates. The main theme of this dissertation is to incorporate phenotypic heterogeneity into an association analysis framework to accurately estimate effect sizes and boost power of the association tests.

First, to motivate methods development and provide context, polygenic risk scores are considered for generalized vitiligo, an autoimmune skin disorder. The predictive ability of 51 different scores were compared with a final score comprising 48 genetic variants having the best performance. Next, a binomial mixture model (BMM) was develop to incorporate phenotypic heterogeneity into a case-control association framework. Based on our experience in analyzing vitiligo, we hypothesize that a genetic variant could be associated with only part of the population. Under this hypothesis, a simulation study was performed demonstrating that the BMM yields smaller bias for odds ratios compared to logistic regression while having comparable statistical power.

Next, to further increase the power of association tests, a multivariate model, mixture of factor analysis regression (MixFAR), is proposed. MixFAR simultaneously identifies clusters of individuals in the multivariate phenotypic space, performs dimension reduction on each cluster to identify the latent factors, and detects association between the genetic variants and the latent factors. Through simulation MixFAR was compared with trait-based association test that uses extended Simes procedure (TATES) and factor mixture analysis (FMA) under three and six trait scenarios. In simulations with six phenotypes MixFAR has greater power compared to TATES. In simulations with three phenotypes, MixFAR has greater or comparable power to FMA and TATES when the phenotypes are positively correlated and the genetic variant is associated with all three phenotypes. While TATES is robust to the directions of correlations among phenotypes, MixFAR is robust to the association of the genetic variant with the case-control status.
Speaker:subrata paul

Download as iCalendar