Friday, April 29 2022

8:45am - 10:15am

8:45am - 10:15am

PhD Thesis Presentation

Using Data-Consistent Inversion to Overcome Spurious Inference in Genome-Wide Association Studies.

Typically, linear or logistic regression models have been used in the additive GWAS setting to identify associations between phenotypes (traits) and genotypes (genetic variants) of interest, depending on whether the phenotype is continuous or binary. Using regression, especially assuming additive variant effects, has two major flaws. The first problem is related to regression assumptions. Normality of residuals is one of the primary assumptions for a linear regression based model. However, residuals rarely follow a perfect normal distribution. Deviation from normality increases the Type II error rate and decreases the power of the test. Classical statistics usually use transformation methods to coerce the data into a normal distribution.. However, it cannot guarantee asymptotically normally distributed parameters based on the regression assumptions, meaning p-values based on these assumptions may be inaccurate. The second problem is concerned with the additive assumption in the GWAS regression model. GWAS traditionally treats the count of a certain genetic variant (0, 1, or 2) as the format for a genetic predictor variable. Specifically, it makes an additivity assumption that 2 of this variant have twice the mean effect of 1 of the variant. The assumption is made to justifying treating this discrete variable as a continuous one in an ordinary regression model. If this additive assumption is false, this may result in spurious conclusions about the associations between the variant and a trait. This research proposes methods to cope with the flaws mentioned above. First (Aim 1), to address deviation from normality, we propose applying Data Consistent Inversion as a new way to identify the GWAS model’s parameters. DCI requires none of the assumptions of the linear regression model. It only uses a domain or a contextual knowledge about parameters to find a true distribution of a variant’s mean effects. To cope with the additive assumption flaw and account for the true ordinal format of the variant, we replace the GWAS regression model with a cell means model, and apply DCI independently to the variant count specific models derived from the additive GWAS model. The method allows us to find true distributions of model parameters for each model, specifically the plausible mean effects for each variant count. To quantify the significance of associations, we utilize non-parametric tests for differences in these three distributions of means. In this research, we propose that this adaptation will produce a more accurate p-value than from the usual GWAS regression approach. This hypothesis is tested in multiple simulation scenarios and a real data example from the COPDGene project. The second part of the project focuses on the effect of non-normality in GWAS by studying only continuous forms of the genetic variant (thereby removing the need for the additive assumption). The goal is to directly modify the asymptotic p-values based on the DCI identified true distributions (contrast this to the first aim where we switch to non-parametric tests), resulting in improved power and decreased Type I error rates. The final component of the dissertation is to modify the underlying likelihood ratio testing structure based on asymptotic assumptions to the DCI identified parameter distributions.

Speaker: | Negar Janani |

Affiliation: | |

Location: | Zoom link in email |

Done