Monday, August 3 2020
10:00am - 12:00pm
PhD Thesis Presentation
Advancement of understudied variants within statistical genetics

Genetics, the study of how traits are passed down through generations, plays an important role in health and disease. While genetic variants such as common single nucleotide variants (SNVs) have been widely analyzed, other types of variants, such as copy number variants (CNVs) and rare SNVs, remain understudied due to their added complexity within analysis. In this dissertation I focus on two open questions surrounding CNVs and rare SNVs.
I first assess the relationship between CNVs and normal facial variation in a cohort of Bantu African children. Similarity in facial characteristics between relatives suggests a strong genetic component, but little is known about the role of CNVs in facial variation. We first assess how well CNVs are tagged by densely imputed SNVs. We then present a genome wide association study (GWAS) and gene set analysis of the relationship between normal facial variation and CNVs in our Bantu sample. We find that CNVs play a role in normal facial variation, with putative novel associations in five regions (p<1e-5) and nominal evidence of independent CNV association in three regions previously identified in SNV-based GWAS (p<0.05). Ultimately, our findings suggest that the study of CNVs, including the re-evaluation of existing GWAS data, is likely beneficial for the study of complex traits.
Secondly, I address the lack of methods to simulate rare variant genetic data that is representative of real data with respect to the total number of variants, allele frequency spectrum (AFS), haplotype structure, and variant annotation. I present RAREsim, a flexible and scalable genetic simulation method designed for accurate simulation of rare variants. I demonstrate RAREsimís ability to simulate the expected AFS and total number of variants within the coding region of chromosome 19 across four ancestries with varying sample sizes while maintaining haplotype structure and ability to annotate variants. I show the generalizability of RAREsimís ancestry specific default parameters in coding regions on other chromosomes, intergenic regions, various sample sizes, and other target datasets. RAREsim is easily implemented within the accompanying R package, as demonstrated in the vignette, enabling previously unavailable accurate simulation of large samples of rare variant data.
Speaker:Megan Null
Affiliation:Department of Mathematical and Statistical Sciences
Location:Zoom meeting

Download as iCalendar