2022 Bruins-In-Genomics Summer Undergraduate Research Program

2022 B.I.G. Summer Participants

Lab PIsMentorsStudents
PATRICK ALLARDRamanuj Sarkar
Medhini Sosale
VALERIE ARBOLEDASarah SpendloveDesiree Cervantes
Sarah SpendloveMatthew Moleres
PAUL BOUTROSTaka YamaguchiAnna Neiman-Golden
Chenghao ZhuSiddharth Mahesh
Julie LivingstoneKarthik Guruvayurappa
Chenghoa ZhuAli Mobedi
Taka YamaguchiPhilippa Steinberg
MANISH BUTTEAudrey Yang
JEFF CHIANGJohnson ChenAsiya Dalak
Johnson ChenEleanor Carr
Akos RudasSimon Lee
JASON ERNSTHa VuAahna Rathod
Ha VuSaiyang Liu
Ha VuElijah Jones
NANDITA GARUDRicky WolffKaysa Pfannmuller
Mariana HarrisJulianna Perez
Xiaolu GuoJianche Liu
WILLIAM HSURuiwen (Rina) DingGarrett Duncan
JIMMY HULeah YeHanifa Maswali
STEVEN JACOBSENZhenhui ZhongKarisa Ke
COLIN KREMERAarshi Jain
HONGHU LIUJie ShenTalia Villa
Jie ShenFabely Moreno
KIRK LOHMUELLERSaige Daines
CHONGYUAN LUOMatthew HeffelCatalina Holguin
Matthew HeffelMaya Brawer-Cohen
AARON MEYERJackson ChinCarly Gordon
Jackson ChinJalissa Emmens
ALIREZA MOSHAVERINIAIsabela Molina
Mary Torres Guzman
ROEL OPHOFFMarcelo FranciaEthan Concepcion
Marcelo FranciaZachary Jordan
PAIVI PAJUKANTAAsha Kar
MATTEO PELLEGRINISabrina Perez
Ashley Lopez
Wenbin GuoJunxi Feng
Wenbin GuoHongxiang Fu
Salvador Ayala
TAMER SALLAMZhengyi ZhangVivien Su
SRIRAM SANKARARAMANNoela Wheeler
Aakarsh Anand
Prateek Anand
Michelle Johnson
MARC SUCHARDFan BuKeyi Xue
HUNG TON-THATAmanda Fuenzalida
DANIEL TWARDBryson GrayOsagie Aimiuwu
Bryson GrayTiffanie Crumbie
Bryson GraySarthak Tiwari
SHARMILA VENUGOPALDaniel Felipe Forero
Fiona Latifi
Haoxuan (Harry) Zhang
Cameron Gill
CLAUDIO VILLANUEVAMirian Krystel De SiqueiraVictoria Pellot
CUN-YU WANGJing WangKennedey Boyette
Jing WangEduardo Del Rio
JENNIFER WILSONEmily Liu
Jarod Le
DAVID WONGKarolina Kaczor-UrbanowiczRachelle Chanthavong
Misagh KordiTimothy Lindsey
Misagh KordiManuel Cortes
GRACE XIAOJonatan HervosoRohan Chatterjee
Jonatan HervosoRaymond Benitez
XIA YANGMonty BlencoweNuoya Jiang
Monty BlencoweImri Haggin
Monty BlencoweDarren Wijaya
Monty BlencoweNing Wang
JASMINE ZHOURan HuTony Zhang
Ran HuHarinarayana Mellacheruvu

2022 B.I.G. Summer Poster Abstracts

Integrating Gene Expression in the Alignment of Spatial Transcriptomics Data

OSAGIE K. AIMIUWU1,2, Kalen Clifton3, Michael I. Miller3, Jean Fan3,4, Daniel Tward5

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2University of North Carolina, Chapel Hill

3Department of Biomedical Engineering, Johns Hopkins University

4Department of Computer Science, Johns Hopkins University

5Department of Computational Medicine, Department of Neurology, UCLA

Spatial transcriptomics (ST) can reveal novel information about cell types by providing gene expression profiles of cells in tissues, including the brain. To compare datasets and test biological hypotheses, alignment between pairs of brain slices is necessary. However, it is unknown how gene expression information should be included during the alignment procedure. We acquired ST datasets consisting of three coronal slices with three replicates each. The datasets were rasterized and each cell was represented as a smooth Gaussian function with height proportional to gene expression level. We applied dimensionality reduction and large deformation diffeomorphic metric mapping to the rasterized images, calculating both affine and diffeomorphic transformations to generate aligned coordinates for the original cells. Alignment accuracy was determined from error between sets of annotated landmarks on the images. We found that the alignment ignoring gene expression generally had lower error, suggesting simpler alignment calculations can be done in the future.

Aimiuwu

Explaining potential epistasis in genomic data using symbolic representations of complex black box models

AAKARSH ANAND1, PRATEEK ANAND1, Boyang Fu2, Sriram Sankararaman2,3,4,5

  1. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
  2. Department of Computer Science, UCLA
  3. Department of Human Genetics, David Geffen School of Medicine, UCLA
  4. Department of Computational Medicine, David Geffen School of Medicine, UCLA
  5. Bioinformatics Interdepartmental Graduate Program, UCLA

Epistasis, known as the interaction among genetic variants, has long been hypothesized to play a major role in explaining missing heritability. Though recent studies have found many candidate variants demonstrating epistasis signals in the UKBiobank, it remains a controversial question how to interpret the findings. Nonlinear models have shown potential in capturing these signals, but we require additional explanation methods to understand the projected relationships. Here, we utilize symbolic pursuit, a form of symbolic regression that provides a closed-form, interpretable model which generalizes first order explanations. Furthermore, we extend this study by applying Taylor expansions to the model, balancing interpretability with performance while improving its generalizability. We found the method performed reliably and was consistent with other methods across a variety of simulated data. This work contains strong implications for its use on large genomic datasets and its ability to capture nonlinear interactions without prior knowledge of the genetic architecture.

Anand

Using the Multistate Epigenetic Pacemaker (MSEPM) to investigate the effects of age, weight, cell type, and sex on DNA methylation.

SALVADOR AYALA1, ASH LOPEZ1, SABRINA C. PEREZ1, Matteo Pellegrini2, Christopher He2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Molecular, Cell and Developmental Biology, UCLA

Methylomes are dynamic and change throughout the life of an organism. Numerous physiological characteristics, including aging and weight, can impact DNA methylation. A conditional expectation maximization approach is implemented in The Multistate Epigenetic Pacemaker (MSEPM), which attempts to account for the influence of multiple factors on DNA methylation. This project utilizes MSEPM modeling to develop multivariate models of DNA methylation across three different datasets. Using Python, we constructed models for blood and buccal swab samples. We analyzed how multiple factors, like age, sex cell types along with weight and exercise, impact the methylomes. Our results indicate that factors other than age sex and cell types also have measurable impacts on DNA methylation.

Characterization of allele-specific alternative splicing and expression events in schizophrenia

RAYMOND BENITEZ1, ROHAN CHATTERJEE1, Jonatan Hervoso1,2, Xinshu (Grace) Xiao1,2,3

  1. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
  2. Bioinformatics Interdepartmental Program, University of California, Los Angeles, California 90095, USA;
  3. Department of Integrative Biology and Physiology, University of California, Los Angeles, California 9009

RNA splicing is a highly regulated RNA processing mechanism, where introns are removed and exons are ligated to produce mature mRNA. Additionally, RNA splicing is a primary link between genetic variations and complex traits. Furthermore, RNA splicing is highly tissue-specific, with high frequency and complex alternative splicing patterns in brain tissues. Many studies have shown that phenotypical changes in the prefrontal cortex are associated with schizophrenia. However, the genetic basis of dysregulated splicing in schizophrenia is not fully understood. In this study we examined RNA-seq data from the Dorsolateral prefrontal cortex (DLPFC) of schizophrenia and control samples. We implemented the Allele Specific Alternative mRNA processing (ASARP) method, to identify genetically influenced allele-specific alternative splicing (ASAS) events and allele-specific expression (ASE) events that contribute to schizophrenia. Our approach identified several genes with allele-specific patterns for expression and splicing events caused by single nucleotide polymorphisms. Additionally, genes had statistically significant proportional differences between control and schizophrenia samples. While most genes with ASAS events were common to both cohorts, a smaller subset of the genes were exclusive to each cohort, suggesting the existence of distinct splicing patterns for schizophrenia samples. A large subset of the genes with ASE were exclusive to the schizophrenia samples, suggesting that schizophrenia samples can be characterized by having more distinct ASE than control samples. Our results provide an improved understanding of the genetically driven ASAS and ASE contributing to schizophrenia.

Benitez:Chatterjee.pdf

Single Cell Analysis of tumors expressing CD45 in patients with HNSCC

KENNEDEY BOYETTE1, EDUARDO DEL RIO1, Dr. Jing Wang3, Dr. Cun Yu Wang1,2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Oral and Systemic Health Sciences, School of Dentistry, UCLA

3 Department of Human Genetics, David Geffen School of Medicine, UCLA

Head and Neck Squamous Cell Carcinoma (HNSCC) is an aggressive form of cancer that is typically diagnosed at late stages which increases the risk of metastasis; in turn, limiting the treatment possibilities for patients with HNSCC. In this study 33 patient tumor samples that had positive and negative CD34 were analyzed; the primary objective was to determine the cell-type composition of these tumors. Furthermore, the secondary objective was to determine the differences in tumor compositions of male and female patients through gene expressions. Using single cell RNA sequencing (scRNAseq) cells were grouped in clusters to determine whether most of the tumors shared the same composition. By doing so it makes it possible to better understand biochemical pathways that enhance tumor growth in the tumor microenvironment (TME).

Boyette

Identification of cell-types and biological pathways associated with autism spectrum disorder using single-cell genomic analysis

MAYA BRAWER-COHEN1, Katherine Eyring3, Cuining Liu2, Kevin Abuhanna2, Terence Yi2, Yi Zhang2, Brie Wamsley3, Daniel H. Geschwind2,3, Chongyuan Luo2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics, David Geffen School of Medicine, UCLA

3 Department of Neurology, David Geffen School of Medicine, UCLA

BAutism Spectrum Disorder (ASD) has been associated with gene misregulation through studies with bulk transcriptome profiling. Single-cell sequencing technology enabled the identification of associated cell-types and differentially expressed genes. This project aims to identify relevant cell-types in ASD versus control subjects, as well as differentially expressed and differentially methylated genes at the single-cell level through multi-omic analysis (snmCT-seq).  Although the effect size of individual genes was small, Pathway over-enrichment analysis indicates the involvement of excitatory (IT-L2 – IT-L6), inhibitory (MGE-Pvalb, MGE-SST, CGE-Vip, Sncg), and non-neuronal (ASC, ODC/OPC) cell-types in several disorder-relevant pathways, such as the reduction of cytosolic Ca2+ levels in RNA and cation-coupled chloride cotransporters in methylation. These omic pathway differences could have behavioral implications given the role of calcium in neurotransmitter release and membrane excitability, leading to the cognitive differences observed in ASD patients.

Brawer-Cohen_Maya_BIG_Poster

Differential loop analysis of phenotypically and genetically juxtaposed mouse strains using visualization tools

ELEANOR S CARR 1 , Zeyuan Chen2, Jonathan Flint3, Jeffrey Chiang4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Dept of Computer Science, Samueli School of Engineering, UCLA

3 Dept of Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, UCLA

4 Dept of Computational Medicine, David Geffen School of Medicine, UCLA

Previous research of knockout gene experiments in mouse strains C57BL/6 (black6) and DBA/2J (OBA) has suggested that the genes Lsamp, Ptprd, and Nptx2, which code for neuronal proteins, are key determinants of behavioral differences in fear, anxiety, and pain responses. To further explore the underlying mechanisms behind these genes, tissue samples were sequenced and analyzed for conformational differences in chromatin packaging, captured in Hi-C data. While there are many computational tools for the processing of this data, it can be difficult to understand the outputs of these complex differential analyses. With this project, we hope to draw conclusions about long-range gene regulation by visualizing chromatin packaging via looping. Using the EpigenomeBrowser from WashU St. Louis, we demonstrate how the flexibility and modularity of browser-based linear visualizations can allow us to identify long-range interactions in an intuitive and accessible manner.

BIG_Poster_Carr-8.10

Identifying case versus control differences in polygenic risk scores for endo-phenotypes associated with congenital heart disease

DESIREE CERVANTES1, Sarah Spendlove2,3, Maria Palafox3, Valerie Arboleda,3,4,5

1 Bruins in Genomics (B.I.G.) Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Bioinformatics Interdepartmental Program, UCLA

3 Department of Human Genetics, David Geffen School of Medicine, UCLA

4 Pathology and Laboratory Medicine, University of California Los Angeles

5 Department of Computational Medicine, University of California Los Angeles

This investigation explores the distribution of polygenic risk scores (PRS) for congenital heart disease (CHD) across six different endo-phenotypes. Previous work by Spendlove et al. finds that heart valve PRS for all SNVs have the most significant p-value out of all phenotypes studied. To expand upon this study, the deciles of each endo-phenotype PRS were investigated to characterize finer resolution differences between probands and pseudo-controls. The first and tenth deciles of the PRS distribution and their odds ratio yielded the most significant findings despite the direction of the effect being opposite from what was expected in that individuals with the highest PRS have the lowest risk for CHD. Since PRS only considers common variants, we suspect the presence of rare variants in the individuals’ genomes contributing to their risk for CHD. Although the previous paper showed that the heart valve PRS is significantly associated with the phenotype and demonstrates that the tenth decile is significantly different, a Wilcoxon test was conducted to determine the mean difference between the probands and pseudo-controls amongst all deciles and found that there is no significant difference between the two. However, when separating the deciles into individual box plots it was found that in the tenth decile, probands were much more abundant in PRS counts than their pseudo controls which supports that there is a possibility of rare variants contributing to the expression of CHD. It was also revealed that all heart sounds severity SNVs showed an even larger difference between PRS for probands and pseudo-controls than heart valves. Singletons for the top and bottom deciles for all heart valve SNVs and all heart sounds severity SNVs were matched to individuals in each decile to examine their contribution to potential rare variants. Lastly, rare variant burden analysis was performed on the tenth PRS decile.

Cervantes

Exploring the characteristics of compromised (“loser”) pancreatic cells as potential type-II diabetes biomarkers

RACHELLE CHANTHAVONG3, Karolina Kaczor Urbanowicz1, Slavica Tudzarova-Trajkovska2, Matteo Pellegrini4

1 School of Dentistry, UCLA

2 School of Medicine, UCLA

3 Samueli School of Engineering, UCLA

4 B.I.G. Summer Program, Institute for Quantitative and Computation Biosciences, UCLA

There is emerging evidence that injured cells in some beta-cell populations are not being properly eliminated and may contribute to the development of type II diabetes. Our research aims to find potential biomarkers in these beta-cell populations that will identify these injured or “loser” cells. Our approach will be to develop scripts in Rstudio to select all insulin-positive cells among non-diabetics and type-II diabetics, separate these beta-cell subpopulations that express certain genes, and then give them a module score. Using these scores, we will then determine the differentially expressed genes between the high-scoring beta-cell subpopulations and the lowest-scoring beta-cell subpopulations. The results will then be used to develop a gene signature that can identify biomarkers for “loser” cells. This would help better understand the mechanism for the development of type-II diabetes as well as serve as a potential diagnostic measure for type-II diabetes.

Metabolome-Wide Association Study of Cerebral Spinal Fluid

ETHAN CONCEPCION1, ZACHARY JORDAN1, Marcelo Francia2, Toni Boltz3, Roel Ophoff3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Neuroscience, David Geffen School of Medicine, UCLA

3 Department of Genetics, David Geffen School of Medicine, UCLA

The genetic etiology of neuropsychiatric disorders is extremely complex. While genome wide association studies(GWAS) have identified genomic loci that could be implicated in these disorders, the biological mechanisms by which these loci act remain to be elucidated. To study the impact of these genetic variants, we performed a Metabolome Wide Association Study(MWAS) using metabolites from cerebrospinal fluid(CSF) samples collected from 460 subjects in conjunction with imputed genotype data. The CSF is a direct reservoir of brain metabolic products, therefore it can reflect physiological changes associated with neuropsychiatric disorders. Using FUSION we generated predictive weights for 780 CSF metabolites, which were used to test metabolite-phenotype associations in 9 brain-related disorders. At a False Discovery Rate(FDR) of 0.05, we found 16 metabolite-phenotype associations. This investigation revealed a number of metabolites significantly associated with Schizophrenia, Migraine, ADHD, Bipolar Disorder, and MDD as well as overlap in metabolites across these disorders.

Concepcion

Gastric Cancer Biomarker Discovery using Machine Learning Approach

MANUEL CORTES1, TIMOTHY LINDSEY1, Misagh Kordi2, David T. Wong2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 UCLA School of Dentistry, Center for Oral/Head & Neck Oncology Research, UCLA

The advent of accurate and cost-effective sequencing has led to the growth of analysis techniques such as liquid biopsy for disease diagnosis. Previously, it has been well documented that certain genetic markers within liquid biopsy show marked differences between clinical groups, such as organ transplant studies, prenatal testing, and more recently oncological studies. Within the field of oncological research, advances have been made to understand how the distribution of tumor-derived cell-free DNA (cfDNA), can be a critical biomarker for cancer detection. Through the study of cfDNA fragmentomics, we seek to extend our understanding of the DNA Integrity Index (DII) and soft-clipped read mapping as biomarkers for cancer diagnostics, with the goal of discovering the discrepancies between healthy patients and cancer patients. Particularly, our focus is on the statistical description and analysis of soft clips in ultrashort DNA (nt < 150) to elucidate biologically relevant features that are not frequently explored in the literature.  We then applied gradient boosting, Random Forest Models to determine cancer presence based on the differential alignment of long- (nt > 63) vs. short-reads (nt < 63). We trained our models on a synthetic sample set with N-Dimensional feature space, derived from existing liquid biopsy data and tested on a liquid biopsy cancer/healthy sample set DII. Through ML analysis of the flattened DII sample space, features were validated to hold information relevant for classification. Our models yield 68.75 ± 6.25% accuracy, with Sensitivity = 70% and Specificity = 83.33%, in classifying healthy and cancer samples. Apart from our ML analysis, we analyzed soft clip from our cfDNA samples via traditional distance measurements (i.e. Jaccard and Levenshtein), along with alignment analysis. Our results showed soft clipped regions hold pertinent information that warrants further investigation. Further work is currently being investigated for increased feature extraction to reach improved accuracy and provide a robust and non-invasive tool for future cancer diagnosis in the clinical space. [pdf-embedder url="https://qcb.ucla.edu/wp-content/uploads/sites/14/2022/08/CortesLindsey.pdf" title="Cortes:Lindsey"] [/av_toggle] [av_toggle title='CRUMBIE: Tractography Algorithms Applied to Diffusion Tensor Images to Establish White Matter Tracts Between Brain Regions Using a Micron Resolution Dataset' tags='' custom_id='' av_uid='av-ojpzco' sc_version='1.0'] Tractography Algorithms Applied to Diffusion Tensor Images to Establish White Matter Tracts Between Brain Regions Using a Micron Resolution Dataset

TIFFANIE CRUMBIE1,8, Bryson Gray3, Ricardo Coronadoleija7,Jiangyang Zhang7 ,Partha Mitra6,David Nauen5 ,Daniel Tward2,3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Neurology, David Geffen School of Medicine, UCLA

3 Ahmanson-Lovelace Brain Mapping Center, UCLA

4 Department of Computational Medicine, UCLA

5 Johns Hopkins School of Medicine

6 Cold Spring Harbor Laboratory

7 NYU Grossman School of Medicine 8 University of Central Florida

Neuroscientists have developed methods for identifying irregularities in neural pathways, but improvements would allow better understanding of disorders. Currently most white matter tracts can be identified reproducibly, but gray matter and some complex regions cannot due to lack of high resolution data. Diffusion Tensor Imaging (DTI) tractography is an MRI technique to identify white matter fibers connecting brain regions, based on the direction of water diffusion. By studying a high resolution dataset consisting of submillimeter resolution DTI and micron resolution neuron stained microscopy images, we can identify these pathways. Using two graph-based algorithms (minimum spanning trees (MSTs), Dijkstra’s), we developed computational tools for estimating pathways between brain regions. We test the hypothesis that MSTs compute fastest, but are less accurate when compared to microscopy data. We hypothesize other algorithms are slower, but have better accuracy. Our platform serves as a start for developing and validating tractography algorithms in the future.

Crumbie

Tractography Algorithms Applied to Diffusion Tensor Images to Establish White Matter Tracts Between Brain Regions Using a Micron Resolution Dataset

TIFFANIE CRUMBIE1,8, Bryson Gray3, Ricardo Coronadoleija7,Jiangyang Zhang7 ,Partha Mitra6,David Nauen5 ,Daniel Tward2,3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA 2 Department of Neurology, David Geffen School of Medicine, UCLA

3 Ahmanson-Lovelace Brain Mapping Center, UCLA 4 Department of Computational Medicine, UCLA

5 Johns Hopkins School of Medicine 6 Cold Spring Harbor Laboratory

7 NYU Grossman School of Medicine 8 University of Central Florida

Neuroscientists have developed methods for identifying irregularities in neural pathways, but improvements would allow better understanding of disorders. Currently most white matter tracts can be identified reproducibly, but gray matter and some complex regions cannot due to lack of high resolution data. Diffusion Tensor Imaging (DTI) tractography is an MRI technique to identify white matter fibers connecting brain regions, based on the direction of water diffusion.

By studying a high resolution dataset consisting of submillimeter resolution DTI and micron resolution neuron stained microscopy images, we can identify these pathways. Using two graph-based algorithms (minimum spanning trees (MSTs), Dijkstra’s), we developed computational tools for estimating pathways between brain regions. We test the hypothesis that MSTs compute fastest, but are less accurate when compared to microscopy data. We hypothesize other algorithms are slower, but have better accuracy. Our platform serves as a start for developing and validating tractography algorithms in the future.

Toggle Content goes here

The effect of evolutionary history and linkage disequilibrium on the genetic architecture of complex traits

SAIGE DAINES1, Xinjun Zhang2, Kirk Lohmueller2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Ecology and Evolutionary Biology, UCLA

3 Department of Human Genetics, David Geffen School of Medicine, UCLA

Genome-wide association studies (GWAS) have identified numerous common variants associated with complex traits. However, rare variants may have a large influence on complex traits, particularly if natural selection keeps mutations of large effect at lower frequency in the population. The ability of GWAS to discover associations with rare variants is unclear. For example, in regions of the genome with low recombination rates, higher levels of linkage disequilibrium (LD), may increase the power of GWAS in detecting rare causal mutations. It is not well understood how LD in populations of different evolutionary demographic history effects the power of GWAS. Here we investigated the effects of different recombination rates combined with models of European and African population history in an evolutionary model that includes negative natural selection. Effect sizes of deleterious mutations and phenotype of individuals were simulated under a model where a mutation’s effect on the trait was proportional to its effect on fitness.  We performed GWAS analysis on simulations with varying recombination rates and observe that allele frequencies and estimated effect sizes between causal and non-causal GWAS ‘hits’ differ in regions of low and high recombination rate. Further, we show that the demographic history of a population plays an important role in the power of GWAS and the fine-mapping of genetic variants. Our question provides insight on the influence of LD and evolutionary history on the genetic architecture of complex traits, and how to interpret GWAS results when such factors are accounted for.

saigedaines_ucla_poster

Automated segmentation and classification pipeline for lung cancer detection and diagnosis

GARRETT DUNCAN1, Anil Yadav2, Ruiwen Ding2, William Hsu2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Medical & Imaging Informatics, Department of Radiological Sciences, DGSOM, UCLA

Detecting and annotating lung nodules in computed tomography (CT) scans can be a time-consuming, tedious, and subjective task. In addition, classifying nodules as benign or malignant can be tricky when nodule size is between 5-30mm in diameter. We hypothesize that providing a lung nodule malignancy prediction model with a nodule mask from segmentation will improve its classification performance as compared to providing the entire CT scan. For segmentation, using the MedicalNet model as a basis, we fine-tuned the model using 839 CT scans from 300 patients in the Lung Image Database Consortium (LIDC) dataset. For classification, we trained a ResNet-34 model on 693 volumes from 251 patients from the LIDC dataset. For the reference standard, we considered nodules that were given a suspicion level of 3.75 or higher by radiologists as malignant. Our initial exploratory analysis on the dataset found positive correlations between diameter, calcification, lobulation, and spiculation and the risk of malignancy. Using a held-out test set of 435 scans, we will report the intersection over union (IOU) and Dice coefficient of the fine-tuned MedicalNet model. Using a test set of 244 scans with nodules between 5-30mm, we will report the area under the curve and sensitivity, and specificity for tested thresholds. We look to demonstrate the utility of performing nodule segmentation prior to classification and highlight the strengths and limitations of using deep learning-based methods for lung cancer diagnosis.

Using PCA to Explore the Data of Candidates

JALISSA EMMENS, CARLY GORDON

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics, David Geffen School of Medicine, UCLA

The Meyer lab involves Principal Component Analysis. By transforming a large set of variables into a smaller one, Principal Component Analysis reduces the dimensionality of large data sets by reducing their size while retaining most of the information. The reason we are using PCA is because we have hundreds of data listed that needs to be recorded and compared. Without PCA it would be too complicated to explore and comprehend the data being that there is a large amount of data. The variables that are included in our data are healthy, bacterial, viral, SIRS and candidemia. Using PCA we have created three scatter plot graphs that are labeled components one on the x-axis and components two on the y-axis. We also have a fourth graph included that shows the variances explained vs the number of components. These graphs help us to visualize the larger set of variables and assist us to further understand all of the data. From the graphs we found that bacterial, healthy and candidemia candidates have more of a lower component 1 and a higher component 2; healthy candidates have more of a higher component 1 and 2, and the SIRS candidates have more of a higher component 1 and 2. We haven’t exactly found what it means at the moment but we did find that as the number of components increases the variance explained also increases.

Emmens:Gordon

Exploring patterns in Hi-C datasets from various mice strains using standard and contrastive principal component analysis

ASIYA FALAK1, Zeyuan Chen2, Jeffrey N. Chiang3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Computer Science, UCLA

3 Department of Computational Medicine, UCLA

Principal Component Analysis (PCA) is often used to reveal low dimensional structure within a single given dataset from high-throughput chromosome conformation capture (Hi-C) data. An extension termed contrastive PCA (cPCA) has been shown to capture differential trends often missed by standard PCA by maximizing variance preserved in target data while minimize variance in a background data. In our project, we sought to detect differential conformation regions between two mice strains, BLK6 and DBA, under both PCA and cPCA. Using PCA, we tested the autosomes of both mice strains and found several differential conformation regions, some of which were sign flipping and some that were changes within the same compartment. When conducting unsupervised cPCA on both strains, we could not see meaningful patterns in the foreground data relative to the background data, suggesting that modifications to the datasets are necessary before differential trends can be detected.

Falak

BSReadSim: a profile-based Whole Genome Bisulfite Sequencing Read Simulator

JUNXI FENG1, HONGXIANG FU1, Wenbin Guo2, Matteo Pellegrini3

1 Department of Statistics, University of California Los Angeles

2 Bioinformatics Interdepartmental PhD Program, University of California Los Angeles

3 Department of Molecular Cell and Developmental Biology, University of California Los Angeles

DNA methylation plays an important role in gene expression regulation, cell development, and disease progression. Bisulfite sequencing is currently the standard method to quantify DNA methylation. Benchmarking computational methods designed for analyzing bisulfite sequencing data becomes a pressing need for the research community. Yet whole genome bisulfite sequencing is still expensive, in-silico simulation remains the primary approach for benchmarking. Existing simulators such as WGBSSuite and pWGBSSimla only produce read count level simulated data. Although BSBolt and Sherman can generate simulated reads, they fail to conduct profile-based simulation specified by users. We introduce BSReadSim, an efficient and flexible tool that enables profile-based simulation of bisulfite sequencing reads. Specifically, BSReadSim allows users to incorporate reference genetic variants and methylation profiles into reads simulation. Overall, BSReadSim can generate much more realistic bisulfite reads. Furthermore, our tool can be used to benchmark various tools such as bisulfite sequencing aligners, SNP-calling tools, and meQTL studies.

Modeling the dynamics of epileptic firing behaviors in neurons with sus regulated extracellular potassium

FIONA LATIFI1, DANIEL FELIPE FORERO1, Sharmila Venugopal2

  1. BIG Summer Program, Institute for Quantitative and Computational Biology, UCLA
  2. Department of Integrative Biology and Physiology, UCLA

Dysregulated extracellular potassium (K+) buffering can cause neural hyperexcitability and high-frequency firing associated with seizures. To test this, we utilized electrophysiological data from neurons in patients with pharmacoresistant medial temporal lobe epilepsy (Data from Allen Institute). Neurons were classified based on their computational properties as Type 1 or Type 2. We then modified two well-established neuron models, (Connor-Stevens for Type 1 & Hodgkin-Huxley for Type 2) to match the empirical input-output gains evident from the frequency-injected current (F-I) curves. To mimic dysregulated K+ buffering, we increased the K+ Nernst potential in the model and noted an increased firing rate and decreased F-I curve slope. These results shed light on how extracellular potassium dysregulation is implicated in epilepsy. Future work incorporating astrocyte models will enable examining astrocyte-mediated control of neural excitability through spatial K+ buffering and uptake via K+ inward rectifying (Kir) and AQP4 channels, gap junctions, and Na+/K+ ATPase pumps.

Investigating the Hypervirulence of a Fusobacterium nucleatum Ethanolamine Utilization Mutant in a Pre-Term Birth Model

AMANDA FUENZALIDA1, Dana Franklin2, Yi-Wei Chen2, Hung Ton-That2

Bruins in Genomics, University of California-Los Angeles1, School of Dentistry, University of California-Los Angeles2

Known to play a significant role in the development of oral biofilms, the oral pathobiont Fusobacterium nucleatum can spread to distal organs, including placenta and colon, promoting preterm birth and colorectal cancer. Since many bacterial pathogens acquire ethanolamine (EA) as a source of carbon and nitrogen, and EA is upregulated in the placenta during fetal development, we previously sought to examine the role of EA metabolism in fusobacterial virulence as this process is currently unknown in F. nucleatum. Because the EA-ammonia lyase EutBC catalyzes the breakdown of EA, we generated a eutBC mutant (ΔeutBC), initially predicting virulence attenuation of this mutant. Surprisingly, we found that ΔeutBC is hypervirulent, exhibiting its increased placental and fetal colonization and reducing pup survival.  We now hypothesize that eutBC deficiency may cause intracellular accumulation of EA, which triggers upregulation of virulence factors that promote bacterial virulence and colonization. To investigate this, we began to isolate total RNA from the recovered placentas of animals infected with the parent or ΔeutBC strain for gene expression analysis of a few Fusobacterium predicted virulence factors by qRT-PCR. By creating a Python code to analyze gene expression data via a ΔΔCT method, we found that expression of genes coding for putative toxins and factors involved in oxidative stress are upregulated in ΔeutBC, relative to the parent strain. In our current experiments, we are employing RNA-seq to identify differentially regulated genes caused by eutBC deficiency at the genome-wide scale, aiming to reveal novel virulence factors contributing to the F. nucleatum pathophysiology.

Fuenzalida

Examining cell-specific biomarkers in Amyotrophic Lateral Sclerosis

CAMERON GILL, Sharmila Venugopal

Department of Integrative Biology and Physiology, UCLA

BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

Amyotrophic Lateral Sclerosis (ALS) is a devastating neurodegenerative disease, undetectable at early stages due to lack of objective biomarkers. Our lab has uncovered early physiological abnormalities not only in the vulnerable brainstem motor neurons but also their proprioceptive input neurons and GABAergic inhibition using the well-established SOD1-G93A mouse model for ALS. In parallel, single-cell RNA sequencing near symptomatic stage revealed differentially expressed genes (DEGs) in a variety of cell types in the brainstem areas of SOD1-G93A mouse many of which corresponded with human genes. Here, we cross-examined protein biomarkers reported in ALS patients in the cell types identified in our SOD1-G93A RNA-seq data to search for cell-specific markers in mouse and human. Using bioinformatics approach, we generated gene/protein interaction networks, and identified 11 overlapping markers between SOD1-G93A mouse and human patient reports. Specifically, GABAergic neurons and astroglia each comprised of 4/11 overlapping patient markers, further establishing the utility of SOD1-G93A mouse model to identify cell-specific biomarkers in ALS.

Exploring the RNA Editing Landscape of Metastatic Prostate Cancer

KARTHIK GURUVAYURAPPAN1, 2, 3, 4, 5, Julie Livingstone1, 2, 3, 4, 5, Paul C. Boutros1, 2, 3, 4, 5

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics, University of California, Los Angeles

3 Department of Urology, University of California, Los Angeles

4 Institute for Precision Health, University of California, Los Angeles

5 Jonsson Comprehensive Cancer Center, University of California, Los Angeles

RNA editing is a post-transcriptional mechanism affecting both protein-coding transcripts and non-coding transcripts. The most common modification is adenosine-to-inosine (A-to-I) editing, which is frequently more active in cancer cells. A-I editing can introduce non-synonymous mutations, resulting in protein heterogeneity, but the nature and functional consequences of these changes in cancer remain widely unexplored. To characterize the landscape of A-to-I RNA editing in prostate cancer, we identified A-to-I RNA editing sites across 99 publicly available metastatic tumours using both RNA-sequencing and DNA-sequencing data. We then investigated associations between A-to-I RNA editing and key clinical and genomic features. We discovered A-to-I editing sites were significantly associated with site of metastasis, chromothripsis, homologous recombination deficiency, and decreased gene abundance. An exploration of A-to-I RNA editing in metastatic prostate cancer, combined with future exploration of localized tumours, may identify candidate RNA editing sites that contribute to protein heterogeneity in prostate cancer.

Guruvayurappan

Time course of Islet Transcriptome changes due to IAPP-induced beta-cell stress

IMRI HAGGIN1, NOUYA JIANG1, Montgomery Blencowe2, Zac Zamora2, Peter C. Butler4, Tatyana Gurlo4, Xia Yang2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA

3 Interdepartmental Program of Bioinformatics, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA

4 Larry L. Hillblom Islet Research Center, University of California, Los Angeles, David Geffen School of Medicine, Los Angeles, CA, USA

Type 2 diabetes (T2D) is characterized by the progressive deterioration of beta cell function in part due to misfolding of human islet amyloid polypeptide (hIAPP) co-expressed together with insulin by beta-cells. Beta-cell specific overexpression of hIAPP in mice results in diabetes; and shown to induce gene expression changes similar to human islets from individuals with T2D with the pronounced increase in inflammation. To reveal the mechanisms of progression in beta-cell failure, in this study we compared gene signatures of islets from 6 weeks-old and prediabetic 9 weeks-old transgenic mice. Through WGCNA and DEG analysis, we found that the major pathways were consistent between ages including ECM organization and immune functions, however differences were found for wound healing, RNA processing and ribosome biogenesis. RRHO analysis revealed concordance for upregulated genes, however less concordance in downregulated genes. This pattern was also similar in comparison to human T2D islets. Future analysis of gene networks might identify the key players in progression of disease and targets for intervention.

Single cell multi-omic association of divergent learning behaviors in C57BL/6J and DBA mice with perturbed biological pathways in the Hippocampus

CATALINA HOLGUIN1, Matthew Heffel2, Yi Zhang2, Kevin Abuhanna2, Patrick Chen3, Cuining Liu2, Terence Li2, Chongyuan Luo2, Jonathan Flint3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics, David Geffen School of Medicine, UCLA

3 Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, UCLA

Little is known about the transcriptomic and epigenetic pathways underlying behavioral phenotypes in mice, a key animal model with substantial differences in behavior by strain (e.g., learning, anxiety). To address this issue, we collected 47,926 single cell RNA profiles (10x scRNA-seq) and 2,950 single nucleus methylomes (snmC-seq2) from the hippocampal region of two mouse genotypes (DBA and C57BL/6J), then identified between-strain differentially expressed and methylated genes (Wilcoxon FDR < 0.05) which we used for pathway overrepresentation analyses (Enrichr). Between-strain differences included pathways likely relevant to behavior, such as the NMDA receptor pathway, which is involved in synaptic plasticity and memory in the hippocampus. Increased gene expression in this pathway has previously been associated with faster learning behavior. Thus, increased expression and decreased methylation of NMDA genes in C57BL/6J dentate gyrus cells could contribute to its faster learning rate in comparison to DBA in spatial alternation tasks. [pdf-embedder url="https://qcb.ucla.edu/wp-content/uploads/sites/14/2022/08/Holguin.pdf"] [/av_toggle] [av_toggle title='JAIN: Modelling Bacteria-enhanced Thermal Tolerance in Marine Phytoplankton' tags='' custom_id='' av_uid='av-n0l8qg' sc_version='1.0'] Modelling Bacteria-enhanced Thermal Tolerance in Marine Phytoplankton

AARSHI JAIN1,2, Colin T. Kremer3

1 BIG Summer Program, Department of Ecology and Evolutionary Biology, UCLA

2 URC Sciences Summer program, Department of Ecology and Evolutionary Biology, UCLA

3 Department of Ecology and Evolutionary Biology, UCLA

Marine phytoplankton account for 50% of global primary productivity and form the base of the oceanic food web. Recent studies on freshwater species show that cross-feeding between phytoplankton and bacteria in their microbiome influences phytoplankton thermal tolerance. Phytoplankton supply bacteria with photosynthate and receive cobalamin in return. This allows phytoplankton to synthesize methionine with a specific pathway (METH) that functions better at high temperatures than an alternate pathway (METE). Few studies have investigated this mutualism in marine species. We study how cross-feeding affects marine phytoplankton and their response to global warming by developing an ODE model. We found that temperature affects the species’ equilibrium population. When we allowed evolution along a tradeoff between growth and substrate synthesis, this interaction became unstable. Accounting for nutrient concentrations within cells may stabilize this interaction from an eco-evolutionary perspective. This study expands fundamental understanding of how species interactions affect adaptation to thermal gradients.

Jain

Effects of stochastic trace estimators on method of moments estimations

 MICHELLE JOHNSON1,2,3, NOELLE WHEELER1, Ali Pazoki4,Sriram Sankararaman4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Biology, Harvey Mudd College

3 Department of Computer Science, Harvey Mudd College

4 Department of Computer Science,UCLA

Variance components analysis has emerged as a powerful tool in complex trait genetics. Applying this method to large-scale genetic datasets can reveal important insights into genetic architecture, but previous methods of fitting variance components do not scale to these datasets. To address this, we used the scalable Method-of-Moments (MoM) estimator. The key computational bottleneck in the MoM estimator is computing the trace of a large transformed matrix A. Explicitly computing A requires O(N 3) which is not feasible in these cases. In this project, we assessed the effect of different scalable stochastic trace estimators on variance component estimation. To compare these estimators, we looked at the bias and variance of the MoM estimator as a function of distribution and number of random vectors. We found that a Rademacher distribution yielded the smallest standard error, which decreased with an increasing number of random vectors. This decrease was stronger in Hutch++ than in Hutchinson estimator.

Johnson:Wheeler

Using pair-wise scores derived from epigenomic features to predict phenotypic similarity between genetic variants

ELIJAH JONES1, SAIYANG LIU1, Ha Vu2, Jason Ersnt3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Bioinformatics Interdepartmental Ph.D. Program, UCLA

3 Department of Biological Chemistry, David Geffen School of Medicine, UCLA

Genome-wide association studies (GWAS) provide an unprecedented opportunity to uncover genotype-phenotype associations. Identifying genetic variants whose epigenetic properties influence their shared associations with certain phenotypes is an important challenge, especially in studies of the biological implications of GWAS variants. Here, we explore the possibility of developing a pair-wise score for genetic variants to measure their associations with similar phenotypes. We first analyzed the patterns of epigenetic signals that differentiate pairs of shared-phenotype GWAS variants based on data from Enformer–a deep learning framework that predicts >5,000 epigenetic features from DNA sequences. Then, we trained a fully-connected Siamese Network with a contrastive loss function to measure the epigenetic distances among variants and hence offer a proxy for their phenotypic similarity. This study offers evidence that epigenetic signals present at each variant in a pair can be predictive of their associations with similar phenotypes and diseases. The resulting score will be a resource for studies of human genetic variants and their associated phenotypes based on the closely-related variants at functional levels.

Jones:Liu

A Semi-automated Pipeline for Phenotyping Diabetes Complications in the UK Biobank: Generating and Analyzing Interval-censored Time-to-event Data

KELLY JONES1, Do Hyun Kim2, Aubrey Jensen3, Hua Zhou2, Jin Zhou4

1 BIG Summer Program Institute for Computational and Quantitative Biology

2 Department of Biostatistics, UCLA

3 Samueli School of Engineering, UCLA

4 Department of Medicine, uCLA

The increased availability of electronic health records through online databases has afforded new opportunities to study complex diseases in large populations. This study aimed to create a semi-automated and reproducible pipeline for extracting biomarkers and phenotyping incident outcomes using availalbe records from UK Biobank. This pipeline was applied to determine the time-to-event for three diabetes complications: cardiovascular disease, diabetic kidney disease, and diabetic retinopathy. These time-to-event data were then analyzed with biomarker trajectories and other DM risk factors to estimate the risk of diabetes complications developing over time. Specifically, we generated time-to-event data that were interval-censored to account for uncertainty in the diagnosis dates of diabetes and its complications. We specialized the interval-censored data to subjects whose time-to-event was right-censored. The primary analyses were conducted on both types of time-to-event data. We compare the effect sizes and directions of various factors on the risk of developing these complications.

Jones K

Methplotter: a Python package for the visualization of DNA methylation data

KARISA KE1, Zhenhui Zhong2, Steve Jacobsen2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Molecular, Cell and Developmental Biology, University of California Los Angeles, Los Angeles, CA, USA.

DNA methylation is an inheritable epigenetic mark that locks genes or transposable elements (TE) in the “off position” , which serves as an important component in various cellular processes such as genomic imprinting, embryonic development, differentiation, and maintenance of cellular identity through the epigenetic regulation of gene expression. High-throughput sequencing is widely used to profile genome-wide DNA methylation in a single-base resolution. Here, using the Python, we build a package, methplotter, for the visualization of high-throughput DNA methylation data. Functions of  methplotter  include converting methylation report from different methylation pipelines into a consistent DNA methylation format, generating wiggle report, graphing methylation data on a bar plot, line plot, and box plot over chromosomes, genes, TEs, or specific regions. In summary, our work provides a new tool for the downstream visualization of DNA methylation data.

Ke_Karisa_poster

Using linear regression and PathFX networks identifies genes associated with psychiatric drug effects in zebrafish screens

JAROD LE, Jennifer Wilson

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Bioengineering, Samueli School of Engineering, UCLA

The need for new treatments of psychiatric diseases is important due to over 25% of U.S. adults having serious mental illnesses (NIH, 2020) and no new therapeutic mechanisms or new drug targets have been identified within the last few decades (Mathur & Guo, 2010). However, many psychiatric diseases are polygenic and discovering new therapies requires a systems level approach. PathFX is a protein-protein interaction (PPI) network program that predicts drug outcomes by analyzing proteins downstream of drug targets. PPI models, including PathFX, tend to overpredict drug outcomes and have low precision and prediction performance. Previous work showed that a context-specific approach could reduce overprediction and improve precision. We hypothesized that antidepressant data taken from in vivo screening could be used to find proteins that have the most impact in the network. We introduced a pipeline that took in intermediate proteins and in vivo screening data and used machine learning to prioritize network proteins associated with strong and weak screen drug effects. The model prioritized the gene variant, rs2725252, and the network genes, PENK and CXCL6, suggesting that in vitro animal models are sensitive to drugs that transport xenobiotic molecules, and interact with opioid and chemokine receptors.

Le_Jarod_Poster

Making improved predictions of Non-invasive Continuous Arterial Blood Pressure through leveraging EHR medication data

SIMON LEE1, Ákos Rudas2, & Jeffrey N. Chiang2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Computational Medicine, University of California, Los Angeles, CA, USA

Hypotension (a sustained decrease in blood pressure) within critical care patients is associated with a higher risk of mortality and other severe complications. It is periodically monitored non-invasively for all ICU patients. Continuous blood pressure monitoring via arterial line catheters has been shown to lead to faster response times but is also associated with complications such as infection. With recent advances in machine learning, there are current models that are able to predict Arterial Blood Pressure (ABP) continuously and non-invasively but lack perfect precision. Therefore, this project aimed to leverage vasopressors medication from the Electronic Health Records (EHR) to make better-informed predictions. By including the administration rate of vasopressor drugs, we retrained the ABP Imputer deep learning model with the electrocardiogram (EKG) and photoplethysmographic (PPG) waveforms to build a prediction system that takes into account clinical interventions. Our results indicated that the precision of the waveforms is on average 5.67 mmHg away from the actual ABP measurement. These results suggest that our potential new method can measure blood pressure continuously and non-invasively for all patients in the ICU setting and beyond, without the need for any additional instrumentation.

Lee

Associations of antidepressant drug pathway proteins between drug efficacy and acceptability outcomes

EMILY LIU1, Jennifer Wilson2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Bioengineering, Samueli School of Engineering, UCLA

Antidepressant drug therapies provide positive therapeutic outcomes but are also not without unexpected and sometimes negative drug outcomes. Studies have shown that by understanding drug target protein-protein interactions (PPI) and associated phenotypes, scientists can better predict these unexpected drug outcomes. In this project, we use PathFX, a network algorithm that analyzes PPI downstream of drug targets, to investigate the association between network proteins and clinical assessment of antidepressant effects. We applied a linear regression and a regularized linear regression (RLR) model to identify relationships between PathFX-identified drug network genes and clinical outcomes such as efficacy and acceptability. After performing these two methods, we found similar model coefficients regardless of using efficacy or acceptability data. In applying RLR to clinical efficacy data, ATP-binding cassette family genes, serotonin-binding genes, and cytochrome enzyme genes (e.g., ABCG2, SLC6A4, and CYP1A2) had the highest, and cytochrome P450 family member genes (e.g., CYP2D6, CYP3A4, and CYP3A5) had the lowest coefficients in the model. In applying RLR to clinical acceptability data, cytochrome P450 family enzyme genes, serotonin-binding genes, and isozyme genes (e.g., CYP2D6, HTR2A, and GSTP1) had the highest, and ATP-binding cassette family genes, potassium channel genes, and forming ligand-gated ion channel genes (e.g., ABCG2, KcCNH2, and CHRNB4) had the lowest model coefficients. Our results suggest that antidepressant drug network genes associated with high clinical efficacy or acceptability are related but contain distinct network proteins. Therefore, further investigating these distinctions may help scientists refine their understanding of antidepressants and minimize negative drug outcomes.

Poster

High-throughput neoantigen discovery pipeline using noncanonical peptide identification

SIDDHARTH MAHESH1, ALI MOBEDI1, Chenghao Zhu2, 3, 4, 5, Lydia Y Liu2,4,6,7, Paul C. Boutros2,

3, 4, 5

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics, University of California, Los Angeles, CA, USA

3 Department of Urology, University of California, Los Angeles, CA, USA

4 Jonsson Comprehensive Cancer Center, University of California, Los Angeles, CA, USA

5 Institute for Precision Health, University of California, Los Angeles, CA, USA

6 Department of Medical Biophysics, University of Toronto, Toronto, Canada

7 Princess Margaret Cancer Centre, University Health Network, Toronto, Canada

The use of tumor neoantigens is a new approach in cancer immunotherapy that relies on the body’s immune system to eliminate cancerous cells. Neoantigens are peptides derived from genomic and transcriptomic variations that are recognized by the major histocompatibility complex (MHC). These tumor-specific peptides can then be presented on the cell’s surface for identification by T-cell receptors. Recent studies demonstrate the low accuracy of current neoantigen discovery pipelines, revealing that less than 5% of predicted bound peptides are truly present on cell surfaces. We propose a well-optimized high-throughput pipeline that utilizes non-canonical peptide callers (moPepGen) and HLA-genotyping in neoantigen discovery. The pipeline was tested on sequencing data obtained from 14 metastatic head and neck squamous cell carcinoma tumors. By leveraging proteogenomic data, we hope that our model can ultimately be a powerful tool for the development of cancer vaccines, improvement of patient outcomes, and disease prognosis.

Manesh

Exploring The Cell Heterogeneity in The Developing Mouse Mandible

HANIFA MASWALI1, Leah Ye2, Jimmy Hu2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Division of Oral Biology, School of Dentistry, UCLA

During embryonic development, progenitor cells in the first pharyngeal arch generate distinct cell types that contribute to various tissues, including connective tissue. How diverse cell types can arise from seemingly uniform progenitors during development remains an important question in craniofacial biology, and understanding the mechanisms that pattern the pharyngeal arch is a critical step toward addressing this question. Due to the proximity of the mandibular arch to the developing heart, we hypothesized possible signaling interactions between the two tissues. Here we performed single-cell RNA-sequencing to dissect the different cell populations in the early developing mouse mandible and the closely positioned heart tissues at embryonic day (E) 9.5. Using CellChat, we further identified signaling processes, such as BMP, responsible for differentiating cells within the mandible and heart. We anticipate that our findings will contribute to the research aiming to further characterize the pattern formation resulting in craniofacial and heart development.

DNA Methylation as a Prognostic Indicator of Liver Hepatocellular Carcinoma (LIHC)

HARINARAYANA MELLACHERUVU1, TONY ZHANG1, Ran Hu1,2,3, Jasmine Zhou1,2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA

3 Bioinformatics Interdepartmental Ph.D. Program, UCLA

DNA methylation at CpG sites is a promising marker in cancers with its association with cancer survival. Using the Illumina 450k array data from 377 patients with liver hepatocellular carcinomas (LIHC), we aim to identify differentially methylated positions (DMPs) as prognostic indicators. By comparing 50 pairs of tumor and adjacent normal tissue samples, we selected a set of DMPs markers as candidates. Then we trained univariate and multivariate Cox models on a training set (n=218) to identify probes that predict overall survival. To tackle the high-dimensionality in DNA methylation data, we used 5-fold cross-validation to obtain an elastic-net penalty regularized Cox model and tested it on a validation set (n=108). The predicted survival risk scores are correlated with cancer stage and stratify patients into risk level groups with a more significant log-rank test p-value than cancer stage, showing DNA methylation markers’ utility in LIHC prognosis.

Leveraging Human Phenotype Ontology (HPO) to identify phenotypes associated with chromatin modifier diseases

MATTHEW MOLERES1, Leroy Bondhus2, Valorie Arboleda2,3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Human Genetics David Geffen School of Medicine, UCLA

3 Department of Computational Medicine, David Geffen School of Medicine, UCLA

4 Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles

There are rare diseases caused by mutations in chromatin modifiers. Chromatin modifier diseases occur when the DNA of a gene encodes for a mutated chromatin modifier protein, causing unfavorable phenotypes related to a disease. We then want to unify these chromatin modifier diseases to specific phenotypes so that an understanding of the phenotypic aberrations can be understood and related across all syndromes. To do this we used the Online Mendelian Inheritance of Man (OMIM) to relate genes to various phenotypes and the HPO to capture the relations between phenotypes. A search tree was built to model the HPO tree structure to associate chromatin modifier genes to the phenotypes. Then Fisher’s Test was conducted to determine which phenotypes were enriched in chromatin modifier diseases. Gene Ontology (GO) analysis was performed on genes associated with the phenotypes in the tree to test for the molecular functions of the genes that share phenotypes with chromatin modifier diseases. Based on a phenotype observed, Hypoplastic fifth fingernail, suggests a relation to the biological processes involved in cardiac chamber formation, cardiac septum morphogenesis, and neuroepithelial cell differentiation. In this project we have related and identified phenotypes that are associated with chromatin modifier diseases. For future research these chromatin modifier diseases and phenotypes can be looked at over a developmental timeline to see where the genes in the phenotypes have come from and their role in biological processes over time.

Moleres

Enhanced mechano-sensing of stem cells in dynamic hydrogels promotes the osteogenic differentiation

ISABELA MOLINA1, Weihao Yuan2, Alireza Moshaverinia2,3,4

  1. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
  2. Laboratory of Biomaterials Innovation and Tissue Engineering (BITE), School of Dentistry, UCLA
  3. Weintraub Center for Reconstructive Biotechnology, School of Dentistry, UCLA
  4. Division of Advanced Prosthodontics, School of Dentistry, UCLA

The biomaterials-assisted delivery of stem cells gives great advantages to the regeneration of dental tissues. Although the traditional system of static networks has been widely used, another alternative hasemerged: structurally dynamic 3D hydrogels. This type of hydrogel responds to artificial activators andbiological signals with spatial precision allowing them to better mimic the physicochemical properties of the cellular microenvironment. Herein, the synthesis of gelatin and acryloyl cyclodextrin hybrid hydrogels based on”host – guest” interactions was achieved and the obtained hydrogels were used for the culture of human gingivalfibroblasts (hGnF) in growth medium (GM) and osteogenic medium (OM) to study the osteogenic differentiation by examination of gene expression levels via PCR and IF of ALP, OCN, Col1, RunX2, and MAPK.Our results demonstrated that the cells cultured inside the dynamic hydrogels showed a significantdifferentiation inside the osteogenic medium. Our findings provide valuable insights into how different types ofhydrogel media can influence gene expression and the growth and propagation of gingival fibroblasts, in dentaltissue regeneration.

Keywords: Gingival fibroblasts, Osteogenic medium, Growth medium, ALP, OCN.

Molina_Isabela_Poster_Print

An empirical best linear unbiased prediction approach to small area estimation of untreated dental decay and sealant prevalence in California children

FABELY MORENO1, TALIA VILLA1, Yilan Huang2, Jason Shen3, Honghu Liu2,3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Biostatistics, Fielding School of Public Health, UCLA

3 Division of Public Health and Community Dentistry, School of Dentistry, UCLA

4 Department of Medicine, David Geffen School of Medicine, UCLA

Although preventable, dental decay is the single most common chronic disease in children. Oral health data are essential to understand prevalence at the local level, monitor progress in disease prevention, and develop community health improvement plans. Availability of county-level data is scarce, and obtaining sufficient data is resource-intensive. We utilized Small Area Estimation (SAE) to generate county-level estimates of oral health indicators in children across California by borrowing strength from multiple data sources, including local surveys and auxiliary census data. Using the Battese-Harter-Fuller (BHF) model and the Fay-Herriot (FH) model, unit-level and area-level linear mixed models respectively, we generated estimates for the prevalence of untreated tooth decay and dental sealants in children. We then drew a comparison between direct and model-based estimates. These granular and disaggregated estimates will inform local health departments’ efforts to prevent disease and reduce children’s oral health disparities in counties across California.

Impact of seed selection on subclonal reconstruction solutions

ANNA NEIMAN-GOLDEN1, PHILIPPA STEINBERG1, Lydia Y. Liu2,3,4, Takafumi Yamaguchi2,3, Yash Patel2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
2 Department of Human Genetics, David Geffen School of Medicine, UCLA

3 Institute for Precision Health, UCLA

4 Department of Medical Biophysics, University of Toronto, Toronto, Canada

High intra-tumoural heterogeneity makes cancer prognosis and personalized treatment more difficult. We developed subclonal reconstruction (SRC) pipelines that use single nucleotide variant (SNV) and copy number aberration (CNA) mutation callers and SRC algorithms to quantify intra-tumoural heterogeneity. However, previous research shows that the algorithm choice impacts the results of subclonal reconstruction. In this project, we have expanded the pipelines to work with multiple mutation callers by creating parsers that extract variant data from different tools. Further, it is unclear how much the seed that initializes the probabilistic SRC algorithm impacts the pipeline results. Therefore, we benchmarked different randomized seeds across mutation caller and SRC algorithm combinations and quantified the variance in outcomes. Standardizing the pipelines across tools and accounting for variance due to seed selection will improve the reproducibility and accuracy in studying cancer evolution and enhance precision medicine.

Neiman-Golden:Steinberg

Role of thiazolidinediones in adipose tissue remodeling

VICTORIA PELLOT 1, Mirian De Siqueira2, Sicheng Zhang2, Claudio Villanueva2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, UCLA

Adipose tissue is a plastic and heterogenous tissue involved in regulating systemic energy metabolism, such as regulating food intake, energy storage and thermogenesis. Excess adipose tissue is closely linked with type 2 diabetes, cardiovascular disease, and contributes to the obesity epidemic. There are three types of adipocytes: white, beige, and brown; white functions for energy storage while brown and beige, for thermogenesis. PPAR-g is a major transcription factor that is highly expressed in adipocytes. Activation of PPAR-g by thiazolidinediones has a significant antidiabetic response; however, the detailed mechanism of action is still unclear. RNA-Seq analysis of brown, epidydimal and inguinal adipose tissue was performed on a genetic mouse model of obesity (type 2 diabetes) after treatment with the PPAR-g agonist, rosiglitazone. We observed a greater number of overlapping genes between brown and inguinal. For the downstream analysis, we focused on the overlapping upregulated genes from the three tissues because it provides a common rosiglitazone-driven remodeling. We found that within this list of genes the oxidoreductase, carboxylic acid metabolism and PPAR-g signaling pathways were enriched. Recognizing the pathways with the highest relevance to different processes occurring within the adipose tissue will allow us to understand the mechanisms by which rosiglitazone and other TZDs work as antidiabetic drugs and help combat obesity.

Pellot

Detection of spatial selective sweeps with haplotype homozygosity statistics

JULIANNA PEREZ1, Mariana Harris2, Nandita Garud3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Computational Medicine, UCLA

3 Department of Ecology and Evolutionary Biology, UCLA

Adaptation occurs when a species evolves to become more fit in its environment. Detecting adaptation unveils forces that shape populations, which can be applied to disease or human populations. Adaptation leaves behind selective sweeps—signatures in the genome distinct from neutrality. Recently, the haplotype homozygosity statistic, H12, was developed for detecting hard and soft sweeps (Garud et. al, 2015). In a hard sweep, a single haplotype with a selective advantage rises to frequency in the population, compared to soft sweeps where several haplotypes risesimultaneously. H12 uses SNP defined windows; this statistic has power in numerous evolutionary scenarios, but it may miss complete hard sweeps due to SNP-based windows spanning many base pairs. We propose a new statistic, XP-H12, whereby window size is defined by the diversity of a distinct population, possibly making it easier to detect complete hard sweeps. We tested the ability of the XP-H12 statistic in detecting hard and soft sweeps in several evolutionary scenarios. We find that XP-H12 outperforms H12 in detecting complete sweeps when the selection coefficient is >= 0.05. This research provides insight into the use of XP-H12 in detecting sweeps in populations, highlighting cases where H12 might miss sweeps, and cases where XP-H12 is able to detect such sweeps.

perez, julianna

Microbial diversity across cold creek

KAYSA PFANNMULLER1, Ricky Wolff2,3, Nandita Garud2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Ecology and Evolutionary Biology, UCLA

3 Bioinformatics Interdepartmental PhD Program, UCLA

Microbial communities are diverse across time and space. In environmental microbiomes, this diversity plays a role in several functions of microbial communities, including nutrient cycling, nitrogen fixation and carbon sequestration. Determination of microbial diversity across landscapes can help us to better understand the roles that function within the microbiome. In this study, microbial samples were taken from a series of thirteen pools in Cold Creek, a breeding place for highly diverse amphibian populations in the Santa Monica mountains. Of the samples taken, eight were sequenced. The raw reads were then mapped to reference genomes with the software MIDAS. The results from this mapping were inconclusive, which is believed to be due to the human-centric nature of the database. Given that environmental microbiomes are so diverse and contain several potential unknown species, we de novo assembled genomes using the raw reads from the eight samples with the Anvi’o software. After performing this assembly, the reads were mapped to the assembled genomes. Finally, we inferred species abundances with the Anvi’o software.  We were successful in determining taxonomical inferences. Some of the most common organisms we found were in the genera Methyloceanibacter and Chryseolinea, the order Nitrospirales, and the class Actinomycetia.

Kaysa_Pfannmuller_poster

A machine-learning based pairwise phenotypic similarity score for disease-associated genetic variants

AAHNA RATHOD1, Luke Li2, Jason Ernst3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Bioinformatics Interdepartmental Program, UCLA

3 Department of Biological Chemistry, David Geffen School of Medicine, UCLA

Genome-wide association studies (GWAS) investigate the association between specific phenotypic traits and common variants in the human genome. These genetic variants can facilitate changes in gene expression through histone modifications, changes that can be reflected in the characterization of chromatin states. We reason that variants associated with the same trait should share functional similarities. Therefore, we propose a machine-learning based framework that computes a phenotypic similarity score for a pair of variants based on their functional annotations. We took variants from the EMBL-EBI GWAS catalog and annotated them with the recently developed universal ChromHMM state segmentations. We trained the model to distinguish between pairs of variants associated with the same trait (“positive pairs”) and pairs associated with different traits (“negative pairs”), using the pair’s correspondence of ChromHMM states as input features. Preliminary analysis indicates that positive and negative pairs have significantly different distributions of training features. Our results will offer insight into if epigenomic features can be predictive of variants’ shared phenotypic association.

Rathood

Analysis of methylomic data in .fastq files using WG-Blimp

RAMANUJ SARKAR1, Patrick Allard2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Molecular Biology Institute, UCLA

Bisphenol A (BPA) has been associated with changes to the frequency of gene expression in mouse epiblast-like cells (EpiLCs) without changing the effect when compared to control EpiLCs. This project analyzed pre-collected methylomic data from the two groups of EpiLCs and identified regions with different methylation between them to understand if the methylation corresponded with the changed frequencies. We attempted to use the end-to-end analysis pipeline wg-blimp to create methylation reports which would convey whether these were correlated. The results were inconclusive.

Sarkar

Using whole-genome bisulfite sequencing to investigate the effect of BPA on germ cells

MEDHINI SOSALE1, Patrick Allard2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 UCLA Institute for Society and Genetics

3 UCLA Molecular Biology Institute

Bisphenol A (BPA) is a hazardous chemical that causes reproductive dysfunctions over multiple generations. This project applies whole-genome bisulfite sequencing (WGBS) to investigate the epigenetic mechanisms underlying its transgenerational inheritance. WGBS can comprehensively identify the cytosines that have been methylated and how these methylation events can be altered by a treatment. We compared the effectiveness of two WGBS pipelines, wg-blimp and BSBolt, in detecting differentially methylated regions (DMRs) from untreated and BPA-exposed in vitromouse primordial germ cells. While wg-blimp is designed to run a complete WGBS pipeline, BSBolt requires steps to be performed individually. I found that wg-blimp is harder to troubleshoot but that BSBolt is more time consuming. Using BSBolt, I obtained data for FastQC analysis and aligned the genome to a reference as part of the WGBS pipeline. Upon pipeline completion, DMRs will be visualized and compared to determine how exposure to BPA can be observed in the genome.

MedhiniSosale_BIGSummerPoster

Genome-wide Chromatin Architecture in Patients with HIV and Myocardial Infarction Nominates Immunometabolic Regulatory Networks in Disease Pathogenesis

VIVIEN SU1,2, Zhengyi Zhang2, Judith Currier3, Janine Trevillyan3, and Tamer Sallam2

1 Big Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Departments of Medicine and Physiology, David Geffen School of Medicine, UCLA

3 Division of Infectious Diseases, David Geffen School of Medicine, UCLA

People with Human immunodeficiency virus (HIV) have increased risk of Myocardial Infarction (MI) independent of traditional risk factors for heart disease such as dyslipidemia, hypertension, and diabetes. HIV is characterized by unique perturbations in innate and adaptive immune responses that can influence plaque stability, but these factors are not well captured by routine cardiovascular risk prediction models. To better elucidate mechanisms of MI in patients with HIV and potential prediction markers, we interrogated peripheral blood mononuclear cells (PBMCs) in patients with HIV from the AIDS Clinical Trial Group with or without myocardial infarction (MI). PBMCs were collected within 3 months prior to recorded MI from patients or matched controls enrolled in the same protocol. ATAC-seq and RNA-seq were performed using optimized protocols. We obtained high quality sequencing data from 24 patients samples including 13 with MI and 11 non-MI. Principal-component analysis (PCA) was conducted based on the entire ATAC-seq profiles, verified that the chromatin accessibility signature could be distinguished between the MI and non-MI groups. The chromatin accessibility profile was further separated to 8 distinct clusters via a K-mean clustering algorithm and distinct cluster “cluster 4” was identified with higher chromatin accessibility in MI group. Function and pathway annotation revealed that gene signature of “cluster 4” was related significantly to enrichment of energy homeostasis items and pathways.

Using Machine Learning Classification Approaches for Prediction of Obstructive Sleep Apnea from Mean Diffusivity MRI Images

SARTHAK TIWARI1,2, Bhaswati Roy3, Rajesh Kumar3,4,5, Daniel Tward6,7

  1. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
  2. Department of Biomedical Engineering, University of Utah
  3. Department of Anesthesiology & Perioperative Medicine, UCLA
  4. Department of Radiological Sciences, UCLA
  5. Department of Bioengineering, UCLA
  6. Department of Computational Medicine, UCLA
  7. Department of Neurology, UCLA

Obstructive sleep apnea (OSA) is a disorder that affects 10% of adults. Early diagnosis and treatment can lower the risk of diseases and brain damage, but current diagnosis methods are costly and time consuming. MRI is a promising faster and cheaper alternative. Using 155 mean diffusivity images, we tested convolutional neural networks (CNN), logistic regression, and stationary linear discriminant analysis (LDA). We trained networks with combinations of noise, deformation, and affine augmentations, and calculated Receiver Operating Characteristic curves. The model with highest area under the curve (AUC) was applied to the test data. The best AUC on the test sets is .62 from the logistic regressor, .56 from the CNN, and .63 from LDA. Diffusion MRI shows promise for evaluating OSA, but is limited by small datasets, reducing its generalizability. Future work focusing on transfer learning and increasing dataset size can make deep learning a promising approach for diagnosing OSA.

Tiwari

Bioactive peptide loading nanoparticles for the inhibition of the local inflammatory microenvironment

MARY TORRES1, Weihao Yuan2, Alireza Moshaverinia2,3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Laboratory of Biomaterials Innovation and Tissue Engineering (BITE), School of Dentistry, UCLA

3 Weintraub Center for Reconstructive Biotechnology, School of Dentistry, UCLA

4 Division of Advanced Prosthodontics, School of Dentistry, UCLA

Modifications to titanium dental implants have surged to improve biomimetic conditions and encourage bone growth. However, activated pro-inflammatory macrophages in the local inflammatory microenvironment have inhibitory effects on bone regeneration. This study explored whether coating RGD and TCP-25 with nanoparticles could influence the regulation of the inflammatory response governed by macrophages. Herein, mice macrophages were polarized into their pro-inflammatory or pro-regeneration phenotype and cultured in titanium substrates with either ZIF-8 nanoparticles or ZIF-8 nanoparticles encapsulated in TCP-25 and RGD peptides. Then, the gene expression levels of GAPDH, MAPK, Taz, TGFβ, and Smad3 were studied. We illustrated the signaling pathways associated to these proteins in macrophages. Our findings demonstrated that the TCP-25/RGD@ZIF-8 nanoparticles can significantly suppress the pro-inflammatory polarization and promote the pro-regeneration polarization of the seeded macrophages via the MAPK and TGFβ/Smad3 signaling pathway. This will give insight for future studies on how to direct M1 to M2 polarization on titanium substrates.

Torres_Mary-poster

Cell-cell Communication Gene Regulatory Network Inference Based on Single Cell Multiomics

NING WANG1,2, Russell Littman1,2,3, and Xia Yang1,2,3,4,5

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, UCLA

3 Bioinformatics Interdepartmental PhD Program, UCLA

4 Computational and Systems Biology Interdepartmental PhD Program, UCLA

5 Molecular, Cellular and Integrative Physiology Interdepartmental Ph.D. Program, UCLA

Gene regulatory networks (GRNs) depict which genes regulate one another through directed graphs, where nodes and edges represent genes and regulatory relationships between gene pairs. GRNs help illustrate how genes and their interactions contribute to health and disease. The advent of single cell multiomics brings new opportunities to leverage multidimensional data to model GRNs across cell types. We designed and developed a new computational method, grnComm, to infer inter-cellular GRNs using gradient boosting decision trees based on collective information from single cell RNA sequencing, spatial transcriptomics, and receptor-ligand databases. To benchmark method performance, we measure the cosine similarity between predicted and true gene expression of downstream genes from their upstream regulators across cell types. grnComm will enable efficient integration of single cell multiomics data to derive accurate and comprehensive GRNs across cell types, which will inform on regulatory cell types, genes, and pathways that regulate physiological homeostasis or disease etiology.

Single-Cell Multiomic Analysis of the Effects of PFOA on Mouse Liver Gene Expression and Chromatin Accessibility

DARREN WIJAYA1,2, Russell Littman1,2,3, Graciel Diamante2, In Sook Ahn2, Kavya Immadisetty2, Guanglin Zhang2, Ingrid Cely2, Xia Yang1,2,3,4,5

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, UCLA

3 Bioinformatics Interdepartmental PhD Program, UCLA

4 Computational and Systems Biology Interdepartmental PhD Program, UCLA

5 Molecular, Cellular and Integrative Physiology Interdepartmental PhD Program, UCLA

Perfluorooctanoic acid (PFOA) is a common chemical used in various household products, primarily in non-stick Teflon products. Its prevalence has caused widespread exposure and has been associated with different forms of toxicity and carcinogenicity, particularly in the liver. However, there is a lack of studies that determine the effects of PFOA in the liver in the single cell resolution. In this study, we aimed to use a single-cell multiomic approach in investigating PFOA effects in the liver by analyzing differential gene expression and chromatin accessibility in the same cells, at the single cell resolution. Using computational statistical tools such as DecontX and SoupX, we removed ambient RNA contamination from the RNA sample. We also analyzed the cell type clustering using a joint multiomic dimensionality reduction (UMAP) that utilizes both RNA and ATAC assays and annotated the cell types using a predetermined liver cell-type-specific marker gene database.

Evaluating polygenic score methods in admixed populations

Ziqi Xu1,2, Kangcheng Hou3, Yi Ding3, Bogdan Pasaniuc3,4,5,6

  1. Department of Computer Science, UCLA, Los Angeles, CA, USA
  2. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
  3. Bioinformatics Interdepartmental Program, University of California, Los Angeles (UCLA), Los Angeles, CA, USA
  4. Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
  5. Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
  6. Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA

Polygenic score (PGS) has recently emerged to be a powerful tool to predict the genetic components of complex traits and diseases. However, PGS are usually trained in individuals with European ancestries and have shown limited utility in individuals with non-European ancestries. Application of PGS is especially problematic within recent admixed populations, in which individuals inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. Within admixed populations, PGS accuracy can decay with a decreased proportion of European ancestries and this fact prevents equitable application of PGS. Best practices of applying PGS to admixed individuals are under-explored. In this work, we investigate best practices of applying PGS in admixed populations. We performed simulation studies based on real genotypes from 500K individuals in UK Biobank. We trained PGS models with LDpred2 on 370K individuals with European ancestries and compared two methods of PGS to admixed individuals with European and African ancestries: (1) total PGS: PGS weights were applied to genotypes of admixed individuals regardless of local ancestries; (2) partial PGS: PGS weights were applied to only genotypes in European local ancestries (Marnetto et al. 2020 Nature Comm). We compared the performance of these two PGS methods using correlation between PGS prediction and simulated true phenotype, and provide recommendations for applying PGS in admixed populations.

Xu

Evaluating Bayesian Sequential Testing Methods for Drug and Vaccine Safety Surveillance through Large-scale Simulations

KEYI XUE­1, Fan Bu1,2, Marc Suchard1,2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Biostatistics, UCLA

Post-market safety surveillance for drugs and vaccines is important as many rare and high-risk adverse reactions may not be identified in clinical trials due to the limited sample size. Surveillance is performed by sequentially analyzing observational data collected at discrete time points, conventionally through frequentist testing approaches like the MaxSPRT. However, such approaches suffer from the inflexibility of observation schedule and lack of power. Thus, we wish to develop an alternative, more flexible approach using Bayesian sequential tests. Through comprehensive simulations, we evaluate a Bayesian sequential testing framework in terms of testing error and timeliness under three commonly assumed data models in safety surveillance. We also explore the impact of different sample sizes, priors and decision thresholds. We find that by “tuning” the thresholds and/or priors, the Bayesian approach can achieve low false positive rates while maintaining promising statistical power, which demonstrates its flexibility and feasibility.

Using Principal Component Analysis to Investigate Causes of Disseminated Coccidioidomycosis

AUDREY YANG1, Sarah J. Spendlove2,3,, Samantha L. Jensen3,4, Diego Orellana3, Alexis V. Stephens5, George R. Thompson6,7,8, Royce H. Johnson9,10, Arash Heidari9,10, Rasha Kuran9,10, Valerie A. Arboleda2,3,4,11, and Manish J. Butte4,5

  1. BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA, Los Angeles, CA, USA
  2. Interdepartmental Bioinformatics Program, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
  3. Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA,USA
  4. Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
  5. Division of Immunology, Allergy, and Rheumatology, Department of Pediatrics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
  6. UC Davis Center for Valley Fever, UC Davis, Davis, CA, USA
  7. Department of Medical Microbiology and Immunology, UC Davis, Davis, CA, USA
  8. Department of Medicine and Division of Infectious Diseases, UC Davis, Davis, CA, USA
  9. Department of Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
  10. Valley Fever Institute, Kern Medical, Bakersfield, CA, USA
  11. Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA

Coccidioidomycosis, or “valley fever,” is a fungal disease endemic to the western hemisphere. While 60% of cases are asymptomatic, 40% of infections manifest in pneumonic symptoms and about 1% disseminate. Dissemination is associated with significant morbidity and mortality. The cause of dissemination can be related to both environmental and genetic factors. Previous findings revealed a correlation between genetic ancestry and dissemination, and between dissemination and increased Th2 cell skewing, an imbalanced Th2:Th1 ratio favoring Th2. Here, we focused on discovering correlations between genetic ancestry, Th2 skewing, and dissemination, as well as finding rare genetic variants that may cause dissemination. We used PCA to identify correlations using data on 86 patients, sorting them into disseminated or uncomplicated cases. We were unable to find significant correlation between dissemination and Th2 skew, possibly due to insufficient sample size or multiple sources contributing to risk of dissemination that were not accounted for. However, using larger datasets and looking at rare variants may still help us find causes of dissemination.

A Computational Model to Investigate the Dynamic Crosstalk Between Autophagy and Apoptosis

HAOXUAN ZHANG1, Sharmila Venugopal2

1 B.I.G. Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Integrative Biology and Physiology, UCLA

Apoptosis is a highly regulated and irreversible process of cell death, while autophagy maintains cellular homeostasis through degrading and recycling cytoplasmic components, such as aggregated and damaged proteins. The pivotal roles of both processes in cancer and neurodegenerative disorders have led to extensive research on understanding and modeling the dynamics involved. However, there lacks a simple yet effective framework modeling the interplay between them in the context of neurons. Here we present an interaction network, applicable to neurons, that describes the dynamics of and between cellular stress, apoptosis, and autophagy. The model is developed through extensive literature search, and the implementation and integration of selected ordinary differential equations(ODE) models, each of which describes an important compartment in the network. We believe our network provides a framework based on which experiments can be designed to substantiate the model and better understand the dynamics involved in neurodegeneration.