2025 Bruins-In-Genomics Summer Undergraduate Research Program

2025 B.I.G. Summer Participants

Lab PIsMentorsStudents
VALERIE ARBOLEDANeerja VashistKirin Chacko
BRUNILDA BALLIULena KrockenbergerLise Tucker
PAUL BOUTROSSelina WuOluwatimilehin Adefioye
JOSEPH CAPRIOLIMashal Malik
TIMOTHY CHANGThai TranAbdallah Fares
CHRISTOPHER COLWELLNatalie Kim
DINO DI CARLORajesh GhoshPranava Jana
JASON ERNSTLisa Barooah
NANDITA GARUDMariana Harris & Maya WeissmannBrendan Aeria
DANIEL GESCHWINDJacqueline MartinAlexandru Georgescu
ROBERT GUNSALUSAvantika Mohan
NEIL HARRISSam Vander DussenAzad Azargushasb
ALEXANDER HOFFMANNAlex GorinSophia Lambrecht
Alex GorinNatalie Pham
Helen HuangZhiyuan Zhu
JIMMY HULeah YeManya Lalwani
Leah YeChristian Suarez
KIRK LOHMUELLERChenlu DiClarissa Lai
ALDONS LUSISDylan SarverUshaswini Namburu
MATTEO PELLEGRINIYiqian GuMegan Huang
Charalampos Kiaris
Lajoyce MboningMegan Mitchell
HAROLD PIMENTELJingyou RaoJoanna Rhim
SRIRAM SANDARARAMANZiyuan ChenHanzhang Liu
WILLIAM SPEIERMicah VinetSarah Wu
MICHAEL TEITELLThang NguyenPrannay Veerabahu
HUNG TON-THATYi-Wei ChenOmri Kariv
YAN WANGAriana WatersSumayyah Borders
Ariana WatersAlima Deen
MICHAEL WELLSAna Rodriguez VegaSreyan Sarkar
DAVID WONGIrene ChoiTrinity Chan
Irene ChoiKevin Trochez
GRACE XINSHU XIAOTing FuAngela Zhang
XIA YANGMontgomery Blencowe & Daniel HaTiffany Lin
Montgomery Blencowe & Daniel HaSherri Xu

2025 B.I.G. Summer Poster Abstracts

Assessing Somatic Loss of Heterozygosity as a Driver in Fumarate Hydratase Deficient Renal Cell Carcinoma Development

OLUWATIMILEHIN ADEFIOYE1, Selina Wu2, Paul Boutros3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Medical Informatics, UCLA

3 Department of Human Genetics, David Geffen School of Medicine, UCLA

Fumarate hydratase (FH) is a tumor suppressor gene associated with fumarate hydratase-deficient renal cell carcinoma (FH-RCC), a rare and aggressive hereditary kidney cancer. Bi-allelic inactivation of FH, often involving a germline pathogenic/likely pathogenic variant and somatic loss of heterozygosity (LOH), is a well-studied model for FH-RCC tumor formation. However, the precise role of FH inactivation in FH-RCC development remains incompletely understood. Cancer Drivers are genes linked with tumor development by the American College of Medical Genetics (ACMG). We performed whole-genome sequencing on paired tumor/normal samples from 13 FH-RCC patients to assess bi-allelic inactivation, focusing on loss of heterozygosity in FH and other ACMG-defined driver genes. Through detailed analysis of B-allele frequencies, we identified evidence of LOH at heterozygous loci in FH and PCSK9, verifying FH bi-allelic inactivation and highlighting notable early events in tumor evolution. The presentation of LOH provides valuable insight into developmental pathways driving FH-RCC and other pathologies.

Adefioye_Oluwatimilehin_BIG_Poster

Synonymous and Non-synonymous Variants Empower Deep Learning to Classify Selective Sweeps in Ancient Human DNA

BRENDAN AERIA1, Mariana Harris2,3, Maya Weissman3, Nandita Garud3,4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Computational Medicine, UCLA

3 Department of Ecology and Evolutionary Biology, UCLA

4 Department of Human Genetics, UCLA

Beneficial mutations arise and rapidly spread through a population to high frequency in a process known as selective sweeps. Detecting selection from ancient human DNA (aDNA) can be challenging given large amounts of missing data and complex demography, but, if discoverable, could reveal the targets and rates of adaptation in human history. While existing methods, including deep learning, have shown promise, they to date have overlooked a rich source of information: the distinct evolutionary patterns of synonymous versus non-synonymous mutations. By comparing a CNN with access to this information to one without, we aim to demonstrate the value of synonymous versus non-synonymous patterns and improve deep learning-based discovery of selective sweeps. This could clarify our evolutionary history while offering a powerful new method for other species.

Brendan-Aeria-BIG-Poster6

Discovering Injury-Specific Brain States After Traumatic Brain Injury Through Deep Learning Network Analysis

AZAD AZARGUSHASB¹, Samuel Vander Dussen², Afshin Paydar²,³, Neil G. Harris²,³ ¹ Departmental Scholar, Computational and Systems Biology B.S. and Bioinformatics M.S., UCLA ² UCLA Brain Injury Research Center, Department of Neurosurgery, David Geffen School of Medicine, UCLA ³ Intellectual Development and Disabilities Research Center, UCLA

Functional brain networks reorganize dynamically following traumatic brain injury (TBI), but understanding how, why and when post-injury they occur remains to be determined. We hypothesized that TBI induces distinct, measurable brain states that differentially evolve during recovery. One of the main issues is how to characterize these network-level changes since there is no agreed upon methodology. We applied Gaussian Mixture Variational Autoencoder (GMVAE) to forelimb-evoked and resting state functional magnetic resonance imaging (fMRI) data from rats with controlled cortical impact injury, comparing the effects of constraint-induced movement therapy (CIMT) to untreated and sham-injured rats across three post-injury timepoints (7, 21, 49 days). GMVAE identified 9 distinct brain states from forelimb-evoked fMRI data. State 6 emerged exclusively within injured animals and was absent in all sham animals, representing an injury-specific network configuration. State distributions shifted temporally: State 1 dominated acutely (36.8% at 7d), State 4 emerged during subacute recovery (30.2% at 21d), and State 9 increased chronically (19.3% at 49d). Graph theory analysis revealed state-specific network topologies through modularity and clustering metrics. These brain states could provide biomarkers for tracking TBI progression and evaluating therapeutic interventions.

AZARGUSHASB

Levering Bulk Data to Build a Single-Cell Methylation Clock

LISA BAROOAH¹, Jason Ernst²

¹BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA
²Department of Biological Chemistry, David Geffen School of Medicine, UCLA

DNA methylation clocks in aging research are traditionally built from bulk tissue. In 2013, Dr. Steve Horvath transformed the field by designing an epigenetic clock based on 353 CpG sites that were highly correlated with chronological age. Over the past decade, single-cell methylation clocks have emerged but have faced coverage limitations. In this project, we designed a single-cell methylation clock and applied it to a pseudobulked single-cell dataset from reprogrammed fibroblasts. First, we replicated Horvath’s clock. To address coverage issues, we cross-referenced Horvath’s 353 CpG sites with methylation profiles from an IHEC (International Human Epigenome Consortium) bulk dataset. We identified similar CpG sites by calculating the smallest Euclidean distances. Ultimately, selecting the top 20 most similar CpG sites with a Gaussian weighting yielded the strongest correlation with biological age. This approach mitigates imputation challenges inherent to single-cell data while enabling more accurate age estimation across reprogramming clusters.

lisabarooah_bigsummer-2

Assessing the Predictive Power of the Oral Microbiome for Cardiometabolic Risk Using Machine Learning Approaches

SUMAYYAH BORDERS1,2,3, Yan Wang4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Biological Sciences, University of Pittsburgh

3 Department of Computer Science, University of Pittsburgh

4 Public and Population Health, Division of Oral and Systemic Health, School of Dentistry, UCLA

Cardiovascular disease (CVD) is the leading cause of death in the United States, encompassing several heart and

blood vessel problems from coronary heart disease to heart failure. While prior research has focused heavily on

the gut microbiome, emerging evidence suggests the oral microbiome may also influence cardiometabolic health.

This study explores the potential of predicting CVD risk using oral microbiome data from the NHANES 2009-2012

dataset and supervised machine learning. We trained and evaluated four machine learning models on phylum-

level relative abundance data: logistic regression, random forest, XGBoost, and Gaussian Naïve Bayes. Our

results show that XGBoost achieved the highest performance (AUC = 0.9989). However, random forest revealed

interpretable microbial associations, with nearly half of the top 20 most important features belonging to distinct

phyla. These findings highlight the predictive power of oral microbiome profiles for cardiometabolic risk

stratification and support further investigation into its contributions to CVD.

S_Borders_BIGposter

Identifying Gene Expression Differences in Blood and Fibroblasts of Patients with Bainbridge-Ropers Syndrome, a Rare Neurodevelopmental Disorder

KIRIN CHACKO1, Neerja Vashist2, Angela Wei2,3,4, Michael Reyes2,5, Maneesha Thaker2, Isabella Lin2,3,4, Valerie Arboleda2,3,4

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Pathology and Laboratory Medicine, DGSOM, UCLA

3Department of Human Genetics, DGSOM, UCLA

4Department of Computational Medicine, DGSOM, UCLA

5College of Medicine, University of the Philippines Manila, Philippines

Additional Sex Combs-Like 3 (ASXL3) encodes a chromatin modifier protein that plays an important role in epigenetic regulation and transcriptional activation. De-novo truncating mutations in ASXL3 cause Bainbridge-Ropers Syndrome (BRS), a rare neurodevelopmental disorder that affects various biological processes in addition to neurological development. However, the mechanism by which the mutation causes BRS is poorly understood. Previous research has established that de novo mutations alter the transcriptome in patient fibroblasts. We performed RNA sequencing of BRS blood and fibroblasts to identify differentially expressed genes across tissues. Results showed that BRS blood and fibroblasts had 351 and 161 differentially expressed genes (DEGs), respectively. Gene ontology revealed immune and neurological genes dysregulated in BRS blood, while fibroblasts showed additional dysregulation in organ development, craniofacial development, and gene regulation factors. Pathogenic variants in ASXL3 affect the transcriptome of genes related to development and cell differentiation across tissues.

Kirin-Chacko-BIG-Summer-2025-Poster-

Machine learning model for gastric cancer prediction using cfDNA

TRINITY CHAN1, KEVIN TROCHEZ1, Irene Choi2, Neeti Swarup2, David T.W. Wong2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 School of Dentistry, UCLA

Gastric cancer (GC) is often diagnosed at late stages, leading to a poor prognosis. Cell-free DNA (cfDNA) in saliva contains rich, non-invasive biomarkers for the early detection of GC. In particular, we investigate the alterations of cfDNA fragmentation patterns caused by GC. Preliminary analysis suggests a significant difference between GC and non-GC patients. Using these non-mutational features, we employed a machine learning classifier to aid in the detection of GC. Class and models were optimized using SMOTE, ensemble methods, domain-specific engineering, and grid search with cross-validation using features with the most significant difference, including fragmentomics, chromosomal coverage, motif, and microbial abundance. These features form the basis for a classification model which ultimately achieved an AUC of 0.81 suggesting the model can accurately classify GC and non-GC patients.

Chan

Network Analysis of Oral Microbiome Interactions in Early and Advanced Periodontitis

ALIMA DEEN,1, 2 Yan Wang3

1 B.I.G Summer Program, Institute of Quantitative and Computational Biosciences, UCLA

2 Department of Biometry and Statistics, College of Agriculture and Life Sciences, Cornell University

3 Public and Population Health, Division of Oral and Systemic Health, School of Dentistry, UCLA

Periodontitis is a dysbiosis-driven disease in which disruptions to the oral microbiome’s interconnected community alter microbial interactions. Since the microbiome functions as a network of interacting taxa, network analysis offers a powerful framework to reveal features linked to community resilience. Here, we determine how oral microbiome network structure and the influence of key taxa differ between individuals with early and advanced periodontitis. Using National Health and Nutrition Examination Survey (NHANES) 2009–2010 data, we constructed correlation-based networks at the phylum level using SparCC and applied spectral graph theory and network metrics to characterize groups. Then, perturbation experiments simulated removal of phyla to assess their effect on network integrity. We found that in early periodontitis, Fusobacteria removal enhanced stability, while Spirochetes supported local clustering; in advanced periodontitis, Fusobacteria loss weakened the network. These findings show how network-based approaches can identify vulnerabilities in the oral microbiome, informing precision-targeted interventions for periodontitis.

Deen_uclabig_poster

Investigating Medications As An Additional Data Modality In Positive Unlabeled Learning for Predicting Alzheimer’s Disease in Electronic Health Records

ABDALLAH FARES1, Thai Tran2,3, Mingzhou Fu2,3, Sriram Sankararaman4, David A Elashoff 4,5, Keith Vossel2, Timothy Chang2

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Neurology, David Geffen School of Medicine, UCLA

3Medical Informatics Home Area, Department of Bioinformatics, UCLA

4Computational Medicine, UCLA

5Department of Biostatistics, UCLA

Alzheimer’s disease (AD) is the most common neurodegenerative disease. Early diagnosis is critical for improved outcomes, with predictive modeling offering promising detection potential. Positive-unlabeled learning (PUL) can leverage unlabeled data but has seldom been applied to real-world electronic health records (EHR) data. Previously, we proposed a semi-supervised positive unlabeled learning (SSPUL) framework coupling PUL with bias mitigation for equitable prediction of undiagnosed AD at UCLA Health, but this relied exclusively on demographics and diagnostic data, limiting model performance. Here we extend the SSPUL framework to incorporate medication data alongside diagnostics, leveraging elastic net to address inherent collinearity between modalities. Using elastic net for feature selection reduced the feature set while producing statistically similar results. However, medications and diagnoses together as features (likely due to their collinearity) did not aid AD detection in this setting. Elastic net feature selection could be useful as an addition to the SSPUL pipeline as more data modalities are incorporated.

Fares

Genomic Network Analysis of Three Key Genes In Neuropsychiatric Disorders

ALEXANDRU GEORGESCU1,2, Jaqueline Martin3,5, Ramin Ali Marandi Ghoddousi2,3,4,5, Dan Geschwind2,3,4,5#

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Neurology, David Geffen School of Medicine, UCLA

3 Department of Genetics, David Geffen School of Medicine, UCLA

4 Program in Neurobehavioral Genetics, Semel Institute, DGSOM , UCLA

5 Center for Autism Research and Treatment, Semel Institute, David Geffen of Medicine, UCLA

The SSPsyGene Initiative, launched by the NIMH, aims to functionally characterize ~100 genes associated with autism and schizophrenia. As part of this effort, we generated ~700 clonal knockout human stem cell lines via CRISPR/Cas9. To contribute to the consortium’s goals, here we perform an in-depth analysis of 3 genes critical to neuropsychiatric disorders (NPDs): SHANK3, ARID1B, and SMARCC2. These genes play key roles in synaptic density and chromatin remodeling. Despite their importance, the biological pathways that converge downstream of gene perturbations, particularly those affecting seemingly unrelated functions but leading to similar phenotypes, remain poorly understood. Using R, we applied genomic functional analysis tools, including Gene Ontology (GO), Gene Set Enrichment Analysis (GSEA), and Weighted Gene Co-expression Network Analysis (WGCNA), to identify shared molecular functions and biological processes. Our findings aim to provide insight into the convergent mechanisms of gene regulation in the developing nervous system and the etiology of NPDs.

Georgescu

K-mer Profiling: A Novel Approach for Cell Type Classification and Marker Discovery in Single-Cell RNA Sequencing Data

MEGAN HUANG1, Yiqian Gu2, Matteo Pellegrini2

1B.I.G. Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Molecular, Cell and Developmental Biology, UCLA

Single-cell RNA sequencing (scRNA-seq) enables the classification of cell types and states by quantifying gene expression on a per-cell basis. Traditionally, this is achieved by analyzing gene counts and identifying gene markers that distinguish between cell populations. This project investigates whether cell type marker discovery could be instead performed using k-mer counts—short nucleotide substrings—rather than conventional gene counts. We developed a computational pipeline that generates comprehensive k-mer count profiles from scRNA-seq data and applies these profiles for downstream clustering and marker identification. Our analyses revealed that clustering cells based on k-mer counts generates clusters similar to those seen with gene count-based methods, yielding similar cell type annotations. However, the sets of cluster-specific markers identified with k-mer analysis differed from traditional gene markers, suggesting a novel perspective on cellular identity. These findings demonstrate that k-mer count-based approaches can provide a parallel method for cell type classification and marker discovery. This may uncover previously overlooked cell type markers and enable more flexible scRNA-seq analysis, ultimately broadening the toolkit for exploring cell heterogeneity and transcriptomic complexity.

Huang

High-Throughput Functional Screening of Calcium Biosensors via PicoShell-Enabled Directed Protein Evolution

PRANAVA JANA1,2, Rajesh Ghosh1,3, Dino Di Carlo1,3,4,5,6

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Microbiology, Immunology, and Molecular Genetics, UCLA

3 Department of Bioengineering, UCLA

4 Department of Mechanical Engineering, UCLA

5 California NanoSystems Institute (CNSI), UCLA

6 Jonsson Comprehensive Cancer Center, UCLA

Traditional colony screening technologies suffer from low throughput and limited functional resolution, hindering the discovery of novel protein biosensor functions through directed evolution. Our lab developed a high-throughput, single-cell screening workflow using PicoShells: hollow, porous hydrogel microparticles that encapsulate individual bacterial clones while supporting nutrient exchange, reagent diffusion, clonal outgrowth, and fluorescence-based functional assays within intact compartments, overcoming droplet-based microfluidics, which lack solution exchange. This platform was implemented to evolve jGCaMP7, a genetically encoded calcium indicator, for enhanced dynamic response and signal-to-noise ratio. We constructed a mutagenized GCaMP library targeting the N- and C-linker regions of the protein, and performed sequential sorts using FACS-based screening within PicoShells by alternating between calcium-rich and calcium-free buffers. We analyzed over 1.5 million encapsulated variants and progressively enriched for clones with superior calcium sensitivity. Within 3 screening cycles, we identified high-performing biosensors exhibiting up to 10-fold improved ΔF/Fo relative to the parent variant. These screens enable deep sequence-modeling and predictive-biosensor optimization.

Pranava-Jana-B.I.G.-Summer-2025-Poster

Genome-Wide Identification of Fusobacterium nucleatum Factors Associated with Oral Squamous Cell Carcinoma

OMRI KARIV1, Yi-Wei Chen2, Hung Ton-That2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Division of Oral Biology and Medicine, School of Dentistry, UCLA

3 Molecular Biology Institute, UCLA

Fusobacterium nucleatum (Fn) is an oral commensal bacterium known to promote both oral and extraoral diseases such as periodontitis, colorectal cancer and oral squamous cell carcinoma (OSCC). Despite being the most prevalent and aggressive oral malignancy, the bacterial factors in Fn which promote OSCC are still largely known. To identify and characterize OSCC-promoting Fn virulence factors, a sequence-defined Fn-Tn5 transposon library is applied. A genome-wide high-throughput screening approach was adopted by infecting a GFP-expressing OSCC cell line with this library. By monitoring GFP intensity, we identified mutants with promoted or attenuated spheroid growth. Secondary and tertiary screenings examining spheroid formation, growth, and 2D&3D cell migration excluded false-positive mutants. Two candidates that consistently showed cancer promotion or attenuation were selected for generating marker-less deletion mutants for future confirmation. This large-scale screening identified Fn virulence factors contributing to our understanding of Fn-promoted OSCC, ultimately aiming to illuminate potential OSCC therapeutic targets.

Kariv

Using k-mer Frequencies to Determine Evolutionary Relationships

CHARALAMPOS KIARIS1, Matteo Pellegrini1, 2, 3, 4

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Institute for Genomics & Proteomics, UCLA

3 Department of Molecular, Cell & Developmental Biology, UCLA

4 Department of Human Genetics, David Geffen School of Medicine, UCLA

Traditionally, phylogenetic trees were constructed by examining specific, highly conserved genes. Recently, it was shown that the whole genome can be used to determine phylogenetic relationships, alleviating biases due to differential selective pressure at specific loci. Both methods might miss patterns that appear within the structure of the genome itself, ignoring possible evolutionarily information. Patterns often appear when examining smaller sections, such as k-mers – nucleotide substrings of length k. This project focuses on these genomic patterns by using k-mer frequencies to construct phylogenetic relationships. With MKMC, a Python k-mer frequency analysis program, we examined 447 mammalian species through different distance calculations and clustering algorithms. Our results indicate, that using a 10-mer count, the reconstructed tree had relatively accurately split superorders (Euarchonta, Glires, Scrotifera, Afrotheria), but less accurate intraorder differentiation. This shows that patterns among different species exist not only in genes, but also in the general frequencies of k-mers.

Time-Series Characterization of Circadian Temperature Rhythms During Pregnancy

NATALIE KIM1, Laura Cortes2, Stephanie Correa2, Christopher S. Colwell3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, UCLA

3 Department of Psychiatry and Biobehavioral Sciences, UCLA

The suprachiasmatic nucleus (SCN) acts as the body’s master circadian clock, drives circadian rhythms in core body temperature (CBT) through a network of central and peripheral pathways. The SCN projects to the medial preoptic area (MPOA), and this structure integrates SCN timing signals with thermal sensory input to regulate CBT.  In this study, I first sought to determine the impact of pregnancy on rhythms in CBT using time series analysis tools including Chi-Square periodogram and Rhythmicity Analysis Incorporating Nonparametric method (RAIN). The analysis showed that pregnancy dramatically disrupted circadian rhythms in CBT and that the rhythms were restored within a few days after birth.  Next, I will determine the impact of selectively blocking estrogen signaling in the MPOA.  We hope that these mechanistic studies will provide insight into maternal physiology, circadian biology, and potential links to pregnancy complications including sleep disturbances.

NatalieK_BIG_Poster_UPDATED

Evolutionary Constraint Highlights Noncoding Variant Enrichment Across Complex Traits
CLARISSA LAI1, Chenlu Di2, Kirk E. Lohmueller2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Dept of Ecology and Evolutionary Biology, UCLA

3 Dept of Human Genetics, David Geffen School of Medicine, UCLA

Mutations in noncoding DNA, especially in regions of the human genome conserved across species, are under negative selection, yet their phenotypic impact remains unclear. Building on recent work modeling the DFE of noncoding mutations, we asked whether evolutionary constraint predicts enrichment of trait-associated variants across complex phenotypes. We identified putatively functional noncoding regions using ChromHMM annotations and classified them based on PhastCons scores as conserved in both primates and mammals, primates only, or mammals only. To evaluate enrichment, GWAS-associated variants were compared across conservation categories and non-conserved regions for schizophrenia and 78 additional complex traits. Preliminary findings show that primate-only conserved regions exhibit the strongest enrichment for GWAS-associated variants. Enrichment patterns were highly sensitive to the stringency used to identify causal variants, suggesting that both evolutionary constraint and statistical confidence shape the interpretability of noncoding variation. These findings support evolutionary models of constraint and inform strategies for variant prioritization.

Lai

The Role of Nuclear Laminal Rigidity in Compensatory Regeneration of the Mouse Incisor

MANYA LALWANI1, Christian R. Suarez1, Leah Ye2, Abinaya Thooyamani2, Jimmy K. Hu2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Division of Oral Biology, School of Dentistry, UCLA

The mouse incisor grows continually due to stem cells in the epithelial and mesenchymal tissue of the incisor. The epithelial stem cells differentiate into transit-amplifying cells (TACs), which become pre-ameloblasts and enamel-secreting ameloblasts. These populations exhibit distinct nuclear rigidity levels and cell behavior. This study investigates the role of nuclear lamins (A/C, B1, B2) in nuclear shape and cellular function. Epithelial knockout of all three isoforms resulted in pre-ameloblasts with shortened nuclei and altered nuclear spatial organization. ScRNA-seq of epithelial cells from control and knockout mice yielded 11 cell clusters. Gene-set enrichment of the clusters revealed upregulation of genes related to nuclear and cellular integrity. Cell trajectory analysis showed forking of the triple-knockout differentiation pathway at a relatively TAC-like cluster compared to the same control cluster. These findings help explain the triple-knockout phenotype and associate lamin loss with cellular plasticity, supporting a role for nuclear rigidity in regulating cell differentiation.

Lalwani, Suarez

Distinct but Overlapping Sets of Gene Expression Programs Drive Macrophage and Dendritic Cell Differentiation

SOPHIA LAMBRECHT1*, NATALIE PHAM1,5*, Benancio Rodriguez2, Aleksandr Gorin2,4, Alexander Hoffmann2,3

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Microbiology, Immunology, and Molecular Genetics, UCLA

3Institute for Quantitative and Computational Biosciences (QCBio), UCLA

4Department of Medicine, David Geffen School of Medicine, UCLA

5UC LEADS, Department of Molecular, Cellular, and Developmental Biology, UCSB

*These authors contributed equally to the work

Macrophages and dendritic cells originate from hematopoietic progenitor cells in the bone marrow, and mediate related immune system processes, including phagocytosis and antigen presentation. However, the specific genes and regulatory pathways involved in the differentiation of each cell type remain under investigation. We undertook an unbiased RNA-seq transcriptomic profiling approach to track macrophage and dendritic cell differentiation driven by three growth factors over the course of nine days. Using differential gene expression analysis, principal component analysis, and k-means clustering, we identified gene groups that are either common to all or specific to each growth factor pathway. Gene Ontology analysis identified functions associated with these genes. We found that NFkappaB mutants affected each growth factor pathway differently, suggesting that inflammatory dysregulation alters differentiation pathways. These findings serve as a starting point for a systematic analysis of myeloid differentiation to reveal key biomarkers and guide potential therapeutic intervention in inflammatory disease.

Sophia-and-Natalie-Big-Summer-Poster.pptx

Genetic Pathways and Networks Interacting with Sports to Modify Learning and Memory in Adolescents

TIFFANY LIN1, Michael Cheng2,3, Melody Mao2, Isabella Lam2, Montgomery Blencowe2,4, Xia Yang2,3,4,5,6

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Integrative Biology and Physiology, UCLA

3Bioinformatics Interdepartmental Program, UCLA

4Molecular, Cellular and Integrative Physiology Interdepartmental Program, UCLA

5Brain Research Institute, UCLA

6Institute for Quantitative and Computational Biosciences, UCLA

Regular physical activity is associated with enhanced learning and memory, but the genetic mechanisms moderating these benefits remain unclear. We hypothesized that genetic factors affecting specific molecular and cell type–specific pathways in brain regions relevant to cognition modify the effect of sports participation on performance. Using the Adolescent Brain Cognitive Development (ABCD) cohort, we conducted a genome-wide interaction study (GWIS) and identified SNP-by-sports interaction effects enriched in collagen activated signaling, tube formation, and synapse organization pathways. Top genes included those well known for roles in learning and memory (e.g., APP, TGFB1, WNT5A) and novel candidates (e.g., SDK2, EZR, OLFM1). Integration with single-cell RNA sequencing–derived regulatory networks identified key drivers (RAB6B, NAPB, NSF, NDRG4) coordinating synaptic organization and transmission across neuronal cell types. In hippocampal microglia, CCL4 emerged as a driver in the tumor necrosis factor pathway, implicated in neuroinflammation and multiple neurological disorders. Overlap with a previous SNP-by-mild traumatic brain injury (mTBI) study in the ABCD cohort revealed shared key drivers (NSF, NAPB, NDRG4) in anterior cingulate cortex neurons. Our findings highlight shared molecular pathways through which sports may enhance cognitive resilience and support mTBI recovery, offering potential therapeutic targets at the intersection of physical activity, neurodevelopment, and neuroinflammation.

Lin

Doublet Detection in Droplet Based Sequencing Data by Masked Gaussian Mixture Model

HANZHANG LIU1, Zeyuan Johnson Chen2,3, Eran Halperin3, Sriram Sankararaman2,3,4

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, University of California, Los Angeles

2Department of Computer Science, University of California, Los Angeles

3Department of Computational Medicine, University of California, Los Angeles

4Department of Human Genetics, David Geffen School of Medicine at UCLA, University of California, Los Angeles

Capturing two or more cells in droplet-based single-cell sequencing, multiplets introduce spurious signals that confound downstream analyses. Existing detection methods often rely on generating synthetic doublets and detecting multiplets irrespective of the underlying cell-type identity. We introduce the masked Gaussian Mixture Model (mGMM), an expectation maximization algorithm that jointly models cluster membership and detects multiplets under a probabilistic framework, thereby circumventing the need for time-consuming artificial doublet generation. By refining its estimate of plausible multiplets at each iteration, mGMM steers the optimization process toward recovering the true singlet distribution in epigenomic and transcriptomic datasets. On simulated datasets derived from single-nucleus methyl-3C sequencing, mGMM outperforms existing methods across a range of doublet rates (PRAUC≥0.80; P=0.012). It also demonstrates state-of-the-art performance on single-cell RNA-seq datasets with orthogonal multiplets validation, particularly for those from solid tissues. Finally, mGMM is computationally efficient and robust to hyperparameter choices and normalization schemes.

Liu

Assumption-Free Uncertainty Quantification in Epigenetic Clocks

MEGAN MITCHELL1, Lajoyce Mboning2, Louis-S. Bouchard2, Matteo Pellegrini3

1 BIG Summer Program, Institute for Quantitative & Computational Biosciences, UCLA

2 Department of Chemistry and Biochemistry, UCLA

3 Department of Molecular, Cell and Developmental Biology, UCLA

Epigenetic clocks – models that estimate biological age from DNA methylation (DNAm) data – typically provide point predictions without accompanying measures of uncertainty. To address this limitation, our previously published model, BayesAge, used simulated data to generate a range of plausible ages for each individual. However, this approach was computationally intensive and systematically underestimated error. In this study, we evaluated split conformal prediction and its locally weighted variant as computationally efficient and statistically rigorous methods for quantifying the uncertainty of BayesAge’s predictions. We found that split conformal inference produced empirically valid confidence intervals, even in finite-sample settings, with minimal residual bias. These findings suggest that conformal inference may be valuable for supporting high-risk decision-making tasks in clinical or forensic contexts, where reliable uncertainty bounds are essential.

Mitchell

Identifying & Classifying Hydrogenases in Bacteria Using an Exploratory Computational Tool

AVANTIKA MOHAN3, Robert Gunsalus1, Thomas Holton2

1 Department of Microbiology, Immunology, and Molecular Genetics, UCLA

2 Institute for Genomics and Proteomics, UCLA

3 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

Hydrogenases are crucial enzymes found across bacterial, archaeal, and eukaryotic domains. Detecting the presence of hydrogenase genes in a genome is complicated by their diversity in gene content, enzyme composition, and specificity. Existing search tools rely on DNA and/or amino-acid primary sequence data, and have shown to be of limited value in predicting gene and enzymatic function for a diverse group of organisms. We designed and tested a computational tool that recognizes protein families (pfams) in hydrogenase genes to better predict enzyme function across a subset of sequenced genomes. Using Syntrophomonas wolfeii as a model organism, we selected a ‘protein signature,’ comprising of pfams associated with catalytically active subunits to run amongst other species of the genus Syntrophomonas. Our tool was able to predict identical and related classes of hydrogenases, and shows potential in identifying other types of hydrogenases, like formate hydrogenases and dehydrogenases, across a wide variety of organisms. We hope this algorithm can be used to better understand the ubiquitous process of hydrogen consumption and production.

Mohan

Quantification of Circadian Locomotor Traits in the Hybrid Mouse Diversity Panel

USHASWINI NAMBURU1,2, Dylan C. Sarver1,3, Aldons J. Lusis1,2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Microbiology, Immunology, and Molecular Genetics, UCLA

3 Department of Cardiology, UCLA

Circadian rhythms are central to the temporal regulation of many physiological processes, and their disruption is linked to an increased risk of disease. To investigate the natural variation in circadian behavior, we analyzed pre-existing locomotor activity data from female mice in the Hybrid Mouse Diversity Panel (HMDP), a well-characterized and genetically diverse set of ~100 inbred strains of mice. Using ClockLab software, we extracted quantitative circadian parameters, including activity onset and offset, activity levels during the active and inactive cycles, and bout characteristics. These traits varied across strains, suggesting a heritable basis for circadian behavioral diversity. This study established a comprehensive dataset of circadian traits that will enable further analysis through genome-wide association studies. These data allow mapping of loci that influence rhythmic behavior and may uncover novel or underappreciated regulators of circadian processes. Our findings contribute to a systems genetics approach to understanding the complex genetic makeup of circadian rhythms.

Namburu

Transcriptomic Response of Neural Progenitor Cells to Lead Exposure

SREYAN SARKAR1, Ana Rodriguez Vega2,3, Timothy Derebensky2,3, Rachel Fox2,3, Yanyuan Kang4, Patrick Alllard4, Michael Wells2,3

1 Department of Molecular, Cellular, and Developmental Biology, UCLA, Los Angeles, CA, USA

2 Neuroscience Interdepartmental Program, David Geffen School of Medicine, Los Angeles, CA, USA

3 Department of Human Genetics, David Geffen School of Medicine, Los Angeles, CA, USA

4 Department of Molecular Toxicology, David Geffen School of Medicine, Los Angeles, CA, USA

Lead exposure in the earliest stages of brain development can disrupt neural progenitor cell (NPC) development – a key cell type in CNS ontogeny – heightening the risk of neurodevelopmental disorders. To uncover the cellular mechanisms behind this, induced Pluripotent Stem Cells (iPSCs) from 38 different donors were induced into NPCs and then co-cultured in a “cell village” model. Villages were exposed to lead (Pb; 3 and 10 μM) for four days, followed by single-cell RNA sequencing (scRNA-seq) to assess transcriptomic responses. scRNA-seq analysis involves many preprocessing steps prior to conducting differential gene expression analysis on cell types of interest (NPCs). Initial clustering of cells showed cell cycle and sex influenced much of the variation in the data, so both were regressed out to ensure that variation was largely dictated by cell type. This enabled precise assignment of cell types, significantly improving our ability to conduct DEG and other analysis on NPCs.

Exploring the effects of microtubules on cellular gene expression in the mouse dental incisor epithelium using scRNA-seq 

CHRISTIAN SUAREZ1, MANYA LALWANI1, Qianlin Ye2, Abinaya Thooyamani2, Jimmy Hu2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Division of Oral and Systemic Health Sciences, School of Dentistry, UCLA
The mouse incisor provides a tractable system for studying the mechanisms regulating regeneration, cell behavior, and organization due to the sustained cell renewal enabled by epithelial and mesenchymal stem cells. These populations are governed by biochemical signaling and tissue mechanical forces that influence their proliferative and differentiation potential. While cell organizations modulate mechanical responses, how these physical factors regulate gene expression and cell fates remain an important question. Microtubules are a crucial cytoskeletal component that contributes to cell shape and functions. The microtubule severing protein, spastin, enables a platform for investigating the roles of cell shapes and tissue organization. To identify the effects of severed microtubules, we combined Seurat, CellChat, and SCENIC workflows. Studying the signaling interaction by perturbing cell shape and packing triggers, a cascading failure in cell differentiation, proliferation, and metabolic functions, compromising regenerative capacity. This work elucidates the mechanistic consequences of microtubule disruption on oral epithelial regeneration.

suarez_christian_poster_UCLA

Unraveling the cell type-specific regulatory mechanisms underlying neuropsychiatric and neurodegenerative disease risk

LISE TUCKER1, Lena Krockenberger2, Brunilda Balliu3,4,5

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Bioinformatics Interdepartmental Graduate Program, UCLA

3 Department of Computational Medicine, UCLA

4 Department of Pathology and Laboratory Medicine, UCLA

5 Department of Biostatistics, UCLA

Most neurodegenerative and neuropsychiatric GWAS variants are in non-coding regions and their impact on disease remains unexplained by current bulk brain tissue eQTLs. Single-nucleus studies show greater success linking cell type-specific eQTLs with GWAS variants, but many context-specific eQTLs remain undiscovered due to low power from small cell counts, limited definitions of cell type-specificity, and intra-individual correlation inherent to snRNA-seq studies. This project addressed these gaps by applying FastGxC, a powerful context-dependent cis-eQTL mapping method, to ROSMAP DLPFC snRNA-seq data and then integrated these eQTLs with neuropsychiatric and neurodegenerative GWAS summary statistics using LDSC. We identified 2,217 cell type-specific and 2,300 shared eQTLs: 30.6% were shared only, 33.1% specific only and 36.3% both. LDSC results showed that the specific associations were more enriched for trait heritability than the shared associations. Overall, this work reveals powerful insights into the cell type-specific regulatory mechanisms underlying neuropsychiatric and neurodegenerative diseases.

Tucker

Quantitative Phase Analysis of Dynamic Cellular Internal Motion via Extracellular Fluid Viscosity Modulation.

PRANNAY VEERABAHU1, Thang L. Nguyen2, Michael A. Teitell3

1 Department of Chemistry and Biochemistry, University of California at Los Angeles

2 Department of Bioengineering, University of California at Los Angeles

3 Department of Bioengineering, Molecular Biology Institute, Broad Center for Regenerative Medicine and Stem Cell Research, California NanoSystems Institute, Department of Pathology and Laboratory Medicine, Department of Pediatrics, and Jonsson Comprehensive Cancer Center, David Geffen School of Medicine, University of California at Los Angeles

Cellular motion is critical for maintaining homeostasis and physiological function, with changes in motility linked to cancer dissemination and metastasis. However, the mechanisms underlying motility’s role in cancer progression remain unclear. Recent studies indicate that cellular motion is composed of two components: internal motion, driven by movement of internal molecules against the cellular framework, and external motion through the physical environment. Motility has been shown to influence chemotherapy drug efficacy. Thus, we hypothesize that internal motion, modulated through extracellular fluid viscosity (ECV), contributes to drug efficacy by impeding or aiding intracellular dissemination. Using Quantitative Phase Imaging, we analyzed single-cell biomass dynamics to quantify growth heterogeneity, growth rate, and cellular proliferation under drug-stress at varied ECVs, probing a potential link between TRPV4-regulated internal motion and drug efficacy. We establish internal motion as a key pharmacological variable, with contributions to drug efficacy by both the physical environment and internal motion signaling pathways.

Veerabahu_Prannay-BIG-Poster_VF04

Application of Deep Learning Classification Models in Anterior Segment Optical Coherence Tomography Images for Improving Limbal Stem Cell Deficiency Diagnosis

SARAH WU 1, Micah Vinet 2,3, William Speier 2,4, Sophie Deng 3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Bioengineering, UCLA

3 Stein Eye Institute, UCLA

4 Department of Radiology, David Geffen School of Medicine, UCLA

Limbal Stem Cell Deficiency (LSCD) is a degenerative eye disease marked by the loss of corneal epithelial cells, impairing wound healing. Anterior segment optical coherence tomography (AS-OCT) enables measurement of corneal epithelial thickness, a key indicator of LSCD severity, but clinical criteria for grading severity and distinguishing scarred from healthy tissue remain unclear. We developed a machine learning framework using a modified InceptionV3 model to classify AS-OCT images into LSCD disease severity classes. A deep learning based classifier was first trained to differentiate control vs. severe LSCD patches, which were then mapped back to full-sized images. The model was further extended to classify all LSCD severity levels and distinguish epithelial from non-epithelial tissue, with the ultimate goal of identifying scarred vs. healthy regions. The model demonstrated 97.99% precision and 93.21% recall in binary classification and 82% accuracy in multiclass classification. These findings highlight the potential of deep learning models to enhance efficiency and reduce clinical workload in LSCD diagnosis.

Wu

NutriOmics: A species- and tissue-specific nutrient signature database

SHERRI XU1, Montgomery Blencowe2,3, Daniel Ha3, Xia Yang2,3

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Integrative Biology and Physiology, UCLA

3 Molecular, Cellular, and Integrative Physiology Interdepartmental Program, UCLA

Nutrients are essential for human health, yet no consistent, well-structured, and easily queryable databases comprehensively detail the mechanisms underlying their beneficial or deleterious effects on the complex organ systems. Elucidating species- and tissue-specific effects of nutrients on gene expression can provide valuable molecular insights into their physiological or pathological effects on health. Here, we present NutriOmics, a nutrient knowledgebase and analytical platform hosted on an interactive web server. Leveraging transcriptome data from human, mouse, and rat curated from the Gene Expression Omnibus, we constructed a database that contains the gene signatures and pathways of individual nutrients, allowing both simple queries and more sophisticated gene overlap analysis and network-based nutrient matching for different disease conditions. We demonstrate the utility of NutriOmics in identifying shared gene targets between nutrients and diseases, thereby highlighting nutrients with potential to modify disease pathways networks. By integrating tissue- and species-specific nutrient signatures with gene networks, NutriOmics enhances our systematic understanding of how nutrients affect individual tissues and organ systems in different species and our ability to uncover data- and mechanism-driven personalized nutrition to improve health outcomes and prevent or mitigate various diseases.

Xu

Analysis of RNA Editing in NFT-bearing and NFT-free Neurons

ANGELA ZHANG1, Ting Fu2, Xinshu Xiao2,3

1BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2Department of Integrative Biology and Physiology, UCLA

3Bioinformatics Interdepartmental Program, UCLA

Hyperphosphorylated tau proteins aggregate into neurofibrillary tangles (NFTs), which are hallmarks of Alzheimer’s Disease (AD). While a difference in RNA editing has been found between AD patients and controls, the link between RNA editing and NFTs remains unclear. Here, we analyzed single-cell RNA-seq data from 8 AD donors and called differential RNA editing sites (DEs) between NFT-bearing and NFT-free neurons using de novo RNA editing identification and REDIT testing for differential sites. We identified 64,087 DEs in excitatory neurons, 56% of which showed an increase in editing, and 7,661 in inhibitory neurons, 76% showed an increase. 50 DEs fall in protein-coding regions, including SORBS1, GRIK2, NAE1, and HMGA2, which are AD-relevant genes. To our knowledge, this study is the first to map global RNA editing profiles related to tau pathology, and the identified NFT-neuron-specific editing sites help pave the way to further understand mechanisms behind RNA editing’s relevance to tauopathies.

Zhang_Angela_BIG_2025_Poster_PRINT

A Framework for Rigorous Cell Segmentation and Annotation for Xenium Spatial Transcriptomics

ZHIYUAN ZHU1, Helen Huang2,3, Xiaolu Guo2, Alexander Hoffmann2

1 BIG Summer Program, Institute for Quantitative and Computational Biosciences, UCLA

2 Department of Microbiology, Immunology and Molecular Genetics, UCLA

3 Bioinformatics Interdepartmental PhD Program, UCLA

In imaging-based spatial transcriptomics like Xenium, cell segmentation and cell-type annotation (S&A) affect the reliability of biological conclusions. However, achieving accurate S&A is challenging due to tissue sectioning that leads to overlapping cells, nucleus-less cells, and lateral transcript diffusion. Although numerous S&A algorithms have been developed, evidence-based strategies for optimizing image analysis parameters and estimating accuracy are lacking. We evaluated widely used S&A tools by assessing single-cell expression profiles. Our results demonstrate that default parameters often lead to mis-segmentation and fail to classify cell types. To resolve these limitations, we developed a framework for optimizing segmentation parameters based on cellular morphological properties, combined with hierarchical annotation that is both knowledge- and data-driven. Our framework generated accurate cell maps, validated by marker purity. This work highlights the limitations of existing tools and provides a framework and criteria for parameter tuning to generate rigorous insights from complex spatial data.

Zhu