Workshop Description

In this workshop, we will go over statistical methods used to address common questions related to genomic data analysis, and their implementation in R. This workshop will include four segments.

  1. NP classification and feature ranking, with Jingyi Jessica Li. This segment will introduce the Neyman-Pearson classification algorithm as an approach to control the more severe type of error in asymmetric binary classification and the related feature selection problem. Learn more in Tong et al. 2018, Science Advances and Li et al., Journal of Machine Learning Research.
  2. Clipper: p-value-free false discovery rate (FDR) control on high-throughput data from two conditions, with Xinzhou Ge. Large-scale feature screening is ubiquitous in high-throughput biological data analysis, such as differential gene expression analysis. Clipper is a general statistical framework for identifying the features that differ between conditions with theoretical FDR control and without p-value requirement. Clipper is a versatile and effective tool for correcting the FDR inflation crisis in multiple bioinformatics applications. Learn more in Ge et al. 2021, Genome Biology.
  3. Large-sample differentially expressed gene (DEG) analysis, with Yumei Li. Based on our reported surprising finding that popular differential expression methods have high false discovery rates on population-level RNA-seq data with large sample sizes, this segment will recommend the Wilcoxon rank-sum test. Learn more in  Li et al. 2022, Genome Biology.
  4. Hidden covariates in quantitative trait locus (QTL) analysis, with Heather Zhou. Estimating and accounting for hidden variables is widely practiced as an important step in quantitative trait locus (QTL) analysis for improving the power of QTL identification. This segment explores the best practices for hidden variable inference in QTL analysis. Learn more in Zhou et al. 2022, bioRxiv.

This workshop is addressed to computational biologists interested in RNA-seq data analysis.

  • TBD

Technical Requirements

A computer with R Studio

Instructor

Jingyi Jessica Li is an Associate Professor in the Department of Statistics (primary) and the Departments of Biostatistics, Computational Medicine, and Human Genetics (secondary) at UCLA. Prior to joining UCLA in 2013, Jessica obtained Ph.D. from UC Berkeley, where she worked with Profs. Peter J. Bickel and Haiyan Huang, and B.S. (summa cum laude) from Tsinghua University, China. At UCLA, Jessica leads the group “Junction of Statistics and Biology” that comprises students from interdisciplinary backgrounds. On the statistical methodology side, her research interests include association measures, asymmetric classification, p-value-free false discovery rate control, and high-dimensional variable selection. On the biomedical application side, her research interests include bulk and single-cell RNA sequencing, comparative genomics, and information flow in the central dogma. Jessica is the recipient of the Alfred P. Sloan Research Fellowship (2018), the Johnson & Johnson WiSTEM2D Math Scholar Award (2018), the NSF CAREER Award (2019), and the MIT Technology Review 35 Innovators Under 35 China (2020).

Workshop Details

Required Prerequisites: Basic knowledge in Statistics and Machine Learning. Experience with R is required.
Length: 1 day, 4 hrs
Level: Introductory
Location: Online steam
Seats Available: N/A

Upcoming Dates

REGISTRATION IS CLOSED!

May 27, 2022 (9 AM – 11 AM PT and 1:30PM – 3:30PM)