QCB Bioinformatics Core

About

The QCB Bioinformatics Core offers next-generation sequencing data processing and analysis service

Mission

The Collaboratory aims to develop and execute computational methods and procedures that will facilitate sequencing data analysis for the UCLA community. As part of that effort, we have developed pipelines that are specifically designed to optimize the use of the hoffman2 cluster resources. As part of this new service, QCB collaboratory fellows will analyze next-generation sequencing data submitted to us, and provide users with their results. Examples of data that we can analyze include RNA-seq, ChIP-seq, and bisulfite seq, and we can also perform variant calling on whole genomes or exomes. Please contact Matteo Pellegrini matteop@mcdb.ucla.edu, if you are interested in participating. 
 The service is provided at no cost to users, but we do ask that the collaboratory fellow that analyzes your data be acknowledged as an author on any eventual publications.

Example service: RNA-Seq Analysis with Tophat and Cufflinks

Authors:

Serghei Mangul (serghei@cs.ucla.edu), Yehudit Hasin (yehudit.hasin@gmail.com)

pipeline_2

Technological advances of recent years have made RNA-Seq the method of choice for examining gene expressions. Our service will allow groups at UCLA to utilize RNA-Seq and obtain gene expression levels across multiple samples. Offered service is intended to facilitate RNA-Seq data analysis. We offer ultra-fast pipelines able to analyze large number of RNA-Seq samples in a matter of days. Pipelines use Hoffman2 infrastructure and are capable of analyzing 100 samples per day. RNA-Seq pipeline performs splice alignment of RNA-Seq reads onto a genome to identify exon-exon splice junctions. We use Tophat2 (version 2.0.9) built on the short read mapping program Bowtie2. Aligned RNA-Seq reads are used to assemble alternative splicing transcripts and estimate expression levels of assembled transcripts and genes. We use Cufflinks (version v2.1.1) to assemble aligned RNA-Seq reads into transcripts and estimate gene expression levels.

Delivery of the results

We deliver results according to your preference in any file format. We offer an option to deliver results by email or by hoffman2 shared infrastructure.

Technical details

Each RNA-Seq sample is partitioned into batches of 1M reads, each batch is mapped separately, dramatically increasing the speed of the analyses. Aligned batches are merged into a single alignment. Pipeline will optimize insert size length of the sample (-r option in Tophat) for paired-end reads based on aligning the reads onto the database of annotated transcripts (we use UCSC annotation database). For the assembly and quantification part, the pipeline runs Cufflinks on each reference chromosome separately. Partitioning the cufflinks analyses allows us to increase robustness and speed of the analyses. The results of parallel RNA-Seq analysis are consistent with results in single process mode.

More information

The RNA-Seq pipelines itself is freely available for all hoffman2 users. Please contact Serghei Mangul serghei@cs.ucla.edu, if you are interested in running the pipeline for your analyses on hoffman2. Assistance will be provided.