QCB encompasses a broad range of quantitative and computational biosciences research. We develop cutting edge quantitative and computational tools ranging from statistical analysis and modeling approaches to physics-based algorithms and mechanistic modeling.

Computational Analysis and Software Tools by QCBio-affiliated Laboratories

Interpreting Next Gen Seq Data

CLAM: CLIP-seq Analysis of Multi-mapped reads
CLAM uses an expectation-maximization algorithm to assign multi-mapped reads and calls peaks combining uniquely and multi-mapped reads. To demonstrate the utility of CLAM, we applied it to a wide range of public CLIP-seq/RIP-seq datasets involving numerous splicing factors, microRNAs, and m6A RNA methylation. CLAM recovered a large number of novel RNA regulatory sites inaccessible by uniquely mapped reads.
https://github.com/Xinglab/CLAM

GENESCISSORS
http://csbio.unc.edu/genescissors/
a comprehensive approach to detecting and correcting spurious transcriptome inference due to RNAseq reads misalignment.

GIREMI
https://www.ibp.ucla.edu/research/xiao/GIREMI.html
GIREMI is a method to identify RNA editing sites and distinguish them from SNPs using RNA-Seq data.
http://www.nature.com/nmeth/journal/v12/n4/full/nmeth.3314.html

ImReP: Immune Repertoire Profiling by RNA Sequencing
ImReP is a novel computational method for rapid and accurate profiling of the adaptive immune repertoire from regular RNA-Seq data. Applying it to Genotype-Tissue Expression (GTEx v6) RNA-seq data ImReP is able to efficiently extract TCR- and BCR- derived reads and accurately assemble the complementarity determining regions 3 (CDR3s). Using ImReP, we have created the systematic atlas of B- and T-cell repertoires (https://sergheimangul.wordpress.com/atlas-immune-repertoires/), which is the largest collection of CDR3 sequences and tissue types. ImReP is freely available at https://sergheimangul.wordpress.com/imrep/.

LAPELS
https://github.com/shunping/lapels
Remaps reads aligned to the in silico genome back to the reference coordinate and annotates variants.

MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification
MSIQ provides more accurate and robust isoform quantification than competing software by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify the consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples with more weights on the consistent group.

NMFP
(Non-negative Matrix Factorization based Preselection) file://localhost/(http/::www.stat.ucla.edu:~jingyi.li:software-and-data.html
NMFP is a non-negative matrix factorization based pre-selection method to increase accuracy of identifying mRNA isoforms from RNA-seq data.

RASER https://www.ibp.ucla.edu/research/xiao/RASER.html
Brief description: Read aligner for SNPs and editing sites of RNA.
http://bioinformatics.oxfordjournals.org/content/early/2015/09/04/bioinformatics.btv505.abstract

RNA-Skim
https://githum.com/zzj/RNASkim
A rapid method for RNA-Seq quantification at transcript level

ROP
https://github.com/smangul1/rop
A computational protocol aimed to discover the source of all reads, which originated from complex RNA molecules, recombinant antibodies and microbial communities.

SURVIV
https://github.com/Xinglab/SURVIV
Survival Analysis of mRNA Isoform Variation

SUSPENDERS
https://github.com/holtjma/suspenders
Merges multiple alignments of the same reads under different pretenses.

Exploring Molecular Genomics

CHROMHMM 

http://compbio.mit.edu/ChromHMM/
ChromHMM allow for chromatin state discovery and characterization.
http://www.nature.com/nmeth/journal/v9/n3/full/nmeth.1906.html

CHROMIMPUTE http://www.biolchem.ucla.edu/labs/ernst/ChromImpute/
ChromImpute allow imputing specific epigenomic data, based on a number of available datasets. This may fill in missing data, or provide a means to identify and correct low quality data. http://www.nature.com/nbt/journal/v33/n4/full/nbt.3157.html

DYNAMIC REGULATORY EVENTS MINER (DREM) 

http://www.sb.cs.cmu.edu/drem/
DREM is used for the analysis of dynamical changing TF binding events or mRNA abundances, as revealed in time series NGS datasets.
http://msb.embopress.org/content/3/1/74

MATS
http://rnaseq-mats.sourceforge.net/
A computational tool to detect differential alternative splicing events from RNA-Seq data.

RRHO
(Rank-Rank Hypergeometric Overlap gene expression signature comparison)
http://systems.crump.ucla.edu/rankrank
Algorithm for comparing and visualizing overlap in two gene expression signatures input as ranked gene lists.

SAVANT
http://pathways.mcdb.ucla.edu/savant/
The web-based Signature Visualization Tool (SaVanT) visualizes these cell-type-specific gene expression signatures in user-generated expression data.

SHORT TIME-SERIES EXPRESSION MINER (STEM)
http://sb.cs.cmu.edu/stem/
For clustering and analyzing short time series gene expression data. http://www.biomedcentral.com/1471-2105/7/191

TROM
https://cran.r-project.org/web/packages/TROM/index.html
For comparing transcriptomes of two biological samples from the same or different species. The comparison (i.e., transcriptome mapping) is conducted based on the overlap of the associated genes of different samples. More examples and detailed explanations are available in the vignette.

WGCNA
(Weighted Gene Co-Expression Network Analysis)
https://labs.genetics.ucla.edu/horvat/CoexpressionNetwork/

Interpreting Molecular Genomics

CELLFI
CELLFI is a tool to identify the epigenetic fingerprint of a particular cellular subset through selection of unique CpG loci with cell-specific methylation patterns. CELLFI utilizes the cellular fingerprints to estimate the cellular proportions from a complex tissue. We also demonstrate our method on clinical samples from patients undergoing immune reconstitution, metabolic syndrome, and cancer patients.

ConsHMM
ConsHMM extends the existing ChromHMM software in order to systematically annotate genomes into ‘conservation states’ based on the combinatorial and spatial patterns to which species align to. These states have base-wise resolution and are able to capture distinct enrichments such as regions of open chromatin, CpG islands, transcription start sites, repeat families, exons, proximity to specific gene families or tissue expression, and phenotype-associated genetic variation. These annotations are a resource for interpreting putative regulatory regions and disease associated variation that is complementary to existing conservation representations and/or functional genomics-based annotations.

GEDIT: Gene Expression Deconvolution Interactive Tool
GEDIT utilizes gene expression data to estimate in silico the cell type composition of an unknown mixture. This tool searches for a combination of immune and other cell types (with known expression patterns) that best explains the expression profile observed in the submitted tissue sample. GEDIT outperforms leading tools in the field and is available online atwww.webtools.mcdb.ucla.edu. It currently provides the user with a choice of 5 curated reference matrices for human data and 2 for mouse data, or to submit their own.

Word2vec
The Gene Ontology (GO) contains GO terms that describe biological functions of genes and proteins in the cell. Our new method uses the Word2vec model to compare two words, two sentences, and definitions of two GO terms. Because a gene or protein is annotated by a set of GO terms, we can apply our method to compare two genes or two proteins. Our results are equivalent to those of previous methods which depend on the GO tree. This gives promise to the development of NLP methods in comparing GO terms.

Interpreting Clinical Genetics

ASGENSENG
https://sourceforge.net/projects/asgenseng/
A software to detect allele-specific CNV from both WGS and WES data.

BEAST
Bayesian Evolutionary Analysis by Sampling Trees for bayesian phylogenetic inference.
http://mbe.oxfordjournals.org/content/29/8/1969

CAVIAR
(CAusal Variants Indentification in Associated Regions)
http://genetics.cs.ucla.edu/caviar/ or https://github.com/fhormoz/caviar
CAVIAR implements a new statistical framework that allows for the possibility of an arbitrary number of causal variants in genome-wide association studies. http://www.genetics.org/content/early/2014/08/06/genetics.114.167908

eQTLs
Expression quantitative trait loci (eQTLs) are genetic variants associated with gene expression, and eGenes are genes whose expression levels are associated to genetic variants. eQTL and eGenes provide great supporting evidence for GWAS hits and provide potential insights into the regulatory pathways involved in disease. However, for some tissues insufficient datasets are available. We introduce a meta-analysis model for finding eGenes in data from many tissues, and show that our model is better than other types of meta-analyses. Source code and supplementary data are at https://github.com/datduong/RECOV.

FASTANOVA
http://compgen.unc.edu/wp/?page_id=275
An Efficient Algorithm for Genome-Wide Association Study

FOURSITE
https://github.com/LohmuellerLab/FourSite
Estimates heterozygosity from sequencing reads in low-coverage data

GAIA
https://sourceforge.net/projects/discriminatives/
Implementation of “GAIA: graph classification using evolutionary computation” in SIGMOD’10.
a discriminative subgraph pattern mining algorithm using evolutionary computation
implemented by the author

GAIN
http://liuyi1.com/GAIN/
Efficient Genome Ancestry Inference in Complex Pedigre

GENOTYPE SEQUENCE SEGMENTATION
http://compgen.unc.edu/wp/?page_id=253
http://compgen.unc.edu/wp/wp-content/uploads/2008/07/minseg-final.pdf

GENSENG
https://sourceforge.net/projects/genseng/
A software detecting CNVs(Copy Number Variations) from NGS(Next Generation Sequencing) data.

HTREEQA
http://www.csbio.unc.edu/htreeqa/
Using semi-perfect phylogeny trees in quantitative trait loci study on genotype data

IGMS
(Inferring Genome-wide Mosaic Structure)
http://compgen.unc.edu/wp/?page_id=256
http://compgen.unc.edu/minmosaic/

LAMP-LD
Local ancestry inference aims to infer the ancestry at every position along an individual’s genome. I will describe latent variable models for local ancestry inference that captures the fine-scale correlation structure. The first model, implemented in a program called LAMP, uses a Bayesian Hidden Markov Model with an efficient initialization based on spectral clustering. The second, LAMP-LD uses a two-level hidden Markov Model to model within population linkage disequilibrium allowing the methods to infer local ancestry using dense genomewide genotype data. We show that these methods are highly accurate while being computationally efficient for large genomic datasets and have been used in a number of applications such as finding genetic variants associated with phenotype in admixed populations.

MACH-ADMIX
http://www.unc.edu/~yunmli/MaCH-Admix/
a genotype imputation software that extends the capabilities of MaCH 1.0.

MENDEL http://www.genetics.ucla.edu/software/download?package=1
Mendel is a comprehensive Package for Statistical Analysis of Qualitative and Quantitative Traits.
http://www.ncbi.nlm.nih.gov/pubmed/26567478 http://www.ncbi.nlm.nih.gov/pubmed/24955378

moloc 
Expression QTLs (eQTLs) and methylation QTLs (mQTLs) help pinpoint the responsible gene among the GWAS regions that harbor many genes. Multiple-trait-coloc (moloc) is a Bayesian statistical framework that integrates GWAS summary data with multiple molecular QTL data to identify regulatory effects at GWAS risk loci, and thus help prioritize diseases associated genes.

NPUTE
http://compgen.unc.edu/wp/?page_id-57
Fast Algorithm for Imputing Missing Genotypes in SNPs

PAINTOR http://bogdan.bioinformatics.ucla.edu/software/paintor
PAINTOR integrates functional and association data in fine-mapping studies

PASANIUC LAB TOOLS
Integrating functional data to prioritize causal variants in statistical fine-mapping studies.
PLoS Genet. 2014 Oct 30;10(10):e1004722.
Leveraging functional annotation data in trans-ethnic fine-mapping studies.
Am J Hum Genet. 2015, 97(2):260-71.

ρ-HESS
ρ-HESS is a method to quantify the correlation between pairs of traits due to genetic variation at a small region in the genome. It only requires GWAS summary data, and makes no distributional assumption on the causal variant effect sizes while accounting for linkage disequilibrium (LD) and overlapping GWAS samples. Analyzing large-scale GWAS summary data across multiple complex traits, novel genomic regions may be identified that contribute significantly to the genetic correlation among these traits.

PREFERSIM
https://github.com/LohmuellerLab/PReFerSim
Performs forward in time population genetic simulations.

REM
http://csbio.unc.edu/eQTL/

TREEQA
http://compgen.unc.edu/wp/?page_id=239
Tree-based quantitative genome-wide association mapping

TWAS http://bogdan.bioinformatics.ucla.edu/software/twas/
Transcriptome-wide association study through expression imputation
Nat Genet. 2016 48(3):245-52

Integrating Clinical Data

ANGICART
https://github.com/mnewberry/angicart/
Angicart analyzes 3d radiographic images of blood vessels to determine the centerlines, topology, radius, length, and volume of blood vessel segments.
PLoS Comput Biol 11(8): e1004455.

FFSM
Fast Frequent Subgraph Mining
https://sourceforge.net/projects/ffsm/

LTS
https://sourceforge.net/projects/learning2search2/
is an optimized Java implementation of the algorithm from “LTS: Discriminative Subgraph Mining by Learning from Search History” in Data Engineering (ICDE), IEEE 27th International Conference, pages 207-218, 2011.

MERGEOMICS
http://mergeomics.research.idre.ucla.edu
Mergeomics integrates multidimensional genomic data to identify biological pathways, gene networks, and key regulators of a disease or physiological trait.

http://biorxiv.org/content/early/2016/01/07/036012

[/expand]

[expand title=”Making Biomed BigData Accessible“]LOS ANGELES DATA RESOURCE (LADR)
http://ctsi.ucla.edu/researcher-resources/pages/LADR
A joint project of major Los Angeles healthcare provider organizations (including UCLA, Cedars-Sinai (CSMC), Charles Drew University (CDU), USC, Children’s Hospital Los Angeles (CHLA) and the City of Hope) aimed at enabling research that improves the health of all people in the region using data representing the continuum of care across the region’s major health systems. LADR allows investigators to conduct interactive searches across the participating organizations on patient demographics, diagnosis and procedure codes (ICD-9 and CPT), labs, and medications and will be available to you and your research team for recruitment purposes for your study. LADR formally launched in May 2014 with two organizations, UCLA and CSMC, and a total of 6.8 million patient records. Three additional institutions, USC, CHLA, and CDU, have joined LADR since 2015. A future key feature being being developed for LADR is its “private record linkage” technology that identifies data from the same patients across the participating organizations. By creating this linkage, LADR will enable institutions to assemble more data on patient treatments and other exposures along with more data on their outcomes, empowering research that could not be conducted by any individual organization.

OHDSI
http://www.ohdsi.org
The Observational Health Data Sciences and Informatics (or OHDSI, pronounced “Odyssey”) program is a multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics.
http://www.ncbi.nlm.nih.gov/pubmed/26262116

REDCAP: RESEARCH ELECTRONIC DATA CAPTURE
http://ctsi.ucla.edu/researcher-resources/pages/REDCap
Research Electronic Data Capture) is a secure, HIPAA compliant web-based application for quickly building and managing online surveys, data collection forms and databases. REDCap provides audit trails for tracking data manipulation and user activity, as well as automated export procedures for seamless data downloads to Excel, PDF, and common statistical packages (SPSS, SAS, Stata R).

UC-RESEARCH EXCHANGE
http://ctsi.ucla.edu/researcher-resources/pages/ucrex
The University of California Research eXchange (UC-ReX) is a joint activity of the 5 University of California (UC) CTSAs, charged with fostering multi-site clinical research by providing access to harmonized clinical data from the 5 health systems. The UC Rex Data Explorer is a secure online system designed to enable UC clinical investigators to identify potential research study cohorts spanning the five UC medical centers. The Data Explorer allows investigators to conduct interactive searches of data derived from patient care activities at Davis, Irvine, Los Angeles, San Diego and San Francisco. Search criteria can include demographics, diagnosis and procedure codes (ICD-9 and CPT), labs, and medications. The output of each query from the UC ReX Data Explorer is numerics count of patients by site that match the criteria identified in the query. The numeric count helps investigators assess the feasibility of their study idea by identifying whether there are sufficient numbers of prospective subjects within the UC system.

Making Biomed BigData Accessible

LOS ANGELES DATA RESOURCE (LADR)
http://ctsi.ucla.edu/researcher-resources/pages/LADR
A joint project of major Los Angeles healthcare provider organizations (including UCLA, Cedars-Sinai (CSMC), Charles Drew University (CDU), USC, Children’s Hospital Los Angeles (CHLA) and the City of Hope) aimed at enabling research that improves the health of all people in the region using data representing the continuum of care across the region’s major health systems. LADR allows investigators to conduct interactive searches across the participating organizations on patient demographics, diagnosis and procedure codes (ICD-9 and CPT), labs, and medications and will be available to you and your research team for recruitment purposes for your study. LADR formally launched in May 2014 with two organizations, UCLA and CSMC, and a total of 6.8 million patient records. Three additional institutions, USC, CHLA, and CDU, have joined LADR since 2015. A future key feature being being developed for LADR is its “private record linkage” technology that identifies data from the same patients across the participating organizations. By creating this linkage, LADR will enable institutions to assemble more data on patient treatments and other exposures along with more data on their outcomes, empowering research that could not be conducted by any individual organization.

OHDSI
http://www.ohdsi.org
The Observational Health Data Sciences and Informatics (or OHDSI, pronounced “Odyssey”) program is a multi-stakeholder, interdisciplinary collaborative to bring out the value of health data through large-scale analytics.
http://www.ncbi.nlm.nih.gov/pubmed/26262116

REDCAP: RESEARCH ELECTRONIC DATA CAPTURE
http://ctsi.ucla.edu/researcher-resources/pages/REDCap
Research Electronic Data Capture) is a secure, HIPAA compliant web-based application for quickly building and managing online surveys, data collection forms and databases. REDCap provides audit trails for tracking data manipulation and user activity, as well as automated export procedures for seamless data downloads to Excel, PDF, and common statistical packages (SPSS, SAS, Stata R).

UC-RESEARCH EXCHANGE
http://ctsi.ucla.edu/researcher-resources/pages/ucrex
The University of California Research eXchange (UC-ReX) is a joint activity of the 5 University of California (UC) CTSAs, charged with fostering multi-site clinical research by providing access to harmonized clinical data from the 5 health systems. The UC Rex Data Explorer is a secure online system designed to enable UC clinical investigators to identify potential research study cohorts spanning the five UC medical centers. The Data Explorer allows investigators to conduct interactive searches of data derived from patient care activities at Davis, Irvine, Los Angeles, San Diego and San Francisco. Search criteria can include demographics, diagnosis and procedure codes (ICD-9 and CPT), labs, and medications. The output of each query from the UC ReX Data Explorer is numerics count of patients by site that match the criteria identified in the query. The numeric count helps investigators assess the feasibility of their study idea by identifying whether there are sufficient numbers of prospective subjects within the UC system.

Interpreting biomedical BigData

ResearchMaps.org
ResearchMaps helps scientists to plan their next experiment by navigating the enormous space of causal and molecular mechanistic information (“Big Knowledge”). Users manually enter the empirical results and hypothetical assertions from research articles, which our interface visualizes in a graphical summary known as a “research map.” Nodes in the research map identify the phenomena that were studied; edges between nodes show the kinds of relations that were either supported by experiments or hypothesized by researchers. We apply a Bayesian approach to quantify both the convergence and consistency of the empirical evidence, helping the user to identify which new experiments may prove most instructive. In the graphical structure of a research map, every edge is linked to the article(s) it references, allowing the user to retrieve additional details of the annotated literature.
http://researchmaps.org

sparsebn R package
The sparsebn R package was developed for learning the structure of large, sparse graphical models with a focus on Bayesian networks. While there are many existing packages for this task within the R ecosystem, this package focuses on the unique setting of learning large networks from high-dimensional data, possibly with interventions. As such, the methods provided place a premium on scalability and consistency in a high-dimensional setting. The sparsebn package is open-source and available on CRAN (http://CRAN.Rproject.org/package=sparsebn).

Exploring Dynamical Cell Biology

CTCS MODELS: Co-transcriptional constitutive Splicing model
https://github.com/jdavisturak/CTCSmodel
Co-transcriptional splicing is a dynamic process that renders introns as potential bottlenecks for efficient mRNA processing. This model allows exploration of the parameters that affect the efficiency of constitutive mRNA processing.
Nucleic Acids Res. 2015 Jan;43(2):699-707

DSNICKFURY
https://github.com/michael-weinstein/dsNickFury2
A Python3 program to help select guide RNA sequences for use with any CRISPR/Cas system.

Fit∂a∂i
The distribution of fitness effects (DFE) has considerable importance in population genetics. Here we present a flexible and computationally tractable method, called Fit∂a∂i, to estimate the DFE of new mutations using the site frequency spectrum from a large number of individuals. We apply our approach to the frequency spectrum of 1300 Europeans from the Exome Sequencing Project ESP6400 dataset, 1298 Danes from the LuCamp dataset, and 432 Europeans from the 1000 Genomes Project to estimate the DFE of deleterious nonsynonymous mutations. We infer significantly fewer (0.38-0.84 fold) strongly deleterious mutations with selection coefficient |s| > 0.01 and more (1.24-1.43 fold) weakly deleterious mutations with selection coefficient |s| < 0.001 compared to previous estimates. Furthermore, a DFE that is a mixture distribution of a point mass at neutrality plus a gamma distribution fits better than a gamma distribution in two of the three datasets. Our results suggest that nearly neutral forces play a larger role in human evolution than previously thought.

Fitness landscapes and evolutionary forecasting
To predict the dynamics of the simplest evolutionary systems, only three parameters are required: population size, mutation rate, and the fitness effects of mutations. The first two are relatively easy to obtain for natural populations, but acquiring information about the space of fitness effects, aka the fitness landscape, is a monumental task that has only recently seen significant advances. With fitness landscapes measured for microbial populations, simulations can be run, and evolutionary forecasts can be made to predict how bacteria and viruses will evolve in the face of environmental stressors, such as antibiotics and antivirals. Analyzing fitness landscapes is a distinct problem involving visualization of large amounts of data and quantification of epistasis. I will show examples of fitness landscapes and their quantitative properties for a range of organisms facing various environmental challenges, and show how to make evolutionary forecasts based on these data.

FLOWMAX
Interpreting Lymphocyte dynamics
http://www.signalingsystems.ucla.edu/models-and-code/
Dye dilution experiments (typically CFSE) are commonly used to investigate the population dynamics of lymphocytes in response to immunogenic stimulation. FLowMax interprets such data to derive cell biological parameters (such as probability to grow, time to division and time to death) with a measure of confidence.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0067620

KINETIC MODELS OF NFκB DYNAMICS http://www.signalingsystems.ucla.edu/webmodel/view.DetailSelectModel.php
This web-interface allows access to a series of mathematical models that simulate NFκB dynamics in response to different stimuli. The user may investigate the effects of knockouts or kinetic reactions on the dynamics of NFκB.
Immunol Rev. 2012 Mar;246(1):221-38.

py-SUBSTITUTION
Distinguishing between kinetic and static features within a molecular network
http://www.signalingsystems.ucla.edu/models-and-code/
Typical formulations of dynamical systems models of molecular networks involve kinetic parameters that affect both the abundances and the flux of molecular species. py-Substitution allows for analytical expressions of the steady state that enable the study of abundances and fluxes separately.
PLoS Comput Biol. 2013 Feb; 9(2): e1002901

SPOTLITE
https://lbgsites2.bioinf.unc.edu/spotlite/ Web Application and Augmented Algorithms for Predicting Co-Complexed Proteins from Affinity Purification – Mass Spectrometry Data.

Interpreting and modeling structural data

3D Profiling
3D Profiling of a protein structure is a method to identify other protein sequences that fold in the same way as the profiled protein. Here we apply 3D profiling to identify LARKS (Low-complexity, Amyloid-like, Reversible, Kinked Segments) in the human proteome which may recruit proteins into intracellular bodies. Based on the atomic structures of four LARKS determined in our lab, we have identified more than 1500 candidate LARKS in the human proteome. Many of the candidate LARKS are in proteins already known to participate in dynamic intracellular bodies, including stress granules and P-bodies.

Protein Crystallography
Protein crystallography has advanced to interrogate nano-scale crystalline assemblies. The tools required for frontier analyses of protein structure produce more data than ever before. New detectors are producing data at rates that parallel those of web video traffic. The reduction of ever-larger data sets is therefore an important challenge for the future growth of these technologies. Examples of the new kinds of data that require interpretation and reduction include those collected from experiments in femtosecond diffraction using x-ray lasers, multi-dimensional electron nano-diffraction, and diffractive x-ray imaging. I provide examples of experiments in which the growth of data production is quickly outpacing our reduction capabilities and limiting in-line data processing. Solutions that reduce the complexity of these data could facilitate the next generation of rapid structural analysis.

Interpreting Clinical Data

CancerLocator
CancerLocator exploits the diagnostic potential of cell-free DNA by determining not only the presence but also the location of tumors. CancerLocator simultaneously infers the proportions and the tissue-of-origin of tumor-derived cell-free DNA in a blood sample using genome-wide DNA methylation data. We comprehensively evaluate CancerLocator with simulations and real data, and compare its performance with that of two established multi-class classification methods. We show that the predicted tumor burdens are highly consistent with the true values.
Genome Biol. 2017 Mar 24;18(1):53