Workshop Description

This course is designed to take researchers with no genomic experience through variant calling based on the Broad Institute’s best practices for GATK. This workshop will be designed to take users from raw sequence data to called variants with emphasis placed on understanding when to apply the best practices or when a particular study or system dictates that you deviate from them. We’ll emphasize reproducibility of data processing and combining data sets with shell scripts and for-loops to minimize accidental processing errors. By the end of the course, students should feel comfortable working through their own data and have sufficient knowledge to know when and how to deviate from best practices.

Day 1

  • Computer Setup
  • Data pre-processing and reference genome indexing
  • Mapping sequence reads to reference
  • SAM to BAM files and some BAM processing
  • Processing BAM files (cleaning, indexing, read groups, sorting, duplicate marking)
  • Genotyping individuals to GVCF files

Day 2

  • Joint variant calling to VCF files.
  • Combining/adding data to studies at later times.
  • Processing raw VCF files
  • Variant recalibration or hard filtering of VCF files
  • Evaluation of final variant calls
  • Processing VCF files for analyses (if time remains)

Technical Requirements

  • A laptop with workshop virtual machine installed (virtual machine will be provided for installation).
  • Necessary software will be provided on virtual machine
  • W1: Unix command line would be a useful, but not necessary, prerequisite


Peter Scott is a postdoctoral researcher in the laboratory of Dr. Brad Shaffer. He received his PhD in Ecology and Evolutionary Biology from the University of Alabama. Peter’s primary research interests lie in applying genomic methods to understand species limits and diversification, hybrid zone dynamics, and landscape and conservation genomics in reptiles and amphibians. Additionally, he is interested in investigating how to best apply and adapt modern genomic methods to non-model systems that are representative of difficult evolutionary questions (e.g. resolving relationships in recent, rapid radiations), or that push the limits of these technologies (e.g. organisms with very large genomes).


Workshop Details

Prerequisites: W1, 2, or
equivalent knowledge.
Length: 2 days, 3 hrs per day
Level: Introductory
Location: Collaboratory Classroom  (Boyer Hall, 529)
Seats Available: 28

Fall 2018 Dates

November 15 and 16, 2018
9:30 AM – 12:30 PM