Blog @ Illumina
Real scientists. Real commentary.

Utilization of New DNA Sequencing Technologies: An AGBT Tutorial Workshop, Part 2

Abizar Lakdawalla
| Feb 16, 2012

[Continued from previous blog post]

Matt Blow, Department of Energy, Joint Genome Institute
Discussed sequencing single cells to metagenomes. Primary projects involve plants, fungi, microbes, metagenomes, mostly de novo assemblies, where gap filling remains a challenge. The main reasons for these gaps are: repeats (e.g., wheat, 90% repeats), polyploidy (e.g., switchgrass = tetraploid), extreme sequence composition (e.g., microbes with 70% GC), uneven sequence abundance (e.g., soil metagenomes contain 105 species and orders of magnitude differences in abundance). Hybrid assembly with Pac Bio and Illumina is a popular route to reduce the number of gaps.

In the latter part of his talk, he presented whole-genome sequencing of single microbial cells. The process involves flow sorting single cells, followed by genome amplification and sequencing with computational normalization of coverage by downsampling overrepresented regions. An ambitious 300 single cells are expected to be sequenced in 2012. Conclusion: Short read assembly works, and PacBio hybrid assembly helps. Question: How does PacBio assembly by itself look? Matt's answer: not done as yet, high error rate might interfere with assembly, though rapid improvements are expected.

Shawn Levy, HudsonAlpha Institute
Discussed Transcriptomics (at last, a nice short title!). Most of the focus was on automated protocols, including sample handling and library prep. Sample quality control is critical. With the TruSeq RNA kits, they use just 100 ng of input RNA. Shawn gave a couple of biological examples, including a mouse knock-in model where they identified unique, novel SNPs at splice sites. They typically run experiments in large batches, requiring runs on 7 HiSeqs and 4 Genome Analyzers, equating to 13 flow cells, 104 pooled sample sets. Normalization after post-fluorescence quantitation gives more uniform cluster density, and automation always produces more uniform cluster densities. A bit of a surprising data point, even if the fluorescence quantitation is the same, different library preps give different cluster numbers. TruSeq gave the highest cluster numbers for a given amount of quantified DNA.  qPCR also valuable for quantitation. The other surprise was that bead-based purification essentially had zero loss of DNA even over multiple purification cycles as long as the beads are retained in the tube for the process steps. This is important for very low input DNA preps.

Olivier Elemento, Weill Cornell Medical College
Spoke about Epigenomics. He walked through the three main approaches: ChIP-Seq, bisulfite sequencing and Hi-C (chromatin interaction). Olivier talked about ChIP-Seq data analysis packages (MACS, PeakSeq, Cistrome, etc.). An ideal data set would show >90% alignment, >80% unique mapping with low PCR duplicates (<20%), high signal to noise ratio (for beautiful track visualization), and the expected motif region is highly enriched (I thought all data sets were like that!). They integrate multiple Chip-seq experiments by performing principal component analysis of binding patterns. Perturbation experiments are key to finding the true targets, functional binding sites of transcription factors. He then moved on to bisulfite sequencing. The primary limitation of BS-seq is the high cost due to extensive coverage required, therefore reduced representation bisulfate sequencing (RRBS) libraries are attractive. One HiSeq 2000 lane gave >10x coverage for the 2-4 M CpGs sampled. Olivier showed an example for prostate cancer with linkage clustering on 3868k CpGs. The correlation with gene expression was not as strong as I thought it would be, but there is always more hidden biology than meets the eye.

In the final part of his talk, Olivier described an involved protocol for chromatin interaction with Hi-C protocol based on Lieberman-Aiden et al., to quantify chromatin proximity. Sexy, 3D models of chromatin interactions and animated changes in ERG overexpression in prostate cancer cells got everyone's attention.

Lisa White, Baylor College
Talked about sequencing as an assay in the CLIA lab. This was a very good introduction to a bit of a different topic. Lisa talked about the requirements for College of American Pathologists (CAP) accreditation guidelines in the US. The overall process for the sequencing workflow includes a substantial pre-processing set up. The data delivery aspects are also significantly more complex than I had anticipated. The sign-off requires confirmation of data followed by geneticist review, generation of a report with diagnostic information, and genetic counselor approval. The report is delivered to the physician, who finally relates it to the patient.

Examples were provided from Baylor's Cancer Genetics lab using a cancer panel based on multiplex PCR on the Ion Torrent system, mtDNA sequencing on HiSeq after hybridization pulldown, and exome sequencing with Nimblegen in combination with HiSeq. The analysis pipeline for each assay was described, as well as the various reviews confirmations, and reports. Lisa's presentation was a real education detailing the intricate process required for moving sequencing into clinical use. Check out their website:

Eric Johnson, University of Oregon
Implored us to Sequence, and ye shall find: Genotyping by sequencing. Cute title. Was not sure what it meant though. After explaining that genotyping by sequencing (GBS) means "reproducibly sequencing a subset of the genome," Eric explained the value and the tradeoffs in identifying novel polymorphisms this way: reduced cost, but lower throughput.  Multiple applications with GBS were described, including RAD (restriction site associated DNA). In RAD, gDNA is cut with a frequent restriction enzyme, and then Illumina sequencing adapters are added to the restricted DNA followed by sequencing of the library. He showed a few interesting case studies:

  • Mutation mapping in temperature sensitive C. elegans with EcoR1 RAD markers
  • SNP markers to produce genetic maps in gar fish
  • RAD-seq genotyping on natural populations with phenotypic differences in threespine stickleback

And so ended this year's stellar Technology Tutorial sessions. After the last fish slide slipped from view, I was off and running to the Illumina User Group meeting.