Blog @ Illumina
Real scientists. Real commentary.

Perspectives from the Illumina User Group Meeting

Abizar Lakdawalla
| Feb 16, 2012

For me, the highlight of this first day of AGBT was the Illumina User Group meeting, mapping out where our sequencing platforms are heading. Got in a bit late and missed the beginning of Jay Flatley's talk, but what I did hear was impressive. He had some clear comments about business ethics in the highly competitive NGS environment. Life Tech's email to their customers in response to the Roche offer on Illumina seems to be the crux, and Jay was pretty authentic in his response to some of these competitive tactics. He also walked through the Roche offer history and the emphatic response from the Illumina board. Jay expressed his enthusiasm about how much more there is to Illumina's sequencing technologies in the future and urged the 200 customers in attendance to stay tuned as we continue to innovate.

Geoff Smith from Illumina UK, dived right into the recent developments that were announced in January at the JP Morgan Healthcare Conference. With the HiSeq 2500, in addition to the 600 Gb standard run, you can flip a switch and perform a "turbo-charged" run of 2 x 100 bp (120 Gb) in just 27 hours. A 2 x 50 bp run would be done in < 8 hrs, including cluster generation directly on the HiSeq 2500!  Very high quality data was demonstrated on the HiSeq 2500, which also delivers new flexibility, allowing users to tune the throughput to the specific experimental need.

Illumina is also focused on speeding up sample prep, with a new protocol in the works that uses TruSeq library prep on 500 ng DNA, with a PCR-free and gel-free workflow to shorten sample prep to just 3 hours with even better data quality. So the total process time is 27 hrs to sequence, 20 hrs for data analysis (variant calling). Geoff showed an amazing calendar of sample sequencing runs on a prototype instrument this year alone. In 3.5 weeks, they generated about 2 Tb of data with > 90% of data with greater than Q30 from 16 samples. Some of the samples were prepped the same day the sequencing run started. This machine is not vaporware, kids.

Geoff jumped into MiSeq performance improvements next. In 2012, MiSeq throughput increases to 7 Gb of data from 15 M clusters and a run time of < 24 hrs for 2 x 150 bp. The new chemistry improvements have shaved a minute and a half off the cycle time in  the first version of MiSeq, and increased data amounts are from imaging more flow cell area, including the second surface. But what is even more exciting is the launch of a new 2 x 250 bp chemistry. Even at 250 cycles, the raw error rate is only 0.69%. (with 8 Gb from 16 M reads)! Internally, the new chemistry has been used to generate 2 x 400 bp, but with an error rate of 2% at 400 cycles, though 80% of the bases are still over Q30. Effort is underway to optimize the 2 x 400 reads to achieve the target accuracy requirement. As a teaser, Geoff also showed a 678 bp overlapping read, with zero errors. He confessed to being "stupid to only use a 500 bp human library" for this experiment. That data is comparable to the best data generated on a capillary sequencer!

GIAD data set resized 600

Sheila Fisher of the Broad Institute talked about their experiences. Since July 2011, they have performed over 220 MiSeq runs, reporting excellent data quality and very low failure rates right out of the box. This powerhouse genome center runs a wide variety of applications, the biggest being library QC, followed by amplicon sequencing for mutation validation. The group ran 85 index QC runs representing 8000 samples, which is much greater than 1000 production lanes on HiSeq! The Broad QC run is 8 cycles and takes only 2 hrs and 40 minutes to complete. "This is a dramatic difference, we just took two weeks off our process time!" quoted Sheila. The MiSeq and HiSeq data are totally concordant. In fact, the MiSeq 2 x 150 bp quality is slightly better than 2 x 100 bp on HiSeq, and therefore might be better suited to certain applications- for example, bacterial de novo assembly. They have achieved 2 x 250 bp using standard reagent chemistry, maintaining quality above Q20, and getting 3.84 Gb per run. Another application suited for MiSeq, is de novo viral assembly. Sheila suggested that they will soon switch over all viral work from 454 to MiSeq."

Sheila concluded with some love for the HiSeq 2500, predicting the fast turnaround and high data quality would have a huge impact on the genome center. A Coriell DNA sample was run this through a modified PCR- and gel-free sample prep protocol. The total process time was 53 hrs, yielding 127 Gb of passing filter reads. It actually took longer to transfer the data from Chesterford UK to the Broad than it did to sequence the sample. Looking at the data slide Sheila stated "There is really no bias here! This is the best bias curve I have ever seen! Regions which had gaps before now also look good!"

Stephan Schuster from Penn State focused on de novo sequencing applications which have really been enabled by MiSeq. The first project done on the MiSeq had an interesting experimental design, researchers infected themselves serially with Helicobacter pylori with intervening antibiotic therapy. Stephan used an approach very similar to sequencing by genotyping to figure out H. pylori allele distribution for these samples. In the second project, he talked about the sequencing of an ancient polar bear jaw bone (twice as old as the mammoth DNA) with contemporary polar bear samples. He prepared 24 amplicon libraries per MiSeq run at only $41 sequencing cost per sample. He got quite amazing genotyping calls, with a call ratio of 3,000:0 for homozyogous, and almost an exact split for heterozygous alleles. In the third project he is using the MiSeq for plant genomes, they had limited funding to sequence a parasitic plant. The first run of the MiSeq produced 1x sequence coverage, but this was enough to get 30% better assembly. In the last project, he talked about mtDNA sequencing of flies (ancient species that have pre-dated dinosaurs). He used two long PCR products to cover the whole mtDNA (other than 16S) and in a single run, did all 20 complete mt genomes. This is a great example of how the MiSeq has enabled a small lab to take on a diversity of projects and deliver data in a surprisingly short time.

Stephen Kingsmore from the Center for Pediatric Genomic Medicine in Kansas City gave an inspiring talk on the NGS revolution in the clinical space, and how the higher-speed HiSeq 2500 has the potential to provide data in a short time, making clinical intervention possible for many neonatal diseases. He walked through two specific examples: one child with a skin lesion disease, and another with lactic acid disease. Sequencing quickly pointed out the mutations with an indication of a mitochondrial defect for the lactic acid phenotype.

Really awe-inspiring science, and a fantastic end to the first day @AGBT.