NextSeq 500: Versatility for RNA-Seq
Baseball fans will recognize the value of the 5-tool player: an exceptional athlete who has power, can hit for average, has speed to steal bases, along with exceptional fielding and throwing skills. These players are highly prized by teams looking to acquire a player who they might be able to develop into “the next Willie Mays”.
The newest member of the Illumina team of sequencing platforms is the NextSeq 500, a versatile sequencer not unlike a 5-tool baseball player, providing all of the functions needed for today’s NGS researcher. It is powerful, providing as much data as the current HiSeq 2500 system when used in RapidRun mode. It’s fast, completing standard 2 x 75 paired-end RNA-Seq reads in less than 18 hours. The NextSeq 500 is also easy to operate, using similar cartridge-based reagent kits and touch-screen controls pioneered on the MiSeq. It is flexible, since it can run various configurations of flow cells and read lengths, suitable for a wide range of NGS applications. And, being tightly integrated with new bioinformatic tools in BaseSpace, it has strong connectivity to downstream data analysis.
RNA-Seq is one of the applications that is well suited to a versatile instrument such as the NextSeq 500. RNA-Seq provides a quantitative view of gene expression by sampling, or counting, the transcriptome. How deeply the transcriptome is sampled depends on the application, with simple gene-level expression profiling applications which require only about 10 million reads, to the detection of transcript isoforms which requires 25 to 100 million reads, depending upon the precise application and sample prep kit needed for any study. By providing a total of more than 400 million paired-end reads per run, the NextSeq 500 hits the sweet spot for a variety of RNA-Seq experiments which are often in the range of 4 to 24 samples. Both individual users, and core labs dealing with a variety of RNA-Seq experiments, are going to enjoy the versatility and speed the NextSeq 500 system offers, especially when considering its comparatively low operating cost.
Connectivity of NextSeq 500 to BaseSpace
One of the most important new features of the NextSeq 500 system is the built-in connectivity to downstream data analysis tools in BaseSpace. For the first time, NGS users will not be required to have a server attached to their sequencer, since the system is designed to stream data directly to the BaseSpace analysis environment in the cloud.
The first RNA-Seq applications that Illumina is launching in BaseSpace include the widely-used TopHat and Cufflinks suite of tools that have been packaged together into truly easy-to-use “apps”. These software tools are as simple and intuitive to use as most apps that you have on your iPad or iPhone. The TopHat app will perform alignment and provide a large amount of key QC and run metrics. You will also have the option to enable discovery of fusion transcripts using the TopHat Fusion algorithm, for instance, if you are studying cancer. The Cufflinks app will support novel transcript assembly and then perform differential expression analysis at both the gene and transcript level. These apps will also provide several types of raw data output that can be downloaded so that experienced users can apply the apps in combination with other downstream analyses, like R-tools, or for use in other local pipelines, or even other BaseSpace apps.
Put it this way, these analysis tools are so easy that even I can run them – and I have no training in bioinformatics and had never run any RNA-Seq pipeline myself before using BaseSpace!
NextSeq 500 Data Quality Comparison
The NextSeq 500 system uses an innovative new 2-dye sequencing chemistry that is distinct from other Illumina SBS sequencing platforms. Understandably, this can raise some questions with regards to the data quality for RNA-Seq, yet we have already seen that the NextSeq 500 system shows remarkable concordance with Illumina’s traditional 4-dye sequencing platforms. My group in R&D has run many RNA-Seq samples on the NextSeq 500 system and we have looked very closely at the system performance. One of our first observations was that it was relatively easy to exceed the 400 million PF read specification for the system when using RNA-Seq libraries. In fact, most of the runs we have done typically have closer to 500 million reads.
The NextSeq 500 system can generate a lot of reads, but how does the data compare to the other well established Illumina platforms? The data shown below give an idea of the overall agreement between NextSeq 500 data and that generated on MiSeq and HiSeq. The top panels show correlation plots of gene counts (FPKM) observed when running the same RNA-Seq library on NextSeq, HiSeq, or MiSeq platforms. The panels on the bottom show the concordance of fold change ratios between two samples (UHRR and Brain) for all genes that have significant differential expression using Cufflinks for these two samples across NextSeq vs. either HiSeq or MiSeq. You can see that in both types of comparisons – comparison of counts FPKM or gene expression ratios – the NextSeq 500 system generates data that is entirely consistent with the other two Illumina sequencing platforms that are in use today.
Figure Description: The samples used in this analysis are made with standard RNA from UHRR and Human Brain, which are two samples we have been using for years in development dating back to the MAQC project [Shi et. al., Nat Biotechnol. 24:1151-61, (2006)]. The libraries were prepared with the Illumina TruSeq Stranded mRNA Sample Prep Kit. TOP The two panels on the top show the correlation between gene counts for NextSeq 500 compared to both HiSeq and MiSeq data. These counts were generated using the Illumina TopHat Application that will soon be available in BaseSpace. BOTTOM The two panels on the bottom show the Log2 Fold Change Comparisons between differentially expressed genes in Brain/UHRR samples for HiSeq and MiSeq relative to the same fold-change calls made with NextSeq 500. For each panel the linear regression value (R2), slope of the best fit line, and number of genes included in each plot is given.
Besides looking at gene count comparisons, for me one of the most satisfying aspects of analyzing RNA-Seq data is that you can visualize the results in amazing detail, with your own eyes, using a genome browser. The next figure shows gene coverage plots of the human GAPDH and CALR genes with data from all three Illumina platforms: the NextSeq 500, HiSeq, and MiSeq systems. As you can see in these plots, the detailed, base-by-base patterns of aligned reads across these genes are truly identical for all three platforms. Although each of the three platforms uses different SBS sequencing and clustering chemistries, along with entirely different instrumentation and hardware (lasers, optics, cameras, flow cells, etc.) this base-by-base visualization shows that all three platforms are sequencing, and counting, the same molecules with very little, if any, detectable system-to-system bias.
Figure Description: These two panels show Integrated Genomics Viewer (IGV, available from the Broad Institute) browser shots of RNA-Seq data. The read coverage of two human genes, GAPDH (upper) and CALR (lower), from sequencing data generated on the NextSeq 500 system (top, orange), the HiSeq 2500 system (middle, blue) and MiSeq system (bottom, purple).
Looking Back at an Evolving Game
Back in the early days of RNA-Seq we generated the original “body map” data using the Genome Analyzer II (GAII), which was the first Illumina sequencing platform. This data was collected in the spring of 2007 and eventually became one of our first RNA-Seq papers when published in Nature in late 2008 as part of our collaboration with Chris Burge and his lab at MIT [Wang et al., Nature 27, 456:470-6 (2008)]
In that era of Illumina sequencing, generating 400 million reads, like we used in this dataset, required more than 20 flow cells, or complete runs, on the original Illumina GAII. Run setup was lengthy and difficult, and required a completely separate machine called the Cluster Station to grow clusters on the flow cells. The turnaround time for a single 32 bp read was almost 3 days, and the GAII could not read longer than 35 bps and also could not perform paired-end reads.
While the GAII was a star player in ’07 and ‘08, it now pales in comparison to today’s platforms. For instance, if we wanted to re-generate all of the data used in our first body map study, we could do it with a single 18-hour run on the NextSeq 500 platform. The cost would be a fraction of that original study, we would be able to use much more powerful 2 x 75 bp reads, and the quality of the data would be much higher too. It is amazing how the sequencing game has changed!
The greatest 5-tool players in baseball have changed the game and raised the bar for how other players are evaluated. The ultimate 5-tool player was Willie Mays, who hit a home run for his first hit in the major leagues and went on to become Rookie of the Year. In 1954, the National League MVP led the Giants to a World Series Championship. Many years of exciting baseball followed.
Here’s your chance to get the newest 5-tool player from Illumina. This powerful, economical and versatile system will change the way that you view desktop sequencers. What will the NextSeq 500 do for you?