Blog @ Illumina
Real scientists. Real commentary.

A First Look at RNA-Seq Data on the HiSeq 4000 System

by
Stephen M. Gross, Smita Pathak, Lisa C. Watson, and Gary P. Schroth. Illumina Applications Group
| Jan 12, 2015

patterned flow cells enable a greater diversity of applicationsToday, Illumina announced the HiSeq 3000 and HiSeq 4000 sequencing platforms—the first Illumina sequencers that use patterned flow cells for a more diverse set of applications. Patterned flow cells contain billions of nanowells at fixed locations providing even cluster spacing and uniform feature size to deliver extremely high cluster densities. Is your research focus on microbes, metagenomics, plants, or agriculture? Or are you a fan of human exome sequencing and RNA-Seq? Your next sequencing workhorse is on the horizon. Inspired by technologies pioneered on the HiSeq X Ten, these new systems are built to be the sequencer of choice for genomics labs with a focus broader than human whole genome sequencing.

Our Illumina Applications group specializes in the development of new technologies for Illumina platforms with a focus on RNA-Seq. Part of what we do includes developing new methods for RNA-Seq, such as TruSeq RNA Access, and working closely with collaborators on cutting-edge science. In addition to these activities, we are also continually evaluating Illumina products with a customer’s critical mindset to help ensure that what we offer exceeds expectations and improves with each version. The HiSeq 3000/HiSeq 4000 system with patterned flow cells is an intriguing engineering advance that increases the  combinations of technologies available on Illumina’s three high-throughput sequencers used for core RNA-Seq applications (Table 1).

Platform
Clustering
Capability
Flow cell Cluster
Generation
 
 SBS Chemistry
 NextSeq  On-board  Random  2-channel
 HiSeq 2500
 cBot  Random  4-channel
 HiSeq 4000
 cBot  Patterned  4-channel

 

Given these technical differences, like any Illumina user, we need to evaluate new platforms and determine if they are suitable for the experiments we do. A year ago, we were pleased to introduce RNA-Seq data from the NextSeq 500 platform. This year, we are similarly thrilled to show that data from the HiSeq 4000 patterned flow cells correlates highly with data produced from our HiSeq 2500 and NextSeq 500 systems, as demonstrated by evaluating gene expression counts from RNA-Seq libraries sequenced across the three platforms (Figure 1, click to enlarge).

 

Figure 1: High correspondence between the HiSeq 4000 system and other Illumina platforms. Gene-level expression is shown as fragments per kilobase of transcript per million fragments mapped (FPKM) [Trapnell et al. (2010), Nature Biotechnology, 28, 511-515]. Scatterplots demonstrate high correlation between identical libraries sequenced on HiSeq 4000, HiSeq 2500, and NextSeq 500.

RNA-Seq experiments are, by their nature, a lot about counts and statistics. But there is no better way to get a good feel for data quality than to take a look at the alignments by eye using a genome browser. We have a few genes we like to evaluate, and you can see the high correlation in RNA-Seq read coverage between the three sequencing platforms for two of these genes, GAPDH and CALR (Figure 2, click to enlarge). These figures show that not only are the gene level counts identical between the three platforms, but the sequence-specific patterns of coverage that we observed in RNA-Seq is also very consistent.


Figure 2: Coverage of GAPDH (top) and CALR (bottom) on HiSeq 4000 correlates with HiSeq 2500 and NextSeq 500. Integrative Genomics Viewer [Thorvaldsdottir et al., (2013) Brief Bioinform 14, 178-192] screenshot displays bedgraph files generated from alignment of a human reference brain mRNA library sequenced on indicated Illumina platforms. Green: HiSeq 4000, Blue: HiSeq 2500, Orange: NextSeq 500.

If you are interested in checking out data, we invite you to access some or our HiSeq 4000 RNA-Seq data on BaseSpace and see the data for yourself (free login required). This dataset encompasses eight RNA-Seq libraries generated from human universal reference RNA and human reference brain RNA prepared using both the TruSeq mRNA and TruSeq Total RNA kits, sequenced on one lane of a HiSeq 4000.

This data we’re sharing equates to a total of 387.5 million high quality 2 x 76 bp paired-end reads. In some of our testing of the HiSeq 4000 capabilities, we’ve been able to generate over 400 million quality RNA-Seq reads per lane—that equates to an impressive 3.2 billion reads per flow cell! As we’ll discuss towards the end of this post, Illumina’s BaseSpace cloud storage and analyses environment is designed to make analyses fast and easy, a must for dealing with the data output of HiSeq 4000. In the data that we are sharing, the eight libraries were each sequenced to a depth of approximately 50 million reads. Using the BaseSpace TopHat Alignment and Cufflinks Differential Expression apps, we were able to complete a full analysis in just over 12 hours. This analysis includes use of TopHat-Fusion [Kim et al., (2011), Genome Biology, 12, R72] a tool for discovery of gene fusions from RNA-Seq data. It’s nice to have the capabilities to turn data to insights so quickly using the power of cloud computing.

 As ‘big’ as the new HiSeq 3000 and 4000 sequencers are, they are a part of a growing family of products that enable ‘sample-to-answer’ solutions for NGS experiments. Recognizing that library preparation is time consuming, we announced the NeoPrep automated library preparation system last year. Illumina has been working hard to get NeoPrep ready for customers and demonstrate its capabilities for applications such as low-input RNA-Seq. The output of NeoPrep (16 libraries) matches well with our flagship sequencing systems. For example, we’ve found ~50 million paired-end reads from an mRNA library is a robust target for many RNA-Seq discovery applications. The 16 libraries produced on NeoPrep can be sequenced to a depth of 50 million reads on just two lanes of a HiSeq 4000 flow cell. Or, if you have one of our versatile NextSeq 500 systems, 8 NeoPrep RNA-Seq libraries pair nicely with a high output NextSeq 500 kit.

We recently tested a set of NeoPrep RNA-Seq libraries on the HiSeq 4000. Not only did the libraries sequence equivalently well on a HiSeq 4000 as a HiSeq 2500, but the technical reproducibility provided by NeoPrep is remarkable (Figure 3, click to enlarge). We’re excited about getting this into our customers’ hands soon.  

 

Figure 3: Gene-level gene expression correlation plots demonstrating technical reproducibility of NeoPrep RNA-Seq library preparation. Eight mRNA-Seq libraries were prepared on NeoPrep, alternating universal human reference RNA (UHRR) and human reference brain RNA in adjacent wells, and sequenced on a HiSeq 4000. Scatterplots demonstrate high correlation between like samples (UHRR vs. UHRR or Brain vs. Brain) and many differentially expressed genes in the UHRR vs. Brain comparisions. In these plots the gene-level expression is shown as log10 transformed FPKM.

Generating libraries and sequencing, however, are only the first steps. Like the MiSeq and NextSeq, the HiSeq 4000 is BaseSpace-enabled and can stream data to our cloud analysis environment. BaseSpace has grown substantially over the past year, adding many new features to enhance usability, and many new sequence analysis apps. The RNA-Seq apps from Illumina were launched in spring of last year, followed by new applications for analyses of 16S and shotgun metagenomics data, de novo assembly, and improvements to the exome sequencing apps.

For those new to sequencing technologies, BaseSpace is a great way to get started and explore your data without having to make major investments in computer hardware or learning command-line software. If you are an advanced user, you’ll appreciate BaseSpace cloud storage solutions and easy tools for first-pass analyses and quality control before expending your own compute resources. We know many HiSeq customers are technologically savvy and have built their own storage systems and sophisticated analysis pipelines. If you already have a great pipeline you’d like to share, we’re hoping that you’ll become a BaseSpace developer and enable other Illumina customers to use the great tools you’ve developed.

At Illumina, we take pride in creating accessible, versatile sequencing products and software tools and we seek to continually improve our technologies. Yet, we recognize that sequencing is not an end in itself, but rather a tool for discovery. Challenges are now less about how to sequence, and more about interpretation, innovation, and insight. Our sample-to-answer solutions are designed to save time for your ideas to come to fruition, and to be ready for the next challenge.

 

Comment

  1.