Blog @ Illumina
Real scientists. Real commentary.

HiSeq X Ten Data Available in BaseSpace

by
Andrew Boudreau
| Sep 05, 2014

When we introduced the HiSeq X Ten, the positive reception was staggering - not only from the sequencing-focused scientific community, but also the general public. The long discussed “thousand dollar genome” was now an achievable reality that captured the attention of the scientific and popular press, as well as social media channels worldwide. Despite widespread enthusiasm for this system and the population-scale, high quality sequencing it makes possible, seeing and touching – much less running samples– on an X Ten is not something the average scientist is able to do today. Maybe you’re thinking of buying an X Ten system, or perhaps your existing Illumina sequencers fit your throughput needs, but you’re eager to compare your own data to data from the HiSeq X. Thanks to the cloud and the accessibility afforded by BaseSpace, your opportunity to examine and analyze HiSeq X data has arrived.

If you have a BaseSpace account, exploring HiSeq X whole human genome data is as simple as clicking here (and if you don’t already have a BaseSpace account, sign up is free and quick). As you’ll see, the HiSeq X Ten project we’ve uploaded into the Public Data section of BaseSpace is named  “HiSeq X Ten: TruSeq Nano (4 replicates of NA12878)”. As the project name indicates, we sequenced the same sample (NA12878) in parallel in four different lanes of the HiSeq X. The samples were prepared with theTruSeq Nano DNA Sample Preparation Kit, and the sequencing was paired-end (2 x 151). For each distinct flow cell lane, we performed two sets of analyses. In the first analysis, we aligned the raw reads using the BWA aligner against the UCSC hg19 version of the human reference, then called variants using the GATK Variant Caller (v1.6.22). In the other analysis, we used the same human reference version and performed both alignment and variant calling using Isaac, an ultra-fast analysis tool authored by Illumina. We have uploaded the raw read data from four of the lanes (lanes 1, 3, 5 and 7) to BaseSpace, and stored the two sets of analyses conducted from both lanes 1 and 3. The analysis run time in BaseSpace averaged about 12 hours per sample for Isaac, while the BWA equivalent workflow took about 50 hours per sample, demonstrating the lightning fast speed of the Isaac alignment and variant calling workflow.  Upcoming parallel versions of the Isaac App will be able to process many samples at once, providing even faster analysis completion

Examining the project run summary report in BaseSpace, perhaps one of the most immediately apparent and impressive metrics is the uniform and outstanding depth of coverage across each sample.   Below is a coverage histogram (utilizing alignment with BWA) from flow cell lane 1:

Though we won’t list all four coverage histograms here (you can see them all, along with coverage on a per-chromosome basis, in the “Analysis Reports” section of the project in BaseSpace), coverage is remarkably repeatable across both samples, for both sets of analysis.

 

Turning to variant calling, we observed remarkably consistent variant calling across all replicates. The table below compares the analysis of all four replicates:

1 – Illumina HiSeq X Replicate 1

 

2 – Illumina HiSeq X Replicate 2

 

3 – Illumina HiSeq X Replicate 3

 

4 – Illumina HiSeq X Replicate 4

This shows remarkably consistent, high quality data across all replicates, similar to the quality and reproducibility you see with other Illumina platforms.

 

All the data we used is available to you in BaseSpace, along with all the tools needed to do these analyses and more. We hope you enjoy checking out the data from the system bringing the $1,000 genome to fruition!

Comment

  1.