Blog @ Illumina
Real scientists. Real commentary.

Long Reads Enable Accurate Rice Genome Assembly

by
Rey Mali
| Jun 25, 2014
TruSeq Synthetic Long-Read technology combines shorter sequencing reads to bioinformatically produce high-quality, accurate genome assemblies.

Many estimates of global food consumption indicate that rice, Oryza sativa, composes the majority of at least half of the world’s daily diet. The draft rice genome was completed in 2002, making it the second plant to be sequenced after the model plant Arabidopsis. In the past twelve years, our knowledge about the functional characterization of the rice genome has grown considerably. Comparing the organization of the genome structure, genes, and intergenic regions between strains and other cereal grass species helps pinpoint regions that are highly conserved or those that are rapidly evolving. This information provides insight into genome evolution, speciation, and domestication, and improvements in genome sequencing technology continue to propel agricultural research.

Traditionally, next-generation sequencing approaches have relied on short, paired-end reads. Longer sequence reads can facilitate alignment and improve the accuracy of genome assembly by providing insight into traditionally challenging regions, such as stretches of repetitive elements. This capability is especially useful for assembling  genomes such as rice, which although fairly compact at around 400 Mb, contains paleopolyploid features1, over 35,000 coding genes, and is riddled with transposons2.

Below is a data summary from a genome assembly of Oryza sativa japonica strain Nipponbare using TruSeq Synthetic Long-Read technology. This technology combines a unique assay, sequencing and analysis to generate synthetic long reads for genome finishing. Starting with the TruSeq Synthetic Long-Read DNA Library Prep Kit, genome libraries are prepared for Illumina sequencing. The TruSeq Long-Read Assembly App, offered in Illumina’s cloud computing environment, BaseSpace assembles the shorter reads into long fragments. Parameters indicated in Table 1 were used to calculate assembly quality. The size distribution of end-marked long reads was also generated with the TruSeq Long-Read Assembly App. The fully assembled long reads are arranged according to read length (Figure 1), with the median read length of ~8–10 kb.




The TruSeq Synthetic Long-Read technology combines shorter sequencing reads to bioinformatically produce high-quality, accurate genome assemblies. This technology will provide new insight into traditionally challenging regions and can create comprehensive scaffolds for newly sequenced genomes.

Related Information:

For more information about TruSeq Synthetic Long-Read Technology, please visit our website, or download the data sheet.

To access the data set described here, please visit BaseSpace (free login required) and read more about the app in the BaseSpace blog.

References:

  1. Levy A.L. and M. Feldman (2002) The impact of polyploidy on grass genome evolution. Plant Physiology 130(4) 1587–93.
  2. International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436, 793–800.
  3. Gurevich A, Saveliev V, Vyahhi N, Tesler G. (2013) QUAST: Quality assessment tool for genome assemblies. Bioinformatics 29(8): 1072–75.
  4. Kurtz S, Phillippy A, Delcher A, Smoot M, Shumway M, Antonescu C, et al. (2004) Versatile and open software for comparing large genomes. Genome Biology 5:R12

 

Comment

  1.