Blog @ Illumina
Real scientists. Real commentary.

How Long-Read Sequencing Helps a Model Organism Define its Sense of Self

Dmitry Pushkarev
| Aug 05, 2013

botryllus schlosseriCommonly known as the golden star ascidian, Botryllus schlosseri is a colony-forming organism that grows in shallow and deep waters of the Atlantic Ocean, the North Sea, the English Channel, and the Mediterranean Sea. Members of the subphylum Tunicata, these urochordates are the closest living relatives to vertebrates. Colony-forming tunicates like B. schlosseri can recognize compatible individuals and fuse their blood vessels to form a single organism, whereas incompatible colonies reject others and remain separate. Botryllus is a model organism for understanding allorecognition, or recognition of self vs. non-self, with important implications for transplantation, fetal/maternal outcomes, and stem cell-mediated regeneration. It was these unique adaptations, their phylogenetic importance, and lengthy history of scientific study that prompted us to examine the genome of this colonial tunicate.

Sometime around 2010, I was in Stephen Quake’s Stanford lab working in collaboration with Irv Weissman’s lab on methods to sequence the highly complex and repetitive Botryllus genome. We needed long reads and low error rates to assemble the genome, but this technology didn't exist at the time. After trying all of the tools and sequencing technologies we had at hand (Illumina and 454)  and numerous genome assemblers, we realized that we could not assemble the genome to an acceptable level without performing complicated and prohibitively expensive sequencing of fosmid libraries–bacteriophage cloning vectors used to create stable libraries of large genome fragments.

Necessity being the mother of invention, we drew inspiration from fosmid sequencing, and decided to try long range PCR. After some tweaking of the method, we got long reads from Illumina shotgun sequencing. Using this novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome1. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which were confidently parceled into 13 (of 16 haploid) chromosomes.

Long-read DNA sequencing is great for overcoming problematic regions like repeats that make genome assembly a very complex problem. This technology could advance the progress of notoriously difficult-to-assemble genomes, such as the strawberry with 10 copies of each chromosome, or Paris japonica with a genome that is 50 times the size of the human genome, at 150 billion base pairs. Whole-genome assembly is also being used in projects like Genome 10K to capture the genetic diversity of vertebrate species and preserve the genomes of rare animals that are close to extinction. While the utility of genome assemblies is indisputable, the cost of sequencing continues to decrease, opening up the imagination on using long-read sequencing to make draft genome assembly of nearly anything affordable and routine.

Back in the Quake laboratory, with the Botryllus draft genome assembly, along with other available sets of genetic information, Ayelet Voskoboynik and collaborators continued research on Botryllus histocompatibility. They identified a single unique gene that confers allorecognition2, allowing the organism to participate in fusion or rejection with other individuals in a colony.  Like histocompatibility genes in higher organisms, this gene is polymorphic. Having an accurate draft genome assembly was critical to resolve this highly polymorphic region, and allowed us to find this gene.

We also realized that synthetic long read sequencing technology can be applied to other genomes, and I teamed up with postdoc Michael Kertesz to start a company based on this idea, and Moleculo was created. The rest, as they say, is history.

Related Links: 

The B. schlosseri genome is important new tool for studying the genetic basis of histocompatibility, immunity, stem cell regeneration, and vertebrate evolution:

More information about ongoing research on the Botryllus genome:

More information about Illumina Long-Read Sequencing services:


  1. Voskoboynik A, Neff NF, Sahoo D, Newman AM, Pushkarev D, et al., (2013) The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2:e00569.
  2. Voskoboynik A, Newman AM, Corey DM, Sahoo D, Pushkarev D, et al., (2013) Identification of a colonial chordate histocompatibility gene. Science 341, 384.