Blog @ Illumina
Real scientists. Real commentary.

Thursday Morning at AGBT

Abizar Lakdawalla, Ph.D.
| Feb 16, 2012

heavy security presence at the ILMN User meeting1Is it slightly strange, going to a nice breakfast at a prestigious scientific conference only to be faced with fully attired security guards checking badges at the entrance? Would they taser someone who tried to occupy the pastries?

The meeting is now at full capacity, and it was nice running into folks from Europe that I had the good fortune to interact with last year, including Joakim Lundeberg from SciLab in Sweden who I almost did not recognize due to a significantly greater amount of filamentous keratinous material around his chin. Chad Nussbaum kicked off the session with the suggestion that we vigorously and/or violently throw our cell phones (against the wall, I suppose). Alas I did not get to pick up a few free iPhones, as everyone chose simply to silence them.

Lynn Jorde from the University of Utah School of Medicinegave  a very good talk on whole genome sequencing for disease causing mutations & estimating human mutation rate. He started with a historical reference to the classical geneticist, JBS Haldane, who estimated human mutation rates to be~ 2 x 10-5 based on the prevalence of London hemophiliacs ,and that one third of these of cases were caused by de novo mutations. Lynn introduced the Miller syndrome phenotype (postaxial acrofacial dysostosis) presenting with missing digits, small jaw and facial malformations. The subjects are a single family with the first known re-occurrence of Miller syndrome, and a second family identified with the same. The genomes were sequenced by Complete Genomics at >50x coverage for 4 family members. To calculate the mutation rates, the whole genome sequence data had to be validated first. The 34k variants in the analyzed data supposedly included about 99.9% from sequenicng errors. These variants were validated with Agilent capture and re-sequenced on a Genome Analyzer. Only 28 variants were found to be real, confirmed by mass spec. After the validation steps, a mutation rate of 1.1 x 10-8 was calculated. Lynn showed a table of other mutation rate estimates, including those from trio sequencing from the 1000 Genomes Project, which were quite close to the 1.1 estimate. The exome and WGS sequencing data showed two autosomal recessive mutations, DHODH and DNAH5 (involved in primary ciliary dyskinesia, explaining some of the phenotype features). Both affected children inherited five disease-causing variants. This study was featured in one of the top stories of 2010 – "Family genomics links DNA to disease".

One of the challenges of whole genome sequencing is, for example, finding 3 million SNPs, 10,000 non-synonymous SNPS, and 100 loss of function. Now, which one or ones of these is disease-causing? An all-in-one software tool called VAAST, compares sequences with variant frequencies in dbSNP, 1000 Genomes, and other published genomes. The tool assesses the functional impact of amino acid changes, purifying selection, disease causing variants gene by gene, and pedigree info using a composite likelihood ratio test. Incorporating family data further improve the chances of getting the right gene. Exome analysis identified DHODH from just four families. VAAST indicated only two candidate genes, DNAH5 and DHODH.

A second demonstration of VAAST was shown with an analysis of a lethal progeria-like condition (Ogden syndrome) associated with the x chromosome. This appeared to be a brand new genetic disease stemming from a loss of function mutation in an acetyl transferase, with an expected inheritance pattern. A second, unrelated family has been identified with a similar phenotype, but same mutation is estimated to be 100 generations old. The question remains, will this analysis work for common diseases? For example, in Crohn's disease, GWAS alone allows 10% detection of the NOD2 mutation. With VAAST, that power climbs up to 80%.

In a study of cardiac septal defects within a multigenerational family, VAAST pedigree data immediately shows GATA4 as the causal gene. More controls improve the detection power. They identified 12 mutations in 600 Mb of sequence data, with a male mutation rate 5x higher than the female mutation rate. Harkening back to Haldane, who estimated that the male mutation rate can be as high as ~10x times higher than females, so it seems he was on to something there.

Heidi Rehm, from the Laboratory for Molecular Medicine, Partners Center for Personalized Genetic Medicine and Harvard Medical School gave us an interesting Perspective on Sequencing for the CLIA Lab.The growth of laboratory tests for specific genetic conditions has steadily increased, and now sits north of >150 tests available for cardiovascular conditions, cancer, hearing loss, pharmacogenetics, and other genetic syndromes. As more and more genes with potential therapies are identified and added to test panels, the more we understand about the prevalence of these alleles in the population. For example, no one ordered tests on the α-galactosidase A gene (GLA), until it was identified with an X-linked lysosomal storage disorder and a relatively prevalent cause of ventricular hypertrophy in 2% of patients. To screen for childhood hearing loss, the Affymetrix otochip with 19 genes started showing higher prevalence of Usher syndrome (hearing loss plus retinitis pigmentosa) in the population (8%), with the test becoming a primary indication of the syndrome. One challenge remains in that labs must perform Sanger sequencing to "fill-in" any missing regions not covered or called with NGS. Performing a pan-cardiomyopathy NGS test requires confirmation of 65 amplicons by Sanger sequencing, per patient. To move the ototest to NGS requires the development of to develop ~1,000 Sanger-based confirmatory runs.

Targeted sequencing is even more challenging than using exome or full genomes. The number of novel variants identified per month is difficult to keep up with! The average time to assess a novel variant ranges from 21-119 minutes, depending on what information is available across various sources. dbSNP data is mostly unannotated for clinical significance, and most of the annotations are wrong due to small control populations. Unlike the vast banks of available experimental sequencing data, clinical lab data is not in the public domain.

Heidi's proposal would bring clinical data into the public domain by developing a massive clinical grade variant database. Consisting of 160,000 records, data would be curated from different labs, with reports created byGeneInsight. Variants would have defined classification levels (benign, likely benign, unknown significance, likely pathogenic, pathogenic). Is this variant hub project the future of clinical whole-genome sequencing? Maybe it is too early to call yet, but this has got to be a step in the right direction.

Lisa Shaffer from Perkin Elmer gave a self-billed "non-sequencing talk" detailing a phenotype first approach to identifying clinical syndromes. Many Mendelian chromosomal abnormalities cannot be identified by NGS, but require arrays instead. For example the eponymously-named Potocki-Shaffer syndrome is characterized by holes in the back of the skull and bony outgrowths due to haploinsufficiency of the 11p11.2p12 region identified by a microarray. A proprietary database known as Genoglyphix with 16,000 abnormal microdeletion cases run on arrays and correlated with citation information, is available to researchers. Getting back to the "what defines a syndrome?" question, whole genome arrays and deletion information can help identify candidate genes that may be associated with phenotypes.

Darren Dinwiddie from Children's Mercy Hospital, Kansas City discussed his initial clinical experience with molecular testing using NGS, as this hospital is home to the Center for Pediatric Genomic Medicine. There are just over 7,000 known Mendelian diseases, 3,000 of those with a known molecular basis. Together, those diseases affect about 3-4% of children. Darren questioned the need to test for all recessive diseases. Timely diagnosis is important, including the obvious psychosocial benefits, as well as the ability to rule out other diseases and forego unnecessary treatment. Today, testing a child for 10 diseases costs thousands of dollars, for~ 300 patients per year. With a targeted NGS approach, they could test 3,000 patients per year for 600 diseases at a dramatically reduced cost per patient.

Dinwiddie's group uses Illumina TruSeq enrichment on 8,366 genomic regions (533 genes + mt genomes) for about 2 Mb of sequence. They run 4 Gb of sequence per sample, with SNP accuracy of 98.8%, sensitivity of 95.6%, and specificity of 99.9%. There's a nice front end for inputting phenotype that lists genes meeting the phenotypic criteria. Out of 290 cases, an impressive 286 correlated genes have been identified. GSNAP is done for alignment and then proprietary variant callers and annotations. They use the systemized nomenclature of medicine (SNOmed) to describe symptoms, and every variant is characterized by standards set by the American College of Medical Genetics. Detection of indels and CNVs account for 5-20% of all disease causing variants (Human Gene Mutation Database 2011). Reduced representation of reads would show a homo or het deletion.

Validation with about 700 samples is nearly complete, and was done in three phases:

  • Phase I – Optimization with 96 Coriell samples using TruSeq v2
  • Phase II – Blinded with 384 CMH and Coriell controls
  • Phase III- Efficacy example: 2 sisters, in hospital for 5 years, spent $20,000 plus on negative genetic tests. The NGS test found a mutation in the APTX protein, which was confirmed with Sanger sequencing. The SNOmed phenotype ID mapped to mutations in this gene, causing CoQ10 deficiency. Treatment with CoQ10 has started, and the patients are starting to see significant improvement.

More to come- Talks after the break in the next post.