Blog @ Illumina
Real scientists. Real commentary.

Thursday Afternoon at AGBT

Abizar Lakdawalla
| Feb 17, 2012

Goncalo Abecasis from the University of Michigan School of Public Health talked about early returns on sequencing studies of complex traits. Using LDL genetics as an example, they examined Mendelian variants in several genes. GWAS revealed 95 loci, 15 of which overlapped with the Mendelian disorders. GALNT2 was associated with a very small change in lipid levels, which he stated "gives GWAS a bad name". In 190,000 individuals, rare variants show large effects. Why study rare variants? From exome sequencing of 12,000 individuals there seem to be a lot of rare functional variants to discover, synonymous, nonsynonymous, and nonsense. Non-synonymous ones are seen at higher frequency.

In an interesting study of the Sardinian population a low pass (4x) sequencing project involved 6,000 Sardinians from 4 towns. Using 3 genomes from a family can inform more genomes due to inheritance. Q39X mutation in HBB found only by sequencing but not by GWAS, but resistance to malaria, and other genes found by sequencing were also found by GWAS. There is a gray area where analysis of selection and mutation rates requires special care may be about 25% of the genome.

Ewan Birney from the European Bioinformatics Institute gave a great talk about a massive effort called theENCODE project, dedicated to understanding our genome. Ewan put out the first no tweeting request due to data from a paper under review, and as the reviewer was right there in the audience, Ewan decided to be more circumspect.

Charles Perou from the University of North Carolina, Chapel Hill discussed work on molecular classification of breast tumors using gene expression profiling and its translation into clinical practice. A decade ago, gene expression arrays were the tool of choice for identifying prognostic indicators. Perou started with homemade cDNA arrays, then moved to Agilent arrays with ~70 genes indicating prognoses. However, the story doesn't end at prognosis—therapy is more important.

A list of 2,000 genes giving a classification concordance of 93% was reduced to 50 genes. Several methods, including prognostic risk stratification and C-index, were discussed to get a number behind incidence, overall patient survival, and response to treatment. However, degraded RNA from FFPE samples was not compatible with Agilent micorarrays, so the group moved to qPCR on the LightCycler, which gave a good ROR score on the 50 genes. Based on these scores, the OncotypeDx qPCR test put patients into three groups: low, intermediate, and high, with the high risk group being most likely to benefit from chemotherapy. The qPCR proved to be a bit laborious, so they moved to Nanostring for counting single RNA molecule without amplification, all the way up to 800 genes. The Nanostring PAM50 assay on 1,000 samples was compared with Oncotype to predict the 10-year risk of distant recurrence.

Moving to RNA sequencing at 2 x 50 bp on HiSeq 2000 instruments: 850 breast tumors were analyzed as part of the cancer genome atlas (TCGA)—and in fact they did RNA-Seq for > 2,000 tumor samples. The method of ribosomal RNA deplation on FPPE is a bit tricky—ribo-capture methods don't seem to do well on degraded samples, but Ribominus, Ribozero and DSN all seemed to work. Bonus: FFPE RNA comes pre-fragmented so you can skip fragmentation. There was a good correlation of RNA-Seq from FFPE and arrays from FFPE, and with fresh tissue. The conclusion was that combining and implementing new testing strategies even for a relatively small number of genes is a challenging task, but often gives the best picture of prognosis and response.

Arend Sidow from Stanford University continued the breast cancer discussion by describing breast cancer progression from earliest lesions, to clinically relevant carcinoma revealed by deep whole genome sequencing.  This talk had a restricted media request, so I will try to summarize the concepts. Arend started by discussing the origins and progression of breast cancer, and the evolution of genome variation over time. He had good success with FFPE samples—helped by good sequencing quality on HiSeq—on par with fresh frozen tissue. The general idea is to quantify the reads from the cancer genome, a LOH shows up as a decrease in counts from one allele to zero, and duplications result in extra reads. But, Sidow hints of some issues with normal cell contamination. The data from this talk will remain privileged information, only in the brains of those who were lucky enough to be in the chairs.