Blog @ Illumina
Real scientists. Real commentary.

ESHG Day 2: Focus on Functional Genomics

Scott Brouilette, Ph.D.
| Jun 06, 2014

ESHG 14 was in Milano, Italia

Day 2 was underway with a session dedicated to Functional Genomics. Bart Deplancke (Lausanne) used ChIP-Seq and RNA-Seq to dissect the genetic basis of phenotypic variation in 47 lymphoblastic cell lines. ChIP-Seq indicated chromatin activity in 20% of all tested sites, which exhibited coordinated quantitative association. ChIP-Seq also identified around 15,000 "Variable Molecular Modules" (VMMs; essentially clusters of transcription-factor binding sites) with strong enrichment of variable enhancers and highly coordinated chromatin activity. Interestingly, VMMs comprised many putative enhancers with no previously reported role in gene expression (GEX). Addition of the corresponding RNA-Seq data reveals associations between chromatin variation and GEX variation with differing QTL effects on different epigenetic marks. Unsurprisingly, the most significant SNPs disrupt VMM binding motifs. Overall, the data suggest that VMMs may reflect a more fine-grained impact on chromatin topology (and, one would presume, gene expression?) than was previously appreciated.

Michel Georges discussed the role of gene expression in inflammatory bowel disease (IBD). Various GWAS have identified IBD risk loci, but much of the heritability remains unexplained and causative genes remain elusive. Michel observed what many others who have witnessed the rise of GWAS have previously noted: in many cases the top reported SNP in a given study is simply not causative, at least when considered in isolation. But through conditional analysis, the effect of each SNP is evaluated compared with the effects of the others. Using data from an Illumina Immunochip dataset, Michel nicely showed how unconditional testing yields a single “sentinel” hit, whereas conditional analysis adds several additional hits in the same vicinity. As a final point he also discussed the combination of eQTL and GWAS data, highlighting that the patterns should match, reflecting the perturbation impact on transcription.

The final talk of the session brought us back to single-cell transcriptomics and an evaluation of the technical noise that currently plagues this embryonic field by John Marioni from the Sanger-EBI Single Cell Genomic Centre. The first issue discussed was bias introduced during the amplification steps during NGS protocols; dilution series clearly show that the degree of noise gets progressively worse as the amount of starting material gets lower. And, as you can imagine, the problem is exacerbated for lowly-expressed genes. The suggestion was to use spike-ins to allow quantification of variability; the nature of the spikes depends on the number of single cells being studied. The second key issue is the 35% of genes demonstrating strong correation to cell cycle genes, by estimating the magnitude of this effect it is possible to “normalise” expression relative to the stage of the cell cycle. To test these solutions John’s team examined T-cells and found estimated that 27% of the variance in GEX is due technical noise, and that for 42% of genes 30% variance can be attributed to the “cell cycle effect”. By correcting for both two T-cell sub-populations become apparent, at different stages of differentiation. This was an elegant example of the use of single-cell transcriptomics for identification of cell sub-types, with the caveat that the technical noise/cell cycle effect must be taken into consideration. Personally I would like to see this approach extended to investigate the impact of circadian biology on gene expression at the single-cell level, as this another potential source of GEX variation…