Misexpression of inactive genes in whole blood is associated with nearby rare structural variants.
Vanderstichele T., Burnham KL., de Klein N., Tardaguila M., Howell B., Walter K., Kundu K., Koeppel J., Lee W., Tokolyi A., Persyn E., Nath AP., Marten J., Petrovski S., Roberts DJ., Di Angelantonio E., Danesh J., Berton A., Platt A., Butterworth AS., Soranzo N., Parts L., Inouye M., Paul DS., Davenport EE.
Gene misexpression is the aberrant transcription of a gene in a context where it is usually inactive. Despite its known pathological consequences in specific rare diseases, we have a limited understanding of its wider prevalence and mechanisms in humans. To address this, we analyzed gene misexpression in 4,568 whole-blood bulk RNA sequencing samples from INTERVAL study blood donors. We found that while individual misexpression events occur rarely, in aggregate they were found in almost all samples and a third of inactive protein-coding genes. Using 2,821 paired whole-genome and RNA sequencing samples, we identified that misexpression events are enriched in cis for rare structural variants. We established putative mechanisms through which a subset of SVs lead to gene misexpression, including transcriptional readthrough, transcript fusions, and gene inversion. Overall, we develop misexpression as a type of transcriptomic outlier analysis and extend our understanding of the variety of mechanisms by which genetic variants can influence gene expression.