Genomic Studies of Bipolar Disorder in a Large Cohort from The Netherlands

The transcriptomes of whole blood samples of a large cohort of individuals from the Netherlands were assayed using RNA-seq. These whole blood samples included patients diagnosed with schizophrenia or bipolar disorder, family members of patients, and healthy controls. Within the whole blood dataset, about 600 of the samples were sequenced to an average of 13.9 million mapped reads per sample, while about 2,000 were sequenced to an average of 5.9 million mapped reads per sample, in order to assess changes in power to detect eQTLs based on sample size. We find that expression, defined as the log transcripts per million reads (TPMs), is highly correlated when comparing the moderate (13.9M) and lower (5.9M) coverage datasets.

Additionally, a subset of about 150 individuals provided fibroblast cell lines which were assessed for the levels of gene expression using RNA-seq at an average of 50 million mapped reads per sample. We use downsampling techniques to generate synthetic datasets derived from this real fibroblast RNA-Seq data to explore the relationship between coverage and number individuals in an eQTL analysis.

Corresponding individual-level genotypes for the samples included in the expression matrix are also made available. The genotypes were derived from multiple other project cohorts, each using different genotyping platforms including OmniExpress Exome, Global Screening Array, Illumina550, and PsychChip. A Plink file set of these cohorts is provided here after merging the separate project file sets and extracting only the individuals who have expression data also provided here; quality control for SNP-missingness is recommended.

We demonstrate in this study that given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample to allow for an a greater number of individuals sequenced.

Type: Cohort
Archiver: The database of Genotypes and Phenotypes (dbGaP)