ProHealth: Kaiser Permanente Genome-wide Association Study of Prostate Cancer

A genome-wide association study (GWAS) of prostate cancer (PCa) was conducted in Kaiser Permanente (KP) Northern California health plan members (7,783 cases, 38,595 controls; 80.3% non-Hispanic white, 4.9% African-American, 7.0% East Asian, and 7.8% Latino) [PMID: 26034056]. The data for these members were drawn from three KP cohort studies: Research Program in Genes, Environment and Health (RPGEH) ProHealth, and California Men's Health Study (CMHS) (described further under Study History). Four custom arrays were designed for genotyping, one for each of the four major race-ethnicity groups in the RPGEH cohort: African Americans, East Asians, Latinos, and Non-Hispanic Whites. The number of SNPs and SNP content varied by array, with SNP content designed to maximize the genome-wide coverage of low frequency and more common variants specific to the different race-ethnicity groups, including newly identified SNPs from sequencing projects, and SNPs with established associations with disease phenotypes and risk factors [PMIDs: 21565264, 21903159]. Within the total study cohort, n=34,736 completed a consent which permitted deposition of data to NIH.

Genotyping followed the same general procedure described in [PMIDs: 26092718, plus additional quality control (QC) steps for the additional men, in order to control for potential batch and kit effects, described in [PMID: 26034056. Briefly, we first repeated the filters described in [PMID: 26092718] for all four arrays (EUR, LAT, EAS, AFR). Then, on an array-wise basis, we removed SNPs with MAF<0.01, with a call rate<95%, or with Hardy-Weinberg Equilibrium (HWE) p-value in homogeneous groups<1x10ˆ-5. Furthermore, on the EUR array, to adjust for potential kit effect, we conducted a GWAS of kit, and removed those kit associated SNPs with p<1x10ˆ-6; we also re-genotyped each of the new samples (those not genotyped with the original GERA data) with some of the original GERA data, and removed SNPs with >13/1,268 (1%) mismatches. For the AFR array, to adjust potential plate batch issues, we conducted a GWAS of whether an individual was in the original GERA data vs. in the newly genotyped data and removed those batch-associated SNPs with p<0.05 (we used a stronger threshold than that used for the EUR array because there were fewer individuals on the AFR array); we also re-genotyped each of the new samples with the original GERA data and removed SNPs with >2/78 (2.6%).

After the QC described above, imputation was performed as described in [PMID: 26034056]. Imputation was performed on an array-wise basis, pre-phasing with SHAPE-IT v2.5 [PMID: 22138821], and imputing from the 1000 Genomes Project October 2014 release as a cosmopolitan reference panel with IMPUTE2 [PMID: 22384356].

In addition to the GWAS described above, a nested exome-wide association study (EWAS) of PCa was also conducted (7,489 cases, 7,323 controls; 78% non-Hispanic white, 9% African-American, 3% East Asian, 6% Latino, 4% Other). A custom EWAS array primarily focused on rare variants was designed for genotyping that complemented the GWAS arrays [PMID: 26034056]. The EWAS array content included missense and loss-of-function mutations, and rare exonic mutations from The Cancer Genome Atlas (TCGA) and dbGaP prostate cancer tumor exomes [PMID: 26544944; PMID: 26544944]. Much of the EWAS array design content overlapped with the probesets on the UK Biobank Affymetrix Axiom array [PMID: 30305743]. Genotyping and QC steps taken to filter out samples exhibiting low quality and variants with low call rates are described in Emami et al., 2020 [biorXiv]. The resulting EWAS array genotypes are provided here.

Type: Cohort
Archiver: The database of Genotypes and Phenotypes (dbGaP)