Friday, June 9, 2023

Investigating the association between glycaemic traits and colorectal cancer in the Japanese population using Mendelian randomisation – Scientific Reports

Study design

The MR method estimates the relationship between the exposure and the outcome of interest using known genetic variants related to the exposure under the following assumptions: (i) the selected genetic instruments are associated with the exposure of interest; (ii) they are not associated with any confounding factors in the relationship between the exposure and the outcome; (iii) the association between the genetic instruments and the outcome is only through the exposure of interest21.

As shown in Fig. 1, we conducted two-sample MR analyses, in which we used two independent study samples to estimate the single nucleotide polymorphism (SNP)-risk factors (fasting glucose, HbA1c, and fasting C-peptide) and SNP-outcome (colorectal cancer) associations. There was a slight sample overlap between SNP-glycaemic traits and SNP-colorectal cancer analyses, which resulted in a bias toward non-null association22.

Figure 1

Mendelian randomisation study design. BBJ BioBank Japan, FC fasting C-peptide, FG fasting glucose, FI fasting insulin, HbA1c haemoglobin A1c, HERPACC the Hospital-based Epidemiologic Research Program at Aichi Cancer Centre, J-MICC the Japan Multi-Institutional Collaborative Cohort, JPHC Japan Public Health Center, TMM the Tohoku Medical Megabank, SNP single nucleotide polymorphism.

Genetic instrument selection

To satisfy the MR assumption (i), instrumental variables for fasting glucose, HbA1c, and fasting C-peptide were systematically selected through previously published GWASs, as shown in Supplementary Figure S1 and Supplementary Table S1. Briefly, we used the GWAS Catalog (, co-published by the National Human Genome Research Institute and the European Bioinformatics Institute (for all three phenotypes). Given that GWASs for fasting C-peptide was extremely limited and none of the SNPs were met the baseline exclusion criteria (Supplementary Figure S1), we substituted fasting insulin for fasting C-peptide when searching SNPs through the GWAS Catalog. Genome-wide association data from the Meta-Analysis of Glucose and Insulin-related traits Consortium on 1.3 + billion Caucasian individuals free of diabetes were also used when selecting SNPs for fasting glucose and fasting insulin (substitute for fasting C-peptide)23. We selected SNPs for each glycaemic trait as instruments (genetic variants) reaching a genome-wide statistical significance threshold (P < 5 × 10–8), with minor allele frequency > 0.01 in the East Asians of the 1000 Genomes Project. As of December 1, 2020, we identified 97, 94, and 23 instruments for fasting glucose, HbA1c, and fasting C-peptide, respectively. To minimise the potential confounding effects from linkage disequilibrium, we used the “clumping” function in PLINK (a widely used open-source toolset for population-based linkage analyses and GWASs) (R2 > 0.001, with a 1 Mb window). Finally, we identified 34, 43, and 17 instruments for fasting glucose, HbA1c, and fasting C-peptide, respectively. Regardless of the statistical significance of the SNP-glycaemic trait associations in our samples, we used all the selected SNPs as instrumental variables to minimise biases from false negatives due to insufficient power24 and overfitting25. Detailed information for instrumental variables of each glycaemic trait is presented in Supplementary Tables S2 and S3.

Considering that several variants may affect HbA1c levels via erythrocyte biology26, thereby violating the MR assumptions (ii) and (iii), we examined whether the selected instruments are associated with erythrocyte-related traits at the GWAS significance threshold (P < 5 × 10–8) using the PhenoScanner (available at and Haploreg databases (available at: The instruments included rs579459 for haemoglobin concentration; rs17509001 and rs13134327 for high light scatter reticulocyte count; rs6684514, rs7616006, rs4737009, rs12602486, and rs4820268 for mean corpuscular haemoglobin concentration; rs11248914, rs9914988, rs2748427, and rs57601949 for mean corpuscular volume; rs11964178, rs7776054, rs6980507, rs10823343, rs174594, and rs12819124 for red blood cell count; rs857691, rs9818758, rs837763, and rs17533903 for reticulocyte count; and rs282587 for reticulocyte fraction of red cells. Additionally, we examined whether selected SNPs, such as body mass index (BMI), smoking, alcohol intake, and physical inactivity, could be confounders of the association between glycaemic metabolism and colorectal cancer2. While some instruments were associated with these traits (for example, rs2237892 on KCNQ for BMI), manual selection of instrumental variables that may have pleiotropic effects is generally not recommended27. Therefore, we conducted an MR-Egger regression28 as a sensitivity analysis (details are described in the Statistical Analysis section).

Proportion of explained variance and F-statistics

We calculated how much the selected genetic variant could explain the respective phenotype (X) using the previously described formula as below:

$${R}^{2}=\frac{{\sum }_{i=1}^{N} 2\left({p}_{i}\right)\left(1-{p}_{i}\right){\beta }_{i}^{2}}{Var(X)}$$

(R2 for the proportion of explained variance, N for the independent SNP instruments, \({p}_{i}\) for the effect allele frequency of SNPi and \({\beta }_{i}\) for the magnitude of association between SNPi and the phenotype)29.

Based on the explained variance for each glycaemic trait, we performed power calculations for MR analyses, setting a type-I error rate of 5% and power of 80%29. We also calculated the approximate F-statistics, which reflects the “strength” of an instrumental variables. The F-statistic can be approximated as

$${F}=\frac{{R}^{2} (n-1-k)}{(1-{R}^{2})k}$$

(R2 for the proportion of explained variance, n for the sample size, and k for the number of instrumental variables)30.

Genetic associations with glycaemic traits (fasting glucose, HbA1c, and C-peptide)

We obtained the summary statistics of the SNP-glycaemic trait (fasting glucose and HbA1c) associations from the Japanese Consortium of Genetic Epidemiology (J-CGE) studies31, which consists of the Japan Public Health Center (JPHC)-based prospective study, the Tohoku Medical Megabank (TMM) Community-Based Cohort study, the Japan Multi-Institutional Collaborative Cohort (J-MICC) study, and the Hospital-based Epidemiologic Research Program at Aichi Cancer Center (HERPACC). In the current MR study, genetic data for each glycaemic trait were available from JPHC, TMM and J-MICC within the J-CGE studies. The characteristics of each cohort are presented in Table 1 and Supplementary Table S4. As a proxy for insulin resistance, we used the measurement of fasting C-peptide, which is a more stable measure of insulin32, from the JPHC study.

Table 1 Characteristics of the studies considered for the analysis of single nucleotide polymorphism-exposure associations in the Japanese Consortium of Genetic Epidemiology Studies.

The exclusion criteria included: participants with physician-diagnosed diabetes; participants on any diabetes treatment; participants with fasting (defined by ≥ 8 h) serum glucose ≥ 126 mg/dL (≈ 7 mmol/L for fasting glucose or fasting C-peptide) or HbA1c ≥ 6.5% (for HbA1c); people with missing data on fasting status. In total, we included 17,289 people with fasting glucose measurements (n = 3,537 in JPHC; 9,900 in TMM; and 3,852 in J-MICC), 52,802 people with HbA1c (n = 8,207 in JPHC; 36,647 in TMM; and 7,948 in J-MICC), and 1,666 participants with fasting C-peptides (n = 1,666 in JPHC). The details of each GWAS are described in Supplementary Table S5 For fasting glucose and HbA1c, we meta-analysed each β coefficient and its 95% confidence interval (CI) from the individual study and further meta-analysed the overall effects of SNP-exposure in the fixed-effects inverse-variance weighted (IVW) method. Fasting glucose (mg/dL) and HbA1c (%) were unchanged. Fasting C-peptide (ng/mL) was log-transformed to enhance compliance with normality.

Genetic associations with colorectal cancer

Data for colorectal cancer were extracted from the individual-level GWAS data, including the JPHC case-cohort study-base (482 colorectal cancer cases and 2,434 control subjects), the JPHC case-cohort study-5 years (194 colorectal cancer cases and 3,607 controls), NAGANO study (105 colorectal cancer cases and 103 control subjects), HERPACC study (163 colorectal cancer cases and 3,819 control subjects), and J-MICC study (300 colorectal cancer cases and 901 control subjects); further, summary-level GWAS data were extracted from the BioBank Japan (BBJ) study (6,692 colorectal cancer cases and 27,178 control subjects; NDBC with the primary accession code hum0014; available at Table S6). The GWAS used phase 1 (for the BBJ study) and phase 3 (for the other institutions) of the 1000 Genomes Project as a reference panel in the imputation stage with adjustment of genetic principal components (Supplementary Table S5). The overall estimates of the SNP outcomes were combined using an IVW meta-analysis. This study was approved by the review board of the National Cancer Centre, Japan (Approval No.: 2011-044), TMM (Approval No.: 2012-4-617), Iwate Medical University (HG H25-2), Aichi Cancer Center (Approval No.: 12-27), and the Nagoya University Graduate School of Medicine (Approval No.: 2010-0939). Participants in the JPHC, who had provided blood, were contacted by mail and given the opportunity to opt out of participation before initiating this study. In addition, information on the study was posted on the website of the JPHC to provide participants with the opportunity to opt-out at any time of which the protocol was approved by the institutional review board of the National Cancer Center. Written informed consent was obtained from all the participants in the rest of the institutions. Details are shown in Supplementary Table S4. All procedures contributing to this study comply with the ethical standards of the relevant institutional committees on research involving human participants and with the Helsinki Declaration of 1964, as revised in 2008.

Statistical analysis

We used the TwoSampleMR package in R v3.6.4 for MR analyses, excluding the MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO) model, which utilised the MR-PRESSO package v1.0. The IVW method with random effects was used to assess the relationship between genetically predicted glycaemic traits (fasting glucose, HbA1c, and fasting C-peptide) and the risk of colorectal cancer.

While the IVW method has great statistical power, this method requires a stringent assumption; all the instrumental variables are valid or “balanced pleiotropy”34. In the presence of directional horizontal pleiotropy, however, the IVW method may produce biased estimates35. To address such violations (MR assumptions (ii) and (iii)), we further applied multiple sensitivity analyses that are more robust to pleiotropic effects, including MR-Egger regression, weighted median, and MR-PRESSO analyses. The intercept term of the MR-Egger regression provides the indicator of unbalanced pleiotropy (P < 0.05 indicated significance)28. Nonetheless, the disadvantage of the MR-Egger approach is that it is affected by outliers or influential data points. At this point, the weighted-median analysis is useful because the approach can produce valid estimates if at least a half of the instruments are correct35. Finally, the MR-PRESSO method performs regression analysis of the estimates for SNP-outcome against the SNP-exposure to explore outlier SNPs36. Outliers were then removed from genetic variants whose causal estimates differ substantially from those of the other variants, and the IVW method for all variants was then performed. The funnel and leave-one-out plots were also obtained. The thresholds for nominal significance were set at P < 0.05.

Source link

Related Articles

Leave a Reply

Stay Connected

- Advertisement -spot_img

Latest Articles

%d bloggers like this: