Study population
The HCHS/SOL is a population-based longitudinal multi-site cohort study of Hispanic/Latino adults in the United States. The study primarily enrolled participants from six self-identified Hispanic/Latino backgrounds: Cuban, Central American, Dominican, Mexican, Puerto Rican, and South American [17, 18]. A total of 16,415 adults, 18–74-year-old, were enrolled in the baseline visit at four field centers (Bronx, NY, Chicago, IL, Miami, FL, and San Diego, CA) (2008-2011). A detailed description of the sampling design, including the generation and use of survey weights for the HCHS/SOL, was previously published [17, 18]. Cognitive function was assessed in 9714 individuals aged 45 years or older during the baseline visit. The Study of Latinos-Investigation of Neurocognitive Aging (SOL-INCA) is an ancillary study of HCHS/SOL, focusing on the middle-aged and older adult group who underwent cognitive assessment at visit 1 [19]. Overall, 6377 individuals 50 or older with baseline cognitive testing participated in the SOL-INCA examination, taking place at or after HCHS/SOL visit 2, with an average of 7 years since visit 1. Metabolites were measured in serum, after fasting, on a random subset of 3978 HCHS/SOL participants from visit 1, and profiling was done using untargeted liquid chromatography-mass spectrometry (LC-MS) using the discovery HD4 platform in 2017 at Metabolon Inc. (Durham, NC).
All participants in this analysis signed written informed consent in their preferred language (Spanish/English). The HCHS/SOL was approved by the institutional review boards (IRBs) at each field center, where all participants gave written informed consent, and by the Non-Biomedical IRB at the University of North Carolina at Chapel Hill, to the HCHS/SOL Data Coordinating Center. All IRBs approving the study are Non-Biomedical IRB at the University of North Carolina at Chapel Hill. Chapel Hill, NC; Einstein IRB at the Albert Einstein College of Medicine of Yeshiva University. Bronx, NY; IRB at Office for the Protection of Research Subjects (OPRS), University of Illinois at Chicago. Chicago, IL; Human Subject Research Office, University of Miami. Miami, FL; Institutional Review Board of San Diego State University, San Diego, CA. The present study was approved as a secondary data analysis protocol by the Mass General Brigham IRB protocol #2019P000057.
Neurocognitive outcomes
We studied prevalent MCI at the SOL-INCA visit, classified according to National Institute on Aging-Alzheimer’s Association criteria [20]. In brief, the SOL-INCA MCI research diagnostic operational definition [7, 19] included three criteria: [1] a cognitive test score below –1 standard deviation (SD) in any of the cognitive tests applied at the SOL-INCA exam, where means and SDs were based on SOL-INCA robust internal norms, [2] a rate of global cognitive decline between the HCHS/SOL baseline and the SOL-INCA exam of than −0.055 SD or more per year, and [3] any self-reported subjective cognitive decline using the Everday Cognition 12-item version (E-Cog12) [21]. Additionally, individuals were classified as MCI+ if they met two conditions: (a) a cognitive test performance below –2 SD in any SOL-INCA neurocognitive test, and (b) more than minimal impairment in the instrumental activities of daily living (IADL) [22].
Metabolomic risk score (MRS) for MCI
We previously developed an MRS for MCI based on selected fasting serum metabolites, from a LASSO-penalized regression [23] using 1451 SOL-INCA individuals who also had metabolite measures [14]. The MRS forms a combined measure of the joint effect of 61 metabolites in predicting MCI. The MRS is defined as a weighted sum of metabolite values, of the form, for participant i:
$$mrs_i = \mathop {\sum }\limits_{j = 1}^{61} w_jm_{ij},$$
where mij is the level of the j metabolite in participant i, and wj is the weight of the metabolite. The list of metabolites and weights is provided in Supplementary Table 1. Based on the metabolites and their weights, we constructed the MRS for 3968 HCHS/SOL individuals with metabolomics data. All metabolites used in the MRS had less than 25% missing values. They were treated as continuous and missing values were imputed using half of the lowest value observed in the sample per metabolite, under the assumption that missing values are due to metabolite concentration being below the limit of detection (i.e., missing not at random). Because some metabolites have skewed distribution, we originally rank-normalized the metabolites before summing them in the MRS, and scaled them back to their original scale by multiplying them by their standard deviations (SD), estimated prior to rank-normalization. We also adapted the weights according to the SDs estimated on the sample used for developing the MRS.
Genotyping
APOE genotyping was performed using commercial TaqMan assays previously described [24]. For individuals with missing APOE genotypes, we computed APOE genotypes based on phased whole-genome sequencing (WGS) data from TOPMed Freeze 8. Other genetic data were used based on genotyping (rather than WGS) using an Illumina custom array, as previously reported [25]. Genome-wide imputation was conducted using the multi-ethnic NHLBI Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel (GRCh38 assembly) [25]. Principal components (PCs) were previously computed using PC-Relate [26], and the kinship matrix was computed using the genetic data. “Genetic analysis groups” were constructed based on a combination of self-identified Hispanic/Latino backgrounds and genetic similarity, and are classified as Central American, Cuban, Dominican, Mexican, Puerto Rican, and South American [27].
Heritability estimation for MCI-MRS and BAIBA
Heritability of the MRS and BAIBA (the MRS metabolite highlighted by the MCI-MRS GWAS results, see further details below) were estimated via a mixed model using the variance explained by the kinship matrix, representing the variance explained by additive effects of common genetic variants. Heritability was estimated in 3496 HCHS/SOL individuals (from Fig. 1, set B), after excluding >3rd-degree relatives estimated via the kinship coefficient.
Genome-wide association studies (GWAS) for MCI-MRS and BAIBA
We performed MCI-MRS and BAIBA GWAS in 3890 HCHS/SOL individuals who had both genetic data and an MCI-MRS score and 3863 individuals with BAIBA values (27 individuals had missing BAIBA values) (Fig. 1, Step 1). We used the linear mixed model approach from the “GENESIS” R package and adjusted for age, sex, center, genetic analysis groups, first five PCs of genetic data, and random effects for kinship, household, and block unit. For both GWAS, we removed genetic variants with low minor allele count (MAC) (<60, corresponding to MAF ≲0.77%), and/or low imputation quality (R2 < 0.6), resulting in 12,518,657 variants in MCI-MRS GWAS and 12,481,432 in BAIBA GWAS. We used a two-stage method, in which we first regressed the trait on covariates, obtained residuals, rank-normalized them, and then used the rank-normalized residuals in the association with the genotypes [28], adjusting for the same covariates again. We applied a genome-wide significance threshold of p value = 5 × 10−8. Notably, due to applying the two-stage rank-normalization approach, the selected MAC threshold was expected to result in appropriate type 1 error control. Two-sided p values were computed using the score test.
When multiple variants within a genomic region (1 Mb window) were significantly associated with the MRS or BAIBA (p value <5 × 10−8), we conducted conditional analyses using the index (most significant) SNP as a covariate. If any of the remaining variants had associations with p value <5 × 10−8, we repeated this process, adding the top remaining variant to the model. We report the associations for independent SNPs based on the first discovery model. Finally, we assessed whether the findings from our BAIBA GWAS are similar to previously reported findings by looking up associations of SNPs from regions identified in other GWAS.
We computed the trait variance explained by the identified, genome-wide significant variants for each of the MCI-MRS and BAIBA, by comparing the total variance of a linear mixed model fitted to the metabolite outcome (MCI-MRS or BAIBA) with covariates age, sex, center, genetic analysis groups, first five PCs of genetic data, to the total variance of a similar model that also has the identified variants as covariates. The total variance was defined as the sum of the variance components corresponding to the kinship, household, and block unit matrices, and the residual variance of each model. The percent explained variance was defined as the percent reduction in total variance between the model with and without genetic variants.
MCI-MRS-associated SNPs and their associations with MCI-MRS metabolites
While we focused on BAIBA because the single association region of the MCI-MRS encompasses the AGXT2 gene known to be strongly associated with BAIBA, we also estimated genetic associations of the two MCI-MRS SNPs from the AGXT2 region with all metabolites composing the MCI-MRS. We used the same linear mixed model approach as for the MCI-MRS and BAIBA GWAS, while focusing only on the two SNPs.
Genetic association analysis with MCI in a separate HCHS/SOL dataset
We tested the association between the variants significantly associated with the MCI-MRS or BAIBA levels, and MCI in a set of 3149 HCHS/SOL individuals who were not included in the dataset used for the construction of the MCI-MRS (due to lack of metabolite data) (Fig. 1, Step 2). We employed the mixed model approach with a logistic link function and with the same covariates and random effects as described above. We stratified the analysis by the APOE-ε4 carrier status since the association of BAIBA and MCI was driven by the APOE-ε4 carrier stratum [14]. In a second model, we further included APOE-ε4 and APOE-ε2 carrier status as covariates. Associations were considered significant if they had a p value <0.05. P values were two-sided and were based on the score test. We note that family-wise error rate (FWER) control requires p value threshold accounting for all tested associations, i.e., 0.05/10 = 0.005. Finally, we performed a sensitivity analysis where we applied the same analysis on a smaller subset of 2748 individuals who are genetically unrelated to those who participated in the GWAS of the MCI-MRS and of BAIBA (individuals with >3rd-degree relatedness estimated via the kinship coefficient were excluded; Fig. 1, Step 3). This sensitivity analysis addresses the possibility that replicated genetic associations are potentially driven by genetic similarity with the discovery dataset, potentially replicating false associations.
In another analysis, we constructed a weighted genetic risk score (wGRS) based on AGXT2 variants for each of the MCI-MRS and for BAIBA: the wGRS was a weighted sum of the effect alleles of the 2 or 7 genome-wide significant variants or 7 variants (for MCI-MRS and BAIBA, respectively), with weights being their estimated effect sizes from the GWAS. These wGRSs were constructed and their associations with MCI were estimated in the HCHS/SOL dataset that was separate from the dataset with metabolomics (set C from Fig. 1). The goal of this analysis was to potentially increase power by aggregating information across SNPs.
Generalization of SNP associations with MCI in the ARIC study and meta-analysis
We further evaluated the generalization of the significantly associated SNPs in the ARIC longitudinal cohort study (Fig. 1, Step 3) comprising two major US race/ethnic groups, European and African Americans [29, 30]. The protocol for MCI/dementia diagnosis in ARIC has been previously described [31] and is provided in Supplementary Note 1. Data from ARIC visit 5 were, which includes MCI assessment, used in this analysis. Next, we meta-analyzed the results from HCHS/SOL Hispanic/Latino individuals, ARIC European, and ARIC African Americans in an inverse-variance, fixed-effect meta-analysis. To conclude the significance of association while controlling the FWER on the results from the meta-analysis, a p value of 0.05/10 = 0.005 is required for a given association.
Mediation analyses
Mediation analyses were conducted to further examine the relationship between the two variants associated with MCI in replication meta-analysis, and to explore whether these associations are mediated by BAIBA. We used the R “mediation” package, with a complex survey design from the R “survey” package [32], with a “quasibinomial” family for binary traits. This method accounts for the stratification, clustering, and probability weighting in HCHS/SOL to allow correct generalizations to the target population of Latinos in the US. Models were adjusted for age, sex, and study center. A total of n = 1490 HCHS/SOL participants with genetic, metabolite, and MCI data were included in the analysis (Fig. 1, Step 4).
Lifestyle associations with MRS-MCI and BAIBA
We further explored the associations of lifestyle characteristics with MCI-MRS and BAIBA. We used the complex survey design as described above, with the number of participants varying between 3525–3978, depending on the tested lifestyle characteristic, which included: depression, education, physical activity, sleep duration, insomnia, respiratory event index, BMI, smoking, alcohol consumption, and Mediterranean diet score (more information in Supplementary note 2). We computed estimated effect sizes and two-sided Wald test p values and noted significance at the nominal p value <0.05 level, and computed the required p value threshold for controlling the FWER when testing two metabolite measures (MCI-MRS and BAIBA) and ten lifestyle characteristics as 0.05/(2 × 10) = 0.0025.