Demographic and clinical characteristics
A total of 760 patients (660 PD patients (nPD) and 100 patients with other forms of parkinsonism (npark)) and 808 healthy controls (nHC) from the Luxembourg Parkinson’s study (Fig. 1) were genotyped using NeuroChip and screened for GBA1 variants using targeted PacBio DNA sequencing method, while a subset of 72 patients was screened with WGS. Among the patients, 66.4% (n = 499) were male with a mean age at disease onset (AAO) of 63 ± 11.5 years (Supplementary Table 1). The control group consisted of 52.7% (n = 426) males with a mean age at assessment (AAA) of 59.3 ± 12.2 years. Due to their above 30-fold coverage provided by the long-read DNA sequencing, all samples were selected after successfully passing the MultiQC step (Supplementary Table 9). To ensure ethnic homogeneity and exclude other genetic factors that may bias the assessment of the genetic contribution of GBA1 to PD in the Luxembourgish population, we excluded carriers of mutations in other PD-causing genes (point mutations: n = 10, nPD = 8,nHC = 2; CNV: nPD = 4) in PD-associated genes (no CNVs in GBA1 were detected), first-degree family members (n = 64, nPD = 8, npark = 2, nHC = 54), younger HC (<60 AAA) with first-degree relatives having PD (nHC = 74), and individuals of non-European descent (n = 6) from the cohort. The final cohort consisted of 735 patients (nPD=637, npark = 98) and 675 HC with a mean AAO among the patients of 63.2 ± 11.3 years, whereas the mean AAA for HC was 61 ± 11.5 years. Based on Neurochip and WGS data, none of the GBA1 carriers carried pathogenic variants in other PD-associated genes as defined by MDSGene10.
Targeted PacBio DNA sequencing showed the highest specificity for detecting rare coding variants in GBA1
To measure the reliability of calling rare GBA1 coding variants, we performed two types of comparison. Rare variants were here defined as variants with minor allele frequency (MAF) < 1% in the European population. We compared the results from the PacBio, WGS, and NeuroChip data for a subset of samples (n = 72). We then compared the PacBio and NeuroChip data as they both covered the majority of samples (n = 1568). We considered true positives to be the GBA1 variants validated by Sanger sequencing. False-positive variants were those identified by the analysis method but not confirmed by Sanger sequencing. False-negative variants were not called by the analysis method but were later validated with Sanger sequencing (Supplementary Table 2). First, we evaluated 72 samples screened by all three methods (Fig. 2). Using the GBA1-targeted PacBio DNA sequencing method and WGS in combination with the Gauchian11 tool implemented in Dragen v4 (GBA caller option), we detected six individuals carrying GBA1 variants (p.E365K (n = 3), p.T408M (n = 1), p.N409S (n = 1), RecNciI (n = 1)). The RecNil combines the three variants p.L483P, p.A495P, and p.V499V in one haplotype allele. All variants detected were confirmed by Sanger sequencing (true positive rate (TPR) of 100%). We did not identify any false positive variant calls. However, using the Dragen v.4 pipeline without the GBA1 caller, relying only on the GATK best practices pipeline, the WGS method failed to detect the RecNciI recombinant allele in one individual (TPR of 83.3% (5/6)). Using Neurochip, we detected three potential GBA1 variant carriers (p.T408M (n = 1), p.N431S (n = 1), p.A215D (n = 1), but only one variant (p.T408M) was subsequently confirmed by Sanger sequencing (TPR of 16.6% (1/6), resulting in a false discovery rate (FDR) of 66.6% (2/3).
Next, we compared the results from 1568 samples screened with both, the GBA1-targeted PacBio DNA sequencing method and the NeuroChip array (Fig. 3). Using the GBA1-targeted PacBio DNA method, we detected 135 GBA1 variants carriers, of which 100% were validated by Sanger sequencing. Using the NeuroChip array, we detected 47 potential GBA1 variant carriers, among which only 36 were validated by Sanger sequencing (TPR of 26.7% (36/133), resulting in an FDR of 23.4% (11/47).
Classification of GBA1 variants
Of the 1568 individuals sequenced using the GBA1-targeted PacBio DNA sequencing method, we identified 135 carriers of at least one GBA1 variant (Supplementary Tables 3, 4). Based on the classification of Höglinger et al.12, 25 were carriers of severe variants, 10 of mild variants, 72 of risk variants and 22 of VUS. The most common GBA1 variants in PD patients were the risk variants p.E365K (n = 23; 3.5%) and p.T408M (n = 17; 2.6%).
GBA1 variants were mostly heterozygous missenses, one patient carried a heterozygous stop-gain variant p.R398*(rs121908309), two PD patients carried a homozygous missense variant p.E365K/p.E365K(rs2230288). We identified two HC carrying a pathogenic LRRK2 variant and being positive for GBA1 variant (p.E365KGBA-p.R1441CLRRK2; p.K13RGBA-p.G2019SLRRK2). We also detected nine different synonymous variants in exonic regions (Supplementary Table 4). The variant p.T408T(rs138498426) is a splice site variant (located within 2 bp of the exon boundary) and is classified as VUS12. The remaining synonymous variants were not further analyzed. Additionally, we identified 69 variants in intronic and UTRs regions of GBA1 (Supplementary Table 5) with unclear pathogenic relevance, 35 of which were rare with MAF < 1% in gnomAD for the Non-Finnish European population10.
We classified the following combinations of multiple variants per individual as severe based on the classification of the respective associated pathogenic variants (Table 1): p.N409S-p.L483P, p.K13R-p.L483P, p.F252I-p.T408M, p.Y61H-p.T408M.
GBA1 variant frequency
To calculate the GBA1 frequency in our study, we considered the individuals remaining after the exclusion step (735 patients and 675 HC). We detected 12.1% (n = 77) GBA1 variant carriers among 637 PD patients and 5% (34/675) in HC individuals. We found a frequency of 10.5% (67/637) of pathogenic variants in PD patients (severe, mild, risk) and 4.3% (29/675) in controls (Table 2). Four patients with parkinsonism had GBA1 variants. Carriers of severe GBA1 variants (n = 21; 3.2%; OR = 11.4; 95% CI = [2.6, 49]; p = 0.0010) have a high risk of developing PD as defined by the indicated OR.
Genotype–phenotype associations in GBA1-PD patients
We characterized the clinical phenotype of severe (n = 21), mild (n = 7) and risk (n = 39) GBA1 carriers and non-carriers (n = 554) only in unrelated PD patients excluding carriers with only one synonymous or VUS variant in individuals remaining after the filtering step. The AAO was similar between GBA1 carriers (61.6 ± 11.5) and non-carriers (62.6 ± 11.6). Severe PDGBA1 variant carriers showed a trend towards younger AAO compared to mild and risk (severe: 58.6 ± 13.1 vs mild: 65.4 ± 17 vs risk: 62.5 ± 9.3 years; p = 0.29) (Table 3), with a significant risk to develop early onset PD (OR = 4.02; p = 0.0098). In contrast to non-carriers, we also observed that carriers of pathogenic variants have a strong family history of PD (OR = 0.74; p = 0.0401). Compared to severe (6.4 ± 4.7) and risk (4.4 ± 4.9) variant carriers, the mild (1.7 ± 1.4) variant carriers show a significantly shorter disease duration (Table 3).
We compared clinical features between PD patients carrying pathogenic GBA1 variants and PD patients without GBA1 variants (Supplementary Table 6). We found that in carriers the sense of smell was strongly impaired (uncorrected p = 0.0210) and a higher rate of hallucinations (uncorrected p = 0.0415). Next, we compared patients carrying variants from each category (severe, mild or risk) separately with PD patients without GBA1 variants (Table 4). Carriers of severe GBA1 variants showed more severe non-motor symptoms when compared to non-GBA1 carriers, such as MDS-UPDRS Part I (uncorrected p = 0.0074) and hallucinations (uncorrected p = 0.0099), and also an impaired sense of smell as assessed by Sniffin’ Stick test (uncorrected p = 0.0405). To show the deleterious impact of the severe variants, we compared carriers of severe variants with patients carrying either mild or risk GBA1 variants (Table 5). We observed that severe variants carriers have more severe gait disorder (uncorrected p = 0.0188) and depression (uncorrected p = 0.0074) and worse MDS-UPDRS Part I (uncorrected p = 0.0019) and PDQ-39 (uncorrected p = 0.0422). For all clinical features, there were no significant associations after the correction for multiple comparisons using FDR adjustment.
VUS and the glucosylceramidase structure
We detected nine already reported VUS (p.K13R, p.Y61H, p.R78C, p.L213P, p.E427K, p.A495P, p.H529R, p.R534C, p.T408T) and three new VUS (p.A97G, p.A215 and p.R434C).
According to our strategy developed for the VUS classification of GBA1 variants, where we assign the pathogenicity based on the REVEL, the CADD and the dbscSNV scores, as well as the presence or absence of the variants in the patients. We propose to subclassify the VUS p.Y61H, p.L213P, p.A215D, and p.R434C as probably pathogenic severe variants (Supplementary Table 7). The variant p.L213P changes the leucine into proline, which is known to be the “helix breaker” amino acid that induces a bend into the protein structure13 (Supplementary Fig. 1). The p.L213P and p.A215D variants are in the catalytic site of the enzyme in the triose-phosphate isomerase (TIM) barrel structure. The p.Y61H variant (Fig. 4a) is located next to the residue position of the known severe variant p.C62W, and the patient carrying this variant had an AAO of 38 years, indicating an early-onset, probably severe form of PD. This patient has a family history of PD and reported that the paternal uncle and aunt were diagnosed with PD at the ages of 60 and 70, respectively. The p.R434C variant is close to a known severe (p.V433L) and mild (p.W432R, p.N435T) PD variants in the 3D structure. We compared the clinical scores obtained from carriers of known severe variants with the four carriers of probable severe VUS (p.Y61H, p.L213P, p.A215D, and p.R434C) (Supplementary Table 7). The z-score was used to determine the number of SD deviations from the mean for each clinical score. We observed that the PD patient carrying the p.L213P variant had a z-score that was significantly different for MDS-UPDRS II (z-score = 3.05) and MDS-UPDRS III (z-score = 2.94) confirming its classification as a severe variant.
We propose to subclassify the variants p.H529R and p.R534C as probably mild variants, as they are both found only in PD patients. The variants p.K13R, p.R78C, p.E427K, and p.A495P are subclassified as probable risk variants. The variant p.K13R is located in the signal peptide region. The variant p.R78C was annotated as “PD susceptibility” in HGMD with deleterious impact in CADD. The variant p.E427K was annotated as associated to “parkinsonism” in ClinVar and “reduced activity” in HGMD. We propose to classify the variant p.A97G as probably benign because it is localized in a coil-bend structure and is not close to any known pathogenic variants.
The synonymous variant p.T408T was found in two cases and one healthy control individual. Two established splice-site prediction scores (dbscSNV: ada_score 0.9797 and rf_score 0.85) agreed in their prediction that the variant is likely to affect splicing. HGMD classified the variant as disease mutation (DM) (Supplementary Table 4). Therefore, we propose to classify the variant as a risk variant.
Overall, we propose to classify four VUS variants as probably severe pathogenic variants (p.Y61H, p.L213P, p.A215D, and p.R434C), two as probably mild pathogenic variants (p.H529R and p.R534C), five as probably pathogenic risk variants (p.K13R, p.R78C, p.E427K, p.A495P, and p.T408T) and one as probably benign variants (p.A97G) (Fig. 4b).