Participants
Study materials were collected from digital clock drawing consortium data between the University of Florida (UF) and New Jersey Institute for Successful Aging (NJISA), Memory Assessment Program, School of Osteopathic Medicine, Rowan University. The Institutional Review Boards of the University of Florida and Rowan University approved the study. Study participants at both institutions gave their written approval to be included in the study through informed consent forms. All study procedures were carried out per the Declaration of Helsinki and respective university guidelines and TRIPOD criteria31. The study consisted of two data cohorts:
Training dataset included a set of 23,521 clock drawings from 11,762 participants aged ≥ 65 years, primary English speaking, who completed clock drawing to command and copy conditions as part of routine medical care assessment in a preoperative setting32. Exclusion criteria were as follows: non-fluent in the English language; education < 4 years; visual, hearing, or motor extremity limitation that potentially inhibits the production of a valid clock drawing.
Classification dataset consists of a “fine-tuning” dataset and a “testing” dataset used to fine-tune and test dementia versus non-dementia neural network classifier, respectively. These datasets comprise clock drawings from individuals diagnosed with dementia and non-dementia peers. The dementia clocks were collected from 56 participants evaluated through a community memory assessment program within Rowan University. They were seen by a neuropsychologist, a psychiatrist, and a social worker. Inclusion criteria: age ≥ 55. Exclusion criteria: head trauma, heart disease, or other major medical illness that can induce encephalopathy; major psychiatric disorders; documented learning disability; seizure disorder or other major neurological disorder; less than 6th-grade education, and history of substance abuse. All individuals with dementia were assessed using the Mini-Mental State Examination (MMSE), serum studies and an MRI scan of the brain. These individuals have been described in previous studies33. As reported in previous studies, they were either diagnosed with AD or VaD using standard diagnostic criteria34,35.
A total of 175 non-dementia participants completed a research protocol consisting of neuropsychological measures and neuroimaging. Two neuropsychologists reviewed all data. Inclusion criteria: age ≥ 60, English primary language, availability of intact activities of daily living (ADLs) as per Lawton and Brody’s Activity of Daily Living Scale, completed by both the participant and their caregiver36. Exclusion criteria: clinical evidence of major neurocognitive disorder at baseline, as per the Diagnostic and Statistical Manual of Mental Disorders—Fifth Edition37, presence of a significant chronic medical condition, major psychiatric disorder, history of head trauma/neurodegenerative disease, documented learning disorder, epilepsy or other significant neurological illness, less than 6th grade education, substance abuse in the past year, major cardiac disease, and chronic medical illness-induced encephalopathy. These participants were screened for dementia over the telephone using the Telephone Interview for Cognitive Status (TICS38) and one in-person interview with a neuropsychologist and a research coordinator who also evaluated comorbidity rating39, anxiety, depression, ADLs, neuropsychological functioning, and digital clock drawing40. Data from these participants have been described in other studies3,19.
Procedure
Cohort participants completed two clock drawings: (a) command condition where they were instructed to “Draw the face of a clock, put in all the numbers, and set the hands to ten after eleven”, and (b) the copy condition wherein the participant was presented with a model of a clock and asked to copy the same underneath it2. A digital pen from Anoto, Inc. and associated smart paper17 were used to complete the drawings. The digital pen captures and measures pen positions on the smart paper 75 times/second. 8.5 × 11 inch smart paper was folded in half, giving participants a drawing area of 8.5 × 5.5 inch. Only the final drawing was extracted and used for analyses in the current study.
Clock drawings to both command and copy conditions from the training cohort were used to train the RF-VAE. After that, clock drawings to both command and copy conditions from the fine-tuning cohort were used to train the weights of a neural network classifier and fine-tune the weights of the RF-VAE encoder to distinguish dementia from control clocks. Command and copy clocks were not separated in training because we wanted the model to learn clock encodings that are agnostic to any cognitive outcome and hence generalizable to multiple different classification tasks. The fine-tuning dataset comprised 84 dementia and 263 nondementia clocks. Ultimately, the classification network was tested on the test dataset comprising 28 dementia and 87 control clocks.
Individual clock drawings were extracted from the file using contour detection. The extracted contours were cropped to the boundaries of the clock drawing, padded with white space to a square, and resized to 64 × 64, as this was the only size supported by the RF-VAE implementation25 used. Supplementary Fig. 3 shows the preprocessing pipeline described above.
Statistical testing
The latent features developed by the RF-VAE were tested for statistical difference between dementia and non-dementia cohorts using two-tailed Student’s T-tests with multiple comparisons correction using the Benjamini–Hochberg method41 with FDR = 0.01. The confounding effects of age and education were removed using propensity score matching using the open-source Python library called PsmPy42. This gave us a propensity-score matched cohort of 110 dementia clocks and 220 non-dementia clocks. Significance shown in Fig. 3A were based on adjusted p-values estimated on this propensity-matched cohort, as shown in Supplementary Table 1. Correlation between the variables was calculated using Pearson’s Product Moment Correlation coefficient. Thereafter, the correlation matrix was thresholded at 0.2 and − 0.2 as these values represented 5th and 95th percentiles in the non-parametric distribution of the correlation values. The thresholded binary matrix was used as an adjacency matrix to generate a cross-correlation graph between the latent variables.
Models and experimental setup
A variational autoencoder (VAE) represents a generative model that can learn a lower-dimensional representation of input data in the form of the mean and standard deviation of a Gaussian distribution which it samples to reconstruct the input data. The non-linear output decoder network compensates the loss of generality caused by the prior normal distribution. One disadvantage of the VAE latent distribution is a lack of disentanglement of factors: each latent variable being exclusively responsible for the variation of a unique aspect of the input data. In this paper, we have used an existing implementation of a VAE-based deep autoencoder model that can learn all meaningful sources of variations in clock drawings in its disentangled latent representation. This model, called RF-VAE, uses total correlation (TC) in the latent space to improve disentanglement of relevant sources of variation while tolerating significant KL divergences from nuisance prior distributions while simultaneously identifying factors having low divergence from these nuisance priors as “nuisance sources of variation”. This way, it can learn “all meaningful sources of variations” in its latent space.
The preprocessed clock image was fed to the RF-VAE network with the latent dimension of 10. The RF-VAE network was trained for 1400 epochs at a learning rate of 10−4 with a batch size of 64 following recommendations in source articles25,43. The reconstruction loss was cross-entropy, and the optimizer was Adam44. RF-VAE training took 3.5 h, on a GeForce GP102 Titan × GPU from NVIDIA Corporation. The trained latent space of the RF-VAE was fed to a fully connected feed-forward neural network with two hidden layers having seven neurons in the first hidden layer and four neurons in the second hidden layer. Using an Adam optimizer, the classifier was trained using the fine-tuning dataset for 20 epochs, with a batch size of 32 and a learning rate of 0.0075. The classification loss was binary cross-entropy. A 3.125:1 weight was assigned to the dementia class during training to ameliorate the class imbalance in the fine-tuning dataset. All hyper-parameters were selected using the fine-tuning dataset inside a fivefold cross-validation design by maximizing the average fold AUC of the model. Figure 6 shows the network architecture and represents our method’s conceptual workflow. The top portion of each panel in the figure shows the training process of the RF-VAE. The bottom portion of the figure shows how the trained encoder weights of the RF-VAE support a task-specific classifier. The performance of this trained classifier was tested on the test data, and several important performance metrics, namely, AUC, Accuracy, Sensitivity, Specificity, Precision, and Negative Predictive Value (NPV), were reported. The test data were bootstrapped 100 times using random sampling with replacement to create confidence intervals. The median score, 2.5th quartile, and 97.5th quartile of these metrics over the bootstrapped test dataset were reported.
Conceptual workflows of the proposed method. (A) High-level conceptual diagram showing the training, validation and testing procedures. RF-VAE undergoes unsupervised training with 23,521 unlabeled clock drawings. Subsequently the trained RF-VAE encoder is transferred to a “fine-tuning” stage where a fully connected neural network is optimized using 84 dementia and 263 normal clocks. Finally, the pre-trained encoder and the fine-tuned classifier are tested on 28 dementia and 87 normal clocks. (B) Detailed workflow showing the different loss functions which are minimized during training and classification. In the training stage, a ten-dimensional RF-VAE latent space is constructed by minimizing the loss between original and reconstructed clock drawings and minimizing the total correlation between latent dimensions to disentangle them. Furthermore, feature relevance is ensured in the latent space by eliminating those latent variables that do not diverge significantly from previously defined prior distributions. In the classification stage, the trained encoder is fine-tuned jointly with a fully connected neural network for classifying dementia from non-dementia clocks in the classification stage. Furthermore, age, sex, race, and years of education are added to the latent dimensions to train another classifier with higher performance.
We evaluated the performance gain of the classifier upon the addition of age, sex, race, and years of education of participants to the model. The best-performing classifier consisted of three hidden layers with ten input neurons, 512 neurons in the first hidden layer, 256 neurons in the second hidden layer, and 128 neurons in the third hidden layer. It was trained for 20 epochs over the fine-tuning data with a batch size of 8, at a learning rate of 0.0075. All hyper-parameters were selected using the fine-tuning dataset inside a fivefold cross-validation design by maximizing the average fold AUC of the model. Figure 6 illustrates the different steps in the workflow.