These actigraphy data were collected as part of the Wellness Monitoring for Major Depressive Disorder (Wellness Monitoring Study), a longitudinal observational study conducted by the Canadian Biomarker Integration Network in Depression (CAN-BIND), which aimed to identify predictive biomarkers of relapse of major depressive disorder (MDD) (ClinicalTrials.gov Identifier: NC02934334). The Wellness Monitoring Study used ambulatory monitoring to establish which variables can act as “warning signals” prior to a relapse of MDD. Several symptom domains were evaluated, including mood and anxiety symptoms, sleep, activity, biological rhythms, anhedonia, pain, quality of life, treatment compliance-related variables, speech characteristics and voice characteristics. The domains were assessed through different methods, including self-report questionnaires, clinician-rated assessments, audio recording of voice, and objective monitoring of activity, sleep and biological rhythms with actigraphy.
Participants were enrolled into the study if they had a diagnosis of MDD, responded to treatment for their most recent major depressive episode, and had a current MADRS score < 14 at baseline and screening visits, resulting in a total of 101 participants who completed a baseline visit. Following written informed consent, participants received a study-specific smartphone (LogPad®, ERT, Clario [formerly, PHT]) and wrist-worn actigraph, which were used for the duration of the study. Further information about the study sample is provided in the Supplementary Materials, including supplementary Figure 1 which describes participants in the Wellness Monitoring study.
Participants completed a screening visit, a baseline visit within 2 weeks of screening, and a minimum one-year observational phase (early withdrawal allowed). Most participants completed screening and baseline visits on the same day. During the observational phase of the study, participants completed in-person assessments every 8 weeks in addition to continuous ambulatory monitoring. Participants enrolled on a rolling basis and had variable lengths of follow up periods with target durations of at least 1 year since last patient enrolled.
At baseline, and subsequent 8-weekly follow-up visits, participants were assessed through an on-site electronic data collection device (the SitePad®) which recorded measures of depressive symptom severity, healthcare service use, and symptom severity. Additionally, participants completed self-report questionnaires through the Brain-CODE REDCap interface and provided blood samples, as well as a series of weekly self-reports, and biweekly speech and voice characteristics through the LogPad® device. Further information about the study inclusion/exclusion criteria, treatment and relapse is provided in the supplementary material. All procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. Study procedures were approved by local research ethics boards and all participants provided informed consent before study entry.
Data acquisition: raw actigraphy data
The Actigraph GT9X-BT Link® (ActiGraph, Penascola, Florida, USA) device was used to collect sleep, activity and biological rhythms parameters through the observational phase of the study. Study coordinators uploaded the data to the CentrePoint Study Admin System (http://www.actigraphcorp.com/product-category/study-admin/) and monitored adherence during in-person visits. CentrePoint is a cloud-based technology platform developed by Actigraph, which preserved data integrity, as well as network security, availability, and standards compliance. The GT9X Link contains a capacitive touch wear sensor18.
Participants were instructed to wear the GT9X Link® device 24 h per day for the entire duration of the study, and received a charging dock and USB cable to charge the device from home. Data were collected at 30 Hz on the non-dominant wrist. At each in-person visit, data were extracted to the CentrePoint system by study coordinators. Data from the CentrePoint system were transferred to OBI’s Brain-CODE platform at the completion of the study. Data were first extracted as raw .gt3x files, at intervals corresponding to occasions on which data were uploaded. Data were additionally aggregated into minute-by-minute epochs, as one .csv file for each participant, and were initially sleep scored using the Cole-Kripke algorithm19 (Fig. 1: Raw Actigraphy Data).
Raw actigraphy data provided information about the direction and orientation of the actigraph, while count data only provided information about the amount of movement. Count data aggregated by epoch are traditionally used as the basis of calculating sleepand energy expenditure parameters20, as well as non-wear, while more recent actigraphy processing methods use raw data16,21.
Data processing and analysis
Figure 1 shows a summary of the automated data pre-processing pipeline, as executed in R Statistical Software (v 4.0). As part of this pre-processing pipeline, we assessed data missingness and scored sleep and wake for minute-by-minute epochs using the Cole-Kripke19 and Tudor-Locke22 algorithms. Next, we tested the accuracy of four methods of non-wear detection: (1) the built-in wear sensor available in this actigraph model; scored the minute-by-minute epoch data using the (2) Choi14 and (3) Troiano15 algorithms; and (4) used the raw 30 Hz actigraphy data for scoring using the van Hees algorithm23. From these four methods, we created a new non-wear scoring algorithm (the majority algorithm), and conducted visual quality control of this majority algorithm (See “Non-wear detection” section below). Next, we combined the sleep intervals with non-wear intervals, and conducted sensitivity analyses to assess the influence of valid day selection and percentage of overlap between non-wear and sleep on the relationship between sleep variables and the main outcome measure of this study – the Montgomery-Åsberg Depression Rating Scale (MADRS)24, which was collected at each in-person visit.
An important step in data pre-processing is to trim the data including only data that will be used for analysis. For instance, in case of withdrawal from the study, participants may have worn the actigraph (or the actigraph may have collected data) until it is returned to the lab, at a later date than the official withdrawal date from the study. Additionally, researchers may only be interested in analyzing a specific portion of the collected data, in which case data trimming is also necessary. In the Wellness Monitoring Study, data were trimmed to 1 year of collection, and data that extended following the participant’s enrollment or collected due to configuration error prior to enrollment in the study were trimmed based on study enrolment dates. Duplicate rows were removed (Fig. 1: Data Trimming).
It is important to ensure that data for all dates were accounted for, including periods of missing data, if such paradata were to be recorded or reported. Paradata refers to administrative data that were obtained during the process of collection, management and treatment of actigraphy data25. If a participant was asked to wear multiple actigraph devices throughout the duration of the study, the periods of overlap must be correctly accounted for, and the correct data interval should be used. We maintained accurate paradata of the rows that were removed, and the number of missing minutes per day, per participant, which will be stored and made available with the pre-processed data.
Minute-by-minute epoch data were scored for sleep and wake using the Cole-Kripke and Tudor-Locke algorithms deployed in the actigraph.sleepr package (https://github.com/dipetkov/actigraph.sleepr), which is an open-source implementation of the ActiLife software’s sleep and non-wear detection algorithms (Fig. 1: Sleep/Wake Scoring: Cole-Kripke and Tudor Locke Algorithms). From this analysis, epoch-based scoring of minute epochs and sleep intervals were obtained. Sleep intervals were characterized by the following variables: sleep maintenance efficiency (SE, %), sleep duration (mins), activity counts, non-zero epochs, total sleep time (TST, mins), wake after sleep onset (WASO, mins), number of awakenings, movement index, fragmentation index, sleep fragmentation index, sleep onset time (HH:MM:SS), time out of bed (HH:MM:SS), number of one minute sleep intervals, mean mid sleep time ([time out of bed – sleep onset time]/2), average awakening (mins). Fragmentation index is calculated as a percentage of sleep periods that last 1 min compared to number of periods of sleep during the sleep period. Movement index consists of the percentage of epochs during the sleep period where y-axis counts were larger than zero. Sleep fragmentation index is the sum of the movement index and fragmentation index26.
In the Wellness Monitoring Study, we used the wear sensor embedded in the Actigraph GT9X Link, in addition to the Troiano, Choi and van Hees algorithms to detect non-wear. The Troiano and Choi algorithms were chosen due to their wide use, ease of implementation, and availability through the ActiLife software. The van Hees algorithm was chosen due to its superior performance in Syed and colleagues’ study27, and ease of implementation. The Troiano and Choi algorithms use epoch-aggregated count data14,15. The Troiano algorithm defines non-wear intervals as 60 or more consecutive minute epochs with no activity, allowing for 1 or 2 min of counts of 0 to 10015. Since this algorithm is prone to classifying sedentary activity as non-wear time, Choi and colleagues proposed a modified algorithm where non-wear was classified as intervals of at least 90 min with consecutive minute epochs of no activity. Intervals of 1 or 2 min with non-zero counts would not change this classification, if there was no activity 30 min before or after that interval14. Newer approaches such as the van Hees algorithm use raw data16. Van Hees’ algorithm is based on raw data, where a period is deemed to be non-wear when the standard deviation of movement is lower than 3.0mG (1mG = 0.00981 m/s2) or the value range is lower than 50 mg for at least 2 of 3 axes for a given 30-min period16,23. These approaches are useful to detect longer periods of non-wear, however, shorter periods of non-wear (e.g., taking the actigraph off for showers), will not be detected.
The capacitive sensor on the Actigraph GT9X Link provided epoch-aggregated non-wear detection at the minute level. The capacitive sensor consists of a metallic plate. Based on the concept of capacitive coupling, the sensor charges more quickly when it is in closer proximity to our bodies. The sensor therefore measures the amount of time that the capacitor uses to charge, and therefore allows estimation of non-wear28. Troiano15 and Choi14 algorithms were used to score the activity (motion) data from csv files containing minute-by-minute data using the actigraph.sleepr package (Fig. 1: Non-wear Scoring: Choi and Troiano Algorithms). Additionally, non-wear scoring was performed on the raw data gt3x files using the van Hees algorithm through the GGIR package23. While using this package, we specified a 5 s window for calculating acceleration and angle, 900 s for the epoch length to calculate non-wear and signal clipping, and 3600 s for the window of wear detection (Fig. 1: Non-wear Scoring: Van Hees Algorithm). Agreement between algorithms during each epoch was evaluated through minute-by-minute overlap of non-wear detected by the different algorithms and the wear sensor. Additional information about data processing is provided in the Supplement.
Development of a novel non-wear algorithm: the majority algorithm
A novel non-wear algorithm, the majority algorithm, was developed by calculating the percentage of overlap between the wear sensor, Troiano, Choi and Van Hees algorithms in each minute epoch (Fig. 1: Non-wear Scoring: Development of the Majority Algorithm). If 3 or 4 of the 4 methods of detection indicated that a minute epoch should be classified as non-wear, this minute epoch was classified as non-wear. As the Choi algorithm is an updated version of the Troiano algorithm, we compared the performance of a 4-method version of the majority algorithm (which combined the wear sensor, Troiano, Choi and van Hees algorithms) to a 3-method version of the majority algorithm (which only used the wear sensor, Choi and van Hees algorithms). For the 3-method version, if 2 or 3 of the 3 methods of detection indicated that a minute epoch should be classified as non-wear, this minute epoch was classified as non-wear. To validate the use of this algorithm, we performed visual quality control to evaluate performance of the majority algorithm in a subset of participants. We selected a majority of these participants based on their relapse status, as this was the major outcome in the Wellness Monitoring Study (see Supplementary Material). Each participant file was reviewed day-by-day, where false non-wear detection was identified by one or two trained independent scorers (see Supplementary Material for further details). Accuracy, positive predictive value, sensitivity and specificity statistics were calculated for epoch-level data for each of the 5 algorithms (Choi, Troiano, van Hees, majority (4), and majority (3)) and the wear sensor, as compared to visual quality control at the day level. As 6 of the participant data files were scored by 2 scorers, we averaged the results of the accuracy, positive predictive value, sensitivity and specificity statistics for these participants for the outputs of the algorithms compared to visual quality control. To test the difference in performance of the algorithms, we fitted mixed linear models, with day-level performance statistics as dependent variables and algorithm*day as the independent variables using the lme4 package. We compared the performance of the different algorithms using estimated marginal means of the models, with a Tukey correction for multiple comparisons using the emmeans package. Inter-rater reliability (Cohen’s kappa) was calculated.
Addressing data missingness
Some analytic procedures require complete data. Data missingness can be classified as missing completely at random (MCAR), meaning that missing data are missing independently of observed or missing data. This type of missingness does not cause bias, despite increasing standard error. Missing at random (MAR) data occur when the mechanism of missingness is a partial result of the observed data, and if the mechanism of the missing data is a result of the missing data, this indicates the data are not missing at random (NMAR)29.
It is plausible that participants’ non-wear may correspond with periods of relapse of depression, which is the key outcome measured in the Wellness Monitoring Study, indicating that these data are likely not MAR or MCAR. Additionally, summary statistics regarding non-wear can be used in modeling outcomes during the analysis stage. Therefore, we intend to use missing data as part of our modelling approach, where variables describing non-wear and missingness will be included in predictive models for mental health outcomes.
At the epoch level, we used the average day imputation method, where missing data are imputed by an average of the values collected during the same time period that has missing data (for instance, if data are missing from 7:00 to 7:15, this algorithm will create an average for that missing interval based on the data that were collected)30. To perform this average day imputation, we used a window of 7 days (i.e., 3 days prior to and 3 days following the day with missing data). We did not impute full days of data – only days with partial missing data were imputed. In this study, data could have been missing as a result of non-wear (based on the majority (3) algorithm) or as a result of data not being collected for the period (Fig. 1: Addressing Data Missingness).
Spearman correlations were applied to assess the relationship between depressive symptoms according to the MADRS and data missingness or non-wear patterns. As the data for sleep and depressive symptoms were assessed at different frequencies, we aggregated these data by creating an average of each sleep variable.
Many studies in actigraphy literature use filtering approaches, where days are only considered valid if the actigraph is worn over a certain number of hours for each day31. This threshold has not been standardized, though the most commonly used threshold is 10 h or more of available data in a day31, for the day to be considered valid. A sensitivity analysis was conducted to test influence of non-wear on the relationship between sleep and MADRS scores, the main symptom outcome measure in this study. This sensitivity analysis consisted of two components: (1) number of valid hours of data per day for the day to be considered valid and (2) overlap of the sleep interval with non-wear, and how these components influenced the relationship between sleep variables and depressive symptoms (Fig. 1: Sensitivity Analyses).
First, this sensitivity analysis used hourly thresholds starting from > 6 to 24 valid hours per day of analysis for the relevant sleep interval to be included in the analysis, as well as all collected data. The second component of the analysis selected several thresholds for excluding intervals of sleep based on overlap with non-wear. Overlap of sleep with non-wear intervals was calculated for each sleep interval, first by generating the number of non-wear minutes in each sleep interval, and subsequently calculating percentage of non-wear minutes per duration of the sleep interval. Thresholds were tested in 10% intervals, ranging from < 10% overlap to up to 100% overlap. Sleep intervals exceeding a given threshold (e.g. > 80% overlap) were excluded from analysis for each iteration of this analysis. Since MADRS scores were obtained every 8 weeks for the duration of the study, and at each relapse verification visit, we averaged sleep values across each 8-week epoch. For each combination of thresholds, we conducted mixed linear modeling with the following variables, following standardization, as fixed-effects variables used to model of MADRS score: sleep variables (SE, duration, activity counts, non-zero epochs, TST, number of awakenings, movement index, fragmentation index, sleep onset time, out of bed time, number of one minute sleep intervals, average awakenings), time since study enrolment and number of missing or non-wear minutes, and participant ID as a random intercept . We evaluated 190 combinations of overlap threshold and valid day selection, and chose the threshold combination with the lowest marginal R2 32.
All analyses were implemented in R statistical software (v. 4.0).