With this work, we present a novel deep learning-based algorithm for GA segmentation and quantification on OCT with high and consistent performance. The algorithm was able to accurately segment GA areas on OCT with a mean DSC of 0.86 in the internal test set and a mean DSC of 0.91 in the external test set. Notably, our algorithm reached the same level in DSC as the inter-grader variability of manual segmentation by human expert graders obtained with extensive manpower efforts.
The DSC of the GA growth area between baseline and month 12 was close to 0.5 and therefore lower than that of the segmentation of the total GA area. As the GA growth area represents a small region in most cases, especially compared to the total GA areas, every wrongly segmented pixel—manually or automatically—has a large impact on the average DSC, generally leading to a lower DSC result. As complimentary performance metric we calculated the HD95, which corresponds to the maximum distance between 2D en-face contours of the reference and the segmented region. The mean HD95 for the GA growth area was 0.40 mm, meaning that 95% of the values had an error below 0.40 mm. It was within the range of the mean HD95 of the external validation set for the total GA area (0.38 mm), which indicates the quality of the predicted lesion localization25. Additionally, we evaluated the correlation between manually and automatically segmented growth rates between treatment groups. No statistically significant difference was detected between the automated and the manually segmented growth rates at month 12 with a correlation coefficient of 0.81, which is a clinically relevant finding with regard to providing reasonable automated GA monitoring under therapy.
Precise morphological monitoring of disease activity and therapeutic response is crucial in GA. Functional parameters like best corrected visual acuity (BCVA) that are used to evaluate disease progression and therapeutic response in other retinal diseases, do not reflect disease progression in GA and are thus unsuitable for disease monitoring. BCVA can be preserved until an advanced stage of the disease, when the foveal tissue gets affected by the atrophic process14. Currently, slowing the anatomical growth rate of GA is an accepted clinical trial endpoint5. This underlines the importance of precise and objective image analysis methods to measure morphologic parameters in GA patients, especially since a novel treatment has now been approved by regulatory authorities. AI-based image analysis tools are urgently needed to support physicians in clinical practice to handle the large amount of data and particularly the subclinical hallmarks at the population level generated by the huge number of patients affected by GA who may benefit from regular monitoring. In particular, the regulatory authorities such as the U.S. Food and Drug Administration (FDA) have documented their interest in considering anatomical endpoints when functional parameters are too variable26. On using OCT as a modality it was noted that a more automated measurement is more prone to provide better accuracy26. A first step towards this goal is the automated segmentation of the RPE loss area on OCT.
To this end, we developed an automated algorithm for RPE loss segmentation on OCT, whose technical framework has been introduced previously20. To the best of our knowledge, our study is the first which evaluated AI-based longitudinal assessment of GA growth under therapy. Our algorithm was evaluated for GA progression monitoring on OCT and, furthermore, compared to the results of a prospective phase 2 clinical trial of an investigational complement inhibition therapy for GA. There was a slight difference in performance between the internal and external test set (mean DSC 0.86 in the internal test set vs. mean DSC 0.91 in the external test set). Of note, the internal and external test set derive from two completely independent datasets. One explanation for the difference in performance between the two datasets could be the clinical study setting of the external test set (FILLY trial) with standardized good quality images versus the real-world cohort setting of the internal test set. Moreover, for the internal validation, OCT annotations derived from FAF images were used as training data vs. manual annotations on OCT for the external validation. There could be a slight misregistration of the OCT and FAF images, which could also be an explanation for the difference in performance. We could show that the difference in growth rates between the treatment groups from the trial, measured semi-automatically on FAF, reached the same results as our deep learning-based algorithm using OCT scans from a subset of patients. Thus, we can conclude that FAF and OCT as well as manual and automated segmentation on OCT resulted in the same clinical trial outcome. Notably, the algorithm performed within the intergrader variability between two experts. Therefore, we believe that the performance of the algorithm has a high validity for possible clinical use.
Different approaches for automated GA segmentation on various imaging modalities have been proposed previously27. Most studies predominantly focused on GA segmentation on 2D FAF with a DSC ranging from 0.83 to 0.8928,29. As SD-OCT has become the standard imaging method in AMD in the clinical setting7, more studies have now focused on GA segmentation on OCT30,31,32 with a DSC range of 0.81–0.8727.
In contrast to our algorithm, which was trained on the pathology itself, namely the annotated RPE loss, previous algorithms mainly focused on choroidal hypertransmission to detect GA on OCT, which is a secondary consequence of overlying tissue loss33,34. By processing an OCT projection image obtained from the region between RPE/Bruch membrane and the choroid, those methods achieved an average DSC of 0.8733 and 0.8134, respectively. However, choroidal hypertransmission alone is not sufficient to identify GA on OCT as it underlies great inter-individual variability and is dependent on image quality as well as on the overall signal level of the OCT volume35. The inhomogeneity of the hypertransmission signal is also referred to as bar code pattern making it difficult to consistently quantify small changes in cellular loss.
Other AI methods used single A-scans of the OCT as an input instead of a projection image, reaching a DSC of 0.8730 and 0.9136. By only using the OCT A-scans as input, the algorithm cannot learn from the full 3D contextual information. Our algorithm was specifically trained to delineate a topographic GA area on a 2D en-face map, using the whole 3D OCT volume as input and benefitting from a rich spatial 3D context. Moreover, the algorithm performs equally to human expert graders in an independent external test set and was evaluated using a clinically-relevant endpoint.
Recently, Zhang et al. published an algorithm for GA segmentation on OCT trained on the FILLY data37. They used the previously reported classification system proposed by the Classification of Atrophy Meeting (CAM) group for the description of earlier lesions in atrophic AMD on OCT, namely incomplete RPE and outer retinal atrophy (iRORA) and complete RPE and outer retinal atrophy (cRORA). There are three major criteria that have to be present to meet the definitions for these lesions: (1) region of hypertransmission, (2) RPE disruption or attenuation and (3) signs of photoreceptor degeneration, with (1) and (2) < 250 µm diameter representing iRORA and ≥ 250 µm diameter representing cRORA38,39. These definitions, however, still have to be validated and implemented in clinical practice as they have shown to underlie substantial inter-reader variability40. The CAM classification originates from pre-AI times when an accurate measurement of the extension of alteration in the different layers was not available and therefore represents rather a qualitative assessment. With high-precision measurement using AI tools a distinct grading of GA progression has become possible and enables monitoring of disease activity and therapeutic response in area change on a micron scale, replacing the gross distinction between i- and c-RORA. Particularly to detect morphologic changes preceding GA and investigate potential earlier targets for new treatments requires a resolution superior to 250 µm. During the advanced progression of GA disease, the photoreceptor status has been shown to exceed and precede RPE loss and could therefore identify patients at risk for faster progression11,41. The method proposed by Zhang et al. was in contrast to our work trained on the FILLY data and validated on clinical data and reached a mean DSC for RPE loss in the external test set of 0.87 ± 0.2137.
Another publication recently reported on a deep learning-based method using the RORA classification. The method reached a mean DSC of 0.88 ± 0.074 and 0.84 ± 0.076 compared to two separate graders in the external test set. However, the number of OCT scans in the external test set was very limited (18 OCT volumes)35.
The algorithm we proposed in this work goes beyond the previous methods by taking full 3D context into account as opposed to slice-by-slice segmentation. Moreover, it adds the clinically most relevant value of investigating AI-based monitoring of GA progression under complement inhibition therapy on OCT. Two other studies by our group used the proposed algorithm on RPE loss segmentation to investigate the therapeutic effect of pegcetacoplan on OCT but with distinct different purposes. The work of Riedl et al. focused on the inhibition of photoreceptor thickness and integrity loss42. The work of Vogl et al. investigated the local GA growth estimation43—both aspects were not evaluated in our study. Both mentioned publications did not include the extensive clinical validation as was done in this study. However, they used additional automated algorithms as we believe it is crucial to introduce automated OCT monitoring of GA to the community, specifically under therapeutic conditions. Our results suggest that the proposed algorithm can be used for objective, scalable and precise quantification of GA areas on OCT over time. To be clinically applicable, the robustness of the model across different imaging devices and different disease stages has to be investigated. Further prospective, randomized studies are needed to evaluate the ability of the algorithm to be implemented in clinical practice.
Now that a treatment for GA secondary to AMD has become available, we believe that automated and objective AI methods will be indispensable in the management of GA patients and OCT-based treatment guidance in routine clinical practice. Treatment effects in GA patients cannot be assessed by BCVA change as in exudative AMD, yet treatment will be invasive and long-term. Patients will have to be motivated to follow the continued regimen and payers may request proof of benefit. AI models can be used to predict disease progression and identify further biomarkers which are correlated with future GA growth, thus helping us to better understand the underlying pathomechanisms44,45,46. Furthermore, treatment requirements may be adapted on an individual patient level based on such predictions. In respect to the huge GA population to be treated, only automated fast and accurate AI-based evaluation, i.e. by mouse click will be efficient.
A limitation of this study is a possible selection bias due to the post-hoc analysis of a potential non-random subset from the FILLY trial. Furthermore, this also defines the minimal GA size defined by the inclusion criteria of the trial (lesions > 2.5 mm2). Although the MUV GA cohort consists of real-world patients from routine clinical care, patients were excluded if they had other retinal diseases to train the algorithm on GA patients secondary to non-neovascular AMD only, in concordance with inclusion criteria of currently ongoing treatment trials. Also, the model was trained and evaluated on Spectralis scans only. More studies are needed to investigate the performance of the algorithm on other OCT devices as well as mixed cases. Potential discrepancies with the topline results reported in FILLY could be due to FAF having a bigger field of view than OCT. Furthermore, the automated registration of FAF-based annotations to OCT might lead to some discrepancies. However, this is only the case for internal evaluation, while for external validation the high-level expert annotations on every B-scan of the whole OCT volume were taken as the reference, which is a great strength of this study. The performance of the algorithm was even slightly higher in the laboriously annotated external test set than in the internal evaluation, maybe due to the settings of a randomized clinical trial. While the overall correlation of GA growth rates was high between the manual and automated method for the different treatment groups, the prediction for one individual patient can still be challenging. More extensive phase 3 data is needed for further evaluation.
In conclusion, we propose a fully-automated segmentation method for reliable delineation and quantitative measurement of GA areas on SD-OCT, developed on a real-world cohort. The method was shown to be capable of monitoring GA progression under therapy with the first successful therapeutic approach, i.e. complement inhibition, a major unmet need for future personalized treatment of GA. The method represents an important step toward AI-based monitoring of GA patients in clinical practice on a large scale.