Study participants and study datasets
For our Full Dataset, we obtained retrospective, de-identified coronary angiographic studies from all patients 18 years or greater from the University of California, San Francisco (UCSF), between April 1, 2008 and December 31, 2019 (Supplementary Fig. 1) that underwent a coronary angiogram. Patients without videos of the left or right coronary artery were excluded. Angiograms were acquired with Philips (Koninklijke Philips N.V., Amsterdam, Netherlands) and Siemens (Siemens Healthineers, Forchheim, Germany) systems at 15 frames per second using Iopromide contrast. We generated specifically annotated training datasets (either through available meta-data or expert annotation) of Full Dataset subsets for each of the four primary tasks performed by CathAI.
To maximize the manual labeling efforts required to generate training data for each Algorithm, we generated an “extracted Full dataset” by first automatically identifying the frames within the video that likely contained peak-contrast by calculating the structural similarity index measure (SSIM) from the frame in position ‘0’ where no dye is usually present, which we called the “reference” frame. SSIM is higher if images have similar pixel values and lower if there is greater difference. The frame with the lowest similarity index from the reference frame was selected as most likely containing peak-contrast (e.g. when the artery is full of contrast). Up to 8 frames were then extracted from each video by retaining the reference frame, the peak-contrast frame and the 3 frames immediately preceding and following the peak-contrast frame. This is referred to as the “extracted” Full Dataset frames (n = 1,418,297). All frames of a video were converted to images of dimension 512*512 pixels for algorithmic analysis. Subsets of frames from the extracted Full Dataset were then labeled for each task, as described below, to generate training data for Algorithms 1–3.
For all algorithms, except Algorithm 3, data was split randomly for each algorithm into Training (70%), Development (10%) and Test (20%) datasets, each containing non-overlapping patients. The development dataset was used for algorithm tuning, when required. For Algorithm 3, dataset splits were Training (80%) and Test (20%); since we used original hyperparameters and did not require algorithm tuning16,32.
Algorithm 1 labels were taken directly from the DICOM metadata describing the cranial-caudal and LAO-RAO orientations. Algorithm 2 and 3 required annotations by a board-certified cardiologist (Supplementary Fig. 8 and Supplementary Table 1, 2 and 3 for definitions). For Algorithm 4, the stenoses were taken directly from the procedural report.
The Report dataset
Shortly after performing the procedure, interventional cardiologists typically interpret the angiogram using visual assessment, as per standard clinical practice, and describe the severity of coronary stenosis in the procedure report. This procedural report text was parsed (see below) to identify: any description of coronary artery stenosis, the maximal stenosis percentage (called the REPORT-stenosis) and its location in one of 11 coronary artery segments (Supplementary Table 3). We identified 9782 coronary stenoses in artery segments (REPORT-stenoses) and identified in 1766 non-stenosed complete vessels yielding a total of 10,088 non-stenosed vessel segments, derived from 84,153 images. Then we randomly sampled 10,000 images corresponding to healthy artery subsegments (Supplementary Fig. 1). Metadata was extracted from each angiogram video including the procedure date, the primary (Right Anterior Oblique [RAO]/Left Anterior Oblique [LAO]) and secondary (cranio-caudal) angles of rotation, a unique exam identifier and a unique patient identifier. Non-matched REPORT-stenoses were removed from the dataset. We also excluded videos where an intra-coronary guidewire was present in more than 4 frames, as automatically determined by Algorithm 3 (6,076 videos, 41,780 stenosis-frames identified with guidewires), since these videos likely represent percutaneous coronary interventions which could alter the stenosis percentage within that video (Supplementary Fig. 1); videos were retained from studies prior to the insertion of an intracoronary guidewire.
Text parsing methods
The free text from the procedural report was first segmented using commas (“,”) or periods (“.”). We then applied text parsing methods to identify distinct coronary segments (Supplementary Table 3). When a coronary segment was found, we identified any description of corresponding stenosis percentage by localizing “%” and the nearest one to three-digit number in that sentence. We initially searched using standard terms (such as “right coronary artery”), then expanded the keywords by manual review of the text, over multiple iterations, to include the most common abbreviations, alternate spellings and permutations (Such as “RCA”). Qualitative descriptions of obstructive CHD, such as “mild”, “moderate” or “severe” disease were not extracted. Furthermore, we searched for keywords such as “thrombus”, “obstruction” or “occlusion” in the report; when present in a coronary segment, 100% stenosis was assigned to that segment. For analysis, ostial and proximal coronary segments were merged in the ‘proximal’ class and ostial, proximal, middle, and distal left main arteries were merged under the ‘left main’ class. The most severe stenosis within any of the 11 segments was retained (Supplementary Table 3). We did not analyze chronic total occlusions and stenoses in diagonals, marginals, septals, ramus, left posterior descending artery, left posterolateral or in bypass grafts.
Human subjects research
This study was reviewed by the University of California, San Francisco Institutional Review Board and need for informed consent was waived. The external validation was reviewed and approved by the University of Ottawa Institutional Review Board.
The CathAI system is comprised of 4 neural network algorithms organized in a pipeline (Fig. 1a). Angiographic images are analyzed by each algorithm and “flow” sequentially to the next to accomplish the four foundational tasks for automated angiogram analysis: (1) classification of angiographic projection angle; (2) classification of an angiogram’s primary anatomic structure; (3) localization of relevant objects within an angiogram, including coronary artery sub-segments and stenoses; (4) prediction of coronary artery stenosis severity (as a percentage of artery narrowing).
We customized each of CathAI’s 4 algorithms base neural network architecture to achieve an angiogram-relevant task, as detailed in sections below. As a high-level summary, Algorithm 1 accepted individual images (coronary angiogram video frames) as input and identified the angiographic projection angle used described by LAO-RAO and cranial-caudal axes (LAO cranial, RAO caudal, etc); labels were available from each video’s metadata. Algorithm 2 identified the primary anatomic structure (Supplementary Table 2), since it is common to capture angiogram videos containing non-cardiac anatomic structures such as the aorta or the femoral artery. Algorithm 2 allowed CathAI to subsequently focus on only angiogram videos primarily containing the left and right coronary arteries (LCA and RCA, respectively). Algorithm 3 localized relevant objects within images of the LCA and RCA by outputting bounding box coordinates for identified objects (Supplementary Video). Coronary artery stenosis location was assigned according to greatest overlap between two Algorithm 3-predicted bounding boxes of the coronary artery sub-segment and stenosis (Fig. 1b). Algorithm 4 accepted images cropped around stenosed artery segments (by Algorithm 3 bounding boxes) and predicted the maximal percentage stenosis within the image as a continuous value between 0 and 100 for each image. Predictions were averaged across a video to provide the video-level prediction; and the mean of video-level predictions from all videos that visualized an artery segment within a study provided the final artery-level prediction.
Algorithm 1: classification of angiographic projection angle
Algorithm 1 accepted individual images (video frames) as its input and identified the angiographic projection angle used. The projection angle refers to the fluoroscopic angulation used to obtain the image, commonly described on two axes defined by LAO-RAO and cranial-caudal (LAO cranial, RAO caudal, etc). For Algorithm 1 training data, all images from the extracted Full Dataset were categorized into 12 categories of left-right and cranio-caudal projection angles based on the primary and secondary angles extracted from each video’s metadata (−180 and 180 degrees for the primary angle and −50 and 50 degrees for secondary; Supplementary Table 1). We then split the extracted Full Dataset into training (990,082), development (128,590) and test datasets (299,625).
Algorithm 1 architecture was based on Xception, which is a convolutional neural network that has achieved state-of-the-art performance at image recognition tasks33. It was initialized with ‘ImageNet’ weights34, as commonly performed to initialize weights for faster algorithm convergence in image classification settings; all layers were trainable. Images were augmented by random zoom (range=0.2) and shear rotate (range=0.2). The development dataset was used to iteratively compare algorithm performance and fine tune hyperparameters using grid search (Supplementary Table 5). We experimented with different architectures such as VGG-16, ResNet50 and InceptionNet-V3 but found no incremental benefit over Xception. A grid search was used to fine-tune hyperparameters. The Test dataset was not used at all during training and was only used to report final performance. The most common prediction across extracted the frames of each video was assigned as the angiographic projection of that video; ties were addressed by selecting the projection with the highest average probability across all frames. We used Algorithm 1 weights as a ”core model” to initialize the weights for training the subsequent algorithms based on the Xception architecture.
Algorithm 2: classification of primary anatomic structure
Algorithm 2 aimed to identify the primary anatomic structure present in an angiographic video (Supplementary Table 2), since it is common to capture angiogram videos containing non-cardiac anatomic structures such as the aorta or the femoral artery during the procedure. To generate Algorithm 2 training data, we randomly selected 14,366 images from the extracted Full Dataset, and a cardiologist categorized each image into one of 11 classes describing the primary anatomic structure (Supplementary Table 2). This dataset was split into Training/Development/ Test datasets, containing 9887 (70%), 1504 (10%), and 2975 (20%) images, respectively. We trained Algorithm 2 using the Xception architecture, initialized weights from trained Algorithm 1, and tuned hyper-parameters (Supplementary Table 5). Images were augmented by random zoom (range = 0.2) and shear rotate (range = 0.2). The predicted primary anatomic structure of a video was the mode prediction of all of its extracted Full Dataset frames. Only videos that primarily contained right or left coronary arteries flowed to Algorithm 3 for subsequent CathAI analyses (Supplementary Fig. 1). F1 scores and model performance varied by anatomic class, but in general classes with lesser frames had lower performance (Supplementary Fig. 3b), suggesting a possibility to improve performance if more labeled data were available.
Algorithm 3: localization of angiogram objects
Algorithm 3 aimed to localize relevant objects within images of the left and right coronary arteries (the output of Algorithm 2). While Algorithm 3 was trained to localize multiple objects (Supplementary Table 3), the tasks most critical to the CathAI pipeline were to (i) identify coronary artery segments, (ii) identify stenoses (if present) and (iii) localize other relevant objects such as guidewires or sternotomy. To generate training data for Algorithm 3, 2338 contrast-containing images of LCA and RCA both with and without stenosis (as identified by Algorithm 2) were annotated by a cardiologist who placed bounding boxes around all relevant objects in the image (Supplementary Table 3). Only stenoses in the main epicardial vessels, not side branches such as diagonals or marginals, were labeled. In 100% of the 2338 frames, the LCA or RCA was the primary anatomic structure contained, and the artery was well visualized, well opacified, and not underfilled, according to the annotating cardiologist.
In our final CathAI pipeline we trained two versions of Algorithm 3: Algorithm 3a was trained on and accepted both LCA and RCA images as input. Since the RCA in the LAO projection contained the greatest number of annotated images in our dataset, we also trained a dedicated Algorithm 3b on this projection to demonstrate possible performance gains from focusing an algorithm on a specific artery/projection (RCA in LAO). To train Algorithms 3a/b, we split our labeled images for this task into two separate datasets: One containing left/right coronary arteries (2,338 images) and one containing RCA images in the LAO projection (450 images). Each dataset was subsequently split into 90% training (2,104 and 405 images respectively) and 10% test (234 and 45 images respectively) and Algorithms 3a/b were trained for 50 epochs. Once deployed in the CathAI pipeline, Algorithm 3b served to decrease input variability for Algorithm 3a, which produced performance improvements for both algorithms. Since we achieved performance gains by developing an algorithm on this specific artery/projection, future gains may be achieved with other dedicated algorithms.
Algorithms 3a/b employed the RetinaNet architecture and were trained using original hyperparameters16; a development dataset was not used. RetinaNet has achieved state-of-the-art performance for object localization such as the pedestrian detection for self-driving cars35. For our task, Algorithms 3a/b output bounding box coordinates for any objects present in each input image. Because some artery segments in certain angiographic projections are known a priori to be foreshortened or not visible, we applied a post-hoc heuristic to exclude certain Algorithm 3a/b-predicted artery segments from angiographic projections as predicted by Algorithm 1 (Supplementary Table 4). This thereby represents a fusion of intermediate predictions from two CathAI pipeline algorithms to achieve more clinically-relevant overall pipeline performance. To assess Algorithm 3a/b performance, the predicted coordinates were compared with the ground-truth coordinates using the ratio of the area of intersection over the area of union (called Intersection-over-union [IoU])36. An IoU≥0.5 between the predicted and annotated coordinates was considered a true positive. Next, we measured the mean average precision (mAP), which represents the ratio of true positives over true and false positives at different thresholds of IoU, for each class37 A mAP of 50% compares with state-of-the-art results for this type of task16,35.
Algorithm 4: predicting the percentage of coronary artery stenosis
Algorithm 4 was developed to predict the severity of coronary artery stenosis as a percentage, given input images cropped around stenosed artery segments identified by Algorithm 3. Algorithms 3a/b were run on all Report dataset videos to localize artery segments and stenoses. All frames that contained a stenosis bounding box overlapping with a coronary artery segment bounding box with IoU ≥0.20 comprised potential input frames for Algorithm 4. A stenosis was localized to the artery segment that Algorithm 3 identified which had the greatest overlap by IoU. To derive train/test labels for Algorithm 4, we cross-matched stenoses found by Algorithm 3a/3b with the stenosis percentage found in the procedural report in corresponding artery segments (Supplementary Fig. 1). Matched procedural report values served as labels to train Algorithm 4 with input images cropped around stenosed artery segments according to Algorithm 3a/b bounding boxes. Non-matched stenoses were removed from our dataset. We also excluded all videos where an intra-coronary guidewire was present in more than 4 frames, as automatically determined by Algorithm 3a/b (6,076 videos, 41,780 stenosis-frames identified with guidewires), since these videos likely represent percutaneous coronary interventions which could alter the stenosis percentage within that video (Supplementary Fig. 1); videos were retained from studies prior to the insertion of an intracoronary guidewire. To train and validate Algorithm 4, we combined 6258 images of non-stenosed coronary artery segments with the remaining 98,756 images of stenoses.
Once a stenosis was identified, bounding box coordinates were expanded by 12 pixels in all dimensions, then cropped and resized to the nearest of three predetermined sizes: 256*256 pixels (aspect ratio no.1), 256*128 pixels (aspect ratio no.2) and 128*256 pixels (aspect ratio no.3). This was performed to maximize signal-to-noise (vessel-to-background) ratio, due to different vessel orientations and stenosis sizes. The “Report Dataset” used to train Algorithm 4 consisted of 105,014 images (6667 stenoses coming from 2,736 patients and 5,134 healthy vessel segments from 1,160 patients; Supplementary Fig. 1). Since non-stenosed vessel segments tended to be longer than focal stenosis which may bias training, we cropped all non-stenosed segments randomly to a height and width, mirroring the distribution of stenosis image sizes within that coronary segment. This yielded similar vessel sizes between the stenosed and non-stenosed images for each vessel segment. Images were randomly split into 70% training, 10% development and 20% in testing datasets.
Algorithm 4 was based on a modified Xception architecture where the last layer (Softmax layer, used for classification) was replaced with an ‘average pool’ then dense layer with a linear activation function to enable prediction of stenosis severity as a continuous percentage value. Image metadata consisting of the coronary artery segment label and cropped aspect ratios were also added as inputs into the final layer of Algorithm 4, which improved performance. The algorithm output a percentage stenosis value between 0 and 100 for every input image representing the maximal stenoses in that coronary artery segment. The percentage value was then averaged across all frames of the stenosed artery segment in a video, then averaged across videos of the same artery segment to obtain a final stenosis percentage (artery-level percentage).
Model weights were initialized using those from the trained Algorithm 1. Images were augmented by random flip (both horizontal and vertical), random contrast, gamma and brightness variations, random application of adaptive histogram equalization (To improve contrast in images). The algorithms were trained to minimize the squared loss between the predicted (AI-stenosis) and the report-stenosis using the RADAM optimizer38 with an initial learning rate of 0.001, momentum of 0.9 and batch size of 12, trained for 50 epochs. Training was halted when loss stopped improving for 8 consecutive epochs in the test dataset.
For Algorithm 4, we modified the training scheme such that each epoch was trained on images of one aspect ratio, with the next epoch training on another aspect ratio (copying all weights from the previous iteration), as performed previously for multi-size inputs39. This was iterated until convergence. We measured the algorithm performance on the complete test dataset, consisting of the three aspect ratios. We observed that the convergence of the multi-size input training was like other algorithms that used a fixed size for training. We also examined various pre-processing approaches and sequences without improvement in algorithm performance (Supplementary Table 7s).
Neural network explainability methods
We applied two neural network explainability approaches to the fully-trained CathAI algorithms in order to better understand how algorithms made their predictions, respective to their relative tasks. The GradCAM17 technique highlights image regions most critical to prediction. Red highlighted areas denote higher importance to algorithm prediction, whereas more blue highlighted regions denote lower importance. We also derived saliency maps using the Layer Ordered Visualization of Information (LOVI) method18, which highlights individual pixels in the image that contribute most to algorithm predictions. Brighter pixels represent greater contribution to the algorithm’s prediction.
For external validation, we randomly sampled 1000 coronary angiogram videos performed at the University of Ottawa Heart Institute (UOHI) between July 1st 2020 and October 31st 2020, acquired with Philips Azurion systems (Koninklijke Philips N.V., Amsterdam, Netherlands), at 15 fps, using Iopromide dye. Algorithms 1, 2 and 3 were applied to each video to identify and localize stenoses, and Algorithm 4 predicted AI-stenosis. We then sampled up to 40 examples of angiogram videos per coronary artery segment to form our external validation dataset, identifying a total of 464 coronary angiograms with distinct stenoses. Two board certified interventional cardiologists at the UOHI, each with over 2500 coronary angiograms of experience as primary operators, adjudicated these 464 videos in a blinded fashion by grading stenosis severity as a percentage between 0 and 100%, describing the underlying anatomic structure and localizing the stenosis to a coronary artery sub-segment. Algorithm performance in this dataset was reported as the AUC of the AI-stenosis compared to each adjudicator, and to the average of both adjudicators. Since stenoses in this external validation dataset were only visualized in one video, there was no calculation of artery-level AI-stenosis performance. The same binary threshold (0.54) was used for obstructive AI-stenosis as in the primary analysis. We also described the concordance between the localization of the stenosis as determined by Algorithm 3 and by the two adjudicators as well as the average difference between each adjudicator stenosis percentage.
To train CathAI Algorithm 4 for the different population distribution of the QCA dataset, we split the QCA dataset into training (75%), development (12.5%) and testing (12.5%) and fine-tuned the last two fully-connected layers of Algorithm 4, to allow the algorithm to learn to predict the QCA stenosis values from the input stenosis images rather than visually assessed stenosis. We performed a grid-search of initial learning rates from 1e−4 to 1e−8 and selected the rate which produced the lowest loss value on the development set in 100 epochs. We then trained the model starting with a learning rate of 1e-6 and dropping by a factor of 0.1 every 100 epochs, for 300 epochs.
Quantitative coronary angiography dataset
The CathAI system provides an algorithmic foundation to be re-trained for future angiogram-relevant tasks. To demonstrate this, we obtained an external dataset of coronary angiograms of an a priori different patient population adjudicated at the Montreal Heart Institute (MHI) Core laboratory using the CMS QCA system (MEDIS, Leiden, Netherlands). This dataset was comprised of angiograms analyzed by the MHI Angiographic Core lab, obtained during randomized controlled trials (RCT) in ≥18 year old patients that had a coronary angiography intervention as part of the study, and received novel lipid lowering drugs or placebo40 The trials from which this data were derived used inclusion criteria that excluded patients with obstructive coronary artery disease (CAD) at the start of the study, which resulted in a majority of mild-to-moderate severity coronary stenosis in this dataset. QCA analysis was performed by two trained technicians and was supervised by an expert physician. For this dataset, severe stenosis was defined as ≥50% QCA stenosis severity6,13.
Coronary angiogram images were acquired using the Philips (Koninklijke Philips N.V., Amsterdam, Netherlands), General Electric Medical Systems (General Electric, Chicago, Illinois, United States) and Toshiba (Toshiba Corporation, Minato City, Tokyo, Japan) at 15 frames per second, by injection Iopromide dye into coronary arteries. For each QCA stenosis analysis at the MHI core lab, an end-diastolic frame was selected with angulations that best showed the stenosis at its most severe degree with minimal foreshortening and branch overlap. QCA software automatically calculated the percent diameter stenosis for coronary artery segments with reference diameter ≥1.5 mm.
This RCT QCA dataset contained a different patient population a priori from the real-world clinical UCSF dataset, since all stenoses ≥ 50% were not present during the baseline angiogram (but could be present at a follow-up angiogram), leaving primarily mild-CAD with a mean QCA stenosis severity of 31.7% ± 11.6%. This provided an optimal opportunity to examine how CathAI could provide a foundation for retraining using QCA stenosis labels to function as an automated tool for core lab angiogram analysis. We split the QCA dataset into training (75%), development (12.5%) and testing (12.5%) datasets and fine-tuned the last two fully connected layers of CathAI Algorithm 4 to predict QCA values rather than visually assessed stenosis estimates.
Algorithm evaluation and statistical analysis
As appropriate for each task, each algorithm’s performance for categorical values was evaluated using positive predictive value (PPV), negative predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), F1 score and Bland-Altmann plots. For continuous values, we present the intra-class (ICC [2,2])41 and Pearson correlation and the mean absolute error between CathAI’s stenosis prediction and the report stenosis or QCA stenosis. All three centers report continuous percentage stenoses as part of their routine clinical care.
Neural networks were trained using Keras v.2.24 and TensorFlow v.1.12. Final algorithms performance was reported in the Test Dataset. All analyses were performed using Python 2.7.
Algorithms 1 and 2 were evaluated on the frame/image level using precision (i.e. positive predictive value), recall (sensitivity) and plot the performance using confusion matrices. We also derived the F1 score for each class, which is the harmonic mean between the precision and recall.
To evaluate Algorithm 3a and 3b, we calculated the area of intersection over the area of union (IoU) between predicted bounding-box coordinates and the expert-annotated bounding-box coordinates of objects in each class in the test dataset. The IoU is the ratio between the area of overlap over the area of union between the predicted and annotated sets of coordinates36. An IoU≥0.5 signifies at least 50% area of overlap between the predicted and true bounding-boxes, which we considered a true positive. We then report the performance of Algorithm 3a/b as the mean average precision (mAP) metric, which represents the ratio of true positives over true and false positives at different thresholds of IoU, for every class37. A mAP value of 50% compares with state-of-the-art results for this type of task16,35. We also present the mean average precision for algorithm 3a and algorithm 3b by calculating the proportion of correct class prediction with an IoU≥0.1 with our ground-truth labeling across all our classes in our test dataset, as well as the positive predictive value of stenosis localization using the report or QCA dataset as ground truth.
To evaluate Algorithm 4, the primary metric of interest was the average absolute error between the reported value (REPORT-stenosis) and the predicted value (AI-stenosis) at the artery level. This mirrors guideline-based standard clinical practice for stenosis estimation, by measuring stenosis in multiple orthogonal projections and reporting the maximal degree of stenosis narrowing2,3. Image-level AI-stenosis was averaged across a video to obtain video-level AI-stenosis and compared against REPORT-stenosis using the mean squared error. Pearson and Intra-class correlation (ICC) and Bland-Altman42 plots were then calculated between the REPORT-stenosis and AI-stenosis at the video-level and artery-level. The reliability was classified as poor ( < 0.5), moderate (0.50–0.75), good (0.76-0.90), or excellent (0.91-1.0).(52) Finally, we present the mean squared error between REPORT-stenosis and AI-stenosis at the video-level.
To predict binary “obstructive” coronary artery stenosis, defined as ≥70% stenosis2,3, a threshold of 0.54 was used which optimized the F1 score. Based on this, we also report the area under the receiver operating characteristic curve (AUC), sensitivity, specificity and diagnostic odds-ratio43, at the frame level, video level and artery level, based on this threshold.
Confidence intervals for performance metrics were derived by bootstrapping 80% of the test data over 1000 iterations to obtain 5th and 95th percentile values. We present the performance of Algorithm 4 stratified by sex, by left and right coronary arteries, by artery segment and by age group. We also present two-sided P values for interaction (between the CathAI stenosis, the covariates and reported stenosis calculated by the Wald test). We categorized AI-stenosis and REPORT-stenosis in concordant and discordant lesion groups based on the visual ≥70% cutoff. For discordant lesions we present their prevalence, stratified by coronary artery segment. For lesion/vessel level data, a mixed effects logistic regression model as used to account for within-subject correlation and for repeated angiograms.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.