The engine of any MDT is a computational model. Depending on the application and available data, it may include mechanistic information about the relevant human biology, and it may take as input information specific to either an individual patient or a patient population. In all cases, the output is information that can be used in the treatment of an individual patient. Figure. 2 depicts the role of the computational model in the workflow of MDT applications.
For instance, a deep learning model might be trained on clinical data from a large patient cohort of gastric cancer patients, and is then used to determine a patient’s response to an immunotherapy treatment29. Such models may or may not include any mechanistic information about the relevant tumor biology, such as mutated signaling pathways and their downstream effects, and predictions for the specific patient are based on correlation between the patient’s data and those of the reference population used in the model. At the other end of the spectrum, a computational model may capture all known features of human biology relevant to a given application and may make treatment recommendations based on a model analysis, informing clinical trials, without directly using any data from a specific patient29. The focus of Forum participants was primarily on MDTs based on a mechanistic computational model. This preference stems from the ability of mechanistic MDTs to link outcomes to mechanisms, thereby informing treatment. Additionally, these models allow for the performance of uncertainty quantification in relation to their predictions.
Many mechanistic models of human biology are now available, particularly those incorporating aspects of the immune system. For numerous applications, the underlying model of an MDT will need to encompass various mechanisms, spanning several spatial and temporal scales. For example, while most drug mechanisms are intracellular, their effects manifest at the tissue or organ scale, necessitating cross-scale integration. The immune response to an infection is multifaceted, coordinating diverse mechanisms and cell types. Consequently, computational models for MDTs will likely be high-dimensional, multi-scale, multi-physics, hybrid, and stochastic, containing numerous parameters. Integrating heterogeneous data types, from molecular to physiological, will be essential for their parameterization and application. Most crucially, these models should be adaptable to individual patient data. Very few such models have been constructed for clinical use or new biology discovery, leading us into uncharted territory in their construction, analysis, validation, and application.
An important issue in quantitative medicine in general and MDTs in particular is data integration. We first note that MDTs are an ideal vehicle for the integration of heterogeneous data types at different scales, from molecular to organism-level data, since they provide a rigorous framework that links heterogeneous data types characterizing heterogeneous biological processes in the correct biological fashion. Practically, however, this can raise many challenges, unique to different applications. There are no general approaches to this problem, to our knowledge. In addition, data collection might be technologically challenging. Sometimes, surrogate data types can be used. For instance, gene expression data are often used as surrogates for protein data, even though it is well-understood that this is often problematic for several reasons, e.g., lack of correlation in expression. Also, while it is sometimes possible to obtain data at different scales from the same experiment, this is not always the case.
Finally, we comment on the use of machine learning and artificial intelligence (ML/AI) models for the purpose of constructing MDTs. For the purpose of illustration, we focus on cancer applications, since this is a field that is comparably rich in data. It is important to note that there are certain problems of central importance to oncology where the methods of AI/Big Data are fundamentally limited in their ability to power digital twins. For example, predicting treatment response and then, subsequently, identifying an optimal interventional plan for an individual patient. Examples of the need to solve this problem abound in both medical and radiation oncology where it is well-recognized that a “one-size-fits-all” approach is not appropriate, but a practical method for identifying optimal patient-specific interventions is not well-established. Given the tremendous heterogeneity between patients, it is difficult to imagine how a digital twin built on population data can provide anything more than general insights into predicting an individual patient’s response, let alone how to optimize their treatment plan30. This is because a patient is not just diagnosed with cancer, or (for example) breast cancer, or (for example) triple negative breast cancer; rather, they are diagnosed with one of the (currently) known subtypes of triple negative breast cancer. Thus, to build a digital twin for Ms. Jane Doe that is powered by population-based data, one would need to find a training data set with hundreds (thousands?) of patients that share her subtype, her biological characteristics, and contains all the therapeutic regimens she might receive. That data set does not exist and is unlikely to ever exist because cancer is getting more precisely diagnosed (thereby increasing patient heterogeneity) and the number of available drugs is increasing (thereby increasing treatment options). (This, in fact, recently played out for early triple negative breast cancer when pembrolizumab was approved as part of the standard-of-care therapy31, thereby necessitating building new databases for all population-based approaches for this disease). This is but one example of a problem for which biology-based mathematical models offer a distinct advantage over the AI/Big Data only approach. By explicitly including known biology and physics into the mathematical model27, one can calibrate such models using only patient-specific data to personalize the digital twin, thereby allowing one to not only systematically simulate patient-specific interventions, but also select the one with the highest probability of yielding a positive outcome32.
Five-year action plan for mathematical and computational modeling
The biomedical modeling community has spent decades building complex models of different medical and disease processes in humans from cancer to infections. These are all potentially usable as drivers of MDTs or components thereof. As a first step, we need to develop and curate a repository of model templates (i.e., accepted model structures) and specific model models (e.g., peer-reviewed models of specific signaling networks) that can be used in the construction of MDTs, ranging from intracellular to physiological scales. Existing repositories include, e.g., Biomodels33, Cell Collective34 and GinSim35. These can be built upon for a more comprehensive curated collection.
Existing techniques for the validation, calibration, and analysis of computational models, most importantly sensitivity and identifiability of model parameters, are not always directly applicable to models underlying an MDT. Research is needed to develop appropriate model analysis techniques for MDTs.
For many applications, MDTs will be used to forecast the future health trajectory of a patient, as well as the effect of available interventions to change it. Existing control and optimization methods (e.g., ref. 28) mostly apply only to ordinary differential equations models. Research is needed to develop novel forecasting and control approaches suitable for complex MDTs.
There are many existing models of disease processes and immune system function that can be used to build MDTs, as mentioned above. Research is needed to develop a platform for the modular construction of complex MDT models from component models. Such a platform is essential for achieving the long-term vision of a virtual patient. A possible approach is proposed in12,36.
The data from an individual patient captures different aspects of their characteristics and health status. We have genomic data, gene expression measurements, protein, and metabolite concentrations in different tissues under different conditions, imaging data of everything from immune cells in lymph nodes to functional MRI data in the brain, electronic health records, to lifestyle and behavioral data. They all provide information about some aspect of a person, and the challenge is to integrate them in a meaningful way to provide a holistic representation. A computational model of the patient that is dynamically updated with all this information is a natural way to accomplish the data integration required. The confluence of several simultaneous developments has created an environment in which this promise of personalized medicine is taking on shape: vastly increased availability of data, from the molecular to the population scale, leading to a deeper understanding of human biology and its role in health and disease, and, finally, an expansion of our computational and modeling tools.
As mentioned earlier, most of the projects discussed at the Forum do not meet the criteria for being considered a digital twin or being readily turned into one. All the models discussed can, in principle, serve as the basis for a computational model that is personalized to a patient as all of them capture some aspect of disease-relevant human biology. The most common reason this has not been done for the models discussed is that they are still in the phase of model validation using, in most cases, animal or in vitro data, as appropriate human data are often not available. The second reason is that patient data routinely collected in a healthcare setting are often not suitable to be used directly for the calibration of the models discussed, since models often contain events at the intracellular scale or spatial heterogeneities that are difficult to capture at the tissue scale. Thus, the first step needs to be to develop surrogate measurements for unavailable data and to develop surrogate models that can be used as the underlying MDT model.
If the experience of the computational biology community over the last 30 years is any indication, then possibly the most daunting challenge to widespread adoption of the MDT paradigm is the formation and functioning of the interdisciplinary teams required for this purpose. In addition to biologists and mathematical modelers, we need to also integrate clinicians. All three communities need to find ways to come together for both research and deployment of this technology. This point is made strongly in the National Academies report on the subject11. The Forum participants included representatives of the three communities, and the difficulty of aligning objectives and bridging language barriers was discussed. There are no ready-made solutions to this problem, but appropriate funding mechanisms requiring this integration can provide incentives.
A well-designed funding program for MDT research by the public sector is crucial if substantial progress is to be made over the next decade. New funding paradigms should be considered for this purpose. There can also be an important role for the business community and philanthropic organizations in providing funding for this effort and collaborating on the myriad research problems that will need to be tackled and solved. The Forum we are reporting on here is intended to support a dialog around this topic. Collectively, a community is emerging around this effort that can, with the right resources, help make rapid progress on bringing MDTs to patients at a large scale.
Finally, we comment briefly on a number of important issues related to MDTs that were not discussed at the Forum, in part because time was very limited, and in part because the topics required expertise not represented among a significant number of the participants. This includes regulatory hurdles to the deployment of digital twins, ethical and privacy issues, the economics of MDT use on a large scale, and health equity issues, among many others. Another topic, vitally important for digital twin deployment, is uncertainty quantification. There are many contributors to model uncertainty, including the stochasticity that many models, in particular ones that involve the immune system, display. This too was not discussed at length at the Forum for lack of time. Most of these topics are addressed in both the EDITH draft strategic plan8 and the National Academies report11, and we encourage the reader to consult them.