Why are most clinical trials failing because of the impact of AI?

Artificial intelligence (AI) research in healthcare is growing, with potential applications being proven across a variety of medical areas. However, there are currently few examples of such strategies being successfully implemented in clinical practice.



Retrospective versus prospective studies

Studies conducted in the past versus those conducted in the future

While existing studies have included huge numbers of patients and substantial benchmarking against expert performance, the great majority of studies have been retrospective, which means they use previously labeled data to train and test algorithms. Only through prospective studies will we be able to comprehend the true utility of AI systems, as performance is likely to suffer when confronted with real-world data that differs from that seen during algorithm training. Diabetic retinopathy grading, identification of breast cancer metastases in sentinel lymph node biopsies, wrist fracture detection, colonic polyp detection, and detection of congenital cataracts are among the few prospective studies to date.

Metrics do not always indicate clinical relevance.

The term “AI chasm” refers to the reality that accuracy does not always equate to therapeutic efficacy. Despite its widespread usage in machine learning studies, the area under the curve of a receiver operating characteristic curve is not always the best metric to represent clinical applicability and is difficult for many doctors to understand. Papers should provide information about positive and negative predictive values in addition to providing sensitivity and specificity at a certain model operating point.

Clinicians must understand how the proposed algorithms could improve patient care within a relatable workflow; however, most papers do not attempt to present such information. Potential approaches to this have been suggested, including decision curve analysis, which aims to quantify the net benefit of using a model to guide subsequent actions. To increase comprehension, medical students and practising clinicians should be given access to an AI curriculum that teaches them how to critically appraise, embrace, and employ AI tools safely in their practice.

Difficulty comparing various algorithms

The objective comparison of algorithms across research is difficult due to the fact that each study’s performance is reported using various approaches on different populations with distinct sample distributions and characteristics. Algorithms must be compared on the same independent test set that is representative of the target population, using the same performance criteria, in order to be fair. Without this, clinicians will struggle to determine which algorithm is most likely to perform well for their patients.

Machine learning science’s difficulties

AI algorithms are susceptible to a variety of flaws, including inapplicability outside of the training domain, bias, and brittleness (the ability to be easily deceived). Consider dataset shift, fitting confounders rather than real signals, propagating inadvertent biases in clinical practice, providing interpretability to algorithms, generating accurate measurements of model confidence, and the issue of generalization to new populations.

Dataset shift

It is easy to overlook the reality that all input data are generated in a non-stationary environment with fluctuating patient populations, where clinical and operational procedures develop with time, which is especially crucial for EHR algorithms. The implementation of a new predictive algorithm may result in a different distribution than that used to train the algorithm. Methods for detecting drift and updating models in response to declining performance are therefore crucial. Mitigations include meticulous assessment of performance over time to proactively uncover problems, as well as the likely need for frequent retraining. In order to retain performance over time, data-driven testing methodologies have been proposed to select the most effective updating method, ranging from basic recalibration to full model retraining.

Resistance to antagonistic attack or manipulation

Algorithms have been demonstrated to be vulnerable to adversarial assault. An adversarial assault, albeit still somewhat speculative at the moment, depicts an otherwise effective model that is subject to manipulation by inputs expressly meant to trick it. In one study, for example, photos of benign moles were misidentified as malignant by adding adversarial noise or simply rotating them.

Implementing AI systems presents logistical challenges.

Many of the present obstacles in converting AI algorithms to clinical practice are due to the fact that most healthcare data for machine learning is not freely available. Data is frequently compartmentalized in a plethora of medical imaging archiving systems, pathology systems, EHRs, electronic prescription tools, and insurance databases, making it challenging to bring them all together. The adoption of standardized data formats, such as Fast Healthcare Interoperability Resources, has the potential to improve data aggregation, although greater interoperability does not necessarily solve the problem of inconsistent semantic coding in EHR data.

Human-made impediments to AI adoption in healthcare

Even if a very successful algorithm solves all of the aforementioned obstacles, human hurdles to adoption are significant. To guarantee that this technology reaches and benefits patients, it is critical to keep a focus on clinical application and patient outcomes, enhance approaches for algorithmic interpretability, and gain a deeper knowledge of human-computer interactions.


Artificial intelligence breakthroughs offer an intriguing possibility to improve healthcare. The translation of research approaches to effective clinical deployment, on the other hand, represents a new frontier in clinical and machine learning research. To ensure that AI systems are safe and effective, robust prospective clinical evaluation will be required, using clinically applicable performance metrics that go beyond measures of technical accuracy to include how AI affects the quality of care, the variability of healthcare professionals, the efficiency and productivity of clinical practice, and, most importantly, patient outcomes. Independent datasets that are representative of prospective target populations should be curated so that different algorithms may be compared while looking for signals of potential bias and unexpected confounders.

AI tool developers must be aware of the unintended implications of their algorithms and ensure that algorithms are created with the global community in mind. More effort to increase algorithmic interpretability and comprehend human-algorithm interactions will be required for their future adoption and safety, which will be supported by the creation of appropriate regulatory frameworks.

Get latest news and insights of Global Biotechnology industry here

Corpradar is a next-gen digital IR 4.0 corporate media house that combines the power of technology with human capital to bring decisive and insight-driven content on key business affairs. In an absolute sense, we create a space for leading business houses and visionary corporate leaders to chime in with their opinions and thoughts on relevant industry-specific matters that provide a detailed expert perspective for our followers.

Leave a Reply

Your email address will not be published. Required fields are marked *