Publication Details
Abstract
The rising use of Electronic Health Records (EHRs) has produced enormous amounts of heterogeneous healthcare information, both structured, including demographics, laboratory measurements, and diagnosis codes, and unstructured clinical narratives, including physician notes and laboratory comments. While artificial intelligence (AI) has demonstrated significance in enhancing clinical risk prediction and clinical decision support, most of the current models use single-modality data and are opaque black-box models, restricting transparency and clinical trust. This study describes the design and assessment of an interpretable clinical risk prediction model that uses both structured and unstructured EHR data to improve the predictive power and intelligibility. A model based on structured features was trained with representations of clinical text by using natural language processing (NLP) to build a unified feature space using a de-identified dataset based on the MIMIC-IV clinical database of more than 230,000 patient admission records. To allow prediction of risks with confidence, a probability-estimating and supervised Support Vector Machine (SVM) classifier was incorporated. The accuracy, precision, recall, F1score, the confusion matrix analysis, and the Receiver Operating Characteristic (ROC) curve metrics were used to measure the model performance. The results of the experiment indicate high levels of discriminative performance with an Area under the Curve (AUC) of about 0.99 which shows robust capacity to classify risks. Multimodal data were better in predictive reliability than single-source prediction methods. The probability-oriented outputs also contributed to the improved interpretability of the result as it aided the clear risk stratification. The proposed framework contributes to the development of scalable, ethical, and interpretable AI systems capable of supporting clinical decision-making to achieve truer, timely, and reliable healthcare risk prediction.