Harmonization of Discrepant Data: A Solution to the Computational Models for Data Collection in the Tertiary Institutions

E. C. Aburuotu; Nathaniel A. O.

doi:10.51699/1ndcxz69

Publication Details

Journal: Innovative: International Multidisciplinary Journal of Applied Technology (2995-486X)

Issue: Vol 3, No 11 (2025)

Pages: 90-106

ISSN: 2995-486X

Check on Google Scholar DOI: 10.51699/1ndcxz69 Visit Article Page

Abstract

Data visualization, interoperability, analysis, and business decisions face a significant challenge as a result of the growing volumes of heterogeneous data being produced by agencies and institutions in the education sector. Stakeholders in the education sector should implement Harmonization of Heterogeneous Data-set as a critical solution. The legacy data that is analyzed and is intended to be used in decision support systems and analytical applications is imported from various data sources with different data types and database architectures and structures. All of these data need to be harmonized for the intended business solutions and growth. Therefore, the Support Vector Machine (SVM) algorithm for Heterogeneous Data Harmonization technique has emerged as the most effective method for creating high-quality data intended to enhance the governance and usefulness of its purpose across the enterprise. The goal of the research was to create and refine a support vector machine-based heterogeneous data harmonization solution for enterprise databases. A data harmonization technique was developed with the integration of harmonization tools using a support machine learning algorithm, and the work was implemented using the Java Script (JS) development environment. The study looked at existing data production techniques on various active databases. To accomplish its goals, this work used the Rapid Application Development (RAD) system methodology. The system's AI machines were tested and trained using both structured and unstructured data imported from Microsoft Excel applications, thanks to the Supervised Machine Learning procedures. There were 10,990 different data sets that were used for training and testing. Testing was conducted on 8,393 (70%) datasets, while 2,597 (30%) were used for training. The outcomes demonstrate that the system was successful in redefining the data headers and column dimensions as a means of coordinating the pull of data imported into the system.

Keywords

Data Harmonization Analytics Visualization Dashboard Systems Platform Interoperability Machine Learning Algorithms

Document Preview

Buka Fullscreen / Download

Go Back