Publication Details
Abstract
The exponential growth of financial transactions in the global Banking, Financial Services, and Insurance (BFSI) sector has intensified the challenge of detecting money laundering, which accounts for an estimated 2–5% of global GDP annually (≈ USD 800 billion – 2 trillion) according to the United Nations Office on Drugs and Crime (UNODC). Traditional rule-based Anti-Money Laundering (AML) systems suffer from high false positive rates—often exceeding 95%—and limited scalability when confronted with big data transaction streams. To address these limitations, this paper proposes an AI/ML-powered real-time AML pipeline designed on distributed architectures leveraging Hadoop, PySpark, and graph-based algorithms for suspicious activity detection.
The pipeline integrates streaming data ingestion (Kafka + HDFS), parallelized ML models (PySpark MLlib), and graph-based community detection for uncovering hidden relationships between accounts and transactions. Key innovations include dynamic risk scoring using gradient boosting models, fraud ring detection through distributed graph algorithms (PageRank, Louvain modularity), and adaptive feedback loops for continuous model refinement. The proposed system demonstrates significant improvements: 40–60% reduction in false positives, near real-time processing at sub-second latency for millions of transactions per day, and regulator-ready audit trails through explainable AI components.
This architecture enables financial institutions to move beyond static rule-based monitoring toward proactive, scalable, and explainable AML detection. Beyond BFSI, the design principles apply to fintech, cross-border payments, and cryptocurrency exchanges, where transaction velocity and complexity demand advanced intelligence. The work underscores how combining AI/ML, big data platforms, and distributed graph analytics can redefine the global fight against money laundering by making compliance both scalable and intelligence-driven.