From Paperwork Chaos to AI Precision: How teX.ai Unlocked Seamless Text Extraction for a Financial Service Firm
Client Overview
The client is a fast-growing financial firm that has embraced digital transformation to stay ahead in a competitive market. Specializing in rapid and accurate credit score assessments, they help banks evaluate customer creditworthiness with greater efficiency. By leveraging cutting-edge technology and digital platforms, the client aims to revolutionize the credit delivery process in India.
As part of their customer validation workflow, they process thousands of scanned bank statements to meet applicants' KYC (Know Your Customer) requirements. Automating and optimizing this process is crucial to maintaining speed, accuracy, and compliance in their operations.
Extracting Insights, Not Just Data: The KYC Processing Challenge
To streamline KYC processing, the client needed a robust and scalable text extraction system capable of handling diverse document formats and high data volumes. The goal was to automate the extraction of critical customer details while ensuring accuracy and efficiency.
01
Two Formats, One Challenge:
The extraction process needed to handle two distinct document formats—scanned images and digital PDFs. Key information was spread across both structured tables (tabular data) and unstructured sections (peripheral data), requiring a precise and adaptable extraction approach.
02
Finding the Needle in the Haystack
The extraction process had to identify and extract five key fields from both structured (tabular) and unstructured (peripheral) data sources within the statements. These fields included the Account Holder's Name, Date, Bank Name, Transaction Details, and Address.
03
Learning from Data:
The required extraction models were initially trained using a training corpus of 2000 bank statements.
04
Built to Scale, Ready for Growth:
With thousands of new documents arriving daily, the system needed to scale effortlessly, adapting to fluctuating workloads without compromising performance.
The teX.ai Approach: Implementing Intelligence in KYC Processing
To tackle the complexities of bank statement extraction, Indium implemented a structured, AI-driven approach using our in-house accelerator, teX.ai. By leveraging deep learning, NLP models, and intelligent classification techniques, teX.ai ensured accurate, scalable, and efficient KYC processing.
Decoding Complexity: Understanding Bank Statement Variability:
The initial analysis uncovered 200 different types of bank statements within the provided dataset of 2,000 files, highlighting the need for a flexible and adaptable extraction solution.
Quality First: Training the AI on High-Fidelity Data
Out of the 2,000 bank statements, 830 high-quality files were selected to train the initial NLP model, ensuring superior text recognition and extraction accuracy.
Cracking the Code: AI-Powered Table Detection
A deep learning object detection model was used to locate tables in scanned bank statements. Bounding boxes were drawn to identify table structures, enabling precise extraction.
Extracting the Essentials: Turning Tables into Usable Data:
Tabula and Camelot, integrated within teX.ai, were deployed to extract text from both digital and scanned PDFs. These tools ensured high-accuracy table content extraction.
Extraction of Data Outside Tables (Peripheral Data)
Beyond the Grid: Extracting Peripheral Data with AI: Not all critical information was inside tables. A combined RNN model (LSTM-CRF) efficiently captured the required fields across different document formats for scattered data outside structured sections. This model effectively extracted the peripheral data regardless of the PDF document type.
Output: Ready-to-Use Insights: Structured JSON Output
The extracted data was delivered as a JSON file, seamlessly integrating into the client’s in-house systems for further processing and analysis.
Secure & Scalable: On-Premises Deployment: Set-Up
teX.ai was deployed on-premises within the client’s data center, ensuring full control and security over sensitive financial data. The system was built using Flask and Requests, facilitating a smooth and scalable data processing pipeline. Training was also provided to the in-house team for seamless adoption.
teX.ai in Action: Revolutionizing KYC Processing with Measurable Impact
The implementation of teX.ai revolutionized the client’s KYC processing, delivering speed, accuracy, and scalability like never before. Here’s how it made a measurable difference:
01
80% Boost in Throughput: Faster Processing, Higher Efficiency
With teX.ai, the processing time for a single bank statement dropped to under a minute, allowing the client to handle a significantly higher volume of applications per day. This resulted in an increase in the overall throughput, accelerating loan approvals and customer onboarding.
02
90% Accuracy: Reliable Data for Smarter Decisions
The AI-driven text extraction achieved pronounced accuracy, ensuring clean, structured, and reliable data for further processing. As the system processed larger datasets, its accuracy improved, delivering even more precise and high-quality outputs over time.