Text Extraction For KYC Processing Using Tex.Ai For A Financial Services Firm
Project Overview
To build a text extraction model for bank statements employing the teX.ai product and build a pipeline for easy orchestration of future bank statements for text extraction.
About Client
The client is a rapidly growing firm in India that operates in the financial domain and has embraced digital transformation to gain a competitive edge. They specialize in providing quick and efficient credit score ratings to their customers and offer assistance to banks in assessing their customers’ creditworthiness. By leveraging technology and digital platforms, the client aims to revolutionize the credit delivery process in India. As part of their customer validation process, the client needs to process thousands of scanned bank statements to meet the KYC (Know Your Customer) requirements for the applicants.
Business Requirements
- The documents to be extracted consisted of two types: scanned images and digital PDFs. The extraction process aimed to capture five key fields, which were located in both the table section (tabular data) and outside the tables (peripheral data).
- These fields included the account holder name, date, name of the bank, transaction details, and address.
- A training corpus of 2000 bank statements was initially used to train the required extraction models.
- To accommodate the daily influx of documents, the system needed to be scalable.