Under the drive to bring digital transformation, one of the high courts of India wanted to convert all the documents related to its past cases into digital form. Our client who is a known scanning company won the contract to scan and clean all the historical documents. The client so far was cleaning the documents for any noise such as black patches, scanning impressions, etc. using a semi-automatic tool. The tool would need manual cleaning of the document by snapping the noise and deleting it.
The client wanted to automate the process considering the huge scale of documents in the range of 100 million pages which needed to be cleaned as soon as possible.
A pilot project was created to understand the possibility of cleaning the documents using AI. An AI model was created and trained using about 10,000 pages. The model proved the value of using AI for this monotonous and laborious work of removing the noise from the documents. The model returned the output quality with about 90 to 95% of the removal of the noise present in the test data.
The model is now deployed on Amazon Web services and it is being used to clean all the documents covering close to 100 million pages in a highly cost-effective and in much lesser time than the actual anticipated time for manual cleaning.