I’m on a project to implement SAP S4/HANA, and before Xmas I received a request from a member of the business to fully automate the processing of vendor invoices.
The business representative in question is new to SAP and the concepts of ERP solutions. So I explained all of the benefits of an integrated enterprise-wide solution, how the financial accounts are posted to automatically when performing operational tasks and how the integrated analytics provide real time insights. He was excited by all of that but was also eager to explore the automation of capturing vendor invoices, in other words without any manual intervention. Apparently they receive a lot of invoices each month …
There is not (yet) a solution from SAP to do this last piece, so at the time of writing SAP relies on partner solutions (although I believe a native solution from SAP may be imminent). Nevertheless I had found a problem waiting for an OCR solution … BINGO!
So I spent the lockdown period over Xmas writing a proof-of-concept python application to do the following:
- Convert PDF files to individual images (sometimes the vendors send multiple invoices in a single PDF file and these need to be split out and converted to image files for OCR processing)
- Build a template for each invoice type using manual image annotations to capture regions of interest for each vendor
- Automatically extract all relevant text using OCR techniques from vendor invoices using the coordinates of the annotation templates as references
- Format the extracted text and map it onto the fields of the SAP Invoice API and use that API to post the invoices directly into SAP
That’s quite a lot for a single post so I’m working on a series of posts to explain how I achieved each step.
It will certainly need some refinement, but it is a working MVP. My last post in this series will outline areas for further development.
Here are some images to whet your appetite:
I hope you enjoy the journey.