The global invoice landscape continues to evolve, with a staggering 550 billion invoices generated annually as of 2019. Despite increasing digitization efforts, only 10% of these are processed without paper, creating inefficiencies in manual data entry and document handling. Governments and international bodies, driven by stringent fiscal policies and privacy regulations like GDPR, are intensifying their push for electronic invoicing. This shift is critical for addressing operational inefficiencies, reducing manual errors, and ensuring data security.
Current solutions in the market, such as DocDigitizer, Rossum, and Abby FineReader, showcase varying degrees of efficiency in optical character recognition (OCR) and document processing. However, these platforms reveal critical shortcomings when addressing national-specific requirements or ensuring user autonomy during the document lifecycle. For instance:
- Abby FineReader struggles with character recognition nuances, such as accentuated letters, and image brightness adjustments.
- DocDigitizer achieves high reliability but lacks user control in processing, introducing unnecessary dependency on vendor teams.
- Rossum, while robust in English-language processing, demonstrates accuracy issues when applied to Portuguese invoices.
To address these industry challenges our study leverages the following technologies:
Optical Character Recognition (OCR): Utilizing pre-trained models (e.g., YOLO) with transfer learning to enhance recognition accuracy for Portuguese and European character sets.
Machine Learning (ML): Implementation of feed-forward neural networks for character detection. Support vector machines (SVMs) for document classification and text classification.
Convolutional Neural Networks (CNNs): Applied in logo detection to validate invoice authenticity and enhance data extraction accuracy.
Study Details
Our study set out to develop an efficient and secure system for invoice processing, integrating OCR, ML, and classification technologies. The objectives were:
- Enable capture via low-cost mobile devices.
- Provide immediate identification and categorization of documents.
- Seamlessly recognize key business fields and integrate with enterprise systems.
- Extract text from scanned or photographed invoices.
- Apply ML models to classify document types and extract fields.
- Offer user-interactive adjustments for improved accuracy.
Initial lack of training data was addressed through transfer learning on pre-trained models (YOLO for OCR and SVM for text classification). This allowed us to build upon existing knowledge while customizing models for regional-specific needs.
SVM was used to classify invoice fields efficiently. Its lightweight nature proved advantageous in comparison to deep learning models like NN, offering faster training times and high precision (98.9%). The system pre-processed textual data to enhance classification accuracy, accommodating Portuguese-specific syntax and semantics.
CNNs demonstrated remarkable accuracy (~98%) for logo identification, helping validate vendor authenticity. Training on CIFAR-10 datasets and fine-tuning with national logos ensured robust performance.
Leveraging microservices and containerization facilitated portability and streamlined deployment across diverse environments, including mobile platforms.
Implementation and Results
Tested against six market solutions, our Azure API-based OCR implementation achieved superior text recognition accuracy with minimal data size requirements, supporting both digital and scanned invoices.
The system identified and extracted business-critical fields, with an average confidence score of 96%. Dynamic templates allowed users to define field mappings, reducing errors in downstream processes.
A web application and REST API were developed to process, classify, and display invoice data. Users could manually adjust extracted data, offering control and flexibility absent in competing solutions like DocDigitizer.
The system incorporated GDPR-compliant workflows, ensuring sensitive data handling and user access control. Integration with eFaturas enabled real-time data validation and reduced user input.
The combination of OCR and ML technologies allowed for high levels of automation and accuracy, addressing the inefficiencies of manual invoice processing. Continuous learning algorithms enabled adaptability to evolving invoice formats and user needs. Containerized microservices provided seamless integration with enterprise resource planning (ERP) systems and mobile applications.
The study demonstrated a scalable and efficient system for invoice processing. By combining OCR, ML, and cloud technologies, we addressed inefficiencies in manual data entry.