I am looking for an experienced developer (or small team) to build a configurable AI-based system for document processing and data extraction.
The system must not only extract text from documents, but also interpret and validate the extracted data against external datasets.
Core requirements:
Import documents (PDF, images) from folders or APIs
AI-based document classification (different document types)
Data extraction using AI (not template-based only)
Validation and matching logic against external data sources (e.g. product catalogs or structured datasets)
Ability to handle partial matches, inconsistencies, or missing data
Configurable rules for matching and validation (no hardcoding)
Feedback loop: when data is not correctly recognized, the system should allow correction and improve future results
System features:
Modular architecture
API-based integration with external systems (ERP or others)
Basic interface for configuration and validation
Logging, error handling, and traceability
Nice to have:
Use of LLMs or advanced AI models for document understanding
Human-in-the-loop validation workflow
Ability to “learn” from corrections over time
Tech preferences:
Open to proposals, but preferred:
Python (for AI/ML components)
REST APIs
Docker-based deployment on Linux
Use of open-source AI tools where possible
Goal:
Develop an MVP that can be extended into a more advanced system.
Show More