Project Description
I am the manager of a manufacturing plant building "Paideia," an AI maintenance assistant. I need a Python Backend Developer to build a Dockerized RAG (Retrieval-Augmented Generation) System.
The Core Goal: I need a system where I can personally upload and manage a library of 100+ technical manuals (PDFs). I must be able to add new manuals or delete old ones at any time, on my own, through a simple API or Interface, without needing a developer to run scripts for me.
The Roadmap:
Phase 1 (Cloud): The system runs on Google Cloud Run to leverage high-speed processing.
Phase 2 (Local): I must be able to eventually "download" the entire Dockerized app and run it offline on a Windows workstation.
Key Technical Requirements
1. Self-Service "Bulk Upload" Capability
I must be able to upload batches of manuals (e.g., 10-20 PDFs at once).
Robust Background Queue: The system must use a background worker (e.g., Celery/Redis or Cloud Tasks) to process these files. I need to be able to upload 1GB of data, close my laptop, and have the server finish the job.
Error Handling: If I upload 50 manuals and 1 is corrupt, the system must skip the bad one and finish the other 49. It should not crash the whole batch.
2. Full Data Lifecycle (Add/Delete/Update)
Granular Deletion: I need to be able to delete a specific manual (e.g., "Remove Safety_Manual_2023.pdf").
Deep Cleaning: When I delete a file, the system must physically remove every vector chunk associated with that file from the database so the AI never "hallucinates" old data.
3. Portable Architecture (No Vendor Lock-in)
Docker is Mandatory: The API, Database, and Worker must run in containers.
Portable Vector DB: Use ChromaDB, Qdrant, or Weaviate. Do not use proprietary cloud databases (like Pinecone) that I cannot run offline later.
Tech Stack
Backend: Python (FastAPI).
Infrastructure: Google Cloud Platform (Cloud Run).
Database: Dockerized Vector DB (Chroma/Qdrant) with Persistent Storage.
Queue: Async Task Queue (for handling large uploads).
Deliverables
Dockerized Source Code: Ready to deploy.
API Endpoints: For Upload (Bulk), Status Checking, and Deletion.
Basic "Admin" UI (Optional but Preferred): A simple HTML/Streamlit page to drag-and-drop files and see a progress bar (e.g., "Processing 3 of 10...").
Show More