Ashari Abidin's Developer Docs

OpenClaw OCR API & Agent

๐Ÿš Helicopter View โ€” Whole System at a Glance

MVP Architecture
What are we building? A lightweight OCR automation pipeline using two independent Docker containers.

๐Ÿ“ฆ Container 1 (OCR API): FastAPI web server with Tesseract OCR engine. Exposes /ocr endpoint to accept image uploads and returns extracted text as JSON.
๐Ÿ“ฆ Container 2 (Watcher Agent): A Python watchdog script that monitors a local folder (./uploads). Whenever a new image appears, it automatically sends a POST request to the OCR API and prints the result.

๐Ÿงฉ Data flow: User copies image โ†’ uploads/ โ†’ Watcher detects โ†’ HTTP POST to OCR API โ†’ Tesseract processes โ†’ JSON text output.
โœ… Why this MVP? Fully containerized, portable, scalable, and can evolve into an intelligent document processing system with AI agents later.
+------------------+ +------------------+ +------------------+ +------------------+ | Upload Folder | ---> | Watcher Agent | ---> | OCR API | ---> | JSON Result | | ./uploads | | (watchdog) | | FastAPI+Tesseract| | Extracted Text | +------------------+ +------------------+ +------------------+ +------------------+

๐Ÿ“˜ Step-by-Step Implementation & Code Explanations

28 Steps Detailed
1-4 ๐Ÿณ Docker Installation & Setup
Install Docker Engine and Compose, enable service, add user to docker group, verify versions.
sudo apt update && sudo apt install -y docker.io docker-compose-plugin
sudo systemctl enable docker && sudo systemctl start docker
sudo usermod -aG docker $USER && newgrp docker
docker --version && docker compose version
5-7 ๐Ÿ“ Project Structure
Create root folders and subfolders for OCR API and watcher agent.
cd ~/.openclaw && mkdir -p openclaw-ocr-mvp openclaw-agent
cd openclaw-ocr-mvp && mkdir uploads outputs
8 ๐Ÿ“„ requirements.txt (OCR API)
List of Python dependencies: FastAPI, Uvicorn, Pillow for images, pytesseract bridge, and NumPy.
fastapi
uvicorn[standard]
python-multipart
pillow
pytesseract
numpy
9 ๐Ÿ–ฅ๏ธ main.py (OCR Service)
FastAPI app with /ocr endpoint. Receives image, saves temporarily, runs OCR via Tesseract, returns JSON. Root endpoint for health check.
from fastapi import FastAPI, UploadFile, File
from PIL import Image
import pytesseract, tempfile, shutil, os

app = FastAPI()

@app.get("/")
async def root():
 return {"status": "running"}

@app.post("/ocr")
async def do_ocr(file: UploadFile = File(...)):
 ext = os.path.splitext(file.filename)[1].lower()
 if ext not in [".jpg", ".jpeg", ".png"]:
 return {"error": "Only image files supported"}
 with tempfile.NamedTemporaryFile(delete=False, suffix=ext) as tmp:
 shutil.copyfileobj(file.file, tmp)
 path = tmp.name
 image = Image.open(path)
 text = pytesseract.image_to_string(image)
 return {"filename": file.filename, "text": text}
10 ๐Ÿณ Dockerfile (OCR API)
Build image from Python 3.11, install Tesseract engine, copy requirements, install pip dependencies, expose port 8000, run Uvicorn.
FROM python:3.11
WORKDIR /app
RUN apt-get update && apt-get install -y tesseract-ocr
COPY requirements.txt .
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
11 ๐Ÿงฉ docker-compose.yml (OCR API)
Simplifies orchestration: builds container, maps port 8000, mounts ./uploads and ./outputs volumes for persistence.
services:
 ocr-api:
 build: .
 container_name: ocr-api
 ports:
 - "8000:8000"
 volumes:
 - ./uploads:/app/uploads
 - ./outputs:/app/outputs
 restart: unless-stopped
12-14 ๐Ÿ”จ Build & Run OCR API
cd ~/.openclaw/openclaw-ocr-mvp
docker compose build --no-cache
docker compose up -d
curl http://localhost:8000 # Expected: {"status":"running"}
15-17 ๐Ÿ‘๏ธ Watcher Agent (requirements + watcher.py)
requirements.txt: watchdog (filesystem events), requests (HTTP calls). watcher.py: monitors /data/inbox, sends new images to OCR API.
# requirements.txt (inside openclaw-agent)
watchdog
requests
# watcher.py
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import requests, time, os

WATCH_FOLDER = "/data/inbox"
OCR_API = "http://host.docker.internal:8000/ocr"

class Handler(FileSystemEventHandler):
 def on_created(self, event):
 if event.is_directory:
 return
 path = event.src_path
 ext = os.path.splitext(path)[1].lower()
 if ext not in [".jpg", ".jpeg", ".png"]:
 return
 print(f"[NEW FILE] {path}")
 with open(path, "rb") as f:
 files = {"file": f}
 response = requests.post(OCR_API, files=files)
 print(response.json())

if __name__ == "__main__":
 os.makedirs("/data/inbox", exist_ok=True)
 observer = Observer()
 observer.schedule(Handler(), "/data/inbox", recursive=False)
 observer.start()
 print("Watching folder...")
 try:
 while True:
 time.sleep(1)
 except KeyboardInterrupt:
 observer.stop()
 observer.join()
18-19 ๐Ÿณ Dockerfile & Build Watcher
Dockerfile for watcher: Python 3.11, copy requirements, install, run watcher.py. Build as openclaw-agent.
FROM python:3.11
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "watcher.py"]
cd ~/.openclaw/openclaw-agent
docker build -t openclaw-agent .
20 ๐Ÿš€ Run Watcher Container (with mount)
Bind mount the uploads folder from OCR API project into the container at /data/inbox. Use --add-host so container can reach host's OCR API.
docker run -it --name openclaw-agent -v ~/.openclaw/openclaw-ocr-mvp/uploads:/data/inbox --add-host=host.docker.internal:host-gateway openclaw-agent
You should see: Watching folder...
21-22 ๐Ÿงช Test Automation Workflow
Copy any image into the uploads folder. The watcher detects and prints OCR result automatically.
cp ~/Downloads/sample.jpg ~/.openclaw/openclaw-ocr-mvp/uploads/
Expected watcher output: [NEW FILE] /data/inbox/sample.jpg followed by JSON with extracted text.
23-27 ๐Ÿ”„ Container Management
docker ps # see both containers running
docker compose restart # inside ocr-api folder
docker restart openclaw-agent
docker stop openclaw-agent # stop watcher
28 ๐ŸŽฏ Final Architecture Status
Both containers run independently. OCR API listens on port 8000, watcher monitors uploads folder. This MVP is ready to be extended with AI reasoning, document classification, or multi-agent orchestration.
๐Ÿ”ด OpenClaw OCR MVP โ€” Dokumentasi dua bahasa dengan pemisahan jelas, prolog helikopter view, dan penjelasan setiap kode & langkah.
Back