π¦ OpenClaw OCR MVP Β· PaddleOCR
π― Fokus MVP (Tahap 1)
Yang DIBAHAS
- π€ Upload image β OCR jalan β JSON output
- π‘ OpenClaw siap panggil via REST API
- π³ Fully Dockerized (docker-compose)
- π Mudah di-scale nanti
- π§ PaddleOCR sebagai engine core
Jangan dulu (Tahap 2)
- β Queue / Redis
- β Database, vector DB
- β Multi-agent, Kubernetes
- β Workflow kompleks
- β Async heavy workers
π¦ Step 1β3 : Install Docker & Buat Project
Ubuntu / Linux sudo apt update && sudo apt install docker.io docker-compose -y
docker --version docker compose version mkdir openclaw-ocr-mvp cd openclaw-ocr-mvp
Struktur folder final :
openclaw-ocr-mvp/ βββ app/ β βββ main.py β βββ ocr.py β βββ utils.py (opsional preprocessing) β βββ requirements.txt βββ uploads/ βββ Dockerfile βββ docker-compose.yml
βοΈ Core Code : PaddleOCR + FastAPI Endpoint
app/requirements.txt
fastapi uvicorn python-multipart paddleocr paddlepaddle opencv-python pillow
app/ocr.py (engine)
from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')
def run_ocr(image_path):
result = ocr.ocr(image_path)
output = []
if result and result[0]:
for line in result[0]:
output.append({
"text": line[1][0],
"confidence": float(line[1][1])
})
return output
app/main.py (FastAPI)
from fastapi import FastAPI, UploadFile, File
import shutil
import uuid
from ocr import run_ocr
app = FastAPI(title="OpenClaw OCR API", version="1.0")
@app.get("/")
def home():
return {"status": "running", "service": "OpenClaw OCR MVP"}
@app.post("/ocr")
async def process_ocr(file: UploadFile = File(...)):
filename = f"uploads/{uuid.uuid4()}.png"
with open(filename, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
result = run_ocr(filename)
return {"status": "success", "data": result}
π³ Dockerfile + docker-compose.yml
π Dockerfile
FROM python:3.11 WORKDIR /app COPY app/requirements.txt . RUN pip install --no-cache-dir -r requirements.txt COPY app . RUN mkdir -p uploads CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
π docker-compose.yml
version: '3' services: paddleocr: build: . container_name: paddleocr-api ports: - "8000:8000" volumes: - ./uploads:/app/uploads
docker compose up --build (pertama kali agak lama karena download model PaddleOCR)
π§ͺ Test API & Contoh Hasil OCR
β
Buka browser di http://localhost:8000/docs β Swagger UI siap testing upload gambar.
β Endpoint POST /ocr β upload file gambar, dapatkan JSON:
{
"status": "success",
"data": [
{ "text": "Nama Mahasiswa", "confidence": 0.99 },
{ "text": "ASHARI ABIDIN", "confidence": 0.98 }
]
}
import requests
url = "http://localhost:8000/ocr"
with open("sample.png", "rb") as f:
files = {"file": f}
response = requests.post(url, files=files)
print(response.json())
Tambahkan app/utils.py dengan thresholding Otsu untuk scan noisy.
import cv2 def preprocess(path): img = cv2.imread(path) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] cv2.imwrite(path, thresh)
Lalu di main.py panggil preprocess(filename) sebelum OCR.
π Kenapa Ini MVP yang Benar
- β Modular β pisah antara OCR engine & API layer
- β API-based β siap diintegrasi OpenClaw sebagai tool / action agent
- β Dockerized & scalable β mudah replika horizontal nanti
- β Panggil agent β POST /ocr langsung JSON
- β Mudah upgrade ke tahap 2 (PDF, queue, GPU)
π₯οΈ CPU: 4 core | RAM: 8 GB | Disk: 20 GB
PaddleOCR production-ready dan ringan untuk tahap awal.
Upgrade Tahap 2 (Nanti jangan sekarang) : PDF OCR, table extraction, Redis queue, GPU, async workers, AI cleanup, RAG, vector DB, spreadsheet export.
Kenapa PaddleOCR? Balanced open source, multilingual, document parsing, aktif dikembangkan, populer untuk production document AI modern.
π Command Lengkap Cepat
π¨ Build & start
docker compose up --build
π Stop
docker compose down
π Logs realtime
docker compose logs -f
π Restart
docker compose restart
Pastikan port 8000 terbuka, lalu buka http://localhost:8000 untuk cek status. Swagger docs interaktif tersedia.
Flow Integrasi OpenClaw β OCR MVP
β¨ Arsitektur paling sederhana tapi βbenarβ β sudah APIβbased, dockerized, scalable, dan siap dipanggil agent. No over-engineering.
π― Kesimpulan MVP OpenClaw + PaddleOCR
β
Upload image β OCR berjalan β hasil JSON stabil.
β
Docker compose satu perintah langsung siap pakai.
β
PaddleOCR memberikan keseimbangan terbaik antara akurasi, kecepatan, dan kemudahan deployment.
β
Arsitektur ini menjadi fondasi tepat untuk OpenClaw sebagai agen cerdas yang membutuhkan ekstraksi teks dari gambar/dokumen.
β
Nanti saat traffic naik, scale dengan menambahkan load balancer, multiple container, atau pindah ke GPU β tanpa merusak desain inti.
Comments