Case Study
AI
Behördebot KI
BehördeBot is an AI-powered assistant that helps users fill German public-service forms.
YEAR
2025
TEAM
Rohit Kulkarni
TECH-STACK
RAG, AI
LOCATION
Germany
Published on: October 24, 2025
Project Introduction
BehördeBot is a modular system with a Flask backend (core processing) and a Streamlit frontend (UI). Users upload scanned or photographed forms (PDF/JPG/PNG). The backend converts pages to images, runs OCR, and visualizes recognized text and bounding boxes.
The system classifies the form type, extracts structured fields (with confidences), performs named-entity extraction, applies heuristic error checks, and runs a Retrieval-Augmented Generation (RAG) search across a local legal-doc corpus to surface relevant legal snippets.
Users can translate content German↔English, edit detected field values in an interactive table, save structured JSON, and generate translated, layout-preserving PDFs for download. An evaluation dashboard computes OCR (WER/CER), translation (BLEU), and usability metrics (SUS, task completion), and session outputs/logs are stored in uploads/ and outputs/. The architecture is extensible (add more form types, add LLMs, connect cloud DBs).
Challenges
OCR reliability on poor scans: Handwritten text, low-resolution scans, or complex layouts reduce extraction accuracy and downstream field detection.
Form variability & layout complexity: Many government forms have subtle layout differences; robust classification and field-mapping for many templates is labor-intensive.
Legal-context relevance & trust: Ensuring that RAG results are legally accurate, up-to-date and presented with appropriate caveats to avoid misleading users.
AI solution
Modular OCR + NER pipeline: Combine Tesseract (or better OCR engines) with spaCy-based NER and heuristics to extract field candidates and entity types (names, dates, addresses).
Form classification + structured extraction: ML classifiers to detect form types and template-aware extraction logic (bounding-box + semantic parsing) to map OCR text to fields.
Legal RAG for context-aware guidance: Local vector store over legal docs to fetch supporting snippets per query/page, helping explain form requirements and cite references.
Results / Benefits
Faster, less error-prone form completion: Users complete forms quicker with field-level guidance, error tips, and autocomplete from extracted entities.
Improved accessibility & multilingual support: On-the-fly German↔English translation and layout-preserving PDF generation lowers language barriers for non-German speakers.
Auditability & traceability: Structured JSON outputs and logs make it easy to review what was extracted, corrected, and referenced (useful for help desks or compliance).
Resource efficiency
Reduced manual processing time: Automating OCR + validation lowers staff time spent on manual data-entry and corrections.
Fewer repeat submissions: Error detection and clear guidance reduce re-submissions, saving administrative costs and paper usage.
Local/offline-first deployment option: File/ram-based state and local legal corpora allow deployments without heavy cloud costs or persistent DBs (lower hosting costs; better data privacy).

AI
CardConnect AI
An AI-driven system that automates the process of digitizing business contacts from visiting cards.

AI
Smart Grant AI
SmartGrant sims to automate discovery of public funding and grant opportunities in Germany.





















