AI Document Translator & OCR Tool

Overview

This tool is a comprehensive solution designed to automate the translation of PDF documents while preserving context. It bridges the gap between raw document processing and advanced AI translation by combining Optical Character Recognition (OCR) with Neural Machine Translation (NMT). Ideally suited for academic or professional use, it generates a reconstructed PDF that displays the original text alongside the translation for easy verification.

Key Features

Hybrid Text Extraction: Intelligent pipeline that extracts standard text via PyMuPDF and automatically falls back to Tesseract OCR for images with embedded text.
State-of-the-Art Translation: Leverages the Helsinki-NLP/opus-mt-tc-big-en-tr MarianMT model from Hugging Face for high-quality English-to-Turkish translations.
Side-by-Side Layout: Unique output format that places the source text and translated text adjacent to each other on the same page.
Smart Optimization: Features logic to skip previously processed files and automatically utilizes CUDA-enabled GPUs for accelerated inference.
Robust Logging: Detailed tracking of the translation process via translation_log.txt to monitor progress and catch errors.

Tech Stack

Core Logic: Python
AI & ML: PyTorch, Transformers (Hugging Face)
PDF & Image Processing: PyMuPDF (Fitz), FPDF, Pillow
OCR Engine: Tesseract
Hardware Support: CUDA (GPU Acceleration)