Cloud-based OCR solutions are powerful, but sometimes you need a tool that runs locally—securely, without API costs, and with zero latency. I decided to build exactly that using the power of the Windows Runtime (WinRT) APIs.
What is WindowsImagePdfOcr?
WindowsImagePdfOcr is a command-line tool and library implemented in C# targeting .NET 8. It acts as a wrapper around the built-in Windows.Media.Ocr engine, allowing developers and power users to extract text from images and PDF documents directly on their desktop environment.
The project focuses on accuracy and ease of use, automating the tedious process of converting scanned documents into editable .txt files.
Key Features
Unlike simple wrappers, this tool handles the entire pipeline of image processing to ensure the best recognition results:
- PDF Support: Automatically renders PDF documents page-by-page into high-resolution images for recognition.
- Smart Preprocessing: The engine doesn't just read the image; it improves it. It applies padding, color inversion, and uniform scaling to help the OCR engine detect characters more accurately.
- Format Agnostic: Works out-of-the-box with PNG, JPEG, BMP, TIFF, and GIF.
- Multi-Language Support: Includes a reusable OCR engine wrapper that supports language selection (e.g.,
ru-RU,en-US) based on installed Windows language packs. - WinRT Integration: Designed to run on Windows environments that support native WinRT OCR APIs.
Technical Implementation
The core challenge was bridging .NET 8 with Windows native APIs to handle visual data efficiently. By utilizing the Windows.Media.Ocr namespace, the tool achieves enterprise-grade recognition quality without external dependencies like Tesseract.
Extracted text is automatically saved to a .txt file located next to the input source, making batch processing of archives seamless.
Source Code
The full source code is available on GitHub. It supports argument parsing for automation pipelines and handles language fallback (e.g., automatically selecting en-US or ru-RU).