# Tooling A collection of useful command-line tools. ## OCR Screenshot Tool A cross-platform CLI tool that takes screenshots, performs OCR using DocTR (state-of-the-art deep learning OCR), and copies the result to clipboard. Features intelligent text formatting preservation and optional image annotation. ### Features - ๐ŸŒ **Cross-platform** - Works on Windows, macOS, and Linux - โšก **Multiple screenshot methods** - Choose the fastest for your system - ๐Ÿ” **Advanced OCR** - Uses DocTR with PARSeq recognition model - ๐Ÿ“ **Smart formatting** - Preserves text layout and indentation - ๐ŸŽจ **Image annotation** - Visualize detected text regions - ๐Ÿ“‹ **Clipboard integration** - Automatic text copying ### Installation #### Basic installation: ```bash pip install . ``` #### With cross-platform screenshot support: ```bash # For fastest screenshots (recommended) pip install ".[screenshot-fast]" # For full automation features (region selection) pip install ".[screenshot-full]" # For maximum compatibility (all backends) pip install ".[screenshot-all]" ``` #### Install specific screenshot libraries: ```bash pip install mss # Fastest (~30x faster than others) pip install pyautogui # Interactive region selection pip install pyscreenshot # Multiple backends ``` ### Usage #### Basic Commands Take a screenshot and perform OCR: ```bash ocr-screenshot ``` With verbose output and annotation: ```bash ocr-screenshot --verbose --annotate --save-image ``` #### Screenshot Methods Choose your preferred screenshot method: ```bash # Auto-detect best method (default) ocr-screenshot --screenshot-method auto # Use MSS (fastest) ocr-screenshot --screenshot-method mss # Use PyAutoGUI (supports region selection) ocr-screenshot --screenshot-method pyautogui # Use Pillow ImageGrab (built-in) ocr-screenshot --screenshot-method pillow # Interactive region selection ocr-screenshot --screenshot-method interactive # macOS native (region selection with drag) ocr-screenshot --screenshot-method macos ``` #### Advanced Features Save screenshot with annotation showing detected text: ```bash ocr-screenshot --save-image --annotate --show-words --show-text ``` Capture specific monitor (MSS method): ```bash ocr-screenshot --screenshot-method mss --monitor-number 2 ``` Full annotation with all detection levels: ```bash ocr-screenshot --annotate --show-words --show-lines --show-blocks --show-text --save-image ``` ### Screenshot Method Comparison | Method | Speed | Region Selection | Cross-Platform | Notes | |--------|-------|------------------|----------------|-------| | **mss** | โšกโšกโšก Fastest | โŒ (crop after) | โœ… | ~30x faster, recommended | | **pyautogui** | โšก Slow | โœ… Interactive | โœ… | Best for region selection | | **pillow** | โšก Slow | โœ… Coordinates | โœ… | Built into Pillow | | **pyscreenshot** | โšก Variable | โœ… Coordinates | โœ… | Multiple backends | | **macos** | โšกโšก Fast | โœ… Native UI | ๐ŸŽ macOS only | Native drag selection | ### How it works 1. **Screenshot**: Multiple cross-platform methods available - **Auto**: Tries best method for your platform - **MSS**: Fastest full-screen capture - **Interactive**: Guided region selection - **macOS**: Native drag-to-select interface 2. **OCR**: Advanced DocTR processing - Uses state-of-the-art PARSeq recognition model - Preserves text layout and indentation - Handles multiple languages 3. **Annotation** (optional): Visual feedback - Word-level bounding boxes (red) - Line-level groupings (green) - Block-level sections (blue) - Text overlay showing detected content 4. **Output**: Formatted text copied to clipboard ### Command Line Options ```bash ocr-screenshot [OPTIONS] Options: --lang TEXT Language code for OCR (default: eng) --save-image Save the screenshot image --output-dir PATH Directory to save images (default: ~/Desktop) --verbose Show detailed output --annotate Create annotated image with detection boxes --show-words Show word-level boxes (default: True) --show-lines Show line-level boxes --show-blocks Show block-level boxes --show-text Overlay detected text on image --screenshot-method TEXT Method: auto, mss, pyautogui, pillow, pyscreenshot, macos, interactive --monitor-number INTEGER Monitor to capture (MSS method only, 0=all) --help Show this message and exit ``` ### Examples **Quick OCR with fastest method:** ```bash ocr-screenshot --screenshot-method mss ``` **Debug OCR accuracy with annotations:** ```bash ocr-screenshot --annotate --show-words --show-text --save-image --verbose ``` **Interactive region selection:** ```bash ocr-screenshot --screenshot-method interactive --save-image ``` **Multi-monitor setup (capture monitor 2):** ```bash ocr-screenshot --screenshot-method mss --monitor-number 2 ``` ## Development Guide ### How to Add New Packages To add a new production dependency (e.g., 'requests'): ```bash uv add requests ``` To add a new development dependency (e.g., 'ipdb'): ```bash uv add --dev ipdb ``` After adding dependencies, always re-generate requirements.txt: ```bash uv pip compile pyproject.toml -o requirements.txt ``` ### How to Build Packages To build your project's distributable packages (.whl, .tar.gz): ```bash python -m build ``` Or using the virtual environment directly: ```bash ./venv/bin/python -m build ``` ### Offline Build To build offline packages for deployment: ```bash ./dev_scripts/build_offline.sh ``` This will create offline_packages/ with all dependencies and install.sh