Files
tooling/README.md
T
dingfeng.wong e0e5edbfdc a
2025-07-22 01:51:19 +08:00

212 lines
5.6 KiB
Markdown

# Tooling
A collection of useful command-line tools.
## OCR Screenshot Tool
A cross-platform CLI tool that takes screenshots, performs OCR using DocTR (state-of-the-art deep learning OCR), and copies the result to clipboard. Features intelligent text formatting preservation and optional image annotation.
### Features
- 🌍 **Cross-platform** - Works on Windows, macOS, and Linux
-**Multiple screenshot methods** - Choose the fastest for your system
- 🔍 **Advanced OCR** - Uses DocTR with PARSeq recognition model
- 📝 **Smart formatting** - Preserves text layout and indentation
- 🎨 **Image annotation** - Visualize detected text regions
- 📋 **Clipboard integration** - Automatic text copying
### Installation
#### Basic installation:
```bash
pip install .
```
#### With cross-platform screenshot support:
```bash
# For fastest screenshots (recommended)
pip install ".[screenshot-fast]"
# For full automation features (region selection)
pip install ".[screenshot-full]"
# For maximum compatibility (all backends)
pip install ".[screenshot-all]"
```
#### Install specific screenshot libraries:
```bash
pip install mss # Fastest (~30x faster than others)
pip install pyautogui # Interactive region selection
pip install pyscreenshot # Multiple backends
```
### Usage
#### Basic Commands
Take a screenshot and perform OCR:
```bash
ocr-screenshot
```
With verbose output and annotation:
```bash
ocr-screenshot --verbose --annotate --save-image
```
#### Screenshot Methods
Choose your preferred screenshot method:
```bash
# Auto-detect best method (default)
ocr-screenshot --screenshot-method auto
# Use MSS (fastest)
ocr-screenshot --screenshot-method mss
# Use PyAutoGUI (supports region selection)
ocr-screenshot --screenshot-method pyautogui
# Use Pillow ImageGrab (built-in)
ocr-screenshot --screenshot-method pillow
# Interactive region selection
ocr-screenshot --screenshot-method interactive
# macOS native (region selection with drag)
ocr-screenshot --screenshot-method macos
```
#### Advanced Features
Save screenshot with annotation showing detected text:
```bash
ocr-screenshot --save-image --annotate --show-words --show-text
```
Capture specific monitor (MSS method):
```bash
ocr-screenshot --screenshot-method mss --monitor-number 2
```
Full annotation with all detection levels:
```bash
ocr-screenshot --annotate --show-words --show-lines --show-blocks --show-text --save-image
```
### Screenshot Method Comparison
| Method | Speed | Region Selection | Cross-Platform | Notes |
|--------|-------|------------------|----------------|-------|
| **mss** | ⚡⚡⚡ Fastest | ❌ (crop after) | ✅ | ~30x faster, recommended |
| **pyautogui** | ⚡ Slow | ✅ Interactive | ✅ | Best for region selection |
| **pillow** | ⚡ Slow | ✅ Coordinates | ✅ | Built into Pillow |
| **pyscreenshot** | ⚡ Variable | ✅ Coordinates | ✅ | Multiple backends |
| **macos** | ⚡⚡ Fast | ✅ Native UI | 🍎 macOS only | Native drag selection |
### How it works
1. **Screenshot**: Multiple cross-platform methods available
- **Auto**: Tries best method for your platform
- **MSS**: Fastest full-screen capture
- **Interactive**: Guided region selection
- **macOS**: Native drag-to-select interface
2. **OCR**: Advanced DocTR processing
- Uses state-of-the-art PARSeq recognition model
- Preserves text layout and indentation
- Handles multiple languages
3. **Annotation** (optional): Visual feedback
- Word-level bounding boxes (red)
- Line-level groupings (green)
- Block-level sections (blue)
- Text overlay showing detected content
4. **Output**: Formatted text copied to clipboard
### Command Line Options
```bash
ocr-screenshot [OPTIONS]
Options:
--lang TEXT Language code for OCR (default: eng)
--save-image Save the screenshot image
--output-dir PATH Directory to save images (default: ~/Desktop)
--verbose Show detailed output
--annotate Create annotated image with detection boxes
--show-words Show word-level boxes (default: True)
--show-lines Show line-level boxes
--show-blocks Show block-level boxes
--show-text Overlay detected text on image
--screenshot-method TEXT Method: auto, mss, pyautogui, pillow, pyscreenshot, macos, interactive
--monitor-number INTEGER Monitor to capture (MSS method only, 0=all)
--help Show this message and exit
```
### Examples
**Quick OCR with fastest method:**
```bash
ocr-screenshot --screenshot-method mss
```
**Debug OCR accuracy with annotations:**
```bash
ocr-screenshot --annotate --show-words --show-text --save-image --verbose
```
**Interactive region selection:**
```bash
ocr-screenshot --screenshot-method interactive --save-image
```
**Multi-monitor setup (capture monitor 2):**
```bash
ocr-screenshot --screenshot-method mss --monitor-number 2
```
## Development Guide
### How to Add New Packages
To add a new production dependency (e.g., 'requests'):
```bash
uv add requests
```
To add a new development dependency (e.g., 'ipdb'):
```bash
uv add --dev ipdb
```
After adding dependencies, always re-generate requirements.txt:
```bash
uv pip compile pyproject.toml -o requirements.txt
```
### How to Build Packages
To build your project's distributable packages (.whl, .tar.gz):
```bash
python -m build
```
Or using the virtual environment directly:
```bash
./venv/bin/python -m build
```
### Offline Build
To build offline packages for deployment:
```bash
./dev_scripts/build_offline.sh
```
This will create offline_packages/ with all dependencies and install.sh