a
This commit is contained in:
@@ -5,40 +5,170 @@ A collection of useful command-line tools.
|
||||
|
||||
## OCR Screenshot Tool
|
||||
|
||||
A CLI tool that takes region screenshots on macOS, performs OCR using Tesseract, and copies the result to clipboard.
|
||||
A cross-platform CLI tool that takes screenshots, performs OCR using DocTR (state-of-the-art deep learning OCR), and copies the result to clipboard. Features intelligent text formatting preservation and optional image annotation.
|
||||
|
||||
### Prerequisites
|
||||
### Features
|
||||
|
||||
- macOS (uses built-in `screencapture` command)
|
||||
- Tesseract OCR (install with `brew install tesseract`)
|
||||
- 🌍 **Cross-platform** - Works on Windows, macOS, and Linux
|
||||
- ⚡ **Multiple screenshot methods** - Choose the fastest for your system
|
||||
- 🔍 **Advanced OCR** - Uses DocTR with PARSeq recognition model
|
||||
- 📝 **Smart formatting** - Preserves text layout and indentation
|
||||
- 🎨 **Image annotation** - Visualize detected text regions
|
||||
- 📋 **Clipboard integration** - Automatic text copying
|
||||
|
||||
### Installation
|
||||
|
||||
#### Basic installation:
|
||||
```bash
|
||||
pip install .
|
||||
```
|
||||
|
||||
#### With cross-platform screenshot support:
|
||||
```bash
|
||||
# For fastest screenshots (recommended)
|
||||
pip install ".[screenshot-fast]"
|
||||
|
||||
# For full automation features (region selection)
|
||||
pip install ".[screenshot-full]"
|
||||
|
||||
# For maximum compatibility (all backends)
|
||||
pip install ".[screenshot-all]"
|
||||
```
|
||||
|
||||
#### Install specific screenshot libraries:
|
||||
```bash
|
||||
pip install mss # Fastest (~30x faster than others)
|
||||
pip install pyautogui # Interactive region selection
|
||||
pip install pyscreenshot # Multiple backends
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
Basic usage (takes screenshot, performs OCR, copies to clipboard):
|
||||
#### Basic Commands
|
||||
|
||||
Take a screenshot and perform OCR:
|
||||
```bash
|
||||
uv run ocr-screenshot
|
||||
ocr-screenshot
|
||||
```
|
||||
|
||||
With verbose output:
|
||||
With verbose output and annotation:
|
||||
```bash
|
||||
uv run ocr-screenshot --verbose
|
||||
ocr-screenshot --verbose --annotate --save-image
|
||||
```
|
||||
|
||||
Save the screenshot image:
|
||||
#### Screenshot Methods
|
||||
|
||||
Choose your preferred screenshot method:
|
||||
|
||||
```bash
|
||||
uv run ocr-screenshot --save-image
|
||||
# Auto-detect best method (default)
|
||||
ocr-screenshot --screenshot-method auto
|
||||
|
||||
# Use MSS (fastest)
|
||||
ocr-screenshot --screenshot-method mss
|
||||
|
||||
# Use PyAutoGUI (supports region selection)
|
||||
ocr-screenshot --screenshot-method pyautogui
|
||||
|
||||
# Use Pillow ImageGrab (built-in)
|
||||
ocr-screenshot --screenshot-method pillow
|
||||
|
||||
# Interactive region selection
|
||||
ocr-screenshot --screenshot-method interactive
|
||||
|
||||
# macOS native (region selection with drag)
|
||||
ocr-screenshot --screenshot-method macos
|
||||
```
|
||||
|
||||
Specify OCR language (e.g., for Chinese):
|
||||
#### Advanced Features
|
||||
|
||||
Save screenshot with annotation showing detected text:
|
||||
```bash
|
||||
uv run ocr-screenshot --lang chi_sim
|
||||
ocr-screenshot --save-image --annotate --show-words --show-text
|
||||
```
|
||||
|
||||
Capture specific monitor (MSS method):
|
||||
```bash
|
||||
ocr-screenshot --screenshot-method mss --monitor-number 2
|
||||
```
|
||||
|
||||
Full annotation with all detection levels:
|
||||
```bash
|
||||
ocr-screenshot --annotate --show-words --show-lines --show-blocks --show-text --save-image
|
||||
```
|
||||
|
||||
### Screenshot Method Comparison
|
||||
|
||||
| Method | Speed | Region Selection | Cross-Platform | Notes |
|
||||
|--------|-------|------------------|----------------|-------|
|
||||
| **mss** | ⚡⚡⚡ Fastest | ❌ (crop after) | ✅ | ~30x faster, recommended |
|
||||
| **pyautogui** | ⚡ Slow | ✅ Interactive | ✅ | Best for region selection |
|
||||
| **pillow** | ⚡ Slow | ✅ Coordinates | ✅ | Built into Pillow |
|
||||
| **pyscreenshot** | ⚡ Variable | ✅ Coordinates | ✅ | Multiple backends |
|
||||
| **macos** | ⚡⚡ Fast | ✅ Native UI | 🍎 macOS only | Native drag selection |
|
||||
|
||||
### How it works
|
||||
|
||||
1. **Screenshot**: Click and drag to select a region, or press Space to capture an entire window
|
||||
2. **OCR**: The selected region is processed with Tesseract OCR
|
||||
3. **Clipboard**: The extracted text is automatically copied to your clipboard
|
||||
1. **Screenshot**: Multiple cross-platform methods available
|
||||
- **Auto**: Tries best method for your platform
|
||||
- **MSS**: Fastest full-screen capture
|
||||
- **Interactive**: Guided region selection
|
||||
- **macOS**: Native drag-to-select interface
|
||||
|
||||
2. **OCR**: Advanced DocTR processing
|
||||
- Uses state-of-the-art PARSeq recognition model
|
||||
- Preserves text layout and indentation
|
||||
- Handles multiple languages
|
||||
|
||||
3. **Annotation** (optional): Visual feedback
|
||||
- Word-level bounding boxes (red)
|
||||
- Line-level groupings (green)
|
||||
- Block-level sections (blue)
|
||||
- Text overlay showing detected content
|
||||
|
||||
4. **Output**: Formatted text copied to clipboard
|
||||
|
||||
### Command Line Options
|
||||
|
||||
```bash
|
||||
ocr-screenshot [OPTIONS]
|
||||
|
||||
Options:
|
||||
--lang TEXT Language code for OCR (default: eng)
|
||||
--save-image Save the screenshot image
|
||||
--output-dir PATH Directory to save images (default: ~/Desktop)
|
||||
--verbose Show detailed output
|
||||
--annotate Create annotated image with detection boxes
|
||||
--show-words Show word-level boxes (default: True)
|
||||
--show-lines Show line-level boxes
|
||||
--show-blocks Show block-level boxes
|
||||
--show-text Overlay detected text on image
|
||||
--screenshot-method TEXT Method: auto, mss, pyautogui, pillow, pyscreenshot, macos, interactive
|
||||
--monitor-number INTEGER Monitor to capture (MSS method only, 0=all)
|
||||
--help Show this message and exit
|
||||
```
|
||||
|
||||
### Examples
|
||||
|
||||
**Quick OCR with fastest method:**
|
||||
```bash
|
||||
ocr-screenshot --screenshot-method mss
|
||||
```
|
||||
|
||||
**Debug OCR accuracy with annotations:**
|
||||
```bash
|
||||
ocr-screenshot --annotate --show-words --show-text --save-image --verbose
|
||||
```
|
||||
|
||||
**Interactive region selection:**
|
||||
```bash
|
||||
ocr-screenshot --screenshot-method interactive --save-image
|
||||
```
|
||||
|
||||
**Multi-monitor setup (capture monitor 2):**
|
||||
```bash
|
||||
ocr-screenshot --screenshot-method mss --monitor-number 2
|
||||
```
|
||||
|
||||
## Development Guide
|
||||
|
||||
|
||||
Reference in New Issue
Block a user