12 KiB
Tooling
A collection of useful command-line tools.
OCR Screenshot Tool
A cross-platform CLI tool that takes screenshots, performs OCR using DocTR (state-of-the-art deep learning OCR), and copies the result to clipboard. Features intelligent text formatting preservation and optional image annotation.
Features
- 🌍 Cross-platform - Works on Windows, macOS, and Linux
- ⚡ Multiple screenshot methods - Choose the fastest for your system
- 🔍 Advanced OCR - Uses DocTR with PARSeq recognition model
- 📝 Smart formatting - Preserves text layout and indentation
- 🎨 Image annotation - Visualize detected text regions
- 📋 Clipboard integration - Automatic text copying
Installation
Basic installation:
pip install .
With cross-platform screenshot support:
# For fastest screenshots (recommended)
pip install ".[screenshot-fast]"
# For full automation features (region selection)
pip install ".[screenshot-full]"
# For maximum compatibility (all backends)
pip install ".[screenshot-all]"
Install specific screenshot libraries:
pip install mss # Fastest (~30x faster than others)
pip install pyautogui # Interactive region selection
pip install pyscreenshot # Multiple backends
Usage
Basic Commands
Take a screenshot and perform OCR:
ocr-screenshot
With verbose output and annotation:
ocr-screenshot --verbose --annotate --save-image
Screenshot Methods
Choose your preferred screenshot method:
# Auto-detect best method (default)
ocr-screenshot --screenshot-method auto
# Use MSS (fastest)
ocr-screenshot --screenshot-method mss
# Use PyAutoGUI (supports region selection)
ocr-screenshot --screenshot-method pyautogui
# Use Pillow ImageGrab (built-in)
ocr-screenshot --screenshot-method pillow
# Interactive region selection
ocr-screenshot --screenshot-method interactive
# macOS native (region selection with drag)
ocr-screenshot --screenshot-method macos
Advanced Features
Save screenshot with annotation showing detected text:
ocr-screenshot --save-image --annotate --show-words --show-text
Capture specific monitor (MSS method):
ocr-screenshot --screenshot-method mss --monitor-number 2
Full annotation with all detection levels:
ocr-screenshot --annotate --show-words --show-lines --show-blocks --show-text --save-image
Screenshot Method Comparison
| Method | Speed | Region Selection | Cross-Platform | Notes |
|---|---|---|---|---|
| mss | ⚡⚡⚡ Fastest | ❌ (crop after) | ✅ | ~30x faster, recommended |
| pyautogui | ⚡ Slow | ✅ Interactive | ✅ | Best for region selection |
| pillow | ⚡ Slow | ✅ Coordinates | ✅ | Built into Pillow |
| pyscreenshot | ⚡ Variable | ✅ Coordinates | ✅ | Multiple backends |
| macos | ⚡⚡ Fast | ✅ Native UI | 🍎 macOS only | Native drag selection |
How it works
-
Screenshot: Multiple cross-platform methods available
- Auto: Tries best method for your platform
- MSS: Fastest full-screen capture
- Interactive: Guided region selection
- macOS: Native drag-to-select interface
-
OCR: Advanced DocTR processing
- Uses state-of-the-art PARSeq recognition model
- Preserves text layout and indentation
- Handles multiple languages
-
Annotation (optional): Visual feedback
- Word-level bounding boxes (red)
- Line-level groupings (green)
- Block-level sections (blue)
- Text overlay showing detected content
-
Output: Formatted text copied to clipboard
Command Line Options
ocr-screenshot [OPTIONS]
Options:
--lang TEXT Language code for OCR (default: eng)
--save-image Save the screenshot image
--output-dir PATH Directory to save images (default: ~/Desktop)
--verbose Show detailed output
--annotate Create annotated image with detection boxes
--show-words Show word-level boxes (default: True)
--show-lines Show line-level boxes
--show-blocks Show block-level boxes
--show-text Overlay detected text on image
--screenshot-method TEXT Method: auto, mss, pyautogui, pillow, pyscreenshot, macos, interactive
--monitor-number INTEGER Monitor to capture (MSS method only, 0=all)
--help Show this message and exit
Examples
Quick OCR with fastest method:
ocr-screenshot --screenshot-method mss
Debug OCR accuracy with annotations:
ocr-screenshot --annotate --show-words --show-text --save-image --verbose
Interactive region selection:
ocr-screenshot --screenshot-method interactive --save-image
Multi-monitor setup (capture monitor 2):
ocr-screenshot --screenshot-method mss --monitor-number 2
Speech-to-Text (STT) Tool
A real-time speech-to-text tool using RealtimeSTT with wake word activation. Features the "jarvis" wake word by default and supports live transcription with various output options.
Features
- 🎙️ Real-time transcription - Live speech-to-text conversion
- 🎯 Wake word activation - Multiple wake words including "jarvis"
- ⚡ GPU acceleration - CUDA support for faster processing
- 🔄 Live display - Real-time transcription preview
- 💾 File output - Save transcriptions to text files
- 🎛️ Multiple models - Choose from tiny to large Whisper models
- 🌍 Multi-language - Support for multiple languages
- 🧪 Test mode - Test functionality without wake words
Installation
The STT dependencies are included in the base installation:
pip install .
For optimal performance with GPU acceleration:
# For CUDA 11.8
pip install torch==2.5.1+cu118 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.X
pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
Usage
Basic Commands
Start STT with jarvis wake word:
tooling stt listen
Test STT without wake words:
tooling stt test
Show system information:
tooling stt info
Wake Word Options
Use different wake words:
# Use alexa wake word
tooling stt listen --wake-word alexa
# Use hey google wake word
tooling stt listen --wake-word "hey google"
# Use computer wake word
tooling stt listen --wake-word computer
Model Selection
Choose different Whisper models for speed vs accuracy:
# Fastest (tiny model)
tooling stt listen --model tiny
# Balanced (base model, default)
tooling stt listen --model base
# Best accuracy (large model)
tooling stt listen --model large-v2
Advanced Features
Save transcriptions to file:
tooling stt listen --save-to-file transcripts.txt
Disable real-time display for better performance:
tooling stt listen --no-realtime
Set custom sensitivity and language:
tooling stt listen --sensitivity 0.8 --language en --verbose
Force CPU usage:
tooling stt listen --device cpu
Available Wake Words
The following wake words are supported:
- jarvis (default)
- alexa
- americano
- blueberry
- bumblebee
- computer
- grapefruits
- grasshopper
- hey google
- hey siri
- ok google
- picovoice
- porcupine
- terminator
Wake Word Engines
Two wake word engines are supported:
- openwakeword (default) - Open source, free to use, good accuracy
- pvporcupine - Picovoice's Porcupine engine, highly optimized
Choose the engine based on your requirements:
# Use OpenWakeWord (default)
tooling stt listen --wakeword-engine openwakeword
# Use Porcupine for better performance
tooling stt listen --wakeword-engine pvporcupine
Available Models
| Model | Speed | Accuracy | Memory | Use Case |
|---|---|---|---|---|
| tiny | ⚡⚡⚡ | ⭐⭐ | 39MB | Testing, low-power devices |
| base | ⚡⚡ | ⭐⭐⭐ | 74MB | Balanced (default) |
| small | ⚡ | ⭐⭐⭐⭐ | 244MB | Better accuracy |
| medium | 🐌 | ⭐⭐⭐⭐⭐ | 769MB | High accuracy |
| large-v2 | 🐌🐌 | ⭐⭐⭐⭐⭐ | 1550MB | Best accuracy |
Command Line Options
tooling stt listen [OPTIONS]
Options:
--wake-word TEXT Wake word to activate recording [default: jarvis]
--model TEXT Whisper model (tiny, base, small, medium, large-v2) [default: base]
--language TEXT Language code for transcription (empty for auto-detection)
--realtime/--no-realtime Enable real-time transcription display [default: realtime]
--save-to-file PATH Save transcriptions to a file
--sensitivity FLOAT Wake word sensitivity (0.0 to 1.0) [default: 0.6]
--device TEXT Device to use (auto, cuda, cpu) [default: auto]
--wakeword-engine TEXT Wake word engine (openwakeword, pvporcupine) [default: openwakeword]
--verbose Show verbose output and configuration
--help Show this message and exit
Examples
Basic usage with jarvis:
tooling stt listen
Fast transcription with tiny model:
tooling stt listen --model tiny --wake-word computer
High accuracy with file output:
tooling stt listen --model large-v2 --save-to-file meeting_notes.txt --verbose
Quick test without wake words:
tooling stt test --duration 5 --model tiny
Custom language and sensitivity:
tooling stt listen --language es --sensitivity 0.8 --wake-word "hey google"
Use different wake word engine:
tooling stt listen --wakeword-engine pvporcupine --wake-word alexa
How it Works
- Initialization: Loads the selected Whisper model and sets up audio processing
- Wake Word Detection: Listens for the specified wake word using Porcupine or OpenWakeWord
- Voice Activity Detection: Uses WebRTC VAD and Silero VAD for accurate speech detection
- Real-time Transcription: Processes audio chunks in real-time (optional)
- Final Transcription: Generates high-quality final transcription when speech ends
- Output: Displays results and optionally saves to file
Performance Tips
- GPU: Use CUDA for 3-5x faster transcription
- Model: Use
tinyorbasefor real-time applications - Sensitivity: Adjust wake word sensitivity based on environment noise
- Device: Set
--device cpuif experiencing GPU memory issues - Real-time: Disable
--no-realtimefor better final transcription performance
Troubleshooting
No microphone detected:
# Check audio devices
tooling stt info
CUDA not available:
# Install CUDA-enabled PyTorch
pip install torch==2.5.1+cu121 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
Wake word not detected:
# Increase sensitivity
tooling stt listen --sensitivity 0.8 --verbose
Poor transcription quality:
# Use larger model
tooling stt listen --model large-v2
Development Guide
How to Add New Packages
To add a new production dependency (e.g., 'requests'):
uv add requests
To add a new development dependency (e.g., 'ipdb'):
uv add --dev ipdb
After adding dependencies, always re-generate requirements.txt:
uv pip compile pyproject.toml -o requirements.txt
How to Build Packages
To build your project's distributable packages (.whl, .tar.gz):
python -m build
Or using the virtual environment directly:
./venv/bin/python -m build
Offline Build
To build offline packages for deployment:
./dev_scripts/build_offline.sh
This will create offline_packages/ with all dependencies and install.sh