Pronunciation
Practice speaking words and sentences with local speech recognition. No cloud, no latency.
Steam feature — included in all 8 Practice modes.
How It Works
- A word or sentence from your Word Book is displayed
- Click the microphone button and speak
- Playto uses Whisper (via whisper-rs) to transcribe your speech locally
- Your pronunciation is scored by comparing against the expected text
Everything runs on your CPU — no GPU needed, no internet, no data leaves your PC.
Scoring
Accuracy formula:
accuracy = correct_words / max(expected_words, recognised_words)
Words are normalized (lowercase, alphanumeric only) before comparison. Numbers are converted to words (e.g., "1" → "one") for flexible matching.
Feedback: Word-by-word mismatch details showing which words were wrong and what was recognised instead.
Whisper Model
The Whisper model downloads automatically on first use. Playto tries models in order of quality:
| Model | Size | Quality |
|---|---|---|
| ggml-small.bin | ~460 MB | Best |
| ggml-base.bin | ~140 MB | Good (default) |
| ggml-tiny.bin | ~75 MB | Fast |
Audio format: 16 kHz mono, 32-bit float PCM. Uses 4 CPU threads with greedy sampling strategy.