Pronunciation

Practice speaking words and sentences with local speech recognition. No cloud, no latency.

Steam feature — included in all 8 Practice modes.

How It Works

Everything runs on your CPU — no GPU needed, no internet, no data leaves your PC.

Accuracy formula:

accuracy = correct_words / max(expected_words, recognised_words)

Words are normalized (lowercase, alphanumeric only) before comparison. Numbers are converted to words (e.g., "1" → "one") for flexible matching.

Feedback: Word-by-word mismatch details showing which words were wrong and what was recognised instead.

The Whisper model downloads automatically on first use. Playto tries models in order of quality:

Audio format: 16 kHz mono, 32-bit float PCM. Uses 4 CPU threads with greedy sampling strategy.