playto / Docs / Pronunciation

Pronunciation

Practice speaking words and sentences with local speech recognition. No cloud, no latency.

Steam feature — included in all 8 Practice modes.

How It Works

  1. A word or sentence from your Word Book is displayed
  2. Click the microphone button and speak
  3. Playto uses Whisper (via whisper-rs) to transcribe your speech locally
  4. Your pronunciation is scored by comparing against the expected text

Everything runs on your CPU — no GPU needed, no internet, no data leaves your PC.

Scoring

Accuracy formula:

accuracy = correct_words / max(expected_words, recognised_words)

Words are normalized (lowercase, alphanumeric only) before comparison. Numbers are converted to words (e.g., "1" → "one") for flexible matching.

Feedback: Word-by-word mismatch details showing which words were wrong and what was recognised instead.

Whisper Model

The Whisper model downloads automatically on first use. Playto tries models in order of quality:

ModelSizeQuality
ggml-small.bin~460 MBBest
ggml-base.bin~140 MBGood (default)
ggml-tiny.bin~75 MBFast

Audio format: 16 kHz mono, 32-bit float PCM. Uses 4 CPU threads with greedy sampling strategy.