phonon / audiobench

phonon // evaluation infra for audio ML

Benchmarks for audio models that aren't just speech.

audiobench is an open, reproducible evaluation suite for ASR, separation, tagging, and the long tail of audio ML tasks. One command. Real workloads. No marketing-grade numbers.

$ pip install audiobench   read the docs →

47 tasks live · 218k eval clips · MIT license

Most audio benchmarks lie by omission.

  • Single-number vanity metrics: WER on LibriSpeech-clean tells you nothing about how a model handles noisy restaurants, code-switching, or 8kHz phone audio.
  • Train-test contamination: Public eval sets leak into pretraining corpora. audiobench includes held-out, freshly-licensed clips you can trust.
  • Reproducibility by accident: Pinned data revisions, deterministic decoding, hash-verified outputs.
  • Built by audio engineers: Sample rate, dynamic range, and codec round-trips are first-class concerns.

Eight task families. More on the way.

  • ab/asr-robust — speech recognition (12 tasks, stable)
  • ab/separation-musdb+ — source separation (6 tasks, stable)
  • ab/tagging-audioset-v2 — audio tagging (9 tasks, stable)
  • ab/diarization-cw — speaker diarization (5 tasks, in design)
  • ab/music-tag-mtg — music understanding (8 tasks, in design)
  • ab/sed-urban — sound event detection (7 tasks, in design)
  • ab/codec-perceptual — neural codecs (4 tasks, in design)
  • ab/tts-eval — speech synthesis (6 tasks, in design)

Why this isn't just another benchmark repo.

The suite is free and open. The data behind hard tasks is the business. Each closes the other's gap.

  1. Researchers run audiobench — free, open evaluation suite, one command.
  2. Results land on the leaderboard — public, reproducible, model-card linked.
  3. Weak spots become datasets — phonon licenses targeted training data to fix them.
  4. New datasets feed new tasks — the suite gets harder, the moat gets deeper.

Public, append-only, hash-signed leaderboard.

Top submissions on ab/asr-robust: whisper-large-v3-turbo (4.82 WER), canary-1b (5.04), parakeet-tdt-1.1b (5.31), seamlessm4t-v2-large (6.18).