Skip to content

Audio challenges

AudioRenderer is the accessibility alternative to ImageRenderer. It encodes the challenge's solution as a sequence of sine-wave tones — one per character — and returns a WAV byte-stream. No extra dependencies.

from captchakit import (
    AudioRenderer,
    CaptchaManager,
    MemoryStorage,
    TextChallengeFactory,
)

manager = CaptchaManager(
    factory=TextChallengeFactory(length=5, charset="0123456789"),
    renderer=AudioRenderer(),
    storage=MemoryStorage(),
)
cid, wav_bytes = await manager.issue()

Each character is mapped to a frequency (digits climb a pentatonic scale; lowercase letters span 300–1200 Hz). The result is not intelligible speech — it is a distinctive, bot-unfriendly audio fingerprint that a human with audio can transcribe after one or two listens.

Pairing with image renderer

Best practice for accessibility: expose both renderers and let the user pick. Hold on to the challenge id across the two endpoints:

# single manager, dispatch rendering in the adapter layer
spec = await manager.factory.create()
...
# OR use two managers sharing a Storage

For a quick approach, issue with the image renderer and render a second WAV on demand using the stored Challenge.solution.

Custom tone mapping

from captchakit import AudioRenderer

renderer = AudioRenderer(
    tone_map={"0": 300.0, "1": 400.0, "2": 500.0},
    fallback_freq=220.0,
    tone_ms=500,
    gap_ms=150,
)

Beyond tones: real TTS

If you need a human voice saying the characters, write a small adapter around gTTS, pyttsx3 or a cloud TTS API — anything that implements the Renderer protocol (async def render(challenge) -> bytes + content_type) works. captchakit intentionally keeps TTS out of the core to avoid heavy runtime dependencies.