In a previous post, I set up Kokoro TTS so Claude Code could speak short conversational responses via a /voice slash command. It worked — but it had a problem. The agent would sometimes forget to speak after a few turns, because the voice behavior instructions only lived in the skill invocation context and faded as the conversation grew. This post covers how I fixed that with a Claude Code hook, and the design tradeoffs I considered along the way.
Table of contents
Open Table of contents
The Problem: Skills Are One-Shot Context
When you invoke /voice on in Claude Code, the skill’s instructions get injected into the conversation for that turn. Claude reads the rules (“speak short responses, skip code and diffs”), enables the config, and speaks a confirmation. Great.
But those instructions are not permanent. As the conversation grows and context compresses, the behavioral rules gradually lose influence. Ten turns later, Claude is back to text-only responses — not because voice is disabled, but because the instructions are no longer salient in context.
The original skill tried to solve this by being thorough: 56 lines of detailed rules about when to speak and when not to. But verbose instructions do not help if they are 4000 tokens back in context and competing with recent code diffs for attention.
Three Approaches I Considered
1. Hook as Speaker
The most obvious idea: use a Stop hook (fires after every Claude response) to read the transcript, extract the response, and pipe it through say. Let the hook handle all speech.
This fails because the value of voice mode is editorial judgment. Claude does not just parrot its text output — it composes a separate, shorter spoken version. It writes a 30-line diff as text but speaks “Done, fixed the import error.” A hook script cannot make that editorial decision. You would need one of:
- Heuristics (length checks, regex for code fences) — fragile, often wrong
- An LLM in the hook (agent-type hook) — adds latency and cost to every response, even when voice is off
- A special marker in Claude’s output (e.g.,
<!-- voice: Done -->) — the most promising but still brittle
None of these are better than just letting Claude decide when to speak inline.
2. Hook as Reminder
Instead of having the hook do the speaking, have it remind Claude to speak. A Stop hook that injects a one-line system message: “Voice is on. Remember to speak short responses.”
This is lightweight: the hook checks a config file, and if voice is enabled, returns a short JSON blob with a systemMessage field. That message gets injected into Claude’s context for the next turn. The behavioral knowledge stays fresh without restating the full rules every time.
3. No Hook, Just Better Instructions
Alternatively, keep everything in the skill and make the instructions stickier. Shorter, punchier rules that survive context compression better.
This helps but does not solve the fundamental problem: skill instructions are injected once and decay over time. A hook is structural — it fires after every response regardless of context pressure.
I went with approach 2 — the reminder hook — combined with slimming down the skill instructions.
The Implementation
The Hook Script
#!/bin/bash
# .claude/hooks/voice-reminder.sh
# Stop hook -- injects a short voice-mode reminder when enabled.
# Fires after every Claude response. No-ops instantly when voice is off.
set -euo pipefail
CONFIG="${CLAUDE_PROJECT_DIR:-.}/.claude/voice-config.json"
# Drain stdin (hook sends JSON context we don't need)
cat > /dev/null
# Bail fast if voice is off or config missing
if [[ ! -f "$CONFIG" ]]; then
exit 0
fi
ENABLED=$(jq -r '.enabled // false' "$CONFIG" 2>/dev/null || echo "false")
if [[ "$ENABLED" != "true" ]]; then
exit 0
fi
# Inject terse reminder for next turn
cat <<'EOF'
{"systemMessage":"Voice ON. Speak short responses (acks, status, questions, completions) via say \"message\". Skip code, diffs, long output, anything >2 sentences."}
EOF
Key design choices:
- Bail fast. When voice is off (the common case), the script reads the config and exits in a few milliseconds. No wasted work.
- Drain stdin. Claude Code sends JSON context to hooks via stdin. We do not need it, but we read it to avoid a broken pipe.
- Terse reminder. The system message is one line, about 25 tokens. Compare to the original 56-line skill instructions. This fires after every response, so brevity matters.
Registering the Hook
Add it to .claude/settings.json:
{
"permissions": {
"allow": [
"Bash(say *)"
]
},
"hooks": {
"Stop": [
{
"hooks": [
{
"type": "command",
"command": ".claude/hooks/voice-reminder.sh"
}
]
}
]
}
}
Two things happening here:
- The Stop hook registers
voice-reminder.shto fire after every response. - The
say *permission auto-allows allsaycommands without a confirmation prompt. Without this, every spoken response triggers a “Allow Bash: say …?” dialog, which defeats the purpose.
Slimming the Skill
The /voice skill went from 56 lines to 27. Before, it contained all the behavioral rules (when to speak, when not to, how to sound natural). Now it is just a toggle:
## Toggle
- **`on`**: Set `enabled: true`. Confirm: `say "Voice mode on."`.
- **`off`**: Set `enabled: false`. Print "Voice mode off." (don't speak).
- **`status`**: Report current state (enabled, voice name).
- **No argument**: Toggle current state.
A Stop hook automatically reminds you about voice behavior
while enabled -- no rules to memorize here.
The behavioral knowledge moved from the skill (injected once, decays) to the hook (injected every turn, persistent). The skill just flips the switch.
How It Flows
User: /voice on
-> Skill reads config, sets enabled: true
-> Claude speaks "Voice mode on."
-> Stop hook fires, sees enabled: true, injects reminder
User: "fix the auth bug"
-> Claude sees reminder in context: "Voice ON. Speak short responses..."
-> Claude speaks "Got it, looking into that."
-> Claude reads files, writes fix, shows diff as text
-> Claude speaks "Done. Fixed the token validation."
-> Stop hook fires again, refreshes the reminder
User: /voice off
-> Skill sets enabled: false
-> Stop hook fires, sees enabled: false, exits silently
-> No more reminders until re-enabled
The hook costs essentially nothing when voice is off (a config file read), and about 25 tokens per turn when voice is on. Compared to the original approach — where the full behavioral instructions consumed 300+ tokens and still decayed over time — this is both cheaper and more reliable.
When This Pattern Is Useful Beyond Voice
The “Stop hook as persistent reminder” pattern works for any session-level behavior that should survive context compression:
- Code style enforcement. A hook that reminds the agent about project conventions (naming, error handling, test patterns) after every response.
- Safety constraints. A hook that reinforces rules like “never modify production config” or “always run tests after editing.”
- Mode toggles. Any feature where the user expects the agent to “remember” a preference — verbose mode, a specific output format, a target branch.
The key insight: skills inject context once, hooks inject it continuously. Use skills for actions (toggle a setting, run a command) and hooks for persistence (keep the agent aware of the setting).
Making say Non-Blocking
One subtle UX problem with the original say script: it was synchronous. When Claude called say "Done, fixed the import error.", the Bash tool waited for the entire TTS pipeline — HTTP request to Kokoro, audio download, full playback via ffplay — before returning. That meant Claude’s text output stalled for 1-3 seconds while audio played. In a voice-enhanced workflow, you want speech and text output to happen in parallel.
The first fix was a one-line change at the end of the say script:
# Play audio in background — script returns immediately, subshell cleans up after playback
(ffplay -nodisp -autoexit -loglevel quiet "$TMPFILE" 2>/dev/null; rm -f "$TMPFILE") &
The subshell (...) groups the playback and cleanup, and & backgrounds it. The script returns immediately after curl finishes downloading the audio, Claude continues its text output, and the audio plays concurrently. The temp file gets cleaned up after playback finishes, even though the parent script is long gone.
This matters more than it sounds. Without it, every spoken response adds a noticeable pause to Claude’s output. With it, speech feels like a side channel — information arrives through your ears while your eyes keep reading the terminal.
The Overlap Problem
Non-blocking playback introduced a new bug: when Claude fires multiple say calls in quick succession, the audio clips overlap and play simultaneously. This happens because each backgrounded ffplay process starts immediately — there is nothing serializing playback.
The fix is flock — a file-based lock that serializes the playback subshells while keeping the script itself non-blocking:
# Play audio in background with serialization — script returns immediately,
# but a lockfile ensures clips play sequentially (no overlap).
LOCKFILE="/tmp/say-playback.lock"
(
flock 9
ffplay -nodisp -autoexit -loglevel quiet "$TMPFILE" 2>/dev/null
rm -f "$TMPFILE"
) 9>"$LOCKFILE" &
Each say call still returns immediately (Claude does not wait), but the backgrounded subshells queue on the lockfile. The first clip grabs the lock and plays; the second waits for the lock, then plays; and so on. You get non-blocking output for the agent and sequential audio for the listener.
Known Limitations
The reminder might not be enough. The main uncertainty is whether the Stop hook’s systemMessage survives context compression as well as I hope. It gets injected fresh each turn, so even if old reminders get compressed away, the latest one should be in the hot zone of context. Early results are promising — Claude consistently remembers to speak across 20+ turn sessions now — but I have not stress-tested it with truly long conversations.
If that turns out to be a problem, the next step would be a smarter hook that reads the transcript ($transcript_path from the hook’s stdin JSON), checks whether Claude actually called say in its last response, and only injects the reminder if it did not. That would trade a bit more hook complexity for targeted reminders instead of blanket ones. For now, 25 tokens per turn is cheap enough that blanket reminders are fine.
Stale config across sessions. The config file persists on disk. If you end a session with voice on and start a new one, the hook will immediately start injecting reminders — even if you did not explicitly enable voice for that session. This is arguably the right default (you left it on, so it stays on), but it can be surprising. If you prefer voice to be opt-in per session, you could add a SessionStart hook that resets enabled to false on launch.
Silent first turn in new sessions. Because the Stop hook fires after a response, the very first response in a new session has no voice reminder in context. The agent responds in text only, the hook fires, and from the second turn onward speech works normally. It self-corrects in one turn, but if you are expecting immediate voice output it can be confusing. A potential fix would be to add a SessionStart hook that injects the voice reminder into the initial context if the config is already enabled.
No false-positive risk when off. One thing that works well: when voice is disabled, the hook is a true no-op. It reads one JSON file, sees false, and exits. There is no chance of the agent “remembering” voice mode from earlier in the conversation and speaking when it should not — the hook is the only thing that injects voice instructions, and it only does so when the config says to.
Setup Summary
If you want to replicate this (assuming you already have the Kokoro TTS setup from the previous post):
- Create
.claude/hooks/voice-reminder.shwith the script above chmod +x .claude/hooks/voice-reminder.sh- Add the
Stophook andBash(say *)permission to.claude/settings.json - Restart your Claude Code session (hooks are captured at session start)
/voice onto enable,/voice offto disable
The hook, the skill, and the say script are all in my popos-management repo if you want to see the full implementation.
Updated Feb 13, 2026: Added the non-blocking say section, the flock overlap fix, and documented the silent-first-turn limitation after testing in a fresh session.