voice-tts Security Report — Low Risk | ClawSafe

25 /100

voice-tts

语音输入（Whisper ASR）+ 语音输出（Edge TTS）技能，支持 agent 专属音色，可调用 send_voice_reply.mjs 发送 Telegram 语音消息

Legitimate voice TTS/ASR skill for OpenClaw with no malicious behavior, but with undocumented shell execution, credential reading, and network access in code that is not declared in SKILL.md.

Skill Namevoice-tts

Duration74.1s

Enginepi

✓

Safe to install

Add explicit declarations to SKILL.md: (1) shell:WRITE for subprocess/pip/curl usage, (2) credential reading (openclaw.json botToken access), (3) network:WRITE for Telegram API calls. Also remove the references to non-existent scripts/edge_tts and scripts/whisper from SKILL.md.

Findings 5 items

Severity	Finding	Location
Medium	Undocumented shell subprocess execution Doc Mismatch voice-asr.mjs and voice-tts.mjs use Node.js spawn() to execute Python scripts (whisper, edge-tts) and send_voice_reply.mjs runs curl. None of this shell execution is declared in SKILL.md's capability declarations. The allowed-tools section only documents Node.js CLI entrypoints but not the underlying subprocess calls. `const child = spawn('python3', [script, audioFile, ...passthrough], { stdio: ['ignore', 'pipe', 'pipe'] });` → Add shell:WRITE to SKILL.md's capability declarations, or refactor to use documented Node.js-only approaches.	`bin/voice-asr.mjs:67`
Medium	Undocumented network access (Telegram API) Doc Mismatch send_voice_reply.mjs makes HTTPS POST requests to api.telegram.org via curl to send voice messages. This network:WRITE access is not declared in SKILL.md. The Telegram bot token is also read from openclaw.json without capability declaration. await runCommand('curl', ['-s', '-o', '/dev/null', '-w', '%{http_code}', '-F', `chat_id=${chatId}`, '-F', `voice=@${voiceFile}`, '-F', `caption=${caption.slice(0, 1024)}`, apiUrl]); → Add network:WRITE and credential access to SKILL.md capability declarations.	`scripts/send_voice_reply.mjs:80`
Low	SKILL.md references non-existent internal script files Doc Mismatch SKILL.md states 'scripts/edge_tts and scripts/whisper are internal Python wrappers' but these files do not exist in the scripts/ directory. The actual execution goes through pip-installed Python packages (edge_tts, whisper) invoked via python3. This is a doc-to-code mismatch, though not malicious. `scripts/edge_tts 和 scripts/whisper 是内部 Python 封装，非直接入口` → Remove the references to scripts/edge_tts and scripts/whisper, or add actual stub wrapper scripts.	`SKILL.md:200`
Low	Credential reading from openclaw.json not capability-declared Sensitive Access getBotToken() in send_voice_reply.mjs reads ~/.openclaw/openclaw.json to extract botToken. While this is described in the parameter docs, it is not declared as a credential access capability in the skill's capability map. const cfg = vm.runInNewContext(`(${raw})`, {}); const accounts = cfg?.channels?.telegram?.accounts; return accounts[agentId]?.botToken \|\| accounts['default']?.botToken \|\| null; → Add environment:READ to capability declarations if reading openclaw.json is considered credential access.	`scripts/send_voice_reply.mjs:49`
Low	install.sh uses proxy variable unquoted in pip command Doc Mismatch In install.sh line 40-41, the PROXY variable is unquoted when interpolated into pip install command. While the variable comes from a controlled script argument (--proxy), unquoted variables in shell commands are a general code quality concern. `PIP_CMD="pip3 install edge-tts whisper click" if [[ -n "$PROXY" ]]; then PIP_CMD="pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple edge-tts whisper click" export https_proxy="$PROXY" http_proxy="$PROXY" fi` → Quote proxy variable: export https_proxy="$PROXY". The pip command itself is fine since PROXY is URL-based.	`install.sh:40`

Resource	Declared	Inferred	Status	Evidence
Filesystem	`NONE`	`READ`	✓ Aligned	bin/voice-asr.mjs:7 (fs.readFileSync reads openclaw.json); lib/config.mjs:43
Filesystem	`NONE`	`WRITE`	✓ Aligned	bin/voice-asr.mjs:85-91 (copyFileSync/unlinkSync for archiving); bin/voice-tts.m…
Shell	`NONE`	`WRITE`	✓ Aligned	bin/voice-asr.mjs:67 (spawn('python3', ...)); bin/voice-tts.mjs:51 (spawn('pytho…
Network	`NONE`	`WRITE`	✓ Aligned	scripts/send_voice_reply.mjs:80-91 (curl POST to https://api.telegram.org/)
Environment	`NONE`	`READ`	✓ Aligned	bin/voice-asr.mjs:82 (process.env.OPENCLAW_WORKSPACE); scripts/send_voice_reply.…
Skill Invoke	`NONE`	`READ`	✓ Aligned	bin/voice-asr.mjs:93-95 (generates output instructing agent to call send_voice_r…

4 findings

🔗

Medium External URL 外部 URL

http://127.0.0.1:7897

SKILL.md:50

🔗

Medium External URL 外部 URL

https://nodejs.org/

install.sh:37

🔗

Medium External URL 外部 URL

https://pypi.tuna.tsinghua.edu.cn/simple

install.sh:49

🔗

Medium External URL 外部 URL

https://api.telegram.org/bot$

scripts/send_voice_reply.mjs:80

File Tree

11 files · 34.1 KB · 980 lines

JavaScript 6f · 473L Markdown 1f · 261L Shell 2f · 208L JSON 2f · 38L

├─ ▾ 📁 bin

│ ├─ 📜 voice-asr.mjs JavaScript 127L · 5.2 KB

│ └─ 📜 voice-tts.mjs JavaScript 68L · 2.4 KB

├─ ▾ 📁 lib

│ ├─ 📜 audio.mjs JavaScript 15L · 673 B

│ ├─ 📜 config.mjs JavaScript 63L · 2.2 KB

│ └─ 📜 errors.mjs JavaScript 19L · 860 B

├─ ▾ 📁 scripts

│ └─ 📜 send_voice_reply.mjs JavaScript 181L · 6.9 KB

├─ ▾ 📁 tests

│ └─ 🔧 smoke.sh Shell 24L · 955 B

├─ 📋 config.default.json JSON 24L · 735 B

├─ 🔧 install.sh Shell 184L · 6.7 KB

├─ 📋 package.json JSON 14L · 288 B

└─ 📝 SKILL.md Markdown 261L · 7.3 KB

Dependencies 3 items

Package	Version	Source	Known Vulns	Notes
`edge-tts`	`latest (unpinned in install.sh)`	pip	No	No version pinning in install.sh — pip install without version constraint
`whisper`	`latest (unpinned in install.sh)`	pip	No	No version pinning in install.sh — pip install without version constraint
`click`	`latest (unpinned in install.sh)`	pip	No	No version pinning in install.sh — pip install without version constraint

Security Positives

✓ No evidence of reverse shell, C2, or data exfiltration to unauthorized destinations

✓ All network calls are to legitimate, documented endpoints (api.telegram.org, pypi.org, nodejs.org)

✓ No base64 encoding, obfuscation, or anti-analysis techniques detected

✓ No credential exfiltration — botToken is only used locally for Telegram API authentication

✓ File operations are scoped to expected paths (media directories, /tmp, workspace)

✓ Audio file archiving uses copy-before-delete pattern to prevent data loss

✓ Timeout protection on subprocess calls (SIGKILL after timeout)

✓ No access to ~/.ssh, ~/.aws, .env, or other sensitive credential paths

✓ pip install uses trusted packages (edge-tts, whisper, click) from official PyPI

Scan Report

Findings 5 items

File Tree

Dependencies 3 items

Security Positives