Low Risk — Risk Score 25/100
Last scan:23 hr ago Rescan
25 /100
voice-tts
语音输入(Whisper ASR)+ 语音输出(Edge TTS)技能,支持 agent 专属音色,可调用 send_voice_reply.mjs 发送 Telegram 语音消息
Legitimate voice TTS/ASR skill for OpenClaw with no malicious behavior, but with undocumented shell execution, credential reading, and network access in code that is not declared in SKILL.md.
Skill Namevoice-tts
Duration74.1s
Enginepi
Safe to install
Add explicit declarations to SKILL.md: (1) shell:WRITE for subprocess/pip/curl usage, (2) credential reading (openclaw.json botToken access), (3) network:WRITE for Telegram API calls. Also remove the references to non-existent scripts/edge_tts and scripts/whisper from SKILL.md.

Findings 5 items

Severity Finding Location
Medium
Undocumented shell subprocess execution Doc Mismatch
voice-asr.mjs and voice-tts.mjs use Node.js spawn() to execute Python scripts (whisper, edge-tts) and send_voice_reply.mjs runs curl. None of this shell execution is declared in SKILL.md's capability declarations. The allowed-tools section only documents Node.js CLI entrypoints but not the underlying subprocess calls.
const child = spawn('python3', [script, audioFile, ...passthrough], { stdio: ['ignore', 'pipe', 'pipe'] });
→ Add shell:WRITE to SKILL.md's capability declarations, or refactor to use documented Node.js-only approaches.
bin/voice-asr.mjs:67
Medium
Undocumented network access (Telegram API) Doc Mismatch
send_voice_reply.mjs makes HTTPS POST requests to api.telegram.org via curl to send voice messages. This network:WRITE access is not declared in SKILL.md. The Telegram bot token is also read from openclaw.json without capability declaration.
await runCommand('curl', ['-s', '-o', '/dev/null', '-w', '%{http_code}', '-F', `chat_id=${chatId}`, '-F', `voice=@${voiceFile}`, '-F', `caption=${caption.slice(0, 1024)}`, apiUrl]);
→ Add network:WRITE and credential access to SKILL.md capability declarations.
scripts/send_voice_reply.mjs:80
Low
SKILL.md references non-existent internal script files Doc Mismatch
SKILL.md states 'scripts/edge_tts and scripts/whisper are internal Python wrappers' but these files do not exist in the scripts/ directory. The actual execution goes through pip-installed Python packages (edge_tts, whisper) invoked via python3. This is a doc-to-code mismatch, though not malicious.
scripts/edge_tts 和 scripts/whisper 是内部 Python 封装,非直接入口
→ Remove the references to scripts/edge_tts and scripts/whisper, or add actual stub wrapper scripts.
SKILL.md:200
Low
Credential reading from openclaw.json not capability-declared Sensitive Access
getBotToken() in send_voice_reply.mjs reads ~/.openclaw/openclaw.json to extract botToken. While this is described in the parameter docs, it is not declared as a credential access capability in the skill's capability map.
const cfg = vm.runInNewContext(`(${raw})`, {});
const accounts = cfg?.channels?.telegram?.accounts;
return accounts[agentId]?.botToken || accounts['default']?.botToken || null;
→ Add environment:READ to capability declarations if reading openclaw.json is considered credential access.
scripts/send_voice_reply.mjs:49
Low
install.sh uses proxy variable unquoted in pip command Doc Mismatch
In install.sh line 40-41, the PROXY variable is unquoted when interpolated into pip install command. While the variable comes from a controlled script argument (--proxy), unquoted variables in shell commands are a general code quality concern.
PIP_CMD="pip3 install edge-tts whisper click"
if [[ -n "$PROXY" ]]; then
  PIP_CMD="pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple edge-tts whisper click"
  export https_proxy="$PROXY" http_proxy="$PROXY"
fi
→ Quote proxy variable: export https_proxy="$PROXY". The pip command itself is fine since PROXY is URL-based.
install.sh:40
ResourceDeclaredInferredStatusEvidence
Filesystem NONE READ ✓ Aligned bin/voice-asr.mjs:7 (fs.readFileSync reads openclaw.json); lib/config.mjs:43
Filesystem NONE WRITE ✓ Aligned bin/voice-asr.mjs:85-91 (copyFileSync/unlinkSync for archiving); bin/voice-tts.m…
Shell NONE WRITE ✓ Aligned bin/voice-asr.mjs:67 (spawn('python3', ...)); bin/voice-tts.mjs:51 (spawn('pytho…
Network NONE WRITE ✓ Aligned scripts/send_voice_reply.mjs:80-91 (curl POST to https://api.telegram.org/)
Environment NONE READ ✓ Aligned bin/voice-asr.mjs:82 (process.env.OPENCLAW_WORKSPACE); scripts/send_voice_reply.…
Skill Invoke NONE READ ✓ Aligned bin/voice-asr.mjs:93-95 (generates output instructing agent to call send_voice_r…
4 findings
🔗
Medium External URL 外部 URL
http://127.0.0.1:7897
SKILL.md:50
🔗
Medium External URL 外部 URL
https://nodejs.org/
install.sh:37
🔗
Medium External URL 外部 URL
https://pypi.tuna.tsinghua.edu.cn/simple
install.sh:49
🔗
Medium External URL 外部 URL
https://api.telegram.org/bot$
scripts/send_voice_reply.mjs:80

File Tree

11 files · 34.1 KB · 980 lines
JavaScript 6f · 473L Markdown 1f · 261L Shell 2f · 208L JSON 2f · 38L
├─ 📁 bin
│ ├─ 📜 voice-asr.mjs JavaScript 127L · 5.2 KB
│ └─ 📜 voice-tts.mjs JavaScript 68L · 2.4 KB
├─ 📁 lib
│ ├─ 📜 audio.mjs JavaScript 15L · 673 B
│ ├─ 📜 config.mjs JavaScript 63L · 2.2 KB
│ └─ 📜 errors.mjs JavaScript 19L · 860 B
├─ 📁 scripts
│ └─ 📜 send_voice_reply.mjs JavaScript 181L · 6.9 KB
├─ 📁 tests
│ └─ 🔧 smoke.sh Shell 24L · 955 B
├─ 📋 config.default.json JSON 24L · 735 B
├─ 🔧 install.sh Shell 184L · 6.7 KB
├─ 📋 package.json JSON 14L · 288 B
└─ 📝 SKILL.md Markdown 261L · 7.3 KB

Dependencies 3 items

PackageVersionSourceKnown VulnsNotes
edge-tts latest (unpinned in install.sh) pip No No version pinning in install.sh — pip install without version constraint
whisper latest (unpinned in install.sh) pip No No version pinning in install.sh — pip install without version constraint
click latest (unpinned in install.sh) pip No No version pinning in install.sh — pip install without version constraint

Security Positives

✓ No evidence of reverse shell, C2, or data exfiltration to unauthorized destinations
✓ All network calls are to legitimate, documented endpoints (api.telegram.org, pypi.org, nodejs.org)
✓ No base64 encoding, obfuscation, or anti-analysis techniques detected
✓ No credential exfiltration — botToken is only used locally for Telegram API authentication
✓ File operations are scoped to expected paths (media directories, /tmp, workspace)
✓ Audio file archiving uses copy-before-delete pattern to prevent data loss
✓ Timeout protection on subprocess calls (SIGKILL after timeout)
✓ No access to ~/.ssh, ~/.aws, .env, or other sensitive credential paths
✓ pip install uses trusted packages (edge-tts, whisper, click) from official PyPI