kiwi-voice Security Report — Low Risk | ClawSafe

5 /100

kiwi-voice

Kiwi Voice — 多语言实时语音助手，通过 OpenClaw WebSocket 与 LLM 后端通信。支持 Faster Whisper / MLX Whisper / ElevenLabs STT，支持 Qwen3 / ElevenLabs / Piper / Kokoro TTS，集成就绪词检测、说话人识别、危险命令过滤、Telegram 审批。

合法开源语音助手项目 Kiwi Voice，文档准确、架构清晰、安全控制完善。subprocess 均为合法系统工具调用（pactl/ffmpeg/openclaw），无命令注入；所有外部网络请求指向可信 AI 服务 API；无凭证收割、无数据外泄。

Skill Namekiwi-voice

Duration110.7s

Enginepi

✓

Safe to install

可安全使用。API 建议启用 auth，dependencies 建议锁定版本。

Findings 2 items

Severity	Finding	Location
Low	依赖版本未锁定 Supply Chain requirements.txt 中所有依赖均使用 >= 范围版本，qwen-tts、kokoro-onnx 等无版本规范。攻击者可利用 PyPI 供应链在包更新时注入恶意代码。 `faster-whisper>=1.0.0 requests>=2.28.0 ...` → 使用 pip-compile 生成 requirements.txt 的固定版本，或使用 pyproject.toml [project.dependencies] 锁定版本	`requirements.txt:1`
Low	REST API 默认绑定 0.0.0.0 Priv Escalation API server 默认监听 0.0.0.0:7789，在多租户环境中可能暴露。虽然支持 api_auth_tokens，但默认未启用。 `api_host: str = "0.0.0.0"` → 默认改为 127.0.0.1，或默认启用 auth	`kiwi/config_loader.py:95`

Resource	Declared	Inferred	Status	Evidence
Filesystem	`NONE`	`READ`	✓ Aligned	SKILL.md 仅声明管理配置文件，无文件写入操作
Network	`NONE`	`READ`	✓ Aligned	连接 ElevenLabs/RunPod/Telegram/Home Assistant API，均由配置文件控制，无未声明外部连接
Shell	`NONE`	`READ`	✓ Aligned	subprocess 调用 pactl/ffmpeg/openclaw CLI，均为工具集成而非任意命令执行，无注入风险
Environment	`NONE`	`READ`	✓ Aligned	仅读取 .env 中的服务凭证（API keys），无凭证外传行为

1 Critical 32 findings

💀

Critical Dangerous Command 危险 Shell 命令

rm -rf /

docs/features/voice-security.md:17

🔗

Medium External URL 外部 URL

https://kiwi-voice.com

README.md:2

🔗

Medium External URL 外部 URL

https://kiwi-voice.com/assets/og-image.svg

README.md:3

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/license-MIT-blue.svg

README.md:14

🔗

Medium External URL 外部 URL

https://www.python.org/downloads/

README.md:15

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/python-3.10%2B-blue.svg

README.md:15

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/backend-OpenClaw-orange.svg

README.md:16

🔗

Medium External URL 外部 URL

https://docs.kiwi-voice.com

README.md:19

🔗

Medium External URL 外部 URL

https://docs.openclaw.ai

README.md:21

🔗

Medium External URL 外部 URL

http://homeassistant.local:8123

README.md:429

🔗

Medium External URL 外部 URL

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

docs/deployment/docker.md:70

🔗

Medium External URL 外部 URL

https://download.pytorch.org/whl/cu121

docs/deployment/gpu.md:10

🔗

Medium External URL 外部 URL

https://www.home-assistant.io/

docs/features/home-assistant.md:3

🔗

Medium External URL 外部 URL

http://192.168.1.100:7789

docs/features/home-assistant.md:20

🔗

Medium External URL 外部 URL

https://t.me/botfather

docs/features/voice-security.md:60

🔗

Medium External URL 外部 URL

https://api.telegram.org/bot

docs/features/voice-security.md:67

🔗

Medium External URL 外部 URL

https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet

docs/features/web-microphone.md:7

🔗

Medium External URL 外部 URL

https://ffmpeg.org/download.html

docs/getting-started/installation.md:52

🔗

Medium External URL 外部 URL

http://192.168.1.100:8123

kiwi/integrations/homeassistant.py:30

🔗

Medium External URL 外部 URL

https://api.elevenlabs.io/v1/text-to-speech

kiwi/tts/elevenlabs.py:93

🔗

Medium External URL 外部 URL

https://api.runpod.ai/v2/

kiwi/tts/runpod.py:44

🔗

Medium External URL 外部 URL

http://www.w3.org/2000/svg

kiwi/web/index.html:11

🔗

Medium External URL 外部 URL

https://console.runpod.io/serverless

runpod/README.md:27

🔗

Medium External URL 外部 URL

https://api.runpod.ai/v2/YOUR_ENDPOINT_ID

runpod/README.md:172

🔗

Medium External URL 外部 URL

http://www.apache.org/licenses/LICENSE-2.0

runpod/qwen_tts/__init__.py:9

🔗

Medium External URL 外部 URL

https://huggingface.co/papers/2103.17239

runpod/qwen_tts/core/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py:394

🔗

Medium External URL 外部 URL

https://huggingface.co/papers/2006.08195

runpod/qwen_tts/core/tokenizer_12hz/modeling_qwen3_tts_tokenizer_v2.py:588

🔗

Medium External URL 外部 URL

https://discuss.pytorch.org/t/how-to-generate-variable-length-mask/23397/3

runpod/qwen_tts/core/tokenizer_25hz/modeling_qwen3_tts_tokenizer_v1.py:233

🔗

Medium External URL 外部 URL

https://huggingface.co/papers/2005.07143

runpod/qwen_tts/core/tokenizer_25hz/modeling_qwen3_tts_tokenizer_v1.py:345

🔗

Medium External URL 外部 URL

https://arxiv.org/pdf/2107.03312.pdf

runpod/qwen_tts/core/tokenizer_25hz/vq/core_vq.py:336

🔗

Medium External URL 外部 URL

https://arxiv.org/abs/2305.02765

runpod/qwen_tts/core/tokenizer_25hz/vq/core_vq.py:479

🔗

Medium External URL 外部 URL

https://colab.research.google.com/drive/1q1oe2zOyZp7UsB3jJiQ1IFn8z5YfjwEb

scripts/train_wake_word.py:17

File Tree

146 files · 1.4 MB · 40646 lines

Python 79f · 24900L YAML 18f · 8512L Markdown 35f · 3575L CSS 2f · 1831L JavaScript 3f · 1372L HTML 1f · 282L JSON 3f · 69L Text 1f · 53L TOML 2f · 43L Config 1f · 9L

├─ ▾ 📁 custom_components

│ └─ ▾ 📁 kiwi_voice

│ ├─ ▾ 📁 translations

│ │ └─ 📋 en.json JSON 29L · 843 B

│ ├─ 🐍 __init__.py Python 185L · 6.0 KB

│ ├─ 🐍 button.py Python 108L · 3.6 KB

│ ├─ 🐍 config_flow.py Python 133L · 4.4 KB

│ ├─ 🐍 const.py Python 5L · 116 B

│ ├─ 🐍 coordinator.py Python 159L · 5.6 KB

│ ├─ 📋 manifest.json JSON 11L · 312 B

│ ├─ 🐍 sensor.py Python 165L · 5.6 KB

│ ├─ 📋 services.yaml YAML 36L · 994 B

│ ├─ 📋 strings.json JSON 29L · 843 B

│ ├─ 🐍 switch.py Python 75L · 2.7 KB

│ └─ 🐍 tts.py Python 89L · 2.8 KB

├─ ▾ 📁 docs

│ ├─ ▾ 📁 api

│ │ ├─ 📝 rest.md Markdown 268L · 3.9 KB

│ │ └─ 📝 websocket.md Markdown 118L · 3.0 KB

│ ├─ ▾ 📁 assets

│ │ └─ 📦 favicon.svg 115 B

│ ├─ ▾ 📁 deployment

│ │ ├─ 📝 docker.md Markdown 82L · 1.6 KB

│ │ ├─ 📝 gpu.md Markdown 76L · 1.6 KB

│ │ ├─ 📝 local.md Markdown 76L · 1.6 KB

│ │ ├─ 📝 reverse-proxy.md Markdown 47L · 1.2 KB

│ │ └─ 📝 systemd.md Markdown 58L · 1.1 KB

│ ├─ ▾ 📁 development

│ │ ├─ 📝 architecture.md Markdown 81L · 3.4 KB

│ │ ├─ 📝 code-patterns.md Markdown 129L · 2.4 KB

│ │ └─ 📝 contributing.md Markdown 86L · 2.8 KB

│ ├─ ▾ 📁 features

│ │ ├─ 📝 home-assistant.md Markdown 82L · 2.4 KB

│ │ ├─ 📝 multilanguage.md Markdown 113L · 3.3 KB

│ │ ├─ 📝 souls.md Markdown 95L · 2.5 KB

│ │ ├─ 📝 speaker-id.md Markdown 70L · 2.4 KB

│ │ ├─ 📝 streaming-tts.md Markdown 42L · 1.7 KB

│ │ ├─ 📝 stt-engines.md Markdown 94L · 2.7 KB

│ │ ├─ 📝 tts-providers.md Markdown 122L · 2.6 KB

│ │ ├─ 📝 voice-security.md Markdown 91L · 3.2 KB

│ │ ├─ 📝 wake-word.md Markdown 91L · 2.7 KB

│ │ ├─ 📝 web-dashboard.md Markdown 47L · 1.8 KB

│ │ └─ 📝 web-microphone.md Markdown 47L · 1.7 KB

│ ├─ ▾ 📁 getting-started

│ │ ├─ 📝 configuration.md Markdown 170L · 4.0 KB

│ │ ├─ 📝 first-run.md Markdown 81L · 2.2 KB

│ │ └─ 📝 installation.md Markdown 81L · 1.7 KB

│ ├─ ▾ 📁 stylesheets

│ │ └─ 📄 extra.css CSS 230L · 4.8 KB

│ └─ 📝 index.md Markdown 130L · 4.7 KB

├─ ▾ 📁 kiwi

│ ├─ ▾ 📁 api

│ │ ├─ 🐍 __init__.py Python 2L · 83 B

│ │ ├─ 🐍 audio_bridge.py Python 259L · 8.9 KB

│ │ └─ 🐍 server.py Python 952L · 39.5 KB

│ ├─ ▾ 📁 integrations

│ │ ├─ 🐍 __init__.py Python 1L · 54 B

│ │ └─ 🐍 homeassistant.py Python 230L · 8.7 KB

│ ├─ ▾ 📁 locales

│ │ ├─ 🐍 __init__.py Python 0 B

│ │ ├─ 📋 ar.yaml YAML 533L · 17.3 KB

│ │ ├─ 📋 de.yaml YAML 541L · 15.8 KB

│ │ ├─ 📋 en.yaml YAML 557L · 14.7 KB

│ │ ├─ 📋 es.yaml YAML 535L · 15.2 KB

│ │ ├─ 📋 fr.yaml YAML 541L · 15.7 KB

│ │ ├─ 📋 hi.yaml YAML 533L · 23.6 KB

│ │ ├─ 📋 id.yaml YAML 535L · 14.6 KB

│ │ ├─ 📋 it.yaml YAML 539L · 15.5 KB

│ │ ├─ 📋 ja.yaml YAML 525L · 17.5 KB

│ │ ├─ 📋 ko.yaml YAML 524L · 15.4 KB

│ │ ├─ 📋 pl.yaml YAML 535L · 15.1 KB

│ │ ├─ 📋 pt.yaml YAML 539L · 15.2 KB

│ │ ├─ 📋 ru.yaml YAML 591L · 21.9 KB

│ │ ├─ 📋 tr.yaml YAML 530L · 14.7 KB

│ │ └─ 📋 zh.yaml YAML 526L · 14.0 KB

│ ├─ ▾ 📁 mixins

│ │ ├─ 🐍 __init__.py Python 15L · 478 B

│ │ ├─ 🐍 audio_playback.py Python 331L · 13.4 KB

│ │ ├─ 🐍 dialogue_pipeline.py Python 699L · 32.2 KB

│ │ ├─ 🐍 llm_callbacks.py Python 247L · 10.4 KB

│ │ ├─ 🐍 stream_watchdog.py Python 292L · 12.5 KB

│ │ └─ 🐍 tts_speech.py Python 585L · 24.0 KB

│ ├─ ▾ 📁 souls

│ │ ├─ 📝 comedian.md Markdown 29L · 1.2 KB

│ │ ├─ 📝 hype-person.md Markdown 27L · 1.2 KB

│ │ ├─ 📝 mindful-companion.md Markdown 37L · 1.7 KB

│ │ ├─ 📝 siren.md Markdown 33L · 2.0 KB

│ │ └─ 📝 storyteller.md Markdown 29L · 1.3 KB

│ ├─ ▾ 📁 stt

│ │ ├─ 🐍 __init__.py Python 1L · 52 B

│ │ ├─ 🐍 elevenlabs.py Python 196L · 6.0 KB

│ │ └─ 🐍 mlx_whisper.py Python 139L · 4.0 KB

│ ├─ ▾ 📁 tts

│ │ ├─ 🐍 __init__.py Python 2L · 110 B

│ │ ├─ 🐍 base.py Python 138L · 4.4 KB

│ │ ├─ 🐍 elevenlabs_ws.py Python 710L · 27.0 KB

│ │ ├─ 🐍 elevenlabs.py Python 302L · 11.2 KB

│ │ ├─ 🐍 kokoro.py Python 193L · 5.9 KB

│ │ ├─ 🐍 piper.py Python 143L · 4.9 KB

│ │ ├─ 🐍 qwen_local.py Python 197L · 7.1 KB

│ │ ├─ 🐍 runpod.py Python 265L · 8.2 KB

│ │ └─ 🐍 streaming.py Python 356L · 14.1 KB

│ ├─ ▾ 📁 web

│ │ ├─ ▾ 📁 static

│ │ │ ├─ 📜 app.js JavaScript 1000L · 31.7 KB

│ │ │ ├─ 📜 audio-client.js JavaScript 288L · 8.0 KB

│ │ │ ├─ 📜 audio-worklet.js JavaScript 84L · 2.3 KB

│ │ │ └─ 📄 style.css CSS 1601L · 34.2 KB

│ │ └─ 📄 index.html HTML 282L · 15.0 KB

│ ├─ 🐍 __init__.py Python 7L · 247 B

│ ├─ 🐍 __main__.py Python 4L · 78 B

│ ├─ 🐍 config_loader.py Python 720L · 34.4 KB

│ ├─ 🐍 event_bus.py Python 466L · 13.6 KB

│ ├─ 🐍 hardware_aec.py Python 466L · 16.2 KB

│ ├─ 🐍 i18n.py Python 71L · 1.9 KB

│ ├─ 🐍 listener.py Python 2665L · 122.8 KB

│ ├─ 🐍 openclaw_cli.py Python 254L · 9.6 KB

│ ├─ 🐍 openclaw_ws.py Python 1648L · 64.8 KB

│ ├─ 🐍 service.py Python 748L · 29.8 KB

│ ├─ 🐍 soul_manager.py Python 258L · 9.6 KB

│ ├─ 🐍 speaker_id.py Python 451L · 15.5 KB

│ ├─ 🐍 speaker_manager.py Python 625L · 23.3 KB

│ ├─ 🐍 state_machine.py Python 11L · 437 B

│ ├─ 🐍 text_processing.py Python 242L · 9.3 KB

│ ├─ 🐍 unified_vad.py Python 398L · 12.9 KB

│ ├─ 🐍 utils.py Python 148L · 4.7 KB

│ ├─ 🐍 voice_security.py Python 624L · 22.6 KB

│ └─ 🐍 wake_word.py Python 104L · 3.1 KB

├─ ▾ 📁 runpod

│ ├─ ▾ 📁 qwen_tts

│ │ ├─ ▾ 📁 cli

│ │ │ └─ 🐍 demo.py Python 634L · 28.5 KB

│ │ ├─ ▾ 📁 core

│ │ │ ├─ ▾ 📁 tokenizer_12hz

│ │ │ │ ├─ 🔑 configuration_qwen3_tts_tokenizer_v2.py ⚠ Python 172L · 7.8 KB

│ │ │ │ └─ 🔑 modeling_qwen3_tts_tokenizer_v2.py ⚠ Python 1025L · 39.5 KB

│ │ │ ├─ ▾ 📁 tokenizer_25hz

│ │ │ │ ├─ ▾ 📁 vq

│ │ │ │ │ ├─ 🐍 core_vq.py Python 522L · 19.6 KB

│ │ │ │ │ ├─ 🐍 speech_vq.py Python 356L · 14.5 KB

│ │ │ │ │ └─ 🐍 whisper_encoder.py Python 406L · 14.0 KB

│ │ │ │ ├─ 🔑 configuration_qwen3_tts_tokenizer_v1.py ⚠ Python 332L · 14.2 KB

│ │ │ │ └─ 🔑 modeling_qwen3_tts_tokenizer_v1.py ⚠ Python 1528L · 55.1 KB

│ │ │ └─ 🐍 __init__.py Python 18L · 990 B

│ │ ├─ ▾ 📁 inference

│ │ │ ├─ 🐍 qwen3_tts_model.py Python 877L · 36.3 KB

│ │ │ └─ 🔑 qwen3_tts_tokenizer.py ⚠ Python 410L · 15.3 KB

│ │ ├─ 🐍 __init__.py Python 23L · 839 B

│ │ └─ 🐍 __main__.py Python 24L · 800 B

│ ├─ 🐍 download_model.py Python 22L · 734 B

│ ├─ 🐍 handler.py Python 415L · 13.0 KB

│ └─ 📝 README.md Markdown 238L · 6.1 KB

├─ ▾ 📁 scripts

│ ├─ 🐍 generate_cues.py Python 94L · 2.8 KB

│ ├─ 🐍 noise_monitor.py Python 85L · 2.6 KB

│ ├─ 🐍 send_voice.py Python 33L · 916 B

│ ├─ 🐍 telegram_voice.py Python 170L · 5.1 KB

│ └─ 🐍 train_wake_word.py Python 112L · 3.1 KB

├─ ▾ 📁 tests

│ ├─ 🐍 __init__.py Python 0 B

│ ├─ 🐍 test_api.py Python 27L · 704 B

│ ├─ 🐍 test_audio_bridge.py Python 106L · 3.0 KB

│ ├─ 🐍 test_config.py Python 58L · 1.9 KB

│ ├─ 🐍 test_i18n.py Python 78L · 2.2 KB

│ ├─ 🐍 test_smoke.py Python 80L · 2.6 KB

│ ├─ 🐍 test_text_processing.py Python 137L · 4.2 KB

│ └─ 🐍 test_voice_security.py Python 72L · 2.5 KB

├─ 📝 CLAUDE.md Markdown 228L · 8.8 KB

├─ 📋 config.yaml YAML 283L · 9.9 KB

├─ 📋 mkdocs.yml YAML 109L · 2.9 KB

├─ 📄 mypy.ini Config 9L · 217 B

├─ 📄 pyproject.toml TOML 26L · 830 B

├─ 📝 README.md Markdown 441L · 17.0 KB

├─ 📄 requirements.txt Text 53L · 947 B

├─ 📄 ruff.toml TOML 17L · 489 B

├─ 📝 SKILL.md Markdown 124L · 3.3 KB

└─ 📝 SOUL.md Markdown 12L · 851 B

Dependencies 7 items

Package	Version	Source	Known Vulns	Notes
`faster-whisper`	`>=1.0.0`	pip	No	无版本锁定
`requests`	`>=2.28.0`	pip	No	无版本锁定
`qwen-tts`	`unspecified`	pip	No	无版本锁定，来源非标准
`kokoro-onnx`	`>=0.4.0`	pip	No	无版本锁定
`pyannote.audio`	`>=3.1.0`	pip	No	无版本锁定
`torch`	`>=2.0.0`	pip	No	无版本锁定
`aiohttp`	`>=3.9.0`	pip	No	无版本锁定

Security Positives

✓ SKILL.md 准确描述功能，无文档欺骗或阴影功能

✓ 危险命令检测（DangerousCommandDetector）基于正则表达式，两层安全过滤（pre-LLM + post-LLM）

✓ Telegram 审批机制：非 Owner 用户的危险命令需 Telegram 按钮确认

✓ subprocess 调用均为硬编码命令（pactl/ffmpeg/openclaw），无命令注入漏洞

✓ base64 仅用于音频数据编码和 Ed25519 密钥序列化，无混淆执行

✓ 所有外部网络连接均指向已知可信服务（ElevenLabs/RunPod/Telegram/Home Assistant）

✓ API 支持 Bearer token 认证和细粒度 scope 控制

✓ Speaker Priority 系统实现了 OWNER > FRIEND > GUEST > BLOCKED 分级权限

✓ 代码结构清晰，模块化良好，使用 kiwi_log 统一日志记录

✓ 无凭证收割行为，os.environ 仅用于读取配置 env var，不外传

✓ i18n 系统完善，15 种语言支持，用户字符串完全外部化

Scan Report

Findings 2 items

File Tree

Dependencies 7 items

Security Positives