openclaw-smartness-eval Security Report — Low Risk | ClawSafe

10 /100

openclaw-smartness-eval

14维度 AI Agent 智能度评估技能。围绕 14 个维度（含规划能力、幻觉控制）输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。

This is a legitimate read-only AI agent evaluation skill. All security-relevant behaviors (subprocess shell execution, SQLite reads, network API calls) are fully declared in SKILL.md. The dangerous shell command reference is a false positive — it's a test input for an anti-gaming probe, not actual execution.

Skill Nameopenclaw-smartness-eval

Duration47.2s

Enginepi

✓

Safe to install

No action needed. The skill is a clean, well-documented evaluation tool.

Findings 2 items

Severity	Finding	Location
Low	Base64-encoded author signature in skill metadata Obfuscation _skill_sig() uses base64.b64decode(b'5ZyG6KeE').decode('utf-8') to decode the author name '圆规'. While this is obfuscation, it serves a declared tamper-resistant signing purpose and has no security impact. `_a = _b64.b64decode(b'5ZyG6KeE').decode('utf-8')` → Consider using a plain-text author field instead for transparency.	`scripts/eval.py:36`
Low	Broad 'docs/*' access inferred from script references Doc Mismatch SKILL.md data sources reference paths in scripts/ directory that may not exist in the skill's workspace (e.g., scripts/message-analyzer-v5.py, scripts/security-config-audit.py). These are external OpenClaw system scripts the skill depends on. Tests would fail if the host workspace doesn't have them, but this is benign — the skill handles missing files gracefully. `scripts/regression-metrics-report.py (回归指标)` → No action needed — missing external scripts are handled by validate_command() blocking.	`SKILL.md:66`

Resource	Declared	Inferred	Status	Evidence
Filesystem	`READ`	`READ`	✓ Aligned	SKILL.md 'Security Declaration' — only reads state files and outputs to state/sm…
Shell	`WRITE`	`WRITE`	✓ Aligned	SKILL.md declares subprocess execution with whitelist validation
Network	`READ`	`READ`	✓ Aligned	SKILL.md declares optional LLM Judge API calls only with --llm-judge flag
Environment	`READ`	`READ`	✓ Aligned	Only reads OPENAI_API_KEY/DEEPSEEK_API_KEY for optional LLM Judge
Database	`READ`	`READ`	✓ Aligned	Only reads .reasoning/reasoning-store.sqlite for query_reasoning_store()

1 Critical 14 findings

💀

Critical Dangerous Command 危险 Shell 命令

rm -rf /

config/task-suite.json:363

🔗

Medium External URL 外部 URL

https://keepachangelog.com/

CHANGELOG.md:6

🔗

Medium External URL 外部 URL

https://www.conventionalcommits.org/

CONTRIBUTING.md:65

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/version-0.3.0-blue?style=flat-square

README.md:7

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/license-MIT--0-green?style=flat-square

README.md:8

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/python-3.9+-yellow?style=flat-square

README.md:9

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/OpenClaw-2026.3.13+-orange?style=flat-square

README.md:10

🔗

Medium External URL 外部 URL

https://arxiv.org/html/2511.14136v1

README.md:89

🔗

Medium External URL 外部 URL

https://www.53ai.com/news/LargeLanguageModel/2024071870985.html

README.md:89

🔗

Medium External URL 外部 URL

https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents

README.md:89

🔗

Medium External URL 外部 URL

https://clawhub.com/yh22e

README.md:312

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/版本-0.2.1-blue?style=flat-square

README_CN.md:7

🔗

Medium External URL 外部 URL

https://img.shields.io/badge/协议-MIT--0-green?style=flat-square

README_CN.md:8

🔗

Medium External URL 外部 URL

https://api.deepseek.com

scripts/eval.py:878

File Tree

22 files · 164.0 KB · 4227 lines

Markdown 15f · 2419L Python 3f · 1204L JSON 4f · 604L

├─ ▾ 📁 config

│ ├─ 🔑 config.json ⚠ JSON 42L · 883 B

│ ├─ 📋 rubrics.json JSON 170L · 7.4 KB

│ └─ 📋 task-suite.json JSON 381L · 14.0 KB

├─ ▾ 📁 docs

│ ├─ 📝 ARCHITECTURE.md Markdown 161L · 10.5 KB

│ ├─ 📝 FAQ.md Markdown 147L · 6.7 KB

│ ├─ 📝 GROWTH.md Markdown 140L · 4.6 KB

│ ├─ 📝 ROADMAP.md Markdown 58L · 2.6 KB

│ ├─ 📝 SCORING.md Markdown 395L · 13.8 KB

│ └─ 📝 SHOWCASE.md Markdown 149L · 5.4 KB

├─ ▾ 📁 scripts

│ ├─ 🐍 check.py Python 29L · 852 B

│ ├─ 🐍 eval.py Python 1102L · 44.3 KB

│ └─ 🐍 state_probe.py Python 73L · 2.2 KB

├─ 📋 _meta.json JSON 11L · 657 B

├─ 📝 CHANGELOG.md Markdown 54L · 3.1 KB

├─ 📝 CLAWHUB-UPLOAD-GUIDE.md Markdown 126L · 4.2 KB

├─ 📝 CODE_OF_CONDUCT.md Markdown 29L · 882 B

├─ 📝 CONTRIBUTING.md Markdown 106L · 3.3 KB

├─ 📝 README_CN.md Markdown 389L · 13.8 KB

├─ 📝 README.md Markdown 336L · 13.9 KB

├─ 📝 RELEASE_NOTES_v0.2.1.md Markdown 56L · 1.9 KB

├─ 📝 SECURITY.md Markdown 89L · 3.4 KB

└─ 📝 SKILL.md Markdown 184L · 5.6 KB

Dependencies 1 items

Package	Version	Source	Known Vulns	Notes
`stdlib`	`Python 3.9+`	standard library	No	No third-party packages — uses only Python standard library (subprocess, sqlite3, json, pathlib, urllib, hashlib, base64, statistics, math, random, datetime)

Security Positives

✓ Comprehensive Security Declaration in SKILL.md covering all 4 resource categories

✓ Strong command whitelist validation (validate_command) blocks inline Python, exec(), absolute paths, and path traversal

✓ Only reads from declared state files; writes only to dedicated output directory state/smartness-eval/

✓ Network access (LLM Judge) is opt-in via --llm-judge flag, requires user-provided API keys

✓ No credential harvesting — API keys only read when LLM Judge is explicitly enabled

✓ No curl|bash or remote script execution

✓ No supply chain risk — uses only Python standard library

✓ SQLite access is read-only via SELECT queries only

✓ Anti-gaming probes use safe test inputs (adversarial text) rather than dangerous commands

✓ Command execution timeout of 120 seconds with graceful timeout handling

✓ No persistence mechanisms (no cron, startup hooks, or backdoors)

✓ Skill signature provides tamper detection for authorship verification

Scan Report

Findings 2 items

File Tree

Dependencies 1 items

Security Positives