Scan Report
10 /100
openclaw-smartness-eval
14维度 AI Agent 智能度评估技能。围绕 14 个维度(含规划能力、幻觉控制)输出综合评分、证据、风险与趋势。对齐 CLEAR/T-Eval/Anthropic 行业标准。
This is a legitimate read-only AI agent evaluation skill. All security-relevant behaviors (subprocess shell execution, SQLite reads, network API calls) are fully declared in SKILL.md. The dangerous shell command reference is a false positive — it's a test input for an anti-gaming probe, not actual execution.
Safe to install
No action needed. The skill is a clean, well-documented evaluation tool.
Findings 2 items
| Severity | Finding | Location |
|---|---|---|
| Low | Base64-encoded author signature in skill metadata Obfuscation | scripts/eval.py:36 |
| Low | Broad 'docs/*' access inferred from script references Doc Mismatch | SKILL.md:66 |
| Resource | Declared | Inferred | Status | Evidence |
|---|---|---|---|---|
| Filesystem | READ | READ | ✓ Aligned | SKILL.md 'Security Declaration' — only reads state files and outputs to state/sm… |
| Shell | WRITE | WRITE | ✓ Aligned | SKILL.md declares subprocess execution with whitelist validation |
| Network | READ | READ | ✓ Aligned | SKILL.md declares optional LLM Judge API calls only with --llm-judge flag |
| Environment | READ | READ | ✓ Aligned | Only reads OPENAI_API_KEY/DEEPSEEK_API_KEY for optional LLM Judge |
| Database | READ | READ | ✓ Aligned | Only reads .reasoning/reasoning-store.sqlite for query_reasoning_store() |
1 Critical 14 findings
Critical Dangerous Command 危险 Shell 命令
rm -rf / config/task-suite.json:363 Medium External URL 外部 URL
https://keepachangelog.com/ CHANGELOG.md:6 Medium External URL 外部 URL
https://www.conventionalcommits.org/ CONTRIBUTING.md:65 Medium External URL 外部 URL
https://img.shields.io/badge/version-0.3.0-blue?style=flat-square README.md:7 Medium External URL 外部 URL
https://img.shields.io/badge/license-MIT--0-green?style=flat-square README.md:8 Medium External URL 外部 URL
https://img.shields.io/badge/python-3.9+-yellow?style=flat-square README.md:9 Medium External URL 外部 URL
https://img.shields.io/badge/OpenClaw-2026.3.13+-orange?style=flat-square README.md:10 Medium External URL 外部 URL
https://arxiv.org/html/2511.14136v1 README.md:89 Medium External URL 外部 URL
https://www.53ai.com/news/LargeLanguageModel/2024071870985.html README.md:89 Medium External URL 外部 URL
https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents README.md:89 Medium External URL 外部 URL
https://clawhub.com/yh22e README.md:312 Medium External URL 外部 URL
https://img.shields.io/badge/版本-0.2.1-blue?style=flat-square README_CN.md:7 Medium External URL 外部 URL
https://img.shields.io/badge/协议-MIT--0-green?style=flat-square README_CN.md:8 Medium External URL 外部 URL
https://api.deepseek.com scripts/eval.py:878 File Tree
22 files · 164.0 KB · 4227 lines Markdown 15f · 2419L
Python 3f · 1204L
JSON 4f · 604L
├─
▾
config
│ ├─
config.json
⚠
JSON
│ ├─
rubrics.json
JSON
│ └─
task-suite.json
JSON
├─
▾
docs
│ ├─
ARCHITECTURE.md
Markdown
│ ├─
FAQ.md
Markdown
│ ├─
GROWTH.md
Markdown
│ ├─
ROADMAP.md
Markdown
│ ├─
SCORING.md
Markdown
│ └─
SHOWCASE.md
Markdown
├─
▾
scripts
│ ├─
check.py
Python
│ ├─
eval.py
Python
│ └─
state_probe.py
Python
├─
_meta.json
JSON
├─
CHANGELOG.md
Markdown
├─
CLAWHUB-UPLOAD-GUIDE.md
Markdown
├─
CODE_OF_CONDUCT.md
Markdown
├─
CONTRIBUTING.md
Markdown
├─
README_CN.md
Markdown
├─
README.md
Markdown
├─
RELEASE_NOTES_v0.2.1.md
Markdown
├─
SECURITY.md
Markdown
└─
SKILL.md
Markdown
Dependencies 1 items
| Package | Version | Source | Known Vulns | Notes |
|---|---|---|---|---|
stdlib | Python 3.9+ | standard library | No | No third-party packages — uses only Python standard library (subprocess, sqlite3, json, pathlib, urllib, hashlib, base64, statistics, math, random, datetime) |
Security Positives
✓ Comprehensive Security Declaration in SKILL.md covering all 4 resource categories
✓ Strong command whitelist validation (validate_command) blocks inline Python, exec(), absolute paths, and path traversal
✓ Only reads from declared state files; writes only to dedicated output directory state/smartness-eval/
✓ Network access (LLM Judge) is opt-in via --llm-judge flag, requires user-provided API keys
✓ No credential harvesting — API keys only read when LLM Judge is explicitly enabled
✓ No curl|bash or remote script execution
✓ No supply chain risk — uses only Python standard library
✓ SQLite access is read-only via SELECT queries only
✓ Anti-gaming probes use safe test inputs (adversarial text) rather than dangerous commands
✓ Command execution timeout of 120 seconds with graceful timeout handling
✓ No persistence mechanisms (no cron, startup hooks, or backdoors)
✓ Skill signature provides tamper detection for authorship verification