Scan Report
5 /100
skill-eval
OpenClaw skill evaluation framework for measuring trigger rates and quality improvements of other skills via sessions_spawn + sessions_history.
Skill-eval is a legitimate OpenClaw evaluation framework that safely runs trigger rate and quality tests against other skills using declared agent tools, with all Python scripts serving as pure data processors with no exfiltration capabilities.
Safe to install
This skill is safe to use. The subprocess/shell access is entirely managed by OpenClaw's agent tooling (sessions_spawn), which is declared and necessary for its evaluation function.
Findings 3 items
| Severity | Finding | Location |
|---|---|---|
| Low | HTTP server bound to localhost | viewer/generate_review.py:197 |
| Low | lsof subprocess for port management | viewer/generate_review.py:168 |
| Low | requests library declared without version pinning | requirements.txt:4 |
| Resource | Declared | Inferred | Status | Evidence |
|---|---|---|---|---|
| Filesystem | READ | READ | ✓ Aligned | SKILL.md: Reads ~/.openclaw/openclaw.json for extraDirs discovery; scripts/* rea… |
| Filesystem | WRITE | WRITE | ✓ Aligned | SKILL.md: 'Write to eval-workspace/' for evaluation results storage |
| Skill Invoke | WRITE | WRITE | ✓ Aligned | SKILL.md: 'Call sessions_spawn' and 'Call sessions_history' for agent execution;… |
| Network | NONE | NONE | — | No external network requests. generate_review.py HTTP server is localhost-only (… |
| Shell | WRITE | WRITE | ✓ Aligned | sessions_spawn runs subagents (which execute shell commands as part of evaluatio… |
| Environment | NONE | NONE | — | No iteration over os.environ for credential keys. resolve_paths.py only reads op… |
3 findings
Medium External URL 外部 URL
https://img.shields.io/badge/License-MIT-yellow.svg README.md:3 Medium External URL 外部 URL
https://api.zephyr.internal test-skills/fake-tool/SKILL.md:14 Medium External URL 外部 URL
https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js viewer/viewer.html:10 File Tree
40 files · 288.2 KB · 8934 lines Python 19f · 5572L
Markdown 13f · 1518L
HTML 1f · 1325L
JSON 6f · 506L
Text 1f · 13L
├─
▾
agents
│ ├─
analyzer.md
Markdown
│ ├─
comparator.md
Markdown
│ └─
grader.md
Markdown
├─
▾
docs
│ └─
ARCHITECTURE.md
Markdown
├─
▾
evals
│ ├─
▾
weather
│ │ ├─
quality.json
JSON
│ │ └─
triggers.json
JSON
│ ├─
example-quality.json
JSON
│ └─
example-triggers.json
JSON
├─
▾
scripts
│ ├─
▾
legacy
│ │ ├─
README.md
Markdown
│ │ ├─
run_compare.py
Python
│ │ ├─
run_diagnostics.py
Python
│ │ ├─
run_latency_profile.py
Python
│ │ ├─
run_model_compare.py
Python
│ │ ├─
run_orchestrator.py
Python
│ │ └─
run_trigger.py
Python
│ ├─
aggregate_benchmark.py
Python
│ ├─
analyze_latency.py
Python
│ ├─
analyze_model_compare.py
Python
│ ├─
analyze_quality.py
Python
│ ├─
analyze_triggers.py
Python
│ ├─
build_evals_with_context.py
Python
│ ├─
extract_session_history.py
Python
│ └─
resolve_paths.py
Python
├─
▾
templates
│ └─
▾
cli-wrapper
│ ├─
quality.json
JSON
│ ├─
README.md
Markdown
│ └─
triggers.json
JSON
├─
▾
test-skills
│ ├─
▾
fake-tool
│ │ └─
SKILL.md
Markdown
│ └─
README.md
Markdown
├─
▾
tests
│ ├─
__init__.py
Python
│ ├─
conftest.py
Python
│ ├─
test_analyze_quality.py
Python
│ └─
test_analyze_triggers.py
Python
├─
▾
viewer
│ ├─
generate_review.py
Python
│ └─
viewer.html
HTML
├─
CHANGELOG.md
Markdown
├─
CONTRIBUTING.md
Markdown
├─
README.md
Markdown
├─
requirements.txt
Text
├─
SKILL.md
Markdown
└─
USAGE.md
Markdown
Dependencies 1 items
| Package | Version | Source | Known Vulns | Notes |
|---|---|---|---|---|
requests | >=2.28.0,<3.0.0 | pip | No | Version range is adequate but not strictly pinned. Only used by legacy oc_tools integration, not by current analyze_*.py scripts. |
Security Positives
✓ All runtime actions explicitly declared in SKILL.md Runtime Actions Disclosure table
✓ No credential harvesting — resolve_paths.py only reads openclaw.json for skill directory paths
✓ No data exfiltration — all Python scripts are pure data processors operating on local workspace files
✓ No suspicious patterns — no base64|bash, eval(atob()), curl|bash, or direct IP network requests
✓ No hidden functionality — every script's purpose is documented and traceable
✓ requests library correctly scoped — only for legacy oc_tools integration, not used by current analyze_*.py scripts
✓ viewer.html is a self-contained HTML report with no dynamic external fetches (SheetJS CDN is the only external resource, for spreadsheet rendering only)
✓ sessions_spawn/shell access is via OpenClaw's own tooling, not direct subprocess in skill code