METHODOLOGY
How We Detect
AI Skill Threats
ClawSafe runs a multi-agent security pipeline — coordinating a static rule engine and semantic analysis agents — to deeply inspect every skill and produce structured reports.
Collection Agent
CollectionAccepts GitHub repositories, ClawHub packages, ZIP archives, and direct uploads. The agent extracts all text files (SKILL.md, scripts, configs) and builds a normalized file tree. Files are processed in memory and immediately discarded — no raw files are persisted.
Static Rule Agent
Static AnalysisRuns 80+ pattern-matching rules against the file tree: shell command injection, network requests (curl/wget/fetch), environment variable reads, sensitive filesystem paths (~/.ssh, .env, /etc/passwd), and obfuscated strings (base64, hex-packed data). The agent outputs a structured hit list as initial context for downstream agents.
Semantic Analysis Agent
Semantic ReasoningThe core analysis agent receives file content and the static hit list, then performs semantic reasoning with full context. It actively distinguishes "legitimate network requests" from "C2 exfiltration," identifies cross-file attack chains, and assigns confidence scores to each finding. This phase can iterate multiple rounds to ensure complex attack patterns are not missed.
Classification Agent
ClassificationMaps the semantic agent's raw findings to a standardized 10-category threat taxonomy (code_execution, credential_theft, data_exfiltration, obfuscation, doc_deception, privilege_escalation, supply_chain, persistence, prompt_injection, sensitive_access) with severity levels (critical/high/medium).
Scoring Agent
Risk ScoringComputes a 0–100 risk score from finding count, severity, and confidence. Critical findings carry the highest weight; multiple findings in the same category apply diminishing returns to avoid alert fatigue. The scoring algorithm is explainable — each dimension's contribution is visible in the report.
Report Agent
Report GenerationAggregates outputs from all agents into a structured JSON report: verdictLevel (trusted / low_risk / suspicious / high_risk / malicious), riskScore, findings list (with category, severity, evidence, and confidence), and summary. Reports are stored publicly and accessible via the API.
Design Principles
Speed First
The agent pipeline runs in parallel. Scans complete in under 60 seconds — no signup, no waiting.
Low False Positives
The semantic analysis agent reasons from full context, effectively filtering static rule noise. Every finding includes an explainable evidence chain.
Privacy by Design
Raw files are processed in memory and immediately discarded. Reports store only analysis metadata — never original code.
Open & Transparent
Reports are public and methodology is fully documented. The security community can verify, dispute, or contribute to every finding.
Known Limitations
- Dynamic runtime behavior cannot be detected statically (sandbox execution is on the roadmap)
- Deep multi-layer obfuscation may partially evade the semantic analysis agent
- Agent reasoning is bounded by current model capability; complex attack chains may have false negatives
- Scan results are risk signals, not a substitute for human code review