METHODOLOGY

How We Detect
AI Skill Threats

ClawSafe runs a multi-agent security pipeline — coordinating a static rule engine and semantic analysis agents — to deeply inspect every skill and produce structured reports.

Analysis Architecture

ClawSafe's core is an agent collaboration pipeline, not a single LLM call. Each analysis phase is handled by a dedicated agent, with context passed and accumulated across agents to ensure depth and accuracy.

01

Collection Agent

Collection

Accepts GitHub repositories, ClawHub packages, ZIP archives, and direct uploads. The agent extracts all text files (SKILL.md, scripts, configs) and builds a normalized file tree. Files are processed in memory and immediately discarded — no raw files are persisted.

02

Static Rule Agent

Static Analysis

Runs 80+ pattern-matching rules against the file tree: shell command injection, network requests (curl/wget/fetch), environment variable reads, sensitive filesystem paths (~/.ssh, .env, /etc/passwd), and obfuscated strings (base64, hex-packed data). The agent outputs a structured hit list as initial context for downstream agents.

03

Semantic Analysis Agent

Semantic Reasoning

The core analysis agent receives file content and the static hit list, then performs semantic reasoning with full context. It actively distinguishes "legitimate network requests" from "C2 exfiltration," identifies cross-file attack chains, and assigns confidence scores to each finding. This phase can iterate multiple rounds to ensure complex attack patterns are not missed.

04

Classification Agent

Classification

Maps the semantic agent's raw findings to a standardized 10-category threat taxonomy (code_execution, credential_theft, data_exfiltration, obfuscation, doc_deception, privilege_escalation, supply_chain, persistence, prompt_injection, sensitive_access) with severity levels (critical/high/medium).

05

Scoring Agent

Risk Scoring

Computes a 0–100 risk score from finding count, severity, and confidence. Critical findings carry the highest weight; multiple findings in the same category apply diminishing returns to avoid alert fatigue. The scoring algorithm is explainable — each dimension's contribution is visible in the report.

06

Report Agent

Report Generation

Aggregates outputs from all agents into a structured JSON report: verdictLevel (trusted / low_risk / suspicious / high_risk / malicious), riskScore, findings list (with category, severity, evidence, and confidence), and summary. Reports are stored publicly and accessible via the API.

Design Principles

Speed First

The agent pipeline runs in parallel. Scans complete in under 60 seconds — no signup, no waiting.

Low False Positives

The semantic analysis agent reasons from full context, effectively filtering static rule noise. Every finding includes an explainable evidence chain.

Privacy by Design

Raw files are processed in memory and immediately discarded. Reports store only analysis metadata — never original code.

Open & Transparent

Reports are public and methodology is fully documented. The security community can verify, dispute, or contribute to every finding.

Known Limitations

  • Dynamic runtime behavior cannot be detected statically (sandbox execution is on the roadmap)
  • Deep multi-layer obfuscation may partially evade the semantic analysis agent
  • Agent reasoning is bounded by current model capability; complex attack chains may have false negatives
  • Scan results are risk signals, not a substitute for human code review