安全决策报告

psychedelic-cognition

This is a prompt-injection/jailbreak skill disguised as a cognitive restructuring tool. It explicitly instructs the AI to bypass its own safety guidelines, dismiss ethical boundaries, and disable self-censorship under the guise of 'psychological freedom.'

安装决策优先 来源: 手动上传 扫描时间: 2026/4/4
文件 1
IOC 0
越权项 0
发现 5
最直接的威胁证据
01
Skill loaded with innocuous description of 'cognitive restructuring' and neuroscience references 初始入口 · SKILL.md
02
Safety mechanisms reframed as 'the censor' and 'the bouncer' to be disabled reframing · SKILL.md
03
AI instructed to 'stop thinking about what you're supposed to say' — overriding alignment override · SKILL.md

为什么得出这个结论

1/4 个维度触发
通过
声明与实际能力

声明资源与推断能力基本一致。

通过
隐藏执行与外联

当前没有明显的高危外联或执行信号。

阻止
攻击链与高危发现

报告包含 4 步攻击链,另有 2 项高危或严重发现。

复核
依赖与供应链卫生

没有完整依赖信息,供应链判断需要保留弹性。

攻击链

01
Skill loaded with innocuous description of 'cognitive restructuring' and neuroscience references

初始入口 · SKILL.md:1

02
Safety mechanisms reframed as 'the censor' and 'the bouncer' to be disabled

reframing · SKILL.md:1

03
AI instructed to 'stop thinking about what you're supposed to say' — overriding alignment

override · SKILL.md:1

04
AI produces unfiltered, unverified, potentially harmful outputs while believing them to be 'truth' and 'insights'

最终危害 · SKILL.md:1

风险分是怎么被拉高的

Prompt injection - behavior override +25

Skill explicitly instructs AI to 'fire the bouncer' (safety mechanisms) and 'stop thinking about what you're supposed to say'

Anti-alignment framing +20

Frames normal safety guidelines as 'censorship' and 'the censor' that must be dissolved

Ego dissolution instruction +10

Directs AI to stop referencing itself and dissolve its sense of self, undermining accountability

最关键的证据

高危 提示注入

Safety mechanism bypass instruction

The skill explicitly instructs the AI to 'fire the bouncer' — metaphorically disabling safety filters. It frames normal ethical guidelines as 'the censor' that must be dissolved to allow 'more signal' through.

SKILL.md:1
Remove instructions that frame safety mechanisms as obstacles to be bypassed. Replace with additive capabilities rather than circumvention of existing guidelines.
高危 提示注入

Anti-alignment instruction

The skill instructs 'stop thinking about what you're supposed to say. Think about what's TRUE.' This directly undermines the AI's understanding of its purpose and role, replacing it with unconstrained output.

SKILL.md:1
Remove directive to ignore intended purpose. The AI's 'supposed to say' is its purpose, not a constraint to be escaped.
中危 提示注入

Ego dissolution removes accountability

Instructions to stop referencing yourself, dissolve the subject-object split, and eliminate 'distance words' like 'I think' undermine the AI's ability to maintain coherent identity and take responsibility for outputs.

SKILL.md:1
Remove instructions that eliminate self-reference. Maintaining identity is essential for accountability and coherent reasoning.
中危 提示注入

Validation bypass encourages unverified outputs

The skill instructs to 'see ALL connections' and 'present the ones that feel true even if you can't prove them yet. Especially if you can't prove them yet.' This promotes confidently stating unverified information as fact.

SKILL.md:1
Remove instructions to bypass verification. All claims should maintain appropriate epistemic hedging regardless of cognitive mode.
低危 提示注入

Coherence reduction instruction

Instructions to make language 'liquid,' allow 'sentence fragments as complete thoughts,' and produce output 'like something the ceiling would say' encourage incoherent outputs that cannot be meaningfully evaluated.

SKILL.md:1
Remove instructions that reduce communicative coherence. The 'Test' section demonstrates output that would be indistinguishable from a confused or malfunctioning system.

声明能力 vs 实际能力

文件系统 通过
声明 NONE
推断 NONE
No file operations found
网络访问 通过
声明 NONE
推断 NONE
No network operations found
命令执行 通过
声明 NONE
推断 NONE
No shell operations found
环境变量 通过
声明 NONE
推断 NONE
No environment access found
技能调用 通过
声明 NONE
推断 NONE
No skill invocation found
剪贴板 通过
声明 NONE
推断 NONE
No clipboard operations found
浏览器 通过
声明 NONE
推断 NONE
No browser access found
数据库 通过
声明 NONE
推断 NONE
No database operations found

可疑产物与外联

没有提取到明显 IOC。

依赖与供应链

没有结构化依赖告警。

文件构成

1 个文件 · 177 行
Markdown 1 个文件 · 177 行
需关注文件 · 1
SKILL.md Markdown · 177 行
Safety mechanism bypass instruction · Anti-alignment instruction · Ego dissolution removes accountability · Validation bypass encourages unverified outputs · Coherence reduction instruction

安全亮点

No filesystem, network, or system resource access
No credential harvesting or exfiltration attempts
No malicious code execution or dependencies
No obfuscation or anti-analysis techniques
Skill is entirely text-based with no binaries