fetch-archive-to-lexiang 安全扫描报告 — 低风险 | ClawSafe

18 /100

fetch-archive-to-lexiang

通用文章抓取与归档工具。抓取任意 URL（免费/付费/登录墙）的文章全文，转换为结构化 Markdown，并可选转存到乐享知识库。

A legitimate web scraping and content archiving tool with well-documented behavior; undocumented Chrome cookie extraction is standard scraping technique without exfiltration.

技能名称fetch-archive-to-lexiang

分析耗时60.9s

引擎pi

✓

可以安装

Consider documenting Chrome cookie database access in SKILL.md for full transparency. No blocking issues identified.

安全发现 3 项

严重性	安全发现	位置
低危	Chrome cookie decryption mechanism not documented in SKILL.md 文档欺骗 SKILL.md mentions '从 Chrome Cookie DB 提取 cookies' but doesn't explicitly document the SQLite database access + PBKDF2/Safe Storage decryption chain. This is standard Chrome scraping technique but should be disclosed. `chrome_dir = Path.home() / 'Library/Application Support/Google/Chrome/Default'; cookies_db = chrome_dir / 'Cookies'; subprocess.run(['security', 'find-generic-password', '-s', 'Chrome Safe Storage', '-w'])` → Add a brief technical note in SKILL.md explaining that the script extracts cookies from Chrome's encrypted SQLite database using macOS Keychain Safe Storage — this is how it bypasses paywalls for authenticated content.	`scripts/fetch_article.py:230`
低危	Pinned dependency versions missing 供应链 No requirements.txt or pinned versions. Scripts import: playwright, pymupdf, openai-whisper, openai, opencc-python-reimplemented, cryptography. SKILL.md documents install commands but without version pins. `pip3 install pymupdf \| pip3 install openai-whisper` → Add a requirements.txt with pinned versions to prevent supply chain attacks from unpinned dependencies.	`SKILL.md:1`
提示	SKILL.md incorrectly flagged hardcoded IP at line 543 文档欺骗 Pre-scan flagged '131.0.0.0' as a hardcoded IP at line 543. This is actually the Chrome browser version string 'Chrome/131.0.0.0' in a user agent — a false positive. The string '131.0.0.0' is not used for any network connection. `"Chrome/131.0.0.0 Safari/537.36"` → No action needed — this is a version number, not an IP address.	`scripts/fetch_article.py:543`

资源类型	声明权限	推断权限	状态	证据
网络访问	`READ`	`READ`	✓ 一致	WebFetch for articles, image downloads, YouTube/podcast downloads
文件系统	`WRITE`	`WRITE`	✓ 一致	Writes .md, .json, images/, .pdf to output directories
命令执行	`WRITE`	`WRITE`	✓ 一致	subprocess.Popen for Chrome, subprocess.run for ffmpeg/yt-dlp/security commands
环境变量	`NONE`	`READ`	✓ 一致	OPENAI_API_KEY, proxy vars - used locally for AI translation, no exfiltration
浏览器	`NONE`	`WRITE`	✓ 一致	CDP port 9222 connection, Playwright automation - declared in SKILL.md CDP mode …

1 高危 24 项发现

📡

高危 IP 地址硬编码 IP 地址

131.0.0.0

scripts/fetch_article.py:543

🔗

中危外部 URL 外部 URL

https://lexiangla.com

README.md:3

🔗

中危外部 URL 外部 URL

https://www.codebuddy.ai/

README.md:5

🔗

中危外部 URL 外部 URL

https://docs.anthropic.com/en/docs/claude-code

README.md:5

🔗

中危外部 URL 外部 URL

https://lexiangla.com/mcp

README.md:54

🔗

中危外部 URL 外部 URL

https://mp.weixin.qq.com/s/xxxxx

README.md:65

🔗

中危外部 URL 外部 URL

https://www.lennysnewsletter.com/p/xxxxx

README.md:68

🔗

中危外部 URL 外部 URL

https://www.youtube.com/watch?v=xxxxx

README.md:71

🔗

中危外部 URL 外部 URL

https://www.xiaoyuzhoufm.com/episode/xxxxx

README.md:74

🔗

中危外部 URL 外部 URL

https://lexiangla.com/pages/

README.md:117

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/

README.md:118

🔗

中危外部 URL 外部 URL

https://www.youtube.com/watch?v=xxx

SKILL.md:290

🔗

中危外部 URL 外部 URL

https://www.xiaoyuzhoufm.com/episode/

SKILL.md:368

🔗

中危外部 URL 外部 URL

https://www.xiaoyuzhoufm.com/episode/xxx

SKILL.md:410

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/b6013f6492894a29abbd89d5f2e636c6?company_from=e6c565d6d16811efac17768586f8a025

SKILL.md:473

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/xxxxx?company_from=yyyyy

SKILL.md:502

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/xxx?company_from=yyy

SKILL.md:548

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/xxx

SKILL.md:549

🔗

中危外部 URL 外部 URL

https://lexiangla.com/pages/xxx

SKILL.md:551

🔗

中危外部 URL 外部 URL

https://lexiangla.com/spaces/xxx?company_from=yyy）。你可以在乐享中进入目标知识库首页，复制地址栏链接。」|

SKILL.md:551

🔗

中危外部 URL 外部 URL

https://www.dedao.cn/course/article?id=

SKILL.md:865

🔗

中危外部 URL 外部 URL

http://127.0.0.1:

scripts/fetch_article.py:228

🔗

中危外部 URL 外部 URL

https://substack.com

scripts/fetch_article.py:1474

🔗

中危外部 URL 外部 URL

https://substack.com/sign-in

scripts/fetch_article.py:1505

目录结构

5 文件 · 156.4 KB · 3628 行

Python 3f · 2578L Markdown 2f · 1050L

├─ ▾ 📁 scripts

│ ├─ 🐍 fetch_article.py Python 1590L · 66.2 KB

│ ├─ 🐍 md_to_pdf.py Python 412L · 14.0 KB

│ └─ 🐍 yt_download_transcribe.py Python 576L · 19.0 KB

├─ 📝 README.md Markdown 138L · 5.0 KB

└─ 📝 SKILL.md Markdown 912L · 52.2 KB

依赖分析 8 项

包名	版本	来源	已知漏洞	备注
`playwright`	`*`	pip	否	Version not pinned
`pymupdf`	`*`	pip	否	Version not pinned
`openai-whisper`	`*`	pip	否	Version not pinned
`openai`	`*`	pip	否	Version not pinned
`opencc-python-reimplemented`	`*`	pip	否	Version not pinned
`cryptography`	`*`	pip	否	Version not pinned; used for Chrome cookie AES decryption
`yt-dlp`	`*`	brew	否	brew-installed, not pip; version not pinned in docs
`ffmpeg`	`*`	brew	否	brew-installed, version not pinned in docs

安全亮点

✓ No evidence of data exfiltration — all cookies decrypted locally and used only for scraping, not sent anywhere

✓ No base64-encoded execution, eval(), or obfuscation patterns found

✓ No reverse shell, C2 communication, or credential harvesting for exfiltration

✓ No access to ~/.ssh, ~/.aws, or other sensitive credential directories

✓ SKILL.md accurately describes all major capabilities (CDP mode, cookie injection, Substack login, image extraction)

✓ OpenAI API calls use environment variable only locally, no hardcoded keys

✓ Substack login state saved to ~/.substack/ only — scoped to relevant domain

✓ Chrome CDP profile uses dedicated directory (~/.fetch_article/chrome_cdp_profile), not default Chrome profile

✓ No curl|bash or wget|sh remote script execution

✓ YouTube/podcast downloads use yt-dlp with documented format preferences