低风险 — 风险评分 18/100
上次扫描:23 小时前 重新扫描
18 /100
fetch-archive-to-lexiang
通用文章抓取与归档工具。抓取任意 URL(免费/付费/登录墙)的文章全文,转换为结构化 Markdown,并可选转存到乐享知识库。
A legitimate web scraping and content archiving tool with well-documented behavior; undocumented Chrome cookie extraction is standard scraping technique without exfiltration.
技能名称fetch-archive-to-lexiang
分析耗时60.9s
引擎pi
可以安装
Consider documenting Chrome cookie database access in SKILL.md for full transparency. No blocking issues identified.

安全发现 3 项

严重性 安全发现 位置
低危
Chrome cookie decryption mechanism not documented in SKILL.md 文档欺骗
SKILL.md mentions '从 Chrome Cookie DB 提取 cookies' but doesn't explicitly document the SQLite database access + PBKDF2/Safe Storage decryption chain. This is standard Chrome scraping technique but should be disclosed.
chrome_dir = Path.home() / 'Library/Application Support/Google/Chrome/Default'; cookies_db = chrome_dir / 'Cookies'; subprocess.run(['security', 'find-generic-password', '-s', 'Chrome Safe Storage', '-w'])
→ Add a brief technical note in SKILL.md explaining that the script extracts cookies from Chrome's encrypted SQLite database using macOS Keychain Safe Storage — this is how it bypasses paywalls for authenticated content.
scripts/fetch_article.py:230
低危
Pinned dependency versions missing 供应链
No requirements.txt or pinned versions. Scripts import: playwright, pymupdf, openai-whisper, openai, opencc-python-reimplemented, cryptography. SKILL.md documents install commands but without version pins.
pip3 install pymupdf | pip3 install openai-whisper
→ Add a requirements.txt with pinned versions to prevent supply chain attacks from unpinned dependencies.
SKILL.md:1
提示
SKILL.md incorrectly flagged hardcoded IP at line 543 文档欺骗
Pre-scan flagged '131.0.0.0' as a hardcoded IP at line 543. This is actually the Chrome browser version string 'Chrome/131.0.0.0' in a user agent — a false positive. The string '131.0.0.0' is not used for any network connection.
"Chrome/131.0.0.0 Safari/537.36"
→ No action needed — this is a version number, not an IP address.
scripts/fetch_article.py:543
资源类型声明权限推断权限状态证据
网络访问 READ READ ✓ 一致 WebFetch for articles, image downloads, YouTube/podcast downloads
文件系统 WRITE WRITE ✓ 一致 Writes .md, .json, images/, .pdf to output directories
命令执行 WRITE WRITE ✓ 一致 subprocess.Popen for Chrome, subprocess.run for ffmpeg/yt-dlp/security commands
环境变量 NONE READ ✓ 一致 OPENAI_API_KEY, proxy vars - used locally for AI translation, no exfiltration
浏览器 NONE WRITE ✓ 一致 CDP port 9222 connection, Playwright automation - declared in SKILL.md CDP mode …
1 高危 24 项发现
📡
高危 IP 地址 硬编码 IP 地址
131.0.0.0
scripts/fetch_article.py:543
🔗
中危 外部 URL 外部 URL
https://lexiangla.com
README.md:3
🔗
中危 外部 URL 外部 URL
https://www.codebuddy.ai/
README.md:5
🔗
中危 外部 URL 外部 URL
https://docs.anthropic.com/en/docs/claude-code
README.md:5
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/mcp
README.md:54
🔗
中危 外部 URL 外部 URL
https://mp.weixin.qq.com/s/xxxxx
README.md:65
🔗
中危 外部 URL 外部 URL
https://www.lennysnewsletter.com/p/xxxxx
README.md:68
🔗
中危 外部 URL 外部 URL
https://www.youtube.com/watch?v=xxxxx
README.md:71
🔗
中危 外部 URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxxxx
README.md:74
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/pages/
README.md:117
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/
README.md:118
🔗
中危 外部 URL 外部 URL
https://www.youtube.com/watch?v=xxx
SKILL.md:290
🔗
中危 外部 URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/
SKILL.md:368
🔗
中危 外部 URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxx
SKILL.md:410
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/b6013f6492894a29abbd89d5f2e636c6?company_from=e6c565d6d16811efac17768586f8a025
SKILL.md:473
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/xxxxx?company_from=yyyyy
SKILL.md:502
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy
SKILL.md:548
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/xxx
SKILL.md:549
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/pages/xxx
SKILL.md:551
🔗
中危 外部 URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy)。你可以在乐享中进入目标知识库首页,复制地址栏链接。」|
SKILL.md:551
🔗
中危 外部 URL 外部 URL
https://www.dedao.cn/course/article?id=
SKILL.md:865
🔗
中危 外部 URL 外部 URL
http://127.0.0.1:
scripts/fetch_article.py:228
🔗
中危 外部 URL 外部 URL
https://substack.com
scripts/fetch_article.py:1474
🔗
中危 外部 URL 外部 URL
https://substack.com/sign-in
scripts/fetch_article.py:1505

目录结构

5 文件 · 156.4 KB · 3628 行
Python 3f · 2578L Markdown 2f · 1050L
├─ 📁 scripts
│ ├─ 🐍 fetch_article.py Python 1590L · 66.2 KB
│ ├─ 🐍 md_to_pdf.py Python 412L · 14.0 KB
│ └─ 🐍 yt_download_transcribe.py Python 576L · 19.0 KB
├─ 📝 README.md Markdown 138L · 5.0 KB
└─ 📝 SKILL.md Markdown 912L · 52.2 KB

依赖分析 8 项

包名版本来源已知漏洞备注
playwright * pip Version not pinned
pymupdf * pip Version not pinned
openai-whisper * pip Version not pinned
openai * pip Version not pinned
opencc-python-reimplemented * pip Version not pinned
cryptography * pip Version not pinned; used for Chrome cookie AES decryption
yt-dlp * brew brew-installed, not pip; version not pinned in docs
ffmpeg * brew brew-installed, version not pinned in docs

安全亮点

✓ No evidence of data exfiltration — all cookies decrypted locally and used only for scraping, not sent anywhere
✓ No base64-encoded execution, eval(), or obfuscation patterns found
✓ No reverse shell, C2 communication, or credential harvesting for exfiltration
✓ No access to ~/.ssh, ~/.aws, or other sensitive credential directories
✓ SKILL.md accurately describes all major capabilities (CDP mode, cookie injection, Substack login, image extraction)
✓ OpenAI API calls use environment variable only locally, no hardcoded keys
✓ Substack login state saved to ~/.substack/ only — scoped to relevant domain
✓ Chrome CDP profile uses dedicated directory (~/.fetch_article/chrome_cdp_profile), not default Chrome profile
✓ No curl|bash or wget|sh remote script execution
✓ YouTube/podcast downloads use yt-dlp with documented format preferences