Scan Report
18 /100
fetch-archive-to-lexiang
通用文章抓取与归档工具。抓取任意 URL(免费/付费/登录墙)的文章全文,转换为结构化 Markdown,并可选转存到乐享知识库。
A legitimate web scraping and content archiving tool with well-documented behavior; undocumented Chrome cookie extraction is standard scraping technique without exfiltration.
Safe to install
Consider documenting Chrome cookie database access in SKILL.md for full transparency. No blocking issues identified.
Findings 3 items
| Severity | Finding | Location |
|---|---|---|
| Low | Chrome cookie decryption mechanism not documented in SKILL.md Doc Mismatch | scripts/fetch_article.py:230 |
| Low | Pinned dependency versions missing Supply Chain | SKILL.md:1 |
| Info | SKILL.md incorrectly flagged hardcoded IP at line 543 Doc Mismatch | scripts/fetch_article.py:543 |
| Resource | Declared | Inferred | Status | Evidence |
|---|---|---|---|---|
| Network | READ | READ | ✓ Aligned | WebFetch for articles, image downloads, YouTube/podcast downloads |
| Filesystem | WRITE | WRITE | ✓ Aligned | Writes .md, .json, images/, .pdf to output directories |
| Shell | WRITE | WRITE | ✓ Aligned | subprocess.Popen for Chrome, subprocess.run for ffmpeg/yt-dlp/security commands |
| Environment | NONE | READ | ✓ Aligned | OPENAI_API_KEY, proxy vars - used locally for AI translation, no exfiltration |
| Browser | NONE | WRITE | ✓ Aligned | CDP port 9222 connection, Playwright automation - declared in SKILL.md CDP mode … |
1 High 24 findings
High IP Address 硬编码 IP 地址
131.0.0.0 scripts/fetch_article.py:543 Medium External URL 外部 URL
https://lexiangla.com README.md:3 Medium External URL 外部 URL
https://www.codebuddy.ai/ README.md:5 Medium External URL 外部 URL
https://docs.anthropic.com/en/docs/claude-code README.md:5 Medium External URL 外部 URL
https://lexiangla.com/mcp README.md:54 Medium External URL 外部 URL
https://mp.weixin.qq.com/s/xxxxx README.md:65 Medium External URL 外部 URL
https://www.lennysnewsletter.com/p/xxxxx README.md:68 Medium External URL 外部 URL
https://www.youtube.com/watch?v=xxxxx README.md:71 Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxxxx README.md:74 Medium External URL 外部 URL
https://lexiangla.com/pages/ README.md:117 Medium External URL 外部 URL
https://lexiangla.com/spaces/ README.md:118 Medium External URL 外部 URL
https://www.youtube.com/watch?v=xxx SKILL.md:290 Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/ SKILL.md:368 Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxx SKILL.md:410 Medium External URL 外部 URL
https://lexiangla.com/spaces/b6013f6492894a29abbd89d5f2e636c6?company_from=e6c565d6d16811efac17768586f8a025 SKILL.md:473 Medium External URL 外部 URL
https://lexiangla.com/spaces/xxxxx?company_from=yyyyy SKILL.md:502 Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy SKILL.md:548 Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx SKILL.md:549 Medium External URL 外部 URL
https://lexiangla.com/pages/xxx SKILL.md:551 Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy)。你可以在乐享中进入目标知识库首页,复制地址栏链接。」| SKILL.md:551 Medium External URL 外部 URL
https://www.dedao.cn/course/article?id= SKILL.md:865 Medium External URL 外部 URL
http://127.0.0.1: scripts/fetch_article.py:228 Medium External URL 外部 URL
https://substack.com scripts/fetch_article.py:1474 Medium External URL 外部 URL
https://substack.com/sign-in scripts/fetch_article.py:1505 File Tree
5 files · 156.4 KB · 3628 lines Python 3f · 2578L
Markdown 2f · 1050L
├─
▾
scripts
│ ├─
fetch_article.py
Python
│ ├─
md_to_pdf.py
Python
│ └─
yt_download_transcribe.py
Python
├─
README.md
Markdown
└─
SKILL.md
Markdown
Dependencies 8 items
| Package | Version | Source | Known Vulns | Notes |
|---|---|---|---|---|
playwright | * | pip | No | Version not pinned |
pymupdf | * | pip | No | Version not pinned |
openai-whisper | * | pip | No | Version not pinned |
openai | * | pip | No | Version not pinned |
opencc-python-reimplemented | * | pip | No | Version not pinned |
cryptography | * | pip | No | Version not pinned; used for Chrome cookie AES decryption |
yt-dlp | * | brew | No | brew-installed, not pip; version not pinned in docs |
ffmpeg | * | brew | No | brew-installed, version not pinned in docs |
Security Positives
✓ No evidence of data exfiltration — all cookies decrypted locally and used only for scraping, not sent anywhere
✓ No base64-encoded execution, eval(), or obfuscation patterns found
✓ No reverse shell, C2 communication, or credential harvesting for exfiltration
✓ No access to ~/.ssh, ~/.aws, or other sensitive credential directories
✓ SKILL.md accurately describes all major capabilities (CDP mode, cookie injection, Substack login, image extraction)
✓ OpenAI API calls use environment variable only locally, no hardcoded keys
✓ Substack login state saved to ~/.substack/ only — scoped to relevant domain
✓ Chrome CDP profile uses dedicated directory (~/.fetch_article/chrome_cdp_profile), not default Chrome profile
✓ No curl|bash or wget|sh remote script execution
✓ YouTube/podcast downloads use yt-dlp with documented format preferences