Low Risk — Risk Score 18/100
Last scan:21 hr ago Rescan
18 /100
fetch-archive-to-lexiang
通用文章抓取与归档工具。抓取任意 URL(免费/付费/登录墙)的文章全文,转换为结构化 Markdown,并可选转存到乐享知识库。
A legitimate web scraping and content archiving tool with well-documented behavior; undocumented Chrome cookie extraction is standard scraping technique without exfiltration.
Skill Namefetch-archive-to-lexiang
Duration60.9s
Enginepi
Safe to install
Consider documenting Chrome cookie database access in SKILL.md for full transparency. No blocking issues identified.

Findings 3 items

Severity Finding Location
Low
Chrome cookie decryption mechanism not documented in SKILL.md Doc Mismatch
SKILL.md mentions '从 Chrome Cookie DB 提取 cookies' but doesn't explicitly document the SQLite database access + PBKDF2/Safe Storage decryption chain. This is standard Chrome scraping technique but should be disclosed.
chrome_dir = Path.home() / 'Library/Application Support/Google/Chrome/Default'; cookies_db = chrome_dir / 'Cookies'; subprocess.run(['security', 'find-generic-password', '-s', 'Chrome Safe Storage', '-w'])
→ Add a brief technical note in SKILL.md explaining that the script extracts cookies from Chrome's encrypted SQLite database using macOS Keychain Safe Storage — this is how it bypasses paywalls for authenticated content.
scripts/fetch_article.py:230
Low
Pinned dependency versions missing Supply Chain
No requirements.txt or pinned versions. Scripts import: playwright, pymupdf, openai-whisper, openai, opencc-python-reimplemented, cryptography. SKILL.md documents install commands but without version pins.
pip3 install pymupdf | pip3 install openai-whisper
→ Add a requirements.txt with pinned versions to prevent supply chain attacks from unpinned dependencies.
SKILL.md:1
Info
SKILL.md incorrectly flagged hardcoded IP at line 543 Doc Mismatch
Pre-scan flagged '131.0.0.0' as a hardcoded IP at line 543. This is actually the Chrome browser version string 'Chrome/131.0.0.0' in a user agent — a false positive. The string '131.0.0.0' is not used for any network connection.
"Chrome/131.0.0.0 Safari/537.36"
→ No action needed — this is a version number, not an IP address.
scripts/fetch_article.py:543
ResourceDeclaredInferredStatusEvidence
Network READ READ ✓ Aligned WebFetch for articles, image downloads, YouTube/podcast downloads
Filesystem WRITE WRITE ✓ Aligned Writes .md, .json, images/, .pdf to output directories
Shell WRITE WRITE ✓ Aligned subprocess.Popen for Chrome, subprocess.run for ffmpeg/yt-dlp/security commands
Environment NONE READ ✓ Aligned OPENAI_API_KEY, proxy vars - used locally for AI translation, no exfiltration
Browser NONE WRITE ✓ Aligned CDP port 9222 connection, Playwright automation - declared in SKILL.md CDP mode …
1 High 24 findings
📡
High IP Address 硬编码 IP 地址
131.0.0.0
scripts/fetch_article.py:543
🔗
Medium External URL 外部 URL
https://lexiangla.com
README.md:3
🔗
Medium External URL 外部 URL
https://www.codebuddy.ai/
README.md:5
🔗
Medium External URL 外部 URL
https://docs.anthropic.com/en/docs/claude-code
README.md:5
🔗
Medium External URL 外部 URL
https://lexiangla.com/mcp
README.md:54
🔗
Medium External URL 外部 URL
https://mp.weixin.qq.com/s/xxxxx
README.md:65
🔗
Medium External URL 外部 URL
https://www.lennysnewsletter.com/p/xxxxx
README.md:68
🔗
Medium External URL 外部 URL
https://www.youtube.com/watch?v=xxxxx
README.md:71
🔗
Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxxxx
README.md:74
🔗
Medium External URL 外部 URL
https://lexiangla.com/pages/
README.md:117
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/
README.md:118
🔗
Medium External URL 外部 URL
https://www.youtube.com/watch?v=xxx
SKILL.md:290
🔗
Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/
SKILL.md:368
🔗
Medium External URL 外部 URL
https://www.xiaoyuzhoufm.com/episode/xxx
SKILL.md:410
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/b6013f6492894a29abbd89d5f2e636c6?company_from=e6c565d6d16811efac17768586f8a025
SKILL.md:473
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/xxxxx?company_from=yyyyy
SKILL.md:502
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy
SKILL.md:548
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx
SKILL.md:549
🔗
Medium External URL 外部 URL
https://lexiangla.com/pages/xxx
SKILL.md:551
🔗
Medium External URL 外部 URL
https://lexiangla.com/spaces/xxx?company_from=yyy)。你可以在乐享中进入目标知识库首页,复制地址栏链接。」|
SKILL.md:551
🔗
Medium External URL 外部 URL
https://www.dedao.cn/course/article?id=
SKILL.md:865
🔗
Medium External URL 外部 URL
http://127.0.0.1:
scripts/fetch_article.py:228
🔗
Medium External URL 外部 URL
https://substack.com
scripts/fetch_article.py:1474
🔗
Medium External URL 外部 URL
https://substack.com/sign-in
scripts/fetch_article.py:1505

File Tree

5 files · 156.4 KB · 3628 lines
Python 3f · 2578L Markdown 2f · 1050L
├─ 📁 scripts
│ ├─ 🐍 fetch_article.py Python 1590L · 66.2 KB
│ ├─ 🐍 md_to_pdf.py Python 412L · 14.0 KB
│ └─ 🐍 yt_download_transcribe.py Python 576L · 19.0 KB
├─ 📝 README.md Markdown 138L · 5.0 KB
└─ 📝 SKILL.md Markdown 912L · 52.2 KB

Dependencies 8 items

PackageVersionSourceKnown VulnsNotes
playwright * pip No Version not pinned
pymupdf * pip No Version not pinned
openai-whisper * pip No Version not pinned
openai * pip No Version not pinned
opencc-python-reimplemented * pip No Version not pinned
cryptography * pip No Version not pinned; used for Chrome cookie AES decryption
yt-dlp * brew No brew-installed, not pip; version not pinned in docs
ffmpeg * brew No brew-installed, version not pinned in docs

Security Positives

✓ No evidence of data exfiltration — all cookies decrypted locally and used only for scraping, not sent anywhere
✓ No base64-encoded execution, eval(), or obfuscation patterns found
✓ No reverse shell, C2 communication, or credential harvesting for exfiltration
✓ No access to ~/.ssh, ~/.aws, or other sensitive credential directories
✓ SKILL.md accurately describes all major capabilities (CDP mode, cookie injection, Substack login, image extraction)
✓ OpenAI API calls use environment variable only locally, no hardcoded keys
✓ Substack login state saved to ~/.substack/ only — scoped to relevant domain
✓ Chrome CDP profile uses dedicated directory (~/.fetch_article/chrome_cdp_profile), not default Chrome profile
✓ No curl|bash or wget|sh remote script execution
✓ YouTube/podcast downloads use yt-dlp with documented format preferences