低风险 — 风险评分 20/100
上次扫描:1 天前 重新扫描
20 /100
invoice-extractor
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel
A legitimate invoice OCR extraction tool using Baidu API with no malicious behavior detected; minor supply chain concern due to unpinned dependencies.
技能名称invoice-extractor
分析耗时57.0s
引擎pi
可以安装
Pin the requests library version in requirements.txt to prevent supply chain risk. Otherwise safe to use.

安全发现 3 项

严重性 安全发现 位置
中危
Unpinned dependencies without upper bounds 供应链
requirements.txt specifies requests>=2.28.0, PyMuPDF>=1.23.0, and Pillow>=10.0.0 without upper version pins, creating supply chain risk if a malicious version is published to PyPI
requests>=2.28.0
→ Pin exact versions or set reasonable upper bounds: requests>=2.28.0,<3.0.0
requirements.txt:1
低危
Credential handling not explicitly declared 文档欺骗
SKILL.md documents Baidu API credentials but does not explicitly mention that the tool reads BAIDU_API_KEY and BAIDU_SECRET_KEY from environment variables via os.getenv()
BAIDU_API_KEY = os.getenv("BAIDU_API_KEY", "")
→ Add a section in SKILL.md documenting environment variable support for API credentials
src/config.py:13
低危
PaddleOCR imported but not in requirements.txt 供应链
src/invoice_extractor.py imports PaddleOCR for local OCR fallback, but it is not listed in requirements.txt. If a user switches to local OCR mode, installation will fail silently or require manual setup.
from paddleocr import PaddleOCR
→ Add 'paddleocr' to requirements.txt or document the local OCR dependency separately
src/invoice_extractor.py:18
资源类型声明权限推断权限状态证据
文件系统 READ READ ✓ 一致 SKILL.md: Reads invoice images/PDFs from user-specified directories; creates out…
网络访问 READ READ ✓ 一致 src/baidu_ocr_extractor.py:23-49 - Uses requests.post to Baidu OCR API endpoints…
环境变量 NONE READ ✓ 一致 src/config.py:13-14 - Reads BAIDU_API_KEY and BAIDU_SECRET_KEY from os.getenv()
命令执行 NONE NONE install.sh:15 - Uses pip install, but this is a setup script not invoked by the …
9 项发现
🔗
中危 外部 URL 外部 URL
https://cloud.baidu.com/product/ocr:
SKILL.md:29
🔗
中危 外部 URL 外部 URL
https://cloud.baidu.com/doc/OCR/index.html
SKILL.md:290
🔗
中危 外部 URL 外部 URL
https://cloud.baidu.com/product/ocr
config.template.txt:2
🔗
中危 外部 URL 外部 URL
https://python-poetry.org/docs/#installation
setup.md:93
🔗
中危 外部 URL 外部 URL
https://cloud.baidu.com/
setup.md:106
🔗
中危 外部 URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice
src/baidu_ocr_extractor.py:23
🔗
中危 外部 URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic
src/baidu_ocr_extractor.py:25
🔗
中危 外部 URL 外部 URL
https://aip.baidubce.com/oauth/2.0/token
src/baidu_ocr_extractor.py:49
📧
提示 邮箱 邮箱地址
[email protected]
examples.md:338

目录结构

15 文件 · 101.6 KB · 3396 行
Python 9f · 2171L Markdown 3f · 1170L Shell 1f · 44L Text 2f · 11L
├─ 📁 scripts
│ ├─ 🐍 batch_process.py Python 120L · 3.0 KB
│ └─ 🐍 verify_export.py Python 114L · 3.7 KB
├─ 📁 src
│ ├─ 🐍 baidu_ocr_extractor.py Python 367L · 13.8 KB
│ ├─ 🐍 config.py Python 105L · 3.6 KB
│ ├─ 🐍 excel_exporter.py Python 345L · 11.3 KB
│ ├─ 🐍 invoice_extractor.py Python 588L · 22.2 KB
│ ├─ 🐍 invoice_model.py Python 96L · 3.5 KB
│ ├─ 🐍 main_baidu.py Python 309L · 8.7 KB
│ └─ 🐍 main.py Python 127L · 3.6 KB
├─ 📄 config.template.txt Text 6L · 196 B
├─ 📝 examples.md Markdown 504L · 10.4 KB
├─ 🔧 install.sh Shell 44L · 1.2 KB
├─ 📄 requirements.txt Text 5L · 83 B
├─ 📝 setup.md Markdown 376L · 9.0 KB
└─ 📝 SKILL.md Markdown 290L · 7.4 KB

依赖分析 6 项

包名版本来源已知漏洞备注
requests >=2.28.0 pip Version not pinned - no upper bound allows potentially malicious future versions
pandas >=2.0.0 pip Version not pinned
openpyxl >=3.1.0 pip Version not pinned
PyMuPDF >=1.23.0 pip Version not pinned
Pillow >=10.0.0 pip Version not pinned
paddleocr not listed pip Imported in invoice_extractor.py but missing from requirements.txt

安全亮点

✓ No reverse shell, C2, or data exfiltration to external IPs beyond intended Baidu API
✓ No credential harvesting or exfiltration - API keys are used only for Baidu OCR authentication
✓ No base64-encoded shell execution or obfuscated code
✓ No access to sensitive paths like ~/.ssh, ~/.aws, or .env files
✓ No curl|bash or wget|sh remote script execution patterns
✓ File operations are limited to user-specified input/output directories
✓ Baidu API credentials are never hardcoded - always loaded from environment or config file
✓ No hidden instructions in comments or HTML
✓ No eval(), exec(), or other dynamic code execution
✓ API endpoints use HTTPS with domain names, not direct IP addresses
✓ Config values are written to a local config.txt, not transmitted anywhere