扫描报告
20 /100
invoice-extractor
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel
A legitimate invoice OCR extraction tool using Baidu API with no malicious behavior detected; minor supply chain concern due to unpinned dependencies.
可以安装
Pin the requests library version in requirements.txt to prevent supply chain risk. Otherwise safe to use.
安全发现 3 项
| 严重性 | 安全发现 | 位置 |
|---|---|---|
| 中危 | Unpinned dependencies without upper bounds 供应链 | requirements.txt:1 |
| 低危 | Credential handling not explicitly declared 文档欺骗 | src/config.py:13 |
| 低危 | PaddleOCR imported but not in requirements.txt 供应链 | src/invoice_extractor.py:18 |
| 资源类型 | 声明权限 | 推断权限 | 状态 | 证据 |
|---|---|---|---|---|
| 文件系统 | READ | READ | ✓ 一致 | SKILL.md: Reads invoice images/PDFs from user-specified directories; creates out… |
| 网络访问 | READ | READ | ✓ 一致 | src/baidu_ocr_extractor.py:23-49 - Uses requests.post to Baidu OCR API endpoints… |
| 环境变量 | NONE | READ | ✓ 一致 | src/config.py:13-14 - Reads BAIDU_API_KEY and BAIDU_SECRET_KEY from os.getenv() |
| 命令执行 | NONE | NONE | — | install.sh:15 - Uses pip install, but this is a setup script not invoked by the … |
9 项发现
中危 外部 URL 外部 URL
https://cloud.baidu.com/product/ocr: SKILL.md:29 中危 外部 URL 外部 URL
https://cloud.baidu.com/doc/OCR/index.html SKILL.md:290 中危 外部 URL 外部 URL
https://cloud.baidu.com/product/ocr config.template.txt:2 中危 外部 URL 外部 URL
https://python-poetry.org/docs/#installation setup.md:93 中危 外部 URL 外部 URL
https://cloud.baidu.com/ setup.md:106 中危 外部 URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice src/baidu_ocr_extractor.py:23 中危 外部 URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic src/baidu_ocr_extractor.py:25 中危 外部 URL 外部 URL
https://aip.baidubce.com/oauth/2.0/token src/baidu_ocr_extractor.py:49 提示 邮箱 邮箱地址
[email protected] examples.md:338 目录结构
15 文件 · 101.6 KB · 3396 行 Python 9f · 2171L
Markdown 3f · 1170L
Shell 1f · 44L
Text 2f · 11L
├─
▾
scripts
│ ├─
batch_process.py
Python
│ └─
verify_export.py
Python
├─
▾
src
│ ├─
baidu_ocr_extractor.py
Python
│ ├─
config.py
Python
│ ├─
excel_exporter.py
Python
│ ├─
invoice_extractor.py
Python
│ ├─
invoice_model.py
Python
│ ├─
main_baidu.py
Python
│ └─
main.py
Python
├─
config.template.txt
Text
├─
examples.md
Markdown
├─
install.sh
Shell
├─
requirements.txt
Text
├─
setup.md
Markdown
└─
SKILL.md
Markdown
依赖分析 6 项
| 包名 | 版本 | 来源 | 已知漏洞 | 备注 |
|---|---|---|---|---|
requests | >=2.28.0 | pip | 否 | Version not pinned - no upper bound allows potentially malicious future versions |
pandas | >=2.0.0 | pip | 否 | Version not pinned |
openpyxl | >=3.1.0 | pip | 否 | Version not pinned |
PyMuPDF | >=1.23.0 | pip | 否 | Version not pinned |
Pillow | >=10.0.0 | pip | 否 | Version not pinned |
paddleocr | not listed | pip | 否 | Imported in invoice_extractor.py but missing from requirements.txt |
安全亮点
✓ No reverse shell, C2, or data exfiltration to external IPs beyond intended Baidu API
✓ No credential harvesting or exfiltration - API keys are used only for Baidu OCR authentication
✓ No base64-encoded shell execution or obfuscated code
✓ No access to sensitive paths like ~/.ssh, ~/.aws, or .env files
✓ No curl|bash or wget|sh remote script execution patterns
✓ File operations are limited to user-specified input/output directories
✓ Baidu API credentials are never hardcoded - always loaded from environment or config file
✓ No hidden instructions in comments or HTML
✓ No eval(), exec(), or other dynamic code execution
✓ API endpoints use HTTPS with domain names, not direct IP addresses
✓ Config values are written to a local config.txt, not transmitted anywhere