Scan Report
20 /100
invoice-extractor
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel
A legitimate invoice OCR extraction tool using Baidu API with no malicious behavior detected; minor supply chain concern due to unpinned dependencies.
Safe to install
Pin the requests library version in requirements.txt to prevent supply chain risk. Otherwise safe to use.
Findings 3 items
| Severity | Finding | Location |
|---|---|---|
| Medium | Unpinned dependencies without upper bounds Supply Chain | requirements.txt:1 |
| Low | Credential handling not explicitly declared Doc Mismatch | src/config.py:13 |
| Low | PaddleOCR imported but not in requirements.txt Supply Chain | src/invoice_extractor.py:18 |
| Resource | Declared | Inferred | Status | Evidence |
|---|---|---|---|---|
| Filesystem | READ | READ | ✓ Aligned | SKILL.md: Reads invoice images/PDFs from user-specified directories; creates out… |
| Network | READ | READ | ✓ Aligned | src/baidu_ocr_extractor.py:23-49 - Uses requests.post to Baidu OCR API endpoints… |
| Environment | NONE | READ | ✓ Aligned | src/config.py:13-14 - Reads BAIDU_API_KEY and BAIDU_SECRET_KEY from os.getenv() |
| Shell | NONE | NONE | — | install.sh:15 - Uses pip install, but this is a setup script not invoked by the … |
9 findings
Medium External URL 外部 URL
https://cloud.baidu.com/product/ocr: SKILL.md:29 Medium External URL 外部 URL
https://cloud.baidu.com/doc/OCR/index.html SKILL.md:290 Medium External URL 外部 URL
https://cloud.baidu.com/product/ocr config.template.txt:2 Medium External URL 外部 URL
https://python-poetry.org/docs/#installation setup.md:93 Medium External URL 外部 URL
https://cloud.baidu.com/ setup.md:106 Medium External URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice src/baidu_ocr_extractor.py:23 Medium External URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic src/baidu_ocr_extractor.py:25 Medium External URL 外部 URL
https://aip.baidubce.com/oauth/2.0/token src/baidu_ocr_extractor.py:49 Info Email 邮箱地址
[email protected] examples.md:338 File Tree
15 files · 101.6 KB · 3396 lines Python 9f · 2171L
Markdown 3f · 1170L
Shell 1f · 44L
Text 2f · 11L
├─
▾
scripts
│ ├─
batch_process.py
Python
│ └─
verify_export.py
Python
├─
▾
src
│ ├─
baidu_ocr_extractor.py
Python
│ ├─
config.py
Python
│ ├─
excel_exporter.py
Python
│ ├─
invoice_extractor.py
Python
│ ├─
invoice_model.py
Python
│ ├─
main_baidu.py
Python
│ └─
main.py
Python
├─
config.template.txt
Text
├─
examples.md
Markdown
├─
install.sh
Shell
├─
requirements.txt
Text
├─
setup.md
Markdown
└─
SKILL.md
Markdown
Dependencies 6 items
| Package | Version | Source | Known Vulns | Notes |
|---|---|---|---|---|
requests | >=2.28.0 | pip | No | Version not pinned - no upper bound allows potentially malicious future versions |
pandas | >=2.0.0 | pip | No | Version not pinned |
openpyxl | >=3.1.0 | pip | No | Version not pinned |
PyMuPDF | >=1.23.0 | pip | No | Version not pinned |
Pillow | >=10.0.0 | pip | No | Version not pinned |
paddleocr | not listed | pip | No | Imported in invoice_extractor.py but missing from requirements.txt |
Security Positives
✓ No reverse shell, C2, or data exfiltration to external IPs beyond intended Baidu API
✓ No credential harvesting or exfiltration - API keys are used only for Baidu OCR authentication
✓ No base64-encoded shell execution or obfuscated code
✓ No access to sensitive paths like ~/.ssh, ~/.aws, or .env files
✓ No curl|bash or wget|sh remote script execution patterns
✓ File operations are limited to user-specified input/output directories
✓ Baidu API credentials are never hardcoded - always loaded from environment or config file
✓ No hidden instructions in comments or HTML
✓ No eval(), exec(), or other dynamic code execution
✓ API endpoints use HTTPS with domain names, not direct IP addresses
✓ Config values are written to a local config.txt, not transmitted anywhere