Low Risk — Risk Score 20/100
Last scan:22 hr ago Rescan
20 /100
invoice-extractor
Extract invoice information from images and PDF files using Baidu OCR API, export to Excel
A legitimate invoice OCR extraction tool using Baidu API with no malicious behavior detected; minor supply chain concern due to unpinned dependencies.
Skill Nameinvoice-extractor
Duration57.0s
Enginepi
Safe to install
Pin the requests library version in requirements.txt to prevent supply chain risk. Otherwise safe to use.

Findings 3 items

Severity Finding Location
Medium
Unpinned dependencies without upper bounds Supply Chain
requirements.txt specifies requests>=2.28.0, PyMuPDF>=1.23.0, and Pillow>=10.0.0 without upper version pins, creating supply chain risk if a malicious version is published to PyPI
requests>=2.28.0
→ Pin exact versions or set reasonable upper bounds: requests>=2.28.0,<3.0.0
requirements.txt:1
Low
Credential handling not explicitly declared Doc Mismatch
SKILL.md documents Baidu API credentials but does not explicitly mention that the tool reads BAIDU_API_KEY and BAIDU_SECRET_KEY from environment variables via os.getenv()
BAIDU_API_KEY = os.getenv("BAIDU_API_KEY", "")
→ Add a section in SKILL.md documenting environment variable support for API credentials
src/config.py:13
Low
PaddleOCR imported but not in requirements.txt Supply Chain
src/invoice_extractor.py imports PaddleOCR for local OCR fallback, but it is not listed in requirements.txt. If a user switches to local OCR mode, installation will fail silently or require manual setup.
from paddleocr import PaddleOCR
→ Add 'paddleocr' to requirements.txt or document the local OCR dependency separately
src/invoice_extractor.py:18
ResourceDeclaredInferredStatusEvidence
Filesystem READ READ ✓ Aligned SKILL.md: Reads invoice images/PDFs from user-specified directories; creates out…
Network READ READ ✓ Aligned src/baidu_ocr_extractor.py:23-49 - Uses requests.post to Baidu OCR API endpoints…
Environment NONE READ ✓ Aligned src/config.py:13-14 - Reads BAIDU_API_KEY and BAIDU_SECRET_KEY from os.getenv()
Shell NONE NONE install.sh:15 - Uses pip install, but this is a setup script not invoked by the …
9 findings
🔗
Medium External URL 外部 URL
https://cloud.baidu.com/product/ocr:
SKILL.md:29
🔗
Medium External URL 外部 URL
https://cloud.baidu.com/doc/OCR/index.html
SKILL.md:290
🔗
Medium External URL 外部 URL
https://cloud.baidu.com/product/ocr
config.template.txt:2
🔗
Medium External URL 外部 URL
https://python-poetry.org/docs/#installation
setup.md:93
🔗
Medium External URL 外部 URL
https://cloud.baidu.com/
setup.md:106
🔗
Medium External URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/vat_invoice
src/baidu_ocr_extractor.py:23
🔗
Medium External URL 外部 URL
https://aip.baidubce.com/rest/2.0/ocr/v1/accurate_basic
src/baidu_ocr_extractor.py:25
🔗
Medium External URL 外部 URL
https://aip.baidubce.com/oauth/2.0/token
src/baidu_ocr_extractor.py:49
📧
Info Email 邮箱地址
[email protected]
examples.md:338

File Tree

15 files · 101.6 KB · 3396 lines
Python 9f · 2171L Markdown 3f · 1170L Shell 1f · 44L Text 2f · 11L
├─ 📁 scripts
│ ├─ 🐍 batch_process.py Python 120L · 3.0 KB
│ └─ 🐍 verify_export.py Python 114L · 3.7 KB
├─ 📁 src
│ ├─ 🐍 baidu_ocr_extractor.py Python 367L · 13.8 KB
│ ├─ 🐍 config.py Python 105L · 3.6 KB
│ ├─ 🐍 excel_exporter.py Python 345L · 11.3 KB
│ ├─ 🐍 invoice_extractor.py Python 588L · 22.2 KB
│ ├─ 🐍 invoice_model.py Python 96L · 3.5 KB
│ ├─ 🐍 main_baidu.py Python 309L · 8.7 KB
│ └─ 🐍 main.py Python 127L · 3.6 KB
├─ 📄 config.template.txt Text 6L · 196 B
├─ 📝 examples.md Markdown 504L · 10.4 KB
├─ 🔧 install.sh Shell 44L · 1.2 KB
├─ 📄 requirements.txt Text 5L · 83 B
├─ 📝 setup.md Markdown 376L · 9.0 KB
└─ 📝 SKILL.md Markdown 290L · 7.4 KB

Dependencies 6 items

PackageVersionSourceKnown VulnsNotes
requests >=2.28.0 pip No Version not pinned - no upper bound allows potentially malicious future versions
pandas >=2.0.0 pip No Version not pinned
openpyxl >=3.1.0 pip No Version not pinned
PyMuPDF >=1.23.0 pip No Version not pinned
Pillow >=10.0.0 pip No Version not pinned
paddleocr not listed pip No Imported in invoice_extractor.py but missing from requirements.txt

Security Positives

✓ No reverse shell, C2, or data exfiltration to external IPs beyond intended Baidu API
✓ No credential harvesting or exfiltration - API keys are used only for Baidu OCR authentication
✓ No base64-encoded shell execution or obfuscated code
✓ No access to sensitive paths like ~/.ssh, ~/.aws, or .env files
✓ No curl|bash or wget|sh remote script execution patterns
✓ File operations are limited to user-specified input/output directories
✓ Baidu API credentials are never hardcoded - always loaded from environment or config file
✓ No hidden instructions in comments or HTML
✓ No eval(), exec(), or other dynamic code execution
✓ API endpoints use HTTPS with domain names, not direct IP addresses
✓ Config values are written to a local config.txt, not transmitted anywhere