Trusted — Risk Score 5/100
Last scan:16 hr ago Rescan
5 /100
paddleocr-vl-locally
Complex document parsing with PaddleOCR — converts PDFs and document images into Markdown and JSON via a user-provided Triton inference endpoint.
PaddleOCR document parsing skill — clean implementation with no malicious behavior, no credential theft, no obfuscation, and honest documentation matching code.
Skill Namepaddleocr-vl-locally
Duration34.3s
Enginepi
Safe to install
No action needed. The skill is safe to use as documented.

Findings 1 items

Severity Finding Location
Low
Loose dependency version pins Supply Chain
requirements.txt uses >= pins (httpx>=0.24.0, Pillow>=10.0.0, pypdfium2>=4.0.0) instead of locked versions. This is a minor risk as no known CVEs affect these packages at current versions.
httpx>=0.24.0
→ Pin to specific versions (e.g., httpx==0.27.0) for reproducible builds.
scripts/requirements.txt:4
ResourceDeclaredInferredStatusEvidence
Filesystem NONE READ ✓ Aligned lib.py:141 — _load_file_as_base64 reads user-provided files
Network NONE WRITE ✓ Aligned lib.py:151 — httpx.Client.post to user-configured Triton endpoint
Shell NONE NONE No subprocess, no shell invocation anywhere in codebase
Environment READ READ ✓ Aligned lib.py:47-57 — _get_env reads PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TO…
7 findings
🔗
Medium External URL 外部 URL
http://10.0.0.1:8020/v2/models/layout-parsing/infer
SKILL.md:213
🔗
Medium External URL 外部 URL
http://10.0.133.33:8020/v2/models/layout-parsing/infer
SKILL.md:223
🔗
Medium External URL 外部 URL
https://your-server.com/large_file.pdf
SKILL.md:261
🔗
Medium External URL 外部 URL
http://www.apache.org/licenses/LICENSE-2.0
scripts/lib.py:7
🔗
Medium External URL 外部 URL
https://www.paddleocr.com
scripts/smoke_test.py:42
🔗
Medium External URL 外部 URL
https://your-api-url.paddleocr.com/layout-parsing
scripts/smoke_test.py:49
🔗
Medium External URL 外部 URL
https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png
scripts/smoke_test.py:115

File Tree

10 files · 44.2 KB · 1413 lines
Python 5f · 974L Markdown 2f · 422L Text 2f · 12L JSON 1f · 5L
├─ 📁 references
│ └─ 📝 output_schema.md Markdown 100L · 2.6 KB
├─ 📁 scripts
│ ├─ 🐍 lib.py Python 368L · 11.7 KB
│ ├─ 🐍 optimize_file.py Python 165L · 5.2 KB
│ ├─ 📄 requirements-optimize.txt Text 8L · 190 B
│ ├─ 📄 requirements.txt Text 4L · 71 B
│ ├─ 🐍 smoke_test.py Python 157L · 4.7 KB
│ ├─ 🐍 split_pdf.py Python 130L · 4.3 KB
│ └─ 🐍 vl_caller.py Python 154L · 4.9 KB
├─ 📋 _meta.json JSON 5L · 140 B
└─ 📝 SKILL.md Markdown 322L · 10.6 KB

Dependencies 3 items

PackageVersionSourceKnown VulnsNotes
httpx >=0.24.0 pip No Version not pinned to exact release
Pillow >=10.0.0 pip No Version not pinned to exact release
pypdfium2 >=4.0.0 pip No Version not pinned to exact release

Security Positives

✓ No shell execution, subprocess, or curl|bash patterns anywhere in the codebase
✓ No credential theft — environment variables are read-only and used only for API authentication
✓ No data exfiltration — all network calls go to the user-configured, user-provided Triton endpoint only
✓ No obfuscation — all code is plaintext Python, no base64/eval/atob patterns
✓ Documentation accurately reflects all implemented behavior (doc-to-code match)
✓ No access to sensitive paths (~/.ssh, ~/.aws, .env, etc.)
✓ No hidden instructions or steganographic content
✓ Filesystem writes are limited to user-specified output paths or the OS temp directory
✓ Error handling is thorough and returns structured errors without leaking system internals
✓ Apache 2.0 license from PaddlePaddle authors, consistent with PaddleOCR ecosystem