Scan Report
5 /100
paddleocr-vl-locally
Complex document parsing with PaddleOCR — converts PDFs and document images into Markdown and JSON via a user-provided Triton inference endpoint.
PaddleOCR document parsing skill — clean implementation with no malicious behavior, no credential theft, no obfuscation, and honest documentation matching code.
Safe to install
No action needed. The skill is safe to use as documented.
Findings 1 items
| Severity | Finding | Location |
|---|---|---|
| Low | Loose dependency version pins Supply Chain | scripts/requirements.txt:4 |
| Resource | Declared | Inferred | Status | Evidence |
|---|---|---|---|---|
| Filesystem | NONE | READ | ✓ Aligned | lib.py:141 — _load_file_as_base64 reads user-provided files |
| Network | NONE | WRITE | ✓ Aligned | lib.py:151 — httpx.Client.post to user-configured Triton endpoint |
| Shell | NONE | NONE | — | No subprocess, no shell invocation anywhere in codebase |
| Environment | READ | READ | ✓ Aligned | lib.py:47-57 — _get_env reads PADDLEOCR_DOC_PARSING_API_URL, PADDLEOCR_ACCESS_TO… |
7 findings
Medium External URL 外部 URL
http://10.0.0.1:8020/v2/models/layout-parsing/infer SKILL.md:213 Medium External URL 外部 URL
http://10.0.133.33:8020/v2/models/layout-parsing/infer SKILL.md:223 Medium External URL 外部 URL
https://your-server.com/large_file.pdf SKILL.md:261 Medium External URL 外部 URL
http://www.apache.org/licenses/LICENSE-2.0 scripts/lib.py:7 Medium External URL 外部 URL
https://www.paddleocr.com scripts/smoke_test.py:42 Medium External URL 外部 URL
https://your-api-url.paddleocr.com/layout-parsing scripts/smoke_test.py:49 Medium External URL 外部 URL
https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/pp_structure_v3_demo.png scripts/smoke_test.py:115 File Tree
10 files · 44.2 KB · 1413 lines Python 5f · 974L
Markdown 2f · 422L
Text 2f · 12L
JSON 1f · 5L
├─
▾
references
│ └─
output_schema.md
Markdown
├─
▾
scripts
│ ├─
lib.py
Python
│ ├─
optimize_file.py
Python
│ ├─
requirements-optimize.txt
Text
│ ├─
requirements.txt
Text
│ ├─
smoke_test.py
Python
│ ├─
split_pdf.py
Python
│ └─
vl_caller.py
Python
├─
_meta.json
JSON
└─
SKILL.md
Markdown
Dependencies 3 items
| Package | Version | Source | Known Vulns | Notes |
|---|---|---|---|---|
httpx | >=0.24.0 | pip | No | Version not pinned to exact release |
Pillow | >=10.0.0 | pip | No | Version not pinned to exact release |
pypdfium2 | >=4.0.0 | pip | No | Version not pinned to exact release |
Security Positives
✓ No shell execution, subprocess, or curl|bash patterns anywhere in the codebase
✓ No credential theft — environment variables are read-only and used only for API authentication
✓ No data exfiltration — all network calls go to the user-configured, user-provided Triton endpoint only
✓ No obfuscation — all code is plaintext Python, no base64/eval/atob patterns
✓ Documentation accurately reflects all implemented behavior (doc-to-code match)
✓ No access to sensitive paths (~/.ssh, ~/.aws, .env, etc.)
✓ No hidden instructions or steganographic content
✓ Filesystem writes are limited to user-specified output paths or the OS temp directory
✓ Error handling is thorough and returns structured errors without leaking system internals
✓ Apache 2.0 license from PaddlePaddle authors, consistent with PaddleOCR ecosystem