Skip to content

fix: fall back from empty Docling native chunks#15601

Merged
yingfeng merged 3 commits into
infiniflow:mainfrom
he-yufeng:fix/docling-empty-chunk-fallback
Jun 4, 2026
Merged

fix: fall back from empty Docling native chunks#15601
yingfeng merged 3 commits into
infiniflow:mainfrom
he-yufeng:fix/docling-empty-chunk-fallback

Conversation

@he-yufeng

Copy link
Copy Markdown
Contributor

Summary

  • keep the native Docling chunking path when it returns usable chunks
  • fall back to the standard Docling response parser when a chunked request gets HTTP 200 but returns no usable chunks
  • add a regression test for older Docling servers that accept the chunking request but return a standard conversion payload

Why

Older external Docling servers can accept a request containing do_chunking: true and still return the standard conversion response shape. The current code treats any HTTP 200 from the chunked request as a native chunk response, finds no chunk entries, and returns zero sections without trying the standard response parser.

Fixes #15569.

Validation

  • python -m pytest test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py -q
  • python -m py_compile deepdoc\\parser\\docling_parser.py test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py
  • python -m ruff check deepdoc\\parser\\docling_parser.py test\\unit_test\\deepdoc\\parser\\test_docling_parser_remote.py
  • git diff --check

@dosubot dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. 🐞 bug Something isn't working, pull request that fix bug. labels Jun 3, 2026
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 62e31397-3cb6-4b6b-b0f7-c869f23ae4a2

📥 Commits

Reviewing files that changed from the base of the PR and between dc919ed and a8c0e66.

📒 Files selected for processing (1)
  • test/unit_test/deepdoc/parser/test_docling_parser_remote.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/unit_test/deepdoc/parser/test_docling_parser_remote.py

📝 Walkthrough

Walkthrough

Parser conditional logic now only short-circuits on native Docling chunked responses if returned sections is non-empty; otherwise it logs a warning and falls back to standard parsing. A new unit test verifies this fallback with a mocked remote response.

Changes

Docling Parser Chunked Response Fallback

Layer / File(s) Summary
Conditional chunked response short-circuit
deepdoc/parser/docling_parser.py
Native chunked responses now return (sections, tables) immediately only if sections contains data; if chunking returns no usable chunks, a warning is logged and parsing falls back to the standard (non-chunked) response path.
Test coverage for chunked fallback
test/unit_test/deepdoc/parser/test_docling_parser_remote.py
New unit test module with an in-test dependency loader that stubs common.constants, deepdoc.parser.utils, pdfplumber, and PIL.Image, monkeypatches requests.post to return a standard 200 payload, calls DoclingParser._parse_pdf_remote(..., parse_method="raw"), and asserts the fallback behavior and that the outgoing request enabled chunking (do_chunking == True).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • infiniflow/ragflow#14218: Earlier PR adding native chunking endpoint parsing and early-return logic related to Docling chunk handling.

Suggested labels

🌈 python, size:M

Suggested reviewers

  • Magicbook1108

Poem

🐰 I nibble through chunks that come up thin,
If none are found, I warn with a grin.
Back to the old path the parser will roam,
Safe parsing again—home sweet code home.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: fall back from empty Docling native chunks' clearly summarizes the main change: a fallback mechanism when chunking returns empty results.
Description check ✅ Passed The PR description covers the problem, solution, and validation, but lacks explicit type-of-change checkbox selection from the template.
Linked Issues check ✅ Passed The PR directly addresses issue #15569 by implementing fallback logic when external Docling servers return HTTP 200 with standard payload instead of native chunks, preventing zero-section results.
Out of Scope Changes check ✅ Passed All changes are scoped to the Docling parser fallback logic: core fix in docling_parser.py and regression test in test_docling_parser_remote.py, with no unrelated modifications.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/unit_test/deepdoc/parser/test_docling_parser_remote.py (1)

58-74: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add a pytest priority marker to this test.

This new test is missing the required p1/p2/p3 priority marker.

Proposed patch
 from __future__ import annotations
 
 import importlib.util
 import sys
 import types
 from pathlib import Path
+import pytest
@@
+@pytest.mark.p1
 def test_remote_chunked_200_standard_payload_falls_back(monkeypatch):

As per coding guidelines test/**/*.py: Use pytest with priority markers (p1/p2/p3) for Python testing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/unit_test/deepdoc/parser/test_docling_parser_remote.py` around lines 58
- 74, The test function test_remote_chunked_200_standard_payload_falls_back is
missing a pytest priority marker; add the appropriate marker decorator (e.g.,
`@pytest.mark.p1` or p2/p3 as required by the test classification) above the
function and ensure pytest is imported in the file (add "import pytest" if
absent) so the marker is recognized; keep the existing assertions and behavior
and only add the marker decorator to the function definition.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@test/unit_test/deepdoc/parser/test_docling_parser_remote.py`:
- Around line 58-74: The test function
test_remote_chunked_200_standard_payload_falls_back is missing a pytest priority
marker; add the appropriate marker decorator (e.g., `@pytest.mark.p1` or p2/p3 as
required by the test classification) above the function and ensure pytest is
imported in the file (add "import pytest" if absent) so the marker is
recognized; keep the existing assertions and behavior and only add the marker
decorator to the function definition.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 285eb5c2-93b5-49b1-acb4-7568544c4ac9

📥 Commits

Reviewing files that changed from the base of the PR and between 67c3e73 and dc919ed.

📒 Files selected for processing (2)
  • deepdoc/parser/docling_parser.py
  • test/unit_test/deepdoc/parser/test_docling_parser_remote.py

@yingfeng yingfeng added the ci Continue Integration label Jun 3, 2026
@yingfeng yingfeng marked this pull request as draft June 3, 2026 13:47
@yingfeng yingfeng marked this pull request as ready for review June 3, 2026 13:47
@he-yufeng

Copy link
Copy Markdown
Contributor Author

Pushed a small follow-up for the test priority marker flagged by CodeRabbit.

Validation run locally:

  • python -m pytest test/unit_test/deepdoc/parser/test_docling_parser_remote.py -q: 1 passed
  • python -m py_compile test/unit_test/deepdoc/parser/test_docling_parser_remote.py: passed
  • git diff --check: passed

@codecov

codecov Bot commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.16%. Comparing base (67c3e73) to head (1680612).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #15601   +/-   ##
=======================================
  Coverage   93.16%   93.16%           
=======================================
  Files          10       10           
  Lines         717      717           
  Branches      118      118           
=======================================
  Hits          668      668           
  Misses         29       29           
  Partials       20       20           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yingfeng yingfeng merged commit 5db1b29 into infiniflow:main Jun 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🐞 bug Something isn't working, pull request that fix bug. ci Continue Integration size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: external docling pdf parsing always results in [Docling] Native chunks received: 0

2 participants