fix: add JSON-aware comparison to Python comparator fallback#1276
Merged
Conversation
Fixed a bug where the Python fallback comparator used simple string
comparison for JSON results, causing false negatives when JSON was
semantically identical but formatted differently.
Problem:
The compare_invocations_directly() function compared result_json fields
using direct string comparison (orig_result != cand_result). This failed
for semantically identical JSON with:
- Different whitespace: {"a":1,"b":2} vs { "a": 1, "b": 2 }
- Different key ordering: {"a":1,"b":2} vs {"b":2,"a":1}
The Java Comparator handles this correctly by parsing JSON, but the Python
fallback did not.
Solution:
- Added _compare_json_values() helper function that:
1. Handles None values correctly
2. Fast-path for exact string matches
3. Parses JSON and compares deserialized objects
4. Falls back to string comparison if JSON parsing fails
- Updated compare_invocations_directly() to use JSON-aware comparison
Impact:
- Prevents false negatives in behavior verification
- Matches Java Comparator behavior for consistency
- Handles whitespace, key ordering, and nested objects correctly
- Gracefully handles invalid JSON by falling back to string comparison
Tests added:
- Updated test_whitespace_in_json to expect correct behavior (True)
- Added TestJsonComparison class with 8 comprehensive tests:
* test_json_key_ordering_difference
* test_json_whitespace_and_ordering_combined
* test_json_nested_object_comparison
* test_json_array_comparison_order_matters
* test_json_invalid_json_falls_back_to_string
* test_json_null_vs_string_null
* test_json_empty_object_vs_null
* test_json_numeric_equivalence
Test results: 344 Java tests pass (26 comparator tests)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Contributor
|
@claude format and push |
Contributor
|
Claude finished @KRRT7's task in 44s —— View job Tasks
StatusI attempted to run the code formatter, but I don't have permission to execute The working tree is currently clean with no uncommitted changes, which suggests that either:
What needs to be doneTo format and push the code, you can run these commands locally: # Format the code
uv run ruff format codeflash/ tests/
# Check for linting issues
uv run ruff check codeflash/ tests/
# If there are changes, commit and push
git add -A
git commit -m "chore: format code with ruff
Co-authored-by: Kevin Turcios <KRRT7@users.noreply.github.com>"
git push origin fix/java-comparator-json-comparisonAlternatively, if you'd like me to have the permissions to run these commands, you can update the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Problem
The
compare_invocations_directly()function incomparator.pywas using simple string comparison for JSON results:This caused false negatives when JSON was semantically identical but formatted differently:
Example False Negative:
{"a":1,"b":2}{ "b": 2, "a": 1 }These are semantically identical but fail string comparison due to:
The Java Comparator handles this correctly by parsing JSON and doing deep equality, but the Python fallback did not.
Solution
Added JSON-aware comparison to match Java Comparator behavior:
New Helper Function:
_compare_json_values()Updated Comparison Logic:
What This Fixes
✅ Whitespace differences -
{"a":1}=={ "a": 1 }✅ Key ordering -
{"a":1,"b":2}=={"b":2,"a":1}✅ Nested objects - Deep equality on nested JSON structures
✅ Numeric types -
42==42.0(Python's JSON behavior)✅ Invalid JSON - Gracefully falls back to string comparison
❌ Array ordering -
[1,2,3]!=[3,2,1](correct behavior - order matters)Tests Added
Updated Existing Test:
test_whitespace_in_json- Now expectsTrue(was incorrectly expectingFalse)New Test Class:
TestJsonComparison(8 tests)test_json_key_ordering_difference
{"a":1,"b":2,"c":3}=={"c":3,"a":1,"b":2}test_json_whitespace_and_ordering_combined
test_json_nested_object_comparison
{"outer":{"inner":{"value":123}}}with whitespace variationstest_json_array_comparison_order_matters
[1,2,3]!=[3,2,1](order matters)test_json_invalid_json_falls_back_to_string
test_json_null_vs_string_null
nullvalue comparisontest_json_empty_object_vs_null
{}!=null(correctly different)test_json_numeric_equivalence
{"value":42}=={"value":42.0}Impact
Before (Buggy):
After (Fixed):
Testing
✅ All 344 Java tests pass (7 skipped)
✅ All 26 comparator tests pass (18 existing + 8 new)
✅ No regressions - existing tests still pass
✅ Comprehensive coverage - edge cases, invalid JSON, nested objects
Why This Matters
🤖 Generated with Claude Code