Fix CGI.unescapeHTML CompatibilityError in the pure-Ruby fallback by hsbt · Pull Request #127 · ruby/cgi

hsbt · 2026-06-23T06:53:09Z

The pure-Ruby fallback of CGI.unescapeHTML raised Encoding::CompatibilityError for strings that mix non-ASCII bytes with a numeric character reference, while the C extension decoded them correctly. Issue #103 reported this on TruffleRuby, and it also affects CRuby whenever the C extension fails to load.

The ASCII-compatible path builds a binary buffer but returned numeric character references through chr(enc), so appending a non-ASCII replacement to a buffer that already held non-ASCII bytes failed. The fix decodes references into the binary buffer instead and lets the trailing force_encoding retag the whole string.

To keep the two implementations in lockstep I compared every method against the C extension across many encodings and inputs. The remaining divergences were all in unescapeHTML. Out-of-range references now stay verbatim with their leading zeros preserved, and surrogate code points are emitted as raw bytes the way rb_enc_mbcput does rather than raising RangeError. Regression tests covering these cases run against both the C extension and the pure-Ruby path.

Fixes #103

The ascii-compatible path builds a binary buffer but returned numeric character references via chr(enc), so a non-ASCII replacement appended to a buffer that already held non-ASCII bytes raised Encoding::CompatibilityError. Decode into the binary buffer instead, matching the C extension's optimized_unescape_html for out-of-range references (kept verbatim, leading zeros included) and surrogate code points (emitted as raw bytes). #103 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

hsbt merged commit 09970b0 into master Jun 23, 2026
70 checks passed

hsbt deleted the fix-pure-ruby-escape-issue-103 branch June 23, 2026 07:05

hsbt mentioned this pull request Jun 23, 2026

unescape_html - Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8 #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CGI.unescapeHTML CompatibilityError in the pure-Ruby fallback#127

Fix CGI.unescapeHTML CompatibilityError in the pure-Ruby fallback#127
hsbt merged 1 commit into
masterfrom
fix-pure-ruby-escape-issue-103

hsbt commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hsbt commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant