Files
Notare/lib/notare/xml_sanitizer.rb
mathias234 64c8679044
All checks were successful
CI Pipeline / build (push) Successful in 49s
Sanitize invalid XML characters in text content
Strip invalid XML 1.0 control characters (0x00-0x08, 0x0B-0x0C, 0x0E-0x1F)
from text to prevent corrupted docx files that fail to open in LibreOffice.

Fixes SAXParseException 'PCData Invalid Char value' errors.
2026-01-22 09:10:33 +01:00

16 lines
392 B
Ruby

# frozen_string_literal: true
module Notare
module XmlSanitizer
# Invalid XML 1.0 characters: 0x00, 0x01-0x08, 0x0B-0x0C, 0x0E-0x1F
# Valid whitespace preserved: 0x09 (tab), 0x0A (LF), 0x0D (CR)
INVALID_XML_CHARS = /[\x00-\x08\x0B\x0C\x0E-\x1F]/
def self.sanitize(text)
return text unless text.is_a?(String)
text.gsub(INVALID_XML_CHARS, "")
end
end
end