Sanitize invalid XML characters in text content
All checks were successful
CI Pipeline / build (push) Successful in 49s
All checks were successful
CI Pipeline / build (push) Successful in 49s
Strip invalid XML 1.0 control characters (0x00-0x08, 0x0B-0x0C, 0x0E-0x1F) from text to prevent corrupted docx files that fail to open in LibreOffice. Fixes SAXParseException 'PCData Invalid Char value' errors.
This commit is contained in:
15
lib/notare/xml_sanitizer.rb
Normal file
15
lib/notare/xml_sanitizer.rb
Normal file
@@ -0,0 +1,15 @@
|
||||
# frozen_string_literal: true
|
||||
|
||||
module Notare
|
||||
module XmlSanitizer
|
||||
# Invalid XML 1.0 characters: 0x00, 0x01-0x08, 0x0B-0x0C, 0x0E-0x1F
|
||||
# Valid whitespace preserved: 0x09 (tab), 0x0A (LF), 0x0D (CR)
|
||||
INVALID_XML_CHARS = /[\x00-\x08\x0B\x0C\x0E-\x1F]/
|
||||
|
||||
def self.sanitize(text)
|
||||
return text unless text.is_a?(String)
|
||||
|
||||
text.gsub(INVALID_XML_CHARS, "")
|
||||
end
|
||||
end
|
||||
end
|
||||
Reference in New Issue
Block a user