Sanitize invalid XML characters in text content
All checks were successful
CI Pipeline / build (push) Successful in 49s
All checks were successful
CI Pipeline / build (push) Successful in 49s
Strip invalid XML 1.0 control characters (0x00-0x08, 0x0B-0x0C, 0x0E-0x1F) from text to prevent corrupted docx files that fail to open in LibreOffice. Fixes SAXParseException 'PCData Invalid Char value' errors.
This commit is contained in:
@@ -3,6 +3,7 @@
|
||||
require "nokogiri"
|
||||
|
||||
require_relative "notare/version"
|
||||
require_relative "notare/xml_sanitizer"
|
||||
require_relative "notare/nodes/base"
|
||||
require_relative "notare/nodes/break"
|
||||
require_relative "notare/nodes/hyperlink"
|
||||
|
||||
@@ -8,7 +8,7 @@ module Notare
|
||||
def initialize(text, bold: false, italic: false, underline: false,
|
||||
strike: false, highlight: nil, color: nil, style: nil)
|
||||
super()
|
||||
@text = text
|
||||
@text = XmlSanitizer.sanitize(text)
|
||||
@bold = bold
|
||||
@italic = italic
|
||||
@underline = underline
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
# frozen_string_literal: true
|
||||
|
||||
module Notare
|
||||
VERSION = "0.0.5"
|
||||
VERSION = "0.0.6"
|
||||
end
|
||||
|
||||
15
lib/notare/xml_sanitizer.rb
Normal file
15
lib/notare/xml_sanitizer.rb
Normal file
@@ -0,0 +1,15 @@
|
||||
# frozen_string_literal: true
|
||||
|
||||
module Notare
|
||||
module XmlSanitizer
|
||||
# Invalid XML 1.0 characters: 0x00, 0x01-0x08, 0x0B-0x0C, 0x0E-0x1F
|
||||
# Valid whitespace preserved: 0x09 (tab), 0x0A (LF), 0x0D (CR)
|
||||
INVALID_XML_CHARS = /[\x00-\x08\x0B\x0C\x0E-\x1F]/
|
||||
|
||||
def self.sanitize(text)
|
||||
return text unless text.is_a?(String)
|
||||
|
||||
text.gsub(INVALID_XML_CHARS, "")
|
||||
end
|
||||
end
|
||||
end
|
||||
Reference in New Issue
Block a user