Working with XML documents in Ruby with REXML
Ruby provides a set of libraries that allow Ruby developer to work with XML documents. However, before you begin working with XML documents you need to validate them. Thankfully, Ruby provides a method to accomplish this task with relative ease.
valid_xml? Is such a method. You can use in order to validate xml. It returns parsed xml document which is ready for you to use.
require “rexml/document”
def valid_xml?(xml)
begin
REXML::Document.new(xml)
rescue REXML::ParseException
end
end
Ruby allows you to parse XML document and put it into Ruby data structure. In order to accomplish this we need to execute the following line of code
require “rexml/document”
my_xml = REXML::Document.new(my_xml_doc)
We can navigate to the root of this XML file via root method and iterate via each element with the help of each_element method. Both of these methods belong to Element class.
Here is an example how these methods can be utilized to print entire shallow level XML document
my_xml.root.each_element do |node1|
node1.each_element do |node2|
if node2.has_elements?
node.each_element do |child|
puts “#{child.name}”
end
end
end
Other useful method of parsing a large XML document is to parse it via REXML::Documenmt.parse_stream. We don’t load entire document into memory and parse it for specific part we want to interrogate.
Ruby XML parser provides developer with Xpath methods to navigate XML documents. Some of the often used methods are:
REXML::XPath.first(xmlDoc, ‘//name’)
REXML::XPath.match(xmlDoc, ‘//[@title=”Mr”]’)
REXML::XPath.each(xmlDoc,’//name’) do |person|
puts “#{person}”
end
In addition, there are third party gems that allow you to parse XML documents into Hashes. Example of such a gem is xmlsimple.
gem ‘xmlsimple’
require ‘xmlsimple’
require ‘pp’
doc = XmlSimple.xml_in xml
pp doc
XmlSimple is not a brand new approach to parsing an XML, it is in fact uses Document class and then parses XML into hash and array for ease of access and use.
In conclusion, REXML is so much more than it comes to XML domain. It can help you create new xml documents, modify and delete nodes, compress white spaces, perform replacements among other things.