Difference between DOM and SAX

  • Both are XML parser, that process the XML document to breaks up the text (element) into small identifiable pieces and which are finally mapped to Objects for application to process the elements.
  • Document Object Model (DOM) processes the entire document and stores the object in a tree structure to manipulate.
  • Simple API for XML (SAX) processes the document as it being read (like streams), generate events based on tags and events are handled by the event handler.

DOM: A tree-based processing

DOM processes the XML document and loads the data into memory in a tree-like structure. Consider the following XML code snippet:

  1. <?xml version=”1.0″?>
  2. <users>
  3.   <user ID=”1”>
  4.     <fname>John</fname>
  5.     <lname>Doe</lname>
  6.     <email>jdoe@force.com</email>
  7.   </user>
  8. </users>

A DOM processor analyzing this code snippet would generate the following tree structure in the memory.

dom.png
In-memory DOM Tree Structure

SAX: An event-based processing

SAX analyzes an XML stream as it goes by. The above example document generates the following events:

  1. Start document
  2. Start element (users)
  3. Characters (white space)
  4. Start element (user) with Attribute (ID=”1″)
  5. Characters (white space)
  6. Start element (fname)
  7. Characters (John)
  8. End element (fname)
  9. Characters (white space)
  10. Start element (lname)
  11. Characters (Doe)
  12. End element (lname)
  13. Characters (white space)
  14. Start element (email)
  15. Characters (jdoe@force.com)
  16. End element (email)
  17. Characters (white space)
  18. End element (user)
  19. End element (users)

The SAX API allows a developer to capture these events and act on them.

Pros and Cons

DOM

  • The tree is persistent in memory; it can be modified so an application can make changes to the data and the structure. It can also work its way up and down the tree at any time.
  • DOM can also be much simpler to use.
  • On the other hand, a lot of overhead is involved in building these trees in memory. It’s not unusual for large files to completely overrun a system’s capacity.
  • In addition, creating a DOM tree can be a very slow process.

SAX

  • Analysis can get started immediately, rather than waiting for all of the data to be processed, hence fast processing.
  • Application is simply examining the data as it goes by, it doesn’t need to store it in memory, hence cost less resource.
  • Application doesn’t even have to parse the entire document; it can stop when certain criteria have been satisfied, hence efficient processing.
  • On the other hand, the application is not persisting the data in any way, it is impossible to make changes to it using SAX, or to move backwards in the data stream.
DOM
SAX
Slow processing
Fast processing
Cost more resource
Cost less resource
Inefficient processing
Efficient processing
Non persistent
Persistent

When to choose DOM and SAX?

Depending on following factors, we can choose DOM or SAX,

  1. Application purpose
    • If application needs to refer back to processed data, make changes to the data and output it as XML, then DOM is a choice. Still SAX can be used, but the process is complex, as the application has to make changes to a copy of the data rather than the original data itself.
  2. XML data size
    • For large files, SAX is a better choice, since it processes the XML data as streams.
  3. Need for speed
    • SAX implementations are normally faster than DOM implementations.

Also note that, SAX and DOM are two implementations of parsing XML data, so we can use DOM to create stream of SAX events, and SAX to create a DOM tree.

 

Leave a comment