Reading a BIG xml in .NET

Reading a BIG xml in .NET

I have been using different libraries for reading/writing XML files in .NET, and for me the most comfortable one is System.Xml.Linq. Reading and writing XML by creating XElemetns is easy to use and I really like it. I was really happy when I found on StackOverflow, how to read very big XML files by using classes from Linq.Xml namespace.

A simple XDocument.Load(string fileName) is parsing a whole file and for really big XML-s it is reserving a huge amount of memory. So I have started to look for a better approach. The Enumerator mentioned on StackOverflow internally iterates through an XML file, line by line, and when the condition is fullfilled, it returns XElement object. It seems ideal at first glance, but let’s test it.

I have created two test classes:

I have tested those readers, by using a simple test program. It loads a file by using one of the readers, then iterates through all child elements (under the root element), and measures the time.

I have run this test on my laptop:

  • CPU: I7-3610QM
  • RAM: 2x4GB DDR3
  • HD: SEGATE st750lm022 hn-m750mbb 750GB (test files)
  • HD: CRUCIAl CT-128M550SSD3 (OS)
  • OS: Win7
 
Test Files
Large Big Medium Small
SIZE [KB] 769 915 301 335 11 191 175
NUMBER OF NODES* 65536 24602 853 13

* I am counting only first level nodes, but test files contain multilevel nodes with many attributes.

 
Results for Test Xml Reader 1
Large Big Medium Small
TIME [s] 23,8592633 7,6588557 0,2699508 0,00411189
RAM [KB] 4 014 440 1 530 208 69 488 10 172
 
Results for Test Xml Reader 2
Large Big Medium Small
TIME [s] 12,4330569 4,2049116 0,1781893 0,00540669
RAM [KB] 12 800 12 336 13 220 10 028
 

As you can see the RAM consumption for big XML-s is enormous, so if you want to iterate through XML and filter it down (or import only a part of it) the best solution will be the Enumerator. But if you would like to open a small XML (nodes < 100) then you can use Load functon without any worries.

Related Posts

3 COMMENTS

  1. Wit
    June 07, 2016 11:54 Reply

    Interesting result showing how built-in methods are not appropriate for big tasks.
    How about accessing some selected nodes?

  2. Paweł Mucha
    June 08, 2016 08:07 Reply

    In both readers implementations XML nodes have been deserialized into XElements objects, so access time to its attributes/child nodes is exacly the same

  3. Srihariraju Penmatsa
    October 19, 2016 11:00 Reply

    Super TestXmlReader2 worked for me, really a good solution

Leave a reply