On The Insider: Britney's Bikini-Clad Top 10
BNET Business Network:
BNET
TechRepublic
ZDNet

September 9th, 2005

Proof that XML is extremely bloated

Posted by George Ou @ 12:03 pm

Categories: Infrastructure

Tags:

When I wrote this blog blasting the massive bulk of XML, our own John Carroll responded with a defense of XML for some situations.  John and I both agree that XML should not be used to handle large amounts of data because of increased storage and processing requirements, we only disagree where the cut-off should be.  Some readers noted that XML word processing documents were actually smaller than Microsoft Word documents and this is true because the ratio of the XML tags to the size of the paragraphs is minimal.  To clarify my position, I was actually complaining about bloated XML spreadsheets and databases rather than word processing documents.

I quoted 1000% bloat in reference to spreadsheets and databases but one of our regular readers "Yagotta B. Kidding" pressed me to present some hard evidence.  Reader Patrick Jones responded by offering a spreadsheet that was stored as an 11 Megabyte XML file and as a 3 Megabyte XLS file.  Jones noted that the XML file was very compressible, 194 Kilobytes to be exact.  In that sense, one could argue that XML files can actually be smaller when ZIP is used, but I should warn that this may not be the best example because of the amount of redundant data in the sample that Jones provided.  We also have to take in to account that compression takes additional resources and we are essentially left with a binary file.  If you want to run your own experiments, I’ve zipped up a copy of this sample XML file and placed a copy of it here.

When I ran my tests, I noticed that the time it takes to open the 11 MB XML file was substantially longer than the time it took to open the 3 MB XLS native Microsoft Excel binary format.  But to get a more accurate measurement of the time difference, I decided to make the sample larger by making duplicate copies of the entire page within the sample XML file.  I did this by right clicking on the bottom tab to copy the entire sheet.  Once I had two sheets, I highlighted both sheets and made 4 and then 8 and then 16 sheets.  By the time I got to the 8th page, my laptop was getting a pretty good workout and getting to the 16th page almost locked up my laptop because it was beginning to run out of memory.  When I had the 4th duplication done, I had 16 sheets that ended up taking 193 Megabytes on the disk.  It took me 45 seconds to save this XML file to disk and opening this large XML file took 46 seconds.  I then decided to save the file in Microsoft’s native XLS format and it took a mere 7 seconds and opening the file was even faster at 2 seconds.  Compression with IZArc took an additional 26 seconds and uncompressing the file took another 19 seconds.  Here is a break down of this simple little experiment.

File Size Read Write
Native XLS binary 50,995 KB 2 sec 7 sec
XML spreadsheet 192,892 KB 46 sec 45 sec

Some of you will note that this particular XML file is only 3.78 times bigger than the XLS file, but this particular sample doesn’t have that many fields which reduces the number of XML tags.  In this second sample, the XML version is 10.6 times bigger than the CSV file and 7.7 times bigger than the XLS version.  So the 1000% bloat figure I originally quoted might not always be true but it is indeed possible.  Using compression would solve the storage and transmission problems but it worsens the processing and memory requirements for using XML.

The bottom line is that the large sample XML file was excruciatingly slow and took more than 20 times longer to read and 6 times longer to write.  It doesn’t matter if computers are faster today, the problem is that these are huge multiplier factors that greatly reduce the speed and capacity of any system no matter how fast it is.  In my business where I’m responsible for server and network architecture, these are huge concerns of mine.  People tend to forget that the purpose of better hardware is to get better performance and capacity, it’s not so that we can maintain our performance and capacity because of bloated software.

George Ou is Technical Director of ZDNet. See his full profile and disclosure of his industry affiliations.

  • Talkback
  • Most Recent of 102 Talkback(s)
Excel vs OpenOffice Benchmark Report w/ 16 worksheets
I did a benchmark with that data posted, turned it into 16 worksheets in both Excel and OpenOffice, here are the results...

Machine
====================
Intel Pentium 4 2.8 GHz, 1 GB RAM<... (Read the rest)
Posted by: coaster_z Posted on: 09/21/05 You are currently: a Guest | | Terms of Use
Just for comparison  Yagotta B. Kidding | 09/09/05
Just generate it yourself  george_ou | 09/09/05
Actually, get it here  george_ou | 09/09/05
Th eproblem is with the software, not XML  figgle | 09/13/05
Don't even go there  george_ou | 09/13/05
well said !  mbraincell@... | 09/09/05
The trouble with generalizing is.....  figgle | 09/13/05
You're not paranoid, you're just wrong  george_ou | 09/13/05
The cut-off point is relative  John CarrollZDNet Moderator | 09/09/05
This is scary  Yagotta B. Kidding | 09/09/05
But why not a binary format?  george_ou | 09/09/05
Never too much  Yagotta B. Kidding | 09/09/05
nonsense  george_ou | 09/09/05
Pixel count  Yagotta B. Kidding | 09/10/05
Panning is the point  george_ou | 09/10/05
Tradeoffs  Yagotta B. Kidding | 09/09/05
Processors speed  ramien@... | 09/12/05
x8  Yagotta B. Kidding | 09/09/05
Compression usually isn't the answer  george_ou | 09/09/05
Notepad  Patrick Jones | 09/10/05
Try telling the average user to use "compressed folders"  george_ou | 09/10/05
Don't have to  Patrick Jones | 09/12/05
But forget about editing and saving it... NT  mlynch1234 | 09/16/05
Try 16 sheets, not 8  george_ou | 09/09/05
XML file  Yagotta B. Kidding | 09/09/05
I think it's just too big for Open Office to handle  george_ou | 09/09/05
No, it's the data  Yagotta B. Kidding | 09/10/05
Can you post results for 16?  george_ou | 09/10/05
Sorry for typo, meant to say "half the sheets"  george_ou | 09/11/05
28 second load time is for Open Office? Only 2 seconds with Excel!  george_ou | 09/09/05
Parsing  Yagotta B. Kidding | 09/09/05
Don't blame the disk  george_ou | 09/09/05
But why XML?  theharmonyguy_z | 09/09/05
It's a topic because XML is being pushed for everything  george_ou | 09/09/05
So Microsoft is wrong?  Patrick Jones | 09/10/05
Yes, Microsoft is wrong in case  george_ou | 09/10/05
You should be sending messages to Microsoft!  B.O.F.H. | 09/10/05
It's not just Microsoft  george_ou | 09/10/05
Perhaps you should be explaining to them their errors.  B.O.F.H. | 09/11/05
All I can do is offer my point of view  george_ou | 09/11/05
So how does Excel do in large scale electronic publishing?  B.O.F.H. | 09/10/05
Bloat is bloat  george_ou | 09/10/05
Depends on the application context.  B.O.F.H. | 09/10/05
Desktops are just an example  george_ou | 09/10/05
Neither Java nor XML are targeted to what you want.  B.O.F.H. | 09/10/05
OS bloat is only once  george_ou | 09/11/05
re : Windows XP works fine on 64 MBs of RAM so long as you don't try to run  JasonL31 | 09/13/05
MS Office runs fine on a WinXP machine with 64 MB ram  george_ou | 09/13/05
If this is XML, YOU can have it!  jbaviera@... | 09/13/05
Fortunately, MS Office allows multiple save formats.  No_Ax_to_Grind | 09/10/05
Except, oddly enough  Yagotta B. Kidding | 09/10/05
Nothing odd...  No_Ax_to_Grind | 09/13/05
yeah, ok - we will see  JasonL31 | 09/13/05
Crapily done multiple formats!  An_Axe_to_Grind | 09/10/05
Mine seem to work fine...  No_Ax_to_Grind | 09/13/05
Have you looked ate the output of some of these formats?  B.O.F.H. | 09/10/05
Let the squirrel go!  Patrick Jones | 09/10/05
46 seconds versus 2 seconds is "not that far"?  george_ou | 09/10/05
Yes...  Patrick Jones | 09/11/05
Stop talking about minutes and talk about percentage  george_ou | 09/11/05
No FUD here  Patrick Jones | 09/11/05
It's deliberate...  figgle | 09/13/05
It's worse with Open Office  george_ou | 09/13/05
Java  Anti_Zealot | 09/14/05
Java is a beast, but this can be blamed on XML  george_ou | 09/14/05
Just wrote my own XML Database and its faster than SQL Server  wildranger | 09/10/05
Only in your particular instance  george_ou | 09/10/05
yes, that is true...  wildranger | 09/10/05
Look at how "efficient" Open Office on XML is  george_ou | 09/10/05
Does your "database" do the following  jorwell | 09/12/05
Sorry, you both still missed the point...  wildranger | 09/12/05
If it is catching up with SQL  jorwell | 09/13/05
St. George versus the XML Dragon?  Robert Crocker | 09/12/05
But that isn't the point of view for George  JJ_z | 09/12/05
Text Based Standard  Patrick Jones | 09/12/05
Don't think you understand George's point  JJ_z | 09/12/05
I understand George's point  Patrick Jones | 09/12/05
That's all I ask  george_ou | 09/12/05
My Matrix text is purple with blue highlights  Patrick Jones | 09/12/05
No, it IS his point of view  Robert Crocker | 09/12/05
Cross-purposes  cd2_z | 09/12/05
Big difference  george_ou | 09/12/05
XML is still better than proprietary solutions  wildranger | 09/12/05
You haven't answered a single question to the blog  george_ou | 09/12/05
Open Office uses XML  George Jay | 09/12/05
Yes, but what about the slowness?  george_ou | 09/12/05
George, you have to accept some compromises ...  George Jay | 09/13/05
Windows and Office is not bloated  george_ou | 09/13/05
Cannot argue with illogical statements ...  George Jay | 09/13/05
Open Office is slow to start and slow to use  george_ou | 09/13/05
What is the purpose of Mindsweeper in Windows?  George Jay | 09/13/05
Looks like you've run out of ideas  george_ou | 09/13/05
No, I haven't run out of ideas ...  George Jay | 09/13/05
Sounds like you have  george_ou | 09/13/05
Why does Windows have Minesweeper?  George Jay | 09/14/05
Still don't quite get it  compguy_z | 09/13/05
You're not alone  george_ou | 09/13/05
I dno't always agree with you George, but...  crash89 | 09/14/05
Develop in assembly language for performance.  praveencv@... | 09/14/05
Assembly maps directly to binary  george_ou | 09/14/05
Some comments on multiple posts and general discussion  coaster_z | 09/20/05
Excel vs OpenOffice Benchmark Report w/ 16 worksheets  coaster_z | 09/21/05

What do you think?

SponsoredWhite Papers, Webcasts, and Downloads

Click Here
advertisement

Recent Entries

Top Rated

    Archives

    ZDNet Blogs

    White Papers, Webcasts, and Downloads