Wednesday, November 25, 2009

Javascript, Arrays and more than 1 million HTML tags

So, we have a set of code that does this:
var els = node.getElementsByTagName("*");
var elsLen = els.length;
var pattern = new RegExp('(^|\\s)'+searchClass+'(\\s|$)');
for (var i = 0, j = 0; i < elsLen; i++) {
if ( pattern.test(els[i].className) ) {
classElements[j] = els[i];
j++;
}
}
It's a pretty simple for loop that gets all elements on a page and finds items that belong to a class. We use it on our printing to help alleviate some bad HTML in reports (HTML on reports can have client generated content... it gets ugly fast).

We recently had a support call come in that reported on an error on very large files. The first file had 57MB of HTML. Yes, 57 megabytes. Sigh. Looking at the source code, we found 10,425 occurrences of </div> without the corresponding opening tag. Removing those caused the code to work, so I thought we had solved the issue, it was bad HTML.

The next day it came back again, this time on a report of size 75MB, after they had fixed their templates to not have the closing div without opening div. What?

The error being returned was the fantastic "null object" error, squarely on the "classElements[j] = els[i];" line. Say what?

We use the Web Browser Control (IE) in our application, so I opened it up in IE to see the issue. Some further investigation led me to the fact that it would fail when accessing the element at array position 1,000,000! Checking the element in position 999,999 was fine.

So I reworked the code to look like this:
for (clsItem in els) {
if ( pattern.test(clsItem.className) ) {
classElements[j] = clsItem;
j++;
}
}
And now it works. Apparently indirect references past 1 million work in IE.

We did suggest they cleanup their HTML (tables inside of tables inside of tables? Must have been FrontPage) to help reduce the size of the reports. The 75MB report had 1.2 million tag elements!