Home Section Blog Technology Beyond Recursive Descent
Beyond Recursive Descent PDF Print E-mail
Blog - Technology
Written by Dennis Reinhardt   
Saturday, 21 February 2009 19:01
I have been writing a parser to indent JavaScript code.  To simplify discussion, assume that all statements look like A(B)C where A, B, and C are a series of characters where A contains no parenthesis and B and C can contain parenthesis which balance.  Let's say the routine which indents this code is

    indent(fragment, level)

In pseudo-code, indent looks like this:

    indent(fragment, level):
        fragment = entire_file
        while fragment not empty:
            A,B,C = split_on__matched_parenthesis(fragment)
            if terms split into 3:
                emit(A + "(")
                indent(B, level+1)
                emit(")")
                fragment = C
            else:
                emit fragment
                fragment = empty

Thus, 123(456(abc))789 is emitted as

    123(
        456(
            abc
        )
    )
    789

Note that indent calls itself twice.  That is what makes it recursive.  However, this code can overflow any reasonable size stack.  I used a 200+K jQuery library to test my code and it would not run to completion.  The call stack overflowed.  

Looking at this code, we see that the recursion is held open until all terms are analyzed.  In order to even get this code to run, I had to change the recursion on C (tail recursion, so named because it is at the tail of the fragment) into iteration:

    indent(fragment, level):
        fragment = entire_file
        while fragment not empty:
            fragment, level = indent_step(fragment, level)

    indent_step(fragment, level):
        fragment = entire_file
        A,B,C = split_on__matched_parenthesis(fragment)
        if terms split into 3:
            emit(A + "(")
            indent_step(B, level+1)
            emit(")")
            fragment = C
        else:
            emit fragment
            fragment = empty
        return fragment

Thus, we exit the recursive routine at the tail end and relieve pressure on the subroutine stack.  Doing this, I got the jQuery library to run to completion in 3 minutes.  With further work, I changed the recursion still remaining to iteration and reduced the run time to 10 seconds.  

However, the overall structure was a recursive architecture with iterative hacks. The code was fragile and more than once I spent several hours tracking down a bug.

I have now re-written this as a purely iterative algorithm.  The fragment is scanned left to right for "(" (in our simplified example) and an indentation or emit decision made depending on how far ahead the matching ")" is.  The decision is recorded on a stack and the ")" consults the stack to see what decision its matching "(" made.  Unlike a recursive call stack which holds all possible subroutine calls until the entire evaluation is made, the iterative stack never grows any deeper than the nesting depth of the of the balanced parenthesis in the original fragment.

The purely iterative algorithm now runs in approximately 1 second.  A benefit of the iterative approach is that it is very easy to check the stack for matching punctuation types.  That could theoretically be done in a recursive approach but is incompatible with running fast.  It was necessary to bound the length of search for matching character in both approaches and this prevents error checking in the recursive approach but not the iterative approach.
Trackback(0)
Comments (0)Add Comment

Write comment

busy
Last Updated on Saturday, 21 February 2009 19:05
 
Copyright © 2010 TextToolKit. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.