PDA

View Full Version : Dear Lord, please deliver me from RECURSION



ayanda83
29th March 2017, 15:33
Hi guys, as always, trying to figure out a recursive function just hurts my brain. The function below works perfectly but I am a bit confused by line 13 and 14. If a recursive function calls itself inside a loop, what happens to the value of "i", does it ever get incremented. Lets take the code below as an example, lets assume the "length" of the vector "children" is 5, on the first iteration, the function calls itself and then what happens to the other 4 subsequent iterations of the loop. I hope my question is clear.


void Html_Parser::tableData(GumboNode *node)
{
if (node->type != GUMBO_NODE_ELEMENT) {
return;
}
GumboAttribute* href;
if (node->v.element.tag == GUMBO_TAG_A &&
(href = gumbo_get_attribute(&node->v.element.attributes, "href"))) {
std::cout << href->value << std::endl;
}

GumboVector* children = &node->v.element.children;
for (unsigned int i = 0; i < children->length; ++i) {
tableData(static_cast<GumboNode*>(children->data[i]));
}
}

jmg227
29th March 2017, 16:17
"i" is local to the function and is stored on the call stack. When the function calls itself, a new value of "i" is created for that call. The function will call itself for each child node (which will in turn call itself for each child) until is processes a node with no children.

This is a depth-first traversal of the nodes in the HTML document.

ayanda83
29th March 2017, 16:39
If I understand you correctly, the value of "i" will be 1 on the second call to the function even though you are calling the same function that will initialize "i" to zero every time you call it (i.e. line 13).

d_stranz
29th March 2017, 17:21
"i" is a local variable to the for() loop. It actually isn't used at all in the recursion - the argument to the recursive call is the "GumboNode" pointer to the "i-th" child of the node passed in the previous level of recursion.

I think it would help you better understand if you drew a diagram that shows how the recursion progresses:



Root node
Child node 0
Grandchild node 0 (Child 0 of root node's child 0)
Great-grandchild node 0 (Child 0 of grandchild 0)
Great-grandchild node 1
Great-grandchild node 2
...
Grandchild node 1
Great-grandchild node 0 (Child 0 of grandchild 1)
Great-grandchild node 1
Great-grandchild node 2
...
Child node 1 (of root node)
Grandchild node 0 (Child 0 of root node's child 1)
Great-grandchild node 0 (Child 0 of grandchild 0)
Great-grandchild node 1
Great-grandchild node 2
...
Grandchild node 1
Great-grandchild node 0 (Child 0 of grandchild 1)
Great-grandchild node 1
Great-grandchild node 2
...
and so forth.


The integers (0, 1, 2...) are the value of "i" used to retrieve the child node that starts the next level of recursion. This is called 'depth-first" recursion because each branch of the tree starting at the root node is traversed completely to the end (by indexing the 0-th child). It then backs up on level, gets the 1-th child, then explores the tree that starts with its 0th child, until the tree starting at the root has been explored.

The recursive function you show has three criteria for stopping: first, the node passed in isn't the right type; second, the node doesn't have any children; or third, all of the children have been examined. In any case, if the recursion was called from within the for() loop, when the recursive call returns, the local value of "i" will be whatever it was when the function was called. So if the loop variable "i" was 1 when the function was called, it will be 1 again when the recursion returns back to the level where it started. It gets incremented when the loop goes around again, and a new descent into the tree starts.

The alternative is "breadth-first" recursion, which I'll let you use Google to learn about.

ayanda83
29th March 2017, 17:47
Thank you very much d_stranz for taking the time answer my question. I think I understand now. All recursion does is to suspend computation of the function every time it encounters a call to itself. So if the function calls itself 5 times then there should be 4 suspended computations on the stack before it encounters a base-case which will cause function to return and of course working backwards on the suspended computation on the stack.

high_flyer
31st March 2017, 16:11
Generally it is advisable to substitute recursion with iteration.
There are several advantages but the main two are:
1. you can't get in to an infinite loop
2. the code is easier to read which makes it less prone to error and easier to maintain.
Here is the first google result I got, there are plenty more:
http://stackoverflow.com/questions/159590/way-to-go-from-recursion-to-iteration

Also, when you refactor a recursive function to iteration you learn what it does much easier.

d_stranz
31st March 2017, 18:22
Also, when you refactor a recursive function to iteration you learn what it does much easier.

Generally because it takes quite a while to find all the bugs you introduce in the process ;)

If you don't have to worry about stack overflow (i.e. the recursion doesn't go very deep) then sometimes recursion is the simplest way to go from the mathematical expression to a program. The classic example is computing factorials: f(n) = n * f(n-1) where the ending condition is f(1) = 1. Or visiting all the nodes in a tree as in the current example.

You could spend quite a while trying to turn a simple recursive algorithm into an iterative one. If recursion works to solve your problem, then write it that way and get on with the next piece of code. Just be sure you understand when it could fail - for example, the factorial algorithm can quickly run into integer overflow.

The goal is to write correct programs, on schedule. Go with what is fastest to write and won't come back to bite you in the form of a bug.

Good link to an interesting stackoverflow discussion. Thanks.