Monday 27 January 2014

Depth-first search and breadth-first search in Python

I recently had a tree structure like this (actually from the Gene Ontology hierarchy), and wanted to traverse the tree:
If you start at node N1, there are two ways to traverse the tree:

(i) a depth-first search: this uses a 'stack', and focuses on the last item that you put onto the stack. Starting at node N1, we first put N1 on the stack. The stack is the list [N1] now.
Then we take the last item off the stack: N1 in this case. We take the parent nodes of this item (N2, N3, N4) and add them to the stack. The stack contains list [N2, N3, N4] now.
Then we take the last item off the stack: N4. We take the parent nodes of this item (N3). It is already in the stack. The stack contains list [N2, N3] now.
We take the last item off the stack: N3. We take the parent nodes of this item (N6, N7), and add them to the stack. The stack contains list [N2, N6, N7] now.
We take the last item off the stack: N7. This item has no parents. The stack is [N2, N6] now.
We take the last item off the stack: N6. We add its parents (N13) to the stack, so the stack is [N2, N13].
We take the last item off the stack: N13. It has no parents. The stack is [N2] now.
We take the last item off the stack: N2. It has no parents. The stack is empty now, so we've finished.
That is, a depth-first traversal starting at node N1 takes this route:
(ii) a breadth-first search: this uses a 'queue', and focuses on the first item that you put on the queue.
Starting at node N1, we first put N1 on the queue. The queue is list [N1] now.
Then we take the first item off the queue: N1 in this case. We take the parent nodes of this item (N2, N3, N4) and add them to the queue. The queue is list [N2, N3, N4]  now.
We then take the first item off the queue: N2 in this case. N2 has no parents. The queue is [N3, N4] now.
We take the first item off the queue: N3 in this case. Its parents (N6, N7) are added to the queue, so the queue is [N4, N6, N7] now.
We take the first item off the queue: N4. We have already seen this parent (it was in the queue previously), so we ignore it. The queue is [N6, N7] now.
We take the first item off the queue: N6. We add its parents (N13) to the queue, so the queue is [N7, N13] now.
We take the first item off the queue: N7. It has no parents. The queue is [N13] now.
We take the first item off the queue: N13. It has no parents. The queue is empty now.

That is, a breadth-first search starting at node N1 takes this route:


Python code for depth-first and breadth-first searches:
My python script tree_traversal.py has code for depth-first and breadth-first traversal of this tree (thanks to my husband Noel for help with this!) If you run it, you should get output showing the stack at each point in the depth-first search, and the queue at each point in the breadth-first search:

% python3 tree_traversal.py
Depth-first search:
('stack=', [('N1', 0)])
('stack=', [('N2', 1), ('N3', 1), ('N4', 1)])
('stack=', [('N2', 1), ('N3', 1)])
('stack=', [('N2', 1), ('N6', 2), ('N7', 2)])
('stack=', [('N2', 1), ('N6', 2)])
('stack=', [('N2', 1), ('N13', 3)])
('stack=', [('N2', 1)])
{'N13': 3, 'N1': 0, 'N2': 1, 'N3': 1, 'N4': 1, 'N6': 2, 'N7': 2}


Breadth-first search:
('queue=', [('N1', 0)])
('queue=', [('N2', 1), ('N3', 1), ('N4', 1)])
('queue=', [('N3', 1), ('N4', 1)])
('queue=', [('N4', 1), ('N6', 2), ('N7', 2)])
('queue=', [('N6', 2), ('N7', 2)])
('queue=', [('N7', 2), ('N13', 3)])
('queue=', [('N13', 3)])
{'N13': 3, 'N1': 0, 'N2': 1, 'N3': 1, 'N4': 1, 'N6': 2, 'N7': 2}


Notes on the script: pop() takes the item from the end of a list, removes it and gives it to you.

No comments: