Pragmatic Programming Techniques: MapReduce: Detecting Cycles in Network Graph

Wednesday, January 2, 2013

MapReduce: Detecting Cycles in Network Graph

I recently received an email from an audience of my blog on Map/Reduce algorithm design regarding how to detect whether a graph is acyclic using Map/Reduce. I think this is an interesting problem and can imagine there can be wide range of application to it.

Although I haven't solved this exact problem in the past, I'd like to sketch out my thoughts on a straightforward approach, which may not be highly optimized. My goal is to invite other audience who has solved this problem to share their tricks.

To define the problem: Given a simple directed graph, we want to tell whether it contains any cycles.

Lets say the graph is represented in incidence edge format where each line represent an edge from one node to another node. The graph can be split into multiple files.

node1, node2
node4, node7
node7, node1
....

Here is a general description of the algorithm.

We'll maintain two types of graph data.

A set of link files where each line represent an edge in the graph. This file will be static.
A set of path files where each line represent a path from one node to another node. This file will be updated in each round of map/reduce.

The general idea is to grow the path file until either the path cannot grow further, or the path contain a cycle, we'll use two global flags to keep track of that: "change" flag to keep track of whether new path is discovered, and "cycle" flag to keep track whether a cycle is detected.

The main driver program will create the initial path file by copying the link file. And it will set the initial flag condition: Set cycle and change flag to true. In each round, the driver will

Check if the cycle is still false and change is true, exit if this is not the case
Otherwise, it will clear the change flag and invoke the map/reduce

Within each round, the map function will do the following

Emit the tuple [key=toNode, value=fromNode] if it is a path
Emit the tuple [key=fromNode, value=toNode] if it is a link

This ensure a path and all extended link reaches the same reducer, which will do the following

Check to see if the beginning point of the path is the same as the endpoint of the link. If so, it detects a cycle and mark the cycle flag to be true.
Otherwise, it compute the product of every path to every link, and form the new path which effectively extend the length by one.

The following diagram illustrates the algorithm ...

1 comment:

Unknown said...: Good Algorithm but can you explain what does list_of_tuples stand for in the reducer function?

Thanks; September 13, 2014 at 6:58 PM