Although I haven't solved this exact problem in the past, I'd like to sketch out my thoughts on a straightforward approach, which may not be highly optimized. My goal is to invite other audience who has solved this problem to share their tricks.
To define the problem: Given a simple directed graph, we want to tell whether it contains any cycles.
Lets say the graph is represented in incidence edge format where each line represent an edge from one node to another node. The graph can be split into multiple files.
node1, node2 node4, node7 node7, node1 ....
Here is a general description of the algorithm.
We'll maintain two types of graph data.
- A set of link files where each line represent an edge in the graph. This file will be static.
- A set of path files where each line represent a path from one node to another node. This file will be updated in each round of map/reduce.
The main driver program will create the initial path file by copying the link file. And it will set the initial flag condition: Set cycle and change flag to true. In each round, the driver will
- Check if the cycle is still false and change is true, exit if this is not the case
- Otherwise, it will clear the change flag and invoke the map/reduce
- Emit the tuple [key=toNode, value=fromNode]
if it is a path
- Emit the tuple
[key=fromNode, value=toNode] if it is a link
- Check to see if the beginning point of the path is the same as the endpoint of the link. If so, it detects a cycle and mark the cycle flag to be true.
- Otherwise, it compute the product of every path to every link, and form the new path which effectively extend the length by one.
The following diagram illustrates the algorithm ...