7.3 Dealing with Cycles
We consider first the problem of finding paths in a graph.
The problem of finding traversals will then be seen as a
straightforward extension of our solution to the problem of finding
paths in a cyclic graph.
Furthermore, we begin by representing graphs directly as Prolog
programs, using an explicit arcs/2 relation to implement a
collection of adjacency lists. This simplifies our path-finding
predicate slightly. Then we move on to consider the case where the
graph is represented as a list of adjacency lists, and so must be
passed around as an argument to the relevant predicates and accessed
with general-purpose list processing predicates like member/2.
Suppose we have the graph illustrated in figure 7.3.
The set of nodes is just the four element set {a,b,c,d }, and
the arc relation is as given below.
arcs( a, [b,c] ).
arcs( b, [a,d] ).
arcs( c, [b] ).
arcs( d, [] ).
Figure 7.3: A sample graph.
Naively, we might define a path/3 predicate for this graph in
essentially the same way that we implemented it for trees, that is, by
simply taking the reflexive transitive closure of the arc
relation, compactly encoded by the arcs/2 predicate.
path( X, Y, [] ) :- X = Y.
path( X, Y, P ) :-
arcs( X, ArcList ),
member( Z, ArcList ),
P = [Z|PTail],
path( Z, Y, PTail ).
However, because there is a loop in the graph, taking us from node
b back to node a, there is an infinite branch in the search
tree for path/3. It gets worse! Because the upward pointing
arc---from b back to a---will be found first by
member/2,
before the downward pointing arc to d, Prolog will find this
infinite branch before it finds any other. And because the clause for
b is earlier than the clause for c in the definition of
arcs/2, note that Prolog will even fail to find a path from
a to c! Clearly, if we want to be able to find paths in an
arbitrary graph, something extra has to be done.
Recalling the example of finite state automata, where we looked at an
automaton that gave us essentially the same problem, we already know
one solution we can apply. We observe that the data structure
definition predicate list/1 will enumerate the lists in order
of increasing length. Given that paths are encoded as lists of
node names, we can redefine the problem
as follows.
list([]).
list([_|L]):- list(L).
?- list(P), path(a,Y,P ).
This technique, where we simulate breadth-first search using
repeated bounded depth-first searches actually has a name: it is
called iterative deepening. Here each successive call to
path/3, in effect,
deepens the search by one level of the search tree.
This solution, by the simulation of iterative deepening,
still finds an infinite number
of (ever longer) paths from a to d, for example.
Almost all of these paths, however, are arrived at by following a loop
in the graph. There are only two ways of reaching d from a
which are truly unique: [a,b,d] and [a,c,b,d]. All the
others are arrived at by following the loop.
What want to do here is not to find all of the paths but rather
all of the cycle-free paths.
A neat way to do this is to note that, after all, we are computing a
path from one node to another. To find out if we have visited a
particular node yet, we should be able to just search the path we are
constructing to see if that node is on it yet. However, path/3 is
not designed in such a way that we can do this intuitively appealing
thing. The part of the path that we have already seen is not
represented as a variable in the code, so we have no access to it; the
only part of the path which we can actually examine is the tail of the
path, the nodes we are going to visit in the future, and this is no
help at all to us! So we modify the code slightly so as to keep track
of the nodes we have actually visited, giving us path/4, below.
path( X, Y, P ) :- % path/3 redefined
path( X, Y, [X], P ).
path( X, X, Path, Path ).
path( X, Y, PIn, POut ) :-
arcs( X, L ),
member( Z, L ),
nonmember( Z, PIn ), % Been here before?
path( Z, Y, [Z|PIn], POut ).
This code uses a technique, called accumulator pairs,
which is used everywhere in logic
programming, and which we will examine in some detail when we study
the control and optimization of logic programs. Here we note only
that, while the logical, ``declarative'' semantics of this program may be at this
moment somewhat obscure, it should be reasonably clear what it
does. If we type in a goal like
?- path( a, d, P ).
then this code will begin by initializing the ``path-so-far''
argument to a list containing only the starting point of the
search. Then we call path/4, which does the real work. When we
finally reach our destination, and the base clause applies, then we
simply pass the contents of the ``path-so-far'' argument PIn over
to the output argument POut. This is then passed up to the
calling goal and from it to the goal that called it, and so on up the
chain of recursive calls to path/4 and thence to
path(a,d,P), our original goal. Note that whereas in path/3
we actually built up the path ``on the way up out'' of the recursion,
here we are building it up ``on the way down in.'' This means that at
every point, we can see where we have been so far, by examining the
PIn variable. In this case we do that with a call to
nonmember/2.
As usual, using nonmember/2 requires defining neq/2, in this
case for node names. We can do that in the usual way, with a
collection of unit clauses like ``neq(a,b).'' and so forth. But
neq/2, like any binary relation, can be seen as defining a
graph. This graph has the same set of nodes as the original graph of
figure 7.3, but different arcs. There is an arc between
two nodes in the ``neq-graph'' whenever they are different
nodes. Note that this relation is symmetric, and so technically the
neq-graph is our first example of an undirected graph, but
we can still use the same adjacency list representation scheme, as
follows.
neq( Node1, Node2 ) :-
neq_arcs( Node1, L ),
member( Node2, L ).
neq_arcs( a, [b,c,d] ).
neq_arcs( b, [a,c,d] ).
neq_arcs( c, [a,b,d] ).
neq_arcs( d, [a,b,c] ).
With this code in place, we can now safely search in a graph, even
when it contains cycles.
This approach works well when the graph is represented as a Prolog
program, with an arcs/2 predicate defined by a collection of unit
clauses. There is another even neater approach which we can apply when we
represent our graphs using adjacency lists. This approach uses
select/3 to mark a node as visited by actually
removing that node from the graph. Suppose, then, that our
graph is encoded as an instance of the graph/1 predicate.
graph( 1, [a/[b,c],
b/[a,d],
c/[b],
d/[]
] ).
(This technique is very useful in general for encoding a set of test
data, especially when the data involve very complex Prolog terms.)
Now we want to define a path/4 predicate so that we can issue
queries like
?- graph( G ), path( a, d, G, P ).
Accordingly, we reimplement path so that it includes an argument position
for the graph itself.
path( From, To, Graph, [From|Path] ) :-
p( From, To, Graph, Path ).
p( X, Y, _G, [] ) :- X = Y.
p( X, Y, G, [Z|P] ) :-
select( X/Arcs, G, SmallerG ),
member( Z, Arcs ),
p( Z, Y, SmallerG, P ).
Note that each time through the recursive clause of p/4 we reduce
the size of the graph by removing one adjacency record from
it. The select/3 predicate fails when its list argument is empty,
so we can only call clause 2 a finite number of times before it fails
to apply, and clause 1 is
nonrecursive, so we can only call it once in any case.
So no matter which clause of p/4 Prolog selects, the recursion
is bound to terminate. From this one sees immediately that we cannot
possibly get caught in a loop: that would contradict the theorem that
this program is bound to terminate.
Now we can use the same logic to reimplement our depth first traversal
predicate so that it handles (rooted) cyclic graphs. The only
essential difference comes from the fact that this is a traversal
predicate rather than a search predicate. With p/4, if the
call to select/3 fails (the node has already been visited),
then the whole call to p/4 fails, we backtrack and try to
find a different path. With dft/4
, if the call to
select/3 fails, we must simply ignore that node and
continue. Therefore we need a call to nonmember_/2
so as to
be certain that we only ignore nodes that we really have already
visited. Similarly, where p/4 calls member/2 to
nondeterministically select one way to extend its current path,
dft/4 must execute a recursion on the entire list of
neighbors, so as to check all ways to traverse the graph.
df_traversal(GIn,[Root|Route]):-
GIn = [Root/Neighbors|GInRest], % get the root's neighbors
dft(Neighbors,GInRest,_GOut,Route).
dft([],RemainingG,RemainingG,[]).
dft([N|RestNs],GIn,GOut,[N|Route]):-
select(N/Nbrs,GIn,SmallerG),
dft(Nbrs,SmallerG,EvenSmallerG,RfromN),
dft(RestNs,EvenSmallerG,GOut,RfromRest),
append(RfromN,RfromRest,Route).
dft([N|RestNs],GIn,GOut,Route):-
nonmember_(N,GIn), % already visited
dft(RestNs,GIn,GOut,Route).
The predicate nonmember_/2
which is called in clause 3 of
df_search_2/3
is not quite the same as the predicate
nonmember/2 we have seen before, although it is about as much different
from that predicate as its name implies. The trick is that we are
asking whether or not a node is still present in a graph
and a graph is not a list of nodes, but a list of node/adjacency-list
pairs. The code for nonmember_/2
looks like the following and
assumes that neq/2 has been implemented.
nonmember_( _, [] ).
nonmember_( X, [Y/_|L] ) :-
neq( X, Y ),
nonmember_( X, L ).
So we just pull out the node-part of each node/list pair and check
that for inequality with X.