Graph Unification---Simultaneous traversal of two graphs

7.5 Graph Unification---Simultaneous traversal of two graphs

In this section we explore a more sophisticated application of depth first graph traversal, namely the unification of typed feature structures represented as graphs, or TFS unification for short. This is based on a straightforward depth first traversal algorithm, but this time we will be traversing two graphs, and we we will be doing it simultaneously.

7.5.1 An introduction to typed feature structures

We begin with a short description of what we will mean here by ``typed feature structure''. Then we will consider how to represent them using a graph data type.

We define typed feature structure terms with the following little grammar. (Single quotes are meant to indicate that the enclosed character appears literally in the expression, that is, it is a terminal of the grammar, but the quotes themselves are not meant to appear. They just indicate that something is a terminal.)

átermñ	¾®	átfs-structñ \| ávariableñ	(7.1)
átfs-structñ	¾®	átype-nameñ `/' list(áfv-pairñ)	(7.2)
áfv-pairñ	¾®	áfeature-nameñ`:'átermñ	(7.3)

Observe that the definition is close to the definition of logical terms: we have structures and we have variables. (We treat constants as structures of arity 0; in this case that means type names associated with empty lists.) We emphasize that typed feature structures are in many respects generalizations of logical terms; or, alternatively, logical terms are special cases of typed feature structures: Functors in logical terms correspond to types and features in typed feature structures are represented positionally in logical terms. (That is, logical terms are typed feature structures over a signature in which the only features are first-argument, second-argument and so forth.) So the main new things we have to deal with are the possibility of unifying two structures with different numbers of arguments, and the possibility that corresponding arguments may appear in different positions in their respective structures: it is the name of the feature that matters, not its position in the list.

Just as with logical terms, structure sharing is indicated here through the use of variables. To take a simple case, if we have a TFS of type f with two features i and j, both of which take us to the very same substructure (say a structure g/[], of type g with no features), then we can express this with a set of equations like the following.

{A = f/[i:B,j:C], D = g/[], B = D, C = D }

Typed feature structures give us a convenient way of representing constraints that might hold of a structure. Consider the augmented phrase structure rules in Fig. 7.4. Here we see a number of constraints expressed in terms of equations over feature values. We can express essentially the same constraints using typed feature structures. In fact, the augmented phrase structure grammar makes no direct use of types except to represent ``constants'' like np or third. For purposes of illustration we will add ``functor'' types phrase, lbl, and ref (for ``phrase structure,'' ``label'' and ``reference'', respectively) so that now all of our feature structures are typed. The result is given in Fig. 7.5.

x₀ ® x₁ x₂

x₀.cat = s

x₁.cat = np

x₂.cat = vp

x₁.agr = x₂.agr

x₀ ® x₁ x₂

x₀.cat = vp

x₁.cat = v

x₂.cat = np

x₀.agr = x₁.agr

x₀ ® ``nature''

x₀.cat = np

x₀.agr.per = third

x₀.agr.num = sing

Figure 7.4: Some grammar rules, augmented with constraints.

phrase/[x0:lbl/[cat:s/[]],
        x1:phrase/[x0:lbl/[cat:np/[],
                           agr:X ]],
        x2:phrase/[x0:lbl/[cat:vp/[],
                           agr:X ]]]


phrase/[x0:lbl/[cat:vp/[],
                agr:X ],
        x1:phrase/[x0:lbl/[cat:v/[],
                           agr-X ]],
        x2:phrase/[x0:lbl/[cat:np/[]]]]


phrase/[x0:lbl/[cat:np/[],
                agr:ref/[per:third/[],
                         num:sing/[]]]]

Figure 7.5: Some grammar rules, TFS style.

We can also represent such grammatical constraints with graphs. Considering the grammar rules in figure 7.4, it is clear how to construct a graph from the path expressions and equations there. Paths correspond to paths through a graph, except that the path reflects the arcs traversed rather than the nodes visited. Equations between paths and symbols indicate that the node found at the end of the path is labeled with the symbol and equations between paths indicate that both paths lead to the same place. Typed feature structures can be thought of as graphs in the same way. Again, arcs are labeled with features and now all nodes are labeled with types. Just as with Prolog terms, the appearance of a variable in two places is taken to mean that those two nodes are the same. That is, referring to the s-example in figure 7.5, the node reached by following the agr arc from the ``subject'' label node is identical to the node reached by following the agr arc from the ``predicate'' label node. The important thing to keep in mind is just this, that shared variables indicate shared structure.

There is an extra aspect to typed feature structures which is not readily apparent by comparing them to grammar rules like those in figure 7.4. Namely, we can add to our grammar an elaborate type hierarchy which can tell us, for example, that transitive verbs are a subtype of verb or that proper names are a subtype of noun phrase. Basically, statements of type inclusion (``type t is a type s,'') are implications and support a rich array of possible inferences. In what follows we will make use of only a tiny fragment of these possibilities. We will assume one special type which we will call top, which is at the very top of the hierarchy: everything is a top. The rest of our types are all distinct and bear no relation to each other. We will make use of typed feature structures of the form top/[] as simple placeholders: they indicate that a particular node must exist, but we know no more about it than that it exists, and that it may be found at the end of one or another arc.

So let us consider now how to encode these graphs using our adjacency list notation. Each TFS corresponds to a node in the graph, namely the node which is its root. Node names will just be positive integers, and we adopt the convention that the node named i will be stored in position i of the graph. In practice, this means that we access the graph with calls to nth/3 rather than member/2. As an example, consider figure 7.6. There node 1 of the graph corresponds to the entire feature structure at the top of the figure, that is, node 1 represents the root of a feature structure of type phrase whose x0 attribute has as its value the feature structure corresponding to node 2, a structure with type lbl, whose cat feature has as its value a feature structure of type s, corresponding to node 5, and so on.

    phrase/[x0:lbl/[cat:s/[]],
            x1:phrase/[x0:lbl/[cat:np/[],
                               agr:X ]],
            x2:phrase/[x0:lbl/[cat:vp/[],
                               agr:X ]]]


          [1-phrase/[x0:2, x1:3, x2:4],
           2-lbl/[cat:5],
           3-phrase/[x0:6],
           4-phrase/[x0:7],
           5-s/[],
           6-lbl/[cat:8, agr:9],
           7-lbl/[cat:10, agr:9],
           8-np/[],
           9-top/[],
           10-vp/[]
           ]

Figure 7.6: From feature structures to graph data structures.

7.5.2 The unification operation on graphs

In order to unify two feature structure terms we make the simplifying assumption that both terms are substructures of a single graph. We can always make this assumption since graphs, in general, are not required to be connected. We do require that whatever substructures are meant to represent typed feature structures be rooted and hence connected. But the entire graph need not be. We presume then that prior to the execution of the unification procedure both terms to be unified (``unifcands'') have been stored in a single graph. So the arguments to a call to the unification procedure will be node names in the input graph, rather than entire terms.

In a rather deep sense this overall graph corresponds more closely to computational working memory than to a grammatical representation. Node names, following our naming convention, are addresses in this memory. So we can think of adjacency records like lbl/[cat:8,agr:9] as arrays of pointers, with feature names as their indices. Note however that there is a very important way in which Prolog terms (like the lists we are using as our basic data structures) are not like a computer's memory. We can instruct a computer to replace the contents of memory address x with something new, but we cannot do that to a logical term. There is no assignment operation in Prolog. Instead, when we need to write programs, i.e., definite clause theories, about objects that evolve over the course of a computation, we use Prolog terms to denote states of this object, and write predicates which tell us how one state of the object is related to its next state. A simple example is the familiar list processing predicate select/3, which is a relation between two ``states'' of a list. The graph predicates which mark nodes as visited by removing them from the graph are a more elaborate example in which the successive states of the graph are indicated by a long string of graph-valued variables.

We will implement here a species of destructive unification. That means we will take our two terms (or subgraphs), construct a most general unifier and then replace the root of the first subgraph with a pointer to the root of the second, and replace the second subgraph with the unifier we have just computed. This means we will enrich our data structure with a new kind of expression, this ``pointer'' which I just referred to. Pointers will be represented by terms of the form @N, where N is a node name, that is, an address in our overall graph.

This adds a level of abstraction in the way our data structure represents an intuitively given graph. On the one hand our data structure is an ordinary sort of graph in which we have either labeled nodes with a collection of labeled outgoing arcs (i.e., ordinary adjacency records) or else special sorts of unlabeled nodes with a single unlabeled outgoing arc (i.e., our ``pointers''; a pointer term of the form @7 represents an unlabeled node with a single unlabeled arc leading to node 7).

On the other hand, this graph over two sorts of nodes is meant to represent a graph over nodes only of the first sort. We can recover our intended graph by collecting together a node of the first sort (i.e., a node with a standard adjacency record) and all of the nodes of the second sort which point to it, forming an equivalence class. Each such ``pointer equivalence'' class corresponds to a single node in the intended graph and so, transitively, to the single feature structure of which that node is the root.

7.5.3 The `unif.pl` Program

The Main Routine

As promised the main routine is essentially a simultaneous depth first search of two (sub-) graphs. The top-level predicate unify(P,Q,G0,G) takes two pointers (i.e., node names) P and Q which point to the roots of the unificands in the graph G0. Output is the modified graph G, in which both P and Q point to the root of their most general unifier.

*/
:- op(200,fx,@).

unify(P,Q,Graph0,Graph):-
    get(P,Graph0,PVal),
    get(Q,Graph0,QVal),
    unif(PVal,QVal,P,Q,Graph0,Graph).
/*

First unify/4 calls get/3 twice to fetch whatever is stored in locations P and Q from Graph0. Then control passes to unif/6, which handles the various different cases, depending on what we find in PVal and QVal.

There are two kinds of expression that can be stored as nodes in the graph---terms and pointers---and there are two subgraphs to consider so there are four possible cases for unif/6 to handle. In three of these cases at least one of the unificands is a pointer expression. In that case we simply get whatever it points to and continue in this wise until both arguments are terms. We keep track of which locations we are examining in the fourth and fifth argument positions. When we finally get two terms as unificands, we first compute the type of their unifier (if such a type can be found), then we call unify/4 recursively on each arc in the adjacency record of the first term (FS1). When the recursion on P's and Q's neighbors is complete, we take the graph as it is output from recursive_unify/6 (G2) and update it so that P now points to Q.

*/
unif(Typ/FS,@Q,P,_,G0,G):-
        get(Q,G0,QVal),
        unif(Typ/FS,QVal,P,Q,G0,G).
unif(@P,Typ/FS,_,Q,G0,G):-
        get(P,G0,PVal),
        unif(PVal,Typ/FS,P,Q,G0,G).
unif(@P,@Q,_,_,G0,G):-
        get(P,G0,PVal),
        get(Q,G0,QVal),
        unif(PVal,QVal,P,Q,G0,G).

unif(Typ1/FS1,Typ2/FS2,P,Q,G0,G):-
        type_inference(Typ1,Typ2,Q,G0,G1),
        recursive_unify(FS1,FS2,Q,G1,G2),
        point(P,Q,G2,G).
/*

In general there are three cases to consider when we are comparing the arcs of two feature structures. A given feature may appear in both unificands, only in the first one, or only in the second. However, since we are storing the result in the location of the second unificand, we can ignore the third case: all arcs belonging only to the second unificand are already correctly represented in the input. In case a given feature F has values in both feature structures (clause 2), then we must unify those values. (This is where this algorithm takes on the appearance of a depth first traversal. Remember that unify/4 will ultimately call recursive_unify/6, giving us the familiar double recursion, first on the neighbors of V1 and then afterwards on its ``siblings''.) In case F is not present in FVs2, then we must copy F:V to FVs2, which is stored in the graph at location Loc, but we do nothing further with the structure at location V.

*/
recursive_unify([],_FVs,_Loc,G,G).
recursive_unify([F:V1|FVs1],FVs2,Loc,G0,G):-
    select(F:V2,FVs2,FVs2a),
    unify(V1,V2,G0,G1),
    recursive_unify(FVs1,FVs2a,Loc,G1,G).
recursive_unify([F:V|FVs1],FVs2,Loc,G0,G):-
    nonmember(feature,F,FVs2),
    copy(F:V,Loc,G0,G1),
    recursive_unify(FVs1,FVs2,Loc,G1,G).
/*

Utility Predicates

Most of the remaining predicates are straightforward extensions of nth/3. The only significant exception is type_inference/5 which must compute the type of the new structure. This is done in the following way: the type of the unifier is defined to be the most specific possible type which is at least as general as the types of the unificands. Formally, we conceive of the type hierarchy as a lattice, ordered by the partial order ``more general than''. In such a setting, we are looking for the greatest lower bound of the two input types. Details of this calculation are found in the Test Data section. Like the list of particular features that are allowed, and the list of possible types, the type hierarchy can be freely varied according to the problem at hand, without disturbing the core algorithm.

*/
% Get node record Value with name Ptr in Graph.
get(Ptr,Graph,Value):- member(Ptr-Value,Graph).

% Make node name P in G0 point to node name Q in G.
point(P,Q,G0,G):- replace(P,@Q,G0,G).

copy(Arc,Loc,G0,G):-
        add_arc(Loc,Arc,G0,G).

type_inference(T1,T2,Loc,G0,G):-
        glb(T1,T2,T), % defined in test data section
        replace_type(Loc,T,G0,G).


member(X,[X|_]).
member(X,[_|L]):- member(X,L).

nth(1,[X|_],X).
nth(K,[_|L],X):- K>1, J is K-1, nth(J,L,X).

replace( Key, Ptr, [Key-_|Rest], [Key-Ptr|Rest] ).
replace( Key, Ptr, [Hd|Tail1], [Hd|Tail2] ):-
        replace( Key, Ptr, Tail1, Tail2 ).

replace_type( Key, T, [Key-_/FVs|Rest], [Key-T/FVs|Rest] ).
replace_type( Key, T, [Hd|Rest1], [Hd|Rest2] ):-
        replace_type( Key, T, Rest1, Rest2 ).

add_arc( Key, Arc, [Key-T/Arcs|Rest], [Key-T/[Arc|Arcs]|Rest] ).
add_arc( Key, Arc, [Hd|Tail1], [Hd|Tail2] ):-
        add_arc( Key, Arc, Tail1, Tail2 ).

select(X,[X|L],L).
select(X,[Y|L1],[Y|L2]):- select(X,L1,L2).

nonmember(_Type,_,[]).
nonmember(Type,X,[Y|L]):-
        neq(Type,X,Y),
        nonmember(Type,X,L).

neq(feature,F1,F2:_):-
        features(Feats), % defined in test data section
        domain_neq(F1,F2,Feats).

domain_neq(X,Y,[X|Dom]):- member(Y,Dom).
domain_neq(X,Y,[Y|Dom]):- member(X,Dom).
domain_neq(X,Y,[_|Dom]):- domain_neq(X,Y,Dom).

/*

Test Data

This section contains code which is not considered part of the core program. In particular, besides test data, it includes the definitions of features/1 and glb/3.

Greatest lower bounds in our type hierarchy are simple to compute. If one of the types is top then the other one is our solution. If neither one is top, then they must be identical. Otherwise no solution exists. A little informally, we are saying that the most specific type which is both a T and a top is T itself. (T in this case can of course also be top.) If T1 and T2 are different, and neither is top, then there are no types that are both T1 and T2 (so a fortiori there is no most specific such type), so the computation fails in this case.

*/
glb(top,T2,T2).
glb(T1,top,T1):- 
        definite(T1).
glb(T,T,T):- definite(T).

definite(T):- member(T,[f,g,h]).

features([1,2,3,4,a,b,c,d]).


test(1):- test(1,1,5,X), report(1,1,5,X).
test(2):- test(2,1,3,X), report(2,1,3,X).
test(3):- test(3,1,5,X), report(3,1,5,X).

test(Graph,FS1,FS2,Result):- 
    data(Graph,G0), 
    unify(FS1,FS2,G0,Result).

report(Graph,Start1,Start2,Result):-
        data(Graph,G),
        nl,
        print('Input: '), print( G ), nl,
        print('A: '), print(Start1), print(', B: '), print(Start2), nl,
        print('Output: '), print(Result), nl.

data(1, % X=h/[],Y=h/[],f[a:g[a:X],c:g[a:X]]=f[a:g[a:Y],b:g[a:Y]].
        [1-f/[a:2,c:3], %1
         2-g/[a:4],
         3-g/[a:4],
         4-h/[],
         5-f/[a:6,b:7], %5
         6-g/[a:8],
         7-g/[a:8],
         8-h/[]
        ]).
data(2, % f[a:X,b:X]=f[a:g[b:Y,c:Z]].
        [1-f/[a:2,b:2],
         2-top/[],
         3-f/[a:4],
         4-g/[b:5,c:6],
         5-top/[],
         6-top/[]
         ]).
data(3, % Roots: 1, 5.
     [1-f/[a:2,b:3],
      2-f/[a:4],
      3-f/[a:2],
      4-f/[],
      5-f/[a:6,b:7],
      6-f/[],
      7-f/[a:8],
      8-f/[a:6]
     ]).
         
/*

Running the test data

| ?- test(1).

Input: [1-f/[a:2,c:3],2-g/[a:4],3-g/[a:4],4-h/[],5-f/[a:6,b:7],6-g/[a:8],7-g/[a:8],8-h/[]]
A: 1, B: 5
Output: [1- @5,2- @6,3-g/[a:4],4- @8,5-f/[c:3,a:6,b:7],6-g/[a:8],7-g/[a:8],8-h/[]]

yes
{source_info}
| ?- test(2).

Input: [1-f/[a:2,b:2],2-top/[],3-f/[a:4],4-g/[b:5,c:6],5-top/[],6-top/[]]
A: 1, B: 3
Output: [1- @3,2- @4,3-f/[b:2,a:4],4-g/[b:5,c:6],5-top/[],6-top/[]]

yes
{source_info}
| ?- test(3).

Input: [1-f/[a:2,b:3],2-f/[a:4],3-f/[a:2],4-f/[],5-f/[a:6,b:7],6-f/[],7-f/[a:8],8-f/[a:6]]
A: 1, B: 5
Output: [1- @5,2- @6,3- @7,4- @6,5-f/[a:6,b:7],6- @8,7-f/[a:8],8-f/[a:6]]

yes

Note that in both cases the computed solution can be found by starting at the first member of the graph. So, in the first case, we start at location 1 which contains the pointer @5, so we first go to location 5, and continue from there.

*/