7.5 Graph Unification---Simultaneous traversal of two graphs
In this section we explore a more sophisticated application of depth
first graph traversal, namely the unification of typed feature
structures represented as graphs, or TFS unification for short.
This is based on a straightforward depth first traversal
algorithm, but this time we will be
traversing two graphs, and we we will be doing it
simultaneously.
7.5.1 An introduction to typed feature structures
We begin with a short description of what we will mean here by ``typed
feature structure''. Then we will consider how to represent them using
a graph data type.
We define typed feature structure terms with the following
little grammar. (Single quotes are meant to indicate that the enclosed
character appears literally in the expression, that is, it is a
terminal of the grammar, but the quotes themselves are not meant to
appear. They just indicate that something is a terminal.)
|
átermñ |
¾® |
átfs-structñ | ávariableñ |
(7.1) |
|
átfs-structñ |
¾® |
átype-nameñ `/' list(áfv-pairñ) |
(7.2) |
|
áfv-pairñ |
¾® |
áfeature-nameñ`:'átermñ
|
(7.3) |
Observe that the definition is close to the definition of logical
terms: we have structures and we have variables. (We treat constants
as structures of arity 0; in this case that means type names
associated with empty lists.)
We emphasize
that typed feature structures are in many respects generalizations of
logical terms; or, alternatively, logical terms are special cases of
typed feature structures: Functors in logical terms
correspond to types and features in typed feature structures
are represented positionally in logical terms. (That is, logical terms
are typed feature structures over a signature in which the only
features are first-argument, second-argument and so
forth.)
So the main new things we have to
deal with are the possibility of unifying two structures with
different numbers of arguments, and the possibility that
corresponding arguments may appear in different positions in their
respective structures: it is the name of the feature that matters, not
its position in the list.
Just as with logical terms, structure sharing is indicated here
through the
use of variables. To take a simple case, if we have a TFS of type
f with two features i and j, both of which
take us to the very same substructure (say a structure g/[]
, of
type g with no features), then we can express this with a set
of equations like the following.
{A = f/[i:B,j:C]
, D = g/[]
, B = D, C = D }
Typed feature structures give us a convenient way of representing
constraints that might hold of a structure.
Consider the augmented phrase structure rules in Fig. 7.4.
Here we see a number of constraints expressed in terms of equations
over feature values.
We can express essentially the same constraints using typed feature
structures.
In fact, the augmented phrase structure grammar
makes no direct use of types
except to represent ``constants'' like np or third.
For purposes
of illustration we will add ``functor'' types phrase,
lbl, and ref (for ``phrase structure,''
``label'' and ``reference'', respectively) so that now all of our
feature structures are typed. The result is given in
Fig. 7.5.
|
x0 ® x1 x2 |
|
x0.cat |
= |
s |
x1.cat |
= |
np |
x2.cat |
= |
vp |
x1.agr |
= |
x2.agr
|
|
|
x0 ® x1 x2 |
|
x0.cat |
= |
vp |
x1.cat |
= |
v |
x2.cat |
= |
np |
x0.agr |
= |
x1.agr
|
|
|
x0 ® ``nature'' |
|
x0.cat |
= |
np |
x0.agr.per |
= |
third |
x0.agr.num |
= |
sing
|
|
Figure 7.4: Some grammar rules, augmented with constraints.
phrase/[x0:lbl/[cat:s/[]],
x1:phrase/[x0:lbl/[cat:np/[],
agr:X ]],
x2:phrase/[x0:lbl/[cat:vp/[],
agr:X ]]]
phrase/[x0:lbl/[cat:vp/[],
agr:X ],
x1:phrase/[x0:lbl/[cat:v/[],
agr-X ]],
x2:phrase/[x0:lbl/[cat:np/[]]]]
phrase/[x0:lbl/[cat:np/[],
agr:ref/[per:third/[],
num:sing/[]]]]
Figure 7.5: Some grammar rules, TFS style.
We can also represent such grammatical constraints with graphs.
Considering the grammar rules in figure 7.4, it is
clear how to construct a graph from the path expressions and
equations there. Paths correspond to paths through a graph, except
that the path reflects the arcs traversed rather than the
nodes visited. Equations between paths and symbols indicate that the
node found at the end of the path is labeled with the symbol and
equations between paths indicate that both paths lead to the same
place.
Typed feature structures can be thought of as graphs in the same way.
Again, arcs are labeled with features and now all nodes are labeled
with types. Just as with Prolog terms, the appearance of a variable in
two places is taken to mean that those two nodes are the same.
That is, referring to the s-example in figure 7.5,
the node reached by following the agr arc from the ``subject''
label node is identical to the node reached by following the agr
arc from the ``predicate'' label node. The important thing to keep in
mind is just this, that shared variables indicate shared
structure.
There is an extra aspect to typed feature structures which is not
readily apparent by comparing them to grammar rules like those in
figure 7.4. Namely, we can add to our grammar an elaborate
type hierarchy which can tell us, for example, that
transitive verbs are a subtype of verb or that proper names are a
subtype of noun phrase. Basically, statements of type inclusion
(``type t is a type s,'') are implications and support a rich
array of possible inferences. In what follows we will make use of only
a tiny fragment of these possibilities. We will assume one special
type which we will call top, which is at the
very top of the hierarchy: everything is a top. The rest of
our types are all distinct and bear no relation to each other. We
will make use of typed feature structures of the form top/[] as
simple placeholders: they indicate that a particular node must exist,
but we know no more about it than that it exists, and that it may be found
at the end of one or another arc.
So let us consider now how to encode these graphs using our adjacency
list notation.
Each TFS corresponds to a node in the graph, namely the node which is
its root.
Node names will just be positive integers,
and we adopt the convention that the node
named i will be stored in position i of the graph.
In practice, this means that we access the graph with calls to
nth/3 rather than member/2.
As an example, consider figure 7.6.
There node 1 of the graph corresponds to the
entire feature structure at the top of the figure, that is,
node 1 represents the root of
a feature structure of type phrase
whose x0 attribute has as its value the feature structure
corresponding to node 2, a structure with type lbl, whose
cat feature has as its value a feature structure of type
s, corresponding to node 5, and so on.
phrase/[x0:lbl/[cat:s/[]],
x1:phrase/[x0:lbl/[cat:np/[],
agr:X ]],
x2:phrase/[x0:lbl/[cat:vp/[],
agr:X ]]]
[1-phrase/[x0:2, x1:3, x2:4],
2-lbl/[cat:5],
3-phrase/[x0:6],
4-phrase/[x0:7],
5-s/[],
6-lbl/[cat:8, agr:9],
7-lbl/[cat:10, agr:9],
8-np/[],
9-top/[],
10-vp/[]
]
Figure 7.6: From feature structures to graph data structures.
7.5.2 The unification operation on graphs
In order to unify two feature structure
terms we make the simplifying assumption that
both terms are substructures of a single graph. We can always make
this assumption since graphs, in general, are not required to be
connected. We do require that whatever substructures are meant to
represent typed feature structures be rooted and hence connected. But
the entire graph need not be.
We presume then that prior to the execution of the unification
procedure both terms to be unified (``unifcands'') have been stored
in a single graph. So the arguments to a call to the unification
procedure will be
node names in the input graph, rather than entire terms.
In a rather deep sense this overall graph corresponds
more closely to computational working memory than to
a grammatical representation.
Node names, following our naming convention, are addresses in
this memory. So we can think of adjacency records like
lbl/[cat:8,agr:9] as arrays of pointers, with feature names
as their indices.
Note however that there is a very important way in which Prolog
terms (like the lists we are using as our basic data structures) are
not like a computer's memory. We can instruct a computer to replace
the contents of memory address x with something new, but we cannot
do that to a logical term. There is no assignment operation in
Prolog. Instead, when we need to write programs, i.e., definite
clause theories, about objects that evolve over the course of a
computation, we use Prolog terms to denote states of this
object, and write predicates which tell us how one state of the object
is related to its next state. A simple example is the familiar list
processing predicate select/3, which is a relation between
two ``states'' of a list. The graph predicates which
mark nodes as visited by removing them from the graph are a more
elaborate example in which the successive states of the graph are
indicated by a long string of graph-valued variables.
We will implement here a species of destructive unification.
That means we will take our two terms (or subgraphs), construct a
most general unifier and then replace the root of the first subgraph
with a pointer to the root of the second, and replace the second
subgraph with the unifier we have just computed.
This means we will enrich our data structure with a new kind of
expression, this ``pointer'' which I just referred to.
Pointers will be represented by terms of the form @N, where
N is a node name, that is, an address in our overall graph.
This adds a level of abstraction in the way our data structure
represents an intuitively given graph.
On the one hand our data
structure is an ordinary sort of graph in which we have either
labeled nodes with a collection of labeled outgoing arcs (i.e.,
ordinary adjacency records) or else special sorts of unlabeled nodes
with a single unlabeled outgoing arc (i.e., our ``pointers''; a
pointer term of the form @7 represents an unlabeled node
with a single unlabeled arc leading to node 7).
On the other hand, this graph over two sorts of nodes is meant to
represent a graph over nodes only of the first sort. We can recover
our intended graph by collecting together a node of the first sort
(i.e., a node with a standard adjacency record) and all of the nodes
of the second sort which point to it, forming an equivalence class.
Each such ``pointer equivalence'' class corresponds to a single node
in the intended graph and so, transitively, to the single feature
structure of which that node is the root.
7.5.3 The unif.pl Program
The Main Routine
As promised the main routine is essentially a simultaneous depth
first search of two (sub-) graphs. The top-level predicate
unify(P,Q,G0,G) takes two pointers (i.e., node names)
P and Q which point to the roots of the unificands
in the graph G0. Output is the modified graph G, in
which both P and Q point to the root of their most
general unifier.
*/
:- op(200,fx,@).
unify(P,Q,Graph0,Graph):-
get(P,Graph0,PVal),
get(Q,Graph0,QVal),
unif(PVal,QVal,P,Q,Graph0,Graph).
/*
First unify/4 calls get/3 twice to fetch whatever
is stored in locations P and Q from Graph0.
Then control passes to unif/6, which handles the various
different cases, depending on what we find in PVal and
QVal.
There are two kinds of expression that can be stored as nodes
in the graph---terms and pointers---and there are two subgraphs to
consider so there are
four possible cases for unif/6 to handle.
In three of these cases at least one of the unificands is a pointer
expression. In that case we simply get whatever it points to
and continue in this wise until both arguments are terms.
We keep track of which locations we are examining in the fourth and
fifth argument positions.
When we finally get two terms as unificands,
we first compute the type of their unifier (if such a type can be
found),
then we call unify/4 recursively on each arc
in the adjacency record of the first term (FS1). When the
recursion on P's and Q's neighbors is complete, we
take the graph as it is output from recursive_unify/6
(G2) and update it so that P now points to Q.
*/
unif(Typ/FS,@Q,P,_,G0,G):-
get(Q,G0,QVal),
unif(Typ/FS,QVal,P,Q,G0,G).
unif(@P,Typ/FS,_,Q,G0,G):-
get(P,G0,PVal),
unif(PVal,Typ/FS,P,Q,G0,G).
unif(@P,@Q,_,_,G0,G):-
get(P,G0,PVal),
get(Q,G0,QVal),
unif(PVal,QVal,P,Q,G0,G).
unif(Typ1/FS1,Typ2/FS2,P,Q,G0,G):-
type_inference(Typ1,Typ2,Q,G0,G1),
recursive_unify(FS1,FS2,Q,G1,G2),
point(P,Q,G2,G).
/*
In general there are three cases to consider when we are comparing the
arcs of two feature structures. A given feature may appear in both
unificands, only in the first one, or only in the second. However,
since we are storing the result in the location of the second
unificand, we can ignore the third case: all arcs belonging only to
the second unificand are already correctly represented in the input.
In case a given feature F has values in both feature
structures (clause 2), then we must unify those values. (This is where
this algorithm takes on the appearance of a depth first traversal.
Remember that unify/4 will ultimately call
recursive_unify/6
, giving us the familiar double recursion,
first on the neighbors of V1 and then afterwards on its
``siblings''.)
In case F is not present in FVs2, then we must copy
F:V to FVs2, which is stored in the graph
at location Loc,
but we do nothing further with the structure at location V.
*/
recursive_unify([],_FVs,_Loc,G,G).
recursive_unify([F:V1|FVs1],FVs2,Loc,G0,G):-
select(F:V2,FVs2,FVs2a),
unify(V1,V2,G0,G1),
recursive_unify(FVs1,FVs2a,Loc,G1,G).
recursive_unify([F:V|FVs1],FVs2,Loc,G0,G):-
nonmember(feature,F,FVs2),
copy(F:V,Loc,G0,G1),
recursive_unify(FVs1,FVs2,Loc,G1,G).
/*
Utility Predicates
Most of the remaining predicates are straightforward extensions of
nth/3. The only significant exception is
type_inference/5
which must compute the type of the new
structure. This is done in the following way: the type of the unifier
is defined to be the most specific possible type which is at least as
general as the types of the unificands. Formally, we conceive of the
type hierarchy as a lattice, ordered by the partial order ``more
general than''. In such a setting, we are looking for the
greatest
lower bound of the two input types. Details of this calculation are
found in the Test Data section. Like the list of particular features
that are allowed, and the list of possible types, the type hierarchy
can be freely varied according to the problem at hand,
without disturbing the core algorithm.
*/
% Get node record Value with name Ptr in Graph.
get(Ptr,Graph,Value):- member(Ptr-Value,Graph).
% Make node name P in G0 point to node name Q in G.
point(P,Q,G0,G):- replace(P,@Q,G0,G).
copy(Arc,Loc,G0,G):-
add_arc(Loc,Arc,G0,G).
type_inference(T1,T2,Loc,G0,G):-
glb(T1,T2,T), % defined in test data section
replace_type(Loc,T,G0,G).
member(X,[X|_]).
member(X,[_|L]):- member(X,L).
nth(1,[X|_],X).
nth(K,[_|L],X):- K>1, J is K-1, nth(J,L,X).
replace( Key, Ptr, [Key-_|Rest], [Key-Ptr|Rest] ).
replace( Key, Ptr, [Hd|Tail1], [Hd|Tail2] ):-
replace( Key, Ptr, Tail1, Tail2 ).
replace_type( Key, T, [Key-_/FVs|Rest], [Key-T/FVs|Rest] ).
replace_type( Key, T, [Hd|Rest1], [Hd|Rest2] ):-
replace_type( Key, T, Rest1, Rest2 ).
add_arc( Key, Arc, [Key-T/Arcs|Rest], [Key-T/[Arc|Arcs]|Rest] ).
add_arc( Key, Arc, [Hd|Tail1], [Hd|Tail2] ):-
add_arc( Key, Arc, Tail1, Tail2 ).
select(X,[X|L],L).
select(X,[Y|L1],[Y|L2]):- select(X,L1,L2).
nonmember(_Type,_,[]).
nonmember(Type,X,[Y|L]):-
neq(Type,X,Y),
nonmember(Type,X,L).
neq(feature,F1,F2:_):-
features(Feats), % defined in test data section
domain_neq(F1,F2,Feats).
domain_neq(X,Y,[X|Dom]):- member(Y,Dom).
domain_neq(X,Y,[Y|Dom]):- member(X,Dom).
domain_neq(X,Y,[_|Dom]):- domain_neq(X,Y,Dom).
/*
Test Data
This section contains code which is not considered part of the core
program. In particular, besides test data, it includes the
definitions of features/1 and glb/3.
Greatest lower bounds in our type hierarchy are simple to compute. If one
of the types is top then the other one is our solution. If
neither one is top, then they must be identical. Otherwise no
solution exists. A little informally, we are saying that the most
specific type which is both a T and a top is
T itself.
(T in this case can of course also be top.)
If T1 and T2 are different, and
neither is top, then there are no types that are both
T1 and T2 (so a fortiori there is no most specific
such type), so the computation fails in this case.
*/
glb(top,T2,T2).
glb(T1,top,T1):-
definite(T1).
glb(T,T,T):- definite(T).
definite(T):- member(T,[f,g,h]).
features([1,2,3,4,a,b,c,d]).
test(1):- test(1,1,5,X), report(1,1,5,X).
test(2):- test(2,1,3,X), report(2,1,3,X).
test(3):- test(3,1,5,X), report(3,1,5,X).
test(Graph,FS1,FS2,Result):-
data(Graph,G0),
unify(FS1,FS2,G0,Result).
report(Graph,Start1,Start2,Result):-
data(Graph,G),
nl,
print('Input: '), print( G ), nl,
print('A: '), print(Start1), print(', B: '), print(Start2), nl,
print('Output: '), print(Result), nl.
data(1, % X=h/[],Y=h/[],f[a:g[a:X],c:g[a:X]]=f[a:g[a:Y],b:g[a:Y]].
[1-f/[a:2,c:3], %1
2-g/[a:4],
3-g/[a:4],
4-h/[],
5-f/[a:6,b:7], %5
6-g/[a:8],
7-g/[a:8],
8-h/[]
]).
data(2, % f[a:X,b:X]=f[a:g[b:Y,c:Z]].
[1-f/[a:2,b:2],
2-top/[],
3-f/[a:4],
4-g/[b:5,c:6],
5-top/[],
6-top/[]
]).
data(3, % Roots: 1, 5.
[1-f/[a:2,b:3],
2-f/[a:4],
3-f/[a:2],
4-f/[],
5-f/[a:6,b:7],
6-f/[],
7-f/[a:8],
8-f/[a:6]
]).
/*
Running the test data
| ?- test(1).
Input: [1-f/[a:2,c:3],2-g/[a:4],3-g/[a:4],4-h/[],5-f/[a:6,b:7],6-g/[a:8],7-g/[a:8],8-h/[]]
A: 1, B: 5
Output: [1- @5,2- @6,3-g/[a:4],4- @8,5-f/[c:3,a:6,b:7],6-g/[a:8],7-g/[a:8],8-h/[]]
yes
{source_info}
| ?- test(2).
Input: [1-f/[a:2,b:2],2-top/[],3-f/[a:4],4-g/[b:5,c:6],5-top/[],6-top/[]]
A: 1, B: 3
Output: [1- @3,2- @4,3-f/[b:2,a:4],4-g/[b:5,c:6],5-top/[],6-top/[]]
yes
{source_info}
| ?- test(3).
Input: [1-f/[a:2,b:3],2-f/[a:4],3-f/[a:2],4-f/[],5-f/[a:6,b:7],6-f/[],7-f/[a:8],8-f/[a:6]]
A: 1, B: 5
Output: [1- @5,2- @6,3- @7,4- @6,5-f/[a:6,b:7],6- @8,7-f/[a:8],8-f/[a:6]]
yes
Note that in both cases the computed solution can be found by starting
at the first member of the graph. So, in the first case, we start at
location 1 which contains the pointer @5, so we
first go to location 5, and continue from there.
*/