Chapter 2 Why Logic Programming is So Wonderful

One always has to start a programming language course with a little lecture one could call ``Why bother?'' There are a very great many programming languages available nowadays, and learning each one involves developing new habits, memorizing new things and learning new techniques. All of this is can be quite a lot of WORK! So why bother? (WARNING: The rest of this chapter is loaded with PROPAGANDA!)

A programming language is a formal language, like the language of first order predicate logic, or like a grammar formalism like HPSG or Categorial Grammar. As such it is endowed with a syntax which says what the legal expressions are, and in addition a number of ways of interpreting syntactically well-formed expressions. On the one hand, we have the interpretation that means something to us, the programmers. This is the declarative interpretation of a program, and reflects the way in which the symbols of the program are supposed to reflect the ``real world''. (In a sense, Model Theory is about the declarative interpretation of logical languages; aspects of human grammatical competence furnish the declarative interpretation of grammar formalisms.) Computers, of course, know nothing of the real world (many philosophers would say that computers know nothing, Punkt). They just flip bits from one state to another, basically. More abstractly, computers interpret the programs we write in a way which is blind to the meanings we give our symbols, but which depends very much on the structure of the program and the placement of certain keywords. This meaning-free bit-mashing interpretation of a program furnishes its procedural interpretation. (In a sense, Proof Theory is about the procedural interpretation of logical languages; grammatical derivations (phrase structural, transformational, type-logical, etc.) provide a procedural interpretation of grammar formalisms.)

Sometimes you will hear people make claims like ``Prolog is a declarative language, while C is a procedural language.'' This is loose talk and rather misleading. Both languages have both declarative and procedural interpretations. The thing which makes Prolog special in this regard is that, while for C the procedural interpretation is fairly straightforward and the declarative interpretation is nightmarishly complicated, for (``pure'') Prolog both interpretations are extremely simple, and it is relatively straightforward to prove that they are equivalent in a certain sense.

The reason that logic programming languages like Prolog are considered ``declarative languages'' is that they are designed primarily with their declarative interpretation in mind. They are, I suppose, ``problem oriented'' where ``procedural'' languages are more ``machine oriented''. Instead of treating the statements of the language as commands, the computer treats them as descriptions, in particular as descriptions of what things are true in the world described by the program. Because logical descriptions of a problem have implicit consequences, we can still think of this static worldview in computational terms: computation in a logic programming setting is a process of making a program's implicit consequences explicit. For example, the logical description of human grammatical competence may have among its consequences that a certain linguistic expression (this sentence, for example) is to be assigned a particular semantic interpretation. That consequence is implicit in the statement of the grammar. Turning that grammar into a parser, for example, is essentially the problem of making that implicit consequence explicit. In this sense proof, that is, the explicit assertion of the consequence relation, can be thought of as computation and, maybe more remarkably, computation can be thought of as proof.

A lot of discussion has taken place in the past several decades over whether declarative or procedural approaches are ``better''. They certainly are different, and the differences are especially obvious when you are learning your first declarative language. These differences are very easy to over-emphasize! Still, there are some important things that really are easier to do in logic programming. Here are two of them.

2.1 From Descriptions of Problems to Solutions of Problems

Computation, whether we approach it from a declarative or procedural standpoint, is mainly about problem-solving. We always specify a problem and its solution by providing certain constraints which describe what information we are given to begin with, and certain other constraints which describe what counts as a solution. So, if done correctly, both procedural and declarative programming projects start from the same place: an essentially declarative description of the problem to be solved. Consider the problem of defining the dominance relation that holds between nodes in a phrase structure tree when the first lies strictly closer to the root than the other along the same path. In what follows, we will give definitions in plain English, but phrased as carefully as may be. This is ``specificationese'', a special (ill-defined) language for giving (hopefully) unambiguous specifications of procedures without requiring the rigid syntax of, e.g., first order logic, or, for that matter, Prolog. Since we don't know anything about Prolog's syntax yet, we will use ``specificationese'' as an example of a logic programming language. (The corresponding Prolog code is given in an appendix; you can see for yourselves how closely it reflects the structure of these specifications, even without knowing anything about the rules of Prolog syntax.) A rather literal definition of dominance is the following.

X dominates Y in T iff

there is a branch B in T such that

X is higher than Y in B.

We assume for the moment that we can always extract the branches from a tree, and that a branch is a list, that is, a data structure with a head and a tail, where the tail is also a list.

X is higher than Y in B iff

the position of X in B is P1 and

the position of Y in B is P2 and

P1 < P2.

Now we need to define what is meant by the position of an expression in a list of expressions. The obvious recursive definition is the following.

the position of X in L is N iff

either

X is the head of L, and

N is 1,

or the tail of L is L1, and

the position of X in L1 is N1, and

N is N1+1.

These definitions, together with a definition of the ``branch'' relation, give us enough information to actually compute the dominance relationships between nodes in a tree. The translation of this ``specificationese'' program into Prolog is provided in the appendix to this chapter; again, you should not need to know anything about Prolog syntax to see that the translation from the specification to Prolog is quite direct, indeed almost word for word. (The code in the appendix contains the definitions of trees and branches as well, and is in fact a complete, running Prolog program, though not by any means remarkably efficient.)

Those of you with some programming background probably already know that often iterative procedures are more efficient than recursive, non-iterative ones. Here is the corresponding iterative definition, which represents our first optimization of this logic program.

the position of X in L is N iff

I equals 1 and

a loop with index I, searching L for X, returns N.

This sounds very procedural, but we see immediately that a loop such as this is in fact just another kind of recursively defined relation.

a loop with index I, searching L for X, returns N iff

either

X is the head of L, and

N equals I,

or I1 is I+1, and

the tail of L is L1, and

a loop with index I1, searching L1 for X, returns N.

The only difference between these two definitions is that in the first we do not know the value of N until we have proved the recursive statement and found the value of N1. In the second, we can do our arithmetic before we prove the recursive conjunct. Thought of as ways of defining the ``position'' relation, there does not seem to be much to choose between the two definitions, except that the first is decidedly shorter and simpler. Thought of as procedures, however, the second one turns out to be more efficient. Using the first definition, the Prolog interpreter must remember, for every time the procedure is invoked, that it must still do some stuff after control returns from the recursive call. So the interpreter must create a stack frame, one frame each time the procedure calls itself. Using the second definition, there is nothing that needs to be remembered, so when the end of the recursion is reached, the Prolog interpreter can go straight back to the definition of ``higher'', where ``position'' was originally called. No stack frames need to be created (or discarded).

At this point in the course, you certainly do not need to know what a ``stack frame'' is. All you need to grasp is that there can be two correct definitions of a relation which are logically equivalent (of course they must be equivalent or they cannot both be correct!) but which are not equally efficient. Furthermore, you should appreciate that knowing how to write efficient logic programs requires knowing a fair amount about how the logic program interpreter or compiler works; nonetheless, one can, in most cases, write correct logic programs knowing only how logic works. (And of course the syntax of your logic programming language, e.g., Prolog.)

There are lots of other ways this ``logic program'' could be optimized as well. For example, in order to find the position of Y in branch B we start over at the beginning of the branch, this time searching for Y. But clearly, if X does indeed dominate Y, then at some point while we are searching for the position of Y on the branch we will pass X. So why not search for them both in a single pass through the branch? And in that case, why bother actually calculating exact positions on the branch? Why not just work our way through the list looking for X, and, once we have found it, then start looking for Y. Only if we find Y under these conditions can we say that it does indeed come after X. This way we can discover whether X is higher than Y in B with a single pass through B, and without doing any arithmetic. A little more thought (left as an exercise to the especially ambitious reader) and we can eliminate the call to ``branch'' entirely, working instead directly with the tree.

This kind of optimization is a little different than the optimization where we introduced an iterative recursion in place of a non-iterative one. There we needed to know something about the interpreter or compiler that implements our programming language. Our reasoning here is more general, and more abstract. But it is clearly algorithmic reasoning, not logical reasoning. Once again, we are faced with many different logically equivalent definitions of a relation, and the basis of our choice is efficiency, here interpreted as simply the number of steps we must perform to accomplish a certain task.

This is the kind of reasoning that goes into the discovery of better and better algorithms in general, whether for implementation in ``declarative'' languages or ``procedural'' languages. And in fact the more reasoning we do, and the more optimization we do, the closer our supposedly declarative programs will resemble supposedly procedural programs. After all, at the heart of all computation, whether procedural or declarative, we will always find algorithms. So it is important not to over-emphsize the difference between declarative programming and procedural programming. Much more important is the distinction between reasoning about computers and reasoning about problems. And here pretty much everyone is in agreement: reasoning about problems is first and foremost.

If you are in a situation where you have to squeeze the last ounce of performance from your programs, then there will always come a time at which you will say to yourself ``I could make this go faster if I rewrote it in C.'' In some cases that may be true. However, given the current state of logic programming technology, it almost never is! The truly significant gains in speed and memory use always come from reasoning about the problem, and reasoning is always better and clearer if it is done logically. The true advantage of logic programming lies here, then: logic programming presents us with the fewest possible barriers between reasoning and programming.

2.2 From Theories to Programs

As linguists, we are generally less interested in the performance of a program than we are in its correctness. In the case of simple mathematical problems like defining and calculating dominance relations in a tree structure, the underlying theory is already well worked out and not subject to much debate. We can safely assume that we know what a tree is, and what operations are important in dealing with one. In computational linguistics and computational cognitive science more generally we are not in this position. While it may appear that we understand what a ``sentence'' is, it is not at all clear that we understand what a syntactic analysis of a sentence is. We cannot be certain what the primitive objects of the theory are, nor what operations are possible on them, nor what constraints exist on the computation of these operations. Linguistic theories are extremely complex, they exist in many varieties, which are at best only partly compatible, and they are subject to change almost from day to day. So in cognitive science, when one writes a computer program one must always be prepared to demonstrate rigorously just exactly how that program is related to the corpus of Theory.

This is not always easy to do! It is especially difficult if the program is written in a procedural language. The reason is obvious enough: theories are almost always presented to us declaratively, not procedurally. This is especially true in linguistics. A grammar is meant to tell us what the analysis of a particular expression is, not how to arrive at that analysis. This is just one way of stating the well-known distinction between competence and performance. But the problems don't end there. Even if we are interested in linguistic performance---suppose, for example, that we are collaborating with psychologists rather than linguists---we still have to start from theories. The theory may be about how certain types of sentences require especially large amounts of a person's working memory to process, which on the face of it at least is more ``computational''. Nonetheless, even here you will find that psychological theories (for example) are not algorithmic. A great deal remains to be filled in by the computationalist. And once again, every decision the computationalist makes must be rigorously justified with respect to the original theory. You must be prepared to prove that your program does not introduce answers which are not consequences of the theory, nor does it fail to find answers that are consequences of the theory.

As an example, admittedly oversimplifying the problems involved, let us consider the recognition problem. The recognition problem for a particular grammar G is the problem of determining for any given string of symols S whether or not S can be derived from G , that is, whether or not S is in the language generated by G . Consider the following very simple phrase structure grammar. (Here S stands for sentence, NP for noun phrase, VP for verb phrase, and Det for determiner, i.e., articles like ``the'', quantifiers like ``every'', possessives like ``John's'', etc.)

S	��	NP VP	(1.1)
NP	��	Det Noun	(1.2)
NP	��	``it''	(1.3)
Det	��	``a''	(1.4)
Noun	��	``person''	(1.5)
Noun	��	``thing''	(1.6)
VP	��	``exists''	(1.7)

A grammar generates a string if there is a rewriting derivation beginning with the start-symbol S, at each step rewriting some non-terminal symbol in the current string until eventually the string consists only of terminal symbols. This grammar generates sentences like A thing exists. (For example, apply rule 1.1, then rules 1.2, 1.4, 1.6, 1.7.) Therefore, A thing exists is in the set of expressions that it recognizes; Exists a thing is not.

The following Prolog program implements a solution to the recognition problem for this grammar.

recognize( String ) :- 
        s( String ).

Translation: ``For all String, String is recognized if it is an s.''

s( String ) :- 
        np( String1 ), 
        vp( String2 ), 
        concatenation( String1, String2, String ).

Translation: ``String is an s if there is an np String1 and a vp String2, and String is the result of concatenating String1 and String2.''

np( String ) :- 
        det( String1 ),
        noun( String2 ), 
        concatenation( String1, String2, String ).
np( String ) :- string_eq( String, it ).
        
det( String ) :- string_eq( String, a ).
        
noun( String ) :- string_eq( String, thing ).
noun( String ) :- string_eq( String, person ).
        
vp( String ) :- string_eq( String, exists ).

For the moment we will say nothing about how concatenation or string_eq might be implemented, but will assume that they can be implemented correctly, somehow. (They can, don't worry.)

Interestingly, the description of the recognition problem for this grammar looks very much like the grammar itself! For every rule in the grammar there is exactly one sentence in the program, and the form of the sentence in the program is completely determined by the form of the rule. In a sense, the ability to write declarative programs allows us to blur the distinction between competence and performance. Once we have adequately described (some aspect of) linguistic competence, we almost immediately derive a computational device capable of (some limited) linguistic performance. In fact, in a week or so we will see how to construct a recognizer without even mentioning concatenation explicitly; at that point, our recognizer will essentially be just our grammar with the arrows turned around!

Suppose now that the theory changes. Let's say we discover a new verb, sneezes, or we discover relative clauses and want to be able to recognize sentences like A thing which exists exists. It is as simple to add clauses to our program as it is to add rules to our grammar. Just as it was true that we could optimize our dominance program, there are many ways we can optimize our recognizer, though they are perhaps not as obvious. Still it must be borne in mind that every optimization we make takes us farther and farther away from the original theory. The primary goal in implementing linguistic theories is always clarity. Remember: Premature optimization is the root of all evil! (Incidentally, that was said by a computer scientist, not a linguist!)

2.3 Appendix to Chapter 2: Prolog code for the ``dominates'' relation

dominates(X,Y,T):-
        branch(B,T),
        higher(X,Y,B).

higher(X,Y,B):-
        position(X,B,P1),
        position(Y,B,P2),
        P1 < P2.
/*
position(X,B,N):-
        head(X,B),
        N is 1.
position(X,B,N):-
        tail(B,B1),
        position(X,B1,N1),
        N is N1+1.
*/
position(X,L,N):-
        I is 1,
        loop(I,X,L,N).

loop(I,X,L,N):-
        head(X,L),
        N is I.
loop(I,X,L,N):-
        I1 is I+1,
        tail(L,L1),
        loop(I1,X,L1,N).

branch(B,T):-
        leaf(T,R), % T is a leaf with label R.
        head(R,B),
        tail(B,[]).
branch(B,T):-
        root(T,R),
        head(R,B),
        tail(B,BTail),
        daughter(T,TDtr),
        branch(BTail,TDtr).


% List data structure:
head(H,[H|_]).

tail([_|T],T).


% Tree data structure:
leaf(t(R),R).

root(t(R,_),R).

daughter(t(_,DtrList),D):- member(D,DtrList).

member(X,L):- head(X,L).
member(X,L):- tail(L,L1), member(X,L1).


% Test data, for testing branch and dominates:
data(1,t(a,[A1,A2])):-
        A1 = t(a1,[A11,A12]),
        A2 = t(a2,[A21,A22,A23]),
        A11 = t(a11),
        A12 = t(a12),
        A21 = t(a21),
        A22 = t(a22,[t(a221)]),
        A23 = t(a23).

This document was translated from L^AT_EX by H^EV^EA.

X dominates Y in T iff
	there is a branch B in T such that
		X is higher than Y in B.

X is higher than Y in B iff
	the position of X in B is P1 and
	the position of Y in B is P2 and
	P1 < P2.

the position of X in L is N iff
	either
		X is the head of L, and
		N is 1,
	or	the tail of L is L1, and
		the position of X in L1 is N1, and
		N is N1+1.

the position of X in L is N iff
	I equals 1 and
	a loop with index I, searching L for X, returns N.

	either
		X is the head of L, and
		N equals I,
	or	I1 is I+1, and
		the tail of L is L1, and
		a loop with index I1, searching L1 for X, returns N.