Chapter 2
Why Logic Programming is So Wonderful
One always has to start a programming language course with a little
lecture one could call ``Why bother?'' There are a very great many
programming languages available nowadays, and learning each one involves
developing new habits, memorizing new things and learning new techniques.
All of this is can be quite a lot of WORK! So why bother?
(WARNING: The rest of this chapter is loaded with PROPAGANDA!)
A programming language is a formal language, like the language of
first order predicate logic, or like a grammar formalism like HPSG or
Categorial Grammar. As such it is endowed with a syntax which says
what the legal expressions are, and in addition a number of ways of
interpreting syntactically well-formed expressions.
On the one hand, we have the interpretation that means something to
us, the programmers. This is the declarative interpretation of
a program, and reflects the way in which the symbols of the program
are supposed to reflect the ``real world''. (In a sense, Model Theory
is about the declarative interpretation of logical languages; aspects of
human grammatical competence furnish the declarative interpretation of
grammar formalisms.)
Computers, of course, know nothing of the real world (many
philosophers would say that computers know nothing, Punkt). They just
flip bits from one state to another, basically. More abstractly,
computers interpret the programs we write in a way which is blind to
the meanings we give our symbols, but which depends very much on the
structure of the program and the placement of certain keywords. This
meaning-free bit-mashing interpretation of a program furnishes its
procedural interpretation. (In a sense, Proof Theory is about the
procedural interpretation of logical languages; grammatical
derivations (phrase structural, transformational, type-logical, etc.)
provide a procedural interpretation of grammar formalisms.)
Sometimes you will hear people make claims like ``Prolog is a
declarative language, while C is a procedural language.''
This is loose talk and rather misleading. Both languages have both
declarative and procedural interpretations. The thing which makes
Prolog special in this regard is that, while for C the procedural
interpretation is fairly straightforward and the declarative
interpretation is nightmarishly complicated, for (``pure'') Prolog both
interpretations are extremely simple, and it is relatively
straightforward to prove that they are equivalent in a certain sense.
The reason that logic programming languages like Prolog are considered
``declarative languages'' is that they are designed
primarily with their declarative
interpretation in mind.
They are, I suppose, ``problem oriented'' where ``procedural''
languages are more ``machine oriented''.
Instead of treating the statements of the language as
commands, the computer treats them as descriptions, in particular
as descriptions of what things are true in the world described by the
program.
Because logical descriptions of a problem have implicit
consequences, we can still think of this static worldview in
computational terms: computation in a logic programming setting is a
process of making a program's implicit consequences
explicit.
For example, the logical description of human grammatical competence
may have among its consequences that a certain linguistic expression
(this sentence, for example) is to be assigned a particular semantic
interpretation. That consequence is implicit in the statement of the
grammar. Turning that grammar into a parser, for example, is
essentially the problem of making that implicit consequence
explicit. In this sense proof, that is, the explicit assertion
of the consequence relation, can be thought of as computation
and, maybe more remarkably, computation can be thought of as proof.
A lot of discussion has taken place in the past several decades over
whether declarative or procedural approaches are ``better''. They
certainly are different, and the differences are especially obvious when
you are learning your first declarative language. These differences
are very easy to over-emphasize! Still, there are some important things
that really are easier to do in logic programming.
Here are two of them.
2.1
From Descriptions of Problems to Solutions of Problems
Computation, whether we approach it from a declarative or
procedural standpoint, is mainly about problem-solving.
We always specify a problem and its solution
by providing certain constraints which describe what information we
are given to begin with, and certain other constraints which describe what
counts as a solution. So, if done correctly, both procedural and
declarative programming projects start from the same place: an essentially
declarative description of the problem to be solved.
Consider the problem of defining the dominance relation that holds
between nodes in a phrase structure tree when the first lies strictly
closer to the root than the other along the same path.
In what follows, we will give definitions in plain English, but phrased as
carefully as may be. This is ``specificationese'', a special
(ill-defined) language
for giving (hopefully) unambiguous specifications of procedures
without requiring the rigid syntax of, e.g., first order logic, or,
for that matter, Prolog.
Since we don't know anything about Prolog's syntax yet, we will use
``specificationese'' as an example of a logic programming language.
(The corresponding Prolog code is given in an
appendix; you can see for yourselves how closely it reflects the
structure of these specifications, even without knowing anything about
the rules of Prolog syntax.)
A rather literal
definition of dominance is the following.
X dominates Y in T iff |
|
there is a branch B in T such that |
|
|
X is higher than Y in B.
|
We assume for the moment that we can always extract the branches from
a tree, and that a branch is a list, that is, a data structure
with a head and a tail, where the tail is also a list.
X is higher than Y in B iff |
|
the position of X in B is P1 and |
|
the position of Y in B is P2 and |
|
P1 < P2.
|
Now we need to define what is meant by the position of an expression
in a list of expressions. The obvious recursive definition is the
following.
the position of X in L is N iff |
|
either |
|
|
X is the head of L, and |
|
|
N is 1, |
|
or |
the tail of L is L1, and |
|
|
the position of X in L1 is N1, and |
|
|
N is N1+1.
|
These definitions, together with a definition of the ``branch''
relation, give us enough information to actually compute the dominance
relationships between nodes in a tree. The translation of this
``specificationese'' program into Prolog is provided in the appendix
to this chapter; again, you should not need to know anything about
Prolog syntax to see that the translation from the specification to
Prolog is quite direct, indeed almost word for word. (The code in the
appendix contains the definitions of trees and branches as well, and
is in fact a complete, running Prolog program, though not by any means
remarkably efficient.)
Those of you with some programming background probably already know
that often iterative procedures are more efficient than
recursive, non-iterative ones. Here is the corresponding iterative
definition,
which represents our
first optimization of this logic program.
the position of X in L is N iff |
|
I equals 1 and |
|
a loop with index I, searching L for X, returns N.
|
This sounds very procedural, but we see immediately that a loop such
as this is in fact just another kind of recursively defined relation.
a loop with index I, searching L for X, returns N iff
|
either |
|
|
X is the head of L, and |
|
|
N equals I, |
|
or |
I1 is I+1, and |
|
|
the tail of L is L1, and |
|
|
a loop with index I1, searching L1 for X, returns N.
|
The only difference between these two definitions is that in the first
we do not know the value of N until we have proved the recursive
statement and found the value of N1. In the second, we can do our
arithmetic before we prove the recursive conjunct. Thought of as ways
of defining the ``position'' relation, there does not seem to be much
to choose between the two definitions, except that the first is
decidedly shorter and simpler. Thought of as procedures, however,
the second one turns out to be more efficient. Using the first
definition, the Prolog interpreter must remember, for every time the
procedure is invoked, that it must still do some stuff after control
returns from the recursive call. So the interpreter must create a
stack frame, one frame each time the procedure calls
itself. Using the second definition, there is nothing that needs to be
remembered, so when the end of the recursion is reached, the Prolog
interpreter can go straight back to the definition of ``higher'',
where ``position'' was originally called. No stack frames need to be
created (or discarded).
At this point in the course, you certainly do not need to know what a
``stack frame'' is. All you need to grasp is that there can be two
correct definitions of a relation which are logically equivalent (of
course they must be equivalent or they cannot both be correct!) but
which are not equally efficient. Furthermore, you should appreciate
that knowing how to write efficient logic programs requires knowing a
fair amount about how the logic program interpreter or compiler works;
nonetheless, one can, in most cases,
write correct logic programs knowing only
how logic works. (And of course the syntax of your logic
programming language, e.g., Prolog.)
There are lots of other ways this ``logic program'' could be optimized
as well. For example, in order to find the position of Y in branch B
we start over at the beginning of the branch, this time searching for
Y. But clearly, if X does indeed dominate Y, then at some point while
we are searching for the position of Y on the branch we will pass
X. So why not search for them both in a single pass through the
branch? And in that case, why bother actually calculating exact
positions on the branch? Why not just work our way through the list
looking for X, and, once we have found it, then start looking
for Y. Only if we find Y under these conditions can we say that it
does indeed come after X. This way we can discover whether X is higher
than Y in B with a single pass through B, and without doing any
arithmetic. A little more thought (left as an exercise to the
especially ambitious reader)
and we can eliminate the call to ``branch'' entirely, working instead
directly with the tree.
This kind of optimization is a little different than the optimization
where we introduced an iterative recursion in place of a non-iterative
one. There we needed to know something about the interpreter or
compiler that implements our programming language. Our reasoning here
is more general, and more abstract. But it is clearly
algorithmic reasoning, not logical reasoning.
Once again, we are faced with many different logically
equivalent definitions of a relation, and the basis of our choice
is efficiency, here interpreted as simply the number of steps we must
perform to accomplish a certain task.
This is the kind of reasoning that goes into the discovery of
better and better algorithms in general, whether for implementation in
``declarative'' languages or ``procedural'' languages.
And in fact the more reasoning we do, and the more optimization we do, the
closer our supposedly declarative programs will resemble supposedly
procedural programs. After all, at the heart of all computation, whether
procedural or declarative, we will always find algorithms. So it is
important not to over-emphsize the difference between declarative
programming and procedural programming. Much more important is the
distinction between reasoning about computers and reasoning
about problems. And here pretty much everyone is in agreement:
reasoning
about problems is first and foremost.
If you are in a situation where you
have to squeeze the last ounce of performance from your programs, then
there will always come a time at which you will say to yourself ``I could
make this go faster if I rewrote it in C.'' In some cases that may be true.
However, given the current state of logic programming technology,
it almost never is! The truly significant gains in speed
and memory use always come from reasoning about the problem, and reasoning
is always better and clearer if it is done logically. The true
advantage of logic programming lies here, then:
logic programming presents us
with the fewest possible barriers between reasoning and programming.
2.2
From Theories to Programs
As linguists, we are generally less interested in the performance of a
program than we are in its correctness. In the case of simple
mathematical problems like defining and calculating dominance
relations in a tree structure,
the underlying theory
is already well worked out and not subject to much debate. We can safely
assume that we know what a tree is, and what operations are important
in dealing with one.
In computational linguistics and computational cognitive science more
generally we are not in this position. While it may appear that we
understand what a ``sentence'' is, it is not at all clear that we
understand what a syntactic analysis of a sentence is. We cannot be
certain what the primitive objects of the theory are, nor what operations
are possible on them, nor what constraints exist on the computation of
these operations. Linguistic theories are extremely complex, they exist
in many varieties, which are at best only partly compatible, and they
are subject
to change almost from day to day. So in cognitive science, when one
writes a computer program one must always be prepared to demonstrate
rigorously just exactly how that program is related to the corpus of Theory.
This is not always easy to do! It is especially difficult if the program
is written in a procedural language. The reason is obvious enough:
theories are almost always presented to us declaratively, not
procedurally. This is especially true in linguistics. A grammar is meant
to tell us what the analysis of a particular expression is, not how to
arrive at that analysis. This is just one way of stating the well-known
distinction between competence and performance. But the
problems don't end there. Even if we are interested in linguistic
performance---suppose, for example, that we are collaborating
with psychologists rather than
linguists---we still have to start from theories. The theory may be about
how certain types of sentences require especially large amounts of a
person's working memory to process, which on the face of it at least is
more ``computational''. Nonetheless, even here you will find that
psychological theories (for example) are not algorithmic. A great deal
remains to be filled in by the computationalist. And once again, every
decision the computationalist makes must be rigorously justified
with respect to the original theory. You must be prepared to prove
that your program does not introduce answers which are not consequences
of the theory, nor does it fail to find answers that are consequences of
the theory.
As an example, admittedly oversimplifying the problems involved,
let us consider the recognition problem. The
recognition problem for a particular grammar G is the problem of
determining for any given string of symols S whether or not S
can be derived from G , that is, whether or not S is in the
language generated by G . Consider the following very simple phrase
structure grammar. (Here S stands for sentence, NP for noun phrase, VP
for verb phrase, and Det for determiner, i.e., articles like ``the'',
quantifiers like ``every'', possessives like ``John's'', etc.)
|
S |
小 |
NP VP
|
(1.1) |
|
NP |
小 |
Det Noun
|
(1.2) |
|
NP |
小 |
``it''
|
(1.3) |
|
Det |
小 |
``a''
|
(1.4) |
|
Noun |
小 |
``person''
|
(1.5) |
|
Noun |
小 |
``thing''
|
(1.6) |
|
VP |
小 |
``exists''
|
(1.7) |
A grammar generates a string if there is a rewriting derivation beginning
with the start-symbol S, at each step rewriting some non-terminal
symbol in the current string until eventually the string consists only of
terminal symbols.
This grammar generates sentences like
A thing exists. (For example, apply rule 1.1,
then rules 1.2, 1.4, 1.6, 1.7.)
Therefore, A thing exists is in the set of expressions that it
recognizes; Exists a thing is not.
The following Prolog program implements a solution to the recognition
problem for this
grammar.
recognize( String ) :-
s( String ).
Translation: ``For all String, String is
recognized if it
is an s.''
s( String ) :-
np( String1 ),
vp( String2 ),
concatenation( String1, String2, String ).
Translation: ``String is an s
if there is an np
String1 and a vp
String2, and String is the result of
concatenating String1 and String2.''
np( String ) :-
det( String1 ),
noun( String2 ),
concatenation( String1, String2, String ).
np( String ) :- string_eq( String, it ).
det( String ) :- string_eq( String, a ).
noun( String ) :- string_eq( String, thing ).
noun( String ) :- string_eq( String, person ).
vp( String ) :- string_eq( String, exists ).
For the moment we will say nothing about how concatenation or
string_eq might be implemented, but will assume that they can
be implemented correctly, somehow. (They can, don't worry.)
Interestingly, the description of the recognition problem for this grammar
looks very much like the grammar itself! For every rule in the grammar
there is exactly one sentence in the program, and the form of the sentence
in the program is completely determined by the form of the rule.
In a sense, the ability to write
declarative programs allows us to blur the distinction between competence
and performance. Once we have adequately described (some aspect of)
linguistic competence, we almost immediately derive a computational device
capable of (some limited) linguistic performance. In fact, in a week or
so we will see how to construct a recognizer without even mentioning
concatenation explicitly; at that point, our recognizer will essentially be
just our grammar with the arrows turned around!
Suppose now that the theory changes. Let's say we discover a new verb,
sneezes, or we discover relative clauses and want to be able to
recognize sentences like A thing which exists exists. It is as
simple to add clauses to our program as it is to add rules to our grammar.
Just as it was true that we could optimize our dominance program,
there are many ways we can optimize our recognizer,
though they are perhaps not as obvious. Still it must be borne in mind
that every optimization we make takes us farther and farther away from
the original theory. The primary goal in implementing linguistic theories
is always clarity. Remember: Premature optimization is the
root of all evil! (Incidentally, that was said by a computer scientist,
not a linguist!)
2.3 Appendix to Chapter 2:
Prolog code for the ``dominates'' relation
dominates(X,Y,T):-
branch(B,T),
higher(X,Y,B).
higher(X,Y,B):-
position(X,B,P1),
position(Y,B,P2),
P1 < P2.
/*
position(X,B,N):-
head(X,B),
N is 1.
position(X,B,N):-
tail(B,B1),
position(X,B1,N1),
N is N1+1.
*/
position(X,L,N):-
I is 1,
loop(I,X,L,N).
loop(I,X,L,N):-
head(X,L),
N is I.
loop(I,X,L,N):-
I1 is I+1,
tail(L,L1),
loop(I1,X,L1,N).
branch(B,T):-
leaf(T,R), % T is a leaf with label R.
head(R,B),
tail(B,[]).
branch(B,T):-
root(T,R),
head(R,B),
tail(B,BTail),
daughter(T,TDtr),
branch(BTail,TDtr).
% List data structure:
head(H,[H|_]).
tail([_|T],T).
% Tree data structure:
leaf(t(R),R).
root(t(R,_),R).
daughter(t(_,DtrList),D):- member(D,DtrList).
member(X,L):- head(X,L).
member(X,L):- tail(L,L1), member(X,L1).
% Test data, for testing branch and dominates:
data(1,t(a,[A1,A2])):-
A1 = t(a1,[A11,A12]),
A2 = t(a2,[A21,A22,A23]),
A11 = t(a11),
A12 = t(a12),
A21 = t(a21),
A22 = t(a22,[t(a221)]),
A23 = t(a23).
This document was translated from LATEX by HEVEA.