Crackingthe Natural Language Code |
Modern logic is best described as linguistics without tears. Our declared interest is language. But we have found that the language we actually speak is too complicated and that life is too short. So instead of studying real languages we investigate toy languages invented by ourselves for our own convenience. It is a bit as if zoologists, dismayed by the complexity of real ducks and bears, decided to study plastic ducks and stuffed teddy bears instead. Or perhaps as if an engineer, daunted by the complexities involved in building real bridges spent his time playing with Lego pieces.
We flatter ourselves that somehow or other our findings concerning the toy languages will be relevant to understanding how a real language works. But the nature of this relevance has never been made clear.
We hammer the first-order predicate calculus into our freshmen and make them formalise natural-English sentences. I am no exception, for this is what a logician is paid to do. But I never do it with a clear conscience. I dread the day when a bright student raises his hand and asks me: 'When you tell us to formalise an English sentence, what exactly is it that you want us to do?' I have to confess that I would be stuck for an answer. I certainly want them to come up with a formula which has the same truth-condition as the given English sentence. But this is not all I require. A student who formalised the sentence 'It is raining or not raining' as '("x)(U(x) É U(x))', where U stands for 'unicorn' - such a student would not get a good mark despite the fact that the sentence and the formula have the same truth-condition. This is because the English sentence is about rain and the formula about unicorns and this makes them bad translations of each other. Sameness of subject matter is obviously one of the minimal conditions of intertranslatability. Thus perhaps what I want the student to do is translate the English sentence into my toy language. But this cannot be what I want either. For in the case of the sentence 'All unicorns are unicorns' I do want the students to come up with '("x)(U(x) É U(x))', despite the fact that the sentence and the formula do not agree in subject matter. The hook in the formula refers to a certain truth-function, called the material conditional, but there is no mention of that truth-function in the English sentence. To refer to that truth-function in English one has to use the connective 'if then'; there is clearly no sign of anything like 'if then' in the sentence.
One surprising fact is that the made-up languages we logicians like to play with have been a great hit not just with ourselves but also with those whose declared business is to study languages that people actually speak. Rather than being impressed by the vast gulf separating the logician's calculi from the subject of their study, modern linguists are mesmerised by them and try to emulate the logician's methods in studying natural languages.
One distinctive feature of the artificial notations invented to simplify the logician's life is absence of
CRACKING THE NATURAL LANGUAGE CODE |
8 |
ambiguities. This has one important consequence. If the meaning of every expression is uniquely reflected in the syntactical shape of the expression itself, as it is in the logician's ideography, the combinability of two or more expressions can be described purely in terms of the syntactic shape of those expressions. Thus it is that the logician can afford to separate syntax from semantics. He can describe his language in two instalments, so to speak. First he tells us how the meaningful expressions of the language are put together, ignoring for the time being what they mean; only later does he turn to matters of meaning.
It is perhaps worth noting that even when dealing with the logician's toy language, separating syntax from semantics is not a particularly rational way to proceed. The recursive rules of semantic interpretation are perfectly parallel to those of syntactic formation, so defining syntax and semantics simultaneously would seem far more economical than going through the same recursive rigmarole twice over. But in the case of a formal language this is just a matter of elegance and economy. There is no logical objection to doing the job in two separate stages.
Now the linguists emulate the logicians, completely oblivious of the fact that the ambiguity of the language they deal with precludes the logician's two-stage procedure. The well-formedness of a natural-language compound depends on more than the syntactic shape of its components. It depends equally on what those components mean. A very crude example is the phrase 'Bill and Hillary Clinton' which is only well-formed if the two names refer to reasonably close relatives; if they refer to unrelated people the coordination is just as unacceptable as, say 'Bill Clinton and Buckley' (where Bill is supposed to go with 'Buckley' as well as 'Clinton'). A slightly more sophisticated example is 'The chicken is ready to eat and sleep', which is only well-formed if 'ready to eat' means 'ready to eat something or other'. If it means 'ready to be eaten' the very same form of words is no more acceptable than the phrase 'The chicken is ready to eat and or'. Hence in the case of a natural language any attempt to separate syntax from semantics is not only inelegant and wasteful, but outright impossible. Yet the reigning linguistic methodology is to divide grammar into two so-called modules, a syntactic module, where uninterpreted sentences are formed according to purely syntactic rules, and a semantic module where strings independently thrown up by the syntactic module are assigned meanings.
Most linguistic theories, notably that of the dominant Chomsky school, actually never get around to describing the alleged semantic module. There is some irony in that since it is Chomsky himself who has coined the very apposite slogan that a grammar is a generator of sentence-meaning pairs. As far as I can tell, not a single sentence-meaning pair has yet been generated by Chomsky or his followers.
There are other schools which, at least on the face of it, take Chomsky's slogan seriously. I an referring to Montague's Grammar and other Theories inspired by Montague's approach. But on closer inspection it turns out that what those grammars generate are not sentence-meaning pairs but sentence-sentence pairs. They provide rules whereby sentences of English can be associated with sentences of some other language. Montague Grammar, for example, offers rules whereby English
CRACKING THE NATURAL LANGUAGE CODE |
9 |
sentences can be mapped onto formulas of an artificial language called Intensional Logic (or IL for short).
One obvious point is that the other language is no less in need of a semantic account than English is, so that, on this approach, the problem of semantic interpretation is not solved but merely pushed back, passed on to another language.
It will perhaps be objected that the interpretation of a language like Montague's Intensional Logic is unproblematic, that it is, indeed, explicitly defined. It is far from clear that this is the case. What Montague's semantic rules explicitly define, are f, the truth-conditions of the IL formulas. Bur as I have noted already there must be more to meaning than truth-conditions, for otherwise '("x)(U(x) É U(x))' and 'P Ú ~ P' would we semantically indistinguishable. Meanings, as Max Cresswell rightly emphasises, are structured entities, complexes, not simple objects like truth-conditions. If formulas like '("x)(U(x) É U(x))' and 'P Ú ~ P', are to be semantically distinct, they must express complexes which contain the referents of U and P as constituents. In the truth-condition of the formulas those referents are irretrievably lost.
But maybe some will argue that the complexes expressed by the formulas of Intensional Logic can be reconstructed from Montague's definitions of their truth-conditions. So let us put the complexity quibble aside and assume that Montague's theory provides interpretations of the artificial formulas which it associates with English sentences.
On this assumption a Montague grammar does, indirectly, generate sentence-meaning pairs. Which leaves us with one last question. Does it associate the right meanings with the right sentences? Please note that I do not mean to ask whether the truth-condition that the grammar associates with a sentence is correct. I am happy to grant that it is. By 'meaning' I mean structured meaning, what Russell used to call a denoting complex. Structured meanings are finer than truth-conditions because one and the same truth condition, as we have seen, can be specified by many different complexes of this sort. The question I want to raise is whether the complex that Montague's theory associates, via an IL formula, with an English sentence is the one which is expressed by that sentence.
Since the point I want to make is somewhat subtle let me first give you an analogy. Suppose you own an electronic digital watch and ask your friend how the thing works. To enlighten you your friend shows you a grandfather clock, and says: 'Look, your watch shows 12:00, for example, it is telling you the same thing as the clock tells you by having both its hands pointing straight up. Indeed every reading on the watch translates into a reading on the dial of the clock.' You may reply: 'Well this is fascinating, but what I wanted to know was how my watch works, how it manages to track the fugitive present moment?'. Now imagine that your friend reacts to this by unveiling the mechanism of the grandfather clock, explaining in all detail the synergy of the weights, springs and interlocking cogs.
Clearly your curiosity will still be unsatisfied. You wanted to know what makes your digital watch
CRACKING THE NATURAL LANGUAGE CODE |
10 |
tick and all you learned is what makes the grandfather clock tick. You know now, for example, how the grandfather clock manages to come up with both hand up at the right time, you also know that both hands up on the clock is tantamount to 12:00 on your watch; but you still do not know how your watch manages to come up with 12:00 at the right moment.
Now think of the English language as the electronic watch and of Montague's Intensional Logic as the grandfather clock. You want to understand how English works. Montague tells you: Here is another language, IL. I will show you how to formalise every English expression in IL. This clearly won't assuage you curiosity. A bilingual person may be completely ignorant of how either of the two languages he has mastered works. Now Montague tells you, but wait a minute, I can tell you not only how to translate English into IL but I can also explain to you in all detail how IL works. I can give you a detailed semantics for it. Clearly when he has delivered all that your question still remains unanswered.
Just as every time-piece keeps track of the present moment in a specific way, so every language encodes meanings in a specific way. The task of the linguist is to decipher the code peculiar to English. This task is not discharged by translating English sentences into a language which is based on completely different coding principles. Such a procedure leaves the specific coding principles of English unexamined.
But things are worse than that. Montague's theory not only fails to explore the specific coding principles of English. The IL formulas which it associates with English sentences are not even translations of them. They are just what are known as formalisations. The English sentence 'Every unicorn walks', for example, is associated with the IL formula '("x)(U(x) É U(x))', which is hardly an acceptable translation of the sentence. Surely the phrase 'every unicorn' represents a unit of the English sentence's meaning, but there is nothing corresponding to it in the formula. On the other hand, the hook in the formula refers to a truth function which is never mentioned in the English sentence.
Let me illustrate the point with a more recent, but otherwise completely random
example. Barry Richards, trying to shed semantic light on tensed sentences of English,
offers the following definition:
PAST(v,t)A is true in model M at (w,i) if
w = gc(v), i = gc(v) and there is an interval j < i such that A
is true in M at (w,j); it is false at (w,i) if either w ± gc(v) or i ± gc(t)
or there is no j < i such that A is true at (w,j); (and otherwise is undefined)
Now I do not happen to believe that the operator thus defined has much in common with the English past tense at all, but this is beside the point I am trying to make. I want to draw your attention to Richards' methodology. He set about explaining the semantics of an English sentence like 'Hillary swam'. What he gives us is a truth-condition for a piece of artificial notation: 'PAST(v,t)Hillary swim'. Now a moment's reflection reveals that the coding principles underlying the artificial notation are diametrically opposite to those underlying the English sentence. In the artificial notation a
CRACKING THE NATURAL LANGUAGE CODE |
11 |
past-tense sentence is formed from the outside, as it were, by prefixing a pastness modifier, PAST, to an untensed clause. English on the other hand, which does not have a pastness operator, forms a past-tense sentence from the inside. Pastness is indicated by the morphological form of a pre-clausal constituent of the sentence, the verb.
Richards' approach is typical for most of recent work in the semantics of natural language. The analyst's main concern is to defend his favourite toy language; he wants to demonstrate that for any semantic trick one can pull in natural English (such as making a statement about the past), a similar trick can be pulled in his toy language. He is not interested in how those tricks are pulled in English. He does not see his task as that of deciphering the code of the natural language. His is a formaliser, not a code-cracker. And since the natural language and its code are in fact one and the same thing, he is not really interested in natural language as such at all.
In the rest of my paper I want to sketch an alternative approach. It seems to me that if we are really interested in English expressions and how they come to have the meanings they do then this is what we should concentrate upon. Dragging another language into the picture is just an unnecessary complication.
One principle which was introduced into linguistics by Chomsky and his school seems to me beyond any reasonable doubt. That is the idea that the grammar of a natural language must take the form of an inductive definition or generative process. There are infinitely many meaningful expressions in English and mathematical induction is the only way to handle infinite sets.
But Chomsky took too simplistic a view of what sort of items are to be generated. As I have already argued, the well-formedness of an English expression depends not only on the well-formedness of its constituents, but also on what they mean. Hence we can say, on first approximation, that a grammar must generate points consisting of a meaning and a form of words expressing that meaning.
This, however, is still a simplification. The well-formedness of a compound depends not only on whether their components are well formed and on what they mean, but also on their grammatical category, on their agreement characteristics like number, gender, and case, on whether they are absolute or relative, on their positive or negative polarity, and on their anaphoric and coordination status. All these features have to be taken into account before it can be determined whether a compound has been correctly formed and interpreted. Thus the basic unit of the generative process must be a meaning/expression pair supplemented with indices indicating the values of all these grammatical features. I shall call such indexed meaning/expression pairs,
semantic pairs or briefly s-pairs. An s-pair can be represented in the
following form:
CRACKING THE NATURAL LANGUAGE CODE |
12 |
|
|
||||||||||||||||||||||||||||
Here C is a meaning, ie. a complex which constructs an object from other objects in terms of functional application and functional abstraction. C is a string of letters which expresses that meaning in English. x is the logical type of the object constructed by C. is the grammatical category of the string, an agreement index indicating gender, number, person, case etc, is a status index showing whether the string is absolute or relative, and are a polarity index and anaphoric index respectively. (I will return to the remaining indices shortly.)
A language is a system which offers certain strings of letters as expressions of certain meanings. In other words, a language licenses a certain range of s-pairs. Since the range is invariably infinite, the licensing cannot be given in the form of a list. It must be given in the form of a set of rules. Some of the rules license s-pairs directly, others conditionally; they say, what s-pairs are licensed provided some other s-pairs have been licensed already. The general form of a rule is thus as follows:
|
|
||||||||||||||||||||||||||||
. . . |
. . . | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
. . . |
. . . | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
Condition: C |
|||||||||||||||||||||||||||||
The rule says that if the s-pairs listed above the horizontal line (the 'given' s-pairs) are licensed then so are the s-pairs listed below (the 'resulting' pairs). I shall refer to rules of this sort simply as s-rules.
CRACKING THE NATURAL LANGUAGE CODE |
13 |
Here is an example of an s-rule:
|
|
||||||||||||||||||||||||||||
Since there are no 'given' s-pairs, the rule licenses its resulting s-pair directly. It makes the string 'one' a numeral of person 5 (neuter singular) and rank 0 expressing a trivial construction of the number one.
It would obviously be wasteful to write out a rule like that for each primitive numeral in full. Rules which introduce primitive numerals share a common form which can be represented as follows:
L1(,,,N,N):
|
|
||||||||||||||||||||||||||||
The above rule can then be spoken briefly as L11(NUM,5,0,1,one), a similar rule for 'two' as L1(NUM,8,0,2,two), and so forth.
Classes of rules can then be conveniently listed using this notation. Thus for example,
numerals of rank 0 and 1 can be introduced thus:
| L1(NUM,5,0,1,one) | L1(NUM,8,0,6,six) |
| L1(NUM,8,0,2,two) | L1(NUM,8,0,7,seven) |
| L1(NUM,8,0,3,three) | L1(NUM,8,0,8,eight) |
| L1(NUM,8,0,4,four) | L1(NUM,8,0,9,nine) |
| L1(NUM,8,0,5,five) | |
| L1(NUM,8,1,2.10,twenty) | L1(NUM,8,1,6.10,sixty) |
| L1(NUM,8,1,3.10,thirty) | L1(NUM,8,1,7.10,seventy) |
| L1(NUM,8,1,4.10,forty) | L1(NUM,8,1,8.10,eighty) |
| L1(NUM,8,1,5.10,fifty) | L1(NUM,8,1,9.10,ninety) |
are s-rules.
Then we can formulate a rule whereby the concatenation of a numeral of rank 1 and a numeral of rank zero gives us another numeral of rank one. This numeral expresses the composition of the addition function with the constructions expressed by the original numerals. The rule looks like this:
CRACKING THE NATURAL LANGUAGE CODE |
14 |
|
|
||||||||||||||||||||||||||||
. . . |
. . . | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
According to these rules 'thirty two', for example, is a numeral of rank 1 expressive of the construction [3.10]+2.
I have chosen numerals as an example because they constitute one of the most well-behaved and trouble-free fragments of English. Even so you will have noticed that I am simplifying. In particular, I have glossed over 'eleven' through 'nineteen' which are also primitive numerals. I am cutting corners for the sake of brevity.
The reason the grammar of the numerals is easy is because the steps by which the expressions are formed are closely parallel to the steps whereby the corresponding constructions are formed. In this respect the grammar of numerals is almost like the grammar of the artificial languages of logic and mathematics.
One principle implemented in all artificial languages is this: an argument expression is always outside the corresponding functor and, conversely, a functor is always outside the corresponding argument(s). Almost any mathematical or logical expression, eg 'sin(7+3)' and ' ~ (Px & Qx)', can serve as an illustration of the principle. In the mathematical example, the sine function applies to the result of adding 3 to 7, hence the functor 'sin' is not just outside but also contiguous with the argument expression '(7+3)';in the logical example the negation applies to the conjunction, hence the tilde is outside and contiguous with the whole of '(Px & Qx)'.
Natural as this notational principle may seem, natural English deviates from it systematically. In English, a formative indicative of a modifier is frequently buried within the corresponding arguments. It is as if a mathematician wrote '7sin+3', or perhaps '7+sin3' instead of 'sin(7+3)', or if a logician wrote '(Px ~ & Qx)' or perhaps '(P~x & Q~x)' instead of '~ (Px & Qx)'. Likewise there are converse cases where a functor stands outside its argument expression but not immediately next to it: it appears rather next to a larger expression in which the argument string is buried. This would be analogous to a mathematician's writing, say, '+sin(7,3)' instead of 'sin(7+3)' or a logician's writing '&~ (Px,Qx)' instead of '~ (Px & Qx)'.
CRACKING THE NATURAL LANGUAGE CODE |
15 |
As an example of the first kind of deviation, consider the verb phrase
(1) knocked Ali promptly out.
The promptness modifier clearly applies to what is expressed by the (discontinuous) string '- knocked ... out' of the phrase: the phrase adverts to a prompt knockout. Yet the adverb 'promptly', which expresses the modifier, is buried inside that string.
To illustrate the reverse kind of deviation, consider
(2) With alacrity, Fred knocked Ali out.
Here the alacrity modifier cannot apply to what is expressed by the clause 'Fred knocked Ali out'. It is only the action of knocking Ali out that can sensibly be judged as one which is done with or without alacrity; it makes little sense to characterise the circumstance that Fred (rather than someone else) performed the action-which is what the whole clause adverts to-as alacritous. Yet syntactically the adverb governs the whole clause.
Both sentences contain discontinuous constituents. The string 'knock out' is clearly a constituent of (1), but its two fragments have drifted apart and are separated by foreign material. The string 'knock out with alacrity' is a constituent of (2), but it appears in three separate bits with other strings intervening between them.
Conventional linguistics employs two stratagems in dealing with discontinuous
constituents. Sometimes it simply denies their existence. The most conspicuous is the way
it deals with bitransitive verb phrases, like
(3) take Mary to Nick
By representing the structure of the phrase as
(4)
VP
ÚÄÄÄÄÅÄÄÄÄ¿
V NP PP
³ ³ ÚÄÁÄ¿
³ ³ P NP
³ ³ ³ ³
take Mary to Nick
the linguists lump 'to' illogically with 'Nick' rather than with 'take'. The string 'to Nick' is thus accorded the status of a syntactic constituent (called 'prepositional phrase').
This analysis flies in the face of standard English dictionaries which invariably list 'take
... to ...' as a single entry:
take sb. to sb., v, to conduct or escort.
There is a very good reason for it. It certainly would not do to try and give a definition of 'take' and a separate definition of 'to', relying on the reader to figure it out for himself what 'take sb. to sb.'
CRACKING THE NATURAL LANGUAGE CODE |
16 |
means. All by itself, the word 'take' is not associated with any specifiable activity or accomplishment; it is not like 'sing', which does advert to a well-defined kind of activity. It is perhaps not outright absurd to argue that in the phrase 'sing the Internationale to Nick' the verb 'sing', all by itself, refers to that kind of activity and 'to Nick' expresses, all by itself, a certain manner in which any activity can be conducted. But in the case of (3), reference to the escorting activity is patently the joint work of 'take' and 'to'. There is no such activity as taking simpliciter; only taking somebody to somebody, taking something from somebody, etc.
The decision to represent the structure of (3) as (4) is a momentous step which amounts to divorcing form from meaning and syntax from semantics. It is incompatible with the view that the syntax of an expression is diagrammatic of its meaning, that it corresponds, however loosely, to the logical structure of what it expresses. Once a linguist makes this decision, it is almost inevitable that he will give up any attempts to keep syntax and semantics in step. The idea of an 'autonomous syntax' where meaning need not be taken into account becomes irresistible as a rationalisation of a wrong-headed methodology.
The other stratagem used by conventional linguists to deal with discontinuous strings
is transformationalism. This is the idea that sentences come into this world in on orderly
form, every constituent nice and continuous, whereupon all manner of transformations rend
bits and pieces from their natural places and deposit them in places where they do not
properly belong. Take for example the sentence:
(5) Bill and Hillary Clinton sang the Internationale.
It is difficult to deny that the string 'Bill Clinton' is not a
constituent of this sentence. The story we get is as follows: (5) arises
from a conjunctive sentence:
(6) Bill Clinton sang the Internationale and Hillary Clinton sang the Internationale.
First a transformation called CONJUNCTION REDUCTION applies and we get
(7) Bill Clinton and Hillary Clinton sang the Internationale
Then a transformation called SURNAME DELETION applies which erases the first occurrence of 'Clinton' giving us (5).
There are three problems with this account. Firstly, on one of its natural readings (5) reports that Bill and Hillary combined to give a single performance of the song. On this reading (5) and (6) are completely independent statements, either of which may be true without the other one being true. Secondly, it follows from the account that the string 'Bill and Hillary Clinton' is meaningless in isolation. This is hard to reconcile with the fact that, all by itself, the string can serve as an answer to a question like 'Who sang the Internationale ?' Finally as already noted the 'deletion' of the first occurrence of 'Clinton' from (7) is only possible if the two persons mentioned share their surname thanks to a familial relationship. This information can perhaps be captured in the form of a
CRACKING THE NATURAL LANGUAGE CODE |
17 |
subcategorization feature pertaining to the two proper names. But by the time clause (7) is generated this information will be lost. The only way to recover it is to retrace the steps which have led to the sentence.
But why concatenate 'Bill' with 'Clinton' into a continuous string if the next move is to tear them asunder again? Why not assume that the grammar generates the two as a pair of correlated but not concatenated strings which refer to the president between them, so to speak? Similarly, why not assume that the grammar generates 'take' and 'to' as a pair of correlated but not concatenated strings which refer to a kind of action collectively? When the two fragments are eventually embedded into a clause by word-order rules they may be separated by intervening foreign material without thereby losing their ability to refer in combination to that action kind. Similarly the three strings
'knocked Ali' 'out' and 'promptly'
may refer to a kind of action collectively before they are embedded into a clause and so can
'knocked Ali out' and 'with alacrity'.
On this approach clauses can be generated directly with every word in its right place, rather than being first joined together and later torn apart again.
Another characteristic feature of the natural-language code is what may be called re-categorisation.
The sentence
(8) Fred knocked exactly three boxers out
is another case of a modifier appearing in the middle of what it modifies. The phrase 'exactly three boxers' is not anybody's name; it is a quantifier governing the whole clause. The clause asserts that there are exactly three boxers x such that Fred knocked x out. Yet the quantificational phrase is buried deep inside the clause.
This time, however, it is not a matter of gratuitous word order. The phrase occurs in the position of direct object, a position where an individual name, such as 'Ali' might occur and where the variable x occurs in the semi-formal paraphrase just given. Thus although it denotes a quantifier, that is, roughly, a function from classes to truth values, natural English treats the phrase syntactically as if it was a singular term.
This feature, which, incidentally, is a feature of every natural language I have come across, is an extremely ingenious and economical notational device. In an artificial notation a quantificational phrase stands in front of a clausal abstract:
Exactly three boxers lx Fred knocked x out
the 'place' engaged by the abstraction operator being pinpointed by the letter x. Natural language, on the other hand, pinpoints the place by inserting there the quantificational phrase itself.
Here is my hypotheses of how all this comes about. First the quantificational phrase 'exactly three
CRACKING THE NATURAL LANGUAGE CODE |
18 |
boxers' is generated and interpreted as a name of a quantifier Q,
that is, of an operation from classes to truth-values:
(9)
|
|
||||||||||||||||||||||||||
Once this is done, the quantificational phrase is re-categorised as a noun phrase
expressing a variable, say x, on the understanding that at later stage
the variable will be abstracted upon and Q applied to the resulting construction.
Such an understanding is conveniently codified in the form of a schematic s-pair which may
called anticipation s-pair. In the present case, the anticipation s-pair looks
roughly thus:
(10)
|
|
||||||||||||||||||||||||||
The box
| CL | - |
abs | p | z |
on the right represents an unspecified string with specific grammatical
features assigned to it.
The box
| Jtw | - |
abs | p | z |
on the left represents an unspecified construction with specific constructional features assigned to it. The whole represents a promissory note as it were to the effect that the quantifier will be applied in due course to a construction expressed by a clause which is yet to be generated.
Now there will be a rule licensing step
(11)
|
|
||||||||||||||||||||||||||||
relative to step (10), the fact that the license is relative
to (10) being marked by the index h. From then on the phrase 'exactly
three boxers' can be treated as any other noun phrase. In particular, it can be
inserted as an ordinary direct-object phrase into the verb 'knocked/out'
to form the verb phrase 'knocked exactly three boxers out'. The verb
phrase can in turn be combined with 'Fred' into the clause 'Fred
knocked exactly three boxers out'. Suppose that the step at which this happens is
CRACKING THE NATURAL LANGUAGE CODE |
19 |
(12)
|
|
||||||||||||||||||||||||||||
Since the logical and grammatical features in this step match those in the boxes of the
anticipation step, the promise that the step represents can now be fulfilled. The two
upstairs boxes in (10) can be supplanted with the construction and string
in step (12), yielding.
(13)
|
|
||||||||||||||||||||||||||||
The promise has been fulfilled hence the anticipation index h disappears.
Since licensing is thus often relative to the promissory notes, a step in a derivation
is characterized not just by the s-pair which is being licensed but also by the promissory
notes which are still outstanding at that step. The general form of a step is thus
A1,A2 ... An Þ S,
Where A1,A2 ... and An are the anticipation s-pairs yet to be discharged and S the s-pair which is being licensed relative to A1,A2 ... and An. Let us call them sequents in recognition of their similarity to what Gentzen called sequents. But note that while the constituents of a Gentzen sequent are propositions and the arrow signifies entailment, the sequents I am talking about consist of s-pairs, that is of pairs each of which in turn consists of a string of letters and a logical construction expressed by that string. But the s-rules which govern the generation of sequents are just like Gentzen's: they tell you what sequents you are entitled to derive, given that you have already derived some other sequents.
The notion of s-rule is the key notion of the whole approach. Every s-rule governs an aspect of the interplay between form and meaning peculiar to the language in question. It is at the same time a rule of syntax, semantics, and morphology. You may doubt the particular story I told concerning quantificational phrases like 'exactly three boxers' and consequently doubt that the s-rules the story requires are genuine rules of the English language. But it seems to me that any attempt to describe the mechanism which enables the English language to endow sentences like (8) with their meaning will have to be given in the form of what I have called s-rules. There is no other way to capture the interface between form and meaning peculiar to the English language. It is exactly this interface that both the autonomous syntactician and the Montague-style formaliser leave unexamined: the autonomous syntactician because he leaves meaning out in the cold, and the formaliser because he goes from the vernacular expression straight to its symbolization in a formal language whose form/meaning interface is radically different.
Given the complexity of natural language, the s-rules required to provide its grammar are, of course, numerous and often quite involved. But they are explicit, rigorous, and computer-friendly. In the age of the microchip, complexity is not a problem. What I have been referring to as the logician's toy languages should be seen for what they are, a legacy of the pencil-and-paper era. It is now time to look at the real thing.