/
Author: Mössenböck H. Rechenberg P.
Tags: programming computer science microcomputers software tools
ISBN: 0-13-155060-8
Year: 1989
Text
A COMPILER GENERATOR
FOR MICROCOMPUTERS
Limits of Liability and Disclaimer of Warranty
The authors and publishers of this book have used their best efforts in
preparing this book and the programs contained within it. These efforts
include the development, research and testing of the theories and programs
to determine their effectiveness. The authors and publishers make no
warranty of any kind, expressed or implied, with regard to these programs
or the documentation contained in this book. The authors and publishers
shall not be liable in any event for incidental and consequential damages
in connection with, or arising from, the furnishing, performance or use of
these programs.
A COMPILER GENERATOR
FOR MICROCOMPUTERS
Peter Rechenberg
University of Linz
Hanspeter Mossenbock
University of Zurich
Translated by John O'Meara
and the authors
First published in English 1989 by
Prentice Hall International (UK) Ltd,
66 Wood Lane End, Hemel Hempstead,
Hertfordshire, HP2 4RG
A division of
Simon & Schuster International Group
This book was originally published in German under the title
Ein Compiler Generator fiXr Mikrocomputer by Peter
Rechenberg and Hanspeter Rechenberg
© 1985 Carl Hanser Verlag, Munich and Vienna.
© 1989 Carl Hanser Verlag and
Prentice Hall International (UK) Ltd
All rights reserved. No part of this publication may be
reproduced, stored in a retrieval system, or transmitted, in
any form, or by any means, electronic, mechanical,
photocopying, recording or otherwise, without the prior
permission, in writing, from the publisher.
For permission within the United States of America contact
Prentice Hall Inc., Englewood Cliffs, NJ 07632.
Printed and bound in Great Britain by
A. Wheaton & Co. Ltd, Exeter.
Library of Congress Cataloguing-in-Publication Data
Rechenberg, Peter
[Compiler- Generator fur Mikrocomputer. English]
A compiler generator for microcomputers / Peter
Rechenberg.
Hanspeter Mossenbock.
p. cm.
Translation of: Ein Compiler- Generator fur
Mikrocomputer.
Bibliography: p.
Includes index.
ISBN 0-13-155060-8 : $40.00
1. Compilers (Computer programs) 2. Microcomputers
- Programming.
I. Mossenbock, Hanspeter, 1959- . II. Title
QA76.76. C65R4313 1988
005.26 - dcl9 88-28926
British Library Cataloguing in Publication Data
Rechenberg, Peter
A compiler generator for microcomputers.
1. Computer systems. Programming languages.
Compilers. Design & construction
I. Title II. Mossenbock, Hanspeter
III. Ein Compiler-Generator fur Mikrocomputer.
English 005.4'53
ISBN 0-13-155060-8
ISBN 0-13-155136-1 Pbk
12 3 4 5 92 91 90 89 88
ISBN D-13-lSSDbD-fl
ISBN D-13-lSS13b-l PBK
Contents
Preface ix
Numbered definitions, algorithms, examples xi
Symbols xiv
1 Introduction and survey 1
1.1 Compilers and compiler compilers 2
1.2 Static compiler structure 4
1.3 Dynamic compiler structure 8
1.4 The structure of the book 10
2 Syntax 13
2.1 Basic concepts from formal language theory 13
2.2 LL(1) grammars and syntax analysis 23
2.3 The top-down graph 42
2.4 TheG-code 53
2.5 Parsing with the G-code 56
2.6 Error handling 62
vi Contents
Semantics 69
3.1 Semantic actions 70
3.2 Attributes 71
3.3 Context conditions 76
3.4 Attributed grammars 79
3.5 L-Attributed grammars 82
3.6 Implementation of the semantic interface 85
Various compiler compilers 91
4.1 YACC-yet another compiler compiler 92
4.2 HLP84 - Helsinki language processor 94
4.3 GAG - generator based on attribute grammars 96
4.4 MUG - modular compiler generator 98
4.5 Coco - compiler compiler 100
4.6 Summary 102
The compiler description language Cocol 105
5.1 Lexical structure 105
5.2 Cocol as a syntax description language 106
5.2.1 Productions 107
5.2.2 Declarations 109
5.3 Cocol as a semantic description language 110
5.3.1 Semantic actions 111
5.3.2 Attributes 113
5.3.3 Context conditions 115
5.3.4 Semantic declarations 115
5.3.5 Scope of semantic objects 116
The compiler compiler Coco 117
6.1 Characteristics 117
6.2 Components of the generated compiler 119
6.3 Operation of the generated compiler 120
6.4 Interfaces of the generated compiler 121
6.4.1 Caller interface 121
6.4.2 Input interface 122
6.4.3 Output interface 122
6.4.4 Syntax error interface 123
6.5 Generation of multi-pass compilers 124
Contents
vii
The implementation 125
7.1 Survey 126
7.2 Structure of the symbol list 127
7.2.1 Symbol list representation 127
7.2.2 Symbol list construction 129
7.3 Structure of the top-down graph 130
7.3.1 Top-down graph representation 130
7.3.2 Top-down graph construction 131
7.3.3 Insertion of eps-nodes 136
7.3.4 Removal of redundant eps-nodes 138
7.4 Collecting the symbol sets 140
7.4.1 Deletable nonterminals 141
7.4.2 Terminal start symbols of nonterminals 142
7.4.3 Terminal successors of nonterminals 143
7.4.4 eps-sets 145
7.4.5 any-sets 147
7.5 Grammar tests 147
7.5.1 Completeness 149
7.5.2 Reachability 149
7.5.3 Noncircularity 150
7.5.4 Termination 152
7.5.5 LL(1) condition 153
7.6 Generation of the parser tables 154
7.6.1 Table format 154
7.6.2 Generation of the G-code 156
7.6.3 Generation of the remaining tables 159
7.7 Generation of the syntax analyzer 159
7.8 Generation of the semantic evaluator 160
7.8.1 The invariant parts of the semantic evaluator 161
7.8.2 Processing of the semantic declarations 162
7.8.3 Processing of the semantic actions 163
7.8.4 Attribute processing 164
Applications 171
8.1 Applications in compile construction 171
8.1.1 Specification of a lexical analyzer 172
8.1.2 Description of a lexical analyzer for Modula-2 173
8.1.3 Semantic procedures for lexical analysis 180
8.2 Applications in software engineering 182
8.2.1 Attributed grammars as a software design method 182
viii
Contents
8.2.2 The telegram problem as an example 184
8.2.3 Attributed grammars as documentation 187
8.2.4 The Jackson method as a special case 187
8.3 Results of a Coco run 192
8.3.1 The generated syntax analyzer 193
8.3.2 The generated semantic evaluator 194
8.3.3 The generated parser tables 195
Experiences with Coco 197
9.1 A basis for measurements 197
9.2 Measurements on Coco 199
9.3 Measurements on some generated compilers 200
9.4 General experiences 201
203
207
212
213
214
220
370
Appendices
A
B
C
D
E
F
Definition of Adele
Modula-2 and Pascal
Syntax of Cocol
G-code
Intermodular cross-reference list
Program listings
Bibliography
Index
373
Preface
This book describes the structure of the compiler compiler Coco, which was
developed for microcomputers by the authors. It also deals with the techniques
used by Coco and those by which Coco was developed. Special attention is
given to the table driven top-down syntax analysis with automatic error
recovery and description of semantics using L-attributed grammars. Coco is
written in Modula-2 and generates compilers in Modula-2. It is hoped that this
will show how well Modula-2 is suited to the implementation and
documentation of large modular programs.
Compiler compilers, as we understand them, are not the field of a few
specialists in compiler construction, but rather are tools for managing various
tasks in software engineering, a fact which is not generally known. The
methodology of attributed grammars which lies at the foundation of compiler
compilers includes, for example, the Jackson method as a simple special case,
and can be applied where the program flow is primarily controlled by one
structured input data stream.
Thus this book has something to offer for a wide circle of readers:
1. It is a representation of the principles of compiler construction, as far as
they concern the analysis part of compilers especially LL(l)-syntax
analysis with attributed grammars. (Lexical analysis is covered only
marginally.)
2. It is a detailed description of a compiler compiler.
3. It illustrates the application of a compiler compiler by numerous
examples.
x Preface
4. It illustrates the application of software documentation methods on a large
program system, especially the method of stepwise refinement and the
use of an algorithm description language.
5. It can be used to evaluate the suitability of Modula-2 for software
engineering because it presents a large program in Modula-2 which
exploits the special properties of modular programming.
We consider the primary circle of readers to be advanced computer science
students, theoretically and practically active computer scientists and software
engineers. We therefore presuppose the usual terminology, assume that the
reader is acquainted with the development of software and that he can read
Pascal, or even better Modula-2, or some similar language. Accordingly, we
have kept the discussion brief, but have also taken pains not to refer to special
knowledge cited elsewhere to make the book understandable in itself.
The focal point around which the entire book evolved is the complete
Modula-2 code of Coco in Appendix F. We consider the publishing of such a
large program system a gamble because we are not sure whether the reader
will be interested in the numerous details in it, and because we expose
ourselves to all sorts of criticism of our programming style and choice of
algorithms. But at the same time we hope that it is just this completeness
which makes the book valuable and distinct from others.
For information concerning the structure of the book the reader is referred
to Section 1.4.
The Austrian Foundation for the Advancement of Scientific Research
financially supported the development of the compiler compiler and thereby
rendered it possible, for which we wish to express our appreciation.
For the careful review of the manuscript and for helpful suggestions we
wish to thank our colleagues and friends Prof. G. Pomberger, Dr G. Blaschek
and F. Ritzinger, for proof reading the English translation we wish to thank D.
Raye; for the review of the examples in Chapter 4 we wish to thank Prof. H.
Ganzinger, Prof. U. Kastens, Dr K. Koskimies and Prof. R. Marty. The text
was produced by ourselves with the text processor WriteNow on a Macintosh
computer.
Linz
August, 1988
P. Rechenberg
H. Mossenbock
Numbered definitions,
algorithms, examples
LI Definition Compiler 2
1.2 Definition Compiler compiler 3
1.3 Versions of Coco 4
1.4 Example Lexical analysis 6
1.5 Example Syntax tree 7
2.1 Definition Abbreviations for strings and sets ofstrings 14
2.2 Definition Grammar 15
2.3 Definition Derivation, sentential form, sentence, language 16
2.4 Example Derivation of all sentential forms of a language 16
2.5 Definition Left-canonical derivation 17
2.6 Definition Phrase 18
2.7 Definition Simple phrase, handle 18
2.8 Example Phrase, simple phrase, handle 18
2.9 Definition Recursive grammar 19
2.10 Example Arithmetic expressions 19
2.11 Definition Terminating symbol, derivable symbol 21
2.12 Definition Useless symbol 21
2.13 Definition Reduced grammar 21
2.14 Definition LL(k) grammar 25
2.15 Definition Terminal start symbols of a nonterminal 26
2.16 Definition Terminal start symbols of a string 26
2.17 LL(1) conditions for t-free grammars 27
2.18 Example LL(1) conditions 27
x i i Numbered definitions, algorithms, examples
2.19 Definition Terminal successors 28
2.20 LL(1) conditions for arbitrary grammars 28
2.21 Example LL(1) conditions 29
2.22 Example Dangling else 29
2.23 Definition Deletability 31
2.24 Algorithm Marking deletable symbols 32
2.25 Algorithm Calculationof the sets of terminal start symbols 32
2.26 Algorithm Calculation of successor sets 33
2.27 Algorithm LL(1) analysis (recursive) 35
2.28 Example Recursive LL(1) parsing 36
2.29 Algorithm LL(1)parsing (nonrecursive) 38
2.30 Example Nonrecursive LL(1)parsing 39
2.31 Definition Terminal start symbols of length k 40
2.32 Definition LL(k) grammar 40
2.33 LL(k) condition 40
2.34 Example LL(2) andLL(3) test 41
2.35 Example Basic structure of the top-down graph 43
2.36 Definition Complement symbol any 45
2.37 Example Equivalent top-down graphs 46
2.38 Definition Alternative chain 48
2.39 Example Alternative chains 48
2.40 Definition Match 48
2.41 Definition LL(1) conditions for top-down graphs 49
2.42 Definition G-code (incomplete) 55
2.43 Algorithm Parse (simplified) 58
2.44 Algorithm Parse (complete) 60
2.45 Example Error situation 62
2.46 Principle of error handling 64
2.47 Algorithm Error (basic structure) 65
2.48 Algorithm Triple 66
2.49 Algorithm Fill 67
2.50 Algorithm FillSucc 67
2.51 Algorithm Error (with heuristic enhancements) 68
3.1 Example Semantic actions 70
3.2 Example Semantic actions 71
3.3 Example Interpretation of arithmetic expressions 73
3.4 Example Interpretation of arithmetic expressions in EBNF 74
3.5 Example Inherited attributes 75
3.6 Example A context-sensitive language 78
3.7 Example Context condition 78
3.8 Example Context condition 78
Numbered definitions, algorithms, examples xiii
39 Definition Attributed grammar 79
3 10 Example Variable declaration 80
3.11 Definition L-attributedgrammar 83
3.12 Parser with semantic interface 86
3.13 Example Attribute passing 87
3.14 Definition G-code (remainder) 88
3.15 Principle of attribute saving for recursive symbols 90
4.1 Example Attributed grammar as input for YACC 93
4.2 Example Attributed grammar as input for HLP84 95
4.3 Example Attributed grammar as input for GAG 97
4.4 Example Attributed grammar as input for MUG 99
4.5 Example Attributed grammar as input for Coco 101
5.1 Example Cocol grammarfor real constants 107
5.2 Example Theuseofeps 107
5.3 Example The use of any 108
5.4 Example How the compiler treats LL(1) conflicts 108
5.5 Example Terminal declarations 109
5.6 Example Pragma declarations 110
5.7 Example Nonterminal declarations 110
5.8 Example Semantic actions 112
5.9 Example Indication of dataflow at parameters 112
5.10 Example Semantic macros 113
5.11 Example Semantic actions for pragmas 113
5.12 Example Attributes 114
5.13 Example Context conditions 115
5.14 Example Declarations of semantic objects 115
5.15 Example Stacking of semantic objects 116
6.1 Example Application of any 124
8.1 Example LL(1) conflicts in lexical structures 179
Symbols
a1
a*-
a*
G
0
5
V
VT
vN
v+
V*
e
0
G
n
u
->
1
[]
{}
{}
=^
=>+
=>*
^>
=&
in
a, P, <p,
a
14
14
14
15
40
15
14
15
15
14
14
14
15
15
20
20
16
16
16
17
203
CD
The string of n identical symbols a
Thtstt{an:n>l)
Thestt[an:n2>0)
Grammar
Order (asymptotic time complexity)
Sentence symbol
Alphabet
Alphabet of terminals
Alphabet of nonterminals
The set of all non-empty strings built from symbols of V
The set of all strings built from symbols of V including the
empty string
The empty string
The empty set
'Element of
Intersection of two sets
Union of two sets
Replacement symbol: 'is defined as'
Separates alternatives
Option notation (encloses optional symbols and strings)
Set notation
Repetition notation
Direct derivation: 'produces directly'
Derivation: 'produces'
Derivation: 'produces or is equal to'
Left-canonical derivation
'Does not produce and is not equal to'
Input, output, transient parameters
Strings
String to be analyzed
1
Introduction and survey
The older of the two authors distinctly remembers that he first heard the word
'compiler compiler1 at the IFIP-Congress in Munich in 1962 in connection
with Adas, the super computer of its time by the English company FerrantL It
was a dark, secretive term. Since compiler writing was still an art mastered by
only a few initiates, one could only touch one's cap humbly to people who
were involved in writing compilers which generated compilers. There was just
no way to understand them.
The two works which focused attention on compiler generating programs
and which eliminated much of the mystery from the concept were the
anthology by Rosen [1967] and the survey article Translator Writing Systems by
Feldman and Gries [1968]. But it was the clear formulation of the two most
important deterministic grammars, LR(k)-grammars by Knuth [1965] and
LL(k)-grammars by Lewis and Stearns [1968] that helped compiler
generators achieve the actual breakthrough.
Today, the terms 'compiler generator', 'compiler generating program'
and 'compiler compiler' are used synonymously and refer to a system which
in some way supports and partially automates the production of compilers.
In the first chapter we introduce the concepts of 'compiler' and 'compiler
compiler', survey the subtasks which a compiler must handle and discuss the
organization of the book. The reader who is acquainted with the terminology
of compiler construction, even only partially, can start immediately with
Section 1.4.
2
Introduction and survey
Chap. 1
1.1 Compilers and compiler compilers
With the exception of special cases, a program can be seen as the description
of a process (algorithm) which transforms input data into output data (Fig.
1.1).
Input data
Program
P
Output data
Fig 1.1 Program
If the input data themselves form a program, and the program P transforms
them into another language, P is called a compiler, the input data are called the
source program and the output data are called the target program (Fig. 1.2).
Source program
S
Compiler
C
Fig. 1.2 Compiler
Target program
T
Here, the source language is almost inevitably the higher, less
machine-oriented, and the target language the lower, more machine-oriented, language often
the machine language itself. Thus a compiler can be defined, as in Waite and
Goos [1984].
1.1 Definition Compiler
A compiler is a program which transforms an algorithm from a language
acceptable to humans into a language acceptable to machines.
Because a compiler is a complex program which itself must be written in a
programming language, the question arose quite early as to whether, given an
abstract description of the source language and its transformation into a target
language, a compiler could be generated either completely or partially. A
program CC which is to solve such a task reads the description of the source
language S together with its transformation into a target language T as input data.
It transforms this description into a program C which, when it is later
executed, transforms source programs written in S into the target language T. Thus
CC generates a compiler C, and is known as a compiler generator or
compiler compiler (Fig. 1.3).
Sec. 1.1
Compilers and compiler compilers
3
Compiler description
in
compiler description
language
CDL
Compiler compiler
CC
Compiler
in
compiler implementation
language
OL
Fig. 13 Compiler compiler
This leads to the following definition.
1.2 Definition Compiler compiler
A compiler compiler is a program which generates a compiler, or major
parts thereof, from the complete or partial description of the compiler.
A compiler compiler and the compiler it generates can be represented as in Fig.
1.4.
Compiler
description
in CDL
Source p
5
Compiler
compiler
CC
Kfogram ^
OL
1
Compiler
1 (
^ Target program
T
Fig. 1.4 Compiler compiler and the generated compiler
A compiler compiler and its compiler description language are very closely
related. For the user of a compiler compiler the compiler description language
is actually the only interesting feature because it determines whether the
description of the compiler to be generated can be formulated and how
conveniently this may be accomplished.
Compiler description languages have two primary tasks: (1) the
description of the syntax of the source language of the compiler to be generated and
(2) the description of the transformation of the source program into the target
program. Because the meaning of the source program is visible in this
transformation, the description of the transformation is also known as a semantical
description.
There are basically two notations for syntax description: Backus-Naur
form (BNF) and Extended Backus-Naur form (EBNF). Both describe the
4 Introduction and survey Chap. 1
syntax as a grammar in the form of so-called productions. They constitute
well-understood formal systems and are based on the theory of formal
languages.
The technique of describing semantics is less consolidated. Aside from
ad hoc methods, attributed grammars in a wide variety of forms are usually
applied here.
The compiler compiler described in this book is named Coco (a not very
imaginative abbreviation of 'compiler compiler1) and its compiler description
language is called Cocol (compiler compiler language). Cocol uses the EBNF
of Wirth [1982] for syntax description and a special form of attributed
grammars, the so called L-attributed grammars, for semantical descriptions.
Coco was originally implemented in PLM80 and generated a compiler in
Pascal-86. The version described here is written in Modula-2 and generates
compilers in Modula-2. Table 1.3 shows the versions of Coco that are
available for several popular compilers at the time of writing. They are different in
the languages of the generated compilers (Modula-2 or Pascal) and in the
machines on which they run.
1.3 Versions of Coco
Computer Modula-2 Pascal
Macintosh Mac-METH Turbo-Pascal
MS-DOS computers Logitech V. 3.0 Turbo-Pascal V. 4.0
M2-SDS
Taylor-Modula
ATARI-ST TDI-Modula
IBM/370 Modula/370
1.2 Static compiler structure
Like the translation of a sentence in a natural language Q into another natural
language Z, the transformation of a source program into a target program can
be roughly divided into two phases. First the sentence in Q must be
'understood1, through grammatical analysis. With knowledge of its grammatical
structure and the aid of a dictionary it is then possible to construct the sentence
in Z with the same meaning. In a similar way, the translation of a program
consists of analysis and synthesis.
In the analysis phase the source program is decomposed into its
constituent parts. Here one distinguishes:
Sec. 1.2
Static compiler structure
5
1 lexical analysis, which transforms the input character stream into
'symbols* such as names, numbers and operators;
2. syntax analysis, which analyzes the grammatical structure of the
program;
3. semantic analysis, which analyzes all the properties of the program
which are not of a syntactical nature.
Analysis yields:
1. the determination of the correctness of the program;
2. the internal representation of the source program in a form which is
particularly well adapted for synthesis (so-called intermediate language);
3. memory tables which are used for further processing of the intermediate
language.
i
Analysis
Compiler front end
1
1
,
i
Synthesis
Compiler back end
'
Fig
Source program
Lexical analysis
1
'
Syntax analysis
'
i
Semantic analysis
■
r
Optimization
r
Code generation
Target j
. 1.5 Static c
\
jrogram
ompiler structure
In the synthesis phase the target program is generated from the program in the
intermediate language. Here one distinguishes:
1. optimization, which transforms the program in the intermediate language
to improve the target program with respect to certain criteria;
2. code generation, which generates the target program from the optimized
intermediate language.
This static, or logical compiler structure is shown in Fig. 1.5.
6 Introduction and survey Chap. 1
The analysis sections are determined by the source language and the
intermediate language; the synthesis sections are determined by the intermediate
language and the target language. The analysis sections are known as the com-
piler front end; the synthesis sections are known as the compiler back end.
The compiler front end is independent of the target language; the compiler
back end is independent of the source language.
Compiler compilers primarily support the analysis phase, and therefore
this book only deals with the analysis phase.
Lexical analysis
Lexical analysis preprocesses the source program text in order to simplify the
tasks of the later phases. This preprocessing includes the following points:
1. Elimination of meaningless characters. Comments, empty lines and
unnecessary spaces are eliminated.
2. Recognition of symbols. One or more characters in sequence which
together constitute a symbol are recognized. Symbols are:
(a) keywords such as IF, WHILE, END, etc.;
(b) names for constants, types, variables, procedures, etc.;
(c) literals (numerical constants) such as 3.14;
(d) strings, usually enclosed in inverted commas, such as 'This is a
string';
(e) compound characters such as ':-•, •<-*,'..', etc.;
(f) individual characters such as ' (', V, etc.
3. Arithmetization of symbols. Because numbers can be processed more
easily than strings, keywords, names and strings are replaced by
numbers, and literals are converted to the internal numerical representation of
the machine. This process is known as arithmetization. Names are stored
in a name list, strings in a string list, and literals, possibly, in a constant
list.
1.4 Example Lexical analysis
The source statement
x := 3 + base * factor,
contains the names x, base and factor; the numerical value 3, the
character combination *:=' and the individual characters V, '*', and V- If
ident, becomessy, number, plussy, timessy and semicolonsy are
names for the arithmetized symbols, lexical analysis yields the sequence
of 8 symbols:
ident becomessy number plussy ident timessy ident semicolonsy
Sec. 1.2
Static compiler structure
7
Some of these symbols are uniquely determined (e.g. plussy); others
such as ident and number refer to a class of symbols and must be made
unique by a semantic value (e.g. an index in the name list for names, the
converted numerical value for literals). If x, base and factor are stored
respectively in places 1, 2 and 3 in the name list, lexical analysis yields
the following symbols with their semantic values:
identll becomessy number/3 plussy ident/2 timessy ident/3
semicolonsy
Lexical analysis is the simplest part of the compiler. However, it does take up
a large portion of the compilation time (typically 20 to 40%), which means that
efficiency is especially important.
A lexical analyzer written in Cocol is described in Section 8.1. But lexical
analyzers are not discussed anywhere else in the book and the reader is
referred to the literature, for example Gries [1971] or Bauer [1976].
Syntax analysis
Syntax analysis decomposes the source program, which now consists of
symbols, into its grammatical parts and represents its structure as a tree (called a
syntax tree) or as something equivalent to a tree.
vl
+ v2
v3
f
lable
1
1
Term
I
<
1
1
Expression
1
1
Te
l 1
•
rm
Assignment
Fig. 1.6 Syntax tree
1.5 Example Syntax tree
The source statement in Example 1.4 is an assignment. An assignment
consists of a variable, the assignment symbol, an expression and a
closing semicolon. An expression consists of terms connected by addition
operators, and terms consist of factors connected by multiplication
operators. This yields the syntax tree in Fig. 1.6.
8 Introduction and survey Chap. 1
Syntax analysis is much more difficult than lexical analysis. There are,
however, methods for syntax analysis which are based on the grammar of the
source language. Knowledge of these methods makes syntax analysis a
routine task.
Semantic Analysis
Semantic analysis examines the properties of the source program which cannot
be represented grammatically, in particular:
1. the scope of names;
2. the correspondence between declarations and uses of names;
3. the type compatibility of operands in expressions and statements.
Semantic analysis and syntax analysis can be performed together, in which
case the two phases merge; or they can be performed separately, in which case
the syntax tree, the result of the syntax analysis, is augmented with semantic
information.
1.3 Dynamic compiler structure
Dynamic, or time-dependent, compiler structure must be distinguished from
static, or logical, compiler structure. The individual logical divisions - lexical
analysis, syntax analysis, semantic analysis, optimization and code generation
- can be executed either sequentially or simultaneously, which means
interwoven in time. Each part of the compiler which reads the source program or
an intermediate program in its entirety is called a pass, and thus compilers are
classified as single-pass or multi-pass compilers.
Figure 1.7 shows both cases. For a single-pass compiler the syntax
analyzer is the central, controlling program. It calls the lexical analyzer when it
requires the next source symbol, and it calls the semantic analyzer when it
wishes to pass on a syntactically correct construction. The semantic analyzer
generates a section of intermediate code or the corresponding machine code
(with or without optimization). For a multi-pass compiler each section is
executed sequentially. The result of each section is an intermediate program
which is written onto an external storage device and is read again by the next
pass.
Single pass compilers are generally much faster than multi-pass compilers
because they avoid access to external storage devices for reading and writing
gec i 3 Dynamic compiler structure 9
intermediate programs. Multi-pass compilers, on the other hand, require less
storage space because only one part of the compiler need ever reside in main
storage at once, and they are logically simpler because the various parts are not
intertwined. Some source languages cannot even be compiled by single-pass
pass compilers because they contain grammatical constructs whose translation
requires information which becomes available only from parts of the source
program that are processed later. This is the case, for example, when a
variable can be used before it is declared.
The advantages and disadvantages of single-pass and multi-pass
compilers can be summarized as in Fig. 1.8.
Source program
_J
Syntax
analyzer
Symbols
Source
program
Tree parts
Lexical
analyzer
Lexical
analyzer
—I
External memory
I
Semantic
analyzer
Control flow
Dataflow
Syntax
analyzer
External memory
Intermediate language
Optimization
ad
code generation
program
Semantic
analyzer
—»
External memory
_J
Optimization and
code generation
J
Target program
Fig. 1.7 Single-pass and multi-pass compilers
Single-pass
Multi-pass
compiler
+
-
-
-
_
+
+
+
Speed
Memory
Logical complexity
Universal applicability
Fig. 1.8 Properties of single-pass and multi-pass compilers
+ = favorable - = unfavorable
10 Introduction and survey Chap. 1
1.4 The structure of the book
This book consists of nine chapters and six appendices. The first three
chapters cover the principles of compiler construction as far as they are required for
an understanding of Coco; occasionally rather more than the minimum is
presented in order to provide a well-rounded picture. The fourth chapter provides
a glimpse into other compiler compilers, and the rest of the chapters present
Coco itself, its compiler description language, its implementation and
applications. In view of this an oudine looks as follows:
Principles of compiler construction
1. Introduction and survey
2. Syntax
3. Semantics
Various compiler compilers
4. Various compiler compilers
The compiler compiler Coco
5. The compiler description language Cocol
6. The compiler compiler Coco
7. The implementation
8. Applications
9. Experiences with Coco
The second chapter starts with those concepts from formal language theory
which are necessary for the remainder of the book. Then table-driven LL(1)
syntax analysis is covered; this determines the fundamental structure of this
compiler compiler, and at the same time is a simple and efficient method for
developing the syntactic section of compilers. Most importantly this chapter
contains a method for automatic error recovery which is independent of the
language to be analyzed. .
In the third chapter, the method applied in this compiler compiler for
describing the actual translation process, using attributed grammars, is
presented. The special case of L-attributed grammars is used here and the translation
process is described by attributes, context conditions and semantic actions.
The fourth chapter gives a survey of a few compiler generators described
in the literature, and thus also surveys the state of the art.
The fifth chapter is a definition of the compiler description language
Cocol.
Sec. 1.4
The structure of the book
11
The sixth chapter describes Coco from the view point of the user: its
characteristics, how to use it and what the compilers it generates look like.
Along the way it is shown that Coco is also suitable for implementing
multipass compilers. This chapter, together with the language description of
Chapter 5, forms the 'external' description of Coco.
The seventh chapter, the longest, contains the details of the
implementation of Coco. This chapter is also intended as a study in program
documentation.
The eighth chapter presents three major examples of the use of Coco. The
first is a complete description of a lexical analyzer in Cocol. The second
illustrates Coco as a software engineering tool and the method of attributed
grammars as a software engineering method which encompasses the Jackson
method as a special case. The third presents the compiler sections generated
for a concrete input grammar.
In conclusion the ninth chapter presents experiences of the authors with
Coco.
The Appendices contain the algorithm description language Adele used
here, describe Modula-2 in as much as it differs from or supersedes Pascal,
present a complete listing of Coco in Modula-2 and a description of Coco in
Cocol, that is a self-description of Coco.
Systematic readers should read the book chapter by chapter. Readers who
wish to begin with lexical analysis should consult Section 8.1 as early as
Chapter 2. Readers who wish to know about Coco only (or firstly) from the
user's point of view can start immediately with Chapter 5 followed by
Chapters 6 and 8, and perhaps Chapter 4.
Finally, readers who are already familiar with LL(l)-grammars and are
primarily interested in the implementation of Coco can acquaint themselves
first with Cocol in Chapter 5 and then concentrate on Chapters 6 and 7,
although they will occasionally have to refer back to Chapters 2 and 3.
The following chapter sequences are therefore recommended (Chapters
which extend the material are in italics):
Novices and all-embracing readers: 2-9
Primarily interested in applying Coco: 5, 6, 8, 4
Primarily interested in comparing Coco: 4, 5, 6, 8
Primarily interested in the implementation of Coco: 5, 6, 7, S.J
Some remarks have been repeated so that the chapters do not become too
interdependent. We hope the all-embracing reader will forgive us for this.
In general the presentation is organized according to the principle of
stepwise refinement. This is true of the individual chapters as well as for the
12
Introduction and survey
Chap. 1
book as a whole. Thus Chapters 2 and 3 are basically refinements of Section
1.2, Chapters 5 and 7 refinements of Chapters 2 and 3 and Appendix F,
containing the text of Coco in Modula-2, is a refinement of Chapter 7.
For representing algorithms our algorithm design language Adele is
used. It is defined in Appendix A, but should be understandable without a
definition as it relies strongly on Modula-2 and Pascal. The authors use Adele
constantly in their daily work and view Adele as a method for algorithm
description which is adequate in most cases.
Actual Modula-2 programs occur only in the appendices, but there are
also Modula-2 fragments in Chapters 5 and 7. The book is therefore
understandable for readers who are not familiar with Modula-2. In spite of this,
Modula-2 is viewed as of major importance in this book because of the
technique of modular programming, and especially because of data encapsulation.
One of the book's important aims is to document a large Modula-2 program
and to demonstrate in the process how well Modula-2 is suited to software
engineering projects.
Definitions, algorithms and examples are numbered and indented. A
collection of all numbers is to be found after the table of contents to facilitate fast
searching.
2
Syntax
In this chapter we deal with all syntax-related questions as far as they concern
compilers that use LL(1) syntax analysis. First, we will summarize the
terminology and some important results of formal language theory. Next, we
look at LL(1) grammars and their syntactical analysis. Since the flexibility
and efficiency of syntax analysis depends to a large degree on the
representation of the grammar in memory, we will describe the tree-like data structure
used in Coco which is called a top-down graph. We will also describe an
optimized version of the top-down graph, called the G-code, which is
especially suited for interpretation. At the end of the chapter we describe the G-
code syntax analyzer and a method for automatic error handling.
Except for the G-code and its interpretation this chapter is not Coco
specific. Thus, it can be read as a general treatment of syntax issues in compiler
design. Bottom-up analysis and LR(fc) grammars have been left out, since
they constitute a large and self-contained topic that does not apply to Coco.
Interested readers are referred to Knuth [1965], Aho and Johnson [1974],
Waite and Goos [1984], and Fischer and LeBlanc [1988].
2.1 Basic concepts from formal language theory
We assume that the reader is familiar with the elements of formal languages,
and we summarize here only the terms and definitions that we will use later
on. We primarily use the terminology from the books of Gries [1971] and
Aho and Ullman [1972].
14 Syntax Chap. 2
Symbols and strings
Programs consist syntactically of sequences or strings of symbols which
belong to an alphabet or vocabulary. If a, b, c are the symbols that
constitute the alphabet V, then we can write:
Symbols can be concatenated to form strings. For some strings and sets of
strings there are commonly used abbreviations:
2.1 Definition Abbreviations for strings and sets of strings
d1 denotes the string consisting of n identical symbols a, e. g. a3 = aaa.
e denotes the empty string, i.e. a string of null symbols.
a* denotes the set {an: n £ 1}, e. g. <z+ = {a, aa, aaa, aaaa,...}.
a* denotes the set {a»: n ^ 0}, e. g. <z*= {e, a, aa, aaa,...}. *
It is obvious that a* = cfr u {e}.
V+ denotes the set of all non-empty strings which can be formed from
the symbols contained in V. For example, if V = {a, b> c] then
V+ = {a, 6, c, aa, ab, ac, ba, bb, be, ca, cb, cc, aaa,...}
V+ is called ihtfree semi-group over the alphabet V.
V* denotes the set of all strings including the empty string that can be
formed from the symbols of V. For example, if V = {a, b, c}
then
V* = {e, a, bt c, aa, ab, ac, ba, bb, be, ca, cb, cc, aaa,...}
V* is called ihtfree monoid over the alphabet V.
It is obvious that V* = V+ u {e}.
The set V is always finite whereas the sets a+, a*, V+, V* are always
infinite.
Grammar and language
In Section 1.2, we showed that the grammatical structure of an instruction, a
program, or generally of a 'sentence' is a tree, the syntax tree. In the syntax
tree, there are two types of symbols:
1. Terminals are the symbols of the sentence itself. They are the leaves of
the tree and cannot be decomposed further.
2. Nonterminals are all other symbols.
Basic concepts from formal language theory 15
In addition to the above, each tree contains a distinguished nonterminal, the
sentence symbol, or the root, from which the entire tree originates. The valid
structures of syntax trees and hence the sentences of a formal language are
described by a grammar.
A context-free grammar or, simply, grammar - since we only use
context-free grammars - is a system of rules for producing strings over an
alphabet V.
2.2 Definition Grammar
A grammar G is a quadruple G = (V#, VT, R, S) with the following
components:
VN: alphabet of nonterminals,
VT: alphabet of terminals,
R: set of productions, also called syntax rules or derivation rules,
S: sentence symbol, a special symbol from VN, the root of the syntax
tree.
By V = VN u VT we denote the union of the terminal and nonterminal
alphabets.
A production is written as
X -> a where X e VN and a e V*
(read: X is defined as a' or X can be replaced by a* which means that the
nonterminal X can be replaced by the string a in each string that contains X.
Several productions may have the same left-hand side, such as:
X-> ax
X -> a2
They denote alternatives and can be grouped by use of the symbol T:
X —> a! I a21 <*3
(read: fX is defined as o^ or oc2 or a3').
The productions describe the replacement of nonterminals by strings. We
start from the sentence symbol 5, and replace it by a string according to the
productions of the grammar. Then we repeatedly replace nonterminals in the
string by other strings until we reach a string that contains only terminals. S
itself and all strings that result from S by the application of the productions
are called sentential forms. The sentential forms that consist of terminals only
are called sentences.
16
Syntax
Chap. 2
We denote replacement by the replacement or derivation symbol =». If a
and p are two sentential forms and p may be derived from a by the application
of a production, we write:
a=>p
(read: 'a produces p' or 'p is derived from a1).
These terms are formalized by Definition 2.3 and are illustrated by
Example 2.4.
2.3 Definition Derivation, sentential form, sentence, language
We say that a string a directly produces a string p, written a =» p, if
there exist strings ©i and oo2, such that we can write a = ©x A ©2,
P = ©j q> ©2 and the production A => <p belongs to the grammar. We then
call p a direct derivation of a. We describe a sequence of several
derivations by the symbols =»+ and =»*. A string a produces a string p,
written as
a=»+p
if there exists a sequence of direct derivations
a = ©o =* ©i =» ©2 => ... =* ©/i = P where n ^ 1
Such a sequence is called a derivation of length n. For the case of
a =>+ p or a = p, we write
a=**p
(read: 'a produces or is equal to p'). If G is a grammar with sentence
symbol 5, then a string a is called a sentential form if
A sentence is a sentential form that consists only of terminals, and a
language L(G) is the set of all sentences that can be derived from the
sentence symbol:
L(G) = {a:S=»+a & aeVT*}
2.4 Example Derivation of all sentential forms of a language
Consider the grammar G = ({5, A, 5), [a, b, ;}, R, S) with the
nonterminals 5, A, B, the terminals a, b,;, and the set R of
productions:
2. l Basic concepts from formal language theory 17
S -* A;
A -> aB I BBb
B -> b I ab
From this, the following derivations of sentential forms can be produced:
S =* A; =* aB; =» ab;
=* aab;
=* BBb; =* bBb; =* bbb;
=* babb;
=* abBb; =» abbb;
=* ababb;
The result is L(G) = {ab; aab; bbb; babb; abbb; ababb;}. Hence,
the language L(G) consists of 6 sentences.
A syntax tree can be assigned to each sentence. Figure 2.1 shows the syntax
tree of abbb; in two forms.
s
1
♦ ♦
A ;
i
B B b
B
'
♦
A
1
1
B
S
\
Fig. 2.1 Two forms of syntax tree for abbb;
In the top-down syntax analysis discussed later on, we will always use
derivations in which the leftmost nonterminal is replaced. This kind of derivation
is called left-canonical:
2.5 Definition Left-canonical derivation
A direct derivation eoi A ©2 =
written as
eoi A ©2 => coi a ©2
> 1 a © 2 is called left-canonical, and
if o>i g Vy*f that is if A is the leftmost nonterminal. A derivation is
called left-canonical if all its direct derivations are left-canonical.
Sometimes it is useful to have a name for the string that is substituted for
a nonterminal during a derivation. This string is called a 'phrase*.
18 Syntax Chap. 2
2.6 Definition Phrase
When ©i a o>2 is a sentential form such that
5=» (Oi A ©2=»*©i a ©2,
then a is called & phrase, more specifically an A-phrase.
According to this definition, each sentential form is an 5-phrase.
Because of their importance in bottom-up syntax analysis, which is not
covered in this book, we shall also define the terms simple phrase and
handle.
2.7 Definition Simple phrase, handle
If a is an A-phrase and a direct derivation of A, then
5 =»* ©i A ©2 ^ a>i a ©2
holds and a is also called a simple A-phrase. The leftmost simple phrase
in a sentential form is called the handle of the sentential form.
2.8 Example Phrase, simple phrase, handle
Consider Example 2.4 and the derivation sequence
S =» A; =» BiB2b3; =» B1b2b3/' =» abib2b3;
where the different fls and 6s have been distinguished by an index. In
the sentential form abib2b3;
abi is a simple B-phrase and the handle,
b2 is a simple B-phrase,
abib2b3 is an A-phrase,
abib2b3; is an 5-phrase.
In the sentential form B^bs;
b2 is a simple B-phrase and the handle,
Bxb2b3 is an A-phrase,
Bib2b3; is an S-phrase.
In the sentential form BiB2b3;
BiB2b3 is a simple A-phrase and the handle,
BiB2b3; is an 5-phrase.
In the sentential form a;
a; is a simple 5-phrase and the handle.
Sec. 2.1 Basic concepts from formal language theory 19
Recursive productions produce languages with an infinite number of
sentences. The production A -> a I Ab produces the set of sentences ab*. The
production A -> a I bA produces the set of sentences b*a, the production
A -> a I (A) produces the set of sentences {(n a )n: n > 0}.
Recursion can also appear indirectly, which means it can span several
productions, as in the production pair
A -> x | By
B -> z I Au
The following definition is a consequence of this:
2.9 Definition Recursive grammar
A grammar is called recursive if it permits derivations of the form A =*+
©i A ©2 with A e V#, ©1 e V*, 002 e V*. More specifically, it is called
Left-recursive if A =>+ A ©
Right-recursive if A =»+ eo A
Central-recursive or self-embedding if A =»+ ©1A ©2*
2.10 Example Arithmetic expressions
The grammar of arithmetic expressions with the sentence symbol £ and
the terminals v for variables, and c for constants:
E->T|+T|-T|E + T|E-T
T->F|T*F|T7F
F -> v I c I ( E )
is left-recursive in E and T, and central-recursive in £, T, and F.
The extended Backus-Naur form (EBNF)
Computer science uses various notations for grammar productions. The
previously used notation has the following characteristics:
1. terminals are lower case
2. nonterminals are upper case
3. replacement symbol is -»
4. separation of alternatives is denoted by I
Indefinite repetition, which is a frequently occurring language element, must
be described by recursive productions, especially left-recursive productions.
This appears in many cases unnatural and it is also unsuited for top-down
syntax analysis. Several grammatical notations have therefore evolved that
20 Syntax Chap. 2
remove these and other deficiencies. Among these, the notation introduced by
Wirth [1982] for the description of Modula-2 is especially notable because of
its simplicity and clarity. Its characteristics are:
1. Terminals that represent themselves (literals) are in quotes
2. Other terminals and nonterminals have names that imply their meaning
(this is customary but not mandatory)
3. Replacement symbol is =
4. Separation of alternatives is denoted by I
5. Productions are ended explicitly by a period
6. Option symbol: [A] means A I e
7. Repetition symbol: {A} means e IA IA A IAA A I...
8. Parentheses for enclosing
The grammar of the arithmetic expressions is as follows:
expression = [»+«|n-»] term {(w+«|n-n) term},
term = factor {(n*n|n/w) factor},
factor = c I v I n(n expression n)n.
The form of the EBNF grammar itself can also be described by an EBNF
grammar:
EBNFGrammar = production {production} n.n.
production = symbol "=n expr.
expr = term {n|n term},
term = factor {factor},
factor = ident I string
I "(n expr w)n | n[n expr n]n | n{n expr n}n.
ident is the terminal for names, string is the terminal for a character string
enclosed in quotes.
In this book, we will primarily use Wirth's EBNF notation. However,
where structural simplicity of the grammar is important, we will still use the
older notation of the formal languages.
Reduced grammars
In the grammar of a programming language, each nonterminal and each
alternative should contribute to the generation of sentences. If this is the case, the
grammar is called reduced. In the development of a grammar, unnecessary
nonterminals and alternatives may creep in. Therefore, each newly developed
grammar should be checked to determine if it is reduced. If it is not, the
unnecessary symbols and productions should be removed.
In order to contribute to the generation of sentences, each nonterminal
must meet the following two conditions: It must be 'terminating', that is, it
; 2.1 Basic concepts from formal language theory 21
must produce a terminal string, and it must be derivable', that is, it must
appear in some sentential form.
2.11 Definition Terminating symbol, derivable symbol
A nonterminal A is called terminating if it produces a terminal string,
that is
A=>+a withaeVy.
A nonterminal A is called derivable if it appears in a sentential form, that
is, if
S=> c»iAg)2.
A nonterminal that is not derivable or not terminating, contributes nothing to
the generation of sentences, and is therefore useless.
2.12 Definition Useless symbol
A nonterminal A is called useless if there is no derivation
S=* o>iA ©2=>*coi aa)2 where ©i, ©2, a eV*
2.13 Definition Reduced grammar
A grammar that contains no useless nonterminals is called reduced.
Algorithms for the detection of all useless symbols are simple (see Sections
7.5.2 and 7.5.4, or Hopcroft and UUman [1979]). If one wants to delete
them, the order is important. First, the nonterminating symbols must be
found and all alternatives in which they appear must be deleted from the
grammar. Then, the nonderivable symbols and the alternatives in which they
appear must be found in the new grammar and deleted. Automatic deletion is
possible but not recommended since useless symbols often indicate errors in
the grammar.
Even after removing useless symbols, the grammar may still contain
useless alternatives, which permit derivations of the form A =»+ A. Such a
derivation is called a circular derivation, and the grammar is called circular or
cyclic. Section 7.5.3 contains an algorithm for a circularity check of a
grammar. The book by Hopcroft and UUman [1979] contains an algorithm for the
deletion of productions where the right-hand side consists of only a
nonterminal, and thus for the removal of cycles.
In the foUowing, we wUl cover only non-circular reduced grammars.
22
Syntax
Chap. 2
Grammatical levels
Programming languages contain constructs of various hierarchy. At the very
top art programs, which are composed of modules, procedures, declarations
and statements. Declarations and statements in turn are composed of
expressions, keywords, names and numbers. Names and numbers are composed
of characters. It is somewhat arbitrary which of these constructs are defined
as terminals. If one only wants to show the nesting of procedures, then
declarations and expressions can be regarded as terminals. If one wants to describe
the structure of expressions, then keywords, names, numbers, and operators
can be regarded as terminals. Only if one wants to descend further must
individual characters be seen as terminals.
In this way, the syntax of a programming language need not be
completely described by one grammar, but may be partitioned into several grammatical
levels. The terminals of the higher level are the nonterminals of the lower
level.
In compiler design, usually two levels are used: the syntactical and the
lexical level. The syntactical level is the higher of the two; its sentence symbol
is the program. Its terminals are keywords, names, numbers, operators, etc.
Below this, nonterminals of the lexical level are keywords, names, numbers,
and special symbols. Its terminals are the individual characters of the source
text, insofar as they are meaningful for the grammar (comments, end-of-lines,
and meaningless empty symbols are not part of grammar). Figure 2.2 shows
this relationship.
level
syntactic
lexical
nonterminals
program
procedure
statement
expression
name
number
keyword
terminals
name
number
keyword
individual character
Fig. 2.2 Syntactic and lexical grammatical levels
In this book, we consider mainly the syntactical level. This results in a
difficulty with the notation of terminals. From the syntactical level, the expression
a + b * 3
gec 2.2 LL (1) grammars and syntax analysis 2 3
consists of two names v, a number c, and the operators V and '*':
v + v * c
While the terminals V and '*' are simultaneously members of the syntactical
and the lexical level, the terminal v denotes all names, and the terminal c
denotes all numbers. In order to emphasize this fact, we call terminals of the
syntactical level that represent an entire class of symbols of the lexical level, a
terminal class. Thus, in the grammar of arithmetic expressions, v and c are
terminal classes, and +, -, *, /, (,) are individual terminals.
It is to some extent arbitrary which terminals of the lexical level are also
considered as terminals of the syntactical level, and which are combined to
terminal classes. For instance, the operators *, /, and MOD from the lexical
level can be considered at the syntactical level as individual terminals or can be
combined at the lexical level to the terminal class mulop by the production
mulop = "*" I "/" I "MOD".
2.2 LL(1) grammars and syntax analysis
A grammar for a language can be used in two different ways: as a generative
grammar for the generation of sentences of the language, and as an analyzing
grammar for the decision whether a given string is a sentence of the language.
The generation of sentences is a trivial, straightforward, combinatorial
problem, and of no interest in the practical areas of computer science.
However, the aspect of the generative grammar is important in theoretical computer
science and mathematics. In these fields grammars are classified according to
the expressive power and the structural characteristics of the languages they
generate.
The analysis, more precisely the recognition of sentences is, from a
mathematical point of view, also a trivial problem. All sentences of the
grammar may simply be generated in ascending order by their length, and it is then
easily determined if the specified string is among the sentences (search by
exhaustion). In reality, this is not feasible since the number of sentences
generally grows exponentially with their length. Analysis methods are needed that
make use of all information in the grammar, and that perform the analysis of
the given string in a minimum of time and memory requirement. These
methods can be separated into two main categories: top-down methods start at
the top with the sentence symbol and move downward by repeated derivations
trying to find a sentence which matches the given terminal string; bottom-up
24 Syntax Chap. 2
methods start at the given terminal string and move upward by repeated
reductions of phrases until the sentence symbol is reached. In addition to these
two main approaches, there are analysis methods that mix the top-down and
bottom-up approach.
In this book, we will cover only top-down analysis.
In top-down analysis, we start from the sentence symbol and repeatedly
generate new sentential forms by left-canonical derivations, with the goal of
deriving a sentence matching the given string. If this is successful, the string
has been parsed. If it is not successful, and we have exhausted all possibilities
for the derivation of sentences that match the string, then it is clear that the
symbol string is not a sentence of the grammar.
The only difficulty with this approach is the selection of the correct alter-
native. Generally, there is not enough information available at the time when
the selection between several alternatives must be made to be reasonably sure
of choosing the correct one. Therefore, usually the alternatives must be tried
one after the other until the correct one is found. The alternatives that have
been tried unsuccessfully are dead ends from which one has to return by
backtracking. Fortunately, programming languages are structured in such a
way that the proper alternative can be determined with certainty by considering
only a part of the input string. These grammars are called deterministic. In
compiler construction, only deterministic grammars are used, and so we shall
cover only the top-down analysis of deterministic grammars.
Deterministic top-down parsing
The concept of deterministic top-down parsing consists in selecting the proper
alternative by looking at the start symbols of the string to be analyzed. In this
way parsing proceeds from left to right. Consider, for instance, the grammar
S -> aA | bB
A -» x | aB
B -> y | bA
and the input string a = bbay. The grammar has the property that all of its
alternatives start with terminals, and also that the heads of the alternatives are
different in each rule. This property permits the dead-end-free determination of
the correct alternative by consulting the string a. Assuming that the string is
read from left to right, the parsing proceeds as follows:
1. In the beginning, when a choice has to be made between S =*aA and
S =*bB, the first symbol of a is read, b is found, and therefore it is
gec 2.2 ££ (1) grammars and syntax analysis 2 5
known that S =*bB must be the correct alternative since S =*aA can
never lead to a sentence starting with b.
2. If bB is further derived, one has the choice of replacing B with y or
with bA. If the next symbol is read, a b is found, and so bA must be
the correct alternative.
3. Continuing this procedure, the following derivations are generated:
S =>bB =>bbA =*bbaB =*bbay
resulting in the recognition of a as a sentence.
From the above derivation, the syntax tree of Fig. 2.3 follows.
s
B
A
lllB
I I I *
b b ay
Fig. 2.3 Syntax tree
This is the essence of deterministic top-down parsing: Starting with the
sentence symbol, a sequence of left-canonical derivations is built, selecting the
correct alternatives by the inspection of the string to be parsed. The string is
read from left to right.
More than one symbol of the input string must be considered when
several alternatives of a production start with the same symbol. This lookahead is
a characteristic of the LL(k) grammar:
2.14 Definition LL(k) grammar
A grammar is called LL(£) (deterministically recognizable from left to
right with left canonical derivations and a lookahead of 4 symbols) if its
sentences can be parsed by a top-down analysis from left to right in such
a way that in each situation where a choice must be made between several
alternatives, the correct alternative can always be found by considering
the next k symbols of the input string.
26 Syntax Chap. 2
Since it is desirable to restrict the lookahead to one symbol, and since it turns
out that this restriction allows us to handle most practical cases, we will
examine more closely only the LL(1) grammars. The main question is how to
determine if a given grammar is LL(1). We will answer this question first for
e-free grammars (i. e. grammars without empty alternatives), and then for
grammars that do contain empty alternatives.
LL(1) Grammars without empty alternatives
Even a grammar whose alternatives begin with nonterminals may be parsable
without running into dead ends. Consider the grammar
S -> Aa I Bb
A -> xz | yB
B -) uz | vA
and the string a = uzb. Even though none of the alternatives of the production
for S starts with w, it is obvious that only B can be derived into a string
starting with u, while all derivations of A start with x or y. The symbols x
and y are the 'terminal start symbols' of A, and u and v are the terminal
start symbols of B. The concept of a set of terminal start symbols is central
for the description of the LL(1) property.
2.15 Definition Terminal start symbols of a nonterminal
The set//m(A) of terminal start symbols of the nonterminal A is
defined to be the set of all terminals with which a string derived from A can
start:
first(A) = [x:A=>*xco, for xeVjandtoeV*}
For the production A -» e, first(A) = 0 (the empty set)
This definition can be expanded in a natural way for a string as argument:
2.16 Definition Terminal start symbols of a string
The sztfirst(a) of the terminal start symbols of a string a is defined to be
the set of all terminals with which a or a string derived from a can start:
first(a) = {x : a =>* x a>, for x e Vy and © € V*}
As a special case we dcfme first(t) = 0.
With the concept of terminal start symbols, it is easy to define the conditions
under which an e-free grammar is LL(1):
gec 22 LL (1) grammars and syntax analysis 2 7
2.17 LL(1) condition for z-free grammars
An e-free grammar is LL(1) if, for each of its productions, the sets of
terminal start symbols of its alternatives are pairwise disjoint. That is,
for each of its productions
A-> ai I ... I a„
the following holds:
first(ai) nfirst(aj) = 0 for 1 < i *j < n
2.18 Example LL(1) conditions
The grammar
S -» A;
A -> aB | BBb
B -» b I ab
is not LL(1) since the following is true for the production
A-*aB\BBb:
first(aB)={a}, first(BBb)= {a,b},
and hence
first(aB) n first(BBb)= {a}
The sets of terminal start symbols of both alternatives are not disjoint.
Both alternatives can start with an a. As a result, if a choice must be
made between alternatives, and a is the leftmost symbol of the input
string, the correct alternative cannot be found without a lookahead of
more than one symbol.
No left-recursive grammar is LL(1) since for a production of form
A -> a | A p the following is true: first(a) = first(A p).
LL(1) Grammars with empty alternatives
For a grammar with empty alternatives, the LL(1) condition of the preceding
section no longer holds. Consider, for instance, the grammar
S -> aA; | bAc;
A -> c I e
and the input string a = be;. It is obvious that the production for S meets the
LL(1) condition 2.17 which is also true for the production for A because
f irst(c) = {c}, f irst(e) = 0 and hence f irst(c) n f irst(e) = 0
28 Syntax Chap. 2
However, the grammar is not LL(1) since after the derivation
S =* bAc;
it is impossible to determine with a lookahead of only one symbol whether
A-> c or A -» e must be used for the next derivation/The choice of A -» c:
S => bAc; => bcc;
does not lead to a. The choice of A -> e is the correct one. Therefore, the
grammar is not LL(1).
If we must choose one of the alternatives of a production
a -> otj I... I aRle
and only the next symbol of a can be used, then the terminal start symbols of
ai to an and the terminal successors of A must be pairwise disjoint, since
in the case of the production A -> e , the terminal following A is the next one
in a. The set of terminal sucessors is defined as follows:
2.19 Definition Terminal successors
The set follow(A) of the terminal successors of a nonterminal A is the
set of all terminals that can follow A in any sentential form:
follow(A) = {jc : S =»* o>i A x a>2, for A e V#, x e Vr, a>i, ©2 e V*}
This definition makes it possible to determine the conditions under which an
arbitrary grammar is LL(1):
2.20 LL(1) conditions for arbitrary grammars
A grammar is LL(1) if (1) for each of its productions, the sets of all
terminal start symbols of all alternatives are pairwise disjoint, and (2) for the
nonterminals which can be derived into the empty string, all terminal
successors of the nonterminal are disjoint from the terminal start symbols of
each alternative. Formally: for each production
A -> ai I... I a„
the following must hold:
first(a.ifollow(A)) nfirst(ajfollow(A)) = 0 far 1 £ 1 *] <> n
Note that in the formal representation the cases a/ *>* e and a j => * e are
combined. For a,- ^ ewe have first(aifollow(A)) ^flrst(ai)9 for
at =» e we hdiVtfirst(aifollow(A)) =follow(A).
22 LL(1) grammars and syntax analysis 29
2.21 Example LL(1) conditions
Consider the grammar of Knuth[1965]:
S -* E;
E -» aAbE I bBaE I 6
A -> aAbA I 6
B -> bBaB | 6
Is it LL(1)? Since e appears in the productions for E, A, and B, the
terminal successors of E9 A, and B are needed. From the grammar it can
be easily seen that
follow(E) = {;}
follow(A) - {b}
follow(B) = .{a}
The lookahead sets are:
for the alternatives of the £ production first (aAbE follow (e)) = {a}
first(bBaE follow(E)) = {b}
first(£ follow(E)) = {;}
for the alternatives of the A production first (aAbA foiiow(A)) = {a}
first(6 follow(A)) = {b}
for the alternatives of the B production first (bBaB foiiow(B)) = {a}
Since the lookahead sets are pairwise disjoint for the alternatives of each
production, the grammar is LL(1).
The calculation of the successor sets is not always easy as we can see in the
following example of an if statement having a dangling else clause.
2.22 Example Dangling else
Consider the grammar
1 program -> statement programrest
2 programrest -> program | end
3 statement -> assignment I ifstatement
4 assignment -> v := expr ;
5 ifstatement -4 if thenpart elsepart
6 thenpart -» expr then statement
7 elsepart -» else statement I £
with the sentence symbol program and the terminals end, v, :=, expr,
;, */, then, else.
30
Syntax
Chap. 2
Is it LL(1)? There are three productions with alternatives: programrest,
statement, elsepart. The first two are LL(1) since
{v,if}, first(end)
{v}, first(ifstatement)
The calculation of follow(elsepart) is longer:
first(program)
first(assignment)
{end}
{if}
follow(elsepart)
follow(ifstatement)
follow(statement)
with the result:
follow(elsepart)
= follow(ifstatement)
= follow(statement)
= first(programrest)
U follow(thenpart)
u follow(elsepart)
= first(programrest)
U follow(thenpart)
U follow(elsepart).
by production 5
by production 3
by production 1
by production 6
by production 7
Since the last term on the right-hand side agrees with the left-hand side, it
adds nothing to the set. In addition, since
first(programrest) = first(program) u first(end)
= {v,if,end}
we have
follow(elsepart)
Additionally,
follow(thenpart)
first(elsepart)
follow(ifstatement)
hence
follow(elsepart)
= {v,if,end} u follow(thenpart),
= first(elsepart)
u follow(ifstatement)
= {else}
= follow(statement)
= {v,if,end} u {else}
- {v,if,end,else}
by production 5
by production 7
by production 3
Checking the LL(1) condition for production 7 results in:
first(else statement) n follow(elsepart) = {else} * 0.
The grammar is therefore not LL(1).
The fact that the grammar in this example is not LL(1) does not preclude it
from being deterministically parsable with a lookahead of one symbol. The
syntax analyzer can always choose the first alternative when it sees the
production elsepart and else is the next input symbol. In spite of the ambiguity
of the statement
gea 2.2 LL (1) grammars and syntax analysis 31
if a then if b then c else d
the first else then always belongs to the innermost then (as in PL/I and
Pascal).
LL(1) grammars and grammars of programming languages
The LL(1) conditions severely restrict the class of grammars that can be
analyzed deterministically. Almost all programming language grammars violate
the LL(1) conditions. Especially disturbing are two facts:
1. Left-recursive productions are not LL(1).
2. Alternatives that start with the same string are not LL(1).
However, it is almost always possible to transform non-LL(l) constructs into
LL(1) constructs. This is greatly aided by the use of EBNF notation. With it,
left-recursive productions can be described by use of the repetition symbol {},
and common beginnings of alternatives can be extracted by factorization. We
have defined the LL(1) conditions only for grammars with simple BNF
productions. So the question must arise how they look when an EBNF grammar
is used. We will defer the answer for the time being until the end of Section
2.3.
Computation of start and successor sets
For small grammars, the calculation of start and successor sets to check for the
LL(1) property can be done by careful visual inspection. However, an
algorithm is needed for larger grammars. Since the derivation of the form A =>+ e
plays an important role, we will first introduce the concept of 'deletability1.
2.23 Definition Deletability
A nonterminal A is called deletable, if it produces the empty string:
In this section we will write deletable symbols in square brackets: [A].
An algorithm for marking deletable symbols in a grammar is trivial. It is
based on the following assertions:
1. If A -» e is a production then A is deletable.
2. If A -» X\... Xn is a production and all X,- are deletable, then A is
also deletable.
32
Syntax
Chap. 2
2.24 Algorithm Marking deletable symbols
MarkDeletableSymbols:
Mark all nonterminals A for which A-»e exists;
repeat
— Assert: All marked symbols are deletable
Mark all nonterminals A for which A -> Xi...Xn
and Xi...Xn are all marked nonterminals
until No new symbol was marked
end MarkDeletableSymbols
Sets of terminal start symbols. Three cases must be distinguished for the
calculation of the terminal start symbols of a string a:
1. the string is deletable;
2. its first nondeletable symbol is a terminal y;
3. its first nondeletable symbol is a nonterminal Y.
From this, computation rules (1) through (3) follow:
L for a =[Xi] ...[**],
first (a) = first(Xi) U.. .U first(Xt)
2. for a = [Xi] ... [Xt] y a>,
first (a) = first (Xi) U.. .U first (Xt) U {y}
3. fora=[Xi]... [Xt]Y(o9
first (a) = first(Xi) U...U first(Xt) U first(Y)
The set of start symbols of a nonterminal is the union of the sets of start
symbols of its alternatives:
4. for a -> ai I... I an>
first(A) = first(ai) U...U first(an)
From these computation rules, the following algorithm is derived.
2.25 Algorithm Calculation of the sets of terminal start symbols
FindFirstSets(lGtfirst):
param G: A grammar with marked deletable symbols and n
nonterminals;
first: array(l:n) of set of terminal;
begin
first(1:n):=0; — start with empty sets
repeat
for all productions A-XXi |... Icto do
22 LL(1) grammars and syntax analysis 33
for all alternatives tti=[Xi]...[Xt]Y© with t>=0, Y(D€V* do
first(A):-first(A)+firstUi)+...+first(Xt);
case of
Y is terminal: first(A):=first(A)+{Y}
| Y is nonterminal: first(A):-first(A)+first(Y)
I Y© is absent: — nothing
end
end
end
until No change in first
end FindFirstSets
Terminal successor sets. When calculating the terminal successors of a
nonterminal A there are also three cases which must be distinguished: in the
right-hand side of a production in which A appears, either a terminal y, a
nondeletable nonterminal Y, or nothing follows after any deletable symbols.
From this, the computation rules (5) through (7) follow:
5. forfl-»a>iA [Xi\ ... [Xt],
(first (Xi) U...U first (Xt) u follow(B)) is in follow(A)
6. forfi -» ©i A [Xx] ... [Xt] y ©2,
(first (Xi) U...U first (Xt) U {y}) is in follow(A)
7. for B -> ©i A [Xi] ... [Xt] Y ©2>
(first (Xi) U...U first (Xt) U first (Y)) is in follow(A)
If all occurrences of A on the right-hand side of the productions are
considered, the total set follow(A) will be the combination of all partial sets of
follow(A) that result from (5) through (7). Therefore we have the following
algorithm.
2.26 Algorithm Calculation of successor sets
FindFollowSetsdGlfirsttfollow) :
param G: A grammar with marked deletable symbols and n
nonterminals;
first: array(l:n) of set of terminal;
follow: array(l:n) of set of terminal;
begin
follow(l:n):=0; — start with empty follow sets
repeat
for all nonterminals A do
for all productions B->©iA(Xi]... [Xt]Y©2 with t>=0 and Y(Q^EV* do
follow(A):-follow(A)+first(X^ +..,+first(Xt);
34 Syntax Chap. 2
case of
Y is terminal: follow(A):=follow(A)+{y}
I Y is nonterminal: follow(A):=follow(A)+first(Y)
I Y©2 is absent: follow(A):=follow(A)+follow(B)
end
end
end
until No change in follow
end FindFollowSets
The implementation of the algorithms depends strongly on the data structure of
the grammar. The execution time depends on the order in which the
productions are visited. Many optimizations are possible.
Principles of syntax analysis of LL(1) grammars
The principle of deterministic syntax analysis of LL(1) grammars can be
described abstractly under the following assumptions.
1. The grammar is given in 'matrix form': It has imax productions of the
form
Ai -» a/i I... I aijmaxfi) where 1 < i < imax
The sentence symbol is A i. An alternative ay is given by kmax
components of the form
a// = Xijl ••• Xijkmax(i,j)
a,y = e means kmax(ij) = 1, and Xyi = e.
The representation is matrix-like: index i describes the production,
index j describes the alternative, and index k describes the component.
2. As programmers, we understand this representation as an abstract data
structure with the access Junctions:
X(liljlk): symbol
returns the value of symbol X^
Kind(liljlk): Symkind
returns the kind of the symbol X/#,
where Symkind = (terminal,nonterminal,epsilon).
Rule(iiijik): integer
If Xijk is the nonterminal A& then this function returns index f:
Rule(iiijik) = f <=» Xijk = A?
Kmaxiiiij): integer
returns the number of components of alternative j in the production i.
Sec. 2.2
LL (1) grammars and syntax analysis
35
MatcKlxli): boolean
returns true if a phrase of the nonterminal A; can start with terminal
xy or - if Ai =>+ e - x can follow the phrase of An
Match(ixii) <=» xeflrstiAifollowiAi))
Alternative(ixii): integer
returns the index y of the alternative of the production i which can
begin with the terminal x:
Alternative(lxli)=j <=> x e firs^follow^)
3. The string to be parsed consists of pmax symbols sp:
g = S\...spmax with pmax > 1
The description is basic and abstract since we ignore (1) the concrete data
structures of the stored grammar, (2) the implementation of the access
functions, and (3) the fact that the input string is actually supplied by a lexical
analyzer.
We will now give a recursive and a nonrecursive parsing algorithm.
The recursive algorithm uses an internal recursive procedure Parse. Its
operation should be clear from the following specifications and from the text
of the algorithm without additional explanations.
Initial state: The input string, up to the symbol sp.\, is recognized as a legal
beginning of a sentence. The Arphrase starts with sp.
Function: Parse{liXcorrect) tries to parse the A,-phrase.
Final state: If correct = true, an A,-phrase is parsed and p is advanced
such that sp is the first input symbol that is no longer part of the Arphrase. If
correct = false, an Arphrase was not parsed.
2.27 Algorithm LL(1) analysis (recursive)
ParseRecursive(tcorrect):
param correct: boolean; —the string is successfully parsed
global grammar in matrix form;
s: array(l:pmax) of symbol; —input string
pmax: integer;
local p: integer; —index of current input symbol
Parse(litcorrect):
param i: integer;
correct: boolean; —an Ai phrase is parsed
local j,k: integer;
36 Syntax Chap. 2
begin —try to parse an Ai phrase
— position 1 —
if Match(ispii)
then —parsing of Ai possible
j:=Alternative(ispli); Jc:=l; —parse aij
loop —parse Xij*
— position 2 —
case Kind(iiijik) of
terminal:
if p>pmax or sp<>X(iiijik) then
correct:=false; exit
end;
p:=p+l —read next input symbol
I nonterminal:
Parse(iRule(iiijik)tcorrect);
if not correct then exit end
I epsilon: — do nothing
end;
if k<Kmax(iiij) then k:=k+l else correct:=true; exit end
end
else correct:=false —parsing of Ai impossible
end
— position 3 —
end Parse;
begin —pmax and s are assumed to have values
p:=l; Parse(ilTcorrect); correct^correct & p=pmax+l
end ParseRecursive
We will show the behavior of the above algorithm in Example 2.28 below
where we take a snapshot of the states of the algorithm at 'position l\
•position 2', and 'position 3'.
2.28 Example Recursive LL(1) parsing
Consider Knuth's e-containing grammar from Example 2.21. Let us give
its components the indices i9j9 and k, and extend it by the component
eof so that it will not produce empty sentences:
Si -» Em eofn2
E2 -» *211 A212 b2i3 E214 I b221 B222 &223 E224 I e231
A3 -> a3H A312 b3i3 A314 I e321
B4 -> b4n B412 a4i3 B414 I e42i
The input string shall be
ai b2 b3 a4 eofs
All steps performed by the algorithm can be traced in full detail by the
snapshots of Fig. 2.4.
§eC# 2.2 LL (1) grammars and syntax analysis 3 7
Recursion depth: 0 12 3
Position p sp ijk Xijk ijk Xijk ijk Xijk ijk Xijk
1
2
1
2
2
1
2
3
2
2
1
2
2
1
2
3
2
2
1
2
3
3
3
2
3
1 a
1 a
1 a
1 a
2 b
2 b
2 b
2 b
2 b
3 b
3 b
3 b
4 a
4 a
4 a
4 a
4 a
5 eof
5 eof
5 eof
5 eof
5 eof
5 eof
5 eof
6 -
1—
111
112
112
-
E
eof
eof
2—
211
212
213
214
214
-
a
A
b
E
E
correct
3~
321
321
2~
221
222
223
224
224
-
e
£
b
B
a
E
E
corrects
:=true
correct=true
4— -
421 £
421 £ correct:=true
2— -
231 £
231 £ correct:=true
correct:=true
true
Fig. 2.4 Snapshot of the LL(1) parsing of Algorithm 2.27 applied to the
grammar of Example 2.28
The nonrecursive algorithm uses a stack for the intermediate storage of the
indices of all nonterminals that are currently being processed. The access
functions of the stack are InitStack, Push(liljlk) and Pop(t ttftk).
The algorithm can be in three states: findalternative, try .forward.
These are characterized by the assertions which hold in each one respectively:
findalternative: The input string is already recognized up to the symbol spA
as a legal beginning of a sentence. sp is recognized and it is expected that an
Arphrase, starting with sp9 will follow. The index j of the matching
alternative of the Arproduction will be found.
try: The grammar symbol X^k will be parsed.
forward: X^ has been successfully parsed, so move to its successor.
Por the stack, the following assertion holds in all three states. If j = 1, the
stack is empty. If i > 1, then At is at the top of the stack.
38
Syntax
Chap. 2
2.29 Algorithm LL(1) parsing (nonrecursive)
the string is successfully parsed
—input string
forward);
—pmax and s are assumed to
—have values
—start with first rule
ParseNonRecursive(Tcorrect):
param correct: boolean;
global grammar in matrix form;
s: array(l:pmax) of symbol;
pmax: integer;
type State = (findalternative, try,
local i,j,k,p: integer;
st: State-
begin
i:=l; p:=l; stack:=empty;
st:=findalternative;
loop
case st of
findalternative:
— position 1 —
if Match(ispii)
then j:=Alternat ive(i spi i);
k:=l; st:=try
else correct:=false; exit
end
I try:
— position 2 —
case Kind(iiijik) of
terminal:
if p>pmax or X(iiijik)<>Sp then correct:=false
p:=p+l; st:=forward —Xijk is parsed
I nonterminal:
Push(liijik); i:=Rule(iiijik); st:=findalternative
I epsilon:
st:=forward
end —case Kind
I forward:
— position 3 —
if k<Kmax(iiij)
then k:=k+l; st:=try
else if i>l
then Pop(titjtk)
—X^ is first component
—sp does not match
—parse Xi;jk
exit end;
—advance to next component
—no end of alternative
—end of alternative
—Nonterminal Xijk is parsed
else correct:=p=pmax+l; exit —rule 1 is parsed
end
end —case st
end —loop
— position 4 —
end ParseNonRecursive
The behavior of the nonrecursive algorithm is shown in Example 2.30.
Sec. 2.2
LL (1) grammars and syntax analysis
39
2.30 Example Nonrecursive LL(1) parsing
We consider the same grammar and input stream as in Example 2.28 with
snapshots at positions 1 to 4. The algorithm executes as in Fig. 2.5.
Position
1
2
1
2
3
2
1
2
3
3
2
3
2
1
2
3
2
1
2
3
3
2
3
2
1
2
3
3
3
3
2
3
4
P
1
1
1
1
2
2
2
2
2
2
2
3
3
3
3
4
4
4
4
4
4
4
5
5
5
5
5
5
5
5
5
6
6
Sp
a
a
a
a
b
b
b
b
b
b
b
b
b
b
b
a
a
a
a
a
a
a
eof
eof
eof
eof
eof
eof
eof
eof
eof
-
-
ijk
1
111
211
211
211
212
3
321
321
212
213
213
214
2
221
221
222
4
421
421
222
223
223
224
2
231
231
224
214
111
112
112
112
xijk
E
a
a
a
A
€
e
A
b
b
E
b
b
B
e
e
B
a
a
E
e
e
E
E
E
eof
eof
eof
Stacl
< (En<
empty
empty
111
111
111
111
212,
212,
212,
111
111
111
111
214,
214,
214,
214,
222,
222,
222,
214,
214,
214,
214,
224,
224,
224,
214,
111
111
111
111
111
111
111
111
214,
214,
214,
111
111
111
111
214,
214,
214,
111
empty
i-Of-Stack left)
111
111
111
111
111
111
correct=true
Fig, 2.5 Snapshots of the nonrecursive LL(1) algorithm 2.29,
applied to the grammar of Example 2.28
The recursive algorithm is statically shorter and more elegant. The
nonrecursive algorithm is more suited for the inclusion of error handling since the
explicitly stacked symbols are accessible (see Section 2.6).
Both scan the input string strictly from left to right (p is never
decremented). In addition, there exists a grammar-specific upper limit c such that after
40
Syntax
Chap. 2
a maximum of c loops, a new input symbol is read. Hence, the algorithm has
a linear execution time with respect to the length of the input string. It has the
time complexity 0(pmax).
LL(Jfc) grammars for k > 1
A lookahead of more than one symbol is rarely used in compilers. We shall
therefore treat LL(k) grammars for k > 1 only briefly, for the sake of
completeness.
First, we define the set of terminal start symbols of length A: of a
string a:
2.31 Definition Terminal start symbols of length k
^(a) = (p:aVp© withpeVr*, lpl = fc, <o eV*} for pa>>*
firstk(a) = {p: a h* p with p e Vt*9 Ipl < k} for p > k
If the terminal string which can be derived from a is shorter than k, then
the elements of firstk(a) are also shorter than k.
We will now give a formal definition of the LL(£) grammars according to
Rosenkrantz and Stearns [1970]:
2.32 Definition LL(k) grammar
A grammar is called LL(k) if for all left-canonical derivations of the form
S ^* a A co =** ap©
S =** a A cd =** aycd
where firstk($ cd) = firstly <o), it is implied that p=y.
This means that in an LL(k) grammar no two sentential forms with the
leftmost nonterminal A and the alternatives A -» p and A -> y can exist in which
the left canonical derivations of the remaining strings beginning with p and y
agree in the first k symbols. From this, we get the following condition:
2.33 LL(k) condition
A grammar is LL(k) if for each pair of rules
A -» p and A -> y
and each left canonically derived sentential form that contains A:
Sec. 2.2
LL (1) grammars and syntax analysis
41
the following condition holds:
firstk(g> <o) n firstly a>) = 0
2J4 Example ZI/2) andLL(3) test
Again, consider the grammar
S -» A;
A -» aB | BBb
B -4 b I ab
in order to see if it is LL(fc) and determine the value of k.
The only pair of rules that creates a problem is:
A -» aB
A -* BBb
and the only sentential form in which its left-hand side A appears is A;.
k - 1: the LL(1) test produces: L\(aB;) = [a}
Lx(BBb;) = {<*,*)
Since a belongs to both sets, the grammar is not LL(1).
k = 2: the LL(2) test produces: Li{aB\) = {oa, o&}
L2(BBb\) = {bbyba,ab]
Since the element ab belongs to both sets, the grammar is not LL(2).
k = 3: the LL(3) test produces: £3(08;) = {oft;, aab)
L3(BBb;) = {666,6a6, abb9 aba)
Since both sets are disjoint the grammar is LL(3).
Algorithms for the computation of the sets firstk(a) and for checking the
LL(&) conditions for k > 1 can be found in Aho and Ullman [1972].
No left-recursive grammar is LL(k) for any k. Another simple grammar
that is not LL(k) for any k is:
S -» A;
A4 a I aAa
It has the language {a2****1}. If there were a value of k such that
first^ac^1;) n first^aAaan\) - 0
then k > n+1 would be true. However, since n can become arbitrarily large,
there is no such L
42 Syntax Chap, 2
Rosenkrantz and Stearns [1970] proved the following interesting statements
about LL(fc) grammars:
(1) It is undecidable whether a given arbitrary grammar is LL(fc) for an
unknown value of k.
(2) It is decidable whether a given arbitrary grammar is LL(fc) for a given
fixed value of k.
(3) If a grammar G is not LL(k) for a given k, it cannot be determined if
there is an equivalent LL(k) grammar for G.
(4) For each LL(Jfc) grammar G that contains e, there is an e-free LL(fc+l)
grammar that produces the same language as G, but without the empty
string.
2.3 The top-down graph
In a table-driven syntax analysis, the grammar of the source language must be
stored in main memory so that the analysis algorithm can access it freely. The
three-dimensional abstract data structure consisting of rules, alternatives, and
components, used in Section 2.2 for the representation of the principal
algorithms, is not suited as concrete data structure. It does not make efficient use
of memory and the grammar cannot be represented in EBNF form. A
representation that is much better suited for a practical top-down analysis is a
special kind of graph. We call it top-down graph. It is similar to the syntax
diagrams, introduced by Wirth, that were used to describe the syntax of
Pascal. In Coco, the top-down graph is used as a preliminary step to the even
better suited G-code. Since the G-code is understandable only by means of the
top-down graph, we will describe that first.
Basic structure
The basic structure of the top-down graph is a collection of ordered binary
trees. Its nodes are the grammar symbols of the right-hand sides of syntax
rules. Right pointers link the components of an alternative while left pointers
link the start symbols of different alternatives.
In the picture of a top-down graph, a right pointer leaves a node at the
right, a left pointer leaves the node at the bottom:
Sec.
2.3
The top-down graph
43
node right pointer (to next component)
left pointer
(to next alternative)
2.35 Example Basic structure of the top-down graph
Figure 2.6 shows the top-down graph of the grammar
S -> A;
B -» aB I BBb
B -> b I ab
Notice that the top-down graph comprises only the right-hand sides of
the rules.
B =>
a—-B
B—-B—^b
a—-b
Fig. 2.6 Top-down graph of the grammar of Example 2.35
Factorization
An advantage of the top-down graph over a linear representation is the ability
to show alternatives in a factorized form, as can be done in EBNF. From the
rule
a —^ abc | acd with the top-down graph A =» a—^ b —+> c
I
a—*- c —^ d
we get by left-factorization the EBNF rule
44 Syntax Chap. 2
a —+- a (be | cd) with the top-down graph a => a—■* b —*■ c
I
c—~d
From the rule
a —■* abc I dec with the top-down graph A => a—■* b —•* c
we obtain by right-factorization the EBNF rule
a—- (ab|de)c with the top-down graph A =* a—►b-y-^c
i j
d—►eJ
Notice that the last top-down graph is no longer a tree.
A special case occurs when an alternative is the beginning of another
alternative. Then, an e is created by factorization. For the rule
a —- ab | a with the top-down graph A => a—+> b
we get by left-factorization the EBNF rule
a —^ a [b] with the top-down graph A => a—^ b
Removal of left-recursive rules
The symbol strings defined by left-recursive rules can be represented in EBNF
by the repetition symbol. Repetition corresponds to a loop in the top-down
graph. From the rule
a—- a|Ab with the top-down graph A => a
I
A—- b
we get the EBNF rule
I 1
a—- a{b} with the top-down graph A => a—-b-J
Sec. 2.3
The top-down graph
45
This top-down graph is also not a tree. It can easily be verified that it
represents all possible right-hand sides such as a, ab, abb, abbb9 etc.
The complement symbol any
Sometimes it is desirable to represent a set of terminals by its complementary
set, for example
1. in the description of a string enclosed in quotes: the set of all symbols in
the alphabet except the quote;
2. any symbol in a comment of the form (* ... *) by the set of all symbols
except the symbol *);
3. any symbols except begin (to skip declarations).
Complementary sets cannot be represented in the production notation of
formal languages. Therefore, the only thing left to do is to enumerate all members
of the complementary set, which is very inconvenient. For use in Coco, we
introduce the special symbol any to denote complementary sets.
2.36 Definition Complement symbol any
The complement symbol any represents every arbitrary terminal that is
not a terminal start symbol of an alternative of any.
Figure 2.7 shows the three examples above with the symbol any as an EBNF
rule and as a top-down graph.
string = " '" {any} " •" . string
comment - "(*" {any} "*)". comments
-i-^ any -J r*-
f
skip = {any} "begin". skip => any —» r** "begin
llZr
Fig. 2.7 The meaning of the complement symbol any
Equivalent top-down graphs
If one uses only the basic structure, then a unique top-down graph results
from a grammar rule. This uniqueness is lost with factorization and removal of
left recursion. In these cases there are sometimes several equivalent top-down
graphs.
46
Syntax
Chap. 2
2.37 Example Equivalent top-down graphs
Consider the expression
E->T|+T|-T|E + T|E-T
By factorization and elimination of left-recursion the graph shown by the
upper part of Fig. 2.8 will result. It has 10 nodes and corresponds to the
EBNFrule
E = (T
T I
T) {n+w T |
T>.
Another top-down graph which is equivalent to both but consists of only
7 nodes appears in the middle part of Fig. 2.8. This graph corresponds to
the EBNFrule
Figure 2.8 shows another equivalent and even more condensed top-down
graph with only 6 nodes. This top-down graph no longer corresponds to
an EBNFrule.
T
1 i
+ —-T —
-*T 1
* ft t- m T J
» t r 1 »
1 \
- —•» T »
e
10 nodes
E =* +—x-*" T—i_+_rft.T 1
'J !r
7 nodes
CT^D
6 nodes
Fig. 2.8 Three equivalent top-down graphs for expressions
Sec. 2.3
The top-down graph
47
The graph with the fewest nodes occupies the least memory. However, there
may be reasons (due to the treatment of semantics, see Section 3.6) not to
compress the top-down graph too much.
These examples show that for each EBNF rule there is a corresponding
top-down graph. But a top-down graph does not always correspond to an
EBNF rule.
Representation
The top-down graph can be represented in memory by a data structure of
nodes and pointers that may be dynamically generated or statically declared
and initialized. Since the number of nodes is known in advance and does not
change, the static declaration is more efficient. In Coco, the top-down graph
basically consists of an array of nodes, and each node consists of four
components:
type Graphnode = record
kind: (terminal,nonterminal,any,eps);
val,lp,rp: integer;
end;
var graph: array(l:n) of Graphnode;
The names have the following meaning:
kind: the various node types.
val: the node symbol in some encoding, meaningless for e-nodes.
lp: the left pointer that points to the first node of the next alternative. If
Ip > 0 then graph(lp) starts the next alternative. If lp = 0, the
current alternative is the last one of the production.
rp: the right pointer that points to the next component. If rp > 0 then
graph(rp) is the next component. If rp = 0, the current component is
the last component of an alternative.
n: the number of nodes in the grammar.
LL(1) Conditions for top-down graphs
The LL(1) condition of Section 2.2 refers to the simple grammar
representation with rules and alternatives. If a grammar meets these rules, the correct
alternative can be selected by a lookahead of one symbol without
backtracking. A similar condition for the top-down graph ensures the correct selection
of the alternatives without backtracking by use of a lookahead of one symbol.
To simplify the discussion, we introduce two auxiliary concepts. Since
they are of central importance for the syntax analysis of top-down graphs, we
will use them often. We call these concepts 'alternative chain1 and 'match1.
48
Syntax
Chap. 2
2.38 Definition Alternative chain
An alternative chain is the ordered set of all nodes of a top-down graph
that are linked together by left-pointers. A node pointed to by a right
pointer is the start of an alternative chain. A node without a left pointer is
the end of an alternative chain. We can define nodes that are not linked by
left pointers as also belonging to an alternative chain. In this case the
alternative chain consists of the node alone.
2.39 Example Alternative chains
In the top-down graph of Fig. 2.9 symbols are distinguished by
subscripts. The graph contains the alternative chains
{+1,-2,7-3} {Tx} {+4,-6**8 Ws) iTi)
Note that node T3 belongs to two alternative chains.
E => +.-
12-
r
'rs~*
6—T7-1
e8
Fig. 2.9 Top-down graph for expressions with indexed symbols
2.40 Definition Match
An input symbol x and a node of the top-down graph with symbol sy
match (i. e. fit together) if one of the following conditions are met:
1. sy is a terminal and x = sy;
2. sy is a nondeletable nonterminal that may start with x;
3. sy is a deletable nonterminal, sy can start with xotx can follow the
nodes?;
4. sy is an e-node and x can follow the node sy;
5. .sy is an any-node and x matches no other node in the alternative
chain to which the any-node belongs.
In order to select a node loc uniquely from an alternative chain using a look-
ahead symbol jc, x must match only one alternative:
g^ 2.3 The top-down graph 4 9
2.41 LL(1) conditions for top-down graphs
An alternative chain is LL(1) if an arbitrary input symbol matches at most
one of its nodes. A top-down graph is LL(1) if all of its alternative chains
aieLL(l).
The top-down graph of Fig. 2.9 is therefore LL(1) if T cannot start with + or
- and if E cannot be followed by + or - (these symbols would match the e-
node).
Since each EBNF production corresponds to a top-down graph, the
LL(1) conditions for top-down graphs are also the LL(1) conditions for EBNF
grammars. In order to check if an EBNF grammar is LL(1), it is easiest to
generate its top-down graph and check if it meets the LL(1) conditions. The
LL(1) conditions for EBNF grammars can also be derived from the definition
of the EBNF grammar alone without constructing a top-down graph.
However, this is cumbersome and results in no new insights. We therefore
omit the description and leave the task to the interested reader.
LL(1) Top-down graphs and grammars of
programming languages
If top-down graphs are to have practical value, one must be able to represent
the grammars of programming languages as LL(1) top-down graphs, and
therefore as LL(1) EBNF grammars. We may ask, therefore, if they do this
without exception, or if there are constructs that resist an LL(1) representation,
and if so, what can be done about it
First of all, LL(1) violations by left-recursive productions and by the start
of several alternatives with the same symbol can easily be avoided in top-
down graphs and in EBNF notation. Remaining LL(1) violations can usually
be removed with various tricks that are determined with insight into the
particular situation. As an aid for the 'grammar designer', we will treat several
typical cases and distinguish between the following five methods:
1. substitution and factorization;
2. alphabet extension;
3. syntactic extension;
4. acceptance of non-LL(l) constructs;
5. miscellaneous transformations.
Substitution and factorization. Consider a production with two alternatives
that start with different nonterminals X and F, where X and Y can start
with the same symbol (terminal or nonterminal). Then it is often possible to
50
Syntax
Chap. 2
replace the symbols X and Y by the right-hand side of their productions, and
to extract their common starting string by left-factorization.
This can be simple and obvious as in the various DO instructions of
PLM/80 (and similarly in PIV1):
statement =
I dostatement
I whilestatement
I forstatement
I casestatement
dostatement = "DO" ";" block.
whilestatement = "DO" "WHILE" expr ";" {statement} ending.
forstatement = "DO" ident "=" expr "TO" expr ["BY" expr] ";"
{statement} ending,
casestatement = "DO" "CASE" expr "/" {statement} ending.
By substitution and factorization this results in
statement =
I "DO"
(";" block
I ("CASE" expr ";"
I "WHILE" expr ";"
I ident "=" expr "TO" expr ["BY" expr] ";"
) {statement} ending
)
I ...
However, it can also be difficult. In grammars such as Modula-2 & factor can
be a set or a designator, and both can begin with an identifier:
factor = ... | designator [actpars] I set I ...
designator = qualident {"." ident I "[" exprlist "]" I "T" }.
set = (qualident] "{" [elementlist] "}".
qualident = ident {"." ident}.
Note that even the production for designator taken alone is not LL(1). For
instance, identAdent may be simply a qualident or a qualident followed by
ident
The removal of the LL(1) conflict consists of combining designator and
set into a new symbol deset, and then splitting designator into ident and a
remainder desigrest. After several substitutions and factorizations, the
following LL(1) constructs result:
factor = ... | deset j ...
deset =
"{" [elementlist] "}"
Sec. 2.3
The top-down graph
51
I ident P.n ident}
I ( »*" | "[" exprlist "]" ) desigrest [actpars]
j »{" [elementlist] n}n
I [actpars]
).
desigrest = {"." ident I n[n exprlist ■]■ I ntw}.
The equivalence of the old and new constructs can no longer be easily seen.
Alphabet extension. In selecting an alternative, it is fairly common for two
lookahead symbols to be necessary to find the right one. The main example of
this is when labels appear in front of statements:
statement = [ident n:"] (ident ":=" expr j ifstatement I ...).
An ident at the beginning of a statement may be a label or the left part of an
assignment. This can only be determined by the symbol following ident. This
conflict can often be resolved by extending the terminal alphabet. In the
preceding case, the word label can be added to the alphabet, and the lexical
analyzer can be required to supply a label instead of an ident if ident is
followed by a ':'. In this case, the lexical analyzer is used to resolve the LL(1)
conflict
This method leads to complications if the lexical analyzer is required to
carry out a wider inspection of context to determine whether or not to
substitute two terminals by another. For example, in Algol 60, 'ident:' does not
always mean the label of an instruction. An identifier may also appear in a
declaration, as in ARRAY(n : m). In such cases, the lexical analyzer is no
longer independent of the syntax analyzer since it must consider the context.
Syntactic extension. In Algol 60 there exist multiple assignments, such as
assignment » designator ":=" {designator ":="} expr.
where expr can start with designator. This LL(1) violation is very nasty. It
can be removed by Substitution and factorization1, but this is very
cumbersome (the reader should try it). It is easier to 'expand' the designator inside
the curly brackets to expr. This requires the introduction of an additional
production for assignrest.
assignment » designator ":=" assignrest.
assignrest = expr [n:=n assignrest].
The syntactic extension must be compensated by a semantic restriction. If in
*e production for assignrest the right-recursive part is present, expr must be
restricted to be a designator. This can be achieved by the introduction of a
hoolean attribute isdesignator. Anticipating knowledge from Chapter 3, this
52 Syntax Chap. 2
may be written as an attributed grammar as follows:
assignrest -
exPrfisdesignator
[":=" where (isdesignator)
assignrest].
This means: by syntactic extensions, portions of the language definition are
moved from syntax to static semantics.
Acceptance of non-LL(l) constructs. If it is known that the parser tries to
match the alternatives in the order they are written, some LL(1) violations can
be left alone. The best known case is the dangling else:
ifstatement = "IF" expr "THEN" statement ["ELSE" statement].
Although this construct is not LL(1), and is even ambiguous (see Example
2.22), it can be left alone if one can be sure that the parser, having recognized
the statement following THEN, first tries to detect the optional ELSE, and
only regards the entire if statement as complete if there is no ELSE.
Other transformations. Sometimes, a grammar that is not LL(1) can be
transformed into an equivalent LL(1) grammar by simple transformations that do
not fall into any of the four categories above. For example, in Algol 60, a
block is defined as
block = head ";" body.
head ■ "begin" declaration {";" declaration}.
This construct is not LL(1) since the semicolon is used in a dual role. It
separates adjacent declarations and it separates body from head. The solution is
simple: The grammar can be transformed so that the semicolon becomes a
terminator instead of a separator.
block = head body
head = "begin" declaration ";" {declaration ";"}
The necessity of such transformations, their difficulty, and the uncertainty of
executing these transformations correctly is a weakness of the LL-method, and
often a cause for criticism. In bottom-up analyzable LR(1) grammars, no
transformations, or only a few, are needed, so research has been focused on
the LR-method. However, syntax is but one aspect. What is gained with the
LR-method must be paid back by the connection of semantics to syntax: it is
much more inflexible in the LR-method than in the LL-method, often leads to
violations of the LR-property, and then also requires transformations. In
addition, the LL(l)-method is much easier to understand than the LR-method. This
results in easier transformations and more understandable error messages.
Sec. 2.4
The G-code
53
2.4 The G-code
A top-down graph that resides in memory is a useful way of representing a
grammar. It already requires little space, but it can be significantly compressed
further. Let us consider the grammar of arithmetic expressions:
S = expr.
expr = term {n+n term},
term * factor {n*n factor},
factor =v r(n expr n)w.
Now, let us add the production 5' = 5 eofsy where 5' is the new sentence
symbol and eofsy (= end of file) a new terminal. This trick ensures that each
sentence terminates with the same symbol eofsy and that there is no empty
sentence if 5 can be derived into the empty string.
S'
s
expr
term
factor
=> S —»" eofsy
=» expr
1
e
=5 factor
l
e
=» v
1
T —*• expr —*- T
Fig. 2.10 Top-down graph for an expression: graphic representation
In Fig. 2.10 we see a top-down graph of a grammar with 15 nodes. In Fig.
2.11 we see the internal memory representation described in Section 2.3. If
we assume one byte each for the components typ and val, and two bytes
each for Ip and rp, then the table requires 15*6 = 90 bytes.
Compacting can be achieved by partitioning the nodes according to their
types, and by coding the individual types so that they do not contain any
unnecessary information. The G-code (grammar code) that we use is such a
code. For syntax analysis the elements of the G-code behave as instructions
and therefore they are written as instructions. Sequential G-code instructions
are sequentially executed. They correspond to nodes in the top-down graph
54 Syntax Chap. 2
that are connected by right pointers. Definition 2.42 defines the G-code as far
as it is relevant for the representation of a top-down graph.
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
typ(i)
nt
t
nt
nt
t
nt
eps
nt
t
nt
eps
t
t
nt
t
val(i)
s
eofey
expr
term
term
fector
factor
V
T
expr
tryi
Wi)
0
0
0
0
7
0
0
0
11
0
0
13
0
0
0
ip(i)
2
0
0
5
6
5
0
9
10
9
0
0
14
15
0
rule for S'
rule for S
rule for expr
rule for term
rule for factor
Fig. 2.11 Top-down graph for an expression: representation in memory
The G-code is augmented by tables containing the lookahead symbols. With
each nonterminal symbol sy (not with each nonterminal node) there is
associated a set first(sy)9 containing its terminal start symbols.The operand nr
of an e-instruction (= EPS and EPS A) refers to an array eps set. Thus
epsset(nr) contains all terminals that match the corresponding e-node (see
Definition 2.40). The operand nr of an ANYA-instruction refers to an array
anyset. Thus anyset(nr) contains all terminals that match the corresponding
any-node. In summary, these G-code lookahead sets have the following data
structures:
first: array(limaxnt) of Symbolset
epsset: array(limaxeps) of Symbolset
anyset: array(l:maxany) of Symbolset
If the lookahead sets are stored bitwise, they do not require much memory.
It can be seen that each node of the top-down graph corresponds to a G-
code instruction. The G-code instructions RET and JMP are added at the end
of productions and loops where the linear execution sequence is interrupted.
Sec. 2.4
The G-code
55
2.42 Definition G-code (incomplete)
Instruction Bytes Description
NTA
ANY
ANYA
sy adr
nr adr
T sy 2 terminal
If the next input symbol is sy then recognize it, else report an
error.
TA sy adr 4 terminal with alternative
If the next input symbol is sy then recognize it, else go to
oar.
NT sy 2 nonterminal
If the next input symbol is a terminal start of sy then step
through its production, else report an error.
4 nonterminal with alternative
If the next input symbol is a terminal start of sy then step
through its production, else go to adr.
1 any
Recognize the next input symbol.
4 any with alternative
If die next input symbol is contained in the symbol set indicated
by nr then recognize it, else go to adr.
EPS nr 2 epsilon
If the next input symbol is contained in the successor set
indicated by nr then recognize the empty string, else report an error.
EPSA nr adr 4 epsilon with alternative
If the next input symbol is contained in the successor set
indicated by nr then recognize the empty string, else go to adr.
JMP adr 3 jump
Goto adr.
RET 1 return
Return from the production of a nonterminal.
The operation code and the operands sy and nr are 1 byte each; adr is 2
bytes.
The following G-code results for the grammar shown in the top-down
graph of Fig. 2.10:
1 NT S
3 T eofsy
5 RET
6 NT expr
8 RET
9 NT term
11 TA n+w 20
15 NT term
The production S1 = S eofsy.
The production S - expr.
The production expr = term {"+" term}.
56
Syntax
Chap. 2
17
20
22
23
25
29
31
34
36
37
41
42
44
46
48
JMP
EPS
RET
NT
TA
NT
JMP
EPS
RET
TA
RET
T
NT
T
RET
11
1
factor
w*ii 34
factor
25
2
v 42
it /n
expr
n \ n
The production
The production
factor ■ v | "(■ expr ")•
The lookahead sets are:
first(S) - {v, •(")
first (expr) - {v, "("} epsset(l) - {eofsy, ")"}
first (term) - (v, "("} epsset(2) - (eofsy, ")\ "+"}
first(factor) - (v, n(w}
anyset is empty since the grammar contains no any-symbol.
The total amount of G-code is 48 bytes, which is slightly more than one-
half of the top-down graph.
In Coco, first of all a top-down graph is generated. It is then used to
check several properties of the grammar, and to calculate the start and
successor sets. Finally the graph is transformed into G-code, and this is the ultimate
structure in which the grammar is stored.
2.5 Parsing with the G-code
Parsing becomes quite simple with the G-code since the G-code itself is
already a parsing program. To make a parser, it is only necessary to code an
interpreter for a G-code program.
In this section we will develop such a parser without error handling. In
the next section we will add the error handling.
Assumptions
We will summarize here the assumptions on which parsing with the G-code is
based.
« 2.5 Parsing with the G-code 57
i The G-code is derived from a top-down graph that meets the LL(1)
conditions.
2. If 5 is the sentence symbol, then the top-down graph and the G-code are
expanded by the production
S1 -» S eofsy
where eofsy is the terminal end-of-file that does not appear in the
original grammar. The first G-code instruction of this production has the
address 1.
3. The symbol string to be parsed is supplied by a lexical analyzer, which
provides the next input symbol in the variable typ for each call. After
reaching the last source symbol, the lexical analyzer supplies the symbol
eofsy.
4. The parsing algorithm uses a stack of actual length lacts (= actual length
of stack) to store the addresses that follow the nonterminal instructions
currently being processed (these are the "return addresses" of the
currently parsed nonterminals).
Overview
The parsing algorithm executes the G-code program which is controlled by the
input string to be recognized. It starts at address 1 and ends at the instruction
for the symbol eofsy. Depending on the current input symbol typ and the
current G-code instruction several courses of action are possible. When the
algorithm tries to recognize a terminal there are two possibilities: if it succeeds
then it moves to the next symbol; if it fails then it goes on to the next
alternative (if there is any). When the algorithm tries to recognize a nonterminal,
there are also two possibilities: if the input string and the nonterminal match
then the algorithm pushes the address of the next instruction on the stack and
jumps to the first G-code instruction of the nonterminal; if they do not match
then it goes on to the next alternative (if there is any). At the end of
productions, the 'return address' is popped from the stack with RET, and the
algorithm continues from there on. When an error occurs, error handling and
synchronization take place, after which parsing continues as if no error had
occurred. The analysis ends when typ = eofsy and the corresponding G-
code instruction is T eofsy.
The parsing algorithm is called Parse. It returns a boolean variable
correct which will be true if the analyzed input text is syntactically correct.
Parse is an interpreter that has the following structure:
58
Syntax
Chap. 2
Parse(Tcorrect):
pc:=l; —program counter
loop
opcodes-code (Ipc); —G operation code
case opcode of
t: execute instruction "T syw and change pc
I ta: execute instruction WTA syw adr and change pc
I jmp: execute instruction nJMP adr"
end
end
end Parse
Inside the loop, a value is assigned to the result parameter and the loop is
terminated if typ = eofsy.
The simplified G-code parsing algorithm
First we will present a simplified version of Parse that does not contain the
instructions ANY,ANYA9EPS9EPSA, and does not have any error
actions. We further assume that nonterminals are not deletable. For the
description of Parse in Adele, we will use the following routines:
Decode(lpctopcodetsytnrtadrtnextpc)
returns the parameters of the G-code instruction starting at address pc.
(An operand that does not appear in the actual instruction returns an
undefined value of the corresponding parameter.) nextpc is the address of
the next instruction.
NewSym(ttyp)
returns the next input symbol.
Root(isy): integer
returns the address of the first G-code instruction for the production for
the nonterminal sy.
By using these actions, the simplified algorithm is as follows:
2.43 Algorithm Parse (simplified)
Parse(Tcorrect):
param correct: boolean; —correctness indicator
const eofsy = ... ; —end of file symbol
type Instruction = (t,ta,nt,nta,jmp,ret)/
local
adr: integer; —instruction part adr
first: array of Symbolset; —lookahead symbol sets
lacts: integer; —actual stack length
nextpc: integer; —addr.of next G-code instr.
nr: integer; —instruction part nr
Sec. 2.5
Parsing with the G-code
59
opcode: Instruction;
pc: integer;
stack: array of integer;
sy: integer;
typ: integer;
begin
pc:-l; lacts:=0; NewSym(Ttyp);
loop
Decode (ipctopcodetsyf nrtadrf nextpc) ;
case opcode of
t:
if typ=sy
then
if typ=eofsy then
correct:=true; exit
end;
pc:=nextpc; NewSym(Ttyp)
else correct:=false; exit
end
I ta:
if typ=sy
then pc:=nextpc; NewSym(ttyp)
else pc:=adr
end
I nt:
if typ in first(sy)
then lacts:=lacts+l; stack(lacts)
else correct:=false; exit
end
I nta:
if typ in first(sy)
then lacts:=lacts+l; stack(lacts)
else pc:=adr
end
I jmp:
pc:=adr
I ret:
pc:=stack(lacts); lacts:=lacts-l -
end —case
end —loop
end Parse.
—instruction part opcode
—program counter
—nonterminals worked on
—sy part of G-code instr.
—current source symbol
—init.and read first symbol
—get instruction at pc
—term.without alternative
—must match
—terminate successfully
—advance and read
—terminate unsuccessfully
—terminal with alternative
—may match
—advance and read
—goto alternative
—nonterm.without alternative
—must match
=nextpc; pc:=Root(isy)
terminate loop if error
-nonterminal with alternative
•may match
^nextpc; pc:=Root(isy)
—goto alternative
—jump to next instruction
—return
■find follower in stack
The complete G-code parsing algorithm
We will now add the interpretation of the instructions and properties that were
left out in the previous section, and provide the following explanations.
The instruction ANY recognizes any source symbol, and ANY A
recognizes any source symbol that is a member of the lookahead set belonging to
this instruction.
The instructions EPS and EPSA recognize the empty string if the source
symbol matches their lookahead set.
60
Syntax
Cfcap%2
In the case of an error, the analysis shall not be terminated. Rather, the
error handler
Error (Jpcjaltroot)
will be executed. Error requires as parameters the address pc of the non„
matching G-instruction and the address altroot (root of alternative chain) of
the first G-instruction of the alternative chain in which the error occurred.
Error synchronizes by skipping of input symbols, changes pc and altroot
and sets correct to the value false. Error is thus local to Parse.
Every time an input symbol has been successfully parsed, the next
symbol can be read, and altroot can be set to a new alternative chain. For semantic
reasons, however, these actions are delayed until the input symbol is actually
required by the parser. Instead of reading a symbol, the variable mustreadh
set to true.
Furthermore, in the complete version we will consider the possibility that
a nonterminal X can be derived into the empty string. This can be tested with
the function
Deletable(ix): boolean
Such a nonterminal is always recognized, even if the current input symbol
does not belong to its terminal start symbols (explanation in Section 7.3.3).
This requires the interpretation of the instructions NT and NT A to be
extended.
Expanded in this way, the algorithm Parse has the following complete
form:
2.44 Algorithm Parse (complete)
Parse(tcorrect):
param correct: boolean; — correctness indicator
const eofsy « ... ; — end of file symbol
type Instruction = (t,ta,nt,nta,any,anya,eps,epsa,jmp,ret);
local
adr: integer; —instruction part adr
altroot: integer; —root of alternative chain
anyset: array of Symbolset; —lookahead symbol set
epsset: array of symbolset; —lookahead symbol set
first: array of Symbolset; —lookahead symbol set
lacts: integer; —actual stack length
mustread: boolean; —typ is consumed
nextpc: integer; —address of G-instruction
nr: integer; —instruction part nr
opcode: Instruction; —instruction part opcode
pc: integer; —program counter
stack: array of integer; —nonterminals worked on
sy: integer; —instruction part sy
Sec- 2.5
Parsing with the G-code
61
integer; —current source symbol
^1 )- ... end Error; —local error procedure
Error (.••'•
pc-1; altroot:-l; —initialize
^ustread:=true; lacts:=0;
loop . . . .
Decode(ipclopcode! syTnrTadrinextpc); —get instruction at pc
if mustread then —read next source symbol
NewSym(ttyp); altroot:=pc; mustread:-false
end;
case opcode of
t: —terminal without alternative
if typ^sy —must match
then
if typ=eofsy then
correct:-true; exit —terminate loop successfully
end;
pc:=nextpc; mustread:=true —advance
else Error (tpctaltroot) —sets correct :=false
end
| ta: —terminal with alternative
if typ=sy —may match
then pc:=nextpc; mustread:=true —advance
else pcr^adr —goto alternative
end
I nt: —nonterm. without alternative
if typ in first(sy) or Deletable(isy) —must match
then
lacts:=lacts+l; stack (lacts) :=nextpc; —push follower
pc:=Root(4sy); altroot:=pc —parse rule for nonterminal
else Error (tpctaltroot) —sets correct :=false
end
I nta: —nonterminal.with alternative
if typ in first(sy) or Deletable(isy) —may match
then
lacts:=lacts+l; stack(lacts):=nextpc; —push follower
pc:=Root(isy); altrootr^pc —parse rule for nonterminal
else pc:=adr —goto alternative
end
I any: —any without alternative
pc:=nextpc; mustread:=true —advance
I anya: —any with alternative
if typ in anyset(nr)
then pc:=nextpc; mustread:=true —advance
else pc:=adr —goto alternative
end
I eps: —epsilon
if typ in epsset(nr) —must match
then pc:=nextpc —advance
else Error (tpctaltroot) —sets correct :=false
end
I epsa: —epsilon with alternative
if typ in epsset(nr) —may match
62
Syntax
Chap,2
then pc:=nextpc —advance
else pc:=adr —goto alternative
end
I jmp: —jump to next instruction
pc:=adr
| ret: —return
pcr^stackdacts); lacts:=lacts-l; —find follower in stack
altroot:=pc
end —case
end —loop
end Parse.
2.6 Error handling
Principle
A syntax error arises in one of three situations: (1) the input symbol typ does
not match the symbol sy in the G-code instruction T; (2) typ is not a
terminal start symbol of the instruction NT; (3) typ is not a terminal successor of
the instruction EPS. In any of these situations, the variable altroot contains
the address of the alternative chain in which the error occurred and the stack
contains the return addresses of all nonterminals that are currently being
processed.
This is sufficient information to collect all terminals that can be used to
continue the analysis* The following example illustrates the situation.
2.45 Example Error situation
Consider the grammar fragment:
program - declarations body end.
declarations - ...
body = statement {statement},
statement =
I "if" relation "then" body ...
relation = expr relop expr.
relop = n>" | ">=" | "=n | won | "<=n | "<n.
expr = ...
and the input text
... if a:«b then c:-d end ...
When the syntax analyzer detects the error caused by the ':=', the
situation shown in Fig. 2.12 has been reached. The boxes in this figure
enclose the grammar symbols of the G-code instructions whose addresses
are in the stack.
Sec-
2.6
Error handling
63
S'
eofsy
program
r~
declarations
body
statement
end
r~
statement I e
statement
relation
then
expr relop
b then c:=d end
Fig. 2.12 Partial syntax tree of an erroneous translation of the instruction
if a:=b then c:=d end
The last input symbol which was correctly recognized is a. It was
recognized as expr. Then relop must follow. Since relop cannot start with
':=' the procedure Error(tpctaltroot) is called. The stack contains the
addresses of the G-code instructions for the recognition of
eof end, statement, then
t t
bottom of stack top of stack
We will now collect the so-called 'anchors1, i. e. all terminals that are suitable
for the resumption of the syntax analysis. They can be grouped into four
classes:
1. All terminal start symbols of the alternative chain starting at altroot,
because the erroneous symbol may have been added inadvertently by the
coder, in front of a. symbol of the unrecognized alternative chain. In the
example, these are the symbols >, >=, «, <>f <«, <.
2. All terminal successors of the alternative chain at altroot, because the
erroneous symbol may appear in place of a symbol of the unrecognized
alternative chain. In the example, this set consists of the beginnings of
expr: v, c, +, -, (.
64
Syntax
Chap. 2
3. The terminal start symbols of all symbols in the stack, and of the
alternative chains beginning with them. With these, syntax analysis can be
resumed after a non-recognized nonterminal. In the example these are the
symbols then, end, eofsy and the setfirst(statement).
4. All terminal successors of the alternative chains whose addresses are in
the stack. In the example, these are all terminal start symbols of body
since body follows then, and all terminal start symbols of statement
since statement follows statement.
While the inclusion of items 1 to 3 in the set of anchors is plausible, the
inclusion of item 4 seems rather arbitrary. We could justify this by the fact that
items 3 and 4 are symmetric to items 1 and 2, but there is a heuristic reason as
well. In a grammar, where the V is a statement separator rather then a
statement terminator, without rule 4 the set of anchors would contain the V but not
the start symbols of statements. Then, in the case of a missing V between
statements, which is a common error, the next statement would be skipped
Rule 4 prevents this by adding the start symbols of statements to the set of
anchors. Similar errors, e. g. the suppression of a comma between
expressions, are also quite likely to occur.
Now, input symbols are skipped until one of them appears in the set of
the anchors. In the worst case this appears at the end of the input text, since
eofsy is always among the anchors. Next, the stack must be corrected. If the
anchor is a terminal start symbol of the alternative chain, whose address is in
stack(t), analysis will be resumed at this address and the stack length will be
reduced to t - 1. In Example 2.45, only *:=' is skipped since b is a start
symbol of expr and the stack is not reduced
In summary, we can describe the principle of error handling as follows:
2.46 Principle of error handling
An error is detected if an alternative chain is unsuccessfully traversed up
to its end. Then the error is flagged and the analysis must be
synchronized. The synchronization consists of collecting a set of anchors and of
skipping the input text up to the first input symbol that is contained in the
set of anchors. With it, the analysis can be resumed at the address pc of
the anchor. During this process the stack is reduced if necessary so that at
the end of the error handling the following assertion holds:
Starting with the G-code instruction at pc the analysis can be
continued with the current input symbol typ (typ matches the alternative
chain atpc). The stack contains the return addresses of all
nonterminals currendy under process when continuing the analysis with pc.
. 2.6 Error handling 65
This error handling has two remarkable features:
1 It is completely independent of the syntax of the input language.
2 Anchors are collected only if an error is detected. It is therefore
completely dynamic and starts anew for each error. Hence, the presence of error
handling does not reduce the parsing speed in case of a correct input
string. The synchronization itself is expensive, but, since errors are
infrequent, this is only a slight disadvantage.
The algorithm Error
From the preceding section, the basic structure of the algorithm Error is
obvious now:
2.47 Algorithm Error (basic structure)
Error (tpcjaltroot):
global correct: boolean;
lacts: integer; —actual stack length
begin
correct:=false;
Print error message;
Collect anchors;
skip input symbols up to the first anchor;
Correct pc, altroot and lacts
~ It is synchronized. The analysis can continue
end Error
Error messages
The error messages are also independent of the input language. At the error
location, we simply extract all expected symbols from the G-code and list
them. In Example 2.45, the following error message will occur:
... if a:=b then c^d end ...
I
relop expected
This message is sufficient for most purposes. In Coco we also provide the
option for the user to output his own error messages (see Section 5.2.2).
The collection of anchors
Since, after synchronization, parsing is resumed with a new G-code
instruction newpc and with a new stack length newlacts, anchors are collected as
triples:
(newtyp, newpc, newlacts)
A procedure Triple produces a triple list in which the following triple
categories are included:
66
Syntax
Chap. 2
1. the terminal start symbols of the alternative chain beginning with altroot,
2. the terminal successors of the alternative chain beginning with altroor,
3. the terminal start symbols of all alternative chains whose addresses are in
the stack;
4. the terminal successors of all alternative chains whose addresses are in the
stack.
If a terminal belongs to more than one of the four categories, category 1 has
priority (because no symbol needs to be read). Category 2 has priority over
categories 3 and 4 (because synchronization can take place in the same
production where the error occurred). Of the anchors derived from the stack, the
ones closest to the error location have priority, and the terminal start symbols
of the stacked alternative chains have priority over their successors.
In order to fill the triple list with terminal start symbols and successors
corresponding to the priority rules, we use the algorithms Fill and FillSucc.
Hence, the algorithm Triple has the following form:
2.48 Algorithm Triple
Triple(ialtroot):
global triple list;
stack: array of integer;
lacts: integer;
begin
triple list := empty;
for i:«l to lacts do
FillSucc (istack (i)ii-l) ;
Fill(!stack(i)ii-1)
end;
FillSucc(ialtrootilacts);
Fill(ialtrootilacts)
end Triple
—actual stack size
—class 4
—class 3
—class 2
—class 1
As a concrete data structure of the triple list, we use two arrays newpc and
newlactsy which are indexed with the maxt + 1 terminals of the grammar:
newpc; newlact: array(0: maxt) of integer
The algorithms Fill and FillSucc use the following procedure to obtain G-
code instructions:
GetSymInstr(ipctopcodetsyt nextpctaltpc)
which supplies the G-code instruction atpc. The two last parameters have the
meaning:
Sec. 2.6
Error handling 67
nextpc: Address of the first 'symbol-recognizing' instruction (T, TA, NT,
ANY, ANY A) which follows the instruction at pc in the same
production, or 0 if no such instruction exists.
alwc: Address of the first 'symbol-recognizing' instruction which is an
alternative of the instruction at pc, or 0 if no such instruction exists.
Fill and FillSucc can now be described as follows:
2.49 Algorithm Fill
Fill(ifirstpcilacts):
global newpc,newlact: array(0:maxt) of integer;
begin
pc:=firstpc;
while pc*0 do
GetSymlnstr (ipctopcodetsytnextpctaltpc);
case opcode of
t,ta:
newpc(sy)r^pc; newlacts(pc):=lacts
I nt,nta,nts,ntas-:
for all xe first (isy) do
newpc(x):=pc; newlacts(x):=lacts
end
I any,anya: —nothing (eps and ret do not exist)
end;
pcr^altpc
end
end Fill
2.50 Algorithm FillSucc
FillSucc(istartpcilacts):
global newpc,newlact: array(0:maxt) of integer;
begin
pc:=startpc;
while pc*0 do
GetSymlnstr (ipctopcodetsytnextcptaltcp) ;
if nextcp^O then Fill(inextpcilacts) end;
pc:=altpc
end
end FillSucc
Heuristic improvements
This synchronization procedure works well in most cases and synchronizes
rapidly. However, it is not uncommon for the synchronization to be incorrect,
causing spurious error messages or the skipping of longer text portions. The
quality of the synchronization also depends on the grammar. It can be
68
Syntax
Chap. 2
improved by partitioning long grammar productions into several shorter ones.
This increases the number of anchors.
We have improved the procedure with two heuristics, which are also in-
dependent of the grammar:
1. If several errors occur close together, we print only the first one, under
the assumption that the remaining errors are spurious, resulting from the
first one. We introduce an error distance, errdist, which is set to 0 after
the handling of any error, and is increased by one for each input symbol
read. If errdist is less than a predetermined limit errdistmin when an
error occurs, no error message is given. We use errdistmin = 2, i. e. at
least two symbols must have been recognized since the last error,
otherwise a spurious error is assumed.
2. When a spurious error occurs, the stack may have already changed ftom
the value when the original error occurred. Therefore, we save the stack
at each original error, and restore it at a spurious error.
The heuristics only apply to the program Error and not to its subprograms.
Error now has the final form:
2.51 Algorithm Error (with heuristic enhancements)
Error (tpctaltroot):
global correct: boolean;
lacts: integer; —stack length
errdist: integer; —error distance
errdistmin: integer; —minimal error distance
begin
correct:=false;
if errdist<=errdistmin
then
Print error message;
Collect the anchors;
Save the stack
else Replace the stack again
end;
Skip input symbols up to the first anchor;
Correct pc, altroot, and lacts;
— It is synchronized. The analysis can continue
errdist:=0
end Error
Coco includes the above error-handling method in the generated parser.
A similar error handling was published by Spenke et al [1984]. They
assign weights to the anchors and make the use of an anchor for
synchronization dependent upon its 'insertion overhead1 and its 'reliability'.
3
Semantics
Syntax analysis checks a source program only for formal correctness. That is,
it only determines whether the input string is a sentence of the given grammar.
This function is shown in Fig. 3.1.
Source program
(character sequence)"
Parser
Recognized or
not recognized
Fig. 3.1 Parser
Translation into a target language presents the additional requirement that the
source program must be transformed into the target program. The 'meaning'
of the target program should be the same as that of the source program, i.e.
the semantics should be retained. A program that does this is a compiler (Fig.
3.2).
Source program
Compiler
Target program
Fig. 3.2 Compiler
A compiler emerges from a parser if the parser is able to emit so-called
'semantic actions' each time it has parsed some syntactic construct. The semantic
actions in turn generate output symbols which constitute in their entirety the
target program.
70
Semantics
Chap. 3
This chapter covers attributed grammars, which are presently the most
common technique for the formal description of translation processes. To
describe the translation the context-free grammar for the source program is
enhanced by three items:
1. semantic actions, which describe the actions that must be performed
during the translation;
2. attributes, which describe properties of the grammar symbols and their
environment;
3. context conditions, which describe relationships between attributes.
We will introduce these three items one-by-one, then cover the formalism of
the attributed grammar as a whole, and finally cover a subset of the attributed
grammars, the so-called L-attributed grammars, used by Coco.
3.1 Semantic actions
The description of semantic actions can be inserted directly at the desired
locations in the grammar productions, e. g. by means of the special delimiters
sem... endsem.
For a left-to-right parsing of a production A -» a>i ©2, the execution of
the semantic action statseq after parsing g>i and before parsing 02 can be
described by inserting the semantic action between ©1 and ©2:
A-4fi)i sem statseq endsem ©2
This production is to be interpreted in such a way that, for the parsing of A,
where syntax analysis proceeds from left to right, first ©1 is parsed, then the
semantic action statseq is performed, and afterwards ©2 is parsed.
For the description of the semantic actions themselves there are no
generally accepted conventions. We will use the language constructs of Adele or
Modula-2.
3.1 Example Semantic actions
Given a grammar of an arbitrary sequence of zeros and ones:
s -» os 1 is 1 e
The task consists of reversing a sentence c of L(G(5)) to produce an
output where the first input symbol is output as the last, the second input
symbol is output as the next to last, and so on. This translation is simply
written as
, 3.2 Attributes 71
-» os
I is
I e
sem Write CO') endsem
sem Write CI1) endsem
For a given input sentence, e.g. a = 001, the semantic actions can be
traced according to the syntax tree of Fig. 3.3.
If parsing is performed top-down from left to right, the output string
100 results.
sem Write CO') endsem
i—r
\
sem Write CO1) endsem
I I 1
1 S sem Write(■1») endsem
I
e
Fig. 33 Syntax tree with semantic actions
The next example will show that this method can also describe more difficult
transformations.
3.2 Example Semantic actions
Given the grammar of the previous example, the task is to transform an
input sequence of n zeros and m ones into an output sequence of the
same length which contains all n zeros followed by all m ones, i.e. the
sequence 0* lm. This translation is described by
S -» 0 sem Write CO1) endsem S
I IS sem Write CI') endsem
I e
3.2 Attributes
Even for such a simple task as the transformation of the input sequence
79 + 83* into the output sequence '162', the grammar with semantic actions
fails. In general, the input sentence of any two numbers connected by V to
72
Semantics
Chap. 3
produce an output sequence that shows the sum of the two numbers will fail.
Why?
When recognizing a constant, the lexical analyzer supplies only the
terminal class c (as explained up to now). Thus, the parser 'sees* only the
sequence c + c as input. A semantic action that produces the sum of the two
numbers, however, is not satisfied with the terminal classes of the two
numbers, but requires the values of the constants. These values are the semantic
properties of the individual members of the terminal class c. Thus, a lexical
analyzer will have to supply two items for input symbols that are terminal
classes: the type and the value of the input symbol. The symbol type (not to
be confused with the data type) is the terminal symbol in the context of the
grammar (variable, constant), and therefore a syntactic property, the symbol
value is a semantic property.
By assigning an attribute to each terminal symbol that represents a
terminal class, the semantic properties of terminal classes can be introduced into the
formal language description. We write attributes as indices preceded by an
arrow, whereby a constant now assumes the form: ct*, where x is of the type
integer. The up-arrow shows that x is the result of the parsing of c, i.e. has
the character of an output parameter.
By the use of attributes, we can describe the task of reading and adding
two constants connected by a plus sign as follows:
S -> cfx + cfy sem Write(x+y) endsem
In general, attributes describe properties that are associated with a grammar
symbol. Therefore, nonterminals can also have one or more attributes. For
example, let the following three properties apply to the symbol expr: (1) ftype
of expression', (2) the expression has no operators, and (3) the expression is
translatable at compile-time. Then we can assign these three attributes to
expression by writing
exPrTexprtype tsimple Tvalueknown
exprtype may assume various values dependent on its data type; simple and
valueknown can assume boolean values. In general, one can assign to each
nonterminal and to each terminal class X of the context-free grammar a
number of attributes that describe those properties of X that cannot be described
by the context-free grammar alone. Each attribute can assume a predetermined
number of values. These form the attribute type. The attributes of terminal
classes receive their value through the recognition of the terminal symbols by
the lexical analyzer. The values of the attributes of all nonterminals are
calculated by the semantic actions.
geC# 3.2 Attributes 73
3.3 Example Interpretation of arithmetic expressions
Consider the grammar of arithmetic expressions consisting of numbers,
operators, and parentheses, and terminated by a semicolon:
S -> E; .
E -» T |* E+T
T -> F | T*F
F -» C | (E)
We want to define formally the meaning of such an expression by a
description of its interpretation. Interpretation* means that an expression
will be read, its value computed, and the result printed. In the formal
description it must be stated that each symbol of the grammar, except for
operators, parentheses, and semicolons, has a value. This value is
denoted by an attribute. For example, the production F -» c is verbally
interpreted by the sentence 'the value of the factor F is the value of the
constant c1 and formally by the production:
FTa -> cfb sem a:=b endsem
Similarly, multiplication is described by the attributed production:
TTa ~* Ttb * FTc sem a:-*>*c endsem
This means: 'When recognizing the right-hand side, the attributes b and
c are assigned a value, and subsequently the product of these values is
computed, and assigned to the attribute a of the symbol T.
Correspondingly, the remaining productions of the grammar can be assigned
attributes and semantic actions, so the complete description is as follows:
S -> Efa sem Write(a) endsem
Eta ~* Ttb sem a:=b endsem
Eta ~* Etb + TTc sem a::sb+c endsem
Tta ~* Ftb sem a:s:b endsem
TTa ~* TTb * FTc sem a:=b*c endsem
Fta ~* ctb sem a:=b endsem
Fta ~* (ETb) sem a::sb endsem
Such a description is called an attributed grammar.
A simplified notation
The reader maiy notice that in Example 3.3 most semantic actions consist of
only an assignment. It is therefore a useful shortcut to abbreviate
74 Semantics Chap. 3
FTa -> cfo sem a:=b endsetn
by
Ftb -> ctb
This notation expresses the fact that the attribute of c is assigned to the
output attribute of F without change.
Attributes and semantic actions in EBNF
The extended Backus-Naur form can also be used for the description of
attributed grammars. Example 3.4 is the same as Example 3.3 but uses the
simplified notation in EBNF form.
3.4 Example Interpretation of arithmetic expressions in EBNF
S -* Efa sem Write(a) endsem
n. n
/ •
Eta "> Tta
{"+" Tfb sem a:=a+b endsem
}.
Tta -> FTa
{n*n Ffb sem a:=a*b endsem
}.
FTa -* cTa
With this notation, one can see how the visual separation of syntax and
semantics significantly improves readability.
Input and output attributes
All of the previously used attributes behave like output parameters: they are
generated by the parsing of a terminal or a nonterminal, and are used
afterwards. We therefore call them derived or synthesized attributes and denote
them by an up-arrow. But nonterminals can also have attributes that behave as
input parameters, i.e. attributes that already have values, when the parsing of
the nonterminal starts. Then, semantic actions which are executed during the
parsing of the nonterminal can use these values. We call such attributes
inherited attributes, and denote them by a down-arrow. The next example
shows the application of inherited attributes.
Sec. 3.2
Attributes
75
3.5 Example Inherited attributes
Given the following grammar which describes the declaration of
variables:
s -> del typ idlist ;
typ ->• real I int I bool
idlist -4 id I idlist , id
id is the terminal class of all identifiers. The declaration consists of a
keyword del, a type, and one or more variables of this type, for
example: del int x, y9 z. The semantic action, which should be
performed during parsing of the declaration, consists of entering each
variable's name name and type t into the name list. Let this be done by a
call of the procedure Newld(lnamelt). It is appropriate to call Newld
immediately after the parsing of an identifier id in the production for
idlist. But how can one recognize the type at this point since it was
already parsed in the production for typ?
The solution is to attach the type t as an inherited attribute to the
nonterminal idlist:
s ->
del typft idlistj,t ;
idlist|t ->
idtname sem Newld(iname,it) endsem
I idlist|t , idfname sem Newld(iname,it) endsem
Output attributes of a known symbol A are computed during the parsing of
the right-hand side of the A-production, and can thus be used during the
parsing of other grammar productions that contain A as a part of their right-
hand side. Thus the information flows from the bottom to the top, from the
leaves to the sentence symbol. Input attributes of a nonterminal A are
computed prior to parsing of the A-production, and are used during its parsing.
Thus the information flows from top to bottom in the syntax tree, from the
sentence symbol to the leaves. Output attributes of A describe properties of
the A-phrase, and its constituent phrases. Input attributes of A denote
properties of the environment of the A-phrase.
Figure 3.4 shows a syntax tree 'decorated1 with attributes for the
sentence:
del int x,y,z
The flow of attribute values along the dashed lines can easily be seen.
76
Semantics
Chap. 3
del
typ
Tt
4
int-
idlist
f
idlist
—]
:
?
it
s
r-
—j—
i
idlist
it
t
id
Tname
n
:
I T name
I !
z J
n
id
I T name
y .......
I
NewId<J,name it)
Newld(lname it)
f t
NewId(J,name it)
Fig. 3.4 Analysis of the sentence del int x,y,z.
The attributes flow along the dashed lines
3.3 Context conditions
The formal syntax description of a programming language is not sufficient to
distinguish between correct and incorrect programs. For example, in a
programming language where all variables must be explicitly declared, the
following code may be syntactically correct, even though it does not represent a
valid program since the variables x and y are not declared.
PROCEDURE P
VAR a,b: INTEGER
x:=y
END P
If a programming language definition states 'each variable in an assignment
statement must be declared1 this defines a relationship between textually
separated language elements, which cannot be represented by a context-free gram-
; 3.3 Context conditions 77
mar. Such constraints are thus called context conditions and are usually
considered as part of the semantics since they cannot be described syntactically.
The total set of context conditions is called the static semantics of the
programming language. The word static signifies that they refer to the source
code and not to the execution of the program.
Programming languages are full of context conditions. It would be
desirable if the language definition contained explicit definitions for them,
separating them from the other parts of the language definition and stressing their
importance. Unfortunately, this is rarely the case since they are often buried
implicitly in other definitions. Sometimes they are missing altogether since the
author wants a small defining document, or because it is assumed that the
reader understands them.
Attributed grammars also permit the formal description of context
conditions. The context condition is expressed as a relation between attributes. For
example, the context condition 'the left side and the right side of an
assignment must be of the same type1 imposes a relation between the type attributes
of both sides. If
assign = idfottypi n-=n exprtv2ttyp2 V".
is the production for the assignment, where typl and typ2 are the types of
id and expr, then the context condition is typl = typ2. The context
condition can be written separately from the production in the form
assign = idtvlttypi ":=n exprtv2ttyp2 "f"-
CC: typl=typ2
or it can be integrated into the production, e. g. in a manner proposed by Watt
and Lehrmann Madsen [1983]:
assign = idtvlttypi n:=" exPrtv2TtyP2 "'*" where(typl=typ2).
The first form separates the context condition from the production in a firmer
manner and is especially suited for several long context conditions. The
second form emphasizes the coherence between production and context
condition.
According to van Wijngaaiden's two-level grammar, the part where(...)
can be regarded as a nonterminal that is derived into an empty string if the
relationship inside the parentheses is true. It cannot be derived into a terminal
string if it is false. If typl - typ2, the syntax analysis of where(typl =
typ2) then results in the empty string, so that an assignment is parsed with the
remaining part of the production. However, if typl * typ2, the terminal
string representing the assignment statement is rejected since the wAere-part is
not terminating.
78 Semantics Chap. 3
We use the style with where and define the point of execution of the test
of the context condition by its position in the production in the following way.
The production
A = ©1 where (CC) a>2 •
means that in order to parse A, we must execute a syntax analysis from left to
right that will parse ©1 first. Thereafter the context condition CC is tested. If
it is not met, an error will be reported. Then ©2 will be parsed.
The following examples show the application of context conditions.
3.6 Example A context-sensitive language
The language [anbncn: n £ 1} is not context-free. It is shown in all
textbooks about formal languages that a context-free grammar does not
exist for this language. However, the following attributed grammar with a
context condition is easily constructed:
S = Afp Bfq Cfr where(p=q=r).
Afp « a sem p:«l endsem {a sem p:=p+l endsem}.
Bfq = b sem q:*=l endsem {b sem q:=q+l endsem}.
Cfr - c sem r:-l endsem {c sem r:=r+l endsem}.
Here, p, q> and r represent the counts of the characters a, b, and c.
The context condition requires that they are equal.
3.7 Example Context condition
The context condition In the declaration of an array, both index bounds
must be of type integer, and the lower bound must not be greater than the
upper bound', can be described as follows:
arraydeclaration -
idtname w (" constanttciTtypi ":" constanttc2ttyp2 ")"
where((typl=typ2=integer) & (cl£c2)).
where cl and c2 represent the numerical values of the bounds.
3.8 Example Context condition
The context condition feach variable appearing in a statement must have
been previously declared1, can be described as follows. One must
distinguish syntactically the applied occurrence of a name (in a statement) from
the defining occurrence (in a declaration), with the additional syntax rule:
var = id.
3J Context conditions 7 9
The nonterminal var denotes the applied occurrence of the name id.
Therefore, var must be written in all statements in place of id. If a
semantic procedure IsDeclared(iname) is used to check the symbol list to
see if the name of the variable is declared, the context condition can be
simply formulated as follows:
varTname = idtname where(IsDeclared(iname)).
If a context condition is not met, this usually affects the execution of the
subsequent semantic actions, but this cannot be expressed well in the
attributed grammar. In Coco, we therefore avoid explicit context
conditions, replacing their checking with semantic actions (see Section 3.6).
However, for the description of the static semantics of programming
languages context conditions are very suitable.
3.4 Attributed grammars
In the previous sections we have introduced the elements of attributed
grammars. We now consider them in their entirety. In the literature the concept of
an attributed grammar is defined in many different ways (see for example,
Raiha [1977], Tienari [1980], Watt and Lehrmann Madsen [1983]). We will
follow Waite and Goos [1984].
3.9 Definition Attributed grammar
An attributed grammar is a quadruple AG = (G, A, /?, K):
G = (VN> VT> P9S) is a reduced context-free grammar, A is a finite
set of attributes; R is a finite set of semantic actions; and K is a finite set
of context conditions. With each symbol X eVT u% zero or more
attributes from A are associated. With each production zero or more
semantic actions from R and zero or more context conditions from K are
associated. For each occurrence of a nonterminal X in the syntax tree of
a sentence of L(G) the attributes of X can be computed in at most one
way by semantic actions.
i
The attribute computation process
In the concept of attributed grammars, it is essential that the definition says
nothing about the order in which the semantic actions are executed. In the
previous examples, we assumed that syntax analysis was performed top-down
from left to right, and that the semantic actions were executed in the same
80
Semantics
Chap, 3
aider. However, according to Definition 3.9, this is not required. The order of
the semantic actions is not predetermined by some syntax-analysis method:
rather, it is free. This eliminates the necessity of putting the semantic actions
and context conditions in particular places of the right-hand side of the
grammar productions. All semantic actions and context conditions that belong to a
syntax production can be summarized and written at the end of the production.
In the general case, the translation runs in two phases:
1. syntax analysis, which constructs a syntax tree;
2. execution of semantic actions, which mainly compute the attribute values
attached to the nodes of the syntax tree in an arbitrary order.
Step 2 implies that an 'attribute computation process' will traverse the syntax
tree in an arbitrary manner and compute the values of the unknown attributes
at each node. A semantic action can be executed at a specific time if and only if
all attribute values which contribute to the computation are known at that time.
The attribute computation process continues until all attribute values are
calculated. It is therefore possible that the attribute computation process must
traverse the syntax tree several times, up and down, criss-crossing from left to
right In order to avoid ambiguous computations of attributes, the definition of
attributed grammar contains the sentence: 'For each appearance of a
nonterminal X, the attributes of X can be calculated in at most one way'.
3.10 Example Variable declaration
In Pascal, variables are declared by their enumeration after the keyword
var, and the type follows the list of variables. For example,
var x,y,z: integer
The semantic actions implied by the declaration may consist of a call to a
procedure Newld(inameit) which appends the name and type of the
variable to the name list In a strict translation from left to right, this
construct leads to difficulties, since the type is known only after all names
have been parsed, and therefore Newld cannot be called immediately
after recognizing a name. In an attributed grammar, these difficulties do
not arise if it is formulated as follows:
i
1 declaration
Bvarn idlistj,t0 ":n typftl sem tO:=tl endsem.
2 idlistj,tl =
idtname sem Newld(inameltl) endsem
3 | idlistj,t2 \n idfname sem Newld(inameltl); t2:=tl endsem.
Sec. 3.3
Context conditions
81
For the source text varx,y,z: integer first a syntax tree is generated,
where all attributes except those of terminal classes have no values (see
Fig. 3.5).
declarati
lit t
var idlist
idlist *
it2
idlist *
it2
1 ito
I f
integer
1 * ,- J
\ i i
f i
| |:i
I
1 \
: ! ! !
id sem Newld (J,name J,tl); t2:=tl endsem
' name
IT nam*
t
id sem Newld <J,name J,tl); t2:*tl endsem
* name
iTi
id sem Newld (J,name j,tl) endsem
• T name
Fig. 3*5 Analysis of the sentence var x,y,z: integer with the flow of
attributes along the dashed lines
The attribute computation process now starts at an arbitrary node in order
to compute the missing attributes, and to call procedure Newld.
Wherever it starts, the first semantic action that can be executed is tO := tl in
production 1. Then, tl := tl and Newld(lnameltl) in production 2
can be executed. This process continues along the dashed lines until all of
the semantic actions are executed
82
Semantics
Chap. 3
In Example 3.10, the order in which the three calls to Newld are executed is
not determined by the attributed grammar, but rather depends on the strategy
of the attribute computation process. In most cases, the order is unimportant,
and therefore this kind of attributed grammar is adequate. If desired, a
particular order can be imposed by introducing additional attributes.
Cyclic semantic dependencies
Attributed grammars can be constructed in which the attribute computation
process does not terminate since some attributes depend on themselves. This
is called a cyclic semantic dependency. In Definition 3.9, this possibility is
covered with the sentence: fFor each appearance of a nonterminal X> the
attributes of X can be calculated in at most one way'. There are algorithms that
can check the grammar for this property (Knuth [1968], Waite and Goos
[1984]). If an attributed grammar of the general form described above has
been defined, it must first be checked for cyclic semantic dependencies, and
possibly transformed into a well defined form.
3.5 L-attributed grammars
Great effort is required to translate an attributed grammar as described in the
previous section. First, the syntax tree of the program to be translated must be
generated, and each of its nodes must be 'decorated' with the attributes. Then
the syntax tree must be traversed more than once to compute the attributes until
all attributes are determined. Nowadays storage and run-time requirements
confine this method to mainframes - if it is regarded as practical at all.
Hence, special forms of attributed grammars are needed for compilers,
permitting the computation of the attributes in a single pass from left to right
through the syntax tree. Then the semantic actions can be executed in parallel
with the syntax analysis and no syntax tree is needed. Such attributed
grammars are called L-attributed (i.e. left attributed) according to Lewis et al.
[1976]. All examples in Sections 3.1 through 3.3 are of this kind. The
limitations imposed on attributed grammars to make them L-attributed, and are
related only to the order of the attribute occurrences in a production. Each
inherited attribute a of a grammar symbol X on the right-hand side of a
production must be computable before X can be recognized. Therefore, for
its computation only those attributes can be used that are known prior to the
parsing of X. From this, the following definition follows:
Sec. 3.5 L-attributed grammars 83
3.11 Definition L-attributed grammar
An attributed grammar is called L-attributed if for each of its productions
Y -> Xi... Xn, the following is true: An input attribute of Xk depends
only on the input attributes of Y and on the output attributes of
Xx ... Xk_x.
It can easily be checked by inspection whether a given grammar based on this
definition is L-attributed
The question is, how far can one get with an L-attributed grammar, and
what do the limitations mean? The general attributed grammars are
indisputably the more powerful tool. The user does not need to be concerned about the
processing order of attributes (and possibly storage of intermediate results)
since this is all done automatically by the attribute computation process. The
description is essentially static and thus 'in principle1 simple. In reality, such
descriptions can be cumbersome and difficult to understand, particularly in the
presence of many attributes.
L-attributed grammars can be used to describe the translation of nearly all
important language constructions. However, in many cases more context must
be used for the translation. This is expressed by the necessity of saving
intermediate results in lists, stacks, etc. In Section 3.6 it is shown how the non-L-
attributed grammar of Example 3.10 can be easily replaced by an L-attributed
grammar with semantic actions for temporarily saving variable names. The
worst that can happen is that the order of the semantic actions which is
imposed by the use of the L-attributed grammar will require the partition of the
translation into several passes in which each pass can be defined by an
L-attributed grammar. In view of these disadvantages, Waite and Goos [1984]
say: L-attributed grammars are inadequate, even in comparatively simple
cases.1 We do not agree with this categorical statement. In most cases, the
simplicity and the ease of implementation of L-attributed grammars more than
compensate for their disadvantages. Therefore we feel that they are a very
suitable tool for compiler implementations, at least as long as our computers
are limited in memory and speed.
Coco processes only L-attributed grammars, and all attributed grammars
in the following chapters of this book are L-attributed.
Algorithmic interpretation of L-attributed grammars
While general attributed grammars are a declarative and therefore
non-algorithmic formalism, L-attributed grammars can also be regarded as algorithmic
descriptions, imposing an order in which semantic actions have to be executed.
84
Semantics
Chap. 3
Programmers who are used to think algorithmically will find it easier to follow
this approach. Therefore, we understand an L-attributed grammar as a very
high-level algorithmic language in the following sense.
The context-free portion of a production
a = ax | cc2 | ccn.
denotes the algorithm: Parse the nonterminal A by choosing the matching
alternative a/, and recognizing its components sequentially from left to right1
Each alternative with a semantic action of the form
(Xi = Xi. . .Xj sem SA endsem Xj+1. . .Xn
denotes the algorithm: 'Parse X\ through Xj, then execute the semantic
action SA9 and then parse X/+i through Xn.%
Each alternative with a context condition of the form
(Xi - Xi...Xj where (CC) Xj+i...Xn
denotes the algorithm: 'Parse Xi through Xj, then test the context
condition CC (and report any errors), and then parse Xj+\ through Xn.%
An attributed production of the form
AiaoTbO = Xialtbl Yia2Tb2-
denotes the following algorithm:
1. compute al (using semantic actions that are not stated here, which must
precede X and may depend on a0)\
2. parse X (thereby bl gets a value);
3. compute a2 (using semantic actions that are not stated here, which must
precede Y and may depend on 00, al> M);
4. parse Y (thereby b2 gets a value);
5. compute bO (using semantic actions that are not stated here, which may
depend on a0, al9 bl9 a2> b2).
This algorithmic interpretation adds as a further clause to the definition of L-
attributed grammars (Definition 3.11) the sentence: 'Attributes that are used as
arguments in a semantic action or context condition between the grammar
symbols Xj and Xi+1 can only be input attributes of the left-hand side of the
production and output attributes of Xx toX;.'
Sec. 3.6 Implementation of the semantic interface 85
3.6 Implementation of the semantic interface
The implementation of the semantic interface in a compiler compiler and in the
generated compiler consists of three tasks:
1. translation and storage of semantic actions during compiler generation
time and execution of semantic actions at run-time of the generated
compiler,
2. translation and storage of context conditions during compiler generation
time and test of context conditions at run-time of the generated compiler,
3. reserving memory for attributes at compiler generation time and attribute
passing at run-time of the generated compiler.
These tasks are most simply and direcdy implemented if the generated
compiler performs its syntax analysis with the popular method of recursive descent,
which is not covered in this book (Gries [1971], Hartmann [1977], Wirth
[1986]). In this, semantic actions and context conditions are directly
embedded as code in the syntactic procedures, and attributes become parameters of
the syntactic procedures. The simplicity of this kind of semantic interface
makes the method of recursive descent still attractive today for hand-coded
compilers. If the generated compiler performs a table-driven syntax analysis,
then somewhat more effort is required for the semantic interface. In this
section, we cover the method used by Coco.
Semantic actions
The semantic actions are numbered. The order is arbitrary, but it is easiest to
order them as they appear in the attributed grammar. We start the numbering at
12 for reasons that follow. All semantic actions are placed in the single
procedure Semant as follows:
Semant(inr):
case nr of
12: Semantic Action 12
I 13: Semantic Action 13
I n : Semantic Action n
end
end Semant
The G-code is expanded to provide as many instructions as there are semantic
actions. The G-code instructions treated in Section 2.4 (and two more, see
Definition 3.14) have operation codes 0 through 11. Operation codes 12
through 255 correspond to semantic actions 12 through 255. Thus, Coco has
a limit of 244 semantic actions which will probably be rarely reached. We only
86 Semantics Chap. 3
need 68 semantic actions to describe the attributed grammar of Coco itself, and
126 semantic actions for the largest pass of a Modula-2 compiler.
For the processing of semantic actions the parser of Algorithm 2.44 needs
to be expanded only by an if statement:
3.12 Parser with semantic interface
Parse(Tcorrect):
loop
case opcode of
t: ...
I ret: ...
else
if correct then Semant (I opcode) end
—perform semantic action
end — case
end — loop
end Parse
We will now study this method in more detail by an example that uses an L-at-
tributed grammar to translate the following declaration:
var x,y,z: integer;
(In Example 3.10 we have already given a general attributed grammar for this
task.) Before we can add the identifier list and type to the name list, it must be
temporarily stored. To this purpose we will use a queue as abstract data
structure with the access procedures I nit Queue, Enqueue, Dequeue, and
EmptyQueue whose meaning is obvious. The attributed grammar is as
follows:
declaration =
wvarn idfname sem InitQueue; Enqueue (Iname) endsem
{w,w idtname sem Enqueue (iname) endsem
) n:n typeft sem while not EmptyQueue do
Dequeue(Tx); Newld(ixit)
end
endsem
n. n
The numbering of the semantic actions and their integration into the procedure
Semant results in the following:
Semant(inr):
local name,x: Nametype;
t: (int, bool, real);
begin
case nr of
Sec. 3.6 Implementation of the semantic interface 8 7
12: InitQueue; Enqueued name)
| 13: Enqueue (iname)
| 14: while not EmptyQueue do
Dequeue(tx); Newld(ixit)
end
end
end Semant
The attributes are local variables of Semant This means that in general all the
names contained in a semantic action (enclosed between sem and endsem)
are global to this semantic action, and therefore common to all of the other
semantic actions.
Context conditions
Context conditions are not treated as an independent language element in
Coco. Rather, they are represented as semantic actions. Instead of
where (CO
we write, for example,
sem if not CC then SemErr end endsem
where SemErr is a semantic error processing procedure.
Attribute passing
Coco treats all attributes as local variables of Semant. They receive their
value through attribute passing. This is different for terminals and
nonterminals. The attributes of terminals (i. e. terminal classes) are always
synthesized attributes. They receive their value by the lexical analyzer during
parsing. The inherited attributes of nonterminals are passed before parsing by
an implicit semantic action, whereas the synthesized attributes are passed after
parsing.
3.13 Example Attribute passing
For the productions
A = ... BixTy ...
Biutv = •••
the attribute passing
u:=x
is done in the A-production before the parsing of 5, and the attribute
passing
y:=v
is done in the A-production after the parsing of B.
88 Semantics Chap. 3
The attribute passing after the parsing of a nonterminal can be executed by a
'normal1 G-code instruction, i. e. by an instruction activating a semantic
action. However, for the passing of inherited attributes, two additional G-code
instructions are necessary:
3.14 Definition G-code (remainder)
Instruction Bytes Description
NTS sy sem 3 nonterminal with input attribute semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production for sy, else
report an error
NTAS sy adr sem 5 nonterminal with alternative and input attribute
semantics.
If the next input symbol is a terminal start symbol of sy,
then execute the semantic action sem (for input attribute
passing) and start the parsing of the production else go to
act.
A complete example for the translation of an attributed grammar into G-code,
including attribute passing semantics, can be found in Section 8.3.
Problems with semantic interfaces
The simplicity of this semantic interface gives rise to two problems:
1. Semantic actions may only be executed when it is clear that no other
alternative will match. In the production
A = sem actionl endsem C.
I sem action2 endsem D.
it must be determined whether C or D is the proper alternative before
executing actionl or actionl. Coco takes this into account by automatic
insertion of an e-node before the corresponding semantic actions, which
leads to the following result:
A = sem action 1 endsem A ^e -fraction 1 -* C
C i
I sem action 2 endsem e -taction 2 -> D
D.
EPSA
SEM
NT
RET
M: EPS
SEM
NT
1 M
12
C
2
13
D
RET
. 3.6 Implementation of the semantic interface 8 9
where the proper selection of alternatives is done with the following
lookahead sets:
epsset(l) * first(C)
epsset(2) = first(D)
This also works in the following production:
A = B sem action 1 endsem
{ sem action 2 endsem
C sem action 3 endsem
}.
For the above the following top-down graph and corresponding G-code
is generated:
f NT B
A => B^ actionl-^ e—* action2 -*• C-*action3 —' SEM 12
\ M1:EPSA 1 M2
e SEM 13
NT C
SEM 14
JMP Ml
M2:EPS 2
RET
with the lookahead sets
epsset(l) = first(C)
epsset(2) - follow(A)
If the e-nodes have disjoint lookahead sets, these constructs are LL(1).
Attributes in Coco are implemented as local variables of Semant. This
results in the undesirable feature that their values are not retained during
recursive parsing of nonterminals. For example, in the interpretation of
expressions, the following production arises:
Etx * TTx fn+" Tty sem *:=x+y endsem}.
Here, the output attribute x of the left T must be still available after
parsing of the right T since its value is used afterwards. However, since
T is recursive over F and E9 the attribute x of the left T may be
destroyed by the parsing of the right T. Coco does not take care of this
problem. It is up to the programmer to save and restore x explicitly. This
can be done by use of a stack and replacing the above production by the
following:
90
Semantics
Chap. 3
ETx s Tfx ^"+n sem Push(ix) endsem
Tfy sem Pop(tx); x:=x+y endsem}.
From this follows the
3.15 Principle of attribute saving for recursive symbols
Attribute values that must be preserved beyond the parsing of a recursive
nonterminal X must be saved before the parsing of X and restored after
the parsing of X.
4
Various compiler compilers
In the previous chapter we covered the theoretical background of compilers. In
the following chapters we will show the practical application of these
principles in the design of the compiler compiler Coco.
However, before we go into the details of Coco, it will be interesting to
look at some other compiler compilers. This will enable the reader to compare
Coco with these systems.
There is extensive literature about compiler-generating systems.
Bibliographies can be found at Raiha [1980] and Meijer and Nijholt [1982]. The
scope of this book allows us to cover but a few of them; and even then only to
a limited degree. Some of the best-known compiler compilers are YACC
(Johnson [1975]), HLP84 (Koskimies [1984]), GAG (Kastens etaL[l9S2\),
and MUG (Ganzinger and Giegerich [1984]). In the following paragraphs, we
will compare these systems to each other.
The basic operation of today's compiler compilers is always the same.
The compiler to be generated is described by a metalanguage based on
attributed grammars. From this compiler description, a parser and a semantic
evaluator are generated which constitute the essential parts of the resulting
compiler. The generated compiler reads the source text to be translated,
performs a syntax analysis to check the correctness of the input, and builds a
syntax tree in memory. It then assigns attribute values to the tree nodes
according to the attributed grammar. This process normally requires several
passes which traverse the tree from left to right or from right to left. In each
pass as many attributes as possible are evaluated. Finally the total semantics of
the source program is represented by the attributes in the tree. The last pass
generates the target code from the attribute values.
92
Various compiler compilers
Chap. 4
The various compiler compilers mainly differ in their compiler description
languages, and in their algorithms to traverse the syntax tree. Although much
effort is spent to reduce execution time and attribute space, large memory
requirements and long processing times are the main reasons why
automatically generated compilers are still less efficient than hand coded compilers.
Therefore some compiler compilers like YACC and Coco bypass the
construction of a syntax tree and accept that they are less powerful and less
generally applicable than HLP84, GAG, or MUG.
The above mentioned compiler compilers will be compared without going
into too much detail. We will give a short example of their input language
which will show the translation of a signed integer constant into its value.
Normally, such tasks are handled by the lexical analyzer. However, they can
also be solved with an attributed grammar, which is short and easy to
understand and is therefore well suited as an example of attributed grammars.
Of course compiler compilers can achieve more than what is demonstrated
in this short example. Most of them will only show their advantages on a large
and complex task. However, these small examples will allow some interesting
conclusions about the user-friendliness and the effort required to learn the
description language of the various systems.
4.1 YACC - yet another compiler compiler
Origin and scope
YACC was produced by Stephen C. Johnson at Bell Laboratories in 1975. It
runs under Unix and is therefore widely available. YACC accepts L-attributed
grammars with the limitation that each grammar symbol has only one
synthesized attribute and no inherited attributes. From the compiler
description, YACC generates an LALR(l) parser (Lookahead LR(1)) and a semantic
analyzer which is simply a collection of all of the semantic actions of the
compiler description. The user must supply a main program, a lexical
analyzer, and a syntax-error handler.
Description language
The syntax parts of the YACC source language are written as BNF
productions. All terminals (with the exception of literals) must be declared.
For the production XO : XI X2 ... Xn, the symbol $$ denotes the
attribute of XO, $1 the attribute of XI, and $n the attribute of Xn. Semantic
actions can be specified at any position between the symbols of the context-
gec. 4.1 YACC - yet another compiler compiler 9 3
free grammar. They must be written in C and may contain an arbitrary
sequence of valid C statements. Context conditions are written as if statements
in semantic actions. At the end of the grammar, one can write C procedures
which are called in the semantic actions. At this point also a scanner procedure
named yylex must be provided.
Attribute processing
The attribute processing is done in a single pass during syntax analysis. An
explicit syntax tree of the source language is not produced.
Implementation
YACC is written in C and produces compilers that are also written in C. It has
been used for the translation of many languages, including C, APL,
RATFOR, and Pascal.
4.1 Example Attributed grammar as input for YACC
%start Number /* start symbol of the grammar */
%token digit /* declaration of terminals. Literals don't */
/* have to be declared as terminals. */
%% /* separator */
Number:
1
Digitlist:
1
"-" Digitlist
Digitlist
digit
Digitlist digit
{printf(-$2);}
{printf($l);>;
{$$ = $1;}
{if (($1>3276) J|
{($1=3276) && {$2>7)))
{printf{"Constant too bign);
$$ = 0;}
else {$$ = $1*10 + $2;}
};
/
%%
#include<ctype.h>
yylex{) { /* lexical analyzer */
int ch;
while {{ch=getchar{))==" ");
if {isdigit {ch)) {yylva^ch-'O1; return {digit);}
else return {ch);
yyerror{s)
char *s;
{printf{M%s\n",s);}
/*error procedure*/
main() /*main procedure*/
{return{yyparse{));}
9 4 Various compiler compilers Chap. 4
4.2 HLP84 - Helsinki language processor
Origin and scope
The first version of HLP was produced in 1978 under the name HLP78 at the
University of Helsinki by RSiha et al. [1983]. Since then a new version,
HLP84 (Koskimies [1984]), has been created which has little in common with
the previous one. HLP84 accepts attributed grammars for a one-pass
translation of programs. It generates a scanner, an LALR(l) parser with error
handling, and a semantic evaluator to which user procedures can be attached.
Symbol table handling can be partially described in the compiler definition
language; in certain cases it is even done automatically. This reduces the
number of semantic procedures required.
Description language
The description language Lisa is nonterminal oriented. This is in sharp
contrast to other compiler description languages, where the emphasis is on
productions. Each nonterminal is described by a block which forms the scope
of its local objects. This is similar to the use of procedures in higher-level
languages. A block contains all productions of a nonterminal in extended
BNF, as well as the description of all terminals used in it. Within a block,
attributes and local variables are declared in a Pascal-like form.
A set of semantic rules consisting of assignments and function calls is
attached to each production. These rules assign values to the synthesized
attributes on the left-hand side and to the inherited attributes on the right-hand
side of the production. An attribute a of a grammar symbol S is denoted by
5. a. Terminals can have a single synthesized attribute. There is a specific
language element for context conditions. Lisa provides some standard facilities
for frequently needed operations such as definition of scopes and searching of
names in them. These mechanisms free the user from some clerical work. For
example, an identifier will be automatically searched in all open scopes and its
node in the syntax tree will be automatically attributed according to the
information in its symbol table entry.
Attribute processing
Attributes are processed in a single pass from left to right by means of an
attribute-stack and without an explicit syntax tree. This limits the application of
HLP84 to languages that can be translated in one pass although it is not
required that semantic analysis is done during syntax analysis.
Sec. 4.2 HLP84 - Helsinki language processor 9 5
Implementation
HLP84 was implemented on a Burroughs B7800 computer in Pascal. It
generates compilers in Pascal. The system has been used for its own
implementation and for the generation of a Pascal compiler.
4.2 Example Attributed grammar as input for HLP84
external — declaration of external Pascal-objects
type Outfile = Extfile;
function Writelnt(f:Outfile; irlnteger): (frOutfile) =
procedure ExtOut(var f:Extfile; i:Integer);
— Connects the Pascal-procedure ExtOut with the Lisa-function
— Writelnt.
— Extfile and ExtOut are given in a special system file.
nont Number; — description of the nonterminal Number (start sym.).
— Number has no attributes,
attrset Intval = (val: Integer);
— val is declared to be an integer attribute. The attribute
— declaration is given the name Intval.
var out: Outfile; — global variable
const max = 65535;
nont SignedNumber: Intval; — description of the nt "SignedNumber".
— SignedNumber has an attr. set "Intval"
nont DigitList: Intval;
check val < max; — context condition
token DigitToken: Integer = Digit;
— the terminal "DigitToken" with an attr. of type Integer is
— declared to consist of a single digit (Digit is predefined)
DigitList - DigitToken; — syntactic production
rules — semantic rules
val:=DigitToken
— the attr. of a token is denoted by the name of the token
end;
DigitList = DigitList DigitToken;
rules
val:=10*DigitList.val+DigitToken
end
end DigitList;
SignedNumber = '-' DigitList;
rules
val:=-DigitList.val
end;
SignedNumber = DigitList;
rules
96
Various compiler compilers
Chap. 4
val:=DigitList.val
end
end SignedNumber;
Number = SignedNumber;
rules
post out:=WriteInt(out,SignedNumber.val);
— after SignedNumber is processed, its attribute val is written.
end
end Number
4.3 GAG - generator based on attribute grammars
Origin and scope
GAG was developed by Kastens, Hutt, and Zimmermann [1982] at the
University of Karlsruhe. It accepts ordered attributed grammars where the
attribute evaluation order of each nonterminal is fixed and independent of the
context of the nonterminal. From the compiler description, an attribute
evaluator and an LALR(l) parser are produced (by separate tools). The user
must supply a lexical analyzer and a few other procedures such as a code
generator. These modules together with some fixed parts constitute a complete
compiler.
Description language
The grammar is written in extended BNF with special constructs for options
and repetitions. All nonterminals and terminals (except literals) must be
declared. Every production is associated with a set of semantic rules. In these
rules the strongly typed, functional language Aladin is used, allowing attribute
assignments and function calls. The right-hand side of an assignment can be a
complex expression of attribute values, function calls, if expressions, syntax
symbols, and many others (see Example 4.3). As a functional language Aladin
has neither variables nor control statements. The attribute notation S.a means
the attribute a of the symbol 5. If S occurs in a production several times, the
first occurrence is denoted by 5[1], the second by 5[2], and so on. There is
a special language element for context conditions.
Attribute processing
A decorated syntax tree is built during attribute evaluation, but it is not
traversed in alternating passes from left to right and from right to left, as is
done in some other compiler compilers. A node is visited if there are no more
Sec. 4.3 GAG - Generator based on attribute grammars 9 7
nodes to the left of it, and a parent node is visited when no more of the
children can be visited. The syntax tree is therefore not processed in a straight
direction. In fact, evaluation may sometimes step back some nodes to evaluate
attributes that could not be computed earlier. In this manner, the number of
passes over the tree can be reduced. The memory requirements for attributes in
the syntax tree are optimized by various algorithms. After the attribute
evaluation, the decorated syntax tree is passed to a user program which generates the
target code.
Implementation
GAG is implemented in Standard Pascal under Unix BSD 4.2 on a Siemens
computer 7.760. It also generates compilers in Standard Pascal. Compilers for
Pearl, LIS, Pascal, and Ada have already been produced by GAG.
4.3 Example Attributed grammar as input for GAG
% symbol and attribute declarations
TERM digit value: INT SYNT;
% value is a synthesized integer attribute
NONTERM Number
NONTERM Digitlist value: INT SYNT;
% rules
RULE rl:
Number ::= ["-"J Digitlist
STATIC
Number.value:^
IF n-w IS THERE
THEN -DigitList.value
ELSE DigitList.value
FI
% No output of the attribute Number.value.
% The attributed tree is passed to a user written program,
% which prints the results.
END;
%
RULE r2:
Digitlist ::= digit
STATIC
Digitlist .value :=digit .value
END;
%
RULE r3:
Digitlist ::- Digitlist digit
STATIC
Digitlist[13.value:=10*Digitlist[2].value+digit.value
CONDITION
(Digitlist[2].value<3276) OR
((Digitlist[2].value-3276) AND (digit.value<8))
MESSAGE "Constant value too big"
END;
9 8 Various compiler compilers Chap. 4
4.4 MUG - modular compiler generator
Origin and scope
MUG (Modularer Ubersetzer-Generator) was developed in 1985 at the
University of Dortmund (Germany) by Ganzinger and Vach. It processes so-
called one-sweep grammars (Engelfriet and File [1981]). MUG supports all
phases of semantic analysis (attribute processing, optimization, and code
generation). However, it does not produce a scanner or a parser. Those can be
generated with YACC and then attached to the MUG system. Semantic
modules are written in Modula-2.
The underlying principles of MUG are substantially different from
traditional attributed grammars. Terminals are viewed as the types of some
semantic objects (so-called semantic sorts), nonterminals are viewed as the
types of syntax trees (so-called syntactic sorts). Productions are therefore
viewed as functions, mapping objects of syntactic and semantic sorts into
syntax trees which are themselves elements of syntactic sorts.
The translation of trees of an input grammar into trees of an output
grammar is called an attribute coupling of the two grammars. Attributes can
be classified as semantic attributes, which contain semantic values (and
therefore, like the values of terminal symbols, are objects of semantic sorts)
and syntactic attributes, which represent subtrees of the output grammar (and
thus are objects of syntactic sorts). Semantic attributes are computed in
semantic rules, whereas syntactic attributes are built by applying productions
of the output grammar. Semantic attributes can also be viewed as 'terminal
symbols1 of the output grammar.
As a result of this view, several attribute coupling processes can be
concatenated so that the output grammar of the first coupling becomes the
input grammar of the second one. As an option, MUG can automatically
combine the two attribute couplings into a single one. The user can therefore
describe complex translation processes as a sequence of simple translations
(e.g. L-attributed grammars), which the system - hidden from the user -
combines into a single attributed grammar that does not need to be L-attributed. In
this manner, readability is balanced with efficiency.
Description language
MUG uses one description language for all translation phases. It is based on
Modula-2. The production
Prodi: A -> B c
is written in a function-like manner as
CONSTRUCTOR Prodi (btree:B; cvalrc): A
Sec. 4.4 MUG - modular compiler generator 9 9
^n attribute a of a nonterminal S is written as SAa. All nonterminals must
be declared together with their attributes and attribute types. For semantic
sorts, the user must write Modula-2 modules that export them as types unless
they are standard types of Modula-2. There must be separate modules for the
input grammar, the output grammar, and their attribute coupling. Semantic
rules can contain assignments with arbitrary Modula-2 expressions, function
calls, and if expressions. Syntactic attributes are calculated through
constructors of the output grammar. Context conditions have no construct of their
own. They must be specified within semantic functions.
Attribute processing
The attribute processor generated by MUG uses the 'one-sweep1 method,
which is an L-attributed processing of the syntax tree, where possibly children
of each node have been previously brought into an adequate order.
Implementation
MUG was implemented in Modula-2 on a CADMUS computer. It generates
compilers in Modula-2 and has been used for its own implementation.
4.4 Example Attributed grammar as input for MUG
SIGNATURE DEFINITION MODULE Numbers;
(♦definition of the context-free input grammar*)
FROM Values IMPORT
Value; (*syntactic sort from the output grammar*)
FROM User IMPORT (*semantic sorts (terminals)*)
digit, minus;
SORT (*syntactic sorts (nonterminals)*)
Number, Digitlist;
(*rules of the context-free grammar*)
CONSTRUCTOR PosNumber(dl:Digitlist): Number;
CONSTRUCTOR NegNumber(m:minus; dl:Digitlist): Number;
CONSTRUCTOR SingleDigit(dtdigit): Digitlist;
CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist;
(♦attribution function for the context-free grammar*)
OPERATOR Evaluate(n:Number): Value;
END Numbers.
SIGNATURE DEFINITION MODULE Values;
(♦definition of the context-free output grammar*)
SORT Value;
CONSTRUCTOR Result(val:INTEGER): Value;
END Values.
100 Various compiler compilers Chap. 4
ATTRIBUTATION MODULE Numbers;
(♦attribute coupling of the above grammars*)
FROM Values IMPORT Value;
OPERATOR Evaluate(ntNumber): Value;
(♦declaration of attributes*)
ATTR Number SATTR nval: Value;
ATTR Digitlist SATTR dval: INTEGER;
(*attributations of the productions*)
CONSTRUCTOR PosNumber(dl:Digitlist): Number;
BEGIN
PosNumberAnval = Result (dlAdval) ;
(*the constructor "Result" builds a
syntactical attribute of type "Value"*)
END PosNumber;
CONSTRUCTOR NegNumber(m:minus; dl:Digitlist): Number;
BEGIN
NegNumberAdval = Result(-dlAdval);
END NegNumber;
CONSTRUCTOR SingleDigit(d:digit): Digitlist;
BEGIN
SingleDigitAdval = d;
END SingleDigit;
CONSTRUCTOR MoreDigits(dlrDigitlist; d:digit): Digitlist;
BEGIN
MoreDigitsAdval = 10 * dlAdval + d;
END MoreDigits;
END Evaluate;
END Numbers.
4.5 Coco - compiler compiler
Origin and scope
Coco arose in 1983 at the University of Linz as a successor of a parser-
generator. It processes L-attributed grammars, which are viewed as procedural
descriptions of a translation process. The compiler description is translated
into an LL(1) parser with automatic error recovery and a semantic evaluator to
which user modules can be attached. The user must further supply a main
program and a scanner (for which there is a scanner generator). It is possible
to generate multi-pass compilers with Coco.
Sec. 4.5
Coco - compiler compiler
101
Description language
The compiler description language Cocol is based on context-free grammars
in Wirth's EBNF notation. All terminals and nonterminals must be declared.
Each syntax symbol can have one or more attributes. A symbol 5 with an
output attribute a is written as S<out:a> wherever it occurs within a
production. Semantic actions are written direcdy in Modula-2. They may appear
at arbitrary points on the right-hand side of the productions. Attributes can be
accessed like normal variables. Context conditions are written as if statements
in semantic actions.
Attribute processing
Semantic evaluation takes place during the syntax analysis. A syntax tree of
the input is not built. Productions are processed strictly from left to right.
When a semantic action is encountered, it is executed immediately. Attribute
values of terminals are returned by the scanner, those of nonterminals are
passed using assignments generated by Coco.
Implementation
Coco is implemented in Modula-2 on various microcomputers including
Macintosh, IBM-PC, Atari, and Lilith. It is also available on IBM
mainframes. Coco generates compilers in Modula-2. It has been used for the
construction of a multi-pass Modula-2 compiler and for the generation of
several tools for static program analysis.
4.5 Example Attributed grammar as input for Coco
GRAMMAR Number
SEMANTIC DECLARATIONS
FROM InOut IMPORT WriteString, Writelnt;
VAR value,valuel: INTEGER;
TERMINALS
digit <out:value>
NONTERMINALS
Number
Digitlist <out:value>
RULES
Number =
Digitlist<out:value> sem Writelnt(value,5); endsem
I n_n
Digitlist<out:value> sem Writelnt(-value,5); endsem.
Digitlist<out:value> =
digit<out:value>
102
Various compiler compilers
Chap. 4
{ digit<out:valuel> sem IF (value<3276) OR
((value=3276) AND (valuel<8))
THEN value:=s10*value+valuel;
ELSE
value:-0;
WriteString("Constant too big*);
END;
endsem
}.
ENDGRAM
4.6 Summary
This short overview of some of the better known compiler compilers has
shown that many powerful systems with complex input languages exist for the
definition of many exotic special cases. Why then are these generators so
seldom used for practical applications? There are many reasons. The most
significant is the fact that automatically generated compilers are simply less
efficient than manually coded ones. According to Koskimies et al. [1982], a
Pascal compiler produced with HLP78 ran seven times slower and used three
times as much memory (only for its code!) than a manually generated
compiler.
However, efficiency is not the main goal of a compiler compiler. Often it
is more important that the compiler description be short, formal, and complete.
Then it can be used as a prototype of a compiler implementation for a new
language or to study the techniques of compiler construction as such.
Compiler description languages are sometimes not easy to read. In most
cases ordinary BNF is used for the syntax definition. Although concise and
elegant, this notation often looks unnatural because of the recursion needed to
express repetitions. Attributes usually appear only in semantic rules and not
with the grammar symbols. This makes the productions short, but the reader
must extract from the semantic rules those attributes which belong to a given
syntax symbol. In many cases, the semantic rules may only be attribute
assignments. Therefore, important parts of the actual translation must be
hidden in procedures. Having these difficulties to contend with may even
make the compiler compiler a burden rather than a help.
Finally, most compiler compilers require a lot of memory themselves. For
example, GAG required 4 megabytes of main memory for the generation of an
Ada compiler, and this amount of memory is not available on many
microcomputers.
Sec. 4.6
Summary
103
We believe that a compiler compiler should be a tool which is easy to
understand and easy to use. Above all, its input language should be clear and
natural, but its availability (e.g. on microcomputers) and efficiency are equally
important These were the considerations behind the development of Coco and
its input langage Cocol.
Table 4.1 summarizes the main features of the described compiler
compilers.
Table 4.1 Properties of various compiler compilers
Developed in
Class of
context-free grammars
Class of attributed
grammars
Generated parts
Attribute evaluation
order in the tree
Syntax notation (example)
Attribute notation (example)
Semantic rales (actions)
Context conditions
Applied to which
languages
Implementation language
Language of the
generated compiler
Host computer
1
YACC
1975
LALR(1)
L-attributed grammars
with a single synthesized
attribute per symbol
Parser,
semantic evaluator
No
—
Digitlist: digit
1 Digitlist digit;
$$. $1. $2,...
Arbitrary C statements
Embedded in semantic
actions
C, Apl, Ratfor, Pascal
C
C
On most Unix systems
HLP84
1978/1984
LALR(1)
^attributed grammars
Scanner,
parser, attribute evaluator
No
In a single pass from left
to right
Digitlist = digit;
Digitlist« Digitlist digit;
Digitlist value
Assignments,
attribute expressions,
function calls
Special construct
Pascal
Pascal
Pascal
Burroughs B7800
GAG
1980
LALR(1)
Ordered attributed grammars
(evaluation order of attributes
independent for every
nonterminal)
Attribute evaluator
Yes
Evaluation order results
from ordered attr. grammar
Digitlist ::=digit+
DigjtlisLvalue
Special attribution language,
assignments, function calls,
attribute expressions,
Special construct
PEARL, Pascal, Ada
Standard-Pascal
Standard-Pascal
MUG
1985
LALR(1)
'One-sweep'grammars
(L-attributed grammars on
possibly reordered syntax
trees)
Attribute evaluator
Yes
In a single 'sweep'
CONSTRUCTOR PI
(dl:DigitIist; d:digU): Digitlist
DigitlisfValue
Assignments,
attribute expressions,
function calls, constructors
Embedded in semantic
actions
For its own implementation
Modula-2 1
Coco
I 1983
LUD
L-attributed grammars
Parser,
semantic evaluator
No
—
Digitlist = digit {digit}.
Digitlist <out:value>
Arbitrary Modula-2 1
statements
Embedded in semantic
actions 1
Modula-2, 1
software engineering tools 1
Modula-2 1
Modula-2 1 Modula-2 J
Siemens 7.760 ) Cadmus / Macintosh, IBM-PC,.., /
5
The compiler description
language Cocol
This chapter describes Cocol, the input language of the compiler generator
Coco. A Cocol text essentially consists of an attributed grammar and
declarations. From this description, Coco generates a parser and a semantic
evaluator. The user has to provide a main program, a scanner, an error
message module and semantic modules to get a complete compiler. Some of
these modules can be generated by tools or are standard modules that do not
depend on the language to be processed.
The attributed grammar consists of a context-free grammar as a
description of the compiler input and of semantic information as a description of how
this input is to be translated. When designing an attributed grammar one
usually starts with the context-free grammar and completes it step by step with
attributes, semantic actions and context conditions. Therefore this chapter is
arranged in two parts: the specification of Cocol as a syntax description
language and its specification as a semantic description language.
5.1 Lexical structure
A grammar description in Cocol consists of keywords, identifiers, strings,
numbers, comments and special characters.
Keywords
ALIAS ENDSEM MACROS RULES
105
106 The compiler description language Cocol Chap. 5
ANY EPS NONTERMINALS SEM
DECLARATIONS GRAMMAR OUT SEMANTIC
ENDGRAM IN PRAGMAS TERMINALS
Keywords must be written with upper-case letters, except for the following
keywords that may also be written with lower-case letters, as they often
appear in a context where they are not to be emphasized
alias endsem in sem
any eps out
Identifiers
identifier = letter {letter I digit}.
Identifiers may be of arbitrary length. Case is significant.
Strings
string = quote {anybutquote} quote
I apostrophe {anybutapostrophe} apostrophe.
quote means the character ", apostrophe means the character \ anybutquote
is any character except quote, anybutapostrophe is any character except
apostrophe. Strings must not extend beyond line boundaries.
Numbers
number = digit {digit}.
Special characters
for the syntax description: !()[]{} = .
for the semantic description: < > : , ;
Comments start with the string'--' and extend to the end of the line.
5.2 Cocol as a syntax description language
The kernel of a Cocol text is the syntactic description of the language that the
generated compiler is to process.
Grammar = "GRAMMAR" identifier
SyntaxDeclarations
Productions
"ENDGRAM".
The syntax description consists of declarations for terminals and nonterminals
and of the context-free grammar. The identifier following the keyword
§ec. 5.2 Cocol as a syntax description language 107
GRAMMAR is the grammar name. It is the root symbol (start symbol) of the
grammar and must be declared as a nonterminal. We start with the productions
and continue with the declarations later.
5.2.1 Productions
The productions of the context-free grammar are written in an EBNF
suggested by Wirth [1982] (square brackets enclose optional expressions,
curly brackets denote repetition zero or more times).
Productions = "RULES" {Production}.
Production « identifier "=" Expression ".".
Expression = Term {"|" Term}.
Term = Factor {Factor}.
Factor = Symbol
I "(" Expression ")"
I "[" Expression "]"
I "{" Expression "}"
I "eps"
I "any".
Symbol » identifier | string.
5.1 Example Cocol grammar for real constants
RULES
Real = Integer "." [Integer] [Exponent].
Integer = digit {digit}.
Exponent = "E" ["+"1"-"] integer.
The symbols Real, Integer and Exponent are nonterminals. The
symbols digit, "£", ".", "+" and "-" are terminals (they have no
productions).
eps
The symbol eps denotes the empty string (see Section 2.1) and is used to
describe empty alternatives.
5.2 Example The use of eps
sign - "+" | "-" | eps. is equivalent to sign - ["+" i "-"].
eps is not necessarily needed for the syntax description, but it is required if
one has to attach semantic actions to empty alternatives.
any
The symbol any denotes any terminal, which is not the start of the alternative
108 The compiler description language Cocol Chap. 5
chain to which the any symbol belongs. Therefore any is a representative of
a whole set of terminals, i.e. all terminals which cannot be recognized instead
of it at that point in the grammar.
5.3 Example The use of any
Option - w$n any.
Here, any means any terminal.
Token = keyword I identifier I number I any.
Here, any means any terminal except keyword, identifier or number
(which may be recognized instead of it).
String = "" {any} ,nf.
Here, any means any terminal except ■ n f (which may be recognized
instead of it).
Properties of a correct grammar
Coco generates a compiler only if the grammar is:
1. complete: there must exist a rule for every nonterminal;
2. free of redundancy: every nonterminal must occur in at least one
derivation of the root symbol;
3. free of cycles: there must not be a nonterminal which can be derived
from itself in one or more steps;
4. terminating: every nonterminal must be able to produce a string of
terminals;
5. unambiguous: the grammar must be LL(1).
LL(1) conflicts do not necessarily mean serious errors. They can be viewed as
warnings in situations where the generated compiler will take the first
matching alternative and ignore the others. Sometimes this is what the user
wants, as in the well-known case of the dangling else.
5.4 Example How the compiler treats LL(1) conflicts
This is the grammar of the dangling else:
Statement = ... j IfStatement I ... .
IfStatement = "IF" Expr "THEN" Statement ["ELSE" Statement].
When analyzing the string
IF a THEN IF b THEN C ELSE d
it is not clear whether the else clause belongs to the inner or to the outer
if During parsing the first matching alternative is the else of the inner
Sec. 5.2 Cocol as a syntax description language 109
if. The generated compiler takes this alternative.
5.2.2 Declarations
All terminals and nonterminals must be declared before they can be used in
productions. Declarations have the following order:
SyntaxDeclarations = TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.
Terminal declarations
TerminalDeclarations = "TERMINALS" {Symbol [AliasName]}.
AliasName = "alias" Symbol.
Symbol = identifier I string.
Terminals are declared by their enumeration behind the symbol TERMINALS.
Consecutive token numbers are assigned to them in the order of their
declaration. The first symbol gets the number 1, the next one the number 2,
and so on. If a symbol name contains a special character, it must be enclosed
in quotes (e.g. "+", "plus-symbol").
The end-of-file symbol must not be declared. It always is assumed to
have the token number 0. The lexical analyzer has to supply it as the last
symbol of the input text. At its arrival, the syntax analyzer automatically
interprets it as an indication that the input is empty now. The end-of-file
symbol must not (and cannot) be specified in a production.
A symbol may be given an alias name, which is used in error messages
by the generated compiler. If the alias name is omitted, the symbol name is
used instead of it. Alias names allow the use of short names in the grammar
and of expressive names in error messages.
5.5 Example Terminal declarations
TERMINALS
id alias identifier
":=" alias "becomes symbol"
";" alias semicolon
Pragma declarations
Pragmas are a special feature of Cocol. They are neither terminals nor
nonterminals and must not be used in productions. They may occur at any
position in the input text and are read by the parser as if they were terminals,
but they do not belong to the syntax of the language (examples of pragmas are
110 The compiler description language Cocol Chap. 5
options, the end-of-line symbol, and comments). Parsing is not influenced by
pragmas but they may carry semantic information (such as line numbers,
option values, etc.). Pragmas can be used to propagate information between
the passes of a multi-pass compiler.
PragmaDeclarations = "PRAGMAS" {Symbol}.
Symbol = identifier | string.
Pragmas are declared by enumerating them behind the keyword PRAGMAS.
They are assigned consecutive token numbers, starting with the highest
terminal number plus one.
5.6 Example Pragma declarations
PRAGMAS
"end of line"
option
The purpose of pragmas will become clear when we attach semantic actions to
them (see Example 5.11).
Nonterminal declarations
NonterminalDeclarations = "NONTERMINALS" {identifier [AliasName]}.
AliasName = "alias" Symbol.
Symbol = identifier I string.
Nonterminals are declared by enumerating them behind the keyword
NONTERMINALS. Their declaration order is insignificant. Nonterminals can be
given an alias name too. The root symbol (grammar name) must also be
declared as a nonterminal.
5.7 Example Nonterminal declarations
NONTERMINALS
Stat alias Statement
Expr alias Expression
5.3 Cocol as a semantic description language
The semantics of a translation are specified by attaching semantic actions,
attributes and semantic declarations to the syntax description. The following
grammar of Cocol shows that there are only few locations (marked by
underlined text), where semantic parts have to be added to a syntax description
in order to get an attributed grammar.
Sec. 5.3 Cocol as a semantic description language 111
CocolText
SyntaxDeclarations
TerminalDeclarations
PragmaDeclarations
NonterminalDeclarations
AliasName
Productions
Production
Expression
Term
Factor
Symbol
= "GRAMMAR" identifier
SemanticDeclarations
SyntaxDeclarations
Productions
"ENDGRAM".
= TerminalDeclarations
[PragmaDeclarations]
NonterminalDeclarations.
= "TERMINALS"
{Symbol [Attributes] [AliasName]}.
- "PRAGMAS"
{Symbol [Attributes] [SemAction]}.
- "NONTERMINALS"
{identifier [Attributes! [AliasName]}.
= "ALIAS" Symbol.
= "RULES" {Production}.
= identifier [Attributes! "=" Expression
« Term {"I" Term}.
= Factor {Factor}.
= Symbol [Attributes]
"(" Expression ")"
"[" Expression "]"
"{" Expression "}"
SemAction
"eps"
"any",
identifier I string.
5.3.1 Semantic actions
A semantic action is a statement sequence on the right-hand side of a
production, which is executed after the symbol to the left of it has been recognized
and before the symbol to the right of it will be recognized. Semantic actions
may be written in any algorithmic programming language (in our Coco
implementation this language is Modula-2). There are two kinds of semantic
actions.
SemAction = SimpleAction j SemMacroCall.
Simple semantic actions
SimpleAction = "sem" {any} "endsem".
A semantic action is enclosed by the keywords sem and endsem. Between
them, any statements such as assignments, procedure calls, conditional
statements and loops are allowed. The syntactical correctness of the statements
is not checked by Coco.
112
The compiler description language Cocol
Chap. 5
5.8 Example Semantic actions
We want to have a compiler which counts the words in a text. The
context-free grammar is
Text - {Word}.
Now we add semantic actions.
Text * sem count:=0 endsem
{Word sem count:=count+l endsem}
sem IF count>0 THEN
WriteCard(count,3); WriteString(n words")
END
endsem.
Since syntactic and semantic parts are intermixed and hard to read, we
separate them in two 'colums1:
Text = sem count :=0 endsem
{Word sem count:=count+l endsem
} sem IF count>0 THEN
WriteCard(count,3); WriteStringl" words")
END
endsem.
Syntactic and semantic parts are separated clearly now. The production
must be read line by line from the left to the right.
The parameters of procedure calls in semantic actions may be specified as
input, output or transient parameters by writing the characters T, T or 'IV
in front of them (T, 'A\ and '!A' on an ASCII keyboard). This is a simple
way to make procedure calls more readable. In the resulting compiler these
marks are removed.
5.9 Example Indication of data flow at parameters
ComputeValues(iargumentl,iargument2,tresult);
Semantic macros
Sometimes a semantic action is needed at more than one location in a
grammar. To avoid rewriting of the action, the user can define a macro for it
and call it whenever he needs it.
SemMacroDefinition = "sem" ":" MacroName ":" {any} "endsem".
SemMacroCall = "sem" "(" MacroName ")" "endsem".
MacroName = identifier.
A macro definition is a semantic action headed by a macro name which is
enclosed in colons. It must be given in a special section of the semantic
declarations (see Section 5.3.4). Note: The use of semantic macros also
reduces the code size of the resulting compiler.
Sec. 5.3 Cocol as a semantic description language 113
5.10 Example Semantic macros
The last semantic action of Example 5.8 is needed more than once, say.
The action is defined as a macro in the semantic declarations as follows
(see Section 5.3.4):
MACROS
sem :WriteCounter:
IF count>Q THEN
WriteCard(count,3); WriteString(n words")
END
endsem
It may then be called by writing
sem (WriteCounter) endsem
Semantic actions for pragmas
A semantic action may be associated with the declaration of a pragma. This
means that the action is executed every time the parser reads the pragma. In
this way a pragma can cause the execution of a semantic action although it
does not occur in any production.
5.11 Example Semantic actions for pragmas
PRAGMAS
eolsy sem PrintLinelnfo; — call a semantic procedure
Emit(ieol) — write pragma to next interpass file
endsem
5.3.2 Attributes
Attributes describe semantic properties of symbols and their context.
Attributes = w<n OutAttributes n>"
I "<" InAttributes [w;n OutAttributes] n>n.
InAttributes = "in" w:n InAttr {w,w InAttr}.
OutAttributes = "out" n:n OutAttr {•,• OutAttr}.
InAttr - identifier | number.
OutAttr = identifier.
In Cocol, attributes play the role of parameters of the grammar symbols. They
are classified into input attributes, which are passed to a nonterminal for its
recognition, and output attributes, which arise during the recognition of a
symbol.
We also distinguish between formal and actual attributes. Formal
attributes occur in the declaration of a symbol or are attached to nonterminals on
114 The compiler description language Cocol Chap. 5
the left-hand side of a production. Actual attributes are attached to symbols cm
the right-hand side of a production.
5.12 Example Attributes
NONTERMINALS
Variable <in:type; out:object>
RULES
Variable <in:type; out:object>
Declaration
= Variable <in:tp; out:obj> — tp: actual input attribute
~ obj: actual output attribute
Attribute names may be used like variables in semantic actions.
Attributes of nonterminals
Nonterminals may have input and output attributes of arbitrary types. The type
of an attribute is declared like the type of any other variable (see Section
5.3.4). Formal and actual attributes must be assignment compatible in the
sense of Modula-2, although this is not checked by Coco.
Whenever a nonterminal occurs, all its attributes must follow it. Formal
and actual attributes must correspond in number, sequence, and kind (in or
out). A numeric constant may only be specified as an actual input attribute.
Attribute evaluation is similar to parameter passing in procedures: before
the recognition of a nonterminal is started, the values of the actual input
attributes of the nonterminal are assigned to its formal input attributes; when
the nonterminal has been recognized, the formal output attribute values are
assigned to its actual output attributes.
Attributes of terminals and pragmas
Terminals and pragmas may have only output attributes. For implementation
reasons their size is restricted to word size. This restriction can be
circumvented by using abstract data types for longer attributes.
Whenever a terminal or a pragma occurs, all its attributes must follow it.
For terminals, the names of the formal attributes are insignificant, but for
pragmas they are significant as they may be used in a semantic action.
Pragmas don't have actual attributes since they cannot appear on the right-
hand side of a production. The attribute values of terminals and pragmas are
supplied by the scanner (see Section 6.4.2).
— type: formal
— object: formal
— type: formal
— object: formal
input attribute
output attribute
input attribute
output attribute
Sec. 5.3 Cocol as a semantic description language 115
5.3.3 Context conditions
There is no special language construct for context conditions in Cocol. They
are written as conditional statements in semantic actions. This has the
drawback of hiding them somewhat but has the advantage that arbitrary error
actions can be associated with them.
5.13 Example Context conditions
sem IF typel=type2 — context condition
THEN ... — semantic action
ELSE ... — error action
END
endsem
5.3.4 Semantic declarations
All variables, procedures and named constants that are used as attributes or in
semantic actions must be declared. The compiler description can be viewed as
a module to which these objects are local. The user may also import objects
from other modules.
SemanticDeclarations = [ObjectDeclarations]
[SemMacroDeclarations].
Declarations of semantic objects
ObjectDeclarations = "SEMANTIC" "DECLARATIONS" modulatext.
modulatext is an arbitrary text of import statements, constant, type, variable,
or procedure declarations in Modula-2. The syntax of this text is not checked
by Coco.
5.14 Example Declarations of semantic objects
SEMANTIC DECLARATIONS
FROM InOut IMPORT WriteCard, WriteString;
FROM UserModule IMPORT UserProcedure;
CONST maxint » 32767;
VAR field: ARRAY[1..100] OF CHAR;
PROCEDURE Equal(x,y:ARRAY OF CHAR): BOOLEAN;
BEGIN ... END Equal;
116
The compiler description language Cocol
Chap. 5
Declaration of semantic macros
At this point the user may declare a set of semantic macros in this place which
can be used in the productions.
SemMacroDeclarations = "MACROS" {SemMacroDefinition}.
SemMacroDefinition = "sem" ":" MacroName ":" {any} "endsem".
MacroName = identifier.
An example of the definition and the use of a semantic macro can be found in
Section 5.3.1 (Example 5.10).
5.3.5 Scope of semantic objects
For implementation reasons, the scope of a semantic object cannot be
restricted to a single production: all declared and imported objects are global to the
whole compiler description. This means that the value of a semantic object
may be destroyed by a nonterminal that is processed between the assignment
and the use of that object. One has to resort to the following remedies:
1. Naming conventions. Every production should use its own names for
those attributes and semantic objects which may be destroyed by another
production. This reduces the problem to semantic objects of recursive
nonterminals.
2. Stacking. All values which may be destroyed by a nonterminal should be
stacked before this nonterminal is entered and unstacked afterwards.
5.15 Example Stacking of semantic objects
Expression<out:exprval> =
Term<out:exprval>
{"+" sem Push(iexprval) endsem
Term<out:x> sem Pop(texprval)/ exprval:=exprval+x endsem
}.
Term<out:termval> =
Factor<out:termval>
{"*" sem Push(itermval) endsem
Factor<out:x> sem Pop(ftermval); termval:=termval*x endsem
}.
Factor<out:factval> =
integer<out:factval>
I "{" Expression<out:factval> ")".
The original values of exprval and termval are destroyed by the recursive
calls to Term and Factor so they must be saved on a stack.
6
The compiler compiler Coco
This chapter describes the compiler compiler Coco from the user's point of
view. It contains everything the user needs to know in order to produce a
compiler with Coco. Section 6.1 presents a survey of the main characteristics
of Coco, Section 6.2 describes the components of the generated compilers,
and Section 6.3 shows how these compilers work. Since Coco produces only
the basic parts of a compiler, the user must supply additional modules to get a
complete compiler. Section 6.4 describes the interfaces for these modules and
Section 6.5 shows how a multi-pass compiler can be produced with Coco.
6.1 Characteristics
Coco is a program which generates the basic parts of a compiler from a
compiler description that is supplied as its input. The characteristics of Coco
are:
1. The compiler definition language Cocol is easy to read and easy to learn.
It is based on L-attributed grammars whose syntax rules are written in
Wirth's EBNF notation, and whose semantic actions are coded directly in
Modula-2.
2. Coco and the compilers produced by it are small and efficient, since they
use simple analysis techniques (table-driven top-down parsing and L-
attributed grammars), and since the parser tables are encoded in a very
compact form (G-code). Therefore, they can be efficiendy used on
microcomputers with a small memory and limited processor performance.
118
The compiler compiler Coco
Chap. 6
3. The generated compilers contain a syntax error-recovery algorithm that is
automatically derived from the attributed grammar. This frees the user
from developing individual error handlers for each target compiler.
4. The user can attach modules of his own to the generated compiler parts,
thus adapting the compiler to his particular needs.
5. The input grammar is checked for completeness, consistency, and unam*
biguity.
6. Coco supports the production of multi-pass compilers for languages that
cannot be translated in a single pass, or that are so large that a single-pass
compiler will not fit into memory.
7. Coco offers the possibility of excluding selected source text portions from
syntax analysis. Thus, it is possible to describe complements of regular
languages, or to forward parts of the input from one pass to the next
without modification.
8. Besides terminals and nonterminals, Coco provides a third class of
symbols called pragmas. Pragmas are special terminals that can appear at
arbitrary positions in the input stream, but are not part of the syntax of the
language itself (e.g. end-of-line symbols or compiler options).
How to invoke Coco
The invocation of Coco and the naming of the files involved depend on the
computer on which Coco is running. We describe the version for the Apple
Macintosh. On the Macintosh, Coco is invoked by clicking its icon and by
selecting an input file from the open dialog box which shows all available text
files. Fig. 6.1 is a block diagram of a Coco run.
Compiler description
inCocol
Program frames
►
Coco
i
1
Source list
Syntax analyzer
Semantic evaluator
Fig. 6.1 Input and output files of Coco
Coco reads a compiler description and produces the following:
1. a syntax analyzer as described in Section 2.5 together with parser tables
(G-code and symbol information);
6 2 Components of the generated compiler 119
2 a semantic evaluator as described in Section 3.6;
3 * source list of the Cocol input with any syntax and semantic error
messages, with the results of the grammar tests and with statistical data
about the grammar.
The syntax analyzer and the semantic evaluator are generated from program
frames on files. On the Macintosh, the generated parts are written to the
following files:
Syntax analyzer: grammarnamesyn. DEF, grammarnamesyn. MOD
Semantic evaluaton grammarnamesem.DEF, grammarnamesem .HOD
Source list: input name. LST
grammarname is the grammar name specified in Cocol, inputname is the
name of the input file. Section 8.3 shows an example of these files.
6.2 Components of the generated compiler
In order to get a complete compiler, the user must attach his own modules to
the compiler parts produced by Coco. The following table shows which parts
are generated by Coco, which must be supplied by the user, and which are
available as standard modules.
Generated by Coco
Syntax analyzer
Semantic evaluator
User-supplied
Main program
Lexical analyzer
Semantic modules
Standard module
Error message module
Hence, Coco generates only the basic parts of a compiler (those which are
described by the attributed grammar). For flexibility, the remaining parts may
be written individually, although they are very similar in all compilers (see
program listings in Appendix F).
The lexical analyzer can be generated with the scanner generator Alex
(Mossenb5ck [1986]), which is a separate tool not described in this book. It
produces a scanner module in Modula-2 that exactly fits to the modules
generated by Coco.
The semantic modules are written in Modula-2. Only few conventions
have to be obeyed (see Section 6.4).
120
The compiler compiler Coco
Chap. 6
6.3 Operation of the generated compiler
Figure 6.2 shows the overall structure of a generated single-pass compiler.
The main program calls the syntax analyzer. The syntax analyzer parses the
source program by interpreting the G-code and executes semantic actions
contained in the semantic evaluator, which in turn call semantic procedures to
emit the target code. A filter procedure between the actual syntax analyzer and
the lexical analyzer filters any pragmas out of the input stream and processes
them semantically.
To create a multi-pass compiler, one must write a compiler description for
each pass separately and translate it with Coco. This results in a syntax
analyzer and a semantic evaluator for each pass. Figure 6.3 shows the
interaction of the generated parts in a two-pass compiler. The first pass reads
the source program, processes it and generates an intermediate language (IL).
The second pass reads the intermediate language, processes it again and
generates the target code.
t
Lexical
analyzer
Main program
\
Syntax analyzer
Error message
module
i
Source text
T
Error
messages
Semantic
evaluator
Semantic
procedures
Target code
Fig. 6.2 Overall structure of a generated single-pass compiler
I
Syntax analyzer 1
Main program
I
Lexical
Source text
}
Semantic
evaluator 1
i
Semantic
procedures 1
»
Syntax analyzer 2
IL
1 ^
Semantic
evaluator 2
I
Semantic
procedures 2
Target code
Fig. 63 Overall structure of a generated two-pass compiler
Sec. 6.4
Interfaces of the generated compiler
121
6.4 Interfaces of the generated compiler
A compiler nucleus produced by Coco has four interfaces (shown in Fig.
6.4). It is called by the main module, reads the input stream, translates it into
an output stream, and produces error messages. This nucleus is the same for
all generated compilers. The user must attach some of his own modules to
these interfaces to adapt the compiler to his particular needs.
Input
interface
<
Operating system
interface
*
Syntax analyzer
<
Error
interface
Semantic evaluate?
Output
interface
Fig. 6.4 Interfaces of a generated compiler
6.4.1 Caller interface
The main program must call the syntax analyzer of the generated compiler to
perform the syntax analysis and semantic processing of the input text. The
following definition module shows the interface between the syntax analyzer
and the main program.
DEFINITION MODULE grammarnamesyn;
VAR
printinput: BOOLEAN; (*trace the input?*)
printnodes: BOOLEAN; (*trace the parser?*)
PROCEDURE Parse(VAR correct:BOOLEAN);
END gramma mamesyn.
grammarnamesyn is the name of the generated syntax analyzer (the grammar
name from Cocol with the suffix syn). The procedure Parse is the actual
syntax analyzer. It must be called from the main program of the compiler.
Prior to this, the lexical analyzer (see Section 6.4.2) must be initialized and
ready to supply the first symbol. The parameter correct shows if syntax
errors have been found. The variables printinput and printnodes can be set to
TRUE in order to produce a trace of the syntax analysis for debugging.
122 The compiler compiler Coco
6.4.2 Input interface
The syntax analyzer expects the input from a procedure GetSy which must be
supplied by the user in a module grammarnamelex (grammar name from
Cocol with the suffix lex). The corresponding definition module must look
like this:
DEFINITION MODULE grammarnamelex;
VAR
typ: CARDINAL; (*current symbol number*)
at: ARRAY[1..10] OF CHAR; (*attributes of the current symbol*)
line: CARDINAL; (*current symbol line number*)
col: CARDINAL; <*current symbol column number*)
PROCEDURE GetSy;
END grammarnamelex.
Every time the syntax analyzer needs a new terminal, it calls the procedure
GetSy which returns the symbol number, line number and column number of
the next source symbol in the global variables typ, line and col. It also fills
the array at. If a symbol has i attributes, then at[l..i] holds their values, at is
implicitly imported in any attributed grammar. It can contain a maximum of 10
attributes which experience has shown is sufficient. If imported, typ, line,
and col can be used in the attributed grammar to get the type and the attributes
of symbols that are recognized by the special symbol any.
The symbol numbers returned by GetSy must correspond to the
declaration sequence of the terminals and pragmas in the compiler description. The
first declared symbol must have the number 1, the next symbol must have 2
and so on. At the end of the input stream GetSy must return an end-of-file
symbol which by convention has the symbol number 0.
6.4.3 Output interface
For the generation of object code and other compiler outputs the user is not
bound by any restrictions. One can arbitrarily attach one's own modules to the
compiler nucleus and call one's procedures from the semantic actions of the
attributed grammar.
Thus, the output interface is the interface to all user-supplied semantic
modules. It is described by the import clauses in the semantic declarations of
the compiler description and by the imported definition modules.
Sec. 6.4
Interfaces of the generated compiler
123
6.4.4 Syntax error interface
The syntax analyzer of the generated compiler automatically recovers from a
syntax error and gathers information about the cause of error. However, the
user must provide for the output of the error message by supplying a
procedure SyntaxError exported from a module Errors (see standard module
in Appendix F). This procedure is called by the syntax analyzer each time a
syntax error occurs. It can print the error message immediately or store it in
order to display all error messages together at the end of the compilation. The
definition module Errors must have the following form:
DEFINITION MODULE Errors;
TYPE Symbolname = ARRAY[1..25] OF CHAR;
Errorptr » POINTER TO Errornode;
Errornode = RECORD
txt: Symbolname; (*symbol name*)
1: CARDINAL; (*length of symbol name*)
next: Errorptr; (*to next symbol of the same message*)
END;
PROCEDURE SyntaxError(symbols-.Errorptr; line,col:CARDINAL);
END Errors.
SyntaxError has three parameters: symbols is a pointer to a linked list of
those symbols that are expected at the error location (if available, alias names
are used in place of symbol names). The parameters line and column indicate
the line number and column number of the error location.
Figure 6.5 shows a sample list of expected symbols pointed to by the
parameter symbols.
symbols-
colon
■AT
semicolon
END
Fig. 6.5 List of expected symbols, colon is the symbol causing the error;
semicolon or END have been expected instead
The first node of the list contains the symbol that caused the error (in this case
the colon)y the subsequent nodes contain the symbols that were expected
instead of the erroneous symbol (in this case semicolon and END).
SyntaxError can now produce the following message:
Syntaxerror in line...column...near colon: semicolon or END expected
124 The compiler compiler Coco Chap. 6
6.5 Generation of multi-pass compilers
With L-attributed grammars, some languages can only be translated in multiple
passes. Some other languages are so complex that a single-pass compiler
would not fit into the memory of a microcomputer. For these reasons, a
compiler must often be split into several passes.
Each pass is a compiler of its own. It reads the source program, or an
intermediate language from which it produces a new intermediate language, or
the target program. If somebody wants to write a multi-pass compiler, he must
write a compiler description for each pass, and then put the produced compiler
passes in sequence (see Fig. 6.3). Cocol has features that are specially
designed for the generation of multi-pass compilers:
Input from an intermediate language. It is possible to read an
intermediate language file instead of a source text by simply supplying an
appropriate input procedure GetSy (see Section 6.4.2)
Pragmas serve mainly to pass control information from one pass to the
next in the intermediate language. Before they get to the syntax analyzer of the
next pass they are extracted from the input stream and processed semantically.
The symbol any. The grammar symbol any can be used to exclude parts
of the source text from the syntax analysis, and forward it unchanged to the
next pass.
6.1 Example Application of any
A typical application of the complement symbol any is to process
declarations in the first pass of a compiler and statements in the second
pass. The following example skips statements and forwards them to the
next pass:
Block «
Declarations
BEGINSY
{ any
}
ENDBLOCKSY.
Here, any denotes all terminal symbols except ENDBLOCKSY. It can
be semantically processed using the variables typ and at exported by the
lexical analyzer (see Section 6.4.2).
sent Copy(ltyp,!line,icol,iat);
— copy symbol to next
— intermediate language
endsem
7
The implementation
In this chapter we will show how Coco is structured and how it works. First
we provide an overview of its design (7.1). Then we describe the internal data
structures such as the symbol list (7.2) and the top-down graph (7.3), as well
as the collection of some sets of terminal symbols (7.4). Section 7.5 covers
various grammar tests which the top-down graph is subjected to before the
target compiler is generated. The last three sections cover the generation of the
compiler parts, namely the parser tables (7.6), the syntax analyzer (7.7), and
the semantic evaluator (7.8). Section 8.3 shows an example of the generated
compiler parts for a specific input grammar.
At the beginning of each section, a diagram is used to illustrate how this
section relates to the structure of chapter 7.
The implementation
Structure Structure Collecting Grammar Generation Generation Generation
of the of the tte tests of the of the of the
symbol top-down symbol sets parser syntax semantic
list graph tables analyzer evaluator
Fig. 7.1 Structure of Chapter 7
We describe algorithms in an abstract manner, using Adele or Cocol.
Appendix F contains the concrete implementation of Coco. Details that are not
126
The implementation
Chap. 7
necessary for understanding the algorithms are omitted as they can be found in
the program listings.
Coco is written in Modula-2 and has been implemented on various micro-
computers including Macintosh, IBM-PC, Atari and Lilith. It produces
compilers in Modula-2 and was used for its own implementation, too. We
describe the implementation on the Macintosh.
7.1 Survey
Like any compiler, Coco is composed of an analysis part (front end) and a
synthesis part (back end). The analysis part consists of a lexical analyzer and
a syntax analyzer. The synthesis part consists of a semantic evaluator with
several semantic modules attached to it (Fig. 7.2).
Main program
I
Syntax analyzer
I I
Lexical analyzer Semantic evaluator
I I I I I
Symbol list Top-down graph Grammar tests Generation Generation
handler handler of the of the
syntax analyzer semantic evaluator
Fig. 7.2 Structure of Coco with its main tasks shown as semantic modules
From the above, the main tasks of Coco are:
1. handling a symbol list: Symbol information is stored (name, symbol
number, attribute, scope, etc.);
2. handling a top-down graph: Graph nodes are generated and linked to
form subgraphs;
3. testing the grammar: The grammar is checked to see if it is complete,
non-circular, and LL(1). It is also checked to see whether all nonterminals
can be reached and derived into terminal strings;
4. generating the syntax analyzer: The source code of the generated syntax
analyzer is built from fixed frame parts, and variable parts derived from
Sec. 7.2
Structure of the symbol list
127
the compiler description. It includes LL(1) parser tables generated from
the attributed grammar,
* generating the semantic evaluator: The source code of the semantic
evaluator is built from fixed frame parts and from semantic actions and
declarations copied from the compiler description.
The main algorithm of Coco is as follows:
Coco:
Initialize lexical analyzer;
Parse (Tok);
if ok then
Find deletable symbols;
Insert eps-nodes before deletable nt's;
Delete redundant eps-nodes;
Get symbol sets;
Test grammar(tok);
end;
if ok
then Generate compiler;
else Print error message;
end;
end Coco;
— Section 2.4
— Section 7.4.1
— Section 7.3.3
— Section 7.3.4
— Section 7.4
— Section 7.5
— Sections 7.6
and 7.7
The procedure Parse parses the input text and calls the semantic actions for
the construction of the top-down graph and the symbol list as well as for the
generation of the semantic evaluator. After some tests and transformations of
the data structures the target compiler is produced
7.2 Structure of the symbol list
Coco handles a symbol list with information about terminals, nonterminals,
and pragmas. This section describes its representation and shows how it is
filled.
7.2.1 Symbol list representation
The symbol list is a linear list of symbol nodes each of them describing a
syntax symbol. The list is indexed by symbol numbers.
TYPE
Symboltype = (eps,t,pr,nt,any,err);
(*eps, terminal, pragma, nonterminal, any, error-symbol*)
Symbolnode = RECORD
spix: CARDINAL; (*spelling index of symbol name*)
The implementation
The implementation
Structure Structure Collecting Grammar Generation Generation Generation
of the of the the tests of the of the of the
symbol top-down symbol sets parser syntax semantic
list graph tables analyzer evaluate
I I
symbol list symbol list
representation construction
Fig. 73 Structure of Section 12
aliasspix: CARDINAL; (*spelling index of alias name*)
nra: CARDINAL; ("number of attributes*)
CASE typ: Symboltype OF (*symbol kind*)
t,eps,any: (*nothing*)
I pr:
seml,sem2: CARDINAL; (*pragma semantics*)
I nt,err:
start: CARDINAL; (*start of top-down graph*)
del: BOOLEAN; (*TRUE if deletable*)
firstat: Attributeptr; (*to first formal attribute*)
END;
END;
Symbollist = ARRAY[0..maxsymbol] OF Symbolnode;
The fields spix, aliasspix, nra, and typ are filled when the symbol is
declared For terminals, this is the only information stored in the symbol list.
The node of a pragma has two additional fields denoting the semantic
actions which the generated compiler has to execute when it reads this pragma.
The first action is for the output attribute assignments (Section 7.8.4), the
second is the semantic action associated with this pragma in Cocol. If no
actions are to be executed, both fields are zero. The fields are filled when the
pragma is declared.
Nonterminal nodes contain additional information: The field start points
to the root of the top-down graph of this specific nonterminal. It is set when
the corresponding rule has been processed. At the same time, the field del is
set, which indicates whether the nonterminal is directly deletable, i.e. if it can
be immediately derived into the empty string. The indirect deletability of a
nonterminal can only be determined when the top-down graphs of all
nonterminals have been built (see Section 7.4.1). Finally, nonterminal nodes
have a field firsts tat pointing to a list of formal attributes. This list contains
7#2 Structure of the symbol list 129
the name and direction (input-output) of each attribute of the nonterminal. The
attribute list is built when the nonterminal is declared. It is implemented as
follows:
TYPE
Direction = (up,down); (*attribute direction*)
Attributeptr = POINTER TO Attribute;
Attribute = RECORD
spix: CARDINAL; (*attribute name*)
dir: Direction; (*up,down*)
next: Attributeptr; (*to next attribute of same nt*)
END;
Names of symbols and attributes are not stored in the symbol list direcdy.
Rather, they are stored in a name list which is an array of characters. Instead
of the actual names the symbol list contains only their address in the name list,
called spix (spelling index). The lexical analyzer handles a hashed list of
'spixes1 for fast searching of names.
7.2.2 Symbol list construction
For each symbol in the syntax declarations of Cocol, a symbol node with a
successive number is allocated. Therefore, symbol numbers correspond to the
declaration sequence of the symbols. The following procedures are used to
generate, access, and modify symbol nodes:
PROCEDURE NewSy(spix:CARDINAL; typ:Symboltype): CARDINAL;
PROCEDURE SyNr(spix:CARDINAL): CARDINAL;
PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode);
PROCEDURE RepSy(sy:CARDINAL; sn:Symbolnode);
NewSy generates a new symbol node with the fields spix and typ and
returns its node number. SyNr searches for the symbol with the name spix.
If spix is found, SyNr returns the corresponding symbol number, else it
returns 65535 (the value of the null symbol). GetSy gets the symbol node sn
corresponding to symbol number sy. Repsy replaces the symbol sy by the
node sn.
Attributes are processed with the following procedures:
PROCEDURE NewAt(sy,spix:CARDINAL; dirdirection);
PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction);
PROCEDURE CompleteAt(sy,n:CARDINAL) : BOOLEAN;
NewAt defines a new attribute for the symbol sy. For nonterminals, it also
appends the name (spix) and the direction (dir) of the attribute to the attribute
list. Get At gets the fields spix and dir of the nth attribute of the nonterminal
sy. If sy has less than n attributes, then 0 is returned as the value of spix.
130
The implementation
Omp.7
CompleteAt returns TRUE if the symbol sy has exactly n attributes. The
implementation of these procedures is trivial as can be seen in Appendix F.
7.3 Structure of the top-down graph
The top-down graph has already been described in Section 2.3 as an internal
grammar representation. In Coco, it is implemented in a somewhat extended
form. First, we will describe the extended top-down graphs, and then show
how they are generated. In Section 7.6.2, we will describe the translation of
top-down graphs into G-code.
The implementation
Structure
of the
symbol
list
Structure
of the
top-down
graph
Collecting Grammar
die tests
symbol sets
Generation
of the
parser
tables
Generation
of the
syntax
analyzer
Generation
of die
semantic
evahiator
Top-down
graph
representation
Top-down
graph
construction
Insertion
of
eps-nodes
Removal
of
redundant
Fig. 7.4 Structure of Section 7.3
7.3.1 Top-down graph representation
The top-down graph is a linear list of graph nodes. Each symbol on the right-
hand side of a Cocol rule is represented by a node. The pointers linking the
nodes are indices of this list.
TYPE
Topdowngraph = ARRAY[1.
Graphnode - RECORD
typ: (eps,t,nt,any);
sp: CARDINAL;
lp:
rp:
CARDINAL;
CARDINAL;
.maxnode] OF Graphnode;
(*symbol kind*)
(*t,nt: pointer to node in symbol list*)
(*eps: pointer to eps-set*)
(*any: pointer to any-set*)
(*left pointer*)
(*right pointer*)
Sec. 7.3
Structure of the top-down graph
131
semi:
sem2:
sem3:
line:
link:
END;
CARDINAL; (*in-attribute action*)
CARDINAL; (*out-attribute action*)
CARDINAL; (*explicit semantic action*)
CARDINAL; (*line number in the source text*)
CARDINAL; (*pointer to the next right end*)
Compared to Section 2.3 the graph node is extended by three semantic
numbers, a line number, and a pointer (link). These fields have the following
meaning:
semi: action number of the input attribute assignments or zero (Sect. 7.8.4);
sem2: action number of the output attribute assignments or zero (Sect. 7.8.4);
sem3: number of the user-written semantic action which follows this symbol
in the Cocol text, or zero;
line: line number of this symbol in the Cocol text (for error messages);
link: pointer for linking the right ends of a graph (the right ends are the
nodes whose right pointer is zero).
7.3.2 Top-down graph construction
It is useful to think of a top-down graph as a 'black box1 linked to its
environment by two pointers head and tail. The interior of the black box may contain
a single node, or an arbitrarily complex graph with several nodes. (Fig. 7.5).
Fig. 7.5 Top-down graph as a black box
head points to the root of the graph and tail to its right end. Since the right
end of the graph usually consists of several nodes, these nodes are linked (see
dashed lines above). The following procedures are used to generate and
process the graph nodes:
PROCEDURE NewNode(typ:Symboltype; sy,line:CARDINAL): CARDINAL;
PROCEDURE GetNode(n:CARDINAL; VAR gnrGraphnode);
PROCEDURE RepNode(n:CARDINAL; gnrGraphnode);
NewNode creates a graph node containing the specified symbol sy, having
132 The implementation Chap. 7
the symbol type typ, and the line number line and returns its node number.
GetNode returns the nth graph node in gn. RepNode replaces the nth
graph node by gn.
Two top-down graphs can be combined to a new graph by arranging
them either side by side as successive components or below one another as
alternatives. In either case, a new top-down graph with head and tail is
produced.
Linking of successive components
Coco uses the procedure ConcatRight to link sucessive components.
ConcatRight (Jheadl, Jtaill, ihead2, itail2) :
param headl,head2,taill,tail2: Cardinal;
local p: Cardinal;
begin
p:=taill;
while p<>0 do
gn(p).rp:=head2;
p:=gn(p).link;
end;
taill:=tail2;
end ConcatRight;
ConcatRight links the graphs (headl, taill) and (head2, tail2) via right
pointers giving the new graph (headl, taill). The right ends of the first
graph are linked with the root of the second graph (see Fig. 7.6).
head2
gj
tail2
teadl
Fig. 7.6 Linking of successive components
Sec. 7.3
Structure of the top-down graph
133
Linking of alternatives
Coco uses the procedure ConeatLeft to link alternatives.
ConeatLeft (Jheadl, Jtaill, ihead2, ltail2)
param headl,head2,taill,tail2: Cardinal;
local p: Cardinal;
begin
p:=headl;
while gn(p).lp<>0 do p:=gn(p).lp; end;
gn(p).lp:=head2;
p:=taill;
while gn(p) .linkoO do p:=gn(p) .link; end;
gn(p).link:=tail2;
end ConeatLeft;
ConeatLeft links the graphs (headl, taill) and (Jtead2, tail2) via left
pointers giving the new graph (headl, taill). The end of the first alternative
chain of the first graph is linked with the root of the second graph. The right
ends of both graphs are connected in a similar way (see Fig. 7.7).
headl
TJ
taill
head2
! J
tail2
headl
□
1
r
^^♦H—!
^S»«>.«-:
£J
taill
□
Fig. 7.7 Linking of alternatives
An attributed grammar for the construction of top-down graphs
In order to show that attributed grammars can be used for documentation as
well, we will describe the generation of the top-down graph for one syntax
rule by means of an attributed grammar. The complete top-down graph is
composed of the graphs for all syntax rules.
134 The implementation Chap. 7
The grammar of EBNF rules
Rule « identifier "=" Expression n.w.
Expression = Term {"|n Term}.
Term = Factor {Factor}.
Factor « symbol I "eps" I "any"
I "(" Expression w)n
I "[" Expression "]"
I n{n Expression "}".
contains the nonterminals Expression, Term, and Factor. Each of these
nonterminals supplies as an output attribute a top-down graph with the ends
head and tail. These graphs can be linked in two different ways: factor
graphs are linked via right pointers, term graphs via left pointers
{ConcatRight and ConcatLeft). A new top-down graph is formed in either
case, which is again represented by head and tail.
Expression, Term, and Factor also supply an output attribute del,
which indicates if the term or factor is directly deletable, i.e. if it can be
derived into the empty string, del is entered into the symbol list
The attributed grammar uses the procedures described above to handle the
symbol list (GetSy, RepSy, SyNr) and the top-down graph (NewNode,
ConcatLeft, ConcatRight).
GRAMMAR Rule — graph generation for a single rule
SEMANTIC DECLARATIONS
FROM cocogra IMPORT NewNode, ConcatLeft, ConcatRight, Push, Pop;
FROM cocosym IMPORT GetSy, RepSy, SyNr, Symbolnode, anysy, epssy;
VAR hl,h2,h3: CARDINAL; — head pointers
tl,t2,t3: CARDINAL; — tail pointers
dell,del2,del3:BOOLEAN; — TRUE, if element is deletable
sn: Symbolnode;
spix,syspix: CARDINAL; — spelling indices
sy: CARDINAL; — symbol number
MACROS
sem :PushValues:
Push(lhl); Push(itl); Push(idell);
Push(ih2); Push(it2); Push(idel2);
endsem
sem :PopValues:
Pop(Tdel2); Pop(tt2); Pop(th2);
Pop(tdell); Pop(ttl); Pop(thl);
endsem
TERMINALS
twit it\ tf it rn niit writ nin n_.it it it "pnc" nanvn
symbol<out:spix>
Sec. 7.3
Structure of the top-down graph
135
NONTERMINALS
Rule
Expression <out:hl,tl,dell>
Term <out:h2,t2,del2>
Factor <out:h3,t3,del3>
RULES
Rule =
symbol<out:syspix>
Expression<out:hl,tl,dell>
sem sy:=SyNr(isyspix);
GetSy(isy,?sn);
sn.del:=dell; sn.start:=hl;
RepSy(isy,isn);
endsem
n it
Expression<out:hl,tl,dell£ =
Term<out:hl,tl,dell>
{ n|n Term<out:h2,t2,del2>
sem ConcatLeft(Jhl,ttl,ih2,it2);
dell:=dell OR del2;
endsem
}.
Term<out:h2,t2,del2> -
Factor<out:h2,t2,del2>
{ Factor<out:h3,t3,del3>
sem ConcatRight (th2,tt2,ih3,it3) ;
del2:=del2 AND del 3;
endsem
}.
Factor<out:h3,t3,del3> «
symbol<out:spix> sem sy:=SyNr(ispix);
h3:=NewNode(isy); t3:=h3; del3:=FALSE;
endsem
I neps° sem h3:=NewNode(iepssy); t3:=h3; del3:=TRUE;
endsem
I "any" sem h3:=NewNode(4anysy); t3:=h3; del3:=FALSE;
endsem
I "(" sem (PushValues) endsem
Expression<out:h3,t3,del3>
n)w sem (PopValues) endsem
I "[" sem (PushValues) endsem
Expression<out:h3, t3, del3>
sem hi:=NewNode(iepssy); tl:=hl;
ConcatLeft(ih3,tt3,ihl,itl);
del3:=TRUE;
endsem
136 The implementation Chap. 7
n3n sem (PopValues) endsem
I n{n sem (PushValues) endsem
Expression<out:h3,t3,del3>
sem hl:=NewNode(iepssy); tl:=hl;
ConcatRight(th3,tt3,ih3,it3);
ConcatLeft(th3,tt3,ihl,itl);
t3:=tl; del3:=TRUE;
endsem
n}n sem (PopValues) endsem.
ENDGRAM
Figure 7.8 shows which graphs are produced by the translation of an EBNF
expression in brackets. As an example, we select the expression able.
(able)
[able]
{able}
Fig. 7*8 Translation of an EBNF expression into a top-down graph
7.3.3 Insertion of eps-nodes
Normally each symbol of the input grammar corresponds to one node in the
top-down graph. However, from Fig. 7.8, we see that the translation of
expressions in square or curly brackets leads to the generation of additional
eps-nodes which have no counterpart in the input grammar. They are inserted
by Coco to indicate that an expression is deletable.
There are also some other cases where eps-nodes must be inserted into
graphs: The algorithm of Section 7.3.2 will fail if a term that begins with an
expression in curly brackets has an alternative. The production
S - <{a>b I c).
would lead to the top-down graph shown in Fig. 7.9.
Sec. 7.3
Structure of the top-down graph
137
S: -*- a '
\
e— b
C
Fig. 7.9 Erroneous top-down graph for S = ({a} b I c)
This is obviously wrong because once an a has been identified, only a or b
should follow, not c, as is possible in the above graph. This problem is
solved by including an eps-node in front of the first alternative (Fig. 7.10).
S: —^el—*—^ a—I
t i ^b
Fig 7.10 Correct top-down graph for S = ({a} b I c) with inserted eps-node
This graph is now correct since after identifying an a, only a or b can
follow, not c. For each eps-node, the set of terminal successors (eps-sei) is
computed (Section 7.4.4). The eps-set of the node el (namely {a, b}) allows
us to distinguish between the two alternatives in the above example. Eps-
nodes are inserted in front of all expressions in curly brackets during the
construction of the top-down graph (see attributed grammar in Appendix F).
Deletable nonterminals present a similar problem. If a nonterminal is
deletable, it is always processed by the syntax analyzer, because if the current
input symbol is not a start symbol of the nonterminal itself it may still be a
valid successor. Now, if there is a node which is an alternative of a deletable
nonterminal, this node will never be visited, since the nonterminal will always
be recognized beforehand. Coco solves this problem by inserting an eps-node
in front of a deletable nonterminal. The eps-set of this node is then used to
distinguish between the alternatives. From the graphs shown in Fig. 7.11,
where the deletable nonterminal Y has an alternative, the graphs in Fig. 7.12
are produced.
X: -~y—*- a Y: — c
b e
Fig, 7.11 Top-down graph with deletable nonterminal Y
X: —-el—-Y—-a Y:—- c
b e2
Fig. 7.12 Top-down graph with inserted eps-node in front of deletable nonterminal Y
138 The implementation Chap. 7
The eps-set of the node el (namely {a, c}; c is a terminal start of Y and a
is successor of the deletable nonterminal 10 enables the selection between the
two alternatives starting with el and b. There are no more alternatives to the
node with the deletable nonterminal Y. It can therefore be safely visited by the
syntax analyzer.
The algorithm for the insertion of eps-nodes in front of deletable
nonterminals is shown below.
Insert eps-nodes before deletable nt's:
local gn,gnl: Graphnode;
sn: Symbolnode;
begin
for all nodes i do
Get Node (i i,tgn);
if (gn.typ=nt) and (gn.lpoO) then
GetSy(!gn.sp,Tsn);
if sn.del then — deletable nt with alternative
gnl:=gn; gnl.lp:=0; — gnl now holds the deletable nt
j: =NewNode {int, 10,10); — create empty node
RepNode(ij,ignl);
gn.typ:=eps; gn.sp:-0; — gn holds the new eps-node
gn.rp:=j; gn.seml:=0; gn.sem2:=0; gn.sem3:=0;
RepNode(ii,ign);
end;
end;
end; — for
end Insert eps-nodes before deletable nt's;
7.3.4 Removal of redundant eps-nodes
When expressions in square or curly brackets are translated, eps-nodes arise
that can be removed again if it turns out that the expressions have successors
(see Fig. 7.13). The algorithm for the removal of redundant eps-nodes is
shown below:
Delete redundant eps-nodes:
global visited: set of nodenumbers; — mark list for visited nodes
sn: Symbolnode;
begin
visited:={};
for all nonterminals i do
GetSy(ii,tsn);
DelEps(isn. start);
end;
end Delete redundant eps-nodes;
Sec. 7.3
Structure of the top-down graph
139
Graph with
redundant
eps-nodes
Equivalent graph
without redundant
eps-nodes
EBNF expression
[a]b — a-r-^b -^a-T*b
{a}b
a—r*- b —*• a —t*»
il iT
Fig. 7.13 Creation and removal of redundant eps-nodes
The procedure DelEps(lloc) deletes all redundant eps-nodes in the top-down
graph with the root loc. Redundant eps-nodes can be recognized by the
following characteristics: they have no associated semantic actions, their left
pointer is null, and their right pointer is not null. They always receive a link
from the left pointer of some other node.
DelEps(iloc):
param loc: Cardinal;
global visited: set of nodenumbers; — mark list for visited nodes;
local gn,gnl: Graphnode;
begin
if loc=0 or loc in visited then return end; — end or cycle
visited:=visited+{loc};
GetNode(iloc,tgn);
if gn.lpoO then — test if alt. node is a redundant eps
GetNode(ign.lp,Tgnl) ;
if (gnl.typ=eps) and (gnl.sem3=0)
and (gnl.lp=0) and (gnl.rpoO) then
gn.lp:=gnl.rp;
RepNode(iloc,ign);
end;
end;
DelEps(ign.lp);
DelEps(lgn.rp);
end DelEps;
140
The implementation
Chap. 7
7.4 Collecting the symbol sets
So far, the input grammar has been read and the symbol list as well as the top-
down graph have been built. From these two data structures, Coco calculates
the symbol sets needed for the grammar tests and for the generated compiler.
The implementation
Structure Structure
of the of the
symbol top-down
list graph
Collecting
the
symbol sets
Grammar Generation Generation Generation
tests of die of the of the
parser syntax semantic
tables analyzer evaluator
Deletable
nonterminals
Terminal
start symbols
of
nonterminals
Terminal
successors
of
nonterminals
any-sets
Fig. 7.14 Structure of Section 7.4
Coco collects four sets of terminals:
1. start symbols of nonterminals;
2. successors of nonterminals;
3. successors of eps-nodes (eps-sets);
4. sets represented by any-symbols (any-sets).
The following procedures are used to access the top-down graph and the
symbol list:
PROCEDURE GetNode(loc:CARDINAL; VAR gn:Graphnode);
PROCEDURE RepNode(loc:CARDINAL; gn:Graphnode);
PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode);
PROCEDURE RepSy(sy:CARDINAL; snrSymbolnode);
GetNode gets the graph node gn with the number loc. RepNode replaces
the graph node with the number loc by the node gn. GetSy gets the symbol
node sn with the number sy. RepSy replaces the symbol node with the
number sy by the node sn.
Before the symbol sets are collected, it is necessary to find out which
nonterminals are deletable.
Sec. 7.4
Collecting the symbol sets
141
7.4.1 Deletable nonterminals
All deletable nonterminals are tagged in the symbol list. In the first step,
tagging of those symbols which can be directly derived into the empty string is
carried out. In the second step, tagging of all those nonterminals whose top-
down graph can be traversed along a path of already tagged symbols is carried
out The second step is repeated until no more deletable symbols are found.
The directly deletable nonterminals are found when the top-down graph is
created (see Section 7.3.2). The following algorithm finds the indirectly
deletable nonterminals.
Find deletable symbols:
local sn: Symbolnode;
changed: Boolean;
begin
repeat
changed:=false;
for all nonterminals i do
GetSy(ii,?sn);
if not sn.del and Deletable(isn.start) then
sn.del:=true; RepSy(ii,4sn); changedi^true;
end;
end;
until not changed;
end Find deletable symbols;
The procedure Deletable(iloc) checks if the top-down graph rooted at loc is
deletable (i.e. if it can be traversed along a path of deletable symbols).
Deletable(iloc) Boolean:
param loc: Cardinal;
global marked: set of nodenumbers; — mark list for visited nodes
begin
marked:={}; return DelGraph(iloc);
end Deletable;
The actual work is performed by the procedure DelGraph.
DelGraph(iloc) Boolean:
param loc: Cardinal;
global marked: set of nodenumbers;
local gn: Graphnode;
begin
if loc=0 then return true; end; — end of graph found
if loc in marked then return false; end; — already visited: cycle
marked:=marked+{1oc};
GetNode(iloc,?gn);
return ((gn.lpoO) and DelGraph(ign.lp)) or — deletable alternat.
(DelNode(ign) and DelGraph(ign.rp)); — or deletable right
end DelGraph; — part of graph
142
The implementation
Chap. 7
Finally, DelNode checks if a node (i.e. its corresponding symbol) is dele-
table.
DelNode(ign) Boolean:
param gn: Graphnode;
local sn: Symbolnode;
begin
if gn.typ=nt
then GetSy(lgn.sp,tsn); return sn.del;
else return gn.typ«eps;
end;
end DelNode;
7.4.2 Terminal start symbols of nonterminals
The terminal start symbols of a nonterminal are the terminal start symbols of
its top-down graph, i.e. the start symbols of its first alternative chain. Those
nodes of the chain which contain nonterminals will have their terminal start
symbols calculated recursively. If the chain contains a deletable symbol, its
successors have also to be considered. The terminal start symbols of all
nonterminals are stored in a list.
Get terminal start symbols:
global first: array(nonterminals) of record
ts: set of terminals; — terminal start symbols
ready: Boolean; — true, if ts is computed
end;
local sn: Symbolnode;
begin
for all nonterminals i do first(i).ready:=false; end;
for all nonterminals i do
GetSy(ii,?sn);
GetFirstSetdsn. start, Tfirst(i) .ts);
first(i).ready:=true;
end;
end Get terminal start symbols;
The procedure GetFirstSet(lloc,ts) supplies the terminal start symbols of
the top-down graph with the root loc.
GetFirstSet(iloc,Ts) :
param loc: Cardinal;
s: set of terminals;
global visited: set of nodenumbers; — mark list for visited nodes
begin
visited:={};
CollectFirst(iloc,fs);
end GetFirstSet;
Sec. 7.4
Collecting the symbol sets
143
tpirstSet initializes a mark list for the prevention of cycles and calls the
procedure CollectFirst which does the actual work.
CollectFirst(iloc,?s):
param loc: Cardinal;
s: set of terminals;
global visited: set of nodenumbers; — mark list for visited nodes
first: like in 'Get terminal start symbols';
local sn: Symbolnode;
gn: Graphnode;
si: set of terminals;
begin
s:={>;
while locoO do — for all alternatives
if loc in visited then return; end; — cycle
visited:=visited+{loc};
Get Node (i 1 oc, t gn) ;
if DelNode(ign) then CollectFirst(ign.rp,tsi); s:=s+sl; end;
case gn.typ of
t: s:=s+{gn.sp};
I nt: if first(gn.sp).ready
then s:=s+first(gn.sp).ts;
else
GetSy(ign.sp,tsn); CollectFirst (isn.start,tsl) ;
s:=s+sl;
end;
I any: s^s + {all terminals};
I eps: — nothing
end;
loc:=gn.lp;
end;
end CollectFirst;
The procedure DelNode(ign) from Section 7.4.1 checks if the graph node
gn is deletable.
7.4.3 Terminal successors of nonterminals
The terminal successors of all nonterminals are stored in another list. They are
collected in two steps: first, a search is made for the direct successors of all
nonterminals (those terminals immediately following this nonterminal at all its
occurrences in the graph); then the indirect successors are calculated (if a
nonterminal is at the end of a rule, its indirect successors are the successors of
the nonterminal on the left-hand side of this rule).
In the first step, the data structure follow is filled; this contains for each
nonterminal i its direct successors (ts) and those nonterminals (nts), whose
successors are indirect successors of i. In the second step, the indirect
successors are added to ts.
144 The implementation Chap. 7
Get terminal successors:
global follow: array(nonterminals) of
ts: set of terminals; — terminal successors
nts: set of nonterminals; ~ nt's whose successors
end; — must be added to ts
visitednod: set of nodenumbers; — mark list (visited nodes)
visitedsym: set of nonterminals; — mark list (visited nt's)
local sn: Symbolnode;
i: Cardinal;
begin
for all nonterminals i do follow(i).ts:={}; follow(i).nts:={}; end;
visitednod:={};
for all nonterminals i do — fill follow.ts and follow.nts
GetSy(ii,?sn);
CollectFollow(isn.start,ii);
end;
for all nonterminals i do ~ complete follow.ts
visitedsym:={};
Complete (ii); follow(i) .nts:=0;
end;
end Get terminal successors;
The procedure CollectFollow(lloc,lsy) traverses the top-down graph of
the nonterminal sy starting at the node loc. Every time it encounters a
nonterminal /, it adds its direct successors to the set follow(i).ts. For each
nonterminal 1 at the right end of the graph, it adds sy to the setfollow(i).nts.
CollectFollow(iloc,isy):
param loc,sy: Cardinal;
global follow: as in 'Get terminal successors';
visitednod: set of nodenumbers;
local gn: Graphnode;
s: set of terminals;
begin
while locoO do — step through alternatives chain
if loc in visitednod then return; end; — cycle
visitednod:=visitednod+{loc};
GetNode(iloc,?gn);
if gn.typ=nt then
GetFirstSet(ign.rp,Ts);
follow(gn.sp).ts := follow(gn.sp).ts + s;
If Deletable(ign.rp) then — nt at end of rule
follow(gn.sp).nts := follow(gn.sp).nts + {sy};
end;
end;
CollectFollow(ign.rp,isy);
loc-.^gn.lp;
end;
end CollectFollow;
The procedure GetFirstSet(lloc,ts) from Section 7.4.2 computes the set of
Sec- 7.4
Collecting the symbol sets
145
nal start symbols s of the graph with the root loc. The procedure
letable(Uoc) from Section 7.4.1 checks whether the graph rooted at loc
isdeletable.
The procedure Complete(li) used in Get terminal successors completes
the direct successors of the nonterminal i (follow(i).ts) by adding its
indirect successors, which are the successors of the nonterminals contained in
follow(i).nts.
Complete(ii):
param i: Cardinal;
global visitedsym: set of nonterminals;
follow: like in 'Get terminal successors1;
local j: Cardinal;
begin
if i in visitedsym then return; end; — cycle
visitedsym:=visitedsym+{i};
for all j in follow(i).nts do
Complete(ij);
follow(i) ,ts:«follow(i) .ts+follow( j) .ts;
end;
end Complete;
7.4.4 eps-sets
eps-nodes having an alternative must not be recognized by the generated
syntax analyzer unless the next input symbol is a valid successor of this eps-
node. In order to find out whether a symbol is a valid successor, the syntax
analyzer must know the set of all possible successors of each eps-node with
alternatives.
The terminal successors of an eps-node are the terminal start symbols of
the subgraph rooted at the right pointer of the eps-node. If the right pointer is
null, the terminal successors are the successors of the nonterminal on the left-
hand side of the graph containing the eps-node.
First, the top-down graph of each nonterminal is searched for eps-nodes.
Get eps-sets:
global epsset: array of set of terminals; — eps successors
maxeps: Cardinal; — number of eps-sets
visited: set of nodenumbers; — mark list for visited nodes
local sn: Symbolnode;
begin
visited:={}; maxeps:=0;
for all nonterminals i do
GetSy(ii,Tsn);
FindEps(isn.start, ii,ifalse);
end;
end Get eps-sets;
146 The implementation Chap. 7
The procedure FindEps(lloc9l leftsy,Ivialp) searches the top-down graph
with the root loc for eps-nodes. It computes their successors and stores them
into the global array epsset. The field sp of the eps-node is set to point to this
entry in epsset. The flag vialp indicates whether loc has been reached via a
left pointer.
FindEps(iloc,ileftsy,ivialp):
param loc: Cardinal; — root of TDG
leftsy: Cardinal; — left side nonterminal
vialp: Boolean; — true, if loc is reached via lp
global visited: set of nodenumbers; — mark list for visited nodes;
local gn: Graphnode;
begin
if loc=0 or loc in visited then return; end; —end or cycle
visited:=vi sited+{loc};
GetNode(iloc,fgn);
if (gn.typ^eps) and (vialp or (gn.lpoO)) then — eps with alt.
FindEpsFollowers(ign.rp,ileftsy,tgn.sp); — gn.sp points to
RepNode(iloc,ign); — eps-set
end;
FindEps(ign.lp,ileftsy,itrue);
FindEps(ign.rp,ileftsy,ifalse);
end FindEps;
The procedure FindEpsFollowers(llocMeftsy91nr) collects the terminal
start symbols of the subgraph with the root loc. If the graph is deletable, the
successors of the nonterminal leftsy are also added, nr is the index into the
global array epsset. The collected set has been stored in epsset(nr).
FindEpsFollowers(iloc,ileftsy,Tnr):
param loc,leftsy,nr: Cardinal;
global epsset: array of set of terminals; — successors of eps-nodes
follow: like in Get terminal successors;
maxeps: Cardinal;
local s: set of terminals;
begin
GetFirstSet(iloc,Ts) ;
if Deletable(iloc) then s:=s+follow(leftsy).ts; end;
maxeps:=maxeps+l; epsset(maxeps):=s;
nr:=maxeps;
end FindEpsFollowers;
The procedure GetFirstSet{iloc^s) from Section 7.4.2 collects the terminal
start symbols of the graph with the root loc. The procedure Deletable(iloc)
from Section 7.4.1 determines whether the graph with the root loc is
deletable.
Sec- 7.5
Grammar tests
147
7,4.5 any-sets
In order to recognize an any-symbol, the generated syntax analyzer needs the
set of all terminals represented by the any-symbol. An any-symbol represents
all terminals which are not in the alternative chain to which it belongs. For
any-symbols without alternatives, no any-sets are computed. The syntax
analyzer recognizes them regardless of the next input symbol.
Get any-sets:
global anyset: array of set of terminals; — any-sets
maxany: Cardinal; — number of any-sets
eofsy: Cardinal; — symbol number of eof-symbol
local gn: Graphnode;
s: set of terminals;
begin
for all nodes i do
GetNode(ii,tgn);
if (gn.typ=any) and (gn.lpoO) then
GetFirstSet(ign.lp,ts);
Make complement of s;
s:=s-{eofsy}; — eofsy must not be recognized by any
maxany:=maxany+1;
anyset(maxany):=s;
gn.sp:^maxany; — sp of any-node points to any-set
RepNode{ii,ign);
end;
end;
end Get any-sets;
The procedure GetFirstSet(iloc^s) from Section 7.4.2 supplies the
terminal start symbols of the graph with the root loc.
For the calculation of an any-set, only those symbols are considered
which can be reached via the left pointer of the any-node. The symbols which
lie before the any-node in the alternative chain are not considered, since the
syntax analyzer has already checked them before it gets to the any-node.
7.5 Grammar tests
Before Coco generates the target compiler, it carefully checks if the grammar
satisfies certain requirements which are necessary for a correct compiler. Here
the compiler compiler proves to be very valuable: even in large grammars,
which are hard to understand for human readers, it rapidly finds hidden
ambiguities or circularities. The well-known problem of the 'dangling else' clearly
148
The implementation
Chap. 7
shows how easy bugs in the grammar design can remain undetected without
the support of an automatic tool (actually, this ambiguity was overlooked in
the language definition of Algol).
Coco verifies the following properties:
1. completeness;
2. reachability;
3. noncircularity;
4. termination;
5. LL(1) property
The implementation
Structure Structure
of the of the
symbol top-down
list graph
Collecting
the
symbol sets
Grammar
tests
Generation Generation Generation
of the of the of the
parser syntax semantic
tables analyzer evaluator
Completeness
1 I I I
Reachability Noncircularity Terminalization LL(l)-condition
Fig. 7.15 Structure of Section 7.5
The test algorithms are executed in the following order:
Test grammar(tok):
Test completeness(tokl);
Test if all nt's can be reached(Tok2);
Find circular rules(tok3);
Test if all nt's can be derived to t's(Tok4);
LL1 test(Tok5);
ok:=okl and ok2 and ok3 and ok4 and ok5;
end Test grammar;
These algorithms access the top-down graph and the symbol list with the
following procedures, already described in Sections 7.2.2 and 7.3.2:
PROCEDURE GetNodeUoc:CARDINAL; VAR gn:Graphnode) ;
PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode);
Sec- 7.5
Grammar tests
149
7.5.1 Completeness
As check is carried out as to whether there is a rule for all nonterminals.
Basic idea: The field start in the symbol node of each nonterminal must
point to a top-down graph.
Test completeness <?ok):
param ok: Boolean;
local sn: Symbolnode;
begin
ok:=true;
for all nonterminals i do
GetSy(ii,tsn);
if sn.start^O then ok:=false; end;
end;
end Test completeness;
7.5.2 Reachability
A check is made as to whether all declared nonterminals appear in some
sentential form derived from the start symbol of the grammar.
Basic idea: First, tagging is done on all those nonterminals which can be
derived directly from the start symbol, then on those nonterminals which can
be derived from symbols already tagged. This is repeated until no more
nonterminals can be tagged. The untagged nonterminals are not reachable.
Test if all nt's can be reached(Tok):
param ok: Boolean;
global visited: set of nodenumbers; — already visited nodes
reached: set of nonterminals; — reachable nonterminals
rootsy: Cardinal; — start symbol of grammar
local sn: Symbolnode;
begin
visited:={};
reached:={rootsy};
GetSy(irootsy,?sn) ;
MarkReachedNts(isn.start);
ok:=true;
for all nonterminals i do
if not (i in reached) then ok:=false; end;
end;
end Test if all nt's can be reached;
The procedure MarkReachedNts(iloc) marks all nonterminals which can be
reached from the node loc.
150 The implementation Chap. 7
MarkReachedNts(iloc):
param loc: Cardinal;
global reached: set of nonterminals; — reachable nonterminals
visited: set of nodenumbers; — already visited nodes
local gn: Graphnode;
sn: Symbolnode;
begin
if loc=0 or loc in visited then return; end; — end or cycle
visited:=visited+{loc}; — visit loc
GetNode(!loc,?gn);
if (gn.typ^nt) and not(gn.sp in reached) then — new nt reached
reached:=reached+{gn. sp} ;
GetSy(ign.sp,tsn);
MarkReachedNts(isn.start);
end;
MarkReachedNts (ign. lp) ;
MarkReachedNts(ign.rp);
end MarkReachedNts;
7.5.3 Noncircularity
A check is made as to whether there are nonterminals which can be derived
into themselves, i.e. if there are derivations X =»+ X for some nonterminals X.
(This circularity definition differs from the usual definition in attributed
grammars, which defines circular dependencies of attributes.)
Basic idea: All productions are considered, which have a single
nonterminal as their right-hand side. These single-nonterminal productions make
up a graph that must be noncircular.
Algorithm: The graph is stored as pairs (left, right) of nonterminals for
which there is a production left -► right.
Find circular rules (tok):
param ok: Boolean;
global visited: set of nodenumbers; — mark list for visited nodes
local graph : array of record — derivation graph
left,right: Cardinal;
deleted: Boolean;
end;
graphlength: Cardinal;
singles: set of nonterminals; — single descendants of a nt
sn: Symbolnode; '
changed: Boolean;
i,j: Cardinal;
begin
graphlength:=0;
for all nonterminals i do — build the graph
singles:={}; visited:={);
Sec. 7.5
Grammar tests
151
GetSy(iiftsn);
GetSingles(isn.start,tsingles); — get nt's j such that i->j
for all nonterminals j in singles do
graphlength:=graphlength+l;
with graph(graphlength) do
lefti^i; right:=j; deleted:=false;
end;
end;
end;
repeat — remove edges, which are not on a cycle
changed:=false;
for i:=l to graphlength do
if not graph(i).deleted and
(graph(i).left not on any right-hand side or
graph(i).right not on any left-hand side) then
graph(i).deleted:=true; changed:=true;
end;
end;
until not changed;
ok:=graph is empty;
end Find circular rules;
The elements that have not been deleted in the graph represent the circular part
of the grammar.
The procedure GetSinglesQ, loc ^singles) collects a set (singles) of
nonterminals in the top-down graph with the root loc. If the graph can be
derived into a single nonterminal X, then X is added to singles. The
following assertion always holds: loc is on a path which contains only
deletable symbols between its beginning and loc.
GetSingles(iloc,tsingles):
param loc: Cardinal;
singles: set of nonterminals;
global visited: set of nodenumbers;
local gn: Graphnode;
begin — assert: all nodes left to loc are deletable
if loc-0 or loc in visited then return; end; — end or cycle
visited:=visited+{loc);
GetNode(iloc,Tgn);
if (gn.typ^nt) and Deletable(ign.rp) then — right subgraph
singles:=singles+{gn.sp} — deletable
end;
if DelNode(ign) then GetSingles(ign.rp,tsingles) end;
GetSingles(ign.lp,tsingles) ;
end GetSingles;
A nonterminal X is added to singles if it is on a path from loc to the end of
the top-down graph and if this path has only deletable nodes to the left and
right of X. The deletability of subgraphs and nodes is determined by the
procedures Deletable and Delhi ode from Section 7.4.1.
152
The implementation
Chap. 7
7.5.4 Termination
A check is made as to whether all nonterminals can be derived into (possibly
empty) strings of terminals.
Basic idea: Those nonterminals are tagged which are deletable or can be
derived into a string consisting only of terminals or already tagged
nonterminals. This is repeated until no more nonterminals can be tagged. The
untagged nonterminals are those which cannot be derived into terminals.
Test if nt's can be derived to t'sftok):
param ok: Boolean;
global visited: set of nodenumbers; — mark list for visited nodes
termlist: set of nonterminals; — nonterminals which can be
— derived to terminals
local changed: Boolean;
sn: Symbolnode;
begin
termlist:={};
repeat
changed:=false;
for all nonterminals i which are not in termlist do
GetSy(ii,tsn);
visited:={};
if IsTerm(isn.start) then
termlist:=termlist+{i}; changed:=true;
end;
end; — for
until not changed;
ok:=all nonterminals are in termlist;
end Test if nt's can be derived to t's;
The procedure IsTerm(lloc) checks if the top-down graph with the root loc
has a (possibly empty) path which consists only of terminals or already tagged
nonterminals.
IsTerm(iloc): Boolean:
param loc: Cardinal;
global visited: set of nodenumbers;
termlist: set of nonterminals;
local gn: Graphnode;
begin
if loc=0 or loc in visited then return false; end; — end or cycle
vis ited:=visited+{loc};
GetNode(iloc,tgn);
if (gn.typ=nt) and not (gn.sp in termlist)
then return IsTerm(ign.lp);
else return (gn.rp=0) or IsTerm(lgn.rp) or IsTerm(ign.lp);
end;
end IsTerm;
Sec. 7.5
Grammar tests
153
7.5.5 LL(1) condition
A check is made as to whether it is always possible to decide which path of the
top-down graph should be followed during syntax analysis depending on the
next input symbol.
Basic idea: The LL(1) test consists of the following two subtests:
1. The terminal start symbols of all alternatives in an alternative chain must
be disjoint
2. The terminal start symbols of deletable subgraphs must be different from
the terminal successors of the left-hand side nonterminal.
LLl test(Tok):
param ok: Boolean;
global visited: set of nodenumbers; — mark list for visited nodes
local sn: Symbolnode;
begin
ok:=true;
for all nonterminals i do
visited:=U; GetSy(li,Tsn);
CheckAlternatives (4 sn. start, 4 i, tok);
end;
end LLl test;
The procedure CheckAlternatives(Uoc,lsy,tok) checks if the alternative
chain with the root loc contains only alternatives with distinct start symbols
(subtest 1). If the subgraph rooted at loc is deletable (i.e. if it can produce the
empty string), it is also checked whether the start symbols of the subgraph are
different from the successors of the left-hand side nonterminal sy (subtest 2).
CheckAlternatives uses GetF(lsy,lfirst) and GetFo(lsy,1follow)
to access the already calculated sets of terminal start symbols and successors
of nonterminals.
CheckAlternatives (4loc, 4sy,tok) :
param loc,sy: Cardinal;
ok: Boolean;
global visited: set of nodenumbers; — mark list for visited nodes
local first: set of terminals;
follow: set of terminals;
locset: set of terminals; — start symbols of current node
s: set of terminals; — start symbols of prev. alt.
gn: Graphnode;
begin
if loc^O or loc in visited then return; end; ~ end or cycle
if Deletable(4loc) then — subtest 2
GetFirstSet (4loc,ts) ;
GetFo(4sy,tfollow);
154
The implementation
Chap. 7
if s * follow <> {} then ok:=false; end;
end;
s:-{>;
while locoO do — for all alternatives ... subtest 1
if loc in visited then return; end;
visited:=visited+{loc};
GetNode(iloc,tgn);
if DelNode(lgn)
then GetFirstSet (4gn.rp,tlocset);
else locset:={};
end;
case gn.typ of
t: locset:=locset+{gn.sp};
I nt: GetF(!gn.sp,tfirst);
locset:=locset+first;
I eps,any: — nothing
end;
if s * locset <> {} then ok:=false; end;
s:=s+locset;
CheckAlternatives(ign.rp,lsy,tok);
loc:-gn.lp;
end;
end CheckAlternatives;
The procedures Deletableiiloc) and DelNode(ign) from Section 7.4.1
check whether the top-down graph with the root loc or the graph node gn are
deletable. The procedure GetFirstSet(lloc9ts) from Section 7.4.2 supplies
the terminal start symbols s of the top-down graph with the root be.
7.6 Generation of the parser tables
When the grammar tests are completed, Coco can generate the target compiler.
From the symbol list and the top-down graph, the parser tables which drive
the generated compiler are constructed. The tables contain information for the
recognition of symbols and for error handling, including the G-code which
controls the syntax analysis. This section is structured as shown in Fig. 7.16.
7.6.1 Table format
The parser tables are inserted into the generated syntax analyzer as
initialization code. Table 7.1 shows their contents:
Sec. 7.6
Generation of the parser tables
155
Structure Structure
oflhe ofthe
symbol top-down
list graph
The implementation
Collecting
the
symbol sets
Grammar
Generation
of die
parser
tables
Generation Generation
ofthe oflhe
syntax semantic
analyzer evaluator
Table
format
Generation
oftheG-code
Generation of the
remaining tables
Fig. 7.16 Structure of Section 7.6
Table 7.1 Contents of the parser tables
[ Table item
header
code
ntsymbols
epssets
anysets
attribute numbers
pragma semantics
- namelist
name pointers
Contents
table dimensions (for decoding)
G-code
information about nonterminals
sets of valid successors, one for each eps-instruction in the G-code
sets of terminals represented by each any-symbol
number of attributes for each terminal and each pragma
for each pragma, the semantic actions to be executed when
the pragma is recognized
symbol names for error messages
pointers to the symbol names
The structure of the above data is shown by the following Modula-2 type
declarations:
TYPE
Header = RECORD
maxcodevar, maxtvar, maxpvar, maxsvar,
maxepsvar, maxanyvar, maxnamevar, maxnamepvar: CARDINAL;
END;
Code = ARRAY[L.maxcode] OF [0..255];
Symbolset - ARRAY[0..maxt DIV 16] OF BITSET;
Ntsymbols « ARRAY[maxp+1..maxsym] OF RECORD
startpc: CARDINAL; (*start of rule in G-code*)
del: BOOLEAN; (*true, if deletable*)
first: Symbolset; (*terminal start symbols*)
END;
Epsset = ARRAY[l..maxeps] OF Symbolset;
Anyset = ARRAY[1..maxany] OF Symbolset;
Attributenumbers = ARRAY[0..maxp] OF [0..255];
156 The implementation Chap. 7
Pragmasemantics - ARRAY[maxt..maxp] OF RECORD
seml,sem2: CARDINAL; (*element maxt is a dummy*)
END;
Namelist = ARRAY[1..maxname] OF CHAR;
Namepointers = ARRAY[0..maxnamep] OF CARDINAL;
Checksum ■ CARDINAL;
The constants maxcode, maxt, maxp, etc. are the table dimensions derived
from the input grammar. They are inserted into the generated syntax analyzer
as constant declarations. The header of the parser tables contains the same
values as variables again. However, they are not used by the syntax analyzer,
but are reserved for a decoding program.
7.6.2 Generation of the G-code
The G-code is derived from the top-down graph. This process is very simple:
A recursive algorithm visits all nodes of the top-down graph and translates
them into G-code instructions. The simplified algorithm is shown below:
GenCode(inode):
Generate code for node;
if (node.rpoO) and (node.rp not yet visited) then
GenCode(inode.rp);
end;
if (node.lpoO) AND (node.lp not yet visited) then
GenCode (4 node. lp) ;
end;
end GenCode;
Each node is processed as follows (for the definition of the G-code, see
Section 2.4 or Appendix D):
1. Depending on the node type, a G-code instruction for the recognition of
this node is generated (T, NT, NTS, ANY and EPS instructions). For
nodes with a nonzero left pointer value, the generated instruction also
contains the address of the corresponding alternative (TA, NTA, NTAS,
ANYA and EPSA instructions).
2. If semantic actions are specified in the node, SEM instructions are
generated.
3. If the right pointer of the node is zero, a RET instruction is generated
4. If the right pointer points to an already visited node, a JMP instruction to
the address of this node is generated.
In order to resolve jumps and addresses of alternatives, an address list of all
G-code sequences generated from graph nodes is needed. It is handled by the
following procedures:
Sec. 7.6
Generation of the parser tables
157
PROCEDURE NewAdr (loc-.CARDINAL; adr:CARDINAL) ;
PROCEDURE GetAdr(loc,fixup:CARDINAL; VAR adr:CARDINAL);
PROCEDURE Visited(loc:CARDINAL): BOOLEAN;
NewAdr defines that the G-code sequence generated from node loc has the
address adr. GetAdr returns the address adr of the G-code sequence
corresponding to node loc. If the address is not yet in the address list, then adr is
zero. In this case, fixup is remembered as a G-code location where the node's
address is to be entered as soon as it becomes known. An address becomes
known, when it is defined by NewAdr. It is then automatically entered into
all fixup locations waiting for this address. Visited returns TRUE if the
address of the node with number loc is already known.
Two additional procedures are needed: one to emit G-code instructions
and one to access nodes of top-down graphs:
PROCEDURE Emit(VAR pcrCARDINAL; code:Instruction);
PROCEDURE GetNode(loc:CARDINAL; VAR node:Graphnode);
Emit writes the specified instruction code into the code segment at the location
pc and increases the code segment length accordingly. Here, Instruction is a
symbolic type that is represented by the text of the instruction. The actual
implementation deviates from this. GetNode gets the graph node with the
node number loc. The type Graphnode is described in Section 7.3.1.
The actual algorithm for the generation of the G-code follows:
Generate G-code:
local pc: Cardinal;
begin
pc:=l;
for all nonterminals i do
GenCode(iroot of top-down graph of nonterminal i, tpc) ;
end;
end Generate G-code;
GenCode(lloc,tpc) is a recursive procedure which will now be refined. It
translates the top-down graph with the root loc into a corresponding G-code
sequence and inserts it into the code segment at the location pc.
When GenCode arrives at a node loc that has already been visited, the
G-code for the subgraph at loc has already been generated, so this node does
not have to be revisited.
GenCode(Iloc,tpc):
param loc,pc: Cardinal;
var node: Graphnode;
adr,nr: Cardinal;
begin
if Visited(iloc) then return; end;
NewAdr {Iloc, ipc); — now visit loc
158
The implementation
Chap. 7
GetNode(iloc,tnode);
case node.typ of
t: if node.lp=0
then Emit(tpc, i"T node.sp");
else
GetAdr(inode. lp, ipc+2, tadr);
Emit (tpc, I"TA node.sp,adr") ;
end;
I nt: if node.lp=0
then
if node.seml=0
then Emit (tpc, i "NT node.sp");
else Emit (tpc, I "NTS node.sp,node.semi");
end;
else
GetAdr(Inode.lp, ipc+2,Tadr);
if node.seml^O
then Emit(tpc,l"NTA node.sp,adr");
else Emit (tpc, i"NTAS node.sp,adr,node.semi");
end;
end;
I any: if node.sp=0
then Emit{tpc,i"ANY");
else
GetAdr (inode.lp,ipc+2,tadr) ;
Emit(tpc, i"ANYA node.sp, adr");
end;
I eps: if node.spoO then — node with eps-set
if node.lp=0
then Emit (tpc, i "EPS node.sp");
else
GetAdr(inode.lp,ipc+2, tadr);
Emit (tpc, i"EPSA node.sp, adr");
end;
end;
end; —case
if node.sem2<>0 THEN Emit(tpc,i"SEM (node.sem2)"); end;
if node.sem3<>0 THEN Emit (tpc, i"SEM (node.sem3)"); end;
if node.rp=0
then Emit (tpc, i"RET") ;
else
if Visited(inode.rp) then
GetAdr(inode.rp,ipc+l, tadr); Emit (tpc, i"JMP adr");
end;
end;
if node.rpoO then GenCode(Inode.rp,tpc); end;
if node.lpoO then GenCode(inode. lp, tpc); end;
end GenCode;
Sec. 7.7
Generation of the syntax analyzer
159
r-code is completely stored in memory so that the missing addresses can
J^iserted when they become known.
7 6.3 Generation of the remaining tables
Besides the G-code, the contents of the generated tables are almost entirely
extracted from the symbol list. Only the name list is handled by the lexical
analyzer of Coco. Coco gets the necessary data from the symbol list and from
the lexical analyzer with the help of access procedures, and writes them
unchanged into the syntax analyzer as initialization values.
7.7 Generation of the syntax analyzer
Coco generates a table-driven LL(1) syntax analyzer with error handling in the
form of a Modula-2 source module which the user must compile and include
in his compiler. The syntax analyzer is the implementation of the analysis
algorithm described in Section 2.5. It is the same for all generated compilers.
Only the parser tables differ from compiler to compiler so they have to be
inserted into the otherwise invariant parser module.
The implementation
Structure Structure Collecting
ofthe ofthe the
symbol top-down symbol sets
list graph
Grammar
tests
Generation
ofthe
parser
tables
Generation
ofthe
syntax
analyzer
Generation
ofthe
semantic
evaluator
Fig. 7.17 Structure of Section 7.7
The definition module and the implementation module of the syntax analyzer
are generated from a frame text which Coco reads from the file cocosyn-
frame. At certain locations grammar-dependent parts have to be inserted into
this frame. The locations are marked by the string •-->• and a descriptive name
of the text to be inserted. The following table shows what has to be inserted at
these locations.
160
The implementation
Chap. 7
—>modulename grammar name + syn
—>semantic analyzer grammar name + sent
—>input module grammar name + lex
—declarations table dimensions declared as constants
(see example in Section 8.3)
—>tables table values
The syntax analyzer contains references to other modules (e.g. the lexical
analyzer or the semantic evaluator) whose names are constructed from the
grammar name (the name of the root symbol in the attributed grammar) and
from a suffix. The resulting syntax analyzer is written to the files grammar-
namesyn.DEF and gramrnarnamesynMOD.
Coco uses a procedure CopyFramePart to copy pieces of text from the
frame to the syntax analyzer module.
PROCEDURE CopyFramePart(VAR source,target:File; str:ARRAY OF CHAR);
CopyFramePart copies text from the file source to the file target until it
encounters the string str (str is not copied). When it is next called, it continues
copying the text immediately behind str.
This procedure is called with the name of the next piece of text to be
inserted (e.g. -->tables'). It copies the frame up to this name and then Coco
inserts the specified text in place of the name. This process is repeated until the
entire syntax analyzer has been generated. A source listing of cocosynframe
is shown in Appendix F. The module cocosyn, also shown in Appendix F, is
an example of a syntax analyzer generated by this process.
7.8 Generation of the semantic evaluator
In addition to the syntax analyzer and the parser tables, Coco also generates a
semantic evaluator. This is a Modula-2 source module which the user must
compile and include in his compiler. The semantic evaluator consists of some
invariant parts and of the semantic actions and declarations which are copied
from the attributed grammar. Its generation can be divided into three tasks:
1. copy the semantic declarations from the attributed grammar to the
semantic evaluator,
2. translate the semantic actions into components of a case statement;
3. generate new semantic actions (assignments) for attribute passing.
Before covering these three tasks in detail, we will describe the invariant parts
of the semantic evaluator.
Sec. 7.8
Generation of the semantic evaluator
161
The implementation
Structure Structure
of the of the
symbol top-down
list graph
I
Collecting
the
symbol sets
Grammar Generation Generation
tests of the of the
parser syntax
tables analyzer
Generation
of the
semantic
evaluator
Constant parts
of the
semantic
evaluator
Translation
of
semantic
declarations
Translation
of
semantic
actions
Attribute
processing
Fig. 7.18 Structure of Section 7.8
7.8.1 The invariant parts of the semantic evaluator
Like the syntax analyzer, the semantic analyzer is derived from a frame
module which Coco reads from the file cocosemframe. Again Coco copies
the frame using the procedure CopyFramePart (see Section 7.7) and inserts
grammar-dependent parts at some specified places in the frame. These places
are:
—>modulename
—>scannername
—declarations
—>actions
The frame module is as follows:
DEFINITION MODULE —>modulename;
VAR printactions: BOOLEAN;
PROCEDURE Semant(sem:CARDINAL);
END —>modulename.
grammar name + sem
grammar name + lex
semantic declaration of the grammar
semantic actions of the grammar
(*
IMPLEMENTATION MODULE ~>modulename;
FROM SYSTEM IMPORT WORD;
FROM —>scannername IMPORT at;
—declarations
PROCEDURE ASSIGN(VAR x:WORD; y.WORD);
BEGIN x:=y END ASSIGN;
162
The implementation
Chap. 7
PROCEDURE Semant(sem:CARDINAL);
BEGIN
CASE sem OF
11: ; (*action numbers start at 12*)
—>actions
END;
END Semant;
END —>modulename.
The resulting semantic analyzer is written to the files granunarnamestmDEF
and grammarnamescmMOD. The user may set the exported variable print-
actions to TRUE if he wants a trace of the executed semantic actions.
7.8.2 Processing of the semantic declarations
The semantic declarations, which are written in Modula-2, are copied
immediately and without change from the attributed grammar to the frame program,
and are inserted at the location marked by *~>declarations\ This happens in
the following manner: the lexical analyzer of Coco returns the symbols of the
Modula-2 text to the syntax analyzer as Cocol symbols, and from there they
go to the semantic evaluator. The procedure Copyiltyp^lcol) is called for
each symbol to translate its symbol code back into its source text, which is
then inserted into the frame module.
Problems can arise since the Modula-2 text may contain symbols that are
not Cocol symbols (i.e. +, *, &, etc). Such symbols are copied by means of a
trick: the lexical analyzer assigns them a special symbol code (nococosy) and
an attribute (spix). They are treated like names and entered into the name list.
spix is their address in the name list, which allows their source text to be
accessed.
In order to keep the name list small, the Modula-2 names are entered only
temporarily. Permanent storage is prevented with the procedure StopHash.
This causes a name to be entered, but overwritten by the next name, so the
names can be accessed via their addresses just like the permanently stored
names, but only until the next name has been recognized. The procedure
RestartHash re-establishes permanent storage.
Coco copies the declarations without checking the syntax. If there are
syntax errors, they will be detected by the Modula-2 compiler when the
generated semantic evaluator is compiled. We now describe the translation of
the semantic declarations by an attributed grammar in Cocol.
GRAMMAR Declarations
SEMANTIC DECLARATIONS
FROM cocogen IMPORT Copy;
« 7.8 Generation of the semantic evaluator 163
FROM cocolex IMPORT col, typ, StopHash, RestartHash;
— PROCEDURE Copy(typ,col:CARDINAL);
writes the source text of the symbol 'typ' to the generated
semantic analyzer, col is the symbol column in the grammar.
TERMINALS SEMANTICSY DECLARATIONSY
NONTERMINALS Declarations
ROLES
Declarations =
SEMANTICSY DECLARATIONSY sem StopHash endsem
{ any sem Copy(ityp,icol) endsem
} sem RestartHash endsem.
ENDGRAM
7.8.3 Processing of the semantic actions
Coco translates the semantic actions of the attributed grammar into
continuously numbered variants of a case statement, and inserts them into the
semantic frame program at the location marked by the string 9~>aetions\
Like the declarations, the semantic actions are copied unchanged and without a
syntax check. Again, each symbol is copied by translating its symbol code
back into its source text We describe this process in Cocol.
GRAMMAR SemAction
SEMANTIC DECLARATIONS
FROM cocogen IMPORT Copy, OpenSem;
FROM cocosym IMPORT NewMacro, GetMacroNr;
FROM cocolex IMPORT col, typ, StopHash, RestartHash;
FROM Errors IMPORT SemErr;
—PROCEDURE OpenSem(VAR sem:CARDINAL);
— generates a new case label and returns its number sem.
—PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL);
— gets the action number sem of the macro 'spix1. If the macro
— does not exist, sem^O.
VAR spix,sem: CARDINAL;
TERMINALS SEMSY ENDSEMSY n(n ■)• n:n IDENT<out:spix>
NONTERMINALS SemAction<out:sem>
RULES
SemAction<out:sem> =
SEMSY
( "(n IDENT<out:spix> sem GetMacroNr(ispix,Tsem);
IF sem=0 THEN SemErr(1) END
endsem
n\ it
I sem OpenSem(Tsem); StopHash endsem
{ any sem Copy(ityp,Acol) endsem
}
)
ENDSEMSY sem RestartHash endsem.
ENDGRAM
164 The implementation Chap. 7
The above grammar also shows how semantic macros are processed. The
module cocosym handles a list of macro names and their corresponding
semantic action numbers. The action number of a macro is supplied by the
procedure GetMacroNr.
7.8.4 Attribute processing
While declarations and semantic actions need only to be copied from the
attributed grammar into the semantic evaluator, attributes need further
processing. For each symbol, its attributes must be stored in the symbol list, and
must be checked for consistency every time this symbol occurs. In addition to
this, Coco must generate semantic actions by which values are assigned to the
attributes at run-time.
The processing of attributes depends on the context in which they appear.
In Cocol there are three different places where attributes may occur:
1. at the declaration of a syntax symbol;
2. at a nonterminal on the left-hand side of a rule;
3. at a symbol on the right-hand side of a rule.
We will now describe the processing of attributes in each of these three cases,
and then summarize it by an attributed grammar.
Declaration of attributes
Attributes are declared together with syntax symbols and are entered into the
symbol list. The context of attribute declarations is:
SyntaxDeclarations =
TERMINALS {Symbol [Attributes] [AliasName]}
[ PRAGMAS {Symbol [Attributes] [SemAction]} ]
NONTERMINALS {identifier [Attributes] [AliasName]}.
Coco uses the procedure NewAt to enter an attribute into the symbol list.
TYPE Direction = (up,down);
PROCEDURE NewAt(sy,spix:CARDINAL; dir:Direction);
New At enters an attribute spix with the direction dir for the symbol sy.
Depending on the kind of sy, the following information is stored:
for terminal symbols: number of attributes;
for pragmas: number of attributes;
for nonterminals: numbeh,name, and direction of attributes.
Sec. 7.8
Generation of the semantic evaluator
165
Attributes on the left-hand side of productions
Attributes on the left-hand side of productions are called formal attributes.
Their context is:
Rule = identifier [Attributes] W=B Expression n.n .
Formal attributes are checked for consistency with their declaration. For every
left-hand side nonterminal the number of attributes, their names, order, and
direction must agree with the attributes declared for this nonterminal. The
procedure GetAt is used to access the attribute information in the symbol list.
It gets the name (spix) and the direction (dir) of the nth attribute of the
nonterminal sy. If sy has fewer than n attributes, then spix is zero.
PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction);
Attributes on the right-hand side of productions
Attributes on the right-hand side of productions appear as actual attributes of
syntax symbols in EBNF expressions.
Expression « Term {"I" Term}.
Term = Factor {Factor}.
Factor = Symbol [Attributes] I ... .
In this context, attributes denote semantic values which result from the
recognition of a syntax symbol, or which are required for its recognition. Coco
generates assignments between the attribute values and the attribute names,
and includes them as semantic actions in the evaluator program. It also checks
whether the number of attributes, their order and their direction agree with the
corresponding attribute declaration.
Attribute assignments for terminals and pragmas
The lexical analyzer of the generated compiler exports the attribute values of
terminals and pragmas in the variable at. The array at is filled for each
symbol by the lexical analyzer. A terminal (or pragma) t<out:aJb> is handled by
the generated compiler as follows:
recognize t and fill at;
a:=at(l); b:=at(2);
When t has been recognized, a semantic action must be executed in which
the attribute values at(\) and at(2) are assigned to the attributes a and b.
Since such an action does not exist, Coco must generate it.
Attribute assignments for nonterminals
For nonterminals, attribute assignments occur between formal and actual
attributes. A nonterminal nt<in:a,b; out:c9d> is handled by the generated
155 The implementation Chap. 7
compiler as follows:
formal attribute corresponding to a := a;
formal attribute corresponding to b := b;
parse nt;
c := formal attribute corresponding to c;
d := formal attribute corresponding to d;
Again Coco must generate semantic actions for the attribute assignments.
Generation of attribute assignments
For each attribute on the right-hand side of a production, Coco calls the
procedure GenAssign, which generates an assignment of the corresponding
attribute value to the attribute variable.
TYPE Attrkind = (term, (*attribute of a terminal*)
nonterm, (*attribute of a nonterminal*)
const); (*const. value as an attribute of an nt*)
PROCEDURE GenAssign(typ:Attrkind; left, right .-CARDINAL) ;
Table 7.2 shows the meaning of the parameters left and right depending on
the value of the parameter typ. It also shows which code is generated:
Table 7.2 Parameters of GenAssign and the generated code
Value of
typ
term
nonterm
const
Meaning of
left
Spix of
leftside
Spix of
leftside
Spix of
leftside
Meaning of
right
Attribute
number
Spix of
right side
Constant
value
Generated code
namefleft):=at[right]
name(left):=name(right)
name(left):=right
name(spix) denotes the name at the address spix in the name list. The array
at is exported by the lexical analyzer and contains the attribute values of the
most recently recognized terminal.
The procedure EmitAction builds a semantic action from the attribute
assignments generated since its last call. It inserts the action as a variant of a
case statement into the semantic evaluator. Thus, the semantic evaluator
contains not only the semantic actions of the attributed grammar, but also the
actions generated from attributes by Coco. EmitAction returns the action
number of the generated semantic action.
PROCEDURE EmitAction(VAR sem:CARDINAL);
Sec. 7.8
Generation of the semantic evaluator
167
Optimization of attribute passing
Coco performs two optimizations to reduce the number of attribute
assignments:
1 If the formal and the actual attribute of a nonterminal have identical
names, no assignment is generated.
2. Identical semantic actions (with the same assignments) are generated only
once.
Description of the attribute processing in Cocol
We will now summarize the attribute processing, describing it by an attributed
grammar in Cocol. The start symbol of the grammar is the nonterminal
Attributes. The grammar is a segment of a larger grammar in which attributes
can appear in various contexts. Therefore, Attributes has three input attributes
which control its processing.
Attributes<in:sy,styp,kind; out:semi,sem2>
sy denotes the symbol to which the attributes belong; styp specifies the type
of this symbol; kind is the context in which the attributes are being used
indicating how they are to be processed:
kind=def: treat them as an attribute declaration;
kind=check: perform a consistency check
(when used on the left-hand side of a production);
kind-use: generate semantic actions for attribute passing
(when used on the right-hand side of a production).
semi and sem2 are the numbers of the generated semantic actions for input
and output attribute passing (or zero).
GRAMMAR Attributes
SEMANTIC DECLARATIONS
FROM cocosym IMPORT NewAt, GetAt, CompleteAt, Direction, Usage,
Symboltype;
FROM cocogen IMPORT Attrtype, EmitAction, GenAssign;
FROM Errors IMPORT SemErr;
—TYPE
— Attrtype = (term,nonterm,const);
— Direction = (up,down); (*out-at or in-at*)
— Usage = (def,check,use); (*attribute context:*)
— Symboltype = (eps,t,pr,nt,any);
—PROCEDURE NewAt(sy,spix:CARDINAL; dir:Direction);
— declares an attribute for the symbol sy with the name spix and
— the direction dir.
—PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL;
— VAR dir-.Direct ion);
168 The implementation Chap. 7
— gets the name spix and the direction dir of attribute number n
— of symbol sy. If sy has less than n attributes, then spix=0.
—PROCEDURE CompleteAt(sy,n:CARDINAL): BOOLEAN;
— returns true if symbol sy has exactly n attributes.
VAR
sy,spix,spixl,semi,sem2,n,val: CARDINAL;
styp: Symboltype;
kind: Osage;
dir,dirl: Direction;
MACROS
sem :AssignInAt:
n:=*n+l;
CASE kind OF
use: IF styp-nt THEN
GetAt(!sy,in,tspixl,?dirl);
IF spixloO THEN
IF dir=dirl
THEN GenAssign(inonterm,i spixl, I spix)
ELSE SemErr(2)
END
END
END;
I check: IF styp^nt THEN
GetAt(isy,in,Tspixl,?dirl);
IF spixloO THEN
IF spixOspixl THEN SemErr(3) END;
IF dirOdirl THEN SemErr(2) END
END
END;
I def: NewAt(isy,ispix,idir)
END — CASE
endsem
sem :AssignNumber:
n:=n+l;
IF kind=use
THEN
IF styp=nt THEN
GetAt(isy,Jrn,Tspixl,tdirl) ;
IF spixloO THEN
IF dir=dirl
THEN GenAssign(iconst,ispixl,Ival)
ELSE SemErr(2)
END
END
END
ELSE SemErr(4)
END
endsem
Sec. 7,8
Generation of the semantic evaluator
169
sem :AssignOutAt:
n:=n+l;
CASE kind OF
use: IF styp=t THEN GenAssign(iterm,ispix,in)
ELSIF styp=nt THEN
GetAt(isy,in,tspixl,tdirl);
IF spixloO THEN
IF dir=dirl
THEN GenAssign(inonterm,ispix,ispixl)
ELSE SemErr(2)
END
END
END;
| check:IF styp^nt THEN
GetAt(isy,in,tspixl,Tdirl);
IF spixloO THEN
IF spixospixl THEN SemErr(3) END;
IF dirOdirl THEN SemErr(2) END
END
END;
| def: NewAt(4sy,ispix,idir);
IF styp=pr THEN GenAssign(iterm,lspix,in) END
END — CASE
endsem
TERMINALS
n>« n<W n.n *f it if.it INSY 0UTSY
IDENT<out:spix> NUMBER<out:val>
NONTERMINALS
Attributes<in:sy,styp,kind; out:semi,sem2>
InAttr<in:sy,styp,kind; out:semi,sem2,n>
—n: attribute counter
OutAttr<in:sy,styp,kind,n; out:semi,sem2,n>
RULES
Attributes<in:sy,styp,kind; out:semi,sem2> =
w<w sem seml:=0; sem2:=s0 endsem
( InAttr<in:sy,styp,kind; out:semi,n>
[ ";n OutAttr<in:sy,styp,kind,n; out:sem2,n> ]
I OutAttr<in:sy,styp,kind, 0; out:sem2,n>
)
n>w sem IF NOT CompleteAt(isy,in) THEN
SemErr(5)
END
endsem.
InAttr<in:sy,styp,kind; out:semi,n> ■
INSY ":n sem IF stypont THEN SemErr(l) END;
dir:=down; n:=0
endsem
170 The implementation
Chap. 7
( IDENT<out:spix> sem (AssignlnAt) endsem
I NUMBER<out:val> sem (AssignNumber) endsem
) { \n
( IDENT<out:spix> sem (AssignlnAt) endsem
I N0MBER<out:val> sem (AssignNumber) endsem
)) sem IF kind=use THEN Emit Act i on (t semi) END
endsem.
OutAttr<in:sy,styp,kind,n; out:sem2,n> =
OUTSY ":n sem dir:=up endsem
IDENT<out:spix> sem (AssignOutAt) endsem
{ "," IDENT<out:spix> sem (AssignOutAt) endsem
} sem IF (kind=use) OR (styp=pr) THEN
EmitAction(tsem2)
END
endsem.
ENDGRAM
If one of the context conditions is violated, the procedure SemErr(in) is
called, which emits an error message depending on n:
n error message
1: In-attributes for a pragma or terminal
2: Wrong attribute direction
3: Wrong attribute name
4: Formal attribute is a constant
5: Wrong number of attributes
s
Applications
8.1 Applications in compiler construction
Attributed grammars are mainly used in compiler construction - more
precisely for the description of compilers. However, the description of an actual
compiler is far too complex to be used as an introductory example. Therefore,
in this section we will use Cocol to develop a lexical analyzer, which is part of
a compiler. This example is general enough to demonstrate all language
constructs of Cocol, and yet simple enough for a reader inexperienced with
attributed grammars to follow it The application of Coco to an actual compiler
(the compiler description for Coco itself) can be found in Appendix F.
It is unusual to describe and to generate lexical analyzers with attributed
grammars. Normally, they are coded by hand since they must be very efficient
(lexical analysis takes the biggest part of the compilation time). There are
special scanner generators which are designed to produce fast lexical
analyzers. Although Coco is not such a generator, run-time measurements show
that it is possible in both theory and practice to implement lexical analyzers
with Coco.
As an example, we will develop a lexical analyzer for Modula-2. First we
will give a general specification for lexical analyzers. Then we will prepare a
special specification of a lexical analyzer for Modula-2. Next we will describe
and build this lexical analyzer using Cocol. Finally we will explain some of
the problems that can arise. At the end of this section, we will specify the
semantic procedures used in the description of the lexical analyzer.
171
172 Applications Chap, g
8.1.1 Specification of a lexical analyzer
General tasks
A lexical analyzer must at least perform the following tasks:
1. read and optionally print the source program;
2. skip meaningless character sequences such as blanks, comments, etc.;
3. recognize and tokenize terminals such as keywords, names, numbers,
and operators;
4. report lexical errors.
Usually, a lexical analyzer will recognize only one terminal per call and pass it
to the syntax analyzer. However, there are also analyzers that process the
entire source text at once, and write the symbol codes of the recognized
terminals to an intermediate file so that the syntax analyzer can read them later
on. The lexical analyzer described here is of the latter type.
Tasks of a lexical analyzer for Modula-2
A lexical analyzer for Modula-2 must recognize the following terminals:
Keywords
AND
ARRAY
BEGIN
BY
CASE
CONST
DEFINITION
DIV
DO
ELSE
ELS IF
END
EXIT
EXPORT
FOR
FROM
IF
IMPLEMENTATION
IMPORT
IN
LOOP
MOD
MODULE
NOT
OF
OR
POINTER
PROCEDURE
QUALIFIED
RECORD
REPEAT
RETURN
SET
THEN
TO
TYPE
UNTIL
VAR
WHILE
WITH
Names
Identifier = Letter {Letter I Digit}.
Letter = wAn |"B"|...I"ZnI"a"I"b"I...I"z".
Digit = noBrinr2nr3nr4nr5nr6n!n7iir8,,r9n.
Decimal constants
DecNumber = Digit {Digit}.
Hexadecimal constants
HexNumber = Digit {HexDigit} "H".
HexDigit = Digit I (wAn|nBn | wCnrDnrE,,rFn) .
Octal constants
OctalNumber = OctalDigit {OctalDigit} "Bn.
Sec. 8.1
Applications in compiler construction
173
OctalDigit - «0nnwr2nln3w|w4wr5wr6wr7".
Real constants
RealNumber = Digit {Digit} •.• {Digit}
[nEn [»+"!"-"] Digit [Digit]).
Character constants
CharConst - "'■ any nMf | ••• any •••
| OctalDigit {OctalDigit} nCn.
Character strings
String = Bin {any} n,w I fn' {any} ,n'.
Comments
Comment = "(*" {Comment I any} n*)w.
Operators and separators
+
-
*
/
: =
&
=
#
<>
<
>
<=
addition
subtraction
multiplication
real division
assignment
logical and
equal
not equal
not equal
less than
greater than
less than or equal
>= greater than or equal
( ) round parenthesis
[ ] index-parenthesis
{ } set-brackets
A pointer
, comma ,
period
; semicolon
: colon
range operator
I variant operator
Context conditions
1. Decimal, hexadecimal, or octal constants must be in the range 0 to 65535.
2. The numerical value of character constants must be in the range 0 to 255.
3. Real constants must be in the range 1.4694E-39 to 1.7014E+38.
4. Character strings must not extend over line boundaries.
8.1.2 Description of a lexical analyzer for Modula-2
In the previous section, we described the lexical structure of Modula-2 by a
context-free grammar. Now we will have to attribute it. The following points
need special attention.
The lexical analyzer supplies the terminals for syntax analysis. These are
the nonterminals of the lexical analyzer, whereas the terminals of the lexical
174
Applications
Chap. 8
analyzer are the characters of the source text. These characters must be
supplied by a mini-scanner with the following tasks:
1. read and print the source program;
2. supply the characters of the source text as terminals;
3. treat the character sequences '..', *(*\ and '*)' as special terminals (to
simplify the attributed grammar).
This still leaves enough work for the lexical analyzer proper. In accordance
with Section 6.4.2, we will implement the mini-scanner in the procedure
GetSy of the module Scannerlex. The mini-scanner is so simple that we
refrain from describing it further.
Now we will specify the lexical analyzer of Modula-2 with Cocol.
GRAMMAR Scanner
SEMANTIC DECLARATIONS
FROM Conversions IMPORT Convert, ConvertReal;
FROM Errors IMPORT SemErr;
FROM ListMod IMPORT EnterString, Hash;
FROM Scannerlex IMPORT typ, line, col;
FROM OutMod IMPORT Symboltype, Emit, EmitConstant,
Emitldent, EmitString;
—TYPE Symboltype = (*token codes*)
— (eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy,
insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy,
commasy, semicolonsy, periodsy, colonsy, rangesy, constsy,
typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy,
tosy, arrowsy, importsy, exportsy, fromsy, qualifiedsy,
beginsy, casesy, ofsy, ifsy, thensy, elsifsy, elsesy, loopsy,
exitsy, repeatsy, untilsy, whilesy, dosy, withsy, forsy, bysy,
returnsy, becomessy, endsy, callsy, definitionsy,
implementationsy, proceduresy, modulesy, ident, cardcon,
intcardcon, realcon, charcon, stringcon, eolsy);
— buffer length (every token must
~ fit on a 80 character line)
— string address in string list
nax] OF CHAR; — buffer
— buffer length
— auxiliary
— first character in a string
— auxiliary
— string length
— value of real-constant
— spelling index of identifier
— token code
— symbol column
— constant value
CONST
blmax «
VAR
addr :
b:
bl:
ch:
80;
CARDINAL;
ARRAY[l..bl
CARDINAL;
CHAR;
firstch:CHAR;
i:
length:
rval:
spix:
sy:
symcol:
val:
CARDINAL;
CARDINAL;
REAL;
CARDINAL;
Symboltype;
CARDINAL;
CARDINAL;
Sec. 8.1
Applications in compiler construction
175
MACROS , t ^ ,
sem :AddCh: — it is supposed, that lines
— are not longer than 80 characters
bl:-bl+l; b[bl]:-ch
endsem
TERMINALS
n #n n (*ti
i n*\ tt
chr9 chrlO chrll
chrl7 chrl8 chrl9
chr25 chr26 chr27
if if nj«
n /if tt\ if
n0n «!«
ngft ngn
"@n A
H I
P Q
X Y
»*" a
h i
p q
x y
NONTERMINALS
Scanner
Symbol
Identifier
Number
String
Comment
Letter
Digit
HexDigit
RULES
Scanner =
{Symbol}
f if f
w*ti
non
n . n
B
J
R
Z
b
J
r
z
<out:sy,j
chr4
chrl2
chr20
chr28
"#"
w+i"
"3"
n. n
t
c
K
S
if rtt
c
k
s
n / n
chr5
CR
chr21
chr29
"$"
n if
f
n^ti
ff<W
D
L
T
mi
d
1
t
n 1 n
spix,symcol>
<out:sy,val,symcol>
<out:sy,<
<out:ch>
<out:ch>
<out:ch>
chr6
chrl4
chr22
chr30
if%"
tt_tt
if5ff
tf_ff
E
N
U
i» i n
e
m
u
n \ n
chr7
chrl5
chr23
chr31
n&ff
n n
tt6ft
ti>tt
F
N
V
tf Atf
f
n
V
chrl26
addr,length,firstch,symcol>
sem Emit(ieofsy,ico
chr8
chrl6
chr24
n t it
it/tt
tf Tit
tfjtl
G
0
W
ft ft
g,
0
w
i chrl27
1) endsem.
Symbol =
{" "} — skip blanks
( Identifier <out:sy,spix,symcol>
sem IF sy=ident
THEN EmitIdent(ispix,isymcol) — ident.
ELSE Emit(isy,lsymcol) — keyword
END
endsem
I Number <out:sy,val,symcol>
sem EmitConstant(isy,ival,isymcol) endsem
— cardcon, intcardcon, realcon, charcon
176
Applications
Chap, g
String <out:sy,addr,length,firstch,symcol>
sem IF sy=stringcon
THEN EmitString(iaddr,ilength,isymcol)
ELSE EmitConstant(icharcon,
iORD(firstch),Asymcol)
END
endsem
I Comment
ti /if
tt\ it
ti rn
■]■
ti/n
it \ it
w*n
it it
f
n/it
n + ti
CR
/ ii _n
I eps
)
( ">n
| «»=:»
I eps
)
tt-vti / it—it
I ">" (
I eps
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
sem Emit
[lsemicolonsy,lcol) endsem
;ieqlsy,icol) endsem
ilparsy,lcol) endsem
(irparsy,icol) endsem
|ilbracksy,icol) endsem
;irbracksy,icol) endsem
[ilconbrsy,icol) endsem
irconbrsy,icol) endsem
4timessy,Acol) endsem
icommasy,icol) endsem
islashsy,Acol) endsem
lplussy,icol) endsem
iminussy,icol) endsem
(iarrowsy,icol) endsem
lvariantsy,lcol) endsem
inotsy,icol) endsem
(iandsy,icol) endsem
iperiodsy,Acol) endsem
irangesy,icol) endsem
|leolsy,icol) endsem
ibecomessy,icol) endsem
icolonsy,icol) endsem
[inotsy,icol) endsem
lleqsy,icol) endsem
ilsssy,icol) endsem
igeqsy,icol) endsem
lgtrsy,icol) endsem
).
Identifier <out:sy,spix,symcol> =
Letter <out:ch>
{ Letter <out:ch>
I Digit <out:ch>
}
sem symcol:=col; bl:=0 endsem
sem (AddCh) endsem
sem (AddCh) endsem
sem (AddCh) endsem
sem Hash(ib,ibl,Tsy,Tspix)
— sy is identifier or keyword
endsem.
Sec. 8.1
Applications in compiler construction
177
number <out:sy,val,symcol> =
wu^ sem symcol:=col; bl:=0 endsem
Digit <out:ch> sem (AddCh) endsem
{ HexDigit <out:ch> sem (AddCh) endsem
}{ H sem bl:-bl+l; b[bl] :="HW;
Convert(lb,ibl,Tsy,Tval)
endsem
I w.w sem bl:=bl+l; b[blj :=CHR(typ) endsem
{ Digit <out:ch> sem (AddCh) endsem
}
[ e sem bl:=bl+l; b[bl]:=CHR(typ) endsem
j n+« | it_w sem bl:=bl+l; b[bi]:=CHR(typ) endsem
]
Digit <out:ch> sem (AddCh) endsem
[ Digit <out:ch> sem (AddCh) endsem
]
] sem ConvertReal(ib,ibl,Trval);
sy:=realcon;
val:=CARDINAL(rval)
endsem
I eps sem Convert(ib,ibl,?sy,Tval) endsem
).
String <out:sy,addr,length,firstch,symcol> =
sem symcol:=col; bl:=0 endsem
t n t
{ "..n sem b[bl+l]:=".n; b[bl+2]:="."; bl:=bl+2
endsem
I "(*• sem b[bl+l]:="("; b[bl+2]:="*"; bl:=bl+2
endsem
I "*)" sem b[bl+l]:=n*n; b[bl+2]:-")"; bl:-bl+2
endsem
I CR sem SemError(il,iline,icol); bl:=0 endsem
I any sem bl:=bl+l; b[bl]:=CHR(typ) endsem
}
i n i
n in
{ ".." sem b[bl+l]:=n."; b[bl+2]:="."; bl:-bl+2
endsem
I "(*" sem b[bl+l]:="("; b[bl+2]:="*"; bl:=bl+2
endsem
I "*)" sem b[bl+l]:»"*"; b[bl+2]:=•)■; bl:=bl+2
endsem
I CR sem SemError(ll,iline,lcol); bl:=0 endsem
I any sem bl:=bl+l; b[bl] :<HR(typ) endsem
)
n i it
)
sem length:-bl;
IF length=l
THEN sy:=charcon; firstch:=b[l]
ELSE
Applications
Chap. 8
sy:=stringcon;
EnterString(ib,ibl,taddr)
END
endsem.
Comment =
"(*n { comment
I any
}
"*)n.
Letter <out:ch> =
(A|B|C|D|E|F|G|H|I|J|K|L|M|N|0|P|Q|R|S|T|D|V|W|X|Y|Z|
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z)
sem ch:=CHR(typ) endsem.
Digit <out:ch> «
("0"inlw \"2n |"3" I "4"|"5"|"6" |"7"I w8nI "9")
sem ch:=CHR(typ) endsem.
HexDigit <out:ch> »
digit <out:ch>
I (AJB|C|D|E|F) sem ch:=CHR(typ) endsem.
ENDGRAM
The rules for Number and String need some explanation:
Numerical constants cannot be converted while they are being recognized
because decimal, hexadecimal, octal, and real constants can be distinguished
only by their last character or by a decimal point. Their text must therefore be
stored and converted later.
Strings also have their peculiarities. Our mini-scanner returns the
character sequences '..', '(*\ and '*)' as single terminals. If one of these
sequences appears within a string it has to be expanded again, since strings
must be stored in their original form. Therefore, the rule for strings gets more
complicated than expected.
On the other hand, the description of strings and comments with the
symbol any looks very simple and elegant. In accordance with Section 5.2.1,
any represents all those terminals which cannot be recognized instead of it, at
this point in the grammar (in String: all terminals except'..', '(*'> '*)\ CR,
and ''' (or ■ " ■); in Comment: all terminals except •(*' and '*)'). The
example also shows the semantic processing of any. In a string, the symbol
recognized by any is processed using the global variable typ (see Section
6.4.2).
The reason for the introduction of the terminals \.', •(*'. and ,*)' is not
obvious, and requires an explanation: the symbol'..' is necessary, because
otherwise a lookahead of 2 characters would be needed (the first period in the
geC g# i Applications in compiler construction 179
sequence '1..2' may be a decimal point or the start of a range operator).
Although comments can be processed with a single lookahead character, it
simplifies the processing of comments considerably if we treat the sequences
•(*• and '*)' as single terminals.
LL(1) Conflicts
As shown by Example 8.1, it is often difficult to avoid LL(1) conflicts when
lexical structures are described by an attributed grammar
8.1 Example LL(1) conflicts in lexical structures
Scanner = {Symbol}.
Symbol = ...
| »=»
| ">" [■=■].
This situation represents an LL(1) conflict because if V is read and the
next character is f=\ the syntax analyzer cannot decide whether this
character belongs to the symbol '>=' or whether it constitutes a separate
symbol '='. Such conflicts also appear in the symbols ':=', '<>•, '<=',
Identifier, and Number. However, they are not critical since the syntax
analyzer always chooses the first alternative it encounters during
analysis. In the example above, this means that '=' is correctly considered part
of the symbol '>=' rather than being recognized as a separate symbol.
Speed
A lexical analyzer implemented with Coco runs at approximately one-half the
speed of a hand-coded analyzer. A 35% speed gain can be achieved if the
nonterminals Letter and Digit with their many alternatives are already
recognized as terminal classes by the mini-scanner.
Assessment
The example has shown how easy a translation process can be described with
Cocol, At the first glance, the grammar may seem a bit confusing. Yet, as
soon as one becomes familiar with this notation, the following advantages can
be observed:
1. The grammar is short and precise. For the recognition of a symbol, it is
sufficient to write its name without any additional actions.
2. The syntax is clearly separated from the semantics. Thus the syntax is
more explicit than it is in a hand-coded compiler.
3. From the syntax declarations, one can see immediately which terminals
and nonterminals are in the language.
180
Applications
Chap. 8
4. Error-handling actions need not be described explicitly.
5. Many constructs, like nested comments, can be described with any in a
straightforward and elegant way which is hard to surpass.
Of course, there are some parts of the grammar which are not very simple to
read, e.g. the production for Number. It has a rather complex structure, but
this only shows that Cocol can also handle difficult constructs. After all, the
production for Number describes four different kinds of numerical constants.
This would be difficult to read in a hand-coded lexical analyzer, too, and could
hardly be written in this short and concise form using a conventional
programming language.
8.1.3 Semantic procedures for lexical analysis
We decompose the semantic procedures of the attributed grammar into four
modules Scannerlex, OutMod, ListMod, and Conversions and specify
their definition modules, but omit their implementation modules due to space
limits.
DEFINITION MODULE Scannerlex;
VAR typ,col,line: CARDINAL; ^information about the current token*)
at: ARRAY[1..10] OF CHAR; (*not needed here*)
PROCEDURE GetSy;
END Scannerlex.
Scannerlex reads and prints a source text and returns every single character as
a separate token. The token number as well as its column and its line number
are returned by GetSy in the global variables typ, col, and line. The token
numbers are the ASCII-values of the source characters (exceptions: eofch=0,
'..'=1, ,(*l=2, and '*)'=3). After the last character in the source text is read
GetSy always returns eofch.
DEFINITION MODULE OutMod;
TYPE Symboltype = (*token codes*)
(eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy,
minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy,
lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy,
semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy,
arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy,
importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy,
ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy,
untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy,
becomessy, endsy, callsy, definitionsy, implementationsy,
proceduresy, modulesy, ident, cardcon, intcardcon, realcon,
charcon, stringcon, eolsy);
PROCEDURE Emit(sy:Symboltype; col:CARDINAL);
Sec. 8.1
Applications in compiler construction
181
PROCEDURE EmitConstant(sy:Symboltype; val,col:CARDINAL);
PROCEDURE Emitldent(spix,col:CARDINAL);
PROCEDURE EmitString(addr,len,col:CARDINAL);
END OutMod.
The module OutMod contains procedures to write symbols to an intermediate
language file.
Emit writes a symbol without attributes (e.g. a keyword, an operator or a
single character) to the intermediate language. It emits a word which contains
the symbol type sy and the column col of that symbol.
EmitConstant writes a numeric constant to the intermediate language. It
emits two words, the first of which contains the type sy and the column col
of the symbol and the second the constant value val.
Emitldent writes a name to the intermediate language. It emits two
words, the first of which contains the symbol type *ident' and the column col
and the second the spelling index (spix) of the name.
EmitString writes a string to the intermediate language. It emits three
words, the first of which contains the symbol type 'string1 and the column
col, the second the string address addr and the third the string length len.
DEFINITION MODULE ListMod;
FROM OutMod IMPORT Symboltype;
PROCEDURE EnterString(buffer:ARRAY OF CHAR; len:CARDINAL;
VAR addr:CARDINAL);
PROCEDURE Hash(buffer:ARRAY OF CHAR; len:CARDINAL; VAR sy:Symboltype;
VAR spix:CARDINAL);
END ListMod.
ListMod handles the name list and the string list of the scanner. EnterString
enters a string (stored in buffer[\..len\) into the string list and returns its
address addr. Hash searches a name (stored in buffer[\..leri\) in the name
list. If not found it is entered. For keywords Hash returns the token code of
the keyword and spix is 0. Otherwise Hash returns the token code 'ident1
and spix is the address (spelling index) of the name in the name list.
DEFINITION MODULE Conversions;
FROM OutMod IMPORT Symboltype;
PROCEDURE Convert(buffer:ARRAY OF CHAR; len:CARDINAL;
VAR sy:Symboltype; VAR val:CARDINAL);
PROCEDURE ConvertReal(buffer:ARRAY OF CHAR; len:CARDINAL;
VAR rval:REAL);
END Conversions.
The module Conversions converts digit strings to cardinal or real numbers.
The procedure Convert converts a digit string (stored in buffer[l..len])
to a numeric constant or a character constant. The digit string may have the
following syntax:
digitstring = digit {digit} — decimal constant
182 Applications Chap. g
I digit {hexdigit} 'H1 — hex constant
I octaldigit {octaldigit} 'B* — octal constant
I octaldigit {octaldigit} 'C1. ~ character constant
For numeric constants the output parameter sy is cardcon and val is in the
range 0..6SS3S; for character constants sy is charcon and val is in the range
0C..377C.
ConvertReal converts a digit string (stored in buffer[l.Jeri\) to its real
value rval. The syntax of the digit string is
digitstring = digit {digit} '." {digit} ['E1 [H'!'-'] digit [digit}].
8.2 Applications in software engineering
An attributed grammar as a description method and a compiler compiler as an
implementation tool are not limited to compiler construction. They can also be
useful in other fields of software engineering.
The reason why compiler construction techniques can be generally used
in software engineering is that most large programs have the following
characteristics:
1. Input streams are sufficiently complex to be described in terms of syntax
and semantics.
2. The structure of the input text often determines the logical structure of the
entire program or of a large portion of it.
This wide field of applications is remarkable. We will now show that the well-
known Jackson method of program design can be regarded as a special case of
program design with attributed grammars. With this in mind, in this section
the compiler description language is emphasized while the compiler compiler
stays in the background.
8.2.1 Attributed grammars as a software design method
The use of attributed grammars automatically leads to a two-step design
process: In the first step {coarse design) the problem is decomposed into its
syntactical and semantical parts. Here, the attributed grammar serves as a design
method. In the second step {refined design) the semantic procedures are
designed from their specifications in the rough design.
The creation of the coarse design consists of the following steps, which
may be executed sequentially or iteratively:
Sec. 8.2
Applications in software engineering
183
Write the grammar. The syntactic structure of the input text is described
by a context-free grammar.
2 Find attributes. Starting from the meaning of each syntax symbol, one
tries to find out which (semantic) attributes should be attached to it Then
one defines these attributes and their occurrences in the grammar rules.
With some experience and a proper understanding of the problem the
right choice is almost automatic. This step is therefore also a good check
on correct understanding of the problem.
3. Prepare context conditions. Possibly further attributes may be necessary
for this process.
4. Define semantic procedures. In this step, all procedures which are used
in semantic actions are defined. The refinement of semantic actions into
code and procedure calls may again be done in a coarse or fine manner.
Using the first approach, one may associate a special semantic procedure
with each semantic action; using the latter approach, one may describe
each semantic action in terms of elementary operations of a programming
language without calling semantic procedures. Since many of the
semantic procedures are usually access procedures to data structures, they
support a modular design in the form of data capsules. The collection of all
procedures shows which operations can be performed with the various
data structures and which relations exist between the data structures.
5. Setup the attributed grammar. One combines the context-free grammar,
the attributes, the semantic actions, and any context conditions for a
proper attributed grammar.
After these five steps, the coarse design is completed and the following has
been accomplished:
1. The problem has been decomposed into three parts: syntax, context
conditions, and semantic actions.
2. The attributes and the data structures derived from them are the terms in
which the problem solution can be appropriately described.
3. The access routines to the data structures and all other algorithms required
for the solution are defined by the semantic procedures.
This completes the design method with attributed grammars. The result is
sufficiently abstract to fix only the essential semantic design decisions but to leave
enough freedom to the implementor. On the other hand it is sufficiently
concrete to specify explicitly those details that should not be left to the decision of
the implementor.
184
Applications
Chap, g
The result of the coarse design, consisting of a system of attributes,
semantic procedures, and an attributed grammar, can be viewed as the
specification for the refined design, since it describes what is to be done but not how it
should be done.
The next step is the refined design which may now exclusively
concentrate on the semantic procedures without having to consider any syntactic
problems.
However, coarse design and refined design may influence each other.
After the definition of the attributes, one may find that the semantic procedures
are either too abstract or too concrete, too complex or too simple. For
example, too many access procedures to the data structures of a module may
indicate that it would have been better to add a lower level of abstraction, and to
divide the large module into several smaller ones. The concise and formal
notation of attributed grammars encourages one to try several approaches and to
check their consequences without much effort, even when the task is large.
The refined design is followed by the implementation. Only a lexical
analyzer has to be written here, the rest is done by the compiler compiler.
8.2.2 The telegram problem as an example
Henderson and Snowdon [1972] presented the following problem, which is
known as the 'telegram problem':
A stream of telegrams is to be processed. Each telegram is terminated by
the string 'ZZZZ'. The telegram stream is terminated when an empty
telegram followed by 'ZZZZ' arrives. The words in a telegram are to be
counted. Long words with more than 12 characters are to be counted
separately. After each telegram, the counter values are to be printed. The
telegrams are read and subsequendy printed in lines of 100-120 characters.
Superfluous blanks are to be eliminated. The maximum word length is 20
characters. Longer words are to cause the program to stop.
Since the input consists of structured data, and its structure will significandy
determine the algorithm, this task is well suited for the application of attributed
grammars, and a subsequent implementation with a compiler compiler.
The design steps for the solution of the telegram problem are:
1. Setup the grammar of the input data
Terminals:
textword a word in a telegram
endword end word (= ZZZZ)
Sec. 8.2
Applications in software engineering
185
Nonterminals:
Te iegramSt ream the total telegram stream
TextTeiegram a text telegram (including its end word)
EmptyTeiegram an empty telegram containing only the end word
Context-free grammar:
TelegramStream = {TextTeiegram} EmptyTelegram.
TextTeiegram = textword {textword} endword.
EmptyTelegram = endword.
2. Define attributes. From the specification of the task, three attributes
result:
w array of char the text of a word
n integer the number of words in a telegram
l integer the number of long words in a telegram
3. Assign attributes to the grammar symbols. In this step, we list the
grammar symbols and attach attributes to them.
textwordfw recognizes a word and provides its text w.
TextTeiegramtnti recognizes and prints a telegram with n
words, of which / words are long.
The remaining grammar symbols have no attributes.
Note that the attributed symbols are viewed from an algorithmic
point (i.e. we do not say TextTeiegram represents a telegram', but
rather TextTeiegram recognizes a telegram1). The verbal description of
the attributed symbols should specify all attributes of the symbol. It
should be accurate enough to be used as a specification of the translation
process. This is usually possible and easy to accomplish since the few
items involved have already been previously defined.
4. Define semantic procedures. The actions the program must execute can
be seen from the problem description:
(a) read the source text, recognize and count the words;
(b) print the source text with a different line length;
(c) print the counter values.
Reading the source text is the task of the lexical analyzer and does not
concern us here. The words are counted with the attributes n and /.
Therefore, the only candidates for semantic procedures are those which
print the text and the counter values. A variable will probably be needed
to assure that the line size will not exceed 120 characters. It will be
initialized at the beginning of each telegram, and will be checked and increased
when a new word is added to the line. A line buffer may also be needed.
Following the principle of stepwise refinement, we are not yet interested
in the implementation details here. Rather, we define the following three
procedures which will do the whole printing job.
186
Applications
Chap, s
out m it initialize the output of a telegram;
outword(iv) print the word w according to the problem defi.
nition;
out Account (I nil) print the counter values n and / with an
appropriate text
5. Write down the attributed grammar. Having completed steps 1
through 4, the attributed grammar is almost self-evident now:
TelegramStream -
{ sem Outlnit endsem
TextTelegramfn'h sem OutAccount(inil) endsem
} EmptyTelegram.
TextTelegramfj^ =
textwordfw where (Iw|<=2 0)
sem n:«l;
if |w|>12 then 1:«1 else 1:«0 end;
OutWord(iw)
endsem
{ textwordfw where (|w|<= 20)
sem n:-n+l;
if jw|>12 then 1:=1+1 end;
OutWord(iw)
endsem
} endword.
EmptyTelegram = endword.
This completes the coarse design of the telegram problem* Syntax and
semantics are clearly separated. Together they provide a clear decomposition of the
program, making its structure apparent. The separation shows that the
semantic processing - i.e. the essential part - is very simple if there is a printing
module with the access procedures Outlnit, OutWord, and OutAccount.
A comparison with Henderson and Snowdon's solution shows that in his
program lexical analysis and syntax analysis attract the major part of attention
in design, program text, and possible design errors. Output and counting are
of minor importance and are nearly lost. Their solution avoids the terms
syntax and semantics, thus letting the problem appear to be much more complex
than it is. In contrast, we focus most of our attention on printing and counting.
We consider lexical analysis and syntax analysis as routine matters that do not
require special attention.
Sec. 8.2
Applications in software engineering
187
3 Attributed grammars as documentation
crrtm the above, it should be obvious that attributed grammars are also well
•ted for documentation. The system of syntax, attributes, semantic
procedures and the attributed grammar of a software product is its documentation
(an the abstraction level of the attributed grammar). The following advantages
0f this documentation method are evident:
I The form of the documentation (its structure) is easy to find. It is almost
independent of the product to be described, and consists of the parts:
terminals, nonterminals, context-free grammar, attributes, attributed
grammar symbols, semantic procedures, and attributed grammar (in
this order). This arrangement aids standardization.
2. The documentation is formal and therefore precise, complete, and short.
3. The documentation is abstract enough to hide implementation details, but
concrete enough to express important conceptual details.
4. The fact that attributed grammars represent a machine-readable
documentation renders it unnecessary to separate implementation and
documentation, thus ensuring that the documentation is always up-to-date.
8.2.4 The Jackson method as a special case
At a quick glance, the often discussed Jackson method of program design
seems to have nothing in common with attributed grammars. Jackson [1975]
uses a totally different terminology and describes his method only by
examples in an indirect and unsystematic manner. To find out the essence of
Jackson's method, the reader is forced to study other literature.
The Jackson method is based on the following three concepts:
1. The structure of an algorithm is derivable from its input and output data.
2. The structure of the input and output data is described by tree diagrams
which allow the description of sequences, alternatives, and (unlimited)
repetitions.
3. If the structures of the input and output data 'match* in a certain way, the
total algorithm for the transformation of the input into the output data can
be viewed as an assembly of the transformation algorithms for the
individual substructures.
If the structures of the input and output data do not match, the Jackson method
fails. However, in the examples in his book, Jackson shows that his method
188
Applications
Chap.8
can still be used with the aid of tricks such as 'backtracking', 'program
inversion', and some other techniques.
Hughes [1979] looked at the Jackson method from the standpoint of
formal languages and summarized the following points:
1. Jackson's tree diagrams describe only regular languages since they ait
only based on sequences, alternatives, and unlimited iterations.
2. In addition, it is required that the input data can be deterministically
analyzed with a single-character look-ahead.
3. Jackson's requirement of a structural matching between input and output
data means in the terminology of formal languages that there must be a
finite automaton that transforms the input into the output.
Jackson's design method can be viewed as a special case of the design method
with attributed grammars, in which:
1. the input data is regular and its grammar is LL(1);
2. the output data form a regular language;
3. a certain correspondence exists between input and output language that
manifests itself in the fact that a finite automaton can be found that
transforms the input into the output.
It is therefore only applicable to a narrow range of tasks that meet these
conditions.
It is suprising that this relationship between Jackson's method and the
design method with attributed grammars has hardly been recognized. The
reason for this may be that Jackson does not distinguish between syntax and
semantics (in fact, they are indistinguishably coupled in his examples), and does
not use attributes.
If we describe the examples in Jackson's book with attributed grammars,
they will become simpler, shorter, and easier to understand. The grammars are
simple throughout. We will show this by example 14 of Jackson's book,
which in his discussion covers 17 pages, and is the most voluminous of the
entire book.
Problem description. An operating system collects data about its use. These
data are: A record for the start of each session (LOGON), the end of a session
(LOGOFF), the start of a program (PROGSTART), and the end of a program
(PROGEND). At logon time, the user is assigned a unique session number.
The system makes sure that a user can start a session only when his terminal is
free, and cannot terminate a session that he has not initiated. Furthermore, a
user can have only one active program at any given time. He must terminate
an active program before starting a new one.
Sec. &2
Applications in software engineering
189
The collected data is written to a file. The records have the following
contents:
Logon record: LOGON session number start time
Logoff record: LOGOFF session number stop time
Progstart record: PROGSTART session number program name start time
Progend record: PROGEND session number program name stop time
The records are stored in strict chronological order. However, it is possible
that records are missing due to erroneous processing. In this case, the data file
contains incomplete information for some sessions and programs: a logon
record without corresponding logoff record, and vice versa; a progstart record
without corresponding progend record, and vice versa.
As a result, the program should produce the following list:
Number of complete sessions = nnnn
Average session length = tttt
Number of known sessions
Number of complete programs » pppp
Average program length - uuuu
Number of known programs = qqqq
Grammar. The input consists of four kinds of records. We regard them as
terminals: logon, logoff, progstart, and progend, and arrive at the
following grammar:
input = {logon | logoff I progstart I progend}.
It consists of a single rule (for regular languages, there is always a grammar
that consists of a single rule). In accordance with the problem description we
attach attributes to the terminals:
session: integer session number
prog: name program name
t ime: integer time of logon, logoff, progstart and progend
and get the attributed grammar symbols
logonfsessionttime
logofftsessionttime
progstarttsessionTprogTtime
progendtsessiontprogtti me
Semantics. In the semantic actions, we need variables that hold the results.
We call them
compietesessions: integer number of complete sessions
knownsessions: integer number of known sessions
compieteprogs: integer number of complete programs
knownprogs: integer number of known programs
190 Applications Chap, g
sessiont ime: integer length of all complete sessions
progt ime: integer length of all complete programs
It is clear from the above that, when a logon record appears, the job number
and the start time must be stored until a logoff record with the same job num.
ber is encountered. The same is true for programs. For the time being, we will
put the definition of the concrete data structures in the background, and
consider only the fact that we need the following access procedures:
NewSession (Isessionltime)
Define the start of a session at the specified time.
DisposeSession(isession)
Define the end of a session.
SessionStarted(isession): boolean
Return true if the specified session has been started.
SessionStanTime(lsession): integer
Return the start time of the specified session.
NewProg(lsessionlprogltime)
Define the start of the program prog in the specified session at the
specified time.
DisposeProg(isessioniprog)
Define the end of the program prog in the specified session.
ProgStarted(lsessionlprog): boolean
Return true if prog in session has been started.
ProgStartTime(lsessionlprog): integer
Return the start time of prog in session.
InitStorage
Initialize the abstract data structure.
Attributed grammar. With only these few facts, which are easily derived by
modest thought, the attributed grammar of the problem can be formulated:
input -
sem InitStorage;
completesessions:=0; knownsessions:=0;
completeprogs:=0; knownprogs:=0;
sessiontime:=0; progtime:=0;
endsem
{logontsession1time
sem knownsessions:=knownsessions+l;
NewSession(isessionitime);
endsem
Sec- 8-2
Applications in software engineering
191
| progStartfsessiontprogTtime
sem knownprogs:=knownprogs+l;
NewProg(isessioniprogitime)
endsem
I progendfsessiontprogttime
sem if ProgStarted(isessioniprog)
then
completeprogs:-completeprogs+l;
progtime:=progtime+(time-
ProgStartTime(isessionlprog))
DisposeProg (isessionlprog)
else knownprogs:=knownprogs+1
end
endsem
| logofftsessionttime
sem if SessionStarted(isession)
then
completesessions:=completesessions+l;
sessiontime:=sessiontime+(time-
SessionStartTime(isession))
DisposeSession(isession);
else knownsessions:=knownsessions+l
end
endsem
} sem Write(icompletesessions)
Write(4sessiontime/completesessions)
Write(iknownsessions)
Write(icompleteprogs)
Write(iprogtime/completeprogs)
Write(iknownprogs)
endsem
At this point, the coarse design is already completed. The refined design will
decide about the concrete implementation of the abstract data structure. In
principle, the program can be implemented with a compiler compiler. In order
to read the input data, only a (trivial) lexical analyzer needs to be written. But
since the grammar of this problem is so simple (as it is also for the telegram
problem), the use of a compiler compiler is analogous to taking a
sledgehammer to crack a nut. It is therefore almost self-explanatory that the syntax
analyzer for this problem is coded using the method of recursive descent (in this
case it is even non-recursive).
Jackson instead undertakes voluminous considerations about intermediate
data files and program inversions which make the task appear much more
complicated than it really is.
192
Applications
Chap. $
8.3 Results of a Coco run
For readers interested in the way Coco works, we present an example
showing the contents of the compiler parts generated from a specific input
grammar. It can be viewed as a supplement to the implementation description
in Chapter 7, and should help to understand the principles explained there.
The example will be the description of an index generator, which is a
program that generates an index from a list of keywords entered according to
some syntactic rules. This problem provides another example of the use of
attributed grammars in software engineering.
The input to the index generator is to be as follows: for each page of a
document, the page number and all keywords on this page are entered in the
following manner:
1 = Introduction; User's Guide;
2 = Start up; Parts of the tool;
3 = General characteristics; User's Guide
On the left-hand side of the '=' sign, page numbers as well as words are
allowed. Words, however must start with a •**:
*Appendix = Maintenance; Troubleshooting;
From this input, the compiler generates a file of pairs <keyword, page
number>, sorts this file, and prints an index in which page numbers of
identical keywords are collected (the index at the end of this book was
produced with such a program).
In our example, we will describe the first phase of this compiler, i.e. the
generation of the <keyword, pagenumber> file.
1 GRAMMAR Index
2
3 SEMANTIC DECLARATIONS
4 FROM FilelO IMPORT File,Open,Close,Write,WriteString,WriteLn;
5 FROM Indexlex IMPORT GetKeyword,AdjustNumber.
6
7 VAR f: File;
8 keystring,refstring,string: ARRAY[1..50] OF CHAR;
9 value: CARDINAL;
10
11 TERMINALS
12 n=" alias equal
13 ";n alias semicolon
14 n*w alias asterisk
15 keyword
16 number<out:value>
17
— 1
— 2
— 3
— 4
~ 5
Sec. 8.3
Results of a Coco run
193
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
PRAGMAS
eolsy
NONTERMINALS
Index
Relation
Reference<out:
RULES
Index =
{Relation}
Relation =
Reference<out
n_.it
{ keyword
n • n
}.
:string>
-- 6
— 7
» 8
-- 9
sem Open(f,"INDEX.OUT") endsem
sem Close(f) endsem.
:refstring>
Reference<out:string> =
number<out:value>
j n*n keyword
ENDGRAM
sem GetKeyword(tkeystring);
WriteString(if,ikeystring);
Write(if,iCHR(0));
WriteString(if,4refstring); WriteLn(f)
endsem
sem AdjustNumber(lvalue,tstring) endsem
sem GetKeyword(Tstring) endsem.
This is the description of the translation process. The only thing the user has
to provide is the module Indexlex that supplies the terminals and exports the
two procedures GetKeyword and AdjustNumber. GetKeyword should
return the keyword string that the lexical analyzer has obtained after
recognition of the terminal keyword. AdjustNumber should right-justify a number
in a character field for sorting. The pragma eolsy is specified only to show
how pragmas are encoded in the generated tables.
From this input, Coco generates a table-driven syntax analyzer and a
semantic evaluator. These modules will be discussed in the next sections.
8.3.1 The generated syntax analyzer
The syntax analyzer is generated from a frame program (cocosynframe,
shown in Appendix F) into which Coco inserts the following constant
declarations.
194 Applications Chap, g
CONST
maxname =
maxnamep =
maxcode =
maxany
maxeps
maxt -
maxp
maxs
startpc =
75
9
48
1
2
5
6
9
44
(*length of name list*)
(*number of names*)
(*length of G-code*)
(*number of any-sets. At least one dummy*)
(*number of eps-follower sets*)
(*last terminal number*)
(*last pragma number*)
(*last nonterminal number*)
(*start address of the grammar*)
These values are the table dimensions derived from the above grammar.
8.3.2 The generated semantic evaluator
The semantic evaluator also consists of fixed frame parts and parts that are
copied from the attibuted grammar. For the index generator, the semantic
evaluator is as follows (generated parts are shown in italics and frame parts are
shown in roman type):
IMPLEMENTATION MODULE I/Jdexsem;
FROM SYSTEM IMPORT WORD;
FROM Indexlex IMPORT at;
FROM FilelO IMPORT File,Open, Close,Write,WriteString,WriteLn;
FROM Indexlex IMPORT GetKeyword, AdjustNumber;
VAR f: File;
keystring, re fstring, string: ARRAY[1.. 50] OF CHAR;
value: CARDINAL;
PROCEDURE ASSIGN(VAR x:W0RD; y:WORD);
BEGIN x:=y END ASSIGN;
PROCEDURE Semant(sem:CARDINAL);
BEGIN
CASE sem OF
U: ;
/ 12: (*line 29*)
Open (f, "INDEX. OUT*1)
I 13: (*line 30*)
Close if)
I 14: (*line 33*)
refstring:=string;
I 15: (*line 35*)
GetKeyword (keystring) ;
WriteString (f, keystring);
Write (f,CHR(0));
WriteString(f,refstring); WriteLn(f)
I 16: (*line 44*)
Sec. 8-3
Results of a Coco run
195
ASSIGN (value,at[1]);
I 17: (*line 44*)
AdjustNumber(value,string)
I 18: (*line 45*)
GetKeyword( string)
END;
END Semant;
END I/idexsem.
8.3.3 The generated parser tables
Coco generates the following tables:
1. G-code;
2. information about nonterminals (G-code start address, deletability, set of
start symbols);
3. terminal successors of eps-symbols;
4. symbol sets represented by any-symbols;
5. number of attributes for terminals and pragmas;
6. number of semantic actions for pragmas;
7. symbol names for error messages.
The table values are inserted as initialization code into the generated syntax
analyzer. We will now show these values in a decoded form.
-code
Address
—
1
2
6
9
11
12
—
13
15
16
18
22
23
25
28
30
Instruction
Index —
SEM12
NTA
JMP
EPS
SEM13
RET
Relation —
NT
SEMI 4
T
TA
SEM15
T
JMP
EPS
RET
Relation, 9
2
1
—
Reference
ii _if
keyword, 28
18
2
Code (addresses take 2 bytes)
12
3
10
8
13
11
2
14
0
1
15
0
10
8
11
0 28
18
196
Applications
Chap, §
31
35
36
37
38
40
42
43
—
44
46
48
Reference -
TA
SEMI 6
SEMI 7
RET
T
T
SEM18
RET
dummy rule
NT
T
RET
number, 38
n*n
keyword
—
Index
EOF
1
16
17
11
0
0
18
11
2
0
11
5
3
4
7
0
0 38
The entire grammar occupies only 48 bytes of G-code!
Nonterminal description
symbol (no.) start address deletability terminal start symbols
Index (7) 1 deletable {w*n, number}
Relation (8) 13 nondeletable {w*n, number}
Reference (9) 31 nondeletable {"*", number}
eps-successors
1: {EOF}
2: {EOF, ■*■, number}
Number of attributes for terminals and pragmas
0 keyword: 0
0 number: 1
0 eolsy: 0
0
EOF
it .n
i
Pragma semantics
attribute passing action
eolsy: 0
user action
0
Symbol names
names:
EOF/equal/semicolon/asterisk/keyword/
number/eolsy/Index/Relation/Reference
name pointers: 1, 5, 11, 21, 30, 38, 45, 51, 57, 66
9
Experiences with Coco
In 1981 workers at the University of Linz built a parser-generator that
generates parser tables for an LL(1) syntax analyzer from an input grammar in
Wirth's EBNF notation. The generator proved useful, which is the reason
why it was enhanced in 1983, and eventually evolved into the compiler
compiler Coco.
The first version of Coco ran on an Intel 8080 development system, and
was written in PIVM-80, a language similar to PL/I for microcomputers. Since
then, many more versions of Coco have been implemented in Modula-2 on
various microcomputers including the Macintosh, the IBM-PC, the Atari 1040
and the Lilith. There is also a version for IBM mainframes. Coco has been in
use for several years now and has proved to be useful both in research
projects (e.g. construction of a Modula-2 compiler, tools for static program
analysis) and in student courses.
9.1 A basis for measurements
In the following sections, we will describe the results of memory and run-time
measurements performed on Coco, and on three compilers generated by Coco.
First, we will measure the generation of a Modula-2 compiler. This
compiler consists of 6 passes (lexical analysis, syntax analysis, name
analysis, declaration analysis, semantic analysis, and code generation). Each
of passes 2 through 6 reads the entire source program in an intermediate
language generated by the previous pass. This intermediate program is
197
198
Experiences with Coco
Chap. 9
analyzed and forwarded to the next pass as a new, usually shorter, inter-
mediate program (with the exception of pass 6, which generates the object
code). Each pass is therefore a compiler in itself, described with an attributed
grammar and translated by Coco into a syntax analyzer and a semantic
evaluator. For the measurements, we will not look at the entire Modula-2
compiler, but rather at two specific passes, since we are interested in the
individual Coco runs. We select pass 2 (syntax analysis) and pass 4
(declaration analysis). These two passes have rather different characteristics,
which make them well suited for a comparison. Pass 2 has a large and deeply
nested recursive grammar with only a few semantic actions, while pass 4 has a
simple grammar with a lot of semantic actions. In the following paragraphs,
we will talk about each of the passes as if they were independent compilers.
Secondly, we will measure the generation of Coco by itself. Compared to
the Modula-2 compiler Coco is much smaller and consists of a single pass.
Thus, we have a comparison between two large applications and a small
application. Table 9.1 shows the sizes of the compilers in terms of their
attributed grammar.
Table 9.1 Size of the attributed grammars of the example compilers
Number of lines
Terminal symbols
Pragmas
Nonterminal symbols
Alternatives
Symbols in productions
Semantic actions
G-code
Modula-2
(pass 2)
960
77
4
62.
224
491
7D
1726 bytes
Modula-2
(pass 4)
968
77
4
27
94
192
126
733 bytes
Coco
609
43
1
10
54
137
68
447 bytes
The measurements shown in the following sections were taken from the Lilith,
since the Modula-2 compiler was only available there. For the Macintosh the
results would have been very similar.
The Lilith is a 16-bit computer built on an Am2901 bit-slice processor
with a cycle time of 150 nanoseconds. It has a very compact object code
format (the so-called M-code) which has been especially tailored to Modula-2.
Sec. 9.2
Measurements on Coco
199
9.2 Measurements on Coco
First, we will look at Coco and measure the memory requirements and the-run
time required by Coco to generate a compiler.
Memory requirements
Obviously the memory requirements for the code and the static data of Coco
are the same in all three measurements (65 347 bytes). The size of the dynamic
data depends on the input grammar but requires typically less than 1000 bytes
(see Table 9.2).
Table 9.2 Memory requirements of Coco for the generation of various compilers
Code
Static data
Dynamic data
Totals
Modula-2 Modula-2
(pass 2) (pass 4)
34 170 34 170
31177 31177
190 872
65537 bytes 66219 bytes
Coco
34170
31 177
564
65 911 bytes
The memory requirement for the code is shared between ten Coco-specific
modules and two standard modules. In addition, Coco uses one module that
belongs to the resident part of the operating system, and thus does not increase
Coco's memory requirements.
Run-time
The run-time of Coco depends on the size of the input grammar. Most of the
time is used by the lexical analyzer that reads and lists the grammar. To write
out the syntax analyzer and the semantic evaluator of the target compiler also
requires considerable time, while the rest of the work is done fairly rapidly. In
large grammars, with a deeply nested hierarchy of nonterminals (as in pass 2
of the Modula-2 compiler), also the grammar tests take a certain amount of
time, (see Table 9.3)
200 Experiences with Coco
Table 9.3 Run-time of Coco for the generation of various compilers
Lexical analysis
Syntax analysis, semantic processing
Grammar tests
Output of the generated compiler
Totals
Modula-2
(pass 2)
14.9
6.8
53
10.9
37.9 s
Modula-2
(pass 4)
19.7
4.6
1.7
14.0
40.0 s
Coco
12.0
3.2
0.9
10.9
27.0 s 1
9.3 Measurements on some generated compilers
We will now consider the memory requirements and the run-time of the
compilers generated by Coco.
Memory requirements
Here, we are only interested in parts which are actually generated by Coco,
namely the syntax analyzer, the semantic evaluator, and the parser tables. We
are not going to consider the size of the semantic modules since they are
independent of Coco.
Table 9.4 Memory requirements of some generated compilers
Syntax analyzer
Semantic processor
Analysis tables
Totals
Modula-2
(pass 2)
2836
2096
4600
9532 bytes
Modula-2
(pass 4)
2836
4084
1469
8389 bytes
Coco
2836
2214 !
1294
6344 bytes
All three compilers use the same syntax analyzer driven by different tables. Its
size is constant. The size of the semantic evaluator depends on the number and
the length of the semantic actions of the attributed grammar. As expected, its
size is larger in pass 4 of the Modula-2 compiler than in pass 2 and in Coco.
Note that the memory requirements of the generated compilers do not
depend on the length of the input text, since no syntax tree of the input is built.
Sec. 9.4
General experiences
201
Kun-time
The run-time of the generated compilers on input texts of various length is
shown in Table 9.5.
Table 9.5 Run-time of some generated compilers
Modula-2 Modula-2 Coco
(pass 2) (pass 4)
100 Input symbols 0.9 s 0.5 s 9.1s
1000 Input symbols 1.9 s 1.2 s 14.5 s
5000 Input symbols 7.9 s 4.5 s 35.5 s
Even though Coco is the smallest of the three compilers, it runs much slower
than the others since it does a lot of input and output (it writes long parts of
source programs to disk), while pass 2 and pass 4 of the Modula-2 compiler
work almost entirely in the main memory (with input and output used only for
intermediate languages).
9.4 General experiences
The experiences with Coco are exceptionally good. Coco allows a tight and
very readable specification of the translation processes. The attributed
grammars become essential parts of each compiler documentation.
By automating syntax analysis, error handling, and semantic processing,
attention can be focused on the actual translation process in the semantic
procedures. More time is available for the design now. Working with
attributed grammars almost automatically leads to a modular program structure
with abstract data structures and access procedures, which are usually small
and easy to understand.
In multi-pass compilers, like the Modula-2 compiler, the symbol any is
especially useful since it lets one easily skip over portions of the input that are
not of interest in this pass. The concept of pragmas has also proved useful
since they make it easy to pass control information between successive passes
(e.g. trace commands, options, etc.).
The limitations of LL(1) grammars are not a serious problem. Because of
Wirth's EBNF notation, it is not necessary to perform complex grammar
transformations in order to remove LL(1) conflicts, which is usually required
202
Experiences with Coco
Chap. 9
in the standard BNF notation. The only time when we failed to resolve \JU\\
conflicts was in the translation of the language PLM-80. The conflicts were
resolved by delegating some parts of the processing to the lexical analyzer.
Processing the input with L-attributed grammars and without building a
syntax tree is not a serious restriction. If during processing some attributes are
needed which only become available later, intermediate results are stored until
the required attributes have been calculated and the final translation is possible.
The omission of a syntax tree leads to efficient compilers with regard to speed
and memory requirements. Most of the generated compilers run on
microcomputers.
The negative experiences in the use of Coco are limited to the global
nature of semantic objects in Cocol, which requires explicit stacking of
variables, and to the fact that whenever an error has been detected in the
attributed grammar the program development cycle is enlarged by an additional
run of the compiler compiler.
However, the positive experiences outweigh the negative ones. Even
though we have no hand-coded compiler that we can compare directly to a
Coco-generated compiler, we are not afraid to claim that the efficiency of
compilers generated by Coco is close to that of hand-coded compilers, and it is
certainly easier to implement and to maintain a compiler with Coco than by
hand.
A
Definition of Adele
An algorithm description language, like a programming language, should offer all concepts
for the description of algorithms, but should be free of syntactic peculiarities. In this way,
the algorithms will stand out clearly and the reader will not be distracted by all sorts of
baroque constructs. For the same reason, it should use only a few constructs and give the
user freedom of expression. It should lean on popular programming languages so that it is
easy to read, but should not be firmly bound to a particular programming language. Our
algorithm description language Adele contains elements of PL/I, Modula-2, and Ada. We
will describe its structure by a few examples.
Overall structure
Each algorithm has a name, parameters, and instructions:
Search (ilistllengthixti):
begin
Instructions
end Search
The parameter list of functions is followed by the type of the function:
Search (ilistilengthix) integer:
begin
Instructions
return i
end Search
Input parameters are marked by i, output parameters by t, and transition parameters by J.
Statements
We distinguish between assignments, procedure calls, control statements, input-output
statements, and text statements. To improve readability, instructions may optionally be
separated by a semicolon.
204
Definition ofAdele
APP.A
Assignment. The assignment has the form
variable := expression
Procedure call. The call of a procedure consists of the procedure name and the actual
parameters in parentheses:
ReadCard(Tcard)
It is a useful convention to define procedure names partially with capital letters, and variable
names completely with lower case letters.
Control Statements. Here we use the modern forms of Modula-2 which are explicitly
terminated by an end, with the exception of the repeat statement:
if expression then statement sequence end
if expression then statement sequence else statement sequence end
case expression of case expression of
label: statement sequence or label: statement sequence
I label: statement sequence I label: statement sequence
end else statement sequence
end
while expression do statement sequence end
repeat statement sequence until expression
loop statement sequence with exit end
for variable := expression to expression [by expression] do
statement sequence
end
The control variable will be undefined after completion of the/or loop.
exit
exits from the immediately enclosing loop statement.
return
exits from a procedure.
return expression
exits from the function procedure with expression as the function
value.
halt
stops the algorithm without return to a surrounding algorithm.
Input-output statements. Here we only use three statements:
read (TxT eo f) read x or signal end of input file
wr ite (ix) write x to the output medium
wr iteIn emit line feed
We do not concern ourselves with the format of the input and output text. The boolean
parameter eof indicates the end of the input file. When x has been read, eof will be false
ApP-A
Definition ofAdele
205
on return. If* could not be read due to end of tile, 20/will be true and x will be undefined
on return.
Text statements. Text statements are free texts that describe actions. For example:
calculate mean values and variances;
The only rule is that they be terminated or separated by a semicolon so that their end can be
seen.
Expressions
For expressions we stipulate the common combinations of operators and operands without
giving specific rules. We state only that boolean expressions can be viewed as conditional
expressions with short circuit evaluation:
a & b is equivalent to if a then b else false
a I b is equivalent to if a then true else b end
This means that if the left operand alone is sufficient to determine the value of the
expression, then the right operand is not evaluated.
Declarations
Usually declarations are not needed for the description of short and simple algorithms,
especially if the variables used are obvious from the preceding explanations. However, in
longer algorithms with local variables, global variables, parameters, and perhaps also named
constants, it is advantageous if the algorithm description language also contains declarations.
In Adele, the declaration of constants and variables can be written between the head of the
algorithm and begin. We partition the declared items into the following classes:
parameters, global variables, constants, local dynamic variables, and local static variables. The
classes are identified by the keywords param, global, const, static. After each keyword,
one or more declarations of names of the corresponding type can be placed.
A constant declaration has the form
name = value
a variable declaration has the form
name: type
As types we use the elementary types of Pascal and Modula-2 with the following keywords
or structures:
integer
real
boolean
char
(red,green,blue)
array (index:index) of type
Array types allow a certain amount of freedom. If the range limits are not needed, we write
array of type
If the type is not needed, we write
array (index:index)
206 Definition of Adele App. A
If both are not needed, we simply write
array
As an example of the use of declarations, we describe a linear search algorithm with
declarations of all names:
Search (ilistllengthlxti) :
param list: array of integer
length, x, i: integer
local j: integer
begin
j:=length
while j>0 & list(j)<>x do
end
i:=j
end Search
For static variables, we allow optional initialization. This is done by adding the phrase
imt(value) after the type:
static finished: boolean init(false)
Comments
Comments, like those in Ada, start with two minus signs and extend over the rest of the
line.
— This is a comment
— which extends over two lines.
Undefined issues
Adele has no rules for the remaining items such as records, pointers, modules, etc. We write
them, more or less, in the style of Modula-2.
B
Modula-2 and Pascal
Since Modula-2 evolved from Pascal, its appearance is very similar to Pascal, and so Pascal
programmers have no difficulty in reading Modula-2 programs. Here we will briefly present
the most important differences for the reader of the Modula-2 programs in this book. The
complete language definition and examples can be found in the books of Wirth [1982] and
Pomberger [1986]. A didactically emphasized introduction to Modula-2 is the book of
Blaschek, Pomberger, and Ritzinger [1985].
General characteristics
Modula-2 is a system implementation language that enhances Pascal in the following key
features:
1. Modular program structure. Modula-2 programs are composed of separately compiled
modules. The compiler checks the consistency of the interface between modules. The
language is therefore especially suited for the implementation of data capsules and
abstract data types.
2. Coroutines and parallel processes. Modula-2 provides the coroutine facility as the
basic element for the implementation of parallel processes.
3. Low-level features. Modula-2 provides facilities to bypass strong type checking so
that memory words can be directly accessed and addresses can be handled. This makes it
possible to produce machine-specific code.
We will not describe parallel processing or low-level features in this chapter since Coco does
not use them.
Lexical elements
Modula-2 differs from most Pascal implementations by its sensitivity to the case of letters.
The names TRUE, True, and true denote three different objects.
Single character constants can be denoted by use of an octal number that is terminated
w"h a 'C, e.g. CONST ff= 14C.
207
208
Modula-2 and Pascal
APP.B
Declarations
In contrast to Pascal, constants, type, variable, and procedure declarations can appear in any
order. There are no labels or label declarations.
Standard types. In addition to the standard types of Pascal; INTEGER, REAL
BOOLEAN, CHAR, we have the standard type CARDINAL for unsigned natural
numbers. For 16-bit implementations, the range of integer values is -32 768 to +32 767.
The range of cardinal values is 0 to 65 535.
Enumeration, subrange, array, record, and pointer types are the same as in Pascal with
the exception that arrays cannot be packed, and variant record types have an improved syntax.
If the word length of the computer is w bits, then the cardinality of set types is
confined to w, or a 'small multiple thereof (according to the language definition). There is a
standard type BITSET that consists of the elements 0 through w -1:
TYPE BITSET = SET OF [0..W-1]
Set constants are enclosed in'{' and'} \
The machine-dependent type WORD denotes arbitrary data whose length is a
machine word. It is compatible with all types whose length is a machine word.
Expressions
Expressions in Modula-2 are constructed in the same way as in Pascal. The operators have
essentially the same meaning. One important difference in Modula-2 is that expressions that
contain the operators 'AND* or 'OR' are interpreted as conditional expressions whose
evaluation is terminated as soon as the result of the expression is known (short-circuit
evaluation):
a AND b is equivalent to if a then b else false
a OR b is equivalent to if a then true else b
Statements
Assignment, procedure call, and repeat-statement are taken from Pascal without change.
If, case, while, and for statements have been syntactically improved and expanded. The
if statement can have one or more elsif parts, the case statement can have an else part. All
of these constructs are explicitly terminated by END, which eliminates the need to
distinguish between single and multiple statements in a block:
ifstatement =
IF expr THEN statementsequence
{ELSIF expr THEN statementsequence}
[ELSE statementsequence]
END.
casestatement =
CASE expr OF case {"I" case} [ELSE statementsequence] END.
case = caselabellist n:w statementsequence.
Whilestatement =
WHILE expr DO statementsequence END.
forstatement =
FOR ident ":=w expr TO expr [BY constexpr] DO
statementsequence
END.
APPB
Modula-2 and Pascal
209
features are the loop statement (infinite loop), the exit statement to leave the loop
**eW nt ^d the return statement to leave a procedure or function (here with passing of
jhe function value):
loopstatement = LOOP statementsequence END.
exitstatement = EXIT.
returnstatement = RETURN [expr].
There is no goto statement and no input-output statement in Modula-2. Input and output is
done by procedure calls.
Procedures
There are procedures and function procedures as in Pascal that permit VAL and VAR
parameters. Procedures and functions both begin with the keyword PROCEDURE. Modula-2
permits procedure variables (not used by Coco), and arrays of unspecified length (so-called
open arrays) e. g. in the form:
PROCEDURE Sort(VAR list:ARRAY OF INTEGER);
VAR n: INTEGER;
BEGIN (* assume list: ARRAY[0..n] OF INTEGER *)
n:=HIGH(list); (* standard proc. to find upper limit of index *)
END Sort
Standard procedures. The standard procedures that differ from Pascal are:
CAP(ch):
HIGH(a):
DEC(x)
DEC(x,n)
EXCL(s,i)
HALT
INC(x)
INC(x,n)
INCL(s,i)
CHAR
CARDINAL
converts from lower to upper case
returns the upper bound of array a
decrease
exclude element i from set s:
terminate entire program
increase
include element i in set s:
x:=x-l
x:=x-n
s:=s-{i}
x:=x+l
x:=x+n
s:=s+{i}
Type transfer functions. Modula-2 offers the possibility of explicit type conversions by
so-called type transfer functions. Each type name can be used as a function with one
argument. For example, the type transfer function
CARDINAL(b)
denotes the bit pattern of b (without any conversion) but with type CARDINAL. The
context condition must hold that type b has the same number of bits as CARDINAL.
Type transfer functions should be used with care since they make programs machine
dependent
Modules
An executable Modula-2 program consists of one or more separately compiled modules. A
module is a collection of declarations and statements giving a higher-level unit. Module
boundaries are like a fence for names, which means that names declared inside a module are
unknown outside, and names declared outside a module are unknown inside. The programmer
can open the fence for selected names by an import list that contains all names that are
210 Modula-2 and Pascal Aft)
declared outside and are to be known inside the module and an export list that contains
names that are declared inside the module and are to be known outside. Thus the j
explicitly specified by the programmer and visible in the program text.
There are four kinds of modules: main modules, definition modules, implementation
modules, and inner modules.
Main modules are almost like Pascal programs. They consist of an import list
declarations (of constants, types, variables, procedures, and inner modules), and statements:
programmodule =
MODULE ident •;■
{import}
{declaration}
BEGIN
statement sequence
END ident n.n
Only the line [import] is different from Pascal. It references other separately compiled
modules, and causes these modules to be loaded. In the most common form
import = FROM ident IMPORT identlist n;n
ident is the name of the module to be loaded and identlist contains the names of the objects
exported by the loaded module for use in the declarations and statements of the importing
module. In the less common form
import = IMPORT identlist n;n
the identlist contains only the names of the modules that are to be loaded together with the
importing module.
Separately compilable modules that are not main programs consist of two separately
compiled parts, the definition module and the implementation module. The definition
module describes the interface of the module to its clients. All declared names are
automatically exported.
definitionmodule =
DEFINITION MODULE ident w;n
{import}
{definition}
END ident n.n
definition contains the declarations of the exported objects. Procedures are only specified by
their procedure heading (procedure name and parameters):
definition =
CONST ...
I TYPE ...
I VAR ...
I PROCEDURE ident [formalparameters] w;n.
The implementation module contains the declaration of the non-exported objects, the code
for all procedures, and the statements of the module:
implementationmodule =
IMPLEMENTATION MODULE ident ";"
{import}
{declaration}
BEGIN
statement sequence
END ident V"
APPB
Modula-2 and Pascal
211
.ri ^ implementation modules exist in pairs and hi
55s*ticm module must be compiled before the implementation module. A module can be
have the same name. The
m module. A module can be
if the definition modules of all of the imported modules have been compiled
*Cfcr'storage for local objects of separately compiled modules is allocated when the object
gram is loaded, and remains allocated until the program terminates (static memory
allocation). The statement sequence of the implementation module is executed immediately
after loading the module, and therefore can be used for the initialization of data.
Inner modules are modules that are not separately compiled. They are like procedures
nested inside other modules or procedures. They can import and export
moduledeclaration =
MODULE ident ";"
{import}
[EXPORT [QUALIFIED] identlist n;n]
{declaration}
BEGIN
st atement sequen ce
END ident.
Storage for local objects of inner modules is allocated when the procedure that contains the
inner module is activated, and released when the procedure returns to its caller. By calling the
surrounding procedure, the statements of the inner module are also executed.
There is a (fictitious) separately compiled module SYSTEM, provided by the
compiler, that gives access to low-level features. It exports types and related procedures
(including the type WORD). Each module that imports SYSTEM is therefore machine
dependent
Syntax of Cocol
Keywords:
Other terminal symbols:
Nonterminal symbols:
Upper-case letters
Literals or lower-case letters
Upper and lower-case letters
Cocol
Expression
Term
Factor
Attributes
InAttributes
OutAttributes
SemAction
SemMacroDef
Symbol
AliasName
= GRAMMAR identifier
[SEMANTIC DECLARATIONS {any}]
[MACROS {SemMacroDef}]
TERMINALS {Symbol [Attributes] [AliasName]}
[PRAGMAS {Symbols [Attributes] [SemAction]}]
NONTERMINALS {identifier [Attributes] [AliasName]}
RULES {identifier [Attributes] w=n Expression "."}
ENDGRAM.
« Term {n|n Term}.
= Factor {Factor}.
■ Symbol [Attributes]
I EPS
| ANY
I SemAction
I n(n Expression n)n
I n[n Expression "]"
I "{" Expression "}".
= "<" ( OutAttributes
I InAttributes [w;n OutAttributes]) n>n.
= IN n:w (identifier | number)
{"," (identifier I number)}.
OUT
identifier {"," identifier}.
= SEM ( n(n identifier •)"
I {any}
) ENDSEM.
= SEM n:n identifier n:n {any} ENDSEM.
= identifier | string.
= ALIAS Symbol.
212
V
G-code
q T sy terminal
If the next input symbol is sy, then recognize it, else report an error.
X TA sy adr terminal with alternative
If the next input symbol is sy, then recognize it, else go to adr.
2 NT sy nonterminal
If the next input symbol is a valid start of the nonterminal sy, then enter the
production of sy, else report an error.
3 NT A sy adr nonterminal with alternative
If the next input symbol is a valid start of the nonterminal sy, then enter the
production of sy, else go to adr.
4 NTS sy sem nonterminal with input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
5 NT AS sy adr sem nont. with alternative and input attribute semantics
If the next input symbol is a valid start of the nonterminal sy, then execute the
semantic action sem (for input attribute assignment) and enter the production of sy,
else report an error.
6 ANY any
Recognize the next input symbol.
7 ANY A nradr any with alternative
If the next input symbol is in the symbol set (any-set) denoted by nr, then recognize
it, else go to adr.
8 EPS nr epsilon (empty string)
If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else report an error.
9 EPS A nradr epsilon with alternative
If the next input symbol is in the successor set (eps-set) denoted by nr, then
recognize the empty string, else go to adr.
*0 JMP adr jump
Go to adr.
11 RET return
Return from the production of a nonterminal.
^2... SEM semantic action
Execute the semantic action with the number of the G-code instruction.
213
E
Intermodular cross-reference list
The following list contains all names that are exported or imported by a module of the Coco
system as well as their data types. For every name, the first reference denotes the exporting
module and the other references the importing modules.
Allocate
alts
at
Attrtype
ClearMarkList
ClearSet
Close
CloseFile
col
CompErr
CompleteAt
PROC (VAR ptr:ADDRESS; size:LONGINT)
System, cocogen, cocogen2, cocosym, cocosyn, Errors
CARDINAL
cocogra, cocogen2, cocosem
ARRAY[1..10] OF CARDINAL
cocolex, cocogen, cocosem, cocosyn
(term,nonterm,const)
cocogen, cocosem
PROC (VAR m:Marklist)
cocogra, cocosym, cocotst
PROC (VAR srSymbolset; n:CARDINAL)
cocosym, cocotst
PROC (f:File)
FilelO, coco, cocogen, cocogen2, cocolst
PROC
cocogen, coco, cocosem
CARDINAL
cocolex, cocogen, cocogen2, cocosem, cocosym, cocosyn
PROC (nr:CARDINAL)
Errors, cocogen, cocogen2, cocosem, cocosym
PROC (sy,nr:CARDINAL): BOOLEAN
cocosym, cocosem
214
App. E Intermodular cross-reference list 215
con
ConcatLeft
ConcatRight
Copy
CopyFramePart
ddt
Deallocate
Deletable
File
FilelO, coco, cocogen, cocogen2, cocogra, cocolex, cocosem,
cocosym, cocosyn, cocotst, Errors
PROC {VAR gp,gl,gpl,gll:CARDINAL)
cocogra, cocosem
PROC (VAR gp,gl,gpl,gll:CARDINAL)
cocogra, cocosem
PROC (typ,col:CARDINAL)
cocogen, cocosem
PROC (VAR fl,f2:File; s:ARRAY OF CHAR)
cocogen, cocogen2
ARRAY[nA*..■Z") OF BOOLEAN
cocolex, coco, cocogra, cocosem, cocosym, cocotst
PROC (VAR ptr:ADDRESS)
System, cocogen, cocogen2, Errors
PROC (loc:CARDINAL): BOOLEAN
cocogra, cocosym, cocotst
DeleteRedundantEps PROC
cocogra, coco
DelNode PROC (gn:Graphnode): BOOLEAN
cocogra, cocosym, cocotst
Direction (up,down)
cocosym, cocosem
Done BOOLEAN
FilelO, coco, cocogen, cocogen2
EF CONST CHAR
FilelO, cocolex, cocolst
EmitAction PROC (line.-CARDINAL; VAR semrCARDINAL)
cocogen, cocosem
EOL CONST CHAR
FilelO, cocolex, cocolst
Errornode RECORD
Errors, cocosyn
Errorptr POINTER TO Errornode
Errors, cocolst, cocosyn
Fiie RECORD
FilelO, coco, cocogen, cocogen2, cocolex, cocolst, Errors
filesopen BOOLEAN
cocogen, coco
FindCircularRules PROC (VAR ok:BOOLEAN)
cocotst, coco
FindDelSymbols PROC
cocosym, coco
216 Intermodular cross-reference list App. E
GenAssign PROC (typtAttrtype; left,right:CARDINAL)
cocogen, cocosem
GenSynFiles PROC
cocogen2, coco
GetA PROC (n:CARDINAL; VAR set:Symbolset)
cocosym, cocogen2
GetAt PROC (sy,n:CARDINAL; VAR spix:CARDINAL; VAR dirdirection)
cocosym, cocosem
GetE PROC (n -.CARDINAL; VAR set: Symbol set)
cocosym, cocogen2
GetF PROC (sy:CARDINAL; VAR first:Symbolset)
cocosym, cocogen2, cocotst
GetFirstSet PROC (loc:CARDINAL; VAR set:Symbolset)
cocosym, cocotst
GetFo PROC (sy .-CARDINAL; VAR set: Symbol set)
cocosym, cocotst
GetMacroNr PROC (spix:CARDINAL; VAR sem:CARDINAL)
cocosym, cocosem
GetName PROC (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL)
cocolex, cocogen, cocogen2, cocogra, cocosym, cocotst
GetNextSemErr PROC (VAR nr,line,col:CARDINAL)
Errors, cocolst
GetNextSynErr PROC (VAR symbols:Errorptr; VAR line,col:CARDINAL)
Errors, cocolst
GetNode PROC (p:CARDINAL; VAR gn:Graphnode)
cocogra, cocogen2, cocosem, cocosym, cocotst
GetNumberOfErrors PROC (VAR synerrors,semerrors:CARDINAL)
Errors, coco
GetSy PROC
cocolex, cocosyn
GetSy PROC (sy-.CARDINAL; VAR sn: Symbol node)
cocosym, cocogen2, cocogra, cocosem, cocotst
GetSymbolSets PROC
cocosym, coco
gramspix CARDINAL
cocosym, cocogen2, cocosem
GraphList PROC
cocogra, cocosem
Graphnode RECORD
cocogra, cocogen2, cocosem, cocosym, cocotst
InsertFramePart PROC
cocogen, cocosem
App.E
Intermodular cross-reference list
217
IslnSet
line
LLlTest
1st
Mark
Marked
Marklist
maxany
maxeps
maxn
maxp
maxs
maxsem
maxt
PROC (n:CARDINAL; VAR s.-Symbolset):
cocosym/ cocotst
BOOLEAN
CARDINAL
cocolex, cocogen, cocogen2, cocosem, cocosym, cocosyn
PROC (VAR 111-.BOOLEAN)
cocotst, coco
File
cocolst, coco, cocogen2, cocosym, cocotst
PROC (loc:CARDINAL; VAR m:Marklist)
cocogra, cocosym, cocotst
PROC (loc:CARDINAL; VAR m:Marklist): BOOLEAN
cocogra, cocosym, cocotst
ARRAY[O..maxnodes DIV 16] OF BITSET
cocogra, cocosym, cocotst
CARDINAL
cocosym, cocogen2
CARDINAL
cocosym, cocogen2
CARDINAL
cocogra,
CARDINAL
cocosym,
CARDINAL
cocosym,
CARDINAL
cocogen,
cocogen2, cocosym
cocogen2, cocogra, cocotst
cocogen2, cocogra, cocotst
cocogen2
CARDINAL
cocosym, cocogen2, cocotst
NewAt PROC (sy,spix:CARDINAL; dir:Direction)
cocosym, cocosem
NewEpsBeforeDelNts PROC
cocogra, coco
PROC (spix,sem:CARDINAL; VAR ok:BOOLEAN)
cocosym, cocosem
NewMacro
NewNode
NewSy
normal
Open
PROC (typ:Symboltype;sp,line:CARDINAL)CARDINAL
cocogra, cocosem
PROC (spix-.CARDINAL; typ:Symboltype): CARDINAL
cocosym, cocosem
enumeration constant
System, coco, Errors
PROC (VAR f:File; vo 1 Ref-.INTEGER; fn:ARRAY OF CHAR;
output:BOOLEAN)
FilelO, coco, cocogen, cocogen2, cocolst
218
Intermodular cross-reference list
App.E
OpenFile
OpenSem
Parse
printinput
PrintListing
printnodes
PrintSynError
PutStatistics
Read
RepNode
RepSy
RestartHash
Restriction
rootloc
rules
Semant
SemErr
SetBit
src
StartCopy
StopHash
Symbolnode
PROC (spixtCARDINAL)
cocogen, cocosem
PROC (linetCARDINAL; VAR sem:CARDINAL)
cocogen, cocosem
PROC (VAR correct:BOOLEAN)
cocosyn, coco
BOOLEAN
cocosyn, coco, cocolex
PROC
cocolst, coco
BOOLEAN
cocosyn, coco, cocolex
PROC (VAR f:File; VAR synerrors:CARDINAL)
Errors, cocolst
PROC
cocogen2, coco
PROC (VAR f:File; VAR chtCHAR)
FilelO, coco, cocogen, cocolex, cocolst, Errors
PROC (p:CARDINAL; gn:Graphnode)
cocogra, cocosem, cocosym
PROC (sy:CARDINAL; sn:Symbolnode)
cocosym, cocogen2, cocogra, cocosem, cocotst
PROC
cocolex, cocosem
PROC (nr:CARDINAL)
Errors, cocogra, cocolex, cocosem, cocosym
CARDINAL
cocogra, cocogen2, cocosem, cocosym, cocotst
CARDINAL
cocogra, cocogen2, cocosem
PROC (sem:CARDINAL)
cocosem, cocosyn
PROC (nr,line,col:CARDINAL)
Errors, cocogen, cocogen2, cocolex, cocosem, cocosym
PROC (VAR s:Symbolset)
cocosym, cocotst
File
cocolex, coco, cocogen, cocolst
PROC (col .-CARDINAL)
cocogen, cocosem
PROC
cocolex, cocosem
RECORD
AppE
Intermodular cross-reference list
219
Symbolset
Symboltype
SyNr
SyntaxError
Terminate
cocosym, cocogen2, cocogra, cocosem, cocotst
ARRAY[0..maxterminals DIV 16] OF BITSET
cocosym, cocogen2, cocotst
(eps,t,pr,nt,any,err)
cocosym, cocogen2, cocogra, cocosem, cocotst
PROC (spix:CARDINAL): CARDINAL
cocosym, cocosem
PROC (symbols:Errorptr; line,col:CARDINAL)
Errors, cocosyn
PROC (st:Status)
System, coco, Errors
TestCompleteness PROC (VAR ok-.BOOLEAN)
cocotst, coco
TestlfAllNtReached PROC (VAR ok:BOOLEAN)
cocotst, coco
TestlfNtToTerm
typ
Unit
PROC (VAR ok:BOOLEAN)
cocotst, coco
CARDINAL
cocolex, cocosem, cocosyn
PROC (VAR sl,s2:Symbolset; n:CARDINAL)
cocosym, cocotst
Write PROC (VAR f:File; ch:CHAR)
FilelO, cocogen, cocogen2, cocolex, cocolst,
cocosym, Errors
WriteCard PROC (VAR f:File; nr:CARDINAL; w:INTEGER)
FilelO, cocogen, cocogen2, cocogra, cocolex, cocolst,
cocosem, cocosym, cocosyn, cocotst, Errors
Writelnt PROC (VAR f:File; nr:INTEGER; w:INTEGER)
FilelO, coco
WriteLn PROC (VAR f:File)
FilelO, coco, cocogen, cocogen2, cocogra, cocolst, cocosym,
cocosyn, cocotst, Errors
WriteString PROC (VAR f:File; s:ARRAY OF CHAR)
FilelO, coco, cocogen, cocogen2, cocogra, cocolex, cocolst,
cocosem, cocosym, cocosyn, cocotst, Errors
WriteText PROC (VAR f:File; t:ARRAY OF CHAR; 1:INTEGER)
FilelO, cocogen, cocogen2, cocogra, cocolex, cocosym,
cocotst, Errors
F
Program listings
This appendix contains the program listings of Coco, more than 3500 lines of Modula-2
source code. It is not our intention to describe the program step by step. At this point we
want to provide the reader with an overview of the function of the individual modules, and to
tell him where he should start reading, and which procedures he should further review in
order to understand the program. Modula-2 has a high degree of self-documentation, which
makes it possible to partition a large program into small modules that are easy to
understand, and furthermore to separate these modules into even smaller procedures that are
once more easy to understand. By reviewing the algorithms in Chapters 2,3 and 7, it should
not be difficult for the reader to understand all the details of Coco.
F.l Overview
Figure F.l shows the phases of Coco with their modules and the data flow between them.
The lexical analyzer (cocolex) reads the compiler description and separates it into
tokens. The syntax analyzer (cocosyn) checks the syntax of the input stream and drives the
semantic processing program (cocosem) by activating semantic actions via action numbers.
In this phase, the symbol list (in cocsym) and the top-down graph (in cocogra) are
generated. The module cocogen generates the new semantics evaluator from the semantic
actions of the compiler description. Finally, the symbol list and the top-down graph are
analyzed in the grammar tests (cocotst), and if these tests have been successfully completed,
the new syntax analyzer with its parser tables is generated.
Since Coco was constructed by itself, the syntax analyzer (cocosyn) and its semantic
evaluator (cocosem) are examples of compiler parts produced by Coco.
220
ApP-P
Overview
221
Lexical analysis
cocolex
T
Semantic
evaluate?
Grammar tests
cocotst
f
Syntax analysis
Semantic analysis
cocosyn
cocosem
cocosym
cocogra
cocogen
Compiler generation
cocogen2
Compiler description
Symbols, attributes
Symbol list
top-down graph
Symbol list
top-down graph
Syntax analyzer
Fig.F.l Phases and modules of Coco
F.2 Module hierarchy
Coco consists of
1. 10 Coco-related modules
coco
cocolex
cocosyn
cocosem
cocogra
cocosym
cocotst
cocogen
cocogen2
cocolst
main module
lexical analyzer
syntax analyzer
semantic evaluator
top-down graph handler
symbol list handler
grammar tests
generator of the new semantic evaluator
generator of the new syntax analyzer and the parser tables
source list generator
2. 2 general purpose standard modules
Errors general error module for compilers generated by Coco
FilelO input/output procedures
3. 1 operating system module (not part of Coco)
System dynamic memory management (heap)
222
Program listings
App.F
Figure F.2 shows the module hierarchy. An arrow from module A to module B means that
A calls B.
Arrows leading to the operating system module and the standard modules are not shown
for simplicity. Those modules are used by almost all of the other modules, and are not a
direct part of Coco.
t ft
cocogen
l
cocosyn
cocogra
U
~1
cocosem
i
cocogen2
tTttT
oocolex
~~T~
cocolst
cocotst
cocosym
System FilelO Errors
Fig. F.2 Module hierachy with relation 'uses procedures from'
F.3 Module descriptions
We will now give a short description of all modules of the Coco system. A diagram for each
module will show which procedures are called from other modules.
coco
coco is the main module. It opens the source file and the list file and calls the syntax
analyzer {Parse). When the syntax analysis is completed, the source file has been read, and
the symbol list and a top-down graph have been stored. The top-down graph is further
processed by inserting and deleting eps-nodes at certain positions (NewEpsBeforeDelNts,
DelRedundantEps) and the terminal start symbols are collected (FindDelSymbols,
GetSymbolSets). After that, coco calls the grammar tests (FindCircularRules, Testlf-
NtToTerm, TestCompleteness, TestlfAllNtReachedy LLlTest) and generates the target
compiler (GenSynFiles) if no errors are found. At the end, statistics about the compilation
are written to the list file (PutStatistics), and all files are closed.
App-
Module descriptions
223
cooosyn
Parse
cocosym
FindDelSymbols
GetSymbolSets
I
cocogen2
GenSynFiles
PutStatistics
cocogen
CloseFile
cocolst
PrintListing
cocotst
FindCircularRules
TestlfNtToTerm
TestCompleteness
TestlfAllNtReached
LLlTest
Fig. F3 coco and the modules imported by it
cocolex
cocolex is the lexical analyzer of Coco. It reads the Cocol input, separates it into tokens,
and passes them together with their attributes to the syntax analyzer. Names and strings are
stored in a name list. Numbers are translated into their numeric value. The main procedure of
cocolex is GetSy.
cocosyn
cocosyn is the syntax analyzer of Coco and has been generated by Coco itself. It operates
according to the table-driven LL(1) parsing algorithm described in Section 2.5, and uses the
error-handling mechanism described in Section 2.6. cocosyn gets the source tokens from the
lexical analyzer (GetSy), analyzes them, and calls the procedure Semant to execute the
semantic actions.
cooosyn
1 cocolex
GetSy
cocosem
Semant
Fig. F.4 cocosyn and the modules imported by it
cocosem
cocosem is the semantics evaluator of Coco. It has been generated by Coco itself and
contains the semantic actions of the attributed grammar of Coco, cocosem calls the
Procedures for the generation and management of the symbol list and the top-down graph:
1.
2.
3.
4.
5.
6.
symbol handling: NewSy, GetSy, RepSy, SyNr;
attribute handling: NewAt, GetAt, CompleteAt;
top-down graph handling: NewNode, GetNode, RepNode, ConcatLeft,
ConcatRight, GraphList;
generation of the semantic evaluator: OpenFile, CloseFile, OpenSem,
StartCopy, Copy, InsertFramePart, GenAssign, Emit Assign, EmitAction;
handling of the semantic macros: NewMacro, GetMacroNr,
control over the entries into the name list: StopHash, RestartHash.
224
Program listings
App.p
The listing of cocosem is an example of a large semantic evaluator generated by Coco. But
it is not useful to study cocosem, rather one should study the attributed grammar.
cocosem
coookx
StopHash
RestartHash
oocosym
NewSy
GctSy
RepSy
SyNir
NewAt
GetAt
CompleteAt
NewMacro
GetMacroNr
cocogra
NewNode
GetNode
RepNode
ConcatLeft
ConcatRight
Graphlist
cocogen
OpenFile
CloseFile
Copy
InsertFramePart
StartCopy
OpenSem
GenAssign
EmitAction
Fig. F.5 cocosem and the modules imported by it
cocosym
The module cocosym handles the symbol list of Coco. It contains procedures to generate,
read, and modify symbol nodes, to search names in the symbol list, to enter, read, and check
attributes, and to generate and retrieve information about semantic macros. It also contains
procedures to determine the deletability of nonterminals, and to collect their terminal start
symbols, cocosym uses a few procedures from cocolex and cocogra.
cocosym
1
cocolex
GetName
1
cocogra
GetNode
RepNode
Deletable
DelNode
ClearMaxkList
Marie
Marked
Fig. F.6 cocosym and the modules imported by it
cocogra
The module cocogra handles the top-down graph. It contains procedures to generate, read,
and modify graph nodes, to link subgraphs, and to print the entire top-down graph for
tracing, cocogra also contains procedures to insert eps-nodes in front of deletable
nonterminals, and to remove redundant eps-nodes. To output the top-down graph, cocogra
needs the syntax symbols and their names, which it gets from the modules cocosym and
cocolex.
cocogen
The module cocogen generates the semantic evaluator of the target compiler from the
semantic declarations and semantic actions of the input grammar. It contains procedures to
App-
Module descriptions
225
cocogra
oocolex
GetName
1
1
cocosym
GetSy
RepSy
Fig. F.7 cocogra and the modules imported by it
read the frame module, to copy the semantic parts from the attributed grammar, and to
translate attributes into semantic actions, cocogen uses no other modules of Coco except
for the lexical analyzer, from which it gets the symbol names.
cocogen
oocolex
GetName
Fig. F.8 cocogen and the modules imported by it
cocotst
The module cocotst is a collection of procedures for the execution of the grammar tests as
described in Section 7.5. It uses the symbol list (from cocosym) and the top-down graph
(from cocogra). For the output of error messages, cocotst needs the symbol names which
are obtained with the procedure GetName. To recognize the deletability of graph nodes, and
subgraphs, it uses the procedures Deletable and DelNode from cocogra.
cocotst
1 cocosym
GetSy
RepSy
GetFiistSet
GetF
GetFb
SetBit
IsInSet
Unit
ClearSet
cocogra
GetNode
Deletable
DelNode
ClearMarklist
Made
Marked
oocolex
GetName
Fig. F.9 cocotst and the modules imported by it
cocogen2
The module cocogen! generates the syntax analyzer and the parser tables of the target
compiler. The table values are obtained from the symbol list (with GetSy, RepSy, GetF,
GetE, and GetA) and from the top-down graph {GetNode). Before the tables can be
inserted into the syntax analyzer, cocogen! transforms the top-down graph into G-code
instructions. The syntax analyzer of the target compiler is assembled mainly from the frame
parts (on the file cocosynframe), in which cocogen! inserts the parser tables, some
226 Program listings App. p
declarations, and grammar-specific names. For the output of statistics, cocogen2 uses the
procedure GetName from the lexical analyzer.
cocogen2
| cocogen
CopyFramePart
cocogra
GetNode
cocosym
GetSy
RepSy
GetF
GetE
GetA
cocolex
GetName
Fig. F.10 cocogen2 and the modules imported by it
cocolst
cocolst is called by the main program if errors have been detected during parsing. It reads
the input again and prints a source list with error messages.
Errors
Errors is a general-purpose error message module that can be used by all compilers
generated by Coco. It contains procedures for storing semantic and syntax errors, for
retrieving stored error messages, and for printing all of the stored error messages at the end of
the program. In addition, it contains procedures for handling implementation restrictions and
compiler errors.
FilelO
FilelO is a general-purpose module that contains screen and disk I/O procedures for
characters, strings, and numbers. It is based on five system modules which are not described
in this book. These are Terminal, MemTypes, OS, Toolbox and QuickDraw (see Inside
Macintosh [1985] and Wirth et al. [1986]).
System
System is an operating system module that among other things manages the heap.
F.4 Instructions on how to study the source code
The listings consist of the attributed grammar of Coco and all other modules in alphabetical
order. The reader should first study the source code of the main module coco to see how the
program is started and initialized. The lexical analyzer and the syntax analyzer are not
essential for an understanding of the other modules, so they may be skipped in the
beginning.
The central document that describes the actual translation is the attributed grammar.
The reader should study the attributed grammar and the procedures that are called from the
semantic actions in detail. It is recommended that the procedures belonging to a particular
task are studied together. These tasks are:
ApPF
Instructions how to study the source code
227
handling the symbol list: NewSy, GetSy, RepSy, IsSy
\ handling the attributes: NewAt, GetAt, CompleteAt
% handling the top-down graph: NewNode, GetNode, RepNode, ConcatLeft,
ConcatRight, GraphlAst
a generating the semantic evaluator: CloseFile, CopyFramePart, InsertFramePart
5. copying semantic parts: OpenSem, StartCopy, Copy
6 generating attribute assignments: GenAssign, EmitAction
7 handling semantic macros: NewMacro, GetMacroNr
8* controlling the name list entries: StopHash, RestartHash
The procedures for the collection of the symbol sets and the execution of the grammar tests
may be studied in any order. The only procedures used almost everywhere are the procedures
for marking paths that have been previously visited in traversing the top-down graph
(ClearMarkList, Mark, and Marked in cocogra) and the procedures which check the
deletability of graphs and graph nodes {Deletable and DelNode in cocogra). These
procedures should be read first
As the last module, the reader should study cocogen2. It generates the parser tables and
the syntax analyzer, and uses the data structures generated by the other modules. The reader
should study these modules first to understand how the data structures are filled.
Before an implementation module is studied, the corresponding definition module
should be inspected. It describes the interface of the module, and contains the declarations and
descriptions of all exported objects. The procedures of an implementation module appear in
alphabetical order. Most of them are at the outermost level of the module. Only auxiliary
procedures that are clearly part of another procedure are nested within this procedure.
Each implementation module is followed by a cross-reference list As an additional aid,
Appendix E contains an intermodular cross-reference list with the names and types of all
objects transferred between modules. This list also shows which modules export an object
and which import it
Program listings in alphabetical
coco.ATG
coco.MOD
cocogen.DEF,
cocogen2.DEF,
cocogra.DEFr
cocolex.DEF,
cocolst.DEF,
cocosem.DEF,
cocosemframe
cocosym.DEF,
cocosyn.DEF,
cocosynframe
cocotst.DEF,
E*rors.DEF,
FilelO.DEF
System.DEF
cocogen.MOD
r cocogen2.MOD
cocogra.MOD
cocolex.MOD
cocolst.MOD
cocosem.MOD
cocosym.MOD
cocosyn.MOD
cocotst.MOD
Errors.MOD
FilelO.MOD
order
attributed grammar
main program
generator of semantics processor
generator of the syntax analyzer
top-down graph manager
lexical analyzer
source list generator
semantic evaluator of Coco
semantics evaluator frame
symbol list manager
syntax analyzer
syntax analyzer frame
grammar tests
standard error module
input/output module
dynamic memory management
228
241
245
254
266
274
283
287
297
299
316
328
338
348
356
369
228
Program listings
App.p
1
2 -
3 -
4 -
5 -
6 -
- Attributed grammar of Coco
Moe 13.3.83
This grammar is a documentation of the compiler compiler Coco,
but it is also an example how to use the Coco input language Cocol.
The grammar describes the construction of the parser tables and of
the semantic evaluator.
8 GRAMMAR coco
9
10 —
11 —
12 —
13 —
14 —
15 —
16 —
17 —
18 —
19 —
20 —
21 —
22 —
23 —
24 —
25 —
26 —
21 —
28 —
29 —
30 —
31 —
32 —
coco
expr
term
fact
attr
inattr
outattr
semaction
macrodef
symbol
= GRAMMARSY IDENT
[SEMANTICSY DECLARATIONSY {any}]
[MACROSY {macrodef}]
TERMINALSY {symbol [attr] [aliasname]}
[PRAGMASY {symbol [attr] [semaction]}]
NONTERMINALSY {IDENT [attr] [aliasname]}
RULESSY {IDENT [attr] '-■ expr '.'}
ENDGRAMSY .
= term {■I• term} .
= fact {fact} .
= ( symbol [attr]
I EPSSY
I ANYSY
I semaction
I •{* expr »)'
I •[• expr »]'
I •{' expr •}' .
= •<■ (outattr I inattr [';' outattr]) •>' .
= INSY »:' (IDENT I NUMBER) (V (IDENT j NUMBER)}
= OUTSY ':' IDENT {',' IDENT} .
= SEMSY ( '(' IDENT ■)» | {any}) ENDSEMSY .
= SEMSY ■:■ IDENT ":" {any} ENDSEM .
= IDENT | STRING .
33 — aliasname = ALIASSY symbol
34
35
36 SEMANTIC DECLARATIONS
38
39 FROM cocogen
40
41 FROM cocogra
42
43 FROM cocolex
44 FROM cocosym
45
46
47 FROM Errors
48 FROM SYSTEM
49
50
51 CONST
52 null = 65535;
53
54 TYPE
55 Usage = (def, check, use)
56
57 VAR
58 — symbol nodes
59 sn: Symbolnode;
IMPORT Attrtype, CloseFile, Copy, EmitAction, GenAssign,
InsertFramePart, OpenFile, OpenSem, StartCopy;
IMPORT alts, rules, rootloc, ConcatLeft, ConcatRight,
GetNode, GraphList, Graphnode, NewNode, RepNode;
IMPORT typ, line, col, ddt, RestartHash, StopHash;
IMPORT gramspix, CompleteAt, Direction,
GetAt, GetMacroNr, GetSy, NewAt, NewMacro,
NewSy, RepSy, Symbolnode, Symboltype, SyNr;
IMPORT CompErr, Restriction, SemErr;
IMPORT VAL;
- null symbol
symbol node
APP-F
cocoATG
229
60 sy, syl: CARDINAL; — symbol numbers
61 rootsy: CARDINAL; — start symbol of grammar
6? eofsy: CARDINAL; — endfile symbol (always Nr. 0*)
64 — graph nodes
55 gn: Graphnode; — graph node
66 gp,gpl/9P2f9P3: CARDINAL; — ptr to start of graphs
67 gl,gllfgl2rgl3: CARDINAL; — ptr to right open ends of graphs
68 dd,ddl,dd2: BOOLEAN; — is graph deletable ?
69 gpo: CARDINAL; — auxiliary ptr
70 firstfact: BOOLEAN; — TRUE if first factor in term
71 — attribute processing
72 kind: Usage; — usage of attribute
73 styp: Symboltype; — (eps,t,pr,nt,any,err)
74 dir, dirl: Direction; — input/output attribute
75 count: CARDINAL; — attribute counter
76 n: CARDINAL; ~ value of an attribute constant
77 — generation of semantic evaluator
78 seml,sem2,sem3: CARDINAL; — semantic actions
79 firstsymbol: BOOLEAN; — current symbol the first in action
80 — various
81 ok: BOOLEAN; — error indicator
82 spix, spixl: CARDINAL; — auxiliaries
83 dummy: CARDINAL;
84
85
86 — SEMANTICSTACK Stack to save semantic values
87
88 MODULE SEMANTICSTACK;
89 IMPORT CompErr, Restriction;
90 EXPORT Pop, Push;
91 CONST maxstacksize = 70;
92 VAR
93 stack: ARRAY[1..maxstacksize] OF CARDINAL;
94 sp: CARDINAL;
95
96 PROCEDURE Pop(): CARDINAL;
97 VAR x: CARDINAL;
98 BEGIN
99 IF sp=0 THEN CompErr(6); ELSE x:=stack[spj; DEC(sp); END;
100 RETURN x;
101 END Pop;
102
103 PROCEDURE Push(x:CARDINAL);
104 BEGIN
105 IF sp<maxstacksize
106 THEN INC(sp); stack[sp]:=x;
107 ELSE Restriction(14);
108 END;
109 END Push;
110
111 BEGIN
112 sp:=0;
113 END SEMANTICSTACK;
114
115
116 — Error Report semantic error
117
118 PROCEDURE Error(nrCARDINAL);
230
Program listings
App.F
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
BEGIN SemErr(nr,line,col); END Error;
MACROS
sem :AsslgnIdl:
INC(count);
CASE kind OF
use:
IF styp^nt THEN
GetAt (!sy,!count,Aspixl,Adirl);
IF spixloO THEN
IF dirodirl
THEN GenAssign(!nonterm,!spixl,!spix);
ELSE Error(8); END;
END;
END;
1 check:
IF styp^nt THEN
GetAt (!sy,!count,Aspixl,Adirl);
IF spixloO THEN
IF spixospixl THEN Error(9); END;
IF dirodirl THEN Error(8); END;
END;
END;
i def:
NewAtdsy, Ispix, Idir);
END; — CASE
endsem
sem :AssignId2:
INC(count);
CASE kind OF
use:
IF styp=t THEN
GenAssign(!term,!spix,!count);
ELSIF styp=nt THEN
GetAt (!sy,I count,Aspixl,Adirl);
IF spixloO THEN
IF dir-dirl
THEN GenAssign(!nonterm,Ispix,Ispixl)
ELSE Error(8);
END;
END;
END;
1 check:
IF styp-nt THEN
GetAt (!sy,!count,Aspixl,Adirl);
IF spixloO THEN
IF spixospixl THEN Error(9); END;
IF dirodirl THEN Error(8); END;
END;
END;
1 def:
NewAt(!sy, Ispix, !dir);
IF styp=pr THEN
GenAssign(I term,Ispix,I count);
END;
r
c cocoATG 231
1?8 END; — CASE
YI2 endsem
180
i8l sem :AssignNumber:
182 INC(count);
183 IF kind=use
184 THEN
ll5 IF styp=nt THEN
l86 GetAt(!sy,!count,Aspixl,Adirl);
;87 IF spixloO THEN
188 IF dir^dirl
l8g THEN GenAssign(!const,Ispixl,!n);
190 ELSE Error(8);
191 END;
192 END;
193 END;
194 ELSE Error(10);
195 END;
196 endsem
197
198 sem :CheckAttr:
199 IF NOT CompleteAt(!sy,!count) THEN
200 Error(6);
201 END;
202 endsem
203
204 sem :Copy:
205 Copy(typ,col)
206 endsem
207
208 sem :InltCopy:
209 StartCopy(l)
210 endsem
211
212 sem :PopPointers:
213 firstfact:=VAL(BOOLEAN,Pop());
214 ddl:=VAL(BOOLEAN,Pop()); gll:=Pop(); gpl:=Pop();
215 dd:=VAL(BOOLEAN,Pop()); gl:=Pop(); gp:=Pop();
216 gpo:=0
217 endsem
218
219 sem tPushPointers:
220 Push(lgp); Push(!gl); Push(!VAL(CARDINAL,dd));
221 Push(Igpl); Push(lgll); Push(!VAL(CARDINAL,ddl));
222 Push(!VAL(CARDINAL,flrstfact));
223 endsem
224
225 sem :StoreSymbol:
226 sy:«SyNr(!spix);
227 if sy=null
228 THEN sy:=NewSy(splx,styp)
229 ELSE Error(1);
230 END;
231 endsem
232
233
234 TERMINALS
235 --=======
236
232
Program listings
App.p
— key words
ALIASSY
ANYSY
DECLARATIONSY
ENDGRAMSY
ENDSEMSY
EPSSY
GRAMMARSY
INSY
MACROSY
NONTERMINALSY
OUTSY
PRAGMASY
RULESSY
SEMSY
SEMANTICSY
TERMINALSY
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278 NONTERMINALS
279
alias "ALIAS"
alias "any"
alias "DECLARATIONS"
alias "ENDGRAM"
alias "endsem"
alias "eps"
alias "GRAMMAR"
alias "in"
alias "MACROS"
alias "NONTERMINALS"
alias "out"
alias "PRAGMAS"
alias "RULES"
alias "sem"
alias "SEMANTICS"
alias "TERMINALS"
— terminal classes
IDENT <out:spix>
STRING <out:spix>
NUMBER <out:n>
— single characters
alias identifier
nococosy <out:n>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
ALIAS
ANY, any
DECLARATIONS
ENDGRAM
ENDSEM
EPS, eps
GRAMMAR
IN, in
MACROS
NONTERMINALS
OUT, in
PRAGMAS
RULES
SEM, sem
SEMANTICS
TERMINALS
name
string
constant
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
coco
expr
alias "correct grammar"
— recognizes the whole compiler description
<out:gp,gl,dd> alias expression
— recognizes an expression and builds its TDG.
~ gp points to the root of the TDG
— gl points to right open ends of the TDG
— dd indicates if the TDG is deletable
term <out:gpl,gll,ddl> alias alternative
— recognizes an alternative and builds its TDG.
— gpl points to the root of the TDG
~ gll points to right open ends of the TDG
~ ddl indicates if the TDG is deletable
fact <in:gpo,firstfact; out:gp2,gl2,dd2,gpo>
alias symbol
— recognizes a component and builds its TDG.
— gp2 points to the root of the TDG
— 35
-- 36
— 37
-=■ 38
APPF
cocoATG
233
Qg — gl2 points to right open ends of the TDG
2JJ7 — dd2 indicates if the TDG is deletable
298 — gpo points to the predecessor of fact or is 0
29g — firstfact is TRUE, if fact is the first one in the term
300 *ttr <in:sy,styp,kind; out:semi,sem2,count> — 39
301 alias attribute
3Q2 — recognizes input/output attributes for the symbol sy
303 -- with type styp.
3Q4 ~ kind=def: used in declaration context
305 — seml=0. sem2=0 (except of pragmas)
306 ~ kind=check: used on the left-hand side of rules
307 — seml=0, sem2=0
308 — kind=use: used on the right-hand side of rules
309 — semi: sem.no. of input attribute evaluation
310 — sem2: sem.no. of output attribute evaluation
311 — count is the nr.of attributes in attr
312 inattr <in:sy,styp,kind,count; out:semi,count> — 40
313 alias "in-attribute"
314 — recognizes input/output attributes for the symbol sy
315 — with type styp (sy must be a nonterminal).
316 — kind=def: used in declaration context
317 — seml=0.
318 — kind=check: used on the left-hand side of rules
319 ~ seml=0.
320 — kind=use: used on the right-hand side of rules
321 — semi: sem.no. of input attribute evaluation
322 — count is the no.of attributes in inattr
323 outattr <in:sy,styp,kind,count; out:sem2,count> — 41
324 alias "out-attribute"
325 — recognizes input/output attributes for the symbol sy
326 — with type styp.
327 — kind=def: used in declaration context
328 — sem2=0.
329 — klnd=check: used on the left-hand side of rules
330 — sem2=0.
331 — kind=use: used on the right-hand side of rules
332 — sem2: sem.no. of output attribute evaluation
333 — count is the no.of attributes in outattr
334 semaction <out:sem3> alias "semantic action" — 42
335 — recognizes a semantic action and generates a CASE block
336 — in Semant. sem2 is the action number.
337 macrodef alias "semantic macro" — 43
338 symbol <out:spix> — 44
339 — recognizes a name or a string
340 aliasname <in:sy> alias "alias name" — 45
341 — recognizes a name which is used for the symbol sy in
342 — syntax error messages in the generated compiler.
344
345 —ssssssssssssssssssssssssssess grammar rules =========—==========«====:===—
346 RULES
347 coco =*
348 GRAMMARS Y
349 IDENT <out:gramspix> sem rules:=0; alts:*0;
3j?° OpenFile (gramspix); StopHash;
351 endsem
352
3j|3 [ SEMANTICSY DECLARATIONSY
3jj4 sem (InitCopy) endsem
{ any sem (Copy) endsem
355
234 Program listings App.F
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
}
]
[ MACROSY { macrodef }
TERMINALSY
{ symbol <out
[ attr <ln:
[ allasname
}
[ PRAGMASY
:spix>
sy,t,def;
<in:sy>
{ symbol <out:spix>
sem RestartHash;
InsertFramePart; styp:-t;
endsem
]
sem eofsy:=NewSy(!0,!t) endsem
sem (StoreSymbol) endsem
out:semi,sem2,count> 3
1
sem styp:=pr endsem
sem (StoreSymbol) endsem
[ attr <in:sy,prrdef; out:semirsem2,count>
1
sem GetSy(!sy,Asn); sn.seml:=sem2;
RepSyUsy, !sn);
endsem
[ semaction <out:sem3>
]
}
]
NONTERMINALSY
{ IDENT <out:
[ attr <in:
[ allasname
}
RULESSY
{ IDENT <out:
[ attr <ln:
1 = •
spix>
sy,nt,def
<in:sy>
splx>
sem GetSy(!sy,Asn); sn.sem2:=sem3;
RepSydsy, !sn);
endsem
sem styp:=nt endsem
sem (StoreSymbol) endsem
; out:semi,sem2rcount> ]
]
sem rootsy:=SyNr(!gramspix);
IF rootsy-null THEN Error(2); END;
endsem
sem sy:=SyNr(!spix);
IF sy^null THEN
Error(3); sy:=NewSy(!spix,!err)
END;
GetSy(!sy,Asn);
IF (sn.typont) AND (sn.typoerr) THEN
Error(4);
END;
IF sn.startoO THEN Error(5); END;
syl:=sy; count:=0; styp:=sn.typ
endsem
sy,styp,check; out:semi,sem2,count> ]
expr <out:gp,gl,dd>
i i
}
sem (CheckAttr) endsem
sem GetSy(!syl,Asn);
sn.start:=gp; sn.del:=dd;
RepSy(!sylr!sn);
INC(rules);
endsem
sem rootloc:=NewNode(!ntf!rootsy,!0);
gpl:=NewNode(!t,!eofsy,!0);
gl:=rootloc; gll:=gpl;
ConcatRlght(rootloc,gl,!gpl,!gll)
endsem
ApPF
cocoATG
235
ENDGRAMSY sem IF ddt["L"] THEN GraphList; END;
*15 CloseFile;
endsem.
416
417
418
419 expr <out:gp,gl#dd> =
420 term <out:gp,gl,dd> sem INC(alts); endsem
421 IT
422 term <out:gpl,gll,ddl> sem INC(alts);
423 ConcatLeft(gp,gl,!gpl,!gll);
^24 dd:=dd OR ddl
j95 endsem
426 >-
427
428 term <out:gpl,gll,ddl> =
429 sem gpo:=0 endsem
430 fact <in:gpo,TRUE; out:gpl,gll,ddl,gpo>
431 { fact <in:gpo,FALSE; out:gp2,gl2,dd2,gpo>
432 sem IF gp2<>0 THEN
433 ConcatRight(gpl,gll,!gp2,!gl2);
434 ddl:=ddl AND dd2;
435 END;
436 endsem
437 }.
438
439 fact <in:gpo,firstfact; out:gp2,gl2,dd2,gpo> =
440 ( symbol <out:spix> sem sy:=SyNr(!spix);
441 IF sy=null THEN
442 Error(3); sy:=NewSy(!spix,!err)
443 END;
444 GetSy(!sy,Asn);
445 IF sn.typ=pr THEN Error(16); END;
446 gp2:=NewNode(!sn.typ,!sy,!llne);
447 gl2:=gp2; dd2:=FALSE; gpo:=gp2;
448 count:=0; styp:=sn.typ
449 endsem
450 [ attr <in:sy,styp,use; out:semi,sem2,count>
451 sem GetNode(!gp2,Agn);
452 gn.semi:-semi; gn.sem2:=sem2;
453 RepNode(!gp2,!gn)
454 endsem
455 ] sem (CheckAttr) endsem
456 | EPSSY sem gp2:=NewNode(!eps,!0,!line);
457 gl2:=gp2; dd2:=TRUE; gpo:=gp2
458 endsem
459 | MYSY sem gp2:=NewNode(!any, !0,!line);
460 gl2:=gp2; dd2:=FALSE; gpo:=gp2
461 endsem
462 | semaction <out:sem3> sem IF gpo=0
463 THEN
464 gp2:«NewNode(!eps, !0,!llne);
465 gl2:*gp2; dd2:=TRUE;
466 GetNode(!gp2,Agn); gn.sem3:=sem3;
467 RepNode(!gp2,!gn);
468 ELSE
469 GetNode(!gpo,Agn); gn.sem3:=sem3;
4 7 0 RepNode(gpo,gn);
471 gp2:*0; gl2:-0; gpo:=0
472 END;
473 endsem
236
Program listings
App.F
474 | »(» sem (PushPolnters) endsem
475 expr <out:gp2,gl2,dd2>
476 ■)* sem (PopPointers) endsem
477 I '[' sem (PushPolnters) endsem
478 expr <out:gp,glrdd> sem gp2:=NewNode(!eps,!0,!line);
479 gl2:=gp2;
480 ConcatLeft(gp,gl,!gp2,!gl2);
481 gp2:=gp; gl2:=gl; dd2:=TRUE;
482 endsem
483 ']' sem (PopPointers) endsem
484 I '{' sem (PushPolnters) endsem
485 expr <out:gp,gl,dd> sem gp2:=NewNode(!eps,!0,!line);
486 gl2:=gp2;
487 ConcatRight(gp,gl,!gp,!gl);
488 ConcatLeft(gp,gl,!gp2,!gl2);
489 gp2:=gp; dd2:*TRUE;
490 — gl2 is link of eps node
491 endsem
492 '}' sem (PopPointers) endsem
493 sem IF firstfact THEN
494 gp3:=gp2; gl3:=gl2;
495 gp2:=NewNode(!eps,!0,!line); gl2:=gp2;
496 ConcatRight(gp2,gl2,!gp3,!gl3);
497 END;
498 endsem
499 ).
501
502
503
504
505
506
507
RfiQ
DKio
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
attr <in:sy,styp,kind;
•<•
( inattr <in:sy,styp,
out:seml,sem2,count> =
sem seml:=0; sem2:=0 endsem
kind,0; out:semi,count>
[ ';' outattr <in:sy,stypfkind,count; out:sem2,count> ]
1 outattr <in:sy,stypfkindf0; out:sem2,count>
)
»>'.
inattr <in:sy,styp,kind, count; out:semi,count> =
INSY
i ♦ i
( IDENT <out:spix>
I NUMBER <out:n>
)
{ V
( IDENT <out:spix>
1 NUMBER <out:n>
)}
sem IF stypont THEN Error(7); END;
dir:-down;
endsem
sem (Assignldl) endsem
sem (AssignNumber) endsem
sem (Assignldl) endsem
sem (AssignNumber) endsem
sem IF kind=use THEN
EmitAction((line,Aseml);
END;
endsem.
525 outattr <in:sy,styp,kind,count; out:sem2,count> =
526 OUTSY sem dir:=up endsem
527 ':'
528 IDENT <out:spix> sem (Assignld2) endsem
529 { »,»
530 IDENT <out:spix> sem (Assignld2) endsem
531 } sem IF (kind-use) OR (styp=pr) THEN
532 EmitAction(!line,Asem2);
APPF
cocoATG
237
533
534
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569 -
570
571
572 -
573
574
575
576
577
578
semaction <out:
SEMSY
( '('
IDENT <out:
')'
1 ( any
}
)
ENDSEMSY.
macrodef =
SEMSY
■. i
sem3>
spix>
IDENT <out:spix>
i. i
{ any
}
ENDSEMSY
symbol <out:spix> -
( IDENT <out:
spix>
aliasname <in:sy> =
ALIASSY
symbol <out:spix>
579 ENDGRAM
alias
238
250
334
aliasname 13
aliasspix 575
ALIASSY 33
alts
any
ANYSY
41
11
22
Asslgnldl 125
239
251
337
15
238
349
30
239
514
=
sem
sem
sem
endsem.
sem StopHash; firstsymbol:=TRUE endsem
sem RestartHash endsem
sem GetMacroNr(!spixr Asem3);
IF sem3«0 THEN Error(12); END;
endsem
sem IF firstsymbol THEN
firstsymbol:=FALSE;
OpenSem(!line, Asem3); StartCopy(!col)
END;
Copy(!typ,!col)
endsem
sem RestartHash; endsem
OpenSem(!line,Asem3);
NewMacro(!spix,!sem3,Aok);
IF NOT ok THEN Error(11); END;
StopHash; firstsymbol:=TRUE;
endsem
IF firstsymbol THEN
firstsymbol:=FALSE; StartCopy(col)
END;
Copy(!typr!col)
endsem
RestartHash endsem.
1 STRING <out:spix> ).
240
252
340
33
574
420
31
459
518
sem GetSy(!sy,Asn); sn.aliasspix:=spix;
RepSydsy, !sn);
endsem.
241 242 243 244 245 246 247 248 249
253 256 280 282 287 293 301 313 324
340 366 385 573
422
73 239 355 459 543 562
238
Program listings
App.p
Assignld2 150 528 530
AsslgnNumber 181 515 519
attr 13 14 15 16 20 27 300 311 365 370 384 401
450 501
attributes 302 311 314 322 325 333
Attrtype 39
CheckAttr 198 402 455
CloseFile 39 416
coco 8 10 280 347
cocogen 39
cocogra 41
cocolex 43
cocosyra 44
col 43 119 205 545 547 563 565
CompErr 47 89 99
CompleteAt 44 199
ConcatLeft 41 423 480 488
ConcatRight 41 413 433 487 496
const 189
Copy 39 204 205 355 547 565
count 75 126 130 139 151 155 157 167 176 182 186 199
300 311 312 312 322 323 323 333 365 370 384 399
401 448 450 501 503 504 504 505 509 509 525 525
dd 68 215 220 282 286 404 405 419 420 424 424 478
485
ddl 68 214 221 287 291 422 424 428 430 434 434
dd2 68 292 297 431 434 439 447 457 460 465 475 481
489
ddt 43 415
DECLARATIONSY 11 240 353
def 55 145 173 304 316 327 365 370 384
del 405
dir 74 132 142 146 159 170 174 188 511 526
dirl 74 130 132 139 142 157 159 167 170 186 188
Direction 44 74
down 511
dummy 83
EmitAction 39 521 532
ENDGRAMSY 17 241 415
ENDSEMSY 30 242 551 568
eofsy 62 363 411
eps 73 243 456 464 478 485 490 495
EPSSY 21 243 456
err 73 392 395 442
Error 116 118 119 134 141 142 161 169 170 190 194 200
229 387 392 396 398 442 445 510 540 558
Errors 47
expr 16 18 24 25 26 282 404 419 475 478 485
fact 19 19 20 292 298 299 430 431 439
firstfact 70 213 222 292 299 439 493
firstsymbol 79 537 543 544 559 562 563
GenAssign 39 133 155 160 176 189
GetAt 45 130 139 157 167 186
GetMacroNr 45 539
GetNode 42 451 466 469
GetSy 45 371 376 394 404 444 575
gl 67 215 220 282 285 404 412 413 419 420 423 478
480 481 485 487 487 488
gll 67 214 221 287 290 412 413 422 423 428 430 433
gl2 67 292 296 431 433 439 447 457 460 465 471 475
APPF
cocoATG
239
gl3
gn
gp
gpl
gp2
gp3
gpo
GRAMMARSY
gramsplx
GraphLlst
Graphnode
IDENT
lnattr
InitCopy
InsertFramePart
INSY
kind
line
link
macrodef
MACROSY
maxstacksize
n
name
NewAt
NewMacro
NewNode
NewSy
nococosy
nodes
nonterm
NONTERMINALSY
nr
nt
null
NUMBER
ok
OpenFlle
OpenSem
outattr
OUTSY
Pop
PopPointers
Pr
PRAGMASY
Push
PushPointers
RepNode
Repsy
RestartHash
479
67
65
66
481
66
433
66
456
478
66
69
447
10
44
42
42
10
383
27
208
40
28
72
323
43
556
490
12
12
91
76
256
45
45
42
46
275
58
133
15
118
73
52
28
81
40
40
27
29
90
212
73
14
90
219
42
46
43
480
494
451
215
485
214
292
457
479
494
216
457
244
349
415
65
15
390
28
354
358
245
127
327
119
31
246
93
189
339
146
557
410
228
64
160
247
119
129
52
28
557
350
545
27
248
96
476
175
249
103
474
453
372
357
481
496
452
220
487
221
295
457
480
496
292
460
348
350
16
514
312
510
152
329
446
337
361
105
258
341
174
411
363
382
309
138
227
258
558
556
29
526
101
483
368
368
109
477
467
377
538
486
452
282
487
287
431
459
481
292
462
386
28
518
322
183
331
456
361
275
446
392
310
156
387
515
323
213
492
370
220
484
470
406
549
488
453
284
488
289
432
460
485
298
469
28
528
503
300
501
459
553
515
456
442
311
166
391
519
333
214
445
220
576
568
490
466
404
489
411
433
460
486
429
470
29
530
509
304
503
464
519
459
321
185
441
504
214
531
220
494
466
405
412
439
464
488
430
471
29
539
306
504
478
464
322
382
505
214
221
495
467
419
413
446
465
489
430
30
556
308
505
485
478
332
384
525
215
221
496
469
420
422
447
466
494
431
31
571
312
509
495
485
333
395
215
221
469
423
423
447
467
495
431
32
316
520
521
495
410
215
222
470
478
428
451
471
495
439
256
318
525
532
510
480
430
453
475
496
439
349
320
531
545
240
Program listings
App.F
Restriction 47 89 107
root 284 289 295
rootloc 41 410 412 413
rootsy 61 386 387 410
rules 41 306 308 318 320 329 331 345 349 407
RULESSY 16 250 389
semi 78 300 305 307 309 312 317 319 321 365 370 371
384 401 450 452 452 501 502 503 509 521
sem2 78 300 305 307 310 323 328 330 332 336 365 370
371 376 384 401 450 452 452 501 502 504 505 525
532
sem3 78 334 375 376 462 466 466 469 469 536 539 540
545 556 557
semaction 14 23 30 334 375 462 536
Semant 336
SEMANTICSTACK 86 88 113
SEMANTICSY 11 252 353
SemErr 47 119
SEMSY 30 31 251 537 554
sn 59 371 371 372 376 376 377 394 395 395 398 399
404 405 405 406 444 445 446 448 575 575 576
sp 94 99 99 99 105 106 106 112
spix 82 133 141 146 155 160 169 174 176 226 228 256
257 338 364 369 383 390 390 392 440 440 442 514
518 528 530 539 539 556 557 570 571 571 575 575
spixl 82 130 131 133 139 140 141 157 158 160 167 168
169 186 187 189
stack 93 99 106
StartCopy 40 209 545 563
StopHash 43 350 537 559
StoreSymbol 225 364 369 383
STRING 32 257 571
string 257 339
styp 73 129 138 154 156 166 175 185 228 300 303 312
315 323 326 358 368 382 399 401 448 450 501 503
504 505 509 510 525 531
sy 60 130 139 146 157 167 174 186 199 226 227 228
300 302 312 314 315 323 325 340 341 365 366 370
371 372 376 377 384 385 390 391 392 394 399 401
440 441 442 444 446 450 501 503 504 505 509 525
573 575 576
406
79
syl
symbol
Symbolnode
Symboltype
SyNr
SYSTEM
t
term
TERMINALSY
typ
type
up
Usage
use
VAL
X
60
13
293
46
46
46
48
73
18
13
43
303
526
55
4
48
97
399
14
302
59
73
226
154
18
253
205
315
72
55
213
99
404
20
314
386
358
19
363
395
326
128
214
100
406
32
325
390
363
70
395
153
215
103
33
338
440
365
155
399
183
220
106
52
341
411
176
445
308
221
58
364
287
446
320
222
59
369
299
448
331
60
440
420
547
450
61
570
422
565
520
62
575
428
531
cocoMOD
241
ApP-]
l (* Coco
Compiler compiler Coco
Moe 27.12.83
3 This'is the main module of Coco
a compiler compiler. It
5 a) opens and closes the files
It controls the execution of the
b) initializes the scanner
C) calls the parser
d) calls the procedures which collect the symbol sets
e) calls the grammar test procedures
f)
calls the procedure which generates the compiler
a) calls the lister to print a listing with error messages
13 implementation restrictions
14 1: cocolex Hash
15 2: cocolex Hash
16 3: cocolex Pushlnc
17 4: cocolex EnQueue
18 5: cocogra NewNode
19 6: cocosym NewSy
20 7: cocosym NewSy
21
22 Compiler errors:
Hash table full
Name list full
Include stack overflow
Attribute queue overflow
Too many nodes in TDG (>600)
Symbol list overflow (>199)
Too many terminals (>127)
23 1:
24 2:
25 3:
26 4:
27 5:
28 6:
29
30 Trace switches:
cocolex
cocolex
cocosym
cocogen
Poplnc
DeQueue
GetAt
OpenFile
cocogen2 GenSynFiles
cocogen2 NewAdr
Include stack underflow
Attribute queue underflow
Try to get attribute inf.
Semantic frame not found
Parser frame not found
Fixups already resolved
for a terminal
A: cocosyn
B: cocosyn
cocogra
cocotst
cocotst
cocotst
cocosym
cocosym
cocosym
cocosym
K: cocosym
L: cocosem
31
32
33
34
35
36
37
38
39
40
41
42
43
44 MODULE Coco;
45
46 FROM cocogen IMPORT
47 FROM cocogen2 IMPORT
48 FROM cocogra IMPORT
49 FROM cocolex IMPORT
50 FROM cocolst IMPORT
51 FROM cocosym IMPORT
52 FROM cocosyn IMPORT
53 FROM cocotst IMPORT
54
55 FROM Errors IMPORT
56 FROM FilelO IMPORT
57
58 FROM System IMPORT
59
can be set by "$D letter {letter}" (without spaces)
Print parser input (remove comments!)
Trace parser run (remove comments!)
DelGraph Print visited nodes
FindCircularRules Print derivations between single nt's
TestlfNtToTerm Trace flow of algorithm
CheckAlternatives Print visited nodes
CollectFirstSet Print visited nodes
GetFirstSet Print resulting set
GetFollowSets Print resulting sets
CollectFollowSets Print visited nodes
Print sets of term.starts and succ.s
Print generated TDG
filesopen, CloseFile;
GenSynFiles, PutStatistics;
DeleteRedundantEps, NewEpsBeforeDelNts;
ddt, src;
1st, PrintListing;
FindDelSymbols, GetSymbolSets;
Parse, printinput, printnodes;
FindCircularRules, LLlTest, TestCompleteness,
TestlfAllNtReached, TestlfNtToTerm;
GetNumberOfErrors;
con, File, Done, Open, Close, Read, Writelnt,
WriteLn, WriteString;
Terminate, normal;
*)
242
Program listings
App.F
60
61 VAR
62 ch: CHAR;
63 correct: BOOLEAN;
64 111: BOOLEAN; (*TRUE if grammar is LL(1)*)
65 lstn: ARRAY[0..63] OF CHAR; (*list file name*)
66 ok: BOOLEAN;
67 semerrors: CARDINAL;
68 synerrors: CARDINAL;
69
70
71 (* ChangeExtension Change extension of file name
72 *}
73 PROCEDURE ChangeExtension(VAR old,new:ARRAY OF CHAR; ext:ARRAY OF CHAR);
74 VAR i,j: INTEGER;
75 BEGIN
76 i:=0;
77 WHILE (i<=HIGH(old)) AND (old[i]<>0C) DO i:=i+l; END;
78 WHILE (i>=0) AND (old[i]<=" ") DO DEC(i) END;
79 j:=i;
80 WHILE (j>=0) AND (old[]]<>".") DO DEC(j) END;
81 IF j>=0 THEN i:=j-l; END;
82 FOR j:«0 TO i DO new[jj:=old[j}; END;
83 new[i+l]:="."; new[i+2]:=ext[0]; new[i+3]:=ext[l];
84 new[i+4]:=ext[2]; new[i+5]:=0C;
85 END ChangeExtension;
86
87
88 BEGIN
89 WriteString(con,"Coco - Compiler Compiler Vs 4.1$");.
90 Open(src,0,"",FALSE);
91 IF NOT Done THEN Terminate(normal) END; (*cancel*j
92 ChangeExtension(srcA.name,lstn,"LST");
93 Open(1st,srcA.volRef,lstn,TRUE);
94 WriteString(1st,"Coco - Compiler Compiler Vs 4.1 ");
95 WriteString(1st,"(Source file: "); WriteString(1st,srcA.name);
96 WriteStringdst, ")$$") ;
97
98 WriteString(con,"parsing");
99 Parse(correct); (*parse input grammar*)
100 GetNumberOfErrors(synerrors,semerrors); (*check for errors*)
101 IF synerrors+semerrorsoO THEN
102 IF filesopen THEN CloseFile END;
103 WriteString(con,"$listing");
104 PrintListing;
105 WriteString(con,"$Compilation terminated. ");
106 Writelnt(con,synerrors+semerrors, 0);
107 WriteString(con," errors detected. Press any key.$");
108 Close(src); Close(1st);
109 Read(con,ch); Terminate(normal);
110 END;
HI
112 WriteString(con,"$evaluating$");
113 FindDelSymbols;
114 NewEpsBeforeDelNts;
115 DeleteRedundantEps;
116 GetSymbolSets;
117 TestCompleteness(ok);
118 IF ok THEN TestlfAllNtReached(ok); END;
Aff-F
cocoMOD
243
if ok THEN FlndCircularRules(ok); END;
119 TF ok THEN TestlfNtToTerm(ok); END;
l2? TF ok THEN LLlTest(lll); END;
}l\ IF NOT ok OR NOT 111 THEN
l,t WriteString(con,"listing$");
24 WriteLn(lst); WriteLn(lst); PrintListing;
US END;
127 IF Ok THEN
j28 writeString(con,"writing$");
!29 GenSynFiles;
130 PutStatistics;
131 END;
132 IF NOT ok THEN
133 WriteString(con,"Compilation ended with errors in grammar tests.");
134 ELSIF NOT 111 THEN
135 writeString(con,"Compilation ended with LL(1) errors.");
136 ELSE
137 WriteString(con,"Compilation completed. No errors detected.");
138 END;
139 Close(src); Close(1st);
140 WriteString(con," Press any key.$"); Read(con,ch);
141 END Coco.
C 77 84
ch 62 109 140
ChangeExtension 73 85 92
Close 56 108 108 139 139
CloseFile 46 102
Coco 44 141
cocogen 46
cocogen2 47
cocogra 48
cocolex 49
cocolst 50
cocosym 51
cocosyn 52
cocotst 53
con 56 89 98 103 105 106 107 109 112 123 128 133
135 137 140 140
correct 63 99
ddt 49
DeleteRedundantEps 48 115
Done 56 91
Errors 55
ext 73 83 83 84
File 56
F*lelO 56
filesopen 46 102
FlndCircularRules 53 119
FindDelSymbols 51 113
GenSynFiles 47 129
GetNumberOfErrors 55 100
GetSymbolSets 51 116
HIGH
1
J
74
83
74
76
83
79
77
83
80
77
84
80
77
'84
80
77 78 78 78 79 81 82
81 81 82 82 82
244
Program listings
*!*>.$?
111 64 121 122 134
LLlTest 53 121
1st 50 93 94 95 95 96 108 124 124 139
lstn 65 92 93
name 92 95
new 73 82 83 83 83 84 84
NewEpsBeforeDelNts 48 114
normal 58 91 109
ok 66 117 118 118 119 119 120 120 121 122 127 132
old 73 77 77 78 80 82
Open 56 90 93
Parse 52 99
printinput 52
PrintListing 50 104 124
prlntnodes 52
PutStatlstics 47 130
Read 56 109 140
semerrors 67 100 101 106
src 49 90 92 93 95 108 139
synerrors 68 100 101 106
System 58
Terminate 58 91 109
TestCompleteness 53 117
TestlfAllNtReached 54 118
TestlfNtToTerm 54 120
volRef 93
Writelnt 56 106
WrlteLn 57 124 124
WrlteString 57 89 94 95 95 96 98 103 105 107 112 123
128 133 135 137 140
APP-F
cocogen J)EF 245
Generator of compiler files Moe 28.12.83
x <* cocogen
2 «rtii^module generates the semantic evaluator. It
* \ copies symbols from the input grammar to the evaluator
* b! copies text from the semantic frame to the evaluator
5 ' stores attribute assignments (and emits them as semantic actions)
7 ~- *}
8 DEFINITION MODULE cocogen;
10 FROM FilelO IMPORT File;
n
12 TYPE
13 Attrtype =* (term, nonterm, const);
14
15 VAR
16 maxsem: CARDINAL; (*number of last semantic action*)
17 filesopen: BOOLEAN; (*files may remain open after a syntax error*)
18
19 PROCEDURE CloseFile;
20 (* Closes the file where the semantic evaluator is written to*)
21
22 PROCEDURE Copy (typ, col -.CARDINAL);
23 (* Copies the source symbol typ at column col to the generated
24 semantic file*)
25
26 PROCEDURE CopyFramePart (VAR fl,f2:File; s:ARRAY OF CHAR);
27 (* Copies file fl to file f2 until string s occurs, s is not copied*)
28
29 PROCEDURE EmitAction( line CARDINAL; VAR sem: CARDINAL);
30 (* Emits the stored attribute assignments as a semantic action, line
31 is used to print a comment, sem is the number of the new action*)
32
33 PROCEDURE GenAs sign (typ: Attrtype; left, right-.CARDINAL);
34 (* Generates an assignment arg(left)<—arg(right). typ indicates if
35 arg(right) is a terminal attribute, a nonterminal attribute or
36 a constant*)
37
38 PROCEDURE InsertFramePart;
39 (* inserts the middle part in the generated semantics file*)
40
41 PROCEDURE OpenFile(spix:CARDINAL);
42 (* opens the file where the semantic evaluator is written to. spix is
43 the grammar name in Cocol. The name of the generated file is the
44 grammar name with the suffix "sem"*)
45
46 PROCEDURE OpenSem(line CARDINAL; VAR sem:CARDINAL) ;
W (* Prints the start of a new semantic action (case-number of a new
48 case-block). line is used to print a comment, sem is the number of
49 the new action*)
50
51 PROCEDURE StartCopy(col CARDINAL);
52 (* Saves col as the leftmost column in the following semantic action*)
54 END cocogen.
246
Program listings
*Pp.F
(* cocogen
Generation of semantic evaluator Moe 30.12.83
This module generates the semantic evaluator. It
a) copies symbols from the input grammar to the evaluator
b) copies text from the semantic frame to the evaluator
c) stores attribute assignments (and emits them as semantic actions)
8 IMPLEMENTATION MODULE cocogen;
9
*)
10 FROM cocolex
11 FROM Errors
12 FROM FilelO
13
14 FROM System
15
16 CONST
17 blanks =
18 ■
19
20
21
22
23
24
25
26 TYPE
27
28
29 Action = RECORD
sem: CARDINAL;
firstass: Assignmentptr;
next: Actionptr;
END;
Assignment ■ RECORD
typ: Attrtype;
left: CARDINAL;
CARDINAL;
Assignmentptr;
IMPORT at, line, col, src, GetName;
IMPORT CompErr, SemErr;
IMPORT con, File, Done, Open, Close, Read, Write,
WriteCard, WriteLn, WriteString, WriteText;
IMPORT Allocate, Deallocate;
ident
string
number
lparsy
commasy
eolsy
- 17;
= 18;
= 19;
- 23;
= 33;
=255;
(*symbol numbers*)
30
31
32
33
34
35
36
37
38
39
40
41
42 VAR
Actionptr = POINTER TO Action;
Assignmentptr = POINTER TO Assignment;
^information about attr.eval. action*)
(*action number*)
(*to first assignment*)
(*to next action*)
right:
next:
END;
Name = ARRAY[1.
^information about an attr. assignment*)
(*term,nonterm,const*)
(*spix of left-hand side*)
(*spix or val of right-hand side*)
(*to next assignment*)
,80] OF CHAR;
firstact:
firstass:
fram:
gram:
graml:
lastact:
lastass:
lastcol:
lasttyp:
leftcol:
margin:
op:
sem:
semname:
Actionptr;
Assignmentptr;
File;
-Name;
CARDINAL;
Actionptr;
Assignmentptr;
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
(*first generated action*)
(*first stored assignment*)
(*file with frame of sem.Analyzer*)
(*grammar name*)
(*length of grammar name*)
(*last generated action*)
(*last stored assignment*)
(*column of last symbol*)
(*type of last symbol*)
(♦leftmost column in semantic action*)
(*indent from left margin*)
ARRAY[0..commasy] OF CHAR; (*operator table*)
File; (*file containing sem.evaluator*)
Name; (*file name of sem.evaluator*)
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59 PROCEDURE EmitAssign(p:Assignmentptr); FORWARD;
APP-
cocogenMOD
60
** r* closeFile Close file containing the semantic evaluator
II PROCEDURE CloseFile;
66 CopyFramePart (f ram, sem, "-->modulename");
67 writeText(sem,gram,graml); WriteString(sem,"sem");
68 CopyFramePart(fram,sem,"$$$");
69 Close(fram); Close(sem);
70 filesopen:=FALSE;
71 END CloseFile;
72
73
74 (* Copy Copy source symbol to semantic evaluator
75
76 PROCEDURE Copy (typ,col:CARDINAL) ;
77 VAR
78 ch: CHAR;
79 l,i: CARDINAL;
80 name: Name;
81 BEGIN
82 IF col<=lastcol THEN (*new line*)
83 WriteLn(sem);
84 WriteText(sem,blanks,margin);
85 IF col>leftcol THEN
86 WriteText(sem,blanks,col-leftcol);
87 END;
88 lasttyp:=eolsy;
89 END;
90 IF (typ<=number) AND (lasttyp<=number) THEN
91 Write (sem," ");
92 END;
93 CASE typ OF
94 |1: WriteString(sem,"alias");
95 | 2: WriteString(sem,"any");
96 | 3: WriteString(sem,"DECLARATIONS");
97 | 4: WriteString(sem,"ENDGRAM");
98 | 5: WriteString(sem,"endsem");
99 I 6: WriteString(sem,"eps");
100 j 7: WriteString(sem,"GRAMMAR");
101 l 8: WriteString(sem,"IN");
102 | 9: WriteString(sem,"MACROS");
103 I 10: WriteString(sem,"NONTERMINALS");
104 | U: WriteString(sem,"out");
105 I 12: WriteString(sem,"PRAGMAS");
106 I 13: WriteString(sem,"RULES");
107 I 14: WriteString(sem,"sem");
1°8 I 15: WriteString(sem,"SEMANTICS");
1°9 I 16: WriteString(sem,"TERMINALS");
110 I 17,18: (*ident, string*)
Hi GetName(at[l],name,l); WriteText(sem,name,1);
112 I 19: WriteCard(sem,at(l],0);
113 I 20..33: ^operators*)
lu Write (sem, op [typ]);
115 I 34: ch:«CHR(at[l]);
}16 IF (ch="!") OR ((ch="A") AND (lasttypoident))
111 THEN;
118 ELSE Write (sem, ch);
248
Program listings
*&.*
119 END;
120 END; (*CASE*)
121 lasttyp:-typ; lastcol:=col;
122 END Copy;
123
124 >■
125 (* CopyFramePart Copies file fl to file f2 until string s occurs *
126 *j
127 PROCEDURE CopyFramePart(VAR fl,f2:File; s:ARRAY OF CHAR);
128 VAR
129 ch,startch: CHAR;
130 i: INTEGER;
131 t: ARRAY[0..50] OF CHAR;
132 BEGIN
133 startch:=s[0]; Read(flrch);
134 WHILE NOT flA.eof DO
135 IF chfstartch
136 THEN (*check if s occurs*)
137 i:=0;
138 WHILE (i<HIGH(s)) AND (ch«s[i]) AND NOT flA.eof DO
139 t[i]:=ch; INC(i); Read(fl,ch);
140 END;
141 IF ch=s[i) THEN RETURN; END; (*found - exit*)
142 WriteText(f2,t,i); (*not found- continue*)
143 Write(f2,ch);
144 ELSE Write(f2,ch); (*normal character - write it*)
145 END;
146 Read(fl,ch);
147 END; (*WHILE*)
148 END CopyFramePart;
149
150
151 (* EmitAction Emit stored semantic action
152 *)
153 PROCEDURE EmitAction (line .-CARDINAL; VAR sem.-CARDINAL);
154 VAR
155 act,p: Actionptr;
156 q: Assignmentptr;
157
158 PROCEDURE EqualAct(pl,p2: Assignmentptr): BOOLEAN;
159 BEGIN
160 WHILE (plONIL) AND (p2<>NIL) AND (plA.typ=p2A.typ) AND
161 (plA.left=p2A.left) AND (piA.right=p2A.right) DO
162 pl:=plA.next; p2:=p2A.next;
163 END;
164 RETURN (pl=NIL) AND (p2=NIL);
165 END EqualAct;
166
167 BEGIN
168 IF firstass=NIL
169 THEN sem:-0;
170 ELSE
171 p:=firstact;
172 WHILE (pONIL) AND NOT EqualAct (p\firstassf firstass) DO
173 p:=pA.next;
174 END;
175 IF p=NIL
176 THEN (*new action*)
177 OpenSem(linersem); EmitAssign(firstass);
ApP-F
cocogenMOD
Allocate(act,SIZE(Action));
178 actA.sem:=sem; actA.firstass:=firstass; actA.next:=NIL;
179 IF firstact-NIL
ISO THEN firstact:=act
181 ELSE iastactA.next:=act
I** END;
lJ? lastact:=act;
W ELSE (*same action found; delete recently stored assignments*)
JJ5 sem:=pA.sem;
l*Z WHILE firstassoNIL DO
^J: q:*firstass; firstass:=firstassA.next; Deallocate(q);
1 9 END;
U END;
191 END;
192 firstass:=NIL;
193 END EmitAction;
194
195
196 (* EmitAssign Write attribute assignment
197 *)
198 PROCEDURE EmitAssign (p:Assignmentptr);
199 VAR
200 1: CARDINAL;
201 name: Name;
202 BEGIN
203 WHILE poNIL DO
204 WriteLn(sem); WriteText(sem, blanks,margin);
205 GetName(pA.left,name,1);
206 CASE pA.typ OF
207 term:
208 WriteString(sem,"ASSIGN("); WriteText (sem,name, 1);
209 WriteString(sem,\at["); WriteCard(sem,pA.right,0);
210 WriteString(sem,"]);");
211 | nonterm:
212 WriteText(sem,name,1); Writestring(sem,":=");
213 GetName (pA. right, name, 1);
214 WriteText(sem,name,1); Write(sem,";");
215 | const:
216 WriteText(sem,name, 1); WriteString(sem,":=");
217 WriteCard(sem,pA.rignt,0); Write(sem,";");
218 END; (*CASE*)
219 p:=pA.next;
220 END; (*WHILE*)
221 END EmitAssign;
222
223
224 (* GenAssign Store attribute assignment
225 *)
226 PROCEDURE GenAssign (f.Attrtype; 1, r-.CARDINAL);
227 VAR ass: Assignmentptr;
228 BEGIN
229 if (t=nonterm) AND (l*r) THEN RETURN; END;
230 Allocate(ass,SIZE(Assignment));
231 WITH assA DO typ:*t; left:»l; right:=r; next:«NIL; END;
232 if firstass=NIL THEN firstass:-ass; ELSE lastassA.nextV=ass; END;
233 lastass:=ass;
234 END GenAssign;
235
236
250
Program listings
APP-P
237 (* InsertFramePart Insert middle part of semantic evaluator
238 *,
239 PROCEDURE InsertFramePart;
240 BEGIN
241 CopyFramePart(fram,sem,"~>actions");
242 margin:=9;
243 END InsertFramePart;
244
245
246 (* OpenFile Open file for semantic evaluator
247 *>
248 PROCEDURE OpenFile(spix:CARDINAL);
249 VAR i,l: CARDINAL;
250 BEGIN
251 GetName(spix,gram,1); graml:=1;
252 FOR i:=l TO graml DO semname[ij :-gram[i]; END;
253 semname[l+l] :="s"; semname[l+2] :="e"; semname[l+3] :="m";
254 semname[ 1+4] :="."; semname[l+5] :="D"; semname[l+6]:="E";
255 semname[l+7]:="F"; semname[l+8] :=0C;
256
257 Open (sem, srcA.volRef,semname, TRUE); (^definition module*)
258 C^en(f ram, srcA.volRef,"cocosemframe", FALSE) ;
259 IF NOT Done THEN
260 SemErr(25,line, col);
261 WriteString(con,"The file 'cocosemframe• must be in the same ");
262 Writestring(con,"subdirectory as the input grammar.$Aborted.$");
263 CompErr(4)
264 END;
265 CopyFramePart (f ram, sem,"—>modulename");
266 WriteText( sem, gram, graml); WriteSt ring (sem, "sem");
2 67 CopyFramePart (f ram, sem,"—>modulename ");
268 WriteText(sem,gram,graml); WriteString(sem,"sem");
269 CopyFramePart (f ram, sem, "—> implement at ion");
270 Close(sem);
271
272 semname[l+5J:="M"; semname[l+6]:="0"; semname[l+7]:*"D";
273 Open(sem,srcA.volRef,semname,TRUE); (*implementation module*)
274 CopyFramePart (f ram, sem,"—>modulename");
275 WriteText (sem, gram, graml); WriteString(sem, "sem");
276 CopyFramePart(fram,sem,"—>scannername");
277 WriteText(sem,gram,graml); WriteString(sem,"lex");
278 CopyFramePart(fram,sem,"—>declarations");
279 filesopen:=TRUE;
280 END OpenFile;
281
282
283 (* OpenSem Write start of new semantic action
284 *)
285 PROCEDURE OpenSem(line:CARDINAL; VAR nrrCARDINAL);
286 BEGIN
287 INC(maxsem); nr:=rnaxsem;
288 WriteString(sem,"$ I "); WriteCard(sem,maxsem,3);
289 WriteString(sem,": (*line "); WriteCard(sem,line,0);
290 WriteString(sem, "*)") ;
291 END OpenSem;
292
293
294 (* StartCopy Set leftmost column in semantic action
295 *)
APP-
cocogenMOD
251
^ oboCEDURE StartCopy (col-.CARDINAL);
29f £5leftcol:-col; lasttyp:=eolsy;
FriN leftcol:-col; lasttyp:=eolsy; lastcol:=99; END StartCopy;
298
*«n BEGIN (*cocogen*)
300 BEGin 12345678901234567890*)
30i .=» =.K)t3 {}<>;:,"; (*"»" must start at pos. 20*)
303 maxsem:=H; margin:=0; firstact:=NIL; firstass:=NIL; filesopen:=FALSE;
304 END cocogen.
155 178 179 179 179 181 182 184
27 29 178
27 32 43 48 155
act
Action
Actionptr
Allocate 14 178 230
ass 227 230 231 232 232 233
Assignment 28 34 230
Assignmentptr 28 31 38 44 49 59 156 158 198 227
at 10 111 112 115
Attrtype 35 226
blanks 17 84 86 " 204
C 255
ch 78 115 116 116 118 129 133 135 138 139 139 141
143 144 146
Close 12 69 69 270
CloseFile 64 71
cocogen 8 304
cocolex 10
col 10 76 82 85 86 121 260 296 297
commasy 23 54
CompErr 11 263
con 12 261 262
const 215
Copy 76 122
CopyFramePart 66 68 127 148 241 265 267 269 274 276 278
Deallocate 14 188
Done 12 259
EmitAction 153 193
EmitAssign 59 177 198 221
e<>f 134 138
e°lsy 24 88 297
EqualAct 158 165 172
Errors n
fl 127 133 134 138 139 146
f2 127 142 143 144
File 12 45 55 127
FilelO 12
filesopen 70 279 303
firstact 43 171 180 181 303
firstass 31 44 168 172 172 177 179 179 187 188 188 188
192 232 232 303
FORWARD 59
fram 45 66 68 69 241 258 265 267 269 274 276 278
GenAssign 226 234
GetName 10 111 205 213 251
9rarri 46 67 251 252 266 268 275 277
9rarr*l 47 67 251 252 266 268 275 277
HIGH 138
1 79 130 137 138 138 139 139 141 142 249 252 252
252
252 Program listings
ApfcF
ident 19 116
InsertFramePar 239 243
1 79 111 111 200 205 208 212 213 214 216 226 229
231 249 251 251 253 253 253 254 254 254 255 255
272 272 272
lastact 48 182 184
lastass 49 232 233
lasted 50 82 121 297
lasttyp 51 88 90 116 121 297
left 36 161 161 205 231
leftcol 52 85 86 297
line 10 153 177 260 285 289
lparsy 22
margin 53 84 204 242 303
maxsem 287 287 288 303
name 80 111 111 201 205 208 212 213 214 216
Name 40 46 56 80 201
next 32 38 162 162 173 179 182 188 219 231 232
nonterm 211 229
nr 285 287
number 21 90 90
op 54 114 302
Open 12 257 258 273
OpenFile 248 280
OpenSem 177 285 291
p 59 155 171 172 172 173 173 175 186 198 203 205
206 209 213 217 219 219
pi 158 160 160 161 161 162 162 164
p2 158 160 160 161 161 162 162 164
q 156 188 188
r 226 229 231
Read 12 133 139 146
right 37 161 161 209 213 217 231
s 127 133 138 138 141
sem 30 55 66 67 67 68 69 83 84 86 91 94
95 96 97 98 99 100 101 102 103 104 105 106
107 108 109 111 112 114 118 153 169 177 179 179
186 186 204 204 208 208 209 209 210 212 212 214
214 216 216 217 217 241 257 265 266 266 267 268
268 269 270 273 274 275 275 276 277 277 278 288
288 289 289 290
SemErr 11 260
semname 56 252 253 253 253 254 254 254 255 255 257 272
272 272 273
spix 248 251
sre 10 257 258 273
startch 129 133 135
StartCopy 296 297
string 20
System 14
t 131 139 142 226 229 231
term 207
typ 35 76 90 93 114 121 160 160 206 231
volRef 257 258 273
Write 12 91 114 118 143 144 214 217
WriteCard 13 112 209 217 288 289
WriteLn 13 83 204
WriteString 13 67 94 95 96 97 98 99 100 101 102 103
104 105 106 107 108 109 208 209 210 212 216 261
APP-F
cocogenMOD
253
262 266 268 275 277 288 289 290
„ vi. 13 67 84 86 111 142 204 208 212 214 216 266
HriteText ^ 275 2?7
254
Program listings
App.F
1 (* cocogen2: Generator for syntax files Moe 1.2.84
3 This module generates the parser. It
4 a) translates the top-down graph into G-code
5 b) copies text from the parser frame, inserting the declarations of
6 the table sizes
7 c) writes the parser tables
8 d) prints statistical information about the compilation
9 *,
10 DEFINITION MODULE cocogen2;
11
12 PROCEDURE GenSynFiles;
13 (* Generates the parser and the parser tables*)
14
15 PROCEDURE PutStatistics;
16 (* Writes statistics about the compilation to the list file*)
17
18 END cocogen2.
AppF
cocogenlMOD
255
(* cocogen2:
Generator for syntax files
Moe 1.2.84
This module generates the parser. It
a) translates the top-down graph into G-code
b) copies text from the parser frame, inserting the declarations of
the table sizes
c) writes the parser tables
d) prints statistical information about the compilation
10 IMPLEMENTATION MODULE cocogen2;
11
IMPORT maxsem, CopyFramePart;
IMPORT alts, maxn, rootloc, rules, GetNode, Graphnode;
IMPORT line, col, GetName;
IMPORT 1st;
IMPORT gramspix, maxany, maxeps, maxt, maxp, maxs, GetA,
GetE, GetF, GetSy, RepSy, Symbolnode, Symbolset,
Symboltype;
IMPORT CompErr, SemErr;
IMPORT con, File, Done, Open, Close, Write, WriteCard,
WriteString, WriteText, WriteLn;
IMPORT Allocate, Deallocate;
IMPORT VAL;
(*G-code length*)
12 FROM cocogen
13 FROM cocogra
14 FROM cocolex
15 FROM cocolst
16 FROM cocosym
17
18
19 FROM Errors
20 FROM FilelO
21
22 FROM System
23 FROM SYSTEM
24
25 CONST (*for G-code*)
26 lmaxc = 3000;
27
28 TYPE
Filename = ARRAY[1..30] OF CHAR;
Instruction^(tc,tac,ntc,ntac,ntsc,ntasc,anyc,anyac,epsc,epsac,jmpc,retc);
29
30
31
32 VAR
code: ARRAY[1..lmaxc] OF [0..255];
pc: CARDINAL;
maxname: CARDINAL;
first: BOOLEAN-
CARDINAL;
RECORD
:BOOLEAN OF
TRUE: ch: ARRAY[1..2] OF CHAR;
I FALSE: card: CARDINAL;
END;
END;
ic:
c:
CASE
(*G-code area*)
(*index in code*)
(*length of name list*)
(*used for printing of tables*)
(♦initialization counter*)
33
34
35
36
37
38
39
40
41
42
43
44
45
46 PROCEDURE OutByte{VAR f:File; ch:CHAR); FORWARD;
47 PROCEDURE OutWord(VAR f:File; n:CARDINAL); FORWARD;
48 PROCEDURE PrintTables(VAR f:File); FORWARD;
49 PROCEDURE WriteConstDecl(VAR f:File;t:ARRAY OF CHAR;n:CARDINAL); FORWARD;
50
51
52
53 MODULE LABMOD; (* G-code labels
54 ====================================================================*)
55 IMPORT
56 code, CompErr, Allocate, Deallocate;
57 EXPORT
58 GetAdr, labact, NewAdr, Visited;
59
256 Program listings App. F
60 TYPE
61 Fixupptr = POINTER TO Fixup;
62 Fixup = RECORD
63 adr: CARDINAL; (*G-code address*)
64 next: Fixupptr; (*to next fixup*)
65 END;
66 Labeladr = RECORD
67 loc,adr: CARDINAL; (*node address and corresponding G-code address*)
68 fix: Fixupptr; (*to first fixup*)
69 END;
70 VAR
71 lab: ARRAY[1..70] OF Labeladr;
72 labact: CARDINAL;
73
74
75 PROCEDURE GetAdr(loc, fixup CARDINAL; VAR adr: CARDINAL) ;
76 VAR
77 i: CARDINAL;
78 fp: Fixupptr;
79 BEGIN
80 i:-l;
81 WHILE (i<=labact) AND (lab[i] .locoloc) DO INC(i); END;
82 IF i>labact
83 THEN (*new label*)
84 INC(labact); lab[i].loc:=loc; lab[i].adr:=0;
85 Allocate(fp, SIZE(Fixup));
86 fpA.adr:=fixup; fpA.next:=NIL; lab[i].fix:=fp;
87 ELSE (*old label*)
88 IF lab[i].adr=0 THEN (*not yet resolved*)
89 Allocate(fp,SIZE(Fixup)); fpA.adr:=fixup; fpA.next:=lab[i].fix;
90 lab[i].fix:=fp;
91 END;
92 END;
93 adr:=lab[i).adr;
94 END GetAdr;
95
96
97 PROCEDURE NewAdr(loc,adrCARDINAL);
98 VAR
99 i: CARDINAL;
100 p,q: Fixupptr;
101 BEGIN
102 i:=l;
103 WHILE (i<=labact) AND (lab[i) .locoloc) DO INC(i); END;
104 IF i>labact
105 THEN (*new label*)
106 INC(labact); lab[i].loc:=loc; lab[i].adr:=adr; lab[i].fix:=NIL;
107 ELSE (*old label*)
108 IF lab[i].adr=0
109 THEN (*resolve fixups*)
110 p:-lab[i].fix;
111 WHILE pONIL DO
112 code[pA.adr):-adr DIV 256;
113 code[pA.adr+l]:=adr MOD 256;
114 qi-p; p:=pA.next; Deallocate(q);
115 END;
116 lab[i].adr:=adr; lab[i].fix:«NIL;
117 ELSE (*fixups already resolved*)
118 CompErr(6);
APP-F
cocogen2MOD
257
119 END;
120 END;
121 END NewAdr;
122
123
124 PROCEDURE Visited(loc:CARDINAL): BOOLEAN;
125 VAR i: CARDINAL;
126 BEGIN
127 1:-1;
128 WHILE (io=labact) AND (lab[i] .locoloc) DO INC(i); END;
129 RETURN (i<=labact) AND (lab[i].adr>0);
130 END Visited;
131
132
133 BEGIN (*LABMOD*)
134 labact:=0;
135 END LABMOD;
136
137
138 (* Emit Emit G-code byte
139
140 PROCEDURE Emit (byte:CARDINAL);
141 BEGIN code[pc]:=byte; INC(pc); END Emit;
142
143
144 (* Emit2 Emit G-code word
145
146 PROCEDURE Emit2(word:CARDINAL);
147 BEGIN
148 code[pc]:=word DIV 256; code[pc+l]:=word MOD 256;
149 INC(pc,2);
150 END Emit2;
151
152
153 (* GenCode Generate G-code for TDG in loc
154
155 PROCEDURE GenCode(loc:CARDINAL);
156 VAR
157 adr: CARDINAL;
158 gn: Graphnode;
159 BEGIN
160 IF Visited(loc) THEN RETURN; END;
161 NewAdr(loc,pc); (*now coming to address loc*)
162 GetNode(loc,gn);
163 WITH gn DO
164 CASE typ OF
165 t: IF lp-0
166 THEN Emit(ORD(tc)); Emit(sp);
167 ELSE
168 GetAdr(lprpc+2,adr);
169 Emit(ORD(tac)); Emit(sp); Emit2(adr);
170 END;
171 | nt: IF lp-0
172 THEN IF seml=0
173 THEN Emit(ORD(ntc)); Emit(sp);
!74 ELSE ESnit(ORD(ntsc)); Emit(sp); Emit(semi);
175 END;
176 ELSE
177 GetAdr(lpfpc+2radr);
258 Program listings App. p
178 IF seml=0
179 THEN Emit(ORD(ntac)); Emit(sp); Emit2(adr);
180 ELSE Emit(ORD(ntasc))/Emit(sp)/Emit2(adr)/Emit(semi)/
181 END;
182 END;
183 I any: IF lp=0
184 THEN Emit(ORD(anyc));
185 ELSE
186 GetAdr(lp,pc+2,adr);
187 Emit(ORD(anyac))/ Emit(sp)/ Emit2(adr);
188 END;
189 I eps: IF sp<>0 THEN
190 IF lp=0
191 THEN Emit(ORD(epsc)); Emit(sp);
192 ELSE
193 GetAdr(lp,pc+2,adr)/
194 Emit(ORD(epsac))/ Emit(sp)/ Emit2(adr)/
195 END/
196 END/
197 END; (*CASE*)
198 IF sem2<>0 THEN Emit(sem2); END/
199 IF sem3<>0 THEN Emit(sem3)/ END/
200 IF rp=0 THEN Emit(ORD(retc) );
201 ELSIF Visited(rp) THEN
202 GetAdr(rp,pc+l,adr)/ Emit(ORD(jmpc))/ Emit2(adr)/
203 END;
204 IF rp>0 THEN GenCode(rp)/ END;
205 IF lp>0 THEN GenCode(lp)/ END/
206 END/ (*WITH*)
207 END GenCode/
208
209
210 (* GenSynFiles Generates files for syntax analysis
2U *)
212 PROCEDURE GenSynFiles/
213 VAR
214 fn: Filename/
215 fram: File/ (*file with parser frame*)
216 graml: CARDINAL/ (*length of grammar name*)
217 gramname: Filename/ (*grammar name*)
218 i,j,l: CARDINAL/
219 name: ARRAY[1..50] OF CHAR/
220 startpc: CARDINAL/
221 sn: Symbolnode/
222 syn: File/ (*file for generated parser*)
223 BEGIN
224 pc:=l/
225 FOR i:=maxp+l TO maxs DO
226 labact:=0/ startpc:=pc/
227 GetSy(i,sn);
228 GenCode(sn.start);
229 sn.start:=startpc;
230 RepSy(ifsn);
231 END;
232 startpc:=pc; GenCode(rootloc);
233
234 maxname:=4; (*"EOF"+0C*)
235 FOR i:=l TO maxs DO
236 GetSy(i,sn); GetName(sn.aliasspixrnamef1);
APPF
cocogeril MOD
259
237 sn.spix:=maxname+l; RepSy(i,sn); INC(maxname,l+l);
738 (*sn.spix becomes a pointer in the generated name list*)
239 END;
240
24i GetName(gramspix,gramname,graml);
2*2
243 (* generate parser*)
244 FOR i:=l TO graml DO fn[i]:=gramname[i]; END;
245 fn[graml+l]:="s"; fn[graml+2]:="y"; fn[graml+3]:="n"; fn[graml+4] :=".";
246 fn[graml+5]:="D"; fn[graml+6]:="E"; fn[graml+7]:="F"; fn[graml+8]:=0C;
247 open(syn,1stA.volRef,fn,TRUE);
248 open(fram,1stA.volRef,"cocosynframe",FALSE);
249 IF NOT Done THEN
250 WriteString(con,"The file •cocosynframe• must be in the same ");
251 WriteString(con,"subdirectory as the input grammar.$");
252 SemErr(21,line,col); CompErr(5);
253 END;
254 CopyFramePart (f ram, syn,"~>modulename"); (*definition module*)
255 WriteText(syn,gramname,graml); WriteString(syn,"syn");
256 CopyFramePart(fram,syn,"~>modulename");
257 WriteText(syn,gramname,graml); WriteString(syn,"syn");
258 CopyFramePart(fram, syn,"-^implementation");
259 Close(syn);
260
261 fn[graml+5]:-"M"; fn[graml+6]:="0"; fn[graml+7]:="D";
262 Open(syn,1stA.volRef,fn,TRUE);
263
264 CopyFramePart (fram,syn, "~>modulename"); (*module name*)
265 WriteText(syn,gramname,graml); WriteString(syn,"syn");
266
267 CopyFramePart(fram,syn,"—>semantic analyzer"); (*various imports*)
268 WriteText(syn,gramname,graml); WriteString(syn,"sem");
269 CopyFramePart(fram,syn,"~>input module");
270 WriteText(syn,gramname,graml); WriteString(syn,"lex");
271
272 CopyFramePart(fram,syn,"-^declarations"); (*semantic declarations*)
273 WriteString(syn,"CONST$");
274 WriteConstDecl(syn," maxname =",maxname);
275 WriteConstDecKsyn," maxnamep =",maxs);
276 WriteConstDecKsyn," maxcode =",pc-l);
277 IF maxany=0
278 THEN WriteConstDecKsyn," maxany =",1);
279 ELSE WriteConstDecKsyn," maxany =",maxany);
280 END;
281 IF maxeps=0
282 THEN WriteConstDecKsyn," maxeps =",1);
283 ELSE WriteConstDecKsyn," maxeps =",maxeps);
284 END;
285 WriteConstDecKsyn," maxt =",maxt);
286 WriteConstDecKsyn," maxp =",maxp);
287 WriteConstDecKsyn," maxs =",maxs);
288 WriteConstDecKsyn," startpc =",startpc);
289 WriteString(syn,"$ ");
290
291 CopyFramePart(fram,syn,"—>tables");
292 PrintTables(syn);
293 CopyFramePart(fram,syn,"—>modulename"); (*module name*)
294 WriteText(syn,gramname,graml); WriteString(syn,Msyn");
295 CopyFramePart(fram,syn,"$$$");
260
Program listings
App.p
296 Close(fram); Close(syn);
297 END GenSynFiles;
298
299
300 (* OutByte Write a byte value to tables file
3d *j
302 PROCEDURE OutByte(VAR f:File; ch:CHAR);
303 BEGIN
304 IF first
305 THEN c.ch[l]:=ch;
306 ELSE c.ch[2]:=ch; OutWord(f,c.card);
307 END;
308 first:=NOT first;
309 END OutByte;
310
311
312 (* OutWord Write a word to tables file
313 *)
314 PROCEDURE OutWord(VAR f:File; nrCARDINAL);
315 BEGIN
316 IF ic=10 THEN
317 WriteString(f,"$ "); ic:=0
318 END;
319 WriteCard(f,n,5); Write(f,",");
320 INC(ic);
321 END OutWord;
322
323
324 (* PrintTables Write out an initialization of the grammar tables
325 *)
326 PROCEDURE PrintTables{VAR f:File);
327 VAR
328 i,j,l: CARDINAL;
329 name: ARRAY[1..50] OF CHAR;
330 s: Symbolset;
331 sn: Symbolnode;
332
333 BEGIN
334 first:=TRUE; WriteString(f," INLINE($ H); ic:=0;
335 OutWord(f,pc-l); (*header(table lengths)*)
336 OutWord(f,maxt);
337 OutWord(f,maxp);
338 OutWord(f,maxs);
339 OutWord(frmaxeps);
340 OutWord(frmaxany);
341 OutWord(frmaxs);
342 OutWord(f,maxname);
343 WriteString(fr"$(*—G-code—*)$ "); ic:-0;
344 FOR i:=l TO pc-1 DO (*G-code*)
345 OutByte (f^HRfcodeli]));
346 END;
347 IF ODD(pc-1) THEN
348 OutByte(fr0C);
349 END;
350 WriteString(ff,,$(*—nt-symbols—*)$ "); ic:=0;
351 FOR i:=maxp+l TO maxs DO (*nt-symbols*)
352 GetSy(irsn);
353 OutWord(f#sn.start);
354 OutWord(ffORD(sn.del)*256);
355 GetF(i,s);
ApPF
cocogeri2MOD
261
izc FOR j:=0 TO maxt DIV 16 DO
357 OutWord(frVAL(CARDINAL,s[j]));
358 END;
359 END'
360 WriteString(f,"$(*—eps followers—*)$ "); ic:=0;
361 FOR i:=l TO maxeps DO (*followers of eps nodes*)
362 GetE(i,s);
363 FOR j:=0 TO maxt DIV 16 DO
364 OutWord(f,VAL(CARDINAL,s[j]));
365 E1®'
366 END;
367 IF maxeps=0 THEN
368 FOR j:=0 TO maxt DIV 16 DO
369 OutWord(f,0);
370 END;
371 maxeps:=1; (*dummy*)
372 END;
373 WriteString(f,"$(*—any sets—*)$ "); ic:=0;
374 FOR 1:^=1 TO maxany DO (*any-sets*)
375 GetA(i,s);
376 FOR j:=0 TO maxt DIV 16 DO
377 OutWord(f ,VAL(CARDINAL,s [ j])) ;
378 END;
379 END;
380 IF maxany=0 THEN
381 FOR j:-0 TO maxt DIV 16 DO
382 OutWord(ff0);
383 END;
384 maxany:=1; (*dummy*)
385 END;
386 WriteString(fr"$(*—attribute numbers—*)$ "); ic:=0;
387 FOR i:=0 TO maxp DO (*attribute numbers*)
388 GetSy(i,sn);
389 OutWord(f,sn.nra);
390 END;
391 WriteString(ff"$(*—pragma semantic—*)$ "); ic:=0;
392 OutWord(f,0); OutWord(fr0); (*dummy psem*)
393 FOR i:=maxt+l TO maxp DO (*pragma semantic*)
394 GetSy(i,sn);
395 OutWord(frsn.seml);
396 OutWord(ffsn.sem2);
397 END;
398 WriteString(ff"$(*—name pointers—*)$ "); ic:=0;
399 OutWord(f,l); (*for eofsy*)
400 FOR i:=l TO maxs DO (*name pointers*)
401 GetSy(ifsn);
402 (*sn.spix is now a pointer in the generated name list*)
403 OutWord(f,sn.spix);
404 END;
405 WriteString(f,"$(*—name list—*)$ "); ic:-0;
406 OutByte(f,"E"); OutByte(fr"0");
407 OutByte(f,"F"); OutByte(f,0C);
408 FOR i:=l TO maxs DO (*name list*)
409 GetSy(irsn);
410 GetName(sn.aliasspixfnamefl);
411 FOR j:=l TO 1 DO OutByte(f,name[j]); END;
412 OutByte(f#0C);
413 END;
414 if ODD(maxname) THEN OutByte(fr0C); END;
262
Program listings
Apfcp
415 WriteString(f,"0);$");
416 END PrlntTables;
417
418
419 (* PutStatlstics Writes statistics about compilation to list file
420 *)
421 PROCEDURE PutStatistics;
422 VAR
423 ptrsize: CARDINAL;
424 setsize: CARDINAL;
425 storage: CARDINAL;
426 BEGIN
427 ptrsize:=2; setsize:=2*((maxt DIV 16)+1);
428 storage:=pc-l + (*G-code*)
429 (ptrsize+2+setsize)*(maxs-maxp) + (*ntsymbols*)
430 setsize*maxeps + (*eps-followers*)
431 setsize*maxany + (*any-sets*)
432 2*(maxp+l) + (*nra*)
433 4*(maxp-maxt+l) + (*ps*)
434 2*(maxs+l) + (*namep*)
435 maxname + (*name*)
436 16; (*header*)
437 WriteLn(lst); WriteString(1st,"Statistics:"); WriteLn(lst);
438 WriteCard(lst,rules,5); WriteStringdst," rules"); WriteLn(lst);
439 WriteCard(lst,alts,5); WriteStringdst," alternatives"); WriteLn(lst);
440 WriteCard(lst,maxn,5); WriteStringdst," nodes"); WriteLn(lst);
441 WriteCard(lst,maxsem-10,5); WriteStringdst," semantic actions");
442 WriteLndst);
443 WriteCard(lst,maxeps,5); WriteStringdst," eps with look ahead");
444 WriteLndst);
445 WriteCard(lst,maxanyr5); WriteStringdst," any-sets"); WriteLn(lst);
446 WriteCard(lst,pc-l,5); WriteStringdst," bytes for G-code");
447 WriteLndst);
448 WriteCard(1st,storage,5);
449 WriteStringdst," bytes for grammar tables (total)"); WriteLn(lst);
450 END PutStatistics;
451
452
453 (* WriteConstDecl Write constant declaration text
454 *)
455 PROCEDURE WriteConstDecl (VAR f:File; t .-ARRAY OF CHAR; n:CARDINAL);
456 BEGIN
457 WriteString(f,t); WriteCard(f,n,4); WriteString(f,";$");
458 END WriteConstDecl;
459
460 END cocogen2.
adr 63 67 75 84 86 88 89 93 93 97 106 106
108 112 112 113 113 116 116 129 157 168 169 177
179 180 186 187 193 194 202 202
aliasspix 236 410
Allocate 22 56 85 89
alts 13 439
any 183
anyac 30 187
anyc 30 184
byte 140 141
c 38 305 306 306
APPF
cocogen2MOD
263
246 348 407 412 414
C 41 306
card 40 46 302 305 305 306 306
<*. 20 259 296 296
Close 12
^Tli 10 460
^009602
cocogra
cocolex
cocolst
13
14
15
cocosym ^ ^ ^ ^ ^ ^ ^ ^
code
col
CompErr
con
CopyFramePart
14 252
19 56 118 252
20 250 251
12 254 256 258 264 267 269 272 291 293 295
Deallocate 22 56 114
del 354
Done 20 249
groit 140 141 166 166 169 169 173 173 174 174 174 179
179 180 180 180 184 187 187 191 191 194 194 198
199 200 202
Emit2 146 150 169 179 180 187 194 202
eps 189
epsac 30 194
epsc 30 191
Errors 19
f 46 47 48 49 302 306 314 317 319 319 326 334
335 336 337 338 339 340 341 342 343 345 348 350
353 354 357 360 364 369 373 377 382 386 389 391
392 392 395 396 398 399 403 405 406 406 407 407
411 412 414 415 455 457 457 457
File 20 46 47 48 49 215 222 302 314 326 455
FilelO 20
Filename 29 214 217
first 36 304 308 308 334
fix 68 86 89 90 106 110 116
fixup 75 86 89
Fixup 61 62 85 89
Fixupptr 61 64 68 78 100
fn 214 244 245 245 245 245 246 246 246 246 247 261
261 261 262
FORWARD 46 47 48 49
fp 78 85 86 86 86 89 89 89 90
fram 215 248 254 256 258 264 267 269 272 291 293 295
296
GenCode 155 204 205 207 228 232
GenSynFlles 212 297
GetA 16 375
GetAdr 58 75 94 168 177 186 193 202
GetE 17 362
GetF n 355
GetName 14 236 241 410
GetNode 13 162
GetSy 17 227 236 352 388 394 401 409
*n 158 162 163
9*aml 216 241 244 245 245 245 245 246 246 246 246 255
257 261 261 261 265 268 270 294
gramname 217 241 244 255 257 265 268 270 294
gramsplx 16 241
264 Program listings App.p »
Graphnode 13 158
i 77 80 81 81 81 82 84 84 86 88 89 90
93 99 102 103 103 103 104 106 106 106 108 110
116 116 125 127 128 128 128 129 129 218 225 227
230 235 236 237 244 244 244 328 344 345 351 352
355 361 362 374 375 387 388 393 394 400 401 408
409
lc 37 316 317 320 334 343 350 360 373 386 391 398
405
Instruction 30
j 218 328 356 357 363 364 368 376 377 381 411 411
jmpc 30 202
1 218 236 237 328 410 411
lab 71 81 84 84 86 88 89 90 93 103 106 106
106 108 110 116 116 128 129
labact 58 72 81 82 84 103 104 106 128 129 134 226
Labeladr 66 71
LABMOD 53 135
line 14 252
lmaxc 26 33
loc 67 75 81 81 84 84 97 103 103 106 106 124
128 128 155 160 161 162
lp 165 168 171 177 183 186 190 193 205 205
1st 15 247 248 262 437 437 437 438 438 438 439 439
439 440 440 440 441 441 442 443 443 444 445 445
445 446 446 447 448 449 449
maxany 16 277 279 340 374 380 384 431 445
maxeps 16 281 283 339 361 367 371 430 443
maxn 13 440
maxname 35 234 237 237 274 342 414 435
maxp 16 225 286 337 351 387 393 429 432 433
maxs 16 225 235 275 287 338 341 351 400 408 429 434
maxsem 12 441
maxt 16 285 336 356 363 368 376 381 393 427 433
n 47 49 314 319 455 457
name 219 236 329 410 411
NewAdr 58 97 121 161
next 64 86 89 114
nra 389
nt 171
ntac 30 179
ntasc 30 180
ntc 30 173
ntsc 30 174
ODD 347 414
Open 20 247 248 262
OutByte 46 302 309 345 348 406 406 407 407 411 412 414
OutWord 47 306 314 321 335 336 337 338 339 340 341 342
353 354 357 364 369 377 382 389 392 392 395 396
399 403
p 100 110 111 112 113 114 114 114
pc 34 141 141 148 148 149 161 168 177 186 193 202
224 226 232 276 335 344 347 428 446
PrintTables 48 292 326 416
ptrsize 423 427 429
PutStatistics 421 450
q 100 114 114
RepSy 17 230 237
retc 30 200
p cocogen2MOD 265
rootloc
rp
rules
s
semi
sem2
seffl3
SemErr
setslze
sn
sp
spix
start
startpc
storage
Symbolnode
Symbolset
Symboltype
syn
13
200
13
330
172
198
199
19
424
221
354
166
237
228
220
425
17
17
18
222
232
201
438
355
174
198
199
252
427
227
388
169
403
229
226
428
221
330
247
202
357
178
396
429
228
389
173
353
229
448
331
254
204
362
180
430
229
394
174
232
255
204
364
395
431
230
395
179
288
255
375
236
396
180
256
377
236
401
187
257
237
403
189
257
237
409
191
258
331
410
194
259
352 353
262 264
265 265 267 268 268 269 270 270 272 273 274 275
276 278 279 282 283 285 286 287 288 289 291 292
293 294 294 295 296
System 22
SYSTEM 23
t 49 165 455 457
tac 30 169
tc 30 166
typ 164
VAL 23 357 364 377
Visited 58 124 130 160 201
volRef 247 248 262
word 146 148 148
Write 20 319
WrlteCard 20 319 438 439 440 441 443 445 446 448 457
HriteConstDecl 49 274 275 276 278 279 282 283 285 286 287 288
455 458
WriteLn 21 437 437 438 439 440 442 444 445 447 449
WriteString 21 250 251 255 257 265 268 270 273 289 294 317
334 343 350 360 373 386 391 398 405 415 437 438
439 440 441 443 445 446 449 457 457
WriteText 21 255 257 265 268 270 294
266
Program listings
App.p
l
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
. 55
56
57
58
59
(* cocogra
Graph node list
Moe 28.12.83
This module builds and handles the top-down graph. It
a) generates and updates single graph nodes
b) concatenates graphs via left or right pointers
c) prints the whole graph for tracing
d) inserts eps nodes before deletable nonterminals with alternatives
e) deletes redundant eps-nodes resulting from EBNF-constructs such as
{x}y or [x]y
*,
DEFINITION MODULE cocogra;
FROM cocosym IMPORT Symboltype;
CONST
maxnodes = 600;
TYPE
Graphnode = RECORD
typ: Symboltype;
sp: CARDINAL;
lp: CARDINAL;
rp: CARDINAL;
semi: [0..255];
sem2: [0..255];
sem3: [0..255];
line: CARDINAL;
link: CARDINAL;
END;
(*eps,t,pr,nt,any,err*)
(*node symbol*)
(*left pointer*)
(*right pointer*)
(♦evaluation of in-attributes*)
(♦evaluation of out-attributes*)
(*semantic action*)
(*line number*)
(*ptr to node with same right successor*)
Marklist = ARRAY[0..maxnodes DIV 16] OF BITSET;
VAR
maxn: CARDINAL;
alts: CARDINAL;
rules: CARDINAL;
rootloc: CARDINAL;
(*number of graph nodes*)
(*number of alternatives, filled by AG*)
(*number of grammar rules, filled by AG*)
(*root node of grammar, filled by AG*)
PROCEDURE ClearMarkList(VAR m:Marklist);
(* Clears the mark list m*)
PROCEDURE ConcatLeft(VAR gp,gl,gpl,gll CARDINAL);
(* Links the graph (gp,gl) with the graph (gpl,gll) via left pointers.
The resulting graph is identified by (gp,gl)*)
PROCEDURE ConcatRight(VAR gp,gl,gpl,gllCARDINAL);
(* Links the graph (gp,gl) with the graph (gpl,gll) via right pointers.
The resulting graph is identified by (gp,gl)*)
PROCEDURE Deletable(loc:CARDINAL): BOOLEAN;
(* TRUE if the graph with the root loc is deletable*)
PROCEDURE DeleteRedundantEps;
(* Deletes eps nodes in constructions {x}y and [x]y*)
PROCEDURE DelNode(gn:Graphnode): BOOLEAN;
(* TRUE if the node gn contains a deletable symbol*)
PROCEDURE GetNode(p:CARDINAL; VAR gn:Graphnode);
(* Gets the graph node with the index p*)
c cocograDEF 267
ApPr
*i PROCEDURE GraphList;
g2 (* Prints a test list of the top-down graphs of all rules*)
64 PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist);
65 (* Marks loc in list m as visited*)
67 PROCEDURE Marked (loc: CARDINAL; VAR m:Marklist): BOOLEAN;
68 (* TRUE if loc is marked in m*)
69
70 PROCEDURE NewEpsBeforeDelNts;
71 {* Inserts eps nodes in front of deletable nt's*)
72
73 PROCEDURE NewNode(typ:Symboltype; sp,line:CARDINAL): CARDINAL;
74 (* Generates a new graph node with the specified values and returns
75 its index*)
76
77 PROCEDURE RepNode(p:CARDINAL; gn:Graphnode);
78 {* Replaces the graph node with index p by gn*)
79
80 END cocogra.
268
Program listings
App.F
1 (* cocogra Graph node list for coco Moe 29.12.83
2 ======= ========================
3 This module builds and handles the top-down graph. It
4 a) generates and updates single graph nodes
5 b) concatenates graphs via left or right pointers
6 c) prints the whole graph for tracing
7 d) inserts eps nodes before deletable nonterminals with alternatives
8 e) deletes redundant eps-nodes resulting from EBNF-constructs such as '
9 {x}y or [x]y
10 *j
11 IMPLEMENTATION MODULE cocogra;
12
13 FROM cocolex IMPORT ddt, GetName;
14 FROM cocosym IMPORT maxp, maxs, GetSy, RepSy, Symbolnode,
15 Symbol type;
16 FROM Errors IMPORT Restriction;
17 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString,
18 WriteText;
19
20 TYPE Graphnodelist = ARRAY[1..maxnodes] OF Graphnode;
21 VAR gn: Graphnodelist; (*syntax graph*)
22
23
24 (* ClearMarkList Clear mark list m
25 *)
26 PROCEDURE ClearMarkList (VAR m.-Marklist) ;
27 VAR i: CARDINAL;
28 BEGIN FOR i:=0 TO maxnodes DIV 16 DO m[i]:=U; END; END ClearMarkList;
29
30
31 (* ConcatLeft Concatenate graph gpl left to graph gp
32 *)
33 PROCEDURE ConcatLeft(VAR gprgl,gpl,gll:CARDINAL);
34 VAR p: CARDINAL;
35 BEGIN
36 p:=gp;
37 WHILE gn[p].lp<>0 DO p:=gn[p].lp; END;
38 gn[p].lp:=gpl;
39 p:=gl;
40 WHILE gn[p].link<>0 DO p:=gn[p].link; END;
41 gn[p].link:=gll;
42 END ConcatLeft;
43
44
45 (* ConcatRight Concatenate graph gpl right to graph gp
46 *)
47 PROCEDURE ConcatRight(VAR gp,gl,gpl,gll:CARDINAL);
48 VAR p: CARDINAL;
49 BEGIN
50 p:-gl;
51 WHILE p<>0 DO gn[p].rp:=gpl; p:«gn[p].link; END;
52 gl:-gll;
53 END ConcatRight;
54
55
56 (* Deletable Check if graph in loc is deletable
57 *)
58 PROCEDURE Deletable(loc:CARDINAL):BOOLEAN;
59 VAR m: Marklist;
AppF
cocograMOD
60 PROCEDURE DelGraph(loc:CARDINAL):BOOLEAN;
5J VAR gnrGraphnode;
,* BEGIN
A IF loc=0 THEN RETURN TRUE; END; (*end of graph found*)
A IF Marked (loc,m) THEN RETURN FALSE; END;
66 Mark(loc,m);
61 GetNode(locgn);
6B IF ddt["C"] THEN
59 WriteString{con,"DelGraph:");
70 writeCard(con,loc,6); WriteCard(con,0RD(gn.typ),8);
71 WriteCard(con,gn.sp,6); WriteLn(con);
72 END;
73 RETURN ((gn.lpoO) AND DelGraph(gn.lp)) OR
74 (DelNode(gn) AND DelGraph(gn.rp));
75 END DelGraph;
76
77 BEGIN (*Deletable*)
78 ClearMarkList(m);
79 RETURN DelGraph(loc);
80 END Deletable;
81
82
83 (* DelNode Test if node gn is deletable
84
85 PROCEDURE DelNode(gnrGraphnode):BOOLEAN;
86 VAR sn:Symbolnode;
87 BEGIN
88 IF gn.typ=nt
89 THEN GetSy(gn.sp,sn); RETURN sn.del;
90 ELSE RETURN gn.typ=eps;
91 END;
92 END DelNode;
93
94
95 (* DeleteRedundantEps Delete eps nodes in constructions {x}y and [x]y
96 *)
97 PROCEDURE DeleteRedundantEps;
98 VAR
99 m: Marklist;
100 i: CARDINAL;
101 sn: Symbolnode;
102
103 PROCEDURE DelEps(loc:CARDINAL);
104 VAR gn,gnl: Graphnode;
105 BEGIN
106 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
J07 Mark(loc,m);
108 GetNode(loc,gn);
|09 WITH gn DO
f}° IF lp<>0 THEN
111 GetNode(lp,gnl);
j*2 IF (gnl.typ=eps) AND (gnl.sem3=0)
112 AND (gnl.lp-0) AND (gnl.rpoO) THEN
;j^ lp:=gnl.rp; RepNode(loc,gn);
111 END;
I}* END;
t\l DelEps(lp); ,
118
DelEps(rp);
270
Program listings
App.F
119 END;
120 END DelEps;
121
122 BEGIN
123 ClearMarkList(m);
124 FOR l:=maxp+l TO maxs DO
125 GetSy(l,sn); DelEps(sn.start);
126 END;
127 END DeleteRedundantEps;
128
129
130 (* GetNode Get node gp
131 *)
132 PROCEDURE GetNode(gp:CARDINAL; VAR gnliGraphnode);
133 BEGIN gnl:-gn[gp]; END GetNode;
134
135
136 (* GraphLlst trace output of graph node list
137 *)
138 PROCEDURE GraphList;
139 VAR
140 i,j,l: CARDINAL;
141 name: ARRAY[1..80] OF CHAR;
142 sn: Symbolnode;
143 BEGIN
144 WriteString(con,H$$Topdown-graph:$$");
145 WriteString(con,"loc symbol typ lp rp");
146 WriteString(con," semi sem2 sem3 link line$$");
147 FOR i:=l TO maxn DO
148 WriteCard(con,i,3); WriteString(con," ");
149 WITH gn[i] DO
150 CASE typ OF
151 eps,any:
152 WriteString(con," ");
153 I t,nt:
154 GetSy(sp,sn); GetName(sn.spix,name,l);
155 FOR j:*l+l TO 12 DO name[j]:=" ■; END;
156 WriteText(con,name,12);
157 | err:
158 WriteString(con,"error ");
159 END; (*CASE*)
160 CASE typ OF
161 eps: WriteString(con," eps ");
162 I t: WriteString(con," t ");
163 | pr: WriteString(con," pr ");
164 I nt: WriteString(con," nt ");
165 I any: WriteString(con," any ");
166 ELSE;
167 END; (*CASE*)
168 WriteCard(con,lp,7); WriteCard(con,rp,7);
169 WriteCard(con,seml,7); WriteCard(con,sem2,7);
170 WriteCard(con,sem3,7); WriteCard(con,link,7);
171 WriteCard(con,line,7); WriteLn(con);
172 END; (*WITH*)
173 END; (*FOR*)
174 END GraphList;
175
176
177 (* Mark Marks node loc in m as visited
- cocograMOD 111
APPF
. *)
lll PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist);
80 BEGIN INCL(m[loc DIV 16],loc MOD 16); END Mark;
181
a\ (* Marked Tests if node loc is marked in m
183 I * ^
ifl5 PROCEDURE Marked (loc: CARDINAL; VAR m:Marklist): BOOLEAN;
186 BEGIN RETURN (loc MOD 16) IN m[loc DIV 16]; END Marked;
187
188
189 (* NewEpsBeforeDelNts Insert eps before del. nt's with alternatives
190 *)
191 PROCEDURE NewEpsBeforeDelNts;
192 VAR
193 gnrgnl: Graphnode;
194 loc,locl,maxloc: CARDINAL;
195 sn: Symbolnode;
196 BEGIN
197 maxloc:=maxn;
198 FOR loc:=l TO maxloc DO
199 GetNode(loc,gn);
200 IF (gn.typ=nt) AND (gn.lpoO) AND DelNode(gn) THEN
201 loci :=NewNode(gn.typ,gn.sp,gn. line);
202 gnl:=gn; gnl.lp:=0;
203 WITH gn DO
204 typ:=eps; sp:=0; rp:=locl; seml:=0; sem2:=0; sem3:=0;
205 END;
206 RepNode(loc1,gnl);
207 RepNode(loc,gn);
208 END;
209 END; (*FOR*)
210 END NewEpsBeforeDelNts;
211
212
213 {* NewNode Generate a new graph node and return the index
214 *)
215 PROCEDURE NewNode(t:Symboltype; s:CARDINAL; IrCARDINAL): CARDINAL;
216 BEGIN
217 INC(maxn);
218 IF maxn>maxnodes THEN Restriction(5); END;
219 WITH gnfmaxn] DO
220 typ:=t; sp:=s; lp:=0; rp:=0; seml:=0; sem2:=0; sern3:=0;
221 line:-l; link:-0;
222 END;
223 RETURN maxn;
224 END NewNode;
225
226
227 (* RepNode Replace node gp
228 *)
229 PROCEDURE RepNode(gp:CARDINAL; gnl:Graphnode);
<30 BEGIN gn[gp]:»gnl; END RepNode;
232
233 BEGIN (*cocogra*)
l\l maxn:=0;
*** END cocogra.
272
Program listings
*****
any 151 165
ClearMarkList 26 28 78 123
cocogra 11 235
cocolex 13
cocosym 14
con 17 69 70 70 71 71 144 145 146 148 148 15*
156 158 161 162 163 164 165 168 168 169 169 r?0
170 171 171
ConcatLeft 33 42
ConcatRight 47 53
ddt 13 68
del 89
DelEps 103 117 118 120 125
Deletable 58 80
DeleteRedundantEps 97 127
DelGraph 61 73 74 75 79
DelNode 74 85 92 200
eps 90 112 151 161 204
err 157
Errors 16
FilelO 17
GetName 13 154
GetNode 67 108 HI 132 133 199
GetSy 14 89 125 154
gl 33 39 47 50 52
gll 33 41 47 52
gn 21 37 37 38 40 40 41 51 51 62 67 70
71 73 73 74 74 85 88 89 90 104 108 109
114 133 149 193 199 200 200 200 201 201 201 202
203 207 219 230
gnl 104 111 112 112 113 113 114 132 133 193 202 202
206 229 230
gp 33 36 47 132 133 229 230
gpl 33 38 47 51
GraphList 138 174
Graphnode 20 62 85 104 132 193 229
Graphnodelist 20 21
i 27 28 28 100 124 125 140 147 148 149
INCL 180
j 140 155 155
1 140 154 155 215 221
line 171 201 221
link 40 40 41 51 170 221
loc 58 61 64 65 66 67 70 79 103 106 106 107
108 114 179 180 180 185 186 186 194 198 199 207
loci 194 201 204 206
IP 37 37 38 73 73 110 111 113 114 117 168 200
202 220
m 26 28 59 65 66 78 99 106 107 123 179 180
185 186
Mark 66 107 179 180
Marked 65 106 185 186
Marklist 26 59 99 179 185
maxloc 194 197 199
roaxn 147 197 217 218 219 223 234
ntaxnodes 20 28 218
maxp 14 124
maxs 14 124
APP-
cocograMOD
273
141 154 155 156
^cfteforeDelNts 191 210
fl****** 201 215 224
&&& 88 153 164 200
Ot 34 36 37 37 37 38 39 40 40 40 41 48
p' 50 51 51 51 51
163
P* . 114 206 207 229 230
Section 16 218
aesct 51 74 113 114 118 168 204 220
* 215 220
semi 169 204 220
169 204 220
112 170 204 220
86 89 89 101 125 125 142 154 154 195
71 89 154 201 204 220
154
sem2
sem3
sn
sp
start 125
Symbolnode 14 86 101 142 195
Symboltype 15 215
t 153 162 215 220
typ 70 88 90 112 150 160 200 201 204 220
HriteCard 17 70 70 71 148 168 168 169 169 170 170 171
HriteLn 17 71 171
HriteString 17 69 144 145 146 148 152 158 161 162 163 164
165
HriteText 18 156
274
Program listings
%F
l
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
(* cocolex
Lexical analyzer for coco
**>e 83.03.27
This is the Coco-scanner. It
a) reads the input grammar
b) returns symbol numbers and terminal attributes to the parser
c) hashes names and strings into a name list (permanently or
temporarily)
d) converts number-strings to values
All symbols which are not terminals of Cocol get the symbol type
'nococosy1 and are hashed into the name list.
1
DEFINITION MODULE cocolex;
FROM FilelO IMPORT File;
VAR
typ: CARDINAL; (*next token code*)
at: ARRAY[1..10] OF CARDINAL; (*attr. values of current token*)
line: CARDINAL; (*current line number*)
col: CARDINAL; (*current column number*)
ddt: ARRAY ["A".."Z"] OF BOOLEAN; (*debug and test switches*)
src: File; (*source file*)
PROCEDURE GetName(spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL);
(* Get the text of a name or a string with the spelling index spix.
len denotes its length*)
PROCEDURE GetSy;
(* Gets the next input token and fills at, line and col*)
PROCEDURE RestartHash;
(* Causes identifiers and strings to be stored permanently*)
PROCEDURE StopHash;
(* Causes identifiers and strings to be stored temporarily*)
END cocolex.
APP-
cocolexMOD
275
X
2
3
4
5
6
7
8
9
10
U
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
(* cocolex:
lexical analyzer for coco
moe 83.03.27
83.12.23
*MS is the Coco-scanner. It
a) reads the input grammar
b! returns symbol numbers and terminal attributes to the parser
J hashes names and strings into a name list (permanently or
temporarily)
d) converts number-strings to values
Ml symbols which are not terminals of Cocol get the symbol type
•nococosy' and are hashed into the name list.
IMPLEMENTATION MODULE cocolex;
FROM cocosyn IMPORT printinput, printnodes;
IMPORT SemErr, Restriction;
IMPORT con, EF, EOL, File,
Read, Write, WriteCard, WriteString, WriteText;
IMPORT VAL;
*)
FROM Errors
FROM FilelO
FROM SYSTEM
CONST
eofsy
ident
string
number
eqlsy
periodsy
variantsy
lparsy
rparsy
lbracksy
rbracksy
Iconbrsy
rconbrsy
latparsy
ratparsy
=
■
=
=
=
=
=
=
=
«
=
=
=
=
«
semicolonsy=
colonsy
commasy
nococosy
notyp
buflen
=
=
=
=
=
0;
17;
18;
19;
20;
21;
22;
23;
24;
25;
26;
27;
28;
29;
30;
31;
32;
33;
34;
255;
1024*16
(*lexical types*)
(*numbers 1..16 reserved for keywords*)
(* ( *)
(* ) *)
(* [ *)
(* 1 *)
TYPE
Charclass =
(none,letter,digit,quote,eql, period,variant,lpar,rpar,Ibrack,
rbrack,Iconbr,rconbr,latpar,ratpar,semicolon,colon,comma,endf lie,
endline,dollar,minus);
VAR
c: CHAR;
class: ARRAY [0C..377C] OF Charclass; (*class OF input character*)
*>uf: ARRAY[0. .buflen-1] OF CHAR; (*input buffer*)
bp,bpmax:CARDINAL; (*buffer pointers*)
CONST
idmax = 4980;
htmax = 359;
(*max.length of identifier list*)
(*max.length of hash table*)
276
Program listings
APP.P
60 VAR
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
ch:
column:
i:
id:
idact:
keys:
ht:
storeid:
(* NextCh
CHAR;
CARDINAL;
CARDINAL;
ARRAY[0..idmax+20
CARDINAL;
CARDINAL;
] OF CHAR;
ARRAY[0..htmax] OF CARDINAL
BOOLEAN;
Get next
PROCEDURE NextCh;
BEGIN
Read(src
rch); INC(column);
END NextCh;
(* Hash
PROCEDURE
VAR h,l,d:
(♦current input character*)
(*start of current column*)
(♦identifiers*)
(*last element IN id*)
(*pos. OF last keyword IN id*)
; (*hash table*)
(*store id. permanently?*)
input character (chrcolumn global)
Hash an identifier and return its spix
Hash(idp:CARDINAL;
INTEGER;
VAR spix:
CARDINAL);
PROCEDURE EqualId(x,y,l:CARDINAL):BOOLEAN;
VAR i: CARDINAL;
BEGIN
i:«0;
WHILE
RETURl*
(i<l) AND (id[x+i]
1 i=l;
END EqualId;
»id[y+i])
DO INC(i); END;
91
92 BEGIN
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112 (* EnterKey
113
l:=idp-idact; spix:=idact+l;
h:=(ORD(id[spix])*7 + ORD(id[spix+1)) + 1) * 17 MOD htmax;
d:= -htmax;
LOOP
IF ht[h]=0 THEN (*new identifier*)
IF storeid THEN ht[h]:=spix; idact:=idp; END;
EXIT;
ELSIF Equalld(ht[h],spix,l) THEN
spix:=ht[h); EXIT;
ELSE
INC(dr2);
IF d=htmax THEN Restriction(1); END;
h:=(h+ABS(d)) MOD htmax;
END;
END; (*LOOP*)
IF idp>idmax THEN Restriction(2); END;
END Hash;
(*old identifier*)
(♦collision*)
(*hash table full*)
(*identifier list full*)
Enter a keyword to the identifier list
114 PROCEDURE EnterKey(sy:CARDINAL; key:ARRAY OF CHAR);
115 VAR idp,i: INTEGER;
116 BEGIN
117 INC(idact); id[idact]:=CHR(sy); idp:=idact; (*store symbol number*)
118 FOR i:=0 TO HIGH(key) DO (*store keyword*)
APPF
cocolexMOD
in
ll9 iNC(idp); id[idp]:=key[i];
kfl end;
!;? iNC(idp); id[idp]:-OC;
v) Hash(idp,keys); (*keys contains the last keyword spix at any time*)
J23 END EnterKey;
124
125
19* (* GetName Get the name of an identifier from the name list
I26 * \
127
128 PROCEDURE GetName (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR 1:CARDINAL);
129 VAR i,h:CARDINAL;
130 BEGIN
131 i:=spix; 1:=0; h:=HIGH(name);
132 WHILE (id[i]<>0C) AND (K=h) DO
133 name[l]:-id[i]; INC(i); INC(l);
134 END;
135 END GetName;
136
137
138 (* ReadName Read identifier or keyword
139 *)
140 PROCEDURE ReadName (VAR typ,val:CARDINAL);
141 VAR spix,idp: CARDINAL;
142 BEGIN
143 idp:=idact;
144 WHILE (class[ch]=letter) OR (class[ch]=digit) DO
145 INC(idp); id[idp]:=ch; NextCh;
146 END;
147 INC(idp); idfidp]:=0C;
148 Hash(idp,spix);
149 IF spix<=keys
150 THEN typ:=ORD(id[spix-l]); val:=0; (*keyword*)
151 ELSE typ:=ident; val:=spix; (*identifier*)
152 END;
153 END ReadName;
154
155
156 (* ReadString Read and hash a string
157 *)
158 PROCEDURE ReadString(VAR spixCARDINAL);
159 VAR
160 och: CHAR;
161 idp: CARDINAL;
162 BEGIN
163 idp:=idact; och:=ch;
164 INC(idp); id[idp]:=och; NextCh; (*store quote*)
165 LOOP
J66 if ch=och THEN NextCh; EXIT;
167 ELSIF ch=EF THEN SemErr(24,line,col); EXIT;
168 ELSIF ch=EOL THEN SemErr(23,line,col); EXIT;
J69 ELSE INC(idp); id[idp]:=ch; NextCh; END;
17° END;
111 INC(idp); id[idp]:=och; (*store quote*)
112 INC(idp); id[idpJ:=0C;
173 Hash(idp,spix)
j74 END ReadString;
176
177 (* RestartHash Causes identifiers to be stored permanently
278
Program listings
App.p
178
179 PROCEDURE RestartHash;
180 BEGIN storeid:=TRUE; END RestartHash;
181
182
183 (* StopHash Causes identifiers to be stored temporarily
184
185 PROCEDURE StopHash;
186 BEGIN storeid:=FALSE; END StopHash;
187
188
189 (* ReadNumber Read and convert cardinal constant
190
191 PROCEDURE ReadNumber(VAR valrCARDINAL);
192 BEGIN
193 val:=0;
WHILE class[ch]=digit DO
IF (val>6553) OR ( (val=6553) AND (ch>'5') )
THEN
SemErr(22,line,col);
WHILE class[ch]=digit DO NextCh; END;
ELSE
val: =10*val+VAL(CARDINAL,ORD(ch)-ORD(f0')) ;
NextCh;
END;
END;
END ReadNumber;
194
195
196
197
198
199
200
201
202
203
204
205
206
207 (* GetSy
208
get next lexical symbol
209 PROCEDURE GetSy;
210 VAR valrCARDINAL;
211 BEGIN
212 REPEAT
213 WHILE ch=' »
214 col:=column;
DO NextCh; END;
215 CASE class[ch] OF
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
none:
letter:
digit:
quote:
eql:
period:
variant:
lpar:
rpar:
lbrack:
rbrack:
lconbr:
rconbr:
latpar:
ratpar:
semicolon:
colon:
comma:
endfile:
endline:
typ:=nococosy; at[l]:=ORD(ch); NextCh;
ReadName(typ,va1);
IF typ=ident THEN at[l]:=val; END;
ReadNumber(at[1]); typ:=number;
ReadString(at[l]); typ:=string;
typ:~eqlsy; NextCh;
typ:=periodsy; NextCh;
typ:-variantsy; NextCh;
typ:=lparsy; NextCh;
typ:=rparsy; NextCh;
typ:=lbracksy; NextCh;
typ:=rbracksy; NextCh;
typ:=lconbrsy; NextCh;
typ:=rconbrsy; NextCh;
typ:=latparsy; NextCh;
typ:=ratparsy; NextCh;
typ:=semicolonsy; NextCh;
typ:=colonsy; NextCh;
typ:=commasy; NextCh;
typ:=eofsy;
typ:=notyp;
ApPF
cocolexMOD
279
column:=0; INC(line); NextCh;
IF (line MOD 16)=0 THEN (*update counter on screen*)
IF line>16 THEN
FOR i:=l TO 5 DO Write(con,IOC) END;
END;
WriteCard(con,line,5)
| dollar:
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269 BEGIN (*cocolex*)
FOR c:="A" TO "Z» DO ddt[c]:=FALSE END;
FOR c:=0C TO 377C DO classic]:=none; END;
FOR c:=»a' TO 'z' DO classic]:=letter; END;
FOR c:='A' TO 'Z' DO classic]:=letter; END;
FOR c:='0' TO '9' DO classic] :=digit; END;
NextCh;
IF CAP(ch)="D" (*debug option*)
THEN
NextCh;
WHILE (CAP(ch)>="A") AND (CAPtchX^Z") DO
ddt[CAP(ch)]:=TRUE; NextCh
END;
IF ddt ["A"] THEN printinput:=TRUE END;
IF ddtf-B") THEN printnodes:=TRUE END;
WHILE chOEOL DO NextCh; END;
typ:=notyp;
ELSE typ:=nococosy; at[l]:=ORD('$');
END;
NextCh;
IF ch='-'
THEN
WHILE chOEOL DO NextCh; END;
typ:=notyp;
ELSE typ:=nococosy; atfl] :=ORD('-');
END;
END; (*CASE*)
UNTIL typonotyp;
END GetSy;
I minus:
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
class[EF]:=endfile;
class I"'"]:=quote;
class I •(']:=lpar;
class I'-1]:=minus;
class[EOL]:=endline;
classic1] :=dollar;
class I')' ]:=rpar;
class[•.']:=period;
class I';1] :=semi colon; classic*]:
class['>']:
classfC):
*ratpar;
=lconbr;
class['[']:
class!1|»];
=latpar;
=lbrack;
^variant;
class I •"
class[',
class[•:
class [• =
class[•]
class['}
]:=quote;
]:=comma;
]:=colon;
]:=eql;
]:=rbrack;
]:=rconbr;
column:=0; col:=0; line:=l; ch:=
FOR i:-0 TO htmax-1 DO ht[i]:-0; END;
storeid:=TRUE;
id[0]:="E"; id[l]:-"0"; id(2]:="F"; id[3]:=0C;
idact:=3;
EnterKey( 1,'ALIAS');
EnterKey( 2,'ANY');
EnterKey( 3,'DECLARATIONS')
EnterKey( 4,'ENDGRAM');
EnterKey( 5,'ENDSEM');
EnterKey( 6,'EPS');
EnterKey( 7,'GRAMMAR');
EnterKey( 1,'alias');
EnterKey( 2,'any');
EnterKey( 5,'endsem');
EnterKey( 6,'eps');
280
Program listings
App.
296 EnterKey( 8,'IN'); EnterKey( 8,'in');
297 EnterKey( 9,'MACROS');
298 EnterKey(10,'NONTERMINALS');
299 EnterKey(11,'OUT'); EnterKey(11,'out');
300 EnterKey(12,'PRAGMAS');
301 EnterKey(13,'RULES');
302 EnterKey(14,'SEM'); EnterKey(14,'sem');
303 EnterKey(15,'SEMANTIC');
304 EnterKey(16,'TERMINALS');
305 END cocolex.
ABS 105
at 216 218 219 220 255 262
bp 53
bpmax 53
buf 52
buflen 41 52
C 51 51 121 132 147 172 240 271 271 287
c 50 270 270 271 271 272 272 273 273 274 274
CAP 245 248 248 249
ch 61 75 144 144 145 163 166 167 168 169 194 195
198 200 213 215 216 245 248 248 249 253 258 260
283
Charclass 44 51
class 51 144 144 194 198 215 271 272 273 274 275 275
276 276 276 277 277 277 278 278 278 279 279 279
280 280 280 281 281 281
cocolex 12 305
cocosyn 13
col 167 168 197 214 283
colon 46 233 278
colonsy 36 233
column 62 75 214 237 283
comma 46 234 277
commasy 37 234
con 15 240 242
d 82 95 103 104 105
ddt 249 251 252 270
digit 45 144 194 198 219 274
dollar 47 244 276
EF 15 167 275
endfile 46 235 275
endline 47 236 275
EnterKey 114 123 289 289 290 290 291 292 293 293 294 294
295 296 296 297 298 299 299 300 301 302 302 303
304
eofsy 20 235
EOL 15 168 253 260 275
eql 45 221 279
eqlsy 24 221
Equalld 84 90 100
Errors 14
File 15
FilelO 15
GetName 128 135
GetSy 209 266
h 82 94 97 98 100 101 105 105 129 131 132
Hash 81 109 122 148 173
HIGH 118 131
APPF
cocolexMOD
281
67 97 98 100 101 285
htmax
i
id
idact
ldent
idmax
idp
key
keys
1
latpar
latparsy
lbrack
lbracksy
lconbr
lconbrsy
letter
line
lpar
lparsy
minus
name
NextCh
nococosy
none
notyp
number
och
period
periodsy
printinput
printnodes
quote
ratpar
ratparsy
rbrack
rbracksy
rconbr
rconbrsy
Read
ReadName
ReadNumber
ReadString
RestartHash
Restriction
rpar
rparsy
SemErr
semicolon
semicolonsy
spix
58
63
131
64
150
65
21
57
81
143
171
114
66
82
46
33
45
29
46
31
45
167
45
27
47
128
73
223
237
38
45
40
23
160
45
25
13
13
45
46
34
46
30
46
32
16
140
191
158
179
14
45
28
14
46
35
81
150
67
85
132
88
164
93
151
64
93
145
171
118
122
84
230
230
226
226
228
228
144
168
224
224
257
131
76
224
244
216
216
236
219
163
222
222
251
252
220
231
231
227
227
229
229
75
153
204
174
180
104
225
225
167
232
232
93
151
94
87
133
88
169
93
218
108
98
145
172
119
149
88
279
280
281
217
197
277
278
133
145
225
247
255
271
254
164
278
276
280
280
281
217
219
220
108
277
168
279
94
158
95
88
133
94
171
98
108
147
172
89
272
237
164
226
249
262
261
166
276
197
94
173
104
88
240
94
172
117
115
147
173
93
273
238
166
227
253
265
171
98
105
88
285
117
287
117
117
148
94
239
169
228
257
100
285
88
285
119
287
117
119
161
100
242
198
229
260
101
89
121
287
143
119
163
128
283
201
230
128
115
132
287
163
121
164
131
213
231
131
118
133
288
121
164
132
216
232
141
119
145
122
169
133
221
233
148
129
147
141
169
133
222
234
149
282 Program listings App. p
src
StopHash
storeld
string
sy
SYSTEM
typ
val
VAL
variant
varlantsy
Write
WriteCard
WriteString
WriteText
X
y
75
185
68
22
114
17
140
225
254
140
17
45
26
16
16
16
16
84
84
186
98
220
117
150
226
255
150
200
223
223
240
242
88
88
180
151
227
261
151
281
186
216
228
262
191
286
217 218 219 220 221 222 223 224
229 230 231 232 233 234 235 236
265
193 195 195 200 200 210 217 218
ApPF
cocolstDEF
283
l {* cocolst Prints listing of Cocol text Hoe 16.8.87
a ======= ============================
3 This module closes the source file and reopens it for reading. It prints
4 a listing of the source file with line numbers and error messages.
5 *>
6 DEFINITION MODULE cocolst;
7 FROM FilelO IMPORT File;
8
9 VAR 1st: File; (*list file*)
10
U PROCEDURE PrintListing;
12
13 END cocolst.
284
Program listings
App.F
1 (* cocolst Prints listing of Cocol text Moe 16.8.87
3 This module closes the source file and reopens it for reading. It prints
4 a listing of the source file with line numbers and error messages.
5 *,
6 IMPLEMENTATION MODULE cocolst;
7 FROM cocolex IMPORT src;
8 FROM Errors
9 FROM FilelO
10
11
12
13 (* GetLine
14
IMPORT Errorptr, GetNextSynErr,GetNextSemErr, PrlntSynError;
IMPORT File, EF, EOLr Open, Close, Read, Write,
WriteString, WriteCard, WriteLn;
Read a source line. Return empty line if eof.
15 PROCEDURE GetLine(f:File; VAR line:ARRAY OF CHAR);
16 VAR ch:CHAR; i:CARDINAL;
17 BEGIN
18 Read(f,ch); i:=0;
19 WHILE (chOEOL) AND (choEF) DO line[i]:=ch; INC(i); Read(f,ch) END;
20 IF (i=0) AND (ch-EF) THEN line[0]:=EF ELSE line[i]:=0C END;
21 END GetLine;
22
23
24
25
(* PrintSemError Print semantic error message
26 PROCEDURE PrintSemError(f:File; nr,col:CARDINAL);
27 VAR i:CARDINAL;
28 BEGIN
WriteString(f,"*****
WriteString(f,"A ");
CASE nr OF
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
.55
56
57
58 PROCEDURE PrintListing;
59 VAR
"); FOR i:=l TO col-1 DO Write(f," ") END;
"Symbol declared twice");
"Grammar name is no nonterminal");
"Undeclared symbol");
"Terminal on left-hand side of rule");
"Two rules for the same nonterminal");
"Wrong number of attributes");
"In-attribute for a terminal");
"Wrong attribute direction");
"Wrong attribute name");
"Attribute constant on left-hand side of rule");
"Semantic macro declared twice");
"Undeclared semantic macro");
"Pragma used in rules");
"File •cocosynframe■ not found");
"Number too big");
"End of line in string");
"End of file in string");
"File •cocosemframe' not found");
1: WriteString(f
I 2: WriteString(f
I 3: WriteString(f
I 4: WriteString(f
I 5: WriteString(f
I 6: WriteString(f(
I 7: WriteString(f
I 8: WriteString(f
I 9: WriteString(f
110: WriteString(f
111: WriteString(f
112: WriteString(f
116: WriteString(f
121: WriteString(f,
122: WriteString(f.
123: WriteString(f(
124: WriteString(f(
125: WriteString(f
ELSE WriteString(f,"Error");
END;
WriteLn(f);
END PrintSemError;
(* PrintListing Print a source list with error messages
APPF
cocolstMOD
60
61
62
63
64
65
66
67
68
69
volRef: INTEGER; (*volume or directory of source file*)
srcn: ARRAY[0..63] OF CHAR; (*source name*)
line: ARRAY[0..255] OF CHAR; (*source line*)
symbols: Errorptr;
synlinersyncol: CARDINAL;
semnr: CARDINAL;
semline,semcol: CARDINAL;
lnr: CARDINAL;
sync,seme:CARDINAL;
i: CARDINAL;
(*pointer to error symbols*)
(*line and column of syntax error*)
(♦semantic error number*)
(*line and column of semantic error*)
(*line number*)
(*error counters*)
70 BEGIN
71
72
volRef:=srcA.volRef;
i:=0; REPEAT srcn[i]:=srcA.name[i]; INC(i) UNTIL srcn[i-l]=0C;
73 Close(src); Open(src,volRef,srcn,FALSE);
74 GetNextSemErr(semnr,semline,semcol);
75 GetNextSynErr(symbols,synline,syncol);
76 GetLine(srcrline); lnr: =4; semc:=0; sync:=0;
77 WHILE line[0]<>EF DO
78 WriteCard(lst,lnr,5); WriteStringUst," ");
79 WriteString(1st,line); WriteLn(lst);
80 WHILE synline=lnr DO
81 PrintSynError(lst,symbols,syncol); INC(sync);
82 GetNextSynErr(symbols,synline,syncol);
83 END;
84 WHILE semline^lnr DO
85 PrintSemError(1st,semnr,semcol); INC(semc);
86 GetNextSemErr(semnr,semline,semcol);
87 END;
88 GetLine(src,line); INC(lnr);
END;
WHILE symbolsoNIL DO
PrintSynError(1st,symbols,syncol); INC(sync);
GetNextSynErr(symbols,synline,syncol);
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103 END cocolst.
WHILE semnr<>0 DO
PrintSemError(1st,semnr,semcol); INC(seme);
GetNextSemErr(semnr,semline,semcol);
END;
WriteLn(lst);
WriteCard(1st,sync,5); WriteString(1st,"
WriteCard(lst,semc,5); WriteString(1st,"
END PrintListing;
syntax error(s)$");
semantic error(s)$$*
C
ch
Close
cocolex
cocolst
col
EF
EOL
Errorptr
Errors
f
File
20
16
9
7
6
26
8
72
18
73
103
29
19
19
63
19 19 19 19 20
20 20 77
15 18 19
37 38 39
49 50 52
9 15 26
26
40
29
41
29
42
30
43
32
44
33
45
34
46
35
47
286
Program listings
arm*
FilelO
GetLlne
GetNextSemErr
GetNextSynErr
1
line
lnr
1st
name
nr
Open
PrintListing
PrintSemError
PrintSynError
Read
seme
semcol
semline
semnr
sre
sren
symbols
sync
syncol
synline
volRef
Write
WriteCard
WriteLn
WriteString
9
15
8
8
16
72
15
67
78
100
72
26
9
58
26
8
9
68
66
66
65
7
61
63
68
64
64
60
9
10
10
10
41
21
74
75
18
72
19
76
78
31
73
101
53
81
18
76
74
74
74
71
72
75
76
75
75
71
29
78
52
29
42
76
86
82
19
20
78
79
85
91
19
85
85
84
85
72
72
81
81
81
80
71
99
79
30
43
88
96
92
19
20
80
79
95
95
86
86
86
73
73
82
91
82
82
73
100
98
32
44
20
62
84
81
100
95
96
94
73
90
99
91
92
33
45
20
76
88
85
96
95
76
91
92
34
46
27
77
91
96
88
92
35
47
29
79
95
36
48
69
88
98
37
49
72
99
38
50
72
99
39
78
72
100
40
79
99 100
APPF
cocosemDEF
287
l (* Generated semantic analyzer
3 This module is produced by Coco from the semantic actions of an
4 attributed grammar.
5
6 DEFINITION MODULE cocosem;
7 VAR printactions: BOOLEAN; (*trace the executed semantic actions*)
8 PROCEDURE Semant(sem:CARDINAL);
9 END cocosem.
288
Program listings
AppiF
1
2 (* Generated semantic analyzer
3 ===========================
4 This module is produced by Coco from the semantic actions of an
5 attributed grammar.
6 *j
7 IMPLEMENTATION MODULE cocosem;
8 FROM FilelO IMPORT con, WriteCard, WriteString;
9 FROM SYSTEM IMPORT WORD;
10 FROM cocolex IMPORT at;
11
12
13 FROM cocogen IMPORT Attrtype,CloseFile,Copy,EmitAction,GenAssign,
14 InsertFramePart,OpenFile,OpenSem,StartCopy;
15 FROM cocogra IMPORT alts,rules,rootloc,ConcatLeft,ConcatRight,
16 GetNode,GraphList,Graphnode,NewNode,RepNode;
17 FROM cocolex IMPORT typ,line,col,ddt,RestartHash,StopHash;
18 FROM cocosym IMPORT gramspix,CompleteAt,Direction,
19 GetAt,GetMacroNr,GetSy,NewAt,NewMacro,
20 NewSy,RepSy,Symbolnode,Symboltype,SyNr;
21 FROM Errors IMPORT CompErr,Restriction,SemErr;
22 FROM SYSTEM IMPORT VAL;
23 CONST null=65535;
24 TYPE Usage=(def,check,use);
25 VAR sn:Symbolnode;
26 sy,syl:CARDINAL;
27 rootsyrCARDINAL;
28 eofsyrCARDINAL;
29 gnrGraphnode;
30 gp,gpl,gp2,gp3:CARDINAL;
31 gl,gll,gl2,gl3rCARDINAL;
32 dd,ddl,dd2:BOOLEAN;
33 gporCARDINAL;
34 firstfactrBOOLEAN;
35 kind:Usage;
36 styprSymboltype;
37 dir,dirldirection;
38 countrCARDINAL;
39 n:CARDINAL;
40 semi, sem2, sem3: CARD INAL ;
41 firstsymbolrBOOLEAN;
42 ok:BOOLEAN;
43 spix,spixl:CARDINAL;
44 dummy: CARDINAL;
45 MODULE SEMANTICSTACK;
46 IMPORT CompErr,Restriction;
47 EXPORT Pop,Push;
48 CONST maxstacksize=70;
49 VAR stack:ARRAY[1..maxstacksize]OF CARDINAL;
50 sprCARDINAL;
51 PROCEDURE Pop():CARDINAL;
52 VAR xrCARDINAL;
53 BEGIN IF sp=0 THEN CompErr(6);ELSE x:=stack[sp];DEC(sp);END;
54 RETURN x;
55 END Pop;
56 PROCEDURE Push(x:CARDINAL);
57 BEGIN IF sp<maxstacksize
58 THEN INC(sp);stack[sp]:=x;
59 ELSE Restriction(14);
ApPF
cocosem.MOD
289
60 END;
61 END Push;
62 BEGIN sp:=0;
63 END SEMANTICSTACK;
64 PROCEDURE Error(nr:CARDINAL);
65 BEGIN SemErr(nr,line,col);END Error;
66
67 PROCEDURE ASSIGN(VAR xrWORD; y:WORD);
68 BEGIN
69 x:=y;
70 END ASSIGN;
71
72 PROCEDURE Semant (sem: CARDINAL);
73 BEGIN
74 (*IF printactions THEN
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
WriteString(con,"$ [");
WriteCard(con,sem,3);
WriteString(con,"] ");
END;
*)
CASE sem OF
11: ;
I 12:
1 13:
(*line 125*)
INC(count);
CASE kind OF
use:
IF styp=nt THEN
GetAt(sy,count,spixl,dirl);
IF spixloO THEN
IF dir-dirl
THEN GenAssign(nonterm,spixl,spix) ;
ELSE Error(8);END;
END;
END;
1 check:
IF styp=nt THEN
GetAt(sy,count,spixl,dirl);
IF spixloO THEN
IF spixospixl THEN Error (9) ;END;
IF dirodirl THEN Error(8);END;
END;
END;
Idef:
NewAt(sy,spix,dir);
END;
(*line 150*)
INC(count);
CASE kind OF
use:
IF stjp-t THEN
GenAssign(term,spix,count);
ELSIF styp=nt THEN
GetAt (sy,count,spixl,dirl);
IF spixloO THEN
IF dir-dirl
THEN GenAssign (nonterm,spix, spixl)
ELSE Error(8);
END;
END;
Program listings
END;
I check:
IF styp=nt THEN
GetAt (sy,count,spixl,dirl);
IF spixloO THEN
IF spixospixl THEN Error (9) ;END;
IF dirodirl THEN Error(8);END;
END;
END;
Idef:
NewAt (sy, spix,dir);
IF styp-pr THEN
GenAssign(term,spix,count);
END;
END;
14: (*line 181*)
INC(count);
IF kind=use
THEN IF styp=nt THEN
GetAt (sy,count,spixl,dirl);
IF spixloO THEN
IF dirodirl
THEN GenAssign(const,spixl,n);
ELSE Error(8);
END;
END;
END;
ELSE Error(10);
END;
15: (*line 198*)
IF NOT CompleteAMsy,count)THEN
Error(6);
END;
16: (*line 204*)
Copy(typ,col)
17: (*line 208*)
StartCopy(l)
18: (*line 212*)
firstfact:=VAL(BOOLEAN,Pop());
ddl:=VAL(BOOLEAN,Pop());gll:=Pop();gpl:=Pop();
dd:-VAL(BOOLEAN,Pop());gl:=Pop();gp:=Pop();
gpo:=0
19: (*line 219*)
Push(gp);Push(gl);Push(VAL(CARDINAL,dd));
Push(gpl);Push(gll);Push(VAL(CARDINAL,ddl));
Push(VAL(CARDINAL,firstfact));
20: (*line 225*)
sy:-SyNr(splx);
IF sy=null
THEN sy:=NewSy(spix,styp)
ELSE Error(1);
END;
21: (*line 349*)
ASSIGN(gramspix,at(1]);
22: (*llne 349*)
rules:=0;alts:=0;
OpenFlle(gramspix);StopHash;
23: (*line 357*)
RestartHash;
APPF
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
cocosemMOD
InsertFramePart;styp:=t;
(*line 363*)
eofsy:=NewSy(0,t)
(*line 365*)
styp:=t;
kind:=def;
(*line 368*)
styp:=pr
(*line 370*)
styp:=pr;
kind:=def;
(*line 371*)
GetSy(sy,sn);sn.semi :=sem2;
RepSy(sy,sn);
(*line 376*)
GetSy(sy,sn);sn.sem2:=sem3;
RepSy(sy,sn);
(*line 382*)
styp:=nt
(*line 383*)
ASSIGN(spix,at[1]);
(*line 384*)
styp:=nt;
kind:=def;
(*line 386*)
rootsy:=SyNr(gramspix);
IF rootsy=null THEN Error(2);END;
(*line 390*)
sy:=SyNr(spix);
IF sy=null THEN
Error(3);sy:=NewSy(spix,err)
END;
GetSy(sy,sn);
IF(sn.typont)AND(sn.typoerr)THEN
Error(4);
END;
IF sn.startoO THEN Error(5);END;
syl:=sy;count:=0;styp:=sn.typ
(*line 401*)
kind:=check;
(*line 404*)
GetSy(syl,sn);
sn.start:=gp;sn.del:=dd;
RepSy(syl,sn);
INC(rules);
(*line 410*)
rootloc:=NewNode(nt,rootsy, 0);
gpl:=NewNode(t,eofsy, 0) ;
gl:=rootloc;gll:=gpl;
ConcatRight(rootloc,gl,gpl,gll)
(*line 415*)
IF ddt["L"]THEN GraphLlst;END;
CloseFile;
(*line 420*)
gp:=gpl;
gl:-gll;
dd:=ddl;
(*line 420*)
INC(alts);
291
292
Program listings
%♦*
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
I 41:
I 42:
I 43:
1 44:
1 45:
1 46:
I 47:
! 48:
1 • 49:
1 50:
1 51:
I 52:
I 53:
1 54:
(*llne 422*)
INC(alts);
ConcatLeft(gp,glrgpl,gll);
dd:=dd OR ddl
(*llne 429*)
gpo:=0
(*llne 430*)
flrstfact:-TRUE;
(*line 430*)
gpl:=gp2;
gll:-gl2;
ddl:=dd2;
(*line 431*)
firstfact:=FALSE;
(*llne 432*)
IF gp2<>0 THEN
ConcatRight(gpl,gll,gp2,gl2);
ddl:=ddl AND dd2;
END;
(*line 440*)
sy:=SyNr(spix);
IF sy=null THEN
Error(3);sy:=NewSy(spixrerr)
END;
GetSy(sy,sn);
IF sn.typ=pr THEN Error(16);END;
gp2:=NewNode(sn.typ,syrline) ;
gl2:=gp2;dd2:=FALSE;gpo:=gp2;
count:=0;styp:=sn.typ
(*line 450*)
kind:=use;
(*line 451*)
GetNode(gp2,gn);
gn.semi:=seml;gn.sem2:-sem2;
RepNode(gp2,gn)
(*line 456*)
gp2:=NewNode(eps,0,line);
gl2:=gp2;dd2:=TRUE;gpo:=gp2
(*line 459*)
gp2:=NewNode(any,0,line);
gl2:=gp2;dd2:=FALSE;gpo:=gp2
(*line 462*)
IF gpo=0
THEN gp2:=NewNode(eps, 0 r1ine) ;
gl2:=gp2;dd2:=TRUE;
GetNode(gp2,gn);gn.sem3:=sem
RepNode(gp2rgn);
ELSE GetNode(gpo,gn);gn.sem3:=
RepNode(gpo,gn);
gp2:=0;gl2:=0;gpo:=0
END;
(*line 475*)
gp2:=gp;
gl2:=gl;
dd2:=dd;
(*llne 478*)
gp2:=NewNode(eps,0,line);
gl2:=gp2;
ConcatLeft(gp,gl,gp2,gl2);
APP-
cocosemMOD
293
gp2:=gp;gl2:=gl;dd2:=TRUE;
?07 I 55: (*llne 485*)
*?' gp2:=NewNode(eps,0,line) ;
ZZ gl2:=gp2;
*:0 concatRlght(gpr gl,gp,gl);
;" ConcatLeft(gp,gl,gp2,gl2);
,no gp2:=gp;dd2:-TRUE;
5«3 | 56: (*llne 493*)
304 IF firstfact THEN
305 gp3:=gp2;gl3:=gl2;
306 gp2:=NewNode(eps,0,1lne); gl2:=gp2 ;
307 ConcatRlght(gp2,gl2rgp3rgl3);
308 END'
309 | 57: (*line 502*)
310 seml:=0;sem2:=0
311 | 58: (*line 503*)
312 count:=0;
313 | 59: (*line 510*)
314 IF stypont THEN Error (7) ;END;
315 dir:=down;
316 I 60: (*line 515*)
317 ASSIGN(n,at[1]);
318 I 61: (*line 520*)
319 IF Jcind=use THEN
320 EmitAction (line, semi);
321 END;
322 I 62: (*line 526*)
323 dir:-up
324 I 63: (*line 531*)
325 IF(kind=use)OR(styp=pr)THEN
326 EmitAction(line,sem2);
327 END;
328 | 64: (*line 537*)
329 StopHash;firstsymbol:=TRUE
330 | 65: (*line 538*)
331 RestartHash
332 | 66: (*line 539*)
333 GetMacroNr(spix,sem3);
334 if sem3=0 THEN Error (12);END;
335 | 67: (*line 543*)
336 if firstsymbol THEN
337 firstsymbol:=FALSE;
338 0penSem(line,sem3);StartCopy(col)
339 END;
340 Copy(typ,col)
341 | 68: (*line 549*)
342 RestartHash;
343 | 69: (*line 556*)
344 0penSem(line,sem3);
345 NewMacro(spix,sem3,ok);
346 if NOT ok THEN Error(11);END;
347 StopHash;firstsymbol:=TRUE;
348 | 70: (*line 562*)
3j9 IF firstsymbol THEN
firstsymbol:=FALSE;StartCopy(col)
END;
350
351
352 Copy(typ,col)
353 I 71: (*line 568*)
354 RestartHash
355 I 72: (*line 575*)
294
Program listings
App.F
356
357
358 END;
359 END Semant;
360 BEGIN
361 printactions:=FALSE;
362 END cocosem.
GetSy(sy,sn);sn.aliasspix:-spix;
RepSy(sy,sn);
aliasspix
alts
any
ASSIGN
at
Attrtype
check
CloseFile
cocogen
cocogra
cocolex
cocosem
cocosym
col
CompErr
CompleteAt
con
ConcatLeft
ConcatRight
const
Copy
count
dd
ddl
dd2
ddt
def
del
dir
dirl
Direction
down
dummy
EmitAction
eofsy
eps
err
Error
Errors
FilelO
firstfact
firstsymbol
GenAssign
GetAt
GetMacroNr
GetNode
GetSy
gi
356
15 174
276
67 70
10 172
13
24 94
13 230
13
15
10
7
18
17
21
236 238
172 198 317
198 317
120 217
17
362
65
46
18 149
8
15 239
15 227
141
13 153
38 83
215 265
32 159
32 158
32 248
17 229
24 102
220
37 89
37 87
18 37
315
44
13 320
28 180
273 280
208 211
64 65
204 208
21
153 338 340 350 352
53
295 301
253 300 307
340 352
87 96 106 110 112 122 131 135 138 149
312
162 220 234 240 240 291
163 234 240 248 254 254
254 264 274 277 281 291 296 302
128 183 188 201
99 103 114 125 129 140 315 323
89 96 99 112 114 122 125 138 140
326
225
293 298 306
259
91 98 99 116 124 125 142 146 150 169
212 214 259 262 314 334 346
34 157 164 244 250 304
41 329 336 337 347 349
90 110 115 131 141
96 112 122 138
350
13
19 87
19 333
16 269 282 284
19 190 193 210 219 261 356
31 159 162 226 227 233 239 290 295 296 300 300
APP-F
cocosemMOD
295
gll
gl2
gl3
gn
9P
gpl
gp2
gp3
gpo
gramsplx
GraphList
Graphnode
insertFramePart
kind
line
maxstacksize
n
NewAt
NewMacro
NewNode
NewSy
nonterm
nr
nt
null
ok
OpenFile
OpenSem
Pop
Pr
printactions
Push
RepNode
RepSy
RestartHash
Restriction
rootloc
rootsy
rules
sem
semi
sem2
sem3
Semant
SEMANTICSTACK
SemErr
sn
sp
spix
spixl
301
31
31
299
31
29
30
302
30
30
276
296
30
33
18
16
16
14
35
17
344
48
39
19
19
16
20
90
64
86
23
42
14
14
47
130
361
47
16
20
17
21
15
27
15
72
40
40
40
72
45
21
25
219
50
43
206
43
124
158
247
301
305
269
159
158
246
277
298
305
160
172
229
29
178
84
65
49
141
103
345
224
168
115
65
95
167
345
175
338
51
185
56
271
191
177
46
224
203
174
79
190
190
193
359
63
65
190
220
53
90
208
87
138
163
253
305
307
270
162
163
252
277
299
307
242
175
107
263
57
317
129
225
180
111
204
346
344
55
187
61
283
194
331
59
226
204
222
270
193
282
190
220
53
98
257
88
139
226
264
306
270
220
225
253
280
301
264
203
136
273
263
208
121
207
157
262
162
285
221
342
227
224
270
270
282
191
221
53
103
259
90
141
227
274
307
271
232
226
263
281
302
274
183
276
273
259
137
258
158
325
162
357
354
310
270
284
193
261
57
110
333
96
233
277
282
239
227
264
282
305
277
188
280
276
196
158
162
320
310
284
193
262
58
115
345
97
239
281
282
289
232
264
283
306
279
201
293
280
200
158
163
326
333
194
263
58
124
356
98
247
286
283
295
239
269
286
306
284
217
298
293
211
159
163
334
210
265
62
129
112
253
290
284
296
246
271
289
307
285
267
306
298
224
159
163
338
211
356
131
113
294
284
300
253
273
293
286
319
320
306
314
159
164
344
211
356
166
115
295
285
300
274
294
325
326
345
214
357
168
122
296
301
274
295
338
215
198
123
296
Program listings
App-F
stack 49 53 58
start 214 220
StartCopy 14 155 338 350
StopHash 17 175 329 347
styp 36 86 95 109 111 121 130 137 168 178 182 185
187 196 200 215 265 314 325
sy 26 87 96 103 112 122 129 138 149 166 167 168
190 191 193 194 206 207 208 210 215 257 258 259
261 263 356 357
syl 26 215 219 221
Symbolnode 20 25
Symboltype 20 36
SyNr 20 166 203 206 257
SYSTEM 9 22
t 109 178 180 182 225
term 110 131
typ 17 153 211 211 215 262 263 265 340 352
up 323
Usage 24 35
use 24 85 108 136 267 319 325
VAL 22 157 158 159 162 163 164
WORD 9 67 67
WriteCard 8
WriteString 8
x 52 53 54 56 58 67 69
y 67 69
•$*f
APPF
cocosemframe
i (* Generated semantic analyzer
2 =ai=========================
3 This module is produced by Coco from the semantic actions of an
4 attributed grammar.
6 DEFINITION MODULE —>modulename;
7 VAR printactions: BOOLEAN; (*trace the executed semantic actions*)
8 PROCEDURE Semant(sem:CARDINAL);
9 END —>modulename.
10 -^implementation
11 (* Generated semantic analyzer
12 ===========================
13 This module is produced by Coco from the semantic actions of an
14 attributed grammar.
15
16 IMPLEMENTATION MODULE ~>modulename;
17 FROM FilelO IMPORT con, WriteCard, WriteString;
18 FROM SYSTEM IMPORT WORD;
19 FROM —>scannername IMPORT at;
20
21 — declarations
22
23 PROCEDURE ASSIGN(VAR x:WORD; y:WORD);
24 BEGIN
25 x:=y;
26 END ASSIGN;
27
28 PROCEDURE Semant(sem:CARDINAL);
29 BEGIN
30 (*IF printactions THEN
31 WriteString(con, "$ [");
32 WriteCard(con,
sem,
33 WriteString(con,"]
34 END;*)
35 CASE sem OF
36 11: ;
37 —>actions
38 END;
39 END Semant;
40 BEGIN
41 printactions:=FALSE;
42 END —>modulename.
actions
ASSIGN
at
con
declarations
FilelO
implementation
modulename
Printactions
scannername
sem
Semant
SYSTEM
WORD
WriteCard
37
23
19
17
21
17
10
6
7
19
8
8
18
18
17
26
9
41
28
28
23
3);
■);
16
35
39
23
42
298 Program listings A
/H>p.P
WrlteString 17
x 23 25
y 23 25
APP]
cocosymDEF
Symbol list for coco
Moe 28.12.83
j (* cocosym
2 -—*
% This module
1 a) generates and updates symbol nodes for terminals, pragmas and
nonterminals
b) searches names in the symbol list
c) stores and retrieves attribute information
d) stores and retrieves semantic macros
e) marks deletable symbols in symbol list
f) collects first-sets, follow-sets, eps-sets and any-sets
5
6
7
8
9
10
11
12 DEFINITION MODULE cocosym;
13
14 CONST
15 maxterminals = 128;
16
17 TYPE
Direction = (up,down); (*attribute direction*)
Attributeptr = POINTER TO Attribute;
Attribute = RECORD
spix: CARDINAL; (*name of attribute*)
dir: Direction; (*up,down*)
next: Attributeptr; (*to next attribute of same nt*)
END;
(eps,t,pr,nt,any,err);
RECORD
(♦spelling index of symbol*)
(♦spelling index of alias name*)
(*no.of attributes*)
(*type of symbol*)
(*pragma semantics*)
Symboltype =
Symbolnode =
spix:
aliasspix:
nra:
CASE typ:
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40 VAR
41 maxany:
42 maxeps:
maxt:
maxp:
maxs:
gramspix
CARDINAL;
CARDINAL;
CARDINAL;
Symboltype OF
pr: seml,sem2: CARDINAL;
I nt,err:
start: CARDINAL;
del: BOOLEAN;
firstat: Attributeptr;
END;
END;
Symbolset = ARRAY[0
CARDINAL;
CARDINALS-
CARDINAL;
CARDINAL;
CARDINAL;
CARDINAL;
(*start of top-down graph*)
(*TRUE if deletable*)
(*to first attribute node*)
maxterminals DIV 16] OF BITSET;
(*no.of any-sets*)
(*no.of eps-follower-sets*)
(*no.of last terminal*)
(*no.of last pragma*)
(*no.of last nonterminal*)
(*grammar name, filled by AG*)
43
44
45
46
47
48
49 PROCEDURE ClearSet(VAR s:Symbolset; n:CARDINAL);
50 (* clears set s*)
51
52 PROCEDURE CompleteAt(sy,nr:CARDINAL): BOOLEAN;
53 (* checks if symbol sy has nr attributes*)
54
55 PROCEDURE FindDelSymbols;
56 (* Marks deletable nonterminals and prints them*)
57
58 PROCEDURE GetA(n:CARDINAL; VAR set:Symbolset);
59 (* Gets the any-set with the number n*)
300
Program listings
App.F
60
61 PROCEDURE GetAt(sy,n:CARDINAL; VAR spixCARDINAL; VAR dlr:Directlon);
62 (* Gets the spelling index spix and the direction dir of the n-th
63 attribute of the symbol sy*)
64
65 PROCEDURE GetE(n:CARDINAL; VAR set:Symbolset);
66 (* Gets the eps-follower-set with the number n*)
67
68 PROCEDURE GetF(sy:CARDINAL; VAR first:Symbolset);
69 (* Gets the set of terminal start symbols for the nonterminal sy*)
70
71 PROCEDURE GetFirstSet(loc:CARDINAL; VAR set:Symbolset);
72 (* Gets the terminal start symbols of the graph with the root loc*)
73
74 PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset);
75 (* Gets followers of the nonterminal sy*)
76
77 PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL);
78 (* Gets the number sem of the semantic action corresponding to the
79 macro with the name spix*)
80
81 PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode);
82 (* Gets the symbol node with the index sy*)
83
84 PROCEDURE GetSymbolSets;
85 (* Collects first-sets, follower-sets, eps-sets and any-sets*)
86
87 PROCEDURE IsInSet(n:CARDINAL; VAR s:Symbolset):BOOLEAN;
88 (* TRUE if n is in set s*)
89
90 PROCEDURE NewAt (sy,spix:CARDINAL; dir direction);
91 (* Enters a new attribute for the symbol sy with the spelling index
92 spix and the direction dir*)
93
94 PROCEDURE NewMacro(spix,sem:CARDINAL; VAR ok:BOOLEAN);
95 (* Enters a new semantic macro with the name spix and the action number
96 sem*)
97
98 PROCEDURE NewSy(spix:CARDINAL; typ:Symboltype): CARDINAL;
99 (* Generates a new symbol with the name spix and the type typ and
100 returns its index*)
101
102 PROCEDURE RepSyfsy:CARDINAL; sn:Symbolnode);
103 (* Replaces the symbol sy by the node sn*)
104
105 PROCEDURE SetBit(VAR s:Symbolset; n:CARDINAL);
106 (* Sets bit n in set s*)
107
108 PROCEDURE Unit(VAR si,s2:Symbolset; n:CARDINAL);
109 (* Adds the set s2 to the set si*)
110
111 PROCEDURE SyNr(spix:CARDINAL): CARDINAL;
112 (* Gets the symbol number for the identifier with the name spix*)
113
114 END cocosym.
APPF
cocosymMOD
301
(* cocosym
Symbol list for coco
3 This module
4 a) generates and updates symbol nodes for terminals,
nonterminals
b) searches names in the symbol list
c) stores and retrieves attribute information
d) stores and retrieves semantic macros
Moe 29.12.83
pragmas and
5
6
7
8
9
10
11
e) marks deletable symbols in symbol list
f) collects first-sets, follow-sets, eps-sets and any-sets
*)
12 IMPLEMENTATION MODULE cocosym;
13 FROM cocogra IMPORT maxn, rootloc, ClearMarkList, Deletable, DelNode,
14 GetNode, Graphnode, Mark, Marked, Marklist, RepNode;
IMPORT line, col, ddt, GetName;
IMPORT 1st;
IMPORT CompErr, Restriction, SemErr;
IMPORT con, Write, WriteCard, WriteString,WriteText,WriteLn;
IMPORT Allocate;
IMPORT VAL;
15 FROM cocolex
16 FROM cocolst
17 FROM Errors
18 FROM FilelO
19 FROM System
20 FROM SYSTEM
21
22 CONST
anysetsize
epssetsize
maxsymbols
maxnt
null
eofsy
20;
70;
200;
80;
65535;
0;
(*max.no.of compl.-sets for any-symbols*)
(*max.no.of eps-follower-sets*)
(*max.no.of symbols*)
(*max.number of nonterminals*)
TYPE
Anyset
Epsset
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 VAR
50 anyset: Anyset; (*actual no.of any-sets*)
51 column: CARDINAL; (*printing column for terminal sets*)
52 epsset: Epsset; (*actual no.of eps-sets*)
53 first: Firstset; (*terminal start symbols*)
54 firstmacro: Macroptr; (*first sem macro*)
55 fnt: CARDINAL; (*no.of first nonterminal*)
56 follow: Followset; (*terminal successors*)
57 lastmacro: Macroptr; (*last sem macro*)
58 sn: Symbollist; (*symbol list*)
59
.anysetsize] OF Symbolset;
.epssetsize] OF Symbolset;
.maxnt-1] OF RECORD
(♦terminal symbols*)
(*nts whose start set is to be added*)
.maxnt-1] OF RECORD
(♦terminal symbols*)
(*TRUE if ts is complete*)
(*name of semantic macro*)
(♦associated semantic action*)
(*to next sem macro*)
= ARRAY[1
= ARRAY[1
Followset = ARRAY[0
ts: Symbolset;
nts: Symbolset;
END;
Firstset = ARRAY[0
ts: Symbolset;
ready: BOOLEAN-
END;
Macroptr = POINTER TO Macronode;
Macronode = RECORD
spix: CARDINAL;
sem: CARDINAL;
next: Macroptr;
END;
Symbollist = ARRAY[0..maxsymbols] OF Symbolnode;
302
Program listings
App.p
60
61 PROCEDURE AllBit(VAR s:Symbolset); FORWARD;
62 PROCEDURE DelBlt(VAR s:Symbolset; n:CARDINAL); FORWARD;
63 PROCEDURE PrlntSet(VAR s:Symbolset; n:CARDINAL); FORWARD;
64 PROCEDURE PutNt(sy:CARDINAL); FORWARD;
65 PROCEDURE PutTermSet(VAR s:Symbolset); FORWARD;
66
67
68 (* CompleteAt Test if nr is the correct no.of attributes
69 *j
70 PROCEDURE CompleteAt(syrnr:CARDINAL): BOOLEAN;
71 BEGIN RETURN (sn[sy].nra=nr) OR (sn[sy].typ=err); END CompleteAt;
72
73
74 (* FlndDelSymbols Find all deletable symbols and print them
75 *}
76 PROCEDURE FlndDelSymbols;
77 VAR
78 change: BOOLEAN;
79 dummy: CARDINAL;
80 first: BOOLEAN;
81 i,l: CARDINAL;
82 name: ARRAY[1..50] OF CHAR;
83 sn: Symbolnode;
84 BEGIN
85 fnt:=maxp+l;
86 REPEAT (*while new deletable symbols*)
87 change:=FALSE;
88 FOR i:=maxp+l TO maxs DO
89 GetSy(i,sn);
90 IF (NOT sn.del) AND (sn.startoO) AND Deletable (sn.start) THEN
91 sn.del:=TRUE; RepSy(irsn); change:=TRUE;
92 END;
93 END;
94 UNTIL NOT change;
95
96 first:=TRUE; (*print deletable symbols*)
97 FOR i:=maxp+l TO maxs DO
98 GetSy(i,sn);
99 IF sn.del THEN
100 IF first THEN
101 WriteLn(lst); WriteLn(lst);
102 WriteString(1st."Deletable symbols:"); WriteLn(lst);
103 first:=FALSE;
104 END;
105 GetName(sn.spixrname,l);
106 WriteStringdst," "); WriteText(1st,name, 1); WriteLn(lst);
107 END;
108 END;
109 IF first THEN
110 WriteLn(lst); WriteLn(lst);
111 WriteStringdst,"Grammar contains no deletable symbols.");
112 WriteLn(lst);
113 END;
114 END FlndDelSymbols;
115
116
117 (* GetA Returns the any-set with the number nr
118 *)
APP-
cocosymMOD
303
PROCEDURE GetA(nr:CARDINAL; VAR s:Symbolset);
}8 BEGIN s:=anyset[nr); END GetA;
121
10* {* GetAnySets Find the complement sets for any-nodes
1* -* v * \
124 '
125 PROCEDURE GetAnySets;
126 VAR
127 9n: Graphnode;
128 loc,i: CARDINAL;
129 s: Symbolset;
130 BEGIN (*GetAnySets*)
131 FOR loc:=l TO maxn DO
132 GetNode(loc,gn);
133 IF (gn.typ=any) AND (gn.lpoO) THEN (*any with alternatives*)
134 GetFirstSet(gn.lp,s);
135 FOR i:=0 TO maxt DIV 16 DO (*make complement*)
136 s[i]:=VAL(BITSET,65535)-s[i];
137 END;
138 DelBit(s,eofsy); (*any must not recognize eofsy*)
139 INC(maxany); anyset[maxany]:=s;
140 gn.sp:=maxany; RepNode(loc,gn);
141 END;
142 END;
143 END GetAnySets;
144
145
146 (* GetAt Get name and direction of an attribute
147 *)
148 PROCEDURE GetAt(syrnr:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction);
149 VAR
150 i: CARDINAL;
151 p: Attributeptr;
152 BEGIN
153 if <sn[sy].typont) AND (sn[sy] .typoerr) THEN CompErr(3); END;
154 if (nr>sn[sy].nra) OR (sn[syj.typ=err)
155 THEN spix:=0; dir:=down; (*semantic error*)
156 ELSE
157 p:=sn[sy].firstat;
!58 FOR i:=l TO nr-1 DO p:=pA.next; END;
159 spix:«pA.spix; dir:=^pA.dir;
160 END;
161 END GetAt;
162
163
164 (* GetE Returns the eps-set with the number nr
165 *l
166 PROCEDURE GetE(nrCARDINAL; VAR s:Symbolset);
167 BEGIN s:=epsset[nr]; END GetE;
168
169
,!:? ** GetEpsSets Find the follower symbols for eps-nodes
i/l ^
172 PROCEDURE GetEpsSets;
"3 VAR F
YL\ curnt: CARDINAL;
{'I «s Marklist;
j7'* sn: Symbolnode;
304
Program listings
Afp.*?
178 PROCEDURE FindEpsFollowers(locrleftsy:CARDINAL; VAR nrrCARDINAL);
179 VAR s:Symbolset;
180 BEGIN
181 GetFirstSet(loc,s);
182 IF Deletable(loc) THEN Unit(srfollow[leftsy-fnt].ts,maxt); END;
183 INC(maxeps); epsset[maxeps]:=s;
184 nr:=maxeps;
185 END FindEpsFollowers;
186
187 PROCEDURE FindEps(loc,leftsy:CARDINAL; vialp:BOOLEAN);
188 VAR
189 gn: Graphnode;
190 nr: CARDINAL;
191 BEGIN
192 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
193 Mark(loc,m);
194 GetNode(loc,gn);
195 WITH gn DO
196 IF (typ=eps) AND (vialp OR (lp<>0)) THEN
197 FindEpsFollowers(rp,leftsy,nr);
198 sp:=nr; RepNode(locrgn);
199 END;
200 IF lp<>0 THEN FindEpsUp,leftsy,TRUE); END;
201 IF rp<>0 THEN FindEps(rp,leftsy,FALSE); END;
202 END;
203 END FindEps;
204
205 BEGIN (*GetEpsSets*)
206 ClearMarkList(m);
207 FOR curnt:=maxp+l TO maxs DO
208 GetSy(curntrsn);
209 FindEps(sn.start,curnt,FALSE);
210 END;
211 END GetEpsSets;
212
213
214 (* GetF Returns the terminal start symbols of sy
215 *)
216 PROCEDURE GetF(sy:CARDINAL; VAR s:Symbolset);
217 BEGIN s:=first[sy-fnt].ts; END GetF;
218
219
220 (* GetFirstSet Gets the terminal start symbols of the graph in loc
221 *)
222 PROCEDURE GetFirstSet(loc:CARDINAL; VAR set:Symbolset);
223 VAR m: Marklist; (*mark list for visited nodes*)
224
225 PROCEDURE CollectFirstSet(locCARDINAL; VAR set:Symbolset);
226 VAR
227 gn: Graphnode;
228 sn: Symbolnode;
229 si: Symbolset;
230 BEGIN
231 ClearSet(setfmaxt);
232 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
233 WHILE locoO DO (*for all alternatives*)
234 Mark (loc,m) ;
235 GetNode(loc,gn);
236 IF ddt[MG"] THEN
APPF
cocosymMOD
WriteString(con,"CollectFirstSet:");
2*1 WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ),6);
*L WriteCard(con,gn.sp,6); WriteLn(con);
240 END;
Z^ IF DelNode(gn) THEN
02 CollectFirstSet(gn.rp,sl); Unit(set,sl,maxt);
243 END;
244 CASE gn.typ OF
245 ePs: ;
246 It: SetBit(set,gn.sp);
247 I nt: IF first[gn.sp-fnt].ready
248 THEN Unit(set,first[gn.sp-fnt].ts,maxt);
249 ELSE
250 GetSy(gn.sp,sn);
25i CollectFirstSet(sn.start,si); Unit(set,sl,maxt);
252 END;
253 I any: AllBit(set);
254 END; (*CASE*)
255 loc:=gn.lp;
256 END; (*WHILE*)
257 END CollectFirstSet;
258
259 BEGIN (*GetFirstSet*)
260 ClearMarkList(m);
261 CollectFirstSet(loc,set);
262 IF ddt["H"] THEN
263 WriteString(con,"GetFirstSet:"); PrintSet(set,maxt);
264 END;
265 END GetFirstSet;
266
267
268 (* GetFollowSets Get terminal successors of nonterminals
269
270 PROCEDURE GetFollowSets;
271 VAR
272 change: BOOLEAN;
273 i,n,nl: CARDINAL;
274 m: Marklist;
275 sn: Symbolnode;
276
277 PROCEDURE CollectFollowSets(loc,sym:CARDINAL);
278 VAR
279 gn: Graphnode;
280 set: Symbolset;
281 BEGIN
282 WHILE locoO DO (*step through alternative chain*)
283 if Marked(loc,m) THEN RETURN; END; (*cycle*)
284 Mark(loc,m);
285 GetNode(loc,gn);
286 WITH gn DO
287 if ddt["J"] THEN
288 WriteString(con,"CollectFollowSets ");
289 WriteCard(con,loc,6); WriteCard(con,sp,6);
290 WriteCard(con,sym,6); WriteLn(con);
291 END;
292. if typ=nt THEN
293 GetFirstSet(rp,set);
294 Unit(follow[sp-fnt].ts,set,maxt);
295 IF Deletable(rp) THEN
306
Program listings
App.F
296 SetBit(follow[sp-fnt].nts,sym-fnt);
297 END;
298 IF ddt["I"] THEN
299 WriteString(con,"CollectFollowSets:");
300 WriteCard(con,loc,6);
301 WriteString(con,"$ "); PrintSet(follow[sp-fnt].ts,maxt);
302 WriteString(con,"$ ");
303 PrintSet(follow[sp-fnt].nts,maxs-maxp);
304 WriteLn(con);
305 END;
306 END; (*IF typ=nt*)
307 CollectFollowSets(rp,sym);
308 loc:=lp;
309 END; (*WITH*)
310 END; (*WHILE*)
311 END CollectFollowSets;
312
313 PROCEDURE Complete(i:CARDINAL); (*add indirect successors of*)
314 VAR j: CARDINAL; (*i+fnt to follow[i] .ts*)
315 BEGIN
316 IF Marked(i,m) THEN RETURN; END; (*already visited*)
317 Mark(i,m);
318 FOR j:=0 TO maxs-fnt DO
319 IF IsInSet(j,follow[i].nts) THEN
320 Complete(j);
321 Unit(follow[i].ts,follow!j].ts,maxt);
322 END;
323 END;
324 END Complete;
325
326 BEGIN (*GetFollowSets*)
327 FOR i:=fnt TO maxs DO
328 ClearSet(follow[i-fnt].ts,maxt);
329 ClearSet(follow[i-fnt].nts,maxs-fnt);
330 END;
331
332 ClearMarkList(m);
333 FOR i:=fnt TO maxs DO (*get direct successors of nonterminals*)
334 GetSy(i,sn);
335 IF ddt["I"] THEN
336 WriteString(con,"GetFollowSets(0):"); WriteCard(con,sn.start,6);
337 WriteCard(con,i,6); WriteLn(con);
338 END;
339 CollectFollowSets(sn.start,i);
340 END;
341 CollectFollowSets(rootloc,maxs+l); (*successors of grammar symbol*)
342
343 FOR i:=0 TO maxs-fnt DO (*add indirect successors to follow.ts*)
344 ClearMarkList (m) ;
345 Completed);
346 ClearSet(follow[i].nts,maxt);
347 END;
348
349 IF ddt["I"] THEN
350 WriteString(con,"GetFollowSets(3):$");
351 FOR i:=0 TO maxs-fnt DO
352 WriteCard(con,fnt+i,6); PrintSet(follow[i].ts,maxt);
353 WriteLn(con);
354 END;
355 END;
APP-F
cocosymMOD
307
356 END GetFollowSets;
357
358
359 (* GetFo Get follow-set of nonterminal sy
360 *>
361 PROCEDURE GetFo (sy:CARDINAL; VAR set:Symbolset);
362 BEGIN set:=follow[sy-fnt].ts; END GetFo;
363
364
365 (* GetMacroNr Get semantic macro
366 *)
367 PROCEDURE GetMacroNr (spix:CARDINAL; VAR sem:CARDINAL) ;
368 VAR p: Macroptr;
369 BEGIN
370 p:=firstmacro;
371 WHILE (pONIL) AND (pA.spixospix) DO p:=pA.next; END;
372 IF p=NIL THEN sem:=0; ELSE sem:=pA.sem; END;
373 END GetMacroNr;
374
375
376 (* GetSy Gets the symbol sy
377 *)
378 PROCEDURE GetSy(sy:CARDINAL; VAR snl:Symbolnode) ;
379 BEGIN snl:=sn[sy]; END GetSy;
380
381
382 (* GetSymbolSets Get first-sets, follower-sets, eps-sets and any-sets
383 *)
384 PROCEDURE GetSymbolSets;
385 VAR
386 i: CARDINAL;
387 sn: Symbolnode;
388 BEGIN
389 fnt:=maxp+l;
390 FOR i:-0 TO maxs-fnt DO first[i].ready:=FALSE; END;
391 FOR i:=fnt TO maxs DO
392 GetSy(i,sn);
393 GetFirstSet(sn.start,first[i-fnt].ts);
394 first[i-fnt].ready:-TRUE;
395 END;
396 GetFollowSets;
397 GetEpsSets;
398 GetAnySets;
399 IF ddt["K"] THEN (*print first-sets and follow-sets*)
400 WriteLn(lst);
401 WriteString(lst,"List of terminal start symbols:"); WriteLn(lst);
402 FOR i:=fnt TO maxs DO
403 PutNt(i); PutTermSet(first[i-fnt].ts);
404 END;
405 WriteLn(lst); WriteLn(lst);
406 WriteString(1st,"List of terminal successors:"); WrlteLn(lst);
407 FOR i:=fnt TO maxs DO
408 PutNt(i); PutTermSet(follow[i-fnt].ts);
409 END;
410 END;
411 END GetSymbolSets;
412
413
414 (* NewAt Enter new attribute for a symbol
308
Program listings
App.F
415 «]
416 PROCEDURE NewAt (sy,spx:CARDINAL; dir:Direction);
417 VAR
418 i: CARDINAL;
419 prat: Attributeptr;
420 BEGIN
421 WITH sn[sy] DO
422 INC(nra);
423 IF typ=nt THEN (*store name and direction*)
424 Allocate(atr SIZE(Attribute));
425 atA.spix:=spx; atA.dir:=dir; atA.next:=NIL;
426 IF firstat=NIL
427 THEN firstat:=at;
428 ELSE
429 p:=firstat; WHILE pA.next<>NIL DO p:=pA.next END;
430 pA.next:=at;
431 END;
432 END;
433 END;
434 END NewAt;
435
436
437 (* NewMacro Enter new semantic macro
438 *)
439 PROCEDURE NewMacro(spixrsem:CARDINAL; VAR ok:BOOLEAN);
440 VAR prs: Macroptr;
441 BEGIN
442 p:=firstmacro;
443 WHILE (pONIL) AND (pA.spix<>spix) DO p:=pA.next; END;
444 IF p^NIL
445 THEN
446 ok:=TRUE;
447 Allocate (s,SIZE(Macronode));
448 sA.spix:=spix; sA.sem:=sem; sA.next:=NIL;
449 IF firstmacro=NIL
450 THEN firstmacro:=s; lastmacro:=s;
451 ELSE lastmacroA.next:=s; lastmacro:=s;
452 END;
453 ELSE ok:=FALSE;
454 END;
455 END NewMacro;
456
457
458 (* NewSy Generate a new symbol and return index
459 *)
460 PROCEDURE NewSy(spx:CARDINAL; tprSymboltype): CARDINAL;
461 VAR i: CARDINAL;
462 BEGIN
463 IF maxs=null THEN maxs:=0; ELSE INC(maxs); END;
464 IF maxs>=maxsymbols THEN Restriction(6); END;
465 WITH sn[maxs] DO
466 typ:=tp; spix:=spx; aliasspix:=spix; nra:=0;
467 CASE typ OF
468 t:
469 IF maxt=null THEN maxt:=0; ELSE INC(maxt); END;
470 IF maxp=null THEN maxp:-0; ELSE INC(maxp); END;
471 IF maxt>=maxterminals THEN Restriction(7); END;
472 j pr:
473 IF maxp=null
APPF
cocosymMOD
309
74 THEN SemErr(25rline,col); maxp:=0; maxt:=0;
]75 ELSE INC(maxp);
476 END'
477 seml:=0; sem2:=0;
478 I nt,err:
Al9 start:=0; del:=FALSE; firstat:«NIL;
480 END; (*CASE*)
481 END; (*WITH*)
482 RETURN maxs;
483 END NewSy;
484
485
486 (* RepSy Replace symbol sy
487 *)
488 PROCEDURE RepSy(sy:CARDINAL; snl:Symbolnode);
489 BEGIN sn[sy]:=snl; END RepSy;
490
491
492 (* SyNr Gets index of name splx
493 *)
494 PROCEDURE SyNr(splx:CARDINAL): CARDINAL;
495 VAR i: CARDINAL;
496 BEGIN
497 IF maxs=null THEN RETURN null; END;
498 i:=0;
499 WHILE (i<=maxs) AND (sn[i] .spixospix) DO INC(i); END;
500 IF i>maxs THEN i:»null; END;
501 RETURN i;
502 END SyNr;
503
504
505 (* AllBit Set all bits in set s
506 *)
507 PROCEDURE AllBit(VAR s:Symbolset);
508 VAR i: CARDINAL;
509 BEGIN
510 FOR i:=0 TO maxterminals DIV 16 DO s[i]:=VAL(BITSET,65535); END;
511 END AllBit;
512
513
514 (* ClearSet Clears set s
515 *)
516 PROCEDURE ClearSet(VAR siSymbolset; n:CARDINAL);
517 VAR i: CARDINAL;
518 BEGIN FOR i:-0 TO n DIV 16 DO s[i]:={}; END; END ClearSet;
519
520
521 (* DelBit Deletes bit n in set s
522 *)
523 PROCEDURE DelBit(VAR s:Symbolset; n:CARDINAL);
524 BEGIN EXCL(s[n DIV 16], n MOD 16); END DelBit;
525
526
527 (* Empty TRUE if set s is empty
528 *)
529 PROCEDURE Empty(VAR s:Symbolset; n:CARDINAL):BOOLEAN;
530 VAR i CARDINAL;
531 BEGIN
532 FOR i:=0 TO n DIV 16 DO
310
Program listings
App-F
533
534
535
536
537
538
539
IF S[!]<>{} THEN RETURN FALSE; END;
END;
RETURN TRUE;
END Empty;
(* InSet TRUE if si <= s2
54 o *}
541 PROCEDURE InSet(VAR sl,s2:Symbolset; n:CARDINAL):BOOLEAN;
542 VAR i: CARDINAL;
543 BEGIN
544 FOR i:-0 TO n DIV 16 DO
545 IF NOT(sl[i]<=s2[i]) THEN RETURN FALSE; END;
546 END;
547 RETURN TRUE;
548 END InSet;
549
550
551 (* IsInSet TRUE if n is in set s
552 *)
553 PROCEDURE IsInSet(n:CARDINAL; VAR s:Symbolset):BOOLEAN;
554 BEGIN RETURN (n MOD 16) IN s[n DIV 16]; END IsInSet;
555
556
557 (* PrintSet ddt output of set s
558 *)
559 PROCEDURE PrintSet(VAR s:Symbolset; n:CARDINAL);
560 VAR i: CARDINAL;
561 BEGIN
562 FOR i:=0 TO n DIV 16 DO
563 WriteCard(conrVAL(CARDINALrs[i]) DIV 256,4);
564 WriteCard(con,VAL(CARDINAL,s[i]) MOD 256,4);
565 END;
566 END PrintSet;
567
568
569 (* PutNt Print name of nonterminal sy
570 *)
571 PROCEDURE PutNt(sy:CARDINAL);
572 VAR
573 1: CARDINAL;
574 name: ARRAY[1..50] OF CHAR;
575 sn: Symbolnode;
576 BEGIN
577 GetSy(sy,sn); GetName(sn.spix,name,l);
578 WHILE K12 DO INC(l); name[l]:=" "; END;
579 WriteLn(lst);
580 WriteStringdst," H); WriteText(1st,name, 1); Write(lst," ");
581 column:=15;
582 END PutNt;
583
584
585 (* PutTermSet Print names of terminals in set s
586 *)
587 PROCEDURE PutTermSet(VAR siSymbolset);
588 CONST maxlinelen = 72;
589 VAR
590 i,l: CARDINAL;
591 name: ARRAY[1..50] OF CHAR;
APPF
cocosymMOD
311
592 sn: Symbolnode;
593 BEGIN
594 FOR i:=0 TO maxt DO
595 IF IsInSet(i,s) THEN
595 GetSy(i,sn); GetName(sn.spix,name,l);
597 IF column+l>maxlinelen THEN
598 WriteLn(lst); WriteStringUst," ");
599 column:-15;
600 END;
60i WriteText(1st,name,1); WriteStringUst," ");
602 INC(column,1+2);
603 END; (*IF IsInSet*)
604 END; (*FOR*)
605 WriteLn(lst);
606 END PutTermSet;
607
608
609 (* SetBlt Sets bit n in set s
610
611 PROCEDURE SetBit(VAR s:Symbolset; n:CARDINAL);
612 BEGIN INCL(s[n DIV 16],n MOD 16); END SetBit;
613
614
615 (* Unit si :- si + s2
616
617 PROCEDURE Unit(VAR sl,s2:Symbolset; n:CARDINAL);
618 VAR i: CARDINAL;
619 BEGIN FOR i:=0 TO n DIV 16 DO si[i]:=sl[i]+s2[i]; END; END Unit;
620
621
622 BEGIN (*cocosym*)
623 maxt:=null; maxp:*null; maxs:=null; firstmacro:~NIL;
624 maxany:=0; maxeps:=0;
625 END cocosym.
aliasspix 466
AllBit 61 253 507 511
Allocate 19 424 447
any 133 253
Anyset 31 50
anyset 50 120 139
anysetsize 23 31
at 419 424 425 425 425 427 430
Attribute 424
Attributeptr 151 419
change 78 87 91 94 272
ClearMarkList 13 206 260 332 344
ClearSet 231 328 329 346 516 518
cocogra 13
cocolex 15
cocolst 16
cocosym 12 625
col 15 474
CollectFirstSet 225 242 251 257 261
CollectFollowSets 277 307 311 339 341
column 51 581 597 599 602
CompErr 17 153
Complete 313 320 324 345
CompleteAt 70 71
312
Program listings
App.F
con
curnt
ddt
del
DelBit
Deletable
DelNode
dir
Direction
down
dummy
Empty
eofsy
eps
Epsset
epsset
epssetsize
err
Errors
EXCL
FilelO
FindDelSymbols
FindEps
FindEpsFollowers
first
firstat
firstmacro
Firstset
fnt
follow
Followset
FORWARD
GetA
GetAnySets
GetAt
GetE
GetEpsSets
GetF
GetFirstSet
GetFo
GetFollowSets
GetMacroNr
GetName
GetNode
GetSy
GetSymbolSets
gn
Graphnode
i
18
299
563
174
15
90
62
13
13
148
148
155
79
529
28
196
32
52
24
71
17
524
18
76
187
178
53
403
157
54
37
55
327
393
56
352
33
61
119
125
148
166
172
216
134
361
270
367
15
14
89
384
127
235
285
14
81
273
339
237
300
564
207
236
91
138
90
241
155
416
536
138
245
52
167
32
153
114
200
185
80
426
370
53
85
328
394
182
362
56
62
120
143
161
167
211
217
181
362
356
373
105
132
98
411
132
238
286
127
88
313
343
238
301
208
262
99
523
182
159
183
154
201
197
96
427
442
182
329
402
294
408
63
398
397
222
396
577
194
208
133
239
189
89
316
345
238
302
209
287
479
524
295
159
478
203
100
429
449
217
329
403
296
64
265
596
235
250
133
241
227
91
317
346
239
304
298
416
209
103
479
450
247
333
407
301
65
293
285
334
134
242
279
97
319
351
239
336
335
425
109
623
248
343
408
303
393
378
140
244
98
321
352
263
336
349
425
217
294
351
319
379
140
246
128
327
352
288
337
399
247
296
352
321
392
189
247
135
328
386
289
337
248
296
362
321
577
194
248
136
329
390
289
350
390
301
389
328
596
195
250
136
333
390
290
352
393
303
390
329
198
255
150
334
391
290
353
394
318
391
346
227
279
158
337
392
APPF
cocosymMOD
313
INCL
InSet
m
393 394 402 403 403 407 408 408 418 461 495 498
499 499 499 500 500 501 508 510 510 517 518 518
530 532 533 542 544 545 545 560 562 563 564 590
594 595 596 618 619 619 619 619
612
541 548
islnSet 319 553 554 595
4 314 318 319 320 321
{ ai 105 106 573 577 578 578 578 580 590 596 597
601 602
lastmacro 57 450 451 451
ieftsy 178 182 187 197 200 201
line 15 474
loc 128 131 132 140 178 181 182 187 192 192 193 194
198 222 225 232 232 233 234 235 238 255 261 277
282 283 284 285 289 300 308
lp 133 134 196 200 200 255 308
1st 16 101 101 102 102 106 106 106 110 110 111 112
400 401 401 405 405 406 406 579 580 580 580 598
598 601 601 605
175 192 193 206 223 232 234 260 274 283 284 316
317 332 344
Macronode 41 42 447
Macroptr 41 45 54 57 368 440
Mark 14 193 234 284 317
Marked 14 192 232 283 316
Marklist 14 175 223 274
maxany 139 139 140 624
maxeps 183 183 184 624
maxlinelen 588 597
maxn 13 131
maxnt 26 33 37
maxp 85 88 97 207 303 389 470 470 470 473 474 475
623
maxs 88 97 207 303 318 327 329 333 341 343 351 390
391 402 407 463 463 463 464 465 482 497 499 500
623
maxsymbols 25 47 464
maxt 135 182 231 242 248 251 263 294 301 321 328 346
352 469 469 469 471 474 594 623
maxtermlnals 471 510
n 62 63 273 516 518 523 524 524 529 532 541 544
553 554 554 559 562 611 612 612 617 619
nl 273
name 82 105 106 574 577 578 580 591 596 601
NewAt 416 434
NewMacro 439 455
NewSy 460 483
next 45 158 371 425 429 429 430 443 448 451
nr 70 71 119 120 148 154 158 166 167 178 184 190
197 198
nra 71 154 422 466
nt 153 247 292 423 478
nts 35 296 303 319 329 346
null 27 463 469 470 473 497 497 500 623 623 623
°k 439 446 453
P 151 157 158 158 159 159 368 370 371 371 371 371
372 372 419 429 429 429 429 430 440 442 443 443
443 443 444
314 Program listings App. p
pr 472
PrintSet 63 263 301 303 352 559 566
PutNt 64 403 408 571 582
PutTermSet 65 403 408 587 606
ready 39 247 390 394
RepNode 14 140 198
RepSy 91 488 489
Restriction 17 464 471
rootloc 13 341
rp 197 201 201 242 293 295 307
s 61 62 63 65 119 120 129 134 136 136 138 139
166 167 179 181 182 183 216 217 440 447 448 448
448 450 450 451 451 507 510 516 518 523 524 529
533 553 554 559 563 564 587 595 611 612
si 229 242 242 251 251 541 545 617 619 619
s2 541 545 617 619
sem 44 367 372 372 372 439 448 448
semi 477
sem2 477
SemErr 17 474
set 222 225 231 242 246 248 251 253 261 263 280 293
294 361 362
SetBit 246 296 611 612
sn 58 71 71 83 89 90 90 90 91 91 98 99
105 153 153 154 154 157 176 208 209 228 250 251
275 334 336 339 379 387 392 393 421 465 489 499
575 577 577 592 596 596
snl 378 379 488 489
sp 140 198 239 246 247 248 250 289 294 296 301 303
spix 43 105 148 155 159 159 367 371 371 425 439 443
443 448 448 466 466 494 499 499 577 596
spx 416 425 460 466
start 90 90 209 251 336 339 393 479
sy 64 70 71 71 148 153 153 154 154 157 216 217
361 362 378 379 416 421 488 489 571 577
sym 277 290 296 307
Symbollist 47 58
Symbolnode 47 83 176 228 275 378 387 488 575 592
Symbolset 31 32 34 35 38 61 62 63 65 119 129 166
179 216 222 225 229 280 361 507 516 523 529 541
553 559 587 611 617
Symboltype 460
SyNr 494 502
System 19
SYSTEM 20
t 246 468
tp 460 466
ts 34 38 182 217 248 294 301 321 321 328 352 362
393 403 408
typ 71 133 153 153 154 196 238 244 292 423 466 467
Unit 182 242 248 251 294 321 617 619
VAL 20 136 510 563 564
vialp 187 196
Write 18 580
WriteCard 18 238 238 239 289 289 290 300 336 337 352 563
564
WriteLn 18 101 101 102 106 110 110 112 239 290 304 337
353 400 401 405 405 406 579 598 605
WriteString 18 102 106 111 237 263 288 299 301 302 336 350
App-F
cocosymMOD
401 406 580 598 601
WriteText 18 106 580 601
316 Program listings
1 (* General table-driven syntax analyzer
3 This is a parser module generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD).
6 *)
7 DEFINITION MODULE cocosyn;
8 VAR
9 printinput: BOOLEAN; (*trace the input tokens read*)
10 printnodes: BOOLEAN; (*trace the G-code interpretation*)
11
12 PROCEDURE Parse(VAR correct:BOOLEAN);
13 END cocosyn.
AppF
cocosynMOD
l
2 (* General table-driven syntax analyzer Re
3 ==================================== Moe 21.12.83
4 01 (21.12.83) First version (rewritten from PL/M)
5 02 (28.02.84) New interface for input and errors
6 03 (02.04.84) Error in EOL-processing corrected
7 04 (08.05.84) New EOL-processing
8 05 (23.07.84) For G-code
9 06 (30.08.84) Error recovery simplified
10 07 (05.04.85) New G-code instruction EPSA (ANYA modified)
11 08 (12.04.87) Grammar tables initialized INLINE
12 09 (12.04.87) typ,col,line and at exported by cocolex
13 10 (07.06.87) Name of error module and scanner procedure constant
14
15 IMPLEMENTATION MODULE cocosyn;
16
17 FROM Errors IMPORT SyntaxError, Errorptr, Errornode;
18 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString;
19 FROM System IMPORT Allocate;
20 FROM SYSTEM IMPORT ADDRESS, ADR, INLINE;
21
22 FROM cocosem IMPORT Semant;
23 FROM cocolex IMPORT GetSy, typ, at, line, col;
24
25 CONST
26 maxname = 385;
27 maxnamep = 45;
28 maxcode = 401;
29 maxany ■ 3;
30 maxeps = 10;
31 maxt = 34;
32 maxp = 34;
33 maxs = 45;
34 startpc = 397;
35
36
37
38 CONST (*G-code instructions*)
39 t = 0; ta = 1; nt = 2; nta = 3;
40 nts = 4; ntas = 5; any = 6; anya = 7;
41 eps = 8; epsa = 9; jmp « 10; ret = 11;
42
43 errdistmin - 2; (*min.distance between two errors*)
44 lmaxs = 50; (*max.stack length*)
45 eofsy = 0; (*token number of endfile symbol*)
46
47 TYPE
48 Attributenumbers = ARRAY[0..maxp] OF CARDINAL;
49 Namepointers = ARRAY[0. .maxnamep] OF CARDINAL;
50 Namelist = ARRAY [1. .maxname] OF CHAR;
51 Pragma = RECORD (*semantics for a pragma*)
52 sem2,sem3: CARDINAL;
53 END;
54 Pragmalist = ARRAY [maxt. .maxp] OF Pragma;
55 Symbolset = ARRAY[0. .maxt DIV 16] OF BITSET;
56 (*set of terminals*)
57 Symbolnode = RECORD (*symbol information (only for nt)*)
58 startpc: CARDINAL; (*start node of rule for nt*)
59 del: BOOLEAN; (*TRUE, if nt is deletable*)
318
Program listings
App.p
60
61
62
63
64
65 \
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
first:
END;
Symbollist
Stack
rAR
tab:
header:
code:
ntsymbol
epsset:
anyset:
nra:
ps:
namep:
name:
END;
correct:
pc:
errdist:
newlacts:
newpc:
Symbolset; (*termina.
= ARRAY [maxp+l..maxs]
= ARRAY[l..lmaxs] OF (
POINTER TO RECORD (*grammar
ARRAY[1..8] OF CARDINAL;
ARRAY[l..maxcode] OF CHAR
s: Symbollist;
.s causing to analyze this nt*)
OF Symbolnode;
:ardinal;
tables*)
(*not used*)
; (*G-code area*)
(♦nonterminals information*)
ARRAY[l..maxeps] OF Symbolset;
ARRAY[l..maxany] OF Symbolset;
Attributenumbers;
Pragmalist;
Namepointers;
Name11st;
BOOLEAN;
CARDINAL;
CARDINAL;
ARRAY [0..maxt] OF CARDINAL;
ARRAY [0..maxt] OF CARDINAL;
s,olds: Stack;
lacts: CARDINAL;
(*no.of attributes*)
(♦semantics for pragmas*)
(*pointers to symbol names*)
(*symbol names*)
(*error indicator*)
(*program counter*)
(*current error distance*)
(*new stack length*)
(*pc after recovery*)
(*stack pointer*)
87 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
88 FORWARD;
89 PROCEDURE RestoreStack; FORWARD;
90 PROCEDURE SaveStack; FORWARD;
91 PROCEDURE StackElem(i:CARDINAL): CARDINAL; FORWARD;
92 PROCEDURE Triple(altrootCARDINAL); FORWARD;
93
94
95 (* Match Check if sy is member of the specified set
96
97 PROCEDURE Match(syCARDINAL; set:Symbolset): BOOLEAN;
98 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match;
99
100
101 (* NextSym Get next symbol
102
103 PROCEDURE NextSym;
104 BEGIN
105
106
107
108
109
110
HI
112
113
114
115
116
117
*)
118
LOOP
GetSy;
(*IF printinput THEN
WriteString(con,H$(in:
WriteString(conr") ");
IF printnodes THEN
WriteCard(con,lacts,3); WriteString(con,«
END;
END;*)
IF typ<=maxt THEN RETURN END;
WITH tabA DO
IF correct AND (psftyp].sem2<>0) THEN Semant(ps[typ].sem2); END;
IF correct AND (psftyp].sem3<>0) THEN Semant(ps[typj.sem3); END;
"); WriteCard(con,typ,3);
•I ");
END;
APPF
cocosynMOD
319
U9 IF typ=eofsy THEN RETURN END;
120 END;
!2l END NextSym;
122
123
124
125 (*«=«========:=:============= ERRORS ==================================-*)
126
127 (* AdjustPc Adjust pc to next symbol Instruction
128 *)
129 PROCEDURE AdjustPc(VAR pc.CARDINAL);
130 BEGIN
131 WITH tabA DO
132 IF pc=0 THEN RETURN; END;
133 LOOP
134 CASE ORD(code[pc]) OF
135 t,ta,ntrnta,ntsrntasranyranya,eps,epsa: EXIT;
136 I jmp: pc:=256*ORD(code[pc+1])+ORD(code[pc+2]);
137 I ret: pc:=0; EXIT;
138 ELSE INC(pc); (*sem*)
139 END;
140 END;
141 END;
142 END AdjustPc;
143
144
145 (* Error Report syntax error
146 *)
147 PROCEDURE Error (VAR pc, al t root CARDINAL);
148 VAR
149 erel,h: Errorptr;
150 i,j: CARDINAL;
151 opcode,sy,nextpc,altpc,pel: CARDINAL;
152
153 PROCEDURE GiveName(q:Errorptr; sy:CARDINAL);
154 VAR prj: CARDINAL;
155 BEGIN
156 WITH tabA DO
157 p:=namep[sy]; j:=0;
158 WHILE (j<25) AND (name[p+j]<>0C) DO
159 INC(j); qA.txt[j]:=name[p+j-l];
160 END;
161 qA.l:=j;
162 END;
163 END GiveName;
164
165 BEGIN (*Error*)
166 correct:=FALSE;
167 if errdist >= errdlstmin
168 THEN
169 Allocate(h,SIZE(Errornode)); GiveName(h,typ); (*pass near-symbol*)
170 hA.next:=NIL; el:=h;
171 pcl:=altroot; AdjustPc(pel);
172 WHILE pcl>0 DO
173 GetSymlnstr(pel,opcode,sy,nextpc,altpc);
174 if opcode<any THEN (*t,nt,nts,ta,nta,ntas*)
175 Allocate(e,SIZE(Errornode));
176 GiveName(e,sy); (*pass expected symbol*)
177 elA.next:=e; el:=e; eA.next:-NIL;
320
Program listings
App.p
178 END;
179 pcl:=altpc;
180 END; (*WHILE*)
181 SyntaxError(h,line,col);
182 Triple(altroot); SaveStack;
183 IF printnodes THEN
184 WriteString{conr"$ typ newpc newlacts$");
185 FOR i:=0 TO maxt DO
186 IF newpc[i]<>0 THEN
187 WriteCard(con,i,5); WriteCard(con,newpc[i],10);
188 WriteCard(con,newlacts[i],10); WriteLn(con);
189 END; (*IF*)
190 END; (*FOR*)
191 END; (*IF*)
192 ELSE RestoreStack;
193 END;
194 WHILE newpc[typ]=0 DO
195 IF printnodes THEN
196 WriteString(con,"$(skip:*); WriteCard(con,typ,0);
197 WriteString(con,") ");
198 END;
199 NextSym;
200 END;
201 pc:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0;
202 END Error;
203
204
205 (* Fill Fill triple list with alt-chain starting at pc
206 *)
207 PROCEDURE Fill(pc,lacts:CARDINAL);
208 VAR
209 i,opcode,sy,nextpc,altpc: CARDINAL;
210 s: Symbolset;
211 BEGIN
212 AdjustPc(pc);
213 WHILE pcoO DO
214 GetSymInstr(pc,opcode,sy,nextpc,altpc);
215 CASE opcode OF
216 t,ta:
217 newpc[sy]:=pc; newlacts[sy]:=lacts;
218 I nt,nta,nts,ntas:
219 s:-tabA.ntsymbols[sy].first;
220 FOR i:=0 TO maxt DO
221 IF Match(i,s) THEN newpc[i] :=pc; newlacts[i] reacts; END;
222 END;
223 IF tabA.ntsymbols[syJ.del THEN Fill(nextpc,lacts); END;
224 I eps,epsa:
225 Fill(nextpc,lacts);
226 ELSE (*any,anya: nothing*)
227 END; (*CASE*)
228 pc:=altpc;
229 END; (*WHILE*)
230 END Fill;
231
232
233 (* FillSucc Fill triple list with succ. of alt-chain at pc
234 *)
235 PROCEDURE FillSucc(pc,lactsCARDINAL);
236 VAR
App.F
cocosynMOD
321
237 opcode,sy,nextpc,altpc: CARDINAL;
238 BEGIN
239 AdjustPc(pc);
240 WHILE pc>0 DO (*fill with successors of alternative-starts*)
241 GetSymlnstr(pc,opcode,syrnextpc,altpc);
242 IF nextpoO THEN Fill(nextpc,lacts); END;
243 pc:*altpc;
244 END; (*WHILE*)
245 END FillSucc;
246
247
248 (* GetSymlnstr Get G-code instruction at address pc
249 *,
250 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
251 BEGIN (*assert: pc points to a symbol instruction (not RET,JMP,SEM,ANY)*)
252 WITH tabA DO
253 opcode:*ORD(code [pc]);
254 IF (opcode<-epsa) AND (opcodeoany)
255 THEN sy:=ORD(code[pc+l]);
256 ELSE sy:-0;
257 END;
258 CASE opcode OF
259 t,nt,eps:
260 nextpc:=pc+2; altpc:=0;
261 | ta,nta,anya,epsa:
262 nextpc:=pc+4; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]);
263 1 nts: nextpc:=pc+3; altpc:=0;
264 | ntas: nextpc:=pc+5; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]);
265 | any: nextpc:=pc+l; altpc:=0;
266 END; (*CASE*)
267 AdjustPc(nextpc); AdjustPc(altpc);
268 END;
269 (*assert: nextpc,altpc point to symbol instructions or are zero*)
270 END GetSymlnstr;
271
272
273 (* Triple Fill triple list
274 *)
275 PROCEDURE Triple(altroot CARDINAL);
276 VAR i: CARDINAL;
277 BEGIN
278 FOR i:=0 TO maxt DO (*clear triple list*)
279 newpc[i):=0; newlacts[i]:«0;
280 END;
281 FOR i:*l TO lacts DO (*fill with succ.of stacked nt's*)
282 (*s[l] contains successor at level 0*)
283 FillSucc(StackElem(i),i-l);
284 Fill(StackElem(i),i-1);
285 END;
286 FillSucc(altroot,lacts); (*fill with succ.of alt-chain*)
287 Fill(altroot,lacts); (*fill with current alt-chain*)
288 END Triple;
289
290 (*S===:SS=SSS=:==S=S=:===SSS===S:=SS= £ND ERRORS ===S==S==«^================«=====*)
291
292
293
294 (*«««*:««=::«=«=«««=« SYNTAXSTACK =======«===««==:======*===*===«*)
295
322
Program listings
App.F
296 PROCEDURE Pop(VAR loc: CARDINAL);
297 BEGIN
298 IF lacts>0
299 THEN loc:=s[lacts]; DEC(lacts);
300 ELSE WriteString(con,"— Parser stack underflow.$"); HALT;
301 END;
302 (*IF printnodes THEN WriteString(con," pop"); END;*)
303 END Pop;
304
305 PROCEDURE Push(loc: CARDINAL);
306 BEGIN
307 IF lacts<lmaxs
308 THEN INC(lacts); s[lacts]:=loc;
309 ELSE WriteString(con,M— Parser stack overflow.$"); HALT;
310 END;
311 (*IF printnodes THEN WriteStringCcon," push"); END;*)
312 END Push;
313
314 PROCEDURE RestoreStack;
315 BEGIN s:=olds; END RestoreStack;
316
317 PROCEDURE SaveStack;
318 BEGIN olds:=s; END SaveStack;
319
320 PROCEDURE StackElem(i-.CARDINAL): CARDINAL;
321 BEGIN RETURN s[i]; END StackElem;
322
323 (*==«=«~=™««««««« END SYNTAXSTACK «====«===========:==:======
324
325
326 {* TableContents A dirty trick to initialize the grammar tables
327
328 PROCEDURE TableContents;
329 BEGIN (*%% dont remove or change this comment*)
330 INLINE(
331 401, 34, 34, 45, 10, 3, 45, 385,
332 (*—G-code—*)
333 7, 17, 5398, 271, 22, 3, 4359, 256, 5648, 2560,
334 3592, 279, 265, 36, 811, 36, 2560, 7424, 4120, 812,
335 56, 5125, 9984,12569, 813, 39, 2560, 9985, 3072,20506,
336 812, 80, 5125, 9984,18459, 7171,10752,15645, 2560,15616,
337 2590, 273, 101, 7956, 1319, 94, 8195,11520,21258, 83,
338 2050, 8448, 3329, 4352,33311, 8709, 9984,29987, 2052, 3840,
339 5122, 9252, 21, 2560,27144, 805, 4, 9739, 549,10024,
340 278, 151, 549,10506, 141, 2053, 2858, 1062,11052, 1318,
341 168,11566, 2560,40712, 1547, 812, 186,12037, 9984,46640,
342 12552, 1807, 2817, 1536,49202, 2817, 512,50739, 2819,10752,
343 52276, 2817, 5888,55315, 548,13568, 6162, 2817, 6400,58387,
344 548,13824, 6674, 2816, 6931, 548,14080, 7186,14347, 29,
345 14597,10241, 58, 287, 253, 553, 30, 2820,10554, 2560,
346 64768, 2107, 32, 273, 297, 7948, 289, 293, 273, 286,
347 7948, 2561, 4352, 4924, 3594, 273, 2056,15627, 19,15374,
348 2561, 4352, 2878, 32, 17, 7949, 289, 324, 17, 7949,
349 2561,14600, 2367, 2816, 3648, 279, 345,16640, 4383,16896,
350 6144, 1291, 1794, 353,17162, 345, 2058,17418, 342, 14,
351 32, 17, 8005, 32, 1795, 377,17930, 369, 5,18187,
352 273, 387, 7947, 18, 7947, 1, 556,18443, 547, 0,
353 2816,
354 (*—nt-symbols—*)
355 1, 0, 128, 0, 0, 137, 0,16452, 2694, 0,
App. F cocosynMOD
356
357
358
359
360
361 (
362
363
364
365 (
366
367 (
368
369
370
371
372 (
373
374 (
375
376
377
378
379
380 (
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405 (*
406 —
0, 0, 16,
0, 0, 5472,
0,16384, 0,
65535,65535,65502,65535,65535,
)
154, 0,16452, 2694,
239, 0, 0, 8192,
304, 0, 2048, 0,
359, 0,16384, 0,
391, 0, 2, 0,
—eps followers—*)
512, 1, 0, 8192,
16, 0, 0, 5408,
0, 0,49152, 0,
—any sets—*)
65022,65534,65535,65502,
—attribute numbers—*
0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 0,
0, 0, 0, 0,
—pragma semantic—*)
0, 0,
—name pointers—*)
If 5, 13, 19,
83, 98, 104, 114,
177, 181, 185, 189,
217, 221, 225, 229,
300, 315, 331, 349,
—name list—*)
17743,17920, 8769,19529,16723, 8704, 8801,
17731,19521,21057,21577,20302,21282, 34,
19746, 34,25966,25715,25965, 8704, 8805,
21057,19789,16722, 8704, 8809,28194, 34,
8704, 8782,20302,21573,21069,18766,16716,
29730, 34,20562,16711,19777,21282, 34,
34,29541,27938, 34,21317,19777,20052,
21573,21069,18766,16716,21282, 105,25701,
29184,21332,21065,20039, 78,21837,16965,
10030, 9984,10108, 9984,10024, 9984,10025,
10077, 9984,10107, 9984,10109, 9984,10044,
10043, 9984,10042, 9984,10028, 9984,28271,
34,25455,29298,25955,29728,26482,24941,
30832,29285,29555,26991,28160,24940,29797,
25856,29561,28002,28524, 97,29812,29289,
26990,11617,29812,29289,25205,29797, 8704,
29812,29289,25205,29797, 8704, 8819,25965,
24931,29801,28526, 8704, 8819,25965,24942,
25458,28450, 115,31085,25199,27648, 8801,
24941,25890, 0,0);
0, 171, 0,16452, 2694, 0,
0, 262, 0, 256, 0, 0,
0, 328, 0,16384, 0, 0,
0, 381, 0, 0, 6, 0,
0,
0,
0,
32,
0, 0,
16452, 8166,
0, 0,
o,
o,
o,
lr
34,
122,
193,
233,
366,
o,
o,
o,
44,
128,
197,
242,
373,
0,
o,
0,
53,
140,
201,
260,
o,
lr
o,
59,
152,
205,
271,
0,
lr
o,
69,
163,
209,
283,
0,
lr
Or
74,
170,
213,
290,
28281,
17742,
28787,
19777,
21282,
21077,
18755,
28276,
20992,
9984,
9984,
25455,
28001,
29294,
25205,
8815,
24942,
29801,
27753,
8704,
17479,
8704,
17234,
34,
19525,
21282,
26982,
10045,
10075,
10046,
25455,
29218,
24948,
29797,
30068,
29801,
25376,
24947,
8772,
21057,
8775,
20307,
28533,
21282,
34,
26981,
9984,
9984,
9984,
29561,
101,
26998,
34,
11617,
25376,
28001,
8302,
END TableContents;
Parse
Proper syntax analyzer
407 PROCEDURE Parse(VAR corr:BOOLEAN);
408 VAR
409 altroot: CARDINAL;
410 mustread: BOOLEAN;
411 opcode: CARDINAL;
412 running: BOOLEAN;
413 sy: CARDINAL;
414
(*root of current alternative chain*)
(*TRUE if next symbol must be read*)
(♦instruction code*)
(♦interpreter state*)
324
Program listings
App.p
415 BEGIN
416 tab:=ADR(TableContents)+10D; ^initialize the tables*)
417 pc:=startpc; altroot:=pc;
418 line:=l; col:-l;
419 correct:=TRUE; mustread:=TRUE; running:=TRUE;
420
421 WITH tabA DO
422 WHILE running DO
42 3 opcode:=ORD(code[pc]);
424 IF must read AND (opcode<=epsa) THEN
425 NextSym; mustread:=FALSE; INC(errdist); altroot:=pc;
426 END;
427 (*IF printnodes THEN WriteCard(con,pc,5); END;*)
428 INC(pc);
429 CASE opcode OF
430 t:
431 IF ORD(typ)=ORD(code[pc])
432 THEN IF typ=eofsy (*t recognized*)
433 THEN running:=FALSE;
434 ELSE INC(pc); mustread:=TRUE;
435 END;
436 ELSE Error(pc,altroot);
437 END;
438 I ta:
439 IF ORD(typ)=ORD(code[pc])
440 THEN INC(pc,3); mustread:=TRUE; (*t recognized*)
441 ELSE pc:-ORD(code[pc+1])*256+ORD(code[pc+2]); (*try alt.*)
442 END;
443 I nt,nts:
444 sy:=ORD(code[pc]);
445 IF Match(typ,ntsymbols[sy].first) OR ntsymbols[sy].del
446 THEN (*right nt, parse it*)
447 IF opcode=nts THEN INC(pc); Semant(ORD(code[pc])); END;
448 Push(pc+1); pc:*ntsymbols[sy].startpc;
449 altroot:=pc;
450 ELSE Error(pcraltroot);
451 END;
452 I nta,ntas:
453 sy:=ORD(code[pc]);
454 IF Match(typ,ntsymbols[sy].first)
455 THEN (*right nt, parse it*)
456 INC(pc,3);
457 IF opcode=ntas THEN Semant(ORD(code[pc])); INC(pc) END;
458 Push(pc); pc:=ntsymbols[sy].startpc;
459 altroot:-pc;
460 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*)
461 END;
462 I any: mustread:=TRUE; (*any recognized*)
463 I anya:
464 IF Match(typfanyset[ORD(code[pc])])
465 THEN INC(pcr3); mustread:=TRUE; (*any recognized*)
466 ELSE pc:=ORD(code[pc+1])*256+ORD(code[pc+2]);
467 END;
468 I eps:
469 IF Match(typ,epsset[ORD(code[pc])])
470 THEN INC(pc);
471 ELSE Error(pc,altroot);
472 END;
473 I epsa:
App. F cocosynMOD 325
474 IF Match(typ,epsset[ORD(code[pc])])
475 THEN INC(pc,3); (*eps recognized*)
476 ELSE pc:=0RD(code[pc+1])*256+ORD(code[pc+2]);
477 END;
478 I jmp: pc:-ORD(code[pc] )*256+ORD(code [pc+1]); (*goto successor*)
479 I ret: Pop(pc); altroot:=pc; (*end of nt*)
480 ELSE (*sem*)
481 IF correct THEN Semant(ORD(opcode)); END;
482 END; (*CASE*)
483 END; (*WHILE running*)
484 END; (*WITH tabA*)
485 corr:=correct;
486 END Parse;
487
488 BEGIN
489 printinput:=FALSE;
490 printnodes:=FALSE;
491 errdist:=100;
492 lacts:=0;
493 END cocosyn.
ADDRESS
AdjUStPc
ADR
Allocate
altpc
altroot
any
anya
anyset
at
Attributenumbers
C
cocolex
cocosem
cocosyn
code
col
con
corr
correct
D
del
e
el
eofsy
eps
epsa
epsset
errdist
errdistmin
Error
Errornode
Errorptr
20
129
20
19
87
262
92
449
40
40
71
23
48
158
23
22
15
68
439
469
23
18
407
77
416
59
149
149
45
41
41
70
80
43
147
17
17
142 171 212 239 267 267
416
169 175
151 173 179 209 214 228 237 241 243 250 260
263 264 265 267
147 171 182 201 275 286 287 409 417 425 436
450 459 471 479
135 174 254 265 462
135 261 463
464
72
493
134 136 136 253 255 262 262 264 264 423 431
441 441 444 447 453 457 460 460 464 466 466
474 476 476 478 478
181 418
184 187 187 188 188 196 196 197 300 309
485
116 117 166 419 481 485
223 445
175 176 177 177 177
170 177 177
119 432
135 224 259 468
135 224 254 261 424 473
469 474
167 201 425 491
167
202 436 450 471
169 175
149 153
326
Program listings
App.F
Errors 17
FilelO 18
Fill 207 223 225 230 242 284 287
FlllSucc 235 245 283 286
first 60 219 445 454
FORWARD 88 89 90 91 92
GetSy 23 106
GetSymlnstr 87 173 214 241 250 270
GiveName 153 163 169 176
h 149 169 169 170 170 181
HALT 300 309
header 67
1 91 150 185 186 187 187 188 209 220 221 221 221
276 278 279 279 281 283 283 284 284 320 321
INLINE 20 330
j 150 154 157 158 158 159 159 159 161
jmp 41 136 478
1 161
lacts 84 201 207 217 221 223 225 235 242 281 286 287
298 299 299 307 308 308 492
line 23 181 418
lmaxs 44 63 307
loc 296 299 305 308
Match 97 98 221 445 454 464 469 474
maxany 29 71
maxcode 28 68
maxeps 30 70
maxname 26 50
maxnamep 27 49 «
maxp 32 48 54 62
maxs 33 62
maxt 31 54 55 81 82 114 185 220 278
mustread 410 419 424 425 434 440 462 465
name 75 158 159
Namelist 50 75
namep 74 157
Namepointers 49 74
newlacts 81 188 201 217 221 279
newpc 82 186 187 194 201 217 221 279
next 170 177 177
nextpc 87 151 173 209 214 223 225 237 241 242 242 250
260 262 263 264 265 267
NextSym 103 121 199 425
nra 72
nt 39 135 218 259 443
nta 39 135 218 261 452
ntas 40 135 218 264 452 457
nts 40 135 218 263 443 447
ntsymbols 69 219 223 445 445 448 454 458
olds 83 315 318
opcode 87 151 173 174 209 214 215 237 241 250 253 254
254 258 411 423 424 429 447 457 481
p 154 157 158 159 I
Parse 407 486
pc 78 87 129 132 134 136 136 136 137 138 147 201
201 207 212 213 214 217 221 228 235 239 240 241
243 250 253 255 260 262 262 262 263 264 264 264 \
265 417 417 423 425 428 431 434 436 439 440 441 *!
441 441 444 447 447 448 448 449 450 453 456 457 - ,
AppF
cocosynMOD
327
457 458 458 459 460 460 460 464 465 466 466 466
469 470 471 474 475 476 476 476 478 478 478 479
pel
Pop
pragma
pragmalist
prlntinput
prlntnodes
ps
Push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
set
Stack
StackElem
startpc
sy
Symbollist
Symbolnode
Symbolset
SyntaxError
System
SYSTEM
t
ta
tab
TableContents
Triple
txt
typ
WriteCard
WriteLn
WriteString
479
151
296
51
54
489
183
73
305
153
89
41
412
83
90
52
52
22
97
63
91
34
87
217
448
62
57
55
17
19
20
39
39
66
328
92
159
23
431
18
18
18
171
303
54
73
195
116
312
159
192
137
419
210
182
116
117
116
98
83
283
58
97
219
453
69
62
60
181
135
135
115
402
182
114
432
187
188
184
171
479
490
116
448
161
314
479
422
219
317
116
117
117
284
417
98
223
454
70
216
216
131
416
275
116
439
187
196
172
117
458
315
433
221
318
447
320
448
98
237
458
71
259
261
156
288
116
445
188
197
173
117
299
457
321
458
151
241
97
430
438
219
117
454
196
300
179
308
481
153
250
210
223
117
464
309
315
157
255
252
119
469
318
173
256
416
169
474
321
176 209 214 217
413 444 445 445
421
194 196 201 201
328 Program listings App. p
1 (* General table-driven syntax analyzer
2 «■„.«««««■»■«««««=«««.■.«««
3 This is a parser module generated by Coco from an attributed grammar.
4 Before calling the procedure Parse from the main program, initialize
5 the scanner (<grammarname>lex.MOD).
6 *
7 DEFINITION MODULE —>modulename;
8 VAR
9 printinput: BOOLEAN; (*trace the input tokens read*)
10 printnodes: BOOLEAN; (*trace the G-code interpretation*)
11
12 PROCEDURE Parse(VAR correct:BOOLEAN);
13 END —>modulename.
14 —> implementation
15 (* General table-driven syntax analyzer Re
16 «=====«==========«==:========«===== Moe 21.12.83
17 01 (21.12.83) First version (rewritten from PL/M)
18 02 (28.02.84) New interface for input and errors
19 03 (02.04.84) Error in EOL-processing corrected
20 04 (08.05.84) New EOL-processing
21 05 (23.07.84) For G-code
22 06 (30.08.84) Error recovery simplified
23 07 (05.04.85) New G-code instruction EPSA (ANYA modified)
24 08 (12.04.87) Grammar tables initialized INLINE
25 09 (12.04.87) typ,col,line and at exported by cocolex
26 10 (07.06.87) Name of error module and scanner procedure constant
27
28 IMPLEMENTATION MODULE — >modulename;
29
30 FROM Errors IMPORT SyntaxError, Errorptr, Errornode;
31 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString;
32 FROM System IMPORT Allocate;
33 FROM SYSTEM IMPORT ADDRESS, ADR, INLINE;
34
35 FROM —>semantic analyzer IMPORT Semant;
36 FROM —>input module IMPORT GetSy, typ, at, line, col;
37
38 —declarations
39
40 CONST (*G-code instructions*)
41 t 0; ta = 1; nt = 2; nta = 3;
42 nts = 4; ntas = 5; any = 6; anya « 7;
43 eps = 8; epsa = 9; jmp = 10; ret = 11;
44
45 errdistmin = 2; (*min. distance between two errors*)
46 lmaxs = 50; (*max. stack length*)
47 eofsy = 0; (*token number of endfile symbol*)
48
49 TYPE
50 Attributenumbers = ARRAY[0..maxp] OF CARDINAL;
51 Namepointers = ARRAY[0..maxnamep] OF CARDINAL;
52 Namelist = ARRAY[l..maxname] OF CHAR;
53 Pragma = RECORD (*semantics for a pragma*)
54 sem2,sem3: CARDINAL;
55 END;
56 Pragmalist = ARRAY(maxt..maxp] OF Pragma;
57 Symbolset = ARRAY[0..maxt DIV 16] OF BITSET;
58 (*set of terminals*)
59 Symbolnode = RECORD (*symbol information (only for nt)*)
App. F cocosynframe 329
60
61
62
63
64
65
startpc:
del:
first:
END;
Symbollist
Stack
00
67 VAR
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
tab:
header:
code:
ntsymbol
epsset:
anyset:
nra:
ps:
namep:
name:
END;
correct:
pc:
errdist:
newlacts:
newpc:
: CARDINAL; (*start node of rule for nt*)
BOOLEAN; (*TRUE, if nt is deletable*)
Symbolset; (*terminals causing this nt to be analyz
= ARRAY[maxp+1..maxs]
OF Symbolnode;
= ARRAY[l..lmaxs] OF CARDINALS-
POINTER TO RECORD (*grammar
ARRAY[1..8] OF CARDINAL;
ARRAY [L.maxcode] OF CHAR;
.s: Symbollist;
tables*)
(*not used*)
(*G-code area*)
(♦nonterminals information*)
ARRAY[l..maxeps] OF Symbolset;
ARRAY[l..maxany] OF Symbolset;
Attributenumbers;
Pragmalist;
Namepointers;
Namelist;
BOOLEAN;
CARDINAL;
CARDINAL;
ARRAY [0..maxtj OF CARDINAL;
ARRAY [0..maxt] OF CARDINAL;
s,olds: Stack;
lacts: CARDINAL;
(*no.of attributes*)
(♦semantics for pragmas*)
(♦pointers to symbol names*)
(*symbol names*)
(*error indicator*)
(♦program counter*)
(*current error distance*)
(*new stack length*)
(*pc after recovery*)
(*stack pointer*)
89 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
90 FORWARD;
91 PROCEDURE RestoreStack; FORWARD;
92 PROCEDURE SaveStack; FORWARD;
93 PROCEDURE StackElem(i.-CARDINAL): CARDINAL; FORWARD;
94 PROCEDURE Triple(altroot:CARDINAL); FORWARD;
95
96
97 (* Match Check if sy is member of the specified set
*)
98
99 PROCEDURE Match(sy:CARDINAL; set:Symbolset): BOOLEAN;
100 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match;
101
102
103 (* NextSym Get next symbol
104
105 PROCEDURE NextSym;
106 BEGIN
107 LOOP
GetSy;
(*IF printinput THEN
WriteString(con,■$(in:"); WriteCard(conr typ,3);
WriteString(con,") ");
IF printnodes THEN
WriteCard(con,lact$,3); WriteString(con,"I ");
END;
END;*)
IF typ<=maxt THEN RETURN END;
WITH tabA DO
IF correct AND (psftyp].sem2<>0) THEN Semant(ps[typ].sem2); END;
108
109
110
111
112
113
114
115
116
117
118
330
Program listings
App.F
119 IF correct AND (ps[typ] .sem3<>0) THEN Semant(ps[typ] .sem3); END;
120 END;
121 IF typ=eofsy THEN RETURN END;
122 END;
123 END NextSym;
124
125
126
127 (*=:«=========:=======«:====::==:= ERRORS ============«====================:*)
128
129 (* AdjustPc Adjust pc to next symbol instruction
130 *)
131 PROCEDURE AdjustPc(VAR pc:CARDINAL);
132 BEGIN
133 WITH tabA DO
134 IF pc*0 THEN RETURN; END;
135 LOOP
136 CASE ORD(code[pc]) OF
137 t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: EXIT;
138 | jmp: pc:=256*ORD(code[pc+l])+ORD(code[pc+2]);
139 | ret: pc:=0; EXIT;
140 ELSE INC(pc); (*sem*)
141 END;
142 END;
143 END;
144 END AdjustPc;
145
146
147 (* Error Report syntax error
148 *)
149 PROCEDURE Error (VAR pc,altrootCARDINAL);
150 VAR
151 erelrh: Errorptr;
152 i,j: CARDINAL;
153 opcode,sy,nextpc,altpc,pel: CARDINAL;
154
155 PROCEDURE GiveName(q:Errorptr; syCARDINAL);
156 VAR p,j: CARDINAL;
157 BEGIN
158 WITH tabA DO
159 p:=namep[sy]; j:=0;
160 WHILE (j<25) AND (name[p+j]<>0C) DO
161 INC(j); qA.txt[j]:=name[p+j-l];
162 END;
163 qA.l:=j;
164 END;
165 END GiveName;
166
167 BEGIN (*Error*)
168 correct:=FALSE;
169 IF errdist >= errdistmln
170 THEN
171 Allocate(h,SIZE(Errornode)); GiveName(h,typ); (*pass near-symbol*)
172 hA.next:-NIL; el:=h;
173 pcl:=altroot; AdjustPc{pel);
174 WHILE pcl>0 DO
175 GetSymlnstr(pel,opcode,sy,nextpcraltpc);
176 IF opcode<any THEN (*t,nt,nts,ta,nta,ntas*)
177 Allocate(e,SIZE(Errornode));
AppF
cocosynframe
331
!78 GiveName(ersy); (*pass expected symbol*)
!79 elA.next:=e; el:=e; eA.next:=NIL;
180 END;
181 pcl:=altpc;
!82 END; (*WHILE*)
183 SyntaxError(h,line,col);
184 Triple(altroot); SaveStack;
185 IF printnodes THEN
186 WriteString(con,"$ typ newpc newlacts$");
187 FOR i:=0 TO maxt DO
188 IF newpc[i]<>0 THEN
189 WriteCard(con,i,5); WriteCard(con,newpc[i],10);
190 WriteCard(con,newlacts[i],10); WriteLn(con);
191 END; (*IF*)
192 END; (*FOR*)
193 END; (*IF*)
194 ELSE RestoreStack;
195 END;
196 WHILE newpc[typ]*0 DO
197 IF printnodes THEN
198 WriteString(con,"$(skip:"); WriteCard(con,typ, 0);
199 WriteString(con,") ");
200 END;
201 NextSym;
202 END;
203 pc:=newpc[typ]; altroot:-pc; lacts:=newlacts[typ]; errdist:*0;
204 END Error;
205
206
207 (* Fill Fill triple list with alt-chain starting at pc
208 *)
209 PROCEDURE Fill(pc,lacts:CARDINAL);
210 VAR
211 i,opcode,sy,nextpc,altpc: CARDINAL;
212 s: Symbolset;
213 BEGIN
214 AdjustPc(pc);
215 WHILE pc<>0 DO
216 GetSymlnstr(pc,opcode,sy,nextpc,altpc);
217 CASE opcode OF
218 t,ta:
219 newpc[sy]:«pc; newlactsfsy]:=lacts;
220 | nt,nta,nts,ntas:
221 s:*tabA.ntsymbols[sy].first;
222 FOR i:=0 TO maxt DO
223 IF Match(i,s) THEN newpcfi]:=pc; newlactsli]:=lacts; END;
224 END;
225 IF tabA.ntsymbols[sy].del THEN Fill(nextpc,lacts); END;
226 | eps,epsa:
227 Fill(nextpc,lacts);
228 ELSE (*any,anya: nothing*)
229 END; (*CASE*)
230 pc:=altpc;
231 END; (*WHILE*)
232 END Fill;
233
234
235 (* FillSucc Fill triple list with succ. of alt-chain at pc
236 *)
332
Program listings
App.F
237 PROCEDURE FillSucc(pc,lacts:CARDINAL);
238 VAR
239 opcode,sy,nextpc,altpc: CARDINAL;
240 BEGIN
241 AdjustPc(pc);
242 WHILE pc>0 DO (*fill with successors of alternative-starts*)
243 GetSymlnstr(pc,opcode,sy,nextpc,altpc);
244 IF nextpoO THEN Fill(nextpc,lacts); END;
245 pc:=altpc;
246 END; (*WHILE*)
247 END FillSucc;
248
249
250 (* GetSymlnstr Get G-code instruction at address pc
25i *)
252 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL);
253 BEGIN (*assert: pc points to a symbol instruction (not RET,JMP,SEM,ANY)*)
254 WITH tabA DO
255 opcode:=ORD(code[pc]);
256 IF (opcode<=epsa) AND (opcodeoany)
257 THEN sy:=ORD(code[pc+l]) ;
258 ELSE sy:=0;
259 END;
260 CASE opcode OF
261 t,nt,eps:
262 nextpc:«pc+2; altpc:=0;
263 I ta,nta,anya,epsa:
264 nextpc:=pc+4; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]);
265 | nts: nextpc:=pc+3; altpc:-0;
266 | ntas: nextpc:=pc+5; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3J);
267 I any: nextpc:=pc+l; altpc:=0;
268 END; (*CASE*)
269 AdjustPc(nextpc); AdjustPc(altpc);
270 END;
271 (*assert: nextpc,altpc point to symbol instructions or are zero*)
272 END GetSymlnstr;
273
274
275 (* Triple Fill triple list
276 *)
277 PROCEDURE Triple(altroot CARDINAL);
278 VAR i: CARDINAL;
279 BEGIN
280 FOR i:=0 TO maxt DO (*clear triple list*)
281 newpc[i]:=0; newlacts[i]:=0;
282 END;
283 FOR i:=l TO lacts DO (*fill with succ.of stacked nt's*)
284 (*s[l] contains successor at level 0*)
285 FillSucc(StackElem(i),i-l) ;
286 Fill(StackElem(i),i-1);
287 END;
288 FillSucc(altroot,lacts); (*fill with succ.of alt-chain*)
289 Fill(altroot,lacts); (*fill with current alt-chain*)
290 END Triple;
291
292 (*========—=«======:====:=== END ERRORS ====^===*====-====r:=S======:====S=Sr*)
293
294
295
APPF
cocosynframe
333
296 (*=«==:===========:*======= SYNTAXSTACK ============================
297
298 PROCEDURE Pop(VAR loc: CARDINAL);
299 BEGIN
300 IF lacts>0
301 THEN loc:=s[lacts]; DEC(lacts);
302 ELSE WriteString(conr"— Parser stack underflow.$"); HALT;
303 END;
304 (*IF printnodes THEN WriteString(con," pop"); END;*)
305 END Pop;
306
307 PROCEDURE Push(loc: CARDINAL);
308 BEGIN
309 IF lacts<lmaxs
310 THEN INC(lacts); s[lacts]:=loc;
311 ELSE WriteString(con,"— Parser stack overflow.$"); HALT;
312 END;
313 (*IF printnodes THEN WriteString(con," push"); END;*)
314 END Push;
315
316 PROCEDURE RestoreStack;
317 BEGIN s:=olds; END RestoreStack;
318
319 PROCEDURE SaveStack;
320 BEGIN olds:=s; END SaveStack;
321
322 PROCEDURE StackElem(i CARDINAL): CARDINAL;
323 BEGIN RETURN s[i]; END StackElem;
324
325 (*================«==== END SYNTAXSTACK ==========================
326
327
328 (* TableContents A dirty trick to initialize the grammar tables
329
330 PROCEDURE TableContents;
331 BEGIN (*%% dont remove or change this comment*)
332 —>tables
333 END TableContents;
334
335
336 (* Parse Proper syntax analyzer
337
338 PROCEDURE Parse(VAR corr:BOOLEAN);
339 VAR
340 altroot: CARDINAL; (*root of current alternative chain*)
341 mustread: BOOLEAN; (*TRUE if next symbol must be read*)
342 opcode: CARDINAL; (*instruction code*)
343 running: BOOLEAN; ^interpreter state*)
344 sy: CARDINAL;
345
346 BEGIN
347 tab:=ADR(TableContents)+10D; (*initialize the tables*)
348 pc:=startpc; altroot:=pc;
349 line:-l; col:=0;
350 correct:=TRUE; mustread:=TRUE; running:=TRUE;
351
352 WITH tabA DO
353 WHILE running DO
354 opcode:-ORD(code[pc]);
355 IF mustread AND (opcode<~epsa) THEN
334
Program listings
App.F
356 NextSym; mustread:=FALSE; INC(errdist); altroot:=pc;
357 END;
358 (*IF printnodes THEN WriteCard(con,pc,5); END;*)
359 INC(pc);
360 CASE opcode OF
361 t:
362 IF ORD (typ)=ORD (code [pc])
363 THEN IF typ^eofsy (*t recognized*)
364 THEN running:=FALSE;
365 ELSE INC(pc); mustread:=TRUE;
366 END;
367 ELSE Error(pcraltroot);
368 END;
369 I ta:
370 IF ORD(typ)=ORD(code[pc])
371 THEN INC(pc,3); mustread:=TRUE; (*t recognized*)
372 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*)
373 END;
374 I nt,nts:
375 sy:=ORD(code[pc]);
376 IF Match(typrntsymbols[sy].first) OR ntsymbols[sy].del
377 THEN (*right ntr parse it*)
378 IF opcode=nts THEN INC(pc); Semant(ORD(code[pc))); END;
379 Push(pc+1); pc:=ntsymbols[sy].startpc;
380 altroot:=pc;
381 ELSE Error(pc,altroot);
382 END;
383 I nta,ntas:
384 sy:=ORD(code[pc]);
385 IF Match(typrntsymbols[sy].first)
386 THEN (*right nt, parse it*)
387 INC(pc,3);
388 IF opcode=ntas THEN Semant(ORD(code[pc])); INC(pc) END;
389 Push(pc); pc:-ntsymbols[sy].startpc;
390 altroot:=pc;
391 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*)
392 END;
393 | any: mustread:=TRUE; (*any recognized*)
394 | anya:
395 IF Match(typ,anyset[ORD(code[pc]) ])
396 THEN INC(pc,3); mustread:=TRUE; (*any recognized*)
397 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]);
398 END;
399 I eps:
400 IF Match(typ,epsset[ORD(code[pc])])
401 THEN INC(pc);
402 ELSE Error(pc,altroot);
403 m END;
404 "| epsa:
405 IF Match(typ,epsset[ORD(code[pc])])
406 THEN INC(pcr3); (*eps recognized*)
407 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+23);
408 END;
409 I jmp: pc:=ORD(code[pc])*256+ORD(code[pc+l]); (*goto successor*)
410 | ret: Pop(pc); altroot:=pc; (*end of nt*)
411 ELSE (*sem*)
412 IF correct THEN Semant(ORD(opcode)); END;
413 END; (*CASE*)
414 END; (*WHILE running*)
APP-F
cocosynframe
335
415 END; (*WITH tabA*)
416 corr:=correct;
417 END Parse;
418
419 BEGIN
420 printinput:=FALSE;
421 prlntnodes
:=FAL
422 errdist:=100;
423 lacts:=0;
SE;
424 END —>modulename.
ADDRESS
AdjustPc
ADR
Allocate
altpc
altroot
analyzer
any
anya
anyset
at
Attrlbutenumbers
C
code
col
con
corr
correct
D
declarations
del
e
el
eofsy
eps
epsa
epsset
errdlst
errdistmin
Error
Errornode
Errorptr
Errors
FilelO
Fill
FillSucc
first
FORWARD
GetSy
GetSymlnstr
GiveName
h
HALT
33
131
33
32
89
264
94
380
35
42
42
73
36
50
160
70
370
400
36
31
338
12
347
38
61
151
151
47
43
43
72
82
45
149
30
30
30
31
209
237
62
90
36
89
155
151
302
144
347
171
153
265
149
381
137
137
395
74
136
372
405
183
186
416
79
225
177
172
121
137
137
400
169
169
204
171
151
225
247
221
91
108
175
165
171
311
173
177
175
266
173
390
176
263
138
372
407
349
189
118
376
178
179
363
226
226
405
203
367
177
155
227
285
376
92
216
171
171
214
181
267
184
402
256
394
138
375
407
189
119
179
179
261
256
356
381
232
288
385
93
243
178
172
241
211
269
203
410
267
255
378
409
190
168
179
399
263
422
402
244
94
252
172
269
216
277
393
257
384
409
190
350
179
355
286
272
183
269
230
288
264
388
198
412
404
289
239
289
264
391
198
416
243 245
340 348
266 266
391 395
199 302
252 262
356 367
362
397
311
336
Program listings
%F
header 69
i 93 152 187 188 189 189 190 211 222 223 223 223
278 280 281 281 283 285 285 286 286 322 323
implementation 14
INLINE 33
input 36
j 152 156 159 160 160 161 161 161 163
jmp 43 138 409
1 163
lacts 86 203 209 219 223 225 227 237 244 283 288 289
300 301 301 309 310 310 423
line 36 183 349
lmaxs 46 65 309
loc 298 301 307 310
Match 99 100 223 376 385 395 400 405
maxany 73
maxcode 70
maxeps 72
maxname 52
maxnamep 51
maxp 50 56 64
maxs 64
maxt 56 57 83 84 116 187 222 280
module 36
modulename 7 13 28 424
mustread 341 350 355 356 365 371 393 396
name 77 160 161
Namelist 52 77
namep 76 159
Namepointers 51 76
newlacts 83 190 203 219 223 281
newpc 84 188 189 196 203 219 223 281
next 172 179 179
nextpc 89 153 175 211 216 225 227 239 243 244 244 252
262 264 265 266 267 269
NextSym 105 123 201 356
nra 74
nt 41 137 220 261 374
nta 41 137 220 263 383
ntas 42 137 220 266 383 388
nts 42 137 220 265 374 378
ntsymbols 71 221 225 376 376 379 385 389
olds 85 317 320
opcode 89 153 175 176 211 216 217 239 243 252 255 256
256 260 342 354 355 360 378 388 412
p 156 159 160 161
Parse 12 338 417
pc 80 89 131 134 136 138 138 138 139 140 149 203
203 209 214 215 216 219 223 230 237 241 242 243
245 252 255 257 262 264 264 264 265 266 266 266
267 348 348 354 356 359 362 365 367 370 371 372
372 372 375 378 378 379 379 380 381 384 387 388
388 389 389 390 391 391 391 395 396 397 397 397
400 401 402 405 406 407 407 407 409 409 409 410
410
pel 153 173 173 174 175 181
Pop 298 305 410
Pragma 53 56
Pragmalist 56 75
APP-P
cocosyrtframe
337
printinput
printnodes
ps
push
q
RestoreStack
ret
running
s
SaveStack
sem2
sem3
Semant
semantic
set
Stack
StackElem
startpc
sy
Symbollist
Symbolnode
Symbolset
SyntaxError
SYSTEM
System
t
ta
tab
TableContents
tables
Triple
txt
typ
WriteCard
WriteLn
WriteString
9
10
75
307
155
91
43
343
.85
92
54
54
35
35
99
65
93
60
89
219
379
64
59
57
30
33
32
41
41
68
330
332
94
161
36
362
31
31
31
420
185
118
314
161
194
139
350
212
184
118
119
118
100
85
285
348
99
221
384
71
64
62
183
137
137
117
333
184
116
363
189
190
186
197
118
379
163
316
410
353
221
319
118
119
119
286
379
100
225
385
72
218
218
133
347
277
118
370
189
198
421
119
389
317
364
223
320
378
322
389
100
239
389
73
261
263
158
290
118
376
190
199
119
301
388
323
153
243
99
361
369
221
119
385
198
302
310
412
155
252
212
225
119
395
311
317
159
257
254
121
400
320
175
258
347
171
405
323
178 211 216 219
344 375 376 376
352
196 198 203 203
338
Program listings
App.F
1 (* cocotst Perform various tests with top-down graph Moe 12.1.83
2 ======= =================:=================:=======
3 This module tests
4 a) if all nonterminals can be reached from the start symbol
5 b) if there exist productions for all nonterminals
6 c) if all nonterminals can be derived to terminals
7 d) if the grammar is free of circular derivations
8 e) if the grammar satisfies the LL(1)-conditions
9 t)
10 DEFINITION MODULE cocotst;
11
12 PROCEDURE FindCircularRules(VAR ok:BOOLEAN);
13 (* Finds and prints the circular part of the grammar, ok means:
14 no circular part*)
15
16 PROCEDURE LLlTest(VAR lll:BOOLEAN);
17 (* Checks if the grammar satisfies the LL(1) conditions*)
18
19 PROCEDURE TestCompleteness(VAR ok:BOOLEAN);
20 (* ok=TRUE if all nonterminals have rules*)
21
22 PROCEDURE TestIfAllNtReached(VAR ok:BOOLEAN);
23 (* ok=TRUE if all nonterminals can be reached from the start symbol*)
24
25 PROCEDURE TestIfNtToTerm(VAR ok:BOOLEAN);
26 (* ok=TRUE if all nonterminals can be reduced to terminals*)
27
28 END cocotst.
APP-F
cocotstMOD
339
(* cocotst Perform various tests with the top-down graph Moe 11.1.84
This module tests
a) if all nonterminals can be reached from the start symbol
b) if there exist productions for all nonterminals
c) if all nonterminals can be derived to terminals
d) if the grammar is free of circular derivations
e) if the grammar satisfies the LL(1)-conditions
*>
IMPLEMENTATION MODULE cocotst;
FROM cocogra IMPORT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28 PROCEDURE FindCircularRules (VAR ok-.BOOLEAN);
29 CONST
30 circmax = 150;
31 TYPE
32 Circrule = RECORD
33 left,right: CARDINAL;
34 del: BOOLEAN;
35 END;
36 Circrulelist = ARRAY[1..circmax] OF Circrule;
FROM cocolex IMPORT
FROM cocolst IMPORT
FROM cocosym IMPORT
FROM FilelO
rootloc, ClearMarkList, Deletable, DelNode,
Graphnode, GetNode, Mark, Marked, Marklist;
ddt, GetName;
1st;
maxp, maxs, maxt, ClearSet, GetF,
GetFirstSet, GetFo, GetSy, IsInSet, RepSy,
SetBit, Unit, Symbolnode, Symbolset, Symboltype;
IMPORT con, WriteCard, WriteString, WriteText, WriteLn;
VAR
headline: BOOLEAN;
11: BOOLEAN;
(*TRUE if header shall be printed*)
(*TRUE if LL(1) conditions hold*)
(* FindCircularRules Test grammar for circular derivations
37 VAR
38 c: Circrulelist;
39 changed: BOOLEAN;
40 headline: BOOLEAN;
41 i,j,k,dummy: CARDINAL;
42 lcirc: CARDINAL;
43 m: Marklist;
44 singleset: Marklist; (*set of single nonterminals in a production*)
45 sn: Symbolnode;
46 rside,lside: BOOLEAN;
47
48 PROCEDURE GetSingles(loc:CARDINAL; VAR singles.-Marklist);
49 VAR gn: Graphnode;
50 BEGIN
51 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
52 Mark(loc,m);
53 GetNode(loc,gn);
54 CASE gn.typ OF
55 eps: GetSingles(gn.rp,singles);
56 I t,any: ;
57 | nt: IF Deletable(gn.rp) THEN Mark(gn.sp,singles); END;
58 IF DelNode(gn) THEN GetSingles(gn.rp,singles); END;
59 END; (*CASE*)
340 Program listings App. p
60 GetSingles(gn.lp,singles);
61 END Getsingles;
62
63 PROCEDURE PutCirc(i:CARDINAL);
64 VAR
65 1: CARDINAL;
66 name: ARRAY[1..50] OF CHAR;
67 sn: Symbolnode;
68 BEGIN
69 IF headline THEN
70 WriteLn(lst);
71 WriteString(1st,"Circular part for this grammar:");
72 WriteLn(lst);
73 headline:=FALSE;
74 END;
75 WriteStringdst," ");
76 GetSy(c[i].left,sn); GetName(sn.spix,name,l);
77 WriteText( 1st,name, 1); WriteStringdst," —> ");
78 GetSy(c[i].right,sn); GetName(sn.spix,name,l);
79 WriteText(1st,name,1); WriteLn(lst);
80 END PutCirc;
81
82 BEGIN (*FindCircularRules*)
83 lcirc:=0;
94 (* fiXX list of circular derivations c*)
85 FOR i:=maxp+l TO maxs DO
86 ClearMarkList(singleset); ClearMarkList(m);
87 GetSy(i,sn);
88 GetSingles(sn.start,singleset); (*get nt's j such that i->j*)
89 FOR j:=maxp+l TO maxs DO
90 IF Marked (j,singleset) THEN
91 INC(lcirc);
92 WITH c[lcirc] DO left:^i; right:=j; del:=FALSE; END;
93 IF ddt["D"] THEN
94 WriteCard(con,lcirc,6); WriteCard(con,i,6);
95 WriteCard(con,j,6); WriteLn(con);
96 END;
97 END; (*IF Marked*)
98 END; (*FOR j*)
99 END; (*FOR i*)
100 (* remove non circular derivations from c*)
101 REPEAT
102 changed:=FALSE;
103 FOR i:=l TO lcirc DO
104 IF NOT c[i].del THEN
105 rside:=FALSE; lside:=FALSE;
106 FOR j:=l TO lcirc DO
107 IF NOT c[j].del THEN
108 IF c[i].left=c[j).right THEN rside:=TRUE; END;
109 IF c[j].left-c[i].right THEN lside:=TRUE; END;
110 END;
111 END; (*FOR j*)
112 IF NOT rside OR NOT lside THEN
113 c[i].del:=TRUE; changed:=TRUE;
114 IF ddt["D"] THEN
115 WriteCard(con,i,6); WriteString(con," deleted$");
116 END;
117 END;
118 END; (*IF NOT c[i].del*)
App. F cocotstMOD
119 END; (*FOR*)
120 UNTIL NOT changed;
121 (* c contains the circular part of the grammar. Print it*)
122 ok:=TRUE; headline:=TRUE;
123 FOR i:-l TO lcirc DO
124 IF NOT c[i].del THEN PutCirc(i); ok:=FALSE; END;
125 END;
126 IF ok THEN
127 WriteLn(lst);
128 WriteString(1st,"Grammar contains no circular derivations.");
129 WriteLnUst);
130 END;
131 END FindCircularRules;
132
133
134 (* LLlError Print LL(1) error message
135
136 PROCEDURE LLlError(code,line,sy:CARDINAL);
137 VAR
138 1: CARDINAL;
139 name: ARRAY[1..50] OF CHAR;
140 sn: Symbolnode;
141 BEGIN
142 IF headline THEN
143 headline :=FALSE;
144 WriteLnUst); WriteString(lst,"LL(l)-error(s):"); WriteLn(lst);
145 END;
146 WriteString(1st," line"); WriteCard(lst,line,4);
147 GetSy(sy,sn); GetName(sn.spix,name,l);
148 WriteStringdst," ");
149 CASE code OF
150 1: WriteText(1st,name,1);
151 WriteStringdst," is start of more than one alternative.");
152 |2: WriteText(1st,name,1);
153 WriteStringdst," is start and successor of deletable ");
154 WriteStringdst,"rest of rule.");
155 END;
156 WriteLn(lst);
157 11:-FALSE;
158 END LLlError;
159
160
161 (* LLlTest Collects terminal sets and checks LL(1) conditions
162
163 PROCEDURE LLlTest(VAR lll:BOOLEAN);
164 VAR
165 dummy: CARDINAL;
166 gn: Graphnode;
167 i,loc: CARDINAL;
168 m: Marklist;
169 sn: Symbolnode;
170
171
172 PROCEDURE Test(VAR sl,s2:Symbolset; code, line CARDINAL);
173 VAR i CARDINAL;
174 BEGIN
175 FOR i:=0 TO maxt DO
176 IF IsInSet(i,sl) AND IsInSet(i,s2) THEN
177 LLlError(code,line,i);
342
Program listings
App.F
178 END;
179 END;
180 END Test;
181
182
183 PROCEDURE CheckAlternatives(loc,sym:CARDINAL);
184 VAR
185 gn: Graphnode;
186 locset,s,first,follow: Symbolset;
187 BEGIN
188 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
189 GetNode(loc,gn);
190 IF ddt["F"] THEN
191 WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ),6);
192 WriteCard(con,gn.sp,6); WriteLn(con);
193 END;
194 IF Deletable(loc) THEN
195 GetFirstSet(loc,s); GetFo(sym,follow);
196 Test(s,follow,2,gn.line);
197 END;
198 ClearSet(s,maxt);
199 WHILE locoO DO
200 Mark(loc,m);
201 GetNode(loc,gn);
202 IF DelNode(gn)
203 THEN GetFirstSet(gn.rp,locset);
204 ELSE ClearSet(locset,maxt);
205 END;
206 CASE gn.typ OF
207 t: SetBit(locset,gn.sp);
208 I nt: GetF(gn.sp,first); Unit(locset,first,maxt);
209 I eps,any: ;
210 END;
211 Test(s,locset,l,gn.line);
212 Unit(s,locset,maxt);
213 CheckAlternatives(gn.rp, sym);
214 loc:-gn.lp;
215 END;
216 END CheckAlternatives;
217
218
219 BEGIN (*LLlTest*)
220 11:=TRUE; headline:=TRUE;
221 FOR i:=maxp+l TO maxs DO
222 ClearMarkList(m);
223 GetSy(i,sn);
224 CheckAlternatives(sn.start,i);
225 END;
226 IF 11 THEN
227 WriteLn(lst);
228 WriteString(1st,"Grammar satisfies LL(1)-conditions."); WriteLn(lst);
229 END;
230 111:=11;
231 END LLlTest;
232
233
234 (* TestCompleteness Test if all nonterminals have rules
235 *)
236 PROCEDURE TestCompleteness(VAR ok:BOOLEAN);
App. F
cocotstMOD
237 VAR
238 sn: Symbolnode;
239 1,1,dummy: CARDINAL;
240 name: ARRAY[1..50] OF CHAR;
241 BEGIN
242 ok:=TRUE;
243 FOR i:=maxp+l TO maxs DO
244 GetSy(i,sn);
245 IF sn.start=0 THEN
246 IF ok THEN
247 WriteLn(lst);
248 WriteString(1st,"Nonterminals without rules:"); WriteLn(lst);
249 END;
250 GetName(sn.spix,name,l);
251 WriteString(1st," "); WriteText(1st,name,1); WriteLn(lst);
252 ok:=FALSE;
253 END;
254 END; (*F0R*)
255 IF ok THEN
256 WriteLn(lst);
257 WriteString(1st,"All nonterminals have rules."); WriteLn(lst);
258 END;
259 END TestCompleteness;
260
261
262 (* TestlfAllNtReached Tests if all nts can be reached
263
264 PROCEDURE TestlfAllNtReached(VAR ok:B00LEAN);
265 VAR
266 gn: Graphnode;
267 i,l,dummy: CARDINAL;
268 m: Marklist;
269 name: ARRAY[1..50] OF CHAR;
270 sn: Symbolnode;
271 reached: Marklist;
272
273 PROCEDURE MarkReachedNts(loc:CARDINAL);
274 VAR gn: Graphnode;
275 sn: Symbolnode;
276 BEGIN
277 IF (loc=0) OR Marked(loc,m) THEN RETURN; END;
278 Mark(loc,m);
279 GetNode(loc,gn);
280 WITH gn DO
281 IF (typ=nt) AND NOT Marked(sp,reached) THEN
282 Mark(sp,reached); GetSy(sp,sn); MarkReachedNts(sn.start);
283 END;
284 MarkReachedNts(lp);
285 MarkReachedNts(rp);
286 END;
287 END MarkReachedNts;
288
289 BEGIN
290 ClearMarkList(m);
291 ClearMarkList(reached);
292 GetNode(rootloc,gn); Mark(gn.sp,reached);
293 GetSy(gn.sp,sn);
294 MarkReachedNts(sn.start);
295 ok:=TRUE;
344
Program listings
App.F
296 FOR i:=maxp+l TO maxs DO (*report not marked symbols*)
297 IF NOT Marked(i,reached) THEN
298 GetSy(i,sn); GetName(sn.spix,name,l);
299 WriteString(1st,"Nonterminal "); WriteText(1st,name,1);
300 WriteStringdst," cannot be reached."); WriteLn(lst);
301 ok:=FALSE;
302 END;
303 END;
304 IF ok THEN
305 WriteLn(lst);
306 WriteStringdst,"All nonterminals can be reached."); WriteLn(lst);
307 END;
308 END TestlfAllNtReached;
309
310
311 (* TestlfNtToTerm Test if all nt can be derived to t
312 *)
313 PROCEDURE TestlfNtToTerm(VAR ok:BOOLEAN);
314 VAR
315 1,1,dummy: CARDINAL;
316 sn: Symbolnode;
317 name: ARRAY[1..50) OF CHAR;
318 changed: BOOLEAN;
319 termlist: Marklist; (*list of nts which can be derived to t*)
320 m: Marklist;
321 term: BOOLEAN;
322
323 PROCEDURE IsTerm(loc:CARDINAL):BOOLEAN;
324 VAR gn: Graphnode;
325 BEGIN
326 IF (loc=0) OR Marked(loc,m) THEN RETURN FALSE; END;
327 Mark(loc,m);
328 GetNode(loc,gn);
329 WITH gn DO
330 IF (typ=nt) AND NOT Marked(sp, termlist)
331 THEN RETURN IsTerm(lp);
332 ELSE RETURN (rp=0) OR IsTerm(rp) OR IsTerm(lp);
333 END;
334 END;
335 END IsTerm;
336
337 BEGIN (*TestIfNtToTerm*)
338 ClearMarkList(termlist);
339 REPEAT
340 changed:=FALSE;
341 FOR i:=maxp+l TO maxs DO
342 IF NOT Markedd,termlist) THEN
343 GetSy(i,sn);
344 ClearMarkList(m);
345 term:=IsTerm(sn.start);
346 IF term THEN Mark(i,termlist); changed:=TRUE; END;
347 IF ddt["E") THEN
348 WriteCard(con,i,6);
349 IF term
350 THEN WriteString(con," reducable to term.$");
351 ELSE WriteString(con," not reducable to term.$"); END;
352 END;
353 END; (*IF NOT Marked*)
354 END; (*FOR*)
355 UNTIL NOT changed;
App. F cocotstMOD 345
356 ok:=TRUE;
357 WriteLn(lst);
358 FOR i:=maxp+l TO maxs DO
359 IF NOT Marked(i#termlist) THEN
360 GetSy(i,sn); GetName(sn.spix,name,l);
361 WriteText(1st,name,1);
362 WriteStringdst," cannot be derived to terminals."); WriteLn(lst);
363 ok:=FALSE;
364 END;
365 END; (*FOR*)
366 IF ok THEN
367 WriteStringdst,"All nonterminals can be derived to terminals.");
368 WriteLn(lst);
369 END;
370 END TestlfNtToTerm;
371
372
373 END cocotst.
any
c
changed
CheckAlternatives
circmax
Circrule
Circrulelist
ClearMarkList
ClearSet
cocogra
cocolex
coco1st
cocosym
cocotst
code
con
ddt
del
Deletable
DelNode
dummy
eps
FilelO
56
38
39
30
32
36
12
16
12
14
15
16
10
136
19
350
14
34
12
12
41
55
19
FindCircularRules 28
first
follow
GetF
GetFirstSet
GetFo
GetName
GetNode
GetSingles
GetSy
gn
Graphnode
headline
186
186
16
17
17
14
13
48
17
49
191
266
13
22
209
76 78 92 104 107 108 108 109 109 113 124
102 113 120 318 340 346 355
183 213 216 224
36
36
38
86
86 222 290 291 338 344
198 204
373
149 172 177
94 94 95 95 115 115 191 191 192 192 348
351
93 114 190 347
92 104 107 113 124
57 194
58 202
165 239 267 315
209
131
208 208
195 196
208
195 203
195
76 78 147 250 298 360
53 189 201 279 292 328
55 58 60 61 88
76 78 87 147 223 244 282 293 298 343 360
53 54 55 57 57 58 58 60 166 185 189
192 196 201 202 203 206 207 208 211 213 214
274 279 280 292 292 293 324 328 329
49 166 185 266 274 324
40 69 73 122 142 143 220
346 Program listings App. F
i 41 63 76 78 85 87 92 94 103 104 108 109
113 115 123 124 124 167 173 175 176 176 177 221
223 224 239 243 244 267 296 297 298 315 341 342
343 346 348 358 359 360
IsInSet 17 176 176
IsTerm 323 331 332 332 335 345
j 41 89 90 92 95 106 107 108 109
k 41
I 65 76 77 78 79 138 147 150 152 239 250 251
267 298 299 315 360 361
lcirc 42 83 91 92 94 103 106 123
left 33 76 92 108 109
line 136 146 172 177 196 211
II 23 157 220 226 230
III 163 230
LLlError 136 158 177
LLlTest 163 231
loc 48 51 51 52 53 167 183 188 188 189 191 194
195 199 200 201 214 273 277 277 278 279 323 326
326 327 328
locset 186 203 204 207 208 211 212
lp 60 214 284 331 332
lside 46 105 109 112
1st 15 70 71 72 75 77 77 79 79 127 128 129
144 144 144 146 146 148 150 151 152 153 154 156
227 228 228 247 248 248 251 251 251 256 257 257
299 299 300 300 305 306 306 357 361 362 362 367
368
m 43 51 52 86 168 188 200 222 268 277 278 290
320 326 327 344
Mark 13 52 57 200 278 282 292 327 346
Marked 13 51 90 188 277 281 297 326 330 342 359
Marklist 13 43 44 48 168 268 271 319 320
MarkReachedNts 273 282 284 285 287 294
maxp 16 85 89 221 243 296 341 358
maxs 16 85 89 221 243 296 341 358
maxt 16 175 198 204 208 212
name 66 76 77 78 79 139 147 150 152 240 250 251
269 298 299 317 360 361
nt 57 208 281 330
ok 28 122 124 126 236 242 246 252 255 264 295 301
304 313 356 363 366
PutClrc 63 80 124
reached 271 281 282 291 292 297
RepSy 17
right 33 78 92 108 109
rootloc 12 292
rp 55 57 58 203 213 285 332 332
rside 46 105 108 112
s 186 195 196 198 211 212
si 172 176
s2 172 176
SetBit 18 207
singles 48 55 57 58 60
singleset 44 86 88 90
sn 45 67 76 76 78 78 87 88 140 147 147 169
223 224 238 244 245 250 270 275 282 282 293 294
298 298 316 343 345 360 360
sp 57 192 207 208 281 282 282 292 293 330
App.F
cocotstMOD
347
spix
start
sy
sym
Symbolnode
Symbolset
Symboltype
t
term
termlist
Test
76
88
136
183
18
18
18
56
' 321
319
172
78
224
147
195
45
172
207
345
330
180
147
245
213
67
186
346
338
196
250
282
140
349
342
211
298
294
169
346
360
345
238
359
TestCompleteness 236 259
TestlfAllNtReached 264 308
TestlfNtToTerm 313 370
typ 54 191 206 281 330
Unit 18 208 212
WriteCard 19 94 94 95 115 146 191 191 192 348
WriteLn 19 70 72 79 95 127 129 144 144 156 192 227
228 247 248 251 256 257 300 305 306 357 362 368
WrlteString 19 71 75 77 115 128 144 146 148 151 153 154
228 248 251 257 299 300 306 350 351 362 367
WrlteText 19 77 79 150 152 251 299 361
348
Program listings
App.F
1 (* Errors General module to store error messages Moe 21.03.84
2 sssssas ===ssssssssssssss=sr======r========sss
3 This module stores information about syntax errors and semantic errors.
4 The information can either be retrieved afterwards or be printed
5 automatically as simple error messages.
6 Furthermore the module contains procedures to report compiler errors
7 and implementation restrictions. These procedures cause a program stop.
8 *)
9 DEFINITION MODULE Errors;
10
11 FROM FilelO IMPORT File;
12
13 TYPE
14 Symbolname = ARRAY[1..25] OF CHAR;
15 Errorptr = POINTER TO Errornode;
16 Errornode = RECORD (*expected symbol in syntax error message*)
17 txt: Symbolname;
18 1: CARDINAL;
19 next: Errorptr;
20 END;
21
22
23 PROCEDURE CompErr(nr:CARDINAL);
24 (* Reports compiler error nr and stops the program*)
25
26 PROCEDURE GetNextSemErr(VAR nr,line,col:CARDINAL);
27 (* Gets the error number, the line number and the column number of the
28 next semantic error. nr=0 if no next error exists*)
29
30 PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL);
31 (* Gets the expected symbols, the line number and the column number of
32 the next syntax error. symbols=NIL if no next error exists*)
33
34 PROCEDURE GetNumberOfErrors(VAR synerrors,semerrors:CARDINAL);
35 (* Gets the total number of syntax errors and semantic errors which
36 occurred during compilation*)
37
38 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL);
39 (* Prints error messages for all stored semantic errors (line,col,
40 error number). semerrors holds the total number of stored semantic
41 errors*)
42
43 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL);
44 (* Prints error messages for all stored syntax errors (line,col,
45 "near symbol",expected symbols), synerrors holds the total number of
46 stored syntax errors*)
47
48 PROCEDURE PrintSynError(f:File; symbols:Errorptr; col: CARD INAL);
49 (* Prints one error message line (A expected symbols).*)
50
51 PROCEDURE Restriction(nrCARDINAL);
52 (* Reports implementation restriction nr and stops the program*)
53
54 PROCEDURE SemErr(nr,line,col CARDINAL);
55 (* Stores the error number, line number and column number of a semantic
56 error*)
57
58 PROCEDURE SyntaxError(symbols:Errorptr; line,col:CARDINAL);
59 (* Stores the "near-symbol", the expected symbols, the line number and
App. F ErrorsDEF 349
60 the column number of a syntax error*)
61
62 END Errors.
I
\
350
Program listings
App. F
1 (* Errors General module to store error messages Moe 21.03.84
3 This module stores information about syntax errors and semantic errors.
4 The information can either be retrieved afterwards or be printed
5 automatically as simple error messages.
6 Furthermore the module contains procedures to report compiler errors
7 and implementation restrictions. These procedures cause a program stop.
8 *)
9 IMPLEMENTATION MODULE Errors;
10
>11 (*imports of definition module*)
12 FROM FilelO IMPORT File;
13
14 (*imports of implementation module*)
15 FROM FilelO IMPORT con, Write, WriteCard, WriteLn, WriteString,
16 WriteText, Read;
17 FROM System IMPORT Allocate, Deallocate, Terminate, normal;
18
19
20 TYPE
21 Semerrptr = POINTER TO Semerror;
22 Semerror = RECORD
23 nr,line,col: CARDINAL;
24 next: Semerrptr;
25 END;
26 Synerrptr = POINTER TO Synerror;
27 Synerror = RECORD
28 symbols: Errorptr;
29 line,col: CARDINAL;
30 next: Synerrptr;
31 END;
32
33 VAR
34 semerr: Semerrptr;
35 synerr: Synerrptr;
36
37
38 (* CompErr Reports compiler error nr and stops the program
39 *)
40 PROCEDURE CompErr(nr:CARDINAL);
41 VAR dummy:CARDINAL; ch:CHAR;
42 BEGIN
43 PrintSynErrors(con,dummy); PrintSemErrors(con,dummy);
44 WriteString(con,"Compiler error "); WriteCard(con,nr,0);
45 WriteString(con,". Program terminated.$");
46 WriteString (con, "Press a key to continued"); Read (con, ch);
47 Terminate(normal);
48 END CompErr;
49
50
51 (* GetNextSemErr Gets next semantic error information
52 *]
53 PROCEDURE GetNextSemErr(VAR nr,line,col CARDINAL);
54 VAR p: Semerrptr;
55 BEGIN
56 IF semerr=NIL
57 THEN nr:*0; line:=*0; col:=0;
58 ELSE
59 p:=semerr;
App.F
ErrorsMOD
351
60 nr:=pA.nr; line:=pA.line; col:=pA.col;
61 semerr:=pA.next; Deallocate(p);
62 END;
63 END GetNextSemErr;
64
65
66 {* GetNextSynErr Gets next syntax error information
67 *
68 PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL);
69 VAR p: Synerrptr;
70 BEGIN
71 IF synerr=NIL
72 THEN symbols:=NIL; line:=0; col:=0;
73 ELSE
74 p:=synerr;
75 symbols:=pA.symbols; line:=pA.line; col:=pA.col;
76 synerr:=pA.next; Deallocate(p);
77 END;
78 END GetNextSynErr;
79
80
81 (* GetNumberOfErrors Gets the total number of errors that occurred
g2 *
83 PROCEDURE GetNumberOfErrors(VAR synerrors,semerrors:CARDINAL);
84 VAR
85 syn: Synerrptr;
86 sem: Semerrptr;
87 BEGIN
88 synerrors:-0; syn:=synerr;
89 WHILE synoNIL DO INC (synerrors); syn :=synA. next; END;
90 semerrors:=0; sem:=semerr;
91 WHILE semoNIL DO INC (semerrors); sem :=semA. next; END;
92 END GetNumberOfErrors;
93
94
95 (* PrintSemErrors Prints simple error messages for semantic errors
96 *
97 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL);
98 VAR
99 p: Semerrptr;
100 synerrors: CARDINAL;
101 BEGIN
102 GetNumberOfErrors(synerrors,semerrors);
103 IF semerrors>0 THEN
104 WriteString(f,"Semantic errors:$$");
105 p:=semerr;
106 WHILE pONIL DO
107 WriteString(f,"line"); WriteCard(f,pA.line,5);
108 WriteString(f," col"); WriteCard(f,pA.col,3);
109 WriteString(f,": error "); WriteCard(f,pA.nr,0);
110 WriteLn(f);
111 p:=pA.next;
112 END;
113 END;
114 END PrintSemErrors;
115
116
117 (* PrintSym Print a symbol in error message
118 *
352
Program listings
App.F
119 PROCEDURE PrintSym(f:Flle; txtiARRAY OF CHAR; lentCARDINAL);
120 BEGIN
121 IF len-1
122 THEN Write(f,,,M); Write(fftxt[0J); Write(f,»"»);
123 ELSE WriteText(f,txt,len);
124 END;
125 END PrintSym;
126
127
128 (* PrintExpected Print expected symbols
129 *)
130 PROCEDURE PrintExpected(f:File; VAR ptErrorptr);
131 VAR first:BOOLEAN; qtErrorptr;
132 BEGIN
133 first:=TRUE;
134 WHILE pONIL DO
135 IF first THEN first:=FALSE
136 ELSIF pA.next=NIL THEN WriteString(f,' or »)
137 ELSE WriteString(fr'r ')
138 END;
139 PrintSym(f,pA.txt,pA.l);
140 q:=p; p:=spA.next; Deallocate(q);
141 END;
142 WriteString(f,■ expected'); WriteLn(f);
143 END PrintExpected;
144
145
146 (* PrintSynErrors Prints simple error messages for syntax errors
147 *)
148 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL);
149 VAR
150 err#errl: Synerrptr;
151 p: Errorptr;
152 semerrors: CARDINAL;
153 BEGIN
154 GetNumberOfErrors(synerrors,semerrors);
155 IF synerrors>0 THEN
156 WriteString(f,"Syntax errors:$$");
157 err:=synerr;
158 WHILE errONIL DO
159 WriteStringif/line'); WriteCard(f,errA.line,5);
160 p:=errA.symbols;
161 WriteString(fr' near »); PrintSym(f,pA.txt,pA.l);
162 WriteString(f,• : •);
163 PrintExpected(ffpA.next); Deallocate(p);
164 errl:-err; err:=errA.next; Deallocate(errl);
165 END;
166 END;
167 END PrintSynErrors;
168
169
170 (* PrintSynError Prints one error message line
171 *)
172 PROCEDURE PrintSynError{f:File; symbols:Errorptr; col:CARDINAL);
173 VAR i CARDINAL;
174 BEGIN
175 WriteString(f,"***** "); FOR i:=l TO col-1 DO Write(ffH ") END;
176 WriteString(f,"A ");
177 PrintExpected(f,symbolsA.next); Deallocate(symbols);
App. F
ErrorsMOD
353
178 END PrintSynError;
179
180
181 (* Restriction Reports impl. restriction nr and stops the program
182 *
183 PROCEDURE Restriction(nr:CARDINAL);
184 VAR dummy:CARDINAL; ch:CHAR;
185 BEGIN
186 PrintSynErrors(con,dummy); PrintSemErrors(con,dummy);
187 WriteString(con,"Implementation restriction "); WrlteCard{con,nr,0);
188 WriteString(con,". Program terminated.$");
189 Wri test ring (con, "Press a key to continued"); Read(con,ch);
190 Terminate(normal);
191 END Restriction;
192
193
194 (* SemErr Stores information about semantic error
195 i
196 PROCEDURE SemErr(nr,line,col CARDINAL);
197 VAR e,p,q: Semerrptr;
198 BEGIN
199 Allocate(e,SIZE(Semerror)); eA.nr:=nr; eA.line:=line; eA.col:=col;
200 p:=semerr; q:=NIL;
201 WHILE (pONIL) AND (pA.line<line) DO q:=p; p:=pA.next; END;
202 WHILE (pONIL) AND (pA.line=line) AND (pA.col<col) DO
203 q:=p; p:=pA.next;
204 END;
205 IF q=NIL THEN semerr:=e; ELSE qA.next:=e; END;
206 eA.next:=p;
207 END SemErr;
208
209
210 (* SyntaxError Stores information about syntax error
2U i
111 PROCEDURE SyntaxError(symbols:Errorptr; line, col: CARDINAL);
213 VAR e,p,q: Synerrptr;
214 BEGIN
215 Allocate(e,SIZE(Synerror));
216 eA.symbols:=symbols; eA.line:=line; eA.col:-col;
217 p:«synerr; q:=NIL;
218 WHILE (pONIL) AND (pA.line<line) DO q:=p; p:-pA.next; END;
219 WHILE (poNIL) AND (pA.line=line) AND (pA.col<col) DO
220 q:=p; p:=pA.next;
221 END;
222 IF q-NIL THEN synerr:=e; ELSE qA.next:=e; END;
223 eA.next:=p;
224 END SyntaxError;
225
226 BEGIN (*Errors*)
227 synerr:=NIL; semerr:=NIL;
228 END Errors.
Allocate 17 199 215
ch 41 46 184 189
col 23 29 53 57 60 60 68 72 75 75 108 172
175 196 199 199 202 202 212 216 216 219 219
CompErr 40 48
con 15 43 43 44 44 45 46 46 186 186 187 187
354
Program listings
App.F
188 189 189
Deallocate 17 61 76 140 163 164 177
dummy 41 43 43 184 186 186
e 197 199 199 199 199 205 205 206 213 215 216 216
216 222 222 223
err 150 157 158 159 160 164 164 164
errl 150 164 164
Errorptr 28 68 130 131 151 172 212
Errors 9 228
f 97 104 107 107 108 108 109 109 110 119 122 122
122 123 130 136 137 139 142 142 148 156 159 159
161 161 162 163 172 175 175 176 177
File 12 97 119 130 148 172
FilelO 12 15
first 131 133 135 135
GetNextSemErr 53 63
GetNextSynErr 68 78
GetNumberOfErrors 83 92 102 154
i 173 175
1 139 161
len 119 121 123
line 23 29 53 57 60 60 68 72 75 75 107 159
196 199 199 201 201 202 202 212 216 216 218 218
219 219
next 24 30 61 76 89 91 111 136 140 163 164 177
201 203 205 206 218 220 222 223
normal 17 47 190
nr
23
1 QQ
54
76
136
197
40
59
76
139
200
44
60
99
139
201
53
60
105
140
201
57
60
106
140
201
60
61
107
140
201
60
61
108
151
201
109
69
109
160
202
183
74
111
161
202
187
75
111
161
202
196
75
130
163
203
199
75
134
163
203
203 206 213 217 218 218 218 218 218 219 219 219
220 220 220 223
PrintExpected
PrintSemErrors
PrintSym
PrintSynError
PrintSynErrors
q
Read
Restriction
sem
semerr
SemErr
Semerror
semerrors
Semerrptr
symbols
syn
synerr
Synerror
synerrors
Synerrptr
SyntaxError
System
Terminate
130
43
119
172
43
131
220
16
183
86
34
196
21
83
21
28
85
35
26
83
26
212
17
17
143
97
125
178
148
140
222
46
191
90
56
207
22
90
24
68
88
71
27
88
30
224
47
163
114
139
167
140
222
189
91
59
199
91
34
72
89
74
215
89
35
190
177
186
161
186
197
91
61
97
54
75
89
76
100
69
200
91
90
102
86
75
89
88
102
85
201
105
103
99
160
157
148
150
203
200
152
197
172
217
154
213
205
205
154
177
222
155
205 213 217 218
227
177 212 216 216
227
App. F ErrorsMOD 355
txt 119 122 123 139 161
Write 15 122 122 122 175
WriteCard 15 44 107 108 109 159 187
WriteLn 15 110 142
WriteStrlng 15 44 45 46 104 107 108 109 136 137 142 156
159 161 162 175 176 187 188 189
WriteText 16 123
356
Program listings
App.F
1 (* FilelO Simple 10 with more than one file Moe 16.8.87
2 —=== =================================
3 This module provides procedures which are similar to those of InOut,
4 except that they can be used with more than one file (even with the
5 console).
g *)
7 DEFINITION MODULE FilelO;
8
9 FROM SYSTEM IMPORT WORD;
10 FROM Toolbox IMPORT DialogPtr;
11 FROM OS IMPORT ParmBlkPtr;
12
13 CONST
14 DEL = 177C;
15 EF = 4C;
16 EOL = 15C;
17 ESC = 33C;
18 buffersize = 16*1024;
19
20 TYPE
21 File = POINTER TO FileRecord;
22 FileRecord = RECORD
23 ref: INTEGER; (*file reference number*)
24 volRef: INTEGER; (*volume (subdirectory) reference number*)
25 name: ARRAY[0..63] OF CHAR; (*Modula string terminated by 0C*)
26 buffer: ARRAY[0..buffersize-1] OF CHAR;
27 bp: CARDINAL; (*index of next byte in buffer*)
28 bb: CARDINAL; (*number of bytes in buffer*)
29 output: BOOLEAN; (*true, if opened for output*)
30 eof: BOOLEAN; (*true, if no more unread bytes*)
31 END;
32
33 VAR
34 con: File; (*console file (screen and keyboard)*)
35 Done: BOOLEAN; (*TRUE if an operation was successful*)
36 termCH: CHAR; (*first character after input text*)
37
38 (* — for Mac open dialog box (see "Inside Macintosh") — *)
39 TYPE
40 FilterHook « PROCEDURE(ParmBlkPtr): BOOLEAN;
41 DialogHook = PROCEDURE(INTEGER, DialogPtr): INTEGER;
42 Filetype « ARRAY[0..3] OF CHAR;
43
44 VAR
45 errCode: INTEGER; (*file manager status code*)
46 filterHook: FilterHook; (*file filter procedure (init none)*)
47 dlgHook: DialogHook; (*dialog handling procedure (init none)*)
48 ftype: ARRAY[0..3] OF Filetype;
49 (*file types to be handled by open dialog*)
50 (*init: ftype[0]:="TEXT", ftype[l..3]:-""*)
51 (* *)
52
53 PROCEDURE Open(VAR f:File; volRef:INTEGER; fn:ARRAY OF CHAR;
54 output:BOOLEAN);
55 (* Opens file f with name fn on volume (subdirectory) volRef.
56 volRef 0:default volume; 1:internal drive; 2:external drive
57 negative:volume or subdirectory reference number.
58 fn - If not empty, fn is the name of the file to be opened on
59 volume (subdirectory) volRef. The drive number may be placed
App.F
FilelODEF
357
60 in front of the file name separated by a colon (e.g.1:name).
61 It overwrites volRef.
62 - If empty, an open dialog box is displayed which allows
63 choosing the volume, subdirectory and filename. The chosen
64 values are returned in fA. The value of volRef is irrelevant
65 in this case.
66 (Advanced programmers: Only those files are displayed whose
67 file type is contained in ftype. Own procedures may be
68 supplied in the variables "filterHook" and "dlgHook" to
69 suppress file names in the open box or to handle additional
70 dialog items.)
71 output TRUE: the specified file is opened for output. Any existing
72 file with the same name is deleted.
73 FALSE: the specified file is opened for input.
74 Done indicates if the file f has been opened successfully.*)
75
76 PROCEDURE Close(VAR f:File);
77 (* Closes file f. f becomes NIL*)
78
79 PROCEDURE Read(f:File; VAR ch:CHAR);
80 (* Reads a character ch from the file f (no echo on the console).
81 Done indicates if the operation has been successful*)
82
83 PROCEDURE ReadCard(f:File; VAR val:CARDINAL);
84 (* Reads a CARDINAL from file f (leading blanks are skipped).
85 termCH and Done get values*)
86
87 PROCEDURE Readlnt(f:File; VAR val:INTEGER);
88 (* Reads an INTEGER from file f (leading blanks are skipped).
89 termCH and Done get values*)
90
91 PROCEDURE ReadString(f -.File; VAR s:ARRAY OF CHAR);
92 (* Reads a string of characters (terminated by " ■ or CR) from
93 file f. termCH and Done get values*)
94
95 PROCEDURE ReadWord(f:File; VAR w:CARDINAL);
96 (* Reads a 16 bit word w from the file f without conversion*)
97
98 PROCEDURE Write(f:File; ch:CHAR);
99 (* Writes a character ch to the file f*)
100
101 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER);
102 (* Writes a CARDINAL nr with width w to the file f. If the actual
103 width of nr is bigger than w, w is expanded*)
104
105 PROCEDURE WriteHex(f:Flle; a:ARRAY OF WORD; length:INTEGER);
106 (* Writes length hexadecimal bytes from a to the file f*)
107
108 PROCEDURE Writelnt(f:Flle; 1:INTEGER; w:INTEGER);
109 (* Writes an INTEGER i with w characters to file f. If the actual
110 width of nr is bigger than w, w is expanded*)
111
112 PROCEDURE WriteLn(f:File);
113 (* Skips to the start of the next line on the file f*)
114
115 PROCEDURE WriteString(f:File; s:ARRAY OF CHAR);
116 (* Writes a string s to the file f. Any occurrence of the character
117 "$" in s causes a WriteLn*)
118
358
Program listings
App.F
119 PROCEDURE WriteText(f:File; t:ARRAY OF CHAR; 1:INTEGER);
120 (* Writes a text t with length 1 to the file f*)
121
122 PROCEDURE WriteWord(f:File; w:CARDINAL);
123 (* Writes a 16 bit word w without conversion to the file f*)
124
125 END FilelO.
App. F FildOMOD 359
1 (* FilelO Simple 10 with more than one file Moe 16.8.87
3 This module provides procedures which are similar to those of InOut,
4 except that they can be used with more than one file (even with the
5 console).
6 *)
7 IMPLEMENTATION MODULE FilelO;
8
9 FROM SYSTEM IMPORT WORD, ADR, SETREG, REG, SHORT, VAL;
10 FROM MemTypes IMPORT Str255, ProcPtr;
11 FROM OS IMPORT DupFNErr, EOFErr, OSType, ParamBlockRec,
12 FS, PBHOpen, PBHCreate,PBClose, PBHDelete, PBRead,
13 PBWrite,
14 HFS, GetCatlnfo, SetCatlnfo,
15 SFGetFile, SFPutFile, SFget, SFput, SFReply,
16 SFTypeList;
17 FROM QuickDraw IMPORT Point;
18 FROM Toolbox IMPORT ModStr, PasStr;
19 FROM System IMPORT Allocate, Deallocate;
20 IMPORT Terminal;
21
22
23 (* Open Open a file on the specified volume
24 *)
25 PROCEDURE Open(VAR f:File; volRef:INTEGER; fn:ARRAY OF CHAR;
26 output:BOOLEAN);
27 VAR
28 par: ParamBlockRec;
29 s: Str255;
30 pt: Point;
31 reply: SFReply;
32 tlist: SFTypeList;
33 i,j,l: INTEGER;
34
35 PROCEDURE Create (drive:INTEGER; name:ARRAY OF CHAR;
36 type,creator:OSType; VAR status:INTEGER);
37 VAR statusl:INTEGER; par:ParamBlockRec;
38 BEGIN
39 WITH par DO
40 ioNamePtr:=ADR(name); ioVRefNum:=drive; ioVersNum:=0C; ioDirID:=0;
41 status:=FS(PBHCreate,par); statusl:=0;
42 IF status=DupFNErr THEN
43 statusl:=FS(PBHDelete,par);
44 status:=FS(PBHCreate,par);
45 END;
46 IF (status=0) AND (statusl=0) THEN (*set finder info*)
47 ioFDirIndex:=0; status:=HFS(GetCatlnfo,par);
48 IF StatUS=0 THEN
49 ioFlFndrInfo.fdType:=type; ioFlFndrInfo.fdCreator:=creator;
50 ioDirID:=0;
51 status:=HFS(SetCatlnfo,par);
52 END;
53 END;
54 END;
55 END Create;
56
57 BEGIN
58 Done:=TRUE; errCode:=0;
59 IF fn[0]=0C THEN (*get file name from dialog box*)
360 Program listings App. F
60 pt.v:=60; pt.h:=100; PasStr(fn,s);
61 IF output
62 THEN SFPutFile(pt,s,s, VAL(ProcPtr,dlgHook),reply,SFput)
63 ELSE
64 i:=0;
65 WHILE (i<4) AND (ftype[ir0]<>0C) DO
66 FOR j:-0 TO 3 DO tlist[i,j+1]:=ftype[i,j] END;
67 INC(i)
68 END;
69 SFGetFile(pt, s, VAL(ProcPtr,filterHook),1,tlist,
70 VAL(ProcPtr,dlgHook),reply,SFget)
71 END;
72 IF reply.good
73 THEN
74 l:=ORD(reply.fName[0]);
75 FOR i:-0 TO 1 DO s[i] .-reply.fName[l]; END;
76 volRef:=reply.vRefNum
77 ELSE errCode:=2 (*cancel*)
78 END;
79 ELSIF (fn[l]=":") AND (fn[0]>="0") AND (fn[0]<=-9») THEN
80 volRef:=ORD(fn[0])-ORD("0");
81 i:=2;
82 WHILE (K=HIGH(fn)) AND (fn[l]<>0C) DO s[i-l] :-fn[i]; INC(l) END;
83 s[0]:-CHR(i);
84 ELSE PasStr(fn,s);
85 END;
86
87 IF output & (errCode=0) THEN
88 Create(volRef,s,"TEXT","????",errCode);
89 END;
90
91 IF errCode«0 THEN
92 WITH par DO
93 ioNamePtr:=ADR(s); ioVRefNum:=volRef; ioVersNum:=0C; loDirID:=0;
94 ioPermssn:«0C; ioMlsc:=NIL;
95 errCode:=FS(PBHOpen,par);
96 IF errCode=0 THEN
97 Allocate(f,SIZE(FlleRecord));
98 IF fONIL THEN
99 fA.ref:=loRefNum; fA.volRef:=volRef; ModStr(s,fA.name);
100 fA.bp:*0; fA.bb:=0; fA.eof:=FALSE; fA.output:=output;
101 END;
102 END;
103 END;
104 END;
105 IF errCode#0 THEN Done:=FALSE; f:=NIL END;
106 END Open;
107
108
109 (* Close Close file f
U0 *)
111 PROCEDURE Close(VAR f:File);
112 VAR par:ParamBlockRec;
113 BEGIN
114 IF f=NIL THEN RETURN END; (*con cannot be closed*)
115 par.ioRefNum:=fA.ref;
116 IF fA.output THEN
117 par.ioBuffer:=ADR(fA.buffer);
118 par.loReqCount:=fA.bp; par.ioPosMode:=0; par.ioPosOffset:=0;
App. F
FildOMOD
119 errCode:=FS(PBWrlte,par)
120 END;
121 errCode:=FS(PBClose,par); Done:=errCode=0;
122 Deallocate(f); f:=NIL;
123 END Close;
124
125
126 (* Read Read a character from file f
127
128 PROCEDURE Read(f:Flle; VAR ch:CHAR);
129 VAR par:ParamBlockRec;
130 BEGIN
131 IF f=NIL (*con*)
132 THEN Terminal.Read(ch);
133 ELSE
134 WITH fA DO
135 IF bp>=bb THEN
136 par.ioRefNum:=ref; par.ioBuffer:=ADR(buffer);
137 par.ioReqCount:=buffersize; par.ioPosMode:=0;
138 par.loPosOffset:*0;
139 errCode:=FS(PBReadrpar);
140 IF errCode=EOFErr THEN errCode:-0 END;
141 bb:=SHORT(par.ioActCount); bp:=0;
142 IF bb-0 THEN
143 buffer[0]:=EF; eof:=TRUE; Done:=FALSE; errCode:=EOFErr
144 END
145 END;
146 ch:=buffer[bp]; INC(bp)
147 END
148 END;
149 END Read;
150
151
152 (* ReadCard Read a CARDINAL-constant from file f
153 *)
154 PROCEDURE ReadCard(f:File; VAR val: CARDINAL);
155 VAR ch:CHAR; i:INTEGER;
156 BEGIN
157 IF f*NIL (*con*)
158 THEN (*input from terminal*)
159 i:=0; val:=0;
160 REPEAT Terminal.Read(ch); UNTIL cho" ";
161 WHILE ch>" ■ DO
162 IF ch=DEL THEN
163 IF i>0 THEN
164 Terminal.Write(ch); DEC(i); val:=val DIV 10;
165 END;
166 ELSIF (ch^-O") AND (ch<»"9") AND
167 ((val<6553) OR ((val=6553) AND (ch<="5"))) THEN
168 Terminal.Write(ch); INC(i);
169 val:=10*val+VAL(CARDINAL,ORD(ch)-ORD("0"));
170 END;
171 Terminal.Read(ch);
172 END;
173 Done:=i>0;
174 ELSE (*input from file*)
175 val:=0; Done:=TRUE;
176 REPEAT Read(f,ch) UNTIL ch<>" ■;
177 WHILE ch>" » DO
362 Program listings App. F
178 IF (ch>="0") AND (clK-"^) AND Done AND
179 ((val<6553) OR ((val=6553) AND (ch<="5")))
180 THEN val^lOaval+VALtCARDINALjORDtchJ-ORDrO''));
181 ELSE Done:=FALSE; val:=0;
182 END;
183 Read(f,ch);
184 END;
185 END;
186 termCH:=ch;
187 END ReadCard;
188
189
190 (* Readlnt Read an INTEGER-constant from file f
191 *)
192 PROCEDURE Readlnt(f:File; VAR val: INTEGER);
193 VAR
194 ch: CHAR;
195 sign: INTEGER;
196 x: CARDINAL;
197 s: ARRAY11..80] OF CHAR;
198 i: INTEGER;
199 BEGIN
200 ReadString(f,s);
201 x:=0; val:=0; i:=l;
202 IF s[i]="-" THEN sign:=-l; INC(i); ELSE slgn:=l; END;
203 ch:=s[i];
204 LOOP
205 IF ch=0C THEN Done:=TRUE; EXIT; END;
206 IF (ch<"0") OR (ch>"9H) THEN Done:*FALSE; EXIT; END;
207 IF (x>3276) OR ((x=3276) AND (ch>"8*)) THEN Done:=FALSE; EXIT END;
208 x:=10*x+VAL(CARDINALrORD(ch)-ORD(,,0,•));
209 INC(i); ch:*s[i);
210 END;
211 IF Done THEN
212 IF x<=32767 THEN val:=sign*VAL(INTEGER,x);
213 ELSIF sign=-l THEN val:=-32767; DEC(val);
214 ELSE Done:=FALSE; END;
215 END;
216 END Readlnt;
217
218
219 (* ReadString Read a string of characters from file f
220 *)
221 PROCEDURE ReadString(f:File; VAR s:ARRAY OF CHAR);
222 VAR i:INTEGER; ch:CHAR;
223 BEGIN
224 IF f«NIL (*con*)
225 THEN
226 REPEAT Terminal.Read(ch); UNTIL cho" ";
227 i:=-l;
228 WHILE ch>" " DO
229 IF ch=DEL THEN
230 IF i>=0 THEN Terminal.Write(10C); DEC(i); END;
231 ELSIF KHIGH(s) THEN
232 Terminal.Write(ch); INC(i); s[i]:=ch;
233 END;
234 Terminal.Read(ch);
235 END;
236 ELSE
App-F
FilelOMOD
363
vH N.
237 REPEAT Read(f,ch); UNTIL ch<>"
238 i:—1;
239 WHILE ch>" ■ DO
240 IF i<HIGH(s) THEN INC(i); s[i]:=ch; END;
241 Read(f,ch);
242 END;
243 END;
244 termCH:*ch;
245 INC(i);
246 IF i<=HIGH(s) THEN s[i]:=0C; END;
247 END ReadString;
248
249
250 (* ReadWord Read a word from File f without conversion
251
252 PROCEDURE ReadWord(f:File; VAR w.CARDINAL);
253 VAR i, j: CHAR;
254 BEGIN
255 Read(f,i); Read(f,j);
256 w:=256*ORD(i) + ORD(j);
257 END ReadWord;
258
259
260 (* Write Write a character to list file
261
262 PROCEDURE Write(f:File; ch:CHAR);
263 VAR par:ParamBlockRec; status'.INTEGER;
264 BEGIN
265 IF f=NIL (*con*)
266 THEN Terminal.Write(ch);
267 ELSE
268 WITH fA DO
269 IF bp>-buffersize THEN
270 par.ioRefNum:=ref; par.ioBuffer:~ADR(buffer);
271 par.ioReqCount:=buffersize; par.ioPosMode:=0;
272 par.ioPosOffset:=0;
273 status:=FS(PBWrite,par);
274 bp:=0
275 END;
276 buffer[bp]:=ch; INC(bp)
277 END
278 END;
279 END Write;
280
281
282 (* WriteCard Write a cardinal to list file
283
284 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER);
285 VAR
286
287
288
289
290
291
292
293
294
l,d: INTEGER;
t: ARRAY[1..5] OF CHAR;
BEGIN
l:-0;
REPEAT
d:=nr MOD 10; nr:=nr DIV 10;
INC(l); t[l]:-CHR(ORD("0")+d);
UNTIL nr=0;
WHILE w>l DO Write(f," "); DEC(w);
END;
295 WHILE 1>0 DO Write(f,t(l]); DEC(l); END;
364
Program listings
App.F
296 END WriteCard;
297
298
299 (* WriteHex Write length bytes from a
300 *)
301 PROCEDURE WriteHex(f:File; s:ARRAY OF WORD; length:INTEGER);
302 VAR i,j:INTEGER; w:CARDINAL;
303
304 PROCEDURE WriteHexDigit(b:INTEGER);
305 BEGIN
306 IF b<10
307 THEN Write(f,CHR(b+ORD("0")));
308 ELSE Write(f,CHR(b-10+ORD("A"))); END;
309 END WriteHexDigit;
310
311 BEGIN (*WriteHex*)
312 j:=0;
313 FOR i:=l TO length DO
314 IF ODD(i)
315 THEN w:=VAL(CARDINAL,s[j]) DIV 256;
316 ELSE w:=VAL(CARDINAL,s[j]) MOD 256; INC(j);
317 END;
318 Write(f," ■);
319 WriteHexDigit(w DIV 16);
320 WriteHexDigit(w MOD 16);
321 END;
322 END WriteHex;
323
324
325 (* Writelnt Write an INTEGER-value to file f
326 *)
327 PROCEDURE Writelnt(f:File; i:INTEGER; w:INTEGER);
328 VAR
329 l,d: INTEGER;
330 x: CARDINAL;
331 t: ARRAY[1..5] OF CHAR;
332 sign: CHAR;
333 BEGIN
334 IF i<0
335 THEN sign:*"-"; x:=VAL(CARDINALrABS(i+l)); INC(x);
336 ELSE sign:=" "; x:=VAL(CARDINAL,ABS(i));
337 END;
338 1:*0;
339 REPEAT
340 d:=x MOD 10; x:=x DIV 10;
341 INC(l); t[l]:=CHR(ORD("0")+d);
342 UNTIL x=0;
343 WHILE w>l+l DO Write(f#" "); DEC(w); END;
344 IF (sign="-w) OR (w>l) THEN Write(f,sign); END;
345 WHILE 1>0 DO Write(f,t[1]); DEC(l); END;
346 END Writelnt;
347
348
349 (* WriteLn skip to new line on list file
350 *)
351 PROCEDURE WriteLn(f:File);
352 BEGIN
353 IF f-NIL (*con*)
354 THEN Terminal.WriteLn;
355 ELSE Write(f,EOL);
App.F
FMOMOD
365
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
END;
END WriteLn;
(* WriteString Write a string to list fill
PROCEDURE WriteString(f:File; s:ARRAY OF CHAR);
VAR i: INTEGER;
BEGIN
i:=0;
LOOP
IF i>HIGH(s) THEN EXIT;
ELSIF s[i]="$" THEN WriteLn(f);
ELSIF s[i]-0C THEN EXIT;
ELSE Write(f,s[i]);
END;
INC(i);
END;
END WriteString;
{* WriteText
Write text to list file
PROCEDURE WriteText(f:File; ttARRAY OF CHAR; 1:INTEGER);
VAR i: INTEGER;
BEGIN
FOR i:-0 TO 1-1 DO Write(f,t[i]); END;
END WriteText;
(* WriteWord
Write a word to File f without conversion
PROCEDURE WriteWord(f:File; w:CARDINAL);
BEGIN
Write(f,CHR(w DIV 256));
Write(f,CHR(w MOD 256));
END WriteWord;
BEGIN
con:=NIL; ftype[0]:="TEXT"; ftype[l]:■"*;
dlgHook:=VAL(DialogHook,NIL);
filterHook:=V^L(FilterHookrNIL);
errCode:=0;
END FilelO.
ABS
ADR
Allocate
b
bb
bp
buffer
buffersize
C
ch
93 117 136 270
335 336
9 40
19 97
304 306 307 308
100 135 141 142
100 118 135 141 146 146 269 274 276 276
117 136 143 146 270 276
137 269 271
40 59 65 82 93 94 205 230 246 369
128 132 146 155 160 160 161 162 164 166 166' 167
168 169 171 176 176 177 178 178 179 180 183 186
194 203 205 206 206 207 208 209 222 226 226 228
229 232 232 234 237 237 239 240 241 244 262 266
366
Program listings
App. F
Close
con
Create
creator
d
Deallocate
DEL
DlalogHook
dlgHook
Done
drive
DupFNErr
EF
eof
EOFErr
EOL
errCode
276
111
395
35
36
286
19
162
396
62
58
214
35
11
143
100
11
355
58
123
55
49
291
122
229
70
105
40
42
143
140
77
88
292
396
121
143
87
329
143
88
340
173
91
96 105 119 121 121 139
140 140 143 398
f 25 97 98 99 99 99 100 100 100 100 105 111
114 115 116 117 118 122 122 128 131 134 154 157
176 183 192 200 221 224 237 241 252 255 255 262
265 268 284 294 295 301 307 308 318 327 343 344
345 351 353 355 362 368 370 379 382 388 390 391
fdCreator 49
fdType 49
File 25 HI 128 154 192 221 252 262 284 301 327 351
362 379 388
FilelO 7 399
FileRecord 97
FilterHook 397
filterHook 69 397
fn 25 59 60 79 79 79 80 82 82 82 84
fName 74 75
FS 12 41 43 44 95 119 121 139 273
ftype 65 66 395 395
GetCatlnfo 14 47
good 72
h 60
HFS 14 47 51
HIGH 82 231 240 246 367
i 33 64 65 65 66 66 67 69 75 75 75 81
82 82 82 82 82 83 155 159 163 164 168 173
198 201 202 202 203 209 209 222 227 230 230 231
232 232 238 240 240 240 245 246 246 253 255 256
302 313 314 327 334 335 336 363 365 367 368 369
370 372 380 382 382
ioActCount
ioBuffer
ioDirlD
ioFDirIndex
ioFlFndrlnfo
ioMisc
ioNamePtr
ioPermssn
ioPosMode
ioPosOffset
ioRefNum
141
117
40
47
49
94
40
94
118
118
99
136
50
49
93
137
138
115
270
93
271
272
136
App.F
FilelOMOD
367
ioReqCount
ioVersNum
ioVRefNum
j
1
length
MemTypes
ModStr
name
nr
ODD
Open
OS
OSType
output
par
ParamBlockRec
PasStr
PBClose
PBHCreate
PBHDelete
PBHOpen
PBRead
PBWrite
Point
ProcPtr
pt
QuickDraw
Read
ReadCard
Readlnt
Readstring
ReadWord
ref
REG
reply
s
SetCatlnfo
SETREG
SFget
SFGetFile
SFput
SFPutFile
SFReply
SFTypeList
SHORT
sign
status
statusl
Str255
SYSTEM
System
t
118
40
40
33
33
338
301
10
18
35
284
314
25
11
11
26
28
117
139
11
18
12
12
12
12
12
13
17
10
30
17
128
255
154
192
200
252
99
9
31
29
197
301
14
9
15
15
15
15
15
16
9
195
36
37
10
9
19
287
137
93
93
66
74
341
313
99
40
291
106
36
61
37
118
141
28
60
121
41
43
95
139
119
30
62
60
132
187
216
221
257
115
62
60
200
315
51
70
69
62
62
31
32
141
202
41
41
29
292
271
66
75
341
99
291
87
39
118
263
37
84
44
273
69
60
149
247
136
70
62
202
316
202
42
43
295
66
286
343
291
100
41
118
270
112
70
62
160
270
72
62
203
362
212
44
46
331
253
289
344
293
100
43
119
270
129
69
171
74
69
209
367
213
46
341
255
292
345
116
44
121
271
263
176
75
75
221
368
332
47
345
256
292
345
47
129
271
183
76
82
231
369
335
48
379
302
294
345
51
136
272
226
83
232
370
336
51
382
312
295
379
92
136
273
234
84
240
344
263
315
295
382
95
137
237
88
240
344
273
316
295
112
137
241
93
246
316
329
115
138
255
99
246
368 Program listings App. f
termCH 186 244
Terminal 20 132 160 164 168 171 226 230 232 234 266 354
tlist 32 66 69
Toolbox 18
type 36 49
v 60
val 154 159 164 164 167 167 169 169 175 179 179 180
180 181 192 201 212 213 213
VAL 9 62 69 70 169 180 208 212 315 316 335 336
396 397
volRef 25 76 80 88 93 99 99
vRefNum 76
w 252 256 284 294 294 302 315 316 319 320 327 343
343 344 388 390 391
WORD 9 301
Write 164 168 230 232 262 266 279 294 295 307 308 318
343 344 345 355 370 382 390 391
WriteCard 284 296
WriteHex 301 322
WriteHexDigit 304 309 319 320
Writelnt 327 346
WriteLn 351 354 357 368
WriteString 362 374
WriteText 379 383
WriteWord 388 392
x 196 201 207 207 208 208 212 212 330 335 335 336
340 340 340 342
App. F SystemDEF 369
1 (* System System dependent module (from MacMETH [86])
2 ====== =======================
3 The module System is the heart of the Modula-2 system on the Macintosh.
4 It contains the loader and procedures to supply missing instructions
5 of the processor (REAL and LONGINT arithmetic). There are also
6 procedures for calling and terminating programs and handling the heap.
7 *,
8 DEFINITION MODULE System; (*H.Seller, C.Vetterli, 22-Dec-85/26-Feb-86*)
9
10 FROM SYSTEM IMPORT ADDRESS;
11 ...
12
13 TYPE Status = (normal, moduleNotFound, fileNotFound, illegalKey,
14 readError, badSyntax, noMemory, alreadyLoaded,
15 killed, tooManyPrograms, continue, noApplication);
16
17 PROCEDURE Allocate(VAR ptr:ADDRESS; size:LONGINT);
18 (* Tries to allocate a memory area of the given size on the heap. If the
19 space is not available, ptr returns NIL otherwise ptr returns the
20 address of the reserved area*)PROCEDURE Deallocate(VAR Ptr:ADDRESS);
21
22 PROCEDURE Deallocate(VAR ptr:ADDRESS);
23 (* Releases the memory area given by address ptr. ptr returns NIL*)
24
25 PROCEDURE Terminate(status:Status);
26 (* terminates the currently running process, status signals the
27 cause of termination*)
28
29 ...
30
31 END System.
Bibliography
Aho A.V., Johnson S.C. [1974] LR-parsing, Computing Surveys 6, 2,99-124
Aho A.V., Ullman J.D. [1972] The Theory of Parsing, Translation, and Compiling,
Prentice Hall
Aho A.V., Ullman J.D. [1977] Principles of Compiler Design, Addison-Wesley
Bauer F.L., Eickel J.(eds) [1976] Compiler Construction. An Advanced Course, Springer-
Verlag
Blaschek G., Pomberger G., Ritzinger F. [1985] EinfUhrung in die Programmierung mit
Modula-2, Springer-Verlag, to appear in English 1989
Engelfriet J., File G. [1981] Passes, Sweeps, and Visits, in: Lecture Notes in Computer
Science 115, Springer-Verlag, 193-207
Feldman J.A., Gries D. [1968] Translator writing systems, CACM 9,1,77-113
Fischer C.N., LeBlanc RJ. [1988] Crafting a Compiler, The Benjamin/Cummings
Publishing Company
Ganzinger H., Giegerich R. [1984] Attribute coupled grammars, SIGPLAN Notices 19,6,
157-170
Gries D. [1971] Compiler Construction for Digital Computers, Wiley
Hartmann A.C [1977] A Concurrent Pascal Compiler for Minicomputers, Springer- Verlag
Henderson P., Snowdon R. [1972] An experiment in structured programming, Bit 2, 38-53
Bibliography
371
Hopcroft, Ullman J.D. [1979] Introduction to Automata Theory, Languages, and
Computation, Addison-Wesley
Hughes J.W. [1979] A formalization and explication of the Michael Jackson method of
program design, SOFTWARE - Practice and Experience 9,191-202
Inside Macintosh [1985] volumes I—HI, Addison-Wesley
Jackson M.A. [1975] Principles of Program Design, Academic Press
Johnson S.C. [1975] YACC - Yet Another Compiler-Compiler, Tech.Rep.Nr.32, Bell
Laboratories, July 1975
Kastens U., Hutt B., Zimmermann E. [1982] GAG: A Practical Compiler-Generator, in:
Lecture Notes in Computer Science 141, Springer-Verlag
Knuth D.E. [1965] On the translation of languages from left to right, Information and
Control 8, 6, 607-639
Knuth D.E. [1968] Semantics of context-free languages, Mathematical Systems Theory 2,
127-145
Koskimies K. [1984] A specification language for one-pass semantic analysis, SIGPLAN
Notices 19,6,179-189
Koskimies K., Raiha K.-J., Sarjakoski M. [1982] Compiler construction using attribute
grammars, Proc. SIGPLAN 82 Symposion on Compiler Construction, June 1982,
153-159
Lewis P.M., Rosenkrantz D.J., Stearns R.E. [1976] Compiler Design Theory, Addison-
Wesley
Lewis P.M., Stearns R.E. [1968] Syntax directed transduction, Journal ACM 15,
3,464-488
Meijer H., Nijholt A. [1982] YABBER - yet another bibliography: translator writing tools,
SIGPLAN Notices 17, 10
MttssenbOck H. [1986] Alex - a simple and efficient scanner-generator, SIGPLAN Notices
21,5
Pomberger G [1986] Software Engineering and Modula-2, Prentice Hall
Raiha K.-J. [1977] On Attribute Grammars and their Use in a Compiler Writing System,
Report A-1977-4, Department of Computer Science, University of Helsinki
Raiha K.-J. [1980] Bibliography on attribute grammars, SIGPLAN Notices 15,3
Raiha K.-J., et al. [1983] Revised Report on the Compiler Writing System HLP78,
Report A-1983-1, Department of Computer Science, University of Helsinki
372 Bibliography
Rosen S. (ed.) [1967] Programming Systems and Languages, McGraw-Hill, New York
Rosenkrantz DJ., Stearns R.E. [1970] Properties of deterministic top-down grammars,
Information and Control 17,3,226-256
Spenke M., Miihlenbein H., Mevenkamp M., et al. [1984] A language independent error
recovery method for LL(1) parsers, SOFTWARE - Practice and Experience 14,11
Tienari M. [1980] On the Definition of an Attribute Grammar, in: Lecture Notes in
Computer Science 94 (eds Goos, G. and Hartmanis, J.)* Springer-Verlag
Waite W.M., Goos G. [1984] Compiler Construction, Springer-Verlag
Watt D.E., Lehrmann Madsen O. [1983] Extended attribute grammars, The Computer
Journal 26,2,142^153
Wirth N. [1982] Programming in Modula-2, Springer-Verlag
Wirth N. [1986] Compilerbau, B.G. Teubner Stuttgart
Wirth N., Gutknecht J., Heiz W., et al. [1986] MacMETH - A Fast Modula-2 Language
System For the Apple Macintosh, User Manual, ETH Ztirich
Index
actual attributes, 113,165
address list for G-code generation, 157
Adele,ll, 125, 203
Aho, 13,41
Alex, 119
Algol60,52
algorithmic interpretation of grammars, 83
alias name, 109,123
aliasspix, 128
alphabet, 14
extension, 51
alternative chain, 48,108
alternatives, 15
of deletable nonterminals, 137
of eps-nodes, 137
ambiguity, 108
analysis phase, 4
analyzing grammar, 23
and, 208
any, 45, 107, 122, 124, 178
any-set, 140,147,155
anyset, 54
applications of attributed grammars, 171
arithmetic expressions, 19
arithmetization of symbols, 6
arrows, 112
assessment of some compiler generators, 102
at, 122,165
Atari, 101,126
attribute, 71,72,113
assignment, 131,165
context, 167
coupling, 98
direction, 164
evaluation, 79
list, 129,164, 226
numbers, 155
passing, 87
processing, 164
saving, 90
attributed grammar, 73,79,105
applications, 171
of Coco, 228
attributes
consistency check, 165
of terminals, 122
Attrkind, 166
back end, 6
Bauer, 7
BITSET, 208
Blaschek,207
BNF, 102
bottom-up syntax analysis, 24
brackets, 136
caller interface, 121
CAP, 209
CARDINAL, 208
central-recursive grammar, 19
characteristics of Coco, 117
CheckAlternatives, 153
circular, 108
derivation, 21
grammar, 21
circularity, 150
CloseFile, 223
Coco, 4,104, 222,241
characteristics, 117
history, 197
short description, 100
COCO.ATG, 228
374
Index
cooogen, 224,245
cocogen2, 225, 254
cocogra, 224,266
Cocol, 4, 105
example, 101,134,163,167,174,18i
190,192
syntax, 212
cocolex, 223,275
cocolst, 226,283
cooosem, 223,287
cocosemfrane, 161,297
cocosy^ 224,299
cocosyn, 223, 316
cecosynfraire, 159, 328
cocotst, 225,338
col, 122
CollectFirst, 143
ColiectFollow, 144
comments, 106,110
compiler, 2
compiler compiler, 3,91
compiler description language, 3,105
compiler error numbers, 241
compiler structure
dynamic, 8
static, 4
complement symbol any, 45,107
Corrplete, 145
CcnpleteAt, 129, 223
completeness, 108,149
components of a generated compiler, 119
compound characters, 6
CtancatLeft, 133, 223
CtoncatRight, 132, 223
context condition, 76,87,115
context-free grammar, 15,106
Copy, 162, 163, 223
CopyFramePart, 160, 161
correct grammar, properties, 108
cross-reference list, 214
cyclic semantic dependencies, 82
dangling else, 29,108,147
debug switches, 241
DEC, 209
declaration of
semantic objects, 115
symbols, 109
definition module, 210
DelEfcs, 139
deletability,31
direct, 128,134
indirect, 141
Deletable, 60, 141
deletable nonterminal, 31,141
Delete redundant eps-nodes, 127, 138
>, DelGraph, 141
derivable symbol, 21
derivation, 16
rules, 15
derived attributes, 74
deterministic grammar, 24
direct deletability, 128,134
documentation, 187
dynamic
compiler structure, 8
EBNF, 19,20,107,117
Emit, 157
EmitAction, 166, 167, 223
empty string, 14,107
end-of-file symbol, 109
end-of-line symbol, 110
ends€Ri,70
Engelfriet,98
eps, 107
followers, 54
eps-nodes
insertion, 136
removal, 138
terminal successors of, 140
eps-set, 140,145,155
example, 196
epsset, 54
equivalent top-down graphs, 45
errdist, 68
Error, 60, 65, 68
error distance, 68
error handling, 62,64
error message module, 119,226,348
error messages, 65,123
Errorptr, 123
Errors, 123, 226, 348
example of
Cocol, 101,163,167,174,186,190,192
generated compiler parts, 192
EXCU 209
exit statement, 209
experiences, 197,201
export list, 209
extended Backus-Naur form, 19
factorization of
Index
375
nonterminals, 49
top-down graphs, 43
File, 98
FilelQ, 226, 356
Fill, 67
FillSucc,67
filter procedure, 120
Find circular rules, 148, 150
Find deletable symbols, 127,141
FindEfcs, 146
FindEpsFollowers, 146
first(X),26,54
Fischer, 13
follow(X),28,143
formal attributes, 113,165
frame module, 118,159,161,297,328
free monoid, 14
free semi-group, 14
front end, 6
G-code,53,55,88,117,155, 213
example, 195
generation, 156
parser, 58
GAG, 91,96,102,104
Ganzinger,91,98
GenAssign, 166, 167,223
GenOode, 156,157
Generate G-code, 157
generated compiler parts, 118
example, 192
generated compiler, operation, 120
generated semantic actions, 165
generation of die
semantic evaluator, 245
syntax analyzer, 254
generative grammar, 23
Get eps-sets, 145
Get symbol sets, 127
Get terminal start symbols, 142
Get terminal successors, 144
GetAdr, 157
GetAt, 129,165,167,223
GetFirstSet, 142
GetMacroNr, 163,223
GetNbde, 131,140, 148,157,223
GetSingles, 151
GetSy, 122, 124, 129,140,148,223
Giegerich,91
Goos, 13,82, 83
GRAM4AR, 106
grammar, 15
grammar of Cocol, 212
grammar name, 106,110,121
grammar rules, 107
grammar tests, 126,147,225, 338
grammars in matrix form, 34
grammatical language levels, 22
GraphList, 223
Graphnode, 47, 130
Gries, 7,13,85
HALT, 209
handle, 18
Hartmann,85
Henderson, 184
HIGH, 209
hints for reading the source lists, 226
HLP84, 91,94,104
Hopcroft,21
Hughes, 188
Hutt,96
IBM-PC, 101,126
identifiers, 106
implementation description, 125
implementation module, 210
implementation restrictions, 241
import, 115,122
list, 209
mc.209
ECU 209
indirect deletability, 141
individual characters, 6
inherited attributes, 74,75
inner module, 211
input attribute, 113
input of Coco, 118
input interface, 122
Insert eps-nodes oefore deletable
nt's,127,138
interfaces of the generated compiler, 121
intermediate language, 120,124
intermodular cross-reference list, 214
invocation of Coco, 118
IsTerm, 152
Jackson, 187
Johnson, 13,91,92
Kastens, 91, %
376 Index
keywords, 6,105
Knuth, 13,29, 82
Koskimies, 91,94,102
L-attributed grammar, 4,82,83,92,117
LALR(l) parser, 92,94,96
language, 16
levels, 22
LeBlanc, 13
left-canonical derivation, 17
left-recursive grammar, 19
Lewis, 82
lexical
analysis, 5,6
analyzer, 119,122,129,165,275
analyzer described by Cocol, 171
analyzer, specification, 172
language level, 22
Lilith, 101,126,198
line, 122
line numbers, 122,131
linking
alternative graphs, 133
component graphs, 132
listings, 220
literals, 6
LL(1) test, 148,153
LL(1) analysis
nonrecursive, 38
recursive, 35
LL(1) conditions, 27,28
for top-down-graphs, 47,49
LL(1) conflicts, 108
in lexical structures, 179
LL(1) grammar, 23,26,201
LL(k) condition, 40
LL(k) grammar, 25,40
LL(k)test,41
lookahead, 25
Macintosh, 101,119,126
macro, 112,116,163
main algorithm of Coco, 127
main program, 119,121,210,222,241
MarkFeachecttts, 150
matching of symbols, 48
matrix form of grammars, 34
measurements, 197
Meijer, 91
memory requirements of
Coco, 199
the generated compilers, 200
Morffypes, 226
mini-scanner, 174
Modula-2, 111, 115,119,126,207
modules, 209
description, 222
hierarchy, 221
overview, 220
Mflssenbflck, 119
MUG, 91, 98,104
multi-pass compiler, 8,9,120,124
name list, 129,155
names, 6
Newadr, 157
Newftt, 129,164, 167, 223
NevMacro, 223
NewNode, 131, 223
NewSy, 129, 223
Nijholt, 91
nococosy, 162
nodes of the top-down graph, 130
non-circular grammar, 21
nonterminal, 14,15,110,128
deletable, 141
nonterminals
factorization of, 49
replacement of, 15
substitution of, 49
terminal successors of, 140,143
termination of, 108,152
numbering of terminals, 109,122
numbers, 106
QperiFile, 223
OpenSem, 163, 223
optimization of attribute processing, 167
option symbol, 20
OR, 208
ordered attributed grammar, 96
OS, 226
output attribute, 113
output of Coco, 118
output interface, 122
parameter arrows, 112
Parse, 58, 60, 86, 121, 127
ParseNbnRecursive, 38
Index
377
parser, 223,316
generation, 159
interface, 121
tables, 118,155
tables, example, 195
tables, generation, 154
ParseRecursive, 35
parts of the generated compiler, 119
Pascal, 207
pass, 8
phrase, 17,18
PL/1, 50
PLM/80, 50
Pomberger, 207
pragma, 109,124
semantics, 113,128,155
printinput, 121
printnodes, 121
procedures, 115
productions, 15,107
program frames, 118
program listings, 220
QuickDraw, 226
Raiha, 91,94
reachability, 149
recursive
grammar, 19
productions, 19
reduced grammar, 20,21
redundancy, 108
redundant
eps-node, 138
symbol, 21
repetition symbol, 20
replacement of nonterminals, 15
RepNbde, 131,140,223
RepSy, 129,140,223
FestartHash, 162, 223
restrictions, 241
results of a Coco run, 192
right end of graphs, 131
right-recursive grammar, 19
Ritzinger, 207
root, 15
symbol, 106,110,149
Rosenkrantz, 40,42
RUIES, 107
run-time of
Coco, 199
the generated compilers, 201
scanner, 129,165,223,275
scanner generator, 119,171
scanner interface, 122
scanner procedure, 122
scanner specification, 172
scope of semantic objects, 116
sem 70, HI
Semant, 85, 86
semantic
action numbers, 131
actions, 70, 111
actions, generated, 165
actions, processing, 163
analysis, 5, 8
declarations, copying, 162
description, 110
error action, 115
evaluator, 118,119,223
evaluator of Coco, 287
evaluator, example, 194
evaluator, generation, 160
frame module, 297
interface, 85
macro, 111, 112,116,163
modules, 119,122
procedures for lexical analysis, 180
semantics, 69
sentence, 16
symbol, 15
sentential form, 16
simple phrase, 18
single-pass compiler, 8,9
Snowdon, 184
software engineering, 182
source code, 220
hints, 226
source list, 118
generator, 283
source program, 2
spelling index, 129
spix, 128,129, 162,166
stacking of semantic objects, 116
start symbol, 110,149
StartCopy, 223
static compiler structure, 4
Steams, 40,42
stepwise refinement, 11
StopHash, 162, 223
strings, 6,14,106
substitution of nonterminals, 49
378
Index
symbol list, 126,127,224,226,299
symbol names, 129
symbol sets, collection, 140
Syntoolrxxte, 127
symbols, 6, 14
Synboltype, 127
SyNr, 129,223
syntactic extension, SI
syntactical language level, 22
syntax
analysis, 5, 34
analyzer, 118,119,223,316
analyzer, generation, 159
ofCocol,212
description, 106
error indicator, 121
error interface, 123
error message, 109
error-recovery, 118
notation, 107
rules, 15,107
tree, 7,14,17,91
SyntaxError, 123
synthesis phase, 5
synthesized attributes, 74
SYSTEH211
System, 226
system specific procedures, 369
target program, 2
tasks of Coco, 126
telegram problem, 184
terminal, 14,15,109,122,128
class, 23
start symbols, 26,31,32,140,142
start symbols of length k, 40
successors, 28,31,33
successors of eps-nodes, 140,145
successors of nonterniinals,, 140,143
terminating symbol, 21
termination, 21
of nonterminals, 108,152
Test completeness, 148,149
Test grammar, 127,148
Test if all nt's can be derived to
fs, 148,152
Test if all nt's can be reached, 148,
149
token code, 109,122
Toolbox, 226
top-down
graph, 42,126,130,226,266
graphs, equivalent, 45
graphs, factorization of, 43
syntax analysis, 23,24
top-down-graphs, LL(1) conditions for, 47,49
trace switches, 241
tracing the parser, 121
Triple, 66
two level-grammar, 77
tyRl22
type transfer functions, 209
Ullman, 13,21,41
understanding the source code, hints, 226
useless symbol, 21
user modules, 122
using Coco, 117
Vach,98
van Wijngaarden, 77
variables, 115
versions of Coco, 4
Visited, 157
vocabulary, 14
Waite, 13,82,83
Watt, 77
where, 77
Wirth, 20,85,107,198,207
word, 208
YACC, 91,92, 98, 104
Zimmermann, 96
Y ' OH r,C^