Text
                    A COMPILER GENERATOR
FOR MICROCOMPUTERS


Limits of Liability and Disclaimer of Warranty The authors and publishers of this book have used their best efforts in preparing this book and the programs contained within it. These efforts include the development, research and testing of the theories and programs to determine their effectiveness. The authors and publishers make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The authors and publishers shall not be liable in any event for incidental and consequential damages in connection with, or arising from, the furnishing, performance or use of these programs.
A COMPILER GENERATOR FOR MICROCOMPUTERS Peter Rechenberg University of Linz Hanspeter Mossenbock University of Zurich Translated by John O'Meara and the authors
First published in English 1989 by Prentice Hall International (UK) Ltd, 66 Wood Lane End, Hemel Hempstead, Hertfordshire, HP2 4RG A division of Simon & Schuster International Group This book was originally published in German under the title Ein Compiler Generator fiXr Mikrocomputer by Peter Rechenberg and Hanspeter Rechenberg © 1985 Carl Hanser Verlag, Munich and Vienna. © 1989 Carl Hanser Verlag and Prentice Hall International (UK) Ltd All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form, or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission, in writing, from the publisher. For permission within the United States of America contact Prentice Hall Inc., Englewood Cliffs, NJ 07632. Printed and bound in Great Britain by A. Wheaton & Co. Ltd, Exeter. Library of Congress Cataloguing-in-Publication Data Rechenberg, Peter [Compiler- Generator fur Mikrocomputer. English] A compiler generator for microcomputers / Peter Rechenberg. Hanspeter Mossenbock. p. cm. Translation of: Ein Compiler- Generator fur Mikrocomputer. Bibliography: p. Includes index. ISBN 0-13-155060-8 : $40.00 1. Compilers (Computer programs) 2. Microcomputers - Programming. I. Mossenbock, Hanspeter, 1959- . II. Title QA76.76. C65R4313 1988 005.26 - dcl9 88-28926 British Library Cataloguing in Publication Data Rechenberg, Peter A compiler generator for microcomputers. 1. Computer systems. Programming languages. Compilers. Design & construction I. Title II. Mossenbock, Hanspeter III. Ein Compiler-Generator fur Mikrocomputer. English 005.4'53 ISBN 0-13-155060-8 ISBN 0-13-155136-1 Pbk 12 3 4 5 92 91 90 89 88 ISBN D-13-lSSDbD-fl ISBN D-13-lSS13b-l PBK
Contents Preface ix Numbered definitions, algorithms, examples xi Symbols xiv 1 Introduction and survey 1 1.1 Compilers and compiler compilers 2 1.2 Static compiler structure 4 1.3 Dynamic compiler structure 8 1.4 The structure of the book 10 2 Syntax 13 2.1 Basic concepts from formal language theory 13 2.2 LL(1) grammars and syntax analysis 23 2.3 The top-down graph 42 2.4 TheG-code 53 2.5 Parsing with the G-code 56 2.6 Error handling 62
vi Contents Semantics 69 3.1 Semantic actions 70 3.2 Attributes 71 3.3 Context conditions 76 3.4 Attributed grammars 79 3.5 L-Attributed grammars 82 3.6 Implementation of the semantic interface 85 Various compiler compilers 91 4.1 YACC-yet another compiler compiler 92 4.2 HLP84 - Helsinki language processor 94 4.3 GAG - generator based on attribute grammars 96 4.4 MUG - modular compiler generator 98 4.5 Coco - compiler compiler 100 4.6 Summary 102 The compiler description language Cocol 105 5.1 Lexical structure 105 5.2 Cocol as a syntax description language 106 5.2.1 Productions 107 5.2.2 Declarations 109 5.3 Cocol as a semantic description language 110 5.3.1 Semantic actions 111 5.3.2 Attributes 113 5.3.3 Context conditions 115 5.3.4 Semantic declarations 115 5.3.5 Scope of semantic objects 116 The compiler compiler Coco 117 6.1 Characteristics 117 6.2 Components of the generated compiler 119 6.3 Operation of the generated compiler 120 6.4 Interfaces of the generated compiler 121 6.4.1 Caller interface 121 6.4.2 Input interface 122 6.4.3 Output interface 122 6.4.4 Syntax error interface 123 6.5 Generation of multi-pass compilers 124
Contents vii The implementation 125 7.1 Survey 126 7.2 Structure of the symbol list 127 7.2.1 Symbol list representation 127 7.2.2 Symbol list construction 129 7.3 Structure of the top-down graph 130 7.3.1 Top-down graph representation 130 7.3.2 Top-down graph construction 131 7.3.3 Insertion of eps-nodes 136 7.3.4 Removal of redundant eps-nodes 138 7.4 Collecting the symbol sets 140 7.4.1 Deletable nonterminals 141 7.4.2 Terminal start symbols of nonterminals 142 7.4.3 Terminal successors of nonterminals 143 7.4.4 eps-sets 145 7.4.5 any-sets 147 7.5 Grammar tests 147 7.5.1 Completeness 149 7.5.2 Reachability 149 7.5.3 Noncircularity 150 7.5.4 Termination 152 7.5.5 LL(1) condition 153 7.6 Generation of the parser tables 154 7.6.1 Table format 154 7.6.2 Generation of the G-code 156 7.6.3 Generation of the remaining tables 159 7.7 Generation of the syntax analyzer 159 7.8 Generation of the semantic evaluator 160 7.8.1 The invariant parts of the semantic evaluator 161 7.8.2 Processing of the semantic declarations 162 7.8.3 Processing of the semantic actions 163 7.8.4 Attribute processing 164 Applications 171 8.1 Applications in compile construction 171 8.1.1 Specification of a lexical analyzer 172 8.1.2 Description of a lexical analyzer for Modula-2 173 8.1.3 Semantic procedures for lexical analysis 180 8.2 Applications in software engineering 182 8.2.1 Attributed grammars as a software design method 182
viii Contents 8.2.2 The telegram problem as an example 184 8.2.3 Attributed grammars as documentation 187 8.2.4 The Jackson method as a special case 187 8.3 Results of a Coco run 192 8.3.1 The generated syntax analyzer 193 8.3.2 The generated semantic evaluator 194 8.3.3 The generated parser tables 195 Experiences with Coco 197 9.1 A basis for measurements 197 9.2 Measurements on Coco 199 9.3 Measurements on some generated compilers 200 9.4 General experiences 201 203 207 212 213 214 220 370 Appendices A B C D E F Definition of Adele Modula-2 and Pascal Syntax of Cocol G-code Intermodular cross-reference list Program listings Bibliography Index 373
Preface This book describes the structure of the compiler compiler Coco, which was developed for microcomputers by the authors. It also deals with the techniques used by Coco and those by which Coco was developed. Special attention is given to the table driven top-down syntax analysis with automatic error recovery and description of semantics using L-attributed grammars. Coco is written in Modula-2 and generates compilers in Modula-2. It is hoped that this will show how well Modula-2 is suited to the implementation and documentation of large modular programs. Compiler compilers, as we understand them, are not the field of a few specialists in compiler construction, but rather are tools for managing various tasks in software engineering, a fact which is not generally known. The methodology of attributed grammars which lies at the foundation of compiler compilers includes, for example, the Jackson method as a simple special case, and can be applied where the program flow is primarily controlled by one structured input data stream. Thus this book has something to offer for a wide circle of readers: 1. It is a representation of the principles of compiler construction, as far as they concern the analysis part of compilers especially LL(l)-syntax analysis with attributed grammars. (Lexical analysis is covered only marginally.) 2. It is a detailed description of a compiler compiler. 3. It illustrates the application of a compiler compiler by numerous examples.
x Preface 4. It illustrates the application of software documentation methods on a large program system, especially the method of stepwise refinement and the use of an algorithm description language. 5. It can be used to evaluate the suitability of Modula-2 for software engineering because it presents a large program in Modula-2 which exploits the special properties of modular programming. We consider the primary circle of readers to be advanced computer science students, theoretically and practically active computer scientists and software engineers. We therefore presuppose the usual terminology, assume that the reader is acquainted with the development of software and that he can read Pascal, or even better Modula-2, or some similar language. Accordingly, we have kept the discussion brief, but have also taken pains not to refer to special knowledge cited elsewhere to make the book understandable in itself. The focal point around which the entire book evolved is the complete Modula-2 code of Coco in Appendix F. We consider the publishing of such a large program system a gamble because we are not sure whether the reader will be interested in the numerous details in it, and because we expose ourselves to all sorts of criticism of our programming style and choice of algorithms. But at the same time we hope that it is just this completeness which makes the book valuable and distinct from others. For information concerning the structure of the book the reader is referred to Section 1.4. The Austrian Foundation for the Advancement of Scientific Research financially supported the development of the compiler compiler and thereby rendered it possible, for which we wish to express our appreciation. For the careful review of the manuscript and for helpful suggestions we wish to thank our colleagues and friends Prof. G. Pomberger, Dr G. Blaschek and F. Ritzinger, for proof reading the English translation we wish to thank D. Raye; for the review of the examples in Chapter 4 we wish to thank Prof. H. Ganzinger, Prof. U. Kastens, Dr K. Koskimies and Prof. R. Marty. The text was produced by ourselves with the text processor WriteNow on a Macintosh computer. Linz August, 1988 P. Rechenberg H. Mossenbock
Numbered definitions, algorithms, examples LI Definition Compiler 2 1.2 Definition Compiler compiler 3 1.3 Versions of Coco 4 1.4 Example Lexical analysis 6 1.5 Example Syntax tree 7 2.1 Definition Abbreviations for strings and sets ofstrings 14 2.2 Definition Grammar 15 2.3 Definition Derivation, sentential form, sentence, language 16 2.4 Example Derivation of all sentential forms of a language 16 2.5 Definition Left-canonical derivation 17 2.6 Definition Phrase 18 2.7 Definition Simple phrase, handle 18 2.8 Example Phrase, simple phrase, handle 18 2.9 Definition Recursive grammar 19 2.10 Example Arithmetic expressions 19 2.11 Definition Terminating symbol, derivable symbol 21 2.12 Definition Useless symbol 21 2.13 Definition Reduced grammar 21 2.14 Definition LL(k) grammar 25 2.15 Definition Terminal start symbols of a nonterminal 26 2.16 Definition Terminal start symbols of a string 26 2.17 LL(1) conditions for t-free grammars 27 2.18 Example LL(1) conditions 27
x i i Numbered definitions, algorithms, examples 2.19 Definition Terminal successors 28 2.20 LL(1) conditions for arbitrary grammars 28 2.21 Example LL(1) conditions 29 2.22 Example Dangling else 29 2.23 Definition Deletability 31 2.24 Algorithm Marking deletable symbols 32 2.25 Algorithm Calculationof the sets of terminal start symbols 32 2.26 Algorithm Calculation of successor sets 33 2.27 Algorithm LL(1) analysis (recursive) 35 2.28 Example Recursive LL(1) parsing 36 2.29 Algorithm LL(1)parsing (nonrecursive) 38 2.30 Example Nonrecursive LL(1)parsing 39 2.31 Definition Terminal start symbols of length k 40 2.32 Definition LL(k) grammar 40 2.33 LL(k) condition 40 2.34 Example LL(2) andLL(3) test 41 2.35 Example Basic structure of the top-down graph 43 2.36 Definition Complement symbol any 45 2.37 Example Equivalent top-down graphs 46 2.38 Definition Alternative chain 48 2.39 Example Alternative chains 48 2.40 Definition Match 48 2.41 Definition LL(1) conditions for top-down graphs 49 2.42 Definition G-code (incomplete) 55 2.43 Algorithm Parse (simplified) 58 2.44 Algorithm Parse (complete) 60 2.45 Example Error situation 62 2.46 Principle of error handling 64 2.47 Algorithm Error (basic structure) 65 2.48 Algorithm Triple 66 2.49 Algorithm Fill 67 2.50 Algorithm FillSucc 67 2.51 Algorithm Error (with heuristic enhancements) 68 3.1 Example Semantic actions 70 3.2 Example Semantic actions 71 3.3 Example Interpretation of arithmetic expressions 73 3.4 Example Interpretation of arithmetic expressions in EBNF 74 3.5 Example Inherited attributes 75 3.6 Example A context-sensitive language 78 3.7 Example Context condition 78 3.8 Example Context condition 78
Numbered definitions, algorithms, examples xiii 39 Definition Attributed grammar 79 3 10 Example Variable declaration 80 3.11 Definition L-attributedgrammar 83 3.12 Parser with semantic interface 86 3.13 Example Attribute passing 87 3.14 Definition G-code (remainder) 88 3.15 Principle of attribute saving for recursive symbols 90 4.1 Example Attributed grammar as input for YACC 93 4.2 Example Attributed grammar as input for HLP84 95 4.3 Example Attributed grammar as input for GAG 97 4.4 Example Attributed grammar as input for MUG 99 4.5 Example Attributed grammar as input for Coco 101 5.1 Example Cocol grammarfor real constants 107 5.2 Example Theuseofeps 107 5.3 Example The use of any 108 5.4 Example How the compiler treats LL(1) conflicts 108 5.5 Example Terminal declarations 109 5.6 Example Pragma declarations 110 5.7 Example Nonterminal declarations 110 5.8 Example Semantic actions 112 5.9 Example Indication of dataflow at parameters 112 5.10 Example Semantic macros 113 5.11 Example Semantic actions for pragmas 113 5.12 Example Attributes 114 5.13 Example Context conditions 115 5.14 Example Declarations of semantic objects 115 5.15 Example Stacking of semantic objects 116 6.1 Example Application of any 124 8.1 Example LL(1) conflicts in lexical structures 179
Symbols a1 a*- a* G 0 5 V VT vN v+ V* e 0 G n u -> 1 [] {} {} =^ =>+ =>* ^> =& in a, P, <p, a 14 14 14 15 40 15 14 15 15 14 14 14 15 15 20 20 16 16 16 17 203 CD The string of n identical symbols a Thtstt{an:n>l) Thestt[an:n2>0) Grammar Order (asymptotic time complexity) Sentence symbol Alphabet Alphabet of terminals Alphabet of nonterminals The set of all non-empty strings built from symbols of V The set of all strings built from symbols of V including the empty string The empty string The empty set 'Element of Intersection of two sets Union of two sets Replacement symbol: 'is defined as' Separates alternatives Option notation (encloses optional symbols and strings) Set notation Repetition notation Direct derivation: 'produces directly' Derivation: 'produces' Derivation: 'produces or is equal to' Left-canonical derivation 'Does not produce and is not equal to' Input, output, transient parameters Strings String to be analyzed
1 Introduction and survey The older of the two authors distinctly remembers that he first heard the word 'compiler compiler1 at the IFIP-Congress in Munich in 1962 in connection with Adas, the super computer of its time by the English company FerrantL It was a dark, secretive term. Since compiler writing was still an art mastered by only a few initiates, one could only touch one's cap humbly to people who were involved in writing compilers which generated compilers. There was just no way to understand them. The two works which focused attention on compiler generating programs and which eliminated much of the mystery from the concept were the anthology by Rosen [1967] and the survey article Translator Writing Systems by Feldman and Gries [1968]. But it was the clear formulation of the two most important deterministic grammars, LR(k)-grammars by Knuth [1965] and LL(k)-grammars by Lewis and Stearns [1968] that helped compiler generators achieve the actual breakthrough. Today, the terms 'compiler generator', 'compiler generating program' and 'compiler compiler' are used synonymously and refer to a system which in some way supports and partially automates the production of compilers. In the first chapter we introduce the concepts of 'compiler' and 'compiler compiler', survey the subtasks which a compiler must handle and discuss the organization of the book. The reader who is acquainted with the terminology of compiler construction, even only partially, can start immediately with Section 1.4.
2 Introduction and survey Chap. 1 1.1 Compilers and compiler compilers With the exception of special cases, a program can be seen as the description of a process (algorithm) which transforms input data into output data (Fig. 1.1). Input data Program P Output data Fig 1.1 Program If the input data themselves form a program, and the program P transforms them into another language, P is called a compiler, the input data are called the source program and the output data are called the target program (Fig. 1.2). Source program S Compiler C Fig. 1.2 Compiler Target program T Here, the source language is almost inevitably the higher, less machine-oriented, and the target language the lower, more machine-oriented, language often the machine language itself. Thus a compiler can be defined, as in Waite and Goos [1984]. 1.1 Definition Compiler A compiler is a program which transforms an algorithm from a language acceptable to humans into a language acceptable to machines. Because a compiler is a complex program which itself must be written in a programming language, the question arose quite early as to whether, given an abstract description of the source language and its transformation into a target language, a compiler could be generated either completely or partially. A program CC which is to solve such a task reads the description of the source language S together with its transformation into a target language T as input data. It transforms this description into a program C which, when it is later executed, transforms source programs written in S into the target language T. Thus CC generates a compiler C, and is known as a compiler generator or compiler compiler (Fig. 1.3).
Sec. 1.1 Compilers and compiler compilers 3 Compiler description in compiler description language CDL Compiler compiler CC Compiler in compiler implementation language OL Fig. 13 Compiler compiler This leads to the following definition. 1.2 Definition Compiler compiler A compiler compiler is a program which generates a compiler, or major parts thereof, from the complete or partial description of the compiler. A compiler compiler and the compiler it generates can be represented as in Fig. 1.4. Compiler description in CDL Source p 5 Compiler compiler CC Kfogram ^ OL 1 Compiler 1 ( ^ Target program T Fig. 1.4 Compiler compiler and the generated compiler A compiler compiler and its compiler description language are very closely related. For the user of a compiler compiler the compiler description language is actually the only interesting feature because it determines whether the description of the compiler to be generated can be formulated and how conveniently this may be accomplished. Compiler description languages have two primary tasks: (1) the description of the syntax of the source language of the compiler to be generated and (2) the description of the transformation of the source program into the target program. Because the meaning of the source program is visible in this transformation, the description of the transformation is also known as a semantical description. There are basically two notations for syntax description: Backus-Naur form (BNF) and Extended Backus-Naur form (EBNF). Both describe the
4 Introduction and survey Chap. 1 syntax as a grammar in the form of so-called productions. They constitute well-understood formal systems and are based on the theory of formal languages. The technique of describing semantics is less consolidated. Aside from ad hoc methods, attributed grammars in a wide variety of forms are usually applied here. The compiler compiler described in this book is named Coco (a not very imaginative abbreviation of 'compiler compiler1) and its compiler description language is called Cocol (compiler compiler language). Cocol uses the EBNF of Wirth [1982] for syntax description and a special form of attributed grammars, the so called L-attributed grammars, for semantical descriptions. Coco was originally implemented in PLM80 and generated a compiler in Pascal-86. The version described here is written in Modula-2 and generates compilers in Modula-2. Table 1.3 shows the versions of Coco that are available for several popular compilers at the time of writing. They are different in the languages of the generated compilers (Modula-2 or Pascal) and in the machines on which they run. 1.3 Versions of Coco Computer Modula-2 Pascal Macintosh Mac-METH Turbo-Pascal MS-DOS computers Logitech V. 3.0 Turbo-Pascal V. 4.0 M2-SDS Taylor-Modula ATARI-ST TDI-Modula IBM/370 Modula/370 1.2 Static compiler structure Like the translation of a sentence in a natural language Q into another natural language Z, the transformation of a source program into a target program can be roughly divided into two phases. First the sentence in Q must be 'understood1, through grammatical analysis. With knowledge of its grammatical structure and the aid of a dictionary it is then possible to construct the sentence in Z with the same meaning. In a similar way, the translation of a program consists of analysis and synthesis. In the analysis phase the source program is decomposed into its constituent parts. Here one distinguishes:
Sec. 1.2 Static compiler structure 5 1 lexical analysis, which transforms the input character stream into 'symbols* such as names, numbers and operators; 2. syntax analysis, which analyzes the grammatical structure of the program; 3. semantic analysis, which analyzes all the properties of the program which are not of a syntactical nature. Analysis yields: 1. the determination of the correctness of the program; 2. the internal representation of the source program in a form which is particularly well adapted for synthesis (so-called intermediate language); 3. memory tables which are used for further processing of the intermediate language. i Analysis Compiler front end 1 1 , i Synthesis Compiler back end ' Fig Source program Lexical analysis 1 ' Syntax analysis ' i Semantic analysis ■ r Optimization r Code generation Target j . 1.5 Static c \ jrogram ompiler structure In the synthesis phase the target program is generated from the program in the intermediate language. Here one distinguishes: 1. optimization, which transforms the program in the intermediate language to improve the target program with respect to certain criteria; 2. code generation, which generates the target program from the optimized intermediate language. This static, or logical compiler structure is shown in Fig. 1.5.
6 Introduction and survey Chap. 1 The analysis sections are determined by the source language and the intermediate language; the synthesis sections are determined by the intermediate language and the target language. The analysis sections are known as the com- piler front end; the synthesis sections are known as the compiler back end. The compiler front end is independent of the target language; the compiler back end is independent of the source language. Compiler compilers primarily support the analysis phase, and therefore this book only deals with the analysis phase. Lexical analysis Lexical analysis preprocesses the source program text in order to simplify the tasks of the later phases. This preprocessing includes the following points: 1. Elimination of meaningless characters. Comments, empty lines and unnecessary spaces are eliminated. 2. Recognition of symbols. One or more characters in sequence which together constitute a symbol are recognized. Symbols are: (a) keywords such as IF, WHILE, END, etc.; (b) names for constants, types, variables, procedures, etc.; (c) literals (numerical constants) such as 3.14; (d) strings, usually enclosed in inverted commas, such as 'This is a string'; (e) compound characters such as ':-•, •<-*,'..', etc.; (f) individual characters such as ' (', V, etc. 3. Arithmetization of symbols. Because numbers can be processed more easily than strings, keywords, names and strings are replaced by numbers, and literals are converted to the internal numerical representation of the machine. This process is known as arithmetization. Names are stored in a name list, strings in a string list, and literals, possibly, in a constant list. 1.4 Example Lexical analysis The source statement x := 3 + base * factor, contains the names x, base and factor; the numerical value 3, the character combination *:=' and the individual characters V, '*', and V- If ident, becomessy, number, plussy, timessy and semicolonsy are names for the arithmetized symbols, lexical analysis yields the sequence of 8 symbols: ident becomessy number plussy ident timessy ident semicolonsy
Sec. 1.2 Static compiler structure 7 Some of these symbols are uniquely determined (e.g. plussy); others such as ident and number refer to a class of symbols and must be made unique by a semantic value (e.g. an index in the name list for names, the converted numerical value for literals). If x, base and factor are stored respectively in places 1, 2 and 3 in the name list, lexical analysis yields the following symbols with their semantic values: identll becomessy number/3 plussy ident/2 timessy ident/3 semicolonsy Lexical analysis is the simplest part of the compiler. However, it does take up a large portion of the compilation time (typically 20 to 40%), which means that efficiency is especially important. A lexical analyzer written in Cocol is described in Section 8.1. But lexical analyzers are not discussed anywhere else in the book and the reader is referred to the literature, for example Gries [1971] or Bauer [1976]. Syntax analysis Syntax analysis decomposes the source program, which now consists of symbols, into its grammatical parts and represents its structure as a tree (called a syntax tree) or as something equivalent to a tree. vl + v2 v3 f lable 1 1 Term I < 1 1 Expression 1 1 Te l 1 • rm Assignment Fig. 1.6 Syntax tree 1.5 Example Syntax tree The source statement in Example 1.4 is an assignment. An assignment consists of a variable, the assignment symbol, an expression and a closing semicolon. An expression consists of terms connected by addition operators, and terms consist of factors connected by multiplication operators. This yields the syntax tree in Fig. 1.6.
8 Introduction and survey Chap. 1 Syntax analysis is much more difficult than lexical analysis. There are, however, methods for syntax analysis which are based on the grammar of the source language. Knowledge of these methods makes syntax analysis a routine task. Semantic Analysis Semantic analysis examines the properties of the source program which cannot be represented grammatically, in particular: 1. the scope of names; 2. the correspondence between declarations and uses of names; 3. the type compatibility of operands in expressions and statements. Semantic analysis and syntax analysis can be performed together, in which case the two phases merge; or they can be performed separately, in which case the syntax tree, the result of the syntax analysis, is augmented with semantic information. 1.3 Dynamic compiler structure Dynamic, or time-dependent, compiler structure must be distinguished from static, or logical, compiler structure. The individual logical divisions - lexical analysis, syntax analysis, semantic analysis, optimization and code generation - can be executed either sequentially or simultaneously, which means interwoven in time. Each part of the compiler which reads the source program or an intermediate program in its entirety is called a pass, and thus compilers are classified as single-pass or multi-pass compilers. Figure 1.7 shows both cases. For a single-pass compiler the syntax analyzer is the central, controlling program. It calls the lexical analyzer when it requires the next source symbol, and it calls the semantic analyzer when it wishes to pass on a syntactically correct construction. The semantic analyzer generates a section of intermediate code or the corresponding machine code (with or without optimization). For a multi-pass compiler each section is executed sequentially. The result of each section is an intermediate program which is written onto an external storage device and is read again by the next pass. Single pass compilers are generally much faster than multi-pass compilers because they avoid access to external storage devices for reading and writing
gec i 3 Dynamic compiler structure 9 intermediate programs. Multi-pass compilers, on the other hand, require less storage space because only one part of the compiler need ever reside in main storage at once, and they are logically simpler because the various parts are not intertwined. Some source languages cannot even be compiled by single-pass pass compilers because they contain grammatical constructs whose translation requires information which becomes available only from parts of the source program that are processed later. This is the case, for example, when a variable can be used before it is declared. The advantages and disadvantages of single-pass and multi-pass compilers can be summarized as in Fig. 1.8. Source program _J Syntax analyzer Symbols Source program Tree parts Lexical analyzer Lexical analyzer —I External memory I Semantic analyzer Control flow Dataflow Syntax analyzer External memory Intermediate language Optimization ad code generation program Semantic analyzer —» External memory _J Optimization and code generation J Target program Fig. 1.7 Single-pass and multi-pass compilers Single-pass Multi-pass compiler + - - - _ + + + Speed Memory Logical complexity Universal applicability Fig. 1.8 Properties of single-pass and multi-pass compilers + = favorable - = unfavorable
10 Introduction and survey Chap. 1 1.4 The structure of the book This book consists of nine chapters and six appendices. The first three chapters cover the principles of compiler construction as far as they are required for an understanding of Coco; occasionally rather more than the minimum is presented in order to provide a well-rounded picture. The fourth chapter provides a glimpse into other compiler compilers, and the rest of the chapters present Coco itself, its compiler description language, its implementation and applications. In view of this an oudine looks as follows: Principles of compiler construction 1. Introduction and survey 2. Syntax 3. Semantics Various compiler compilers 4. Various compiler compilers The compiler compiler Coco 5. The compiler description language Cocol 6. The compiler compiler Coco 7. The implementation 8. Applications 9. Experiences with Coco The second chapter starts with those concepts from formal language theory which are necessary for the remainder of the book. Then table-driven LL(1) syntax analysis is covered; this determines the fundamental structure of this compiler compiler, and at the same time is a simple and efficient method for developing the syntactic section of compilers. Most importantly this chapter contains a method for automatic error recovery which is independent of the language to be analyzed. . In the third chapter, the method applied in this compiler compiler for describing the actual translation process, using attributed grammars, is presented. The special case of L-attributed grammars is used here and the translation process is described by attributes, context conditions and semantic actions. The fourth chapter gives a survey of a few compiler generators described in the literature, and thus also surveys the state of the art. The fifth chapter is a definition of the compiler description language Cocol.
Sec. 1.4 The structure of the book 11 The sixth chapter describes Coco from the view point of the user: its characteristics, how to use it and what the compilers it generates look like. Along the way it is shown that Coco is also suitable for implementing multipass compilers. This chapter, together with the language description of Chapter 5, forms the 'external' description of Coco. The seventh chapter, the longest, contains the details of the implementation of Coco. This chapter is also intended as a study in program documentation. The eighth chapter presents three major examples of the use of Coco. The first is a complete description of a lexical analyzer in Cocol. The second illustrates Coco as a software engineering tool and the method of attributed grammars as a software engineering method which encompasses the Jackson method as a special case. The third presents the compiler sections generated for a concrete input grammar. In conclusion the ninth chapter presents experiences of the authors with Coco. The Appendices contain the algorithm description language Adele used here, describe Modula-2 in as much as it differs from or supersedes Pascal, present a complete listing of Coco in Modula-2 and a description of Coco in Cocol, that is a self-description of Coco. Systematic readers should read the book chapter by chapter. Readers who wish to begin with lexical analysis should consult Section 8.1 as early as Chapter 2. Readers who wish to know about Coco only (or firstly) from the user's point of view can start immediately with Chapter 5 followed by Chapters 6 and 8, and perhaps Chapter 4. Finally, readers who are already familiar with LL(l)-grammars and are primarily interested in the implementation of Coco can acquaint themselves first with Cocol in Chapter 5 and then concentrate on Chapters 6 and 7, although they will occasionally have to refer back to Chapters 2 and 3. The following chapter sequences are therefore recommended (Chapters which extend the material are in italics): Novices and all-embracing readers: 2-9 Primarily interested in applying Coco: 5, 6, 8, 4 Primarily interested in comparing Coco: 4, 5, 6, 8 Primarily interested in the implementation of Coco: 5, 6, 7, S.J Some remarks have been repeated so that the chapters do not become too interdependent. We hope the all-embracing reader will forgive us for this. In general the presentation is organized according to the principle of stepwise refinement. This is true of the individual chapters as well as for the
12 Introduction and survey Chap. 1 book as a whole. Thus Chapters 2 and 3 are basically refinements of Section 1.2, Chapters 5 and 7 refinements of Chapters 2 and 3 and Appendix F, containing the text of Coco in Modula-2, is a refinement of Chapter 7. For representing algorithms our algorithm design language Adele is used. It is defined in Appendix A, but should be understandable without a definition as it relies strongly on Modula-2 and Pascal. The authors use Adele constantly in their daily work and view Adele as a method for algorithm description which is adequate in most cases. Actual Modula-2 programs occur only in the appendices, but there are also Modula-2 fragments in Chapters 5 and 7. The book is therefore understandable for readers who are not familiar with Modula-2. In spite of this, Modula-2 is viewed as of major importance in this book because of the technique of modular programming, and especially because of data encapsulation. One of the book's important aims is to document a large Modula-2 program and to demonstrate in the process how well Modula-2 is suited to software engineering projects. Definitions, algorithms and examples are numbered and indented. A collection of all numbers is to be found after the table of contents to facilitate fast searching.
2 Syntax In this chapter we deal with all syntax-related questions as far as they concern compilers that use LL(1) syntax analysis. First, we will summarize the terminology and some important results of formal language theory. Next, we look at LL(1) grammars and their syntactical analysis. Since the flexibility and efficiency of syntax analysis depends to a large degree on the representation of the grammar in memory, we will describe the tree-like data structure used in Coco which is called a top-down graph. We will also describe an optimized version of the top-down graph, called the G-code, which is especially suited for interpretation. At the end of the chapter we describe the G- code syntax analyzer and a method for automatic error handling. Except for the G-code and its interpretation this chapter is not Coco specific. Thus, it can be read as a general treatment of syntax issues in compiler design. Bottom-up analysis and LR(fc) grammars have been left out, since they constitute a large and self-contained topic that does not apply to Coco. Interested readers are referred to Knuth [1965], Aho and Johnson [1974], Waite and Goos [1984], and Fischer and LeBlanc [1988]. 2.1 Basic concepts from formal language theory We assume that the reader is familiar with the elements of formal languages, and we summarize here only the terms and definitions that we will use later on. We primarily use the terminology from the books of Gries [1971] and Aho and Ullman [1972].
14 Syntax Chap. 2 Symbols and strings Programs consist syntactically of sequences or strings of symbols which belong to an alphabet or vocabulary. If a, b, c are the symbols that constitute the alphabet V, then we can write: Symbols can be concatenated to form strings. For some strings and sets of strings there are commonly used abbreviations: 2.1 Definition Abbreviations for strings and sets of strings d1 denotes the string consisting of n identical symbols a, e. g. a3 = aaa. e denotes the empty string, i.e. a string of null symbols. a* denotes the set {an: n £ 1}, e. g. <z+ = {a, aa, aaa, aaaa,...}. a* denotes the set {a»: n ^ 0}, e. g. <z*= {e, a, aa, aaa,...}. * It is obvious that a* = cfr u {e}. V+ denotes the set of all non-empty strings which can be formed from the symbols contained in V. For example, if V = {a, b> c] then V+ = {a, 6, c, aa, ab, ac, ba, bb, be, ca, cb, cc, aaa,...} V+ is called ihtfree semi-group over the alphabet V. V* denotes the set of all strings including the empty string that can be formed from the symbols of V. For example, if V = {a, b, c} then V* = {e, a, bt c, aa, ab, ac, ba, bb, be, ca, cb, cc, aaa,...} V* is called ihtfree monoid over the alphabet V. It is obvious that V* = V+ u {e}. The set V is always finite whereas the sets a+, a*, V+, V* are always infinite. Grammar and language In Section 1.2, we showed that the grammatical structure of an instruction, a program, or generally of a 'sentence' is a tree, the syntax tree. In the syntax tree, there are two types of symbols: 1. Terminals are the symbols of the sentence itself. They are the leaves of the tree and cannot be decomposed further. 2. Nonterminals are all other symbols.
Basic concepts from formal language theory 15 In addition to the above, each tree contains a distinguished nonterminal, the sentence symbol, or the root, from which the entire tree originates. The valid structures of syntax trees and hence the sentences of a formal language are described by a grammar. A context-free grammar or, simply, grammar - since we only use context-free grammars - is a system of rules for producing strings over an alphabet V. 2.2 Definition Grammar A grammar G is a quadruple G = (V#, VT, R, S) with the following components: VN: alphabet of nonterminals, VT: alphabet of terminals, R: set of productions, also called syntax rules or derivation rules, S: sentence symbol, a special symbol from VN, the root of the syntax tree. By V = VN u VT we denote the union of the terminal and nonterminal alphabets. A production is written as X -> a where X e VN and a e V* (read: X is defined as a' or X can be replaced by a* which means that the nonterminal X can be replaced by the string a in each string that contains X. Several productions may have the same left-hand side, such as: X-> ax X -> a2 They denote alternatives and can be grouped by use of the symbol T: X —> a! I a21 <*3 (read: fX is defined as o^ or oc2 or a3'). The productions describe the replacement of nonterminals by strings. We start from the sentence symbol 5, and replace it by a string according to the productions of the grammar. Then we repeatedly replace nonterminals in the string by other strings until we reach a string that contains only terminals. S itself and all strings that result from S by the application of the productions are called sentential forms. The sentential forms that consist of terminals only are called sentences.
16 Syntax Chap. 2 We denote replacement by the replacement or derivation symbol =». If a and p are two sentential forms and p may be derived from a by the application of a production, we write: a=>p (read: 'a produces p' or 'p is derived from a1). These terms are formalized by Definition 2.3 and are illustrated by Example 2.4. 2.3 Definition Derivation, sentential form, sentence, language We say that a string a directly produces a string p, written a =» p, if there exist strings ©i and oo2, such that we can write a = ©x A ©2, P = ©j q> ©2 and the production A => <p belongs to the grammar. We then call p a direct derivation of a. We describe a sequence of several derivations by the symbols =»+ and =»*. A string a produces a string p, written as a=»+p if there exists a sequence of direct derivations a = ©o =* ©i =» ©2 => ... =* ©/i = P where n ^ 1 Such a sequence is called a derivation of length n. For the case of a =>+ p or a = p, we write a=**p (read: 'a produces or is equal to p'). If G is a grammar with sentence symbol 5, then a string a is called a sentential form if A sentence is a sentential form that consists only of terminals, and a language L(G) is the set of all sentences that can be derived from the sentence symbol: L(G) = {a:S=»+a & aeVT*} 2.4 Example Derivation of all sentential forms of a language Consider the grammar G = ({5, A, 5), [a, b, ;}, R, S) with the nonterminals 5, A, B, the terminals a, b,;, and the set R of productions:
2. l Basic concepts from formal language theory 17 S -* A; A -> aB I BBb B -> b I ab From this, the following derivations of sentential forms can be produced: S =* A; =* aB; =» ab; =* aab; =* BBb; =* bBb; =* bbb; =* babb; =* abBb; =» abbb; =* ababb; The result is L(G) = {ab; aab; bbb; babb; abbb; ababb;}. Hence, the language L(G) consists of 6 sentences. A syntax tree can be assigned to each sentence. Figure 2.1 shows the syntax tree of abbb; in two forms. s 1 ♦ ♦ A ; i B B b B ' ♦ A 1 1 B S \ Fig. 2.1 Two forms of syntax tree for abbb; In the top-down syntax analysis discussed later on, we will always use derivations in which the leftmost nonterminal is replaced. This kind of derivation is called left-canonical: 2.5 Definition Left-canonical derivation A direct derivation eoi A ©2 = written as eoi A ©2 => coi a ©2 > 1 a © 2 is called left-canonical, and if o>i g Vy*f that is if A is the leftmost nonterminal. A derivation is called left-canonical if all its direct derivations are left-canonical. Sometimes it is useful to have a name for the string that is substituted for a nonterminal during a derivation. This string is called a 'phrase*.
18 Syntax Chap. 2 2.6 Definition Phrase When ©i a o>2 is a sentential form such that 5=» (Oi A ©2=»*©i a ©2, then a is called & phrase, more specifically an A-phrase. According to this definition, each sentential form is an 5-phrase. Because of their importance in bottom-up syntax analysis, which is not covered in this book, we shall also define the terms simple phrase and handle. 2.7 Definition Simple phrase, handle If a is an A-phrase and a direct derivation of A, then 5 =»* ©i A ©2 ^ a>i a ©2 holds and a is also called a simple A-phrase. The leftmost simple phrase in a sentential form is called the handle of the sentential form. 2.8 Example Phrase, simple phrase, handle Consider Example 2.4 and the derivation sequence S =» A; =» BiB2b3; =» B1b2b3/' =» abib2b3; where the different fls and 6s have been distinguished by an index. In the sentential form abib2b3; abi is a simple B-phrase and the handle, b2 is a simple B-phrase, abib2b3 is an A-phrase, abib2b3; is an 5-phrase. In the sentential form B^bs; b2 is a simple B-phrase and the handle, Bxb2b3 is an A-phrase, Bib2b3; is an S-phrase. In the sentential form BiB2b3; BiB2b3 is a simple A-phrase and the handle, BiB2b3; is an 5-phrase. In the sentential form a; a; is a simple 5-phrase and the handle.
Sec. 2.1 Basic concepts from formal language theory 19 Recursive productions produce languages with an infinite number of sentences. The production A -> a I Ab produces the set of sentences ab*. The production A -> a I bA produces the set of sentences b*a, the production A -> a I (A) produces the set of sentences {(n a )n: n > 0}. Recursion can also appear indirectly, which means it can span several productions, as in the production pair A -> x | By B -> z I Au The following definition is a consequence of this: 2.9 Definition Recursive grammar A grammar is called recursive if it permits derivations of the form A =*+ ©i A ©2 with A e V#, ©1 e V*, 002 e V*. More specifically, it is called Left-recursive if A =>+ A © Right-recursive if A =»+ eo A Central-recursive or self-embedding if A =»+ ©1A ©2* 2.10 Example Arithmetic expressions The grammar of arithmetic expressions with the sentence symbol £ and the terminals v for variables, and c for constants: E->T|+T|-T|E + T|E-T T->F|T*F|T7F F -> v I c I ( E ) is left-recursive in E and T, and central-recursive in £, T, and F. The extended Backus-Naur form (EBNF) Computer science uses various notations for grammar productions. The previously used notation has the following characteristics: 1. terminals are lower case 2. nonterminals are upper case 3. replacement symbol is -» 4. separation of alternatives is denoted by I Indefinite repetition, which is a frequently occurring language element, must be described by recursive productions, especially left-recursive productions. This appears in many cases unnatural and it is also unsuited for top-down syntax analysis. Several grammatical notations have therefore evolved that
20 Syntax Chap. 2 remove these and other deficiencies. Among these, the notation introduced by Wirth [1982] for the description of Modula-2 is especially notable because of its simplicity and clarity. Its characteristics are: 1. Terminals that represent themselves (literals) are in quotes 2. Other terminals and nonterminals have names that imply their meaning (this is customary but not mandatory) 3. Replacement symbol is = 4. Separation of alternatives is denoted by I 5. Productions are ended explicitly by a period 6. Option symbol: [A] means A I e 7. Repetition symbol: {A} means e IA IA A IAA A I... 8. Parentheses for enclosing The grammar of the arithmetic expressions is as follows: expression = [»+«|n-»] term {(w+«|n-n) term}, term = factor {(n*n|n/w) factor}, factor = c I v I n(n expression n)n. The form of the EBNF grammar itself can also be described by an EBNF grammar: EBNFGrammar = production {production} n.n. production = symbol "=n expr. expr = term {n|n term}, term = factor {factor}, factor = ident I string I "(n expr w)n | n[n expr n]n | n{n expr n}n. ident is the terminal for names, string is the terminal for a character string enclosed in quotes. In this book, we will primarily use Wirth's EBNF notation. However, where structural simplicity of the grammar is important, we will still use the older notation of the formal languages. Reduced grammars In the grammar of a programming language, each nonterminal and each alternative should contribute to the generation of sentences. If this is the case, the grammar is called reduced. In the development of a grammar, unnecessary nonterminals and alternatives may creep in. Therefore, each newly developed grammar should be checked to determine if it is reduced. If it is not, the unnecessary symbols and productions should be removed. In order to contribute to the generation of sentences, each nonterminal must meet the following two conditions: It must be 'terminating', that is, it
; 2.1 Basic concepts from formal language theory 21 must produce a terminal string, and it must be derivable', that is, it must appear in some sentential form. 2.11 Definition Terminating symbol, derivable symbol A nonterminal A is called terminating if it produces a terminal string, that is A=>+a withaeVy. A nonterminal A is called derivable if it appears in a sentential form, that is, if S=> c»iAg)2. A nonterminal that is not derivable or not terminating, contributes nothing to the generation of sentences, and is therefore useless. 2.12 Definition Useless symbol A nonterminal A is called useless if there is no derivation S=* o>iA ©2=>*coi aa)2 where ©i, ©2, a eV* 2.13 Definition Reduced grammar A grammar that contains no useless nonterminals is called reduced. Algorithms for the detection of all useless symbols are simple (see Sections 7.5.2 and 7.5.4, or Hopcroft and UUman [1979]). If one wants to delete them, the order is important. First, the nonterminating symbols must be found and all alternatives in which they appear must be deleted from the grammar. Then, the nonderivable symbols and the alternatives in which they appear must be found in the new grammar and deleted. Automatic deletion is possible but not recommended since useless symbols often indicate errors in the grammar. Even after removing useless symbols, the grammar may still contain useless alternatives, which permit derivations of the form A =»+ A. Such a derivation is called a circular derivation, and the grammar is called circular or cyclic. Section 7.5.3 contains an algorithm for a circularity check of a grammar. The book by Hopcroft and UUman [1979] contains an algorithm for the deletion of productions where the right-hand side consists of only a nonterminal, and thus for the removal of cycles. In the foUowing, we wUl cover only non-circular reduced grammars.
22 Syntax Chap. 2 Grammatical levels Programming languages contain constructs of various hierarchy. At the very top art programs, which are composed of modules, procedures, declarations and statements. Declarations and statements in turn are composed of expressions, keywords, names and numbers. Names and numbers are composed of characters. It is somewhat arbitrary which of these constructs are defined as terminals. If one only wants to show the nesting of procedures, then declarations and expressions can be regarded as terminals. If one wants to describe the structure of expressions, then keywords, names, numbers, and operators can be regarded as terminals. Only if one wants to descend further must individual characters be seen as terminals. In this way, the syntax of a programming language need not be completely described by one grammar, but may be partitioned into several grammatical levels. The terminals of the higher level are the nonterminals of the lower level. In compiler design, usually two levels are used: the syntactical and the lexical level. The syntactical level is the higher of the two; its sentence symbol is the program. Its terminals are keywords, names, numbers, operators, etc. Below this, nonterminals of the lexical level are keywords, names, numbers, and special symbols. Its terminals are the individual characters of the source text, insofar as they are meaningful for the grammar (comments, end-of-lines, and meaningless empty symbols are not part of grammar). Figure 2.2 shows this relationship. level syntactic lexical nonterminals program procedure statement expression name number keyword terminals name number keyword individual character Fig. 2.2 Syntactic and lexical grammatical levels In this book, we consider mainly the syntactical level. This results in a difficulty with the notation of terminals. From the syntactical level, the expression a + b * 3
gec 2.2 LL (1) grammars and syntax analysis 2 3 consists of two names v, a number c, and the operators V and '*': v + v * c While the terminals V and '*' are simultaneously members of the syntactical and the lexical level, the terminal v denotes all names, and the terminal c denotes all numbers. In order to emphasize this fact, we call terminals of the syntactical level that represent an entire class of symbols of the lexical level, a terminal class. Thus, in the grammar of arithmetic expressions, v and c are terminal classes, and +, -, *, /, (,) are individual terminals. It is to some extent arbitrary which terminals of the lexical level are also considered as terminals of the syntactical level, and which are combined to terminal classes. For instance, the operators *, /, and MOD from the lexical level can be considered at the syntactical level as individual terminals or can be combined at the lexical level to the terminal class mulop by the production mulop = "*" I "/" I "MOD". 2.2 LL(1) grammars and syntax analysis A grammar for a language can be used in two different ways: as a generative grammar for the generation of sentences of the language, and as an analyzing grammar for the decision whether a given string is a sentence of the language. The generation of sentences is a trivial, straightforward, combinatorial problem, and of no interest in the practical areas of computer science. However, the aspect of the generative grammar is important in theoretical computer science and mathematics. In these fields grammars are classified according to the expressive power and the structural characteristics of the languages they generate. The analysis, more precisely the recognition of sentences is, from a mathematical point of view, also a trivial problem. All sentences of the grammar may simply be generated in ascending order by their length, and it is then easily determined if the specified string is among the sentences (search by exhaustion). In reality, this is not feasible since the number of sentences generally grows exponentially with their length. Analysis methods are needed that make use of all information in the grammar, and that perform the analysis of the given string in a minimum of time and memory requirement. These methods can be separated into two main categories: top-down methods start at the top with the sentence symbol and move downward by repeated derivations trying to find a sentence which matches the given terminal string; bottom-up
24 Syntax Chap. 2 methods start at the given terminal string and move upward by repeated reductions of phrases until the sentence symbol is reached. In addition to these two main approaches, there are analysis methods that mix the top-down and bottom-up approach. In this book, we will cover only top-down analysis. In top-down analysis, we start from the sentence symbol and repeatedly generate new sentential forms by left-canonical derivations, with the goal of deriving a sentence matching the given string. If this is successful, the string has been parsed. If it is not successful, and we have exhausted all possibilities for the derivation of sentences that match the string, then it is clear that the symbol string is not a sentence of the grammar. The only difficulty with this approach is the selection of the correct alter- native. Generally, there is not enough information available at the time when the selection between several alternatives must be made to be reasonably sure of choosing the correct one. Therefore, usually the alternatives must be tried one after the other until the correct one is found. The alternatives that have been tried unsuccessfully are dead ends from which one has to return by backtracking. Fortunately, programming languages are structured in such a way that the proper alternative can be determined with certainty by considering only a part of the input string. These grammars are called deterministic. In compiler construction, only deterministic grammars are used, and so we shall cover only the top-down analysis of deterministic grammars. Deterministic top-down parsing The concept of deterministic top-down parsing consists in selecting the proper alternative by looking at the start symbols of the string to be analyzed. In this way parsing proceeds from left to right. Consider, for instance, the grammar S -> aA | bB A -» x | aB B -> y | bA and the input string a = bbay. The grammar has the property that all of its alternatives start with terminals, and also that the heads of the alternatives are different in each rule. This property permits the dead-end-free determination of the correct alternative by consulting the string a. Assuming that the string is read from left to right, the parsing proceeds as follows: 1. In the beginning, when a choice has to be made between S =*aA and S =*bB, the first symbol of a is read, b is found, and therefore it is
gec 2.2 ££ (1) grammars and syntax analysis 2 5 known that S =*bB must be the correct alternative since S =*aA can never lead to a sentence starting with b. 2. If bB is further derived, one has the choice of replacing B with y or with bA. If the next symbol is read, a b is found, and so bA must be the correct alternative. 3. Continuing this procedure, the following derivations are generated: S =>bB =>bbA =*bbaB =*bbay resulting in the recognition of a as a sentence. From the above derivation, the syntax tree of Fig. 2.3 follows. s B A lllB I I I * b b ay Fig. 2.3 Syntax tree This is the essence of deterministic top-down parsing: Starting with the sentence symbol, a sequence of left-canonical derivations is built, selecting the correct alternatives by the inspection of the string to be parsed. The string is read from left to right. More than one symbol of the input string must be considered when several alternatives of a production start with the same symbol. This lookahead is a characteristic of the LL(k) grammar: 2.14 Definition LL(k) grammar A grammar is called LL(£) (deterministically recognizable from left to right with left canonical derivations and a lookahead of 4 symbols) if its sentences can be parsed by a top-down analysis from left to right in such a way that in each situation where a choice must be made between several alternatives, the correct alternative can always be found by considering the next k symbols of the input string.
26 Syntax Chap. 2 Since it is desirable to restrict the lookahead to one symbol, and since it turns out that this restriction allows us to handle most practical cases, we will examine more closely only the LL(1) grammars. The main question is how to determine if a given grammar is LL(1). We will answer this question first for e-free grammars (i. e. grammars without empty alternatives), and then for grammars that do contain empty alternatives. LL(1) Grammars without empty alternatives Even a grammar whose alternatives begin with nonterminals may be parsable without running into dead ends. Consider the grammar S -> Aa I Bb A -> xz | yB B -) uz | vA and the string a = uzb. Even though none of the alternatives of the production for S starts with w, it is obvious that only B can be derived into a string starting with u, while all derivations of A start with x or y. The symbols x and y are the 'terminal start symbols' of A, and u and v are the terminal start symbols of B. The concept of a set of terminal start symbols is central for the description of the LL(1) property. 2.15 Definition Terminal start symbols of a nonterminal The set//m(A) of terminal start symbols of the nonterminal A is defined to be the set of all terminals with which a string derived from A can start: first(A) = [x:A=>*xco, for xeVjandtoeV*} For the production A -» e, first(A) = 0 (the empty set) This definition can be expanded in a natural way for a string as argument: 2.16 Definition Terminal start symbols of a string The sztfirst(a) of the terminal start symbols of a string a is defined to be the set of all terminals with which a or a string derived from a can start: first(a) = {x : a =>* x a>, for x e Vy and © € V*} As a special case we dcfme first(t) = 0. With the concept of terminal start symbols, it is easy to define the conditions under which an e-free grammar is LL(1):
gec 22 LL (1) grammars and syntax analysis 2 7 2.17 LL(1) condition for z-free grammars An e-free grammar is LL(1) if, for each of its productions, the sets of terminal start symbols of its alternatives are pairwise disjoint. That is, for each of its productions A-> ai I ... I a„ the following holds: first(ai) nfirst(aj) = 0 for 1 < i *j < n 2.18 Example LL(1) conditions The grammar S -» A; A -> aB | BBb B -» b I ab is not LL(1) since the following is true for the production A-*aB\BBb: first(aB)={a}, first(BBb)= {a,b}, and hence first(aB) n first(BBb)= {a} The sets of terminal start symbols of both alternatives are not disjoint. Both alternatives can start with an a. As a result, if a choice must be made between alternatives, and a is the leftmost symbol of the input string, the correct alternative cannot be found without a lookahead of more than one symbol. No left-recursive grammar is LL(1) since for a production of form A -> a | A p the following is true: first(a) = first(A p). LL(1) Grammars with empty alternatives For a grammar with empty alternatives, the LL(1) condition of the preceding section no longer holds. Consider, for instance, the grammar S -> aA; | bAc; A -> c I e and the input string a = be;. It is obvious that the production for S meets the LL(1) condition 2.17 which is also true for the production for A because f irst(c) = {c}, f irst(e) = 0 and hence f irst(c) n f irst(e) = 0
28 Syntax Chap. 2 However, the grammar is not LL(1) since after the derivation S =* bAc; it is impossible to determine with a lookahead of only one symbol whether A-> c or A -» e must be used for the next derivation/The choice of A -» c: S => bAc; => bcc; does not lead to a. The choice of A -> e is the correct one. Therefore, the grammar is not LL(1). If we must choose one of the alternatives of a production a -> otj I... I aRle and only the next symbol of a can be used, then the terminal start symbols of ai to an and the terminal successors of A must be pairwise disjoint, since in the case of the production A -> e , the terminal following A is the next one in a. The set of terminal sucessors is defined as follows: 2.19 Definition Terminal successors The set follow(A) of the terminal successors of a nonterminal A is the set of all terminals that can follow A in any sentential form: follow(A) = {jc : S =»* o>i A x a>2, for A e V#, x e Vr, a>i, ©2 e V*} This definition makes it possible to determine the conditions under which an arbitrary grammar is LL(1): 2.20 LL(1) conditions for arbitrary grammars A grammar is LL(1) if (1) for each of its productions, the sets of all terminal start symbols of all alternatives are pairwise disjoint, and (2) for the nonterminals which can be derived into the empty string, all terminal successors of the nonterminal are disjoint from the terminal start symbols of each alternative. Formally: for each production A -> ai I... I a„ the following must hold: first(a.ifollow(A)) nfirst(ajfollow(A)) = 0 far 1 £ 1 *] <> n Note that in the formal representation the cases a/ *>* e and a j => * e are combined. For a,- ^ ewe have first(aifollow(A)) ^flrst(ai)9 for at =» e we hdiVtfirst(aifollow(A)) =follow(A).
22 LL(1) grammars and syntax analysis 29 2.21 Example LL(1) conditions Consider the grammar of Knuth[1965]: S -* E; E -» aAbE I bBaE I 6 A -> aAbA I 6 B -> bBaB | 6 Is it LL(1)? Since e appears in the productions for E, A, and B, the terminal successors of E9 A, and B are needed. From the grammar it can be easily seen that follow(E) = {;} follow(A) - {b} follow(B) = .{a} The lookahead sets are: for the alternatives of the £ production first (aAbE follow (e)) = {a} first(bBaE follow(E)) = {b} first(£ follow(E)) = {;} for the alternatives of the A production first (aAbA foiiow(A)) = {a} first(6 follow(A)) = {b} for the alternatives of the B production first (bBaB foiiow(B)) = {a} Since the lookahead sets are pairwise disjoint for the alternatives of each production, the grammar is LL(1). The calculation of the successor sets is not always easy as we can see in the following example of an if statement having a dangling else clause. 2.22 Example Dangling else Consider the grammar 1 program -> statement programrest 2 programrest -> program | end 3 statement -> assignment I ifstatement 4 assignment -> v := expr ; 5 ifstatement -4 if thenpart elsepart 6 thenpart -» expr then statement 7 elsepart -» else statement I £ with the sentence symbol program and the terminals end, v, :=, expr, ;, */, then, else.
30 Syntax Chap. 2 Is it LL(1)? There are three productions with alternatives: programrest, statement, elsepart. The first two are LL(1) since {v,if}, first(end) {v}, first(ifstatement) The calculation of follow(elsepart) is longer: first(program) first(assignment) {end} {if} follow(elsepart) follow(ifstatement) follow(statement) with the result: follow(elsepart) = follow(ifstatement) = follow(statement) = first(programrest) U follow(thenpart) u follow(elsepart) = first(programrest) U follow(thenpart) U follow(elsepart). by production 5 by production 3 by production 1 by production 6 by production 7 Since the last term on the right-hand side agrees with the left-hand side, it adds nothing to the set. In addition, since first(programrest) = first(program) u first(end) = {v,if,end} we have follow(elsepart) Additionally, follow(thenpart) first(elsepart) follow(ifstatement) hence follow(elsepart) = {v,if,end} u follow(thenpart), = first(elsepart) u follow(ifstatement) = {else} = follow(statement) = {v,if,end} u {else} - {v,if,end,else} by production 5 by production 7 by production 3 Checking the LL(1) condition for production 7 results in: first(else statement) n follow(elsepart) = {else} * 0. The grammar is therefore not LL(1). The fact that the grammar in this example is not LL(1) does not preclude it from being deterministically parsable with a lookahead of one symbol. The syntax analyzer can always choose the first alternative when it sees the production elsepart and else is the next input symbol. In spite of the ambiguity of the statement
gea 2.2 LL (1) grammars and syntax analysis 31 if a then if b then c else d the first else then always belongs to the innermost then (as in PL/I and Pascal). LL(1) grammars and grammars of programming languages The LL(1) conditions severely restrict the class of grammars that can be analyzed deterministically. Almost all programming language grammars violate the LL(1) conditions. Especially disturbing are two facts: 1. Left-recursive productions are not LL(1). 2. Alternatives that start with the same string are not LL(1). However, it is almost always possible to transform non-LL(l) constructs into LL(1) constructs. This is greatly aided by the use of EBNF notation. With it, left-recursive productions can be described by use of the repetition symbol {}, and common beginnings of alternatives can be extracted by factorization. We have defined the LL(1) conditions only for grammars with simple BNF productions. So the question must arise how they look when an EBNF grammar is used. We will defer the answer for the time being until the end of Section 2.3. Computation of start and successor sets For small grammars, the calculation of start and successor sets to check for the LL(1) property can be done by careful visual inspection. However, an algorithm is needed for larger grammars. Since the derivation of the form A =>+ e plays an important role, we will first introduce the concept of 'deletability1. 2.23 Definition Deletability A nonterminal A is called deletable, if it produces the empty string: In this section we will write deletable symbols in square brackets: [A]. An algorithm for marking deletable symbols in a grammar is trivial. It is based on the following assertions: 1. If A -» e is a production then A is deletable. 2. If A -» X\... Xn is a production and all X,- are deletable, then A is also deletable.
32 Syntax Chap. 2 2.24 Algorithm Marking deletable symbols MarkDeletableSymbols: Mark all nonterminals A for which A-»e exists; repeat — Assert: All marked symbols are deletable Mark all nonterminals A for which A -> Xi...Xn and Xi...Xn are all marked nonterminals until No new symbol was marked end MarkDeletableSymbols Sets of terminal start symbols. Three cases must be distinguished for the calculation of the terminal start symbols of a string a: 1. the string is deletable; 2. its first nondeletable symbol is a terminal y; 3. its first nondeletable symbol is a nonterminal Y. From this, computation rules (1) through (3) follow: L for a =[Xi] ...[**], first (a) = first(Xi) U.. .U first(Xt) 2. for a = [Xi] ... [Xt] y a>, first (a) = first (Xi) U.. .U first (Xt) U {y} 3. fora=[Xi]... [Xt]Y(o9 first (a) = first(Xi) U...U first(Xt) U first(Y) The set of start symbols of a nonterminal is the union of the sets of start symbols of its alternatives: 4. for a -> ai I... I an> first(A) = first(ai) U...U first(an) From these computation rules, the following algorithm is derived. 2.25 Algorithm Calculation of the sets of terminal start symbols FindFirstSets(lGtfirst): param G: A grammar with marked deletable symbols and n nonterminals; first: array(l:n) of set of terminal; begin first(1:n):=0; — start with empty sets repeat for all productions A-XXi |... Icto do
22 LL(1) grammars and syntax analysis 33 for all alternatives tti=[Xi]...[Xt]Y© with t>=0, Y(D€V* do first(A):-first(A)+firstUi)+...+first(Xt); case of Y is terminal: first(A):=first(A)+{Y} | Y is nonterminal: first(A):-first(A)+first(Y) I Y© is absent: — nothing end end end until No change in first end FindFirstSets Terminal successor sets. When calculating the terminal successors of a nonterminal A there are also three cases which must be distinguished: in the right-hand side of a production in which A appears, either a terminal y, a nondeletable nonterminal Y, or nothing follows after any deletable symbols. From this, the computation rules (5) through (7) follow: 5. forfl-»a>iA [Xi\ ... [Xt], (first (Xi) U...U first (Xt) u follow(B)) is in follow(A) 6. forfi -» ©i A [Xx] ... [Xt] y ©2, (first (Xi) U...U first (Xt) U {y}) is in follow(A) 7. for B -> ©i A [Xi] ... [Xt] Y ©2> (first (Xi) U...U first (Xt) U first (Y)) is in follow(A) If all occurrences of A on the right-hand side of the productions are considered, the total set follow(A) will be the combination of all partial sets of follow(A) that result from (5) through (7). Therefore we have the following algorithm. 2.26 Algorithm Calculation of successor sets FindFollowSetsdGlfirsttfollow) : param G: A grammar with marked deletable symbols and n nonterminals; first: array(l:n) of set of terminal; follow: array(l:n) of set of terminal; begin follow(l:n):=0; — start with empty follow sets repeat for all nonterminals A do for all productions B->©iA(Xi]... [Xt]Y©2 with t>=0 and Y(Q^EV* do follow(A):-follow(A)+first(X^ +..,+first(Xt);
34 Syntax Chap. 2 case of Y is terminal: follow(A):=follow(A)+{y} I Y is nonterminal: follow(A):=follow(A)+first(Y) I Y©2 is absent: follow(A):=follow(A)+follow(B) end end end until No change in follow end FindFollowSets The implementation of the algorithms depends strongly on the data structure of the grammar. The execution time depends on the order in which the productions are visited. Many optimizations are possible. Principles of syntax analysis of LL(1) grammars The principle of deterministic syntax analysis of LL(1) grammars can be described abstractly under the following assumptions. 1. The grammar is given in 'matrix form': It has imax productions of the form Ai -» a/i I... I aijmaxfi) where 1 < i < imax The sentence symbol is A i. An alternative ay is given by kmax components of the form a// = Xijl ••• Xijkmax(i,j) a,y = e means kmax(ij) = 1, and Xyi = e. The representation is matrix-like: index i describes the production, index j describes the alternative, and index k describes the component. 2. As programmers, we understand this representation as an abstract data structure with the access Junctions: X(liljlk): symbol returns the value of symbol X^ Kind(liljlk): Symkind returns the kind of the symbol X/#, where Symkind = (terminal,nonterminal,epsilon). Rule(iiijik): integer If Xijk is the nonterminal A& then this function returns index f: Rule(iiijik) = f <=» Xijk = A? Kmaxiiiij): integer returns the number of components of alternative j in the production i.
Sec. 2.2 LL (1) grammars and syntax analysis 35 MatcKlxli): boolean returns true if a phrase of the nonterminal A; can start with terminal xy or - if Ai =>+ e - x can follow the phrase of An Match(ixii) <=» xeflrstiAifollowiAi)) Alternative(ixii): integer returns the index y of the alternative of the production i which can begin with the terminal x: Alternative(lxli)=j <=> x e firs^follow^) 3. The string to be parsed consists of pmax symbols sp: g = S\...spmax with pmax > 1 The description is basic and abstract since we ignore (1) the concrete data structures of the stored grammar, (2) the implementation of the access functions, and (3) the fact that the input string is actually supplied by a lexical analyzer. We will now give a recursive and a nonrecursive parsing algorithm. The recursive algorithm uses an internal recursive procedure Parse. Its operation should be clear from the following specifications and from the text of the algorithm without additional explanations. Initial state: The input string, up to the symbol sp.\, is recognized as a legal beginning of a sentence. The Arphrase starts with sp. Function: Parse{liXcorrect) tries to parse the A,-phrase. Final state: If correct = true, an A,-phrase is parsed and p is advanced such that sp is the first input symbol that is no longer part of the Arphrase. If correct = false, an Arphrase was not parsed. 2.27 Algorithm LL(1) analysis (recursive) ParseRecursive(tcorrect): param correct: boolean; —the string is successfully parsed global grammar in matrix form; s: array(l:pmax) of symbol; —input string pmax: integer; local p: integer; —index of current input symbol Parse(litcorrect): param i: integer; correct: boolean; —an Ai phrase is parsed local j,k: integer;
36 Syntax Chap. 2 begin —try to parse an Ai phrase — position 1 — if Match(ispii) then —parsing of Ai possible j:=Alternative(ispli); Jc:=l; —parse aij loop —parse Xij* — position 2 — case Kind(iiijik) of terminal: if p>pmax or sp<>X(iiijik) then correct:=false; exit end; p:=p+l —read next input symbol I nonterminal: Parse(iRule(iiijik)tcorrect); if not correct then exit end I epsilon: — do nothing end; if k<Kmax(iiij) then k:=k+l else correct:=true; exit end end else correct:=false —parsing of Ai impossible end — position 3 — end Parse; begin —pmax and s are assumed to have values p:=l; Parse(ilTcorrect); correct^correct & p=pmax+l end ParseRecursive We will show the behavior of the above algorithm in Example 2.28 below where we take a snapshot of the states of the algorithm at 'position l\ •position 2', and 'position 3'. 2.28 Example Recursive LL(1) parsing Consider Knuth's e-containing grammar from Example 2.21. Let us give its components the indices i9j9 and k, and extend it by the component eof so that it will not produce empty sentences: Si -» Em eofn2 E2 -» *211 A212 b2i3 E214 I b221 B222 &223 E224 I e231 A3 -> a3H A312 b3i3 A314 I e321 B4 -> b4n B412 a4i3 B414 I e42i The input string shall be ai b2 b3 a4 eofs All steps performed by the algorithm can be traced in full detail by the snapshots of Fig. 2.4.
§eC# 2.2 LL (1) grammars and syntax analysis 3 7 Recursion depth: 0 12 3 Position p sp ijk Xijk ijk Xijk ijk Xijk ijk Xijk 1 2 1 2 2 1 2 3 2 2 1 2 2 1 2 3 2 2 1 2 3 3 3 2 3 1 a 1 a 1 a 1 a 2 b 2 b 2 b 2 b 2 b 3 b 3 b 3 b 4 a 4 a 4 a 4 a 4 a 5 eof 5 eof 5 eof 5 eof 5 eof 5 eof 5 eof 6 - 1— 111 112 112 - E eof eof 2— 211 212 213 214 214 - a A b E E correct 3~ 321 321 2~ 221 222 223 224 224 - e £ b B a E E corrects :=true correct=true 4— - 421 £ 421 £ correct:=true 2— - 231 £ 231 £ correct:=true correct:=true true Fig. 2.4 Snapshot of the LL(1) parsing of Algorithm 2.27 applied to the grammar of Example 2.28 The nonrecursive algorithm uses a stack for the intermediate storage of the indices of all nonterminals that are currently being processed. The access functions of the stack are InitStack, Push(liljlk) and Pop(t ttftk). The algorithm can be in three states: findalternative, try .forward. These are characterized by the assertions which hold in each one respectively: findalternative: The input string is already recognized up to the symbol spA as a legal beginning of a sentence. sp is recognized and it is expected that an Arphrase, starting with sp9 will follow. The index j of the matching alternative of the Arproduction will be found. try: The grammar symbol X^k will be parsed. forward: X^ has been successfully parsed, so move to its successor. Por the stack, the following assertion holds in all three states. If j = 1, the stack is empty. If i > 1, then At is at the top of the stack.
38 Syntax Chap. 2 2.29 Algorithm LL(1) parsing (nonrecursive) the string is successfully parsed —input string forward); —pmax and s are assumed to —have values —start with first rule ParseNonRecursive(Tcorrect): param correct: boolean; global grammar in matrix form; s: array(l:pmax) of symbol; pmax: integer; type State = (findalternative, try, local i,j,k,p: integer; st: State- begin i:=l; p:=l; stack:=empty; st:=findalternative; loop case st of findalternative: — position 1 — if Match(ispii) then j:=Alternat ive(i spi i); k:=l; st:=try else correct:=false; exit end I try: — position 2 — case Kind(iiijik) of terminal: if p>pmax or X(iiijik)<>Sp then correct:=false p:=p+l; st:=forward —Xijk is parsed I nonterminal: Push(liijik); i:=Rule(iiijik); st:=findalternative I epsilon: st:=forward end —case Kind I forward: — position 3 — if k<Kmax(iiij) then k:=k+l; st:=try else if i>l then Pop(titjtk) —X^ is first component —sp does not match —parse Xi;jk exit end; —advance to next component —no end of alternative —end of alternative —Nonterminal Xijk is parsed else correct:=p=pmax+l; exit —rule 1 is parsed end end —case st end —loop — position 4 — end ParseNonRecursive The behavior of the nonrecursive algorithm is shown in Example 2.30.
Sec. 2.2 LL (1) grammars and syntax analysis 39 2.30 Example Nonrecursive LL(1) parsing We consider the same grammar and input stream as in Example 2.28 with snapshots at positions 1 to 4. The algorithm executes as in Fig. 2.5. Position 1 2 1 2 3 2 1 2 3 3 2 3 2 1 2 3 2 1 2 3 3 2 3 2 1 2 3 3 3 3 2 3 4 P 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 6 6 Sp a a a a b b b b b b b b b b b a a a a a a a eof eof eof eof eof eof eof eof eof - - ijk 1 111 211 211 211 212 3 321 321 212 213 213 214 2 221 221 222 4 421 421 222 223 223 224 2 231 231 224 214 111 112 112 112 xijk E a a a A € e A b b E b b B e e B a a E e e E E E eof eof eof Stacl < (En< empty empty 111 111 111 111 212, 212, 212, 111 111 111 111 214, 214, 214, 214, 222, 222, 222, 214, 214, 214, 214, 224, 224, 224, 214, 111 111 111 111 111 111 111 111 214, 214, 214, 111 111 111 111 214, 214, 214, 111 empty i-Of-Stack left) 111 111 111 111 111 111 correct=true Fig, 2.5 Snapshots of the nonrecursive LL(1) algorithm 2.29, applied to the grammar of Example 2.28 The recursive algorithm is statically shorter and more elegant. The nonrecursive algorithm is more suited for the inclusion of error handling since the explicitly stacked symbols are accessible (see Section 2.6). Both scan the input string strictly from left to right (p is never decremented). In addition, there exists a grammar-specific upper limit c such that after
40 Syntax Chap. 2 a maximum of c loops, a new input symbol is read. Hence, the algorithm has a linear execution time with respect to the length of the input string. It has the time complexity 0(pmax). LL(Jfc) grammars for k > 1 A lookahead of more than one symbol is rarely used in compilers. We shall therefore treat LL(k) grammars for k > 1 only briefly, for the sake of completeness. First, we define the set of terminal start symbols of length A: of a string a: 2.31 Definition Terminal start symbols of length k ^(a) = (p:aVp© withpeVr*, lpl = fc, <o eV*} for pa>>* firstk(a) = {p: a h* p with p e Vt*9 Ipl < k} for p > k If the terminal string which can be derived from a is shorter than k, then the elements of firstk(a) are also shorter than k. We will now give a formal definition of the LL(£) grammars according to Rosenkrantz and Stearns [1970]: 2.32 Definition LL(k) grammar A grammar is called LL(k) if for all left-canonical derivations of the form S ^* a A co =** ap© S =** a A cd =** aycd where firstk($ cd) = firstly <o), it is implied that p=y. This means that in an LL(k) grammar no two sentential forms with the leftmost nonterminal A and the alternatives A -» p and A -> y can exist in which the left canonical derivations of the remaining strings beginning with p and y agree in the first k symbols. From this, we get the following condition: 2.33 LL(k) condition A grammar is LL(k) if for each pair of rules A -» p and A -> y and each left canonically derived sentential form that contains A:
Sec. 2.2 LL (1) grammars and syntax analysis 41 the following condition holds: firstk(g> <o) n firstly a>) = 0 2J4 Example ZI/2) andLL(3) test Again, consider the grammar S -» A; A -» aB | BBb B -4 b I ab in order to see if it is LL(fc) and determine the value of k. The only pair of rules that creates a problem is: A -» aB A -* BBb and the only sentential form in which its left-hand side A appears is A;. k - 1: the LL(1) test produces: L\(aB;) = [a} Lx(BBb;) = {<*,*) Since a belongs to both sets, the grammar is not LL(1). k = 2: the LL(2) test produces: Li{aB\) = {oa, o&} L2(BBb\) = {bbyba,ab] Since the element ab belongs to both sets, the grammar is not LL(2). k = 3: the LL(3) test produces: £3(08;) = {oft;, aab) L3(BBb;) = {666,6a6, abb9 aba) Since both sets are disjoint the grammar is LL(3). Algorithms for the computation of the sets firstk(a) and for checking the LL(&) conditions for k > 1 can be found in Aho and Ullman [1972]. No left-recursive grammar is LL(k) for any k. Another simple grammar that is not LL(k) for any k is: S -» A; A4 a I aAa It has the language {a2****1}. If there were a value of k such that first^ac^1;) n first^aAaan\) - 0 then k > n+1 would be true. However, since n can become arbitrarily large, there is no such L
42 Syntax Chap, 2 Rosenkrantz and Stearns [1970] proved the following interesting statements about LL(fc) grammars: (1) It is undecidable whether a given arbitrary grammar is LL(fc) for an unknown value of k. (2) It is decidable whether a given arbitrary grammar is LL(fc) for a given fixed value of k. (3) If a grammar G is not LL(k) for a given k, it cannot be determined if there is an equivalent LL(k) grammar for G. (4) For each LL(Jfc) grammar G that contains e, there is an e-free LL(fc+l) grammar that produces the same language as G, but without the empty string. 2.3 The top-down graph In a table-driven syntax analysis, the grammar of the source language must be stored in main memory so that the analysis algorithm can access it freely. The three-dimensional abstract data structure consisting of rules, alternatives, and components, used in Section 2.2 for the representation of the principal algorithms, is not suited as concrete data structure. It does not make efficient use of memory and the grammar cannot be represented in EBNF form. A representation that is much better suited for a practical top-down analysis is a special kind of graph. We call it top-down graph. It is similar to the syntax diagrams, introduced by Wirth, that were used to describe the syntax of Pascal. In Coco, the top-down graph is used as a preliminary step to the even better suited G-code. Since the G-code is understandable only by means of the top-down graph, we will describe that first. Basic structure The basic structure of the top-down graph is a collection of ordered binary trees. Its nodes are the grammar symbols of the right-hand sides of syntax rules. Right pointers link the components of an alternative while left pointers link the start symbols of different alternatives. In the picture of a top-down graph, a right pointer leaves a node at the right, a left pointer leaves the node at the bottom:
Sec. 2.3 The top-down graph 43 node right pointer (to next component) left pointer (to next alternative) 2.35 Example Basic structure of the top-down graph Figure 2.6 shows the top-down graph of the grammar S -> A; B -» aB I BBb B -> b I ab Notice that the top-down graph comprises only the right-hand sides of the rules. B => a—-B B—-B—^b a—-b Fig. 2.6 Top-down graph of the grammar of Example 2.35 Factorization An advantage of the top-down graph over a linear representation is the ability to show alternatives in a factorized form, as can be done in EBNF. From the rule a —^ abc | acd with the top-down graph A =» a—^ b —+> c I a—*- c —^ d we get by left-factorization the EBNF rule
44 Syntax Chap. 2 a —+- a (be | cd) with the top-down graph a => a—■* b —*■ c I c—~d From the rule a —■* abc I dec with the top-down graph A => a—■* b —•* c we obtain by right-factorization the EBNF rule a—- (ab|de)c with the top-down graph A =* a—►b-y-^c i j d—►eJ Notice that the last top-down graph is no longer a tree. A special case occurs when an alternative is the beginning of another alternative. Then, an e is created by factorization. For the rule a —- ab | a with the top-down graph A => a—+> b we get by left-factorization the EBNF rule a —^ a [b] with the top-down graph A => a—^ b Removal of left-recursive rules The symbol strings defined by left-recursive rules can be represented in EBNF by the repetition symbol. Repetition corresponds to a loop in the top-down graph. From the rule a—- a|Ab with the top-down graph A => a I A—- b we get the EBNF rule I 1 a—- a{b} with the top-down graph A => a—-b-J
Sec. 2.3 The top-down graph 45 This top-down graph is also not a tree. It can easily be verified that it represents all possible right-hand sides such as a, ab, abb, abbb9 etc. The complement symbol any Sometimes it is desirable to represent a set of terminals by its complementary set, for example 1. in the description of a string enclosed in quotes: the set of all symbols in the alphabet except the quote; 2. any symbol in a comment of the form (* ... *) by the set of all symbols except the symbol *); 3. any symbols except begin (to skip declarations). Complementary sets cannot be represented in the production notation of formal languages. Therefore, the only thing left to do is to enumerate all members of the complementary set, which is very inconvenient. For use in Coco, we introduce the special symbol any to denote complementary sets. 2.36 Definition Complement symbol any The complement symbol any represents every arbitrary terminal that is not a terminal start symbol of an alternative of any. Figure 2.7 shows the three examples above with the symbol any as an EBNF rule and as a top-down graph. string = " '" {any} " •" . string comment - "(*" {any} "*)". comments -i-^ any -J r*- f skip = {any} "begin". skip => any —» r** "begin llZr Fig. 2.7 The meaning of the complement symbol any Equivalent top-down graphs If one uses only the basic structure, then a unique top-down graph results from a grammar rule. This uniqueness is lost with factorization and removal of left recursion. In these cases there are sometimes several equivalent top-down graphs.
46 Syntax Chap. 2 2.37 Example Equivalent top-down graphs Consider the expression E->T|+T|-T|E + T|E-T By factorization and elimination of left-recursion the graph shown by the upper part of Fig. 2.8 will result. It has 10 nodes and corresponds to the EBNFrule E = (T T I T) {n+w T | T>. Another top-down graph which is equivalent to both but consists of only 7 nodes appears in the middle part of Fig. 2.8. This graph corresponds to the EBNFrule Figure 2.8 shows another equivalent and even more condensed top-down graph with only 6 nodes. This top-down graph no longer corresponds to an EBNFrule. T 1 i + —-T — -*T 1 * ft t- m T J » t r 1 » 1 \ - —•» T » e 10 nodes E =* +—x-*" T—i_+_rft.T 1 'J !r 7 nodes CT^D 6 nodes Fig. 2.8 Three equivalent top-down graphs for expressions
Sec. 2.3 The top-down graph 47 The graph with the fewest nodes occupies the least memory. However, there may be reasons (due to the treatment of semantics, see Section 3.6) not to compress the top-down graph too much. These examples show that for each EBNF rule there is a corresponding top-down graph. But a top-down graph does not always correspond to an EBNF rule. Representation The top-down graph can be represented in memory by a data structure of nodes and pointers that may be dynamically generated or statically declared and initialized. Since the number of nodes is known in advance and does not change, the static declaration is more efficient. In Coco, the top-down graph basically consists of an array of nodes, and each node consists of four components: type Graphnode = record kind: (terminal,nonterminal,any,eps); val,lp,rp: integer; end; var graph: array(l:n) of Graphnode; The names have the following meaning: kind: the various node types. val: the node symbol in some encoding, meaningless for e-nodes. lp: the left pointer that points to the first node of the next alternative. If Ip > 0 then graph(lp) starts the next alternative. If lp = 0, the current alternative is the last one of the production. rp: the right pointer that points to the next component. If rp > 0 then graph(rp) is the next component. If rp = 0, the current component is the last component of an alternative. n: the number of nodes in the grammar. LL(1) Conditions for top-down graphs The LL(1) condition of Section 2.2 refers to the simple grammar representation with rules and alternatives. If a grammar meets these rules, the correct alternative can be selected by a lookahead of one symbol without backtracking. A similar condition for the top-down graph ensures the correct selection of the alternatives without backtracking by use of a lookahead of one symbol. To simplify the discussion, we introduce two auxiliary concepts. Since they are of central importance for the syntax analysis of top-down graphs, we will use them often. We call these concepts 'alternative chain1 and 'match1.
48 Syntax Chap. 2 2.38 Definition Alternative chain An alternative chain is the ordered set of all nodes of a top-down graph that are linked together by left-pointers. A node pointed to by a right pointer is the start of an alternative chain. A node without a left pointer is the end of an alternative chain. We can define nodes that are not linked by left pointers as also belonging to an alternative chain. In this case the alternative chain consists of the node alone. 2.39 Example Alternative chains In the top-down graph of Fig. 2.9 symbols are distinguished by subscripts. The graph contains the alternative chains {+1,-2,7-3} {Tx} {+4,-6**8 Ws) iTi) Note that node T3 belongs to two alternative chains. E => +.- 12- r 'rs~* 6—T7-1 e8 Fig. 2.9 Top-down graph for expressions with indexed symbols 2.40 Definition Match An input symbol x and a node of the top-down graph with symbol sy match (i. e. fit together) if one of the following conditions are met: 1. sy is a terminal and x = sy; 2. sy is a nondeletable nonterminal that may start with x; 3. sy is a deletable nonterminal, sy can start with xotx can follow the nodes?; 4. sy is an e-node and x can follow the node sy; 5. .sy is an any-node and x matches no other node in the alternative chain to which the any-node belongs. In order to select a node loc uniquely from an alternative chain using a look- ahead symbol jc, x must match only one alternative:
g^ 2.3 The top-down graph 4 9 2.41 LL(1) conditions for top-down graphs An alternative chain is LL(1) if an arbitrary input symbol matches at most one of its nodes. A top-down graph is LL(1) if all of its alternative chains aieLL(l). The top-down graph of Fig. 2.9 is therefore LL(1) if T cannot start with + or - and if E cannot be followed by + or - (these symbols would match the e- node). Since each EBNF production corresponds to a top-down graph, the LL(1) conditions for top-down graphs are also the LL(1) conditions for EBNF grammars. In order to check if an EBNF grammar is LL(1), it is easiest to generate its top-down graph and check if it meets the LL(1) conditions. The LL(1) conditions for EBNF grammars can also be derived from the definition of the EBNF grammar alone without constructing a top-down graph. However, this is cumbersome and results in no new insights. We therefore omit the description and leave the task to the interested reader. LL(1) Top-down graphs and grammars of programming languages If top-down graphs are to have practical value, one must be able to represent the grammars of programming languages as LL(1) top-down graphs, and therefore as LL(1) EBNF grammars. We may ask, therefore, if they do this without exception, or if there are constructs that resist an LL(1) representation, and if so, what can be done about it First of all, LL(1) violations by left-recursive productions and by the start of several alternatives with the same symbol can easily be avoided in top- down graphs and in EBNF notation. Remaining LL(1) violations can usually be removed with various tricks that are determined with insight into the particular situation. As an aid for the 'grammar designer', we will treat several typical cases and distinguish between the following five methods: 1. substitution and factorization; 2. alphabet extension; 3. syntactic extension; 4. acceptance of non-LL(l) constructs; 5. miscellaneous transformations. Substitution and factorization. Consider a production with two alternatives that start with different nonterminals X and F, where X and Y can start with the same symbol (terminal or nonterminal). Then it is often possible to
50 Syntax Chap. 2 replace the symbols X and Y by the right-hand side of their productions, and to extract their common starting string by left-factorization. This can be simple and obvious as in the various DO instructions of PLM/80 (and similarly in PIV1): statement = I dostatement I whilestatement I forstatement I casestatement dostatement = "DO" ";" block. whilestatement = "DO" "WHILE" expr ";" {statement} ending. forstatement = "DO" ident "=" expr "TO" expr ["BY" expr] ";" {statement} ending, casestatement = "DO" "CASE" expr "/" {statement} ending. By substitution and factorization this results in statement = I "DO" (";" block I ("CASE" expr ";" I "WHILE" expr ";" I ident "=" expr "TO" expr ["BY" expr] ";" ) {statement} ending ) I ... However, it can also be difficult. In grammars such as Modula-2 & factor can be a set or a designator, and both can begin with an identifier: factor = ... | designator [actpars] I set I ... designator = qualident {"." ident I "[" exprlist "]" I "T" }. set = (qualident] "{" [elementlist] "}". qualident = ident {"." ident}. Note that even the production for designator taken alone is not LL(1). For instance, identAdent may be simply a qualident or a qualident followed by ident The removal of the LL(1) conflict consists of combining designator and set into a new symbol deset, and then splitting designator into ident and a remainder desigrest. After several substitutions and factorizations, the following LL(1) constructs result: factor = ... | deset j ... deset = "{" [elementlist] "}"
Sec. 2.3 The top-down graph 51 I ident P.n ident} I ( »*" | "[" exprlist "]" ) desigrest [actpars] j »{" [elementlist] n}n I [actpars] ). desigrest = {"." ident I n[n exprlist ■]■ I ntw}. The equivalence of the old and new constructs can no longer be easily seen. Alphabet extension. In selecting an alternative, it is fairly common for two lookahead symbols to be necessary to find the right one. The main example of this is when labels appear in front of statements: statement = [ident n:"] (ident ":=" expr j ifstatement I ...). An ident at the beginning of a statement may be a label or the left part of an assignment. This can only be determined by the symbol following ident. This conflict can often be resolved by extending the terminal alphabet. In the preceding case, the word label can be added to the alphabet, and the lexical analyzer can be required to supply a label instead of an ident if ident is followed by a ':'. In this case, the lexical analyzer is used to resolve the LL(1) conflict This method leads to complications if the lexical analyzer is required to carry out a wider inspection of context to determine whether or not to substitute two terminals by another. For example, in Algol 60, 'ident:' does not always mean the label of an instruction. An identifier may also appear in a declaration, as in ARRAY(n : m). In such cases, the lexical analyzer is no longer independent of the syntax analyzer since it must consider the context. Syntactic extension. In Algol 60 there exist multiple assignments, such as assignment » designator ":=" {designator ":="} expr. where expr can start with designator. This LL(1) violation is very nasty. It can be removed by Substitution and factorization1, but this is very cumbersome (the reader should try it). It is easier to 'expand' the designator inside the curly brackets to expr. This requires the introduction of an additional production for assignrest. assignment » designator ":=" assignrest. assignrest = expr [n:=n assignrest]. The syntactic extension must be compensated by a semantic restriction. If in *e production for assignrest the right-recursive part is present, expr must be restricted to be a designator. This can be achieved by the introduction of a hoolean attribute isdesignator. Anticipating knowledge from Chapter 3, this
52 Syntax Chap. 2 may be written as an attributed grammar as follows: assignrest - exPrfisdesignator [":=" where (isdesignator) assignrest]. This means: by syntactic extensions, portions of the language definition are moved from syntax to static semantics. Acceptance of non-LL(l) constructs. If it is known that the parser tries to match the alternatives in the order they are written, some LL(1) violations can be left alone. The best known case is the dangling else: ifstatement = "IF" expr "THEN" statement ["ELSE" statement]. Although this construct is not LL(1), and is even ambiguous (see Example 2.22), it can be left alone if one can be sure that the parser, having recognized the statement following THEN, first tries to detect the optional ELSE, and only regards the entire if statement as complete if there is no ELSE. Other transformations. Sometimes, a grammar that is not LL(1) can be transformed into an equivalent LL(1) grammar by simple transformations that do not fall into any of the four categories above. For example, in Algol 60, a block is defined as block = head ";" body. head ■ "begin" declaration {";" declaration}. This construct is not LL(1) since the semicolon is used in a dual role. It separates adjacent declarations and it separates body from head. The solution is simple: The grammar can be transformed so that the semicolon becomes a terminator instead of a separator. block = head body head = "begin" declaration ";" {declaration ";"} The necessity of such transformations, their difficulty, and the uncertainty of executing these transformations correctly is a weakness of the LL-method, and often a cause for criticism. In bottom-up analyzable LR(1) grammars, no transformations, or only a few, are needed, so research has been focused on the LR-method. However, syntax is but one aspect. What is gained with the LR-method must be paid back by the connection of semantics to syntax: it is much more inflexible in the LR-method than in the LL-method, often leads to violations of the LR-property, and then also requires transformations. In addition, the LL(l)-method is much easier to understand than the LR-method. This results in easier transformations and more understandable error messages.
Sec. 2.4 The G-code 53 2.4 The G-code A top-down graph that resides in memory is a useful way of representing a grammar. It already requires little space, but it can be significantly compressed further. Let us consider the grammar of arithmetic expressions: S = expr. expr = term {n+n term}, term * factor {n*n factor}, factor =v r(n expr n)w. Now, let us add the production 5' = 5 eofsy where 5' is the new sentence symbol and eofsy (= end of file) a new terminal. This trick ensures that each sentence terminates with the same symbol eofsy and that there is no empty sentence if 5 can be derived into the empty string. S' s expr term factor => S —»" eofsy =» expr 1 e =5 factor l e =» v 1 T —*• expr —*- T Fig. 2.10 Top-down graph for an expression: graphic representation In Fig. 2.10 we see a top-down graph of a grammar with 15 nodes. In Fig. 2.11 we see the internal memory representation described in Section 2.3. If we assume one byte each for the components typ and val, and two bytes each for Ip and rp, then the table requires 15*6 = 90 bytes. Compacting can be achieved by partitioning the nodes according to their types, and by coding the individual types so that they do not contain any unnecessary information. The G-code (grammar code) that we use is such a code. For syntax analysis the elements of the G-code behave as instructions and therefore they are written as instructions. Sequential G-code instructions are sequentially executed. They correspond to nodes in the top-down graph
54 Syntax Chap. 2 that are connected by right pointers. Definition 2.42 defines the G-code as far as it is relevant for the representation of a top-down graph. i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 typ(i) nt t nt nt t nt eps nt t nt eps t t nt t val(i) s eofey expr term term fector factor V T expr tryi Wi) 0 0 0 0 7 0 0 0 11 0 0 13 0 0 0 ip(i) 2 0 0 5 6 5 0 9 10 9 0 0 14 15 0 rule for S' rule for S rule for expr rule for term rule for factor Fig. 2.11 Top-down graph for an expression: representation in memory The G-code is augmented by tables containing the lookahead symbols. With each nonterminal symbol sy (not with each nonterminal node) there is associated a set first(sy)9 containing its terminal start symbols.The operand nr of an e-instruction (= EPS and EPS A) refers to an array eps set. Thus epsset(nr) contains all terminals that match the corresponding e-node (see Definition 2.40). The operand nr of an ANYA-instruction refers to an array anyset. Thus anyset(nr) contains all terminals that match the corresponding any-node. In summary, these G-code lookahead sets have the following data structures: first: array(limaxnt) of Symbolset epsset: array(limaxeps) of Symbolset anyset: array(l:maxany) of Symbolset If the lookahead sets are stored bitwise, they do not require much memory. It can be seen that each node of the top-down graph corresponds to a G- code instruction. The G-code instructions RET and JMP are added at the end of productions and loops where the linear execution sequence is interrupted.
Sec. 2.4 The G-code 55 2.42 Definition G-code (incomplete) Instruction Bytes Description NTA ANY ANYA sy adr nr adr T sy 2 terminal If the next input symbol is sy then recognize it, else report an error. TA sy adr 4 terminal with alternative If the next input symbol is sy then recognize it, else go to oar. NT sy 2 nonterminal If the next input symbol is a terminal start of sy then step through its production, else report an error. 4 nonterminal with alternative If the next input symbol is a terminal start of sy then step through its production, else go to adr. 1 any Recognize the next input symbol. 4 any with alternative If die next input symbol is contained in the symbol set indicated by nr then recognize it, else go to adr. EPS nr 2 epsilon If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else report an error. EPSA nr adr 4 epsilon with alternative If the next input symbol is contained in the successor set indicated by nr then recognize the empty string, else go to adr. JMP adr 3 jump Goto adr. RET 1 return Return from the production of a nonterminal. The operation code and the operands sy and nr are 1 byte each; adr is 2 bytes. The following G-code results for the grammar shown in the top-down graph of Fig. 2.10: 1 NT S 3 T eofsy 5 RET 6 NT expr 8 RET 9 NT term 11 TA n+w 20 15 NT term The production S1 = S eofsy. The production S - expr. The production expr = term {"+" term}.
56 Syntax Chap. 2 17 20 22 23 25 29 31 34 36 37 41 42 44 46 48 JMP EPS RET NT TA NT JMP EPS RET TA RET T NT T RET 11 1 factor w*ii 34 factor 25 2 v 42 it /n expr n \ n The production The production factor ■ v | "(■ expr ")• The lookahead sets are: first(S) - {v, •(") first (expr) - {v, "("} epsset(l) - {eofsy, ")"} first (term) - (v, "("} epsset(2) - (eofsy, ")\ "+"} first(factor) - (v, n(w} anyset is empty since the grammar contains no any-symbol. The total amount of G-code is 48 bytes, which is slightly more than one- half of the top-down graph. In Coco, first of all a top-down graph is generated. It is then used to check several properties of the grammar, and to calculate the start and successor sets. Finally the graph is transformed into G-code, and this is the ultimate structure in which the grammar is stored. 2.5 Parsing with the G-code Parsing becomes quite simple with the G-code since the G-code itself is already a parsing program. To make a parser, it is only necessary to code an interpreter for a G-code program. In this section we will develop such a parser without error handling. In the next section we will add the error handling. Assumptions We will summarize here the assumptions on which parsing with the G-code is based.
« 2.5 Parsing with the G-code 57 i The G-code is derived from a top-down graph that meets the LL(1) conditions. 2. If 5 is the sentence symbol, then the top-down graph and the G-code are expanded by the production S1 -» S eofsy where eofsy is the terminal end-of-file that does not appear in the original grammar. The first G-code instruction of this production has the address 1. 3. The symbol string to be parsed is supplied by a lexical analyzer, which provides the next input symbol in the variable typ for each call. After reaching the last source symbol, the lexical analyzer supplies the symbol eofsy. 4. The parsing algorithm uses a stack of actual length lacts (= actual length of stack) to store the addresses that follow the nonterminal instructions currently being processed (these are the "return addresses" of the currently parsed nonterminals). Overview The parsing algorithm executes the G-code program which is controlled by the input string to be recognized. It starts at address 1 and ends at the instruction for the symbol eofsy. Depending on the current input symbol typ and the current G-code instruction several courses of action are possible. When the algorithm tries to recognize a terminal there are two possibilities: if it succeeds then it moves to the next symbol; if it fails then it goes on to the next alternative (if there is any). When the algorithm tries to recognize a nonterminal, there are also two possibilities: if the input string and the nonterminal match then the algorithm pushes the address of the next instruction on the stack and jumps to the first G-code instruction of the nonterminal; if they do not match then it goes on to the next alternative (if there is any). At the end of productions, the 'return address' is popped from the stack with RET, and the algorithm continues from there on. When an error occurs, error handling and synchronization take place, after which parsing continues as if no error had occurred. The analysis ends when typ = eofsy and the corresponding G- code instruction is T eofsy. The parsing algorithm is called Parse. It returns a boolean variable correct which will be true if the analyzed input text is syntactically correct. Parse is an interpreter that has the following structure:
58 Syntax Chap. 2 Parse(Tcorrect): pc:=l; —program counter loop opcodes-code (Ipc); —G operation code case opcode of t: execute instruction "T syw and change pc I ta: execute instruction WTA syw adr and change pc I jmp: execute instruction nJMP adr" end end end Parse Inside the loop, a value is assigned to the result parameter and the loop is terminated if typ = eofsy. The simplified G-code parsing algorithm First we will present a simplified version of Parse that does not contain the instructions ANY,ANYA9EPS9EPSA, and does not have any error actions. We further assume that nonterminals are not deletable. For the description of Parse in Adele, we will use the following routines: Decode(lpctopcodetsytnrtadrtnextpc) returns the parameters of the G-code instruction starting at address pc. (An operand that does not appear in the actual instruction returns an undefined value of the corresponding parameter.) nextpc is the address of the next instruction. NewSym(ttyp) returns the next input symbol. Root(isy): integer returns the address of the first G-code instruction for the production for the nonterminal sy. By using these actions, the simplified algorithm is as follows: 2.43 Algorithm Parse (simplified) Parse(Tcorrect): param correct: boolean; —correctness indicator const eofsy = ... ; —end of file symbol type Instruction = (t,ta,nt,nta,jmp,ret)/ local adr: integer; —instruction part adr first: array of Symbolset; —lookahead symbol sets lacts: integer; —actual stack length nextpc: integer; —addr.of next G-code instr. nr: integer; —instruction part nr
Sec. 2.5 Parsing with the G-code 59 opcode: Instruction; pc: integer; stack: array of integer; sy: integer; typ: integer; begin pc:-l; lacts:=0; NewSym(Ttyp); loop Decode (ipctopcodetsyf nrtadrf nextpc) ; case opcode of t: if typ=sy then if typ=eofsy then correct:=true; exit end; pc:=nextpc; NewSym(Ttyp) else correct:=false; exit end I ta: if typ=sy then pc:=nextpc; NewSym(ttyp) else pc:=adr end I nt: if typ in first(sy) then lacts:=lacts+l; stack(lacts) else correct:=false; exit end I nta: if typ in first(sy) then lacts:=lacts+l; stack(lacts) else pc:=adr end I jmp: pc:=adr I ret: pc:=stack(lacts); lacts:=lacts-l - end —case end —loop end Parse. —instruction part opcode —program counter —nonterminals worked on —sy part of G-code instr. —current source symbol —init.and read first symbol —get instruction at pc —term.without alternative —must match —terminate successfully —advance and read —terminate unsuccessfully —terminal with alternative —may match —advance and read —goto alternative —nonterm.without alternative —must match =nextpc; pc:=Root(isy) terminate loop if error -nonterminal with alternative •may match ^nextpc; pc:=Root(isy) —goto alternative —jump to next instruction —return ■find follower in stack The complete G-code parsing algorithm We will now add the interpretation of the instructions and properties that were left out in the previous section, and provide the following explanations. The instruction ANY recognizes any source symbol, and ANY A recognizes any source symbol that is a member of the lookahead set belonging to this instruction. The instructions EPS and EPSA recognize the empty string if the source symbol matches their lookahead set.
60 Syntax Cfcap%2 In the case of an error, the analysis shall not be terminated. Rather, the error handler Error (Jpcjaltroot) will be executed. Error requires as parameters the address pc of the non„ matching G-instruction and the address altroot (root of alternative chain) of the first G-instruction of the alternative chain in which the error occurred. Error synchronizes by skipping of input symbols, changes pc and altroot and sets correct to the value false. Error is thus local to Parse. Every time an input symbol has been successfully parsed, the next symbol can be read, and altroot can be set to a new alternative chain. For semantic reasons, however, these actions are delayed until the input symbol is actually required by the parser. Instead of reading a symbol, the variable mustreadh set to true. Furthermore, in the complete version we will consider the possibility that a nonterminal X can be derived into the empty string. This can be tested with the function Deletable(ix): boolean Such a nonterminal is always recognized, even if the current input symbol does not belong to its terminal start symbols (explanation in Section 7.3.3). This requires the interpretation of the instructions NT and NT A to be extended. Expanded in this way, the algorithm Parse has the following complete form: 2.44 Algorithm Parse (complete) Parse(tcorrect): param correct: boolean; — correctness indicator const eofsy « ... ; — end of file symbol type Instruction = (t,ta,nt,nta,any,anya,eps,epsa,jmp,ret); local adr: integer; —instruction part adr altroot: integer; —root of alternative chain anyset: array of Symbolset; —lookahead symbol set epsset: array of symbolset; —lookahead symbol set first: array of Symbolset; —lookahead symbol set lacts: integer; —actual stack length mustread: boolean; —typ is consumed nextpc: integer; —address of G-instruction nr: integer; —instruction part nr opcode: Instruction; —instruction part opcode pc: integer; —program counter stack: array of integer; —nonterminals worked on sy: integer; —instruction part sy
Sec- 2.5 Parsing with the G-code 61 integer; —current source symbol ^1 )- ... end Error; —local error procedure Error (.••'• pc-1; altroot:-l; —initialize ^ustread:=true; lacts:=0; loop . . . . Decode(ipclopcode! syTnrTadrinextpc); —get instruction at pc if mustread then —read next source symbol NewSym(ttyp); altroot:=pc; mustread:-false end; case opcode of t: —terminal without alternative if typ^sy —must match then if typ=eofsy then correct:-true; exit —terminate loop successfully end; pc:=nextpc; mustread:=true —advance else Error (tpctaltroot) —sets correct :=false end | ta: —terminal with alternative if typ=sy —may match then pc:=nextpc; mustread:=true —advance else pcr^adr —goto alternative end I nt: —nonterm. without alternative if typ in first(sy) or Deletable(isy) —must match then lacts:=lacts+l; stack (lacts) :=nextpc; —push follower pc:=Root(4sy); altroot:=pc —parse rule for nonterminal else Error (tpctaltroot) —sets correct :=false end I nta: —nonterminal.with alternative if typ in first(sy) or Deletable(isy) —may match then lacts:=lacts+l; stack(lacts):=nextpc; —push follower pc:=Root(isy); altrootr^pc —parse rule for nonterminal else pc:=adr —goto alternative end I any: —any without alternative pc:=nextpc; mustread:=true —advance I anya: —any with alternative if typ in anyset(nr) then pc:=nextpc; mustread:=true —advance else pc:=adr —goto alternative end I eps: —epsilon if typ in epsset(nr) —must match then pc:=nextpc —advance else Error (tpctaltroot) —sets correct :=false end I epsa: —epsilon with alternative if typ in epsset(nr) —may match
62 Syntax Chap,2 then pc:=nextpc —advance else pc:=adr —goto alternative end I jmp: —jump to next instruction pc:=adr | ret: —return pcr^stackdacts); lacts:=lacts-l; —find follower in stack altroot:=pc end —case end —loop end Parse. 2.6 Error handling Principle A syntax error arises in one of three situations: (1) the input symbol typ does not match the symbol sy in the G-code instruction T; (2) typ is not a terminal start symbol of the instruction NT; (3) typ is not a terminal successor of the instruction EPS. In any of these situations, the variable altroot contains the address of the alternative chain in which the error occurred and the stack contains the return addresses of all nonterminals that are currently being processed. This is sufficient information to collect all terminals that can be used to continue the analysis* The following example illustrates the situation. 2.45 Example Error situation Consider the grammar fragment: program - declarations body end. declarations - ... body = statement {statement}, statement = I "if" relation "then" body ... relation = expr relop expr. relop = n>" | ">=" | "=n | won | "<=n | "<n. expr = ... and the input text ... if a:«b then c:-d end ... When the syntax analyzer detects the error caused by the ':=', the situation shown in Fig. 2.12 has been reached. The boxes in this figure enclose the grammar symbols of the G-code instructions whose addresses are in the stack.
Sec- 2.6 Error handling 63 S' eofsy program r~ declarations body statement end r~ statement I e statement relation then expr relop b then c:=d end Fig. 2.12 Partial syntax tree of an erroneous translation of the instruction if a:=b then c:=d end The last input symbol which was correctly recognized is a. It was recognized as expr. Then relop must follow. Since relop cannot start with ':=' the procedure Error(tpctaltroot) is called. The stack contains the addresses of the G-code instructions for the recognition of eof end, statement, then t t bottom of stack top of stack We will now collect the so-called 'anchors1, i. e. all terminals that are suitable for the resumption of the syntax analysis. They can be grouped into four classes: 1. All terminal start symbols of the alternative chain starting at altroot, because the erroneous symbol may have been added inadvertently by the coder, in front of a. symbol of the unrecognized alternative chain. In the example, these are the symbols >, >=, «, <>f <«, <. 2. All terminal successors of the alternative chain at altroot, because the erroneous symbol may appear in place of a symbol of the unrecognized alternative chain. In the example, this set consists of the beginnings of expr: v, c, +, -, (.
64 Syntax Chap. 2 3. The terminal start symbols of all symbols in the stack, and of the alternative chains beginning with them. With these, syntax analysis can be resumed after a non-recognized nonterminal. In the example these are the symbols then, end, eofsy and the setfirst(statement). 4. All terminal successors of the alternative chains whose addresses are in the stack. In the example, these are all terminal start symbols of body since body follows then, and all terminal start symbols of statement since statement follows statement. While the inclusion of items 1 to 3 in the set of anchors is plausible, the inclusion of item 4 seems rather arbitrary. We could justify this by the fact that items 3 and 4 are symmetric to items 1 and 2, but there is a heuristic reason as well. In a grammar, where the V is a statement separator rather then a statement terminator, without rule 4 the set of anchors would contain the V but not the start symbols of statements. Then, in the case of a missing V between statements, which is a common error, the next statement would be skipped Rule 4 prevents this by adding the start symbols of statements to the set of anchors. Similar errors, e. g. the suppression of a comma between expressions, are also quite likely to occur. Now, input symbols are skipped until one of them appears in the set of the anchors. In the worst case this appears at the end of the input text, since eofsy is always among the anchors. Next, the stack must be corrected. If the anchor is a terminal start symbol of the alternative chain, whose address is in stack(t), analysis will be resumed at this address and the stack length will be reduced to t - 1. In Example 2.45, only *:=' is skipped since b is a start symbol of expr and the stack is not reduced In summary, we can describe the principle of error handling as follows: 2.46 Principle of error handling An error is detected if an alternative chain is unsuccessfully traversed up to its end. Then the error is flagged and the analysis must be synchronized. The synchronization consists of collecting a set of anchors and of skipping the input text up to the first input symbol that is contained in the set of anchors. With it, the analysis can be resumed at the address pc of the anchor. During this process the stack is reduced if necessary so that at the end of the error handling the following assertion holds: Starting with the G-code instruction at pc the analysis can be continued with the current input symbol typ (typ matches the alternative chain atpc). The stack contains the return addresses of all nonterminals currendy under process when continuing the analysis with pc.
. 2.6 Error handling 65 This error handling has two remarkable features: 1 It is completely independent of the syntax of the input language. 2 Anchors are collected only if an error is detected. It is therefore completely dynamic and starts anew for each error. Hence, the presence of error handling does not reduce the parsing speed in case of a correct input string. The synchronization itself is expensive, but, since errors are infrequent, this is only a slight disadvantage. The algorithm Error From the preceding section, the basic structure of the algorithm Error is obvious now: 2.47 Algorithm Error (basic structure) Error (tpcjaltroot): global correct: boolean; lacts: integer; —actual stack length begin correct:=false; Print error message; Collect anchors; skip input symbols up to the first anchor; Correct pc, altroot and lacts ~ It is synchronized. The analysis can continue end Error Error messages The error messages are also independent of the input language. At the error location, we simply extract all expected symbols from the G-code and list them. In Example 2.45, the following error message will occur: ... if a:=b then c^d end ... I relop expected This message is sufficient for most purposes. In Coco we also provide the option for the user to output his own error messages (see Section 5.2.2). The collection of anchors Since, after synchronization, parsing is resumed with a new G-code instruction newpc and with a new stack length newlacts, anchors are collected as triples: (newtyp, newpc, newlacts) A procedure Triple produces a triple list in which the following triple categories are included:
66 Syntax Chap. 2 1. the terminal start symbols of the alternative chain beginning with altroot, 2. the terminal successors of the alternative chain beginning with altroor, 3. the terminal start symbols of all alternative chains whose addresses are in the stack; 4. the terminal successors of all alternative chains whose addresses are in the stack. If a terminal belongs to more than one of the four categories, category 1 has priority (because no symbol needs to be read). Category 2 has priority over categories 3 and 4 (because synchronization can take place in the same production where the error occurred). Of the anchors derived from the stack, the ones closest to the error location have priority, and the terminal start symbols of the stacked alternative chains have priority over their successors. In order to fill the triple list with terminal start symbols and successors corresponding to the priority rules, we use the algorithms Fill and FillSucc. Hence, the algorithm Triple has the following form: 2.48 Algorithm Triple Triple(ialtroot): global triple list; stack: array of integer; lacts: integer; begin triple list := empty; for i:«l to lacts do FillSucc (istack (i)ii-l) ; Fill(!stack(i)ii-1) end; FillSucc(ialtrootilacts); Fill(ialtrootilacts) end Triple —actual stack size —class 4 —class 3 —class 2 —class 1 As a concrete data structure of the triple list, we use two arrays newpc and newlactsy which are indexed with the maxt + 1 terminals of the grammar: newpc; newlact: array(0: maxt) of integer The algorithms Fill and FillSucc use the following procedure to obtain G- code instructions: GetSymInstr(ipctopcodetsyt nextpctaltpc) which supplies the G-code instruction atpc. The two last parameters have the meaning:
Sec. 2.6 Error handling 67 nextpc: Address of the first 'symbol-recognizing' instruction (T, TA, NT, ANY, ANY A) which follows the instruction at pc in the same production, or 0 if no such instruction exists. alwc: Address of the first 'symbol-recognizing' instruction which is an alternative of the instruction at pc, or 0 if no such instruction exists. Fill and FillSucc can now be described as follows: 2.49 Algorithm Fill Fill(ifirstpcilacts): global newpc,newlact: array(0:maxt) of integer; begin pc:=firstpc; while pc*0 do GetSymlnstr (ipctopcodetsytnextpctaltpc); case opcode of t,ta: newpc(sy)r^pc; newlacts(pc):=lacts I nt,nta,nts,ntas-: for all xe first (isy) do newpc(x):=pc; newlacts(x):=lacts end I any,anya: —nothing (eps and ret do not exist) end; pcr^altpc end end Fill 2.50 Algorithm FillSucc FillSucc(istartpcilacts): global newpc,newlact: array(0:maxt) of integer; begin pc:=startpc; while pc*0 do GetSymlnstr (ipctopcodetsytnextcptaltcp) ; if nextcp^O then Fill(inextpcilacts) end; pc:=altpc end end FillSucc Heuristic improvements This synchronization procedure works well in most cases and synchronizes rapidly. However, it is not uncommon for the synchronization to be incorrect, causing spurious error messages or the skipping of longer text portions. The quality of the synchronization also depends on the grammar. It can be
68 Syntax Chap. 2 improved by partitioning long grammar productions into several shorter ones. This increases the number of anchors. We have improved the procedure with two heuristics, which are also in- dependent of the grammar: 1. If several errors occur close together, we print only the first one, under the assumption that the remaining errors are spurious, resulting from the first one. We introduce an error distance, errdist, which is set to 0 after the handling of any error, and is increased by one for each input symbol read. If errdist is less than a predetermined limit errdistmin when an error occurs, no error message is given. We use errdistmin = 2, i. e. at least two symbols must have been recognized since the last error, otherwise a spurious error is assumed. 2. When a spurious error occurs, the stack may have already changed ftom the value when the original error occurred. Therefore, we save the stack at each original error, and restore it at a spurious error. The heuristics only apply to the program Error and not to its subprograms. Error now has the final form: 2.51 Algorithm Error (with heuristic enhancements) Error (tpctaltroot): global correct: boolean; lacts: integer; —stack length errdist: integer; —error distance errdistmin: integer; —minimal error distance begin correct:=false; if errdist<=errdistmin then Print error message; Collect the anchors; Save the stack else Replace the stack again end; Skip input symbols up to the first anchor; Correct pc, altroot, and lacts; — It is synchronized. The analysis can continue errdist:=0 end Error Coco includes the above error-handling method in the generated parser. A similar error handling was published by Spenke et al [1984]. They assign weights to the anchors and make the use of an anchor for synchronization dependent upon its 'insertion overhead1 and its 'reliability'.
3 Semantics Syntax analysis checks a source program only for formal correctness. That is, it only determines whether the input string is a sentence of the given grammar. This function is shown in Fig. 3.1. Source program (character sequence)" Parser Recognized or not recognized Fig. 3.1 Parser Translation into a target language presents the additional requirement that the source program must be transformed into the target program. The 'meaning' of the target program should be the same as that of the source program, i.e. the semantics should be retained. A program that does this is a compiler (Fig. 3.2). Source program Compiler Target program Fig. 3.2 Compiler A compiler emerges from a parser if the parser is able to emit so-called 'semantic actions' each time it has parsed some syntactic construct. The semantic actions in turn generate output symbols which constitute in their entirety the target program.
70 Semantics Chap. 3 This chapter covers attributed grammars, which are presently the most common technique for the formal description of translation processes. To describe the translation the context-free grammar for the source program is enhanced by three items: 1. semantic actions, which describe the actions that must be performed during the translation; 2. attributes, which describe properties of the grammar symbols and their environment; 3. context conditions, which describe relationships between attributes. We will introduce these three items one-by-one, then cover the formalism of the attributed grammar as a whole, and finally cover a subset of the attributed grammars, the so-called L-attributed grammars, used by Coco. 3.1 Semantic actions The description of semantic actions can be inserted directly at the desired locations in the grammar productions, e. g. by means of the special delimiters sem... endsem. For a left-to-right parsing of a production A -» a>i ©2, the execution of the semantic action statseq after parsing g>i and before parsing 02 can be described by inserting the semantic action between ©1 and ©2: A-4fi)i sem statseq endsem ©2 This production is to be interpreted in such a way that, for the parsing of A, where syntax analysis proceeds from left to right, first ©1 is parsed, then the semantic action statseq is performed, and afterwards ©2 is parsed. For the description of the semantic actions themselves there are no generally accepted conventions. We will use the language constructs of Adele or Modula-2. 3.1 Example Semantic actions Given a grammar of an arbitrary sequence of zeros and ones: s -» os 1 is 1 e The task consists of reversing a sentence c of L(G(5)) to produce an output where the first input symbol is output as the last, the second input symbol is output as the next to last, and so on. This translation is simply written as
, 3.2 Attributes 71 -» os I is I e sem Write CO') endsem sem Write CI1) endsem For a given input sentence, e.g. a = 001, the semantic actions can be traced according to the syntax tree of Fig. 3.3. If parsing is performed top-down from left to right, the output string 100 results. sem Write CO') endsem i—r \ sem Write CO1) endsem I I 1 1 S sem Write(■1») endsem I e Fig. 33 Syntax tree with semantic actions The next example will show that this method can also describe more difficult transformations. 3.2 Example Semantic actions Given the grammar of the previous example, the task is to transform an input sequence of n zeros and m ones into an output sequence of the same length which contains all n zeros followed by all m ones, i.e. the sequence 0* lm. This translation is described by S -» 0 sem Write CO1) endsem S I IS sem Write CI') endsem I e 3.2 Attributes Even for such a simple task as the transformation of the input sequence 79 + 83* into the output sequence '162', the grammar with semantic actions fails. In general, the input sentence of any two numbers connected by V to
72 Semantics Chap. 3 produce an output sequence that shows the sum of the two numbers will fail. Why? When recognizing a constant, the lexical analyzer supplies only the terminal class c (as explained up to now). Thus, the parser 'sees* only the sequence c + c as input. A semantic action that produces the sum of the two numbers, however, is not satisfied with the terminal classes of the two numbers, but requires the values of the constants. These values are the semantic properties of the individual members of the terminal class c. Thus, a lexical analyzer will have to supply two items for input symbols that are terminal classes: the type and the value of the input symbol. The symbol type (not to be confused with the data type) is the terminal symbol in the context of the grammar (variable, constant), and therefore a syntactic property, the symbol value is a semantic property. By assigning an attribute to each terminal symbol that represents a terminal class, the semantic properties of terminal classes can be introduced into the formal language description. We write attributes as indices preceded by an arrow, whereby a constant now assumes the form: ct*, where x is of the type integer. The up-arrow shows that x is the result of the parsing of c, i.e. has the character of an output parameter. By the use of attributes, we can describe the task of reading and adding two constants connected by a plus sign as follows: S -> cfx + cfy sem Write(x+y) endsem In general, attributes describe properties that are associated with a grammar symbol. Therefore, nonterminals can also have one or more attributes. For example, let the following three properties apply to the symbol expr: (1) ftype of expression', (2) the expression has no operators, and (3) the expression is translatable at compile-time. Then we can assign these three attributes to expression by writing exPrTexprtype tsimple Tvalueknown exprtype may assume various values dependent on its data type; simple and valueknown can assume boolean values. In general, one can assign to each nonterminal and to each terminal class X of the context-free grammar a number of attributes that describe those properties of X that cannot be described by the context-free grammar alone. Each attribute can assume a predetermined number of values. These form the attribute type. The attributes of terminal classes receive their value through the recognition of the terminal symbols by the lexical analyzer. The values of the attributes of all nonterminals are calculated by the semantic actions.
geC# 3.2 Attributes 73 3.3 Example Interpretation of arithmetic expressions Consider the grammar of arithmetic expressions consisting of numbers, operators, and parentheses, and terminated by a semicolon: S -> E; . E -» T |* E+T T -> F | T*F F -» C | (E) We want to define formally the meaning of such an expression by a description of its interpretation. Interpretation* means that an expression will be read, its value computed, and the result printed. In the formal description it must be stated that each symbol of the grammar, except for operators, parentheses, and semicolons, has a value. This value is denoted by an attribute. For example, the production F -» c is verbally interpreted by the sentence 'the value of the factor F is the value of the constant c1 and formally by the production: FTa -> cfb sem a:=b endsem Similarly, multiplication is described by the attributed production: TTa ~* Ttb * FTc sem a:-*>*c endsem This means: 'When recognizing the right-hand side, the attributes b and c are assigned a value, and subsequently the product of these values is computed, and assigned to the attribute a of the symbol T. Correspondingly, the remaining productions of the grammar can be assigned attributes and semantic actions, so the complete description is as follows: S -> Efa sem Write(a) endsem Eta ~* Ttb sem a:=b endsem Eta ~* Etb + TTc sem a::sb+c endsem Tta ~* Ftb sem a:s:b endsem TTa ~* TTb * FTc sem a:=b*c endsem Fta ~* ctb sem a:=b endsem Fta ~* (ETb) sem a::sb endsem Such a description is called an attributed grammar. A simplified notation The reader maiy notice that in Example 3.3 most semantic actions consist of only an assignment. It is therefore a useful shortcut to abbreviate
74 Semantics Chap. 3 FTa -> cfo sem a:=b endsetn by Ftb -> ctb This notation expresses the fact that the attribute of c is assigned to the output attribute of F without change. Attributes and semantic actions in EBNF The extended Backus-Naur form can also be used for the description of attributed grammars. Example 3.4 is the same as Example 3.3 but uses the simplified notation in EBNF form. 3.4 Example Interpretation of arithmetic expressions in EBNF S -* Efa sem Write(a) endsem n. n / • Eta "> Tta {"+" Tfb sem a:=a+b endsem }. Tta -> FTa {n*n Ffb sem a:=a*b endsem }. FTa -* cTa With this notation, one can see how the visual separation of syntax and semantics significantly improves readability. Input and output attributes All of the previously used attributes behave like output parameters: they are generated by the parsing of a terminal or a nonterminal, and are used afterwards. We therefore call them derived or synthesized attributes and denote them by an up-arrow. But nonterminals can also have attributes that behave as input parameters, i.e. attributes that already have values, when the parsing of the nonterminal starts. Then, semantic actions which are executed during the parsing of the nonterminal can use these values. We call such attributes inherited attributes, and denote them by a down-arrow. The next example shows the application of inherited attributes.
Sec. 3.2 Attributes 75 3.5 Example Inherited attributes Given the following grammar which describes the declaration of variables: s -> del typ idlist ; typ ->• real I int I bool idlist -4 id I idlist , id id is the terminal class of all identifiers. The declaration consists of a keyword del, a type, and one or more variables of this type, for example: del int x, y9 z. The semantic action, which should be performed during parsing of the declaration, consists of entering each variable's name name and type t into the name list. Let this be done by a call of the procedure Newld(lnamelt). It is appropriate to call Newld immediately after the parsing of an identifier id in the production for idlist. But how can one recognize the type at this point since it was already parsed in the production for typ? The solution is to attach the type t as an inherited attribute to the nonterminal idlist: s -> del typft idlistj,t ; idlist|t -> idtname sem Newld(iname,it) endsem I idlist|t , idfname sem Newld(iname,it) endsem Output attributes of a known symbol A are computed during the parsing of the right-hand side of the A-production, and can thus be used during the parsing of other grammar productions that contain A as a part of their right- hand side. Thus the information flows from the bottom to the top, from the leaves to the sentence symbol. Input attributes of a nonterminal A are computed prior to parsing of the A-production, and are used during its parsing. Thus the information flows from top to bottom in the syntax tree, from the sentence symbol to the leaves. Output attributes of A describe properties of the A-phrase, and its constituent phrases. Input attributes of A denote properties of the environment of the A-phrase. Figure 3.4 shows a syntax tree 'decorated1 with attributes for the sentence: del int x,y,z The flow of attribute values along the dashed lines can easily be seen.
76 Semantics Chap. 3 del typ Tt 4 int- idlist f idlist —] : ? it s r- —j— i idlist it t id Tname n : I T name I ! z J n id I T name y ....... I NewId<J,name it) Newld(lname it) f t NewId(J,name it) Fig. 3.4 Analysis of the sentence del int x,y,z. The attributes flow along the dashed lines 3.3 Context conditions The formal syntax description of a programming language is not sufficient to distinguish between correct and incorrect programs. For example, in a programming language where all variables must be explicitly declared, the following code may be syntactically correct, even though it does not represent a valid program since the variables x and y are not declared. PROCEDURE P VAR a,b: INTEGER x:=y END P If a programming language definition states 'each variable in an assignment statement must be declared1 this defines a relationship between textually separated language elements, which cannot be represented by a context-free gram-
; 3.3 Context conditions 77 mar. Such constraints are thus called context conditions and are usually considered as part of the semantics since they cannot be described syntactically. The total set of context conditions is called the static semantics of the programming language. The word static signifies that they refer to the source code and not to the execution of the program. Programming languages are full of context conditions. It would be desirable if the language definition contained explicit definitions for them, separating them from the other parts of the language definition and stressing their importance. Unfortunately, this is rarely the case since they are often buried implicitly in other definitions. Sometimes they are missing altogether since the author wants a small defining document, or because it is assumed that the reader understands them. Attributed grammars also permit the formal description of context conditions. The context condition is expressed as a relation between attributes. For example, the context condition 'the left side and the right side of an assignment must be of the same type1 imposes a relation between the type attributes of both sides. If assign = idfottypi n-=n exprtv2ttyp2 V". is the production for the assignment, where typl and typ2 are the types of id and expr, then the context condition is typl = typ2. The context condition can be written separately from the production in the form assign = idtvlttypi ":=n exprtv2ttyp2 "f"- CC: typl=typ2 or it can be integrated into the production, e. g. in a manner proposed by Watt and Lehrmann Madsen [1983]: assign = idtvlttypi n:=" exPrtv2TtyP2 "'*" where(typl=typ2). The first form separates the context condition from the production in a firmer manner and is especially suited for several long context conditions. The second form emphasizes the coherence between production and context condition. According to van Wijngaaiden's two-level grammar, the part where(...) can be regarded as a nonterminal that is derived into an empty string if the relationship inside the parentheses is true. It cannot be derived into a terminal string if it is false. If typl - typ2, the syntax analysis of where(typl = typ2) then results in the empty string, so that an assignment is parsed with the remaining part of the production. However, if typl * typ2, the terminal string representing the assignment statement is rejected since the wAere-part is not terminating.
78 Semantics Chap. 3 We use the style with where and define the point of execution of the test of the context condition by its position in the production in the following way. The production A = ©1 where (CC) a>2 • means that in order to parse A, we must execute a syntax analysis from left to right that will parse ©1 first. Thereafter the context condition CC is tested. If it is not met, an error will be reported. Then ©2 will be parsed. The following examples show the application of context conditions. 3.6 Example A context-sensitive language The language [anbncn: n £ 1} is not context-free. It is shown in all textbooks about formal languages that a context-free grammar does not exist for this language. However, the following attributed grammar with a context condition is easily constructed: S = Afp Bfq Cfr where(p=q=r). Afp « a sem p:«l endsem {a sem p:=p+l endsem}. Bfq = b sem q:*=l endsem {b sem q:=q+l endsem}. Cfr - c sem r:-l endsem {c sem r:=r+l endsem}. Here, p, q> and r represent the counts of the characters a, b, and c. The context condition requires that they are equal. 3.7 Example Context condition The context condition In the declaration of an array, both index bounds must be of type integer, and the lower bound must not be greater than the upper bound', can be described as follows: arraydeclaration - idtname w (" constanttciTtypi ":" constanttc2ttyp2 ")" where((typl=typ2=integer) & (cl£c2)). where cl and c2 represent the numerical values of the bounds. 3.8 Example Context condition The context condition feach variable appearing in a statement must have been previously declared1, can be described as follows. One must distinguish syntactically the applied occurrence of a name (in a statement) from the defining occurrence (in a declaration), with the additional syntax rule: var = id.
3J Context conditions 7 9 The nonterminal var denotes the applied occurrence of the name id. Therefore, var must be written in all statements in place of id. If a semantic procedure IsDeclared(iname) is used to check the symbol list to see if the name of the variable is declared, the context condition can be simply formulated as follows: varTname = idtname where(IsDeclared(iname)). If a context condition is not met, this usually affects the execution of the subsequent semantic actions, but this cannot be expressed well in the attributed grammar. In Coco, we therefore avoid explicit context conditions, replacing their checking with semantic actions (see Section 3.6). However, for the description of the static semantics of programming languages context conditions are very suitable. 3.4 Attributed grammars In the previous sections we have introduced the elements of attributed grammars. We now consider them in their entirety. In the literature the concept of an attributed grammar is defined in many different ways (see for example, Raiha [1977], Tienari [1980], Watt and Lehrmann Madsen [1983]). We will follow Waite and Goos [1984]. 3.9 Definition Attributed grammar An attributed grammar is a quadruple AG = (G, A, /?, K): G = (VN> VT> P9S) is a reduced context-free grammar, A is a finite set of attributes; R is a finite set of semantic actions; and K is a finite set of context conditions. With each symbol X eVT u% zero or more attributes from A are associated. With each production zero or more semantic actions from R and zero or more context conditions from K are associated. For each occurrence of a nonterminal X in the syntax tree of a sentence of L(G) the attributes of X can be computed in at most one way by semantic actions. i The attribute computation process In the concept of attributed grammars, it is essential that the definition says nothing about the order in which the semantic actions are executed. In the previous examples, we assumed that syntax analysis was performed top-down from left to right, and that the semantic actions were executed in the same
80 Semantics Chap, 3 aider. However, according to Definition 3.9, this is not required. The order of the semantic actions is not predetermined by some syntax-analysis method: rather, it is free. This eliminates the necessity of putting the semantic actions and context conditions in particular places of the right-hand side of the grammar productions. All semantic actions and context conditions that belong to a syntax production can be summarized and written at the end of the production. In the general case, the translation runs in two phases: 1. syntax analysis, which constructs a syntax tree; 2. execution of semantic actions, which mainly compute the attribute values attached to the nodes of the syntax tree in an arbitrary order. Step 2 implies that an 'attribute computation process' will traverse the syntax tree in an arbitrary manner and compute the values of the unknown attributes at each node. A semantic action can be executed at a specific time if and only if all attribute values which contribute to the computation are known at that time. The attribute computation process continues until all attribute values are calculated. It is therefore possible that the attribute computation process must traverse the syntax tree several times, up and down, criss-crossing from left to right In order to avoid ambiguous computations of attributes, the definition of attributed grammar contains the sentence: 'For each appearance of a nonterminal X, the attributes of X can be calculated in at most one way'. 3.10 Example Variable declaration In Pascal, variables are declared by their enumeration after the keyword var, and the type follows the list of variables. For example, var x,y,z: integer The semantic actions implied by the declaration may consist of a call to a procedure Newld(inameit) which appends the name and type of the variable to the name list In a strict translation from left to right, this construct leads to difficulties, since the type is known only after all names have been parsed, and therefore Newld cannot be called immediately after recognizing a name. In an attributed grammar, these difficulties do not arise if it is formulated as follows: i 1 declaration Bvarn idlistj,t0 ":n typftl sem tO:=tl endsem. 2 idlistj,tl = idtname sem Newld(inameltl) endsem 3 | idlistj,t2 \n idfname sem Newld(inameltl); t2:=tl endsem.
Sec. 3.3 Context conditions 81 For the source text varx,y,z: integer first a syntax tree is generated, where all attributes except those of terminal classes have no values (see Fig. 3.5). declarati lit t var idlist idlist * it2 idlist * it2 1 ito I f integer 1 * ,- J \ i i f i | |:i I 1 \ : ! ! ! id sem Newld (J,name J,tl); t2:=tl endsem ' name IT nam* t id sem Newld <J,name J,tl); t2:*tl endsem * name iTi id sem Newld (J,name j,tl) endsem • T name Fig. 3*5 Analysis of the sentence var x,y,z: integer with the flow of attributes along the dashed lines The attribute computation process now starts at an arbitrary node in order to compute the missing attributes, and to call procedure Newld. Wherever it starts, the first semantic action that can be executed is tO := tl in production 1. Then, tl := tl and Newld(lnameltl) in production 2 can be executed. This process continues along the dashed lines until all of the semantic actions are executed
82 Semantics Chap. 3 In Example 3.10, the order in which the three calls to Newld are executed is not determined by the attributed grammar, but rather depends on the strategy of the attribute computation process. In most cases, the order is unimportant, and therefore this kind of attributed grammar is adequate. If desired, a particular order can be imposed by introducing additional attributes. Cyclic semantic dependencies Attributed grammars can be constructed in which the attribute computation process does not terminate since some attributes depend on themselves. This is called a cyclic semantic dependency. In Definition 3.9, this possibility is covered with the sentence: fFor each appearance of a nonterminal X> the attributes of X can be calculated in at most one way'. There are algorithms that can check the grammar for this property (Knuth [1968], Waite and Goos [1984]). If an attributed grammar of the general form described above has been defined, it must first be checked for cyclic semantic dependencies, and possibly transformed into a well defined form. 3.5 L-attributed grammars Great effort is required to translate an attributed grammar as described in the previous section. First, the syntax tree of the program to be translated must be generated, and each of its nodes must be 'decorated' with the attributes. Then the syntax tree must be traversed more than once to compute the attributes until all attributes are determined. Nowadays storage and run-time requirements confine this method to mainframes - if it is regarded as practical at all. Hence, special forms of attributed grammars are needed for compilers, permitting the computation of the attributes in a single pass from left to right through the syntax tree. Then the semantic actions can be executed in parallel with the syntax analysis and no syntax tree is needed. Such attributed grammars are called L-attributed (i.e. left attributed) according to Lewis et al. [1976]. All examples in Sections 3.1 through 3.3 are of this kind. The limitations imposed on attributed grammars to make them L-attributed, and are related only to the order of the attribute occurrences in a production. Each inherited attribute a of a grammar symbol X on the right-hand side of a production must be computable before X can be recognized. Therefore, for its computation only those attributes can be used that are known prior to the parsing of X. From this, the following definition follows:
Sec. 3.5 L-attributed grammars 83 3.11 Definition L-attributed grammar An attributed grammar is called L-attributed if for each of its productions Y -> Xi... Xn, the following is true: An input attribute of Xk depends only on the input attributes of Y and on the output attributes of Xx ... Xk_x. It can easily be checked by inspection whether a given grammar based on this definition is L-attributed The question is, how far can one get with an L-attributed grammar, and what do the limitations mean? The general attributed grammars are indisputably the more powerful tool. The user does not need to be concerned about the processing order of attributes (and possibly storage of intermediate results) since this is all done automatically by the attribute computation process. The description is essentially static and thus 'in principle1 simple. In reality, such descriptions can be cumbersome and difficult to understand, particularly in the presence of many attributes. L-attributed grammars can be used to describe the translation of nearly all important language constructions. However, in many cases more context must be used for the translation. This is expressed by the necessity of saving intermediate results in lists, stacks, etc. In Section 3.6 it is shown how the non-L- attributed grammar of Example 3.10 can be easily replaced by an L-attributed grammar with semantic actions for temporarily saving variable names. The worst that can happen is that the order of the semantic actions which is imposed by the use of the L-attributed grammar will require the partition of the translation into several passes in which each pass can be defined by an L-attributed grammar. In view of these disadvantages, Waite and Goos [1984] say: L-attributed grammars are inadequate, even in comparatively simple cases.1 We do not agree with this categorical statement. In most cases, the simplicity and the ease of implementation of L-attributed grammars more than compensate for their disadvantages. Therefore we feel that they are a very suitable tool for compiler implementations, at least as long as our computers are limited in memory and speed. Coco processes only L-attributed grammars, and all attributed grammars in the following chapters of this book are L-attributed. Algorithmic interpretation of L-attributed grammars While general attributed grammars are a declarative and therefore non-algorithmic formalism, L-attributed grammars can also be regarded as algorithmic descriptions, imposing an order in which semantic actions have to be executed.
84 Semantics Chap. 3 Programmers who are used to think algorithmically will find it easier to follow this approach. Therefore, we understand an L-attributed grammar as a very high-level algorithmic language in the following sense. The context-free portion of a production a = ax | cc2 | ccn. denotes the algorithm: Parse the nonterminal A by choosing the matching alternative a/, and recognizing its components sequentially from left to right1 Each alternative with a semantic action of the form (Xi = Xi. . .Xj sem SA endsem Xj+1. . .Xn denotes the algorithm: 'Parse X\ through Xj, then execute the semantic action SA9 and then parse X/+i through Xn.% Each alternative with a context condition of the form (Xi - Xi...Xj where (CC) Xj+i...Xn denotes the algorithm: 'Parse Xi through Xj, then test the context condition CC (and report any errors), and then parse Xj+\ through Xn.% An attributed production of the form AiaoTbO = Xialtbl Yia2Tb2- denotes the following algorithm: 1. compute al (using semantic actions that are not stated here, which must precede X and may depend on a0)\ 2. parse X (thereby bl gets a value); 3. compute a2 (using semantic actions that are not stated here, which must precede Y and may depend on 00, al> M); 4. parse Y (thereby b2 gets a value); 5. compute bO (using semantic actions that are not stated here, which may depend on a0, al9 bl9 a2> b2). This algorithmic interpretation adds as a further clause to the definition of L- attributed grammars (Definition 3.11) the sentence: 'Attributes that are used as arguments in a semantic action or context condition between the grammar symbols Xj and Xi+1 can only be input attributes of the left-hand side of the production and output attributes of Xx toX;.'
Sec. 3.6 Implementation of the semantic interface 85 3.6 Implementation of the semantic interface The implementation of the semantic interface in a compiler compiler and in the generated compiler consists of three tasks: 1. translation and storage of semantic actions during compiler generation time and execution of semantic actions at run-time of the generated compiler, 2. translation and storage of context conditions during compiler generation time and test of context conditions at run-time of the generated compiler, 3. reserving memory for attributes at compiler generation time and attribute passing at run-time of the generated compiler. These tasks are most simply and direcdy implemented if the generated compiler performs its syntax analysis with the popular method of recursive descent, which is not covered in this book (Gries [1971], Hartmann [1977], Wirth [1986]). In this, semantic actions and context conditions are directly embedded as code in the syntactic procedures, and attributes become parameters of the syntactic procedures. The simplicity of this kind of semantic interface makes the method of recursive descent still attractive today for hand-coded compilers. If the generated compiler performs a table-driven syntax analysis, then somewhat more effort is required for the semantic interface. In this section, we cover the method used by Coco. Semantic actions The semantic actions are numbered. The order is arbitrary, but it is easiest to order them as they appear in the attributed grammar. We start the numbering at 12 for reasons that follow. All semantic actions are placed in the single procedure Semant as follows: Semant(inr): case nr of 12: Semantic Action 12 I 13: Semantic Action 13 I n : Semantic Action n end end Semant The G-code is expanded to provide as many instructions as there are semantic actions. The G-code instructions treated in Section 2.4 (and two more, see Definition 3.14) have operation codes 0 through 11. Operation codes 12 through 255 correspond to semantic actions 12 through 255. Thus, Coco has a limit of 244 semantic actions which will probably be rarely reached. We only
86 Semantics Chap. 3 need 68 semantic actions to describe the attributed grammar of Coco itself, and 126 semantic actions for the largest pass of a Modula-2 compiler. For the processing of semantic actions the parser of Algorithm 2.44 needs to be expanded only by an if statement: 3.12 Parser with semantic interface Parse(Tcorrect): loop case opcode of t: ... I ret: ... else if correct then Semant (I opcode) end —perform semantic action end — case end — loop end Parse We will now study this method in more detail by an example that uses an L-at- tributed grammar to translate the following declaration: var x,y,z: integer; (In Example 3.10 we have already given a general attributed grammar for this task.) Before we can add the identifier list and type to the name list, it must be temporarily stored. To this purpose we will use a queue as abstract data structure with the access procedures I nit Queue, Enqueue, Dequeue, and EmptyQueue whose meaning is obvious. The attributed grammar is as follows: declaration = wvarn idfname sem InitQueue; Enqueue (Iname) endsem {w,w idtname sem Enqueue (iname) endsem ) n:n typeft sem while not EmptyQueue do Dequeue(Tx); Newld(ixit) end endsem n. n The numbering of the semantic actions and their integration into the procedure Semant results in the following: Semant(inr): local name,x: Nametype; t: (int, bool, real); begin case nr of
Sec. 3.6 Implementation of the semantic interface 8 7 12: InitQueue; Enqueued name) | 13: Enqueue (iname) | 14: while not EmptyQueue do Dequeue(tx); Newld(ixit) end end end Semant The attributes are local variables of Semant This means that in general all the names contained in a semantic action (enclosed between sem and endsem) are global to this semantic action, and therefore common to all of the other semantic actions. Context conditions Context conditions are not treated as an independent language element in Coco. Rather, they are represented as semantic actions. Instead of where (CO we write, for example, sem if not CC then SemErr end endsem where SemErr is a semantic error processing procedure. Attribute passing Coco treats all attributes as local variables of Semant. They receive their value through attribute passing. This is different for terminals and nonterminals. The attributes of terminals (i. e. terminal classes) are always synthesized attributes. They receive their value by the lexical analyzer during parsing. The inherited attributes of nonterminals are passed before parsing by an implicit semantic action, whereas the synthesized attributes are passed after parsing. 3.13 Example Attribute passing For the productions A = ... BixTy ... Biutv = ••• the attribute passing u:=x is done in the A-production before the parsing of 5, and the attribute passing y:=v is done in the A-production after the parsing of B.
88 Semantics Chap. 3 The attribute passing after the parsing of a nonterminal can be executed by a 'normal1 G-code instruction, i. e. by an instruction activating a semantic action. However, for the passing of inherited attributes, two additional G-code instructions are necessary: 3.14 Definition G-code (remainder) Instruction Bytes Description NTS sy sem 3 nonterminal with input attribute semantics. If the next input symbol is a terminal start symbol of sy, then execute the semantic action sem (for input attribute passing) and start the parsing of the production for sy, else report an error NTAS sy adr sem 5 nonterminal with alternative and input attribute semantics. If the next input symbol is a terminal start symbol of sy, then execute the semantic action sem (for input attribute passing) and start the parsing of the production else go to act. A complete example for the translation of an attributed grammar into G-code, including attribute passing semantics, can be found in Section 8.3. Problems with semantic interfaces The simplicity of this semantic interface gives rise to two problems: 1. Semantic actions may only be executed when it is clear that no other alternative will match. In the production A = sem actionl endsem C. I sem action2 endsem D. it must be determined whether C or D is the proper alternative before executing actionl or actionl. Coco takes this into account by automatic insertion of an e-node before the corresponding semantic actions, which leads to the following result: A = sem action 1 endsem A ^e -fraction 1 -* C C i I sem action 2 endsem e -taction 2 -> D D. EPSA SEM NT RET M: EPS SEM NT 1 M 12 C 2 13 D RET
. 3.6 Implementation of the semantic interface 8 9 where the proper selection of alternatives is done with the following lookahead sets: epsset(l) * first(C) epsset(2) = first(D) This also works in the following production: A = B sem action 1 endsem { sem action 2 endsem C sem action 3 endsem }. For the above the following top-down graph and corresponding G-code is generated: f NT B A => B^ actionl-^ e—* action2 -*• C-*action3 —' SEM 12 \ M1:EPSA 1 M2 e SEM 13 NT C SEM 14 JMP Ml M2:EPS 2 RET with the lookahead sets epsset(l) = first(C) epsset(2) - follow(A) If the e-nodes have disjoint lookahead sets, these constructs are LL(1). Attributes in Coco are implemented as local variables of Semant. This results in the undesirable feature that their values are not retained during recursive parsing of nonterminals. For example, in the interpretation of expressions, the following production arises: Etx * TTx fn+" Tty sem *:=x+y endsem}. Here, the output attribute x of the left T must be still available after parsing of the right T since its value is used afterwards. However, since T is recursive over F and E9 the attribute x of the left T may be destroyed by the parsing of the right T. Coco does not take care of this problem. It is up to the programmer to save and restore x explicitly. This can be done by use of a stack and replacing the above production by the following:
90 Semantics Chap. 3 ETx s Tfx ^"+n sem Push(ix) endsem Tfy sem Pop(tx); x:=x+y endsem}. From this follows the 3.15 Principle of attribute saving for recursive symbols Attribute values that must be preserved beyond the parsing of a recursive nonterminal X must be saved before the parsing of X and restored after the parsing of X.
4 Various compiler compilers In the previous chapter we covered the theoretical background of compilers. In the following chapters we will show the practical application of these principles in the design of the compiler compiler Coco. However, before we go into the details of Coco, it will be interesting to look at some other compiler compilers. This will enable the reader to compare Coco with these systems. There is extensive literature about compiler-generating systems. Bibliographies can be found at Raiha [1980] and Meijer and Nijholt [1982]. The scope of this book allows us to cover but a few of them; and even then only to a limited degree. Some of the best-known compiler compilers are YACC (Johnson [1975]), HLP84 (Koskimies [1984]), GAG (Kastens etaL[l9S2\), and MUG (Ganzinger and Giegerich [1984]). In the following paragraphs, we will compare these systems to each other. The basic operation of today's compiler compilers is always the same. The compiler to be generated is described by a metalanguage based on attributed grammars. From this compiler description, a parser and a semantic evaluator are generated which constitute the essential parts of the resulting compiler. The generated compiler reads the source text to be translated, performs a syntax analysis to check the correctness of the input, and builds a syntax tree in memory. It then assigns attribute values to the tree nodes according to the attributed grammar. This process normally requires several passes which traverse the tree from left to right or from right to left. In each pass as many attributes as possible are evaluated. Finally the total semantics of the source program is represented by the attributes in the tree. The last pass generates the target code from the attribute values.
92 Various compiler compilers Chap. 4 The various compiler compilers mainly differ in their compiler description languages, and in their algorithms to traverse the syntax tree. Although much effort is spent to reduce execution time and attribute space, large memory requirements and long processing times are the main reasons why automatically generated compilers are still less efficient than hand coded compilers. Therefore some compiler compilers like YACC and Coco bypass the construction of a syntax tree and accept that they are less powerful and less generally applicable than HLP84, GAG, or MUG. The above mentioned compiler compilers will be compared without going into too much detail. We will give a short example of their input language which will show the translation of a signed integer constant into its value. Normally, such tasks are handled by the lexical analyzer. However, they can also be solved with an attributed grammar, which is short and easy to understand and is therefore well suited as an example of attributed grammars. Of course compiler compilers can achieve more than what is demonstrated in this short example. Most of them will only show their advantages on a large and complex task. However, these small examples will allow some interesting conclusions about the user-friendliness and the effort required to learn the description language of the various systems. 4.1 YACC - yet another compiler compiler Origin and scope YACC was produced by Stephen C. Johnson at Bell Laboratories in 1975. It runs under Unix and is therefore widely available. YACC accepts L-attributed grammars with the limitation that each grammar symbol has only one synthesized attribute and no inherited attributes. From the compiler description, YACC generates an LALR(l) parser (Lookahead LR(1)) and a semantic analyzer which is simply a collection of all of the semantic actions of the compiler description. The user must supply a main program, a lexical analyzer, and a syntax-error handler. Description language The syntax parts of the YACC source language are written as BNF productions. All terminals (with the exception of literals) must be declared. For the production XO : XI X2 ... Xn, the symbol $$ denotes the attribute of XO, $1 the attribute of XI, and $n the attribute of Xn. Semantic actions can be specified at any position between the symbols of the context-
gec. 4.1 YACC - yet another compiler compiler 9 3 free grammar. They must be written in C and may contain an arbitrary sequence of valid C statements. Context conditions are written as if statements in semantic actions. At the end of the grammar, one can write C procedures which are called in the semantic actions. At this point also a scanner procedure named yylex must be provided. Attribute processing The attribute processing is done in a single pass during syntax analysis. An explicit syntax tree of the source language is not produced. Implementation YACC is written in C and produces compilers that are also written in C. It has been used for the translation of many languages, including C, APL, RATFOR, and Pascal. 4.1 Example Attributed grammar as input for YACC %start Number /* start symbol of the grammar */ %token digit /* declaration of terminals. Literals don't */ /* have to be declared as terminals. */ %% /* separator */ Number: 1 Digitlist: 1 "-" Digitlist Digitlist digit Digitlist digit {printf(-$2);} {printf($l);>; {$$ = $1;} {if (($1>3276) J| {($1=3276) && {$2>7))) {printf{"Constant too bign); $$ = 0;} else {$$ = $1*10 + $2;} }; / %% #include<ctype.h> yylex{) { /* lexical analyzer */ int ch; while {{ch=getchar{))==" "); if {isdigit {ch)) {yylva^ch-'O1; return {digit);} else return {ch); yyerror{s) char *s; {printf{M%s\n",s);} /*error procedure*/ main() /*main procedure*/ {return{yyparse{));}
9 4 Various compiler compilers Chap. 4 4.2 HLP84 - Helsinki language processor Origin and scope The first version of HLP was produced in 1978 under the name HLP78 at the University of Helsinki by RSiha et al. [1983]. Since then a new version, HLP84 (Koskimies [1984]), has been created which has little in common with the previous one. HLP84 accepts attributed grammars for a one-pass translation of programs. It generates a scanner, an LALR(l) parser with error handling, and a semantic evaluator to which user procedures can be attached. Symbol table handling can be partially described in the compiler definition language; in certain cases it is even done automatically. This reduces the number of semantic procedures required. Description language The description language Lisa is nonterminal oriented. This is in sharp contrast to other compiler description languages, where the emphasis is on productions. Each nonterminal is described by a block which forms the scope of its local objects. This is similar to the use of procedures in higher-level languages. A block contains all productions of a nonterminal in extended BNF, as well as the description of all terminals used in it. Within a block, attributes and local variables are declared in a Pascal-like form. A set of semantic rules consisting of assignments and function calls is attached to each production. These rules assign values to the synthesized attributes on the left-hand side and to the inherited attributes on the right-hand side of the production. An attribute a of a grammar symbol S is denoted by 5. a. Terminals can have a single synthesized attribute. There is a specific language element for context conditions. Lisa provides some standard facilities for frequently needed operations such as definition of scopes and searching of names in them. These mechanisms free the user from some clerical work. For example, an identifier will be automatically searched in all open scopes and its node in the syntax tree will be automatically attributed according to the information in its symbol table entry. Attribute processing Attributes are processed in a single pass from left to right by means of an attribute-stack and without an explicit syntax tree. This limits the application of HLP84 to languages that can be translated in one pass although it is not required that semantic analysis is done during syntax analysis.
Sec. 4.2 HLP84 - Helsinki language processor 9 5 Implementation HLP84 was implemented on a Burroughs B7800 computer in Pascal. It generates compilers in Pascal. The system has been used for its own implementation and for the generation of a Pascal compiler. 4.2 Example Attributed grammar as input for HLP84 external — declaration of external Pascal-objects type Outfile = Extfile; function Writelnt(f:Outfile; irlnteger): (frOutfile) = procedure ExtOut(var f:Extfile; i:Integer); — Connects the Pascal-procedure ExtOut with the Lisa-function — Writelnt. — Extfile and ExtOut are given in a special system file. nont Number; — description of the nonterminal Number (start sym.). — Number has no attributes, attrset Intval = (val: Integer); — val is declared to be an integer attribute. The attribute — declaration is given the name Intval. var out: Outfile; — global variable const max = 65535; nont SignedNumber: Intval; — description of the nt "SignedNumber". — SignedNumber has an attr. set "Intval" nont DigitList: Intval; check val < max; — context condition token DigitToken: Integer = Digit; — the terminal "DigitToken" with an attr. of type Integer is — declared to consist of a single digit (Digit is predefined) DigitList - DigitToken; — syntactic production rules — semantic rules val:=DigitToken — the attr. of a token is denoted by the name of the token end; DigitList = DigitList DigitToken; rules val:=10*DigitList.val+DigitToken end end DigitList; SignedNumber = '-' DigitList; rules val:=-DigitList.val end; SignedNumber = DigitList; rules
96 Various compiler compilers Chap. 4 val:=DigitList.val end end SignedNumber; Number = SignedNumber; rules post out:=WriteInt(out,SignedNumber.val); — after SignedNumber is processed, its attribute val is written. end end Number 4.3 GAG - generator based on attribute grammars Origin and scope GAG was developed by Kastens, Hutt, and Zimmermann [1982] at the University of Karlsruhe. It accepts ordered attributed grammars where the attribute evaluation order of each nonterminal is fixed and independent of the context of the nonterminal. From the compiler description, an attribute evaluator and an LALR(l) parser are produced (by separate tools). The user must supply a lexical analyzer and a few other procedures such as a code generator. These modules together with some fixed parts constitute a complete compiler. Description language The grammar is written in extended BNF with special constructs for options and repetitions. All nonterminals and terminals (except literals) must be declared. Every production is associated with a set of semantic rules. In these rules the strongly typed, functional language Aladin is used, allowing attribute assignments and function calls. The right-hand side of an assignment can be a complex expression of attribute values, function calls, if expressions, syntax symbols, and many others (see Example 4.3). As a functional language Aladin has neither variables nor control statements. The attribute notation S.a means the attribute a of the symbol 5. If S occurs in a production several times, the first occurrence is denoted by 5[1], the second by 5[2], and so on. There is a special language element for context conditions. Attribute processing A decorated syntax tree is built during attribute evaluation, but it is not traversed in alternating passes from left to right and from right to left, as is done in some other compiler compilers. A node is visited if there are no more
Sec. 4.3 GAG - Generator based on attribute grammars 9 7 nodes to the left of it, and a parent node is visited when no more of the children can be visited. The syntax tree is therefore not processed in a straight direction. In fact, evaluation may sometimes step back some nodes to evaluate attributes that could not be computed earlier. In this manner, the number of passes over the tree can be reduced. The memory requirements for attributes in the syntax tree are optimized by various algorithms. After the attribute evaluation, the decorated syntax tree is passed to a user program which generates the target code. Implementation GAG is implemented in Standard Pascal under Unix BSD 4.2 on a Siemens computer 7.760. It also generates compilers in Standard Pascal. Compilers for Pearl, LIS, Pascal, and Ada have already been produced by GAG. 4.3 Example Attributed grammar as input for GAG % symbol and attribute declarations TERM digit value: INT SYNT; % value is a synthesized integer attribute NONTERM Number NONTERM Digitlist value: INT SYNT; % rules RULE rl: Number ::= ["-"J Digitlist STATIC Number.value:^ IF n-w IS THERE THEN -DigitList.value ELSE DigitList.value FI % No output of the attribute Number.value. % The attributed tree is passed to a user written program, % which prints the results. END; % RULE r2: Digitlist ::= digit STATIC Digitlist .value :=digit .value END; % RULE r3: Digitlist ::- Digitlist digit STATIC Digitlist[13.value:=10*Digitlist[2].value+digit.value CONDITION (Digitlist[2].value<3276) OR ((Digitlist[2].value-3276) AND (digit.value<8)) MESSAGE "Constant value too big" END;
9 8 Various compiler compilers Chap. 4 4.4 MUG - modular compiler generator Origin and scope MUG (Modularer Ubersetzer-Generator) was developed in 1985 at the University of Dortmund (Germany) by Ganzinger and Vach. It processes so- called one-sweep grammars (Engelfriet and File [1981]). MUG supports all phases of semantic analysis (attribute processing, optimization, and code generation). However, it does not produce a scanner or a parser. Those can be generated with YACC and then attached to the MUG system. Semantic modules are written in Modula-2. The underlying principles of MUG are substantially different from traditional attributed grammars. Terminals are viewed as the types of some semantic objects (so-called semantic sorts), nonterminals are viewed as the types of syntax trees (so-called syntactic sorts). Productions are therefore viewed as functions, mapping objects of syntactic and semantic sorts into syntax trees which are themselves elements of syntactic sorts. The translation of trees of an input grammar into trees of an output grammar is called an attribute coupling of the two grammars. Attributes can be classified as semantic attributes, which contain semantic values (and therefore, like the values of terminal symbols, are objects of semantic sorts) and syntactic attributes, which represent subtrees of the output grammar (and thus are objects of syntactic sorts). Semantic attributes are computed in semantic rules, whereas syntactic attributes are built by applying productions of the output grammar. Semantic attributes can also be viewed as 'terminal symbols1 of the output grammar. As a result of this view, several attribute coupling processes can be concatenated so that the output grammar of the first coupling becomes the input grammar of the second one. As an option, MUG can automatically combine the two attribute couplings into a single one. The user can therefore describe complex translation processes as a sequence of simple translations (e.g. L-attributed grammars), which the system - hidden from the user - combines into a single attributed grammar that does not need to be L-attributed. In this manner, readability is balanced with efficiency. Description language MUG uses one description language for all translation phases. It is based on Modula-2. The production Prodi: A -> B c is written in a function-like manner as CONSTRUCTOR Prodi (btree:B; cvalrc): A
Sec. 4.4 MUG - modular compiler generator 9 9 ^n attribute a of a nonterminal S is written as SAa. All nonterminals must be declared together with their attributes and attribute types. For semantic sorts, the user must write Modula-2 modules that export them as types unless they are standard types of Modula-2. There must be separate modules for the input grammar, the output grammar, and their attribute coupling. Semantic rules can contain assignments with arbitrary Modula-2 expressions, function calls, and if expressions. Syntactic attributes are calculated through constructors of the output grammar. Context conditions have no construct of their own. They must be specified within semantic functions. Attribute processing The attribute processor generated by MUG uses the 'one-sweep1 method, which is an L-attributed processing of the syntax tree, where possibly children of each node have been previously brought into an adequate order. Implementation MUG was implemented in Modula-2 on a CADMUS computer. It generates compilers in Modula-2 and has been used for its own implementation. 4.4 Example Attributed grammar as input for MUG SIGNATURE DEFINITION MODULE Numbers; (♦definition of the context-free input grammar*) FROM Values IMPORT Value; (*syntactic sort from the output grammar*) FROM User IMPORT (*semantic sorts (terminals)*) digit, minus; SORT (*syntactic sorts (nonterminals)*) Number, Digitlist; (*rules of the context-free grammar*) CONSTRUCTOR PosNumber(dl:Digitlist): Number; CONSTRUCTOR NegNumber(m:minus; dl:Digitlist): Number; CONSTRUCTOR SingleDigit(dtdigit): Digitlist; CONSTRUCTOR MoreDigits(dl:Digitlist; d:digit): Digitlist; (♦attribution function for the context-free grammar*) OPERATOR Evaluate(n:Number): Value; END Numbers. SIGNATURE DEFINITION MODULE Values; (♦definition of the context-free output grammar*) SORT Value; CONSTRUCTOR Result(val:INTEGER): Value; END Values.
100 Various compiler compilers Chap. 4 ATTRIBUTATION MODULE Numbers; (♦attribute coupling of the above grammars*) FROM Values IMPORT Value; OPERATOR Evaluate(ntNumber): Value; (♦declaration of attributes*) ATTR Number SATTR nval: Value; ATTR Digitlist SATTR dval: INTEGER; (*attributations of the productions*) CONSTRUCTOR PosNumber(dl:Digitlist): Number; BEGIN PosNumberAnval = Result (dlAdval) ; (*the constructor "Result" builds a syntactical attribute of type "Value"*) END PosNumber; CONSTRUCTOR NegNumber(m:minus; dl:Digitlist): Number; BEGIN NegNumberAdval = Result(-dlAdval); END NegNumber; CONSTRUCTOR SingleDigit(d:digit): Digitlist; BEGIN SingleDigitAdval = d; END SingleDigit; CONSTRUCTOR MoreDigits(dlrDigitlist; d:digit): Digitlist; BEGIN MoreDigitsAdval = 10 * dlAdval + d; END MoreDigits; END Evaluate; END Numbers. 4.5 Coco - compiler compiler Origin and scope Coco arose in 1983 at the University of Linz as a successor of a parser- generator. It processes L-attributed grammars, which are viewed as procedural descriptions of a translation process. The compiler description is translated into an LL(1) parser with automatic error recovery and a semantic evaluator to which user modules can be attached. The user must further supply a main program and a scanner (for which there is a scanner generator). It is possible to generate multi-pass compilers with Coco.
Sec. 4.5 Coco - compiler compiler 101 Description language The compiler description language Cocol is based on context-free grammars in Wirth's EBNF notation. All terminals and nonterminals must be declared. Each syntax symbol can have one or more attributes. A symbol 5 with an output attribute a is written as S<out:a> wherever it occurs within a production. Semantic actions are written direcdy in Modula-2. They may appear at arbitrary points on the right-hand side of the productions. Attributes can be accessed like normal variables. Context conditions are written as if statements in semantic actions. Attribute processing Semantic evaluation takes place during the syntax analysis. A syntax tree of the input is not built. Productions are processed strictly from left to right. When a semantic action is encountered, it is executed immediately. Attribute values of terminals are returned by the scanner, those of nonterminals are passed using assignments generated by Coco. Implementation Coco is implemented in Modula-2 on various microcomputers including Macintosh, IBM-PC, Atari, and Lilith. It is also available on IBM mainframes. Coco generates compilers in Modula-2. It has been used for the construction of a multi-pass Modula-2 compiler and for the generation of several tools for static program analysis. 4.5 Example Attributed grammar as input for Coco GRAMMAR Number SEMANTIC DECLARATIONS FROM InOut IMPORT WriteString, Writelnt; VAR value,valuel: INTEGER; TERMINALS digit <out:value> NONTERMINALS Number Digitlist <out:value> RULES Number = Digitlist<out:value> sem Writelnt(value,5); endsem I n_n Digitlist<out:value> sem Writelnt(-value,5); endsem. Digitlist<out:value> = digit<out:value>
102 Various compiler compilers Chap. 4 { digit<out:valuel> sem IF (value<3276) OR ((value=3276) AND (valuel<8)) THEN value:=s10*value+valuel; ELSE value:-0; WriteString("Constant too big*); END; endsem }. ENDGRAM 4.6 Summary This short overview of some of the better known compiler compilers has shown that many powerful systems with complex input languages exist for the definition of many exotic special cases. Why then are these generators so seldom used for practical applications? There are many reasons. The most significant is the fact that automatically generated compilers are simply less efficient than manually coded ones. According to Koskimies et al. [1982], a Pascal compiler produced with HLP78 ran seven times slower and used three times as much memory (only for its code!) than a manually generated compiler. However, efficiency is not the main goal of a compiler compiler. Often it is more important that the compiler description be short, formal, and complete. Then it can be used as a prototype of a compiler implementation for a new language or to study the techniques of compiler construction as such. Compiler description languages are sometimes not easy to read. In most cases ordinary BNF is used for the syntax definition. Although concise and elegant, this notation often looks unnatural because of the recursion needed to express repetitions. Attributes usually appear only in semantic rules and not with the grammar symbols. This makes the productions short, but the reader must extract from the semantic rules those attributes which belong to a given syntax symbol. In many cases, the semantic rules may only be attribute assignments. Therefore, important parts of the actual translation must be hidden in procedures. Having these difficulties to contend with may even make the compiler compiler a burden rather than a help. Finally, most compiler compilers require a lot of memory themselves. For example, GAG required 4 megabytes of main memory for the generation of an Ada compiler, and this amount of memory is not available on many microcomputers.
Sec. 4.6 Summary 103 We believe that a compiler compiler should be a tool which is easy to understand and easy to use. Above all, its input language should be clear and natural, but its availability (e.g. on microcomputers) and efficiency are equally important These were the considerations behind the development of Coco and its input langage Cocol. Table 4.1 summarizes the main features of the described compiler compilers.
Table 4.1 Properties of various compiler compilers Developed in Class of context-free grammars Class of attributed grammars Generated parts Attribute evaluation order in the tree Syntax notation (example) Attribute notation (example) Semantic rales (actions) Context conditions Applied to which languages Implementation language Language of the generated compiler Host computer 1 YACC 1975 LALR(1) L-attributed grammars with a single synthesized attribute per symbol Parser, semantic evaluator No — Digitlist: digit 1 Digitlist digit; $$. $1. $2,... Arbitrary C statements Embedded in semantic actions C, Apl, Ratfor, Pascal C C On most Unix systems HLP84 1978/1984 LALR(1) ^attributed grammars Scanner, parser, attribute evaluator No In a single pass from left to right Digitlist = digit; Digitlist« Digitlist digit; Digitlist value Assignments, attribute expressions, function calls Special construct Pascal Pascal Pascal Burroughs B7800 GAG 1980 LALR(1) Ordered attributed grammars (evaluation order of attributes independent for every nonterminal) Attribute evaluator Yes Evaluation order results from ordered attr. grammar Digitlist ::=digit+ DigjtlisLvalue Special attribution language, assignments, function calls, attribute expressions, Special construct PEARL, Pascal, Ada Standard-Pascal Standard-Pascal MUG 1985 LALR(1) 'One-sweep'grammars (L-attributed grammars on possibly reordered syntax trees) Attribute evaluator Yes In a single 'sweep' CONSTRUCTOR PI (dl:DigitIist; d:digU): Digitlist DigitlisfValue Assignments, attribute expressions, function calls, constructors Embedded in semantic actions For its own implementation Modula-2 1 Coco I 1983 LUD L-attributed grammars Parser, semantic evaluator No — Digitlist = digit {digit}. Digitlist <out:value> Arbitrary Modula-2 1 statements Embedded in semantic actions 1 Modula-2, 1 software engineering tools 1 Modula-2 1 Modula-2 1 Modula-2 J Siemens 7.760 ) Cadmus / Macintosh, IBM-PC,.., /
5 The compiler description language Cocol This chapter describes Cocol, the input language of the compiler generator Coco. A Cocol text essentially consists of an attributed grammar and declarations. From this description, Coco generates a parser and a semantic evaluator. The user has to provide a main program, a scanner, an error message module and semantic modules to get a complete compiler. Some of these modules can be generated by tools or are standard modules that do not depend on the language to be processed. The attributed grammar consists of a context-free grammar as a description of the compiler input and of semantic information as a description of how this input is to be translated. When designing an attributed grammar one usually starts with the context-free grammar and completes it step by step with attributes, semantic actions and context conditions. Therefore this chapter is arranged in two parts: the specification of Cocol as a syntax description language and its specification as a semantic description language. 5.1 Lexical structure A grammar description in Cocol consists of keywords, identifiers, strings, numbers, comments and special characters. Keywords ALIAS ENDSEM MACROS RULES 105
106 The compiler description language Cocol Chap. 5 ANY EPS NONTERMINALS SEM DECLARATIONS GRAMMAR OUT SEMANTIC ENDGRAM IN PRAGMAS TERMINALS Keywords must be written with upper-case letters, except for the following keywords that may also be written with lower-case letters, as they often appear in a context where they are not to be emphasized alias endsem in sem any eps out Identifiers identifier = letter {letter I digit}. Identifiers may be of arbitrary length. Case is significant. Strings string = quote {anybutquote} quote I apostrophe {anybutapostrophe} apostrophe. quote means the character ", apostrophe means the character \ anybutquote is any character except quote, anybutapostrophe is any character except apostrophe. Strings must not extend beyond line boundaries. Numbers number = digit {digit}. Special characters for the syntax description: !()[]{} = . for the semantic description: < > : , ; Comments start with the string'--' and extend to the end of the line. 5.2 Cocol as a syntax description language The kernel of a Cocol text is the syntactic description of the language that the generated compiler is to process. Grammar = "GRAMMAR" identifier SyntaxDeclarations Productions "ENDGRAM". The syntax description consists of declarations for terminals and nonterminals and of the context-free grammar. The identifier following the keyword
§ec. 5.2 Cocol as a syntax description language 107 GRAMMAR is the grammar name. It is the root symbol (start symbol) of the grammar and must be declared as a nonterminal. We start with the productions and continue with the declarations later. 5.2.1 Productions The productions of the context-free grammar are written in an EBNF suggested by Wirth [1982] (square brackets enclose optional expressions, curly brackets denote repetition zero or more times). Productions = "RULES" {Production}. Production « identifier "=" Expression ".". Expression = Term {"|" Term}. Term = Factor {Factor}. Factor = Symbol I "(" Expression ")" I "[" Expression "]" I "{" Expression "}" I "eps" I "any". Symbol » identifier | string. 5.1 Example Cocol grammar for real constants RULES Real = Integer "." [Integer] [Exponent]. Integer = digit {digit}. Exponent = "E" ["+"1"-"] integer. The symbols Real, Integer and Exponent are nonterminals. The symbols digit, "£", ".", "+" and "-" are terminals (they have no productions). eps The symbol eps denotes the empty string (see Section 2.1) and is used to describe empty alternatives. 5.2 Example The use of eps sign - "+" | "-" | eps. is equivalent to sign - ["+" i "-"]. eps is not necessarily needed for the syntax description, but it is required if one has to attach semantic actions to empty alternatives. any The symbol any denotes any terminal, which is not the start of the alternative
108 The compiler description language Cocol Chap. 5 chain to which the any symbol belongs. Therefore any is a representative of a whole set of terminals, i.e. all terminals which cannot be recognized instead of it at that point in the grammar. 5.3 Example The use of any Option - w$n any. Here, any means any terminal. Token = keyword I identifier I number I any. Here, any means any terminal except keyword, identifier or number (which may be recognized instead of it). String = "" {any} ,nf. Here, any means any terminal except ■ n f (which may be recognized instead of it). Properties of a correct grammar Coco generates a compiler only if the grammar is: 1. complete: there must exist a rule for every nonterminal; 2. free of redundancy: every nonterminal must occur in at least one derivation of the root symbol; 3. free of cycles: there must not be a nonterminal which can be derived from itself in one or more steps; 4. terminating: every nonterminal must be able to produce a string of terminals; 5. unambiguous: the grammar must be LL(1). LL(1) conflicts do not necessarily mean serious errors. They can be viewed as warnings in situations where the generated compiler will take the first matching alternative and ignore the others. Sometimes this is what the user wants, as in the well-known case of the dangling else. 5.4 Example How the compiler treats LL(1) conflicts This is the grammar of the dangling else: Statement = ... j IfStatement I ... . IfStatement = "IF" Expr "THEN" Statement ["ELSE" Statement]. When analyzing the string IF a THEN IF b THEN C ELSE d it is not clear whether the else clause belongs to the inner or to the outer if During parsing the first matching alternative is the else of the inner
Sec. 5.2 Cocol as a syntax description language 109 if. The generated compiler takes this alternative. 5.2.2 Declarations All terminals and nonterminals must be declared before they can be used in productions. Declarations have the following order: SyntaxDeclarations = TerminalDeclarations [PragmaDeclarations] NonterminalDeclarations. Terminal declarations TerminalDeclarations = "TERMINALS" {Symbol [AliasName]}. AliasName = "alias" Symbol. Symbol = identifier I string. Terminals are declared by their enumeration behind the symbol TERMINALS. Consecutive token numbers are assigned to them in the order of their declaration. The first symbol gets the number 1, the next one the number 2, and so on. If a symbol name contains a special character, it must be enclosed in quotes (e.g. "+", "plus-symbol"). The end-of-file symbol must not be declared. It always is assumed to have the token number 0. The lexical analyzer has to supply it as the last symbol of the input text. At its arrival, the syntax analyzer automatically interprets it as an indication that the input is empty now. The end-of-file symbol must not (and cannot) be specified in a production. A symbol may be given an alias name, which is used in error messages by the generated compiler. If the alias name is omitted, the symbol name is used instead of it. Alias names allow the use of short names in the grammar and of expressive names in error messages. 5.5 Example Terminal declarations TERMINALS id alias identifier ":=" alias "becomes symbol" ";" alias semicolon Pragma declarations Pragmas are a special feature of Cocol. They are neither terminals nor nonterminals and must not be used in productions. They may occur at any position in the input text and are read by the parser as if they were terminals, but they do not belong to the syntax of the language (examples of pragmas are
110 The compiler description language Cocol Chap. 5 options, the end-of-line symbol, and comments). Parsing is not influenced by pragmas but they may carry semantic information (such as line numbers, option values, etc.). Pragmas can be used to propagate information between the passes of a multi-pass compiler. PragmaDeclarations = "PRAGMAS" {Symbol}. Symbol = identifier | string. Pragmas are declared by enumerating them behind the keyword PRAGMAS. They are assigned consecutive token numbers, starting with the highest terminal number plus one. 5.6 Example Pragma declarations PRAGMAS "end of line" option The purpose of pragmas will become clear when we attach semantic actions to them (see Example 5.11). Nonterminal declarations NonterminalDeclarations = "NONTERMINALS" {identifier [AliasName]}. AliasName = "alias" Symbol. Symbol = identifier I string. Nonterminals are declared by enumerating them behind the keyword NONTERMINALS. Their declaration order is insignificant. Nonterminals can be given an alias name too. The root symbol (grammar name) must also be declared as a nonterminal. 5.7 Example Nonterminal declarations NONTERMINALS Stat alias Statement Expr alias Expression 5.3 Cocol as a semantic description language The semantics of a translation are specified by attaching semantic actions, attributes and semantic declarations to the syntax description. The following grammar of Cocol shows that there are only few locations (marked by underlined text), where semantic parts have to be added to a syntax description in order to get an attributed grammar.
Sec. 5.3 Cocol as a semantic description language 111 CocolText SyntaxDeclarations TerminalDeclarations PragmaDeclarations NonterminalDeclarations AliasName Productions Production Expression Term Factor Symbol = "GRAMMAR" identifier SemanticDeclarations SyntaxDeclarations Productions "ENDGRAM". = TerminalDeclarations [PragmaDeclarations] NonterminalDeclarations. = "TERMINALS" {Symbol [Attributes] [AliasName]}. - "PRAGMAS" {Symbol [Attributes] [SemAction]}. - "NONTERMINALS" {identifier [Attributes! [AliasName]}. = "ALIAS" Symbol. = "RULES" {Production}. = identifier [Attributes! "=" Expression « Term {"I" Term}. = Factor {Factor}. = Symbol [Attributes] "(" Expression ")" "[" Expression "]" "{" Expression "}" SemAction "eps" "any", identifier I string. 5.3.1 Semantic actions A semantic action is a statement sequence on the right-hand side of a production, which is executed after the symbol to the left of it has been recognized and before the symbol to the right of it will be recognized. Semantic actions may be written in any algorithmic programming language (in our Coco implementation this language is Modula-2). There are two kinds of semantic actions. SemAction = SimpleAction j SemMacroCall. Simple semantic actions SimpleAction = "sem" {any} "endsem". A semantic action is enclosed by the keywords sem and endsem. Between them, any statements such as assignments, procedure calls, conditional statements and loops are allowed. The syntactical correctness of the statements is not checked by Coco.
112 The compiler description language Cocol Chap. 5 5.8 Example Semantic actions We want to have a compiler which counts the words in a text. The context-free grammar is Text - {Word}. Now we add semantic actions. Text * sem count:=0 endsem {Word sem count:=count+l endsem} sem IF count>0 THEN WriteCard(count,3); WriteString(n words") END endsem. Since syntactic and semantic parts are intermixed and hard to read, we separate them in two 'colums1: Text = sem count :=0 endsem {Word sem count:=count+l endsem } sem IF count>0 THEN WriteCard(count,3); WriteStringl" words") END endsem. Syntactic and semantic parts are separated clearly now. The production must be read line by line from the left to the right. The parameters of procedure calls in semantic actions may be specified as input, output or transient parameters by writing the characters T, T or 'IV in front of them (T, 'A\ and '!A' on an ASCII keyboard). This is a simple way to make procedure calls more readable. In the resulting compiler these marks are removed. 5.9 Example Indication of data flow at parameters ComputeValues(iargumentl,iargument2,tresult); Semantic macros Sometimes a semantic action is needed at more than one location in a grammar. To avoid rewriting of the action, the user can define a macro for it and call it whenever he needs it. SemMacroDefinition = "sem" ":" MacroName ":" {any} "endsem". SemMacroCall = "sem" "(" MacroName ")" "endsem". MacroName = identifier. A macro definition is a semantic action headed by a macro name which is enclosed in colons. It must be given in a special section of the semantic declarations (see Section 5.3.4). Note: The use of semantic macros also reduces the code size of the resulting compiler.
Sec. 5.3 Cocol as a semantic description language 113 5.10 Example Semantic macros The last semantic action of Example 5.8 is needed more than once, say. The action is defined as a macro in the semantic declarations as follows (see Section 5.3.4): MACROS sem :WriteCounter: IF count>Q THEN WriteCard(count,3); WriteString(n words") END endsem It may then be called by writing sem (WriteCounter) endsem Semantic actions for pragmas A semantic action may be associated with the declaration of a pragma. This means that the action is executed every time the parser reads the pragma. In this way a pragma can cause the execution of a semantic action although it does not occur in any production. 5.11 Example Semantic actions for pragmas PRAGMAS eolsy sem PrintLinelnfo; — call a semantic procedure Emit(ieol) — write pragma to next interpass file endsem 5.3.2 Attributes Attributes describe semantic properties of symbols and their context. Attributes = w<n OutAttributes n>" I "<" InAttributes [w;n OutAttributes] n>n. InAttributes = "in" w:n InAttr {w,w InAttr}. OutAttributes = "out" n:n OutAttr {•,• OutAttr}. InAttr - identifier | number. OutAttr = identifier. In Cocol, attributes play the role of parameters of the grammar symbols. They are classified into input attributes, which are passed to a nonterminal for its recognition, and output attributes, which arise during the recognition of a symbol. We also distinguish between formal and actual attributes. Formal attributes occur in the declaration of a symbol or are attached to nonterminals on
114 The compiler description language Cocol Chap. 5 the left-hand side of a production. Actual attributes are attached to symbols cm the right-hand side of a production. 5.12 Example Attributes NONTERMINALS Variable <in:type; out:object> RULES Variable <in:type; out:object> Declaration = Variable <in:tp; out:obj> — tp: actual input attribute ~ obj: actual output attribute Attribute names may be used like variables in semantic actions. Attributes of nonterminals Nonterminals may have input and output attributes of arbitrary types. The type of an attribute is declared like the type of any other variable (see Section 5.3.4). Formal and actual attributes must be assignment compatible in the sense of Modula-2, although this is not checked by Coco. Whenever a nonterminal occurs, all its attributes must follow it. Formal and actual attributes must correspond in number, sequence, and kind (in or out). A numeric constant may only be specified as an actual input attribute. Attribute evaluation is similar to parameter passing in procedures: before the recognition of a nonterminal is started, the values of the actual input attributes of the nonterminal are assigned to its formal input attributes; when the nonterminal has been recognized, the formal output attribute values are assigned to its actual output attributes. Attributes of terminals and pragmas Terminals and pragmas may have only output attributes. For implementation reasons their size is restricted to word size. This restriction can be circumvented by using abstract data types for longer attributes. Whenever a terminal or a pragma occurs, all its attributes must follow it. For terminals, the names of the formal attributes are insignificant, but for pragmas they are significant as they may be used in a semantic action. Pragmas don't have actual attributes since they cannot appear on the right- hand side of a production. The attribute values of terminals and pragmas are supplied by the scanner (see Section 6.4.2). — type: formal — object: formal — type: formal — object: formal input attribute output attribute input attribute output attribute
Sec. 5.3 Cocol as a semantic description language 115 5.3.3 Context conditions There is no special language construct for context conditions in Cocol. They are written as conditional statements in semantic actions. This has the drawback of hiding them somewhat but has the advantage that arbitrary error actions can be associated with them. 5.13 Example Context conditions sem IF typel=type2 — context condition THEN ... — semantic action ELSE ... — error action END endsem 5.3.4 Semantic declarations All variables, procedures and named constants that are used as attributes or in semantic actions must be declared. The compiler description can be viewed as a module to which these objects are local. The user may also import objects from other modules. SemanticDeclarations = [ObjectDeclarations] [SemMacroDeclarations]. Declarations of semantic objects ObjectDeclarations = "SEMANTIC" "DECLARATIONS" modulatext. modulatext is an arbitrary text of import statements, constant, type, variable, or procedure declarations in Modula-2. The syntax of this text is not checked by Coco. 5.14 Example Declarations of semantic objects SEMANTIC DECLARATIONS FROM InOut IMPORT WriteCard, WriteString; FROM UserModule IMPORT UserProcedure; CONST maxint » 32767; VAR field: ARRAY[1..100] OF CHAR; PROCEDURE Equal(x,y:ARRAY OF CHAR): BOOLEAN; BEGIN ... END Equal;
116 The compiler description language Cocol Chap. 5 Declaration of semantic macros At this point the user may declare a set of semantic macros in this place which can be used in the productions. SemMacroDeclarations = "MACROS" {SemMacroDefinition}. SemMacroDefinition = "sem" ":" MacroName ":" {any} "endsem". MacroName = identifier. An example of the definition and the use of a semantic macro can be found in Section 5.3.1 (Example 5.10). 5.3.5 Scope of semantic objects For implementation reasons, the scope of a semantic object cannot be restricted to a single production: all declared and imported objects are global to the whole compiler description. This means that the value of a semantic object may be destroyed by a nonterminal that is processed between the assignment and the use of that object. One has to resort to the following remedies: 1. Naming conventions. Every production should use its own names for those attributes and semantic objects which may be destroyed by another production. This reduces the problem to semantic objects of recursive nonterminals. 2. Stacking. All values which may be destroyed by a nonterminal should be stacked before this nonterminal is entered and unstacked afterwards. 5.15 Example Stacking of semantic objects Expression<out:exprval> = Term<out:exprval> {"+" sem Push(iexprval) endsem Term<out:x> sem Pop(texprval)/ exprval:=exprval+x endsem }. Term<out:termval> = Factor<out:termval> {"*" sem Push(itermval) endsem Factor<out:x> sem Pop(ftermval); termval:=termval*x endsem }. Factor<out:factval> = integer<out:factval> I "{" Expression<out:factval> ")". The original values of exprval and termval are destroyed by the recursive calls to Term and Factor so they must be saved on a stack.
6 The compiler compiler Coco This chapter describes the compiler compiler Coco from the user's point of view. It contains everything the user needs to know in order to produce a compiler with Coco. Section 6.1 presents a survey of the main characteristics of Coco, Section 6.2 describes the components of the generated compilers, and Section 6.3 shows how these compilers work. Since Coco produces only the basic parts of a compiler, the user must supply additional modules to get a complete compiler. Section 6.4 describes the interfaces for these modules and Section 6.5 shows how a multi-pass compiler can be produced with Coco. 6.1 Characteristics Coco is a program which generates the basic parts of a compiler from a compiler description that is supplied as its input. The characteristics of Coco are: 1. The compiler definition language Cocol is easy to read and easy to learn. It is based on L-attributed grammars whose syntax rules are written in Wirth's EBNF notation, and whose semantic actions are coded directly in Modula-2. 2. Coco and the compilers produced by it are small and efficient, since they use simple analysis techniques (table-driven top-down parsing and L- attributed grammars), and since the parser tables are encoded in a very compact form (G-code). Therefore, they can be efficiendy used on microcomputers with a small memory and limited processor performance.
118 The compiler compiler Coco Chap. 6 3. The generated compilers contain a syntax error-recovery algorithm that is automatically derived from the attributed grammar. This frees the user from developing individual error handlers for each target compiler. 4. The user can attach modules of his own to the generated compiler parts, thus adapting the compiler to his particular needs. 5. The input grammar is checked for completeness, consistency, and unam* biguity. 6. Coco supports the production of multi-pass compilers for languages that cannot be translated in a single pass, or that are so large that a single-pass compiler will not fit into memory. 7. Coco offers the possibility of excluding selected source text portions from syntax analysis. Thus, it is possible to describe complements of regular languages, or to forward parts of the input from one pass to the next without modification. 8. Besides terminals and nonterminals, Coco provides a third class of symbols called pragmas. Pragmas are special terminals that can appear at arbitrary positions in the input stream, but are not part of the syntax of the language itself (e.g. end-of-line symbols or compiler options). How to invoke Coco The invocation of Coco and the naming of the files involved depend on the computer on which Coco is running. We describe the version for the Apple Macintosh. On the Macintosh, Coco is invoked by clicking its icon and by selecting an input file from the open dialog box which shows all available text files. Fig. 6.1 is a block diagram of a Coco run. Compiler description inCocol Program frames ► Coco i 1 Source list Syntax analyzer Semantic evaluator Fig. 6.1 Input and output files of Coco Coco reads a compiler description and produces the following: 1. a syntax analyzer as described in Section 2.5 together with parser tables (G-code and symbol information);
6 2 Components of the generated compiler 119 2 a semantic evaluator as described in Section 3.6; 3 * source list of the Cocol input with any syntax and semantic error messages, with the results of the grammar tests and with statistical data about the grammar. The syntax analyzer and the semantic evaluator are generated from program frames on files. On the Macintosh, the generated parts are written to the following files: Syntax analyzer: grammarnamesyn. DEF, grammarnamesyn. MOD Semantic evaluaton grammarnamesem.DEF, grammarnamesem .HOD Source list: input name. LST grammarname is the grammar name specified in Cocol, inputname is the name of the input file. Section 8.3 shows an example of these files. 6.2 Components of the generated compiler In order to get a complete compiler, the user must attach his own modules to the compiler parts produced by Coco. The following table shows which parts are generated by Coco, which must be supplied by the user, and which are available as standard modules. Generated by Coco Syntax analyzer Semantic evaluator User-supplied Main program Lexical analyzer Semantic modules Standard module Error message module Hence, Coco generates only the basic parts of a compiler (those which are described by the attributed grammar). For flexibility, the remaining parts may be written individually, although they are very similar in all compilers (see program listings in Appendix F). The lexical analyzer can be generated with the scanner generator Alex (Mossenb5ck [1986]), which is a separate tool not described in this book. It produces a scanner module in Modula-2 that exactly fits to the modules generated by Coco. The semantic modules are written in Modula-2. Only few conventions have to be obeyed (see Section 6.4).
120 The compiler compiler Coco Chap. 6 6.3 Operation of the generated compiler Figure 6.2 shows the overall structure of a generated single-pass compiler. The main program calls the syntax analyzer. The syntax analyzer parses the source program by interpreting the G-code and executes semantic actions contained in the semantic evaluator, which in turn call semantic procedures to emit the target code. A filter procedure between the actual syntax analyzer and the lexical analyzer filters any pragmas out of the input stream and processes them semantically. To create a multi-pass compiler, one must write a compiler description for each pass separately and translate it with Coco. This results in a syntax analyzer and a semantic evaluator for each pass. Figure 6.3 shows the interaction of the generated parts in a two-pass compiler. The first pass reads the source program, processes it and generates an intermediate language (IL). The second pass reads the intermediate language, processes it again and generates the target code. t Lexical analyzer Main program \ Syntax analyzer Error message module i Source text T Error messages Semantic evaluator Semantic procedures Target code Fig. 6.2 Overall structure of a generated single-pass compiler I Syntax analyzer 1 Main program I Lexical Source text } Semantic evaluator 1 i Semantic procedures 1 » Syntax analyzer 2 IL 1 ^ Semantic evaluator 2 I Semantic procedures 2 Target code Fig. 63 Overall structure of a generated two-pass compiler
Sec. 6.4 Interfaces of the generated compiler 121 6.4 Interfaces of the generated compiler A compiler nucleus produced by Coco has four interfaces (shown in Fig. 6.4). It is called by the main module, reads the input stream, translates it into an output stream, and produces error messages. This nucleus is the same for all generated compilers. The user must attach some of his own modules to these interfaces to adapt the compiler to his particular needs. Input interface < Operating system interface * Syntax analyzer < Error interface Semantic evaluate? Output interface Fig. 6.4 Interfaces of a generated compiler 6.4.1 Caller interface The main program must call the syntax analyzer of the generated compiler to perform the syntax analysis and semantic processing of the input text. The following definition module shows the interface between the syntax analyzer and the main program. DEFINITION MODULE grammarnamesyn; VAR printinput: BOOLEAN; (*trace the input?*) printnodes: BOOLEAN; (*trace the parser?*) PROCEDURE Parse(VAR correct:BOOLEAN); END gramma mamesyn. grammarnamesyn is the name of the generated syntax analyzer (the grammar name from Cocol with the suffix syn). The procedure Parse is the actual syntax analyzer. It must be called from the main program of the compiler. Prior to this, the lexical analyzer (see Section 6.4.2) must be initialized and ready to supply the first symbol. The parameter correct shows if syntax errors have been found. The variables printinput and printnodes can be set to TRUE in order to produce a trace of the syntax analysis for debugging.
122 The compiler compiler Coco 6.4.2 Input interface The syntax analyzer expects the input from a procedure GetSy which must be supplied by the user in a module grammarnamelex (grammar name from Cocol with the suffix lex). The corresponding definition module must look like this: DEFINITION MODULE grammarnamelex; VAR typ: CARDINAL; (*current symbol number*) at: ARRAY[1..10] OF CHAR; (*attributes of the current symbol*) line: CARDINAL; (*current symbol line number*) col: CARDINAL; <*current symbol column number*) PROCEDURE GetSy; END grammarnamelex. Every time the syntax analyzer needs a new terminal, it calls the procedure GetSy which returns the symbol number, line number and column number of the next source symbol in the global variables typ, line and col. It also fills the array at. If a symbol has i attributes, then at[l..i] holds their values, at is implicitly imported in any attributed grammar. It can contain a maximum of 10 attributes which experience has shown is sufficient. If imported, typ, line, and col can be used in the attributed grammar to get the type and the attributes of symbols that are recognized by the special symbol any. The symbol numbers returned by GetSy must correspond to the declaration sequence of the terminals and pragmas in the compiler description. The first declared symbol must have the number 1, the next symbol must have 2 and so on. At the end of the input stream GetSy must return an end-of-file symbol which by convention has the symbol number 0. 6.4.3 Output interface For the generation of object code and other compiler outputs the user is not bound by any restrictions. One can arbitrarily attach one's own modules to the compiler nucleus and call one's procedures from the semantic actions of the attributed grammar. Thus, the output interface is the interface to all user-supplied semantic modules. It is described by the import clauses in the semantic declarations of the compiler description and by the imported definition modules.
Sec. 6.4 Interfaces of the generated compiler 123 6.4.4 Syntax error interface The syntax analyzer of the generated compiler automatically recovers from a syntax error and gathers information about the cause of error. However, the user must provide for the output of the error message by supplying a procedure SyntaxError exported from a module Errors (see standard module in Appendix F). This procedure is called by the syntax analyzer each time a syntax error occurs. It can print the error message immediately or store it in order to display all error messages together at the end of the compilation. The definition module Errors must have the following form: DEFINITION MODULE Errors; TYPE Symbolname = ARRAY[1..25] OF CHAR; Errorptr » POINTER TO Errornode; Errornode = RECORD txt: Symbolname; (*symbol name*) 1: CARDINAL; (*length of symbol name*) next: Errorptr; (*to next symbol of the same message*) END; PROCEDURE SyntaxError(symbols-.Errorptr; line,col:CARDINAL); END Errors. SyntaxError has three parameters: symbols is a pointer to a linked list of those symbols that are expected at the error location (if available, alias names are used in place of symbol names). The parameters line and column indicate the line number and column number of the error location. Figure 6.5 shows a sample list of expected symbols pointed to by the parameter symbols. symbols- colon ■AT semicolon END Fig. 6.5 List of expected symbols, colon is the symbol causing the error; semicolon or END have been expected instead The first node of the list contains the symbol that caused the error (in this case the colon)y the subsequent nodes contain the symbols that were expected instead of the erroneous symbol (in this case semicolon and END). SyntaxError can now produce the following message: Syntaxerror in line...column...near colon: semicolon or END expected
124 The compiler compiler Coco Chap. 6 6.5 Generation of multi-pass compilers With L-attributed grammars, some languages can only be translated in multiple passes. Some other languages are so complex that a single-pass compiler would not fit into the memory of a microcomputer. For these reasons, a compiler must often be split into several passes. Each pass is a compiler of its own. It reads the source program, or an intermediate language from which it produces a new intermediate language, or the target program. If somebody wants to write a multi-pass compiler, he must write a compiler description for each pass, and then put the produced compiler passes in sequence (see Fig. 6.3). Cocol has features that are specially designed for the generation of multi-pass compilers: Input from an intermediate language. It is possible to read an intermediate language file instead of a source text by simply supplying an appropriate input procedure GetSy (see Section 6.4.2) Pragmas serve mainly to pass control information from one pass to the next in the intermediate language. Before they get to the syntax analyzer of the next pass they are extracted from the input stream and processed semantically. The symbol any. The grammar symbol any can be used to exclude parts of the source text from the syntax analysis, and forward it unchanged to the next pass. 6.1 Example Application of any A typical application of the complement symbol any is to process declarations in the first pass of a compiler and statements in the second pass. The following example skips statements and forwards them to the next pass: Block « Declarations BEGINSY { any } ENDBLOCKSY. Here, any denotes all terminal symbols except ENDBLOCKSY. It can be semantically processed using the variables typ and at exported by the lexical analyzer (see Section 6.4.2). sent Copy(ltyp,!line,icol,iat); — copy symbol to next — intermediate language endsem
7 The implementation In this chapter we will show how Coco is structured and how it works. First we provide an overview of its design (7.1). Then we describe the internal data structures such as the symbol list (7.2) and the top-down graph (7.3), as well as the collection of some sets of terminal symbols (7.4). Section 7.5 covers various grammar tests which the top-down graph is subjected to before the target compiler is generated. The last three sections cover the generation of the compiler parts, namely the parser tables (7.6), the syntax analyzer (7.7), and the semantic evaluator (7.8). Section 8.3 shows an example of the generated compiler parts for a specific input grammar. At the beginning of each section, a diagram is used to illustrate how this section relates to the structure of chapter 7. The implementation Structure Structure Collecting Grammar Generation Generation Generation of the of the tte tests of the of the of the symbol top-down symbol sets parser syntax semantic list graph tables analyzer evaluator Fig. 7.1 Structure of Chapter 7 We describe algorithms in an abstract manner, using Adele or Cocol. Appendix F contains the concrete implementation of Coco. Details that are not
126 The implementation Chap. 7 necessary for understanding the algorithms are omitted as they can be found in the program listings. Coco is written in Modula-2 and has been implemented on various micro- computers including Macintosh, IBM-PC, Atari and Lilith. It produces compilers in Modula-2 and was used for its own implementation, too. We describe the implementation on the Macintosh. 7.1 Survey Like any compiler, Coco is composed of an analysis part (front end) and a synthesis part (back end). The analysis part consists of a lexical analyzer and a syntax analyzer. The synthesis part consists of a semantic evaluator with several semantic modules attached to it (Fig. 7.2). Main program I Syntax analyzer I I Lexical analyzer Semantic evaluator I I I I I Symbol list Top-down graph Grammar tests Generation Generation handler handler of the of the syntax analyzer semantic evaluator Fig. 7.2 Structure of Coco with its main tasks shown as semantic modules From the above, the main tasks of Coco are: 1. handling a symbol list: Symbol information is stored (name, symbol number, attribute, scope, etc.); 2. handling a top-down graph: Graph nodes are generated and linked to form subgraphs; 3. testing the grammar: The grammar is checked to see if it is complete, non-circular, and LL(1). It is also checked to see whether all nonterminals can be reached and derived into terminal strings; 4. generating the syntax analyzer: The source code of the generated syntax analyzer is built from fixed frame parts, and variable parts derived from
Sec. 7.2 Structure of the symbol list 127 the compiler description. It includes LL(1) parser tables generated from the attributed grammar, * generating the semantic evaluator: The source code of the semantic evaluator is built from fixed frame parts and from semantic actions and declarations copied from the compiler description. The main algorithm of Coco is as follows: Coco: Initialize lexical analyzer; Parse (Tok); if ok then Find deletable symbols; Insert eps-nodes before deletable nt's; Delete redundant eps-nodes; Get symbol sets; Test grammar(tok); end; if ok then Generate compiler; else Print error message; end; end Coco; — Section 2.4 — Section 7.4.1 — Section 7.3.3 — Section 7.3.4 — Section 7.4 — Section 7.5 — Sections 7.6 and 7.7 The procedure Parse parses the input text and calls the semantic actions for the construction of the top-down graph and the symbol list as well as for the generation of the semantic evaluator. After some tests and transformations of the data structures the target compiler is produced 7.2 Structure of the symbol list Coco handles a symbol list with information about terminals, nonterminals, and pragmas. This section describes its representation and shows how it is filled. 7.2.1 Symbol list representation The symbol list is a linear list of symbol nodes each of them describing a syntax symbol. The list is indexed by symbol numbers. TYPE Symboltype = (eps,t,pr,nt,any,err); (*eps, terminal, pragma, nonterminal, any, error-symbol*) Symbolnode = RECORD spix: CARDINAL; (*spelling index of symbol name*)
The implementation The implementation Structure Structure Collecting Grammar Generation Generation Generation of the of the the tests of the of the of the symbol top-down symbol sets parser syntax semantic list graph tables analyzer evaluate I I symbol list symbol list representation construction Fig. 73 Structure of Section 12 aliasspix: CARDINAL; (*spelling index of alias name*) nra: CARDINAL; ("number of attributes*) CASE typ: Symboltype OF (*symbol kind*) t,eps,any: (*nothing*) I pr: seml,sem2: CARDINAL; (*pragma semantics*) I nt,err: start: CARDINAL; (*start of top-down graph*) del: BOOLEAN; (*TRUE if deletable*) firstat: Attributeptr; (*to first formal attribute*) END; END; Symbollist = ARRAY[0..maxsymbol] OF Symbolnode; The fields spix, aliasspix, nra, and typ are filled when the symbol is declared For terminals, this is the only information stored in the symbol list. The node of a pragma has two additional fields denoting the semantic actions which the generated compiler has to execute when it reads this pragma. The first action is for the output attribute assignments (Section 7.8.4), the second is the semantic action associated with this pragma in Cocol. If no actions are to be executed, both fields are zero. The fields are filled when the pragma is declared. Nonterminal nodes contain additional information: The field start points to the root of the top-down graph of this specific nonterminal. It is set when the corresponding rule has been processed. At the same time, the field del is set, which indicates whether the nonterminal is directly deletable, i.e. if it can be immediately derived into the empty string. The indirect deletability of a nonterminal can only be determined when the top-down graphs of all nonterminals have been built (see Section 7.4.1). Finally, nonterminal nodes have a field firsts tat pointing to a list of formal attributes. This list contains
7#2 Structure of the symbol list 129 the name and direction (input-output) of each attribute of the nonterminal. The attribute list is built when the nonterminal is declared. It is implemented as follows: TYPE Direction = (up,down); (*attribute direction*) Attributeptr = POINTER TO Attribute; Attribute = RECORD spix: CARDINAL; (*attribute name*) dir: Direction; (*up,down*) next: Attributeptr; (*to next attribute of same nt*) END; Names of symbols and attributes are not stored in the symbol list direcdy. Rather, they are stored in a name list which is an array of characters. Instead of the actual names the symbol list contains only their address in the name list, called spix (spelling index). The lexical analyzer handles a hashed list of 'spixes1 for fast searching of names. 7.2.2 Symbol list construction For each symbol in the syntax declarations of Cocol, a symbol node with a successive number is allocated. Therefore, symbol numbers correspond to the declaration sequence of the symbols. The following procedures are used to generate, access, and modify symbol nodes: PROCEDURE NewSy(spix:CARDINAL; typ:Symboltype): CARDINAL; PROCEDURE SyNr(spix:CARDINAL): CARDINAL; PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode); PROCEDURE RepSy(sy:CARDINAL; sn:Symbolnode); NewSy generates a new symbol node with the fields spix and typ and returns its node number. SyNr searches for the symbol with the name spix. If spix is found, SyNr returns the corresponding symbol number, else it returns 65535 (the value of the null symbol). GetSy gets the symbol node sn corresponding to symbol number sy. Repsy replaces the symbol sy by the node sn. Attributes are processed with the following procedures: PROCEDURE NewAt(sy,spix:CARDINAL; dirdirection); PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction); PROCEDURE CompleteAt(sy,n:CARDINAL) : BOOLEAN; NewAt defines a new attribute for the symbol sy. For nonterminals, it also appends the name (spix) and the direction (dir) of the attribute to the attribute list. Get At gets the fields spix and dir of the nth attribute of the nonterminal sy. If sy has less than n attributes, then 0 is returned as the value of spix.
130 The implementation Omp.7 CompleteAt returns TRUE if the symbol sy has exactly n attributes. The implementation of these procedures is trivial as can be seen in Appendix F. 7.3 Structure of the top-down graph The top-down graph has already been described in Section 2.3 as an internal grammar representation. In Coco, it is implemented in a somewhat extended form. First, we will describe the extended top-down graphs, and then show how they are generated. In Section 7.6.2, we will describe the translation of top-down graphs into G-code. The implementation Structure of the symbol list Structure of the top-down graph Collecting Grammar die tests symbol sets Generation of the parser tables Generation of the syntax analyzer Generation of die semantic evahiator Top-down graph representation Top-down graph construction Insertion of eps-nodes Removal of redundant Fig. 7.4 Structure of Section 7.3 7.3.1 Top-down graph representation The top-down graph is a linear list of graph nodes. Each symbol on the right- hand side of a Cocol rule is represented by a node. The pointers linking the nodes are indices of this list. TYPE Topdowngraph = ARRAY[1. Graphnode - RECORD typ: (eps,t,nt,any); sp: CARDINAL; lp: rp: CARDINAL; CARDINAL; .maxnode] OF Graphnode; (*symbol kind*) (*t,nt: pointer to node in symbol list*) (*eps: pointer to eps-set*) (*any: pointer to any-set*) (*left pointer*) (*right pointer*)
Sec. 7.3 Structure of the top-down graph 131 semi: sem2: sem3: line: link: END; CARDINAL; (*in-attribute action*) CARDINAL; (*out-attribute action*) CARDINAL; (*explicit semantic action*) CARDINAL; (*line number in the source text*) CARDINAL; (*pointer to the next right end*) Compared to Section 2.3 the graph node is extended by three semantic numbers, a line number, and a pointer (link). These fields have the following meaning: semi: action number of the input attribute assignments or zero (Sect. 7.8.4); sem2: action number of the output attribute assignments or zero (Sect. 7.8.4); sem3: number of the user-written semantic action which follows this symbol in the Cocol text, or zero; line: line number of this symbol in the Cocol text (for error messages); link: pointer for linking the right ends of a graph (the right ends are the nodes whose right pointer is zero). 7.3.2 Top-down graph construction It is useful to think of a top-down graph as a 'black box1 linked to its environment by two pointers head and tail. The interior of the black box may contain a single node, or an arbitrarily complex graph with several nodes. (Fig. 7.5). Fig. 7.5 Top-down graph as a black box head points to the root of the graph and tail to its right end. Since the right end of the graph usually consists of several nodes, these nodes are linked (see dashed lines above). The following procedures are used to generate and process the graph nodes: PROCEDURE NewNode(typ:Symboltype; sy,line:CARDINAL): CARDINAL; PROCEDURE GetNode(n:CARDINAL; VAR gnrGraphnode); PROCEDURE RepNode(n:CARDINAL; gnrGraphnode); NewNode creates a graph node containing the specified symbol sy, having
132 The implementation Chap. 7 the symbol type typ, and the line number line and returns its node number. GetNode returns the nth graph node in gn. RepNode replaces the nth graph node by gn. Two top-down graphs can be combined to a new graph by arranging them either side by side as successive components or below one another as alternatives. In either case, a new top-down graph with head and tail is produced. Linking of successive components Coco uses the procedure ConcatRight to link sucessive components. ConcatRight (Jheadl, Jtaill, ihead2, itail2) : param headl,head2,taill,tail2: Cardinal; local p: Cardinal; begin p:=taill; while p<>0 do gn(p).rp:=head2; p:=gn(p).link; end; taill:=tail2; end ConcatRight; ConcatRight links the graphs (headl, taill) and (head2, tail2) via right pointers giving the new graph (headl, taill). The right ends of the first graph are linked with the root of the second graph (see Fig. 7.6). head2 gj tail2 teadl Fig. 7.6 Linking of successive components
Sec. 7.3 Structure of the top-down graph 133 Linking of alternatives Coco uses the procedure ConeatLeft to link alternatives. ConeatLeft (Jheadl, Jtaill, ihead2, ltail2) param headl,head2,taill,tail2: Cardinal; local p: Cardinal; begin p:=headl; while gn(p).lp<>0 do p:=gn(p).lp; end; gn(p).lp:=head2; p:=taill; while gn(p) .linkoO do p:=gn(p) .link; end; gn(p).link:=tail2; end ConeatLeft; ConeatLeft links the graphs (headl, taill) and (Jtead2, tail2) via left pointers giving the new graph (headl, taill). The end of the first alternative chain of the first graph is linked with the root of the second graph. The right ends of both graphs are connected in a similar way (see Fig. 7.7). headl TJ taill head2 ! J tail2 headl □ 1 r ^^♦H—! ^S»«>.«-: £J taill □ Fig. 7.7 Linking of alternatives An attributed grammar for the construction of top-down graphs In order to show that attributed grammars can be used for documentation as well, we will describe the generation of the top-down graph for one syntax rule by means of an attributed grammar. The complete top-down graph is composed of the graphs for all syntax rules.
134 The implementation Chap. 7 The grammar of EBNF rules Rule « identifier "=" Expression n.w. Expression = Term {"|n Term}. Term = Factor {Factor}. Factor « symbol I "eps" I "any" I "(" Expression w)n I "[" Expression "]" I n{n Expression "}". contains the nonterminals Expression, Term, and Factor. Each of these nonterminals supplies as an output attribute a top-down graph with the ends head and tail. These graphs can be linked in two different ways: factor graphs are linked via right pointers, term graphs via left pointers {ConcatRight and ConcatLeft). A new top-down graph is formed in either case, which is again represented by head and tail. Expression, Term, and Factor also supply an output attribute del, which indicates if the term or factor is directly deletable, i.e. if it can be derived into the empty string, del is entered into the symbol list The attributed grammar uses the procedures described above to handle the symbol list (GetSy, RepSy, SyNr) and the top-down graph (NewNode, ConcatLeft, ConcatRight). GRAMMAR Rule — graph generation for a single rule SEMANTIC DECLARATIONS FROM cocogra IMPORT NewNode, ConcatLeft, ConcatRight, Push, Pop; FROM cocosym IMPORT GetSy, RepSy, SyNr, Symbolnode, anysy, epssy; VAR hl,h2,h3: CARDINAL; — head pointers tl,t2,t3: CARDINAL; — tail pointers dell,del2,del3:BOOLEAN; — TRUE, if element is deletable sn: Symbolnode; spix,syspix: CARDINAL; — spelling indices sy: CARDINAL; — symbol number MACROS sem :PushValues: Push(lhl); Push(itl); Push(idell); Push(ih2); Push(it2); Push(idel2); endsem sem :PopValues: Pop(Tdel2); Pop(tt2); Pop(th2); Pop(tdell); Pop(ttl); Pop(thl); endsem TERMINALS twit it\ tf it rn niit writ nin n_.it it it "pnc" nanvn symbol<out:spix>
Sec. 7.3 Structure of the top-down graph 135 NONTERMINALS Rule Expression <out:hl,tl,dell> Term <out:h2,t2,del2> Factor <out:h3,t3,del3> RULES Rule = symbol<out:syspix> Expression<out:hl,tl,dell> sem sy:=SyNr(isyspix); GetSy(isy,?sn); sn.del:=dell; sn.start:=hl; RepSy(isy,isn); endsem n it Expression<out:hl,tl,dell£ = Term<out:hl,tl,dell> { n|n Term<out:h2,t2,del2> sem ConcatLeft(Jhl,ttl,ih2,it2); dell:=dell OR del2; endsem }. Term<out:h2,t2,del2> - Factor<out:h2,t2,del2> { Factor<out:h3,t3,del3> sem ConcatRight (th2,tt2,ih3,it3) ; del2:=del2 AND del 3; endsem }. Factor<out:h3,t3,del3> « symbol<out:spix> sem sy:=SyNr(ispix); h3:=NewNode(isy); t3:=h3; del3:=FALSE; endsem I neps° sem h3:=NewNode(iepssy); t3:=h3; del3:=TRUE; endsem I "any" sem h3:=NewNode(4anysy); t3:=h3; del3:=FALSE; endsem I "(" sem (PushValues) endsem Expression<out:h3,t3,del3> n)w sem (PopValues) endsem I "[" sem (PushValues) endsem Expression<out:h3, t3, del3> sem hi:=NewNode(iepssy); tl:=hl; ConcatLeft(ih3,tt3,ihl,itl); del3:=TRUE; endsem
136 The implementation Chap. 7 n3n sem (PopValues) endsem I n{n sem (PushValues) endsem Expression<out:h3,t3,del3> sem hl:=NewNode(iepssy); tl:=hl; ConcatRight(th3,tt3,ih3,it3); ConcatLeft(th3,tt3,ihl,itl); t3:=tl; del3:=TRUE; endsem n}n sem (PopValues) endsem. ENDGRAM Figure 7.8 shows which graphs are produced by the translation of an EBNF expression in brackets. As an example, we select the expression able. (able) [able] {able} Fig. 7*8 Translation of an EBNF expression into a top-down graph 7.3.3 Insertion of eps-nodes Normally each symbol of the input grammar corresponds to one node in the top-down graph. However, from Fig. 7.8, we see that the translation of expressions in square or curly brackets leads to the generation of additional eps-nodes which have no counterpart in the input grammar. They are inserted by Coco to indicate that an expression is deletable. There are also some other cases where eps-nodes must be inserted into graphs: The algorithm of Section 7.3.2 will fail if a term that begins with an expression in curly brackets has an alternative. The production S - <{a>b I c). would lead to the top-down graph shown in Fig. 7.9.
Sec. 7.3 Structure of the top-down graph 137 S: -*- a ' \ e— b C Fig. 7.9 Erroneous top-down graph for S = ({a} b I c) This is obviously wrong because once an a has been identified, only a or b should follow, not c, as is possible in the above graph. This problem is solved by including an eps-node in front of the first alternative (Fig. 7.10). S: —^el—*—^ a—I t i ^b Fig 7.10 Correct top-down graph for S = ({a} b I c) with inserted eps-node This graph is now correct since after identifying an a, only a or b can follow, not c. For each eps-node, the set of terminal successors (eps-sei) is computed (Section 7.4.4). The eps-set of the node el (namely {a, b}) allows us to distinguish between the two alternatives in the above example. Eps- nodes are inserted in front of all expressions in curly brackets during the construction of the top-down graph (see attributed grammar in Appendix F). Deletable nonterminals present a similar problem. If a nonterminal is deletable, it is always processed by the syntax analyzer, because if the current input symbol is not a start symbol of the nonterminal itself it may still be a valid successor. Now, if there is a node which is an alternative of a deletable nonterminal, this node will never be visited, since the nonterminal will always be recognized beforehand. Coco solves this problem by inserting an eps-node in front of a deletable nonterminal. The eps-set of this node is then used to distinguish between the alternatives. From the graphs shown in Fig. 7.11, where the deletable nonterminal Y has an alternative, the graphs in Fig. 7.12 are produced. X: -~y—*- a Y: — c b e Fig, 7.11 Top-down graph with deletable nonterminal Y X: —-el—-Y—-a Y:—- c b e2 Fig. 7.12 Top-down graph with inserted eps-node in front of deletable nonterminal Y
138 The implementation Chap. 7 The eps-set of the node el (namely {a, c}; c is a terminal start of Y and a is successor of the deletable nonterminal 10 enables the selection between the two alternatives starting with el and b. There are no more alternatives to the node with the deletable nonterminal Y. It can therefore be safely visited by the syntax analyzer. The algorithm for the insertion of eps-nodes in front of deletable nonterminals is shown below. Insert eps-nodes before deletable nt's: local gn,gnl: Graphnode; sn: Symbolnode; begin for all nodes i do Get Node (i i,tgn); if (gn.typ=nt) and (gn.lpoO) then GetSy(!gn.sp,Tsn); if sn.del then — deletable nt with alternative gnl:=gn; gnl.lp:=0; — gnl now holds the deletable nt j: =NewNode {int, 10,10); — create empty node RepNode(ij,ignl); gn.typ:=eps; gn.sp:-0; — gn holds the new eps-node gn.rp:=j; gn.seml:=0; gn.sem2:=0; gn.sem3:=0; RepNode(ii,ign); end; end; end; — for end Insert eps-nodes before deletable nt's; 7.3.4 Removal of redundant eps-nodes When expressions in square or curly brackets are translated, eps-nodes arise that can be removed again if it turns out that the expressions have successors (see Fig. 7.13). The algorithm for the removal of redundant eps-nodes is shown below: Delete redundant eps-nodes: global visited: set of nodenumbers; — mark list for visited nodes sn: Symbolnode; begin visited:={}; for all nonterminals i do GetSy(ii,tsn); DelEps(isn. start); end; end Delete redundant eps-nodes;
Sec. 7.3 Structure of the top-down graph 139 Graph with redundant eps-nodes Equivalent graph without redundant eps-nodes EBNF expression [a]b — a-r-^b -^a-T*b {a}b a—r*- b —*• a —t*» il iT Fig. 7.13 Creation and removal of redundant eps-nodes The procedure DelEps(lloc) deletes all redundant eps-nodes in the top-down graph with the root loc. Redundant eps-nodes can be recognized by the following characteristics: they have no associated semantic actions, their left pointer is null, and their right pointer is not null. They always receive a link from the left pointer of some other node. DelEps(iloc): param loc: Cardinal; global visited: set of nodenumbers; — mark list for visited nodes; local gn,gnl: Graphnode; begin if loc=0 or loc in visited then return end; — end or cycle visited:=visited+{loc}; GetNode(iloc,tgn); if gn.lpoO then — test if alt. node is a redundant eps GetNode(ign.lp,Tgnl) ; if (gnl.typ=eps) and (gnl.sem3=0) and (gnl.lp=0) and (gnl.rpoO) then gn.lp:=gnl.rp; RepNode(iloc,ign); end; end; DelEps(ign.lp); DelEps(lgn.rp); end DelEps;
140 The implementation Chap. 7 7.4 Collecting the symbol sets So far, the input grammar has been read and the symbol list as well as the top- down graph have been built. From these two data structures, Coco calculates the symbol sets needed for the grammar tests and for the generated compiler. The implementation Structure Structure of the of the symbol top-down list graph Collecting the symbol sets Grammar Generation Generation Generation tests of die of the of the parser syntax semantic tables analyzer evaluator Deletable nonterminals Terminal start symbols of nonterminals Terminal successors of nonterminals any-sets Fig. 7.14 Structure of Section 7.4 Coco collects four sets of terminals: 1. start symbols of nonterminals; 2. successors of nonterminals; 3. successors of eps-nodes (eps-sets); 4. sets represented by any-symbols (any-sets). The following procedures are used to access the top-down graph and the symbol list: PROCEDURE GetNode(loc:CARDINAL; VAR gn:Graphnode); PROCEDURE RepNode(loc:CARDINAL; gn:Graphnode); PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode); PROCEDURE RepSy(sy:CARDINAL; snrSymbolnode); GetNode gets the graph node gn with the number loc. RepNode replaces the graph node with the number loc by the node gn. GetSy gets the symbol node sn with the number sy. RepSy replaces the symbol node with the number sy by the node sn. Before the symbol sets are collected, it is necessary to find out which nonterminals are deletable.
Sec. 7.4 Collecting the symbol sets 141 7.4.1 Deletable nonterminals All deletable nonterminals are tagged in the symbol list. In the first step, tagging of those symbols which can be directly derived into the empty string is carried out. In the second step, tagging of all those nonterminals whose top- down graph can be traversed along a path of already tagged symbols is carried out The second step is repeated until no more deletable symbols are found. The directly deletable nonterminals are found when the top-down graph is created (see Section 7.3.2). The following algorithm finds the indirectly deletable nonterminals. Find deletable symbols: local sn: Symbolnode; changed: Boolean; begin repeat changed:=false; for all nonterminals i do GetSy(ii,?sn); if not sn.del and Deletable(isn.start) then sn.del:=true; RepSy(ii,4sn); changedi^true; end; end; until not changed; end Find deletable symbols; The procedure Deletable(iloc) checks if the top-down graph rooted at loc is deletable (i.e. if it can be traversed along a path of deletable symbols). Deletable(iloc) Boolean: param loc: Cardinal; global marked: set of nodenumbers; — mark list for visited nodes begin marked:={}; return DelGraph(iloc); end Deletable; The actual work is performed by the procedure DelGraph. DelGraph(iloc) Boolean: param loc: Cardinal; global marked: set of nodenumbers; local gn: Graphnode; begin if loc=0 then return true; end; — end of graph found if loc in marked then return false; end; — already visited: cycle marked:=marked+{1oc}; GetNode(iloc,?gn); return ((gn.lpoO) and DelGraph(ign.lp)) or — deletable alternat. (DelNode(ign) and DelGraph(ign.rp)); — or deletable right end DelGraph; — part of graph
142 The implementation Chap. 7 Finally, DelNode checks if a node (i.e. its corresponding symbol) is dele- table. DelNode(ign) Boolean: param gn: Graphnode; local sn: Symbolnode; begin if gn.typ=nt then GetSy(lgn.sp,tsn); return sn.del; else return gn.typ«eps; end; end DelNode; 7.4.2 Terminal start symbols of nonterminals The terminal start symbols of a nonterminal are the terminal start symbols of its top-down graph, i.e. the start symbols of its first alternative chain. Those nodes of the chain which contain nonterminals will have their terminal start symbols calculated recursively. If the chain contains a deletable symbol, its successors have also to be considered. The terminal start symbols of all nonterminals are stored in a list. Get terminal start symbols: global first: array(nonterminals) of record ts: set of terminals; — terminal start symbols ready: Boolean; — true, if ts is computed end; local sn: Symbolnode; begin for all nonterminals i do first(i).ready:=false; end; for all nonterminals i do GetSy(ii,?sn); GetFirstSetdsn. start, Tfirst(i) .ts); first(i).ready:=true; end; end Get terminal start symbols; The procedure GetFirstSet(lloc,ts) supplies the terminal start symbols of the top-down graph with the root loc. GetFirstSet(iloc,Ts) : param loc: Cardinal; s: set of terminals; global visited: set of nodenumbers; — mark list for visited nodes begin visited:={}; CollectFirst(iloc,fs); end GetFirstSet;
Sec. 7.4 Collecting the symbol sets 143 tpirstSet initializes a mark list for the prevention of cycles and calls the procedure CollectFirst which does the actual work. CollectFirst(iloc,?s): param loc: Cardinal; s: set of terminals; global visited: set of nodenumbers; — mark list for visited nodes first: like in 'Get terminal start symbols'; local sn: Symbolnode; gn: Graphnode; si: set of terminals; begin s:={>; while locoO do — for all alternatives if loc in visited then return; end; — cycle visited:=visited+{loc}; Get Node (i 1 oc, t gn) ; if DelNode(ign) then CollectFirst(ign.rp,tsi); s:=s+sl; end; case gn.typ of t: s:=s+{gn.sp}; I nt: if first(gn.sp).ready then s:=s+first(gn.sp).ts; else GetSy(ign.sp,tsn); CollectFirst (isn.start,tsl) ; s:=s+sl; end; I any: s^s + {all terminals}; I eps: — nothing end; loc:=gn.lp; end; end CollectFirst; The procedure DelNode(ign) from Section 7.4.1 checks if the graph node gn is deletable. 7.4.3 Terminal successors of nonterminals The terminal successors of all nonterminals are stored in another list. They are collected in two steps: first, a search is made for the direct successors of all nonterminals (those terminals immediately following this nonterminal at all its occurrences in the graph); then the indirect successors are calculated (if a nonterminal is at the end of a rule, its indirect successors are the successors of the nonterminal on the left-hand side of this rule). In the first step, the data structure follow is filled; this contains for each nonterminal i its direct successors (ts) and those nonterminals (nts), whose successors are indirect successors of i. In the second step, the indirect successors are added to ts.
144 The implementation Chap. 7 Get terminal successors: global follow: array(nonterminals) of ts: set of terminals; — terminal successors nts: set of nonterminals; ~ nt's whose successors end; — must be added to ts visitednod: set of nodenumbers; — mark list (visited nodes) visitedsym: set of nonterminals; — mark list (visited nt's) local sn: Symbolnode; i: Cardinal; begin for all nonterminals i do follow(i).ts:={}; follow(i).nts:={}; end; visitednod:={}; for all nonterminals i do — fill follow.ts and follow.nts GetSy(ii,?sn); CollectFollow(isn.start,ii); end; for all nonterminals i do ~ complete follow.ts visitedsym:={}; Complete (ii); follow(i) .nts:=0; end; end Get terminal successors; The procedure CollectFollow(lloc,lsy) traverses the top-down graph of the nonterminal sy starting at the node loc. Every time it encounters a nonterminal /, it adds its direct successors to the set follow(i).ts. For each nonterminal 1 at the right end of the graph, it adds sy to the setfollow(i).nts. CollectFollow(iloc,isy): param loc,sy: Cardinal; global follow: as in 'Get terminal successors'; visitednod: set of nodenumbers; local gn: Graphnode; s: set of terminals; begin while locoO do — step through alternatives chain if loc in visitednod then return; end; — cycle visitednod:=visitednod+{loc}; GetNode(iloc,?gn); if gn.typ=nt then GetFirstSet(ign.rp,Ts); follow(gn.sp).ts := follow(gn.sp).ts + s; If Deletable(ign.rp) then — nt at end of rule follow(gn.sp).nts := follow(gn.sp).nts + {sy}; end; end; CollectFollow(ign.rp,isy); loc-.^gn.lp; end; end CollectFollow; The procedure GetFirstSet(lloc,ts) from Section 7.4.2 computes the set of
Sec- 7.4 Collecting the symbol sets 145 nal start symbols s of the graph with the root loc. The procedure letable(Uoc) from Section 7.4.1 checks whether the graph rooted at loc isdeletable. The procedure Complete(li) used in Get terminal successors completes the direct successors of the nonterminal i (follow(i).ts) by adding its indirect successors, which are the successors of the nonterminals contained in follow(i).nts. Complete(ii): param i: Cardinal; global visitedsym: set of nonterminals; follow: like in 'Get terminal successors1; local j: Cardinal; begin if i in visitedsym then return; end; — cycle visitedsym:=visitedsym+{i}; for all j in follow(i).nts do Complete(ij); follow(i) ,ts:«follow(i) .ts+follow( j) .ts; end; end Complete; 7.4.4 eps-sets eps-nodes having an alternative must not be recognized by the generated syntax analyzer unless the next input symbol is a valid successor of this eps- node. In order to find out whether a symbol is a valid successor, the syntax analyzer must know the set of all possible successors of each eps-node with alternatives. The terminal successors of an eps-node are the terminal start symbols of the subgraph rooted at the right pointer of the eps-node. If the right pointer is null, the terminal successors are the successors of the nonterminal on the left- hand side of the graph containing the eps-node. First, the top-down graph of each nonterminal is searched for eps-nodes. Get eps-sets: global epsset: array of set of terminals; — eps successors maxeps: Cardinal; — number of eps-sets visited: set of nodenumbers; — mark list for visited nodes local sn: Symbolnode; begin visited:={}; maxeps:=0; for all nonterminals i do GetSy(ii,Tsn); FindEps(isn.start, ii,ifalse); end; end Get eps-sets;
146 The implementation Chap. 7 The procedure FindEps(lloc9l leftsy,Ivialp) searches the top-down graph with the root loc for eps-nodes. It computes their successors and stores them into the global array epsset. The field sp of the eps-node is set to point to this entry in epsset. The flag vialp indicates whether loc has been reached via a left pointer. FindEps(iloc,ileftsy,ivialp): param loc: Cardinal; — root of TDG leftsy: Cardinal; — left side nonterminal vialp: Boolean; — true, if loc is reached via lp global visited: set of nodenumbers; — mark list for visited nodes; local gn: Graphnode; begin if loc=0 or loc in visited then return; end; —end or cycle visited:=vi sited+{loc}; GetNode(iloc,fgn); if (gn.typ^eps) and (vialp or (gn.lpoO)) then — eps with alt. FindEpsFollowers(ign.rp,ileftsy,tgn.sp); — gn.sp points to RepNode(iloc,ign); — eps-set end; FindEps(ign.lp,ileftsy,itrue); FindEps(ign.rp,ileftsy,ifalse); end FindEps; The procedure FindEpsFollowers(llocMeftsy91nr) collects the terminal start symbols of the subgraph with the root loc. If the graph is deletable, the successors of the nonterminal leftsy are also added, nr is the index into the global array epsset. The collected set has been stored in epsset(nr). FindEpsFollowers(iloc,ileftsy,Tnr): param loc,leftsy,nr: Cardinal; global epsset: array of set of terminals; — successors of eps-nodes follow: like in Get terminal successors; maxeps: Cardinal; local s: set of terminals; begin GetFirstSet(iloc,Ts) ; if Deletable(iloc) then s:=s+follow(leftsy).ts; end; maxeps:=maxeps+l; epsset(maxeps):=s; nr:=maxeps; end FindEpsFollowers; The procedure GetFirstSet{iloc^s) from Section 7.4.2 collects the terminal start symbols of the graph with the root loc. The procedure Deletable(iloc) from Section 7.4.1 determines whether the graph with the root loc is deletable.
Sec- 7.5 Grammar tests 147 7,4.5 any-sets In order to recognize an any-symbol, the generated syntax analyzer needs the set of all terminals represented by the any-symbol. An any-symbol represents all terminals which are not in the alternative chain to which it belongs. For any-symbols without alternatives, no any-sets are computed. The syntax analyzer recognizes them regardless of the next input symbol. Get any-sets: global anyset: array of set of terminals; — any-sets maxany: Cardinal; — number of any-sets eofsy: Cardinal; — symbol number of eof-symbol local gn: Graphnode; s: set of terminals; begin for all nodes i do GetNode(ii,tgn); if (gn.typ=any) and (gn.lpoO) then GetFirstSet(ign.lp,ts); Make complement of s; s:=s-{eofsy}; — eofsy must not be recognized by any maxany:=maxany+1; anyset(maxany):=s; gn.sp:^maxany; — sp of any-node points to any-set RepNode{ii,ign); end; end; end Get any-sets; The procedure GetFirstSet(iloc^s) from Section 7.4.2 supplies the terminal start symbols of the graph with the root loc. For the calculation of an any-set, only those symbols are considered which can be reached via the left pointer of the any-node. The symbols which lie before the any-node in the alternative chain are not considered, since the syntax analyzer has already checked them before it gets to the any-node. 7.5 Grammar tests Before Coco generates the target compiler, it carefully checks if the grammar satisfies certain requirements which are necessary for a correct compiler. Here the compiler compiler proves to be very valuable: even in large grammars, which are hard to understand for human readers, it rapidly finds hidden ambiguities or circularities. The well-known problem of the 'dangling else' clearly
148 The implementation Chap. 7 shows how easy bugs in the grammar design can remain undetected without the support of an automatic tool (actually, this ambiguity was overlooked in the language definition of Algol). Coco verifies the following properties: 1. completeness; 2. reachability; 3. noncircularity; 4. termination; 5. LL(1) property The implementation Structure Structure of the of the symbol top-down list graph Collecting the symbol sets Grammar tests Generation Generation Generation of the of the of the parser syntax semantic tables analyzer evaluator Completeness 1 I I I Reachability Noncircularity Terminalization LL(l)-condition Fig. 7.15 Structure of Section 7.5 The test algorithms are executed in the following order: Test grammar(tok): Test completeness(tokl); Test if all nt's can be reached(Tok2); Find circular rules(tok3); Test if all nt's can be derived to t's(Tok4); LL1 test(Tok5); ok:=okl and ok2 and ok3 and ok4 and ok5; end Test grammar; These algorithms access the top-down graph and the symbol list with the following procedures, already described in Sections 7.2.2 and 7.3.2: PROCEDURE GetNodeUoc:CARDINAL; VAR gn:Graphnode) ; PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode);
Sec- 7.5 Grammar tests 149 7.5.1 Completeness As check is carried out as to whether there is a rule for all nonterminals. Basic idea: The field start in the symbol node of each nonterminal must point to a top-down graph. Test completeness <?ok): param ok: Boolean; local sn: Symbolnode; begin ok:=true; for all nonterminals i do GetSy(ii,tsn); if sn.start^O then ok:=false; end; end; end Test completeness; 7.5.2 Reachability A check is made as to whether all declared nonterminals appear in some sentential form derived from the start symbol of the grammar. Basic idea: First, tagging is done on all those nonterminals which can be derived directly from the start symbol, then on those nonterminals which can be derived from symbols already tagged. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are not reachable. Test if all nt's can be reached(Tok): param ok: Boolean; global visited: set of nodenumbers; — already visited nodes reached: set of nonterminals; — reachable nonterminals rootsy: Cardinal; — start symbol of grammar local sn: Symbolnode; begin visited:={}; reached:={rootsy}; GetSy(irootsy,?sn) ; MarkReachedNts(isn.start); ok:=true; for all nonterminals i do if not (i in reached) then ok:=false; end; end; end Test if all nt's can be reached; The procedure MarkReachedNts(iloc) marks all nonterminals which can be reached from the node loc.
150 The implementation Chap. 7 MarkReachedNts(iloc): param loc: Cardinal; global reached: set of nonterminals; — reachable nonterminals visited: set of nodenumbers; — already visited nodes local gn: Graphnode; sn: Symbolnode; begin if loc=0 or loc in visited then return; end; — end or cycle visited:=visited+{loc}; — visit loc GetNode(!loc,?gn); if (gn.typ^nt) and not(gn.sp in reached) then — new nt reached reached:=reached+{gn. sp} ; GetSy(ign.sp,tsn); MarkReachedNts(isn.start); end; MarkReachedNts (ign. lp) ; MarkReachedNts(ign.rp); end MarkReachedNts; 7.5.3 Noncircularity A check is made as to whether there are nonterminals which can be derived into themselves, i.e. if there are derivations X =»+ X for some nonterminals X. (This circularity definition differs from the usual definition in attributed grammars, which defines circular dependencies of attributes.) Basic idea: All productions are considered, which have a single nonterminal as their right-hand side. These single-nonterminal productions make up a graph that must be noncircular. Algorithm: The graph is stored as pairs (left, right) of nonterminals for which there is a production left -► right. Find circular rules (tok): param ok: Boolean; global visited: set of nodenumbers; — mark list for visited nodes local graph : array of record — derivation graph left,right: Cardinal; deleted: Boolean; end; graphlength: Cardinal; singles: set of nonterminals; — single descendants of a nt sn: Symbolnode; ' changed: Boolean; i,j: Cardinal; begin graphlength:=0; for all nonterminals i do — build the graph singles:={}; visited:={);
Sec. 7.5 Grammar tests 151 GetSy(iiftsn); GetSingles(isn.start,tsingles); — get nt's j such that i->j for all nonterminals j in singles do graphlength:=graphlength+l; with graph(graphlength) do lefti^i; right:=j; deleted:=false; end; end; end; repeat — remove edges, which are not on a cycle changed:=false; for i:=l to graphlength do if not graph(i).deleted and (graph(i).left not on any right-hand side or graph(i).right not on any left-hand side) then graph(i).deleted:=true; changed:=true; end; end; until not changed; ok:=graph is empty; end Find circular rules; The elements that have not been deleted in the graph represent the circular part of the grammar. The procedure GetSinglesQ, loc ^singles) collects a set (singles) of nonterminals in the top-down graph with the root loc. If the graph can be derived into a single nonterminal X, then X is added to singles. The following assertion always holds: loc is on a path which contains only deletable symbols between its beginning and loc. GetSingles(iloc,tsingles): param loc: Cardinal; singles: set of nonterminals; global visited: set of nodenumbers; local gn: Graphnode; begin — assert: all nodes left to loc are deletable if loc-0 or loc in visited then return; end; — end or cycle visited:=visited+{loc); GetNode(iloc,Tgn); if (gn.typ^nt) and Deletable(ign.rp) then — right subgraph singles:=singles+{gn.sp} — deletable end; if DelNode(ign) then GetSingles(ign.rp,tsingles) end; GetSingles(ign.lp,tsingles) ; end GetSingles; A nonterminal X is added to singles if it is on a path from loc to the end of the top-down graph and if this path has only deletable nodes to the left and right of X. The deletability of subgraphs and nodes is determined by the procedures Deletable and Delhi ode from Section 7.4.1.
152 The implementation Chap. 7 7.5.4 Termination A check is made as to whether all nonterminals can be derived into (possibly empty) strings of terminals. Basic idea: Those nonterminals are tagged which are deletable or can be derived into a string consisting only of terminals or already tagged nonterminals. This is repeated until no more nonterminals can be tagged. The untagged nonterminals are those which cannot be derived into terminals. Test if nt's can be derived to t'sftok): param ok: Boolean; global visited: set of nodenumbers; — mark list for visited nodes termlist: set of nonterminals; — nonterminals which can be — derived to terminals local changed: Boolean; sn: Symbolnode; begin termlist:={}; repeat changed:=false; for all nonterminals i which are not in termlist do GetSy(ii,tsn); visited:={}; if IsTerm(isn.start) then termlist:=termlist+{i}; changed:=true; end; end; — for until not changed; ok:=all nonterminals are in termlist; end Test if nt's can be derived to t's; The procedure IsTerm(lloc) checks if the top-down graph with the root loc has a (possibly empty) path which consists only of terminals or already tagged nonterminals. IsTerm(iloc): Boolean: param loc: Cardinal; global visited: set of nodenumbers; termlist: set of nonterminals; local gn: Graphnode; begin if loc=0 or loc in visited then return false; end; — end or cycle vis ited:=visited+{loc}; GetNode(iloc,tgn); if (gn.typ=nt) and not (gn.sp in termlist) then return IsTerm(ign.lp); else return (gn.rp=0) or IsTerm(lgn.rp) or IsTerm(ign.lp); end; end IsTerm;
Sec. 7.5 Grammar tests 153 7.5.5 LL(1) condition A check is made as to whether it is always possible to decide which path of the top-down graph should be followed during syntax analysis depending on the next input symbol. Basic idea: The LL(1) test consists of the following two subtests: 1. The terminal start symbols of all alternatives in an alternative chain must be disjoint 2. The terminal start symbols of deletable subgraphs must be different from the terminal successors of the left-hand side nonterminal. LLl test(Tok): param ok: Boolean; global visited: set of nodenumbers; — mark list for visited nodes local sn: Symbolnode; begin ok:=true; for all nonterminals i do visited:=U; GetSy(li,Tsn); CheckAlternatives (4 sn. start, 4 i, tok); end; end LLl test; The procedure CheckAlternatives(Uoc,lsy,tok) checks if the alternative chain with the root loc contains only alternatives with distinct start symbols (subtest 1). If the subgraph rooted at loc is deletable (i.e. if it can produce the empty string), it is also checked whether the start symbols of the subgraph are different from the successors of the left-hand side nonterminal sy (subtest 2). CheckAlternatives uses GetF(lsy,lfirst) and GetFo(lsy,1follow) to access the already calculated sets of terminal start symbols and successors of nonterminals. CheckAlternatives (4loc, 4sy,tok) : param loc,sy: Cardinal; ok: Boolean; global visited: set of nodenumbers; — mark list for visited nodes local first: set of terminals; follow: set of terminals; locset: set of terminals; — start symbols of current node s: set of terminals; — start symbols of prev. alt. gn: Graphnode; begin if loc^O or loc in visited then return; end; ~ end or cycle if Deletable(4loc) then — subtest 2 GetFirstSet (4loc,ts) ; GetFo(4sy,tfollow);
154 The implementation Chap. 7 if s * follow <> {} then ok:=false; end; end; s:-{>; while locoO do — for all alternatives ... subtest 1 if loc in visited then return; end; visited:=visited+{loc}; GetNode(iloc,tgn); if DelNode(lgn) then GetFirstSet (4gn.rp,tlocset); else locset:={}; end; case gn.typ of t: locset:=locset+{gn.sp}; I nt: GetF(!gn.sp,tfirst); locset:=locset+first; I eps,any: — nothing end; if s * locset <> {} then ok:=false; end; s:=s+locset; CheckAlternatives(ign.rp,lsy,tok); loc:-gn.lp; end; end CheckAlternatives; The procedures Deletableiiloc) and DelNode(ign) from Section 7.4.1 check whether the top-down graph with the root loc or the graph node gn are deletable. The procedure GetFirstSet(lloc9ts) from Section 7.4.2 supplies the terminal start symbols s of the top-down graph with the root be. 7.6 Generation of the parser tables When the grammar tests are completed, Coco can generate the target compiler. From the symbol list and the top-down graph, the parser tables which drive the generated compiler are constructed. The tables contain information for the recognition of symbols and for error handling, including the G-code which controls the syntax analysis. This section is structured as shown in Fig. 7.16. 7.6.1 Table format The parser tables are inserted into the generated syntax analyzer as initialization code. Table 7.1 shows their contents:
Sec. 7.6 Generation of the parser tables 155 Structure Structure oflhe ofthe symbol top-down list graph The implementation Collecting the symbol sets Grammar Generation of die parser tables Generation Generation ofthe oflhe syntax semantic analyzer evaluator Table format Generation oftheG-code Generation of the remaining tables Fig. 7.16 Structure of Section 7.6 Table 7.1 Contents of the parser tables [ Table item header code ntsymbols epssets anysets attribute numbers pragma semantics - namelist name pointers Contents table dimensions (for decoding) G-code information about nonterminals sets of valid successors, one for each eps-instruction in the G-code sets of terminals represented by each any-symbol number of attributes for each terminal and each pragma for each pragma, the semantic actions to be executed when the pragma is recognized symbol names for error messages pointers to the symbol names The structure of the above data is shown by the following Modula-2 type declarations: TYPE Header = RECORD maxcodevar, maxtvar, maxpvar, maxsvar, maxepsvar, maxanyvar, maxnamevar, maxnamepvar: CARDINAL; END; Code = ARRAY[L.maxcode] OF [0..255]; Symbolset - ARRAY[0..maxt DIV 16] OF BITSET; Ntsymbols « ARRAY[maxp+1..maxsym] OF RECORD startpc: CARDINAL; (*start of rule in G-code*) del: BOOLEAN; (*true, if deletable*) first: Symbolset; (*terminal start symbols*) END; Epsset = ARRAY[l..maxeps] OF Symbolset; Anyset = ARRAY[1..maxany] OF Symbolset; Attributenumbers = ARRAY[0..maxp] OF [0..255];
156 The implementation Chap. 7 Pragmasemantics - ARRAY[maxt..maxp] OF RECORD seml,sem2: CARDINAL; (*element maxt is a dummy*) END; Namelist = ARRAY[1..maxname] OF CHAR; Namepointers = ARRAY[0..maxnamep] OF CARDINAL; Checksum ■ CARDINAL; The constants maxcode, maxt, maxp, etc. are the table dimensions derived from the input grammar. They are inserted into the generated syntax analyzer as constant declarations. The header of the parser tables contains the same values as variables again. However, they are not used by the syntax analyzer, but are reserved for a decoding program. 7.6.2 Generation of the G-code The G-code is derived from the top-down graph. This process is very simple: A recursive algorithm visits all nodes of the top-down graph and translates them into G-code instructions. The simplified algorithm is shown below: GenCode(inode): Generate code for node; if (node.rpoO) and (node.rp not yet visited) then GenCode(inode.rp); end; if (node.lpoO) AND (node.lp not yet visited) then GenCode (4 node. lp) ; end; end GenCode; Each node is processed as follows (for the definition of the G-code, see Section 2.4 or Appendix D): 1. Depending on the node type, a G-code instruction for the recognition of this node is generated (T, NT, NTS, ANY and EPS instructions). For nodes with a nonzero left pointer value, the generated instruction also contains the address of the corresponding alternative (TA, NTA, NTAS, ANYA and EPSA instructions). 2. If semantic actions are specified in the node, SEM instructions are generated. 3. If the right pointer of the node is zero, a RET instruction is generated 4. If the right pointer points to an already visited node, a JMP instruction to the address of this node is generated. In order to resolve jumps and addresses of alternatives, an address list of all G-code sequences generated from graph nodes is needed. It is handled by the following procedures:
Sec. 7.6 Generation of the parser tables 157 PROCEDURE NewAdr (loc-.CARDINAL; adr:CARDINAL) ; PROCEDURE GetAdr(loc,fixup:CARDINAL; VAR adr:CARDINAL); PROCEDURE Visited(loc:CARDINAL): BOOLEAN; NewAdr defines that the G-code sequence generated from node loc has the address adr. GetAdr returns the address adr of the G-code sequence corresponding to node loc. If the address is not yet in the address list, then adr is zero. In this case, fixup is remembered as a G-code location where the node's address is to be entered as soon as it becomes known. An address becomes known, when it is defined by NewAdr. It is then automatically entered into all fixup locations waiting for this address. Visited returns TRUE if the address of the node with number loc is already known. Two additional procedures are needed: one to emit G-code instructions and one to access nodes of top-down graphs: PROCEDURE Emit(VAR pcrCARDINAL; code:Instruction); PROCEDURE GetNode(loc:CARDINAL; VAR node:Graphnode); Emit writes the specified instruction code into the code segment at the location pc and increases the code segment length accordingly. Here, Instruction is a symbolic type that is represented by the text of the instruction. The actual implementation deviates from this. GetNode gets the graph node with the node number loc. The type Graphnode is described in Section 7.3.1. The actual algorithm for the generation of the G-code follows: Generate G-code: local pc: Cardinal; begin pc:=l; for all nonterminals i do GenCode(iroot of top-down graph of nonterminal i, tpc) ; end; end Generate G-code; GenCode(lloc,tpc) is a recursive procedure which will now be refined. It translates the top-down graph with the root loc into a corresponding G-code sequence and inserts it into the code segment at the location pc. When GenCode arrives at a node loc that has already been visited, the G-code for the subgraph at loc has already been generated, so this node does not have to be revisited. GenCode(Iloc,tpc): param loc,pc: Cardinal; var node: Graphnode; adr,nr: Cardinal; begin if Visited(iloc) then return; end; NewAdr {Iloc, ipc); — now visit loc
158 The implementation Chap. 7 GetNode(iloc,tnode); case node.typ of t: if node.lp=0 then Emit(tpc, i"T node.sp"); else GetAdr(inode. lp, ipc+2, tadr); Emit (tpc, I"TA node.sp,adr") ; end; I nt: if node.lp=0 then if node.seml=0 then Emit (tpc, i "NT node.sp"); else Emit (tpc, I "NTS node.sp,node.semi"); end; else GetAdr(Inode.lp, ipc+2,Tadr); if node.seml^O then Emit(tpc,l"NTA node.sp,adr"); else Emit (tpc, i"NTAS node.sp,adr,node.semi"); end; end; I any: if node.sp=0 then Emit{tpc,i"ANY"); else GetAdr (inode.lp,ipc+2,tadr) ; Emit(tpc, i"ANYA node.sp, adr"); end; I eps: if node.spoO then — node with eps-set if node.lp=0 then Emit (tpc, i "EPS node.sp"); else GetAdr(inode.lp,ipc+2, tadr); Emit (tpc, i"EPSA node.sp, adr"); end; end; end; —case if node.sem2<>0 THEN Emit(tpc,i"SEM (node.sem2)"); end; if node.sem3<>0 THEN Emit (tpc, i"SEM (node.sem3)"); end; if node.rp=0 then Emit (tpc, i"RET") ; else if Visited(inode.rp) then GetAdr(inode.rp,ipc+l, tadr); Emit (tpc, i"JMP adr"); end; end; if node.rpoO then GenCode(Inode.rp,tpc); end; if node.lpoO then GenCode(inode. lp, tpc); end; end GenCode;
Sec. 7.7 Generation of the syntax analyzer 159 r-code is completely stored in memory so that the missing addresses can J^iserted when they become known. 7 6.3 Generation of the remaining tables Besides the G-code, the contents of the generated tables are almost entirely extracted from the symbol list. Only the name list is handled by the lexical analyzer of Coco. Coco gets the necessary data from the symbol list and from the lexical analyzer with the help of access procedures, and writes them unchanged into the syntax analyzer as initialization values. 7.7 Generation of the syntax analyzer Coco generates a table-driven LL(1) syntax analyzer with error handling in the form of a Modula-2 source module which the user must compile and include in his compiler. The syntax analyzer is the implementation of the analysis algorithm described in Section 2.5. It is the same for all generated compilers. Only the parser tables differ from compiler to compiler so they have to be inserted into the otherwise invariant parser module. The implementation Structure Structure Collecting ofthe ofthe the symbol top-down symbol sets list graph Grammar tests Generation ofthe parser tables Generation ofthe syntax analyzer Generation ofthe semantic evaluator Fig. 7.17 Structure of Section 7.7 The definition module and the implementation module of the syntax analyzer are generated from a frame text which Coco reads from the file cocosyn- frame. At certain locations grammar-dependent parts have to be inserted into this frame. The locations are marked by the string •-->• and a descriptive name of the text to be inserted. The following table shows what has to be inserted at these locations.
160 The implementation Chap. 7 —>modulename grammar name + syn —>semantic analyzer grammar name + sent —>input module grammar name + lex —declarations table dimensions declared as constants (see example in Section 8.3) —>tables table values The syntax analyzer contains references to other modules (e.g. the lexical analyzer or the semantic evaluator) whose names are constructed from the grammar name (the name of the root symbol in the attributed grammar) and from a suffix. The resulting syntax analyzer is written to the files grammar- namesyn.DEF and gramrnarnamesynMOD. Coco uses a procedure CopyFramePart to copy pieces of text from the frame to the syntax analyzer module. PROCEDURE CopyFramePart(VAR source,target:File; str:ARRAY OF CHAR); CopyFramePart copies text from the file source to the file target until it encounters the string str (str is not copied). When it is next called, it continues copying the text immediately behind str. This procedure is called with the name of the next piece of text to be inserted (e.g. -->tables'). It copies the frame up to this name and then Coco inserts the specified text in place of the name. This process is repeated until the entire syntax analyzer has been generated. A source listing of cocosynframe is shown in Appendix F. The module cocosyn, also shown in Appendix F, is an example of a syntax analyzer generated by this process. 7.8 Generation of the semantic evaluator In addition to the syntax analyzer and the parser tables, Coco also generates a semantic evaluator. This is a Modula-2 source module which the user must compile and include in his compiler. The semantic evaluator consists of some invariant parts and of the semantic actions and declarations which are copied from the attributed grammar. Its generation can be divided into three tasks: 1. copy the semantic declarations from the attributed grammar to the semantic evaluator, 2. translate the semantic actions into components of a case statement; 3. generate new semantic actions (assignments) for attribute passing. Before covering these three tasks in detail, we will describe the invariant parts of the semantic evaluator.
Sec. 7.8 Generation of the semantic evaluator 161 The implementation Structure Structure of the of the symbol top-down list graph I Collecting the symbol sets Grammar Generation Generation tests of the of the parser syntax tables analyzer Generation of the semantic evaluator Constant parts of the semantic evaluator Translation of semantic declarations Translation of semantic actions Attribute processing Fig. 7.18 Structure of Section 7.8 7.8.1 The invariant parts of the semantic evaluator Like the syntax analyzer, the semantic analyzer is derived from a frame module which Coco reads from the file cocosemframe. Again Coco copies the frame using the procedure CopyFramePart (see Section 7.7) and inserts grammar-dependent parts at some specified places in the frame. These places are: —>modulename —>scannername —declarations —>actions The frame module is as follows: DEFINITION MODULE —>modulename; VAR printactions: BOOLEAN; PROCEDURE Semant(sem:CARDINAL); END —>modulename. grammar name + sem grammar name + lex semantic declaration of the grammar semantic actions of the grammar (* IMPLEMENTATION MODULE ~>modulename; FROM SYSTEM IMPORT WORD; FROM —>scannername IMPORT at; —declarations PROCEDURE ASSIGN(VAR x:WORD; y.WORD); BEGIN x:=y END ASSIGN;
162 The implementation Chap. 7 PROCEDURE Semant(sem:CARDINAL); BEGIN CASE sem OF 11: ; (*action numbers start at 12*) —>actions END; END Semant; END —>modulename. The resulting semantic analyzer is written to the files granunarnamestmDEF and grammarnamescmMOD. The user may set the exported variable print- actions to TRUE if he wants a trace of the executed semantic actions. 7.8.2 Processing of the semantic declarations The semantic declarations, which are written in Modula-2, are copied immediately and without change from the attributed grammar to the frame program, and are inserted at the location marked by *~>declarations\ This happens in the following manner: the lexical analyzer of Coco returns the symbols of the Modula-2 text to the syntax analyzer as Cocol symbols, and from there they go to the semantic evaluator. The procedure Copyiltyp^lcol) is called for each symbol to translate its symbol code back into its source text, which is then inserted into the frame module. Problems can arise since the Modula-2 text may contain symbols that are not Cocol symbols (i.e. +, *, &, etc). Such symbols are copied by means of a trick: the lexical analyzer assigns them a special symbol code (nococosy) and an attribute (spix). They are treated like names and entered into the name list. spix is their address in the name list, which allows their source text to be accessed. In order to keep the name list small, the Modula-2 names are entered only temporarily. Permanent storage is prevented with the procedure StopHash. This causes a name to be entered, but overwritten by the next name, so the names can be accessed via their addresses just like the permanently stored names, but only until the next name has been recognized. The procedure RestartHash re-establishes permanent storage. Coco copies the declarations without checking the syntax. If there are syntax errors, they will be detected by the Modula-2 compiler when the generated semantic evaluator is compiled. We now describe the translation of the semantic declarations by an attributed grammar in Cocol. GRAMMAR Declarations SEMANTIC DECLARATIONS FROM cocogen IMPORT Copy;
« 7.8 Generation of the semantic evaluator 163 FROM cocolex IMPORT col, typ, StopHash, RestartHash; — PROCEDURE Copy(typ,col:CARDINAL); writes the source text of the symbol 'typ' to the generated semantic analyzer, col is the symbol column in the grammar. TERMINALS SEMANTICSY DECLARATIONSY NONTERMINALS Declarations ROLES Declarations = SEMANTICSY DECLARATIONSY sem StopHash endsem { any sem Copy(ityp,icol) endsem } sem RestartHash endsem. ENDGRAM 7.8.3 Processing of the semantic actions Coco translates the semantic actions of the attributed grammar into continuously numbered variants of a case statement, and inserts them into the semantic frame program at the location marked by the string 9~>aetions\ Like the declarations, the semantic actions are copied unchanged and without a syntax check. Again, each symbol is copied by translating its symbol code back into its source text We describe this process in Cocol. GRAMMAR SemAction SEMANTIC DECLARATIONS FROM cocogen IMPORT Copy, OpenSem; FROM cocosym IMPORT NewMacro, GetMacroNr; FROM cocolex IMPORT col, typ, StopHash, RestartHash; FROM Errors IMPORT SemErr; —PROCEDURE OpenSem(VAR sem:CARDINAL); — generates a new case label and returns its number sem. —PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL); — gets the action number sem of the macro 'spix1. If the macro — does not exist, sem^O. VAR spix,sem: CARDINAL; TERMINALS SEMSY ENDSEMSY n(n ■)• n:n IDENT<out:spix> NONTERMINALS SemAction<out:sem> RULES SemAction<out:sem> = SEMSY ( "(n IDENT<out:spix> sem GetMacroNr(ispix,Tsem); IF sem=0 THEN SemErr(1) END endsem n\ it I sem OpenSem(Tsem); StopHash endsem { any sem Copy(ityp,Acol) endsem } ) ENDSEMSY sem RestartHash endsem. ENDGRAM
164 The implementation Chap. 7 The above grammar also shows how semantic macros are processed. The module cocosym handles a list of macro names and their corresponding semantic action numbers. The action number of a macro is supplied by the procedure GetMacroNr. 7.8.4 Attribute processing While declarations and semantic actions need only to be copied from the attributed grammar into the semantic evaluator, attributes need further processing. For each symbol, its attributes must be stored in the symbol list, and must be checked for consistency every time this symbol occurs. In addition to this, Coco must generate semantic actions by which values are assigned to the attributes at run-time. The processing of attributes depends on the context in which they appear. In Cocol there are three different places where attributes may occur: 1. at the declaration of a syntax symbol; 2. at a nonterminal on the left-hand side of a rule; 3. at a symbol on the right-hand side of a rule. We will now describe the processing of attributes in each of these three cases, and then summarize it by an attributed grammar. Declaration of attributes Attributes are declared together with syntax symbols and are entered into the symbol list. The context of attribute declarations is: SyntaxDeclarations = TERMINALS {Symbol [Attributes] [AliasName]} [ PRAGMAS {Symbol [Attributes] [SemAction]} ] NONTERMINALS {identifier [Attributes] [AliasName]}. Coco uses the procedure NewAt to enter an attribute into the symbol list. TYPE Direction = (up,down); PROCEDURE NewAt(sy,spix:CARDINAL; dir:Direction); New At enters an attribute spix with the direction dir for the symbol sy. Depending on the kind of sy, the following information is stored: for terminal symbols: number of attributes; for pragmas: number of attributes; for nonterminals: numbeh,name, and direction of attributes.
Sec. 7.8 Generation of the semantic evaluator 165 Attributes on the left-hand side of productions Attributes on the left-hand side of productions are called formal attributes. Their context is: Rule = identifier [Attributes] W=B Expression n.n . Formal attributes are checked for consistency with their declaration. For every left-hand side nonterminal the number of attributes, their names, order, and direction must agree with the attributes declared for this nonterminal. The procedure GetAt is used to access the attribute information in the symbol list. It gets the name (spix) and the direction (dir) of the nth attribute of the nonterminal sy. If sy has fewer than n attributes, then spix is zero. PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction); Attributes on the right-hand side of productions Attributes on the right-hand side of productions appear as actual attributes of syntax symbols in EBNF expressions. Expression « Term {"I" Term}. Term = Factor {Factor}. Factor = Symbol [Attributes] I ... . In this context, attributes denote semantic values which result from the recognition of a syntax symbol, or which are required for its recognition. Coco generates assignments between the attribute values and the attribute names, and includes them as semantic actions in the evaluator program. It also checks whether the number of attributes, their order and their direction agree with the corresponding attribute declaration. Attribute assignments for terminals and pragmas The lexical analyzer of the generated compiler exports the attribute values of terminals and pragmas in the variable at. The array at is filled for each symbol by the lexical analyzer. A terminal (or pragma) t<out:aJb> is handled by the generated compiler as follows: recognize t and fill at; a:=at(l); b:=at(2); When t has been recognized, a semantic action must be executed in which the attribute values at(\) and at(2) are assigned to the attributes a and b. Since such an action does not exist, Coco must generate it. Attribute assignments for nonterminals For nonterminals, attribute assignments occur between formal and actual attributes. A nonterminal nt<in:a,b; out:c9d> is handled by the generated
155 The implementation Chap. 7 compiler as follows: formal attribute corresponding to a := a; formal attribute corresponding to b := b; parse nt; c := formal attribute corresponding to c; d := formal attribute corresponding to d; Again Coco must generate semantic actions for the attribute assignments. Generation of attribute assignments For each attribute on the right-hand side of a production, Coco calls the procedure GenAssign, which generates an assignment of the corresponding attribute value to the attribute variable. TYPE Attrkind = (term, (*attribute of a terminal*) nonterm, (*attribute of a nonterminal*) const); (*const. value as an attribute of an nt*) PROCEDURE GenAssign(typ:Attrkind; left, right .-CARDINAL) ; Table 7.2 shows the meaning of the parameters left and right depending on the value of the parameter typ. It also shows which code is generated: Table 7.2 Parameters of GenAssign and the generated code Value of typ term nonterm const Meaning of left Spix of leftside Spix of leftside Spix of leftside Meaning of right Attribute number Spix of right side Constant value Generated code namefleft):=at[right] name(left):=name(right) name(left):=right name(spix) denotes the name at the address spix in the name list. The array at is exported by the lexical analyzer and contains the attribute values of the most recently recognized terminal. The procedure EmitAction builds a semantic action from the attribute assignments generated since its last call. It inserts the action as a variant of a case statement into the semantic evaluator. Thus, the semantic evaluator contains not only the semantic actions of the attributed grammar, but also the actions generated from attributes by Coco. EmitAction returns the action number of the generated semantic action. PROCEDURE EmitAction(VAR sem:CARDINAL);
Sec. 7.8 Generation of the semantic evaluator 167 Optimization of attribute passing Coco performs two optimizations to reduce the number of attribute assignments: 1 If the formal and the actual attribute of a nonterminal have identical names, no assignment is generated. 2. Identical semantic actions (with the same assignments) are generated only once. Description of the attribute processing in Cocol We will now summarize the attribute processing, describing it by an attributed grammar in Cocol. The start symbol of the grammar is the nonterminal Attributes. The grammar is a segment of a larger grammar in which attributes can appear in various contexts. Therefore, Attributes has three input attributes which control its processing. Attributes<in:sy,styp,kind; out:semi,sem2> sy denotes the symbol to which the attributes belong; styp specifies the type of this symbol; kind is the context in which the attributes are being used indicating how they are to be processed: kind=def: treat them as an attribute declaration; kind=check: perform a consistency check (when used on the left-hand side of a production); kind-use: generate semantic actions for attribute passing (when used on the right-hand side of a production). semi and sem2 are the numbers of the generated semantic actions for input and output attribute passing (or zero). GRAMMAR Attributes SEMANTIC DECLARATIONS FROM cocosym IMPORT NewAt, GetAt, CompleteAt, Direction, Usage, Symboltype; FROM cocogen IMPORT Attrtype, EmitAction, GenAssign; FROM Errors IMPORT SemErr; —TYPE — Attrtype = (term,nonterm,const); — Direction = (up,down); (*out-at or in-at*) — Usage = (def,check,use); (*attribute context:*) — Symboltype = (eps,t,pr,nt,any); —PROCEDURE NewAt(sy,spix:CARDINAL; dir:Direction); — declares an attribute for the symbol sy with the name spix and — the direction dir. —PROCEDURE GetAt(sy,n:CARDINAL; VAR spix:CARDINAL; — VAR dir-.Direct ion);
168 The implementation Chap. 7 — gets the name spix and the direction dir of attribute number n — of symbol sy. If sy has less than n attributes, then spix=0. —PROCEDURE CompleteAt(sy,n:CARDINAL): BOOLEAN; — returns true if symbol sy has exactly n attributes. VAR sy,spix,spixl,semi,sem2,n,val: CARDINAL; styp: Symboltype; kind: Osage; dir,dirl: Direction; MACROS sem :AssignInAt: n:=*n+l; CASE kind OF use: IF styp-nt THEN GetAt(!sy,in,tspixl,?dirl); IF spixloO THEN IF dir=dirl THEN GenAssign(inonterm,i spixl, I spix) ELSE SemErr(2) END END END; I check: IF styp^nt THEN GetAt(isy,in,Tspixl,?dirl); IF spixloO THEN IF spixOspixl THEN SemErr(3) END; IF dirOdirl THEN SemErr(2) END END END; I def: NewAt(isy,ispix,idir) END — CASE endsem sem :AssignNumber: n:=n+l; IF kind=use THEN IF styp=nt THEN GetAt(isy,Jrn,Tspixl,tdirl) ; IF spixloO THEN IF dir=dirl THEN GenAssign(iconst,ispixl,Ival) ELSE SemErr(2) END END END ELSE SemErr(4) END endsem
Sec. 7,8 Generation of the semantic evaluator 169 sem :AssignOutAt: n:=n+l; CASE kind OF use: IF styp=t THEN GenAssign(iterm,ispix,in) ELSIF styp=nt THEN GetAt(isy,in,tspixl,tdirl); IF spixloO THEN IF dir=dirl THEN GenAssign(inonterm,ispix,ispixl) ELSE SemErr(2) END END END; | check:IF styp^nt THEN GetAt(isy,in,tspixl,Tdirl); IF spixloO THEN IF spixospixl THEN SemErr(3) END; IF dirOdirl THEN SemErr(2) END END END; | def: NewAt(4sy,ispix,idir); IF styp=pr THEN GenAssign(iterm,lspix,in) END END — CASE endsem TERMINALS n>« n<W n.n *f it if.it INSY 0UTSY IDENT<out:spix> NUMBER<out:val> NONTERMINALS Attributes<in:sy,styp,kind; out:semi,sem2> InAttr<in:sy,styp,kind; out:semi,sem2,n> —n: attribute counter OutAttr<in:sy,styp,kind,n; out:semi,sem2,n> RULES Attributes<in:sy,styp,kind; out:semi,sem2> = w<w sem seml:=0; sem2:=s0 endsem ( InAttr<in:sy,styp,kind; out:semi,n> [ ";n OutAttr<in:sy,styp,kind,n; out:sem2,n> ] I OutAttr<in:sy,styp,kind, 0; out:sem2,n> ) n>w sem IF NOT CompleteAt(isy,in) THEN SemErr(5) END endsem. InAttr<in:sy,styp,kind; out:semi,n> ■ INSY ":n sem IF stypont THEN SemErr(l) END; dir:=down; n:=0 endsem
170 The implementation Chap. 7 ( IDENT<out:spix> sem (AssignlnAt) endsem I NUMBER<out:val> sem (AssignNumber) endsem ) { \n ( IDENT<out:spix> sem (AssignlnAt) endsem I N0MBER<out:val> sem (AssignNumber) endsem )) sem IF kind=use THEN Emit Act i on (t semi) END endsem. OutAttr<in:sy,styp,kind,n; out:sem2,n> = OUTSY ":n sem dir:=up endsem IDENT<out:spix> sem (AssignOutAt) endsem { "," IDENT<out:spix> sem (AssignOutAt) endsem } sem IF (kind=use) OR (styp=pr) THEN EmitAction(tsem2) END endsem. ENDGRAM If one of the context conditions is violated, the procedure SemErr(in) is called, which emits an error message depending on n: n error message 1: In-attributes for a pragma or terminal 2: Wrong attribute direction 3: Wrong attribute name 4: Formal attribute is a constant 5: Wrong number of attributes
s Applications 8.1 Applications in compiler construction Attributed grammars are mainly used in compiler construction - more precisely for the description of compilers. However, the description of an actual compiler is far too complex to be used as an introductory example. Therefore, in this section we will use Cocol to develop a lexical analyzer, which is part of a compiler. This example is general enough to demonstrate all language constructs of Cocol, and yet simple enough for a reader inexperienced with attributed grammars to follow it The application of Coco to an actual compiler (the compiler description for Coco itself) can be found in Appendix F. It is unusual to describe and to generate lexical analyzers with attributed grammars. Normally, they are coded by hand since they must be very efficient (lexical analysis takes the biggest part of the compilation time). There are special scanner generators which are designed to produce fast lexical analyzers. Although Coco is not such a generator, run-time measurements show that it is possible in both theory and practice to implement lexical analyzers with Coco. As an example, we will develop a lexical analyzer for Modula-2. First we will give a general specification for lexical analyzers. Then we will prepare a special specification of a lexical analyzer for Modula-2. Next we will describe and build this lexical analyzer using Cocol. Finally we will explain some of the problems that can arise. At the end of this section, we will specify the semantic procedures used in the description of the lexical analyzer. 171
172 Applications Chap, g 8.1.1 Specification of a lexical analyzer General tasks A lexical analyzer must at least perform the following tasks: 1. read and optionally print the source program; 2. skip meaningless character sequences such as blanks, comments, etc.; 3. recognize and tokenize terminals such as keywords, names, numbers, and operators; 4. report lexical errors. Usually, a lexical analyzer will recognize only one terminal per call and pass it to the syntax analyzer. However, there are also analyzers that process the entire source text at once, and write the symbol codes of the recognized terminals to an intermediate file so that the syntax analyzer can read them later on. The lexical analyzer described here is of the latter type. Tasks of a lexical analyzer for Modula-2 A lexical analyzer for Modula-2 must recognize the following terminals: Keywords AND ARRAY BEGIN BY CASE CONST DEFINITION DIV DO ELSE ELS IF END EXIT EXPORT FOR FROM IF IMPLEMENTATION IMPORT IN LOOP MOD MODULE NOT OF OR POINTER PROCEDURE QUALIFIED RECORD REPEAT RETURN SET THEN TO TYPE UNTIL VAR WHILE WITH Names Identifier = Letter {Letter I Digit}. Letter = wAn |"B"|...I"ZnI"a"I"b"I...I"z". Digit = noBrinr2nr3nr4nr5nr6n!n7iir8,,r9n. Decimal constants DecNumber = Digit {Digit}. Hexadecimal constants HexNumber = Digit {HexDigit} "H". HexDigit = Digit I (wAn|nBn | wCnrDnrE,,rFn) . Octal constants OctalNumber = OctalDigit {OctalDigit} "Bn.
Sec. 8.1 Applications in compiler construction 173 OctalDigit - «0nnwr2nln3w|w4wr5wr6wr7". Real constants RealNumber = Digit {Digit} •.• {Digit} [nEn [»+"!"-"] Digit [Digit]). Character constants CharConst - "'■ any nMf | ••• any ••• | OctalDigit {OctalDigit} nCn. Character strings String = Bin {any} n,w I fn' {any} ,n'. Comments Comment = "(*" {Comment I any} n*)w. Operators and separators + - * / : = & = # <> < > <= addition subtraction multiplication real division assignment logical and equal not equal not equal less than greater than less than or equal >= greater than or equal ( ) round parenthesis [ ] index-parenthesis { } set-brackets A pointer , comma , period ; semicolon : colon range operator I variant operator Context conditions 1. Decimal, hexadecimal, or octal constants must be in the range 0 to 65535. 2. The numerical value of character constants must be in the range 0 to 255. 3. Real constants must be in the range 1.4694E-39 to 1.7014E+38. 4. Character strings must not extend over line boundaries. 8.1.2 Description of a lexical analyzer for Modula-2 In the previous section, we described the lexical structure of Modula-2 by a context-free grammar. Now we will have to attribute it. The following points need special attention. The lexical analyzer supplies the terminals for syntax analysis. These are the nonterminals of the lexical analyzer, whereas the terminals of the lexical
174 Applications Chap. 8 analyzer are the characters of the source text. These characters must be supplied by a mini-scanner with the following tasks: 1. read and print the source program; 2. supply the characters of the source text as terminals; 3. treat the character sequences '..', *(*\ and '*)' as special terminals (to simplify the attributed grammar). This still leaves enough work for the lexical analyzer proper. In accordance with Section 6.4.2, we will implement the mini-scanner in the procedure GetSy of the module Scannerlex. The mini-scanner is so simple that we refrain from describing it further. Now we will specify the lexical analyzer of Modula-2 with Cocol. GRAMMAR Scanner SEMANTIC DECLARATIONS FROM Conversions IMPORT Convert, ConvertReal; FROM Errors IMPORT SemErr; FROM ListMod IMPORT EnterString, Hash; FROM Scannerlex IMPORT typ, line, col; FROM OutMod IMPORT Symboltype, Emit, EmitConstant, Emitldent, EmitString; —TYPE Symboltype = (*token codes*) — (eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy, minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy, semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy, importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy, ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy, untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy, becomessy, endsy, callsy, definitionsy, implementationsy, proceduresy, modulesy, ident, cardcon, intcardcon, realcon, charcon, stringcon, eolsy); — buffer length (every token must ~ fit on a 80 character line) — string address in string list nax] OF CHAR; — buffer — buffer length — auxiliary — first character in a string — auxiliary — string length — value of real-constant — spelling index of identifier — token code — symbol column — constant value CONST blmax « VAR addr : b: bl: ch: 80; CARDINAL; ARRAY[l..bl CARDINAL; CHAR; firstch:CHAR; i: length: rval: spix: sy: symcol: val: CARDINAL; CARDINAL; REAL; CARDINAL; Symboltype; CARDINAL; CARDINAL;
Sec. 8.1 Applications in compiler construction 175 MACROS , t ^ , sem :AddCh: — it is supposed, that lines — are not longer than 80 characters bl:-bl+l; b[bl]:-ch endsem TERMINALS n #n n (*ti i n*\ tt chr9 chrlO chrll chrl7 chrl8 chrl9 chr25 chr26 chr27 if if nj« n /if tt\ if n0n «!« ngft ngn "@n A H I P Q X Y »*" a h i p q x y NONTERMINALS Scanner Symbol Identifier Number String Comment Letter Digit HexDigit RULES Scanner = {Symbol} f if f w*ti non n . n B J R Z b J r z <out:sy,j chr4 chrl2 chr20 chr28 "#" w+i" "3" n. n t c K S if rtt c k s n / n chr5 CR chr21 chr29 "$" n if f n^ti ff<W D L T mi d 1 t n 1 n spix,symcol> <out:sy,val,symcol> <out:sy,< <out:ch> <out:ch> <out:ch> chr6 chrl4 chr22 chr30 if%" tt_tt if5ff tf_ff E N U i» i n e m u n \ n chr7 chrl5 chr23 chr31 n&ff n n tt6ft ti>tt F N V tf Atf f n V chrl26 addr,length,firstch,symcol> sem Emit(ieofsy,ico chr8 chrl6 chr24 n t it it/tt tf Tit tfjtl G 0 W ft ft g, 0 w i chrl27 1) endsem. Symbol = {" "} — skip blanks ( Identifier <out:sy,spix,symcol> sem IF sy=ident THEN EmitIdent(ispix,isymcol) — ident. ELSE Emit(isy,lsymcol) — keyword END endsem I Number <out:sy,val,symcol> sem EmitConstant(isy,ival,isymcol) endsem — cardcon, intcardcon, realcon, charcon
176 Applications Chap, g String <out:sy,addr,length,firstch,symcol> sem IF sy=stringcon THEN EmitString(iaddr,ilength,isymcol) ELSE EmitConstant(icharcon, iORD(firstch),Asymcol) END endsem I Comment ti /if tt\ it ti rn ■]■ ti/n it \ it w*n it it f n/it n + ti CR / ii _n I eps ) ( ">n | «»=:» I eps ) tt-vti / it—it I ">" ( I eps sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit sem Emit [lsemicolonsy,lcol) endsem ;ieqlsy,icol) endsem ilparsy,lcol) endsem (irparsy,icol) endsem |ilbracksy,icol) endsem ;irbracksy,icol) endsem [ilconbrsy,icol) endsem irconbrsy,icol) endsem 4timessy,Acol) endsem icommasy,icol) endsem islashsy,Acol) endsem lplussy,icol) endsem iminussy,icol) endsem (iarrowsy,icol) endsem lvariantsy,lcol) endsem inotsy,icol) endsem (iandsy,icol) endsem iperiodsy,Acol) endsem irangesy,icol) endsem |leolsy,icol) endsem ibecomessy,icol) endsem icolonsy,icol) endsem [inotsy,icol) endsem lleqsy,icol) endsem ilsssy,icol) endsem igeqsy,icol) endsem lgtrsy,icol) endsem ). Identifier <out:sy,spix,symcol> = Letter <out:ch> { Letter <out:ch> I Digit <out:ch> } sem symcol:=col; bl:=0 endsem sem (AddCh) endsem sem (AddCh) endsem sem (AddCh) endsem sem Hash(ib,ibl,Tsy,Tspix) — sy is identifier or keyword endsem.
Sec. 8.1 Applications in compiler construction 177 number <out:sy,val,symcol> = wu^ sem symcol:=col; bl:=0 endsem Digit <out:ch> sem (AddCh) endsem { HexDigit <out:ch> sem (AddCh) endsem }{ H sem bl:-bl+l; b[bl] :="HW; Convert(lb,ibl,Tsy,Tval) endsem I w.w sem bl:=bl+l; b[blj :=CHR(typ) endsem { Digit <out:ch> sem (AddCh) endsem } [ e sem bl:=bl+l; b[bl]:=CHR(typ) endsem j n+« | it_w sem bl:=bl+l; b[bi]:=CHR(typ) endsem ] Digit <out:ch> sem (AddCh) endsem [ Digit <out:ch> sem (AddCh) endsem ] ] sem ConvertReal(ib,ibl,Trval); sy:=realcon; val:=CARDINAL(rval) endsem I eps sem Convert(ib,ibl,?sy,Tval) endsem ). String <out:sy,addr,length,firstch,symcol> = sem symcol:=col; bl:=0 endsem t n t { "..n sem b[bl+l]:=".n; b[bl+2]:="."; bl:=bl+2 endsem I "(*• sem b[bl+l]:="("; b[bl+2]:="*"; bl:=bl+2 endsem I "*)" sem b[bl+l]:=n*n; b[bl+2]:-")"; bl:-bl+2 endsem I CR sem SemError(il,iline,icol); bl:=0 endsem I any sem bl:=bl+l; b[bl]:=CHR(typ) endsem } i n i n in { ".." sem b[bl+l]:=n."; b[bl+2]:="."; bl:-bl+2 endsem I "(*" sem b[bl+l]:="("; b[bl+2]:="*"; bl:=bl+2 endsem I "*)" sem b[bl+l]:»"*"; b[bl+2]:=•)■; bl:=bl+2 endsem I CR sem SemError(ll,iline,lcol); bl:=0 endsem I any sem bl:=bl+l; b[bl] :<HR(typ) endsem ) n i it ) sem length:-bl; IF length=l THEN sy:=charcon; firstch:=b[l] ELSE
Applications Chap. 8 sy:=stringcon; EnterString(ib,ibl,taddr) END endsem. Comment = "(*n { comment I any } "*)n. Letter <out:ch> = (A|B|C|D|E|F|G|H|I|J|K|L|M|N|0|P|Q|R|S|T|D|V|W|X|Y|Z| a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z) sem ch:=CHR(typ) endsem. Digit <out:ch> « ("0"inlw \"2n |"3" I "4"|"5"|"6" |"7"I w8nI "9") sem ch:=CHR(typ) endsem. HexDigit <out:ch> » digit <out:ch> I (AJB|C|D|E|F) sem ch:=CHR(typ) endsem. ENDGRAM The rules for Number and String need some explanation: Numerical constants cannot be converted while they are being recognized because decimal, hexadecimal, octal, and real constants can be distinguished only by their last character or by a decimal point. Their text must therefore be stored and converted later. Strings also have their peculiarities. Our mini-scanner returns the character sequences '..', '(*\ and '*)' as single terminals. If one of these sequences appears within a string it has to be expanded again, since strings must be stored in their original form. Therefore, the rule for strings gets more complicated than expected. On the other hand, the description of strings and comments with the symbol any looks very simple and elegant. In accordance with Section 5.2.1, any represents all those terminals which cannot be recognized instead of it, at this point in the grammar (in String: all terminals except'..', '(*'> '*)\ CR, and ''' (or ■ " ■); in Comment: all terminals except •(*' and '*)'). The example also shows the semantic processing of any. In a string, the symbol recognized by any is processed using the global variable typ (see Section 6.4.2). The reason for the introduction of the terminals \.', •(*'. and ,*)' is not obvious, and requires an explanation: the symbol'..' is necessary, because otherwise a lookahead of 2 characters would be needed (the first period in the
geC g# i Applications in compiler construction 179 sequence '1..2' may be a decimal point or the start of a range operator). Although comments can be processed with a single lookahead character, it simplifies the processing of comments considerably if we treat the sequences •(*• and '*)' as single terminals. LL(1) Conflicts As shown by Example 8.1, it is often difficult to avoid LL(1) conflicts when lexical structures are described by an attributed grammar 8.1 Example LL(1) conflicts in lexical structures Scanner = {Symbol}. Symbol = ... | »=» | ">" [■=■]. This situation represents an LL(1) conflict because if V is read and the next character is f=\ the syntax analyzer cannot decide whether this character belongs to the symbol '>=' or whether it constitutes a separate symbol '='. Such conflicts also appear in the symbols ':=', '<>•, '<=', Identifier, and Number. However, they are not critical since the syntax analyzer always chooses the first alternative it encounters during analysis. In the example above, this means that '=' is correctly considered part of the symbol '>=' rather than being recognized as a separate symbol. Speed A lexical analyzer implemented with Coco runs at approximately one-half the speed of a hand-coded analyzer. A 35% speed gain can be achieved if the nonterminals Letter and Digit with their many alternatives are already recognized as terminal classes by the mini-scanner. Assessment The example has shown how easy a translation process can be described with Cocol, At the first glance, the grammar may seem a bit confusing. Yet, as soon as one becomes familiar with this notation, the following advantages can be observed: 1. The grammar is short and precise. For the recognition of a symbol, it is sufficient to write its name without any additional actions. 2. The syntax is clearly separated from the semantics. Thus the syntax is more explicit than it is in a hand-coded compiler. 3. From the syntax declarations, one can see immediately which terminals and nonterminals are in the language.
180 Applications Chap. 8 4. Error-handling actions need not be described explicitly. 5. Many constructs, like nested comments, can be described with any in a straightforward and elegant way which is hard to surpass. Of course, there are some parts of the grammar which are not very simple to read, e.g. the production for Number. It has a rather complex structure, but this only shows that Cocol can also handle difficult constructs. After all, the production for Number describes four different kinds of numerical constants. This would be difficult to read in a hand-coded lexical analyzer, too, and could hardly be written in this short and concise form using a conventional programming language. 8.1.3 Semantic procedures for lexical analysis We decompose the semantic procedures of the attributed grammar into four modules Scannerlex, OutMod, ListMod, and Conversions and specify their definition modules, but omit their implementation modules due to space limits. DEFINITION MODULE Scannerlex; VAR typ,col,line: CARDINAL; ^information about the current token*) at: ARRAY[1..10] OF CHAR; (*not needed here*) PROCEDURE GetSy; END Scannerlex. Scannerlex reads and prints a source text and returns every single character as a separate token. The token number as well as its column and its line number are returned by GetSy in the global variables typ, col, and line. The token numbers are the ASCII-values of the source characters (exceptions: eofch=0, '..'=1, ,(*l=2, and '*)'=3). After the last character in the source text is read GetSy always returns eofch. DEFINITION MODULE OutMod; TYPE Symboltype = (*token codes*) (eofsy, andsy, divsy, timessy, slashsy, modsy, notsy, plussy, minussy, orsy, eqlsy, neqsy, grtsy, geqsy, lsssy, leqsy, insy, lparsy, rparsy, lbracksy, rbracksy, lconbrsy, rconbrsy, commasy, semicolonsy, periodsy, colonsy, rangesy, constsy, typesy, varsy, arraysy, recordsy, variantsy, setsy, pointersy, tosy, arrowsy, importsy, exportsy, fromsy, qualifiedsy, beginsy, casesy, ofsy, ifsy, thensy, elsifsy, elsesy, loopsy, exitsy, repeatsy, untilsy, whilesy, dosy, withsy, forsy, bysy, returnsy, becomessy, endsy, callsy, definitionsy, implementationsy, proceduresy, modulesy, ident, cardcon, intcardcon, realcon, charcon, stringcon, eolsy); PROCEDURE Emit(sy:Symboltype; col:CARDINAL);
Sec. 8.1 Applications in compiler construction 181 PROCEDURE EmitConstant(sy:Symboltype; val,col:CARDINAL); PROCEDURE Emitldent(spix,col:CARDINAL); PROCEDURE EmitString(addr,len,col:CARDINAL); END OutMod. The module OutMod contains procedures to write symbols to an intermediate language file. Emit writes a symbol without attributes (e.g. a keyword, an operator or a single character) to the intermediate language. It emits a word which contains the symbol type sy and the column col of that symbol. EmitConstant writes a numeric constant to the intermediate language. It emits two words, the first of which contains the type sy and the column col of the symbol and the second the constant value val. Emitldent writes a name to the intermediate language. It emits two words, the first of which contains the symbol type *ident' and the column col and the second the spelling index (spix) of the name. EmitString writes a string to the intermediate language. It emits three words, the first of which contains the symbol type 'string1 and the column col, the second the string address addr and the third the string length len. DEFINITION MODULE ListMod; FROM OutMod IMPORT Symboltype; PROCEDURE EnterString(buffer:ARRAY OF CHAR; len:CARDINAL; VAR addr:CARDINAL); PROCEDURE Hash(buffer:ARRAY OF CHAR; len:CARDINAL; VAR sy:Symboltype; VAR spix:CARDINAL); END ListMod. ListMod handles the name list and the string list of the scanner. EnterString enters a string (stored in buffer[\..len\) into the string list and returns its address addr. Hash searches a name (stored in buffer[\..leri\) in the name list. If not found it is entered. For keywords Hash returns the token code of the keyword and spix is 0. Otherwise Hash returns the token code 'ident1 and spix is the address (spelling index) of the name in the name list. DEFINITION MODULE Conversions; FROM OutMod IMPORT Symboltype; PROCEDURE Convert(buffer:ARRAY OF CHAR; len:CARDINAL; VAR sy:Symboltype; VAR val:CARDINAL); PROCEDURE ConvertReal(buffer:ARRAY OF CHAR; len:CARDINAL; VAR rval:REAL); END Conversions. The module Conversions converts digit strings to cardinal or real numbers. The procedure Convert converts a digit string (stored in buffer[l..len]) to a numeric constant or a character constant. The digit string may have the following syntax: digitstring = digit {digit} — decimal constant
182 Applications Chap. g I digit {hexdigit} 'H1 — hex constant I octaldigit {octaldigit} 'B* — octal constant I octaldigit {octaldigit} 'C1. ~ character constant For numeric constants the output parameter sy is cardcon and val is in the range 0..6SS3S; for character constants sy is charcon and val is in the range 0C..377C. ConvertReal converts a digit string (stored in buffer[l.Jeri\) to its real value rval. The syntax of the digit string is digitstring = digit {digit} '." {digit} ['E1 [H'!'-'] digit [digit}]. 8.2 Applications in software engineering An attributed grammar as a description method and a compiler compiler as an implementation tool are not limited to compiler construction. They can also be useful in other fields of software engineering. The reason why compiler construction techniques can be generally used in software engineering is that most large programs have the following characteristics: 1. Input streams are sufficiently complex to be described in terms of syntax and semantics. 2. The structure of the input text often determines the logical structure of the entire program or of a large portion of it. This wide field of applications is remarkable. We will now show that the well- known Jackson method of program design can be regarded as a special case of program design with attributed grammars. With this in mind, in this section the compiler description language is emphasized while the compiler compiler stays in the background. 8.2.1 Attributed grammars as a software design method The use of attributed grammars automatically leads to a two-step design process: In the first step {coarse design) the problem is decomposed into its syntactical and semantical parts. Here, the attributed grammar serves as a design method. In the second step {refined design) the semantic procedures are designed from their specifications in the rough design. The creation of the coarse design consists of the following steps, which may be executed sequentially or iteratively:
Sec. 8.2 Applications in software engineering 183 Write the grammar. The syntactic structure of the input text is described by a context-free grammar. 2 Find attributes. Starting from the meaning of each syntax symbol, one tries to find out which (semantic) attributes should be attached to it Then one defines these attributes and their occurrences in the grammar rules. With some experience and a proper understanding of the problem the right choice is almost automatic. This step is therefore also a good check on correct understanding of the problem. 3. Prepare context conditions. Possibly further attributes may be necessary for this process. 4. Define semantic procedures. In this step, all procedures which are used in semantic actions are defined. The refinement of semantic actions into code and procedure calls may again be done in a coarse or fine manner. Using the first approach, one may associate a special semantic procedure with each semantic action; using the latter approach, one may describe each semantic action in terms of elementary operations of a programming language without calling semantic procedures. Since many of the semantic procedures are usually access procedures to data structures, they support a modular design in the form of data capsules. The collection of all procedures shows which operations can be performed with the various data structures and which relations exist between the data structures. 5. Setup the attributed grammar. One combines the context-free grammar, the attributes, the semantic actions, and any context conditions for a proper attributed grammar. After these five steps, the coarse design is completed and the following has been accomplished: 1. The problem has been decomposed into three parts: syntax, context conditions, and semantic actions. 2. The attributes and the data structures derived from them are the terms in which the problem solution can be appropriately described. 3. The access routines to the data structures and all other algorithms required for the solution are defined by the semantic procedures. This completes the design method with attributed grammars. The result is sufficiently abstract to fix only the essential semantic design decisions but to leave enough freedom to the implementor. On the other hand it is sufficiently concrete to specify explicitly those details that should not be left to the decision of the implementor.
184 Applications Chap, g The result of the coarse design, consisting of a system of attributes, semantic procedures, and an attributed grammar, can be viewed as the specification for the refined design, since it describes what is to be done but not how it should be done. The next step is the refined design which may now exclusively concentrate on the semantic procedures without having to consider any syntactic problems. However, coarse design and refined design may influence each other. After the definition of the attributes, one may find that the semantic procedures are either too abstract or too concrete, too complex or too simple. For example, too many access procedures to the data structures of a module may indicate that it would have been better to add a lower level of abstraction, and to divide the large module into several smaller ones. The concise and formal notation of attributed grammars encourages one to try several approaches and to check their consequences without much effort, even when the task is large. The refined design is followed by the implementation. Only a lexical analyzer has to be written here, the rest is done by the compiler compiler. 8.2.2 The telegram problem as an example Henderson and Snowdon [1972] presented the following problem, which is known as the 'telegram problem': A stream of telegrams is to be processed. Each telegram is terminated by the string 'ZZZZ'. The telegram stream is terminated when an empty telegram followed by 'ZZZZ' arrives. The words in a telegram are to be counted. Long words with more than 12 characters are to be counted separately. After each telegram, the counter values are to be printed. The telegrams are read and subsequendy printed in lines of 100-120 characters. Superfluous blanks are to be eliminated. The maximum word length is 20 characters. Longer words are to cause the program to stop. Since the input consists of structured data, and its structure will significandy determine the algorithm, this task is well suited for the application of attributed grammars, and a subsequent implementation with a compiler compiler. The design steps for the solution of the telegram problem are: 1. Setup the grammar of the input data Terminals: textword a word in a telegram endword end word (= ZZZZ)
Sec. 8.2 Applications in software engineering 185 Nonterminals: Te iegramSt ream the total telegram stream TextTeiegram a text telegram (including its end word) EmptyTeiegram an empty telegram containing only the end word Context-free grammar: TelegramStream = {TextTeiegram} EmptyTelegram. TextTeiegram = textword {textword} endword. EmptyTelegram = endword. 2. Define attributes. From the specification of the task, three attributes result: w array of char the text of a word n integer the number of words in a telegram l integer the number of long words in a telegram 3. Assign attributes to the grammar symbols. In this step, we list the grammar symbols and attach attributes to them. textwordfw recognizes a word and provides its text w. TextTeiegramtnti recognizes and prints a telegram with n words, of which / words are long. The remaining grammar symbols have no attributes. Note that the attributed symbols are viewed from an algorithmic point (i.e. we do not say TextTeiegram represents a telegram', but rather TextTeiegram recognizes a telegram1). The verbal description of the attributed symbols should specify all attributes of the symbol. It should be accurate enough to be used as a specification of the translation process. This is usually possible and easy to accomplish since the few items involved have already been previously defined. 4. Define semantic procedures. The actions the program must execute can be seen from the problem description: (a) read the source text, recognize and count the words; (b) print the source text with a different line length; (c) print the counter values. Reading the source text is the task of the lexical analyzer and does not concern us here. The words are counted with the attributes n and /. Therefore, the only candidates for semantic procedures are those which print the text and the counter values. A variable will probably be needed to assure that the line size will not exceed 120 characters. It will be initialized at the beginning of each telegram, and will be checked and increased when a new word is added to the line. A line buffer may also be needed. Following the principle of stepwise refinement, we are not yet interested in the implementation details here. Rather, we define the following three procedures which will do the whole printing job.
186 Applications Chap, s out m it initialize the output of a telegram; outword(iv) print the word w according to the problem defi. nition; out Account (I nil) print the counter values n and / with an appropriate text 5. Write down the attributed grammar. Having completed steps 1 through 4, the attributed grammar is almost self-evident now: TelegramStream - { sem Outlnit endsem TextTelegramfn'h sem OutAccount(inil) endsem } EmptyTelegram. TextTelegramfj^ = textwordfw where (Iw|<=2 0) sem n:«l; if |w|>12 then 1:«1 else 1:«0 end; OutWord(iw) endsem { textwordfw where (|w|<= 20) sem n:-n+l; if jw|>12 then 1:=1+1 end; OutWord(iw) endsem } endword. EmptyTelegram = endword. This completes the coarse design of the telegram problem* Syntax and semantics are clearly separated. Together they provide a clear decomposition of the program, making its structure apparent. The separation shows that the semantic processing - i.e. the essential part - is very simple if there is a printing module with the access procedures Outlnit, OutWord, and OutAccount. A comparison with Henderson and Snowdon's solution shows that in his program lexical analysis and syntax analysis attract the major part of attention in design, program text, and possible design errors. Output and counting are of minor importance and are nearly lost. Their solution avoids the terms syntax and semantics, thus letting the problem appear to be much more complex than it is. In contrast, we focus most of our attention on printing and counting. We consider lexical analysis and syntax analysis as routine matters that do not require special attention.
Sec. 8.2 Applications in software engineering 187 3 Attributed grammars as documentation crrtm the above, it should be obvious that attributed grammars are also well •ted for documentation. The system of syntax, attributes, semantic procedures and the attributed grammar of a software product is its documentation (an the abstraction level of the attributed grammar). The following advantages 0f this documentation method are evident: I The form of the documentation (its structure) is easy to find. It is almost independent of the product to be described, and consists of the parts: terminals, nonterminals, context-free grammar, attributes, attributed grammar symbols, semantic procedures, and attributed grammar (in this order). This arrangement aids standardization. 2. The documentation is formal and therefore precise, complete, and short. 3. The documentation is abstract enough to hide implementation details, but concrete enough to express important conceptual details. 4. The fact that attributed grammars represent a machine-readable documentation renders it unnecessary to separate implementation and documentation, thus ensuring that the documentation is always up-to-date. 8.2.4 The Jackson method as a special case At a quick glance, the often discussed Jackson method of program design seems to have nothing in common with attributed grammars. Jackson [1975] uses a totally different terminology and describes his method only by examples in an indirect and unsystematic manner. To find out the essence of Jackson's method, the reader is forced to study other literature. The Jackson method is based on the following three concepts: 1. The structure of an algorithm is derivable from its input and output data. 2. The structure of the input and output data is described by tree diagrams which allow the description of sequences, alternatives, and (unlimited) repetitions. 3. If the structures of the input and output data 'match* in a certain way, the total algorithm for the transformation of the input into the output data can be viewed as an assembly of the transformation algorithms for the individual substructures. If the structures of the input and output data do not match, the Jackson method fails. However, in the examples in his book, Jackson shows that his method
188 Applications Chap.8 can still be used with the aid of tricks such as 'backtracking', 'program inversion', and some other techniques. Hughes [1979] looked at the Jackson method from the standpoint of formal languages and summarized the following points: 1. Jackson's tree diagrams describe only regular languages since they ait only based on sequences, alternatives, and unlimited iterations. 2. In addition, it is required that the input data can be deterministically analyzed with a single-character look-ahead. 3. Jackson's requirement of a structural matching between input and output data means in the terminology of formal languages that there must be a finite automaton that transforms the input into the output. Jackson's design method can be viewed as a special case of the design method with attributed grammars, in which: 1. the input data is regular and its grammar is LL(1); 2. the output data form a regular language; 3. a certain correspondence exists between input and output language that manifests itself in the fact that a finite automaton can be found that transforms the input into the output. It is therefore only applicable to a narrow range of tasks that meet these conditions. It is suprising that this relationship between Jackson's method and the design method with attributed grammars has hardly been recognized. The reason for this may be that Jackson does not distinguish between syntax and semantics (in fact, they are indistinguishably coupled in his examples), and does not use attributes. If we describe the examples in Jackson's book with attributed grammars, they will become simpler, shorter, and easier to understand. The grammars are simple throughout. We will show this by example 14 of Jackson's book, which in his discussion covers 17 pages, and is the most voluminous of the entire book. Problem description. An operating system collects data about its use. These data are: A record for the start of each session (LOGON), the end of a session (LOGOFF), the start of a program (PROGSTART), and the end of a program (PROGEND). At logon time, the user is assigned a unique session number. The system makes sure that a user can start a session only when his terminal is free, and cannot terminate a session that he has not initiated. Furthermore, a user can have only one active program at any given time. He must terminate an active program before starting a new one.
Sec. &2 Applications in software engineering 189 The collected data is written to a file. The records have the following contents: Logon record: LOGON session number start time Logoff record: LOGOFF session number stop time Progstart record: PROGSTART session number program name start time Progend record: PROGEND session number program name stop time The records are stored in strict chronological order. However, it is possible that records are missing due to erroneous processing. In this case, the data file contains incomplete information for some sessions and programs: a logon record without corresponding logoff record, and vice versa; a progstart record without corresponding progend record, and vice versa. As a result, the program should produce the following list: Number of complete sessions = nnnn Average session length = tttt Number of known sessions Number of complete programs » pppp Average program length - uuuu Number of known programs = qqqq Grammar. The input consists of four kinds of records. We regard them as terminals: logon, logoff, progstart, and progend, and arrive at the following grammar: input = {logon | logoff I progstart I progend}. It consists of a single rule (for regular languages, there is always a grammar that consists of a single rule). In accordance with the problem description we attach attributes to the terminals: session: integer session number prog: name program name t ime: integer time of logon, logoff, progstart and progend and get the attributed grammar symbols logonfsessionttime logofftsessionttime progstarttsessionTprogTtime progendtsessiontprogtti me Semantics. In the semantic actions, we need variables that hold the results. We call them compietesessions: integer number of complete sessions knownsessions: integer number of known sessions compieteprogs: integer number of complete programs knownprogs: integer number of known programs
190 Applications Chap, g sessiont ime: integer length of all complete sessions progt ime: integer length of all complete programs It is clear from the above that, when a logon record appears, the job number and the start time must be stored until a logoff record with the same job num. ber is encountered. The same is true for programs. For the time being, we will put the definition of the concrete data structures in the background, and consider only the fact that we need the following access procedures: NewSession (Isessionltime) Define the start of a session at the specified time. DisposeSession(isession) Define the end of a session. SessionStarted(isession): boolean Return true if the specified session has been started. SessionStanTime(lsession): integer Return the start time of the specified session. NewProg(lsessionlprogltime) Define the start of the program prog in the specified session at the specified time. DisposeProg(isessioniprog) Define the end of the program prog in the specified session. ProgStarted(lsessionlprog): boolean Return true if prog in session has been started. ProgStartTime(lsessionlprog): integer Return the start time of prog in session. InitStorage Initialize the abstract data structure. Attributed grammar. With only these few facts, which are easily derived by modest thought, the attributed grammar of the problem can be formulated: input - sem InitStorage; completesessions:=0; knownsessions:=0; completeprogs:=0; knownprogs:=0; sessiontime:=0; progtime:=0; endsem {logontsession1time sem knownsessions:=knownsessions+l; NewSession(isessionitime); endsem
Sec- 8-2 Applications in software engineering 191 | progStartfsessiontprogTtime sem knownprogs:=knownprogs+l; NewProg(isessioniprogitime) endsem I progendfsessiontprogttime sem if ProgStarted(isessioniprog) then completeprogs:-completeprogs+l; progtime:=progtime+(time- ProgStartTime(isessionlprog)) DisposeProg (isessionlprog) else knownprogs:=knownprogs+1 end endsem | logofftsessionttime sem if SessionStarted(isession) then completesessions:=completesessions+l; sessiontime:=sessiontime+(time- SessionStartTime(isession)) DisposeSession(isession); else knownsessions:=knownsessions+l end endsem } sem Write(icompletesessions) Write(4sessiontime/completesessions) Write(iknownsessions) Write(icompleteprogs) Write(iprogtime/completeprogs) Write(iknownprogs) endsem At this point, the coarse design is already completed. The refined design will decide about the concrete implementation of the abstract data structure. In principle, the program can be implemented with a compiler compiler. In order to read the input data, only a (trivial) lexical analyzer needs to be written. But since the grammar of this problem is so simple (as it is also for the telegram problem), the use of a compiler compiler is analogous to taking a sledgehammer to crack a nut. It is therefore almost self-explanatory that the syntax analyzer for this problem is coded using the method of recursive descent (in this case it is even non-recursive). Jackson instead undertakes voluminous considerations about intermediate data files and program inversions which make the task appear much more complicated than it really is.
192 Applications Chap. $ 8.3 Results of a Coco run For readers interested in the way Coco works, we present an example showing the contents of the compiler parts generated from a specific input grammar. It can be viewed as a supplement to the implementation description in Chapter 7, and should help to understand the principles explained there. The example will be the description of an index generator, which is a program that generates an index from a list of keywords entered according to some syntactic rules. This problem provides another example of the use of attributed grammars in software engineering. The input to the index generator is to be as follows: for each page of a document, the page number and all keywords on this page are entered in the following manner: 1 = Introduction; User's Guide; 2 = Start up; Parts of the tool; 3 = General characteristics; User's Guide On the left-hand side of the '=' sign, page numbers as well as words are allowed. Words, however must start with a •**: *Appendix = Maintenance; Troubleshooting; From this input, the compiler generates a file of pairs <keyword, page number>, sorts this file, and prints an index in which page numbers of identical keywords are collected (the index at the end of this book was produced with such a program). In our example, we will describe the first phase of this compiler, i.e. the generation of the <keyword, pagenumber> file. 1 GRAMMAR Index 2 3 SEMANTIC DECLARATIONS 4 FROM FilelO IMPORT File,Open,Close,Write,WriteString,WriteLn; 5 FROM Indexlex IMPORT GetKeyword,AdjustNumber. 6 7 VAR f: File; 8 keystring,refstring,string: ARRAY[1..50] OF CHAR; 9 value: CARDINAL; 10 11 TERMINALS 12 n=" alias equal 13 ";n alias semicolon 14 n*w alias asterisk 15 keyword 16 number<out:value> 17 — 1 — 2 — 3 — 4 ~ 5
Sec. 8.3 Results of a Coco run 193 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 PRAGMAS eolsy NONTERMINALS Index Relation Reference<out: RULES Index = {Relation} Relation = Reference<out n_.it { keyword n • n }. :string> -- 6 — 7 » 8 -- 9 sem Open(f,"INDEX.OUT") endsem sem Close(f) endsem. :refstring> Reference<out:string> = number<out:value> j n*n keyword ENDGRAM sem GetKeyword(tkeystring); WriteString(if,ikeystring); Write(if,iCHR(0)); WriteString(if,4refstring); WriteLn(f) endsem sem AdjustNumber(lvalue,tstring) endsem sem GetKeyword(Tstring) endsem. This is the description of the translation process. The only thing the user has to provide is the module Indexlex that supplies the terminals and exports the two procedures GetKeyword and AdjustNumber. GetKeyword should return the keyword string that the lexical analyzer has obtained after recognition of the terminal keyword. AdjustNumber should right-justify a number in a character field for sorting. The pragma eolsy is specified only to show how pragmas are encoded in the generated tables. From this input, Coco generates a table-driven syntax analyzer and a semantic evaluator. These modules will be discussed in the next sections. 8.3.1 The generated syntax analyzer The syntax analyzer is generated from a frame program (cocosynframe, shown in Appendix F) into which Coco inserts the following constant declarations.
194 Applications Chap, g CONST maxname = maxnamep = maxcode = maxany maxeps maxt - maxp maxs startpc = 75 9 48 1 2 5 6 9 44 (*length of name list*) (*number of names*) (*length of G-code*) (*number of any-sets. At least one dummy*) (*number of eps-follower sets*) (*last terminal number*) (*last pragma number*) (*last nonterminal number*) (*start address of the grammar*) These values are the table dimensions derived from the above grammar. 8.3.2 The generated semantic evaluator The semantic evaluator also consists of fixed frame parts and parts that are copied from the attibuted grammar. For the index generator, the semantic evaluator is as follows (generated parts are shown in italics and frame parts are shown in roman type): IMPLEMENTATION MODULE I/Jdexsem; FROM SYSTEM IMPORT WORD; FROM Indexlex IMPORT at; FROM FilelO IMPORT File,Open, Close,Write,WriteString,WriteLn; FROM Indexlex IMPORT GetKeyword, AdjustNumber; VAR f: File; keystring, re fstring, string: ARRAY[1.. 50] OF CHAR; value: CARDINAL; PROCEDURE ASSIGN(VAR x:W0RD; y:WORD); BEGIN x:=y END ASSIGN; PROCEDURE Semant(sem:CARDINAL); BEGIN CASE sem OF U: ; / 12: (*line 29*) Open (f, "INDEX. OUT*1) I 13: (*line 30*) Close if) I 14: (*line 33*) refstring:=string; I 15: (*line 35*) GetKeyword (keystring) ; WriteString (f, keystring); Write (f,CHR(0)); WriteString(f,refstring); WriteLn(f) I 16: (*line 44*)
Sec. 8-3 Results of a Coco run 195 ASSIGN (value,at[1]); I 17: (*line 44*) AdjustNumber(value,string) I 18: (*line 45*) GetKeyword( string) END; END Semant; END I/idexsem. 8.3.3 The generated parser tables Coco generates the following tables: 1. G-code; 2. information about nonterminals (G-code start address, deletability, set of start symbols); 3. terminal successors of eps-symbols; 4. symbol sets represented by any-symbols; 5. number of attributes for terminals and pragmas; 6. number of semantic actions for pragmas; 7. symbol names for error messages. The table values are inserted as initialization code into the generated syntax analyzer. We will now show these values in a decoded form. -code Address — 1 2 6 9 11 12 — 13 15 16 18 22 23 25 28 30 Instruction Index — SEM12 NTA JMP EPS SEM13 RET Relation — NT SEMI 4 T TA SEM15 T JMP EPS RET Relation, 9 2 1 — Reference ii _if keyword, 28 18 2 Code (addresses take 2 bytes) 12 3 10 8 13 11 2 14 0 1 15 0 10 8 11 0 28 18
196 Applications Chap, § 31 35 36 37 38 40 42 43 — 44 46 48 Reference - TA SEMI 6 SEMI 7 RET T T SEM18 RET dummy rule NT T RET number, 38 n*n keyword — Index EOF 1 16 17 11 0 0 18 11 2 0 11 5 3 4 7 0 0 38 The entire grammar occupies only 48 bytes of G-code! Nonterminal description symbol (no.) start address deletability terminal start symbols Index (7) 1 deletable {w*n, number} Relation (8) 13 nondeletable {w*n, number} Reference (9) 31 nondeletable {"*", number} eps-successors 1: {EOF} 2: {EOF, ■*■, number} Number of attributes for terminals and pragmas 0 keyword: 0 0 number: 1 0 eolsy: 0 0 EOF it .n i Pragma semantics attribute passing action eolsy: 0 user action 0 Symbol names names: EOF/equal/semicolon/asterisk/keyword/ number/eolsy/Index/Relation/Reference name pointers: 1, 5, 11, 21, 30, 38, 45, 51, 57, 66
9 Experiences with Coco In 1981 workers at the University of Linz built a parser-generator that generates parser tables for an LL(1) syntax analyzer from an input grammar in Wirth's EBNF notation. The generator proved useful, which is the reason why it was enhanced in 1983, and eventually evolved into the compiler compiler Coco. The first version of Coco ran on an Intel 8080 development system, and was written in PIVM-80, a language similar to PL/I for microcomputers. Since then, many more versions of Coco have been implemented in Modula-2 on various microcomputers including the Macintosh, the IBM-PC, the Atari 1040 and the Lilith. There is also a version for IBM mainframes. Coco has been in use for several years now and has proved to be useful both in research projects (e.g. construction of a Modula-2 compiler, tools for static program analysis) and in student courses. 9.1 A basis for measurements In the following sections, we will describe the results of memory and run-time measurements performed on Coco, and on three compilers generated by Coco. First, we will measure the generation of a Modula-2 compiler. This compiler consists of 6 passes (lexical analysis, syntax analysis, name analysis, declaration analysis, semantic analysis, and code generation). Each of passes 2 through 6 reads the entire source program in an intermediate language generated by the previous pass. This intermediate program is 197
198 Experiences with Coco Chap. 9 analyzed and forwarded to the next pass as a new, usually shorter, inter- mediate program (with the exception of pass 6, which generates the object code). Each pass is therefore a compiler in itself, described with an attributed grammar and translated by Coco into a syntax analyzer and a semantic evaluator. For the measurements, we will not look at the entire Modula-2 compiler, but rather at two specific passes, since we are interested in the individual Coco runs. We select pass 2 (syntax analysis) and pass 4 (declaration analysis). These two passes have rather different characteristics, which make them well suited for a comparison. Pass 2 has a large and deeply nested recursive grammar with only a few semantic actions, while pass 4 has a simple grammar with a lot of semantic actions. In the following paragraphs, we will talk about each of the passes as if they were independent compilers. Secondly, we will measure the generation of Coco by itself. Compared to the Modula-2 compiler Coco is much smaller and consists of a single pass. Thus, we have a comparison between two large applications and a small application. Table 9.1 shows the sizes of the compilers in terms of their attributed grammar. Table 9.1 Size of the attributed grammars of the example compilers Number of lines Terminal symbols Pragmas Nonterminal symbols Alternatives Symbols in productions Semantic actions G-code Modula-2 (pass 2) 960 77 4 62. 224 491 7D 1726 bytes Modula-2 (pass 4) 968 77 4 27 94 192 126 733 bytes Coco 609 43 1 10 54 137 68 447 bytes The measurements shown in the following sections were taken from the Lilith, since the Modula-2 compiler was only available there. For the Macintosh the results would have been very similar. The Lilith is a 16-bit computer built on an Am2901 bit-slice processor with a cycle time of 150 nanoseconds. It has a very compact object code format (the so-called M-code) which has been especially tailored to Modula-2.
Sec. 9.2 Measurements on Coco 199 9.2 Measurements on Coco First, we will look at Coco and measure the memory requirements and the-run time required by Coco to generate a compiler. Memory requirements Obviously the memory requirements for the code and the static data of Coco are the same in all three measurements (65 347 bytes). The size of the dynamic data depends on the input grammar but requires typically less than 1000 bytes (see Table 9.2). Table 9.2 Memory requirements of Coco for the generation of various compilers Code Static data Dynamic data Totals Modula-2 Modula-2 (pass 2) (pass 4) 34 170 34 170 31177 31177 190 872 65537 bytes 66219 bytes Coco 34170 31 177 564 65 911 bytes The memory requirement for the code is shared between ten Coco-specific modules and two standard modules. In addition, Coco uses one module that belongs to the resident part of the operating system, and thus does not increase Coco's memory requirements. Run-time The run-time of Coco depends on the size of the input grammar. Most of the time is used by the lexical analyzer that reads and lists the grammar. To write out the syntax analyzer and the semantic evaluator of the target compiler also requires considerable time, while the rest of the work is done fairly rapidly. In large grammars, with a deeply nested hierarchy of nonterminals (as in pass 2 of the Modula-2 compiler), also the grammar tests take a certain amount of time, (see Table 9.3)
200 Experiences with Coco Table 9.3 Run-time of Coco for the generation of various compilers Lexical analysis Syntax analysis, semantic processing Grammar tests Output of the generated compiler Totals Modula-2 (pass 2) 14.9 6.8 53 10.9 37.9 s Modula-2 (pass 4) 19.7 4.6 1.7 14.0 40.0 s Coco 12.0 3.2 0.9 10.9 27.0 s 1 9.3 Measurements on some generated compilers We will now consider the memory requirements and the run-time of the compilers generated by Coco. Memory requirements Here, we are only interested in parts which are actually generated by Coco, namely the syntax analyzer, the semantic evaluator, and the parser tables. We are not going to consider the size of the semantic modules since they are independent of Coco. Table 9.4 Memory requirements of some generated compilers Syntax analyzer Semantic processor Analysis tables Totals Modula-2 (pass 2) 2836 2096 4600 9532 bytes Modula-2 (pass 4) 2836 4084 1469 8389 bytes Coco 2836 2214 ! 1294 6344 bytes All three compilers use the same syntax analyzer driven by different tables. Its size is constant. The size of the semantic evaluator depends on the number and the length of the semantic actions of the attributed grammar. As expected, its size is larger in pass 4 of the Modula-2 compiler than in pass 2 and in Coco. Note that the memory requirements of the generated compilers do not depend on the length of the input text, since no syntax tree of the input is built.
Sec. 9.4 General experiences 201 Kun-time The run-time of the generated compilers on input texts of various length is shown in Table 9.5. Table 9.5 Run-time of some generated compilers Modula-2 Modula-2 Coco (pass 2) (pass 4) 100 Input symbols 0.9 s 0.5 s 9.1s 1000 Input symbols 1.9 s 1.2 s 14.5 s 5000 Input symbols 7.9 s 4.5 s 35.5 s Even though Coco is the smallest of the three compilers, it runs much slower than the others since it does a lot of input and output (it writes long parts of source programs to disk), while pass 2 and pass 4 of the Modula-2 compiler work almost entirely in the main memory (with input and output used only for intermediate languages). 9.4 General experiences The experiences with Coco are exceptionally good. Coco allows a tight and very readable specification of the translation processes. The attributed grammars become essential parts of each compiler documentation. By automating syntax analysis, error handling, and semantic processing, attention can be focused on the actual translation process in the semantic procedures. More time is available for the design now. Working with attributed grammars almost automatically leads to a modular program structure with abstract data structures and access procedures, which are usually small and easy to understand. In multi-pass compilers, like the Modula-2 compiler, the symbol any is especially useful since it lets one easily skip over portions of the input that are not of interest in this pass. The concept of pragmas has also proved useful since they make it easy to pass control information between successive passes (e.g. trace commands, options, etc.). The limitations of LL(1) grammars are not a serious problem. Because of Wirth's EBNF notation, it is not necessary to perform complex grammar transformations in order to remove LL(1) conflicts, which is usually required
202 Experiences with Coco Chap. 9 in the standard BNF notation. The only time when we failed to resolve \JU\\ conflicts was in the translation of the language PLM-80. The conflicts were resolved by delegating some parts of the processing to the lexical analyzer. Processing the input with L-attributed grammars and without building a syntax tree is not a serious restriction. If during processing some attributes are needed which only become available later, intermediate results are stored until the required attributes have been calculated and the final translation is possible. The omission of a syntax tree leads to efficient compilers with regard to speed and memory requirements. Most of the generated compilers run on microcomputers. The negative experiences in the use of Coco are limited to the global nature of semantic objects in Cocol, which requires explicit stacking of variables, and to the fact that whenever an error has been detected in the attributed grammar the program development cycle is enlarged by an additional run of the compiler compiler. However, the positive experiences outweigh the negative ones. Even though we have no hand-coded compiler that we can compare directly to a Coco-generated compiler, we are not afraid to claim that the efficiency of compilers generated by Coco is close to that of hand-coded compilers, and it is certainly easier to implement and to maintain a compiler with Coco than by hand.
A Definition of Adele An algorithm description language, like a programming language, should offer all concepts for the description of algorithms, but should be free of syntactic peculiarities. In this way, the algorithms will stand out clearly and the reader will not be distracted by all sorts of baroque constructs. For the same reason, it should use only a few constructs and give the user freedom of expression. It should lean on popular programming languages so that it is easy to read, but should not be firmly bound to a particular programming language. Our algorithm description language Adele contains elements of PL/I, Modula-2, and Ada. We will describe its structure by a few examples. Overall structure Each algorithm has a name, parameters, and instructions: Search (ilistllengthixti): begin Instructions end Search The parameter list of functions is followed by the type of the function: Search (ilistilengthix) integer: begin Instructions return i end Search Input parameters are marked by i, output parameters by t, and transition parameters by J. Statements We distinguish between assignments, procedure calls, control statements, input-output statements, and text statements. To improve readability, instructions may optionally be separated by a semicolon.
204 Definition ofAdele APP.A Assignment. The assignment has the form variable := expression Procedure call. The call of a procedure consists of the procedure name and the actual parameters in parentheses: ReadCard(Tcard) It is a useful convention to define procedure names partially with capital letters, and variable names completely with lower case letters. Control Statements. Here we use the modern forms of Modula-2 which are explicitly terminated by an end, with the exception of the repeat statement: if expression then statement sequence end if expression then statement sequence else statement sequence end case expression of case expression of label: statement sequence or label: statement sequence I label: statement sequence I label: statement sequence end else statement sequence end while expression do statement sequence end repeat statement sequence until expression loop statement sequence with exit end for variable := expression to expression [by expression] do statement sequence end The control variable will be undefined after completion of the/or loop. exit exits from the immediately enclosing loop statement. return exits from a procedure. return expression exits from the function procedure with expression as the function value. halt stops the algorithm without return to a surrounding algorithm. Input-output statements. Here we only use three statements: read (TxT eo f) read x or signal end of input file wr ite (ix) write x to the output medium wr iteIn emit line feed We do not concern ourselves with the format of the input and output text. The boolean parameter eof indicates the end of the input file. When x has been read, eof will be false
ApP-A Definition ofAdele 205 on return. If* could not be read due to end of tile, 20/will be true and x will be undefined on return. Text statements. Text statements are free texts that describe actions. For example: calculate mean values and variances; The only rule is that they be terminated or separated by a semicolon so that their end can be seen. Expressions For expressions we stipulate the common combinations of operators and operands without giving specific rules. We state only that boolean expressions can be viewed as conditional expressions with short circuit evaluation: a & b is equivalent to if a then b else false a I b is equivalent to if a then true else b end This means that if the left operand alone is sufficient to determine the value of the expression, then the right operand is not evaluated. Declarations Usually declarations are not needed for the description of short and simple algorithms, especially if the variables used are obvious from the preceding explanations. However, in longer algorithms with local variables, global variables, parameters, and perhaps also named constants, it is advantageous if the algorithm description language also contains declarations. In Adele, the declaration of constants and variables can be written between the head of the algorithm and begin. We partition the declared items into the following classes: parameters, global variables, constants, local dynamic variables, and local static variables. The classes are identified by the keywords param, global, const, static. After each keyword, one or more declarations of names of the corresponding type can be placed. A constant declaration has the form name = value a variable declaration has the form name: type As types we use the elementary types of Pascal and Modula-2 with the following keywords or structures: integer real boolean char (red,green,blue) array (index:index) of type Array types allow a certain amount of freedom. If the range limits are not needed, we write array of type If the type is not needed, we write array (index:index)
206 Definition of Adele App. A If both are not needed, we simply write array As an example of the use of declarations, we describe a linear search algorithm with declarations of all names: Search (ilistllengthlxti) : param list: array of integer length, x, i: integer local j: integer begin j:=length while j>0 & list(j)<>x do end i:=j end Search For static variables, we allow optional initialization. This is done by adding the phrase imt(value) after the type: static finished: boolean init(false) Comments Comments, like those in Ada, start with two minus signs and extend over the rest of the line. — This is a comment — which extends over two lines. Undefined issues Adele has no rules for the remaining items such as records, pointers, modules, etc. We write them, more or less, in the style of Modula-2.
B Modula-2 and Pascal Since Modula-2 evolved from Pascal, its appearance is very similar to Pascal, and so Pascal programmers have no difficulty in reading Modula-2 programs. Here we will briefly present the most important differences for the reader of the Modula-2 programs in this book. The complete language definition and examples can be found in the books of Wirth [1982] and Pomberger [1986]. A didactically emphasized introduction to Modula-2 is the book of Blaschek, Pomberger, and Ritzinger [1985]. General characteristics Modula-2 is a system implementation language that enhances Pascal in the following key features: 1. Modular program structure. Modula-2 programs are composed of separately compiled modules. The compiler checks the consistency of the interface between modules. The language is therefore especially suited for the implementation of data capsules and abstract data types. 2. Coroutines and parallel processes. Modula-2 provides the coroutine facility as the basic element for the implementation of parallel processes. 3. Low-level features. Modula-2 provides facilities to bypass strong type checking so that memory words can be directly accessed and addresses can be handled. This makes it possible to produce machine-specific code. We will not describe parallel processing or low-level features in this chapter since Coco does not use them. Lexical elements Modula-2 differs from most Pascal implementations by its sensitivity to the case of letters. The names TRUE, True, and true denote three different objects. Single character constants can be denoted by use of an octal number that is terminated w"h a 'C, e.g. CONST ff= 14C. 207
208 Modula-2 and Pascal APP.B Declarations In contrast to Pascal, constants, type, variable, and procedure declarations can appear in any order. There are no labels or label declarations. Standard types. In addition to the standard types of Pascal; INTEGER, REAL BOOLEAN, CHAR, we have the standard type CARDINAL for unsigned natural numbers. For 16-bit implementations, the range of integer values is -32 768 to +32 767. The range of cardinal values is 0 to 65 535. Enumeration, subrange, array, record, and pointer types are the same as in Pascal with the exception that arrays cannot be packed, and variant record types have an improved syntax. If the word length of the computer is w bits, then the cardinality of set types is confined to w, or a 'small multiple thereof (according to the language definition). There is a standard type BITSET that consists of the elements 0 through w -1: TYPE BITSET = SET OF [0..W-1] Set constants are enclosed in'{' and'} \ The machine-dependent type WORD denotes arbitrary data whose length is a machine word. It is compatible with all types whose length is a machine word. Expressions Expressions in Modula-2 are constructed in the same way as in Pascal. The operators have essentially the same meaning. One important difference in Modula-2 is that expressions that contain the operators 'AND* or 'OR' are interpreted as conditional expressions whose evaluation is terminated as soon as the result of the expression is known (short-circuit evaluation): a AND b is equivalent to if a then b else false a OR b is equivalent to if a then true else b Statements Assignment, procedure call, and repeat-statement are taken from Pascal without change. If, case, while, and for statements have been syntactically improved and expanded. The if statement can have one or more elsif parts, the case statement can have an else part. All of these constructs are explicitly terminated by END, which eliminates the need to distinguish between single and multiple statements in a block: ifstatement = IF expr THEN statementsequence {ELSIF expr THEN statementsequence} [ELSE statementsequence] END. casestatement = CASE expr OF case {"I" case} [ELSE statementsequence] END. case = caselabellist n:w statementsequence. Whilestatement = WHILE expr DO statementsequence END. forstatement = FOR ident ":=w expr TO expr [BY constexpr] DO statementsequence END.
APPB Modula-2 and Pascal 209 features are the loop statement (infinite loop), the exit statement to leave the loop **eW nt ^d the return statement to leave a procedure or function (here with passing of jhe function value): loopstatement = LOOP statementsequence END. exitstatement = EXIT. returnstatement = RETURN [expr]. There is no goto statement and no input-output statement in Modula-2. Input and output is done by procedure calls. Procedures There are procedures and function procedures as in Pascal that permit VAL and VAR parameters. Procedures and functions both begin with the keyword PROCEDURE. Modula-2 permits procedure variables (not used by Coco), and arrays of unspecified length (so-called open arrays) e. g. in the form: PROCEDURE Sort(VAR list:ARRAY OF INTEGER); VAR n: INTEGER; BEGIN (* assume list: ARRAY[0..n] OF INTEGER *) n:=HIGH(list); (* standard proc. to find upper limit of index *) END Sort Standard procedures. The standard procedures that differ from Pascal are: CAP(ch): HIGH(a): DEC(x) DEC(x,n) EXCL(s,i) HALT INC(x) INC(x,n) INCL(s,i) CHAR CARDINAL converts from lower to upper case returns the upper bound of array a decrease exclude element i from set s: terminate entire program increase include element i in set s: x:=x-l x:=x-n s:=s-{i} x:=x+l x:=x+n s:=s+{i} Type transfer functions. Modula-2 offers the possibility of explicit type conversions by so-called type transfer functions. Each type name can be used as a function with one argument. For example, the type transfer function CARDINAL(b) denotes the bit pattern of b (without any conversion) but with type CARDINAL. The context condition must hold that type b has the same number of bits as CARDINAL. Type transfer functions should be used with care since they make programs machine dependent Modules An executable Modula-2 program consists of one or more separately compiled modules. A module is a collection of declarations and statements giving a higher-level unit. Module boundaries are like a fence for names, which means that names declared inside a module are unknown outside, and names declared outside a module are unknown inside. The programmer can open the fence for selected names by an import list that contains all names that are
210 Modula-2 and Pascal Aft) declared outside and are to be known inside the module and an export list that contains names that are declared inside the module and are to be known outside. Thus the j explicitly specified by the programmer and visible in the program text. There are four kinds of modules: main modules, definition modules, implementation modules, and inner modules. Main modules are almost like Pascal programs. They consist of an import list declarations (of constants, types, variables, procedures, and inner modules), and statements: programmodule = MODULE ident •;■ {import} {declaration} BEGIN statement sequence END ident n.n Only the line [import] is different from Pascal. It references other separately compiled modules, and causes these modules to be loaded. In the most common form import = FROM ident IMPORT identlist n;n ident is the name of the module to be loaded and identlist contains the names of the objects exported by the loaded module for use in the declarations and statements of the importing module. In the less common form import = IMPORT identlist n;n the identlist contains only the names of the modules that are to be loaded together with the importing module. Separately compilable modules that are not main programs consist of two separately compiled parts, the definition module and the implementation module. The definition module describes the interface of the module to its clients. All declared names are automatically exported. definitionmodule = DEFINITION MODULE ident w;n {import} {definition} END ident n.n definition contains the declarations of the exported objects. Procedures are only specified by their procedure heading (procedure name and parameters): definition = CONST ... I TYPE ... I VAR ... I PROCEDURE ident [formalparameters] w;n. The implementation module contains the declaration of the non-exported objects, the code for all procedures, and the statements of the module: implementationmodule = IMPLEMENTATION MODULE ident ";" {import} {declaration} BEGIN statement sequence END ident V"
APPB Modula-2 and Pascal 211 .ri ^ implementation modules exist in pairs and hi 55s*ticm module must be compiled before the implementation module. A module can be have the same name. The m module. A module can be if the definition modules of all of the imported modules have been compiled *Cfcr'storage for local objects of separately compiled modules is allocated when the object gram is loaded, and remains allocated until the program terminates (static memory allocation). The statement sequence of the implementation module is executed immediately after loading the module, and therefore can be used for the initialization of data. Inner modules are modules that are not separately compiled. They are like procedures nested inside other modules or procedures. They can import and export moduledeclaration = MODULE ident ";" {import} [EXPORT [QUALIFIED] identlist n;n] {declaration} BEGIN st atement sequen ce END ident. Storage for local objects of inner modules is allocated when the procedure that contains the inner module is activated, and released when the procedure returns to its caller. By calling the surrounding procedure, the statements of the inner module are also executed. There is a (fictitious) separately compiled module SYSTEM, provided by the compiler, that gives access to low-level features. It exports types and related procedures (including the type WORD). Each module that imports SYSTEM is therefore machine dependent
Syntax of Cocol Keywords: Other terminal symbols: Nonterminal symbols: Upper-case letters Literals or lower-case letters Upper and lower-case letters Cocol Expression Term Factor Attributes InAttributes OutAttributes SemAction SemMacroDef Symbol AliasName = GRAMMAR identifier [SEMANTIC DECLARATIONS {any}] [MACROS {SemMacroDef}] TERMINALS {Symbol [Attributes] [AliasName]} [PRAGMAS {Symbols [Attributes] [SemAction]}] NONTERMINALS {identifier [Attributes] [AliasName]} RULES {identifier [Attributes] w=n Expression "."} ENDGRAM. « Term {n|n Term}. = Factor {Factor}. ■ Symbol [Attributes] I EPS | ANY I SemAction I n(n Expression n)n I n[n Expression "]" I "{" Expression "}". = "<" ( OutAttributes I InAttributes [w;n OutAttributes]) n>n. = IN n:w (identifier | number) {"," (identifier I number)}. OUT identifier {"," identifier}. = SEM ( n(n identifier •)" I {any} ) ENDSEM. = SEM n:n identifier n:n {any} ENDSEM. = identifier | string. = ALIAS Symbol. 212
V G-code q T sy terminal If the next input symbol is sy, then recognize it, else report an error. X TA sy adr terminal with alternative If the next input symbol is sy, then recognize it, else go to adr. 2 NT sy nonterminal If the next input symbol is a valid start of the nonterminal sy, then enter the production of sy, else report an error. 3 NT A sy adr nonterminal with alternative If the next input symbol is a valid start of the nonterminal sy, then enter the production of sy, else go to adr. 4 NTS sy sem nonterminal with input attribute semantics If the next input symbol is a valid start of the nonterminal sy, then execute the semantic action sem (for input attribute assignment) and enter the production of sy, else report an error. 5 NT AS sy adr sem nont. with alternative and input attribute semantics If the next input symbol is a valid start of the nonterminal sy, then execute the semantic action sem (for input attribute assignment) and enter the production of sy, else report an error. 6 ANY any Recognize the next input symbol. 7 ANY A nradr any with alternative If the next input symbol is in the symbol set (any-set) denoted by nr, then recognize it, else go to adr. 8 EPS nr epsilon (empty string) If the next input symbol is in the successor set (eps-set) denoted by nr, then recognize the empty string, else report an error. 9 EPS A nradr epsilon with alternative If the next input symbol is in the successor set (eps-set) denoted by nr, then recognize the empty string, else go to adr. *0 JMP adr jump Go to adr. 11 RET return Return from the production of a nonterminal. ^2... SEM semantic action Execute the semantic action with the number of the G-code instruction. 213
E Intermodular cross-reference list The following list contains all names that are exported or imported by a module of the Coco system as well as their data types. For every name, the first reference denotes the exporting module and the other references the importing modules. Allocate alts at Attrtype ClearMarkList ClearSet Close CloseFile col CompErr CompleteAt PROC (VAR ptr:ADDRESS; size:LONGINT) System, cocogen, cocogen2, cocosym, cocosyn, Errors CARDINAL cocogra, cocogen2, cocosem ARRAY[1..10] OF CARDINAL cocolex, cocogen, cocosem, cocosyn (term,nonterm,const) cocogen, cocosem PROC (VAR m:Marklist) cocogra, cocosym, cocotst PROC (VAR srSymbolset; n:CARDINAL) cocosym, cocotst PROC (f:File) FilelO, coco, cocogen, cocogen2, cocolst PROC cocogen, coco, cocosem CARDINAL cocolex, cocogen, cocogen2, cocosem, cocosym, cocosyn PROC (nr:CARDINAL) Errors, cocogen, cocogen2, cocosem, cocosym PROC (sy,nr:CARDINAL): BOOLEAN cocosym, cocosem 214
App. E Intermodular cross-reference list 215 con ConcatLeft ConcatRight Copy CopyFramePart ddt Deallocate Deletable File FilelO, coco, cocogen, cocogen2, cocogra, cocolex, cocosem, cocosym, cocosyn, cocotst, Errors PROC {VAR gp,gl,gpl,gll:CARDINAL) cocogra, cocosem PROC (VAR gp,gl,gpl,gll:CARDINAL) cocogra, cocosem PROC (typ,col:CARDINAL) cocogen, cocosem PROC (VAR fl,f2:File; s:ARRAY OF CHAR) cocogen, cocogen2 ARRAY[nA*..■Z") OF BOOLEAN cocolex, coco, cocogra, cocosem, cocosym, cocotst PROC (VAR ptr:ADDRESS) System, cocogen, cocogen2, Errors PROC (loc:CARDINAL): BOOLEAN cocogra, cocosym, cocotst DeleteRedundantEps PROC cocogra, coco DelNode PROC (gn:Graphnode): BOOLEAN cocogra, cocosym, cocotst Direction (up,down) cocosym, cocosem Done BOOLEAN FilelO, coco, cocogen, cocogen2 EF CONST CHAR FilelO, cocolex, cocolst EmitAction PROC (line.-CARDINAL; VAR semrCARDINAL) cocogen, cocosem EOL CONST CHAR FilelO, cocolex, cocolst Errornode RECORD Errors, cocosyn Errorptr POINTER TO Errornode Errors, cocolst, cocosyn Fiie RECORD FilelO, coco, cocogen, cocogen2, cocolex, cocolst, Errors filesopen BOOLEAN cocogen, coco FindCircularRules PROC (VAR ok:BOOLEAN) cocotst, coco FindDelSymbols PROC cocosym, coco
216 Intermodular cross-reference list App. E GenAssign PROC (typtAttrtype; left,right:CARDINAL) cocogen, cocosem GenSynFiles PROC cocogen2, coco GetA PROC (n:CARDINAL; VAR set:Symbolset) cocosym, cocogen2 GetAt PROC (sy,n:CARDINAL; VAR spix:CARDINAL; VAR dirdirection) cocosym, cocosem GetE PROC (n -.CARDINAL; VAR set: Symbol set) cocosym, cocogen2 GetF PROC (sy:CARDINAL; VAR first:Symbolset) cocosym, cocogen2, cocotst GetFirstSet PROC (loc:CARDINAL; VAR set:Symbolset) cocosym, cocotst GetFo PROC (sy .-CARDINAL; VAR set: Symbol set) cocosym, cocotst GetMacroNr PROC (spix:CARDINAL; VAR sem:CARDINAL) cocosym, cocosem GetName PROC (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL) cocolex, cocogen, cocogen2, cocogra, cocosym, cocotst GetNextSemErr PROC (VAR nr,line,col:CARDINAL) Errors, cocolst GetNextSynErr PROC (VAR symbols:Errorptr; VAR line,col:CARDINAL) Errors, cocolst GetNode PROC (p:CARDINAL; VAR gn:Graphnode) cocogra, cocogen2, cocosem, cocosym, cocotst GetNumberOfErrors PROC (VAR synerrors,semerrors:CARDINAL) Errors, coco GetSy PROC cocolex, cocosyn GetSy PROC (sy-.CARDINAL; VAR sn: Symbol node) cocosym, cocogen2, cocogra, cocosem, cocotst GetSymbolSets PROC cocosym, coco gramspix CARDINAL cocosym, cocogen2, cocosem GraphList PROC cocogra, cocosem Graphnode RECORD cocogra, cocogen2, cocosem, cocosym, cocotst InsertFramePart PROC cocogen, cocosem
App.E Intermodular cross-reference list 217 IslnSet line LLlTest 1st Mark Marked Marklist maxany maxeps maxn maxp maxs maxsem maxt PROC (n:CARDINAL; VAR s.-Symbolset): cocosym/ cocotst BOOLEAN CARDINAL cocolex, cocogen, cocogen2, cocosem, cocosym, cocosyn PROC (VAR 111-.BOOLEAN) cocotst, coco File cocolst, coco, cocogen2, cocosym, cocotst PROC (loc:CARDINAL; VAR m:Marklist) cocogra, cocosym, cocotst PROC (loc:CARDINAL; VAR m:Marklist): BOOLEAN cocogra, cocosym, cocotst ARRAY[O..maxnodes DIV 16] OF BITSET cocogra, cocosym, cocotst CARDINAL cocosym, cocogen2 CARDINAL cocosym, cocogen2 CARDINAL cocogra, CARDINAL cocosym, CARDINAL cocosym, CARDINAL cocogen, cocogen2, cocosym cocogen2, cocogra, cocotst cocogen2, cocogra, cocotst cocogen2 CARDINAL cocosym, cocogen2, cocotst NewAt PROC (sy,spix:CARDINAL; dir:Direction) cocosym, cocosem NewEpsBeforeDelNts PROC cocogra, coco PROC (spix,sem:CARDINAL; VAR ok:BOOLEAN) cocosym, cocosem NewMacro NewNode NewSy normal Open PROC (typ:Symboltype;sp,line:CARDINAL)CARDINAL cocogra, cocosem PROC (spix-.CARDINAL; typ:Symboltype): CARDINAL cocosym, cocosem enumeration constant System, coco, Errors PROC (VAR f:File; vo 1 Ref-.INTEGER; fn:ARRAY OF CHAR; output:BOOLEAN) FilelO, coco, cocogen, cocogen2, cocolst
218 Intermodular cross-reference list App.E OpenFile OpenSem Parse printinput PrintListing printnodes PrintSynError PutStatistics Read RepNode RepSy RestartHash Restriction rootloc rules Semant SemErr SetBit src StartCopy StopHash Symbolnode PROC (spixtCARDINAL) cocogen, cocosem PROC (linetCARDINAL; VAR sem:CARDINAL) cocogen, cocosem PROC (VAR correct:BOOLEAN) cocosyn, coco BOOLEAN cocosyn, coco, cocolex PROC cocolst, coco BOOLEAN cocosyn, coco, cocolex PROC (VAR f:File; VAR synerrors:CARDINAL) Errors, cocolst PROC cocogen2, coco PROC (VAR f:File; VAR chtCHAR) FilelO, coco, cocogen, cocolex, cocolst, Errors PROC (p:CARDINAL; gn:Graphnode) cocogra, cocosem, cocosym PROC (sy:CARDINAL; sn:Symbolnode) cocosym, cocogen2, cocogra, cocosem, cocotst PROC cocolex, cocosem PROC (nr:CARDINAL) Errors, cocogra, cocolex, cocosem, cocosym CARDINAL cocogra, cocogen2, cocosem, cocosym, cocotst CARDINAL cocogra, cocogen2, cocosem PROC (sem:CARDINAL) cocosem, cocosyn PROC (nr,line,col:CARDINAL) Errors, cocogen, cocogen2, cocolex, cocosem, cocosym PROC (VAR s:Symbolset) cocosym, cocotst File cocolex, coco, cocogen, cocolst PROC (col .-CARDINAL) cocogen, cocosem PROC cocolex, cocosem RECORD
AppE Intermodular cross-reference list 219 Symbolset Symboltype SyNr SyntaxError Terminate cocosym, cocogen2, cocogra, cocosem, cocotst ARRAY[0..maxterminals DIV 16] OF BITSET cocosym, cocogen2, cocotst (eps,t,pr,nt,any,err) cocosym, cocogen2, cocogra, cocosem, cocotst PROC (spix:CARDINAL): CARDINAL cocosym, cocosem PROC (symbols:Errorptr; line,col:CARDINAL) Errors, cocosyn PROC (st:Status) System, coco, Errors TestCompleteness PROC (VAR ok-.BOOLEAN) cocotst, coco TestlfAllNtReached PROC (VAR ok:BOOLEAN) cocotst, coco TestlfNtToTerm typ Unit PROC (VAR ok:BOOLEAN) cocotst, coco CARDINAL cocolex, cocosem, cocosyn PROC (VAR sl,s2:Symbolset; n:CARDINAL) cocosym, cocotst Write PROC (VAR f:File; ch:CHAR) FilelO, cocogen, cocogen2, cocolex, cocolst, cocosym, Errors WriteCard PROC (VAR f:File; nr:CARDINAL; w:INTEGER) FilelO, cocogen, cocogen2, cocogra, cocolex, cocolst, cocosem, cocosym, cocosyn, cocotst, Errors Writelnt PROC (VAR f:File; nr:INTEGER; w:INTEGER) FilelO, coco WriteLn PROC (VAR f:File) FilelO, coco, cocogen, cocogen2, cocogra, cocolst, cocosym, cocosyn, cocotst, Errors WriteString PROC (VAR f:File; s:ARRAY OF CHAR) FilelO, coco, cocogen, cocogen2, cocogra, cocolex, cocolst, cocosem, cocosym, cocosyn, cocotst, Errors WriteText PROC (VAR f:File; t:ARRAY OF CHAR; 1:INTEGER) FilelO, cocogen, cocogen2, cocogra, cocolex, cocosym, cocotst, Errors
F Program listings This appendix contains the program listings of Coco, more than 3500 lines of Modula-2 source code. It is not our intention to describe the program step by step. At this point we want to provide the reader with an overview of the function of the individual modules, and to tell him where he should start reading, and which procedures he should further review in order to understand the program. Modula-2 has a high degree of self-documentation, which makes it possible to partition a large program into small modules that are easy to understand, and furthermore to separate these modules into even smaller procedures that are once more easy to understand. By reviewing the algorithms in Chapters 2,3 and 7, it should not be difficult for the reader to understand all the details of Coco. F.l Overview Figure F.l shows the phases of Coco with their modules and the data flow between them. The lexical analyzer (cocolex) reads the compiler description and separates it into tokens. The syntax analyzer (cocosyn) checks the syntax of the input stream and drives the semantic processing program (cocosem) by activating semantic actions via action numbers. In this phase, the symbol list (in cocsym) and the top-down graph (in cocogra) are generated. The module cocogen generates the new semantics evaluator from the semantic actions of the compiler description. Finally, the symbol list and the top-down graph are analyzed in the grammar tests (cocotst), and if these tests have been successfully completed, the new syntax analyzer with its parser tables is generated. Since Coco was constructed by itself, the syntax analyzer (cocosyn) and its semantic evaluator (cocosem) are examples of compiler parts produced by Coco. 220
ApP-P Overview 221 Lexical analysis cocolex T Semantic evaluate? Grammar tests cocotst f Syntax analysis Semantic analysis cocosyn cocosem cocosym cocogra cocogen Compiler generation cocogen2 Compiler description Symbols, attributes Symbol list top-down graph Symbol list top-down graph Syntax analyzer Fig.F.l Phases and modules of Coco F.2 Module hierarchy Coco consists of 1. 10 Coco-related modules coco cocolex cocosyn cocosem cocogra cocosym cocotst cocogen cocogen2 cocolst main module lexical analyzer syntax analyzer semantic evaluator top-down graph handler symbol list handler grammar tests generator of the new semantic evaluator generator of the new syntax analyzer and the parser tables source list generator 2. 2 general purpose standard modules Errors general error module for compilers generated by Coco FilelO input/output procedures 3. 1 operating system module (not part of Coco) System dynamic memory management (heap)
222 Program listings App.F Figure F.2 shows the module hierarchy. An arrow from module A to module B means that A calls B. Arrows leading to the operating system module and the standard modules are not shown for simplicity. Those modules are used by almost all of the other modules, and are not a direct part of Coco. t ft cocogen l cocosyn cocogra U ~1 cocosem i cocogen2 tTttT oocolex ~~T~ cocolst cocotst cocosym System FilelO Errors Fig. F.2 Module hierachy with relation 'uses procedures from' F.3 Module descriptions We will now give a short description of all modules of the Coco system. A diagram for each module will show which procedures are called from other modules. coco coco is the main module. It opens the source file and the list file and calls the syntax analyzer {Parse). When the syntax analysis is completed, the source file has been read, and the symbol list and a top-down graph have been stored. The top-down graph is further processed by inserting and deleting eps-nodes at certain positions (NewEpsBeforeDelNts, DelRedundantEps) and the terminal start symbols are collected (FindDelSymbols, GetSymbolSets). After that, coco calls the grammar tests (FindCircularRules, Testlf- NtToTerm, TestCompleteness, TestlfAllNtReachedy LLlTest) and generates the target compiler (GenSynFiles) if no errors are found. At the end, statistics about the compilation are written to the list file (PutStatistics), and all files are closed.
App- Module descriptions 223 cooosyn Parse cocosym FindDelSymbols GetSymbolSets I cocogen2 GenSynFiles PutStatistics cocogen CloseFile cocolst PrintListing cocotst FindCircularRules TestlfNtToTerm TestCompleteness TestlfAllNtReached LLlTest Fig. F3 coco and the modules imported by it cocolex cocolex is the lexical analyzer of Coco. It reads the Cocol input, separates it into tokens, and passes them together with their attributes to the syntax analyzer. Names and strings are stored in a name list. Numbers are translated into their numeric value. The main procedure of cocolex is GetSy. cocosyn cocosyn is the syntax analyzer of Coco and has been generated by Coco itself. It operates according to the table-driven LL(1) parsing algorithm described in Section 2.5, and uses the error-handling mechanism described in Section 2.6. cocosyn gets the source tokens from the lexical analyzer (GetSy), analyzes them, and calls the procedure Semant to execute the semantic actions. cooosyn 1 cocolex GetSy cocosem Semant Fig. F.4 cocosyn and the modules imported by it cocosem cocosem is the semantics evaluator of Coco. It has been generated by Coco itself and contains the semantic actions of the attributed grammar of Coco, cocosem calls the Procedures for the generation and management of the symbol list and the top-down graph: 1. 2. 3. 4. 5. 6. symbol handling: NewSy, GetSy, RepSy, SyNr; attribute handling: NewAt, GetAt, CompleteAt; top-down graph handling: NewNode, GetNode, RepNode, ConcatLeft, ConcatRight, GraphList; generation of the semantic evaluator: OpenFile, CloseFile, OpenSem, StartCopy, Copy, InsertFramePart, GenAssign, Emit Assign, EmitAction; handling of the semantic macros: NewMacro, GetMacroNr, control over the entries into the name list: StopHash, RestartHash.
224 Program listings App.p The listing of cocosem is an example of a large semantic evaluator generated by Coco. But it is not useful to study cocosem, rather one should study the attributed grammar. cocosem coookx StopHash RestartHash oocosym NewSy GctSy RepSy SyNir NewAt GetAt CompleteAt NewMacro GetMacroNr cocogra NewNode GetNode RepNode ConcatLeft ConcatRight Graphlist cocogen OpenFile CloseFile Copy InsertFramePart StartCopy OpenSem GenAssign EmitAction Fig. F.5 cocosem and the modules imported by it cocosym The module cocosym handles the symbol list of Coco. It contains procedures to generate, read, and modify symbol nodes, to search names in the symbol list, to enter, read, and check attributes, and to generate and retrieve information about semantic macros. It also contains procedures to determine the deletability of nonterminals, and to collect their terminal start symbols, cocosym uses a few procedures from cocolex and cocogra. cocosym 1 cocolex GetName 1 cocogra GetNode RepNode Deletable DelNode ClearMaxkList Marie Marked Fig. F.6 cocosym and the modules imported by it cocogra The module cocogra handles the top-down graph. It contains procedures to generate, read, and modify graph nodes, to link subgraphs, and to print the entire top-down graph for tracing, cocogra also contains procedures to insert eps-nodes in front of deletable nonterminals, and to remove redundant eps-nodes. To output the top-down graph, cocogra needs the syntax symbols and their names, which it gets from the modules cocosym and cocolex. cocogen The module cocogen generates the semantic evaluator of the target compiler from the semantic declarations and semantic actions of the input grammar. It contains procedures to
App- Module descriptions 225 cocogra oocolex GetName 1 1 cocosym GetSy RepSy Fig. F.7 cocogra and the modules imported by it read the frame module, to copy the semantic parts from the attributed grammar, and to translate attributes into semantic actions, cocogen uses no other modules of Coco except for the lexical analyzer, from which it gets the symbol names. cocogen oocolex GetName Fig. F.8 cocogen and the modules imported by it cocotst The module cocotst is a collection of procedures for the execution of the grammar tests as described in Section 7.5. It uses the symbol list (from cocosym) and the top-down graph (from cocogra). For the output of error messages, cocotst needs the symbol names which are obtained with the procedure GetName. To recognize the deletability of graph nodes, and subgraphs, it uses the procedures Deletable and DelNode from cocogra. cocotst 1 cocosym GetSy RepSy GetFiistSet GetF GetFb SetBit IsInSet Unit ClearSet cocogra GetNode Deletable DelNode ClearMarklist Made Marked oocolex GetName Fig. F.9 cocotst and the modules imported by it cocogen2 The module cocogen! generates the syntax analyzer and the parser tables of the target compiler. The table values are obtained from the symbol list (with GetSy, RepSy, GetF, GetE, and GetA) and from the top-down graph {GetNode). Before the tables can be inserted into the syntax analyzer, cocogen! transforms the top-down graph into G-code instructions. The syntax analyzer of the target compiler is assembled mainly from the frame parts (on the file cocosynframe), in which cocogen! inserts the parser tables, some
226 Program listings App. p declarations, and grammar-specific names. For the output of statistics, cocogen2 uses the procedure GetName from the lexical analyzer. cocogen2 | cocogen CopyFramePart cocogra GetNode cocosym GetSy RepSy GetF GetE GetA cocolex GetName Fig. F.10 cocogen2 and the modules imported by it cocolst cocolst is called by the main program if errors have been detected during parsing. It reads the input again and prints a source list with error messages. Errors Errors is a general-purpose error message module that can be used by all compilers generated by Coco. It contains procedures for storing semantic and syntax errors, for retrieving stored error messages, and for printing all of the stored error messages at the end of the program. In addition, it contains procedures for handling implementation restrictions and compiler errors. FilelO FilelO is a general-purpose module that contains screen and disk I/O procedures for characters, strings, and numbers. It is based on five system modules which are not described in this book. These are Terminal, MemTypes, OS, Toolbox and QuickDraw (see Inside Macintosh [1985] and Wirth et al. [1986]). System System is an operating system module that among other things manages the heap. F.4 Instructions on how to study the source code The listings consist of the attributed grammar of Coco and all other modules in alphabetical order. The reader should first study the source code of the main module coco to see how the program is started and initialized. The lexical analyzer and the syntax analyzer are not essential for an understanding of the other modules, so they may be skipped in the beginning. The central document that describes the actual translation is the attributed grammar. The reader should study the attributed grammar and the procedures that are called from the semantic actions in detail. It is recommended that the procedures belonging to a particular task are studied together. These tasks are:
ApPF Instructions how to study the source code 227 handling the symbol list: NewSy, GetSy, RepSy, IsSy \ handling the attributes: NewAt, GetAt, CompleteAt % handling the top-down graph: NewNode, GetNode, RepNode, ConcatLeft, ConcatRight, GraphlAst a generating the semantic evaluator: CloseFile, CopyFramePart, InsertFramePart 5. copying semantic parts: OpenSem, StartCopy, Copy 6 generating attribute assignments: GenAssign, EmitAction 7 handling semantic macros: NewMacro, GetMacroNr 8* controlling the name list entries: StopHash, RestartHash The procedures for the collection of the symbol sets and the execution of the grammar tests may be studied in any order. The only procedures used almost everywhere are the procedures for marking paths that have been previously visited in traversing the top-down graph (ClearMarkList, Mark, and Marked in cocogra) and the procedures which check the deletability of graphs and graph nodes {Deletable and DelNode in cocogra). These procedures should be read first As the last module, the reader should study cocogen2. It generates the parser tables and the syntax analyzer, and uses the data structures generated by the other modules. The reader should study these modules first to understand how the data structures are filled. Before an implementation module is studied, the corresponding definition module should be inspected. It describes the interface of the module, and contains the declarations and descriptions of all exported objects. The procedures of an implementation module appear in alphabetical order. Most of them are at the outermost level of the module. Only auxiliary procedures that are clearly part of another procedure are nested within this procedure. Each implementation module is followed by a cross-reference list As an additional aid, Appendix E contains an intermodular cross-reference list with the names and types of all objects transferred between modules. This list also shows which modules export an object and which import it Program listings in alphabetical coco.ATG coco.MOD cocogen.DEF, cocogen2.DEF, cocogra.DEFr cocolex.DEF, cocolst.DEF, cocosem.DEF, cocosemframe cocosym.DEF, cocosyn.DEF, cocosynframe cocotst.DEF, E*rors.DEF, FilelO.DEF System.DEF cocogen.MOD r cocogen2.MOD cocogra.MOD cocolex.MOD cocolst.MOD cocosem.MOD cocosym.MOD cocosyn.MOD cocotst.MOD Errors.MOD FilelO.MOD order attributed grammar main program generator of semantics processor generator of the syntax analyzer top-down graph manager lexical analyzer source list generator semantic evaluator of Coco semantics evaluator frame symbol list manager syntax analyzer syntax analyzer frame grammar tests standard error module input/output module dynamic memory management 228 241 245 254 266 274 283 287 297 299 316 328 338 348 356 369
228 Program listings App.p 1 2 - 3 - 4 - 5 - 6 - - Attributed grammar of Coco Moe 13.3.83 This grammar is a documentation of the compiler compiler Coco, but it is also an example how to use the Coco input language Cocol. The grammar describes the construction of the parser tables and of the semantic evaluator. 8 GRAMMAR coco 9 10 — 11 — 12 — 13 — 14 — 15 — 16 — 17 — 18 — 19 — 20 — 21 — 22 — 23 — 24 — 25 — 26 — 21 — 28 — 29 — 30 — 31 — 32 — coco expr term fact attr inattr outattr semaction macrodef symbol = GRAMMARSY IDENT [SEMANTICSY DECLARATIONSY {any}] [MACROSY {macrodef}] TERMINALSY {symbol [attr] [aliasname]} [PRAGMASY {symbol [attr] [semaction]}] NONTERMINALSY {IDENT [attr] [aliasname]} RULESSY {IDENT [attr] '-■ expr '.'} ENDGRAMSY . = term {■I• term} . = fact {fact} . = ( symbol [attr] I EPSSY I ANYSY I semaction I •{* expr »)' I •[• expr »]' I •{' expr •}' . = •<■ (outattr I inattr [';' outattr]) •>' . = INSY »:' (IDENT I NUMBER) (V (IDENT j NUMBER)} = OUTSY ':' IDENT {',' IDENT} . = SEMSY ( '(' IDENT ■)» | {any}) ENDSEMSY . = SEMSY ■:■ IDENT ":" {any} ENDSEM . = IDENT | STRING . 33 — aliasname = ALIASSY symbol 34 35 36 SEMANTIC DECLARATIONS 38 39 FROM cocogen 40 41 FROM cocogra 42 43 FROM cocolex 44 FROM cocosym 45 46 47 FROM Errors 48 FROM SYSTEM 49 50 51 CONST 52 null = 65535; 53 54 TYPE 55 Usage = (def, check, use) 56 57 VAR 58 — symbol nodes 59 sn: Symbolnode; IMPORT Attrtype, CloseFile, Copy, EmitAction, GenAssign, InsertFramePart, OpenFile, OpenSem, StartCopy; IMPORT alts, rules, rootloc, ConcatLeft, ConcatRight, GetNode, GraphList, Graphnode, NewNode, RepNode; IMPORT typ, line, col, ddt, RestartHash, StopHash; IMPORT gramspix, CompleteAt, Direction, GetAt, GetMacroNr, GetSy, NewAt, NewMacro, NewSy, RepSy, Symbolnode, Symboltype, SyNr; IMPORT CompErr, Restriction, SemErr; IMPORT VAL; - null symbol symbol node
APP-F cocoATG 229 60 sy, syl: CARDINAL; — symbol numbers 61 rootsy: CARDINAL; — start symbol of grammar 6? eofsy: CARDINAL; — endfile symbol (always Nr. 0*) 64 — graph nodes 55 gn: Graphnode; — graph node 66 gp,gpl/9P2f9P3: CARDINAL; — ptr to start of graphs 67 gl,gllfgl2rgl3: CARDINAL; — ptr to right open ends of graphs 68 dd,ddl,dd2: BOOLEAN; — is graph deletable ? 69 gpo: CARDINAL; — auxiliary ptr 70 firstfact: BOOLEAN; — TRUE if first factor in term 71 — attribute processing 72 kind: Usage; — usage of attribute 73 styp: Symboltype; — (eps,t,pr,nt,any,err) 74 dir, dirl: Direction; — input/output attribute 75 count: CARDINAL; — attribute counter 76 n: CARDINAL; ~ value of an attribute constant 77 — generation of semantic evaluator 78 seml,sem2,sem3: CARDINAL; — semantic actions 79 firstsymbol: BOOLEAN; — current symbol the first in action 80 — various 81 ok: BOOLEAN; — error indicator 82 spix, spixl: CARDINAL; — auxiliaries 83 dummy: CARDINAL; 84 85 86 — SEMANTICSTACK Stack to save semantic values 87 88 MODULE SEMANTICSTACK; 89 IMPORT CompErr, Restriction; 90 EXPORT Pop, Push; 91 CONST maxstacksize = 70; 92 VAR 93 stack: ARRAY[1..maxstacksize] OF CARDINAL; 94 sp: CARDINAL; 95 96 PROCEDURE Pop(): CARDINAL; 97 VAR x: CARDINAL; 98 BEGIN 99 IF sp=0 THEN CompErr(6); ELSE x:=stack[spj; DEC(sp); END; 100 RETURN x; 101 END Pop; 102 103 PROCEDURE Push(x:CARDINAL); 104 BEGIN 105 IF sp<maxstacksize 106 THEN INC(sp); stack[sp]:=x; 107 ELSE Restriction(14); 108 END; 109 END Push; 110 111 BEGIN 112 sp:=0; 113 END SEMANTICSTACK; 114 115 116 — Error Report semantic error 117 118 PROCEDURE Error(nrCARDINAL);
230 Program listings App.F 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 BEGIN SemErr(nr,line,col); END Error; MACROS sem :AsslgnIdl: INC(count); CASE kind OF use: IF styp^nt THEN GetAt (!sy,!count,Aspixl,Adirl); IF spixloO THEN IF dirodirl THEN GenAssign(!nonterm,!spixl,!spix); ELSE Error(8); END; END; END; 1 check: IF styp^nt THEN GetAt (!sy,!count,Aspixl,Adirl); IF spixloO THEN IF spixospixl THEN Error(9); END; IF dirodirl THEN Error(8); END; END; END; i def: NewAtdsy, Ispix, Idir); END; — CASE endsem sem :AssignId2: INC(count); CASE kind OF use: IF styp=t THEN GenAssign(!term,!spix,!count); ELSIF styp=nt THEN GetAt (!sy,I count,Aspixl,Adirl); IF spixloO THEN IF dir-dirl THEN GenAssign(!nonterm,Ispix,Ispixl) ELSE Error(8); END; END; END; 1 check: IF styp-nt THEN GetAt (!sy,!count,Aspixl,Adirl); IF spixloO THEN IF spixospixl THEN Error(9); END; IF dirodirl THEN Error(8); END; END; END; 1 def: NewAt(!sy, Ispix, !dir); IF styp=pr THEN GenAssign(I term,Ispix,I count); END;
r c cocoATG 231 1?8 END; — CASE YI2 endsem 180 i8l sem :AssignNumber: 182 INC(count); 183 IF kind=use 184 THEN ll5 IF styp=nt THEN l86 GetAt(!sy,!count,Aspixl,Adirl); ;87 IF spixloO THEN 188 IF dir^dirl l8g THEN GenAssign(!const,Ispixl,!n); 190 ELSE Error(8); 191 END; 192 END; 193 END; 194 ELSE Error(10); 195 END; 196 endsem 197 198 sem :CheckAttr: 199 IF NOT CompleteAt(!sy,!count) THEN 200 Error(6); 201 END; 202 endsem 203 204 sem :Copy: 205 Copy(typ,col) 206 endsem 207 208 sem :InltCopy: 209 StartCopy(l) 210 endsem 211 212 sem :PopPointers: 213 firstfact:=VAL(BOOLEAN,Pop()); 214 ddl:=VAL(BOOLEAN,Pop()); gll:=Pop(); gpl:=Pop(); 215 dd:=VAL(BOOLEAN,Pop()); gl:=Pop(); gp:=Pop(); 216 gpo:=0 217 endsem 218 219 sem tPushPointers: 220 Push(lgp); Push(!gl); Push(!VAL(CARDINAL,dd)); 221 Push(Igpl); Push(lgll); Push(!VAL(CARDINAL,ddl)); 222 Push(!VAL(CARDINAL,flrstfact)); 223 endsem 224 225 sem :StoreSymbol: 226 sy:«SyNr(!spix); 227 if sy=null 228 THEN sy:=NewSy(splx,styp) 229 ELSE Error(1); 230 END; 231 endsem 232 233 234 TERMINALS 235 --======= 236
232 Program listings App.p — key words ALIASSY ANYSY DECLARATIONSY ENDGRAMSY ENDSEMSY EPSSY GRAMMARSY INSY MACROSY NONTERMINALSY OUTSY PRAGMASY RULESSY SEMSY SEMANTICSY TERMINALSY 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 NONTERMINALS 279 alias "ALIAS" alias "any" alias "DECLARATIONS" alias "ENDGRAM" alias "endsem" alias "eps" alias "GRAMMAR" alias "in" alias "MACROS" alias "NONTERMINALS" alias "out" alias "PRAGMAS" alias "RULES" alias "sem" alias "SEMANTICS" alias "TERMINALS" — terminal classes IDENT <out:spix> STRING <out:spix> NUMBER <out:n> — single characters alias identifier nococosy <out:n> 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 ALIAS ANY, any DECLARATIONS ENDGRAM ENDSEM EPS, eps GRAMMAR IN, in MACROS NONTERMINALS OUT, in PRAGMAS RULES SEM, sem SEMANTICS TERMINALS name string constant 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 coco expr alias "correct grammar" — recognizes the whole compiler description <out:gp,gl,dd> alias expression — recognizes an expression and builds its TDG. ~ gp points to the root of the TDG — gl points to right open ends of the TDG — dd indicates if the TDG is deletable term <out:gpl,gll,ddl> alias alternative — recognizes an alternative and builds its TDG. — gpl points to the root of the TDG ~ gll points to right open ends of the TDG ~ ddl indicates if the TDG is deletable fact <in:gpo,firstfact; out:gp2,gl2,dd2,gpo> alias symbol — recognizes a component and builds its TDG. — gp2 points to the root of the TDG — 35 -- 36 — 37 -=■ 38
APPF cocoATG 233 Qg — gl2 points to right open ends of the TDG 2JJ7 — dd2 indicates if the TDG is deletable 298 — gpo points to the predecessor of fact or is 0 29g — firstfact is TRUE, if fact is the first one in the term 300 *ttr <in:sy,styp,kind; out:semi,sem2,count> — 39 301 alias attribute 3Q2 — recognizes input/output attributes for the symbol sy 303 -- with type styp. 3Q4 ~ kind=def: used in declaration context 305 — seml=0. sem2=0 (except of pragmas) 306 ~ kind=check: used on the left-hand side of rules 307 — seml=0, sem2=0 308 — kind=use: used on the right-hand side of rules 309 — semi: sem.no. of input attribute evaluation 310 — sem2: sem.no. of output attribute evaluation 311 — count is the nr.of attributes in attr 312 inattr <in:sy,styp,kind,count; out:semi,count> — 40 313 alias "in-attribute" 314 — recognizes input/output attributes for the symbol sy 315 — with type styp (sy must be a nonterminal). 316 — kind=def: used in declaration context 317 — seml=0. 318 — kind=check: used on the left-hand side of rules 319 ~ seml=0. 320 — kind=use: used on the right-hand side of rules 321 — semi: sem.no. of input attribute evaluation 322 — count is the no.of attributes in inattr 323 outattr <in:sy,styp,kind,count; out:sem2,count> — 41 324 alias "out-attribute" 325 — recognizes input/output attributes for the symbol sy 326 — with type styp. 327 — kind=def: used in declaration context 328 — sem2=0. 329 — klnd=check: used on the left-hand side of rules 330 — sem2=0. 331 — kind=use: used on the right-hand side of rules 332 — sem2: sem.no. of output attribute evaluation 333 — count is the no.of attributes in outattr 334 semaction <out:sem3> alias "semantic action" — 42 335 — recognizes a semantic action and generates a CASE block 336 — in Semant. sem2 is the action number. 337 macrodef alias "semantic macro" — 43 338 symbol <out:spix> — 44 339 — recognizes a name or a string 340 aliasname <in:sy> alias "alias name" — 45 341 — recognizes a name which is used for the symbol sy in 342 — syntax error messages in the generated compiler. 344 345 —ssssssssssssssssssssssssssess grammar rules =========—==========«====:===— 346 RULES 347 coco =* 348 GRAMMARS Y 349 IDENT <out:gramspix> sem rules:=0; alts:*0; 3j?° OpenFile (gramspix); StopHash; 351 endsem 352 3j|3 [ SEMANTICSY DECLARATIONSY 3jj4 sem (InitCopy) endsem { any sem (Copy) endsem 355
234 Program listings App.F 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 } ] [ MACROSY { macrodef } TERMINALSY { symbol <out [ attr <ln: [ allasname } [ PRAGMASY :spix> sy,t,def; <in:sy> { symbol <out:spix> sem RestartHash; InsertFramePart; styp:-t; endsem ] sem eofsy:=NewSy(!0,!t) endsem sem (StoreSymbol) endsem out:semi,sem2,count> 3 1 sem styp:=pr endsem sem (StoreSymbol) endsem [ attr <in:sy,prrdef; out:semirsem2,count> 1 sem GetSy(!sy,Asn); sn.seml:=sem2; RepSyUsy, !sn); endsem [ semaction <out:sem3> ] } ] NONTERMINALSY { IDENT <out: [ attr <in: [ allasname } RULESSY { IDENT <out: [ attr <ln: 1 = • spix> sy,nt,def <in:sy> splx> sem GetSy(!sy,Asn); sn.sem2:=sem3; RepSydsy, !sn); endsem sem styp:=nt endsem sem (StoreSymbol) endsem ; out:semi,sem2rcount> ] ] sem rootsy:=SyNr(!gramspix); IF rootsy-null THEN Error(2); END; endsem sem sy:=SyNr(!spix); IF sy^null THEN Error(3); sy:=NewSy(!spix,!err) END; GetSy(!sy,Asn); IF (sn.typont) AND (sn.typoerr) THEN Error(4); END; IF sn.startoO THEN Error(5); END; syl:=sy; count:=0; styp:=sn.typ endsem sy,styp,check; out:semi,sem2,count> ] expr <out:gp,gl,dd> i i } sem (CheckAttr) endsem sem GetSy(!syl,Asn); sn.start:=gp; sn.del:=dd; RepSy(!sylr!sn); INC(rules); endsem sem rootloc:=NewNode(!ntf!rootsy,!0); gpl:=NewNode(!t,!eofsy,!0); gl:=rootloc; gll:=gpl; ConcatRlght(rootloc,gl,!gpl,!gll) endsem
ApPF cocoATG 235 ENDGRAMSY sem IF ddt["L"] THEN GraphList; END; *15 CloseFile; endsem. 416 417 418 419 expr <out:gp,gl#dd> = 420 term <out:gp,gl,dd> sem INC(alts); endsem 421 IT 422 term <out:gpl,gll,ddl> sem INC(alts); 423 ConcatLeft(gp,gl,!gpl,!gll); ^24 dd:=dd OR ddl j95 endsem 426 >- 427 428 term <out:gpl,gll,ddl> = 429 sem gpo:=0 endsem 430 fact <in:gpo,TRUE; out:gpl,gll,ddl,gpo> 431 { fact <in:gpo,FALSE; out:gp2,gl2,dd2,gpo> 432 sem IF gp2<>0 THEN 433 ConcatRight(gpl,gll,!gp2,!gl2); 434 ddl:=ddl AND dd2; 435 END; 436 endsem 437 }. 438 439 fact <in:gpo,firstfact; out:gp2,gl2,dd2,gpo> = 440 ( symbol <out:spix> sem sy:=SyNr(!spix); 441 IF sy=null THEN 442 Error(3); sy:=NewSy(!spix,!err) 443 END; 444 GetSy(!sy,Asn); 445 IF sn.typ=pr THEN Error(16); END; 446 gp2:=NewNode(!sn.typ,!sy,!llne); 447 gl2:=gp2; dd2:=FALSE; gpo:=gp2; 448 count:=0; styp:=sn.typ 449 endsem 450 [ attr <in:sy,styp,use; out:semi,sem2,count> 451 sem GetNode(!gp2,Agn); 452 gn.semi:-semi; gn.sem2:=sem2; 453 RepNode(!gp2,!gn) 454 endsem 455 ] sem (CheckAttr) endsem 456 | EPSSY sem gp2:=NewNode(!eps,!0,!line); 457 gl2:=gp2; dd2:=TRUE; gpo:=gp2 458 endsem 459 | MYSY sem gp2:=NewNode(!any, !0,!line); 460 gl2:=gp2; dd2:=FALSE; gpo:=gp2 461 endsem 462 | semaction <out:sem3> sem IF gpo=0 463 THEN 464 gp2:«NewNode(!eps, !0,!llne); 465 gl2:*gp2; dd2:=TRUE; 466 GetNode(!gp2,Agn); gn.sem3:=sem3; 467 RepNode(!gp2,!gn); 468 ELSE 469 GetNode(!gpo,Agn); gn.sem3:=sem3; 4 7 0 RepNode(gpo,gn); 471 gp2:*0; gl2:-0; gpo:=0 472 END; 473 endsem
236 Program listings App.F 474 | »(» sem (PushPolnters) endsem 475 expr <out:gp2,gl2,dd2> 476 ■)* sem (PopPointers) endsem 477 I '[' sem (PushPolnters) endsem 478 expr <out:gp,glrdd> sem gp2:=NewNode(!eps,!0,!line); 479 gl2:=gp2; 480 ConcatLeft(gp,gl,!gp2,!gl2); 481 gp2:=gp; gl2:=gl; dd2:=TRUE; 482 endsem 483 ']' sem (PopPointers) endsem 484 I '{' sem (PushPolnters) endsem 485 expr <out:gp,gl,dd> sem gp2:=NewNode(!eps,!0,!line); 486 gl2:=gp2; 487 ConcatRight(gp,gl,!gp,!gl); 488 ConcatLeft(gp,gl,!gp2,!gl2); 489 gp2:=gp; dd2:*TRUE; 490 — gl2 is link of eps node 491 endsem 492 '}' sem (PopPointers) endsem 493 sem IF firstfact THEN 494 gp3:=gp2; gl3:=gl2; 495 gp2:=NewNode(!eps,!0,!line); gl2:=gp2; 496 ConcatRight(gp2,gl2,!gp3,!gl3); 497 END; 498 endsem 499 ). 501 502 503 504 505 506 507 RfiQ DKio 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 attr <in:sy,styp,kind; •<• ( inattr <in:sy,styp, out:seml,sem2,count> = sem seml:=0; sem2:=0 endsem kind,0; out:semi,count> [ ';' outattr <in:sy,stypfkind,count; out:sem2,count> ] 1 outattr <in:sy,stypfkindf0; out:sem2,count> ) »>'. inattr <in:sy,styp,kind, count; out:semi,count> = INSY i ♦ i ( IDENT <out:spix> I NUMBER <out:n> ) { V ( IDENT <out:spix> 1 NUMBER <out:n> )} sem IF stypont THEN Error(7); END; dir:-down; endsem sem (Assignldl) endsem sem (AssignNumber) endsem sem (Assignldl) endsem sem (AssignNumber) endsem sem IF kind=use THEN EmitAction((line,Aseml); END; endsem. 525 outattr <in:sy,styp,kind,count; out:sem2,count> = 526 OUTSY sem dir:=up endsem 527 ':' 528 IDENT <out:spix> sem (Assignld2) endsem 529 { »,» 530 IDENT <out:spix> sem (Assignld2) endsem 531 } sem IF (kind-use) OR (styp=pr) THEN 532 EmitAction(!line,Asem2);
APPF cocoATG 237 533 534 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 - 570 571 572 - 573 574 575 576 577 578 semaction <out: SEMSY ( '(' IDENT <out: ')' 1 ( any } ) ENDSEMSY. macrodef = SEMSY ■. i sem3> spix> IDENT <out:spix> i. i { any } ENDSEMSY symbol <out:spix> - ( IDENT <out: spix> aliasname <in:sy> = ALIASSY symbol <out:spix> 579 ENDGRAM alias 238 250 334 aliasname 13 aliasspix 575 ALIASSY 33 alts any ANYSY 41 11 22 Asslgnldl 125 239 251 337 15 238 349 30 239 514 = sem sem sem endsem. sem StopHash; firstsymbol:=TRUE endsem sem RestartHash endsem sem GetMacroNr(!spixr Asem3); IF sem3«0 THEN Error(12); END; endsem sem IF firstsymbol THEN firstsymbol:=FALSE; OpenSem(!line, Asem3); StartCopy(!col) END; Copy(!typ,!col) endsem sem RestartHash; endsem OpenSem(!line,Asem3); NewMacro(!spix,!sem3,Aok); IF NOT ok THEN Error(11); END; StopHash; firstsymbol:=TRUE; endsem IF firstsymbol THEN firstsymbol:=FALSE; StartCopy(col) END; Copy(!typr!col) endsem RestartHash endsem. 1 STRING <out:spix> ). 240 252 340 33 574 420 31 459 518 sem GetSy(!sy,Asn); sn.aliasspix:=spix; RepSydsy, !sn); endsem. 241 242 243 244 245 246 247 248 249 253 256 280 282 287 293 301 313 324 340 366 385 573 422 73 239 355 459 543 562
238 Program listings App.p Assignld2 150 528 530 AsslgnNumber 181 515 519 attr 13 14 15 16 20 27 300 311 365 370 384 401 450 501 attributes 302 311 314 322 325 333 Attrtype 39 CheckAttr 198 402 455 CloseFile 39 416 coco 8 10 280 347 cocogen 39 cocogra 41 cocolex 43 cocosyra 44 col 43 119 205 545 547 563 565 CompErr 47 89 99 CompleteAt 44 199 ConcatLeft 41 423 480 488 ConcatRight 41 413 433 487 496 const 189 Copy 39 204 205 355 547 565 count 75 126 130 139 151 155 157 167 176 182 186 199 300 311 312 312 322 323 323 333 365 370 384 399 401 448 450 501 503 504 504 505 509 509 525 525 dd 68 215 220 282 286 404 405 419 420 424 424 478 485 ddl 68 214 221 287 291 422 424 428 430 434 434 dd2 68 292 297 431 434 439 447 457 460 465 475 481 489 ddt 43 415 DECLARATIONSY 11 240 353 def 55 145 173 304 316 327 365 370 384 del 405 dir 74 132 142 146 159 170 174 188 511 526 dirl 74 130 132 139 142 157 159 167 170 186 188 Direction 44 74 down 511 dummy 83 EmitAction 39 521 532 ENDGRAMSY 17 241 415 ENDSEMSY 30 242 551 568 eofsy 62 363 411 eps 73 243 456 464 478 485 490 495 EPSSY 21 243 456 err 73 392 395 442 Error 116 118 119 134 141 142 161 169 170 190 194 200 229 387 392 396 398 442 445 510 540 558 Errors 47 expr 16 18 24 25 26 282 404 419 475 478 485 fact 19 19 20 292 298 299 430 431 439 firstfact 70 213 222 292 299 439 493 firstsymbol 79 537 543 544 559 562 563 GenAssign 39 133 155 160 176 189 GetAt 45 130 139 157 167 186 GetMacroNr 45 539 GetNode 42 451 466 469 GetSy 45 371 376 394 404 444 575 gl 67 215 220 282 285 404 412 413 419 420 423 478 480 481 485 487 487 488 gll 67 214 221 287 290 412 413 422 423 428 430 433 gl2 67 292 296 431 433 439 447 457 460 465 471 475
APPF cocoATG 239 gl3 gn gp gpl gp2 gp3 gpo GRAMMARSY gramsplx GraphLlst Graphnode IDENT lnattr InitCopy InsertFramePart INSY kind line link macrodef MACROSY maxstacksize n name NewAt NewMacro NewNode NewSy nococosy nodes nonterm NONTERMINALSY nr nt null NUMBER ok OpenFlle OpenSem outattr OUTSY Pop PopPointers Pr PRAGMASY Push PushPointers RepNode Repsy RestartHash 479 67 65 66 481 66 433 66 456 478 66 69 447 10 44 42 42 10 383 27 208 40 28 72 323 43 556 490 12 12 91 76 256 45 45 42 46 275 58 133 15 118 73 52 28 81 40 40 27 29 90 212 73 14 90 219 42 46 43 480 494 451 215 485 214 292 457 479 494 216 457 244 349 415 65 15 390 28 354 358 245 127 327 119 31 246 93 189 339 146 557 410 228 64 160 247 119 129 52 28 557 350 545 27 248 96 476 175 249 103 474 453 372 357 481 496 452 220 487 221 295 457 480 496 292 460 348 350 16 514 312 510 152 329 446 337 361 105 258 341 174 411 363 382 309 138 227 258 558 556 29 526 101 483 368 368 109 477 467 377 538 486 452 282 487 287 431 459 481 292 462 386 28 518 322 183 331 456 361 275 446 392 310 156 387 515 323 213 492 370 220 484 470 406 549 488 453 284 488 289 432 460 485 298 469 28 528 503 300 501 459 553 515 456 442 311 166 391 519 333 214 445 220 576 568 490 466 404 489 411 433 460 486 429 470 29 530 509 304 503 464 519 459 321 185 441 504 214 531 220 494 466 405 412 439 464 488 430 471 29 539 306 504 478 464 322 382 505 214 221 495 467 419 413 446 465 489 430 30 556 308 505 485 478 332 384 525 215 221 496 469 420 422 447 466 494 431 31 571 312 509 495 485 333 395 215 221 469 423 423 447 467 495 431 32 316 520 521 495 410 215 222 470 478 428 451 471 495 439 256 318 525 532 510 480 430 453 475 496 439 349 320 531 545
240 Program listings App.F Restriction 47 89 107 root 284 289 295 rootloc 41 410 412 413 rootsy 61 386 387 410 rules 41 306 308 318 320 329 331 345 349 407 RULESSY 16 250 389 semi 78 300 305 307 309 312 317 319 321 365 370 371 384 401 450 452 452 501 502 503 509 521 sem2 78 300 305 307 310 323 328 330 332 336 365 370 371 376 384 401 450 452 452 501 502 504 505 525 532 sem3 78 334 375 376 462 466 466 469 469 536 539 540 545 556 557 semaction 14 23 30 334 375 462 536 Semant 336 SEMANTICSTACK 86 88 113 SEMANTICSY 11 252 353 SemErr 47 119 SEMSY 30 31 251 537 554 sn 59 371 371 372 376 376 377 394 395 395 398 399 404 405 405 406 444 445 446 448 575 575 576 sp 94 99 99 99 105 106 106 112 spix 82 133 141 146 155 160 169 174 176 226 228 256 257 338 364 369 383 390 390 392 440 440 442 514 518 528 530 539 539 556 557 570 571 571 575 575 spixl 82 130 131 133 139 140 141 157 158 160 167 168 169 186 187 189 stack 93 99 106 StartCopy 40 209 545 563 StopHash 43 350 537 559 StoreSymbol 225 364 369 383 STRING 32 257 571 string 257 339 styp 73 129 138 154 156 166 175 185 228 300 303 312 315 323 326 358 368 382 399 401 448 450 501 503 504 505 509 510 525 531 sy 60 130 139 146 157 167 174 186 199 226 227 228 300 302 312 314 315 323 325 340 341 365 366 370 371 372 376 377 384 385 390 391 392 394 399 401 440 441 442 444 446 450 501 503 504 505 509 525 573 575 576 406 79 syl symbol Symbolnode Symboltype SyNr SYSTEM t term TERMINALSY typ type up Usage use VAL X 60 13 293 46 46 46 48 73 18 13 43 303 526 55 4 48 97 399 14 302 59 73 226 154 18 253 205 315 72 55 213 99 404 20 314 386 358 19 363 395 326 128 214 100 406 32 325 390 363 70 395 153 215 103 33 338 440 365 155 399 183 220 106 52 341 411 176 445 308 221 58 364 287 446 320 222 59 369 299 448 331 60 440 420 547 450 61 570 422 565 520 62 575 428 531
cocoMOD 241 ApP-] l (* Coco Compiler compiler Coco Moe 27.12.83 3 This'is the main module of Coco a compiler compiler. It 5 a) opens and closes the files It controls the execution of the b) initializes the scanner C) calls the parser d) calls the procedures which collect the symbol sets e) calls the grammar test procedures f) calls the procedure which generates the compiler a) calls the lister to print a listing with error messages 13 implementation restrictions 14 1: cocolex Hash 15 2: cocolex Hash 16 3: cocolex Pushlnc 17 4: cocolex EnQueue 18 5: cocogra NewNode 19 6: cocosym NewSy 20 7: cocosym NewSy 21 22 Compiler errors: Hash table full Name list full Include stack overflow Attribute queue overflow Too many nodes in TDG (>600) Symbol list overflow (>199) Too many terminals (>127) 23 1: 24 2: 25 3: 26 4: 27 5: 28 6: 29 30 Trace switches: cocolex cocolex cocosym cocogen Poplnc DeQueue GetAt OpenFile cocogen2 GenSynFiles cocogen2 NewAdr Include stack underflow Attribute queue underflow Try to get attribute inf. Semantic frame not found Parser frame not found Fixups already resolved for a terminal A: cocosyn B: cocosyn cocogra cocotst cocotst cocotst cocosym cocosym cocosym cocosym K: cocosym L: cocosem 31 32 33 34 35 36 37 38 39 40 41 42 43 44 MODULE Coco; 45 46 FROM cocogen IMPORT 47 FROM cocogen2 IMPORT 48 FROM cocogra IMPORT 49 FROM cocolex IMPORT 50 FROM cocolst IMPORT 51 FROM cocosym IMPORT 52 FROM cocosyn IMPORT 53 FROM cocotst IMPORT 54 55 FROM Errors IMPORT 56 FROM FilelO IMPORT 57 58 FROM System IMPORT 59 can be set by "$D letter {letter}" (without spaces) Print parser input (remove comments!) Trace parser run (remove comments!) DelGraph Print visited nodes FindCircularRules Print derivations between single nt's TestlfNtToTerm Trace flow of algorithm CheckAlternatives Print visited nodes CollectFirstSet Print visited nodes GetFirstSet Print resulting set GetFollowSets Print resulting sets CollectFollowSets Print visited nodes Print sets of term.starts and succ.s Print generated TDG filesopen, CloseFile; GenSynFiles, PutStatistics; DeleteRedundantEps, NewEpsBeforeDelNts; ddt, src; 1st, PrintListing; FindDelSymbols, GetSymbolSets; Parse, printinput, printnodes; FindCircularRules, LLlTest, TestCompleteness, TestlfAllNtReached, TestlfNtToTerm; GetNumberOfErrors; con, File, Done, Open, Close, Read, Writelnt, WriteLn, WriteString; Terminate, normal; *)
242 Program listings App.F 60 61 VAR 62 ch: CHAR; 63 correct: BOOLEAN; 64 111: BOOLEAN; (*TRUE if grammar is LL(1)*) 65 lstn: ARRAY[0..63] OF CHAR; (*list file name*) 66 ok: BOOLEAN; 67 semerrors: CARDINAL; 68 synerrors: CARDINAL; 69 70 71 (* ChangeExtension Change extension of file name 72 *} 73 PROCEDURE ChangeExtension(VAR old,new:ARRAY OF CHAR; ext:ARRAY OF CHAR); 74 VAR i,j: INTEGER; 75 BEGIN 76 i:=0; 77 WHILE (i<=HIGH(old)) AND (old[i]<>0C) DO i:=i+l; END; 78 WHILE (i>=0) AND (old[i]<=" ") DO DEC(i) END; 79 j:=i; 80 WHILE (j>=0) AND (old[]]<>".") DO DEC(j) END; 81 IF j>=0 THEN i:=j-l; END; 82 FOR j:«0 TO i DO new[jj:=old[j}; END; 83 new[i+l]:="."; new[i+2]:=ext[0]; new[i+3]:=ext[l]; 84 new[i+4]:=ext[2]; new[i+5]:=0C; 85 END ChangeExtension; 86 87 88 BEGIN 89 WriteString(con,"Coco - Compiler Compiler Vs 4.1$");. 90 Open(src,0,"",FALSE); 91 IF NOT Done THEN Terminate(normal) END; (*cancel*j 92 ChangeExtension(srcA.name,lstn,"LST"); 93 Open(1st,srcA.volRef,lstn,TRUE); 94 WriteString(1st,"Coco - Compiler Compiler Vs 4.1 "); 95 WriteString(1st,"(Source file: "); WriteString(1st,srcA.name); 96 WriteStringdst, ")$$") ; 97 98 WriteString(con,"parsing"); 99 Parse(correct); (*parse input grammar*) 100 GetNumberOfErrors(synerrors,semerrors); (*check for errors*) 101 IF synerrors+semerrorsoO THEN 102 IF filesopen THEN CloseFile END; 103 WriteString(con,"$listing"); 104 PrintListing; 105 WriteString(con,"$Compilation terminated. "); 106 Writelnt(con,synerrors+semerrors, 0); 107 WriteString(con," errors detected. Press any key.$"); 108 Close(src); Close(1st); 109 Read(con,ch); Terminate(normal); 110 END; HI 112 WriteString(con,"$evaluating$"); 113 FindDelSymbols; 114 NewEpsBeforeDelNts; 115 DeleteRedundantEps; 116 GetSymbolSets; 117 TestCompleteness(ok); 118 IF ok THEN TestlfAllNtReached(ok); END;
Aff-F cocoMOD 243 if ok THEN FlndCircularRules(ok); END; 119 TF ok THEN TestlfNtToTerm(ok); END; l2? TF ok THEN LLlTest(lll); END; }l\ IF NOT ok OR NOT 111 THEN l,t WriteString(con,"listing$"); 24 WriteLn(lst); WriteLn(lst); PrintListing; US END; 127 IF Ok THEN j28 writeString(con,"writing$"); !29 GenSynFiles; 130 PutStatistics; 131 END; 132 IF NOT ok THEN 133 WriteString(con,"Compilation ended with errors in grammar tests."); 134 ELSIF NOT 111 THEN 135 writeString(con,"Compilation ended with LL(1) errors."); 136 ELSE 137 WriteString(con,"Compilation completed. No errors detected."); 138 END; 139 Close(src); Close(1st); 140 WriteString(con," Press any key.$"); Read(con,ch); 141 END Coco. C 77 84 ch 62 109 140 ChangeExtension 73 85 92 Close 56 108 108 139 139 CloseFile 46 102 Coco 44 141 cocogen 46 cocogen2 47 cocogra 48 cocolex 49 cocolst 50 cocosym 51 cocosyn 52 cocotst 53 con 56 89 98 103 105 106 107 109 112 123 128 133 135 137 140 140 correct 63 99 ddt 49 DeleteRedundantEps 48 115 Done 56 91 Errors 55 ext 73 83 83 84 File 56 F*lelO 56 filesopen 46 102 FlndCircularRules 53 119 FindDelSymbols 51 113 GenSynFiles 47 129 GetNumberOfErrors 55 100 GetSymbolSets 51 116 HIGH 1 J 74 83 74 76 83 79 77 83 80 77 84 80 77 '84 80 77 78 78 78 79 81 82 81 81 82 82 82
244 Program listings *!*>.$? 111 64 121 122 134 LLlTest 53 121 1st 50 93 94 95 95 96 108 124 124 139 lstn 65 92 93 name 92 95 new 73 82 83 83 83 84 84 NewEpsBeforeDelNts 48 114 normal 58 91 109 ok 66 117 118 118 119 119 120 120 121 122 127 132 old 73 77 77 78 80 82 Open 56 90 93 Parse 52 99 printinput 52 PrintListing 50 104 124 prlntnodes 52 PutStatlstics 47 130 Read 56 109 140 semerrors 67 100 101 106 src 49 90 92 93 95 108 139 synerrors 68 100 101 106 System 58 Terminate 58 91 109 TestCompleteness 53 117 TestlfAllNtReached 54 118 TestlfNtToTerm 54 120 volRef 93 Writelnt 56 106 WrlteLn 57 124 124 WrlteString 57 89 94 95 95 96 98 103 105 107 112 123 128 133 135 137 140
APP-F cocogen J)EF 245 Generator of compiler files Moe 28.12.83 x <* cocogen 2 «rtii^module generates the semantic evaluator. It * \ copies symbols from the input grammar to the evaluator * b! copies text from the semantic frame to the evaluator 5 ' stores attribute assignments (and emits them as semantic actions) 7 ~- *} 8 DEFINITION MODULE cocogen; 10 FROM FilelO IMPORT File; n 12 TYPE 13 Attrtype =* (term, nonterm, const); 14 15 VAR 16 maxsem: CARDINAL; (*number of last semantic action*) 17 filesopen: BOOLEAN; (*files may remain open after a syntax error*) 18 19 PROCEDURE CloseFile; 20 (* Closes the file where the semantic evaluator is written to*) 21 22 PROCEDURE Copy (typ, col -.CARDINAL); 23 (* Copies the source symbol typ at column col to the generated 24 semantic file*) 25 26 PROCEDURE CopyFramePart (VAR fl,f2:File; s:ARRAY OF CHAR); 27 (* Copies file fl to file f2 until string s occurs, s is not copied*) 28 29 PROCEDURE EmitAction( line CARDINAL; VAR sem: CARDINAL); 30 (* Emits the stored attribute assignments as a semantic action, line 31 is used to print a comment, sem is the number of the new action*) 32 33 PROCEDURE GenAs sign (typ: Attrtype; left, right-.CARDINAL); 34 (* Generates an assignment arg(left)<—arg(right). typ indicates if 35 arg(right) is a terminal attribute, a nonterminal attribute or 36 a constant*) 37 38 PROCEDURE InsertFramePart; 39 (* inserts the middle part in the generated semantics file*) 40 41 PROCEDURE OpenFile(spix:CARDINAL); 42 (* opens the file where the semantic evaluator is written to. spix is 43 the grammar name in Cocol. The name of the generated file is the 44 grammar name with the suffix "sem"*) 45 46 PROCEDURE OpenSem(line CARDINAL; VAR sem:CARDINAL) ; W (* Prints the start of a new semantic action (case-number of a new 48 case-block). line is used to print a comment, sem is the number of 49 the new action*) 50 51 PROCEDURE StartCopy(col CARDINAL); 52 (* Saves col as the leftmost column in the following semantic action*) 54 END cocogen.
246 Program listings *Pp.F (* cocogen Generation of semantic evaluator Moe 30.12.83 This module generates the semantic evaluator. It a) copies symbols from the input grammar to the evaluator b) copies text from the semantic frame to the evaluator c) stores attribute assignments (and emits them as semantic actions) 8 IMPLEMENTATION MODULE cocogen; 9 *) 10 FROM cocolex 11 FROM Errors 12 FROM FilelO 13 14 FROM System 15 16 CONST 17 blanks = 18 ■ 19 20 21 22 23 24 25 26 TYPE 27 28 29 Action = RECORD sem: CARDINAL; firstass: Assignmentptr; next: Actionptr; END; Assignment ■ RECORD typ: Attrtype; left: CARDINAL; CARDINAL; Assignmentptr; IMPORT at, line, col, src, GetName; IMPORT CompErr, SemErr; IMPORT con, File, Done, Open, Close, Read, Write, WriteCard, WriteLn, WriteString, WriteText; IMPORT Allocate, Deallocate; ident string number lparsy commasy eolsy - 17; = 18; = 19; - 23; = 33; =255; (*symbol numbers*) 30 31 32 33 34 35 36 37 38 39 40 41 42 VAR Actionptr = POINTER TO Action; Assignmentptr = POINTER TO Assignment; ^information about attr.eval. action*) (*action number*) (*to first assignment*) (*to next action*) right: next: END; Name = ARRAY[1. ^information about an attr. assignment*) (*term,nonterm,const*) (*spix of left-hand side*) (*spix or val of right-hand side*) (*to next assignment*) ,80] OF CHAR; firstact: firstass: fram: gram: graml: lastact: lastass: lastcol: lasttyp: leftcol: margin: op: sem: semname: Actionptr; Assignmentptr; File; -Name; CARDINAL; Actionptr; Assignmentptr; CARDINAL; CARDINAL; CARDINAL; CARDINAL; (*first generated action*) (*first stored assignment*) (*file with frame of sem.Analyzer*) (*grammar name*) (*length of grammar name*) (*last generated action*) (*last stored assignment*) (*column of last symbol*) (*type of last symbol*) (♦leftmost column in semantic action*) (*indent from left margin*) ARRAY[0..commasy] OF CHAR; (*operator table*) File; (*file containing sem.evaluator*) Name; (*file name of sem.evaluator*) 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 PROCEDURE EmitAssign(p:Assignmentptr); FORWARD;
APP- cocogenMOD 60 ** r* closeFile Close file containing the semantic evaluator II PROCEDURE CloseFile; 66 CopyFramePart (f ram, sem, "-->modulename"); 67 writeText(sem,gram,graml); WriteString(sem,"sem"); 68 CopyFramePart(fram,sem,"$$$"); 69 Close(fram); Close(sem); 70 filesopen:=FALSE; 71 END CloseFile; 72 73 74 (* Copy Copy source symbol to semantic evaluator 75 76 PROCEDURE Copy (typ,col:CARDINAL) ; 77 VAR 78 ch: CHAR; 79 l,i: CARDINAL; 80 name: Name; 81 BEGIN 82 IF col<=lastcol THEN (*new line*) 83 WriteLn(sem); 84 WriteText(sem,blanks,margin); 85 IF col>leftcol THEN 86 WriteText(sem,blanks,col-leftcol); 87 END; 88 lasttyp:=eolsy; 89 END; 90 IF (typ<=number) AND (lasttyp<=number) THEN 91 Write (sem," "); 92 END; 93 CASE typ OF 94 |1: WriteString(sem,"alias"); 95 | 2: WriteString(sem,"any"); 96 | 3: WriteString(sem,"DECLARATIONS"); 97 | 4: WriteString(sem,"ENDGRAM"); 98 | 5: WriteString(sem,"endsem"); 99 I 6: WriteString(sem,"eps"); 100 j 7: WriteString(sem,"GRAMMAR"); 101 l 8: WriteString(sem,"IN"); 102 | 9: WriteString(sem,"MACROS"); 103 I 10: WriteString(sem,"NONTERMINALS"); 104 | U: WriteString(sem,"out"); 105 I 12: WriteString(sem,"PRAGMAS"); 106 I 13: WriteString(sem,"RULES"); 107 I 14: WriteString(sem,"sem"); 1°8 I 15: WriteString(sem,"SEMANTICS"); 1°9 I 16: WriteString(sem,"TERMINALS"); 110 I 17,18: (*ident, string*) Hi GetName(at[l],name,l); WriteText(sem,name,1); 112 I 19: WriteCard(sem,at(l],0); 113 I 20..33: ^operators*) lu Write (sem, op [typ]); 115 I 34: ch:«CHR(at[l]); }16 IF (ch="!") OR ((ch="A") AND (lasttypoident)) 111 THEN; 118 ELSE Write (sem, ch);
248 Program listings *&.* 119 END; 120 END; (*CASE*) 121 lasttyp:-typ; lastcol:=col; 122 END Copy; 123 124 >■ 125 (* CopyFramePart Copies file fl to file f2 until string s occurs * 126 *j 127 PROCEDURE CopyFramePart(VAR fl,f2:File; s:ARRAY OF CHAR); 128 VAR 129 ch,startch: CHAR; 130 i: INTEGER; 131 t: ARRAY[0..50] OF CHAR; 132 BEGIN 133 startch:=s[0]; Read(flrch); 134 WHILE NOT flA.eof DO 135 IF chfstartch 136 THEN (*check if s occurs*) 137 i:=0; 138 WHILE (i<HIGH(s)) AND (ch«s[i]) AND NOT flA.eof DO 139 t[i]:=ch; INC(i); Read(fl,ch); 140 END; 141 IF ch=s[i) THEN RETURN; END; (*found - exit*) 142 WriteText(f2,t,i); (*not found- continue*) 143 Write(f2,ch); 144 ELSE Write(f2,ch); (*normal character - write it*) 145 END; 146 Read(fl,ch); 147 END; (*WHILE*) 148 END CopyFramePart; 149 150 151 (* EmitAction Emit stored semantic action 152 *) 153 PROCEDURE EmitAction (line .-CARDINAL; VAR sem.-CARDINAL); 154 VAR 155 act,p: Actionptr; 156 q: Assignmentptr; 157 158 PROCEDURE EqualAct(pl,p2: Assignmentptr): BOOLEAN; 159 BEGIN 160 WHILE (plONIL) AND (p2<>NIL) AND (plA.typ=p2A.typ) AND 161 (plA.left=p2A.left) AND (piA.right=p2A.right) DO 162 pl:=plA.next; p2:=p2A.next; 163 END; 164 RETURN (pl=NIL) AND (p2=NIL); 165 END EqualAct; 166 167 BEGIN 168 IF firstass=NIL 169 THEN sem:-0; 170 ELSE 171 p:=firstact; 172 WHILE (pONIL) AND NOT EqualAct (p\firstassf firstass) DO 173 p:=pA.next; 174 END; 175 IF p=NIL 176 THEN (*new action*) 177 OpenSem(linersem); EmitAssign(firstass);
ApP-F cocogenMOD Allocate(act,SIZE(Action)); 178 actA.sem:=sem; actA.firstass:=firstass; actA.next:=NIL; 179 IF firstact-NIL ISO THEN firstact:=act 181 ELSE iastactA.next:=act I** END; lJ? lastact:=act; W ELSE (*same action found; delete recently stored assignments*) JJ5 sem:=pA.sem; l*Z WHILE firstassoNIL DO ^J: q:*firstass; firstass:=firstassA.next; Deallocate(q); 1 9 END; U END; 191 END; 192 firstass:=NIL; 193 END EmitAction; 194 195 196 (* EmitAssign Write attribute assignment 197 *) 198 PROCEDURE EmitAssign (p:Assignmentptr); 199 VAR 200 1: CARDINAL; 201 name: Name; 202 BEGIN 203 WHILE poNIL DO 204 WriteLn(sem); WriteText(sem, blanks,margin); 205 GetName(pA.left,name,1); 206 CASE pA.typ OF 207 term: 208 WriteString(sem,"ASSIGN("); WriteText (sem,name, 1); 209 WriteString(sem,\at["); WriteCard(sem,pA.right,0); 210 WriteString(sem,"]);"); 211 | nonterm: 212 WriteText(sem,name,1); Writestring(sem,":="); 213 GetName (pA. right, name, 1); 214 WriteText(sem,name,1); Write(sem,";"); 215 | const: 216 WriteText(sem,name, 1); WriteString(sem,":="); 217 WriteCard(sem,pA.rignt,0); Write(sem,";"); 218 END; (*CASE*) 219 p:=pA.next; 220 END; (*WHILE*) 221 END EmitAssign; 222 223 224 (* GenAssign Store attribute assignment 225 *) 226 PROCEDURE GenAssign (f.Attrtype; 1, r-.CARDINAL); 227 VAR ass: Assignmentptr; 228 BEGIN 229 if (t=nonterm) AND (l*r) THEN RETURN; END; 230 Allocate(ass,SIZE(Assignment)); 231 WITH assA DO typ:*t; left:»l; right:=r; next:«NIL; END; 232 if firstass=NIL THEN firstass:-ass; ELSE lastassA.nextV=ass; END; 233 lastass:=ass; 234 END GenAssign; 235 236
250 Program listings APP-P 237 (* InsertFramePart Insert middle part of semantic evaluator 238 *, 239 PROCEDURE InsertFramePart; 240 BEGIN 241 CopyFramePart(fram,sem,"~>actions"); 242 margin:=9; 243 END InsertFramePart; 244 245 246 (* OpenFile Open file for semantic evaluator 247 *> 248 PROCEDURE OpenFile(spix:CARDINAL); 249 VAR i,l: CARDINAL; 250 BEGIN 251 GetName(spix,gram,1); graml:=1; 252 FOR i:=l TO graml DO semname[ij :-gram[i]; END; 253 semname[l+l] :="s"; semname[l+2] :="e"; semname[l+3] :="m"; 254 semname[ 1+4] :="."; semname[l+5] :="D"; semname[l+6]:="E"; 255 semname[l+7]:="F"; semname[l+8] :=0C; 256 257 Open (sem, srcA.volRef,semname, TRUE); (^definition module*) 258 C^en(f ram, srcA.volRef,"cocosemframe", FALSE) ; 259 IF NOT Done THEN 260 SemErr(25,line, col); 261 WriteString(con,"The file 'cocosemframe• must be in the same "); 262 Writestring(con,"subdirectory as the input grammar.$Aborted.$"); 263 CompErr(4) 264 END; 265 CopyFramePart (f ram, sem,"—>modulename"); 266 WriteText( sem, gram, graml); WriteSt ring (sem, "sem"); 2 67 CopyFramePart (f ram, sem,"—>modulename "); 268 WriteText(sem,gram,graml); WriteString(sem,"sem"); 269 CopyFramePart (f ram, sem, "—> implement at ion"); 270 Close(sem); 271 272 semname[l+5J:="M"; semname[l+6]:="0"; semname[l+7]:*"D"; 273 Open(sem,srcA.volRef,semname,TRUE); (*implementation module*) 274 CopyFramePart (f ram, sem,"—>modulename"); 275 WriteText (sem, gram, graml); WriteString(sem, "sem"); 276 CopyFramePart(fram,sem,"—>scannername"); 277 WriteText(sem,gram,graml); WriteString(sem,"lex"); 278 CopyFramePart(fram,sem,"—>declarations"); 279 filesopen:=TRUE; 280 END OpenFile; 281 282 283 (* OpenSem Write start of new semantic action 284 *) 285 PROCEDURE OpenSem(line:CARDINAL; VAR nrrCARDINAL); 286 BEGIN 287 INC(maxsem); nr:=rnaxsem; 288 WriteString(sem,"$ I "); WriteCard(sem,maxsem,3); 289 WriteString(sem,": (*line "); WriteCard(sem,line,0); 290 WriteString(sem, "*)") ; 291 END OpenSem; 292 293 294 (* StartCopy Set leftmost column in semantic action 295 *)
APP- cocogenMOD 251 ^ oboCEDURE StartCopy (col-.CARDINAL); 29f £5leftcol:-col; lasttyp:=eolsy; FriN leftcol:-col; lasttyp:=eolsy; lastcol:=99; END StartCopy; 298 *«n BEGIN (*cocogen*) 300 BEGin 12345678901234567890*) 30i .=» =.K)t3 {}<>;:,"; (*"»" must start at pos. 20*) 303 maxsem:=H; margin:=0; firstact:=NIL; firstass:=NIL; filesopen:=FALSE; 304 END cocogen. 155 178 179 179 179 181 182 184 27 29 178 27 32 43 48 155 act Action Actionptr Allocate 14 178 230 ass 227 230 231 232 232 233 Assignment 28 34 230 Assignmentptr 28 31 38 44 49 59 156 158 198 227 at 10 111 112 115 Attrtype 35 226 blanks 17 84 86 " 204 C 255 ch 78 115 116 116 118 129 133 135 138 139 139 141 143 144 146 Close 12 69 69 270 CloseFile 64 71 cocogen 8 304 cocolex 10 col 10 76 82 85 86 121 260 296 297 commasy 23 54 CompErr 11 263 con 12 261 262 const 215 Copy 76 122 CopyFramePart 66 68 127 148 241 265 267 269 274 276 278 Deallocate 14 188 Done 12 259 EmitAction 153 193 EmitAssign 59 177 198 221 e<>f 134 138 e°lsy 24 88 297 EqualAct 158 165 172 Errors n fl 127 133 134 138 139 146 f2 127 142 143 144 File 12 45 55 127 FilelO 12 filesopen 70 279 303 firstact 43 171 180 181 303 firstass 31 44 168 172 172 177 179 179 187 188 188 188 192 232 232 303 FORWARD 59 fram 45 66 68 69 241 258 265 267 269 274 276 278 GenAssign 226 234 GetName 10 111 205 213 251 9rarri 46 67 251 252 266 268 275 277 9rarr*l 47 67 251 252 266 268 275 277 HIGH 138 1 79 130 137 138 138 139 139 141 142 249 252 252 252
252 Program listings ApfcF ident 19 116 InsertFramePar 239 243 1 79 111 111 200 205 208 212 213 214 216 226 229 231 249 251 251 253 253 253 254 254 254 255 255 272 272 272 lastact 48 182 184 lastass 49 232 233 lasted 50 82 121 297 lasttyp 51 88 90 116 121 297 left 36 161 161 205 231 leftcol 52 85 86 297 line 10 153 177 260 285 289 lparsy 22 margin 53 84 204 242 303 maxsem 287 287 288 303 name 80 111 111 201 205 208 212 213 214 216 Name 40 46 56 80 201 next 32 38 162 162 173 179 182 188 219 231 232 nonterm 211 229 nr 285 287 number 21 90 90 op 54 114 302 Open 12 257 258 273 OpenFile 248 280 OpenSem 177 285 291 p 59 155 171 172 172 173 173 175 186 198 203 205 206 209 213 217 219 219 pi 158 160 160 161 161 162 162 164 p2 158 160 160 161 161 162 162 164 q 156 188 188 r 226 229 231 Read 12 133 139 146 right 37 161 161 209 213 217 231 s 127 133 138 138 141 sem 30 55 66 67 67 68 69 83 84 86 91 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 111 112 114 118 153 169 177 179 179 186 186 204 204 208 208 209 209 210 212 212 214 214 216 216 217 217 241 257 265 266 266 267 268 268 269 270 273 274 275 275 276 277 277 278 288 288 289 289 290 SemErr 11 260 semname 56 252 253 253 253 254 254 254 255 255 257 272 272 272 273 spix 248 251 sre 10 257 258 273 startch 129 133 135 StartCopy 296 297 string 20 System 14 t 131 139 142 226 229 231 term 207 typ 35 76 90 93 114 121 160 160 206 231 volRef 257 258 273 Write 12 91 114 118 143 144 214 217 WriteCard 13 112 209 217 288 289 WriteLn 13 83 204 WriteString 13 67 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 208 209 210 212 216 261
APP-F cocogenMOD 253 262 266 268 275 277 288 289 290 „ vi. 13 67 84 86 111 142 204 208 212 214 216 266 HriteText ^ 275 2?7
254 Program listings App.F 1 (* cocogen2: Generator for syntax files Moe 1.2.84 3 This module generates the parser. It 4 a) translates the top-down graph into G-code 5 b) copies text from the parser frame, inserting the declarations of 6 the table sizes 7 c) writes the parser tables 8 d) prints statistical information about the compilation 9 *, 10 DEFINITION MODULE cocogen2; 11 12 PROCEDURE GenSynFiles; 13 (* Generates the parser and the parser tables*) 14 15 PROCEDURE PutStatistics; 16 (* Writes statistics about the compilation to the list file*) 17 18 END cocogen2.
AppF cocogenlMOD 255 (* cocogen2: Generator for syntax files Moe 1.2.84 This module generates the parser. It a) translates the top-down graph into G-code b) copies text from the parser frame, inserting the declarations of the table sizes c) writes the parser tables d) prints statistical information about the compilation 10 IMPLEMENTATION MODULE cocogen2; 11 IMPORT maxsem, CopyFramePart; IMPORT alts, maxn, rootloc, rules, GetNode, Graphnode; IMPORT line, col, GetName; IMPORT 1st; IMPORT gramspix, maxany, maxeps, maxt, maxp, maxs, GetA, GetE, GetF, GetSy, RepSy, Symbolnode, Symbolset, Symboltype; IMPORT CompErr, SemErr; IMPORT con, File, Done, Open, Close, Write, WriteCard, WriteString, WriteText, WriteLn; IMPORT Allocate, Deallocate; IMPORT VAL; (*G-code length*) 12 FROM cocogen 13 FROM cocogra 14 FROM cocolex 15 FROM cocolst 16 FROM cocosym 17 18 19 FROM Errors 20 FROM FilelO 21 22 FROM System 23 FROM SYSTEM 24 25 CONST (*for G-code*) 26 lmaxc = 3000; 27 28 TYPE Filename = ARRAY[1..30] OF CHAR; Instruction^(tc,tac,ntc,ntac,ntsc,ntasc,anyc,anyac,epsc,epsac,jmpc,retc); 29 30 31 32 VAR code: ARRAY[1..lmaxc] OF [0..255]; pc: CARDINAL; maxname: CARDINAL; first: BOOLEAN- CARDINAL; RECORD :BOOLEAN OF TRUE: ch: ARRAY[1..2] OF CHAR; I FALSE: card: CARDINAL; END; END; ic: c: CASE (*G-code area*) (*index in code*) (*length of name list*) (*used for printing of tables*) (♦initialization counter*) 33 34 35 36 37 38 39 40 41 42 43 44 45 46 PROCEDURE OutByte{VAR f:File; ch:CHAR); FORWARD; 47 PROCEDURE OutWord(VAR f:File; n:CARDINAL); FORWARD; 48 PROCEDURE PrintTables(VAR f:File); FORWARD; 49 PROCEDURE WriteConstDecl(VAR f:File;t:ARRAY OF CHAR;n:CARDINAL); FORWARD; 50 51 52 53 MODULE LABMOD; (* G-code labels 54 ====================================================================*) 55 IMPORT 56 code, CompErr, Allocate, Deallocate; 57 EXPORT 58 GetAdr, labact, NewAdr, Visited; 59
256 Program listings App. F 60 TYPE 61 Fixupptr = POINTER TO Fixup; 62 Fixup = RECORD 63 adr: CARDINAL; (*G-code address*) 64 next: Fixupptr; (*to next fixup*) 65 END; 66 Labeladr = RECORD 67 loc,adr: CARDINAL; (*node address and corresponding G-code address*) 68 fix: Fixupptr; (*to first fixup*) 69 END; 70 VAR 71 lab: ARRAY[1..70] OF Labeladr; 72 labact: CARDINAL; 73 74 75 PROCEDURE GetAdr(loc, fixup CARDINAL; VAR adr: CARDINAL) ; 76 VAR 77 i: CARDINAL; 78 fp: Fixupptr; 79 BEGIN 80 i:-l; 81 WHILE (i<=labact) AND (lab[i] .locoloc) DO INC(i); END; 82 IF i>labact 83 THEN (*new label*) 84 INC(labact); lab[i].loc:=loc; lab[i].adr:=0; 85 Allocate(fp, SIZE(Fixup)); 86 fpA.adr:=fixup; fpA.next:=NIL; lab[i].fix:=fp; 87 ELSE (*old label*) 88 IF lab[i].adr=0 THEN (*not yet resolved*) 89 Allocate(fp,SIZE(Fixup)); fpA.adr:=fixup; fpA.next:=lab[i].fix; 90 lab[i].fix:=fp; 91 END; 92 END; 93 adr:=lab[i).adr; 94 END GetAdr; 95 96 97 PROCEDURE NewAdr(loc,adrCARDINAL); 98 VAR 99 i: CARDINAL; 100 p,q: Fixupptr; 101 BEGIN 102 i:=l; 103 WHILE (i<=labact) AND (lab[i) .locoloc) DO INC(i); END; 104 IF i>labact 105 THEN (*new label*) 106 INC(labact); lab[i].loc:=loc; lab[i].adr:=adr; lab[i].fix:=NIL; 107 ELSE (*old label*) 108 IF lab[i].adr=0 109 THEN (*resolve fixups*) 110 p:-lab[i].fix; 111 WHILE pONIL DO 112 code[pA.adr):-adr DIV 256; 113 code[pA.adr+l]:=adr MOD 256; 114 qi-p; p:=pA.next; Deallocate(q); 115 END; 116 lab[i].adr:=adr; lab[i].fix:«NIL; 117 ELSE (*fixups already resolved*) 118 CompErr(6);
APP-F cocogen2MOD 257 119 END; 120 END; 121 END NewAdr; 122 123 124 PROCEDURE Visited(loc:CARDINAL): BOOLEAN; 125 VAR i: CARDINAL; 126 BEGIN 127 1:-1; 128 WHILE (io=labact) AND (lab[i] .locoloc) DO INC(i); END; 129 RETURN (i<=labact) AND (lab[i].adr>0); 130 END Visited; 131 132 133 BEGIN (*LABMOD*) 134 labact:=0; 135 END LABMOD; 136 137 138 (* Emit Emit G-code byte 139 140 PROCEDURE Emit (byte:CARDINAL); 141 BEGIN code[pc]:=byte; INC(pc); END Emit; 142 143 144 (* Emit2 Emit G-code word 145 146 PROCEDURE Emit2(word:CARDINAL); 147 BEGIN 148 code[pc]:=word DIV 256; code[pc+l]:=word MOD 256; 149 INC(pc,2); 150 END Emit2; 151 152 153 (* GenCode Generate G-code for TDG in loc 154 155 PROCEDURE GenCode(loc:CARDINAL); 156 VAR 157 adr: CARDINAL; 158 gn: Graphnode; 159 BEGIN 160 IF Visited(loc) THEN RETURN; END; 161 NewAdr(loc,pc); (*now coming to address loc*) 162 GetNode(loc,gn); 163 WITH gn DO 164 CASE typ OF 165 t: IF lp-0 166 THEN Emit(ORD(tc)); Emit(sp); 167 ELSE 168 GetAdr(lprpc+2,adr); 169 Emit(ORD(tac)); Emit(sp); Emit2(adr); 170 END; 171 | nt: IF lp-0 172 THEN IF seml=0 173 THEN Emit(ORD(ntc)); Emit(sp); !74 ELSE ESnit(ORD(ntsc)); Emit(sp); Emit(semi); 175 END; 176 ELSE 177 GetAdr(lpfpc+2radr);
258 Program listings App. p 178 IF seml=0 179 THEN Emit(ORD(ntac)); Emit(sp); Emit2(adr); 180 ELSE Emit(ORD(ntasc))/Emit(sp)/Emit2(adr)/Emit(semi)/ 181 END; 182 END; 183 I any: IF lp=0 184 THEN Emit(ORD(anyc)); 185 ELSE 186 GetAdr(lp,pc+2,adr); 187 Emit(ORD(anyac))/ Emit(sp)/ Emit2(adr); 188 END; 189 I eps: IF sp<>0 THEN 190 IF lp=0 191 THEN Emit(ORD(epsc)); Emit(sp); 192 ELSE 193 GetAdr(lp,pc+2,adr)/ 194 Emit(ORD(epsac))/ Emit(sp)/ Emit2(adr)/ 195 END/ 196 END/ 197 END; (*CASE*) 198 IF sem2<>0 THEN Emit(sem2); END/ 199 IF sem3<>0 THEN Emit(sem3)/ END/ 200 IF rp=0 THEN Emit(ORD(retc) ); 201 ELSIF Visited(rp) THEN 202 GetAdr(rp,pc+l,adr)/ Emit(ORD(jmpc))/ Emit2(adr)/ 203 END; 204 IF rp>0 THEN GenCode(rp)/ END; 205 IF lp>0 THEN GenCode(lp)/ END/ 206 END/ (*WITH*) 207 END GenCode/ 208 209 210 (* GenSynFiles Generates files for syntax analysis 2U *) 212 PROCEDURE GenSynFiles/ 213 VAR 214 fn: Filename/ 215 fram: File/ (*file with parser frame*) 216 graml: CARDINAL/ (*length of grammar name*) 217 gramname: Filename/ (*grammar name*) 218 i,j,l: CARDINAL/ 219 name: ARRAY[1..50] OF CHAR/ 220 startpc: CARDINAL/ 221 sn: Symbolnode/ 222 syn: File/ (*file for generated parser*) 223 BEGIN 224 pc:=l/ 225 FOR i:=maxp+l TO maxs DO 226 labact:=0/ startpc:=pc/ 227 GetSy(i,sn); 228 GenCode(sn.start); 229 sn.start:=startpc; 230 RepSy(ifsn); 231 END; 232 startpc:=pc; GenCode(rootloc); 233 234 maxname:=4; (*"EOF"+0C*) 235 FOR i:=l TO maxs DO 236 GetSy(i,sn); GetName(sn.aliasspixrnamef1);
APPF cocogeril MOD 259 237 sn.spix:=maxname+l; RepSy(i,sn); INC(maxname,l+l); 738 (*sn.spix becomes a pointer in the generated name list*) 239 END; 240 24i GetName(gramspix,gramname,graml); 2*2 243 (* generate parser*) 244 FOR i:=l TO graml DO fn[i]:=gramname[i]; END; 245 fn[graml+l]:="s"; fn[graml+2]:="y"; fn[graml+3]:="n"; fn[graml+4] :="."; 246 fn[graml+5]:="D"; fn[graml+6]:="E"; fn[graml+7]:="F"; fn[graml+8]:=0C; 247 open(syn,1stA.volRef,fn,TRUE); 248 open(fram,1stA.volRef,"cocosynframe",FALSE); 249 IF NOT Done THEN 250 WriteString(con,"The file •cocosynframe• must be in the same "); 251 WriteString(con,"subdirectory as the input grammar.$"); 252 SemErr(21,line,col); CompErr(5); 253 END; 254 CopyFramePart (f ram, syn,"~>modulename"); (*definition module*) 255 WriteText(syn,gramname,graml); WriteString(syn,"syn"); 256 CopyFramePart(fram,syn,"~>modulename"); 257 WriteText(syn,gramname,graml); WriteString(syn,"syn"); 258 CopyFramePart(fram, syn,"-^implementation"); 259 Close(syn); 260 261 fn[graml+5]:-"M"; fn[graml+6]:="0"; fn[graml+7]:="D"; 262 Open(syn,1stA.volRef,fn,TRUE); 263 264 CopyFramePart (fram,syn, "~>modulename"); (*module name*) 265 WriteText(syn,gramname,graml); WriteString(syn,"syn"); 266 267 CopyFramePart(fram,syn,"—>semantic analyzer"); (*various imports*) 268 WriteText(syn,gramname,graml); WriteString(syn,"sem"); 269 CopyFramePart(fram,syn,"~>input module"); 270 WriteText(syn,gramname,graml); WriteString(syn,"lex"); 271 272 CopyFramePart(fram,syn,"-^declarations"); (*semantic declarations*) 273 WriteString(syn,"CONST$"); 274 WriteConstDecl(syn," maxname =",maxname); 275 WriteConstDecKsyn," maxnamep =",maxs); 276 WriteConstDecKsyn," maxcode =",pc-l); 277 IF maxany=0 278 THEN WriteConstDecKsyn," maxany =",1); 279 ELSE WriteConstDecKsyn," maxany =",maxany); 280 END; 281 IF maxeps=0 282 THEN WriteConstDecKsyn," maxeps =",1); 283 ELSE WriteConstDecKsyn," maxeps =",maxeps); 284 END; 285 WriteConstDecKsyn," maxt =",maxt); 286 WriteConstDecKsyn," maxp =",maxp); 287 WriteConstDecKsyn," maxs =",maxs); 288 WriteConstDecKsyn," startpc =",startpc); 289 WriteString(syn,"$ "); 290 291 CopyFramePart(fram,syn,"—>tables"); 292 PrintTables(syn); 293 CopyFramePart(fram,syn,"—>modulename"); (*module name*) 294 WriteText(syn,gramname,graml); WriteString(syn,Msyn"); 295 CopyFramePart(fram,syn,"$$$");
260 Program listings App.p 296 Close(fram); Close(syn); 297 END GenSynFiles; 298 299 300 (* OutByte Write a byte value to tables file 3d *j 302 PROCEDURE OutByte(VAR f:File; ch:CHAR); 303 BEGIN 304 IF first 305 THEN c.ch[l]:=ch; 306 ELSE c.ch[2]:=ch; OutWord(f,c.card); 307 END; 308 first:=NOT first; 309 END OutByte; 310 311 312 (* OutWord Write a word to tables file 313 *) 314 PROCEDURE OutWord(VAR f:File; nrCARDINAL); 315 BEGIN 316 IF ic=10 THEN 317 WriteString(f,"$ "); ic:=0 318 END; 319 WriteCard(f,n,5); Write(f,","); 320 INC(ic); 321 END OutWord; 322 323 324 (* PrintTables Write out an initialization of the grammar tables 325 *) 326 PROCEDURE PrintTables{VAR f:File); 327 VAR 328 i,j,l: CARDINAL; 329 name: ARRAY[1..50] OF CHAR; 330 s: Symbolset; 331 sn: Symbolnode; 332 333 BEGIN 334 first:=TRUE; WriteString(f," INLINE($ H); ic:=0; 335 OutWord(f,pc-l); (*header(table lengths)*) 336 OutWord(f,maxt); 337 OutWord(f,maxp); 338 OutWord(f,maxs); 339 OutWord(frmaxeps); 340 OutWord(frmaxany); 341 OutWord(frmaxs); 342 OutWord(f,maxname); 343 WriteString(fr"$(*—G-code—*)$ "); ic:-0; 344 FOR i:=l TO pc-1 DO (*G-code*) 345 OutByte (f^HRfcodeli])); 346 END; 347 IF ODD(pc-1) THEN 348 OutByte(fr0C); 349 END; 350 WriteString(ff,,$(*—nt-symbols—*)$ "); ic:=0; 351 FOR i:=maxp+l TO maxs DO (*nt-symbols*) 352 GetSy(irsn); 353 OutWord(f#sn.start); 354 OutWord(ffORD(sn.del)*256); 355 GetF(i,s);
ApPF cocogeri2MOD 261 izc FOR j:=0 TO maxt DIV 16 DO 357 OutWord(frVAL(CARDINAL,s[j])); 358 END; 359 END' 360 WriteString(f,"$(*—eps followers—*)$ "); ic:=0; 361 FOR i:=l TO maxeps DO (*followers of eps nodes*) 362 GetE(i,s); 363 FOR j:=0 TO maxt DIV 16 DO 364 OutWord(f,VAL(CARDINAL,s[j])); 365 E1®' 366 END; 367 IF maxeps=0 THEN 368 FOR j:=0 TO maxt DIV 16 DO 369 OutWord(f,0); 370 END; 371 maxeps:=1; (*dummy*) 372 END; 373 WriteString(f,"$(*—any sets—*)$ "); ic:=0; 374 FOR 1:^=1 TO maxany DO (*any-sets*) 375 GetA(i,s); 376 FOR j:=0 TO maxt DIV 16 DO 377 OutWord(f ,VAL(CARDINAL,s [ j])) ; 378 END; 379 END; 380 IF maxany=0 THEN 381 FOR j:-0 TO maxt DIV 16 DO 382 OutWord(ff0); 383 END; 384 maxany:=1; (*dummy*) 385 END; 386 WriteString(fr"$(*—attribute numbers—*)$ "); ic:=0; 387 FOR i:=0 TO maxp DO (*attribute numbers*) 388 GetSy(i,sn); 389 OutWord(f,sn.nra); 390 END; 391 WriteString(ff"$(*—pragma semantic—*)$ "); ic:=0; 392 OutWord(f,0); OutWord(fr0); (*dummy psem*) 393 FOR i:=maxt+l TO maxp DO (*pragma semantic*) 394 GetSy(i,sn); 395 OutWord(frsn.seml); 396 OutWord(ffsn.sem2); 397 END; 398 WriteString(ff"$(*—name pointers—*)$ "); ic:=0; 399 OutWord(f,l); (*for eofsy*) 400 FOR i:=l TO maxs DO (*name pointers*) 401 GetSy(ifsn); 402 (*sn.spix is now a pointer in the generated name list*) 403 OutWord(f,sn.spix); 404 END; 405 WriteString(f,"$(*—name list—*)$ "); ic:-0; 406 OutByte(f,"E"); OutByte(fr"0"); 407 OutByte(f,"F"); OutByte(f,0C); 408 FOR i:=l TO maxs DO (*name list*) 409 GetSy(irsn); 410 GetName(sn.aliasspixfnamefl); 411 FOR j:=l TO 1 DO OutByte(f,name[j]); END; 412 OutByte(f#0C); 413 END; 414 if ODD(maxname) THEN OutByte(fr0C); END;
262 Program listings Apfcp 415 WriteString(f,"0);$"); 416 END PrlntTables; 417 418 419 (* PutStatlstics Writes statistics about compilation to list file 420 *) 421 PROCEDURE PutStatistics; 422 VAR 423 ptrsize: CARDINAL; 424 setsize: CARDINAL; 425 storage: CARDINAL; 426 BEGIN 427 ptrsize:=2; setsize:=2*((maxt DIV 16)+1); 428 storage:=pc-l + (*G-code*) 429 (ptrsize+2+setsize)*(maxs-maxp) + (*ntsymbols*) 430 setsize*maxeps + (*eps-followers*) 431 setsize*maxany + (*any-sets*) 432 2*(maxp+l) + (*nra*) 433 4*(maxp-maxt+l) + (*ps*) 434 2*(maxs+l) + (*namep*) 435 maxname + (*name*) 436 16; (*header*) 437 WriteLn(lst); WriteString(1st,"Statistics:"); WriteLn(lst); 438 WriteCard(lst,rules,5); WriteStringdst," rules"); WriteLn(lst); 439 WriteCard(lst,alts,5); WriteStringdst," alternatives"); WriteLn(lst); 440 WriteCard(lst,maxn,5); WriteStringdst," nodes"); WriteLn(lst); 441 WriteCard(lst,maxsem-10,5); WriteStringdst," semantic actions"); 442 WriteLndst); 443 WriteCard(lst,maxeps,5); WriteStringdst," eps with look ahead"); 444 WriteLndst); 445 WriteCard(lst,maxanyr5); WriteStringdst," any-sets"); WriteLn(lst); 446 WriteCard(lst,pc-l,5); WriteStringdst," bytes for G-code"); 447 WriteLndst); 448 WriteCard(1st,storage,5); 449 WriteStringdst," bytes for grammar tables (total)"); WriteLn(lst); 450 END PutStatistics; 451 452 453 (* WriteConstDecl Write constant declaration text 454 *) 455 PROCEDURE WriteConstDecl (VAR f:File; t .-ARRAY OF CHAR; n:CARDINAL); 456 BEGIN 457 WriteString(f,t); WriteCard(f,n,4); WriteString(f,";$"); 458 END WriteConstDecl; 459 460 END cocogen2. adr 63 67 75 84 86 88 89 93 93 97 106 106 108 112 112 113 113 116 116 129 157 168 169 177 179 180 186 187 193 194 202 202 aliasspix 236 410 Allocate 22 56 85 89 alts 13 439 any 183 anyac 30 187 anyc 30 184 byte 140 141 c 38 305 306 306
APPF cocogen2MOD 263 246 348 407 412 414 C 41 306 card 40 46 302 305 305 306 306 <*. 20 259 296 296 Close 12 ^Tli 10 460 ^009602 cocogra cocolex cocolst 13 14 15 cocosym ^ ^ ^ ^ ^ ^ ^ ^ code col CompErr con CopyFramePart 14 252 19 56 118 252 20 250 251 12 254 256 258 264 267 269 272 291 293 295 Deallocate 22 56 114 del 354 Done 20 249 groit 140 141 166 166 169 169 173 173 174 174 174 179 179 180 180 180 184 187 187 191 191 194 194 198 199 200 202 Emit2 146 150 169 179 180 187 194 202 eps 189 epsac 30 194 epsc 30 191 Errors 19 f 46 47 48 49 302 306 314 317 319 319 326 334 335 336 337 338 339 340 341 342 343 345 348 350 353 354 357 360 364 369 373 377 382 386 389 391 392 392 395 396 398 399 403 405 406 406 407 407 411 412 414 415 455 457 457 457 File 20 46 47 48 49 215 222 302 314 326 455 FilelO 20 Filename 29 214 217 first 36 304 308 308 334 fix 68 86 89 90 106 110 116 fixup 75 86 89 Fixup 61 62 85 89 Fixupptr 61 64 68 78 100 fn 214 244 245 245 245 245 246 246 246 246 247 261 261 261 262 FORWARD 46 47 48 49 fp 78 85 86 86 86 89 89 89 90 fram 215 248 254 256 258 264 267 269 272 291 293 295 296 GenCode 155 204 205 207 228 232 GenSynFlles 212 297 GetA 16 375 GetAdr 58 75 94 168 177 186 193 202 GetE 17 362 GetF n 355 GetName 14 236 241 410 GetNode 13 162 GetSy 17 227 236 352 388 394 401 409 *n 158 162 163 9*aml 216 241 244 245 245 245 245 246 246 246 246 255 257 261 261 261 265 268 270 294 gramname 217 241 244 255 257 265 268 270 294 gramsplx 16 241
264 Program listings App.p » Graphnode 13 158 i 77 80 81 81 81 82 84 84 86 88 89 90 93 99 102 103 103 103 104 106 106 106 108 110 116 116 125 127 128 128 128 129 129 218 225 227 230 235 236 237 244 244 244 328 344 345 351 352 355 361 362 374 375 387 388 393 394 400 401 408 409 lc 37 316 317 320 334 343 350 360 373 386 391 398 405 Instruction 30 j 218 328 356 357 363 364 368 376 377 381 411 411 jmpc 30 202 1 218 236 237 328 410 411 lab 71 81 84 84 86 88 89 90 93 103 106 106 106 108 110 116 116 128 129 labact 58 72 81 82 84 103 104 106 128 129 134 226 Labeladr 66 71 LABMOD 53 135 line 14 252 lmaxc 26 33 loc 67 75 81 81 84 84 97 103 103 106 106 124 128 128 155 160 161 162 lp 165 168 171 177 183 186 190 193 205 205 1st 15 247 248 262 437 437 437 438 438 438 439 439 439 440 440 440 441 441 442 443 443 444 445 445 445 446 446 447 448 449 449 maxany 16 277 279 340 374 380 384 431 445 maxeps 16 281 283 339 361 367 371 430 443 maxn 13 440 maxname 35 234 237 237 274 342 414 435 maxp 16 225 286 337 351 387 393 429 432 433 maxs 16 225 235 275 287 338 341 351 400 408 429 434 maxsem 12 441 maxt 16 285 336 356 363 368 376 381 393 427 433 n 47 49 314 319 455 457 name 219 236 329 410 411 NewAdr 58 97 121 161 next 64 86 89 114 nra 389 nt 171 ntac 30 179 ntasc 30 180 ntc 30 173 ntsc 30 174 ODD 347 414 Open 20 247 248 262 OutByte 46 302 309 345 348 406 406 407 407 411 412 414 OutWord 47 306 314 321 335 336 337 338 339 340 341 342 353 354 357 364 369 377 382 389 392 392 395 396 399 403 p 100 110 111 112 113 114 114 114 pc 34 141 141 148 148 149 161 168 177 186 193 202 224 226 232 276 335 344 347 428 446 PrintTables 48 292 326 416 ptrsize 423 427 429 PutStatistics 421 450 q 100 114 114 RepSy 17 230 237 retc 30 200
p cocogen2MOD 265 rootloc rp rules s semi sem2 seffl3 SemErr setslze sn sp spix start startpc storage Symbolnode Symbolset Symboltype syn 13 200 13 330 172 198 199 19 424 221 354 166 237 228 220 425 17 17 18 222 232 201 438 355 174 198 199 252 427 227 388 169 403 229 226 428 221 330 247 202 357 178 396 429 228 389 173 353 229 448 331 254 204 362 180 430 229 394 174 232 255 204 364 395 431 230 395 179 288 255 375 236 396 180 256 377 236 401 187 257 237 403 189 257 237 409 191 258 331 410 194 259 352 353 262 264 265 265 267 268 268 269 270 270 272 273 274 275 276 278 279 282 283 285 286 287 288 289 291 292 293 294 294 295 296 System 22 SYSTEM 23 t 49 165 455 457 tac 30 169 tc 30 166 typ 164 VAL 23 357 364 377 Visited 58 124 130 160 201 volRef 247 248 262 word 146 148 148 Write 20 319 WrlteCard 20 319 438 439 440 441 443 445 446 448 457 HriteConstDecl 49 274 275 276 278 279 282 283 285 286 287 288 455 458 WriteLn 21 437 437 438 439 440 442 444 445 447 449 WriteString 21 250 251 255 257 265 268 270 273 289 294 317 334 343 350 360 373 386 391 398 405 415 437 438 439 440 441 443 445 446 449 457 457 WriteText 21 255 257 265 268 270 294
266 Program listings App.p l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 . 55 56 57 58 59 (* cocogra Graph node list Moe 28.12.83 This module builds and handles the top-down graph. It a) generates and updates single graph nodes b) concatenates graphs via left or right pointers c) prints the whole graph for tracing d) inserts eps nodes before deletable nonterminals with alternatives e) deletes redundant eps-nodes resulting from EBNF-constructs such as {x}y or [x]y *, DEFINITION MODULE cocogra; FROM cocosym IMPORT Symboltype; CONST maxnodes = 600; TYPE Graphnode = RECORD typ: Symboltype; sp: CARDINAL; lp: CARDINAL; rp: CARDINAL; semi: [0..255]; sem2: [0..255]; sem3: [0..255]; line: CARDINAL; link: CARDINAL; END; (*eps,t,pr,nt,any,err*) (*node symbol*) (*left pointer*) (*right pointer*) (♦evaluation of in-attributes*) (♦evaluation of out-attributes*) (*semantic action*) (*line number*) (*ptr to node with same right successor*) Marklist = ARRAY[0..maxnodes DIV 16] OF BITSET; VAR maxn: CARDINAL; alts: CARDINAL; rules: CARDINAL; rootloc: CARDINAL; (*number of graph nodes*) (*number of alternatives, filled by AG*) (*number of grammar rules, filled by AG*) (*root node of grammar, filled by AG*) PROCEDURE ClearMarkList(VAR m:Marklist); (* Clears the mark list m*) PROCEDURE ConcatLeft(VAR gp,gl,gpl,gll CARDINAL); (* Links the graph (gp,gl) with the graph (gpl,gll) via left pointers. The resulting graph is identified by (gp,gl)*) PROCEDURE ConcatRight(VAR gp,gl,gpl,gllCARDINAL); (* Links the graph (gp,gl) with the graph (gpl,gll) via right pointers. The resulting graph is identified by (gp,gl)*) PROCEDURE Deletable(loc:CARDINAL): BOOLEAN; (* TRUE if the graph with the root loc is deletable*) PROCEDURE DeleteRedundantEps; (* Deletes eps nodes in constructions {x}y and [x]y*) PROCEDURE DelNode(gn:Graphnode): BOOLEAN; (* TRUE if the node gn contains a deletable symbol*) PROCEDURE GetNode(p:CARDINAL; VAR gn:Graphnode); (* Gets the graph node with the index p*)
c cocograDEF 267 ApPr *i PROCEDURE GraphList; g2 (* Prints a test list of the top-down graphs of all rules*) 64 PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist); 65 (* Marks loc in list m as visited*) 67 PROCEDURE Marked (loc: CARDINAL; VAR m:Marklist): BOOLEAN; 68 (* TRUE if loc is marked in m*) 69 70 PROCEDURE NewEpsBeforeDelNts; 71 {* Inserts eps nodes in front of deletable nt's*) 72 73 PROCEDURE NewNode(typ:Symboltype; sp,line:CARDINAL): CARDINAL; 74 (* Generates a new graph node with the specified values and returns 75 its index*) 76 77 PROCEDURE RepNode(p:CARDINAL; gn:Graphnode); 78 {* Replaces the graph node with index p by gn*) 79 80 END cocogra.
268 Program listings App.F 1 (* cocogra Graph node list for coco Moe 29.12.83 2 ======= ======================== 3 This module builds and handles the top-down graph. It 4 a) generates and updates single graph nodes 5 b) concatenates graphs via left or right pointers 6 c) prints the whole graph for tracing 7 d) inserts eps nodes before deletable nonterminals with alternatives 8 e) deletes redundant eps-nodes resulting from EBNF-constructs such as ' 9 {x}y or [x]y 10 *j 11 IMPLEMENTATION MODULE cocogra; 12 13 FROM cocolex IMPORT ddt, GetName; 14 FROM cocosym IMPORT maxp, maxs, GetSy, RepSy, Symbolnode, 15 Symbol type; 16 FROM Errors IMPORT Restriction; 17 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString, 18 WriteText; 19 20 TYPE Graphnodelist = ARRAY[1..maxnodes] OF Graphnode; 21 VAR gn: Graphnodelist; (*syntax graph*) 22 23 24 (* ClearMarkList Clear mark list m 25 *) 26 PROCEDURE ClearMarkList (VAR m.-Marklist) ; 27 VAR i: CARDINAL; 28 BEGIN FOR i:=0 TO maxnodes DIV 16 DO m[i]:=U; END; END ClearMarkList; 29 30 31 (* ConcatLeft Concatenate graph gpl left to graph gp 32 *) 33 PROCEDURE ConcatLeft(VAR gprgl,gpl,gll:CARDINAL); 34 VAR p: CARDINAL; 35 BEGIN 36 p:=gp; 37 WHILE gn[p].lp<>0 DO p:=gn[p].lp; END; 38 gn[p].lp:=gpl; 39 p:=gl; 40 WHILE gn[p].link<>0 DO p:=gn[p].link; END; 41 gn[p].link:=gll; 42 END ConcatLeft; 43 44 45 (* ConcatRight Concatenate graph gpl right to graph gp 46 *) 47 PROCEDURE ConcatRight(VAR gp,gl,gpl,gll:CARDINAL); 48 VAR p: CARDINAL; 49 BEGIN 50 p:-gl; 51 WHILE p<>0 DO gn[p].rp:=gpl; p:«gn[p].link; END; 52 gl:-gll; 53 END ConcatRight; 54 55 56 (* Deletable Check if graph in loc is deletable 57 *) 58 PROCEDURE Deletable(loc:CARDINAL):BOOLEAN; 59 VAR m: Marklist;
AppF cocograMOD 60 PROCEDURE DelGraph(loc:CARDINAL):BOOLEAN; 5J VAR gnrGraphnode; ,* BEGIN A IF loc=0 THEN RETURN TRUE; END; (*end of graph found*) A IF Marked (loc,m) THEN RETURN FALSE; END; 66 Mark(loc,m); 61 GetNode(locgn); 6B IF ddt["C"] THEN 59 WriteString{con,"DelGraph:"); 70 writeCard(con,loc,6); WriteCard(con,0RD(gn.typ),8); 71 WriteCard(con,gn.sp,6); WriteLn(con); 72 END; 73 RETURN ((gn.lpoO) AND DelGraph(gn.lp)) OR 74 (DelNode(gn) AND DelGraph(gn.rp)); 75 END DelGraph; 76 77 BEGIN (*Deletable*) 78 ClearMarkList(m); 79 RETURN DelGraph(loc); 80 END Deletable; 81 82 83 (* DelNode Test if node gn is deletable 84 85 PROCEDURE DelNode(gnrGraphnode):BOOLEAN; 86 VAR sn:Symbolnode; 87 BEGIN 88 IF gn.typ=nt 89 THEN GetSy(gn.sp,sn); RETURN sn.del; 90 ELSE RETURN gn.typ=eps; 91 END; 92 END DelNode; 93 94 95 (* DeleteRedundantEps Delete eps nodes in constructions {x}y and [x]y 96 *) 97 PROCEDURE DeleteRedundantEps; 98 VAR 99 m: Marklist; 100 i: CARDINAL; 101 sn: Symbolnode; 102 103 PROCEDURE DelEps(loc:CARDINAL); 104 VAR gn,gnl: Graphnode; 105 BEGIN 106 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; J07 Mark(loc,m); 108 GetNode(loc,gn); |09 WITH gn DO f}° IF lp<>0 THEN 111 GetNode(lp,gnl); j*2 IF (gnl.typ=eps) AND (gnl.sem3=0) 112 AND (gnl.lp-0) AND (gnl.rpoO) THEN ;j^ lp:=gnl.rp; RepNode(loc,gn); 111 END; I}* END; t\l DelEps(lp); , 118 DelEps(rp);
270 Program listings App.F 119 END; 120 END DelEps; 121 122 BEGIN 123 ClearMarkList(m); 124 FOR l:=maxp+l TO maxs DO 125 GetSy(l,sn); DelEps(sn.start); 126 END; 127 END DeleteRedundantEps; 128 129 130 (* GetNode Get node gp 131 *) 132 PROCEDURE GetNode(gp:CARDINAL; VAR gnliGraphnode); 133 BEGIN gnl:-gn[gp]; END GetNode; 134 135 136 (* GraphLlst trace output of graph node list 137 *) 138 PROCEDURE GraphList; 139 VAR 140 i,j,l: CARDINAL; 141 name: ARRAY[1..80] OF CHAR; 142 sn: Symbolnode; 143 BEGIN 144 WriteString(con,H$$Topdown-graph:$$"); 145 WriteString(con,"loc symbol typ lp rp"); 146 WriteString(con," semi sem2 sem3 link line$$"); 147 FOR i:=l TO maxn DO 148 WriteCard(con,i,3); WriteString(con," "); 149 WITH gn[i] DO 150 CASE typ OF 151 eps,any: 152 WriteString(con," "); 153 I t,nt: 154 GetSy(sp,sn); GetName(sn.spix,name,l); 155 FOR j:*l+l TO 12 DO name[j]:=" ■; END; 156 WriteText(con,name,12); 157 | err: 158 WriteString(con,"error "); 159 END; (*CASE*) 160 CASE typ OF 161 eps: WriteString(con," eps "); 162 I t: WriteString(con," t "); 163 | pr: WriteString(con," pr "); 164 I nt: WriteString(con," nt "); 165 I any: WriteString(con," any "); 166 ELSE; 167 END; (*CASE*) 168 WriteCard(con,lp,7); WriteCard(con,rp,7); 169 WriteCard(con,seml,7); WriteCard(con,sem2,7); 170 WriteCard(con,sem3,7); WriteCard(con,link,7); 171 WriteCard(con,line,7); WriteLn(con); 172 END; (*WITH*) 173 END; (*FOR*) 174 END GraphList; 175 176 177 (* Mark Marks node loc in m as visited
- cocograMOD 111 APPF . *) lll PROCEDURE Mark(loc:CARDINAL; VAR m:Marklist); 80 BEGIN INCL(m[loc DIV 16],loc MOD 16); END Mark; 181 a\ (* Marked Tests if node loc is marked in m 183 I * ^ ifl5 PROCEDURE Marked (loc: CARDINAL; VAR m:Marklist): BOOLEAN; 186 BEGIN RETURN (loc MOD 16) IN m[loc DIV 16]; END Marked; 187 188 189 (* NewEpsBeforeDelNts Insert eps before del. nt's with alternatives 190 *) 191 PROCEDURE NewEpsBeforeDelNts; 192 VAR 193 gnrgnl: Graphnode; 194 loc,locl,maxloc: CARDINAL; 195 sn: Symbolnode; 196 BEGIN 197 maxloc:=maxn; 198 FOR loc:=l TO maxloc DO 199 GetNode(loc,gn); 200 IF (gn.typ=nt) AND (gn.lpoO) AND DelNode(gn) THEN 201 loci :=NewNode(gn.typ,gn.sp,gn. line); 202 gnl:=gn; gnl.lp:=0; 203 WITH gn DO 204 typ:=eps; sp:=0; rp:=locl; seml:=0; sem2:=0; sem3:=0; 205 END; 206 RepNode(loc1,gnl); 207 RepNode(loc,gn); 208 END; 209 END; (*FOR*) 210 END NewEpsBeforeDelNts; 211 212 213 {* NewNode Generate a new graph node and return the index 214 *) 215 PROCEDURE NewNode(t:Symboltype; s:CARDINAL; IrCARDINAL): CARDINAL; 216 BEGIN 217 INC(maxn); 218 IF maxn>maxnodes THEN Restriction(5); END; 219 WITH gnfmaxn] DO 220 typ:=t; sp:=s; lp:=0; rp:=0; seml:=0; sem2:=0; sern3:=0; 221 line:-l; link:-0; 222 END; 223 RETURN maxn; 224 END NewNode; 225 226 227 (* RepNode Replace node gp 228 *) 229 PROCEDURE RepNode(gp:CARDINAL; gnl:Graphnode); <30 BEGIN gn[gp]:»gnl; END RepNode; 232 233 BEGIN (*cocogra*) l\l maxn:=0; *** END cocogra.
272 Program listings ***** any 151 165 ClearMarkList 26 28 78 123 cocogra 11 235 cocolex 13 cocosym 14 con 17 69 70 70 71 71 144 145 146 148 148 15* 156 158 161 162 163 164 165 168 168 169 169 r?0 170 171 171 ConcatLeft 33 42 ConcatRight 47 53 ddt 13 68 del 89 DelEps 103 117 118 120 125 Deletable 58 80 DeleteRedundantEps 97 127 DelGraph 61 73 74 75 79 DelNode 74 85 92 200 eps 90 112 151 161 204 err 157 Errors 16 FilelO 17 GetName 13 154 GetNode 67 108 HI 132 133 199 GetSy 14 89 125 154 gl 33 39 47 50 52 gll 33 41 47 52 gn 21 37 37 38 40 40 41 51 51 62 67 70 71 73 73 74 74 85 88 89 90 104 108 109 114 133 149 193 199 200 200 200 201 201 201 202 203 207 219 230 gnl 104 111 112 112 113 113 114 132 133 193 202 202 206 229 230 gp 33 36 47 132 133 229 230 gpl 33 38 47 51 GraphList 138 174 Graphnode 20 62 85 104 132 193 229 Graphnodelist 20 21 i 27 28 28 100 124 125 140 147 148 149 INCL 180 j 140 155 155 1 140 154 155 215 221 line 171 201 221 link 40 40 41 51 170 221 loc 58 61 64 65 66 67 70 79 103 106 106 107 108 114 179 180 180 185 186 186 194 198 199 207 loci 194 201 204 206 IP 37 37 38 73 73 110 111 113 114 117 168 200 202 220 m 26 28 59 65 66 78 99 106 107 123 179 180 185 186 Mark 66 107 179 180 Marked 65 106 185 186 Marklist 26 59 99 179 185 maxloc 194 197 199 roaxn 147 197 217 218 219 223 234 ntaxnodes 20 28 218 maxp 14 124 maxs 14 124
APP- cocograMOD 273 141 154 155 156 ^cfteforeDelNts 191 210 fl****** 201 215 224 &&& 88 153 164 200 Ot 34 36 37 37 37 38 39 40 40 40 41 48 p' 50 51 51 51 51 163 P* . 114 206 207 229 230 Section 16 218 aesct 51 74 113 114 118 168 204 220 * 215 220 semi 169 204 220 169 204 220 112 170 204 220 86 89 89 101 125 125 142 154 154 195 71 89 154 201 204 220 154 sem2 sem3 sn sp start 125 Symbolnode 14 86 101 142 195 Symboltype 15 215 t 153 162 215 220 typ 70 88 90 112 150 160 200 201 204 220 HriteCard 17 70 70 71 148 168 168 169 169 170 170 171 HriteLn 17 71 171 HriteString 17 69 144 145 146 148 152 158 161 162 163 164 165 HriteText 18 156
274 Program listings %F l 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 (* cocolex Lexical analyzer for coco **>e 83.03.27 This is the Coco-scanner. It a) reads the input grammar b) returns symbol numbers and terminal attributes to the parser c) hashes names and strings into a name list (permanently or temporarily) d) converts number-strings to values All symbols which are not terminals of Cocol get the symbol type 'nococosy1 and are hashed into the name list. 1 DEFINITION MODULE cocolex; FROM FilelO IMPORT File; VAR typ: CARDINAL; (*next token code*) at: ARRAY[1..10] OF CARDINAL; (*attr. values of current token*) line: CARDINAL; (*current line number*) col: CARDINAL; (*current column number*) ddt: ARRAY ["A".."Z"] OF BOOLEAN; (*debug and test switches*) src: File; (*source file*) PROCEDURE GetName(spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR len:CARDINAL); (* Get the text of a name or a string with the spelling index spix. len denotes its length*) PROCEDURE GetSy; (* Gets the next input token and fills at, line and col*) PROCEDURE RestartHash; (* Causes identifiers and strings to be stored permanently*) PROCEDURE StopHash; (* Causes identifiers and strings to be stored temporarily*) END cocolex.
APP- cocolexMOD 275 X 2 3 4 5 6 7 8 9 10 U 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 (* cocolex: lexical analyzer for coco moe 83.03.27 83.12.23 *MS is the Coco-scanner. It a) reads the input grammar b! returns symbol numbers and terminal attributes to the parser J hashes names and strings into a name list (permanently or temporarily) d) converts number-strings to values Ml symbols which are not terminals of Cocol get the symbol type •nococosy' and are hashed into the name list. IMPLEMENTATION MODULE cocolex; FROM cocosyn IMPORT printinput, printnodes; IMPORT SemErr, Restriction; IMPORT con, EF, EOL, File, Read, Write, WriteCard, WriteString, WriteText; IMPORT VAL; *) FROM Errors FROM FilelO FROM SYSTEM CONST eofsy ident string number eqlsy periodsy variantsy lparsy rparsy lbracksy rbracksy Iconbrsy rconbrsy latparsy ratparsy = ■ = = = = = = = « = = = = « semicolonsy= colonsy commasy nococosy notyp buflen = = = = = 0; 17; 18; 19; 20; 21; 22; 23; 24; 25; 26; 27; 28; 29; 30; 31; 32; 33; 34; 255; 1024*16 (*lexical types*) (*numbers 1..16 reserved for keywords*) (* ( *) (* ) *) (* [ *) (* 1 *) TYPE Charclass = (none,letter,digit,quote,eql, period,variant,lpar,rpar,Ibrack, rbrack,Iconbr,rconbr,latpar,ratpar,semicolon,colon,comma,endf lie, endline,dollar,minus); VAR c: CHAR; class: ARRAY [0C..377C] OF Charclass; (*class OF input character*) *>uf: ARRAY[0. .buflen-1] OF CHAR; (*input buffer*) bp,bpmax:CARDINAL; (*buffer pointers*) CONST idmax = 4980; htmax = 359; (*max.length of identifier list*) (*max.length of hash table*)
276 Program listings APP.P 60 VAR 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 ch: column: i: id: idact: keys: ht: storeid: (* NextCh CHAR; CARDINAL; CARDINAL; ARRAY[0..idmax+20 CARDINAL; CARDINAL; ] OF CHAR; ARRAY[0..htmax] OF CARDINAL BOOLEAN; Get next PROCEDURE NextCh; BEGIN Read(src rch); INC(column); END NextCh; (* Hash PROCEDURE VAR h,l,d: (♦current input character*) (*start of current column*) (♦identifiers*) (*last element IN id*) (*pos. OF last keyword IN id*) ; (*hash table*) (*store id. permanently?*) input character (chrcolumn global) Hash an identifier and return its spix Hash(idp:CARDINAL; INTEGER; VAR spix: CARDINAL); PROCEDURE EqualId(x,y,l:CARDINAL):BOOLEAN; VAR i: CARDINAL; BEGIN i:«0; WHILE RETURl* (i<l) AND (id[x+i] 1 i=l; END EqualId; »id[y+i]) DO INC(i); END; 91 92 BEGIN 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 (* EnterKey 113 l:=idp-idact; spix:=idact+l; h:=(ORD(id[spix])*7 + ORD(id[spix+1)) + 1) * 17 MOD htmax; d:= -htmax; LOOP IF ht[h]=0 THEN (*new identifier*) IF storeid THEN ht[h]:=spix; idact:=idp; END; EXIT; ELSIF Equalld(ht[h],spix,l) THEN spix:=ht[h); EXIT; ELSE INC(dr2); IF d=htmax THEN Restriction(1); END; h:=(h+ABS(d)) MOD htmax; END; END; (*LOOP*) IF idp>idmax THEN Restriction(2); END; END Hash; (*old identifier*) (♦collision*) (*hash table full*) (*identifier list full*) Enter a keyword to the identifier list 114 PROCEDURE EnterKey(sy:CARDINAL; key:ARRAY OF CHAR); 115 VAR idp,i: INTEGER; 116 BEGIN 117 INC(idact); id[idact]:=CHR(sy); idp:=idact; (*store symbol number*) 118 FOR i:=0 TO HIGH(key) DO (*store keyword*)
APPF cocolexMOD in ll9 iNC(idp); id[idp]:=key[i]; kfl end; !;? iNC(idp); id[idp]:-OC; v) Hash(idp,keys); (*keys contains the last keyword spix at any time*) J23 END EnterKey; 124 125 19* (* GetName Get the name of an identifier from the name list I26 * \ 127 128 PROCEDURE GetName (spix:CARDINAL;VAR name:ARRAY OF CHAR;VAR 1:CARDINAL); 129 VAR i,h:CARDINAL; 130 BEGIN 131 i:=spix; 1:=0; h:=HIGH(name); 132 WHILE (id[i]<>0C) AND (K=h) DO 133 name[l]:-id[i]; INC(i); INC(l); 134 END; 135 END GetName; 136 137 138 (* ReadName Read identifier or keyword 139 *) 140 PROCEDURE ReadName (VAR typ,val:CARDINAL); 141 VAR spix,idp: CARDINAL; 142 BEGIN 143 idp:=idact; 144 WHILE (class[ch]=letter) OR (class[ch]=digit) DO 145 INC(idp); id[idp]:=ch; NextCh; 146 END; 147 INC(idp); idfidp]:=0C; 148 Hash(idp,spix); 149 IF spix<=keys 150 THEN typ:=ORD(id[spix-l]); val:=0; (*keyword*) 151 ELSE typ:=ident; val:=spix; (*identifier*) 152 END; 153 END ReadName; 154 155 156 (* ReadString Read and hash a string 157 *) 158 PROCEDURE ReadString(VAR spixCARDINAL); 159 VAR 160 och: CHAR; 161 idp: CARDINAL; 162 BEGIN 163 idp:=idact; och:=ch; 164 INC(idp); id[idp]:=och; NextCh; (*store quote*) 165 LOOP J66 if ch=och THEN NextCh; EXIT; 167 ELSIF ch=EF THEN SemErr(24,line,col); EXIT; 168 ELSIF ch=EOL THEN SemErr(23,line,col); EXIT; J69 ELSE INC(idp); id[idp]:=ch; NextCh; END; 17° END; 111 INC(idp); id[idp]:=och; (*store quote*) 112 INC(idp); id[idpJ:=0C; 173 Hash(idp,spix) j74 END ReadString; 176 177 (* RestartHash Causes identifiers to be stored permanently
278 Program listings App.p 178 179 PROCEDURE RestartHash; 180 BEGIN storeid:=TRUE; END RestartHash; 181 182 183 (* StopHash Causes identifiers to be stored temporarily 184 185 PROCEDURE StopHash; 186 BEGIN storeid:=FALSE; END StopHash; 187 188 189 (* ReadNumber Read and convert cardinal constant 190 191 PROCEDURE ReadNumber(VAR valrCARDINAL); 192 BEGIN 193 val:=0; WHILE class[ch]=digit DO IF (val>6553) OR ( (val=6553) AND (ch>'5') ) THEN SemErr(22,line,col); WHILE class[ch]=digit DO NextCh; END; ELSE val: =10*val+VAL(CARDINAL,ORD(ch)-ORD(f0')) ; NextCh; END; END; END ReadNumber; 194 195 196 197 198 199 200 201 202 203 204 205 206 207 (* GetSy 208 get next lexical symbol 209 PROCEDURE GetSy; 210 VAR valrCARDINAL; 211 BEGIN 212 REPEAT 213 WHILE ch=' » 214 col:=column; DO NextCh; END; 215 CASE class[ch] OF 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 none: letter: digit: quote: eql: period: variant: lpar: rpar: lbrack: rbrack: lconbr: rconbr: latpar: ratpar: semicolon: colon: comma: endfile: endline: typ:=nococosy; at[l]:=ORD(ch); NextCh; ReadName(typ,va1); IF typ=ident THEN at[l]:=val; END; ReadNumber(at[1]); typ:=number; ReadString(at[l]); typ:=string; typ:~eqlsy; NextCh; typ:=periodsy; NextCh; typ:-variantsy; NextCh; typ:=lparsy; NextCh; typ:=rparsy; NextCh; typ:=lbracksy; NextCh; typ:=rbracksy; NextCh; typ:=lconbrsy; NextCh; typ:=rconbrsy; NextCh; typ:=latparsy; NextCh; typ:=ratparsy; NextCh; typ:=semicolonsy; NextCh; typ:=colonsy; NextCh; typ:=commasy; NextCh; typ:=eofsy; typ:=notyp;
ApPF cocolexMOD 279 column:=0; INC(line); NextCh; IF (line MOD 16)=0 THEN (*update counter on screen*) IF line>16 THEN FOR i:=l TO 5 DO Write(con,IOC) END; END; WriteCard(con,line,5) | dollar: 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 BEGIN (*cocolex*) FOR c:="A" TO "Z» DO ddt[c]:=FALSE END; FOR c:=0C TO 377C DO classic]:=none; END; FOR c:=»a' TO 'z' DO classic]:=letter; END; FOR c:='A' TO 'Z' DO classic]:=letter; END; FOR c:='0' TO '9' DO classic] :=digit; END; NextCh; IF CAP(ch)="D" (*debug option*) THEN NextCh; WHILE (CAP(ch)>="A") AND (CAPtchX^Z") DO ddt[CAP(ch)]:=TRUE; NextCh END; IF ddt ["A"] THEN printinput:=TRUE END; IF ddtf-B") THEN printnodes:=TRUE END; WHILE chOEOL DO NextCh; END; typ:=notyp; ELSE typ:=nococosy; at[l]:=ORD('$'); END; NextCh; IF ch='-' THEN WHILE chOEOL DO NextCh; END; typ:=notyp; ELSE typ:=nococosy; atfl] :=ORD('-'); END; END; (*CASE*) UNTIL typonotyp; END GetSy; I minus: 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 class[EF]:=endfile; class I"'"]:=quote; class I •(']:=lpar; class I'-1]:=minus; class[EOL]:=endline; classic1] :=dollar; class I')' ]:=rpar; class[•.']:=period; class I';1] :=semi colon; classic*]: class['>']: classfC): *ratpar; =lconbr; class['[']: class!1|»]; =latpar; =lbrack; ^variant; class I •" class[', class[•: class [• = class[•] class['} ]:=quote; ]:=comma; ]:=colon; ]:=eql; ]:=rbrack; ]:=rconbr; column:=0; col:=0; line:=l; ch:= FOR i:-0 TO htmax-1 DO ht[i]:-0; END; storeid:=TRUE; id[0]:="E"; id[l]:-"0"; id(2]:="F"; id[3]:=0C; idact:=3; EnterKey( 1,'ALIAS'); EnterKey( 2,'ANY'); EnterKey( 3,'DECLARATIONS') EnterKey( 4,'ENDGRAM'); EnterKey( 5,'ENDSEM'); EnterKey( 6,'EPS'); EnterKey( 7,'GRAMMAR'); EnterKey( 1,'alias'); EnterKey( 2,'any'); EnterKey( 5,'endsem'); EnterKey( 6,'eps');
280 Program listings App. 296 EnterKey( 8,'IN'); EnterKey( 8,'in'); 297 EnterKey( 9,'MACROS'); 298 EnterKey(10,'NONTERMINALS'); 299 EnterKey(11,'OUT'); EnterKey(11,'out'); 300 EnterKey(12,'PRAGMAS'); 301 EnterKey(13,'RULES'); 302 EnterKey(14,'SEM'); EnterKey(14,'sem'); 303 EnterKey(15,'SEMANTIC'); 304 EnterKey(16,'TERMINALS'); 305 END cocolex. ABS 105 at 216 218 219 220 255 262 bp 53 bpmax 53 buf 52 buflen 41 52 C 51 51 121 132 147 172 240 271 271 287 c 50 270 270 271 271 272 272 273 273 274 274 CAP 245 248 248 249 ch 61 75 144 144 145 163 166 167 168 169 194 195 198 200 213 215 216 245 248 248 249 253 258 260 283 Charclass 44 51 class 51 144 144 194 198 215 271 272 273 274 275 275 276 276 276 277 277 277 278 278 278 279 279 279 280 280 280 281 281 281 cocolex 12 305 cocosyn 13 col 167 168 197 214 283 colon 46 233 278 colonsy 36 233 column 62 75 214 237 283 comma 46 234 277 commasy 37 234 con 15 240 242 d 82 95 103 104 105 ddt 249 251 252 270 digit 45 144 194 198 219 274 dollar 47 244 276 EF 15 167 275 endfile 46 235 275 endline 47 236 275 EnterKey 114 123 289 289 290 290 291 292 293 293 294 294 295 296 296 297 298 299 299 300 301 302 302 303 304 eofsy 20 235 EOL 15 168 253 260 275 eql 45 221 279 eqlsy 24 221 Equalld 84 90 100 Errors 14 File 15 FilelO 15 GetName 128 135 GetSy 209 266 h 82 94 97 98 100 101 105 105 129 131 132 Hash 81 109 122 148 173 HIGH 118 131
APPF cocolexMOD 281 67 97 98 100 101 285 htmax i id idact ldent idmax idp key keys 1 latpar latparsy lbrack lbracksy lconbr lconbrsy letter line lpar lparsy minus name NextCh nococosy none notyp number och period periodsy printinput printnodes quote ratpar ratparsy rbrack rbracksy rconbr rconbrsy Read ReadName ReadNumber ReadString RestartHash Restriction rpar rparsy SemErr semicolon semicolonsy spix 58 63 131 64 150 65 21 57 81 143 171 114 66 82 46 33 45 29 46 31 45 167 45 27 47 128 73 223 237 38 45 40 23 160 45 25 13 13 45 46 34 46 30 46 32 16 140 191 158 179 14 45 28 14 46 35 81 150 67 85 132 88 164 93 151 64 93 145 171 118 122 84 230 230 226 226 228 228 144 168 224 224 257 131 76 224 244 216 216 236 219 163 222 222 251 252 220 231 231 227 227 229 229 75 153 204 174 180 104 225 225 167 232 232 93 151 94 87 133 88 169 93 218 108 98 145 172 119 149 88 279 280 281 217 197 277 278 133 145 225 247 255 271 254 164 278 276 280 280 281 217 219 220 108 277 168 279 94 158 95 88 133 94 171 98 108 147 172 89 272 237 164 226 249 262 261 166 276 197 94 173 104 88 240 94 172 117 115 147 173 93 273 238 166 227 253 265 171 98 105 88 285 117 287 117 117 148 94 239 169 228 257 100 285 88 285 119 287 117 119 161 100 242 198 229 260 101 89 121 287 143 119 163 128 283 201 230 128 115 132 287 163 121 164 131 213 231 131 118 133 288 121 164 132 216 232 141 119 145 122 169 133 221 233 148 129 147 141 169 133 222 234 149
282 Program listings App. p src StopHash storeld string sy SYSTEM typ val VAL variant varlantsy Write WriteCard WriteString WriteText X y 75 185 68 22 114 17 140 225 254 140 17 45 26 16 16 16 16 84 84 186 98 220 117 150 226 255 150 200 223 223 240 242 88 88 180 151 227 261 151 281 186 216 228 262 191 286 217 218 219 220 221 222 223 224 229 230 231 232 233 234 235 236 265 193 195 195 200 200 210 217 218
ApPF cocolstDEF 283 l {* cocolst Prints listing of Cocol text Hoe 16.8.87 a ======= ============================ 3 This module closes the source file and reopens it for reading. It prints 4 a listing of the source file with line numbers and error messages. 5 *> 6 DEFINITION MODULE cocolst; 7 FROM FilelO IMPORT File; 8 9 VAR 1st: File; (*list file*) 10 U PROCEDURE PrintListing; 12 13 END cocolst.
284 Program listings App.F 1 (* cocolst Prints listing of Cocol text Moe 16.8.87 3 This module closes the source file and reopens it for reading. It prints 4 a listing of the source file with line numbers and error messages. 5 *, 6 IMPLEMENTATION MODULE cocolst; 7 FROM cocolex IMPORT src; 8 FROM Errors 9 FROM FilelO 10 11 12 13 (* GetLine 14 IMPORT Errorptr, GetNextSynErr,GetNextSemErr, PrlntSynError; IMPORT File, EF, EOLr Open, Close, Read, Write, WriteString, WriteCard, WriteLn; Read a source line. Return empty line if eof. 15 PROCEDURE GetLine(f:File; VAR line:ARRAY OF CHAR); 16 VAR ch:CHAR; i:CARDINAL; 17 BEGIN 18 Read(f,ch); i:=0; 19 WHILE (chOEOL) AND (choEF) DO line[i]:=ch; INC(i); Read(f,ch) END; 20 IF (i=0) AND (ch-EF) THEN line[0]:=EF ELSE line[i]:=0C END; 21 END GetLine; 22 23 24 25 (* PrintSemError Print semantic error message 26 PROCEDURE PrintSemError(f:File; nr,col:CARDINAL); 27 VAR i:CARDINAL; 28 BEGIN WriteString(f,"***** WriteString(f,"A "); CASE nr OF 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 .55 56 57 58 PROCEDURE PrintListing; 59 VAR "); FOR i:=l TO col-1 DO Write(f," ") END; "Symbol declared twice"); "Grammar name is no nonterminal"); "Undeclared symbol"); "Terminal on left-hand side of rule"); "Two rules for the same nonterminal"); "Wrong number of attributes"); "In-attribute for a terminal"); "Wrong attribute direction"); "Wrong attribute name"); "Attribute constant on left-hand side of rule"); "Semantic macro declared twice"); "Undeclared semantic macro"); "Pragma used in rules"); "File •cocosynframe■ not found"); "Number too big"); "End of line in string"); "End of file in string"); "File •cocosemframe' not found"); 1: WriteString(f I 2: WriteString(f I 3: WriteString(f I 4: WriteString(f I 5: WriteString(f I 6: WriteString(f( I 7: WriteString(f I 8: WriteString(f I 9: WriteString(f 110: WriteString(f 111: WriteString(f 112: WriteString(f 116: WriteString(f 121: WriteString(f, 122: WriteString(f. 123: WriteString(f( 124: WriteString(f( 125: WriteString(f ELSE WriteString(f,"Error"); END; WriteLn(f); END PrintSemError; (* PrintListing Print a source list with error messages
APPF cocolstMOD 60 61 62 63 64 65 66 67 68 69 volRef: INTEGER; (*volume or directory of source file*) srcn: ARRAY[0..63] OF CHAR; (*source name*) line: ARRAY[0..255] OF CHAR; (*source line*) symbols: Errorptr; synlinersyncol: CARDINAL; semnr: CARDINAL; semline,semcol: CARDINAL; lnr: CARDINAL; sync,seme:CARDINAL; i: CARDINAL; (*pointer to error symbols*) (*line and column of syntax error*) (♦semantic error number*) (*line and column of semantic error*) (*line number*) (*error counters*) 70 BEGIN 71 72 volRef:=srcA.volRef; i:=0; REPEAT srcn[i]:=srcA.name[i]; INC(i) UNTIL srcn[i-l]=0C; 73 Close(src); Open(src,volRef,srcn,FALSE); 74 GetNextSemErr(semnr,semline,semcol); 75 GetNextSynErr(symbols,synline,syncol); 76 GetLine(srcrline); lnr: =4; semc:=0; sync:=0; 77 WHILE line[0]<>EF DO 78 WriteCard(lst,lnr,5); WriteStringUst," "); 79 WriteString(1st,line); WriteLn(lst); 80 WHILE synline=lnr DO 81 PrintSynError(lst,symbols,syncol); INC(sync); 82 GetNextSynErr(symbols,synline,syncol); 83 END; 84 WHILE semline^lnr DO 85 PrintSemError(1st,semnr,semcol); INC(semc); 86 GetNextSemErr(semnr,semline,semcol); 87 END; 88 GetLine(src,line); INC(lnr); END; WHILE symbolsoNIL DO PrintSynError(1st,symbols,syncol); INC(sync); GetNextSynErr(symbols,synline,syncol); 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 END cocolst. WHILE semnr<>0 DO PrintSemError(1st,semnr,semcol); INC(seme); GetNextSemErr(semnr,semline,semcol); END; WriteLn(lst); WriteCard(1st,sync,5); WriteString(1st," WriteCard(lst,semc,5); WriteString(1st," END PrintListing; syntax error(s)$"); semantic error(s)$$* C ch Close cocolex cocolst col EF EOL Errorptr Errors f File 20 16 9 7 6 26 8 72 18 73 103 29 19 19 63 19 19 19 19 20 20 20 77 15 18 19 37 38 39 49 50 52 9 15 26 26 40 29 41 29 42 30 43 32 44 33 45 34 46 35 47
286 Program listings arm* FilelO GetLlne GetNextSemErr GetNextSynErr 1 line lnr 1st name nr Open PrintListing PrintSemError PrintSynError Read seme semcol semline semnr sre sren symbols sync syncol synline volRef Write WriteCard WriteLn WriteString 9 15 8 8 16 72 15 67 78 100 72 26 9 58 26 8 9 68 66 66 65 7 61 63 68 64 64 60 9 10 10 10 41 21 74 75 18 72 19 76 78 31 73 101 53 81 18 76 74 74 74 71 72 75 76 75 75 71 29 78 52 29 42 76 86 82 19 20 78 79 85 91 19 85 85 84 85 72 72 81 81 81 80 71 99 79 30 43 88 96 92 19 20 80 79 95 95 86 86 86 73 73 82 91 82 82 73 100 98 32 44 20 62 84 81 100 95 96 94 73 90 99 91 92 33 45 20 76 88 85 96 95 76 91 92 34 46 27 77 91 96 88 92 35 47 29 79 95 36 48 69 88 98 37 49 72 99 38 50 72 99 39 78 72 100 40 79 99 100
APPF cocosemDEF 287 l (* Generated semantic analyzer 3 This module is produced by Coco from the semantic actions of an 4 attributed grammar. 5 6 DEFINITION MODULE cocosem; 7 VAR printactions: BOOLEAN; (*trace the executed semantic actions*) 8 PROCEDURE Semant(sem:CARDINAL); 9 END cocosem.
288 Program listings AppiF 1 2 (* Generated semantic analyzer 3 =========================== 4 This module is produced by Coco from the semantic actions of an 5 attributed grammar. 6 *j 7 IMPLEMENTATION MODULE cocosem; 8 FROM FilelO IMPORT con, WriteCard, WriteString; 9 FROM SYSTEM IMPORT WORD; 10 FROM cocolex IMPORT at; 11 12 13 FROM cocogen IMPORT Attrtype,CloseFile,Copy,EmitAction,GenAssign, 14 InsertFramePart,OpenFile,OpenSem,StartCopy; 15 FROM cocogra IMPORT alts,rules,rootloc,ConcatLeft,ConcatRight, 16 GetNode,GraphList,Graphnode,NewNode,RepNode; 17 FROM cocolex IMPORT typ,line,col,ddt,RestartHash,StopHash; 18 FROM cocosym IMPORT gramspix,CompleteAt,Direction, 19 GetAt,GetMacroNr,GetSy,NewAt,NewMacro, 20 NewSy,RepSy,Symbolnode,Symboltype,SyNr; 21 FROM Errors IMPORT CompErr,Restriction,SemErr; 22 FROM SYSTEM IMPORT VAL; 23 CONST null=65535; 24 TYPE Usage=(def,check,use); 25 VAR sn:Symbolnode; 26 sy,syl:CARDINAL; 27 rootsyrCARDINAL; 28 eofsyrCARDINAL; 29 gnrGraphnode; 30 gp,gpl,gp2,gp3:CARDINAL; 31 gl,gll,gl2,gl3rCARDINAL; 32 dd,ddl,dd2:BOOLEAN; 33 gporCARDINAL; 34 firstfactrBOOLEAN; 35 kind:Usage; 36 styprSymboltype; 37 dir,dirldirection; 38 countrCARDINAL; 39 n:CARDINAL; 40 semi, sem2, sem3: CARD INAL ; 41 firstsymbolrBOOLEAN; 42 ok:BOOLEAN; 43 spix,spixl:CARDINAL; 44 dummy: CARDINAL; 45 MODULE SEMANTICSTACK; 46 IMPORT CompErr,Restriction; 47 EXPORT Pop,Push; 48 CONST maxstacksize=70; 49 VAR stack:ARRAY[1..maxstacksize]OF CARDINAL; 50 sprCARDINAL; 51 PROCEDURE Pop():CARDINAL; 52 VAR xrCARDINAL; 53 BEGIN IF sp=0 THEN CompErr(6);ELSE x:=stack[sp];DEC(sp);END; 54 RETURN x; 55 END Pop; 56 PROCEDURE Push(x:CARDINAL); 57 BEGIN IF sp<maxstacksize 58 THEN INC(sp);stack[sp]:=x; 59 ELSE Restriction(14);
ApPF cocosem.MOD 289 60 END; 61 END Push; 62 BEGIN sp:=0; 63 END SEMANTICSTACK; 64 PROCEDURE Error(nr:CARDINAL); 65 BEGIN SemErr(nr,line,col);END Error; 66 67 PROCEDURE ASSIGN(VAR xrWORD; y:WORD); 68 BEGIN 69 x:=y; 70 END ASSIGN; 71 72 PROCEDURE Semant (sem: CARDINAL); 73 BEGIN 74 (*IF printactions THEN 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 WriteString(con,"$ ["); WriteCard(con,sem,3); WriteString(con,"] "); END; *) CASE sem OF 11: ; I 12: 1 13: (*line 125*) INC(count); CASE kind OF use: IF styp=nt THEN GetAt(sy,count,spixl,dirl); IF spixloO THEN IF dir-dirl THEN GenAssign(nonterm,spixl,spix) ; ELSE Error(8);END; END; END; 1 check: IF styp=nt THEN GetAt(sy,count,spixl,dirl); IF spixloO THEN IF spixospixl THEN Error (9) ;END; IF dirodirl THEN Error(8);END; END; END; Idef: NewAt(sy,spix,dir); END; (*line 150*) INC(count); CASE kind OF use: IF stjp-t THEN GenAssign(term,spix,count); ELSIF styp=nt THEN GetAt (sy,count,spixl,dirl); IF spixloO THEN IF dir-dirl THEN GenAssign (nonterm,spix, spixl) ELSE Error(8); END; END;
Program listings END; I check: IF styp=nt THEN GetAt (sy,count,spixl,dirl); IF spixloO THEN IF spixospixl THEN Error (9) ;END; IF dirodirl THEN Error(8);END; END; END; Idef: NewAt (sy, spix,dir); IF styp-pr THEN GenAssign(term,spix,count); END; END; 14: (*line 181*) INC(count); IF kind=use THEN IF styp=nt THEN GetAt (sy,count,spixl,dirl); IF spixloO THEN IF dirodirl THEN GenAssign(const,spixl,n); ELSE Error(8); END; END; END; ELSE Error(10); END; 15: (*line 198*) IF NOT CompleteAMsy,count)THEN Error(6); END; 16: (*line 204*) Copy(typ,col) 17: (*line 208*) StartCopy(l) 18: (*line 212*) firstfact:=VAL(BOOLEAN,Pop()); ddl:=VAL(BOOLEAN,Pop());gll:=Pop();gpl:=Pop(); dd:-VAL(BOOLEAN,Pop());gl:=Pop();gp:=Pop(); gpo:=0 19: (*line 219*) Push(gp);Push(gl);Push(VAL(CARDINAL,dd)); Push(gpl);Push(gll);Push(VAL(CARDINAL,ddl)); Push(VAL(CARDINAL,firstfact)); 20: (*line 225*) sy:-SyNr(splx); IF sy=null THEN sy:=NewSy(spix,styp) ELSE Error(1); END; 21: (*line 349*) ASSIGN(gramspix,at(1]); 22: (*llne 349*) rules:=0;alts:=0; OpenFlle(gramspix);StopHash; 23: (*line 357*) RestartHash;
APPF 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: cocosemMOD InsertFramePart;styp:=t; (*line 363*) eofsy:=NewSy(0,t) (*line 365*) styp:=t; kind:=def; (*line 368*) styp:=pr (*line 370*) styp:=pr; kind:=def; (*line 371*) GetSy(sy,sn);sn.semi :=sem2; RepSy(sy,sn); (*line 376*) GetSy(sy,sn);sn.sem2:=sem3; RepSy(sy,sn); (*line 382*) styp:=nt (*line 383*) ASSIGN(spix,at[1]); (*line 384*) styp:=nt; kind:=def; (*line 386*) rootsy:=SyNr(gramspix); IF rootsy=null THEN Error(2);END; (*line 390*) sy:=SyNr(spix); IF sy=null THEN Error(3);sy:=NewSy(spix,err) END; GetSy(sy,sn); IF(sn.typont)AND(sn.typoerr)THEN Error(4); END; IF sn.startoO THEN Error(5);END; syl:=sy;count:=0;styp:=sn.typ (*line 401*) kind:=check; (*line 404*) GetSy(syl,sn); sn.start:=gp;sn.del:=dd; RepSy(syl,sn); INC(rules); (*line 410*) rootloc:=NewNode(nt,rootsy, 0); gpl:=NewNode(t,eofsy, 0) ; gl:=rootloc;gll:=gpl; ConcatRight(rootloc,gl,gpl,gll) (*line 415*) IF ddt["L"]THEN GraphLlst;END; CloseFile; (*line 420*) gp:=gpl; gl:-gll; dd:=ddl; (*line 420*) INC(alts); 291
292 Program listings %♦* 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 I 41: I 42: I 43: 1 44: 1 45: 1 46: I 47: ! 48: 1 • 49: 1 50: 1 51: I 52: I 53: 1 54: (*llne 422*) INC(alts); ConcatLeft(gp,glrgpl,gll); dd:=dd OR ddl (*llne 429*) gpo:=0 (*llne 430*) flrstfact:-TRUE; (*line 430*) gpl:=gp2; gll:-gl2; ddl:=dd2; (*line 431*) firstfact:=FALSE; (*llne 432*) IF gp2<>0 THEN ConcatRight(gpl,gll,gp2,gl2); ddl:=ddl AND dd2; END; (*line 440*) sy:=SyNr(spix); IF sy=null THEN Error(3);sy:=NewSy(spixrerr) END; GetSy(sy,sn); IF sn.typ=pr THEN Error(16);END; gp2:=NewNode(sn.typ,syrline) ; gl2:=gp2;dd2:=FALSE;gpo:=gp2; count:=0;styp:=sn.typ (*line 450*) kind:=use; (*line 451*) GetNode(gp2,gn); gn.semi:=seml;gn.sem2:-sem2; RepNode(gp2,gn) (*line 456*) gp2:=NewNode(eps,0,line); gl2:=gp2;dd2:=TRUE;gpo:=gp2 (*line 459*) gp2:=NewNode(any,0,line); gl2:=gp2;dd2:=FALSE;gpo:=gp2 (*line 462*) IF gpo=0 THEN gp2:=NewNode(eps, 0 r1ine) ; gl2:=gp2;dd2:=TRUE; GetNode(gp2,gn);gn.sem3:=sem RepNode(gp2rgn); ELSE GetNode(gpo,gn);gn.sem3:= RepNode(gpo,gn); gp2:=0;gl2:=0;gpo:=0 END; (*line 475*) gp2:=gp; gl2:=gl; dd2:=dd; (*llne 478*) gp2:=NewNode(eps,0,line); gl2:=gp2; ConcatLeft(gp,gl,gp2,gl2);
APP- cocosemMOD 293 gp2:=gp;gl2:=gl;dd2:=TRUE; ?07 I 55: (*llne 485*) *?' gp2:=NewNode(eps,0,line) ; ZZ gl2:=gp2; *:0 concatRlght(gpr gl,gp,gl); ;" ConcatLeft(gp,gl,gp2,gl2); ,no gp2:=gp;dd2:-TRUE; 5«3 | 56: (*llne 493*) 304 IF firstfact THEN 305 gp3:=gp2;gl3:=gl2; 306 gp2:=NewNode(eps,0,1lne); gl2:=gp2 ; 307 ConcatRlght(gp2,gl2rgp3rgl3); 308 END' 309 | 57: (*line 502*) 310 seml:=0;sem2:=0 311 | 58: (*line 503*) 312 count:=0; 313 | 59: (*line 510*) 314 IF stypont THEN Error (7) ;END; 315 dir:=down; 316 I 60: (*line 515*) 317 ASSIGN(n,at[1]); 318 I 61: (*line 520*) 319 IF Jcind=use THEN 320 EmitAction (line, semi); 321 END; 322 I 62: (*line 526*) 323 dir:-up 324 I 63: (*line 531*) 325 IF(kind=use)OR(styp=pr)THEN 326 EmitAction(line,sem2); 327 END; 328 | 64: (*line 537*) 329 StopHash;firstsymbol:=TRUE 330 | 65: (*line 538*) 331 RestartHash 332 | 66: (*line 539*) 333 GetMacroNr(spix,sem3); 334 if sem3=0 THEN Error (12);END; 335 | 67: (*line 543*) 336 if firstsymbol THEN 337 firstsymbol:=FALSE; 338 0penSem(line,sem3);StartCopy(col) 339 END; 340 Copy(typ,col) 341 | 68: (*line 549*) 342 RestartHash; 343 | 69: (*line 556*) 344 0penSem(line,sem3); 345 NewMacro(spix,sem3,ok); 346 if NOT ok THEN Error(11);END; 347 StopHash;firstsymbol:=TRUE; 348 | 70: (*line 562*) 3j9 IF firstsymbol THEN firstsymbol:=FALSE;StartCopy(col) END; 350 351 352 Copy(typ,col) 353 I 71: (*line 568*) 354 RestartHash 355 I 72: (*line 575*)
294 Program listings App.F 356 357 358 END; 359 END Semant; 360 BEGIN 361 printactions:=FALSE; 362 END cocosem. GetSy(sy,sn);sn.aliasspix:-spix; RepSy(sy,sn); aliasspix alts any ASSIGN at Attrtype check CloseFile cocogen cocogra cocolex cocosem cocosym col CompErr CompleteAt con ConcatLeft ConcatRight const Copy count dd ddl dd2 ddt def del dir dirl Direction down dummy EmitAction eofsy eps err Error Errors FilelO firstfact firstsymbol GenAssign GetAt GetMacroNr GetNode GetSy gi 356 15 174 276 67 70 10 172 13 24 94 13 230 13 15 10 7 18 17 21 236 238 172 198 317 198 317 120 217 17 362 65 46 18 149 8 15 239 15 227 141 13 153 38 83 215 265 32 159 32 158 32 248 17 229 24 102 220 37 89 37 87 18 37 315 44 13 320 28 180 273 280 208 211 64 65 204 208 21 153 338 340 350 352 53 295 301 253 300 307 340 352 87 96 106 110 112 122 131 135 138 149 312 162 220 234 240 240 291 163 234 240 248 254 254 254 264 274 277 281 291 296 302 128 183 188 201 99 103 114 125 129 140 315 323 89 96 99 112 114 122 125 138 140 326 225 293 298 306 259 91 98 99 116 124 125 142 146 150 169 212 214 259 262 314 334 346 34 157 164 244 250 304 41 329 336 337 347 349 90 110 115 131 141 96 112 122 138 350 13 19 87 19 333 16 269 282 284 19 190 193 210 219 261 356 31 159 162 226 227 233 239 290 295 296 300 300
APP-F cocosemMOD 295 gll gl2 gl3 gn 9P gpl gp2 gp3 gpo gramsplx GraphList Graphnode insertFramePart kind line maxstacksize n NewAt NewMacro NewNode NewSy nonterm nr nt null ok OpenFile OpenSem Pop Pr printactions Push RepNode RepSy RestartHash Restriction rootloc rootsy rules sem semi sem2 sem3 Semant SEMANTICSTACK SemErr sn sp spix spixl 301 31 31 299 31 29 30 302 30 30 276 296 30 33 18 16 16 14 35 17 344 48 39 19 19 16 20 90 64 86 23 42 14 14 47 130 361 47 16 20 17 21 15 27 15 72 40 40 40 72 45 21 25 219 50 43 206 43 124 158 247 301 305 269 159 158 246 277 298 305 160 172 229 29 178 84 65 49 141 103 345 224 168 115 65 95 167 345 175 338 51 185 56 271 191 177 46 224 203 174 79 190 190 193 359 63 65 190 220 53 90 208 87 138 163 253 305 307 270 162 163 252 277 299 307 242 175 107 263 57 317 129 225 180 111 204 346 344 55 187 61 283 194 331 59 226 204 222 270 193 282 190 220 53 98 257 88 139 226 264 306 270 220 225 253 280 301 264 203 136 273 263 208 121 207 157 262 162 285 221 342 227 224 270 270 282 191 221 53 103 259 90 141 227 274 307 271 232 226 263 281 302 274 183 276 273 259 137 258 158 325 162 357 354 310 270 284 193 261 57 110 333 96 233 277 282 239 227 264 282 305 277 188 280 276 196 158 162 320 310 284 193 262 58 115 345 97 239 281 282 289 232 264 283 306 279 201 293 280 200 158 163 326 333 194 263 58 124 356 98 247 286 283 295 239 269 286 306 284 217 298 293 211 159 163 334 210 265 62 129 112 253 290 284 296 246 271 289 307 285 267 306 298 224 159 163 338 211 356 131 113 294 284 300 253 273 293 286 319 320 306 314 159 164 344 211 356 166 115 295 285 300 274 294 325 326 345 214 357 168 122 296 301 274 295 338 215 198 123
296 Program listings App-F stack 49 53 58 start 214 220 StartCopy 14 155 338 350 StopHash 17 175 329 347 styp 36 86 95 109 111 121 130 137 168 178 182 185 187 196 200 215 265 314 325 sy 26 87 96 103 112 122 129 138 149 166 167 168 190 191 193 194 206 207 208 210 215 257 258 259 261 263 356 357 syl 26 215 219 221 Symbolnode 20 25 Symboltype 20 36 SyNr 20 166 203 206 257 SYSTEM 9 22 t 109 178 180 182 225 term 110 131 typ 17 153 211 211 215 262 263 265 340 352 up 323 Usage 24 35 use 24 85 108 136 267 319 325 VAL 22 157 158 159 162 163 164 WORD 9 67 67 WriteCard 8 WriteString 8 x 52 53 54 56 58 67 69 y 67 69 •$*f
APPF cocosemframe i (* Generated semantic analyzer 2 =ai========================= 3 This module is produced by Coco from the semantic actions of an 4 attributed grammar. 6 DEFINITION MODULE —>modulename; 7 VAR printactions: BOOLEAN; (*trace the executed semantic actions*) 8 PROCEDURE Semant(sem:CARDINAL); 9 END —>modulename. 10 -^implementation 11 (* Generated semantic analyzer 12 =========================== 13 This module is produced by Coco from the semantic actions of an 14 attributed grammar. 15 16 IMPLEMENTATION MODULE ~>modulename; 17 FROM FilelO IMPORT con, WriteCard, WriteString; 18 FROM SYSTEM IMPORT WORD; 19 FROM —>scannername IMPORT at; 20 21 — declarations 22 23 PROCEDURE ASSIGN(VAR x:WORD; y:WORD); 24 BEGIN 25 x:=y; 26 END ASSIGN; 27 28 PROCEDURE Semant(sem:CARDINAL); 29 BEGIN 30 (*IF printactions THEN 31 WriteString(con, "$ ["); 32 WriteCard(con, sem, 33 WriteString(con,"] 34 END;*) 35 CASE sem OF 36 11: ; 37 —>actions 38 END; 39 END Semant; 40 BEGIN 41 printactions:=FALSE; 42 END —>modulename. actions ASSIGN at con declarations FilelO implementation modulename Printactions scannername sem Semant SYSTEM WORD WriteCard 37 23 19 17 21 17 10 6 7 19 8 8 18 18 17 26 9 41 28 28 23 3); ■); 16 35 39 23 42
298 Program listings A /H>p.P WrlteString 17 x 23 25 y 23 25
APP] cocosymDEF Symbol list for coco Moe 28.12.83 j (* cocosym 2 -—* % This module 1 a) generates and updates symbol nodes for terminals, pragmas and nonterminals b) searches names in the symbol list c) stores and retrieves attribute information d) stores and retrieves semantic macros e) marks deletable symbols in symbol list f) collects first-sets, follow-sets, eps-sets and any-sets 5 6 7 8 9 10 11 12 DEFINITION MODULE cocosym; 13 14 CONST 15 maxterminals = 128; 16 17 TYPE Direction = (up,down); (*attribute direction*) Attributeptr = POINTER TO Attribute; Attribute = RECORD spix: CARDINAL; (*name of attribute*) dir: Direction; (*up,down*) next: Attributeptr; (*to next attribute of same nt*) END; (eps,t,pr,nt,any,err); RECORD (♦spelling index of symbol*) (♦spelling index of alias name*) (*no.of attributes*) (*type of symbol*) (*pragma semantics*) Symboltype = Symbolnode = spix: aliasspix: nra: CASE typ: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 VAR 41 maxany: 42 maxeps: maxt: maxp: maxs: gramspix CARDINAL; CARDINAL; CARDINAL; Symboltype OF pr: seml,sem2: CARDINAL; I nt,err: start: CARDINAL; del: BOOLEAN; firstat: Attributeptr; END; END; Symbolset = ARRAY[0 CARDINAL; CARDINALS- CARDINAL; CARDINAL; CARDINAL; CARDINAL; (*start of top-down graph*) (*TRUE if deletable*) (*to first attribute node*) maxterminals DIV 16] OF BITSET; (*no.of any-sets*) (*no.of eps-follower-sets*) (*no.of last terminal*) (*no.of last pragma*) (*no.of last nonterminal*) (*grammar name, filled by AG*) 43 44 45 46 47 48 49 PROCEDURE ClearSet(VAR s:Symbolset; n:CARDINAL); 50 (* clears set s*) 51 52 PROCEDURE CompleteAt(sy,nr:CARDINAL): BOOLEAN; 53 (* checks if symbol sy has nr attributes*) 54 55 PROCEDURE FindDelSymbols; 56 (* Marks deletable nonterminals and prints them*) 57 58 PROCEDURE GetA(n:CARDINAL; VAR set:Symbolset); 59 (* Gets the any-set with the number n*)
300 Program listings App.F 60 61 PROCEDURE GetAt(sy,n:CARDINAL; VAR spixCARDINAL; VAR dlr:Directlon); 62 (* Gets the spelling index spix and the direction dir of the n-th 63 attribute of the symbol sy*) 64 65 PROCEDURE GetE(n:CARDINAL; VAR set:Symbolset); 66 (* Gets the eps-follower-set with the number n*) 67 68 PROCEDURE GetF(sy:CARDINAL; VAR first:Symbolset); 69 (* Gets the set of terminal start symbols for the nonterminal sy*) 70 71 PROCEDURE GetFirstSet(loc:CARDINAL; VAR set:Symbolset); 72 (* Gets the terminal start symbols of the graph with the root loc*) 73 74 PROCEDURE GetFo(sy:CARDINAL; VAR set:Symbolset); 75 (* Gets followers of the nonterminal sy*) 76 77 PROCEDURE GetMacroNr(spix:CARDINAL; VAR sem:CARDINAL); 78 (* Gets the number sem of the semantic action corresponding to the 79 macro with the name spix*) 80 81 PROCEDURE GetSy(sy:CARDINAL; VAR sn:Symbolnode); 82 (* Gets the symbol node with the index sy*) 83 84 PROCEDURE GetSymbolSets; 85 (* Collects first-sets, follower-sets, eps-sets and any-sets*) 86 87 PROCEDURE IsInSet(n:CARDINAL; VAR s:Symbolset):BOOLEAN; 88 (* TRUE if n is in set s*) 89 90 PROCEDURE NewAt (sy,spix:CARDINAL; dir direction); 91 (* Enters a new attribute for the symbol sy with the spelling index 92 spix and the direction dir*) 93 94 PROCEDURE NewMacro(spix,sem:CARDINAL; VAR ok:BOOLEAN); 95 (* Enters a new semantic macro with the name spix and the action number 96 sem*) 97 98 PROCEDURE NewSy(spix:CARDINAL; typ:Symboltype): CARDINAL; 99 (* Generates a new symbol with the name spix and the type typ and 100 returns its index*) 101 102 PROCEDURE RepSyfsy:CARDINAL; sn:Symbolnode); 103 (* Replaces the symbol sy by the node sn*) 104 105 PROCEDURE SetBit(VAR s:Symbolset; n:CARDINAL); 106 (* Sets bit n in set s*) 107 108 PROCEDURE Unit(VAR si,s2:Symbolset; n:CARDINAL); 109 (* Adds the set s2 to the set si*) 110 111 PROCEDURE SyNr(spix:CARDINAL): CARDINAL; 112 (* Gets the symbol number for the identifier with the name spix*) 113 114 END cocosym.
APPF cocosymMOD 301 (* cocosym Symbol list for coco 3 This module 4 a) generates and updates symbol nodes for terminals, nonterminals b) searches names in the symbol list c) stores and retrieves attribute information d) stores and retrieves semantic macros Moe 29.12.83 pragmas and 5 6 7 8 9 10 11 e) marks deletable symbols in symbol list f) collects first-sets, follow-sets, eps-sets and any-sets *) 12 IMPLEMENTATION MODULE cocosym; 13 FROM cocogra IMPORT maxn, rootloc, ClearMarkList, Deletable, DelNode, 14 GetNode, Graphnode, Mark, Marked, Marklist, RepNode; IMPORT line, col, ddt, GetName; IMPORT 1st; IMPORT CompErr, Restriction, SemErr; IMPORT con, Write, WriteCard, WriteString,WriteText,WriteLn; IMPORT Allocate; IMPORT VAL; 15 FROM cocolex 16 FROM cocolst 17 FROM Errors 18 FROM FilelO 19 FROM System 20 FROM SYSTEM 21 22 CONST anysetsize epssetsize maxsymbols maxnt null eofsy 20; 70; 200; 80; 65535; 0; (*max.no.of compl.-sets for any-symbols*) (*max.no.of eps-follower-sets*) (*max.no.of symbols*) (*max.number of nonterminals*) TYPE Anyset Epsset 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 VAR 50 anyset: Anyset; (*actual no.of any-sets*) 51 column: CARDINAL; (*printing column for terminal sets*) 52 epsset: Epsset; (*actual no.of eps-sets*) 53 first: Firstset; (*terminal start symbols*) 54 firstmacro: Macroptr; (*first sem macro*) 55 fnt: CARDINAL; (*no.of first nonterminal*) 56 follow: Followset; (*terminal successors*) 57 lastmacro: Macroptr; (*last sem macro*) 58 sn: Symbollist; (*symbol list*) 59 .anysetsize] OF Symbolset; .epssetsize] OF Symbolset; .maxnt-1] OF RECORD (♦terminal symbols*) (*nts whose start set is to be added*) .maxnt-1] OF RECORD (♦terminal symbols*) (*TRUE if ts is complete*) (*name of semantic macro*) (♦associated semantic action*) (*to next sem macro*) = ARRAY[1 = ARRAY[1 Followset = ARRAY[0 ts: Symbolset; nts: Symbolset; END; Firstset = ARRAY[0 ts: Symbolset; ready: BOOLEAN- END; Macroptr = POINTER TO Macronode; Macronode = RECORD spix: CARDINAL; sem: CARDINAL; next: Macroptr; END; Symbollist = ARRAY[0..maxsymbols] OF Symbolnode;
302 Program listings App.p 60 61 PROCEDURE AllBit(VAR s:Symbolset); FORWARD; 62 PROCEDURE DelBlt(VAR s:Symbolset; n:CARDINAL); FORWARD; 63 PROCEDURE PrlntSet(VAR s:Symbolset; n:CARDINAL); FORWARD; 64 PROCEDURE PutNt(sy:CARDINAL); FORWARD; 65 PROCEDURE PutTermSet(VAR s:Symbolset); FORWARD; 66 67 68 (* CompleteAt Test if nr is the correct no.of attributes 69 *j 70 PROCEDURE CompleteAt(syrnr:CARDINAL): BOOLEAN; 71 BEGIN RETURN (sn[sy].nra=nr) OR (sn[sy].typ=err); END CompleteAt; 72 73 74 (* FlndDelSymbols Find all deletable symbols and print them 75 *} 76 PROCEDURE FlndDelSymbols; 77 VAR 78 change: BOOLEAN; 79 dummy: CARDINAL; 80 first: BOOLEAN; 81 i,l: CARDINAL; 82 name: ARRAY[1..50] OF CHAR; 83 sn: Symbolnode; 84 BEGIN 85 fnt:=maxp+l; 86 REPEAT (*while new deletable symbols*) 87 change:=FALSE; 88 FOR i:=maxp+l TO maxs DO 89 GetSy(i,sn); 90 IF (NOT sn.del) AND (sn.startoO) AND Deletable (sn.start) THEN 91 sn.del:=TRUE; RepSy(irsn); change:=TRUE; 92 END; 93 END; 94 UNTIL NOT change; 95 96 first:=TRUE; (*print deletable symbols*) 97 FOR i:=maxp+l TO maxs DO 98 GetSy(i,sn); 99 IF sn.del THEN 100 IF first THEN 101 WriteLn(lst); WriteLn(lst); 102 WriteString(1st."Deletable symbols:"); WriteLn(lst); 103 first:=FALSE; 104 END; 105 GetName(sn.spixrname,l); 106 WriteStringdst," "); WriteText(1st,name, 1); WriteLn(lst); 107 END; 108 END; 109 IF first THEN 110 WriteLn(lst); WriteLn(lst); 111 WriteStringdst,"Grammar contains no deletable symbols."); 112 WriteLn(lst); 113 END; 114 END FlndDelSymbols; 115 116 117 (* GetA Returns the any-set with the number nr 118 *)
APP- cocosymMOD 303 PROCEDURE GetA(nr:CARDINAL; VAR s:Symbolset); }8 BEGIN s:=anyset[nr); END GetA; 121 10* {* GetAnySets Find the complement sets for any-nodes 1* -* v * \ 124 ' 125 PROCEDURE GetAnySets; 126 VAR 127 9n: Graphnode; 128 loc,i: CARDINAL; 129 s: Symbolset; 130 BEGIN (*GetAnySets*) 131 FOR loc:=l TO maxn DO 132 GetNode(loc,gn); 133 IF (gn.typ=any) AND (gn.lpoO) THEN (*any with alternatives*) 134 GetFirstSet(gn.lp,s); 135 FOR i:=0 TO maxt DIV 16 DO (*make complement*) 136 s[i]:=VAL(BITSET,65535)-s[i]; 137 END; 138 DelBit(s,eofsy); (*any must not recognize eofsy*) 139 INC(maxany); anyset[maxany]:=s; 140 gn.sp:=maxany; RepNode(loc,gn); 141 END; 142 END; 143 END GetAnySets; 144 145 146 (* GetAt Get name and direction of an attribute 147 *) 148 PROCEDURE GetAt(syrnr:CARDINAL; VAR spix:CARDINAL; VAR dir:Direction); 149 VAR 150 i: CARDINAL; 151 p: Attributeptr; 152 BEGIN 153 if <sn[sy].typont) AND (sn[sy] .typoerr) THEN CompErr(3); END; 154 if (nr>sn[sy].nra) OR (sn[syj.typ=err) 155 THEN spix:=0; dir:=down; (*semantic error*) 156 ELSE 157 p:=sn[sy].firstat; !58 FOR i:=l TO nr-1 DO p:=pA.next; END; 159 spix:«pA.spix; dir:=^pA.dir; 160 END; 161 END GetAt; 162 163 164 (* GetE Returns the eps-set with the number nr 165 *l 166 PROCEDURE GetE(nrCARDINAL; VAR s:Symbolset); 167 BEGIN s:=epsset[nr]; END GetE; 168 169 ,!:? ** GetEpsSets Find the follower symbols for eps-nodes i/l ^ 172 PROCEDURE GetEpsSets; "3 VAR F YL\ curnt: CARDINAL; {'I «s Marklist; j7'* sn: Symbolnode;
304 Program listings Afp.*? 178 PROCEDURE FindEpsFollowers(locrleftsy:CARDINAL; VAR nrrCARDINAL); 179 VAR s:Symbolset; 180 BEGIN 181 GetFirstSet(loc,s); 182 IF Deletable(loc) THEN Unit(srfollow[leftsy-fnt].ts,maxt); END; 183 INC(maxeps); epsset[maxeps]:=s; 184 nr:=maxeps; 185 END FindEpsFollowers; 186 187 PROCEDURE FindEps(loc,leftsy:CARDINAL; vialp:BOOLEAN); 188 VAR 189 gn: Graphnode; 190 nr: CARDINAL; 191 BEGIN 192 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; 193 Mark(loc,m); 194 GetNode(loc,gn); 195 WITH gn DO 196 IF (typ=eps) AND (vialp OR (lp<>0)) THEN 197 FindEpsFollowers(rp,leftsy,nr); 198 sp:=nr; RepNode(locrgn); 199 END; 200 IF lp<>0 THEN FindEpsUp,leftsy,TRUE); END; 201 IF rp<>0 THEN FindEps(rp,leftsy,FALSE); END; 202 END; 203 END FindEps; 204 205 BEGIN (*GetEpsSets*) 206 ClearMarkList(m); 207 FOR curnt:=maxp+l TO maxs DO 208 GetSy(curntrsn); 209 FindEps(sn.start,curnt,FALSE); 210 END; 211 END GetEpsSets; 212 213 214 (* GetF Returns the terminal start symbols of sy 215 *) 216 PROCEDURE GetF(sy:CARDINAL; VAR s:Symbolset); 217 BEGIN s:=first[sy-fnt].ts; END GetF; 218 219 220 (* GetFirstSet Gets the terminal start symbols of the graph in loc 221 *) 222 PROCEDURE GetFirstSet(loc:CARDINAL; VAR set:Symbolset); 223 VAR m: Marklist; (*mark list for visited nodes*) 224 225 PROCEDURE CollectFirstSet(locCARDINAL; VAR set:Symbolset); 226 VAR 227 gn: Graphnode; 228 sn: Symbolnode; 229 si: Symbolset; 230 BEGIN 231 ClearSet(setfmaxt); 232 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; 233 WHILE locoO DO (*for all alternatives*) 234 Mark (loc,m) ; 235 GetNode(loc,gn); 236 IF ddt[MG"] THEN
APPF cocosymMOD WriteString(con,"CollectFirstSet:"); 2*1 WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ),6); *L WriteCard(con,gn.sp,6); WriteLn(con); 240 END; Z^ IF DelNode(gn) THEN 02 CollectFirstSet(gn.rp,sl); Unit(set,sl,maxt); 243 END; 244 CASE gn.typ OF 245 ePs: ; 246 It: SetBit(set,gn.sp); 247 I nt: IF first[gn.sp-fnt].ready 248 THEN Unit(set,first[gn.sp-fnt].ts,maxt); 249 ELSE 250 GetSy(gn.sp,sn); 25i CollectFirstSet(sn.start,si); Unit(set,sl,maxt); 252 END; 253 I any: AllBit(set); 254 END; (*CASE*) 255 loc:=gn.lp; 256 END; (*WHILE*) 257 END CollectFirstSet; 258 259 BEGIN (*GetFirstSet*) 260 ClearMarkList(m); 261 CollectFirstSet(loc,set); 262 IF ddt["H"] THEN 263 WriteString(con,"GetFirstSet:"); PrintSet(set,maxt); 264 END; 265 END GetFirstSet; 266 267 268 (* GetFollowSets Get terminal successors of nonterminals 269 270 PROCEDURE GetFollowSets; 271 VAR 272 change: BOOLEAN; 273 i,n,nl: CARDINAL; 274 m: Marklist; 275 sn: Symbolnode; 276 277 PROCEDURE CollectFollowSets(loc,sym:CARDINAL); 278 VAR 279 gn: Graphnode; 280 set: Symbolset; 281 BEGIN 282 WHILE locoO DO (*step through alternative chain*) 283 if Marked(loc,m) THEN RETURN; END; (*cycle*) 284 Mark(loc,m); 285 GetNode(loc,gn); 286 WITH gn DO 287 if ddt["J"] THEN 288 WriteString(con,"CollectFollowSets "); 289 WriteCard(con,loc,6); WriteCard(con,sp,6); 290 WriteCard(con,sym,6); WriteLn(con); 291 END; 292. if typ=nt THEN 293 GetFirstSet(rp,set); 294 Unit(follow[sp-fnt].ts,set,maxt); 295 IF Deletable(rp) THEN
306 Program listings App.F 296 SetBit(follow[sp-fnt].nts,sym-fnt); 297 END; 298 IF ddt["I"] THEN 299 WriteString(con,"CollectFollowSets:"); 300 WriteCard(con,loc,6); 301 WriteString(con,"$ "); PrintSet(follow[sp-fnt].ts,maxt); 302 WriteString(con,"$ "); 303 PrintSet(follow[sp-fnt].nts,maxs-maxp); 304 WriteLn(con); 305 END; 306 END; (*IF typ=nt*) 307 CollectFollowSets(rp,sym); 308 loc:=lp; 309 END; (*WITH*) 310 END; (*WHILE*) 311 END CollectFollowSets; 312 313 PROCEDURE Complete(i:CARDINAL); (*add indirect successors of*) 314 VAR j: CARDINAL; (*i+fnt to follow[i] .ts*) 315 BEGIN 316 IF Marked(i,m) THEN RETURN; END; (*already visited*) 317 Mark(i,m); 318 FOR j:=0 TO maxs-fnt DO 319 IF IsInSet(j,follow[i].nts) THEN 320 Complete(j); 321 Unit(follow[i].ts,follow!j].ts,maxt); 322 END; 323 END; 324 END Complete; 325 326 BEGIN (*GetFollowSets*) 327 FOR i:=fnt TO maxs DO 328 ClearSet(follow[i-fnt].ts,maxt); 329 ClearSet(follow[i-fnt].nts,maxs-fnt); 330 END; 331 332 ClearMarkList(m); 333 FOR i:=fnt TO maxs DO (*get direct successors of nonterminals*) 334 GetSy(i,sn); 335 IF ddt["I"] THEN 336 WriteString(con,"GetFollowSets(0):"); WriteCard(con,sn.start,6); 337 WriteCard(con,i,6); WriteLn(con); 338 END; 339 CollectFollowSets(sn.start,i); 340 END; 341 CollectFollowSets(rootloc,maxs+l); (*successors of grammar symbol*) 342 343 FOR i:=0 TO maxs-fnt DO (*add indirect successors to follow.ts*) 344 ClearMarkList (m) ; 345 Completed); 346 ClearSet(follow[i].nts,maxt); 347 END; 348 349 IF ddt["I"] THEN 350 WriteString(con,"GetFollowSets(3):$"); 351 FOR i:=0 TO maxs-fnt DO 352 WriteCard(con,fnt+i,6); PrintSet(follow[i].ts,maxt); 353 WriteLn(con); 354 END; 355 END;
APP-F cocosymMOD 307 356 END GetFollowSets; 357 358 359 (* GetFo Get follow-set of nonterminal sy 360 *> 361 PROCEDURE GetFo (sy:CARDINAL; VAR set:Symbolset); 362 BEGIN set:=follow[sy-fnt].ts; END GetFo; 363 364 365 (* GetMacroNr Get semantic macro 366 *) 367 PROCEDURE GetMacroNr (spix:CARDINAL; VAR sem:CARDINAL) ; 368 VAR p: Macroptr; 369 BEGIN 370 p:=firstmacro; 371 WHILE (pONIL) AND (pA.spixospix) DO p:=pA.next; END; 372 IF p=NIL THEN sem:=0; ELSE sem:=pA.sem; END; 373 END GetMacroNr; 374 375 376 (* GetSy Gets the symbol sy 377 *) 378 PROCEDURE GetSy(sy:CARDINAL; VAR snl:Symbolnode) ; 379 BEGIN snl:=sn[sy]; END GetSy; 380 381 382 (* GetSymbolSets Get first-sets, follower-sets, eps-sets and any-sets 383 *) 384 PROCEDURE GetSymbolSets; 385 VAR 386 i: CARDINAL; 387 sn: Symbolnode; 388 BEGIN 389 fnt:=maxp+l; 390 FOR i:-0 TO maxs-fnt DO first[i].ready:=FALSE; END; 391 FOR i:=fnt TO maxs DO 392 GetSy(i,sn); 393 GetFirstSet(sn.start,first[i-fnt].ts); 394 first[i-fnt].ready:-TRUE; 395 END; 396 GetFollowSets; 397 GetEpsSets; 398 GetAnySets; 399 IF ddt["K"] THEN (*print first-sets and follow-sets*) 400 WriteLn(lst); 401 WriteString(lst,"List of terminal start symbols:"); WriteLn(lst); 402 FOR i:=fnt TO maxs DO 403 PutNt(i); PutTermSet(first[i-fnt].ts); 404 END; 405 WriteLn(lst); WriteLn(lst); 406 WriteString(1st,"List of terminal successors:"); WrlteLn(lst); 407 FOR i:=fnt TO maxs DO 408 PutNt(i); PutTermSet(follow[i-fnt].ts); 409 END; 410 END; 411 END GetSymbolSets; 412 413 414 (* NewAt Enter new attribute for a symbol
308 Program listings App.F 415 «] 416 PROCEDURE NewAt (sy,spx:CARDINAL; dir:Direction); 417 VAR 418 i: CARDINAL; 419 prat: Attributeptr; 420 BEGIN 421 WITH sn[sy] DO 422 INC(nra); 423 IF typ=nt THEN (*store name and direction*) 424 Allocate(atr SIZE(Attribute)); 425 atA.spix:=spx; atA.dir:=dir; atA.next:=NIL; 426 IF firstat=NIL 427 THEN firstat:=at; 428 ELSE 429 p:=firstat; WHILE pA.next<>NIL DO p:=pA.next END; 430 pA.next:=at; 431 END; 432 END; 433 END; 434 END NewAt; 435 436 437 (* NewMacro Enter new semantic macro 438 *) 439 PROCEDURE NewMacro(spixrsem:CARDINAL; VAR ok:BOOLEAN); 440 VAR prs: Macroptr; 441 BEGIN 442 p:=firstmacro; 443 WHILE (pONIL) AND (pA.spix<>spix) DO p:=pA.next; END; 444 IF p^NIL 445 THEN 446 ok:=TRUE; 447 Allocate (s,SIZE(Macronode)); 448 sA.spix:=spix; sA.sem:=sem; sA.next:=NIL; 449 IF firstmacro=NIL 450 THEN firstmacro:=s; lastmacro:=s; 451 ELSE lastmacroA.next:=s; lastmacro:=s; 452 END; 453 ELSE ok:=FALSE; 454 END; 455 END NewMacro; 456 457 458 (* NewSy Generate a new symbol and return index 459 *) 460 PROCEDURE NewSy(spx:CARDINAL; tprSymboltype): CARDINAL; 461 VAR i: CARDINAL; 462 BEGIN 463 IF maxs=null THEN maxs:=0; ELSE INC(maxs); END; 464 IF maxs>=maxsymbols THEN Restriction(6); END; 465 WITH sn[maxs] DO 466 typ:=tp; spix:=spx; aliasspix:=spix; nra:=0; 467 CASE typ OF 468 t: 469 IF maxt=null THEN maxt:=0; ELSE INC(maxt); END; 470 IF maxp=null THEN maxp:-0; ELSE INC(maxp); END; 471 IF maxt>=maxterminals THEN Restriction(7); END; 472 j pr: 473 IF maxp=null
APPF cocosymMOD 309 74 THEN SemErr(25rline,col); maxp:=0; maxt:=0; ]75 ELSE INC(maxp); 476 END' 477 seml:=0; sem2:=0; 478 I nt,err: Al9 start:=0; del:=FALSE; firstat:«NIL; 480 END; (*CASE*) 481 END; (*WITH*) 482 RETURN maxs; 483 END NewSy; 484 485 486 (* RepSy Replace symbol sy 487 *) 488 PROCEDURE RepSy(sy:CARDINAL; snl:Symbolnode); 489 BEGIN sn[sy]:=snl; END RepSy; 490 491 492 (* SyNr Gets index of name splx 493 *) 494 PROCEDURE SyNr(splx:CARDINAL): CARDINAL; 495 VAR i: CARDINAL; 496 BEGIN 497 IF maxs=null THEN RETURN null; END; 498 i:=0; 499 WHILE (i<=maxs) AND (sn[i] .spixospix) DO INC(i); END; 500 IF i>maxs THEN i:»null; END; 501 RETURN i; 502 END SyNr; 503 504 505 (* AllBit Set all bits in set s 506 *) 507 PROCEDURE AllBit(VAR s:Symbolset); 508 VAR i: CARDINAL; 509 BEGIN 510 FOR i:=0 TO maxterminals DIV 16 DO s[i]:=VAL(BITSET,65535); END; 511 END AllBit; 512 513 514 (* ClearSet Clears set s 515 *) 516 PROCEDURE ClearSet(VAR siSymbolset; n:CARDINAL); 517 VAR i: CARDINAL; 518 BEGIN FOR i:-0 TO n DIV 16 DO s[i]:={}; END; END ClearSet; 519 520 521 (* DelBit Deletes bit n in set s 522 *) 523 PROCEDURE DelBit(VAR s:Symbolset; n:CARDINAL); 524 BEGIN EXCL(s[n DIV 16], n MOD 16); END DelBit; 525 526 527 (* Empty TRUE if set s is empty 528 *) 529 PROCEDURE Empty(VAR s:Symbolset; n:CARDINAL):BOOLEAN; 530 VAR i CARDINAL; 531 BEGIN 532 FOR i:=0 TO n DIV 16 DO
310 Program listings App-F 533 534 535 536 537 538 539 IF S[!]<>{} THEN RETURN FALSE; END; END; RETURN TRUE; END Empty; (* InSet TRUE if si <= s2 54 o *} 541 PROCEDURE InSet(VAR sl,s2:Symbolset; n:CARDINAL):BOOLEAN; 542 VAR i: CARDINAL; 543 BEGIN 544 FOR i:-0 TO n DIV 16 DO 545 IF NOT(sl[i]<=s2[i]) THEN RETURN FALSE; END; 546 END; 547 RETURN TRUE; 548 END InSet; 549 550 551 (* IsInSet TRUE if n is in set s 552 *) 553 PROCEDURE IsInSet(n:CARDINAL; VAR s:Symbolset):BOOLEAN; 554 BEGIN RETURN (n MOD 16) IN s[n DIV 16]; END IsInSet; 555 556 557 (* PrintSet ddt output of set s 558 *) 559 PROCEDURE PrintSet(VAR s:Symbolset; n:CARDINAL); 560 VAR i: CARDINAL; 561 BEGIN 562 FOR i:=0 TO n DIV 16 DO 563 WriteCard(conrVAL(CARDINALrs[i]) DIV 256,4); 564 WriteCard(con,VAL(CARDINAL,s[i]) MOD 256,4); 565 END; 566 END PrintSet; 567 568 569 (* PutNt Print name of nonterminal sy 570 *) 571 PROCEDURE PutNt(sy:CARDINAL); 572 VAR 573 1: CARDINAL; 574 name: ARRAY[1..50] OF CHAR; 575 sn: Symbolnode; 576 BEGIN 577 GetSy(sy,sn); GetName(sn.spix,name,l); 578 WHILE K12 DO INC(l); name[l]:=" "; END; 579 WriteLn(lst); 580 WriteStringdst," H); WriteText(1st,name, 1); Write(lst," "); 581 column:=15; 582 END PutNt; 583 584 585 (* PutTermSet Print names of terminals in set s 586 *) 587 PROCEDURE PutTermSet(VAR siSymbolset); 588 CONST maxlinelen = 72; 589 VAR 590 i,l: CARDINAL; 591 name: ARRAY[1..50] OF CHAR;
APPF cocosymMOD 311 592 sn: Symbolnode; 593 BEGIN 594 FOR i:=0 TO maxt DO 595 IF IsInSet(i,s) THEN 595 GetSy(i,sn); GetName(sn.spix,name,l); 597 IF column+l>maxlinelen THEN 598 WriteLn(lst); WriteStringUst," "); 599 column:-15; 600 END; 60i WriteText(1st,name,1); WriteStringUst," "); 602 INC(column,1+2); 603 END; (*IF IsInSet*) 604 END; (*FOR*) 605 WriteLn(lst); 606 END PutTermSet; 607 608 609 (* SetBlt Sets bit n in set s 610 611 PROCEDURE SetBit(VAR s:Symbolset; n:CARDINAL); 612 BEGIN INCL(s[n DIV 16],n MOD 16); END SetBit; 613 614 615 (* Unit si :- si + s2 616 617 PROCEDURE Unit(VAR sl,s2:Symbolset; n:CARDINAL); 618 VAR i: CARDINAL; 619 BEGIN FOR i:=0 TO n DIV 16 DO si[i]:=sl[i]+s2[i]; END; END Unit; 620 621 622 BEGIN (*cocosym*) 623 maxt:=null; maxp:*null; maxs:=null; firstmacro:~NIL; 624 maxany:=0; maxeps:=0; 625 END cocosym. aliasspix 466 AllBit 61 253 507 511 Allocate 19 424 447 any 133 253 Anyset 31 50 anyset 50 120 139 anysetsize 23 31 at 419 424 425 425 425 427 430 Attribute 424 Attributeptr 151 419 change 78 87 91 94 272 ClearMarkList 13 206 260 332 344 ClearSet 231 328 329 346 516 518 cocogra 13 cocolex 15 cocolst 16 cocosym 12 625 col 15 474 CollectFirstSet 225 242 251 257 261 CollectFollowSets 277 307 311 339 341 column 51 581 597 599 602 CompErr 17 153 Complete 313 320 324 345 CompleteAt 70 71
312 Program listings App.F con curnt ddt del DelBit Deletable DelNode dir Direction down dummy Empty eofsy eps Epsset epsset epssetsize err Errors EXCL FilelO FindDelSymbols FindEps FindEpsFollowers first firstat firstmacro Firstset fnt follow Followset FORWARD GetA GetAnySets GetAt GetE GetEpsSets GetF GetFirstSet GetFo GetFollowSets GetMacroNr GetName GetNode GetSy GetSymbolSets gn Graphnode i 18 299 563 174 15 90 62 13 13 148 148 155 79 529 28 196 32 52 24 71 17 524 18 76 187 178 53 403 157 54 37 55 327 393 56 352 33 61 119 125 148 166 172 216 134 361 270 367 15 14 89 384 127 235 285 14 81 273 339 237 300 564 207 236 91 138 90 241 155 416 536 138 245 52 167 32 153 114 200 185 80 426 370 53 85 328 394 182 362 56 62 120 143 161 167 211 217 181 362 356 373 105 132 98 411 132 238 286 127 88 313 343 238 301 208 262 99 523 182 159 183 154 201 197 96 427 442 182 329 402 294 408 63 398 397 222 396 577 194 208 133 239 189 89 316 345 238 302 209 287 479 524 295 159 478 203 100 429 449 217 329 403 296 64 265 596 235 250 133 241 227 91 317 346 239 304 298 416 209 103 479 450 247 333 407 301 65 293 285 334 134 242 279 97 319 351 239 336 335 425 109 623 248 343 408 303 393 378 140 244 98 321 352 263 336 349 425 217 294 351 319 379 140 246 128 327 352 288 337 399 247 296 352 321 392 189 247 135 328 386 289 337 248 296 362 321 577 194 248 136 329 390 289 350 390 301 389 328 596 195 250 136 333 390 290 352 393 303 390 329 198 255 150 334 391 290 353 394 318 391 346 227 279 158 337 392
APPF cocosymMOD 313 INCL InSet m 393 394 402 403 403 407 408 408 418 461 495 498 499 499 499 500 500 501 508 510 510 517 518 518 530 532 533 542 544 545 545 560 562 563 564 590 594 595 596 618 619 619 619 619 612 541 548 islnSet 319 553 554 595 4 314 318 319 320 321 { ai 105 106 573 577 578 578 578 580 590 596 597 601 602 lastmacro 57 450 451 451 ieftsy 178 182 187 197 200 201 line 15 474 loc 128 131 132 140 178 181 182 187 192 192 193 194 198 222 225 232 232 233 234 235 238 255 261 277 282 283 284 285 289 300 308 lp 133 134 196 200 200 255 308 1st 16 101 101 102 102 106 106 106 110 110 111 112 400 401 401 405 405 406 406 579 580 580 580 598 598 601 601 605 175 192 193 206 223 232 234 260 274 283 284 316 317 332 344 Macronode 41 42 447 Macroptr 41 45 54 57 368 440 Mark 14 193 234 284 317 Marked 14 192 232 283 316 Marklist 14 175 223 274 maxany 139 139 140 624 maxeps 183 183 184 624 maxlinelen 588 597 maxn 13 131 maxnt 26 33 37 maxp 85 88 97 207 303 389 470 470 470 473 474 475 623 maxs 88 97 207 303 318 327 329 333 341 343 351 390 391 402 407 463 463 463 464 465 482 497 499 500 623 maxsymbols 25 47 464 maxt 135 182 231 242 248 251 263 294 301 321 328 346 352 469 469 469 471 474 594 623 maxtermlnals 471 510 n 62 63 273 516 518 523 524 524 529 532 541 544 553 554 554 559 562 611 612 612 617 619 nl 273 name 82 105 106 574 577 578 580 591 596 601 NewAt 416 434 NewMacro 439 455 NewSy 460 483 next 45 158 371 425 429 429 430 443 448 451 nr 70 71 119 120 148 154 158 166 167 178 184 190 197 198 nra 71 154 422 466 nt 153 247 292 423 478 nts 35 296 303 319 329 346 null 27 463 469 470 473 497 497 500 623 623 623 °k 439 446 453 P 151 157 158 158 159 159 368 370 371 371 371 371 372 372 419 429 429 429 429 430 440 442 443 443 443 443 444
314 Program listings App. p pr 472 PrintSet 63 263 301 303 352 559 566 PutNt 64 403 408 571 582 PutTermSet 65 403 408 587 606 ready 39 247 390 394 RepNode 14 140 198 RepSy 91 488 489 Restriction 17 464 471 rootloc 13 341 rp 197 201 201 242 293 295 307 s 61 62 63 65 119 120 129 134 136 136 138 139 166 167 179 181 182 183 216 217 440 447 448 448 448 450 450 451 451 507 510 516 518 523 524 529 533 553 554 559 563 564 587 595 611 612 si 229 242 242 251 251 541 545 617 619 619 s2 541 545 617 619 sem 44 367 372 372 372 439 448 448 semi 477 sem2 477 SemErr 17 474 set 222 225 231 242 246 248 251 253 261 263 280 293 294 361 362 SetBit 246 296 611 612 sn 58 71 71 83 89 90 90 90 91 91 98 99 105 153 153 154 154 157 176 208 209 228 250 251 275 334 336 339 379 387 392 393 421 465 489 499 575 577 577 592 596 596 snl 378 379 488 489 sp 140 198 239 246 247 248 250 289 294 296 301 303 spix 43 105 148 155 159 159 367 371 371 425 439 443 443 448 448 466 466 494 499 499 577 596 spx 416 425 460 466 start 90 90 209 251 336 339 393 479 sy 64 70 71 71 148 153 153 154 154 157 216 217 361 362 378 379 416 421 488 489 571 577 sym 277 290 296 307 Symbollist 47 58 Symbolnode 47 83 176 228 275 378 387 488 575 592 Symbolset 31 32 34 35 38 61 62 63 65 119 129 166 179 216 222 225 229 280 361 507 516 523 529 541 553 559 587 611 617 Symboltype 460 SyNr 494 502 System 19 SYSTEM 20 t 246 468 tp 460 466 ts 34 38 182 217 248 294 301 321 321 328 352 362 393 403 408 typ 71 133 153 153 154 196 238 244 292 423 466 467 Unit 182 242 248 251 294 321 617 619 VAL 20 136 510 563 564 vialp 187 196 Write 18 580 WriteCard 18 238 238 239 289 289 290 300 336 337 352 563 564 WriteLn 18 101 101 102 106 110 110 112 239 290 304 337 353 400 401 405 405 406 579 598 605 WriteString 18 102 106 111 237 263 288 299 301 302 336 350
App-F cocosymMOD 401 406 580 598 601 WriteText 18 106 580 601
316 Program listings 1 (* General table-driven syntax analyzer 3 This is a parser module generated by Coco from an attributed grammar. 4 Before calling the procedure Parse from the main program, initialize 5 the scanner (<grammarname>lex.MOD). 6 *) 7 DEFINITION MODULE cocosyn; 8 VAR 9 printinput: BOOLEAN; (*trace the input tokens read*) 10 printnodes: BOOLEAN; (*trace the G-code interpretation*) 11 12 PROCEDURE Parse(VAR correct:BOOLEAN); 13 END cocosyn.
AppF cocosynMOD l 2 (* General table-driven syntax analyzer Re 3 ==================================== Moe 21.12.83 4 01 (21.12.83) First version (rewritten from PL/M) 5 02 (28.02.84) New interface for input and errors 6 03 (02.04.84) Error in EOL-processing corrected 7 04 (08.05.84) New EOL-processing 8 05 (23.07.84) For G-code 9 06 (30.08.84) Error recovery simplified 10 07 (05.04.85) New G-code instruction EPSA (ANYA modified) 11 08 (12.04.87) Grammar tables initialized INLINE 12 09 (12.04.87) typ,col,line and at exported by cocolex 13 10 (07.06.87) Name of error module and scanner procedure constant 14 15 IMPLEMENTATION MODULE cocosyn; 16 17 FROM Errors IMPORT SyntaxError, Errorptr, Errornode; 18 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString; 19 FROM System IMPORT Allocate; 20 FROM SYSTEM IMPORT ADDRESS, ADR, INLINE; 21 22 FROM cocosem IMPORT Semant; 23 FROM cocolex IMPORT GetSy, typ, at, line, col; 24 25 CONST 26 maxname = 385; 27 maxnamep = 45; 28 maxcode = 401; 29 maxany ■ 3; 30 maxeps = 10; 31 maxt = 34; 32 maxp = 34; 33 maxs = 45; 34 startpc = 397; 35 36 37 38 CONST (*G-code instructions*) 39 t = 0; ta = 1; nt = 2; nta = 3; 40 nts = 4; ntas = 5; any = 6; anya = 7; 41 eps = 8; epsa = 9; jmp « 10; ret = 11; 42 43 errdistmin - 2; (*min.distance between two errors*) 44 lmaxs = 50; (*max.stack length*) 45 eofsy = 0; (*token number of endfile symbol*) 46 47 TYPE 48 Attributenumbers = ARRAY[0..maxp] OF CARDINAL; 49 Namepointers = ARRAY[0. .maxnamep] OF CARDINAL; 50 Namelist = ARRAY [1. .maxname] OF CHAR; 51 Pragma = RECORD (*semantics for a pragma*) 52 sem2,sem3: CARDINAL; 53 END; 54 Pragmalist = ARRAY [maxt. .maxp] OF Pragma; 55 Symbolset = ARRAY[0. .maxt DIV 16] OF BITSET; 56 (*set of terminals*) 57 Symbolnode = RECORD (*symbol information (only for nt)*) 58 startpc: CARDINAL; (*start node of rule for nt*) 59 del: BOOLEAN; (*TRUE, if nt is deletable*)
318 Program listings App.p 60 61 62 63 64 65 \ 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 first: END; Symbollist Stack rAR tab: header: code: ntsymbol epsset: anyset: nra: ps: namep: name: END; correct: pc: errdist: newlacts: newpc: Symbolset; (*termina. = ARRAY [maxp+l..maxs] = ARRAY[l..lmaxs] OF ( POINTER TO RECORD (*grammar ARRAY[1..8] OF CARDINAL; ARRAY[l..maxcode] OF CHAR s: Symbollist; .s causing to analyze this nt*) OF Symbolnode; :ardinal; tables*) (*not used*) ; (*G-code area*) (♦nonterminals information*) ARRAY[l..maxeps] OF Symbolset; ARRAY[l..maxany] OF Symbolset; Attributenumbers; Pragmalist; Namepointers; Name11st; BOOLEAN; CARDINAL; CARDINAL; ARRAY [0..maxt] OF CARDINAL; ARRAY [0..maxt] OF CARDINAL; s,olds: Stack; lacts: CARDINAL; (*no.of attributes*) (♦semantics for pragmas*) (*pointers to symbol names*) (*symbol names*) (*error indicator*) (*program counter*) (*current error distance*) (*new stack length*) (*pc after recovery*) (*stack pointer*) 87 PROCEDURE GetSymInstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); 88 FORWARD; 89 PROCEDURE RestoreStack; FORWARD; 90 PROCEDURE SaveStack; FORWARD; 91 PROCEDURE StackElem(i:CARDINAL): CARDINAL; FORWARD; 92 PROCEDURE Triple(altrootCARDINAL); FORWARD; 93 94 95 (* Match Check if sy is member of the specified set 96 97 PROCEDURE Match(syCARDINAL; set:Symbolset): BOOLEAN; 98 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match; 99 100 101 (* NextSym Get next symbol 102 103 PROCEDURE NextSym; 104 BEGIN 105 106 107 108 109 110 HI 112 113 114 115 116 117 *) 118 LOOP GetSy; (*IF printinput THEN WriteString(con,H$(in: WriteString(conr") "); IF printnodes THEN WriteCard(con,lacts,3); WriteString(con,« END; END;*) IF typ<=maxt THEN RETURN END; WITH tabA DO IF correct AND (psftyp].sem2<>0) THEN Semant(ps[typ].sem2); END; IF correct AND (psftyp].sem3<>0) THEN Semant(ps[typj.sem3); END; "); WriteCard(con,typ,3); •I "); END;
APPF cocosynMOD 319 U9 IF typ=eofsy THEN RETURN END; 120 END; !2l END NextSym; 122 123 124 125 (*«=«========:=:============= ERRORS ==================================-*) 126 127 (* AdjustPc Adjust pc to next symbol Instruction 128 *) 129 PROCEDURE AdjustPc(VAR pc.CARDINAL); 130 BEGIN 131 WITH tabA DO 132 IF pc=0 THEN RETURN; END; 133 LOOP 134 CASE ORD(code[pc]) OF 135 t,ta,ntrnta,ntsrntasranyranya,eps,epsa: EXIT; 136 I jmp: pc:=256*ORD(code[pc+1])+ORD(code[pc+2]); 137 I ret: pc:=0; EXIT; 138 ELSE INC(pc); (*sem*) 139 END; 140 END; 141 END; 142 END AdjustPc; 143 144 145 (* Error Report syntax error 146 *) 147 PROCEDURE Error (VAR pc, al t root CARDINAL); 148 VAR 149 erel,h: Errorptr; 150 i,j: CARDINAL; 151 opcode,sy,nextpc,altpc,pel: CARDINAL; 152 153 PROCEDURE GiveName(q:Errorptr; sy:CARDINAL); 154 VAR prj: CARDINAL; 155 BEGIN 156 WITH tabA DO 157 p:=namep[sy]; j:=0; 158 WHILE (j<25) AND (name[p+j]<>0C) DO 159 INC(j); qA.txt[j]:=name[p+j-l]; 160 END; 161 qA.l:=j; 162 END; 163 END GiveName; 164 165 BEGIN (*Error*) 166 correct:=FALSE; 167 if errdist >= errdlstmin 168 THEN 169 Allocate(h,SIZE(Errornode)); GiveName(h,typ); (*pass near-symbol*) 170 hA.next:=NIL; el:=h; 171 pcl:=altroot; AdjustPc(pel); 172 WHILE pcl>0 DO 173 GetSymlnstr(pel,opcode,sy,nextpc,altpc); 174 if opcode<any THEN (*t,nt,nts,ta,nta,ntas*) 175 Allocate(e,SIZE(Errornode)); 176 GiveName(e,sy); (*pass expected symbol*) 177 elA.next:=e; el:=e; eA.next:-NIL;
320 Program listings App.p 178 END; 179 pcl:=altpc; 180 END; (*WHILE*) 181 SyntaxError(h,line,col); 182 Triple(altroot); SaveStack; 183 IF printnodes THEN 184 WriteString{conr"$ typ newpc newlacts$"); 185 FOR i:=0 TO maxt DO 186 IF newpc[i]<>0 THEN 187 WriteCard(con,i,5); WriteCard(con,newpc[i],10); 188 WriteCard(con,newlacts[i],10); WriteLn(con); 189 END; (*IF*) 190 END; (*FOR*) 191 END; (*IF*) 192 ELSE RestoreStack; 193 END; 194 WHILE newpc[typ]=0 DO 195 IF printnodes THEN 196 WriteString(con,"$(skip:*); WriteCard(con,typ,0); 197 WriteString(con,") "); 198 END; 199 NextSym; 200 END; 201 pc:=newpc[typ]; altroot:=pc; lacts:=newlacts[typ]; errdist:=0; 202 END Error; 203 204 205 (* Fill Fill triple list with alt-chain starting at pc 206 *) 207 PROCEDURE Fill(pc,lacts:CARDINAL); 208 VAR 209 i,opcode,sy,nextpc,altpc: CARDINAL; 210 s: Symbolset; 211 BEGIN 212 AdjustPc(pc); 213 WHILE pcoO DO 214 GetSymInstr(pc,opcode,sy,nextpc,altpc); 215 CASE opcode OF 216 t,ta: 217 newpc[sy]:=pc; newlacts[sy]:=lacts; 218 I nt,nta,nts,ntas: 219 s:-tabA.ntsymbols[sy].first; 220 FOR i:=0 TO maxt DO 221 IF Match(i,s) THEN newpc[i] :=pc; newlacts[i] reacts; END; 222 END; 223 IF tabA.ntsymbols[syJ.del THEN Fill(nextpc,lacts); END; 224 I eps,epsa: 225 Fill(nextpc,lacts); 226 ELSE (*any,anya: nothing*) 227 END; (*CASE*) 228 pc:=altpc; 229 END; (*WHILE*) 230 END Fill; 231 232 233 (* FillSucc Fill triple list with succ. of alt-chain at pc 234 *) 235 PROCEDURE FillSucc(pc,lactsCARDINAL); 236 VAR
App.F cocosynMOD 321 237 opcode,sy,nextpc,altpc: CARDINAL; 238 BEGIN 239 AdjustPc(pc); 240 WHILE pc>0 DO (*fill with successors of alternative-starts*) 241 GetSymlnstr(pc,opcode,syrnextpc,altpc); 242 IF nextpoO THEN Fill(nextpc,lacts); END; 243 pc:*altpc; 244 END; (*WHILE*) 245 END FillSucc; 246 247 248 (* GetSymlnstr Get G-code instruction at address pc 249 *, 250 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); 251 BEGIN (*assert: pc points to a symbol instruction (not RET,JMP,SEM,ANY)*) 252 WITH tabA DO 253 opcode:*ORD(code [pc]); 254 IF (opcode<-epsa) AND (opcodeoany) 255 THEN sy:=ORD(code[pc+l]); 256 ELSE sy:-0; 257 END; 258 CASE opcode OF 259 t,nt,eps: 260 nextpc:=pc+2; altpc:=0; 261 | ta,nta,anya,epsa: 262 nextpc:=pc+4; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]); 263 1 nts: nextpc:=pc+3; altpc:=0; 264 | ntas: nextpc:=pc+5; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]); 265 | any: nextpc:=pc+l; altpc:=0; 266 END; (*CASE*) 267 AdjustPc(nextpc); AdjustPc(altpc); 268 END; 269 (*assert: nextpc,altpc point to symbol instructions or are zero*) 270 END GetSymlnstr; 271 272 273 (* Triple Fill triple list 274 *) 275 PROCEDURE Triple(altroot CARDINAL); 276 VAR i: CARDINAL; 277 BEGIN 278 FOR i:=0 TO maxt DO (*clear triple list*) 279 newpc[i):=0; newlacts[i]:«0; 280 END; 281 FOR i:*l TO lacts DO (*fill with succ.of stacked nt's*) 282 (*s[l] contains successor at level 0*) 283 FillSucc(StackElem(i),i-l); 284 Fill(StackElem(i),i-1); 285 END; 286 FillSucc(altroot,lacts); (*fill with succ.of alt-chain*) 287 Fill(altroot,lacts); (*fill with current alt-chain*) 288 END Triple; 289 290 (*S===:SS=SSS=:==S=S=:===SSS===S:=SS= £ND ERRORS ===S==S==«^================«=====*) 291 292 293 294 (*«««*:««=::«=«=«««=« SYNTAXSTACK =======«===««==:======*===*===«*) 295
322 Program listings App.F 296 PROCEDURE Pop(VAR loc: CARDINAL); 297 BEGIN 298 IF lacts>0 299 THEN loc:=s[lacts]; DEC(lacts); 300 ELSE WriteString(con,"— Parser stack underflow.$"); HALT; 301 END; 302 (*IF printnodes THEN WriteString(con," pop"); END;*) 303 END Pop; 304 305 PROCEDURE Push(loc: CARDINAL); 306 BEGIN 307 IF lacts<lmaxs 308 THEN INC(lacts); s[lacts]:=loc; 309 ELSE WriteString(con,M— Parser stack overflow.$"); HALT; 310 END; 311 (*IF printnodes THEN WriteStringCcon," push"); END;*) 312 END Push; 313 314 PROCEDURE RestoreStack; 315 BEGIN s:=olds; END RestoreStack; 316 317 PROCEDURE SaveStack; 318 BEGIN olds:=s; END SaveStack; 319 320 PROCEDURE StackElem(i-.CARDINAL): CARDINAL; 321 BEGIN RETURN s[i]; END StackElem; 322 323 (*==«=«~=™««««««« END SYNTAXSTACK «====«===========:==:====== 324 325 326 {* TableContents A dirty trick to initialize the grammar tables 327 328 PROCEDURE TableContents; 329 BEGIN (*%% dont remove or change this comment*) 330 INLINE( 331 401, 34, 34, 45, 10, 3, 45, 385, 332 (*—G-code—*) 333 7, 17, 5398, 271, 22, 3, 4359, 256, 5648, 2560, 334 3592, 279, 265, 36, 811, 36, 2560, 7424, 4120, 812, 335 56, 5125, 9984,12569, 813, 39, 2560, 9985, 3072,20506, 336 812, 80, 5125, 9984,18459, 7171,10752,15645, 2560,15616, 337 2590, 273, 101, 7956, 1319, 94, 8195,11520,21258, 83, 338 2050, 8448, 3329, 4352,33311, 8709, 9984,29987, 2052, 3840, 339 5122, 9252, 21, 2560,27144, 805, 4, 9739, 549,10024, 340 278, 151, 549,10506, 141, 2053, 2858, 1062,11052, 1318, 341 168,11566, 2560,40712, 1547, 812, 186,12037, 9984,46640, 342 12552, 1807, 2817, 1536,49202, 2817, 512,50739, 2819,10752, 343 52276, 2817, 5888,55315, 548,13568, 6162, 2817, 6400,58387, 344 548,13824, 6674, 2816, 6931, 548,14080, 7186,14347, 29, 345 14597,10241, 58, 287, 253, 553, 30, 2820,10554, 2560, 346 64768, 2107, 32, 273, 297, 7948, 289, 293, 273, 286, 347 7948, 2561, 4352, 4924, 3594, 273, 2056,15627, 19,15374, 348 2561, 4352, 2878, 32, 17, 7949, 289, 324, 17, 7949, 349 2561,14600, 2367, 2816, 3648, 279, 345,16640, 4383,16896, 350 6144, 1291, 1794, 353,17162, 345, 2058,17418, 342, 14, 351 32, 17, 8005, 32, 1795, 377,17930, 369, 5,18187, 352 273, 387, 7947, 18, 7947, 1, 556,18443, 547, 0, 353 2816, 354 (*—nt-symbols—*) 355 1, 0, 128, 0, 0, 137, 0,16452, 2694, 0,
App. F cocosynMOD 356 357 358 359 360 361 ( 362 363 364 365 ( 366 367 ( 368 369 370 371 372 ( 373 374 ( 375 376 377 378 379 380 ( 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 (* 406 — 0, 0, 16, 0, 0, 5472, 0,16384, 0, 65535,65535,65502,65535,65535, ) 154, 0,16452, 2694, 239, 0, 0, 8192, 304, 0, 2048, 0, 359, 0,16384, 0, 391, 0, 2, 0, —eps followers—*) 512, 1, 0, 8192, 16, 0, 0, 5408, 0, 0,49152, 0, —any sets—*) 65022,65534,65535,65502, —attribute numbers—* 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, —pragma semantic—*) 0, 0, —name pointers—*) If 5, 13, 19, 83, 98, 104, 114, 177, 181, 185, 189, 217, 221, 225, 229, 300, 315, 331, 349, —name list—*) 17743,17920, 8769,19529,16723, 8704, 8801, 17731,19521,21057,21577,20302,21282, 34, 19746, 34,25966,25715,25965, 8704, 8805, 21057,19789,16722, 8704, 8809,28194, 34, 8704, 8782,20302,21573,21069,18766,16716, 29730, 34,20562,16711,19777,21282, 34, 34,29541,27938, 34,21317,19777,20052, 21573,21069,18766,16716,21282, 105,25701, 29184,21332,21065,20039, 78,21837,16965, 10030, 9984,10108, 9984,10024, 9984,10025, 10077, 9984,10107, 9984,10109, 9984,10044, 10043, 9984,10042, 9984,10028, 9984,28271, 34,25455,29298,25955,29728,26482,24941, 30832,29285,29555,26991,28160,24940,29797, 25856,29561,28002,28524, 97,29812,29289, 26990,11617,29812,29289,25205,29797, 8704, 29812,29289,25205,29797, 8704, 8819,25965, 24931,29801,28526, 8704, 8819,25965,24942, 25458,28450, 115,31085,25199,27648, 8801, 24941,25890, 0,0); 0, 171, 0,16452, 2694, 0, 0, 262, 0, 256, 0, 0, 0, 328, 0,16384, 0, 0, 0, 381, 0, 0, 6, 0, 0, 0, 0, 32, 0, 0, 16452, 8166, 0, 0, o, o, o, lr 34, 122, 193, 233, 366, o, o, o, 44, 128, 197, 242, 373, 0, o, 0, 53, 140, 201, 260, o, lr o, 59, 152, 205, 271, 0, lr o, 69, 163, 209, 283, 0, lr Or 74, 170, 213, 290, 28281, 17742, 28787, 19777, 21282, 21077, 18755, 28276, 20992, 9984, 9984, 25455, 28001, 29294, 25205, 8815, 24942, 29801, 27753, 8704, 17479, 8704, 17234, 34, 19525, 21282, 26982, 10045, 10075, 10046, 25455, 29218, 24948, 29797, 30068, 29801, 25376, 24947, 8772, 21057, 8775, 20307, 28533, 21282, 34, 26981, 9984, 9984, 9984, 29561, 101, 26998, 34, 11617, 25376, 28001, 8302, END TableContents; Parse Proper syntax analyzer 407 PROCEDURE Parse(VAR corr:BOOLEAN); 408 VAR 409 altroot: CARDINAL; 410 mustread: BOOLEAN; 411 opcode: CARDINAL; 412 running: BOOLEAN; 413 sy: CARDINAL; 414 (*root of current alternative chain*) (*TRUE if next symbol must be read*) (♦instruction code*) (♦interpreter state*)
324 Program listings App.p 415 BEGIN 416 tab:=ADR(TableContents)+10D; ^initialize the tables*) 417 pc:=startpc; altroot:=pc; 418 line:=l; col:-l; 419 correct:=TRUE; mustread:=TRUE; running:=TRUE; 420 421 WITH tabA DO 422 WHILE running DO 42 3 opcode:=ORD(code[pc]); 424 IF must read AND (opcode<=epsa) THEN 425 NextSym; mustread:=FALSE; INC(errdist); altroot:=pc; 426 END; 427 (*IF printnodes THEN WriteCard(con,pc,5); END;*) 428 INC(pc); 429 CASE opcode OF 430 t: 431 IF ORD(typ)=ORD(code[pc]) 432 THEN IF typ=eofsy (*t recognized*) 433 THEN running:=FALSE; 434 ELSE INC(pc); mustread:=TRUE; 435 END; 436 ELSE Error(pc,altroot); 437 END; 438 I ta: 439 IF ORD(typ)=ORD(code[pc]) 440 THEN INC(pc,3); mustread:=TRUE; (*t recognized*) 441 ELSE pc:-ORD(code[pc+1])*256+ORD(code[pc+2]); (*try alt.*) 442 END; 443 I nt,nts: 444 sy:=ORD(code[pc]); 445 IF Match(typ,ntsymbols[sy].first) OR ntsymbols[sy].del 446 THEN (*right nt, parse it*) 447 IF opcode=nts THEN INC(pc); Semant(ORD(code[pc])); END; 448 Push(pc+1); pc:*ntsymbols[sy].startpc; 449 altroot:=pc; 450 ELSE Error(pcraltroot); 451 END; 452 I nta,ntas: 453 sy:=ORD(code[pc]); 454 IF Match(typ,ntsymbols[sy].first) 455 THEN (*right nt, parse it*) 456 INC(pc,3); 457 IF opcode=ntas THEN Semant(ORD(code[pc])); INC(pc) END; 458 Push(pc); pc:=ntsymbols[sy].startpc; 459 altroot:-pc; 460 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*) 461 END; 462 I any: mustread:=TRUE; (*any recognized*) 463 I anya: 464 IF Match(typfanyset[ORD(code[pc])]) 465 THEN INC(pcr3); mustread:=TRUE; (*any recognized*) 466 ELSE pc:=ORD(code[pc+1])*256+ORD(code[pc+2]); 467 END; 468 I eps: 469 IF Match(typ,epsset[ORD(code[pc])]) 470 THEN INC(pc); 471 ELSE Error(pc,altroot); 472 END; 473 I epsa:
App. F cocosynMOD 325 474 IF Match(typ,epsset[ORD(code[pc])]) 475 THEN INC(pc,3); (*eps recognized*) 476 ELSE pc:=0RD(code[pc+1])*256+ORD(code[pc+2]); 477 END; 478 I jmp: pc:-ORD(code[pc] )*256+ORD(code [pc+1]); (*goto successor*) 479 I ret: Pop(pc); altroot:=pc; (*end of nt*) 480 ELSE (*sem*) 481 IF correct THEN Semant(ORD(opcode)); END; 482 END; (*CASE*) 483 END; (*WHILE running*) 484 END; (*WITH tabA*) 485 corr:=correct; 486 END Parse; 487 488 BEGIN 489 printinput:=FALSE; 490 printnodes:=FALSE; 491 errdist:=100; 492 lacts:=0; 493 END cocosyn. ADDRESS AdjUStPc ADR Allocate altpc altroot any anya anyset at Attributenumbers C cocolex cocosem cocosyn code col con corr correct D del e el eofsy eps epsa epsset errdist errdistmin Error Errornode Errorptr 20 129 20 19 87 262 92 449 40 40 71 23 48 158 23 22 15 68 439 469 23 18 407 77 416 59 149 149 45 41 41 70 80 43 147 17 17 142 171 212 239 267 267 416 169 175 151 173 179 209 214 228 237 241 243 250 260 263 264 265 267 147 171 182 201 275 286 287 409 417 425 436 450 459 471 479 135 174 254 265 462 135 261 463 464 72 493 134 136 136 253 255 262 262 264 264 423 431 441 441 444 447 453 457 460 460 464 466 466 474 476 476 478 478 181 418 184 187 187 188 188 196 196 197 300 309 485 116 117 166 419 481 485 223 445 175 176 177 177 177 170 177 177 119 432 135 224 259 468 135 224 254 261 424 473 469 474 167 201 425 491 167 202 436 450 471 169 175 149 153
326 Program listings App.F Errors 17 FilelO 18 Fill 207 223 225 230 242 284 287 FlllSucc 235 245 283 286 first 60 219 445 454 FORWARD 88 89 90 91 92 GetSy 23 106 GetSymlnstr 87 173 214 241 250 270 GiveName 153 163 169 176 h 149 169 169 170 170 181 HALT 300 309 header 67 1 91 150 185 186 187 187 188 209 220 221 221 221 276 278 279 279 281 283 283 284 284 320 321 INLINE 20 330 j 150 154 157 158 158 159 159 159 161 jmp 41 136 478 1 161 lacts 84 201 207 217 221 223 225 235 242 281 286 287 298 299 299 307 308 308 492 line 23 181 418 lmaxs 44 63 307 loc 296 299 305 308 Match 97 98 221 445 454 464 469 474 maxany 29 71 maxcode 28 68 maxeps 30 70 maxname 26 50 maxnamep 27 49 « maxp 32 48 54 62 maxs 33 62 maxt 31 54 55 81 82 114 185 220 278 mustread 410 419 424 425 434 440 462 465 name 75 158 159 Namelist 50 75 namep 74 157 Namepointers 49 74 newlacts 81 188 201 217 221 279 newpc 82 186 187 194 201 217 221 279 next 170 177 177 nextpc 87 151 173 209 214 223 225 237 241 242 242 250 260 262 263 264 265 267 NextSym 103 121 199 425 nra 72 nt 39 135 218 259 443 nta 39 135 218 261 452 ntas 40 135 218 264 452 457 nts 40 135 218 263 443 447 ntsymbols 69 219 223 445 445 448 454 458 olds 83 315 318 opcode 87 151 173 174 209 214 215 237 241 250 253 254 254 258 411 423 424 429 447 457 481 p 154 157 158 159 I Parse 407 486 pc 78 87 129 132 134 136 136 136 137 138 147 201 201 207 212 213 214 217 221 228 235 239 240 241 243 250 253 255 260 262 262 262 263 264 264 264 \ 265 417 417 423 425 428 431 434 436 439 440 441 *! 441 441 444 447 447 448 448 449 450 453 456 457 - ,
AppF cocosynMOD 327 457 458 458 459 460 460 460 464 465 466 466 466 469 470 471 474 475 476 476 476 478 478 478 479 pel Pop pragma pragmalist prlntinput prlntnodes ps Push q RestoreStack ret running s SaveStack sem2 sem3 Semant set Stack StackElem startpc sy Symbollist Symbolnode Symbolset SyntaxError System SYSTEM t ta tab TableContents Triple txt typ WriteCard WriteLn WriteString 479 151 296 51 54 489 183 73 305 153 89 41 412 83 90 52 52 22 97 63 91 34 87 217 448 62 57 55 17 19 20 39 39 66 328 92 159 23 431 18 18 18 171 303 54 73 195 116 312 159 192 137 419 210 182 116 117 116 98 83 283 58 97 219 453 69 62 60 181 135 135 115 402 182 114 432 187 188 184 171 479 490 116 448 161 314 479 422 219 317 116 117 117 284 417 98 223 454 70 216 216 131 416 275 116 439 187 196 172 117 458 315 433 221 318 447 320 448 98 237 458 71 259 261 156 288 116 445 188 197 173 117 299 457 321 458 151 241 97 430 438 219 117 454 196 300 179 308 481 153 250 210 223 117 464 309 315 157 255 252 119 469 318 173 256 416 169 474 321 176 209 214 217 413 444 445 445 421 194 196 201 201
328 Program listings App. p 1 (* General table-driven syntax analyzer 2 «■„.«««««■»■«««««=«««.■.««« 3 This is a parser module generated by Coco from an attributed grammar. 4 Before calling the procedure Parse from the main program, initialize 5 the scanner (<grammarname>lex.MOD). 6 * 7 DEFINITION MODULE —>modulename; 8 VAR 9 printinput: BOOLEAN; (*trace the input tokens read*) 10 printnodes: BOOLEAN; (*trace the G-code interpretation*) 11 12 PROCEDURE Parse(VAR correct:BOOLEAN); 13 END —>modulename. 14 —> implementation 15 (* General table-driven syntax analyzer Re 16 «=====«==========«==:========«===== Moe 21.12.83 17 01 (21.12.83) First version (rewritten from PL/M) 18 02 (28.02.84) New interface for input and errors 19 03 (02.04.84) Error in EOL-processing corrected 20 04 (08.05.84) New EOL-processing 21 05 (23.07.84) For G-code 22 06 (30.08.84) Error recovery simplified 23 07 (05.04.85) New G-code instruction EPSA (ANYA modified) 24 08 (12.04.87) Grammar tables initialized INLINE 25 09 (12.04.87) typ,col,line and at exported by cocolex 26 10 (07.06.87) Name of error module and scanner procedure constant 27 28 IMPLEMENTATION MODULE — >modulename; 29 30 FROM Errors IMPORT SyntaxError, Errorptr, Errornode; 31 FROM FilelO IMPORT con, WriteCard, WriteLn, WriteString; 32 FROM System IMPORT Allocate; 33 FROM SYSTEM IMPORT ADDRESS, ADR, INLINE; 34 35 FROM —>semantic analyzer IMPORT Semant; 36 FROM —>input module IMPORT GetSy, typ, at, line, col; 37 38 —declarations 39 40 CONST (*G-code instructions*) 41 t 0; ta = 1; nt = 2; nta = 3; 42 nts = 4; ntas = 5; any = 6; anya « 7; 43 eps = 8; epsa = 9; jmp = 10; ret = 11; 44 45 errdistmin = 2; (*min. distance between two errors*) 46 lmaxs = 50; (*max. stack length*) 47 eofsy = 0; (*token number of endfile symbol*) 48 49 TYPE 50 Attributenumbers = ARRAY[0..maxp] OF CARDINAL; 51 Namepointers = ARRAY[0..maxnamep] OF CARDINAL; 52 Namelist = ARRAY[l..maxname] OF CHAR; 53 Pragma = RECORD (*semantics for a pragma*) 54 sem2,sem3: CARDINAL; 55 END; 56 Pragmalist = ARRAY(maxt..maxp] OF Pragma; 57 Symbolset = ARRAY[0..maxt DIV 16] OF BITSET; 58 (*set of terminals*) 59 Symbolnode = RECORD (*symbol information (only for nt)*)
App. F cocosynframe 329 60 61 62 63 64 65 startpc: del: first: END; Symbollist Stack 00 67 VAR 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 tab: header: code: ntsymbol epsset: anyset: nra: ps: namep: name: END; correct: pc: errdist: newlacts: newpc: : CARDINAL; (*start node of rule for nt*) BOOLEAN; (*TRUE, if nt is deletable*) Symbolset; (*terminals causing this nt to be analyz = ARRAY[maxp+1..maxs] OF Symbolnode; = ARRAY[l..lmaxs] OF CARDINALS- POINTER TO RECORD (*grammar ARRAY[1..8] OF CARDINAL; ARRAY [L.maxcode] OF CHAR; .s: Symbollist; tables*) (*not used*) (*G-code area*) (♦nonterminals information*) ARRAY[l..maxeps] OF Symbolset; ARRAY[l..maxany] OF Symbolset; Attributenumbers; Pragmalist; Namepointers; Namelist; BOOLEAN; CARDINAL; CARDINAL; ARRAY [0..maxtj OF CARDINAL; ARRAY [0..maxt] OF CARDINAL; s,olds: Stack; lacts: CARDINAL; (*no.of attributes*) (♦semantics for pragmas*) (♦pointers to symbol names*) (*symbol names*) (*error indicator*) (♦program counter*) (*current error distance*) (*new stack length*) (*pc after recovery*) (*stack pointer*) 89 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); 90 FORWARD; 91 PROCEDURE RestoreStack; FORWARD; 92 PROCEDURE SaveStack; FORWARD; 93 PROCEDURE StackElem(i.-CARDINAL): CARDINAL; FORWARD; 94 PROCEDURE Triple(altroot:CARDINAL); FORWARD; 95 96 97 (* Match Check if sy is member of the specified set *) 98 99 PROCEDURE Match(sy:CARDINAL; set:Symbolset): BOOLEAN; 100 BEGIN RETURN (sy MOD 16) IN set[sy DIV 16]; END Match; 101 102 103 (* NextSym Get next symbol 104 105 PROCEDURE NextSym; 106 BEGIN 107 LOOP GetSy; (*IF printinput THEN WriteString(con,■$(in:"); WriteCard(conr typ,3); WriteString(con,") "); IF printnodes THEN WriteCard(con,lact$,3); WriteString(con,"I "); END; END;*) IF typ<=maxt THEN RETURN END; WITH tabA DO IF correct AND (psftyp].sem2<>0) THEN Semant(ps[typ].sem2); END; 108 109 110 111 112 113 114 115 116 117 118
330 Program listings App.F 119 IF correct AND (ps[typ] .sem3<>0) THEN Semant(ps[typ] .sem3); END; 120 END; 121 IF typ=eofsy THEN RETURN END; 122 END; 123 END NextSym; 124 125 126 127 (*=:«=========:=======«:====::==:= ERRORS ============«====================:*) 128 129 (* AdjustPc Adjust pc to next symbol instruction 130 *) 131 PROCEDURE AdjustPc(VAR pc:CARDINAL); 132 BEGIN 133 WITH tabA DO 134 IF pc*0 THEN RETURN; END; 135 LOOP 136 CASE ORD(code[pc]) OF 137 t,ta,nt,nta,nts,ntas,any,anya,eps,epsa: EXIT; 138 | jmp: pc:=256*ORD(code[pc+l])+ORD(code[pc+2]); 139 | ret: pc:=0; EXIT; 140 ELSE INC(pc); (*sem*) 141 END; 142 END; 143 END; 144 END AdjustPc; 145 146 147 (* Error Report syntax error 148 *) 149 PROCEDURE Error (VAR pc,altrootCARDINAL); 150 VAR 151 erelrh: Errorptr; 152 i,j: CARDINAL; 153 opcode,sy,nextpc,altpc,pel: CARDINAL; 154 155 PROCEDURE GiveName(q:Errorptr; syCARDINAL); 156 VAR p,j: CARDINAL; 157 BEGIN 158 WITH tabA DO 159 p:=namep[sy]; j:=0; 160 WHILE (j<25) AND (name[p+j]<>0C) DO 161 INC(j); qA.txt[j]:=name[p+j-l]; 162 END; 163 qA.l:=j; 164 END; 165 END GiveName; 166 167 BEGIN (*Error*) 168 correct:=FALSE; 169 IF errdist >= errdistmln 170 THEN 171 Allocate(h,SIZE(Errornode)); GiveName(h,typ); (*pass near-symbol*) 172 hA.next:-NIL; el:=h; 173 pcl:=altroot; AdjustPc{pel); 174 WHILE pcl>0 DO 175 GetSymlnstr(pel,opcode,sy,nextpcraltpc); 176 IF opcode<any THEN (*t,nt,nts,ta,nta,ntas*) 177 Allocate(e,SIZE(Errornode));
AppF cocosynframe 331 !78 GiveName(ersy); (*pass expected symbol*) !79 elA.next:=e; el:=e; eA.next:=NIL; 180 END; 181 pcl:=altpc; !82 END; (*WHILE*) 183 SyntaxError(h,line,col); 184 Triple(altroot); SaveStack; 185 IF printnodes THEN 186 WriteString(con,"$ typ newpc newlacts$"); 187 FOR i:=0 TO maxt DO 188 IF newpc[i]<>0 THEN 189 WriteCard(con,i,5); WriteCard(con,newpc[i],10); 190 WriteCard(con,newlacts[i],10); WriteLn(con); 191 END; (*IF*) 192 END; (*FOR*) 193 END; (*IF*) 194 ELSE RestoreStack; 195 END; 196 WHILE newpc[typ]*0 DO 197 IF printnodes THEN 198 WriteString(con,"$(skip:"); WriteCard(con,typ, 0); 199 WriteString(con,") "); 200 END; 201 NextSym; 202 END; 203 pc:=newpc[typ]; altroot:-pc; lacts:=newlacts[typ]; errdist:*0; 204 END Error; 205 206 207 (* Fill Fill triple list with alt-chain starting at pc 208 *) 209 PROCEDURE Fill(pc,lacts:CARDINAL); 210 VAR 211 i,opcode,sy,nextpc,altpc: CARDINAL; 212 s: Symbolset; 213 BEGIN 214 AdjustPc(pc); 215 WHILE pc<>0 DO 216 GetSymlnstr(pc,opcode,sy,nextpc,altpc); 217 CASE opcode OF 218 t,ta: 219 newpc[sy]:«pc; newlactsfsy]:=lacts; 220 | nt,nta,nts,ntas: 221 s:*tabA.ntsymbols[sy].first; 222 FOR i:=0 TO maxt DO 223 IF Match(i,s) THEN newpcfi]:=pc; newlactsli]:=lacts; END; 224 END; 225 IF tabA.ntsymbols[sy].del THEN Fill(nextpc,lacts); END; 226 | eps,epsa: 227 Fill(nextpc,lacts); 228 ELSE (*any,anya: nothing*) 229 END; (*CASE*) 230 pc:=altpc; 231 END; (*WHILE*) 232 END Fill; 233 234 235 (* FillSucc Fill triple list with succ. of alt-chain at pc 236 *)
332 Program listings App.F 237 PROCEDURE FillSucc(pc,lacts:CARDINAL); 238 VAR 239 opcode,sy,nextpc,altpc: CARDINAL; 240 BEGIN 241 AdjustPc(pc); 242 WHILE pc>0 DO (*fill with successors of alternative-starts*) 243 GetSymlnstr(pc,opcode,sy,nextpc,altpc); 244 IF nextpoO THEN Fill(nextpc,lacts); END; 245 pc:=altpc; 246 END; (*WHILE*) 247 END FillSucc; 248 249 250 (* GetSymlnstr Get G-code instruction at address pc 25i *) 252 PROCEDURE GetSymlnstr(pc:CARDINAL; VAR opcode,sy,nextpc,altpc: CARDINAL); 253 BEGIN (*assert: pc points to a symbol instruction (not RET,JMP,SEM,ANY)*) 254 WITH tabA DO 255 opcode:=ORD(code[pc]); 256 IF (opcode<=epsa) AND (opcodeoany) 257 THEN sy:=ORD(code[pc+l]) ; 258 ELSE sy:=0; 259 END; 260 CASE opcode OF 261 t,nt,eps: 262 nextpc:«pc+2; altpc:=0; 263 I ta,nta,anya,epsa: 264 nextpc:=pc+4; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3]); 265 | nts: nextpc:=pc+3; altpc:-0; 266 | ntas: nextpc:=pc+5; altpc:=256*ORD(code[pc+2])+ORD(code[pc+3J); 267 I any: nextpc:=pc+l; altpc:=0; 268 END; (*CASE*) 269 AdjustPc(nextpc); AdjustPc(altpc); 270 END; 271 (*assert: nextpc,altpc point to symbol instructions or are zero*) 272 END GetSymlnstr; 273 274 275 (* Triple Fill triple list 276 *) 277 PROCEDURE Triple(altroot CARDINAL); 278 VAR i: CARDINAL; 279 BEGIN 280 FOR i:=0 TO maxt DO (*clear triple list*) 281 newpc[i]:=0; newlacts[i]:=0; 282 END; 283 FOR i:=l TO lacts DO (*fill with succ.of stacked nt's*) 284 (*s[l] contains successor at level 0*) 285 FillSucc(StackElem(i),i-l) ; 286 Fill(StackElem(i),i-1); 287 END; 288 FillSucc(altroot,lacts); (*fill with succ.of alt-chain*) 289 Fill(altroot,lacts); (*fill with current alt-chain*) 290 END Triple; 291 292 (*========—=«======:====:=== END ERRORS ====^===*====-====r:=S======:====S=Sr*) 293 294 295
APPF cocosynframe 333 296 (*=«==:===========:*======= SYNTAXSTACK ============================ 297 298 PROCEDURE Pop(VAR loc: CARDINAL); 299 BEGIN 300 IF lacts>0 301 THEN loc:=s[lacts]; DEC(lacts); 302 ELSE WriteString(conr"— Parser stack underflow.$"); HALT; 303 END; 304 (*IF printnodes THEN WriteString(con," pop"); END;*) 305 END Pop; 306 307 PROCEDURE Push(loc: CARDINAL); 308 BEGIN 309 IF lacts<lmaxs 310 THEN INC(lacts); s[lacts]:=loc; 311 ELSE WriteString(con,"— Parser stack overflow.$"); HALT; 312 END; 313 (*IF printnodes THEN WriteString(con," push"); END;*) 314 END Push; 315 316 PROCEDURE RestoreStack; 317 BEGIN s:=olds; END RestoreStack; 318 319 PROCEDURE SaveStack; 320 BEGIN olds:=s; END SaveStack; 321 322 PROCEDURE StackElem(i CARDINAL): CARDINAL; 323 BEGIN RETURN s[i]; END StackElem; 324 325 (*================«==== END SYNTAXSTACK ========================== 326 327 328 (* TableContents A dirty trick to initialize the grammar tables 329 330 PROCEDURE TableContents; 331 BEGIN (*%% dont remove or change this comment*) 332 —>tables 333 END TableContents; 334 335 336 (* Parse Proper syntax analyzer 337 338 PROCEDURE Parse(VAR corr:BOOLEAN); 339 VAR 340 altroot: CARDINAL; (*root of current alternative chain*) 341 mustread: BOOLEAN; (*TRUE if next symbol must be read*) 342 opcode: CARDINAL; (*instruction code*) 343 running: BOOLEAN; ^interpreter state*) 344 sy: CARDINAL; 345 346 BEGIN 347 tab:=ADR(TableContents)+10D; (*initialize the tables*) 348 pc:=startpc; altroot:=pc; 349 line:-l; col:=0; 350 correct:=TRUE; mustread:=TRUE; running:=TRUE; 351 352 WITH tabA DO 353 WHILE running DO 354 opcode:-ORD(code[pc]); 355 IF mustread AND (opcode<~epsa) THEN
334 Program listings App.F 356 NextSym; mustread:=FALSE; INC(errdist); altroot:=pc; 357 END; 358 (*IF printnodes THEN WriteCard(con,pc,5); END;*) 359 INC(pc); 360 CASE opcode OF 361 t: 362 IF ORD (typ)=ORD (code [pc]) 363 THEN IF typ^eofsy (*t recognized*) 364 THEN running:=FALSE; 365 ELSE INC(pc); mustread:=TRUE; 366 END; 367 ELSE Error(pcraltroot); 368 END; 369 I ta: 370 IF ORD(typ)=ORD(code[pc]) 371 THEN INC(pc,3); mustread:=TRUE; (*t recognized*) 372 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*) 373 END; 374 I nt,nts: 375 sy:=ORD(code[pc]); 376 IF Match(typrntsymbols[sy].first) OR ntsymbols[sy].del 377 THEN (*right ntr parse it*) 378 IF opcode=nts THEN INC(pc); Semant(ORD(code[pc))); END; 379 Push(pc+1); pc:=ntsymbols[sy].startpc; 380 altroot:=pc; 381 ELSE Error(pc,altroot); 382 END; 383 I nta,ntas: 384 sy:=ORD(code[pc]); 385 IF Match(typrntsymbols[sy].first) 386 THEN (*right nt, parse it*) 387 INC(pc,3); 388 IF opcode=ntas THEN Semant(ORD(code[pc])); INC(pc) END; 389 Push(pc); pc:-ntsymbols[sy].startpc; 390 altroot:=pc; 391 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); (*try alt.*) 392 END; 393 | any: mustread:=TRUE; (*any recognized*) 394 | anya: 395 IF Match(typ,anyset[ORD(code[pc]) ]) 396 THEN INC(pc,3); mustread:=TRUE; (*any recognized*) 397 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+2]); 398 END; 399 I eps: 400 IF Match(typ,epsset[ORD(code[pc])]) 401 THEN INC(pc); 402 ELSE Error(pc,altroot); 403 m END; 404 "| epsa: 405 IF Match(typ,epsset[ORD(code[pc])]) 406 THEN INC(pcr3); (*eps recognized*) 407 ELSE pc:=ORD(code[pc+l])*256+ORD(code[pc+23); 408 END; 409 I jmp: pc:=ORD(code[pc])*256+ORD(code[pc+l]); (*goto successor*) 410 | ret: Pop(pc); altroot:=pc; (*end of nt*) 411 ELSE (*sem*) 412 IF correct THEN Semant(ORD(opcode)); END; 413 END; (*CASE*) 414 END; (*WHILE running*)
APP-F cocosynframe 335 415 END; (*WITH tabA*) 416 corr:=correct; 417 END Parse; 418 419 BEGIN 420 printinput:=FALSE; 421 prlntnodes :=FAL 422 errdist:=100; 423 lacts:=0; SE; 424 END —>modulename. ADDRESS AdjustPc ADR Allocate altpc altroot analyzer any anya anyset at Attrlbutenumbers C code col con corr correct D declarations del e el eofsy eps epsa epsset errdlst errdistmin Error Errornode Errorptr Errors FilelO Fill FillSucc first FORWARD GetSy GetSymlnstr GiveName h HALT 33 131 33 32 89 264 94 380 35 42 42 73 36 50 160 70 370 400 36 31 338 12 347 38 61 151 151 47 43 43 72 82 45 149 30 30 30 31 209 237 62 90 36 89 155 151 302 144 347 171 153 265 149 381 137 137 395 74 136 372 405 183 186 416 79 225 177 172 121 137 137 400 169 169 204 171 151 225 247 221 91 108 175 165 171 311 173 177 175 266 173 390 176 263 138 372 407 349 189 118 376 178 179 363 226 226 405 203 367 177 155 227 285 376 92 216 171 171 214 181 267 184 402 256 394 138 375 407 189 119 179 179 261 256 356 381 232 288 385 93 243 178 172 241 211 269 203 410 267 255 378 409 190 168 179 399 263 422 402 244 94 252 172 269 216 277 393 257 384 409 190 350 179 355 286 272 183 269 230 288 264 388 198 412 404 289 239 289 264 391 198 416 243 245 340 348 266 266 391 395 199 302 252 262 356 367 362 397 311
336 Program listings %F header 69 i 93 152 187 188 189 189 190 211 222 223 223 223 278 280 281 281 283 285 285 286 286 322 323 implementation 14 INLINE 33 input 36 j 152 156 159 160 160 161 161 161 163 jmp 43 138 409 1 163 lacts 86 203 209 219 223 225 227 237 244 283 288 289 300 301 301 309 310 310 423 line 36 183 349 lmaxs 46 65 309 loc 298 301 307 310 Match 99 100 223 376 385 395 400 405 maxany 73 maxcode 70 maxeps 72 maxname 52 maxnamep 51 maxp 50 56 64 maxs 64 maxt 56 57 83 84 116 187 222 280 module 36 modulename 7 13 28 424 mustread 341 350 355 356 365 371 393 396 name 77 160 161 Namelist 52 77 namep 76 159 Namepointers 51 76 newlacts 83 190 203 219 223 281 newpc 84 188 189 196 203 219 223 281 next 172 179 179 nextpc 89 153 175 211 216 225 227 239 243 244 244 252 262 264 265 266 267 269 NextSym 105 123 201 356 nra 74 nt 41 137 220 261 374 nta 41 137 220 263 383 ntas 42 137 220 266 383 388 nts 42 137 220 265 374 378 ntsymbols 71 221 225 376 376 379 385 389 olds 85 317 320 opcode 89 153 175 176 211 216 217 239 243 252 255 256 256 260 342 354 355 360 378 388 412 p 156 159 160 161 Parse 12 338 417 pc 80 89 131 134 136 138 138 138 139 140 149 203 203 209 214 215 216 219 223 230 237 241 242 243 245 252 255 257 262 264 264 264 265 266 266 266 267 348 348 354 356 359 362 365 367 370 371 372 372 372 375 378 378 379 379 380 381 384 387 388 388 389 389 390 391 391 391 395 396 397 397 397 400 401 402 405 406 407 407 407 409 409 409 410 410 pel 153 173 173 174 175 181 Pop 298 305 410 Pragma 53 56 Pragmalist 56 75
APP-P cocosyrtframe 337 printinput printnodes ps push q RestoreStack ret running s SaveStack sem2 sem3 Semant semantic set Stack StackElem startpc sy Symbollist Symbolnode Symbolset SyntaxError SYSTEM System t ta tab TableContents tables Triple txt typ WriteCard WriteLn WriteString 9 10 75 307 155 91 43 343 .85 92 54 54 35 35 99 65 93 60 89 219 379 64 59 57 30 33 32 41 41 68 330 332 94 161 36 362 31 31 31 420 185 118 314 161 194 139 350 212 184 118 119 118 100 85 285 348 99 221 384 71 64 62 183 137 137 117 333 184 116 363 189 190 186 197 118 379 163 316 410 353 221 319 118 119 119 286 379 100 225 385 72 218 218 133 347 277 118 370 189 198 421 119 389 317 364 223 320 378 322 389 100 239 389 73 261 263 158 290 118 376 190 199 119 301 388 323 153 243 99 361 369 221 119 385 198 302 310 412 155 252 212 225 119 395 311 317 159 257 254 121 400 320 175 258 347 171 405 323 178 211 216 219 344 375 376 376 352 196 198 203 203
338 Program listings App.F 1 (* cocotst Perform various tests with top-down graph Moe 12.1.83 2 ======= =================:=================:======= 3 This module tests 4 a) if all nonterminals can be reached from the start symbol 5 b) if there exist productions for all nonterminals 6 c) if all nonterminals can be derived to terminals 7 d) if the grammar is free of circular derivations 8 e) if the grammar satisfies the LL(1)-conditions 9 t) 10 DEFINITION MODULE cocotst; 11 12 PROCEDURE FindCircularRules(VAR ok:BOOLEAN); 13 (* Finds and prints the circular part of the grammar, ok means: 14 no circular part*) 15 16 PROCEDURE LLlTest(VAR lll:BOOLEAN); 17 (* Checks if the grammar satisfies the LL(1) conditions*) 18 19 PROCEDURE TestCompleteness(VAR ok:BOOLEAN); 20 (* ok=TRUE if all nonterminals have rules*) 21 22 PROCEDURE TestIfAllNtReached(VAR ok:BOOLEAN); 23 (* ok=TRUE if all nonterminals can be reached from the start symbol*) 24 25 PROCEDURE TestIfNtToTerm(VAR ok:BOOLEAN); 26 (* ok=TRUE if all nonterminals can be reduced to terminals*) 27 28 END cocotst.
APP-F cocotstMOD 339 (* cocotst Perform various tests with the top-down graph Moe 11.1.84 This module tests a) if all nonterminals can be reached from the start symbol b) if there exist productions for all nonterminals c) if all nonterminals can be derived to terminals d) if the grammar is free of circular derivations e) if the grammar satisfies the LL(1)-conditions *> IMPLEMENTATION MODULE cocotst; FROM cocogra IMPORT 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 PROCEDURE FindCircularRules (VAR ok-.BOOLEAN); 29 CONST 30 circmax = 150; 31 TYPE 32 Circrule = RECORD 33 left,right: CARDINAL; 34 del: BOOLEAN; 35 END; 36 Circrulelist = ARRAY[1..circmax] OF Circrule; FROM cocolex IMPORT FROM cocolst IMPORT FROM cocosym IMPORT FROM FilelO rootloc, ClearMarkList, Deletable, DelNode, Graphnode, GetNode, Mark, Marked, Marklist; ddt, GetName; 1st; maxp, maxs, maxt, ClearSet, GetF, GetFirstSet, GetFo, GetSy, IsInSet, RepSy, SetBit, Unit, Symbolnode, Symbolset, Symboltype; IMPORT con, WriteCard, WriteString, WriteText, WriteLn; VAR headline: BOOLEAN; 11: BOOLEAN; (*TRUE if header shall be printed*) (*TRUE if LL(1) conditions hold*) (* FindCircularRules Test grammar for circular derivations 37 VAR 38 c: Circrulelist; 39 changed: BOOLEAN; 40 headline: BOOLEAN; 41 i,j,k,dummy: CARDINAL; 42 lcirc: CARDINAL; 43 m: Marklist; 44 singleset: Marklist; (*set of single nonterminals in a production*) 45 sn: Symbolnode; 46 rside,lside: BOOLEAN; 47 48 PROCEDURE GetSingles(loc:CARDINAL; VAR singles.-Marklist); 49 VAR gn: Graphnode; 50 BEGIN 51 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; 52 Mark(loc,m); 53 GetNode(loc,gn); 54 CASE gn.typ OF 55 eps: GetSingles(gn.rp,singles); 56 I t,any: ; 57 | nt: IF Deletable(gn.rp) THEN Mark(gn.sp,singles); END; 58 IF DelNode(gn) THEN GetSingles(gn.rp,singles); END; 59 END; (*CASE*)
340 Program listings App. p 60 GetSingles(gn.lp,singles); 61 END Getsingles; 62 63 PROCEDURE PutCirc(i:CARDINAL); 64 VAR 65 1: CARDINAL; 66 name: ARRAY[1..50] OF CHAR; 67 sn: Symbolnode; 68 BEGIN 69 IF headline THEN 70 WriteLn(lst); 71 WriteString(1st,"Circular part for this grammar:"); 72 WriteLn(lst); 73 headline:=FALSE; 74 END; 75 WriteStringdst," "); 76 GetSy(c[i].left,sn); GetName(sn.spix,name,l); 77 WriteText( 1st,name, 1); WriteStringdst," —> "); 78 GetSy(c[i].right,sn); GetName(sn.spix,name,l); 79 WriteText(1st,name,1); WriteLn(lst); 80 END PutCirc; 81 82 BEGIN (*FindCircularRules*) 83 lcirc:=0; 94 (* fiXX list of circular derivations c*) 85 FOR i:=maxp+l TO maxs DO 86 ClearMarkList(singleset); ClearMarkList(m); 87 GetSy(i,sn); 88 GetSingles(sn.start,singleset); (*get nt's j such that i->j*) 89 FOR j:=maxp+l TO maxs DO 90 IF Marked (j,singleset) THEN 91 INC(lcirc); 92 WITH c[lcirc] DO left:^i; right:=j; del:=FALSE; END; 93 IF ddt["D"] THEN 94 WriteCard(con,lcirc,6); WriteCard(con,i,6); 95 WriteCard(con,j,6); WriteLn(con); 96 END; 97 END; (*IF Marked*) 98 END; (*FOR j*) 99 END; (*FOR i*) 100 (* remove non circular derivations from c*) 101 REPEAT 102 changed:=FALSE; 103 FOR i:=l TO lcirc DO 104 IF NOT c[i].del THEN 105 rside:=FALSE; lside:=FALSE; 106 FOR j:=l TO lcirc DO 107 IF NOT c[j].del THEN 108 IF c[i].left=c[j).right THEN rside:=TRUE; END; 109 IF c[j].left-c[i].right THEN lside:=TRUE; END; 110 END; 111 END; (*FOR j*) 112 IF NOT rside OR NOT lside THEN 113 c[i].del:=TRUE; changed:=TRUE; 114 IF ddt["D"] THEN 115 WriteCard(con,i,6); WriteString(con," deleted$"); 116 END; 117 END; 118 END; (*IF NOT c[i].del*)
App. F cocotstMOD 119 END; (*FOR*) 120 UNTIL NOT changed; 121 (* c contains the circular part of the grammar. Print it*) 122 ok:=TRUE; headline:=TRUE; 123 FOR i:-l TO lcirc DO 124 IF NOT c[i].del THEN PutCirc(i); ok:=FALSE; END; 125 END; 126 IF ok THEN 127 WriteLn(lst); 128 WriteString(1st,"Grammar contains no circular derivations."); 129 WriteLnUst); 130 END; 131 END FindCircularRules; 132 133 134 (* LLlError Print LL(1) error message 135 136 PROCEDURE LLlError(code,line,sy:CARDINAL); 137 VAR 138 1: CARDINAL; 139 name: ARRAY[1..50] OF CHAR; 140 sn: Symbolnode; 141 BEGIN 142 IF headline THEN 143 headline :=FALSE; 144 WriteLnUst); WriteString(lst,"LL(l)-error(s):"); WriteLn(lst); 145 END; 146 WriteString(1st," line"); WriteCard(lst,line,4); 147 GetSy(sy,sn); GetName(sn.spix,name,l); 148 WriteStringdst," "); 149 CASE code OF 150 1: WriteText(1st,name,1); 151 WriteStringdst," is start of more than one alternative."); 152 |2: WriteText(1st,name,1); 153 WriteStringdst," is start and successor of deletable "); 154 WriteStringdst,"rest of rule."); 155 END; 156 WriteLn(lst); 157 11:-FALSE; 158 END LLlError; 159 160 161 (* LLlTest Collects terminal sets and checks LL(1) conditions 162 163 PROCEDURE LLlTest(VAR lll:BOOLEAN); 164 VAR 165 dummy: CARDINAL; 166 gn: Graphnode; 167 i,loc: CARDINAL; 168 m: Marklist; 169 sn: Symbolnode; 170 171 172 PROCEDURE Test(VAR sl,s2:Symbolset; code, line CARDINAL); 173 VAR i CARDINAL; 174 BEGIN 175 FOR i:=0 TO maxt DO 176 IF IsInSet(i,sl) AND IsInSet(i,s2) THEN 177 LLlError(code,line,i);
342 Program listings App.F 178 END; 179 END; 180 END Test; 181 182 183 PROCEDURE CheckAlternatives(loc,sym:CARDINAL); 184 VAR 185 gn: Graphnode; 186 locset,s,first,follow: Symbolset; 187 BEGIN 188 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; 189 GetNode(loc,gn); 190 IF ddt["F"] THEN 191 WriteCard(con,loc,6); WriteCard(con,ORD(gn.typ),6); 192 WriteCard(con,gn.sp,6); WriteLn(con); 193 END; 194 IF Deletable(loc) THEN 195 GetFirstSet(loc,s); GetFo(sym,follow); 196 Test(s,follow,2,gn.line); 197 END; 198 ClearSet(s,maxt); 199 WHILE locoO DO 200 Mark(loc,m); 201 GetNode(loc,gn); 202 IF DelNode(gn) 203 THEN GetFirstSet(gn.rp,locset); 204 ELSE ClearSet(locset,maxt); 205 END; 206 CASE gn.typ OF 207 t: SetBit(locset,gn.sp); 208 I nt: GetF(gn.sp,first); Unit(locset,first,maxt); 209 I eps,any: ; 210 END; 211 Test(s,locset,l,gn.line); 212 Unit(s,locset,maxt); 213 CheckAlternatives(gn.rp, sym); 214 loc:-gn.lp; 215 END; 216 END CheckAlternatives; 217 218 219 BEGIN (*LLlTest*) 220 11:=TRUE; headline:=TRUE; 221 FOR i:=maxp+l TO maxs DO 222 ClearMarkList(m); 223 GetSy(i,sn); 224 CheckAlternatives(sn.start,i); 225 END; 226 IF 11 THEN 227 WriteLn(lst); 228 WriteString(1st,"Grammar satisfies LL(1)-conditions."); WriteLn(lst); 229 END; 230 111:=11; 231 END LLlTest; 232 233 234 (* TestCompleteness Test if all nonterminals have rules 235 *) 236 PROCEDURE TestCompleteness(VAR ok:BOOLEAN);
App. F cocotstMOD 237 VAR 238 sn: Symbolnode; 239 1,1,dummy: CARDINAL; 240 name: ARRAY[1..50] OF CHAR; 241 BEGIN 242 ok:=TRUE; 243 FOR i:=maxp+l TO maxs DO 244 GetSy(i,sn); 245 IF sn.start=0 THEN 246 IF ok THEN 247 WriteLn(lst); 248 WriteString(1st,"Nonterminals without rules:"); WriteLn(lst); 249 END; 250 GetName(sn.spix,name,l); 251 WriteString(1st," "); WriteText(1st,name,1); WriteLn(lst); 252 ok:=FALSE; 253 END; 254 END; (*F0R*) 255 IF ok THEN 256 WriteLn(lst); 257 WriteString(1st,"All nonterminals have rules."); WriteLn(lst); 258 END; 259 END TestCompleteness; 260 261 262 (* TestlfAllNtReached Tests if all nts can be reached 263 264 PROCEDURE TestlfAllNtReached(VAR ok:B00LEAN); 265 VAR 266 gn: Graphnode; 267 i,l,dummy: CARDINAL; 268 m: Marklist; 269 name: ARRAY[1..50] OF CHAR; 270 sn: Symbolnode; 271 reached: Marklist; 272 273 PROCEDURE MarkReachedNts(loc:CARDINAL); 274 VAR gn: Graphnode; 275 sn: Symbolnode; 276 BEGIN 277 IF (loc=0) OR Marked(loc,m) THEN RETURN; END; 278 Mark(loc,m); 279 GetNode(loc,gn); 280 WITH gn DO 281 IF (typ=nt) AND NOT Marked(sp,reached) THEN 282 Mark(sp,reached); GetSy(sp,sn); MarkReachedNts(sn.start); 283 END; 284 MarkReachedNts(lp); 285 MarkReachedNts(rp); 286 END; 287 END MarkReachedNts; 288 289 BEGIN 290 ClearMarkList(m); 291 ClearMarkList(reached); 292 GetNode(rootloc,gn); Mark(gn.sp,reached); 293 GetSy(gn.sp,sn); 294 MarkReachedNts(sn.start); 295 ok:=TRUE;
344 Program listings App.F 296 FOR i:=maxp+l TO maxs DO (*report not marked symbols*) 297 IF NOT Marked(i,reached) THEN 298 GetSy(i,sn); GetName(sn.spix,name,l); 299 WriteString(1st,"Nonterminal "); WriteText(1st,name,1); 300 WriteStringdst," cannot be reached."); WriteLn(lst); 301 ok:=FALSE; 302 END; 303 END; 304 IF ok THEN 305 WriteLn(lst); 306 WriteStringdst,"All nonterminals can be reached."); WriteLn(lst); 307 END; 308 END TestlfAllNtReached; 309 310 311 (* TestlfNtToTerm Test if all nt can be derived to t 312 *) 313 PROCEDURE TestlfNtToTerm(VAR ok:BOOLEAN); 314 VAR 315 1,1,dummy: CARDINAL; 316 sn: Symbolnode; 317 name: ARRAY[1..50) OF CHAR; 318 changed: BOOLEAN; 319 termlist: Marklist; (*list of nts which can be derived to t*) 320 m: Marklist; 321 term: BOOLEAN; 322 323 PROCEDURE IsTerm(loc:CARDINAL):BOOLEAN; 324 VAR gn: Graphnode; 325 BEGIN 326 IF (loc=0) OR Marked(loc,m) THEN RETURN FALSE; END; 327 Mark(loc,m); 328 GetNode(loc,gn); 329 WITH gn DO 330 IF (typ=nt) AND NOT Marked(sp, termlist) 331 THEN RETURN IsTerm(lp); 332 ELSE RETURN (rp=0) OR IsTerm(rp) OR IsTerm(lp); 333 END; 334 END; 335 END IsTerm; 336 337 BEGIN (*TestIfNtToTerm*) 338 ClearMarkList(termlist); 339 REPEAT 340 changed:=FALSE; 341 FOR i:=maxp+l TO maxs DO 342 IF NOT Markedd,termlist) THEN 343 GetSy(i,sn); 344 ClearMarkList(m); 345 term:=IsTerm(sn.start); 346 IF term THEN Mark(i,termlist); changed:=TRUE; END; 347 IF ddt["E") THEN 348 WriteCard(con,i,6); 349 IF term 350 THEN WriteString(con," reducable to term.$"); 351 ELSE WriteString(con," not reducable to term.$"); END; 352 END; 353 END; (*IF NOT Marked*) 354 END; (*FOR*) 355 UNTIL NOT changed;
App. F cocotstMOD 345 356 ok:=TRUE; 357 WriteLn(lst); 358 FOR i:=maxp+l TO maxs DO 359 IF NOT Marked(i#termlist) THEN 360 GetSy(i,sn); GetName(sn.spix,name,l); 361 WriteText(1st,name,1); 362 WriteStringdst," cannot be derived to terminals."); WriteLn(lst); 363 ok:=FALSE; 364 END; 365 END; (*FOR*) 366 IF ok THEN 367 WriteStringdst,"All nonterminals can be derived to terminals."); 368 WriteLn(lst); 369 END; 370 END TestlfNtToTerm; 371 372 373 END cocotst. any c changed CheckAlternatives circmax Circrule Circrulelist ClearMarkList ClearSet cocogra cocolex coco1st cocosym cocotst code con ddt del Deletable DelNode dummy eps FilelO 56 38 39 30 32 36 12 16 12 14 15 16 10 136 19 350 14 34 12 12 41 55 19 FindCircularRules 28 first follow GetF GetFirstSet GetFo GetName GetNode GetSingles GetSy gn Graphnode headline 186 186 16 17 17 14 13 48 17 49 191 266 13 22 209 76 78 92 104 107 108 108 109 109 113 124 102 113 120 318 340 346 355 183 213 216 224 36 36 38 86 86 222 290 291 338 344 198 204 373 149 172 177 94 94 95 95 115 115 191 191 192 192 348 351 93 114 190 347 92 104 107 113 124 57 194 58 202 165 239 267 315 209 131 208 208 195 196 208 195 203 195 76 78 147 250 298 360 53 189 201 279 292 328 55 58 60 61 88 76 78 87 147 223 244 282 293 298 343 360 53 54 55 57 57 58 58 60 166 185 189 192 196 201 202 203 206 207 208 211 213 214 274 279 280 292 292 293 324 328 329 49 166 185 266 274 324 40 69 73 122 142 143 220
346 Program listings App. F i 41 63 76 78 85 87 92 94 103 104 108 109 113 115 123 124 124 167 173 175 176 176 177 221 223 224 239 243 244 267 296 297 298 315 341 342 343 346 348 358 359 360 IsInSet 17 176 176 IsTerm 323 331 332 332 335 345 j 41 89 90 92 95 106 107 108 109 k 41 I 65 76 77 78 79 138 147 150 152 239 250 251 267 298 299 315 360 361 lcirc 42 83 91 92 94 103 106 123 left 33 76 92 108 109 line 136 146 172 177 196 211 II 23 157 220 226 230 III 163 230 LLlError 136 158 177 LLlTest 163 231 loc 48 51 51 52 53 167 183 188 188 189 191 194 195 199 200 201 214 273 277 277 278 279 323 326 326 327 328 locset 186 203 204 207 208 211 212 lp 60 214 284 331 332 lside 46 105 109 112 1st 15 70 71 72 75 77 77 79 79 127 128 129 144 144 144 146 146 148 150 151 152 153 154 156 227 228 228 247 248 248 251 251 251 256 257 257 299 299 300 300 305 306 306 357 361 362 362 367 368 m 43 51 52 86 168 188 200 222 268 277 278 290 320 326 327 344 Mark 13 52 57 200 278 282 292 327 346 Marked 13 51 90 188 277 281 297 326 330 342 359 Marklist 13 43 44 48 168 268 271 319 320 MarkReachedNts 273 282 284 285 287 294 maxp 16 85 89 221 243 296 341 358 maxs 16 85 89 221 243 296 341 358 maxt 16 175 198 204 208 212 name 66 76 77 78 79 139 147 150 152 240 250 251 269 298 299 317 360 361 nt 57 208 281 330 ok 28 122 124 126 236 242 246 252 255 264 295 301 304 313 356 363 366 PutClrc 63 80 124 reached 271 281 282 291 292 297 RepSy 17 right 33 78 92 108 109 rootloc 12 292 rp 55 57 58 203 213 285 332 332 rside 46 105 108 112 s 186 195 196 198 211 212 si 172 176 s2 172 176 SetBit 18 207 singles 48 55 57 58 60 singleset 44 86 88 90 sn 45 67 76 76 78 78 87 88 140 147 147 169 223 224 238 244 245 250 270 275 282 282 293 294 298 298 316 343 345 360 360 sp 57 192 207 208 281 282 282 292 293 330
App.F cocotstMOD 347 spix start sy sym Symbolnode Symbolset Symboltype t term termlist Test 76 88 136 183 18 18 18 56 ' 321 319 172 78 224 147 195 45 172 207 345 330 180 147 245 213 67 186 346 338 196 250 282 140 349 342 211 298 294 169 346 360 345 238 359 TestCompleteness 236 259 TestlfAllNtReached 264 308 TestlfNtToTerm 313 370 typ 54 191 206 281 330 Unit 18 208 212 WriteCard 19 94 94 95 115 146 191 191 192 348 WriteLn 19 70 72 79 95 127 129 144 144 156 192 227 228 247 248 251 256 257 300 305 306 357 362 368 WrlteString 19 71 75 77 115 128 144 146 148 151 153 154 228 248 251 257 299 300 306 350 351 362 367 WrlteText 19 77 79 150 152 251 299 361
348 Program listings App.F 1 (* Errors General module to store error messages Moe 21.03.84 2 sssssas ===ssssssssssssss=sr======r========sss 3 This module stores information about syntax errors and semantic errors. 4 The information can either be retrieved afterwards or be printed 5 automatically as simple error messages. 6 Furthermore the module contains procedures to report compiler errors 7 and implementation restrictions. These procedures cause a program stop. 8 *) 9 DEFINITION MODULE Errors; 10 11 FROM FilelO IMPORT File; 12 13 TYPE 14 Symbolname = ARRAY[1..25] OF CHAR; 15 Errorptr = POINTER TO Errornode; 16 Errornode = RECORD (*expected symbol in syntax error message*) 17 txt: Symbolname; 18 1: CARDINAL; 19 next: Errorptr; 20 END; 21 22 23 PROCEDURE CompErr(nr:CARDINAL); 24 (* Reports compiler error nr and stops the program*) 25 26 PROCEDURE GetNextSemErr(VAR nr,line,col:CARDINAL); 27 (* Gets the error number, the line number and the column number of the 28 next semantic error. nr=0 if no next error exists*) 29 30 PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL); 31 (* Gets the expected symbols, the line number and the column number of 32 the next syntax error. symbols=NIL if no next error exists*) 33 34 PROCEDURE GetNumberOfErrors(VAR synerrors,semerrors:CARDINAL); 35 (* Gets the total number of syntax errors and semantic errors which 36 occurred during compilation*) 37 38 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL); 39 (* Prints error messages for all stored semantic errors (line,col, 40 error number). semerrors holds the total number of stored semantic 41 errors*) 42 43 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL); 44 (* Prints error messages for all stored syntax errors (line,col, 45 "near symbol",expected symbols), synerrors holds the total number of 46 stored syntax errors*) 47 48 PROCEDURE PrintSynError(f:File; symbols:Errorptr; col: CARD INAL); 49 (* Prints one error message line (A expected symbols).*) 50 51 PROCEDURE Restriction(nrCARDINAL); 52 (* Reports implementation restriction nr and stops the program*) 53 54 PROCEDURE SemErr(nr,line,col CARDINAL); 55 (* Stores the error number, line number and column number of a semantic 56 error*) 57 58 PROCEDURE SyntaxError(symbols:Errorptr; line,col:CARDINAL); 59 (* Stores the "near-symbol", the expected symbols, the line number and
App. F ErrorsDEF 349 60 the column number of a syntax error*) 61 62 END Errors. I \
350 Program listings App. F 1 (* Errors General module to store error messages Moe 21.03.84 3 This module stores information about syntax errors and semantic errors. 4 The information can either be retrieved afterwards or be printed 5 automatically as simple error messages. 6 Furthermore the module contains procedures to report compiler errors 7 and implementation restrictions. These procedures cause a program stop. 8 *) 9 IMPLEMENTATION MODULE Errors; 10 >11 (*imports of definition module*) 12 FROM FilelO IMPORT File; 13 14 (*imports of implementation module*) 15 FROM FilelO IMPORT con, Write, WriteCard, WriteLn, WriteString, 16 WriteText, Read; 17 FROM System IMPORT Allocate, Deallocate, Terminate, normal; 18 19 20 TYPE 21 Semerrptr = POINTER TO Semerror; 22 Semerror = RECORD 23 nr,line,col: CARDINAL; 24 next: Semerrptr; 25 END; 26 Synerrptr = POINTER TO Synerror; 27 Synerror = RECORD 28 symbols: Errorptr; 29 line,col: CARDINAL; 30 next: Synerrptr; 31 END; 32 33 VAR 34 semerr: Semerrptr; 35 synerr: Synerrptr; 36 37 38 (* CompErr Reports compiler error nr and stops the program 39 *) 40 PROCEDURE CompErr(nr:CARDINAL); 41 VAR dummy:CARDINAL; ch:CHAR; 42 BEGIN 43 PrintSynErrors(con,dummy); PrintSemErrors(con,dummy); 44 WriteString(con,"Compiler error "); WriteCard(con,nr,0); 45 WriteString(con,". Program terminated.$"); 46 WriteString (con, "Press a key to continued"); Read (con, ch); 47 Terminate(normal); 48 END CompErr; 49 50 51 (* GetNextSemErr Gets next semantic error information 52 *] 53 PROCEDURE GetNextSemErr(VAR nr,line,col CARDINAL); 54 VAR p: Semerrptr; 55 BEGIN 56 IF semerr=NIL 57 THEN nr:*0; line:=*0; col:=0; 58 ELSE 59 p:=semerr;
App.F ErrorsMOD 351 60 nr:=pA.nr; line:=pA.line; col:=pA.col; 61 semerr:=pA.next; Deallocate(p); 62 END; 63 END GetNextSemErr; 64 65 66 {* GetNextSynErr Gets next syntax error information 67 * 68 PROCEDURE GetNextSynErr(VAR symbols:Errorptr; VAR line,col:CARDINAL); 69 VAR p: Synerrptr; 70 BEGIN 71 IF synerr=NIL 72 THEN symbols:=NIL; line:=0; col:=0; 73 ELSE 74 p:=synerr; 75 symbols:=pA.symbols; line:=pA.line; col:=pA.col; 76 synerr:=pA.next; Deallocate(p); 77 END; 78 END GetNextSynErr; 79 80 81 (* GetNumberOfErrors Gets the total number of errors that occurred g2 * 83 PROCEDURE GetNumberOfErrors(VAR synerrors,semerrors:CARDINAL); 84 VAR 85 syn: Synerrptr; 86 sem: Semerrptr; 87 BEGIN 88 synerrors:-0; syn:=synerr; 89 WHILE synoNIL DO INC (synerrors); syn :=synA. next; END; 90 semerrors:=0; sem:=semerr; 91 WHILE semoNIL DO INC (semerrors); sem :=semA. next; END; 92 END GetNumberOfErrors; 93 94 95 (* PrintSemErrors Prints simple error messages for semantic errors 96 * 97 PROCEDURE PrintSemErrors(f:File; VAR semerrors:CARDINAL); 98 VAR 99 p: Semerrptr; 100 synerrors: CARDINAL; 101 BEGIN 102 GetNumberOfErrors(synerrors,semerrors); 103 IF semerrors>0 THEN 104 WriteString(f,"Semantic errors:$$"); 105 p:=semerr; 106 WHILE pONIL DO 107 WriteString(f,"line"); WriteCard(f,pA.line,5); 108 WriteString(f," col"); WriteCard(f,pA.col,3); 109 WriteString(f,": error "); WriteCard(f,pA.nr,0); 110 WriteLn(f); 111 p:=pA.next; 112 END; 113 END; 114 END PrintSemErrors; 115 116 117 (* PrintSym Print a symbol in error message 118 *
352 Program listings App.F 119 PROCEDURE PrintSym(f:Flle; txtiARRAY OF CHAR; lentCARDINAL); 120 BEGIN 121 IF len-1 122 THEN Write(f,,,M); Write(fftxt[0J); Write(f,»"»); 123 ELSE WriteText(f,txt,len); 124 END; 125 END PrintSym; 126 127 128 (* PrintExpected Print expected symbols 129 *) 130 PROCEDURE PrintExpected(f:File; VAR ptErrorptr); 131 VAR first:BOOLEAN; qtErrorptr; 132 BEGIN 133 first:=TRUE; 134 WHILE pONIL DO 135 IF first THEN first:=FALSE 136 ELSIF pA.next=NIL THEN WriteString(f,' or ») 137 ELSE WriteString(fr'r ') 138 END; 139 PrintSym(f,pA.txt,pA.l); 140 q:=p; p:=spA.next; Deallocate(q); 141 END; 142 WriteString(f,■ expected'); WriteLn(f); 143 END PrintExpected; 144 145 146 (* PrintSynErrors Prints simple error messages for syntax errors 147 *) 148 PROCEDURE PrintSynErrors(f:File; VAR synerrors:CARDINAL); 149 VAR 150 err#errl: Synerrptr; 151 p: Errorptr; 152 semerrors: CARDINAL; 153 BEGIN 154 GetNumberOfErrors(synerrors,semerrors); 155 IF synerrors>0 THEN 156 WriteString(f,"Syntax errors:$$"); 157 err:=synerr; 158 WHILE errONIL DO 159 WriteStringif/line'); WriteCard(f,errA.line,5); 160 p:=errA.symbols; 161 WriteString(fr' near »); PrintSym(f,pA.txt,pA.l); 162 WriteString(f,• : •); 163 PrintExpected(ffpA.next); Deallocate(p); 164 errl:-err; err:=errA.next; Deallocate(errl); 165 END; 166 END; 167 END PrintSynErrors; 168 169 170 (* PrintSynError Prints one error message line 171 *) 172 PROCEDURE PrintSynError{f:File; symbols:Errorptr; col:CARDINAL); 173 VAR i CARDINAL; 174 BEGIN 175 WriteString(f,"***** "); FOR i:=l TO col-1 DO Write(ffH ") END; 176 WriteString(f,"A "); 177 PrintExpected(f,symbolsA.next); Deallocate(symbols);
App. F ErrorsMOD 353 178 END PrintSynError; 179 180 181 (* Restriction Reports impl. restriction nr and stops the program 182 * 183 PROCEDURE Restriction(nr:CARDINAL); 184 VAR dummy:CARDINAL; ch:CHAR; 185 BEGIN 186 PrintSynErrors(con,dummy); PrintSemErrors(con,dummy); 187 WriteString(con,"Implementation restriction "); WrlteCard{con,nr,0); 188 WriteString(con,". Program terminated.$"); 189 Wri test ring (con, "Press a key to continued"); Read(con,ch); 190 Terminate(normal); 191 END Restriction; 192 193 194 (* SemErr Stores information about semantic error 195 i 196 PROCEDURE SemErr(nr,line,col CARDINAL); 197 VAR e,p,q: Semerrptr; 198 BEGIN 199 Allocate(e,SIZE(Semerror)); eA.nr:=nr; eA.line:=line; eA.col:=col; 200 p:=semerr; q:=NIL; 201 WHILE (pONIL) AND (pA.line<line) DO q:=p; p:=pA.next; END; 202 WHILE (pONIL) AND (pA.line=line) AND (pA.col<col) DO 203 q:=p; p:=pA.next; 204 END; 205 IF q=NIL THEN semerr:=e; ELSE qA.next:=e; END; 206 eA.next:=p; 207 END SemErr; 208 209 210 (* SyntaxError Stores information about syntax error 2U i 111 PROCEDURE SyntaxError(symbols:Errorptr; line, col: CARDINAL); 213 VAR e,p,q: Synerrptr; 214 BEGIN 215 Allocate(e,SIZE(Synerror)); 216 eA.symbols:=symbols; eA.line:=line; eA.col:-col; 217 p:«synerr; q:=NIL; 218 WHILE (pONIL) AND (pA.line<line) DO q:=p; p:-pA.next; END; 219 WHILE (poNIL) AND (pA.line=line) AND (pA.col<col) DO 220 q:=p; p:=pA.next; 221 END; 222 IF q-NIL THEN synerr:=e; ELSE qA.next:=e; END; 223 eA.next:=p; 224 END SyntaxError; 225 226 BEGIN (*Errors*) 227 synerr:=NIL; semerr:=NIL; 228 END Errors. Allocate 17 199 215 ch 41 46 184 189 col 23 29 53 57 60 60 68 72 75 75 108 172 175 196 199 199 202 202 212 216 216 219 219 CompErr 40 48 con 15 43 43 44 44 45 46 46 186 186 187 187
354 Program listings App.F 188 189 189 Deallocate 17 61 76 140 163 164 177 dummy 41 43 43 184 186 186 e 197 199 199 199 199 205 205 206 213 215 216 216 216 222 222 223 err 150 157 158 159 160 164 164 164 errl 150 164 164 Errorptr 28 68 130 131 151 172 212 Errors 9 228 f 97 104 107 107 108 108 109 109 110 119 122 122 122 123 130 136 137 139 142 142 148 156 159 159 161 161 162 163 172 175 175 176 177 File 12 97 119 130 148 172 FilelO 12 15 first 131 133 135 135 GetNextSemErr 53 63 GetNextSynErr 68 78 GetNumberOfErrors 83 92 102 154 i 173 175 1 139 161 len 119 121 123 line 23 29 53 57 60 60 68 72 75 75 107 159 196 199 199 201 201 202 202 212 216 216 218 218 219 219 next 24 30 61 76 89 91 111 136 140 163 164 177 201 203 205 206 218 220 222 223 normal 17 47 190 nr 23 1 QQ 54 76 136 197 40 59 76 139 200 44 60 99 139 201 53 60 105 140 201 57 60 106 140 201 60 61 107 140 201 60 61 108 151 201 109 69 109 160 202 183 74 111 161 202 187 75 111 161 202 196 75 130 163 203 199 75 134 163 203 203 206 213 217 218 218 218 218 218 219 219 219 220 220 220 223 PrintExpected PrintSemErrors PrintSym PrintSynError PrintSynErrors q Read Restriction sem semerr SemErr Semerror semerrors Semerrptr symbols syn synerr Synerror synerrors Synerrptr SyntaxError System Terminate 130 43 119 172 43 131 220 16 183 86 34 196 21 83 21 28 85 35 26 83 26 212 17 17 143 97 125 178 148 140 222 46 191 90 56 207 22 90 24 68 88 71 27 88 30 224 47 163 114 139 167 140 222 189 91 59 199 91 34 72 89 74 215 89 35 190 177 186 161 186 197 91 61 97 54 75 89 76 100 69 200 91 90 102 86 75 89 88 102 85 201 105 103 99 160 157 148 150 203 200 152 197 172 217 154 213 205 205 154 177 222 155 205 213 217 218 227 177 212 216 216 227
App. F ErrorsMOD 355 txt 119 122 123 139 161 Write 15 122 122 122 175 WriteCard 15 44 107 108 109 159 187 WriteLn 15 110 142 WriteStrlng 15 44 45 46 104 107 108 109 136 137 142 156 159 161 162 175 176 187 188 189 WriteText 16 123
356 Program listings App.F 1 (* FilelO Simple 10 with more than one file Moe 16.8.87 2 —=== ================================= 3 This module provides procedures which are similar to those of InOut, 4 except that they can be used with more than one file (even with the 5 console). g *) 7 DEFINITION MODULE FilelO; 8 9 FROM SYSTEM IMPORT WORD; 10 FROM Toolbox IMPORT DialogPtr; 11 FROM OS IMPORT ParmBlkPtr; 12 13 CONST 14 DEL = 177C; 15 EF = 4C; 16 EOL = 15C; 17 ESC = 33C; 18 buffersize = 16*1024; 19 20 TYPE 21 File = POINTER TO FileRecord; 22 FileRecord = RECORD 23 ref: INTEGER; (*file reference number*) 24 volRef: INTEGER; (*volume (subdirectory) reference number*) 25 name: ARRAY[0..63] OF CHAR; (*Modula string terminated by 0C*) 26 buffer: ARRAY[0..buffersize-1] OF CHAR; 27 bp: CARDINAL; (*index of next byte in buffer*) 28 bb: CARDINAL; (*number of bytes in buffer*) 29 output: BOOLEAN; (*true, if opened for output*) 30 eof: BOOLEAN; (*true, if no more unread bytes*) 31 END; 32 33 VAR 34 con: File; (*console file (screen and keyboard)*) 35 Done: BOOLEAN; (*TRUE if an operation was successful*) 36 termCH: CHAR; (*first character after input text*) 37 38 (* — for Mac open dialog box (see "Inside Macintosh") — *) 39 TYPE 40 FilterHook « PROCEDURE(ParmBlkPtr): BOOLEAN; 41 DialogHook = PROCEDURE(INTEGER, DialogPtr): INTEGER; 42 Filetype « ARRAY[0..3] OF CHAR; 43 44 VAR 45 errCode: INTEGER; (*file manager status code*) 46 filterHook: FilterHook; (*file filter procedure (init none)*) 47 dlgHook: DialogHook; (*dialog handling procedure (init none)*) 48 ftype: ARRAY[0..3] OF Filetype; 49 (*file types to be handled by open dialog*) 50 (*init: ftype[0]:="TEXT", ftype[l..3]:-""*) 51 (* *) 52 53 PROCEDURE Open(VAR f:File; volRef:INTEGER; fn:ARRAY OF CHAR; 54 output:BOOLEAN); 55 (* Opens file f with name fn on volume (subdirectory) volRef. 56 volRef 0:default volume; 1:internal drive; 2:external drive 57 negative:volume or subdirectory reference number. 58 fn - If not empty, fn is the name of the file to be opened on 59 volume (subdirectory) volRef. The drive number may be placed
App.F FilelODEF 357 60 in front of the file name separated by a colon (e.g.1:name). 61 It overwrites volRef. 62 - If empty, an open dialog box is displayed which allows 63 choosing the volume, subdirectory and filename. The chosen 64 values are returned in fA. The value of volRef is irrelevant 65 in this case. 66 (Advanced programmers: Only those files are displayed whose 67 file type is contained in ftype. Own procedures may be 68 supplied in the variables "filterHook" and "dlgHook" to 69 suppress file names in the open box or to handle additional 70 dialog items.) 71 output TRUE: the specified file is opened for output. Any existing 72 file with the same name is deleted. 73 FALSE: the specified file is opened for input. 74 Done indicates if the file f has been opened successfully.*) 75 76 PROCEDURE Close(VAR f:File); 77 (* Closes file f. f becomes NIL*) 78 79 PROCEDURE Read(f:File; VAR ch:CHAR); 80 (* Reads a character ch from the file f (no echo on the console). 81 Done indicates if the operation has been successful*) 82 83 PROCEDURE ReadCard(f:File; VAR val:CARDINAL); 84 (* Reads a CARDINAL from file f (leading blanks are skipped). 85 termCH and Done get values*) 86 87 PROCEDURE Readlnt(f:File; VAR val:INTEGER); 88 (* Reads an INTEGER from file f (leading blanks are skipped). 89 termCH and Done get values*) 90 91 PROCEDURE ReadString(f -.File; VAR s:ARRAY OF CHAR); 92 (* Reads a string of characters (terminated by " ■ or CR) from 93 file f. termCH and Done get values*) 94 95 PROCEDURE ReadWord(f:File; VAR w:CARDINAL); 96 (* Reads a 16 bit word w from the file f without conversion*) 97 98 PROCEDURE Write(f:File; ch:CHAR); 99 (* Writes a character ch to the file f*) 100 101 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER); 102 (* Writes a CARDINAL nr with width w to the file f. If the actual 103 width of nr is bigger than w, w is expanded*) 104 105 PROCEDURE WriteHex(f:Flle; a:ARRAY OF WORD; length:INTEGER); 106 (* Writes length hexadecimal bytes from a to the file f*) 107 108 PROCEDURE Writelnt(f:Flle; 1:INTEGER; w:INTEGER); 109 (* Writes an INTEGER i with w characters to file f. If the actual 110 width of nr is bigger than w, w is expanded*) 111 112 PROCEDURE WriteLn(f:File); 113 (* Skips to the start of the next line on the file f*) 114 115 PROCEDURE WriteString(f:File; s:ARRAY OF CHAR); 116 (* Writes a string s to the file f. Any occurrence of the character 117 "$" in s causes a WriteLn*) 118
358 Program listings App.F 119 PROCEDURE WriteText(f:File; t:ARRAY OF CHAR; 1:INTEGER); 120 (* Writes a text t with length 1 to the file f*) 121 122 PROCEDURE WriteWord(f:File; w:CARDINAL); 123 (* Writes a 16 bit word w without conversion to the file f*) 124 125 END FilelO.
App. F FildOMOD 359 1 (* FilelO Simple 10 with more than one file Moe 16.8.87 3 This module provides procedures which are similar to those of InOut, 4 except that they can be used with more than one file (even with the 5 console). 6 *) 7 IMPLEMENTATION MODULE FilelO; 8 9 FROM SYSTEM IMPORT WORD, ADR, SETREG, REG, SHORT, VAL; 10 FROM MemTypes IMPORT Str255, ProcPtr; 11 FROM OS IMPORT DupFNErr, EOFErr, OSType, ParamBlockRec, 12 FS, PBHOpen, PBHCreate,PBClose, PBHDelete, PBRead, 13 PBWrite, 14 HFS, GetCatlnfo, SetCatlnfo, 15 SFGetFile, SFPutFile, SFget, SFput, SFReply, 16 SFTypeList; 17 FROM QuickDraw IMPORT Point; 18 FROM Toolbox IMPORT ModStr, PasStr; 19 FROM System IMPORT Allocate, Deallocate; 20 IMPORT Terminal; 21 22 23 (* Open Open a file on the specified volume 24 *) 25 PROCEDURE Open(VAR f:File; volRef:INTEGER; fn:ARRAY OF CHAR; 26 output:BOOLEAN); 27 VAR 28 par: ParamBlockRec; 29 s: Str255; 30 pt: Point; 31 reply: SFReply; 32 tlist: SFTypeList; 33 i,j,l: INTEGER; 34 35 PROCEDURE Create (drive:INTEGER; name:ARRAY OF CHAR; 36 type,creator:OSType; VAR status:INTEGER); 37 VAR statusl:INTEGER; par:ParamBlockRec; 38 BEGIN 39 WITH par DO 40 ioNamePtr:=ADR(name); ioVRefNum:=drive; ioVersNum:=0C; ioDirID:=0; 41 status:=FS(PBHCreate,par); statusl:=0; 42 IF status=DupFNErr THEN 43 statusl:=FS(PBHDelete,par); 44 status:=FS(PBHCreate,par); 45 END; 46 IF (status=0) AND (statusl=0) THEN (*set finder info*) 47 ioFDirIndex:=0; status:=HFS(GetCatlnfo,par); 48 IF StatUS=0 THEN 49 ioFlFndrInfo.fdType:=type; ioFlFndrInfo.fdCreator:=creator; 50 ioDirID:=0; 51 status:=HFS(SetCatlnfo,par); 52 END; 53 END; 54 END; 55 END Create; 56 57 BEGIN 58 Done:=TRUE; errCode:=0; 59 IF fn[0]=0C THEN (*get file name from dialog box*)
360 Program listings App. F 60 pt.v:=60; pt.h:=100; PasStr(fn,s); 61 IF output 62 THEN SFPutFile(pt,s,s, VAL(ProcPtr,dlgHook),reply,SFput) 63 ELSE 64 i:=0; 65 WHILE (i<4) AND (ftype[ir0]<>0C) DO 66 FOR j:-0 TO 3 DO tlist[i,j+1]:=ftype[i,j] END; 67 INC(i) 68 END; 69 SFGetFile(pt, s, VAL(ProcPtr,filterHook),1,tlist, 70 VAL(ProcPtr,dlgHook),reply,SFget) 71 END; 72 IF reply.good 73 THEN 74 l:=ORD(reply.fName[0]); 75 FOR i:-0 TO 1 DO s[i] .-reply.fName[l]; END; 76 volRef:=reply.vRefNum 77 ELSE errCode:=2 (*cancel*) 78 END; 79 ELSIF (fn[l]=":") AND (fn[0]>="0") AND (fn[0]<=-9») THEN 80 volRef:=ORD(fn[0])-ORD("0"); 81 i:=2; 82 WHILE (K=HIGH(fn)) AND (fn[l]<>0C) DO s[i-l] :-fn[i]; INC(l) END; 83 s[0]:-CHR(i); 84 ELSE PasStr(fn,s); 85 END; 86 87 IF output & (errCode=0) THEN 88 Create(volRef,s,"TEXT","????",errCode); 89 END; 90 91 IF errCode«0 THEN 92 WITH par DO 93 ioNamePtr:=ADR(s); ioVRefNum:=volRef; ioVersNum:=0C; loDirID:=0; 94 ioPermssn:«0C; ioMlsc:=NIL; 95 errCode:=FS(PBHOpen,par); 96 IF errCode=0 THEN 97 Allocate(f,SIZE(FlleRecord)); 98 IF fONIL THEN 99 fA.ref:=loRefNum; fA.volRef:=volRef; ModStr(s,fA.name); 100 fA.bp:*0; fA.bb:=0; fA.eof:=FALSE; fA.output:=output; 101 END; 102 END; 103 END; 104 END; 105 IF errCode#0 THEN Done:=FALSE; f:=NIL END; 106 END Open; 107 108 109 (* Close Close file f U0 *) 111 PROCEDURE Close(VAR f:File); 112 VAR par:ParamBlockRec; 113 BEGIN 114 IF f=NIL THEN RETURN END; (*con cannot be closed*) 115 par.ioRefNum:=fA.ref; 116 IF fA.output THEN 117 par.ioBuffer:=ADR(fA.buffer); 118 par.loReqCount:=fA.bp; par.ioPosMode:=0; par.ioPosOffset:=0;
App. F FildOMOD 119 errCode:=FS(PBWrlte,par) 120 END; 121 errCode:=FS(PBClose,par); Done:=errCode=0; 122 Deallocate(f); f:=NIL; 123 END Close; 124 125 126 (* Read Read a character from file f 127 128 PROCEDURE Read(f:Flle; VAR ch:CHAR); 129 VAR par:ParamBlockRec; 130 BEGIN 131 IF f=NIL (*con*) 132 THEN Terminal.Read(ch); 133 ELSE 134 WITH fA DO 135 IF bp>=bb THEN 136 par.ioRefNum:=ref; par.ioBuffer:=ADR(buffer); 137 par.ioReqCount:=buffersize; par.ioPosMode:=0; 138 par.loPosOffset:*0; 139 errCode:=FS(PBReadrpar); 140 IF errCode=EOFErr THEN errCode:-0 END; 141 bb:=SHORT(par.ioActCount); bp:=0; 142 IF bb-0 THEN 143 buffer[0]:=EF; eof:=TRUE; Done:=FALSE; errCode:=EOFErr 144 END 145 END; 146 ch:=buffer[bp]; INC(bp) 147 END 148 END; 149 END Read; 150 151 152 (* ReadCard Read a CARDINAL-constant from file f 153 *) 154 PROCEDURE ReadCard(f:File; VAR val: CARDINAL); 155 VAR ch:CHAR; i:INTEGER; 156 BEGIN 157 IF f*NIL (*con*) 158 THEN (*input from terminal*) 159 i:=0; val:=0; 160 REPEAT Terminal.Read(ch); UNTIL cho" "; 161 WHILE ch>" ■ DO 162 IF ch=DEL THEN 163 IF i>0 THEN 164 Terminal.Write(ch); DEC(i); val:=val DIV 10; 165 END; 166 ELSIF (ch^-O") AND (ch<»"9") AND 167 ((val<6553) OR ((val=6553) AND (ch<="5"))) THEN 168 Terminal.Write(ch); INC(i); 169 val:=10*val+VAL(CARDINAL,ORD(ch)-ORD("0")); 170 END; 171 Terminal.Read(ch); 172 END; 173 Done:=i>0; 174 ELSE (*input from file*) 175 val:=0; Done:=TRUE; 176 REPEAT Read(f,ch) UNTIL ch<>" ■; 177 WHILE ch>" » DO
362 Program listings App. F 178 IF (ch>="0") AND (clK-"^) AND Done AND 179 ((val<6553) OR ((val=6553) AND (ch<="5"))) 180 THEN val^lOaval+VALtCARDINALjORDtchJ-ORDrO'')); 181 ELSE Done:=FALSE; val:=0; 182 END; 183 Read(f,ch); 184 END; 185 END; 186 termCH:=ch; 187 END ReadCard; 188 189 190 (* Readlnt Read an INTEGER-constant from file f 191 *) 192 PROCEDURE Readlnt(f:File; VAR val: INTEGER); 193 VAR 194 ch: CHAR; 195 sign: INTEGER; 196 x: CARDINAL; 197 s: ARRAY11..80] OF CHAR; 198 i: INTEGER; 199 BEGIN 200 ReadString(f,s); 201 x:=0; val:=0; i:=l; 202 IF s[i]="-" THEN sign:=-l; INC(i); ELSE slgn:=l; END; 203 ch:=s[i]; 204 LOOP 205 IF ch=0C THEN Done:=TRUE; EXIT; END; 206 IF (ch<"0") OR (ch>"9H) THEN Done:*FALSE; EXIT; END; 207 IF (x>3276) OR ((x=3276) AND (ch>"8*)) THEN Done:=FALSE; EXIT END; 208 x:=10*x+VAL(CARDINALrORD(ch)-ORD(,,0,•)); 209 INC(i); ch:*s[i); 210 END; 211 IF Done THEN 212 IF x<=32767 THEN val:=sign*VAL(INTEGER,x); 213 ELSIF sign=-l THEN val:=-32767; DEC(val); 214 ELSE Done:=FALSE; END; 215 END; 216 END Readlnt; 217 218 219 (* ReadString Read a string of characters from file f 220 *) 221 PROCEDURE ReadString(f:File; VAR s:ARRAY OF CHAR); 222 VAR i:INTEGER; ch:CHAR; 223 BEGIN 224 IF f«NIL (*con*) 225 THEN 226 REPEAT Terminal.Read(ch); UNTIL cho" "; 227 i:=-l; 228 WHILE ch>" " DO 229 IF ch=DEL THEN 230 IF i>=0 THEN Terminal.Write(10C); DEC(i); END; 231 ELSIF KHIGH(s) THEN 232 Terminal.Write(ch); INC(i); s[i]:=ch; 233 END; 234 Terminal.Read(ch); 235 END; 236 ELSE
App-F FilelOMOD 363 vH N. 237 REPEAT Read(f,ch); UNTIL ch<>" 238 i:—1; 239 WHILE ch>" ■ DO 240 IF i<HIGH(s) THEN INC(i); s[i]:=ch; END; 241 Read(f,ch); 242 END; 243 END; 244 termCH:*ch; 245 INC(i); 246 IF i<=HIGH(s) THEN s[i]:=0C; END; 247 END ReadString; 248 249 250 (* ReadWord Read a word from File f without conversion 251 252 PROCEDURE ReadWord(f:File; VAR w.CARDINAL); 253 VAR i, j: CHAR; 254 BEGIN 255 Read(f,i); Read(f,j); 256 w:=256*ORD(i) + ORD(j); 257 END ReadWord; 258 259 260 (* Write Write a character to list file 261 262 PROCEDURE Write(f:File; ch:CHAR); 263 VAR par:ParamBlockRec; status'.INTEGER; 264 BEGIN 265 IF f=NIL (*con*) 266 THEN Terminal.Write(ch); 267 ELSE 268 WITH fA DO 269 IF bp>-buffersize THEN 270 par.ioRefNum:=ref; par.ioBuffer:~ADR(buffer); 271 par.ioReqCount:=buffersize; par.ioPosMode:=0; 272 par.ioPosOffset:=0; 273 status:=FS(PBWrite,par); 274 bp:=0 275 END; 276 buffer[bp]:=ch; INC(bp) 277 END 278 END; 279 END Write; 280 281 282 (* WriteCard Write a cardinal to list file 283 284 PROCEDURE WriteCard(f:File; nr:CARDINAL; w:INTEGER); 285 VAR 286 287 288 289 290 291 292 293 294 l,d: INTEGER; t: ARRAY[1..5] OF CHAR; BEGIN l:-0; REPEAT d:=nr MOD 10; nr:=nr DIV 10; INC(l); t[l]:-CHR(ORD("0")+d); UNTIL nr=0; WHILE w>l DO Write(f," "); DEC(w); END; 295 WHILE 1>0 DO Write(f,t(l]); DEC(l); END;
364 Program listings App.F 296 END WriteCard; 297 298 299 (* WriteHex Write length bytes from a 300 *) 301 PROCEDURE WriteHex(f:File; s:ARRAY OF WORD; length:INTEGER); 302 VAR i,j:INTEGER; w:CARDINAL; 303 304 PROCEDURE WriteHexDigit(b:INTEGER); 305 BEGIN 306 IF b<10 307 THEN Write(f,CHR(b+ORD("0"))); 308 ELSE Write(f,CHR(b-10+ORD("A"))); END; 309 END WriteHexDigit; 310 311 BEGIN (*WriteHex*) 312 j:=0; 313 FOR i:=l TO length DO 314 IF ODD(i) 315 THEN w:=VAL(CARDINAL,s[j]) DIV 256; 316 ELSE w:=VAL(CARDINAL,s[j]) MOD 256; INC(j); 317 END; 318 Write(f," ■); 319 WriteHexDigit(w DIV 16); 320 WriteHexDigit(w MOD 16); 321 END; 322 END WriteHex; 323 324 325 (* Writelnt Write an INTEGER-value to file f 326 *) 327 PROCEDURE Writelnt(f:File; i:INTEGER; w:INTEGER); 328 VAR 329 l,d: INTEGER; 330 x: CARDINAL; 331 t: ARRAY[1..5] OF CHAR; 332 sign: CHAR; 333 BEGIN 334 IF i<0 335 THEN sign:*"-"; x:=VAL(CARDINALrABS(i+l)); INC(x); 336 ELSE sign:=" "; x:=VAL(CARDINAL,ABS(i)); 337 END; 338 1:*0; 339 REPEAT 340 d:=x MOD 10; x:=x DIV 10; 341 INC(l); t[l]:=CHR(ORD("0")+d); 342 UNTIL x=0; 343 WHILE w>l+l DO Write(f#" "); DEC(w); END; 344 IF (sign="-w) OR (w>l) THEN Write(f,sign); END; 345 WHILE 1>0 DO Write(f,t[1]); DEC(l); END; 346 END Writelnt; 347 348 349 (* WriteLn skip to new line on list file 350 *) 351 PROCEDURE WriteLn(f:File); 352 BEGIN 353 IF f-NIL (*con*) 354 THEN Terminal.WriteLn; 355 ELSE Write(f,EOL);
App.F FMOMOD 365 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 END; END WriteLn; (* WriteString Write a string to list fill PROCEDURE WriteString(f:File; s:ARRAY OF CHAR); VAR i: INTEGER; BEGIN i:=0; LOOP IF i>HIGH(s) THEN EXIT; ELSIF s[i]="$" THEN WriteLn(f); ELSIF s[i]-0C THEN EXIT; ELSE Write(f,s[i]); END; INC(i); END; END WriteString; {* WriteText Write text to list file PROCEDURE WriteText(f:File; ttARRAY OF CHAR; 1:INTEGER); VAR i: INTEGER; BEGIN FOR i:-0 TO 1-1 DO Write(f,t[i]); END; END WriteText; (* WriteWord Write a word to File f without conversion PROCEDURE WriteWord(f:File; w:CARDINAL); BEGIN Write(f,CHR(w DIV 256)); Write(f,CHR(w MOD 256)); END WriteWord; BEGIN con:=NIL; ftype[0]:="TEXT"; ftype[l]:■"*; dlgHook:=VAL(DialogHook,NIL); filterHook:=V^L(FilterHookrNIL); errCode:=0; END FilelO. ABS ADR Allocate b bb bp buffer buffersize C ch 93 117 136 270 335 336 9 40 19 97 304 306 307 308 100 135 141 142 100 118 135 141 146 146 269 274 276 276 117 136 143 146 270 276 137 269 271 40 59 65 82 93 94 205 230 246 369 128 132 146 155 160 160 161 162 164 166 166' 167 168 169 171 176 176 177 178 178 179 180 183 186 194 203 205 206 206 207 208 209 222 226 226 228 229 232 232 234 237 237 239 240 241 244 262 266
366 Program listings App. F Close con Create creator d Deallocate DEL DlalogHook dlgHook Done drive DupFNErr EF eof EOFErr EOL errCode 276 111 395 35 36 286 19 162 396 62 58 214 35 11 143 100 11 355 58 123 55 49 291 122 229 70 105 40 42 143 140 77 88 292 396 121 143 87 329 143 88 340 173 91 96 105 119 121 121 139 140 140 143 398 f 25 97 98 99 99 99 100 100 100 100 105 111 114 115 116 117 118 122 122 128 131 134 154 157 176 183 192 200 221 224 237 241 252 255 255 262 265 268 284 294 295 301 307 308 318 327 343 344 345 351 353 355 362 368 370 379 382 388 390 391 fdCreator 49 fdType 49 File 25 HI 128 154 192 221 252 262 284 301 327 351 362 379 388 FilelO 7 399 FileRecord 97 FilterHook 397 filterHook 69 397 fn 25 59 60 79 79 79 80 82 82 82 84 fName 74 75 FS 12 41 43 44 95 119 121 139 273 ftype 65 66 395 395 GetCatlnfo 14 47 good 72 h 60 HFS 14 47 51 HIGH 82 231 240 246 367 i 33 64 65 65 66 66 67 69 75 75 75 81 82 82 82 82 82 83 155 159 163 164 168 173 198 201 202 202 203 209 209 222 227 230 230 231 232 232 238 240 240 240 245 246 246 253 255 256 302 313 314 327 334 335 336 363 365 367 368 369 370 372 380 382 382 ioActCount ioBuffer ioDirlD ioFDirIndex ioFlFndrlnfo ioMisc ioNamePtr ioPermssn ioPosMode ioPosOffset ioRefNum 141 117 40 47 49 94 40 94 118 118 99 136 50 49 93 137 138 115 270 93 271 272 136
App.F FilelOMOD 367 ioReqCount ioVersNum ioVRefNum j 1 length MemTypes ModStr name nr ODD Open OS OSType output par ParamBlockRec PasStr PBClose PBHCreate PBHDelete PBHOpen PBRead PBWrite Point ProcPtr pt QuickDraw Read ReadCard Readlnt Readstring ReadWord ref REG reply s SetCatlnfo SETREG SFget SFGetFile SFput SFPutFile SFReply SFTypeList SHORT sign status statusl Str255 SYSTEM System t 118 40 40 33 33 338 301 10 18 35 284 314 25 11 11 26 28 117 139 11 18 12 12 12 12 12 13 17 10 30 17 128 255 154 192 200 252 99 9 31 29 197 301 14 9 15 15 15 15 15 16 9 195 36 37 10 9 19 287 137 93 93 66 74 341 313 99 40 291 106 36 61 37 118 141 28 60 121 41 43 95 139 119 30 62 60 132 187 216 221 257 115 62 60 200 315 51 70 69 62 62 31 32 141 202 41 41 29 292 271 66 75 341 99 291 87 39 118 263 37 84 44 273 69 60 149 247 136 70 62 202 316 202 42 43 295 66 286 343 291 100 41 118 270 112 70 62 160 270 72 62 203 362 212 44 46 331 253 289 344 293 100 43 119 270 129 69 171 74 69 209 367 213 46 341 255 292 345 116 44 121 271 263 176 75 75 221 368 332 47 345 256 292 345 47 129 271 183 76 82 231 369 335 48 379 302 294 345 51 136 272 226 83 232 370 336 51 382 312 295 379 92 136 273 234 84 240 344 263 315 295 382 95 137 237 88 240 344 273 316 295 112 137 241 93 246 316 329 115 138 255 99 246
368 Program listings App. f termCH 186 244 Terminal 20 132 160 164 168 171 226 230 232 234 266 354 tlist 32 66 69 Toolbox 18 type 36 49 v 60 val 154 159 164 164 167 167 169 169 175 179 179 180 180 181 192 201 212 213 213 VAL 9 62 69 70 169 180 208 212 315 316 335 336 396 397 volRef 25 76 80 88 93 99 99 vRefNum 76 w 252 256 284 294 294 302 315 316 319 320 327 343 343 344 388 390 391 WORD 9 301 Write 164 168 230 232 262 266 279 294 295 307 308 318 343 344 345 355 370 382 390 391 WriteCard 284 296 WriteHex 301 322 WriteHexDigit 304 309 319 320 Writelnt 327 346 WriteLn 351 354 357 368 WriteString 362 374 WriteText 379 383 WriteWord 388 392 x 196 201 207 207 208 208 212 212 330 335 335 336 340 340 340 342
App. F SystemDEF 369 1 (* System System dependent module (from MacMETH [86]) 2 ====== ======================= 3 The module System is the heart of the Modula-2 system on the Macintosh. 4 It contains the loader and procedures to supply missing instructions 5 of the processor (REAL and LONGINT arithmetic). There are also 6 procedures for calling and terminating programs and handling the heap. 7 *, 8 DEFINITION MODULE System; (*H.Seller, C.Vetterli, 22-Dec-85/26-Feb-86*) 9 10 FROM SYSTEM IMPORT ADDRESS; 11 ... 12 13 TYPE Status = (normal, moduleNotFound, fileNotFound, illegalKey, 14 readError, badSyntax, noMemory, alreadyLoaded, 15 killed, tooManyPrograms, continue, noApplication); 16 17 PROCEDURE Allocate(VAR ptr:ADDRESS; size:LONGINT); 18 (* Tries to allocate a memory area of the given size on the heap. If the 19 space is not available, ptr returns NIL otherwise ptr returns the 20 address of the reserved area*)PROCEDURE Deallocate(VAR Ptr:ADDRESS); 21 22 PROCEDURE Deallocate(VAR ptr:ADDRESS); 23 (* Releases the memory area given by address ptr. ptr returns NIL*) 24 25 PROCEDURE Terminate(status:Status); 26 (* terminates the currently running process, status signals the 27 cause of termination*) 28 29 ... 30 31 END System.
Bibliography Aho A.V., Johnson S.C. [1974] LR-parsing, Computing Surveys 6, 2,99-124 Aho A.V., Ullman J.D. [1972] The Theory of Parsing, Translation, and Compiling, Prentice Hall Aho A.V., Ullman J.D. [1977] Principles of Compiler Design, Addison-Wesley Bauer F.L., Eickel J.(eds) [1976] Compiler Construction. An Advanced Course, Springer- Verlag Blaschek G., Pomberger G., Ritzinger F. [1985] EinfUhrung in die Programmierung mit Modula-2, Springer-Verlag, to appear in English 1989 Engelfriet J., File G. [1981] Passes, Sweeps, and Visits, in: Lecture Notes in Computer Science 115, Springer-Verlag, 193-207 Feldman J.A., Gries D. [1968] Translator writing systems, CACM 9,1,77-113 Fischer C.N., LeBlanc RJ. [1988] Crafting a Compiler, The Benjamin/Cummings Publishing Company Ganzinger H., Giegerich R. [1984] Attribute coupled grammars, SIGPLAN Notices 19,6, 157-170 Gries D. [1971] Compiler Construction for Digital Computers, Wiley Hartmann A.C [1977] A Concurrent Pascal Compiler for Minicomputers, Springer- Verlag Henderson P., Snowdon R. [1972] An experiment in structured programming, Bit 2, 38-53
Bibliography 371 Hopcroft, Ullman J.D. [1979] Introduction to Automata Theory, Languages, and Computation, Addison-Wesley Hughes J.W. [1979] A formalization and explication of the Michael Jackson method of program design, SOFTWARE - Practice and Experience 9,191-202 Inside Macintosh [1985] volumes I—HI, Addison-Wesley Jackson M.A. [1975] Principles of Program Design, Academic Press Johnson S.C. [1975] YACC - Yet Another Compiler-Compiler, Tech.Rep.Nr.32, Bell Laboratories, July 1975 Kastens U., Hutt B., Zimmermann E. [1982] GAG: A Practical Compiler-Generator, in: Lecture Notes in Computer Science 141, Springer-Verlag Knuth D.E. [1965] On the translation of languages from left to right, Information and Control 8, 6, 607-639 Knuth D.E. [1968] Semantics of context-free languages, Mathematical Systems Theory 2, 127-145 Koskimies K. [1984] A specification language for one-pass semantic analysis, SIGPLAN Notices 19,6,179-189 Koskimies K., Raiha K.-J., Sarjakoski M. [1982] Compiler construction using attribute grammars, Proc. SIGPLAN 82 Symposion on Compiler Construction, June 1982, 153-159 Lewis P.M., Rosenkrantz D.J., Stearns R.E. [1976] Compiler Design Theory, Addison- Wesley Lewis P.M., Stearns R.E. [1968] Syntax directed transduction, Journal ACM 15, 3,464-488 Meijer H., Nijholt A. [1982] YABBER - yet another bibliography: translator writing tools, SIGPLAN Notices 17, 10 MttssenbOck H. [1986] Alex - a simple and efficient scanner-generator, SIGPLAN Notices 21,5 Pomberger G [1986] Software Engineering and Modula-2, Prentice Hall Raiha K.-J. [1977] On Attribute Grammars and their Use in a Compiler Writing System, Report A-1977-4, Department of Computer Science, University of Helsinki Raiha K.-J. [1980] Bibliography on attribute grammars, SIGPLAN Notices 15,3 Raiha K.-J., et al. [1983] Revised Report on the Compiler Writing System HLP78, Report A-1983-1, Department of Computer Science, University of Helsinki
372 Bibliography Rosen S. (ed.) [1967] Programming Systems and Languages, McGraw-Hill, New York Rosenkrantz DJ., Stearns R.E. [1970] Properties of deterministic top-down grammars, Information and Control 17,3,226-256 Spenke M., Miihlenbein H., Mevenkamp M., et al. [1984] A language independent error recovery method for LL(1) parsers, SOFTWARE - Practice and Experience 14,11 Tienari M. [1980] On the Definition of an Attribute Grammar, in: Lecture Notes in Computer Science 94 (eds Goos, G. and Hartmanis, J.)* Springer-Verlag Waite W.M., Goos G. [1984] Compiler Construction, Springer-Verlag Watt D.E., Lehrmann Madsen O. [1983] Extended attribute grammars, The Computer Journal 26,2,142^153 Wirth N. [1982] Programming in Modula-2, Springer-Verlag Wirth N. [1986] Compilerbau, B.G. Teubner Stuttgart Wirth N., Gutknecht J., Heiz W., et al. [1986] MacMETH - A Fast Modula-2 Language System For the Apple Macintosh, User Manual, ETH Ztirich
Index actual attributes, 113,165 address list for G-code generation, 157 Adele,ll, 125, 203 Aho, 13,41 Alex, 119 Algol60,52 algorithmic interpretation of grammars, 83 alias name, 109,123 aliasspix, 128 alphabet, 14 extension, 51 alternative chain, 48,108 alternatives, 15 of deletable nonterminals, 137 of eps-nodes, 137 ambiguity, 108 analysis phase, 4 analyzing grammar, 23 and, 208 any, 45, 107, 122, 124, 178 any-set, 140,147,155 anyset, 54 applications of attributed grammars, 171 arithmetic expressions, 19 arithmetization of symbols, 6 arrows, 112 assessment of some compiler generators, 102 at, 122,165 Atari, 101,126 attribute, 71,72,113 assignment, 131,165 context, 167 coupling, 98 direction, 164 evaluation, 79 list, 129,164, 226 numbers, 155 passing, 87 processing, 164 saving, 90 attributed grammar, 73,79,105 applications, 171 of Coco, 228 attributes consistency check, 165 of terminals, 122 Attrkind, 166 back end, 6 Bauer, 7 BITSET, 208 Blaschek,207 BNF, 102 bottom-up syntax analysis, 24 brackets, 136 caller interface, 121 CAP, 209 CARDINAL, 208 central-recursive grammar, 19 characteristics of Coco, 117 CheckAlternatives, 153 circular, 108 derivation, 21 grammar, 21 circularity, 150 CloseFile, 223 Coco, 4,104, 222,241 characteristics, 117 history, 197 short description, 100 COCO.ATG, 228
374 Index cooogen, 224,245 cocogen2, 225, 254 cocogra, 224,266 Cocol, 4, 105 example, 101,134,163,167,174,18i 190,192 syntax, 212 cocolex, 223,275 cocolst, 226,283 cooosem, 223,287 cocosemfrane, 161,297 cocosy^ 224,299 cocosyn, 223, 316 cecosynfraire, 159, 328 cocotst, 225,338 col, 122 CollectFirst, 143 ColiectFollow, 144 comments, 106,110 compiler, 2 compiler compiler, 3,91 compiler description language, 3,105 compiler error numbers, 241 compiler structure dynamic, 8 static, 4 complement symbol any, 45,107 Corrplete, 145 CcnpleteAt, 129, 223 completeness, 108,149 components of a generated compiler, 119 compound characters, 6 CtancatLeft, 133, 223 CtoncatRight, 132, 223 context condition, 76,87,115 context-free grammar, 15,106 Copy, 162, 163, 223 CopyFramePart, 160, 161 correct grammar, properties, 108 cross-reference list, 214 cyclic semantic dependencies, 82 dangling else, 29,108,147 debug switches, 241 DEC, 209 declaration of semantic objects, 115 symbols, 109 definition module, 210 DelEfcs, 139 deletability,31 direct, 128,134 indirect, 141 Deletable, 60, 141 deletable nonterminal, 31,141 Delete redundant eps-nodes, 127, 138 >, DelGraph, 141 derivable symbol, 21 derivation, 16 rules, 15 derived attributes, 74 deterministic grammar, 24 direct deletability, 128,134 documentation, 187 dynamic compiler structure, 8 EBNF, 19,20,107,117 Emit, 157 EmitAction, 166, 167, 223 empty string, 14,107 end-of-file symbol, 109 end-of-line symbol, 110 ends€Ri,70 Engelfriet,98 eps, 107 followers, 54 eps-nodes insertion, 136 removal, 138 terminal successors of, 140 eps-set, 140,145,155 example, 196 epsset, 54 equivalent top-down graphs, 45 errdist, 68 Error, 60, 65, 68 error distance, 68 error handling, 62,64 error message module, 119,226,348 error messages, 65,123 Errorptr, 123 Errors, 123, 226, 348 example of Cocol, 101,163,167,174,186,190,192 generated compiler parts, 192 EXCU 209 exit statement, 209 experiences, 197,201 export list, 209 extended Backus-Naur form, 19 factorization of
Index 375 nonterminals, 49 top-down graphs, 43 File, 98 FilelQ, 226, 356 Fill, 67 FillSucc,67 filter procedure, 120 Find circular rules, 148, 150 Find deletable symbols, 127,141 FindEfcs, 146 FindEpsFollowers, 146 first(X),26,54 Fischer, 13 follow(X),28,143 formal attributes, 113,165 frame module, 118,159,161,297,328 free monoid, 14 free semi-group, 14 front end, 6 G-code,53,55,88,117,155, 213 example, 195 generation, 156 parser, 58 GAG, 91,96,102,104 Ganzinger,91,98 GenAssign, 166, 167,223 GenOode, 156,157 Generate G-code, 157 generated compiler parts, 118 example, 192 generated compiler, operation, 120 generated semantic actions, 165 generation of die semantic evaluator, 245 syntax analyzer, 254 generative grammar, 23 Get eps-sets, 145 Get symbol sets, 127 Get terminal start symbols, 142 Get terminal successors, 144 GetAdr, 157 GetAt, 129,165,167,223 GetFirstSet, 142 GetMacroNr, 163,223 GetNbde, 131,140, 148,157,223 GetSingles, 151 GetSy, 122, 124, 129,140,148,223 Giegerich,91 Goos, 13,82, 83 GRAM4AR, 106 grammar, 15 grammar of Cocol, 212 grammar name, 106,110,121 grammar rules, 107 grammar tests, 126,147,225, 338 grammars in matrix form, 34 grammatical language levels, 22 GraphList, 223 Graphnode, 47, 130 Gries, 7,13,85 HALT, 209 handle, 18 Hartmann,85 Henderson, 184 HIGH, 209 hints for reading the source lists, 226 HLP84, 91,94,104 Hopcroft,21 Hughes, 188 Hutt,96 IBM-PC, 101,126 identifiers, 106 implementation description, 125 implementation module, 210 implementation restrictions, 241 import, 115,122 list, 209 mc.209 ECU 209 indirect deletability, 141 individual characters, 6 inherited attributes, 74,75 inner module, 211 input attribute, 113 input of Coco, 118 input interface, 122 Insert eps-nodes oefore deletable nt's,127,138 interfaces of the generated compiler, 121 intermediate language, 120,124 intermodular cross-reference list, 214 invocation of Coco, 118 IsTerm, 152 Jackson, 187 Johnson, 13,91,92 Kastens, 91, %
376 Index keywords, 6,105 Knuth, 13,29, 82 Koskimies, 91,94,102 L-attributed grammar, 4,82,83,92,117 LALR(l) parser, 92,94,96 language, 16 levels, 22 LeBlanc, 13 left-canonical derivation, 17 left-recursive grammar, 19 Lewis, 82 lexical analysis, 5,6 analyzer, 119,122,129,165,275 analyzer described by Cocol, 171 analyzer, specification, 172 language level, 22 Lilith, 101,126,198 line, 122 line numbers, 122,131 linking alternative graphs, 133 component graphs, 132 listings, 220 literals, 6 LL(1) test, 148,153 LL(1) analysis nonrecursive, 38 recursive, 35 LL(1) conditions, 27,28 for top-down-graphs, 47,49 LL(1) conflicts, 108 in lexical structures, 179 LL(1) grammar, 23,26,201 LL(k) condition, 40 LL(k) grammar, 25,40 LL(k)test,41 lookahead, 25 Macintosh, 101,119,126 macro, 112,116,163 main algorithm of Coco, 127 main program, 119,121,210,222,241 MarkFeachecttts, 150 matching of symbols, 48 matrix form of grammars, 34 measurements, 197 Meijer, 91 memory requirements of Coco, 199 the generated compilers, 200 Morffypes, 226 mini-scanner, 174 Modula-2, 111, 115,119,126,207 modules, 209 description, 222 hierarchy, 221 overview, 220 Mflssenbflck, 119 MUG, 91, 98,104 multi-pass compiler, 8,9,120,124 name list, 129,155 names, 6 Newadr, 157 Newftt, 129,164, 167, 223 NevMacro, 223 NewNode, 131, 223 NewSy, 129, 223 Nijholt, 91 nococosy, 162 nodes of the top-down graph, 130 non-circular grammar, 21 nonterminal, 14,15,110,128 deletable, 141 nonterminals factorization of, 49 replacement of, 15 substitution of, 49 terminal successors of, 140,143 termination of, 108,152 numbering of terminals, 109,122 numbers, 106 QperiFile, 223 OpenSem, 163, 223 optimization of attribute processing, 167 option symbol, 20 OR, 208 ordered attributed grammar, 96 OS, 226 output attribute, 113 output of Coco, 118 output interface, 122 parameter arrows, 112 Parse, 58, 60, 86, 121, 127 ParseNbnRecursive, 38
Index 377 parser, 223,316 generation, 159 interface, 121 tables, 118,155 tables, example, 195 tables, generation, 154 ParseRecursive, 35 parts of the generated compiler, 119 Pascal, 207 pass, 8 phrase, 17,18 PL/1, 50 PLM/80, 50 Pomberger, 207 pragma, 109,124 semantics, 113,128,155 printinput, 121 printnodes, 121 procedures, 115 productions, 15,107 program frames, 118 program listings, 220 QuickDraw, 226 Raiha, 91,94 reachability, 149 recursive grammar, 19 productions, 19 reduced grammar, 20,21 redundancy, 108 redundant eps-node, 138 symbol, 21 repetition symbol, 20 replacement of nonterminals, 15 RepNbde, 131,140,223 RepSy, 129,140,223 FestartHash, 162, 223 restrictions, 241 results of a Coco run, 192 right end of graphs, 131 right-recursive grammar, 19 Ritzinger, 207 root, 15 symbol, 106,110,149 Rosenkrantz, 40,42 RUIES, 107 run-time of Coco, 199 the generated compilers, 201 scanner, 129,165,223,275 scanner generator, 119,171 scanner interface, 122 scanner procedure, 122 scanner specification, 172 scope of semantic objects, 116 sem 70, HI Semant, 85, 86 semantic action numbers, 131 actions, 70, 111 actions, generated, 165 actions, processing, 163 analysis, 5, 8 declarations, copying, 162 description, 110 error action, 115 evaluator, 118,119,223 evaluator of Coco, 287 evaluator, example, 194 evaluator, generation, 160 frame module, 297 interface, 85 macro, 111, 112,116,163 modules, 119,122 procedures for lexical analysis, 180 semantics, 69 sentence, 16 symbol, 15 sentential form, 16 simple phrase, 18 single-pass compiler, 8,9 Snowdon, 184 software engineering, 182 source code, 220 hints, 226 source list, 118 generator, 283 source program, 2 spelling index, 129 spix, 128,129, 162,166 stacking of semantic objects, 116 start symbol, 110,149 StartCopy, 223 static compiler structure, 4 Steams, 40,42 stepwise refinement, 11 StopHash, 162, 223 strings, 6,14,106 substitution of nonterminals, 49
378 Index symbol list, 126,127,224,226,299 symbol names, 129 symbol sets, collection, 140 Syntoolrxxte, 127 symbols, 6, 14 Synboltype, 127 SyNr, 129,223 syntactic extension, SI syntactical language level, 22 syntax analysis, 5, 34 analyzer, 118,119,223,316 analyzer, generation, 159 ofCocol,212 description, 106 error indicator, 121 error interface, 123 error message, 109 error-recovery, 118 notation, 107 rules, 15,107 tree, 7,14,17,91 SyntaxError, 123 synthesis phase, 5 synthesized attributes, 74 SYSTEH211 System, 226 system specific procedures, 369 target program, 2 tasks of Coco, 126 telegram problem, 184 terminal, 14,15,109,122,128 class, 23 start symbols, 26,31,32,140,142 start symbols of length k, 40 successors, 28,31,33 successors of eps-nodes, 140,145 successors of nonterniinals,, 140,143 terminating symbol, 21 termination, 21 of nonterminals, 108,152 Test completeness, 148,149 Test grammar, 127,148 Test if all nt's can be derived to fs, 148,152 Test if all nt's can be reached, 148, 149 token code, 109,122 Toolbox, 226 top-down graph, 42,126,130,226,266 graphs, equivalent, 45 graphs, factorization of, 43 syntax analysis, 23,24 top-down-graphs, LL(1) conditions for, 47,49 trace switches, 241 tracing the parser, 121 Triple, 66 two level-grammar, 77 tyRl22 type transfer functions, 209 Ullman, 13,21,41 understanding the source code, hints, 226 useless symbol, 21 user modules, 122 using Coco, 117 Vach,98 van Wijngaarden, 77 variables, 115 versions of Coco, 4 Visited, 157 vocabulary, 14 Waite, 13,82,83 Watt, 77 where, 77 Wirth, 20,85,107,198,207 word, 208 YACC, 91,92, 98, 104 Zimmermann, 96 Y ' OH r,C^