Author: Manes E.G.   Arbib M.A.  

Tags: computer science  

ISBN: 978-1-4612-9377-4

Year: 1986

Text
                    Texts and Monographs in Computer Science

Editor

David Gries
Advisory Board
F. L. Bauer
J. J. Horning
R. Reddy
D. C. Tsichritzis
W. M. Waite


The AKM Series in Theoretical Computer Science A Subseries of Texts and Monographs in Computer Science A Basis for Theoretical Computer Science by M. A. Arbib, A. J. Kfoury, and R. N. Moll A Programming Approach to Computability by A. J. Kfoury, R. N. Moll, and M. A. Arbib An Introduction to Formal Language Theory by R. N. Moll, M. A. Arbib, and A. J. Kfoury Algebraic Approaches to Program Semantics by E. G. Manes and M. A. Arbib
Algebraic Approaches to Program Semantics Ernest G. Manes Michael A. Arbib Springer-Verlag New York Berlin Heidelberg London Paris Tokyo
Ernest G. Manes Department of Mathematics and Statistics University of Massachusetts Amherst, Massachusetts 01003 U.S.A. Michael A. Arbib Departments of Computer Science, Neurobiology and Physiology University of Southern California Los Angeles, California 90089 U.S.A. Series Editor David Gries Department of Computer Science Cornell University Upson Hall Ithaca, New York 14853 U.S.A. Library of Congress Cataloging in Publication Data Manes, Ernest G., 1943Algebraic approaches to program semantics. (Texts and monographs in computer science) Includes index. 1. Programming languages (Electronic computers)Semantics. 2. Algebra. I. Arbib, Michael A. II. Title. III. Series. 1986 005.13'1 86-6563 QA76.7.M34 © 1986 by Springer-Verlag New York Inc. All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag, 175 Fifth Avenue, New York, New York 10010, U.S.A. The use of general descriptive names, trade names, trademarks, etc. in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Typeset by Asco Trade Typesetting Ltd., Hong Kong. 9 8 7 6 5 432 1 ISBN-13: 978-1-4612-9377-4 DOl: 10.1007/978-1-4612-4962-7 e-ISBN-13: 978-1-4612-4962-7
To Bernadette and Prue
Preface In the 1930s, mathematical logicians studied the notion of "effective computability" using such notions as recursive functions, A-calculus, and Turing machines. The 1940s saw the construction of the first electronic computers, and the next 20 years saw the evolution of higher-level programming languages in which programs could be written in a convenient fashion independent (thanks to compilers and interpreters) of the architecture of any specific machine. The development of such languages led in turn to the general analysis of questions of syntax, structuring strings of symbols which could count as legal programs, and semantics, determining the "meaning" of a program, for example, as the function it computes in transforming input data to output results. An important approach to semantics, pioneered by Floyd, Hoare, and Wirth, is called assertion semantics: given a specification of which assertions (preconditions) on input data should guarantee that the results satisfy desired assertions (postconditions) on output data, one seeks a logical proof that the program satisfies its specification. An alternative approach, pioneered by Scott and Strachey, is called denotational semantics: it offers algebraic techniques for characterizing the denotation of (i.e., the function computed by) a program-the properties of the program can then be checked by direct comparison of the denotation with the specification. This book is an introduction to denotational semantics. More specifically, we introduce the reader to two approaches to denotational semantics: the order semantics of Scott and Strachey and our own partially additive semantics. Moreover, we show how each approach may be applied both to the specification of the semantics of programs, including recursive programs, and to the specification of new data types from old. There has been a growing acceptance that category theory, a branch of abstract algebra, provides a perspicuous
viii Preface general setting for all these topics, and for many other algebraic approaches to program semantics as well. Thus, an important aim of this book is to interweave the study of semantics with a completely self-contained introduction to a useful core of category theory, fully motivated by basic concepts of computer science. Computer science seeks to provide a scientific basis for the study of information processing, algorithms, and the design and programming of computers. The past four decades have witnessed major advances in programming methodology, which allow immense programs to be designed with increasing speed and reduced error, and in the development of mathematical techniques to allow the rigorous specification of program, process, and machine. The present volume is one of a series, the AKM Series in Theoretical Computer Science, designed to make key mathematical developments in computer science readily accessible to undergraduate and beginning graduate students. The book is essentially self-contained: what little background is required may be found in the AKM volume A Basis for Theoretical Computer Science. However, this book is more algebraic than other books in the AKM Series, and as such may prove somewhat heavier going-at least for American students, since the American curriculum in theoretical computer science, as distinct from the European curriculum, stresses combinatorial methods over algebraic methods. The book is organized in three parts: Part 1 presents the denotational semantics of control, that is, the way in which the denotation of a program can be obtained from the denotation of the pieces from which it is composed. The approach is motivated by analysis of a fragment of Pascal, a functional programming fragment, and a consideration of nondeterministic semantics. Basic notions of category theory include those of product and coproduct. Chapter 3 presents the elements of partially additive semantics, including a denotational semantics of iteration and a new theory of guards ("test functions") which provides a bridge between denotational semantics and the assertion semantics presented in Chapter 4. Part 2 extends the theory of Part 1 by showing how the Kleene sequence yields a denotation for the computation given by a recursive program. Chapter 6 then introduces domains as the setting for the order semantics of recursion, while Chapter 8 provides the partially ordered semantics of recursion. Chapter 7, on canonical fixed points, provides a unified setting for both approaches, as well as for the study of fixed points in metric spaces in Chapter 9. Part 3 extends the theory to data types. The crucial tools are provided by the following notions from category theory, which are introduced in Chapters 10 and 11: functors, fixed points of functors, and co-continuous and continuous functors. We motivate these with a discussion of how a generalized Kleene sequence can provide the denotation of a recursive specification of a data type. In Chapter 12, we consider parametric specification of data types, analyzing arrays, stacks, queues, and our functional programming fragment
ix Preface in the process. We devote Chapter 13 to the order semantics of data types. Finally, Chapter 14 gives a brief introduction to describing data types using operations and equations, and extends the earlier theory of functorial fixed points to include these ideas. As a result, the reader is not limited to anyone algebraic approach to program semantics, but rather is given the tools to tailor the formal semantics to the need of different applications. The book grew out of our research in partially additive semantics, which was in turn based on our general investigation of "category theory applied to computation and control." We thank the National Science Foundation for its support of this research. This volume represents an attempt to place the material in the perspective of other approaches to denotational semantics, and to render the common algebraic tools as accessible as possible. We thank our many colleagues in both America and Europe for all they taught us in the course of this research, and for their comments on an earlier draft of the book. It is with regret that we note that limitations of space make it impossible to address all the topics raised in this correspondence within the compass of an introductory text. Finally, we thank Gwyn Mitchell and Kathy Adamczyk for their typing ofthe draft of this manuscript; and Ms. Adamczyk for helping with research on the notes for Chapter 5. Amherst, Massachusetts ERNEST G. MANES MICHAEL A. ARBIB
Contents Part 1 Denotational Semantics of Control CHAPTER 1 An Introduction to Denotational Semantics 1.1 1.2 1.3 1.4 1.5 Syntax and Semantics A Simple Fragment of Pascal A Functional Programming Fragment Multifunctions A Preview of Partially Additive Semantics CHAPTER 2 An Introduction to Category Theory 2.1 2.2 2.3 The Definition of a Category Isomorphism, Duality, and Zero Objects Products and Coproducts 3 3 5 11 21 26 38 39 46 57 CHAPTER 3 Partially Additive Semantics 71 3.1 3.2 3.3 71 75 85 Partial Addition Partially Additive Categories and Iteration The Boolean Algebra of Guards CHAPTER 4 Assertion Semantics 4.1 Assertions and Preconditions 4.2 Partial Correctness 4.3 . Total Correctness 98 98 102 109
xii Contents Part 2 Semantics of Recursion CHAPTER 5 Recursive Specifications 119 5.1 5.2 5.3 120 129 The Kleene Sequence The Pattern-of-Calls Expansion Iteration Recursively 139 CHAPTER 6 Order Semantics of Recursion 146 6.1 6.2 6.3 6.4 147 152 160 169 Domains Fixed Point Theorems Recursive Specification in FPF Fixed Points and Formal Languages CHAPTER 7 Canonical Fixed Points 176 CHAPTER 8 Partially Additive Semantics of Recursion 180 8.1 8.2 8.3 8.4 8.5 180 186 193 PAR Schemes The Canonical Fixed Point for PAR Schemes Additive Domains Proving Correctness Power Series and Products 200 203 CHAPTER 9 Fixed Points in Metric Spaces 210 9.1 9.2 9.3 9.4 210 218 220 228 Contractions on Complete Metric Spaces Differential Equations Metrics on Trees Context-Free Languages as Metric Fixed Points Part 3 Data Types CHAPTER 10 Functors 235 10.1 10.2 236 245 Data Types Lead to Functors Fixed Points of Functors CHAPTER 11 Recursive Specification of Data Types 258 11.1 11.2 11.3 258 266 From Least Upper Bounds to Least Fixed Points Co-continuous Functors Continuous Functors and Greatest Fixed Points 272
Contents CHAPTER 12 xiii Parametric Specification 279 12.1 12.2 12.3 280 283 288 Arrays Stacks and Queues A Functional Programming Fragment Revisited CHAPTER 13 Order Semantics of Data Types 293 13.1 13.2 13.3 13.4 293 296 300 305 Introduction Constructions with Domains Cartesian-Closed Categories Solving Function Space Equations CHAPTER 14 Equational Specification 318 14.1 14.2 319 328 Initial Algebras Sur-reflections Epilogue 341 Author Index 345 Subject Index 347
PART 1 DENOTATIONAL SEMANTICS OF CONTROL
CHAPTER 1 An Introduction to Denotational Semantics 1.1 1.2 1.3 1.4 1.5 Syntax and Semantics A Simple Fragment of Pascal A Functional Programming Fragment Multifunctions A Preview of Partially Additive Semantics 1.1 Syntax and Semantics To specify a programming language we must specify its syntax and semantics. The syntax of a programming language specifies which strings of symbols constitute valid programs. A formal description of the syntax typically involves a precise specification of the alphabet of allowable symbols and a finite set of rules delineating how symbols may be grouped into expressions, instructions, and programs. Most compilers for programming languages are implemented with syntax checking whereby the first stage in compiling a program is to check its text to see if it is syntactically valid. In practice, syntax must be described at two levels, for a human user through programming manuals and as a syntax-checking algorithm within a compiler or interpreter. "Semantics" is a technical word for "meaning." A semantics for a programming language explains what programs in that language mean. In more mathematical terms, semantics is a function whose input is a syntactically valid program and whose output is a description of the function computed by the program. There are different approaches to semantics. We briefly introduce three: operational semantics, denotational semantics, and assertion semantics. We will give an example of an operational semantics in the next section. Assertion semantics will be further considered in Chapter 4. Denotational semantics is a major concern of this book. Operational semantics is the most intuitive for beginners with some programming experience, being the form of semantics described in most programming manuals. To provide an operational semantics for a programming
4 1 An Introduction to Denotational Semantics language, one invents an "abstract computer" and describes how programs "run" on this computer. Usually, the semantics prescribes how the syntactic form of a program is to be interpreted as a (data-dependent) sequence of instructions. Input data are then transformed as the program is run in sequence, instruction by instruction, branching and looping back on the basis of tests on current values of data. By contrast to operational semantics which traces all intermediate states in a computation, denotational semantics focuses on input/output behavior and ignores the intermediate states. Operational semantics provides more information on how to implement a programming language as long as the implementation environment resembles that of the abstract computer. For example, an operational semantics in which every computation is described as a serial sequence of state changes would be somewhat at odds with an implementation on a pipeline architecture which maximizes parallel computation. An objective of denotational semantics is to avoid worry about details of implementation. A challenge posed by denotational semantics is to invent mathematical frameworks permitting the description of repetitive programming constructs (i.e., "loops") without explicit reference to intermediate states. The "partially additive semantics" of Section 1.5 introduces a power-series representation for computed functions which, in part, expresses programming constructs in terms of operations that manipulate power series. Other approaches to denotational semantics, to be discussed in Part 2, use partially ordered sets and metric spaces for their mathematical underpinnings. Before discussing assertion semantics we must first introduce assertions. An assertion is a statement about the program state which is either true or false. As an example, consider the (hopefully transparent) program 1. 1 INPUTS: X OUTPUTS: Y {X ~ O} BEGIN (a block of code representing an algorithm for Y := END {X=Y*Y}. JX) The assertions are shown enclosed by braces, { and }. They are not part of the program, but assert what properties should hold true when the assertion is encountered in executing the program. A program is correct if indeed the satisfaction of all initial assertions about the input data guarantees the truth of all assertions encountered later on. One could attempt to design a programming language with assertions in mind. All built-in functions would come with associated assertions and for each programming construct there would be rules explaining how to find
5 1.2 A Simple Fragment of Pascal suitable assertions for the overall construct from the pieces of the construct and their assertions. Ideally, every program would automatically be strewn with assertions with the following beneficial effects. The assertions would usefully document the program, and it would be possible to write software that could automatically scan the assertions to detect bugs and check for correctness. In the next section we introduce a small fragment of Pascal giving a formal syntax and an operational semantics. In Section 1.3, however, we introduce a functional programming fragment that makes no use of identifiers or assignment statements. Here, the concept of "state" (which in Section 1.2 means the values stored by the identifiers) would require major overhaul before one could give an operational semantics or an assertion semantics. It is hard to create general semantic theories devoid of built-in assumptions about the programming languages to which they apply! 1.2 A Simple Fragment of Pascal In this section we describe an abbreviated version of Pascal. Although this limited version has full computing power with regard to functions whose inputs and outputs are natural numbers, this is a tangential point-the main objective of this section is to illustrate how to present a formal syntax as well as an operational semantics for a simple programming language. The reader should observe that the level of precision of the operational semantics is such that it becomes fairly clear how to write a compiler or interpreter for the Pascal fragment, so that we accomplish more than an exercise in formalizing what we already knew. The complete syntax of our Pascal fragment is given in Table 1. Here, the colons, commas, and periods are not among the 64 symbols in the alphabet. Parentheses are used liberally to ensure that there is exactly one way to derive an expression, test, or statement using the building rules and beginning with those which are given outright. We do not give a formal proof of this here, but encourage the reader to explore this (see Exercise 1). Three examples of expressions are ((a + 5)*2), 572, (cat + (dog + mouse)), whereas, according to our rules, a+5 is not an expression. An example of a statement is shown in 2.
6 1 An Introduction to Denotational Semantics Table 1 The Syntax of a Pascal Fragment Alphabet of Symbols Digits: 0, 1, ... ,9 Letters: a, b, ... , z Boolean Truth Values: T, F Parentheses: ( , ) Boolean Connectives: --" v, /\ Comparisons: =, =1-, <, ~, >, ;:0: Arithmetic Functions: +, -, *, -7Statement Constructors: :=,;, begin, end, if, then, else, while, do, repeat, until The set of expressions is defined by: Given Outright: Any nonempty string of digits (called a numeral), a letter followed by a (possibly empty) string of digits and letters (called an identifier). Building Rules: If D, E are expressions so are (D + E), (D - E), (D * E), (D -7- E). The set of tests is defined by: Given Outright: T and F D = E, D =I- E, D < E, D ~ E, D > E, D ;:0: E for any two expressions D, E. Building Rules: If B, C are tests so are (--, B), (B v C), (B /\ C). The set of statements is defined by: Given Outright: I := E if I is an identifier and E is an expression. Building Rules: If Sl' ... , Sn are statements (n ;:0: 0) so is begin Sl; ... ; Sn end. If B is a test and R, S are statements, so are (if B then Reise S) (while B do S) (repeat S until B). 2 begin a := 5; (while (a > 0 /\ a # 6) do a := a - 1) end Note that begin, while, do, and end are single symbols in the chosen alphabet and that there is no space symbol in the alphabet. Normally, one displays a statement so as to be more readable by humans, for example, as in 3: 3 begin a:= 5; (while (a> 0 /\ a # 6) do a := a - 1) end This is harmless since we obtain 2 from 3 by ignoring the aspects (in this case the vertical arrangement and the spaces) which are not expressible in the formal syntax. We assume that the reader already has a good idea of what the semantics of our fragment should be. (For example, the algorithm described by 2 always terminates with identifier a storing the value 0.) A formal operational semantics is as follows. We imagine an abstract computer with one memory location set aside for each identifier. Each location stores a single value, where a value is either a natural number or the symbol 1.. meaning "as yet undefined." At any time, only finitely many locations store a number. The effect of executing a state-
7 1.2 A Simple Fragment of Pascal ment is to assign numerical values to identifiers by evaluating numerical expressions according to an algorithm controlled by tests and conditional and repetitive constructs. (Here we ignore overflow: our numerical operations, +, -, *, -;-, for addition, subtraction, multiplication, and division compute exact integer values no matter how large.) The only thing that can "go wrong" is that we might attempt to evaluate an expression containing identifiers for which no numerical values have been assigned. When this happens we wish to abort the computation and so we create a special abort state ro. Every other state is a normal state which we define to be a function a from the set of all identifiers to the set of all values, with the requirement that a(l) #- 1- for only finitely many identifiers I. The initial state is the function r which assigns 1- to each identifier. The operational semantics of a statement S will be defined as a computation sequence of states beginning with the initial state r and taking one of the forms 4a, 4b, or 4c: 4a r, aI' ... , an' ro (n > 0, all ai -=f. ro); 4b r, aI' ... , an' ... (all ai -=f. ro); 4c (n ~ 0, all a i -=f. ro). In 4a, computation aborts. In 4b, computation is nonterminating. In 4c, the computation terminates in a normal state an' We now turn to the details of how to associate a definite sequence of states to a statement. Here the description of Table 1 provides a guide. (We substitute the more mathematical terms "basis step" for "given outright" and "inductive step" for "building rules" from now on.) We must first assign appropriate values to expressions and tests (a process that depends on the state). 5 The value [a, E] of expression E in normal state a is defined inductively as follows. Basis Step: If E is a numeral, [a, E] is the usual base-tO natural number value of E (with leading zeros ignored). If E is an identifier, [a, E] = aCE). Inductive Step: If either [a, D] = 1- or [a, E] = 1-, then [a, (D + E)] = [a, (D - E)] Else [O',(D + E)] = [0', (D * E)] = [0', (D -;- E)] = 1-. = [O',D] + [O',E] [O',(D - E)] = [O',D] -=- [O',EJ [a,(D*E)J = [O',DJ[O',EJ [a,(D -;- E)J = [a,DJ div [a,EJ
8 1 An Introduction to Denotational Semantics are the expected natural-number arithmetic operations so that x ..:.. y means the maximum of 0 and x - y and x div y is the largest integer ~ y/x, that is, the unique integer q with y = qx + r, where the remainder r satisfies o ~ r < x. (Here we have relied on the earlier-stated fact that there is only one way to decouple an expression; if there were more than one way the above rules might assign values to expressions ambiguously.) To illustrate how 5 is used, suppose that O"(a) = 3. Then [O",((a + 5)*2)] + 5)] [0",2J = ([O",aJ + [0",5J)[0",2J = (3 + 5)(2) = 16. = [0", (a Tests are evaluated in a similar way: 6 The truth value [0", BJ of test B in normal state 0" is defined inductively as follows. Basis Step: [0", TJ = T, [O",FJ = F. [0", D = EJ is .1 if either of [0", DJ or [0", EJ is .1, else is Tor F accordingly as [O",DJ = [O",EJ, [O",DJ #- [O",E]. [O",D #- EJ, [O",D < EJ, [O",D ~ EJ, [O",D > EJ, and [O",D ;;::: EJ are defined similarly. Inductive Step: Let I (not), v (or) /\ (and) have their usual meanings on the Boolean truth values T, F (T for "true," F for "false") so that, for example, IT = F, IF = T, F /\ T = F, and so on. Then [0", (IB)] is .1 if [O",BJ is .1, else is I [O",B]. [0", (B v e)] is .1 if either of [0", BJ or [0", eJ is .1, else is [0", BJ v [0", e]. [0", (B /\ e)] is .1 if either of [0", BJ or [0", eJ is .1, else is [0", BJ /\ [0", eJ. As a prelude to defining the semantics of statements, we aid the reader's intuition with flowschemes for the programming constructs in Table 7.
9 1.2 A Simple Fragment of Pascal Table 7 Flowschemes for Programming Constructs Assignment Statement: I := E --1 1:= E ~ Composition: begin S1; ... ; Sn end -1L-__ _...J~ Conditional: (if B then Reise S) ----+ S_l Repetitive Constructs: (while B do S) F T (repeat S until B) The principal semantic definition is: For any normal state a the computation sequence of S starting at a is a state sequence a, S) of one of the three forms 8a, 8b, 8c, < (n 8a 8b a, ai' ... , an'''' ~ 0), all ai #- w); (all ai #- w);
10 1 An Introduction to Denotational Semantics 8c (n ~ 0, all a i i= co). (with interpretations similar to those of 4a, 4b, 4c) defined inductively as follows. 9 Basis Step <a,I:= E) = { a,co a,a 1 if [a,E] =.1.. else, where _ {a(J) a 1 (J ) - [a,E] if J i= I if J = I. This is the expected meaning. Identifier I is assigned the value obtained by evaluating E, as long as this is possible, and other identifiers are left unchanged. Inductive Step 10 Composition. Define <a, begin end) = a and define <a, begin Sl end) = <a, Sl)' Proceeding inductively on the number of statements, assume that <a, begin Sz;"'; Sk+1 end) has been defined for every normal state a and every k statements Sz, .. " Sk+1' Then <a,begin S1;",;Sk+1 end) is defined as follows. It is defined to be <a, Sl) if "S1 fails to terminate normally starting at a," that is, if <a,Sl) has one of the forms Sa, Sb. Otherwise, <a,S1) = a, a 1, ... , an as in Sc so we define <a, begin Sl;' .. ; Sk+1 end) to be the sequence In short, we form the sequence obtained if each Si+1 begins where the previous Si leaves off, save that this cannot continue if computation aborts or one of the Si did not terminate. 11 Conditional. <a,(if B then Reise S) <a,co) { = <a,R) <a, S) if[a,B]=.1.. if [a,B] if [a, B] = T = F. Repetitive constructs, The computation sequence <a, (while B do S) is given by 12. 12 a, co if [a, B] = .1..; a if [a, B] = F.
11 1.3 A Functional Programming Fragment <(J, while B do S) = <(J,S) (J, (J b ... , (In-b <(In' while B do S) if [(J,B] = T and [(J,S] has one of the forms Sa, Sb if [(J, B] = T and <(J, S) has the form (J, (J 1, ... , (In of Sc. This sequence may, of course, fail to terminate. We leave it to the reader to formulate a similar definition for <(J, (repeat S until B). 13 The computation sequence <S) of the statement Sis <r, S), where r is the initial state mapping each identifier to .1. The operational semantics of our Pascal fragment is complete. EXERCISES FOR SECTION 1.2 1. Give several examples of ambiguities that would arise if the operational semantics of the syntax of Table 1 were modified to delete some of the parentheses. Why were we able to get away without parentheses in the tests D = E, ... , D ;;::: E? Why were parentheses not needed to enclose begin Sl; ... ; Sn end? 2. Give an inductive definition of the set of numerals which excludes numerals with leading zeros. 3. Provide an algorithm for <(J, (repeat S until B) similar to that of 12. 4. Two statements R, S are semantically equivalent if <(J,R) = <(J,S) for all normal states (J. Let B be any test and let R, S be any statements. Show that any two of the following three statements are semantically equivalent: begin (while B do R); Send. (if B then begin (while B do R); S end else S). begin (while B do (if B then Reise S)); Send. 5. Let m be a natural number ;;::: 1. Find a polynomial p(n) with natural-number coefficients such that an m-symbol alphabet has p(n) words of length :;;; n. (Do not forget the empty word of length 0 which is an important input in practice: "carriage return.") 1.3 A Functional Programming Fragment In his provocative 1977 Turing Award Lecture, John Backus expressed concern that many programming languages were syntactically fat and unwieldy but semantically lean and inexpressive. In reaction, he proposed a new class of languages, the functional programming languages, in which a "program" is a symbolic input/output function whose inputs are not given names: there are no identifiers, assignments, or references of any kind to intermediate storage and hence there are no side effects (such as clashes between local and global
12 1 An Introduction to Denotational Semantics variable identifiers) to concern the programmer. In this section we present a simple functional programming fragment whose principal data structures are trees similar to the "s-expressions" of the programming language LISP but many of whose function constructors are patterned after those emphasized by Backus. Because we delay introduction of repetitive constructs into this fragment until our later discussion of recursion in Chapter 5, the version of this section temporarily fails to have full computing power. We shall call our language FPF for "Functional Programming Fragment." The syntax of FPF is given in Table 1. Here the colons and periods are not among the 32 symbols of the alphabet. The reader need not feel uneasy if Table 1 fails to explain how FPF works, since that is the job of semantics: syntax has no meaning! We will give a denotational semantics for FPF. We begin by discussing DTN whose inductive definition is given in Table 1 which is the set of DTNs, that is, dynamic trees of numerals (note the different typeface for the set and for the generic name of elements of the set). This set includes lists of numerals, namely, the DTNs of form <n l , ... , nk ), where n i are numerals. The case k = 0 gives the empty list as a DTN. Similarly, we can have a list of lists such as «5,17), < ), (035», the list whose first entry is the list <5,17), <) Table 1 The Syntax of FPF Alphabet of Symbols Digits: 0 1 ... 9 Parentheses: ( ) < > Atomic Functions: id head tail + - * -;- num = Function Constructors == 0 if then else [ ] ex / The set DTN of dynamic trees of numerals (DTNs for short) is defined by: Basis Step: A numeral (i.e., a nonempty string of digits) is a DTN. Inductive Step: If t 1 ... tk are DTN s (k ;::: 0) then <t 1, ... , t k is a DTN. The set of functions is defined by: Basis Step: An atomic function symbol is a function if t is a DTN then == t is a function. Inductive Step: If f1 ... fk are functions (k ;::: 1) then so are Uk 0'" 0 fd and [f1' ... ,fk]. If p, f, 9 are functions then so is (if p then f else g). If f is a function then so are (exf) and (If). > whose second entry is the empty list, and whose third entry is the length-l list consisting of the numeral 035. Other examples are less homogeneous, for example, <05,« »,0,3». An m x n matrix of numerals (au), usually visualized as a rectangular array with the numeral au in row i and column j, may conveniently be coded as the DTN 2
13 1.3 A Functional Programming Fragment The input to a matrix multiplication algorithm may then be coded as a length-2 list whose entries are matrices as in 2. These examples suggest the ease with which DTNs model complex inputs and outputs. Each DTN has a unique derivation tree describing how to build it using the basis and inductive steps in the definition of DTN of Table 1. For example, <1, «0,10), < has derivation tree ») o 10 where each node (= dark circle) indicates a list whose entries are the subtrees branching from that node (read in left-to-right order). A node without branches thus indicates the empty list. The node at the root (= top) of the (upside down) tree indicates the lists represented by the whole tree. It is clear that such derivation trees are in natural one-to-one correspondence with the elements of DTN and, indeed, that the list notation is just a convenient way to code such a tree as a string. This explains the term dynamic tree of numerals. "Dynamic" is in the same sense as in the term "dynamic array" in Pascal, meaning that the lengths and shapes of DTNs are not prespecified in a "declaration." In our denotational semantics of FPF, the semantics of each syntactic function will be such as to transform inputs in DTN to outputs which are again in DTN. We pause briefly to note what kind of function constitutes such a transformation: 3 Definitions. Let X, Y be sets. A partial function from X to Y is specified by providing a subset A of X and a function mapping each element of A to a unique element of Y. We say X is the domain, Y is the codomain and A is the domain of definition. (Other authors use "domain" for our "domain of definition." Our terminology follows the conventions of category theory as discussed in the next chapter; see Definition 2.1.1.) Our most common notation will be to assign a symbolic name such as f to a partial function. We write "let X ~ Y be a partial function" to mean f is a partial function from X to Y. We may also write f: X -+ Y in place of X ~ Y. In either case, we use f(x) for the value assigned by f to each x in its domain of definition, which we denote by DD(f). If x E X but x ¢ DD(f) we say ''f(x) is undefined." 4 The set of all partial functions from X to Y will be written Pfn(X, Y). The
14 1 An Introduction to Denotational Semantics "partial" in partial function means "partially defined." Paradoxically, an important special case of a partial function X ~ Y occurs when DD(f) = X. This is just a function from X to Y. For emphasis, we call such f a total function from X to Y. 5 We relate this to program semantics in general before returning to DTN. Let X be an input set and let Y be an output set. A given algorithm with input x in X may fail to terminate. Let A be the subset of X consisting of those x for which the algorithm terminates if x is the input. The denotational semantics of the algorithm is the partial function f: X -+ Y with DD(f) = A, and where f(x) is the output at termination when x is the input. (In 1.4.5 below we will consider a computation environment in which 5 requires modification.) 6 The set of all total functions from X to Y will be written Tot(X, Y). If f E Pfn(X, Y), g E Pfn(Y, Z) arise as in 5 we may think of f, g as the computations of subalgorithms which can be chained together setting the output of f as the input to g to produce a net output in Z from an input in X. The formal operation involved is as follows. 7 Definition. For f E Pfn(X, Y), g E Pfn(Y, Z) their composition gf E Pfn(X, Z) is defined by DD(gf) = {xEXlxEDD(f),!(x)EDD(g)}, (gf)(x) = g(f(x)) for x E DD(gf). Note that gf is total when f and g are. The functions studied in first-semester calculus are partial functions from the set of reals to itself (e.g., DD(l/x) = {xix #- O}, DD(arcsinx) = {xl-I:::; x :::; I}, etc.). The "chain rule" refers to the composition of 7, being a rule for the derivative of gf. Composition of functions is sometimes called "chaining" because the output of one function is the input to the next, creating a chain of two links. Longer chains arise in 12 below. We turn now to the semantics for FPF by associating a partial function in Pfn(DTN, DTN) to each syntactic function. To keep the notation as simple as possible we will denote the semantics of a function f by f: and so we will write f: t for the value f: assigns to the DTN t. Thus, the presence of the colon (which is not in the alphabet of Table 1) indicates semantics. In describing a specific partial function f, if a formula for f : t is given for t of a particular form without further comment our convention is that f: is not defined for other t. Sometimes, of course, DD(f) is sufficiently complicated for a more careful description to be necessary. We begin with the basis step functions in Table 1.
15 1.3 A Functional Programming Fragment 8 id: is the identity function, id: t = t for all t E DTN. head returns the first element of a list and tail drops the first element of a list as follows: head: <t1> ... ,tk ) = t1 tail: <t 1, ... ,tk ) = <t 2 , ••• ,tk ) (k ~ 1), (k ~ 1). Thus, we can not make heads or tails of the empty list or numerals. 9 The arithmetic functions +, -, *, and -;- require an input of form <m,n) where m, n are numerals. The meaning of the operations is then the same as in Pascal as described in 1.2.5. Thus, +: <m,n) = m + n, -: <m, n) = m ...:.. n, *: <m, n) = mn, -;- : <m, n) = m div n, where, on the right-hand sides, the numerals m, n represent numbers in the usual base-lO way and the numerical results are represented as numerals without leading zeros. 10 The numeral function num is defined by num: t = {«» <) if t is a numeral else. Similarly, the equality function = takes an input of the form <t, u) where t, u E DTN are arbitrary and = : <t, u) . {«» IS <) if t = u 'f t #- u. 1 Here we have coded the truth values as DTNs by representing T as« »and This is analogous to the trick used in set theory (mathematicians sometimes adopt the view that all of mathematics may be derived from set theory) to define natural numbers in terms of sets wherein is defined as the empty set 0, 1 is defined as the one-element set {0}, 2 is defined as the two-element set {a, 1} = {0, {0}}, and n = {O, ... , n - 1} in general. Using lists instead of sets, that is, by substituting for { and ) for }, the same constructions are available in DTN. We could have used the numerals and 1 for F and T but it seemed more desirable to use a convention that would apply to dynamic trees of objects other than numerals. In fact, our convention is analogous to that used in the programming language LISP, where the empty list NIL is used for the truth value F and any other values may be interpreted as T. We now provide the semantics for the basis step in the set of functions defined in Table 1. F as < ). ° < °
16 1 An Introduction to Denotational Semantics 11 For each DTN t, == t: is the total function which is constantly t, that is, ==t: u is t for any DTN u. To continue our description of the semantics of FPF we examine the constructions of the inductive step for functions in Table 1. 12 If fl' ... , fk are functions (k z 1), (fk 0 ••• 0 fl): is the k-fold composition of 7, being essentially the same as the pseudo-Pascal fl; ... ; fk' that is, (Such is defined, of course, only when all the intermediate steps are defined.) The next construction applies k functions in parallel and combines the results in a single list. 13 If fl' ... , fk are functions (k DD(fk} and z 1), then DD([fl, .. · ,fkJ} = DD(fl} n··· n This construction is a major tool in building lists. 14 For p, f, 9 functions, (if p then f f: t { else g): tis g: t undefined p: t =f. p: t = <) <) p: t is undefined. Thus, our device for viewing function p as a test is to consider p: t false if it is our coding < ) for false, true if it is defined but not false, and undefined else. The notation above is understood to mean that (if p then f else g): t is unbut g: t is undefined or if p: t is defined and =f. but defined if p: t = f: t is undefined. <) <) 15 The symbol IX is the apply-to-all operator. If f is a function, (1Xf): is "f: applied to all entries in the input list." Specifically, an input to (1Xf): must have the form <t1> ... , tk) with k z 1 and each ti E DD(f} and then (1Xf): <tl> ... ,tk) = <f: tl'···,f: tk)· 16 The symbol/is the insertion operator. If f is a function then (If): <t 1, t 2, t 3), for example, will be defined as f: <tl,f: <t 2, t3». Equivalently, using infix notation t f u instead of f: <t, u), (If): <t 1, t 2, t 3) = tl f (t2 f t3)· Similarly, (If): <t 1, t 2, t 3, t 4) will be tl f (t2 f (t3 f t 4))· Thus, / treats f as a function of two variables and extends it to a function on any number of variables by "inserting" it between the variables. The formal definition is as follows. The input must have the form <t 1, ... ,tk) (k z O), that is, it cannot be a numeral. We use induction on k.
17 1.3 A Functional Programming Fragment (If): < ) = < ), (If): <t 1) = t 1, (If): <t 1,···,tk+1) =f: <tl>(lf): <t 2,···,tk+1»· This completes the description of the syntax and semantics of FPF. Since the reader may have had very little prior experience with functional languages, we will write some FPF functions to illustrate some of the concepts. Additional examples using recursion will be given in Section 5.1, but we shall be able to achieve quite a bit without any repetitive constructs. Indeed, it is possible to write an FPF function to multiply two square matrices and this is done in 26 below. We begin by introducing "abbreviations" which amount to "subprograms." 17 We introduce the symbol =.bb' If f is a syntactic function then X =.bb f, read "x is an abbreviation for f," is an informal declaration that any occurrence of x may be literally replaced by the string f We begin with some abbreviations which produce functions to manipulate lists and matrices. 18 For any function f and n ?: 0,1" is the abbreviation defined by fO =abb id, f1 =.bb f, 1" =.bb (f 0 ' " 0 f) (n times for n > 1). For i ?: 1 we have the following abbreviations: 19 pr; 20 21 =.bb (head 0 taiJi- 1 ), the ith projection function. col; transPn =.bb =.bb (cxpr;), the ith column function. [coIl," ., coIn], the n-column transpose function. Thus, transp3 is an abbreviation for the FPF function [(cx(head 0 id)), (cx(head 0 tail)), (cx(head 0 tail 0 tail))]. The reader may easily check that pr;: <t 1, ... , t n ) is t; for i :::; n but undefined for i > n so that pr; selects the ith entry of a list, that col; returns the ith column of a matrix i:::; n i> n,
18 1 An Introduction to Denotational Semantics and that transPn: «all'···, aln ), .. ·, <amI'···' amn » = «all,···,aml),···,<aln,···,amn» produces the transpose of an n-column matrix. In 26 below we define an FPF function to multiply two n x n matrices. We present a strategy whereby a sequence of subfunctions will be composed to produce the desired function. We begin with the input at step 0 and the result at step i will be the output of the ith subfunction and the input to the (i + l)th subfunction. For C a matrix, we use the notations Ci and Ci for the ith row andjth column of C, and thus C/ for its ij component. Step 0: Step 1: Step 2: Step 3: <A, B), A, B the input n x n matrices coded as in 2. «AI,B), ... , <An,B». «<AI,B I ), ... , <AI,Bn», ... , «An,B I ), .... , <An, B n»). Replace each <Ai' Bi) with its dot product dot: <Ai,Bi) = ~::<A7B{ 1 ~ k ~ n) so that the result in Step 3 is indeed the matrix product desired. We implement these subfunctions as follows. The ith row of a matrix is just the ith column of its transpose and this leads to 22 Since pr2: <A,B) = B, a function to transform step 0 to step 1 is In the same vein, if then g: <Ai' B) = «Ai' BI), ... , <Ai' Bn» so that 112 =abb (rxg) 24 transforms step 1 to step 2. A powerful use of insertion and apply-to-all is 25 dotn =abb ( / + )(rx *)transPn
19 1.3 A Functional Programming Fragment whose semantics for two length-n lists of numerals is dotn: «P1,···,Pn),(q1,···,qn» = P1q1 + ... + Pnqn (In detail, transPn: «P1,···,Pn),(q1,···,qn» = «P1,q1),···,(Pn,qn» (cu): «Pl,q1),···,(Pn,qn» = (P1q1,···,Pnqn) (/ + ): (P1 ql> ... ' Pnqn) = P1 q1 + (/ +): (P2q2,···, Pnqn) = P1Q1 + P2Q2 + (/+): (P3Q3,···,PnQn) = ... = P1Q1 + ... + PnQn)· ». Thus, step 2 to step 3 is achieved by (oc(oc dotn Chaining these steps together, we obtain an FPF abbreviation to multiply two n x n matrices in » f12 matmliltiplYn =abb «oc(oc dot n 26 0 0 fod for f01 as in 23, f12 as in 24, and dotn as in 25. We conclude the section with a few additional abbreviations and encourage the reader to work the exercises. Boolean operations may be derived as follows. Preserving the conventions of 10, coin the notations 27 F for ( ), T for « », false =abb == F, true =abb == T. We may then define (IP) =abb (if P then false else true), 28 if p: t = F if p: t # F undefined if p: t is undefined. T (IP): tis { F 29 (p v Q) =abb (if P then (trueoQ) else (if Q then true else false», T (p v q): t is { F undefined (p 30 1\ if p: t ::/= F or q: t # F if p: t = F = q: t if p: tor q: t is undefined, q) =abb(I«IP) V (Iq))). ::/= =abb (I = ). The numerical relations are then introduced as follows:
20 1 An Introduction to Denotational Semantics 31 ~ =abb ( = 0 [ - , > =abb (~ /\ < =abb (I ~), ~ =abb « V == OJ), #-), =). For example, ~: <t,u) is {; undefined EXERCISES FOR SECTION if t, u are numerals and t ~ u if t, u are numerals and t > u else. 1.3 1. Draw the derivation tree of the matrix of 2 for the case m = 3, n = 2. 2. Let X be an m-element set and let Y be an n-element set. Show that Tot(X, Y) has nm elements and that Pfn(X, Y) has (n + 1)m elements. 3. Recall that a total function f: X ---> Y is injective if whenever Xl' X 2 are distinct elements of X then f(x l ) oF f(x 2 ) in Y. Given fETot(X, Y), gETot(Y, Z), prove (a) that gf is injective if f, g are injective; (b) that f is injective if gf is injective; (c) that there are P(n, r) injective functions from an r-element set to an n-element set, where P(n, r) = n(n - 1)'" (n - r + 1) is the number of ways to select r things from n if the selection is made without repetition. 4. Recall that a total function f: X ---> Y is surjective iffor every y in Y there exists at least one x in X with f(x) = y. Given fETot(X, Y), gETot(Y, Z), prove (a) that gf is surjective if f, g are surjective; (b) that g is surjective if gf is surjective. 5. Describe taiF 0 head: as a partial function. 6. Write an FPF function for the everywhere undefined function .1 defined by 00(.1:) = 0. 7. If A, Bare m x n matrices, their sum is the m x n matrix A + B with ij entry aij + bij if A, B, respectively, have ij entry a ij , bij' Write an FPF function +m.n to add two m x n matrices. 8. Write an FPF function =m,n whose input is a list of two m x n matrices, such that _. <A, B) . {T -m n' . IS undefined A, Bare equal matrices if else. Say that two FPF functions f, g are semantically equivalent if f: = g: m Pfn(DTN, DTN). 9. Show that (if p then true else false) is semantically equivalent to (..., (...,p)). Give necessary and sufficient conditions on p for p and (""(...,p)) to be semantically equivalent.
21 1.4 Multifunctions 10. Prove that (p v q) and (q v p) are semantically equivalent. Prove, however, that if (p 0 q) =abb (if p then true else (if q then true else false)) then (p 0 q) and (q 0 p) are not semantically equivalent. 11. Prove that (p v q) and (I (( Ip) 1\ (Iq))) are semantically equivalent. 12. Write an FPF function f to compute the number of occurrences of the least value in a nonempty list of numbers. [Hint: a possible strategy is Step 0: Step 1: Step 2: Step 4: <n1, ... ,nk). <t1, ... ,tk) where ti = «ni,ni, ... ,n),<n1, ... ,nk»' <i 1 , ••• , ik ) where ij is 1 if nj S n, for all t, and ij = 0 else. sum all ij .] 1.4 M ultifunctions Since denotational semantics is to assign an input/output meaning to each program, it is reasonable to consider possible general forms for input/output "functions." In the Pascal fragment of Section 1.2 inputs and outputs were assignments of natural numbers to identifiers whereas they were DTNs for the functional programming fragment of Section 1.3. In Part 3 we shall be concerned with the theory of data types which addresses the question of how inputs and outputs can be structured (e.g., "DTN structure"). But even if we bypass this issue for the time being, allowing the inputs and outputs to have no particular structure, we may nonetheless wish to consider more general things than partial functions for input/output descriptions. In this section we introduce "multifunctions." We make no claim that partial functions and multifunctions exhaust all reasonable possibilities. Rather, we introduce the notion of a "category" in Chapter 2 as a candidate for a truly general framework. The common properties of partial functions and multifunctions studied in this section will help to motivate later work with categories. A total function is "single-valued" in the sense that exactly one output f(x) results for each input x. Similarly, a partial function is "at-most-one-valued." More generally, multifunctions obtain by allowing f(x) to be any set of outputs, including the empty set. For an example, consider an anthropological data base for a population P in which it is possible to retrieve the names of the children (also in P) of any person in P. The "children" multifunction f then assigns to each p in P the set f(p) of all children of p. The formal definition of a multifunction is as follows. 1 Definitions. Let X, Y be sets. A multifunction from X to Y is a total function from X to the set of subsets of Y. The set of all multifunctions from X to Y will be denoted Mfn(X, Y).
22 1 An Introduction to Denotational Semantics In set theory it is customary_ to call the set of subsets of Y the power set of Y which leads to the following standard: 2 Notation. If Y is a set, .9(Y) denotes the set of subsets of Y. We then have, by Definition 1, Mfn(X, Y) = Tot(X, .9(Y)). 3 Why should Definition 1 be useful, then, if multifunctions are just a special case of total functions? The reason lies in considering how we want to chain multifunctions together. For example, a grandchild is just a child of a child so that if f E Mfn(P, P) is the "children" multifunction as above, one intuitively expects to obtain the "grandchildren" multifunction by an appropriate composition of f with itself. Considering f as a total function from P to [3l!(P) and trying to compose f with itself as in 1.3.7 does not work because the value of the output f(p) does not have the right form to be an input to f. What we need is the following definition. 4 Definition. For f E Mfn(X, Y), 9 E Mfn(Y, Z), their composition gf E Mfn(X, Z) is defined by gf(x) = {zEZlthere exists YEf(x) with zEf(Y)}. Indeed, it is immediate that if f E Mfn(P, P) is the "children" multifunction then ff E Mfn(P, P) is the "grandchildren" multifunction we desired. Multifunctions are suitable input/output functions for the following parallel computation scenario which generalizes that of 1.3.5. 5 Let X be an input set and let Y be an output set. Beginning with an input x in X, a given algorithm simultaneously initiates a set of noninteracting computations. Some of these may not terminate and those that do may halt at different times. The denotational semantics of the algorithm is the multifunction f E Mfn(X, Y) which assigns to x the set f(x) of all outputs in Y resulting from some terminating computation initiated by input x. One might, for example, add atomic multifunctions to the functional programming fragment of Section 1.3 and give a multifunction denotational semantics based on 5 rather than 1.3.3. See Exercise 3. In such a situation we would need a multifunction semantics for the FPF (J;. 0 ••• 0 fd. Similarly, in attempting a multifunction semantics for Pascal we would need to assign a meaning to begin fl; ... ; fk end. While the composition operation of 4 is the natural candidate, a technical issue is raised. Up to now we have viewed the chaining together of, say, three functions in the following way:
23 1.4 Multifunctions For multifunctions, should this mean h(gf) or (hg)f? Fortunately, it makes no difference. 6 Proposition (Associative Law for Multifunction Composition). If f Mfn(W, X), g E Mfn(X, Y), hE Mfn(Y, Z) then h(gf) = (hg)f E Mfn(X, Z). E Let zE(h(gf))(w). Then there exists YE(gf)(W) with zEh(y). But then there exists X Ef(w) with YEg(X). By the definition of hg, zE(hg)(x) and so Z E ((hg)f) (w). So far, we have shown that (h(gf))(w) is a subset of ((hg)f)(w) for all WE W To complete the proof, let zE((hg)f)(w) and show zE(h(gf))(w). There exists X Ef(w) with zE(hg)(x). Thus, there exists YEg(X) with zEh(y). By the definition of gf, y E (gf)(w) and then Z E (h(gf))(w). 0 PROOF. Theorem 6 allows us to write the equal multifunctions h(gf) and (hg)f simply as hgf In fact, the proof has shown 7 (hgf)(w) = {ZEZ: there exists XEf(w) and then YEg(X) with zEh(y)}. Repeated use of the associative law guarantees that parentheses can be avoided for chains of all lengths. While we will not give a formal proof here, the following example indicates the general idea. 8 Example. e((d(cb))a) = (ed)(c(ba)) by five uses of6 as follows: e((d(cb))a) = e(((dc)b)a) = e((dc)(ba)) = (e(dc))(ba) = ((ed)c)(ba) = (ed)(c(ba)). (as d(cb) = (as ((dc)b)a (dc)b) = (dc)(ba) Thus, both compositions in 8 could be written edcba. See Exercise 5. We conclude this section by showing that partial functions (and so total functions) may be thought of as special cases of multifunctions. 9 Definition. For each f E Pfn(X, Y) define F E Mfn(X, Y) by F(x) = {{J(x)} o x E DD(f) else. Such F is closely associated to f For example, f can be completely deduced from F because DD(f) = {xEXIF(x) #- 0} and for xEDD(f), f(x) is the unique element of F(x). A multifunction g has the form F if and only if g(x) has at most one element for all x. Furthermore, the compositions of 4 and 1.3.7 respect each other as is shown in the next result.
24 1 An Introduction to Denotational Semantics 10 Proposition, Let fEPfn(X, Y), gEPfn(Y,Z) and let gfEPfn(X,Z) be the composition of 1,3,7. Let g1' E Mfn(X, Z) be the composition of 4. Then (gff = g1'· PROOF. (g1')(x) = {zlthere exists yEf'(X) with zEg'(y)} = {zlx E DD(f) and f(x) E DD(g) and z = g(f(x))} = {0{g(f(X))} xEDD(f) andf(x)EDD(g) else = (gf),(x). D The import of 9, 10 is that "partial functions are multifunctions," that is, blurring the distinction between f and f' is unlikely to be imprecise. Usually, one writes f' simply as f Thus, if f E Pfn(X, Y), g E Mfn( Y, Z) we would write gf without comment for the more precise gf' E Mfn(X, Z). One mild warning is in order, however, relating to 1.3,3. If a known programming statement computes f we would expect to be able to write the statement if f'(x) = 0 then g(x) else h(x) in, say, Pascal. This would compute h(x) if computation of f(x) halts, but would be undefined rather than returning g(x) if f(x) does not halt, that is, if x¢DD(f). In short, f'(x) = 0 in 1,3,5 should be interpreted not as a returned value but as nontermination. A similar interpretation applies to f(x) = 0 in 5. On the other hand, there are circumstances such as the "children" multifunction where 0 is a reasonable returned value. In a semantic environment where a possibly non terminating algorithm has the empty set as a possible returned value, multifunctions may not provide the correct type offunction. See Exercise 2.1.10. 11 Proposition (Associative Law for Partial Function Composition). If f E Pfn(W, X), g E Pfn(X, Y), hE Pfn(Y, Z) then, with respect to the composition of 1.3,7, h(gf) = (hg)f E Pfn(W, Z). PROOF. Using 6 and 10 we have (h(gf)f = «hg)f)', so that h(gf) = (hg)f. (hgff' EXERCISES FOR SECTION = h'(gff = h'(g1') = (h'g')f' = D 1.4 1. Show that &l(Y) has 2" elements if Y has n elements. Show that Mfn(X, Y) has 2m" elements if X has m elements and Y has n elements. 2. A relation from X to Y is a subset of X x Y where X x Y is the set of all ordered pairs (x, y) with x E X and y E Y. It is standard to write xRy as a synonym for (x, y) E R and to say "x, yare R-related" if xRy. Denote the set ofrelations from X to Yas Rel(X, Y). Three examples of relations are ::::; E Rel(N, N) where n ::::; m has
25 1.4 Multifunctions its usual meaning, R E Rel(R, N) if R is the set of real numbers and if xRn means x 2 = n, and S E Rel(P, W) if pSw means w is the sister of p where P is a set of people and W is a set of women. Prove that "a relation is the same thing as a multifunction." More precisely, for each f E Mfn(X, Y) define f* E Rel(X, Y) by xf*y if and only if y E f(x). Prove that fl-+ f* establishes a bijective (= injective and surjective) function from Mfn(X, Y) to Rel(X, Y). 3. It is easy to extend the semantics of FPF of Section 1.3 to multifunctions in Mfn(DTN, DTN). For function constructors, define (A 0···0 fd: using 4 (and 6): [fl' .. ·,jkJ: t = {<ul, ... ,un)luiE/;: t for all i} A = (if p then f else g): t = Au B {0 f: t (af): (t l ,···, t k> = {<u l (If): ( ° if p: t - {< )} oF else, >= (If): (t l > = {< >}, B where = {g: t if ( 0 else, tJor all ,···, uk>luiEf: >E p: t i}, {td, (If): <tl,···,tk+l> = {f: (tl,u>luE(lf): (t 2 ,···,tk+l>}. Consider the atomic functions of Table 1.3.1 as multifunctions as in 9. (a) Iff: denotes the partial function semantics of f as in Section 1.3 show that the multifunction semantics is just (f:f. Because of (a) we must extend the atomic functions to include multifunctions which are not partial functions in order for multifunction semantics to be interesting. Many possibilities might be considered. Here we explore two, the first being the "index generator" or "iota" function of the programming language APL and the second being a proper multifunction. Extend the syntax of Table 1.3.1 by adding 1 (lower case Greek iota) and foreach to the atomic functions. Their semantics is as follows. The input to 1 is a numeral nand I: n = (1,2, ... ,n) whereas for m, n ;::: 1, » = {(t i,u )ll::;; i::;; m, l::;;j::;; m}. foreach: «tl, ... ,tm>,(ul, ... ,un (b) If 9 =abb ((a *) 0 transp2 0 [I, I]) g: j show that n = (1,4,9, ... ,n 2 > for each numeral n. (c) Show that (+ 0 foreach 0 [g, g]): n = {p2 + q2: 1 ::;; p, q ::;; n} and use this to write a function to test if an input number n has the form p2 for p, q natural numbers. + q2
26 1 An Introduction to Denotational Semantics (d) Write a similar function to test if a number n has the form p2 q, r natural numbers. + q2 + r3 with p, 4. Generalize 7 to n multifunctions. 5. A complete list of all ways to parenthesize a chain of four functions using a binary composition is ((dc)b)a, (d(cb))a, (dc)(ba), d((cb)a), d(c(ba)). As in 8, show that all four are the same if the composition satisfies the associative law. 1.5 A Preview of Partially Additive Semantics In this section we consider partial functions and multifunctions as frameworks for denotational semantics without reference to any partic4lar programming language. Basic constructions such as chaining, conditional testing, and looping are described at the function level. The term "partially additive" refers to a kind of sum operation which can be defined on the sets Pfn(X, Y), Mfn(X, Y) (and more generally in Section 3.2). To fix the context we must choose just one of "partial function" or "multifunction," that is, we must specify the "semantic category" in the sense of the following definition (which will be generalized in the next chapter). 1 Definition. The semantic category is either Pfn (for partial functions) or Mfn (for ml.lltifunctions). We adopt the noncommittal notation SC(X, Y) to mean "Pfn(X, Y) if the semantic category is Pfn and Mfn(X, Y) if the semantic category is Mfn." 2 Notation. We will use all the notations f: X -+ Y X~Y -X---1·.r0 y • as synonyms for f E SC(X, Y). These may appear geometrically reoriented in diagrams, for example, right-to-Ieft, vertically, diagonally, and so on. The last notation is "flowscheme" notation. The important operation of iterated composition has already been introduced (in 1.4.4, 1.4.6-8 for Mfn, 1.3.7, 1.4.11 for Pfn). If/;ESC(Xi - 1 ,X;) for i = 1, ... , n, suitable flowscheme notation for the composition fn' .. f1 E SC(XO,Xn ) is
27 1.5 A Preview of Partially Additive Semantics 3 ~ .~ The labeled arrow notation X --.!- Y is useful in "commutative diagrams" such as 4 in which our convention is the following: 5 In a diagram such as 4, if two paths of arrows begin at the same place and end at the same place then, unless the contrary is indicated, the compositions of these paths are asserted to be equal. To emphasize this assertion we say "the diagram commutes." Thus, in 4, g = f3fdl ESC(XO,X3 ), h = f4f3ESC(XZ,X4), hfdl = f4gE SC(Xo, X 4 ), f = fsf4f3fzfl E SC(Xo, X 5 ), and so on. Notation such as f 6 IY could be used to indicate that h is not necessarily the same as gf E SC(X, Z). The identity function id: DTN ---+ DTN was introduced with FPF in 1.3.8. More generally, we have the following: 7 Definition. For each set X, the identity function of X, idx : X ~ X is the total function defined by idx(x) = x. This function is in Pfn(X, X) and so may be considered in Mfn(X, X) as in 1.4.9-10, so that always idx ESC(X, X). We clearly have the following: 8 For fESC(X, Y), idyj = f = fid x . We may express this by a commutative diagram:
28 1 An Introduction to Denotational Semantics Alternatively, inventing the "through box" X ~R x. as flowscheme notation for idx , 8 may be expressed in flowscheme terms by x X x f f y X y f y y We now introduce the fundamental operation of sum, first for Mfn and then for Pfn. 9 Definitions. Let X, Y be sets. Let I be a set and for each i E I let I; E Mfn(X, Y) (we say (I; liE I) is an I -indexed family in Mfn(X, Y)). Then the sum ~)I;I i E I) (alternatively written L 1;) is the multifunction in Mfn(X, Y) defined by ieI C~II;>x) = i~J;(X) = {YE YIYEI;(x) for some iEI}. Hence, for one-element families (meaning that I has one element) L (f) = I and, in case I is empty, the sum maps x to the empty set for all x in X (see Exercise 1). If I = {1,2, ... ,n} with n ~ 2 so that the family (1;IiEI) has the form (f1, ... , In), we write 11 + ... + In as a synonym for L (I; liE I). In general, we may write LI; instead of (1;1 i E I) when I is clear from context. L An intuitive flows cherne notation for summing is exemplified by the following.
1.5 A Preview of Partially Additive Semantics 10 29 f + g is written 1 and similarly for other families (I; liE I). This notation conveys the idea of 9 since an output from (f + g)(x) is an output from either of f(x) or g(x). We next seek a suitable sum operation for partial functions. It is easy to see that when each I; in 9 is a partial function (i.e., a multifunction which happens to be a partial function-recall 1.4.9-10) then 2..h need not be. In the case of 10, let f, g be partial functions and x be such that f(x) and g(x) are defined and different. Then the sum f + g maps x to the set {J(x), g(x)} and so is not a partial function. To better understand what needs to be fixed, imagine the "fanout" in 10, 1 n as controlled by a test such as "if f is defined go left; if g is defined go right." For multifunctions, such a test can pass the input down both lines simultaneously. For partial functions we demand that such a test choose at most one alternative and define 10 only when DD(f) (') DD(g) = 0. We have motivated. 11 Let X, Y be sets and let (1;IiEl) be an I-indexed family in Pfn(X, Y). Then (1;IiEI) is summable in Pfn(X, Y) if for all i,jEI with i =f.j, DD(fdn DD(J}) = 0. In that case, II; = I(l;liEI) in Pfn(X, Y) is defined by
30 1 An Introduction to Denotational Semantics DD(L.O = U DD(J;) iEI (LJ;)(x) = { jj(X) undefined if there exists j with x E DD(jj) else. Note that we do not require that I be finite. The following is an immediate result: r 12 If (J;liEI) is summable, (LJ;)" = I (J;'), where is defined in 1.4,9 and the latter sum is that of 9. Thus, the Pfn sum, when it exists, specializes the Mfn sum. In particular, we have for one-element families 13 I (f) = f and for empty families 14 I0=0, where 0: A -+ B denotes the everywhere undefined partial function characterized by DD(O) = 0. It is obvious that we may extend a summable family by adding any number of O's or we may delete any number of O's which are already there without affecting either the summability of the family or the value of the sum. It is for this reason that, in this context, we prefer the notation 0 instead of the alternate notation .1 introduced in Exercise 1.3.6. Our operation of sum, then, differs from ordinary numerical addition in two fundamental respects: (a) It is not always defined. Indeed, for any f E Pfn(X, Y) with DD(f) -# 0, f + f is never defined. The "partial" in "partially additive" refers to this property-addition (= sum) is only partially defined. (b) There are many infinite families whose sum is defined. We remark that even finite sums such as 10 can not be implemented in an unrestricted way. It is well known from computability theory that given two programs which compute partial functions f, 9 there is no way to decide, in general, if DD(f) n DD(g) = 0, and this makes it hard to imagine a suitable approach to compute f + 9 for arbitrary f, 9 (see Exercise 4 for an unsuccessful attempt). There remains the option to restrict the use of sum to "provably disjoint" families, and this will in fact be what happens when we give a partially additive semantics for iteration in Section 3.3 (see also 27 below). We turn to some properties of sum, beginning with the following one. 15 Proposition (Distributive Law of Composition over Sums in Mfn). Let fEMfn(W,X), let (giliEl) be a family in Mfn(X, Y), and let hEMfn(Y,Z). Then
31 1.5 A Preview of Partially Additive Semantics (IgJf = I(gJ)EMfn(W, Y), h(I gJ = I (hgJ E Mfn(X, Z). PROOF. YE«IgJf)(w)~there exists XEf(w), YE(LgJ(x) ~there exists XEX, iEI with XEf(w) and YEgi(W) ~there exists iEI with YE(gJ)(W) ~ YE(~::<gJ))(w). zE(h(Igi))(x)~there exists YE(LgJ(x) with zEh(y) ~ there exists i E I, Y E gi(X) with Z E h(y) ~ there exists i El with Z E(hgJ(x) D ~zE(I(hgi))(X). 16 Corollary (Distributive Law of Composition over Sums in Pfn). Let f E Pfn(W, Y), let (gdi E I) be a summablefamily in Pfn(X, Y), and let hE Pfn(Y, Z). Then (gJliEI) and (hgiliEI) are summable and (I gi)f = L (gJ) E Pfn(W, Y), h(IgJ n = L(hgJEPfn(X,Z). n n n PROOF. If WE DD(gJ) DD(gjf) then f(w) E DD(gJ DD(gj), so i = j. If x E DD(hgJ DD(hgj ) then x E DD(gJ DD(g), so i = j. Then equality of the sums follows from 15 in view of 12. D Proposition 15 and Corollary 16 are valid when I is empty, yielding the following: 17 For fESC(X, Y) and for all sets W, Z we have the commutative diagram o W~}~ Y 0 )Z where the four o's are the appropriate empty sums of 14. It follows that any composition fn ... f1 is 0 if any of the h is o. A useful result about the existence of sums is the following: 18 Proposition. Let (hliEI) be a summablefamily in Pfn(X, Y). Then: (a) If J c I, (hliEJ) is summable in Pfn(X, Y). (b) If (gd i E I) is a similarly indexed family (not necessarily summable) in Pfn( Y, Z), then (gih liE I) is a summable family in Pfn(X, Z).
32 1 An Introduction to Denotational Semantics PROOF. That (a) holds is obvious. For (b), if a E DD(gih) a E DD(.t;) DD(ij), so i = j. n nDD(ijgj) then 0 In the balance of this section we emphasize the use of sums to define programming constructs. 19 Definition. If A is a subset of X, the inclusion function of A is incA E Pfn(X, X) defined by DD(incA) = A, incA(x) = x. Thus, inc0 = 0 is the everywhere undefined function X -+ X and incx = idx . As usual, we consider incA E Mfn(X, X) as well, as in 1.4.9-10. 20 Definition. If P E Pfn(X, X) is an inclusion function (so that P = incA for A = DD(p)), we say P is a guard function and for f E SC(X, Y) we introduce the notation P -+ f for fp. The meaning of p -+ f is "if p is true then execute f else the result is undefined" where to say p(x) is true means x E DD(p). Thus p "guards" entry to f Such p -+ f is called a guarded command. 21 Definition. For n ;::: 1, an n-way test on X is (P1, .. . , Pn) with each Pi an inclusion function in Pfn(X, X) and DD(Pi) DD(pj) = 0 if i =f. j. n 22 Definition. Let (P1, ... ,Pn) be an n-way test on X and let f1, ... , fnE SC(X, Y). Then a natural generalization of the case statement in Pascal is case (P1,··· ,Pn) of (f1,··· ,f,,) = f1P1 + ... + fnPn with flowscheme Pn In The sum is defined by Proposition 18.
33 1.5 A Preview of Partially Additive Semantics A related construction in multifunction semantics is the following: 23 Definition. Let PI' ... , Pn be guard functions in Pfn(X, X) and let fl' ... , fn E Mfn(X, Y). Then the alternative construct is if PI -+ fl 0··· oPn -+ fn fi = flPI + ... + fnPnEMfn(X, Y). We emphasize that the guards here are not required to have disjoint domains. The intended meaning is "pick any i for which the guard Pi is true and execute h." The flowscheme is the same as in 22. The Pascal if-then-else construction is a special case of 22 as follows. 24 Definition. Let A be a subset of X. Define A'to be the complement of A, that is, A' = {xEXlx¢A}. Then (incA, incA') is a two-way test on X. For f, gESC(X, Y) define if A then f else 9 = f incA + 9 incA' in SC(X, Y). Two suitable flowschemes are x T F y y The sum operation and composition lead to a calculus to manipulate functions. We begin with two basic properties of inclusion functions whose proof is obvious and follow this with an example that simplifies a compound conditional statement. 25 Proposition. Let A, B be subsets of X. Then: (a) incA incB = incAnB = incBincA' (b) If An B = 0, incA + incB exists and is incAuB ' 26 Example. If A, B c: X, f, g, hE SC(X, Y) then
34 1 An Introduction to Denotational Semantics if A then (if B then I else g) else (if A' then I else h) = (/incB + gincB,)incA + (jincA, + hincA)incA, (since A" = == lincBincA + gincB, incA +I incA' incA' + gincAnB , + lincA' l(incAnB + incA') + g incAnB , +I incA incA' A) (by 15) = lincAnB (by 25) = (by 15 since the sum in parentheses is defined by 25) = l(inc(AnB)UA') + g incAnB , (by 25) = I inc(AnB')' + g incAnB, = if A n B' then g else f Repetitive constructs may be defined using infinite sums. For example, 27 Definition. For A c X,fESC(X,X) define while A do 1= co L incA,(jincA)"ESC(X,X) "=0 (where for g E Sc(Y, Y), g" is defined by gO = idy , g"+l = g"g) with one summand for each number n of traversals of the loop in the flowscheme ~--------------~x F x That the sum exists when the semantic category is Pfn is clear from the fact that 28 DD(incA,(jincA)") = {xEXlx,J(x), ... ,j"-l(X)EA,J"(x)¢A}, which ensures that x E DD(incA, (1 incA)") for at most one n. 29 Definition. For A c X,fESC(X,X) define repeat I until A = (while A' do f)f It is easy to use the laws for manipulating sums to deduce a formula like that of 27 (see Exercise 7).
35 1.5 A Preview of Partially Additive Semantics A companion to the multi valued alternative construct of 23 is the multivalued repetitive construct do Pl flO··· 0 Pn ~ fn od ~ which is intended to mean "pick any i for which guard Pi is true and execute h; repeat until no such i exists and then exit." Since the choice of i is multivalued, many successful computation paths of varying length are possible. A suitable formal definition of the semantics is the following: 30 Definition. Let Pl, ... , Pn be guard functions in Pfn(X, X) and let fl, ... , fn E Mfn(X, X). Then the multivalued repetitive construct is do Pl ~ flO'" 0 Pn ~ In od = where if Pi = incA, then A = Al infinite sum (see Exercise 8). 31 Example. For A c: X, f while A do if Pl U··· U E SC(X, X), --+ flO'" 0 Pn ~ fn fi, An" This may be expressed as an 9 E SC(X, Y) we have the identity x x g f g y that is, 9 (while A do f) = if A then 9 (while A do f) else g. A formal proof is as follows, where we make use of 15, 16, and 25. if A then 9 (while A do f) else 9 = (g n~o inCA,(fincA)") incA + gincA,
36 1 An Introduction to Denotational Semantics = g (Jo (since incA incA = incA while incA' incA = 0) incA,(f incAt) (since the sum in parentheses is defined) = g (while A do f). EXERCISES FOR SECTION 1.5 1. If (Ai: i E J) is a family of subsets of Y with union A then x E A if and only if there exists i E J with x E Ai' Conclude that A is empty if J is empty. 2. Show that every multifunction is a sum (in Mfn) of partial functions. 3. Give an example off, g, hE Pfn(X, Y) such that f f + g + h is not defined. + g, f + h, g + h are defined but 4. Given programs to compute f, g E Pfn(X, Y) one can design an operating system to run both programs simultaneously (e.g., by interleaving steps). Explain why this approach can not be made to compute f + g. 5. Give an example of fl' f2EPfn(X, Y), gl, g2EPfn(W,X) such that fl defined butflgl + f2g2 does not exist. + f2 is 6. Give a proof of 25. 7. Using the laws for manipulating sums give a careful proof that repeat f until A = I 00 n=O (incAfHincA, f)n. 8. Give a careful proof that 00 do PI -> flO'" 0 Pn -> J. od = I k=O incA,(fIPI + ... + J.Pn)k, where A = DD(pd v··· v DD(Pn). 9. Although no mathematically distinguished function in SC(X, X) suggests "abort" we may declare a particular element aEX to be the "abort value" and treat the total function abort(x) = a as the abort function in SC(X, X). Give a modified form of the alternative construct of 23 which aborts if none of the guards is true. Similarly, give a modified form of the multi valued repetitive construct of 30 which aborts if none ofthe guards is true initially. 10. As in Example 31, draw flowschemes and give a proof of g (while A do f) = g (while A do (if A then f else g)). 11. Draw flowschemes and give a proof of while A do f = while A do (while A do f). 12. Let X, j( be sets and let A c X, Ac j(, fESC(X,X), h, kESC(X,j(), j, ~E
37 Notes and References for Chapter 1 SC(X, X). Assume that k is such that if A then hg else k = (if A then j else g) hE SC(X, X). Show that k (while A do f) = {j (while A do j) hE SC(X, X). Draw flowschemes for the hypothesis and the conclusion. Notes and References for Chapter 1 For an operational semantics of Pascal see K. Jensen and N. Wirth, PASCAL Users Manual and Report, Springer-Verlag, 1974. Denotational semantics of programming languages stems from the work of D. S. Scott and C. Strachey. See J. Stoy, Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory, MIT Press, 1979, for a textbook account. Assertion semantics was introduced in R. Floyd, "Assigning meanings to programs," in Mathematical Aspects of Computer Science, American Mathematical Society, 1967, pp. 19-32, and C. A. R. Hoare "An axiomatic basis for computer programming," Communications of the Association for Computing Machinery, 12, 1969, pp. 576-580, 583. A textbook account is given by S. Alagic and M. A. Arbib, The Design of Well-Structured and Correct Programs, Springer-Verlag, 1977. Backus' Turing Award Lecture is published in Communications of the Association for Computing Machinery, 21,1978, pp. 613-641. For computability theory based on a Pascal fragment see A. J. Kfoury, R. N. Moll, and M. A. Arbib, A Programming Approach to Computability, Springer-Verlag, 1982. Computable functions were first defined in various equivalent ways in the 1930s before the computer age. As such, the idea of developing a theory of computable functions without reference to a programming language as suggested by Section 1.5 is not at all new. What is different about the modern approach is an emphasis on constructions that seem likely candidates for use in defining the semantics of a programming language. Partially additive semantics was introduced by the present authors in two papers, Journal of Algebra, 62, 1980, pp. 203-227 and Journal of the Association for Computing Machinery, 29, 1982, pp. 577-602. The second of these cites Karp (1959), De Bakker and Meertens (1975), and De Roever (1976) for applying the Mfn sum of 1.5.9 to aspects of semantics. For a formal proof that the associative law implies that all n-chains, regardless of parenthesization, compose equally see N. Jacobson, Lectures in Abstract Algebra, Van Nostrand, 1951, pp. 20-21. The alternative construction and the multivalued repetitive construct are set forth in the book by E. J. Dijkstra, A Discipline of Programming, Prentice-Hall, 1977. While he requires these constructions to have the abort features of our Exercise 9, in fact his constructions coincide with those we have given because his abort function is indistinguishable from nontermination.
CHAPTER 2 An Introduction to Category Theory 2.1 The Definition of a Category 2.2 Isomorphism, Duality, and Zero Objects 2.3 Products and Coproducts Going beyond the partial functions and multifunctions already considered, one might invent other useful notions of the input/output function from X to y. In addition to the need to consider X, Y as "data structures," there are theoretical approaches to semantics in which all X, Y must carry further structure. Rather than embark on the misguided task of presenting an exhaustive list of present and future possibilities, we introduce categories as a framework for semantics which possess so little structure that most models of semantics can be represented this way. Surprisingly, what structure remains can be extensively developed and there is a great deal to say. Category theory per se is tangential to this book. We discuss only a few topics which bear directly on our analysis of the "semantic category." In Section 2.1 we introduce the notion of a category which provides the bare bones of abstraction of the semantics of composition. Section 2.2 introduces the useful organizing principle of duality and relates it to isomorphisms and to initial and terminal objects. Isomorphism is self-dual and initial is dual to terminal.The uniqueness of initial objects has important instantiations in semantics, such as the uniqueness of a sequence defined by simple recursion. Zero objects are simultaneously initial and terminal and generalize the empty set in PCo. To round out this introduction to category theory we present, in Section 2.3, the notion of product and the dual concept of coproduct which both find frequent applications throughout the book. With this, we have all the category theory needed for our study of program semantics in Chapter 3. Further category theory is developed in Chapter 4 as motivated by the issues raised by attempting to describe assertion semantics in a semantic category. When we turn to the study of data types in Part 3 we shall need to call
2.1 The Definition of a Category 39 on further concepts from category theory-functors, limits, and algebraic theories. The concepts in this chapter are quite abstract and may seem so even to readers with experience in pure mathematics. We encourage patience! Familiarity with the language will grow and the approach should come to seem increasingly natural with the applications to semantics in subsequent chapters. 2.1 The Definition of a Category A "category" is an abstraction of "sets and functions between them." In a category sets become "objects," abstract things with no internal structure. There are sets in the theory, however, namely, for each two objects X, Y there is a set of "morphisms" from X to Y. These morphisms compose in an associative way, and there are identity morphisms. The motivating examples for us are 2 and 3 below. Here, then, is the precise definition: 1 Definition. A category C is given by data (i), (ii), (iii) subject to axioms (a), (b), (c) as follows. Datum i. A collection ob(C) of C-objects X, Y, Z, .... Datum ii. For each ordered pair of objects (X, Y) a set C(X, Y) of Cmorphisms from X to Y. We use the term map as a synonym for morphism. Axiom a. The sets C(X, Y) are disjoint: if C(X, Y) n C(X, Y) =f. 0, then X = X and Y = Y. We will rarely say fEC(X, Y), introducing instead the following two synonymous notations: f: X --+ Y, X ~ Y. Here X is called the domain of f and Y is the codomain of f. Axiom a guarantees that this definition makes sense, that is, there will never be any ambiguity concerning the domain or codomain of a morphism. Datum iii. A composition operator 0 assigning to each ordered pair of morphisms (f, g) of form f: X --+ Y, g: Y --+ Z (i.e., the codomain of f coincides with the domain of g) a third morphism go f: X --+ Z whose domain is that of f and whose codomain is that of g. Axiom b. Composition is associative, that is, given f: X --+ Y, g: Y --+ Z, h: Z --+ W, (hog)of = ho(gof): X --+ W Axiom c. For each object X there exists an identity morphism id x : X --+ X with domain and codomain X with the property that for each morphism f: Y --+ X, id x o f = f and for each morphism g: X --+ Z. goid x = g. This completes the definition. We observe at once that the id x of axiom c
40 2 An Introduction to Category Theory is unique. For suppose also that u: X --+ X satisfies u 0 f = f for all f: Y --+ X and go u = 9 for all g: X --+ Z. Regarding u as f for id x , id x 0 u = u. Regarding id x as a 9 for u, id x 0 u = id x . Thus, u = id x . Hence, id x is well named as the identity morphism of X. As is usual for mathematical structures generally, a host of alternate notations may prove useful. Thus, composition might be denoted 9 *f instead of 9 0 f for some categories. Since composition is the basic operation of category theory we shall most often write composition with no symbol at all, as gf We shall almost always stick to id x for the identity morphism of X. Even in our first examples, different categories may share the same objects and even the same morphisms. In such situations different arrows such as f: X --,. Y may be used and alternate notation for composition may be essential. 2 Example. Set, the category of sets and total functions. Here objects are sets, a morphism f: X --+ Y is a total function from X to Y, composition is the usual one, (gf)(x) = g(f(x)), and idx(x) = x. 3 Example. Pfn, the category of sets and partial functions. Here objects are sets but a morphism f: X --+ Y is a partial function from X to Y. Composition is as in 1.3.7. The identity (total) function still provides id x . Note that Pfn (X, Y) in the sense of definition 1 is exactly Pfn(X, Y) as in 1.3.4. 4 Example. Mfn, the category of sets and multivalued functions. Here objects are sets, Mfn(X, Y) is as in 1.4.3 with composition given by 1.4.4, and idx(x) = {x}. 5 Example. ANMfn, the category of sets and multi valued functions with "all or nothing" composition. In this example, objects are sets and ANMfn (X, Y) = Mfn(X, Y) but composition gf: X --+ Z for f: X --+ Y, g: Y --+ Z is defined by f( ) 9 x - {0 0 if g(y) = for some YEf(x) {ZEZ: there exists YEf(x) with zEg(y)} else. (This is "all or nothing" in the sense that scenario 1.4.5 has been modified so that no output is defined if any computation fails to terminate.) The identity morphism id x is the same as in Mfn. Thus, the only difference between ANMfn and Mfn is composition. Examples 2-5 are categories. For all but ANMfn, axiom b has been established in Section 1.4; we leave the modification of properties 1.4.6 to ANMfn as an exercise. Axiom c is routine. Axiom a holds by definition-we consider the domain and codomain as part of the definition of a function. In the student's likely first encounter with functions, elementary calculus, axiom a is not made explicit. Formulas such as x 2 are confused with functions and one speaks one moment of"x2 for -1 :s; x :s; 10" and the next moment of"x2
41 2.1 The Definition of a Category for 2 ::; x ::; 3." According to our conventions these are different functions. This is reasonable since these functions have different properties-for example, the second is monotone increasing where the first is not. We again avoid a formal proof that repeated use of the associative law axiom b establishes that all n-fold compositions are equal regardless of parenthesization and so can be written without parentheses as f. ... fl' (Example 1.4.8 clearly goes through in any category.) The commutative designation such as 1.5.4 is useful in any category. Thus, in diagram "'yA X 9 IB~ ~D~ Y we understand that "ba = hgf" is asserted and we may emphasize this assertion by saying "the diagram commutes." When one regards a category as "the semantic category" generalizing 1.5.1 (with 3, 4, and 5 being examples), the flow scheme notation of 1.5.2-clearly a workable synonym for f: X ~ Y in any category-is useful. In practice, however, many other types of category arise. Experience dictates that virtually any class of structures can be made the objects of a category in a "natural" way. Some of the possibilities are explored in the exercises. We turn now to examples of categories that are useful in this book but not necessarily as "semantic" categories. 6 Definitions. A partially ordered set, poset for short, is a pair (P, ::;) where P is a set and ::; is a binary relation on P which is a partial order on P. This is defined to mean that the following three axioms hold for all x, y, Z E P. Reflexivity: x ::; x. Transitivity: if x ::; y and y ::; Z then x ::; z. Antisymmetry: if x ::; y and y ::; x then x = y. We emphasize that the symbol::; has no a priori meaning. Any relation satisfying the three axioms is a partial order, and many different partial orders may be of interest on one set. While other symbols could be used-for example, xRy instead of x ::; ythe ::; symbol gives rise to the following associated definitions. In a poset (P, ::;) say that x< y if x ::; y but x i= y, x ;::: y if y ::; x, x> y if y < x, x i y if it is false that x ::; y (warning: not equivalent to x > y; see the Hasse diagram below).
42 2 An Introduction to Category Theory x 1:. y, x -;f y, x i. yare defined similarly. It is not so clear how to obtain similar conventions with the symbol R. A useful device for drawing finite posets galore is the Hasse diagram, an example of which is d", ~ b~C a Here P is the set of nodes ( = dark circles); P = {a, b, c, d, e} in this example. The partial order is defined by x ::;; y if and only if x = y or x is below y and there exists an upward path from x to y. It is easy to see that (P, ::;;) is always a poset. In the above example a ::;; b, a ::;; d, while b, c are incomparable because b i c and c i b. A totally ordered set is a partially ordered set (P, ::;;) in which every two elements are comparable-given x, y at least one of x ::;; y or y ::;; x holds. The term partially ordered set refers to the possibility that incomparable pairs may exist. Posets are fundamental structures arising frequently in mathematics and theoretical computer science. They play several roles in this book. Here are some examples of po sets: 7 Example. If N = {a, 1,2 ... } is the set of natural numbers and::;; has its usual meaning, (N, ::;;) is a totally ordered set. 8 Example. If Y is any set and &(Y) is the set of subsets of Y (1.4.2) then (&,(Y), c) is a poset where A c B is the usual subset inclusion. Note that we may have that is, A, BE &,(Y) but neither A c B nor B c A holds. Thus, if Y has two or more elements, (&(Y), c) is not totally ordered. 9 Example. For any two sets X, Y and f, g E Pfn(X, Y) define f ::;; g to mean g extends f, that is, "if f(x) is defined then g(x) is also defined and then g(x) = f(x)." Then (Pfn(X, Y), ::;;) is a poset which is not totally ordered. This example is important in Section 5.1.
43 2.1 The Definition of a Category Partially ordered sets form a category: 10 Example. Define Poset to be the category whose objects are posets and with Poset((P, ::;), (PI' ::; d) the set of all total functions f: P -+ PI which are monotone in the sense that if Xl ::; X 2 then f(xd ::;d(x 2 ) Composition and identity morphisms are as in Set. The reader should check that Poset does then satisfy the category axioms. We next introduce another important mathematical structure. 11 Definition. A monoid is a triple (M, a, e) where M is a set, a: M M is a function, and e EMaIl subject to the axioms X M- a is associative (xa y)az = xa(yaz) for all X, y, z in M. e is the identity: eax = xae = X for all X in M. As for categories, the composition of Xl' ... ' xn is written without parentheses as Xl a···ax n • 12 Example. For any category C and any object X E ob(C), the set C(X, X) of all morphisms of X to itself forms a monoid under composition, with identity id x . 13 Example. An example of a monoid familiar from formal language theory is (X*, cone, A), where X* is the set of all finite strings (Xl' ... ' Xm), m ~ 0, with each Xi in the given "alphabet" X. Here cone is the operation of concatenation---conc((xl, ... ,Xm)(YI, ... ,Yn)) = (xl, ... ,Xm,YI, ... ,Yn) and A = ( ) is the empty string (= (Xl, ... , Xm) with m = 0). 14 Example. The category Mon has monoids as objects, monoid homomorphisms as morphisms. Here, given two monoids (M, a, e) and (M', *, e'), we say that a function f: M -+ M' is a monoid homomorphismf: (M, a, e ) (M', *, e') if f(e) = e', while f(x a y) = f(x) * f(y) for all x, y in M. We define composition and identity as for functions. The reader should check that Mon does indeed satisfy the category axioms. 15 Definition. Let C be any category and let flfi be any subclass of ob(C). Define a category D by ob(D) = flfi, D(X, Y) = C(X, Y) for each X, Yin flfi with composition and identities the same as in C. A routine check shows D is a category. We call it the full subcategory induced by flfi. ("Full" refers to the fact that all C-morphisms between objects in flfi have been retained.) Since no restrictions have been imposed on flfi, full subcategories give rise to a rich supply of new categories. Even more generally:
44 2 An Introduction to Category Theory 16 Definition. Let C be a category. A subcategory D of C is given by a subclass ob(D) of ob(C) and, for each X, Y in ob(D), a subset D(X, Y) of C(X, Y) subject to the axioms that id x E D(X, X) and, whenever f E D(X, Y), gED(Y,Z) then gfEC(X,Z) is, in fact, in D(X,Z). It is obvious that such D, with the composition inherited from C, satisfies axioms a, b, c of the definition of a category. Thus, a subcategory is a category in its own right. Clearly, a subcategory D of C is a full category if and only if D(X, Y) = C(X, Y) for all X, YEob(D). 17 Example. Set is a (nonfull) subcategory of Pfn since ob(Set) = ob(Pfn), Set (X, Y) c Pfn(X, Y), id x E Pfn(X, X) is the total identity function and if f, g are compos able total functions their composition gf as partial functions is their composition as total functions. EXERCISES FOR SECTION 2.1 1. Find a formula analogous to 1.4.7 for the composition of three multifunctions in ANMfn. Do the same for n multifunctions. 2. Repeat Exercise 1.4.5 in any category. 3. The category Veet has real vector spaces as objects, linear maps as morphisms, composition, and identity morphisms at the level Set. Verify that this is a category. 4. Let C be any category and let X be any object at C. Define a category D as follows. A D-object is a C-morphism ofform f: A ...... X. A D-morphism from f: A ...... X to fl: Al ...... X is a C-morphism g: A ...... Al such that the following diagram commutes: Define composition and identity morphisms as in C. Verify that D is a category. It is called the category of C-objects over X. 5. Let C be any category. Let be C-morphisms. Define a category D as follows. A D-object is (S, t, u) where t: S ...... X, u: S ...... Y, are C-morphisms such that
45 2.1 The Definition of a Category x ---:f'--~) z A D-morphism from (S, t, u) to (Sl' t 1 , ud is a C-morphism 0(: S -+ Sl such that S X ~l~ Y ---u-: IX ~SI Define composition and identity morphisms as in C. Verify that D is a category. 6. Let C be any category. Let D be the category of commutative squares of C defined as follows. A D-object is a commutative square (A, B, C, D, r, s, t, u): U A D-morphism from (A, B, C, D, r, s, t, u) to (AI' B 1 , C1 , D1 , r 1 , Sl, t 1 , u 1 ) is a 4-tuple (0(, fJ, ')I, b) where 0(: A -+ Ai> {J: B -+ B 1 , ')I: C -+ C1 , b: D -+ Dl such that the following "commutative cube" obtains: A 'r --------------------~)B ~ ,1 ~Cl r1 U1 ~ s )Dl~ C ---------------------+) D Define composition as in C, that is, and similarly let (idA' id B , ide, id D ) be the identity morphism. Verify that D is a category. 7. Working this exercise will give the reader a head start on later work in data types. Let (P, s) be a po set. Define a category C(p.:<;;) whose objects are the elements of P and whose morphisms are the assertions that x S y. More formally, xsy else. Thus, there is a morphism x -+ y (and it is then unique) just in case x composition by (y -+ z)(x -+ y) = x -+ z and define idx = x -+ x. s y. Define
46 2 An Introduction to Category Theory Verify that C(p.~) is a category. How do the po set axioms correspond to the category axioms here? 8. Let X be any set. Show that x :::;; y if x = y is a partial order on X, the discrete ordering. C(x. =) as in Exercise 7 is called the discrete category on X. 9. Let X* be as in 13. Show that the "prefix ordering" w :::;; v if w is a prefix of v, that is, v = conc(w, u) for some u, is a partial order on X*. 10. Recalling the discussion following 1.4.10, it is useful to have a semantic category Mfnoo for which the objects are sets but such that Mfnoo(X, Y) = Tot(X, &,(Y) u {oo}) where 00 is any object not in &,(Y). When fEC(X, Y), fix) = 0 means "termination with the empty set of values," whereas fix) = 00 means "nontermination," thereby extending the scenario of 1.4.5. Capture the spirit of this extended scenario by providing a definition for composition and identities that make Mfnoo a category. 11. Let (M, 0, e) be any monoid. We think of elements of M as "degrees of reliability" with e being "totally reliable." A partial function with reliability from X to Y is a partial function f: X ----+ M x Y, where M x Y is the set of all ordered pairs (a, y) with aE M, y E Y. If fix) = (a, y) we say "f(x) = y with reliability a." (i) An intuitive choice of monoid is M = [0,1], the interval of all real numbers a, 0 :::;; a :::;; 1 with a °b ordinary numerical multiplication and e = 1. Verify that this is a monoid. (Here fix) = (a, y) might be interpreted ''fix) = y with probability a.") (ii) For any monoid (M, 0, e) verify that the following defines a category. Objects: sets morphism X -+ Y = partial function X ----+ M x Y identity morphism X ----+ X: DD(id x ) = X, idx(x) = (e, x). Composition: given f: X ----+ M x Y, g: Y ----+ M x Z "if fix) = y with reliability a and g(y) = z with reliability b then gf(x) = z with reliability ba," that is, g(f(x)) is defined if and only if fix) = (a, x) is defined and g(y) = (b, y) is defined and then gf(x) = (ba, z). The resulting category is denoted FwR(M.o.e)' (iii) Given f E Pfn(X, Y) define E FwR(M.o.e)(X, Y) by r DD(n = DD(f), F(x) = (e, x). A morphism of form F is said to be reliable. Prove that (gff = g"j. and the reliable morphisms constitute a subcategory of FwR(M.o.e)' 2.2 Isomorphism, Duality, and Zero Objects This section introduces the fundamental equivalence relation of category theory, isomorphism. Also discussed are duality, initial and terminal objects, as well as zero objects which are simultaneously initial and terminal. The
47 2.2 Isomorphism, Duality, and Zero Objects Cartesian product of two sets, the set of words on a given alphabet, and the principle of simple recursion are all manifestations of initial or terminal objects. Isomorphisms All constructions in a category must ultimately be described entirely in the language of objects morphisms, composition, and identities. Our first definition in this language is that of "isomorphism." 1 Definition. A morphism f: X -+ Y in a category C is an isomorphism if there exists g: Y -+ X with gf = id x , fg = id y or, in terms of a commutative diagram, x~r~ x f I Y if it exists, is unique, since if also hf = idx, fh = id y then g(fh) = (gf)h = id x h = h. This proof uses the full force of axioms band c in the definition of a category. Such g, then, is called the inverse of f and is written f-1. Such g, 9 = 9 id y = 2 Example. In Set f: X -+ Y is an isomorphism if and only if f is bijective, that is, f is one-one and onto. To see this, first suppose f is an isomorphism. If f(x) = f(x) then x = f-1(f(X)) = f-1(f(X)) = X, which proves f is oneone (injective). If y is any element of Y, y = f(f-1 (y)), so f is onto (surjective). Conversely, let f be injective and surjective. If y is any element of Y there exists a unique element of X, call if g(y), which f maps to y. Thus, f(g(y)) = y. Since, in particular, f(g(f(x)) = f(x) and f is injective, g(f(x)) = x. 3 Example. In Pfn, f: X -+ Y is an isomorphism if and only if f is a total function which is bijective. By the first example it is obvious that a bijective total function is an isomorphism. Conversely, let f: X -+ Y be an isomorphism. Then f-1f = id x so that X = DD(id x) = DD(f-1f) c DD(f) which implies f is a total function. Similarly, ff- 1 = id y implies f- 1 is total, so f is an isomorphism in Set. 4 Example. In Mfn, f: X -+ Y is an isomorphism if and only if f is a total function which is bijective. As in Example 3, one way is clear. Conversely, let f: X -+ Y be an isomorphism. First observe that if yEf(x) then xEf-1(y) since ff-1(y) = {y} implies that f-1(y) #- 0 and, if xEf-1(y) then xEf-1f(x) = {x} so that x = x. Then, if y, YEf(x) then xEf-1(y) and YEf(x) so YEff-1(y) = {y} and y = y. This proves f is a partial function. Symmetrically, f- 1 is a partial function. Now use the preceding example.
48 2 An Introduction to Category Theory 5 Definition. Two objects X, Y in a category C are isomorphic if there exists an isomorphism f: X --+ Y. This is written X ;;: Y. 6 Observation. Isomorphism is an equivalence relation on ob(C). x PROOF. id x : X --+ X is an isomorphism with id 1 = id x so that X ;;: X, and isomorphism is reflexive. Iff: X --+ Y is an isomorphism, so is f- 1: Y --+ X so that isomorphism is symmetric. To see that transitivity holds, if f: X --+ Yand g: Y --+ Z are isomorphisms, then (gf)(f-l g-l) = g(jJ-l )g-1 = gg-1 = id z and U- 1g-1 )(g1) = id x similarly so that gf is an isomorphism (and (g1)-1 = f- 1g-I). 0 As a rule, definitions and constructions in category theory (beginning with 7 below; see Theorem 8) are not unique but are "unique up to isomorphism." Thus, a major aspect of the philosophy of category is that "isomorphism" formalizes "abstractly the same." Each theorem of category theory has a "dual theorem" whose proof is an automatic consequence of the original, obtained by "reversing the arrows." Before giving the general notion of duality, we explore the motivating duality of initial and terminal objects. 7 Definition. An object A in a category C is initial if for every object X there exists exactly one morphism from A to X. We denote this unique morphism by!:A--+X. The next result, simple as its proof may be, is one of the most fundamental in category theory because it turns out that many important constructs can be shown to be equivalent to initial objects in suitable categories. 8 Theorem. If A, B are both initial objects in a category C then !: A --+ B is an isomorphism. Thus, if C has an initial object it is unique up to a unique isomorphism. PROOF. As any two morphisms from A to A are equal, similarly B to B, the following diagram commutes: 9 Example. The empty set 0 is initial in Pfn with !: 0 --+ X the totally undefined function. Since this function is total (else it would be undefined on some element of 0) 0 is also initial in Set. !: 0 --+ X is not only the unique partial function but also the unique multifunction (by 1.4.3) so 0 is again the initial object of Mfn and ANMfn.
49 2.2 Isomorphism, Duality, and Zero Objects For each construction defined in a general category C, the dual construction is the construction obtained by "reversing all arrows." An initial object is one admitting unique morphisms from itself, so the dual concept should be as object admitting unique morphisms to itself and such is aptly called a "terminal object." 10 Definition. An object A in a category C is terminal if for each object X of C there exists exactly one C-morphism from X to A. The unique C-morphism from X to A will be denoted j : X --+ A. As another exercise in the language of duality, consider the notion of an isomorphism. We saw that f: X --+ Y is an isomorphism just in case there is a g: Y --+ X such that the following commutes: g Y~I~ Y g lX If we reverse all the arrows, we say that f: Y --+ X is "the dual of an isomorphism" just in case there is a g: X --+ Y such that the following commutes: g Y~l~ x Y( g But this just says that f is an isomorphism, so the concept of isomorphism is self-dual. With this observation, the dual of Theorem 8 is the following: 11 Theorem. If A and B are both terminal objects, then isomorphism. j : A --+ B is an PROOF. Once the concept of duality is understood, no proof is needed-just reverse all arrows in the proof of Theorem 10. D We are now going to place the concept of duality on a more formal footing, by regarding a diagram in this category C with all its arrows reversed as being identical to a diagram in the "opposite category" coP. Here the abstraction of our general definition of a category begins to show its power. A function f: X --+ Y from a set X to a set Y is certainly not to be considered as a function from Y to X, but there is nothing to prevent us using the "arrow-reversed notation" f: Y --< X for f (using a distinctive new arrowhead) and calling this a morphism from Y to X in the new category SetOP • Here is the general definition.
50 2 An Introduction to Category Theory 12 Definition. Let C be a category. The dual or opposite category of C is the category cop defined as follows: ob(CO P) = ob(C), cop (X, Y) = c(y, X). Taking C as the "primary" category whose arrows we write in the normal way f: X -. Y, we write the same morphism in cop as f: Y ---< X. If f E COP (X, Y) and g E COP(Y, Z) their composition g * f in cop (X, Z) is obtained by taking the composition fog of g E C(Z, Y) and f E C( Y, X) in C. X~ Y~Z =X~Z =X~ZinCop, where Z~Y~X=Z~XinC. Axioms a, b, and c for cop follow easily from their correspondents in C. The identity morphisms of cop coincide with those of C. Moreover, rephrasing our earlier observation that isomorphism is self-dual, f E C(X, Y) is an isomorphism in C if and only if the same g considered in COP is an isomorphism in coP. PROOF. The diagrams (in which equally labeled commutative diagrams are equal statements in their respective categories) x 1 ,y ~ (A)l~dy id~ ~ (B)~ X 1 ,y ~ X in C establish the assertion about isomorphisms. J(B) Y,'id, id~I' (A)~ X >>--1-;:--- y in cop D Clearly, C = (COPt P. There is nothing special about being of the form coP. cop ranges over all categories as C does. When C is a "concrete" category such as Set there is no guarantee that cop will likewise have such a representation. SetOP is "more abstract" than Set. We now see that our earlier definition, for each construction in a general category C, that the dual construction is that obtained by "reversing all arrows" becomes 13 Definition. Given a construction A in C, the dual construction is that obtained by performing the construction in cop, and then interpreting the construction in C. We will often refer to the dual of the A construct as the co-A construct. Thus, isomorphism = co-isomorphism, co-initial = terminal, and coterminal = initial. With this, let us use cop in spelling out a full proof of
51 2.2 Isomorphism, Duality, and Zero Objects Theorem 11: By definition, A, B are initial objects in cop. By Theorem 8, !: B --< A is an isomorphism in cop. But we have already shown that f: Y --< X is an isomorphism in cop if and only if f: A ...... B is an isomorphism in C. Thus, the unique morphism A ...... B (which we choose to call i rather than !) is an isomorphism in C. 14 Example. In Set, a terminal object is a one-element set. Hence, Set has many different terminal objects but all are isomorphic (as they must be by Theorem 13). Thus, while the "abstract theory of initial objects" and the "abstract theory of terminal objects" should be regarded as the same (whatever we can state and prove about initial objects in C is automatically stated and proved, dually, for terminal objects in cop and cop ranges over all categories as C does), in a particular example such as Set initial objects and terminal objects behave differently. 15 Example. Let D be the full subcategory of Set whose objects are sets with two or more elements. While the initial object 0 of Set is not in D this does not in itself prove that D has no initial object (see Exercise 2(d)). In fact, if A is any object of D there are at least two morphisms A ...... A. This proves that no object of D is either initial or terminal. Before introaucing zero objects we consider the more general concept of zero morphisms which abstract "totally undefined" morphisms in Pfn. 16 Definition. Let C be any category, and let OXY E C(X, Y) be given for each X, y. Say that (Oxy) is a family of zero morphisms if for every f: W ...... X, g: Y ...... Z we have X --,---+) Y Oxy On taking f or g equal to the identity, we see that this amounts to saying "any composition which has a zero factor is itself zero." Set does not have a family of zero morphisms because Set (X, 0) is empty whenever X # 0. When a family of zero morphisms exists, however, it is unique since if (Oxy), (ZXY) are both families of zero morphisms, ZXY = idyZxy(Oxxidx) = (idyZxy}Oxxidx = OXY· We often write 0: X ...... Y for OXY: X ...... Y if no confusion would arise. 17 Example. In Pfn, Mfn, and ANMfn, the totally undefined functions i I Oxy=X-0~Y yield a family of zero morphisms. This example motivates the next definition and proposition.
52 2 An Introduction to Category Theory 18 Definition. A zero object in a category C is an object that is both initial and terminal. We denote a zero object by 0, the same symbol as for an initial object. Though arbitrary, the convention is standard. 19 Proposition. A category with a zero object has zero morphisms. In a category with zero morphisms, each initial object is also a zero object and each terminal object is also a zero object. PROOF. For the first statement, let 0 be a zero object and define Then i ! fi/O~j. x y commutes so (Oxy) is a family of zero morphisms. For the second statement, let zero morphisms exist and let 0 be an initial object. There exists at least one morphism X -+ 0, namely, O. As 0 is initial, 0: 0 -+ 0 = ido: 0 -+ 0 so if f: X -+ 0 is arbitrary we have f = idof = Of = O. This shows 0 is terminal. That a terminal object is initial is simply the dual D statement. 20 Example. Pfn, Mfn, and ANMfn have 0 as zero object. The construction in Example 17 follows the proof of 19. In Pfn, we have said f: X -+ Y is total iff DD(f) = X, that is, f(x) is defined for every XEX. In the same spirit, say that f: X -+ Yin Mfn or ANMfn is total if f(x) "# 0 for all x E x. It is easy to prove (work Exercise 7!) that these definitions are unified by the following abstract one. 21 Definition. Let C be a category with zero morphisms. Say that f: X -+ Y is total if, whenever t: T -+ X, we have that t "# 0 implies ft "# O. The following results, obviously true in Pfn, Mfn, and ANMfn, hold for total morphisms in general. 22 Proposition. Let C be a category with zero morphisms and let f: X g: Y -+ Z. Then (i) Iff, g are total, so is gf: X (ii) If gf is total, so is f. PROOF. -+ -+ Y, Z. (i) if t "# 0 then ft "# 0 so g(ft) = (gf)t "# O. (ii) If t "# 0 then (gf)t "# 0 so g(ft) "# O. Since gO = 0, ft "# O. D
53 2.2 Isomorphism, Duality, and Zero Objects Simple Recursion We conclude this section by showing how sequences inductively defined by simple recursion are the unique morphism! from the initial object in an appropriate category. This is a foretaste of the principle that constructions in a category which are unique up to isomorphism are instantiations of initial objects. By a sequence in a set X we mean a function N --+ X. We may define the sequence g: N --+ N, n --+ 2n inductively by the definition g(O) = 1, g(n + 1) = 2 * g(n) for n ~ O. This is a specific case of the following general notion. 23 Definition. We say that the sequence g: N --+ X is defined from f: X --+ X by simple recursion if 9 satisfies the recursive definition Basis Step: g(O) = Xo. Induction Step: g(n + 1) = f(g(n)) Xo E X and for nEN. In the above example, Xo = 1 and f(x) = 2 * x. It is clear that 9 is defined uniquely by the above scheme. We shall take up a general discussion of recursive definitions in Chapter 5. Here our task is to show that the 9 of 23 is really an example of the unique map! induced from an initial object. First, we note that an element Xo E X can also be written as the function 1 --+ X that sends the unique element of the one-element set to Xo. We shall also call the map Xo' The basis step of 23 can then be rewritten as the commutative diagram 24 1 ~Nl ~X 9 (Here, 0 is not a zero morphism as in 16, but the map whose value is 0 in N; we are, in fact, in Set which does not have zero morphisms.) Again, if we let s: N --+ N denote the successor function n f--+ n + 1, the induction step is equivalent to the commutative diagram 25 More generally, then, we have the following: 26 The Principle of Simple Recursion. For each Xo: 1 --+ X and f: X --+ X
54 2 An Introduction to Category Theory there exists a unique g: N --+ X such that the following diagram commutes: l~gNl ( s ~ x( 1 N 1g X This leads us to consider the following category: 27 The category of simple recursion data Srd has as objects the triples (X, xo,f), where X is a set, Xo E X, and f: X --+ X is a total function. A morphism t/J: (X, xo,f) -----+ (Y,Yo, h) is a total function t/J: X --+ Y for which the diagram --=-1__ 1->: 1 ~y( X X +-( l~ y h commutes, that is, t/J(x o) = Yo, while t/J(f(x)) h(t/J(x)) for each x in X. = We must yet specify composition and identities and verify that Srd is a category. But, given this, we can note immediately that the principle of simple recursion, 26, is equivalent to the statement that "(N, 0, s) is initial in Srd." Returning to the definition of Srd, composition is defined to be the usual composition 1//0 t/J of total functions. That this is well defined is best seen from "diagram pasting:" 1 X /1; 1 Yo l~ h , Y ( ~l~' z( y l~' k Z For example, k(t/J't/J) = (t/J't/J)f because k(t/J't/J) = (kt/J')t/J = (1// h)t/J = t/J'(ht/J) = t/J'(t/Jf) = (t/J'I/J)f. Axiom b of 2.1.1 is obvious since the composition of total functions is associative. The identities for axiom c are the obvious ones: X ( 1~ lid ~ x( 1 x 1 x lid x X Now claim: if t/J: (X, xo,f) -----+ (Y, Yo, h) is an Srd-morphism, t/J is an isomorphism in Srd if and only if t/J is bijective. On the one hand, if t/J is an isomorphism then there exists ,p: (Y, Yo, h) -----+ (X, xo,f) with t/J o,p = id y , ,p 0 t/J = idx , so t/J is bijective. Conversely, if t/J is bijective then there exists a
55 2.2 Isomorphism, Duality, and Zero Objects function rjJ: Y ---. X with ljJ 0 rjJ = id y , rjJ 0 ljJ = id x . Is such rjJ a morphism: (Y, Yo, h) ----+ (X, xo,f)? Consider the diagram below-the1's indicate places where commutativity is yet to be proved. f X( X 1 71~ Yo I Y ( ~1~ X( h ? 1~ Y 1~ X f Well, reading from the diagram above, (rjJh)ljJ = rjJ(hljJ) = rjJ(ljJf) = (rjJljJ)f = id x f = f = f 0 id x = f(rjJljJ) = (frjJ)ljJ. Thus, rjJh = (rjJh)(ljJljJ-l) = ((rjJh)ljJ)ljJ-l = ((frjJ)ljJ)ljJ-l = (frjJ)ljJljJ-l = frjJ. Similarly, rjJ(yo) = rjJ(ljJ(x o)) = (rjJljJ)(xo) = idx(x o) = Xo' We reiterate the desire that objects in a category should be isomorphic just in case they are "abstractly the same." This works out well for the category of recursion data above where if ljJ: (X, xo,f) ----+ (Y, Yo, g) is an isomorphism, the bijection ljJ transports Xo to Yo and f to g: thinking of ljJ as a "relabelling," the abstract structure is "the same." When designing new categories, one of the aesthetic criteria to keep in mind is that this technical sense of isomorphism should relate to intuitive ones. For example, if when defining the category of simple recursion data we dropped the requirement that ljJ(x o) = Yo and ljJf = gljJ, we would get a category whose isomorphisms were bijections, but here if s: N ---. N is s(n) = n + 1 whereas for z: N ---. N, z(n) = 0, (N, 0, s) and (N, 0, z) would be isomorphic, which is not desirable. EXERCISES FOR SECTION 2.2 1. Show that in ANMfn isomorphisms are total bijections. 2. Let (P, ~) be a poset and let C(P.';) be the category of Exercise 2.1.7. (a) Prove that isomorphic objects of C(P,,;) are equal. (b) A least element of (P,~) is PEP such that P ~ x for all XEP. Give a direct proof that if (P, ~) has a least element, it is unique. (c) Give an alternate proof of (b) using Theorem 11 by showing that a least element of (P, ~) is an initial object of C(P. ,;). (d) Let P = {a,b,c} be the po set with a ~ b ~ c. Let D be the full subcategory with objects b, c of C(p, ,;). In comparison to E<,ample 15, show that D has an initial object, but one different from that of C(P. ,;). 3. Show that a morphism in Mon is an isomorphism if and only if it is bijective. 4. Show that a morphism f: (P, ~) ---+ (Pl , ~l) in Poset is an isomorphism if and only if f is bijective and f(x) ~l fry) implies x ~ y. Show that if (P, ~) is discrete (see Exercise 2.1.8) and (Pl , ~') is not then idp:(P, ~)---+(Pl' ~l) is bijective and monotone but not an isomorphism.
56 2 An Introduction to Category Theory 5. Prove that an object Z of C is a zero object in C if and only if it is a zero object in COp. Prove that C has zero morphisms if and only if cop has zero morphisms. 6. Prove that any full subcategory of a category with zero morphisms has zero morphisms. This fails for arbitrary subcategories (see Example 2.1.17). Use this construction to give an example of a category with zero morphisms but with no zero object. 7. Show that the total morphisms of Pfn in the sense of Definition 21 are exactly those f which are total functions as in 1.3.6. Similarly, show that the total morphisms of Mfn and ANMfn of 21 are those f with f(x) =I 0 for all x. 8. For your category Mfn"" of Exercise 2.1.10, do zero morphisms exist? If so, characterize the total morphisms. 9. In any category with zero morphisms, prove that every isomorphism is total. 10. In Pfn, we call t: DD(f) ---+ X with t(x) = x the totalizer of f. (i) Verify that this is the special case in Pfn for the following general definition: Let C be any category with zero morphisms. Given f: X --+ Y, a morphism of form t: T --+ X is a totalizer of f if ft is total and if whenever u: U --+ X is such that fu is total then there exists unique CL with tCL = u as shown: T ~\ u I 1 X -----=:/-_1 y (ii) Using Exercise 9 show that if f is an isomorphism then id x is a totalizer of f. (iii) Prove that if t: T --+ X and u: U --+ X are both totalizers of f then the unique CL with tCL = u is an isomorphism. Give both a direct proof and a proof based on Theorem 11 obtained by representing (T, t) and (U, u) as objects in a suitable category Cf. In Pfn, (i) and (ii) suggest that the important semantic notion of the domain o( definition of a morphism can be described in category-theoretic terms. Mfn and ANMfn also possess a reasonable notion of domain of definition for f: X --+ Y, namely, DD(f) = {xEX:f(x) =I 0}. Regrettably, it may be shown that not every f has a totalizer in these categories. 11. Let (M, 0, e) be a monoid. An element a E M is invertible if there exists bE M with ab = e = ba. A monoid in which every element is invertible is called a group. (i) Let R be the set of real numbers. Show that (R, +,0) is a group. (ii) Imitate the proof of Definition 1 to show that if a is invertible then the b with ab = e = ba is unique. We call b the inverse of a and write b = a-i. (iii) In fact, show that your proof in (ii) is a special case of that of Definition 1 by associating to (M, 0, e) the following category C(M.a.e). There is only one object, call it CL. The set of morphisms CL --+ CL is M with ida = e and 0 for composition. Verify that C(M.a.e) is a category whose isomorphisms are the invertible elements. 12. Let (M, 0, e) be a monoid and consider the category FwR(M.a.e) of Exercise 2.1.11. (i) Show that f: X --+ M x Y is an isomorphism from X to Y in FwR(M.a.e) if
57 2.3 Products and Coproducts and only if (a) DD(f) = X. (b) For all y E Y there exists unique x E X such that f(x) has form (a, y). (c) For all x E X, if f(x) = (a, y), a is invertible (as in Exercise 11). (ii) Show that the empty set is a zero object of FwR(M.o.e). 2.3 Products and Coproducts In this section we show that two constructions of set theory which play an important role in program semantics-Cartesian products and disjoint union-can be described in category-theoretic terms and so generalize to a wide class of categories. While the original constructions seem unrelated, their category-theoretic descriptions are seen to be dual. Cartesian products abstract to products in a category whereas disjoint unions abstract to coproducts, it being common to indicate duality by the prefix "co" as discussed earlier in 2.2.13. Co products are an important structural aspect of the partially additive categories of the next chapter. We begin by describing Cartesian products of sets. The term "Cartesian" honors the mathematician Rene Descartes who developed plane analytic geometry whereby the plane is represented as the set of all ordered pairs (x, y) with x, y in the set R of real numbers. Thus, the plane is R x R where, in general, for any two sets X, Y their Cartesian product is the set X x Y = {(X,Y)[XEX,YE Y} of all ordered pairs with x E X, Y E Y If a program fragment had two variables x and Y taking values in the sets X and Y, respectively, then X x Y would comprise all possible values which could be taken by the two variables taken together. Turning to a formal analysis, we offer the following precise description of an ordered pair (x, y) E X X Y which, if somewhat pedantic, is useful for generalizing to the product of infinitely many sets. If we invent a convenient two-element set, say {i,j}, an ordered pair in X, Y amounts to a total function f: {i,j} ---+ X u Y with f(i) E X, f(j) E Y The relationship between (x, y) and f is that f(i) = x, f(j) = y, and this formula defines (x, y) in terms of f and f in terms of (x, y) and, indeed, establishes a bijective correspondence between the f and the (x,y). We could then regard X x Yas the set of all functions f: {i,j} ---+ X u Y with f(i) E X, f(j) E Y This leads to the following general definition. 1 Definition. Let (Xi [i E J) be any family of sets. Their Cartesian product is the set of all functions. J~UXi ieI
58 2 An Introduction to Category Theory such that f(i) E Xi for each i E I. We denote this set of functions by n Xi ieI or n(X;liEI). Often family notation is used so we write (xiliEI) instead of f, where f(i) = Xi. This directly generalizes the motivating comments above where I has two elements. 2 Example. If 1= {p, a, s} and if Xp is a set of persons, Xa is a set of age values, say {16,17, ... ,80}, and Xs = {male,fem&le}, then nieIXi is a suitable vallJe object for a data base "record" for a person's age and sex. We next consider unions. Note that if Xl ~ X2 are isomorphic in Set and if, similarly, YI ~ Y2 , it is not necessarily true that Xl u YI ~ X2 U Y2 • For example, let Xl = {a,a}, X2 = {a,b}, YI = {a,b,e}, anq f2 = {c,d,e}. Then Xl u YI = {a,a,b,e} has four elements whereas X2 u Yz = {a,b,c,d,e} has five. Since any two sets in bijective correspondence are "abstractly the same," according to the philosophy of category theory we seek j;l notion of union that respects isomorphism better than ordinary union. A solution is given in terms of "disjoint unions" wherein, given a family (Xi liE I), the elements of Xi are "painted color i" before taking an ordinary union. The more precise definition makes use of ordered pairs and is as follows. 3 Definition. Let (X;I i E I) be any family of sets. Their disjoint union is the set {(x, i)liEX,XEX;} and is denoted 11 Xi ieI or 11 (X;I i E I) (The choice of the upside-down Cartesian product symbol anticipates the yet-to-be established category-theoretic duality between Cartesian product and disjoint union.) Note that 11 Xi = UXi ieI ieI X {i} is the ordinary union of the "colored" sets Xi x {i} in which an element (x, i) is "x painted color i." The union is disjoint because Xi x {i} nXi x {j} even if Xi = 0 ifi =l-j = Xj. Disjoint unions also occur in semantics: 4 Example. The disjoint union has a very natural application in describing the exit value of a multiexit program. Here, a value like (y, i) would be interpreted as "execution of the program terminates by taking exit i with
59 2.3 Products and Coproducts value y." For example, given fl' ... , f,,: X statement of 1.5.22. Y in Pfn, consider the case -+ case (Pl"'" Pn) of (fl"" ,f,,) with flowscheme I m~ 0---fJ . For 1 = {l, ... , n}, the semantics of the portion to the left of the dashed line is X~UY ie] (here the notation Ui EI Y means Ui EI Y; with each Y; = Y) where g(x) { if Pi(X) is defined (!;(X), i) = undefined else. This discussion will be completed in 25 below. We turn now to a description of Cartesian products that uses only category-theoretic language in Set. Our starting point is a given family (X;! i E1) of sets. Our Definition 1 is in terms of elements; we must think more in terms of how to use morphisms in Set to characterize when a set X is to be isomorphic to niEIXi' To this end consider a family of morphisms of the form (Y~ X;!iE1). For each yE Y, (!;(y)liE1) is an element ofniEIXi and so should correspond to a definite element of X, call it f(x). In this way there is a bijective correspondence between morphisms Y -+ X and families of morphisms Y -+ Xi' Moreover, when X = niEIXi this correspondence is easily described in terms of commutative diagrams. To begin, we need the following definition: 5 Definition. For (X;!i E1) a family of sets andj E1, thejth projection function is n x.~x. , J' ie] We then observe that the relationship between f and the!; is precisely 6 n iEI X.I pro J 'XoJ ~), y (for alljE1)
60 2 An Introduction to Category Theory In terms of elements, 6 asserts that f(y) = (h(y)liEI) which is what we expected. We have motivated the following definition. 7 Definition. Let C be any category and let (X;I i E I) be a family of objects of C. A product of (X;liEI) is (P,(priliEI)), where P is an object ofC and for each i E I, pri: P --+ Xi is a C-morphism, all subject to the following property: given (Y, (hi i E 1)) with Ya C-object and h: Y -+ Xi C-morphisms there exists unique f: Y --+ P such that prJ = h for all i E I as shown: The pri are called projection morphisms. 8 Example. In Set, P = TIiE1Xi, with pri as in 5, is a product of (X;liEI). This was established in the discussion motivating 7. 9 Proposition. In any category C, products are unique up to a unique isomorphism, that is, if (P, (pr i)), (P, (pr i)) are both products of (X;I i E I), the unique IX p l~ C(~Xi Ji pri is an isomorphism. PROOF. For given (X;liEI), let D be the category whose objects are all (Y,(h)) with};: Y -+ Xi' whose morphisms h: (Y, (};)) -----+ (Z, (gi)) are defined to be C-morphisms h: Y -+ Z such that y~ 1 h z ~g. . Xi (all i E I) with composition and identities as in C. That D is a category is routinely verified. By Definition 7, a product of (X;I i E I) is the same thing as a terminal object of D. The desired result now follows from Theorem 2.2.11. 0 10 Proposition. In any category, a product of the empty family is the same concept as a terminal object whereas if I = {i} has one element, id: Xi -+ Xi is a product.
61 2.3 Products and Coproducts For the first statement, consider 7 with 1= 0. A product is an object P equipped with the empty family (prj-that is, P is an object with no further structure. Continuing, the property satisfied by P is that if Y is any object with no further structure then there exists unique f: Y --+ P with no further conditions. This is then the same as the definition of a terminal object, 2.2.10. The second statement is immediate from the diagram PROOF. id Xi lXi hi/, y D 11 Definition and Notations. In any category, if (P,(priliEI)) is a product of (X;I i E I) we use the notations n Xi ieI or n(X;liEI) or n Xi for P, the third notation being a convenient shorthand if I is understood from context. By 9 such Xi is unique only up to isomorphism, but in category theory that is unique enough. Thus, in Set we are modifying the notation of 1 to now refer to any set isomorphic to the specific model for the product which we there called "the Cartesian product." See Exercise 1. The unique way f of 7 depends only on the I; and deserves a notation to indicate this. We thus write such f as [I; liE I] or simply as [I;] when I is understood. Thus, in our new notation, n X prj l/, l t [hliEI] X. ' y When I is finite with at least two elements, infix x is a convenient symbol to indicate products. Thus, if 1= {l, ... ,n}, n ~ 2, Xl is a synonym for X ... X n Xi' Similarly, we write Xn XxYxZ instead of the more cumbersome (X;liEI), 1= {1,2,3}, Xl = X, X2 = Y, X3 = Z. Corresponding notations exist for the unique induced map, for example, /X 'f x x. [fl,···J·]l ~ X, x y and i
62 2 An Introduction to Category Theory X' prx X x Y pry J Y ~i[r. Y A product of (X;!iEI) when all Xi = X are equal is called a power (of I copies of X) and we write Xl instead OfniEIXi. We say that a category has products if every family of objects has a product. Similarly, a category has finite products if every finite family (i.e., a family (X;!iEI) with I finite) has a product. By Proposition 10 any category which has finite products has a terminal object. 12 Example. Set has products, as we have already discussed. If C is the full subcategory of Set of all finite sets, then C has finite products since Xl' ... , Xn E C, the Set product Xl x ... X Xn is again in C (see Exercise 11) and so, with the same pri and [fl' ... '!"] constructions as in Set, acts as a product in C. If (X;! i E/) is an infinite family in C for which each Xi has at least two elements, no product of this family exists in C. (See Exercise 12.) 13 Example. Two one-element sets do not have a product in ANMfn. To see this, suppose {a} ~ X 2:4 {b} were a product diagram in ANMfn. Let I be anyone-element set. By the product property there exists a unique subset S: I -+ X of X such that prlS = 0, pr 2 S = {b}. Since pr 2 S =1= 0, S =1= 0. As prlS = 0, there exists XES with prl(x) = 0. It follows from the definition of composition in ANMfn that for X: I -+ X, prlX = 0. Similarly, pr 2 X = 0. Since also pr 1 0 = 0 = pr 2 0 we must have X = 0 which we have already seen is not so, the desired contradiction. Using the prefix "co" to signal duality, we define coproducts as dual to products as follows: 14 Definition. In any category C, given a family (XiliEI) of objects, a coproduct of (X;!iEI) is (C, (in;!i E1)). C is a C-object and for each i E I, in i: Xi -+ C is a C-morphism such that (C, (in;! i E 1)) is a product in COp. This makes sense since in cop the ini have the form C~x., Given /;: Xi -+ Yin C we have in C in cop
63 2.3 Products and Coproducts The morphism ini is called the ith coproduct injection. The unique rJ. with rJ. ini = h will be denoted (hi i E I). Notations similar to those for products are useful for coproducts. If (C, (in;! i E 1)) is a coproduct of (X;! i E I), we write UXi ieI or U(XiliE1) or UXi as a synonym for C. We indicate finite coproducts with infix + as in X+Y+Z when all Xi = x, yielding a copower (of I copies of X), which we write I· X. C has coproducts if every family of C-objects has a coproduct. C has finite coproducts if every finite family of C-objects has a coproduct. The following propositions are dual to 9 and 10 and so require no further proof whatsoever! 15 Proposition. In any category, coproducts are unique up to a unique isomorphism, that is, if (C, (in i)), (c, (in;)) are both coproducts of (X;! i E I), the unique rJ. . c /: Xi :CX '>-1mi-""""'c is an isomorphism. 16 Proposition. In any category, a coproduct of the empty family is the same concept as an initial object whereas if I = {i} has one element, id: Xi -+ Xi is a coproduct. We now make good our promise that disjoint unions provide coproducts in Set, resolving a possible notational ambiguity between 3 and 14. 17 Example. Set has coproducts. Given a family (X;! i E I) of sets, let C = {(x, i): i E I, x E X;} be the disjoint union of 3 and define injections in i: Xi -+ C by ini(x) = (x, i). Given h: Xi -+ Y we have Xi ini IF ~~/; y where f(x, i) = /;(x). The above diagram commutes because for i E I, x E Xi' fini(x) = f(x, i) = hex). Such f is unique since if g ini = h for all i then for any (x, i) E c, g(x, i) = g ini(x) = hex) = f(x, i).
64 2 An Introduction to Category Theory 18 Example. As discussed in 1.3.4 and 1.4.9, a total function f: X --+ Y may also be regarded as a partial function X --+ Y or as a multifunction X --+ Y. We now observe that if in;: X; --+ C is· a coproduct in Set, then the same functions considered in Pfn, Mfn, or ANMfn are again co products in those categories. Thus, Pfn, Mfn, and ANMfn have coproducts and these are constructed as disjoint unions. To see this, first observe the following: 19 Iff: X --+ Y is a total function and g: Y --+ Z is a multifunction so that g is a total function Y --+ &1(Z), then the Mfn and the ANMfn compositions gf are given as the total function composition X~Y~&1(Z) This is immediate from 1.4.4 and 2.1.5. Continuing with the discussion, if /;: X; --+ &1(Y) are total functions, it follows from the coproduct property in Set that there exists a unique total function f: C --+ &1(Y) with x in j j Ie ~ if' &,(Y) where fin; = /; in Set. But by 19, as in; is a total function, this is equivalent to fin; = fin Mfn and ANMfn and so it is clear that (C, (in;» is a coproduct in Mfn and ANMfn. Finally, construct C = {(x, i): i E I, x E X;} as the disjoint union with in;(x) = (x, i) so that f(x, i) = h(x) as discussed in 17. Clearly, if each h(x) has at most one element of Y this holds for each f(x, i) so f is a partial function if each h is. But then by Proposition 1.4.10, it is clear that (C,(in;» is a coproduct in Pfn. In the balance of this section we offer a few examples that demonstrate the relevance of coproducts to semantics. As we remarked following 2.1.5, the flowscheme notation of 1.5.2 provides useful instruction in the "semantic category." The notations of 1.5.3 and 1.5.8 present composition and identities this way. We now extend flowscheme notation to finite coproducts, using the disjoint union construction of coproducts common to Pfn, Mfn, and ANMfn, as discussed in Example 18, for motivation. Let C be any category with finite coproducts. We consider flowschemes whose atomic components are morphisms of the form f: X --+ Xl + ... + Xn with flowscheme
65 2.3 Products and Coproducts Table 20 A Loop-Free Flowscheme D c G Intuitively, f first applies an n-way test and, if the ith test is successful, transforms an input from X into an output in Xi' Recall Example 4. An example of sufficient complexity to illustrate the ideas is shown in Table 20. The atomic morphisms there are f: X --+ A + B, g: B--+ E + C + F, h: A --+ C + D, t: E --+ D, and u: F --+ G. In general, however, we should consider "multi-input, multi-output" morphisms of the form f: Xl + ... + Xm --+ Yl + ... + Y" with flowscheme 21 - - :-:-4:~I~ ~---:.---.: _____f ____ For example, consider Table 20 just after f executes; the remainder of the computation has the form A + B --+ D + C + G. What we wish to do is to describe Table 20 as a morphism X --+ D + C + G in C given f, g, h, t, u. This can be done for this and a larger class of similar flowschemes using the operations of composition, "parallel construction," and "line tying." We will not formalize the general class of flowschemes involved, but we will describe the last two operations rigorously and apply them to give a semantics to Table 20. 22 Definition. Let n ~ 2, Xi ~ y;, i = 1, ... , n in a category with finite co products. Then the parallel construction Xl + ... + Xn 1,11"'111", Yl + ... + Y" with flow scheme notation x.
66 2 An Introduction to Category Theory is defined by That is, ftll" ·11f,. (/1 11"'11 f,.)(x, i) <flin l , ... ,f,.inn ). = In Pfn, Mfn, and ANMfn, = h(x) which is aptly described by the flowscheme notation. 23 Definition. Before defining "line tying" we give an example. The flowscheme •.. X2 XI U 1-' • XI XI x2 ... x2 --. ... X3 X3 ... describes the morphism X 2 + Xl + Xl ~ + X 2 + X3 <in2.in"in"in2.in3~ Xl ... + X 2 + X3 where the inj are the injections of Xl + X 2 + X 3 . In general, given objects Xl"'" Xn (n ;:::: 1) and an ordered list with ij E {1, ... , n} (repetitions allowed), the line tying morphism is defined by LT W in.] = (j = 1, .. . ,k) in·'i where inj injects Xij as the jth terms of Xi, + ... + X ik while inij injects Xij as the ijth term of Xl + ... + X n • The semantics of Table 20 arises by viewing the flowscheme as in Table 24. Table 24 A Resolution of Table 20 ~ ~ ~ E~ X A B :: B 9 cE F ~ t C ---. F ---. U G C D D • }---U D C C ~I - ~ G G
67 2.3 Products and Coproducts 25 Example. For g: X -4 Y should have flowscheme + '" + Y as in Example 4, the case statement Y --_X--~·GJ~~==}---·· Y and so is ag, where a: Y + .. , + Y -4 Y is the line tying morphism defined by a ini = id y for each i. (Recall from Proposition 16 that id y : Y -4 Y is a coproduct of one copy of Y.) EXERCISES FOR SECTION 2.3 1. For the Cartesian product construction of sets show that s: X x Y ----+ Y x X, s(x, y) = (y, x) is bijective, and that a: (X x Y) x Z ---+ X x (Y x Z), a«x, y), z) = (x,(y, z» is bijective. Thus, X x Y ~ Y x X and X x (Y x Z) ~ (X x Y) x Z in Set. Discuss how X x Yand Y x X and also X x (Y x Z) and (X x Y) x Z are genuinely different from the point of view of implementing records (cf. Example 2.) 2. Let C be a category with finite products. (i) Define the "coordinate switching morphism" s: X x Y ----+ Y x X by ~f~, X'Y~;J, X and show that for C = Set, s(x, y) = (y, x) as in Exercise 1. In general, show that s is an isomorphism. (ii) If 1 is a terminal object, show that pr 1: X x 1 ---+ X is an isomorphism. (iii) Define a(X x Y) x Z -> X x (Y x Z) by ~Xi-Y (X x Y) x Z I ~x Pf 1 ) X x (Y x Z) lpf 2 YxZ For C = Set show that a«x, y), z) = (x, (y, z» as in Exercise 1. In general, show that a is an isomorphism. [Hint: Define a- l in a similar way and show aa- l and a-la are identities by composing with projections and using the uniqueness of morphisms induced into a product. For beginners, this is a somewhat involved exercise.] (iv) State the dual of (i), (ii), and (iii) for coproducts. Describe s: X + Y ----+ Y + X and a: (X + Y) + Z ----+ X + (Y + Z) when C = Set.
68 2 An Introduction to Category Theory 3. Show that any category with a terminal object and such that each two objects has a product possesses all finite products. [Hint: Use induction. Given Xl' ... , X.+1' if Xl x ... X X. exists show that is a model of Xl x ... X X.+1'] State the dual result for coproducts. 4. In any category, if pri: P - Xi is a product of (X;liEI) and if f: Q - P is an isomorphism, show that prJ: Q - Xi is also a product. 5. (i) Let Xl' X2 be vector spaces and let Xl x X2 be the Cartesian product set with the projection functions pri: Xl x X2 --+ Xi of 5. Prove that there exists a unique vector space structure on Xl x X2 rendering pr l' pr 2 linear and then show that this constructs the product of Xl' X2 in the category Veet of Exercise 2.1.3. Now define in i: Xi --+ Xl X X2 by in l (x) = (x,O), in2(y) = (0, y). Show that each ini is linear and, in fact, that is a coproduct in Veet. [Hint: <fl'/2)(X l ,X2) = fl(xd + f2(X2)'] Thus, Veet has the curious property that the same object underlies both the (finite) product and coproduct construction. (ii) In case I is infinite, show that the product vector space (Xi' prJ has underlying set {(x;l i E I)} while the coproduct vector space (Xi' ini) has underlying set {(x;liEI, and only finitely many Xi are nonzero)}. 6. Let C be any category with the property that for each two objects X, Y there exists a bijection C(X, Y) - c(Y, X), fr--+ f* with the properties id! = id x ; for f: X - Y, g: Y - Z, (gf)* = f*g*. Show that is a product in C if and only if . X. prt', P is a coproduct in C. 7. Show that in Mfn, the disjoint union C = {(x, i): i E I, . {{X} pri(x,J) = 0 x E Xi} with projections ifi =j ifi #- j is a product in Mfn. Hence, the same object underlies the coproduct and coproduct. [Hint: Use Exercise 5 with xEf*(y)<,> yEf(x).] n 8. In this exercise let X x Y, Xi' and so on refer to the Cartesian product construction in Set and let +, li refer to disjoint union. Then Pfn has products as follows: (i) The terminal object (in fact the zero object) is the empty set.
69 2.3 Products and Coproducts (ii) For two sets X, Y, verify that the Pfo product is (X x Y) + X + Y with prl: (X x Y) + X + Y - - X defined by DD(prd = (X x Y) + X, prl(X,y) = Xl' prl(x) = X, and pr2: (X x Y) + X + Y __ Ysimilarly. (iii) For an arbitrary family (Xd i E I) define XJ = iE J Xi for each nonempty subset J of I. Define P = 1l(XJ 10 "# J c I). Define appropriate projections with respect to which P is the product of (Xd i E I) in Pfo. n 9. Express the following flowschemes in a category with finite coproducts. y (i) X .~ (ii) ~~ Z z A X Z ~0 B ~ .~ >:: 10. In this exercise let C be any category with products. We consider a simplified model of operational semantics suggested by sets and total functions. A more complex model is really required. The point of this exercise is to provide some practice with products. We limit the discussion to the semantics of assignment for the Pascal fragment of Table 1.2.1. Let V be a "value" object of C equipped with three "constants" 0, 1, .1: 1 -+ V (where 1 is a terminal object) for "zero," "one," "undefined," and four operations +, -, x, -;.-: V x V - - V. (i) Use the product V x V and Proposition 10 to describe the "successor morphism" s: V -+ V which, in Set, is s(V) = V + 1. Let I be the set of all identifiers as in Table 1.2.1. Define the power VI as the "object of states." In Set an element of VI is a function I -+ V from identifiers to values. The evaluation of an expression E takes the form of a morphism [El VI -+ V defined inductively as follows (see Table 1.2.1). For each numeral n, with ordinary numerical value N, define en] = VI --->1 ~ V~ V, where SN is the N-fold composition of the successor morphism with itself id y , Sl = S, SHl = (Sk)S). For each identifier ()(, (SO = [()(] = VI ~ V. Assuming [E], [F] have been defined then [E + F] = VI [[El.[Fll , V X V~ V and [E - F], [E x F], and [E -;.- F] are defined similarly. (ii) Use the product property of VI to define the semantics of assigning expression E to identifier ()( as a "state-transition morphism" VI which in Set would be [.:=Ej , VI
70 2 An Introduction to Category Theory . [IX. = E](V;) = (W;), Jtj = {Vi[E](V;) ifj#1X ifj = IX. 11. In Set, if Xl' ... , X. are finite and if Xi has k i elements, show that Xl x ... x X. has k l '" k. elements and so is again finite. Also, show Xl + ... + X. has kl + ... + k. elements. 12. Let Xo , Xl' X 2 , ••• be an infinite sequence of sets and let Xi by elements of Xi' For each subset A of N define A ZEn ZA .= " { Xi Yi ifiEA ifi¢A n Xi # Yi be distinct and show that A f-> ZA is injective. This proves that Xi is infinite (indeed, Xi is for those readers familiar with basic facts about infinite sets, it shows uncountable since N has uncountably many subsets). n 13. Let (M, 0, e) be any monoid. If in i : Xi --+ X is a coproduct in PCn, show that in;: Xi --+ X (in the notation of Exercise 2.1.11 (iii)) is a coproduct in FwR(M.o.e)' Thus, FwR(M,o.e) has coproducts. Notes and References for Chapter 2 Much has been written about category theory. For an elementary introduction on the level of this chapter see M. A. Arbib and E. G. Manes, Arrows, Structures, and Functors: The Categorical Imperative, Academic Press, 1975. Many useful texts go into more detail. The reader may wish to consult 1. Adamek, Theory of Mathematical Structures, D. Reidel, Dordrecht, 1983. P. Freyd, Abelian Categories, Harper & Row, 1964. H. Herrlich and G. E. Strecker, Category Theory, 2nd ed., Heldermann Verlag, Berlin, 1979. S. Mac Lane, Categories for the Working Mathematician, Springer-Verlag, 1972. B. Mitchell, Theory of Categories, Academic Press, 1965. The Principle of Simple Recursion was formulated by F. W. Lawvere in the 1960s and used in a category-theoretic approach to the foundations of mathematics. For discussion and further references see R. Goldblatt, Topoi, The Categorial Analysis of Logic, Revised Edition, NorthHolland, 1984. The observation that a wide class ofloop-free flowschemes can be expressed in any category with finite coproducts is due to C. C. Elgot in lectures given at Stevens Institute of Technology, July 1975. The isomorphisms X x (Y x Z) ~ (X x Y) x Z, X x Y ~ Y x X of Exercise 2 imply that the product of n objects, in any order and with any parenthesization, is of the form Xl x .. , x X. up to isomorphism. This result is more involved than meets the eye. For an introduction to so-called problems of "coherence" see the book by Mac Lane cited above.
CHAPTER 3 Partially Additive Semantics 3.1 Partial Addition 3.2 Partially Additive Categories and Iteration 3.3 The Boolean Algebra of Guards In Section 1.5 we introduced guard functions and finite and infinite sums of partial functions and multifunctions and used these as tools to describe guarded commands and various conditional and repetitive constructs. All this was in Pfn or Mfn. The task of this chapter is to develop axioms on a general semantic category to make similar constructions possible in a wider context and to clarify the algebraic processes of manipulation and simplification as partly demonstrated by Example 1.5.31. Section 3.1 formally axiomatizes the addition operation previously introduced for Pfn(X, Y) and Mfn(X, Y). In Section 3.2, Pfo and Mfn are generalized to "partially additive" categories C in which C(X, Y) has a partial addition as in Section 3.1 subject to appropriate axioms that relate the category structure to the addition operations. The study initiated in Section 1.5 may be pursued further in any partially additive category. All axioms on a partially additive category deal with summability and behavior of sums. It is rather surprising, then, that if X is an object in a partially additive category C, no further axioms are needed to describe a subset Guard(X) of C(X, X) to play the role of the "guard functions." As proved in Section 3.3, Guard(X) is a Boolean algebra under the partial order p $; q if pq = p. 3.1 Partial Addition In Section 1.5 we introduced a sum operation in Mfn(X, Y) and in Pfn(X, Y). In this section we present "partially additive monoids" (M, wherein M is any set (abstracting M = Mfn(X, Y) or M = Pfn(X, Y» and I is a suitable sum operation on M. I)
72 3 Partially Additive Semantics To begin, let us pause to consider the ordinary "sum of xl, ... ,xn ." The underpinning for this is a binary operation x + y which is associative (x + (y + z) = (x + y) + z) and commutative (x + y = y + x). The associativity allows us to write Xl + ... + Xn without parentheses (for the same reasons as discussed following 1.4.7), whereas the commutativity makes the order in which Xl' ... ' Xn are listed immaterial. For example, y + x + z + w = w + y + X + z = w + x + y + z obtains from two uses of commutativity. Sums such as those of 1.5.9 and 1.5.11 cannot be derived from a binary operation because we need infinite sums such as the formula for while-do in 1.5.27. The approach we shall adopt is to define sums of families of various size directly. Let M be a fixed set. If I is a set, an I-indexed family in M is a function x: 1-4 M. Family notation for this function is X = (xiii E I). Here we have written Xi instead of x(i). Whereas the function notation x: I -4 M makes both the domain and codomain explicit, the family notation suppresses the codomain. Family notation is thus convenient when the codomain remains fixed. The empty family is the unique 0-indexed family (unique because 0 is initial in Set). Two families x: I ~ M, y: K ~ M are equivalent if there exists a bijection tjJ: I ~ K with ytjJ = x; in family notation, yop(i) = Xi for all i E I. A subfamily of (xii i E I) is (xN E J) where J c I. An I -indexed family is countable if I is finite or denumerably infinite. The partial addition to be axiomatized will assign an element I (xii i E I) of M to certain I-indexed families in M. Since the semantic notions we wish to capture involve no uncountable sums, we will deal exclusively with countable families. One axiom will be an appropriate expression of the irrelevancy of breaking up a sum into subs urns, for example, Xl + X 2 + X3 + X4 + X5 + X6 = X3 + (Xl + X6 + X2) + (X4 + X5)· If I = {I, ... ,6}, Ia = {3}, Ib = {1,6,2}, Ie = {4, 5}, and J = {a, b, c} we may write this result as 1 Here (Ijlj E J) is a partition of I, that is, if j #- k then I j n h = 0 and also I = (I/ j E J). We pointedly remark that in our definition of partition, I j = 0 is allowed for any number of j (even denumerably many j) as long as U the partition properties just given hold. We are now ready for the formal definition. 2 Definition. A partially additive monoid is a pair (M, I), where M is a nonempty set and L is a partial function which maps countable families in M to elements of M (and we say that (Xi: i E I) is summable if I (xii i E I) is defined) subject to the following three axioms: Partition-Associativity Axiom. If (xii i E I) is a countable family and if (IN E J) is a partition of I with J countable, then (xii i E I) is summable if and only if (xii i E Ij ) is summablefor every j E J and (xii i E I j ) Ij E J) is summable. In that case, (I
73 3.1 Partial Addition Unary Sum Axiom. Any family (x;liEI) in which I has one element is summable and I(x;liEI) = Xj if I = {j}. Limit Axiom. If (x;I i E I) is a countable family and if the subfamily (x;l i E F) is summable for every finite subset F of I then (x;l i E I) is summable. 3 Example. For any two sets X, Y the set Pfn(X, Y) with the I of 1.5.11 is a partially additive monoid. The details are easily verified. We point out that the limit axiom is true here in a stronger form: if (/; liE I) is a countable family then it is summable if /; + fj exists for each i,j in I . • 4 Example. For any two sets X, Y, Pfn(X, Y) can be considered as a partially additive monoid in a different way from that in Example 3. Say that a countable family (/;Ii E I) is overlap summable if whenever x E DD(/;) (\ DD(fj) then /;(x) = fj(x) and then define (/;liEI)(x) to equal any /;(x) which is defined for that x. Then = on disjoint families but is more often defined. The verification of the axioms is routine. I' I I' I' 5 Example. For each two sets X, Y the set Mfn(X, Y) has a partially additive monoid structure in which every countable family is summable: define (/;1 i E I)(a) = (/;(a) Ii E I) as in 1.5.9. I U I) Let (M, be a partially additive monoid. An alternative notation for I(x;liEI) is Xi, + x i2 + Xi3 + ... if I = {il,i2,i3""}' The notation gives preference to a particular enumeration of I but the partition-associativity axiom and the unary sum axiom may be used to prove that equivalent families have the same sum: if ljJ: I --+ I is any bijection then x'l'(itl + X'l'(i 2) + X'l'(i3) + ... exists and equals Xi, + x i2 + Xi3 + . .. [just consider the partition J = ({ ljJ(j)} Ij E I)]. The full strength of this equivalence property is used to justify notations such as x + y + z for x, y, Z E M. Such exists if and only if I (Xi liE I) exists with I a three-element set and {X 1 ,X 2,X 3} = {x,y,z}, but subject to this it does not matter how I and the Xi are chosen. Thus, for example, the unary sum axiom may be written I X = X for all X E M. Pfn(X, Y), as in Example 3, has a special element, the totally undefined function 0, which acts like a zero in that if Xi, + x i2 + Xi3 + ... exists then so does the sum with O's arbitrarily interspersed, for example, Xi, + 0 + Xi2 + o + 0 + Xi3 + Xi, + .... The remarks made above make it clear that order is never important in a sum so that, in effect, the zero property of 0 is expressed by saying that whenever (xiliEI) is summable then I(x;liEI) + 0 + 0 + 0+ ... (for any countable number of O's) exists and equals I(XiliEI). The following result establishes such a 0 for any partially additive monoid. I) 6 Theorem. Let (M, be any partially additive monoid and let!: 0 --+ M be the empty family in M. Then 0 = exists. Furthermore, if (xiliEI) is any summable family, if J is any countable set disjoint from I, and if Xi = 0 for i E J then I (x;l i E I u J) exists and equals I (x;l i E I). PROOF. I! First observe a principle of interest in its own right:
74 3 Partially Additive Semantics 7 Any subfamily of a summable family is summable. The proof of 7 is immediate from the partition-associativity axiom since if (x;liEI) is summable and K c I define J = {1,2}, 11 = K, 12 = I - K so that (lNEJ) partitions I and so (x;liEK) = (x i liEI 1 ) is summable. Since the empty family is a subfamily of any family, the proof that exists is clear once some family is seen to be summable. This is clear from the unary sum axiom since M is nonempty. Now let (x;liEI) be summable, let J be a countable set disjoint from I and let Xi = 0 for iEJ. For jEI u J define L! I. J = {U} 0 ifjEI ifUJ. Then (ljlj E I u J) partitions 1. By the partition-associativity axiom L(~)xiliEIj)ljEI u J) exists and equals L(x;liEI). But L(XiliEI) is just Xj if j E I (by the unary sum axiom) and is 0 if j E J, so this equality is just the statement of the theorem. 0 Ordinary numerical addition for integers includes situations such as (-1) + 1 = O. There are no such additive inverses for partially additive monoids: if X + y = 0 then necessarily x = y = 0, as follows. 8 Proposition. Let (M, D be a partially additive monoid, and let L (x;l i E I) O. Then each Xi = O. L = L PROOF. By partition associativity, for each iEI, Xi + (xjlj #- i) = (x;liEI) = 0, so it suffices to prove that X = 0 if X + y = O. This follows from partition associativity by x=x+O+O+O+··· = x + (y + x) + (y + x) + (y + x) + ... = (x + y) + (x + y) + (x + y) + .. . = O. o EXERCISES FOR SECTION 3.1 1. Let (M, D be any partially additive monoid. Let 00 be any new element not already in M and set M = M u {oo}. For countable families (x;liEI) in M define I Xi = {L Xi 00 if each else. Xi E M, and L Xi exists Show that (M,I) is a partially additive monoid. This demonstrates that any partially additive monoid may be extended to one in which every countable family has a sum by adding a single element. 2. Consider the construction of Exercise 1 for Pfn(X, Y) with the overlap sum of Example 4. Interpret 00 as al). "overdefined element."
75 3.2 Partially Additive Categories and Iteration 3. Let (M, +,0) be a monoid, as in Definition 2.1.11, such that x = Y = 0 whenever x + y = O. Say that a countable family (x;liEI) is summable if {ilxi =f. O} is finite, in which case L Xi is defined to be the ordinary sum of the nonzero Xi' the empty sum being defined as O. Show that (M,D is a partially additive monoid. Where do you need the assumption that X + y = 0 implies x = y = O? 4. Let M be any set and let OEM be any element. For any countable family (x;liEI) define = O. Show that (M, is a partially additiave monoid if and only if LXi M = {O}. L) 3.2 Partially Additive Categories and Iteration In this section we introduce an axiomatic class of semantic categories which plays an important role in this book. The object is to equip a category C with partially additive monoid structures on each of the sets C(X, Y) and impose axioms so that the constructions of Section 1.5 for Pfn and Mfn can be done in C. Proposition 1.5.15 and Corollary 1.5.16, the distributive law of composition over sum in Mfn and Pfn, immediately motivate the following definition: 1 Definition. Let C be a category. A partially additive structure L on C is an assignment of a partially additive monoid structure Lx,y making (C(X, Y), Lx, y) a partially additive monoid for each pair X, Y of C-objects, subject to the distributivity of composition over sum, that is, for all f: W -+ X, h: Y -+ Z and for all summable families (g;l i E J) in C(X, Y), (gJI i E J) and (hg;l i E J) are also summable and . ( L g;)f = L (gJ), X,y h(L gj) X,y W,y = L (hgJ x,z We do not exclude the case J = 0. See the proof of Proposition 4 below. Henceforth, we will usually write L gj for the more tedious Lx, y gj (as we did in Section 1.5) unless special emphasis is required. 2 Examples. As already observed, 1.5.9 gives Mfn a partially additive structure whereas 1.5.11 provides Pfn with a partially additive structure, It is easily verified that the overlap sum of Example 3.1.4 is an alternate partially additive structure on Pfn, Hence, a category may have more than one partially additive structure, 3 Example. The sum of 1.5.9 is not a partially additive structure on ANMfn. Let WE W, fl' f2: W -+ X, and g: X -+ Y be such that .t;(w) = {x;} with
76 3 Partially Additive Semantics g(x 1) = 0 and g(x 2) i= 0· Then (gf1 + gf2)(W) (f1 + f2)(W) = {X1'X 2} so that (g(f1 + f2))(W) = 0 0· = u g(X2) i= 0, whereas 4 Proposition. A category admitting a partially additive structure has zero morphisms. PROOF. Let I be a partially additive structure on C and let Oxy E C(X, Y) be the I-sum of the empty family in C(X, Y) as in 3.1.6. Given f: W -+ X, if (giliE 0) is the empty family in C(X, Y), (fg;iiE 0) is the empty family in C(w, X) so that fOxy = f(I(g;iiE 0)) = I(fg;iiE 0) = OWy Similarly, for g: Y -+ Z, gOWY = Owz' We thus have as required by Definition 2.2.16. D 5 Notation. Paralleling 1.5.10, if (hi i E J) is a summable family in C(X, Y) in a category C with partially additive structure I, the following flowscheme notation is introduced for I.t;. y x Given a category with partially additive structure, what axioms should be imposed to justify the intuitions suggested by our flowscheme notations? First of all we should require the existence of sufficiently many coproducts so as to have available the theory of 2.3.20-25. For reasons of technical hindsight we will in fact require that every countable family of objects has a coproduct. We shall then require two axioms, both of which guarantee that certain families are summable. (Hence, in any situation such as Mfn where all families are summable, these axioms will necessarily hold.) We begin with some general definitions: 6 Definition. Let C be a category with countable co products and zero morphisms, and let (Xii i E J) be a countable family of C-objects. (i) For J c J, the quasi projection PRJ: (X;I i E J) -+ (Xi: i E J) is de- 11 11
77 3.2 Partially Additive Categories and Iteration fined by . , {in0i iEJ i¢J. PR J ln. = If J = {j} we write PRj. (ii) The diagonal-injection A is defined by (recall that 1· X is the coproduct of 1 copies of X; see Definition 2.3.14). Xi = 1· X, we have the special line tying mor(iii) When all Xi = X, phism (J defined by 11 I·X (J I inii~ X X For flowschemes, we use 1 amply suggested by these. X, = {1, 2} as an example, the general case being I A= Xl X2 X, X2 ~ I x2 (J= In the flowscheme for A we are implicitly using the natural isomorphism (Xl + X 2 ) + (Xl + X 2 ) ~ Xl + X 2 + Xl + X 2 · See Exercise 1, and the comment on coherence in the notes to Chapter 2.
78 3 Partially Additive Semantics The reader should now pause to work Exercise 2. The first axiom on a partially additive category, called the "compatible sum axiom" in Definition 11 below, is motivated by the following flow scheme identity induced by f: X -----. Y + Y: 7 Using the maps 0": Y that 8 + Y -----. Y, PR 1, PR 2 : Y + Y -----. Y of 6,7 asserts O"f = (PR1f) + (PR 2 f)· It is clear how to generalize 7 and 8 for f: X -----. I· Y for any countable I. For our axiom we will ask for a weaker statement than 8, which we state as an observation about Pfn. 9 Observation. Given f: X -----. I . Y in Pfn with I countable, (PR i f liE I) is summable. Indeed this holds with either 1.5.11 or the overlap sum of 3.1.4 since if(PRJ(x)) is summable, f(x) has form (y, i) so that DD(PRJ) n DD(PRJ) = if i =1= j. Of course, 9 holds in Mfn with the sum of 1.5.9 because all families are summable. o The second axiom on a partially additive category, called the "untying axiom" in Definition 11 below, is suggested by the flowscheme x for f y + g. It seems reasonable to allow the output lines to be "untied" to yield y x y Now, at least in Pfn, the effect of untying is to say that x E DD(f) gets mapped to one copy of Y and x E DD(g) gets mapped to a disjoint copy of Y so that either gets mapped into Y + Y. The injection maps y~y+ y~y distinguish one copy from the other. From this perspective, the untied flow-
79 3.2 Partially Additive Categories and Iteration chart is y x y+ y y which now has the same form as the original and represents in l f have the following: + in 2 g. We 10 Observation. In Pfn, if I. g: Y -+ Y have disjoint domains of definition then in l I. in 2 g: X -----+ Y + Y do also. The proof is obvious. We have motivated the central definition of this section: 11 Definition. A partially additive category is a category C which has countable coproducts and a partially additive structure I which satisfies the following two axioms: (i) Compatible Sum Axiom. If (I; liE I) is a countable family in C(X, Y) and if there exists f: X -----+ I· Y with X~[PR' Y (we say the I; are compatible) then (I;I i E I) is summable. (ii) Untying Axiom. If I. g: X -+ Yare summable then also inlI. in 2 g: X -----+ Y + Yare summable. 12 Example. Pfn is partially additive with I as in 1.5.11. This was established in 1.5.16, 2.3.18, 9, and 10. 13 Example. Mfn is partially additive with I as in 1.5.9. See 1.5.15 and 2.3.18. The terminology of 11 is a bit off-kilter at the current stage because it appears possible that whether or not a category "is" partially additive depends on which partially additive structure I is under consideration. We now prove, however, that if C with I is a partially additive category then the converse of the compatible sum axiom holds so that summability is completely determined by the categorical structure of coproducts and (necessarily unique) zero maps, and further that the sum is determined by the category itself. Thus, a category can be partially additive for at most one This is Theorem 18 below. En route we need some preliminary results. To begin, consider the map L\: Xl + X 2 -----+ (Xl + X 2 ) + (Xl + X 2 ) of 6. Identifying (Xl + X 2 ) + (Xl + X 2 ) with Xl + X 2 + Xl + X 2 as discussed I.
80 3 Partially Additive Semantics following 6, we have the flowschemes Xl Xl X2 X2 Xl Xl Xl X2 X2 X2 Xl \ ( I X2 (1 whose composition u II should surely be idx I +X 2 • A rigorous proof of the general statement is easily established with no appeal to the assertion about parenthesization of co products: 14 Proposition. In any category with countable coproducts and zero morphisms, let (X;I i E I) be a countable family of objects and let UXi~I·UXi ieI ieI I·UXi~UXi ieI be the morphisms of 6. Then u II ieI = id. PROOF. We have ull inj = uinjinj = id inj. Thus, ull = id when preceded by inj for eachj in I, and hence the two are equal by the uniqueness property in the definition of coproducts. 0 The next result will find frequent application. We use the morphisms of 6 without comment from now on. 15 Proposition. Given f: X ---+ UP';: iEI) in a partially additive category, there exists a unique family J;: X -+ Yj with f = L ini !;, namely,!; = PRd. PROOF. The family (in i !;) is compatible, because llf: X the compatibility condition of 11 (i): PRjo(llf) To see this, first note that = inJjj. ---+ I· U Yj satisfies
3.2 Partially Additive Categories and Iteration 81 commutes for each j, since the two paths are obviously equal (using 6) when preceded by ini for each i in I. Thus, PRj 0 1\ 0 f = inj 0 PRj 0 f = inj 0 ij as was claimed. Hence, ~)njij is defined, by the compatibility sum axiom. Noting that PRj 0 id = PRj, we see that the PRj: I'11 1"; ------+ U 1"; are also summable by the compatibility sum axiom. But if L PRj exists, then the untying axiom tells us that L injPR/ I'11 1"; ------+ I'11 1"; also exists. We have since PRjini = 0 for i #- j, and zero summands "drop out." This establishes LinjPRj = id j since these morphisms are equal when preceded by ini for each i in I. As a1\ = id, by 14, we have that f = a1\f = a(id)N = a(L injPRj}N = L (ainj)(PRj1\)f = L (id)(injPRj)f = LiniPRd)· Thus, f = L injij, as was to be shown. For uniqueness, note that if also f = L injgj, then PRJ = L PRiinjgj = gi' D 16 Corollary. In a partially additive category, for each countable family (X;I i E I) of objects, L iniPR i exists and L iniPRi = id: 11 Xi ------+ 11 Xi' In Example 3 we observed that the obvious L fails to be a partially additive structure on ANMfn. In fact no L makes ANMfn a partially additive category: 17 Example. ANMfn is not partially additive. Co products exist as disjoint unions as discussed in 2.3.18. Let X be any nonempty set. Then the injections in i: X ------+ X + X and quasiprojections PR/ X + X ------+ X are determined by the category and do not depend on any sum operation. If t: {y} ------+ X + X satisfies t(y) = X + X then PRit = 0: {y} ------+ X + X for i = 1, 2 since each PRi maps some element of X + X to 0. Thus, if ANMfn were partially additive we could have in! PR! t + in 2 PR 2 t = in! 0 + in 2 0 = 0 + 0 = o whereas by 16 (in! PR! + in 2 PR 2 )t = t #- 0, and this violates the distributivity of composition over sum. We next see that a category can be partially additive in at most one way.
82 3 Partially Additive Semantics In particular, Pfn is partially additive for the disjoint sum but not for the overlap sum. 18 Theorem. The addition operation of a partially additive category is unique as follows: if C is a partially additive category, then a family (hli E I) in C(X, Y) is summable if and only if it is compatible. In that case, the f: X -----+ I· Y with PRJ = h is unique, and 19 PROOF. If "ih exists, then f "i in;h exists by the untying axiom. Then PRjf = "i(PR)n;hl iE I) = h· = This shows that summable families are compatible. It is immediate from 15 that if PRJ = PRig for each i, then f over, if PRJ = L then = g. More- o Note that the formula 19 for the sum established 8 in general. The next result generalizes 1.5.18(b). 20 Corollary. Given h: X -? Y, g;: Y exists, then so too does "i 9;h. -? Z in a partially additive category if "ih PROOF. If (h liE I) is summable then it is compatible, by 15; that is, there exists f: X -----+ I· Y with f = "i in;h. Now define g: I· Y -----+ Z by go in; = g;. Then o Definitions such as if A then f else 9 in 1.5.24 and while A do f in 1.5.27 for Pfn and Mfn can not be stated in a partially additive category without first generalizing the guard functions incA of 1.5.20, as will be done in the next section. Surprisingly, however, we need not wait to define conditional and iterative constructions which, as will be seen in the next section, do generalize the if-then-else and while-do constructions. Following 2.3.4 and 2.3.25 we have the following: 21 Definition. In any category with finite coproducts, the generalized conditional of a morphism of form t: X -? Y is at: X -? Y with a: Y + Y -----+ Yas in 6, that is, (Jin! = id y = ain 2 • An appropriate flowscheme is 22 Examples. In Pfn, given A c X, f, g: X provided t: X -----+ Y + Y is defined by -? Y, if A then f else 9 = (Jt
83 3.2 Partially Additive Categories and Iteration DD(t) = (A n DD(f) u (A' n DD(g» f(x) = {f(X) g(x) xEA x¢A. A similar construction works in Mfn. There is a partially additive version of 1.5.24 for the generalized conditional: 23 Proposition. Given t: X ----+ Y + Y is a partially additive category, t = inl t 1 + in2 t2 as in 15, the generalized conditional is given by (J( if = tl + t2 The reader should observe that the flows cherne form of proposition 22 is . precisely 7. In the same vein, a morphism of form f: X ----+ X + Y induces a "generalized while-do" which we call the iterate of f and denote by a dagger superscript ft: X --+ Y. Intuitively, ft is "repeat f until in Y." The formal definition results from the following theorem which makes crucial use of the limit axiom in Definition 3.1.2. 24 Theorem. Given f: X ----+ X + Y in a partially additive category, write f = in 1 fl + in2 f2 for unique fl: X --+ X, f2: X -+ Y as in Proposition 15. Then the sum 00 L fdt: X --+ Y n=O ft = exists. We call it the iterate off and denote it with the flows cherne ! PROOF. For any g: X --+ X {!Jr---=X::c..-...Jy Y the sum gfl + f2 exists. To see this, consider X ~X and use the fact that f +Y (g,id y ) I Y = in l fl + in2 f2 to derive <g,idy>f= <g,idy>{indl +ind2) = <g, idy>indl + <g,idy)in2f2 = gfl + f2 Setting g = f2' f2fl + f2 exists. Proceeding inductively, if g = f2ft +
84 3 Partially Additive Semantics f2ft- 1 + ... + f2fl + f2' exists, then so does (fzft + f2ft- 1 + ... f2fl + f2)fl + f2 = f2ft+1 + f2ft + ... + f2fl + f2' Now let F be any finite subset of I = {a, 1,2, ... }. There exists n with Fe {O, ... , n}. As (fU2: 0:5: i :5: n) is summable and as any subfamily of a summable family is summable by 3.1.7, (fU2: iEF) is summable. That ft exists then follows from the limit axiom. D 25 Example. In Pfn, for f: X ----+ X + Y, if x, f(x), f(f(x)), ... , r- 1 (x) are defined and in X and then r(x) is defined and in Y, ft(x) = r(x). Such n is clearly unique if it exists. If no such n exists ft(x) is undefined. For A eX, f: X -+ X, while A do f is 9 t if g: X ----+ X + X is defined by g(x) = {(f(x), I) (x, 2) if x E DD(f) n A ifx¢A. L incA,(f incAt 00 9t = n=O which is just the formula for while A do f of 1.5.27. In Chapter 8 we will discuss the semantics of recursion in a partially additive category. Our more immediate objectives are to introduce guards so as to deal directly with conditional and repetitive constructs in the form they were introduced in Section 1.5 and to discuss assertion semantics and proof rules in this context. This program is begun in the next section and continued in Chapter 4. EXERCISES FOR SECTION 3.2 1. Construct an isomorphism (W + X) + (Y + Z) ~ W + X + Y + Z in any category with finite coproducts. [Hint: define four injections rendering (W + X) + (Y + Z) a model of W + X + Y + Z.] 2. Give explicit descriptions of PRJ> ~, and [) in Pfn and convince yourself that the flowschemes of Definition 6 are aptly chosen. Observe that the same partial functions considered in Mfn provide these constructions in Mfn. 3. Prove that Set is not partially additive. 4. Prove that Veet is not partially additive. [Hint: All finite families are compatible and this forces (f + g)(x) = f(x) + g(x).] 5. Let C be a partially additive category and let Y be an object of C such that for all X and for all f, g E C(X, Y), f + g is defined. Prove that Y~Y+Y~Y is product in C.
85 3.3 The Boolean Algebra of Guards 6. Show that for h: X ---+ X + Yin Pfo, there exist A c X, f: X such that h t = go (while A do f). --> X, and g: X --> Y 7. Show that while A do f in Mfo as in 1.5.27 has form h t but that, in contrast to Pfo in Exercise 6, there exists h: X ---+ X + Yin Mfn such that h t is not of the form go (while A do f). Thus, the iterate is truly more general than while-do. 8. Show that in both Pfn and Mfn, repeat f until A has the form h t. 9. Show that the following holds is any partially additive category. y y 10. Show that the following holds in any partially additive category. ~ y y 11. For any monoid (M, 0, e), show that the category FwR(M.o,e) of Exercises 2.1.11, 2.2.12, 2.3.13 is partially additive. [Hint: Pfn(M,o,e) (X, Y) = Pfn(X, M x Y) is a partially additive monoid with' the sum of 1.5.11.] 3.3 The Boolean Algebra of Guards Pascal allows a statement of the form if x > 0 and not x = 5 then Sl else S2' If Sl' S2 are to be semantically interpreted as morphisms X -+ Y in a category C, how can "P and not Q" be interpreted, where P is "x > 0" and Q is "x = 5"? In categories such as Pfn where objects are sets, "propositions" such as P, Q are tantamount to subsets. For example, "x > 0" may be identified with the subset of all x for which x > O. Under this identification, the logical operations and, or, and not correspond to intersection, union, and complement in the set of subsets of the set of possible values of x. Equivalently, they may be represented by guard functions as in 1.5.20. In this section, we first explore the set of subsets of a set, showing that its poset structure determines its logical structure as a "Boolean algebra." We then show that for an object X in a partially additive category C, there is a subset Guard(X) of C(X, X) of "guard morphisms" which forms a Boolean algebra. This is a pleasant surprise since the axioms defining a partially additive category were not designed with this in mind-Guard(X) comes for free!
86 3 Partially Additive Semantics We begin by abstracting familiar operations in the poset (&,(X), c) of subsets of X (2.1.8) to an arbitrary poset. 1 Definition. Let (P, $;) be a poset. A least element of (P, $;) is an element oE P such that 0 $; x for all x E P. At most one least element exists since if z is also a least element then 0 $; z and z $; 0 so that z = O. A greatest element of (P, $;) is an element 1 E P such that x $; 1 for all x E P. Again, antisymmetry yields that there is at most one greatest element. (It is reasonable to have chosen the same notations as for initial and terminal objects in a category-see Exercise 1.) 2 Example. (gIl(X), c) has 0 as least element and has X as greatest element. "Intersection" and "union" generalize to posets by observing that the intersection of two sets is the largest set contained in both of them whereas the union of two sets is the smallest set containing both of them. Formally, we have the following: 3 Definition. Let (P, $;) be a poset with x, YEP. An element infimum (or greatest lower bound or meet) of x, Y if (i) W $; x and W $; y; and (ii) whenever a $; x and a $; y, then a $; WE P is the w. Such W is unique when it exists since if u also satisfies (i) and (ii) then and u $; w. We write the infimum of x, y as W $; u x/\y when it exists. A supremum (or least upper bound or join) of x, y is an element z of P satisfying (iii) x $; z and y $; z; and (iv) whenever x $; a and y $; a, then z $; a. Again, such z is unique if it exists, in which case we write it as x v y. A poset in which x /\ y exists for all x, y is called a meet-semilattice. A po set in which x v y exists for all x, y is called a join-semilattice. A lattice is a poset in which both x /\ y and x v y exist for all x, y. 4 Example. (gIl(X), c) is a lattice with A /\ B 5 Proposition. In a meet-semilattice, (i) x /\ y = y /\ x. (ii) Ifx $; Xl' Y $; Yl then x /\ y (iii) x /\ (y /\ z) = (x /\ y) /\ z. $; Xl /\ Yl. = A n B and A v B = A u B.
87 3.3 The Boolean Algebra of Guards I n a join-semilattice, (iv) x v Y = Y v x. (v) Ifx ::::;; Xl' Y ::::;; Yl then X v Y ::::;; (vi) X v (y v z) = (x v y) v z. Xl V Yl· PROOF. That (i) holds is obvious since the order in which x, yare listed is immaterial in Definition 3. To prove (ii) and (iii) we use the axioms on ::::;; and 3 as follows. If x::::;; Xl and Y ::::;; Yl then x 1\ Y ::::;; x and x::::;; Xl so X 1\ y::::;; Xl; similarly, x 1\ Y ::::;; Yl. Thus, x 1\ y::::;; Xl 1\ Yl by 3 (ii). The proof of (iii) is longer. As x::::;; x and Y 1\ Z ::::;; Y it follows from (ii) already proved that x 1\ (y 1\ z) ::::;; X 1\ y. Furthermore, x 1\ (y 1\ z) ::::;; Y 1\ Z ::::;; z so X 1\ (y 1\ z) ::::;; z. Thus, x 1\ (y 1\ z) :5; (x 1\ y) 1\ z. For the reverse inequality, (x 1\ y) 1\ Z :5; X 1\ Y :5; x and, as x 1\ Y :5; Y and z :5; z, (x 1\ y) 1\ Z :5; Y 1\ Z so (x 1\ y) 1\ Z :5; X 1\ (y 1\ z). By anti symmetry, (iii) follows. The proof of (iv), (v), and (vi) for join-semilattices is similar. o Because of (iii) and (vi) in 5 we may use parentheses-free notation for n-fold meets and joins. Xl 1\ ... 1\ Xm Xl V ... V Xn 6 Proposition. Let (P, ::::;;) be a poset and let x E P. Then: (i) x 1\ x, X V x exist and x 1\ x = X = X V x. (ii) If the least element 0 of (P, ::::;;) exists, 0 1\ x, 0 V x exist and 1\ o v x = x. (iii) If the greatest element 1 of (P, :5;) exists, 1 1\ x, 1 v x exist and 1 1\ Ivx=1. ° = 0, X = x, X PROOF. We prove the statements involving joins, leaving the remaining results for the reader since the proofs are similar. For (i), x :5; x and x :5; x, and if x :5; a and x :5; a, then x :5; a. Thus, x satisfies the requirements for x v x in 3. As :5; x and x :5; x and if :5; a and x :5; a then x :5; a, v x = x. Finally, 1 :5; 1 and x :5; 1 and if 1 :5; a and x :5; a then 1 = a by antisymmetry since certainly a :5; 1 so 1 v x = 1. D ° ° ° If A is a subset of X, the only subset S of X satisfying A n S = 0, A u S = X is S = A' = X - A, the complement of A. This suggests the following. ° 7 Definition. Let (P, ::::;;) be a poset with least element and greatest element 1. Let x E P. A complement of x is an element x' for which x 1\ x', X V x' exist and = 0, X 1\ x' x x' = 1. V
88 3 Partially Additive Semantics 8 Example. Consider the poset with Hasse diagram 1 "<1>' o Here each of a, b, c has two complements. We would like complements to be unique. A key idea is the following. 9 Definition. A lattice is distributive if X 1\ (y V z) = (x 1\ y) V (x 1\ z) for all x, y, z in the lattice. 10 Proposition. In a distributive lattice with least and greatest elements, each element has at most one complement. PROOF. Suppose X 1\ X V = 0 = X 1\ Z, Y = 1 = x v z. Y Then, making free use of Propositions 5 and 6, y=yl\l=yl\~v~=~I\~v~I\~=yl\z so that y ::;; z. Similarly, z ::;; y. Thus, y = z. o 11 Example. (.?P(X), c) is a distributive lattice. If X E A n (B u C) then x E A and either x E B (hence x E A n B) or x E C (hence x E A n C) so x E (A n B) u (A n C). And, if x E (A n B) u (A n C), then either x E A nBc A n (B u C) or xEAn C c An(Bu C). It follows that (&,(X), c) is a Boolean algebra which is defined as follows: 12 Definition. A Boolean algebra is a distributive lattice with least and greatest elements in which every element x has a (necessarily unique by 10) complement x'. While the axioms on a Boolean algebra as an abstraction of (.?P(X), c) have been well motivated, it is not clear that enough axioms have been imposed. The reader's confidence that this is in fact so will be strengthened by working Exercise 10. We emphasize that all operations involved-x 1\ y, X V y, 0, 1, x'-are
89 3.3 The Boolean Algebra of Guards defined in terms of the partial order and if they exist they do so uniquely. A Boolean algebra is a type of poset. We now turn our attention to the problem of finding a subset of C(X, X) of "guard morphisms" which forms a Boolean algebra. For intuition, consider the way in which a subset A of X corresponds to the partial function incA: X -+ X of 1.5.19. We note that incA inherits from A the following properties: incA + incA' = 1, incA' incA' = 0 = incA' . incA, where we now write 1 for the identity function idx . This motivates the following definition: . 13 Definition. For X an object of a partially additive category C, Guard(X) is the subset of C(X, X) comprising all morphisms for which there exists p' such that p + p' exists and p + p' = pp' =0= 1, p'p, where we take 1 = idx . Elements of Guard(X) are called guards on X. 14 Example. For both of the partially additive categories Pfn, Mfn, Guard(X) consists of all the inclusion functions incA of 1.5.19. First consider Mfn. The equations p + p' = idx for p, p' E Mfn(X, X) yields p(x) u p'(x) = {x}. If both p(x) = {x} and p'(x) = {x} then p(p'(x» = {x} which contradicts p(p'(y» = 0 for all y. Thus, exactly one of p(x), p'(x) can equal {x}. Setting A = {x E Xlp(x) = {x}} we see p = incA' Conversely, if p = incA, set p' = incA" The proof for Pfn is essentially the same. The object of this section, then, is to show that for each partially additive C, Guard(X) has a poset structure with respect to which it is a Boolean algebra in such a way that Guard(X) for Pfn(X,X) and Mfn(X, X) have the usual Boolean operations on subsets. In what follows, we leave implicit a partially additive category C with respect to which Guard(X) is formed for some object X. We begin with the following: 15 Proposition. For p in Guard(X), the p' in the equations .p + p' = 1 pp' = 0 = p'p is unique. Furthermore, p" = p, 0' = 1, and l' = O.
90 3 Partially Additive Semantics PROOF. In spirit, this is much like PropositioQ 10. If also p +q = pq 1, = 0 = qp, then q = ql = q(p + p') = qp + qp' = 0 + qp' = qp' so that p' = (p + q)p' = pp' + qp' = qp' = q. That p" = p is immediate from the symmetry of p and p' in the defining equations. That 0' = 1, I' = 0 is clear from 0+1=0 0·1 = 0 = 1·0. o We next introduce the "sum-ordering" relation which, while not necessarily antisymmetric on all C(X, X) is general, always mflkes Guard(X) a poset, a~ shown in Theorem 20. 16 Definition. The sum-ordering relation on C(X, X) is defined by f ::;; 9 if there exists h such that 9 = f + h. Hence, in any partially additive category, we have p ::;; 1 = idx for each guard p. 17 Examples. For Pfn(X,X), ::;; is the extension ordering of Example 2.1.9. If 9 extends f define DD(h) = {xEDD(g)lx¢DD(f)} and define h(x) = g(x) to get 9 = f + h. That 9 extends f if 9 has the form f + h is obvious. 18 Example. For Mfn(X, X), f::;; 9 if and only if f(x) c g(x) for all x. We leave this as an exercise. 19 Proposition. The sum-ordering ::;; on C(X, X) satisfies the following properties: (i) ::;; is reflexive and transitive. (ii) Iff::;; 9 then for any t, u, if ::;; tg and fu ::;; guo (iii) If P is a guard and f ::;; p then pf = f = fp, p'f= 0 =fp'· (iv) If p is a guard and f ::;; 1 then pf = fp. (v) For p, q guards, pp = p and pq = qp.
91 3.3 The Boolean Algebra of Guards Before reading the proof, readers should hone their intuition by checking that (i) through (v) do indeed hold in Pfn(X, X). PROOF. (i) As f = f + 0, f ~ f so ~ is reflexive. If 9 = f + w, h = 9 + v then by partition associativity, h = 9 + v = (f + w) + v = f + (w + v). Thus, if f ~ 9 and 9 ~ h, f ~ h so ~ is transitive. (ii) If 9 = f + h, tg = t(f + h) = if + w for w = th and gu = fu + v for v == hu. (iii) Write p = f + h. Then 0 = pp' = (f + h)p' = fp' + hp' so fp' = 0 by Proposition 3.1.8. But then f = f(p + p') = fp + 0 = fp. That p'f = 0 and f = pf i~ similar. (iv) Applying (ii) to f ~ 1, pf ~ pI = p and similarly fp ~ p. By two uses of (iii) we have pf = (pf)p = p(fp) = fp· (v) From p ~ p, pp (iv) that pq = qp. p is immediate from (iii). Since p = 20 Theorem. Consider Guard(X) with the sum-ordering guards p, q p Furthermore, (Guard(X), ~) ~ ~ 1, it follows from D ~. Then for any q<=>pq = p. is a poset. PROOF. Let p, q be guards. If pq = p then q = (p + p')q = pq + p' q = p + h for h = p' q, so p ~ q. Conversely, if p ~ q then pq = p by 19 (iii). We know ~ is reflexive and transitive from 13 (i). To prove antisymmetry, we note that if p ~ q and q ~ p then, using 19 (v), p = pq = qp = q. The following proposition prepares the way to prove that (Guard(X), is a Boolean algebra. 21 Proposition. For brevity, write G for Guard(X). Let p, q E G. Then: (i) Also pq E G with (pq)' = pq' + p' q + p' q'. (ii) The infimum of p, q in (G, ~) exists and is pq. (iii) p ~ q if and only if q' ~ p'. PROOF. (i) We have + p')(q + q') = p(q + q') + p'(q + q') pq + (pq' + p'q + p'q'). 1 = 1· 1 = (p = D ~)
92 3 Partially Additive Semantics Making free use of 19 (v) pq(pq' + p'q + p'q') = pqq' + pp'q + pp'qq' = pO + Oq + 00 = O. Similarly, (pq' + p'q + p'q')pq = 0 so by virtue of the defining equations in 13, pq is in G with (pq)' + p'q +p'q'. pq' = (ii) That pq is in G has just been established. By 19 and 20 (pq)p = ppq = pq shows pq ~ p and pq ~ q similarly. If also r ~ p and r ~ q then rp = r = rq so r(pq) = (rp)q = rq = rand r ~ pq. By Definition 3, pq = P 1\ q in (G, ~). (iii) If p ~ q then p = pq so by (i) p' = (pq)' = pq' + p'q + p'q' = (p + p')q' + p'q = q' + h, h = p'q. So q' ~ p'. (We caution the reader that if ac + bc exists (a + b)c may not; in the above, pq' + p' q' = (p + p')q' is valid because we know p + p' exists.) Conversely, if q' ~ p' then by the result already proved, p" ~ q". So, recalling 15,p ~ q. 0 We can now establish the main result of the section. 22 Theorem. For G = Guard(X) for X an object of a partially additive category C, with sum-ordering ~,(G, ~) is a Boolean algebra. Furthermore, (i) (ii) (iii) (iv) ° the empty sum is the least element of (G, ~); I = idx is the greatest element of (G, ~); the irifinum operation is p 1\ q = pq; the Boolean algebra complement of p coincides with the guard complement p'; and (v) the supremum operation is given by any of p v q = pq' + p' q + pq = pq' +q = P + p'q. Since Op = 0, pi = p, (i) and (ii) are clear, and (iii) has already been shown in 21. For the moment let p' denote the guard complement. Given p, q, it follows from 21 (iii) that (p' q')' is the supremum of p, q as follows. As p' q' ~ p', p ~ (p' q')'. Similarly, q ~ (p' q')'. It also p ~ t, q ~ t then t' ~ p', t' ~ q' so t' ~ p' q'; hence, (p' q')' ~ t. Using 21 (i) this shows PROOF. p v q = (p'q')' = p'q + pq' + pq, which has alternate forms pq' We then have + (p' + p)q = pq' +q and p + p' q similarly.
93 3.3 The Boolean Algebra of Guards = P + P'P' = P + p' = 1, p /\ p' = pp' = 0 = p' p = p' /\ p. P V P' So p' is also the lattice complement. Finally, in accordance with Definition 12, we now must prove the distributive law of 9. Indeed, p /\ (q V r) = p(q + q'r) = pq+ pq'r, whereas (p /\ q) V (p /\ r) pq + (pq)'(pr) = = pq + (pq' + p' q + p' q')(pr) = pq + pq'r + p'pqr + p'pq'r = pq + pq'r. D We have now justified the following definitions which extend those of Section 1.5 to an arbitrary partially additive category. 23 Definitions. Let C be a partially additive category. An n-way test on X is a summable n-tuple (Pl" .. , Pn) with each Pi E Guard(X). If fl' ... ,fn: X -+ Y and (Pl" .. , Pn) is an n-way test, we define case (Pl"" ,Pn) of (fl"" ,f,.) = flPl + ... + fnPn· This sum exists by Corollary 3.2.20. This recaptures both the case statement of 1.5.22 for C = Pfn and the alternative construct 1.5.23 if C = Mfn. An important special case occurs for p E Guard(X), f, g: X -+ Y: if p then f else g = fp + gp'. For pEGuard(X), f: X -+ X we also have while p do f L p'(fpt, 00 = repeat f until p = n=O L (p'f)(pf)n, 00 n=O generalizing 1.5.27 and 1.5.29 for C = Pfn or Mfn. For the repetitive construct of 1.5.30 let (Pl"'" Pn) be an n-way test on X and let fl' ... , f,.: X -+ X. Then do Pl -+ flO" . 0 Pn -+ f,. od = while Pl v ... V Pn do flPl + ... + fnPn· EXERCISES FOR SECTION 3.3 1. For the category C(p.';) of Exercise 2.1.7, show that a least element of (P, ~) is the same thing as an initial object of C(P. ,;). Then invoke duality to prove that a greatest element of (P, ::;) is a terminal object of C(P, ,;). 2. In the context of Definition 3, show that the uniqueness of the supremum follows by duality. Also invoke duality for the proofs of (iv), (v), and (vi) in Proposition 5 and the "remaining results" in the proof of Proposition 6.
94 3 Partially Additive Semantics 3. Show that "any subset of a poset is a po set." More precisely, show that if (P, :::;;) is a po set and Po c P then (Po, so) is a poset if x :::;;0 y means that x s y. Also prove that (Po, so) is totally ordered if (P, :::;;) is. 4. Let (P, s) be a poset. By Exercise 3, if A c P, A is itself a poset so it is meaningful to discuss the least element or the greatest element of A. Let x, YEP. A lower bound of x, y is Z E P such that z s x and z s y. Let LB(x, y) be the set of lower bounds of x, y. Similarly, let UB(x,y) = {zlx:::;; z and y:::;; z} be the set of upper bounds of x, y. Show that the greatest lower bound x /\ y is literally the greatest element of LB(x,y) in the sense that one exists if and only if the other does and then they are equal. Similarly, show that x v y is the least element of UB(x, y). 5. By Propositions 5 and 6, if (P, s) is a meet-semilattice with greatest element 1, then (P, /\,1) is a monoid and two special properties hold: Commutativity: x /\ y = Y /\ x. Idempotency: x /\ x = x. Conversely, show that if (P, 0, e) is any monoid in which commutativity and idempotency hold (x 0 y = yo x, x 0 x = x) then (P, s) is a meet-semilattice with greatest element if x s y is defined to mean x 0 y = x. In fact, show that these constructions establish a bijection between meet-semilattice-with-greatestelement structures and monoid structures on P. This summarizes by saying "a meet-semilattice with greatest element is the same thing as a commutative idempotent monoid." 6. In the po set with Hasse diagram y x show that x v y does not exist, even though there do exist z with x :::;; z, y :::;; z. 7. In any lattice, prove the absorptive laws: x v (x /\ y) = x, X /\ (x V y) = x, for all x, y. 8. In any distributive lattice prove that x v (y /\ z) = (x v y) /\ (x V z). Why does this not follow from duality? 9. Let P = {1, 2, 3, ... }. (P, s) is totally ordered ifn :::;; m means that m is numerically larger than n. Another important partial ordering on Pis nlm if n divides m, that is, m = an for some integer a. Verify that (P, I) is a lattice with
95 3.3 The Boolean Algebra of Guards n A m = greatest common divisor of n, m, n v m = least common multiple of n, m, 1 = least element (defying standard notation!), which is not totally ordered and has no greatest element. 10. Verify the following laws in any Boolean algebra: (i) x" = x. (ii) x ::;; y if and only if y' ::;; x'. (iii) (De Morgan's Laws): (x v y)' (x A = x' A y', y)' = x' v y'. (iv) Use induction to show (Xl v ... v x n )' = x~ A ... A X~. 11. Let (P, ::;;) be a Boolean algebra. For p, x, YEP define if p then x else y = (p A x) V (p' A y). Verify the following: (i) p' = if p then 0 else 1; (ii) p v q = if p then 1 else q. 12. Let f: X -> Y in a partially additive category. Define 7, if it exists, to be the least element of {pEGuard(X)lfp = In Pfn, show that 7 exists and is incA for A = DD(f). Similarly, in Mfn, prove = incA for A = {xEXlf(x) i= 0}. Hence, 7 is a candidate for a general notion of "domain of definition" for morphisms in a partially additive category. n. 7 13. If C is a partially additive category, the sum-ordering on C(X, Y) is f::;; g if g = f + h for some h. By the same proof as that of 19 (i), ::;; is reflexive and transitive. (i) Show that (C(X, Y), ::;;) is a poset if C is Pfn or Mfn. (ii) In general, define the extension-ordering c: on C(X, Y) by f c: g if g = fp for some p E Guard(X). Show that (C(X, Y), c:) is a poset. (iii) For f, g E C(X, Y) show that f c: g => f::;; g. (iv) Show that f c: g ¢ > f ::;; g in Pfn but give an example in Mfn with f ::;; g but not f c: g. [Hint: For the latter, let X have one element.] 14. A partially additive semiring is (R, I, 0,1), such that (R, I) is a partially additive monoid, (R, 0,1) is a monoid (we write pq rather than po q), and the following distributive laws hold: if (q;!i E J) is summable in R then for each p, q E R, (qiPli E J) and (rp;! i E J) are also summable and (I q;)p = I (qiP) r(I qi) =I (rqi) The empty sum is not excluded, that is, Op = 0 = pO.
96 3 Partially Additive Semantics Show that if C is any partially additive category with partially additive structure I then for every object X, (C(X, X), Ix,x, 0, idx)is a partially additive semiring, where 0 denotes C-composition. 15, Let (R, I, 0,1) be a partially additive semiring as in Exercise 14. Define the sum-ordering ~ on R as in Exercise 13. Verify that the center C of (R, I, 0,1) defined by C = {pERlthere exists p' ER with p + p' = 1, pp' = 0 = p'p} is a Boolean algebra with order ~. [Hint: Check that all results culminating with Theorem 22 go through unchanged.] 16. Refer to Exercises 14 and 15 for terminology. The unit interval of a partially additive semiring consists of all x with x ~ 1. The center is always a subset of the unit interval and they are equal in Pfn(X, X) and Mfn(X, X). The following develops an example with a trivial center but a large unit irrterval. Let a < b be real n umbers, let [a, b] denote the closed interval {x Ia ~ x ~ b}, and let R be the set of all functions f: [a, b] ----> [a, b] which are monotone, that is, if x ~ y then f(x) ~ f(y). (Thus, a function is monotone if and only if its graph is never decreasing.) We assume the reader to be familiar with the fact that every subset of [a,b] has a supremum. (i) Show that (R, I, 0,1) is a partially additive semiring if (I.t;)(x) = V.t;(x), the supremum of the .t;(x), (f 0 g)(x) = f(x) 1\ g(x) = minimum of f(x), g(x), and 1 is the identity function l(x) = x. Show also that the empty sum 0 is the function O(x) = O. (ii) Show that the center of (R, I, 0,1) is {O, I}. (iii) Show that the unit interval of R is infinite. 17. Let (M, 0, e) be any monoid. For the partially additive category FwR(M,o,e) of Exercise 3.2.11 show that Guard(X) is the set of all incA (using the notation of Exercise 2.1.11) with A a subset of X. 18. In any partially additive category, show that for pEGuard(X), f: X .... X, both while p do f and repeat f until p have form h t for appropriate h: X ----> X + X. 19. Let V be a vector space and let P be the set of all subspaces of v: By Exercise 3, (P, c) is a poset if c denotes subset inclusion. The zero subspace {O} is the least element of P and V itself is the greatest element. (i) Prove that A II B is a subspace for A, BE P and conclude that A II B is the infimum. (ii) While A u B need not be a subspace if A, B are, A v B exists and is the linear span of Au B. Verify this. H follows that (P, c) is a lattice, the lattice of subspaces of v: Notes and References for Chapter 3 Partially additive monoids and categories were introduced by the authors in "Partially-additive categories and flow-diagram semantics," Journal of Algebra, 62, 1980, pp. 203-227. Exercise 3.1.1 is due to M. E. Steenstrup. The idea of the iteration as a construction assigning a morphism of the form
Notes and References for Chapter 3 97 x .... Y to one of the form f: X ---+ X + Y is due to C. C. Elgot, "Monadic computation and iterative algebraic theories," in Proceedings of Logic Colloquium '73 (H. E. Rose and J. C. Shepherson, Eds.), North-Holland, Amsterdam, 1975. Other mathematical structures have subsets which are Boolean algebras. For any distributive lattice with least and greatest elements, the subset of elements which have a complement is a Boolean algebra. We leave it as an exercise to verify that this is a special case of Exercise 3.3.15 (ignore the limit axiom which plays no role, define summable = all but finitely many are the least element, sum = supremum, composition = infimum). For those familiar with a little ring theory, another well-known result is that in a commutative ring the subset of all p with pp = p is a Boolean algebra. While this is not formally a consequence of Exercise 15, it is interesting to note that if p + p' = 1, pp' = 0 = p' p in a ring then, as p' = 1 - p, p(1 - p) = 0 so that pp = p. Hence, one suspects there is a common thread. Much has been written about Boolean algebras. The reader may enjoy the earlier sections of P. R. Halmos, Lectures on Boolean Algebras, Van Nostrand, 1967. The material on Guard(X) and, more generally, on the center of a partially additive semiring is adapted from E. G. Manes and D. B. Benson. "The inverse semigroup of a sum-ordered semiring," Semigroup Forum, 31, 1985, pp. 129-152. ft:
CHAPTER 4 Assertion Semantics 4.1 Assertions and Preconditions 4.2 Partial Correctness 4.3 Total Correctness In the introductory Section 4.1 we informally define partial correctness assertions and notions relating to weakest preconditions with the Pascal fragment of Section 1.2 in mind. Here, we state a number of well-known properties and proof rules whose truth is intuitively evident. In keeping with the spirit of this book we must break with current custom in expositions of assertion semantics by emphasizing the underlying mathematical framework without choosing any specific programming language. In particular, the "program state" upon which the informal definitions of Section 4.1 are built is not available in the general setting. We show that the theory of guards of Section 4.3 allows us to generalize a number of properties of partial correctness. We then introduce the notion of "kernel-domain decomposition" and show in Sections 4.2 and 4.3 that the remaining concepts and results of Section 4.1 can then be formulated and established in any partially additive category in which each morphism has a kernel-domain decomposition. 4.1 Assertions and Preconditions In this book, our stress is on denotational semantics: given a program S, we associate with it a denotation which is a morphism f: X -+ Y that relates the state before the computation to the state (or states) after the computation. However, another approach to program semantics emphasizes the preconditions which must be met before a program is used, and the postconditions which can be guaranteed to hold thereafter. For example, a program G to compute the greatest common divisor of two numbers might only work if both numbers are positive. The precondition might thus be stated as x =
4.1 Assertions and Preconditions 99 Xo > 0, Y = Yo > 0, where x and yare variables, and Xo and Yo are specific values. On exit, we might not care about the final values of the variables x and y, but want to assert that z holds the desired result gcd(xo, Yo) of the computation. We could write this as 1 {x = Xo > 0, y = Yo> O} G {z = gcd(xo, Yo)}· Note that in this formulation, "all bets are off" as to how G will perform if the precondition is not met. In fact, in this "assertion semantics" that goes back to Floyd and Hoare, 1 is even weaker than our exposition so far sounds, for it is to be interpreted as a partial correctness specification, asserting only that "if the precondition is met and if G thereafter halts, then the postcondition will be met." A total correctness specification would include the stronger claim that the precondition guarantees the eventual termination of G's computation. In general, the precondition and postcondition need not specify more than a few variables used in the program-the idea is that to check that a program is correct, we often need only check the processing of a few key variables. For an account of the practicality of this approach, describing a methodology whereby programs and their specifications are developed together with (possibly informal) correctness proofs in a process of stepwise refinement, see the text by Alagic and Arbib cited in the Chapter 1 notes. Here our task is to reconcile our denotational semantics with the use of assertion semanticsbased on the use of assertions as preconditions and postconditions in program specifications, a methodology especially associated with R. Floyd, C. A. R. Hoare, and E. Dijkstra. In the rest of this section, we provide a semiformal introduction to assertion semantics based on the Pascal fragment of Section 1.2. We then embed it in partially additive semantics in the following two sections where no specific language is in the picture. 2 Definition. A partial correctness specification is a structure of the form {IX} S {P} where IX and P are tests (the precondition and postcondition, respectively) and S is a statement of the programming language. This is regarded as asserting that "If the program state initially satisfies IX and if execution of S terminates, then the program state upon termination satisfies p." 3 Definition. The weakest liberal precondition operator, wlp(S, p), where S is a statement and Pis a test, defines a new precondition: "wlp(S, p) is satisfied by any initial state with the property that, if S terminates, it does so in a state satisfying p." Thus, wlp(S, P) also holds for all initial states from which S does not terminate. Letting P ¢> Q be logical equivalence (i.e., P => Q 1\ Q => P, P is true if and only if Q is true) we clearly have the following: 4 Observation. {a} S {P} ¢> (IX => wlp(S, P)). By contrast with these partial correctness assertions, (the word "liberal" in 3 is the sign of this partialness) total correctness assertions insist that S halts.
100 4 Assertion Semantics 5 Definition. The weakest precondition operator wp(S, {3), where S is a statement and {3 is a test, defines a new precondition: "wp(S, {3) is true of an initial state from which S terminates and does so in a state satisfying {3." The total correctness version of 2 is then 6 a=> wp(S, {3) which asserts that precondition a guarantees that S will terminate and will so do in a state satisfying {3. The ultimate objective is to provide useful assertions of the form 2 and 5 when S describes a (perhaps complex) algorithm. While we make no attempt to be complete, we mention the following rules which have been found in practice to be important in tailoring assertions to statements and we encourage the reader to work the exercises. 7 Proof Rule. If a=> a1 , {ad S {{3d and {31 => {3 then {a} S {{3}. 8 Proof Rule (Composition Rule). If {a} R {{3}, {{3} S {y} then {a} begin R; S end {y}. 9 Proof Rule (Conditional Rule). If {a /\ B}R{{3} and {a /\ ,B}S{{3} then {a} if B then Reise S {{3}. 10 Proof Rule (Iteration Rule). If {a /\ B} S {a} then {a} while B do S{a /\ 'B}. For example, Proof Rule 9 makes sense because, if we have precondition a satisfied before executing if B then Reise S, then (since tests do not change the value of variables), we will have that precondition a /\ B holds if we take the R path, while precondition a /\ I B will hold if we take the S path. In either case, we are guaranteed that {3 will hold if and when the computation terminates. In Proof Rule 10, a is what Floyd calls a loop invariant. It is a property of the program state that remains unchanged no matter how many times we go round the loop of while B do S, as long as it holds when we first enter the loop. In the next section, we shall see how to interpret partial correctness specifications {a} S {{3} in partially additive semantics and then rigorously prove the above proof rules in that setting. We shall also prove that the weakest liberal precondition operator satisfies analogues of the following properties. 11 Property. wlp(begin Sl; S2 end, {3) 12 Property. wlp(S, true) = true. = Wlp(Sl' wlp(S2' {3))
101 4.1 Assertions and Preconditions 13 Property. wlp(S,Pl 1\ P2) = wlp(S,Pl) 1\ wlp(S,P2). The weakest precondition operator (total correctness) satisfies the following properties. 14 Property. wp(S,false) 15 Property. WP(S,Pl 1\ = false. pz) = WP(S,Pl) 1\ wp(S,Pz)· The composition law expressed by 16 Property. wp(begin Sl; S2 end; P) = WP(Sl' wp(Sz' P)) is true for deterministic semantics (i.e., the semantics if Sl' Sz are partial functions) but becomes problematic in the nondeterministic case. We refer the reader to Exercise 6. A mathematically precise insight into 16 is provided in Theorem 4.3.8 below. EXERCISES FOR SECTION 4.1 1. Verify that {x = Xo 2 O} begin y:= 0; while x> 1 do begin x := x - 2; y:= y+ 1 end {y = Xo div 2} end by first verifying that {x = Xo > l,y = Yo} begin x:= x - 2; end y:= y + 1 {X=X o -220,y=yo+l} and then using Proof Rules 7 through 10. 2. Verify that {false} S un is true for any Sand p. 3. Prove wp(n := n*n, {n > O}) = true (applied to integers). 4. For odd(n), the predicate which is true of an integer just in case it is odd, verify that wp(while I odd(n) do n := n div 2, odd(n)) = true. 5. Use the semantic equivalence repeat S until B = begin S; while IB do Send and 8 and 10 to infer a suitable proof rule for repeat S until B. 6. Let the semantics of SI' S2 be multifunctions. Show that 16 fails if the semantics of begin SI; S2 end is defined using the composition of Mfn but holds if the ANMfn composition 2.1.5 is used.
102 4 Assertion Semantics 4.2 Partial Correctness For the balance of this chapter we work in a partially additive category C. No additional axioms are required to give a definition capturing a suitable interpretation of {IX} S {[J} and to formulate and prove the corresponding Proof Rules 7 through 10 of Section 4.1. Thereafter, kernel-domain decompositions will be introduced to define a suitable weakest liberal precondition operator. Our formulation of {IX} S {[J} will rest on our theory of guards developed in Section 3.3. In PCn, we may associate S with its denotation f: X --+ Y, IX with a total function X --+ {true,false}, and [J with a total function Y--+ {true,false}. But we might just as well associate IX with the guard inCA: X --+ X where A = {xIIX(x) = true}, and [J with the guard incB = {yl[J(y) = true}. We note that {O(} S {[J} can then be reexpressed in either of two equivalent ways: 1 which says that if IX(X) is true (incA(x) is defined) and f(x) is defined, then [J(f(x» is true (incB(f(x» = f(x». Or, incB,·f·incA = 0 2 which says that if IX(X) is true and f(x) is defined, then it is not the case that [J(f(x» = false. To generalize this to an arbitrary partially additive category C, the reader should recall the definition of Guard(X) from Section 3.3 and our proof that Guard(X) was a Boolean algebra. We first make the following observation: 3 Observation. Given f: X --+ Y qfp in C, p E Guard(X), q E Guard ( Y) = fp -= q'fp = o. PROOF. = fp, then q'fp = q' qfp = O. If q'fp = 0, then fp = (q + q')fp = qfp· If qfp D It is then clear the following generalizes the informal semen tics we offered for {O(} S {[J} in Section 4.1. 4 Definition. Given f: X --+ Yin C, p E Guard (X), q E Guard(Y), we write {p}f{q} if either of the equivalent conditions qfp = fp or q'fp = 0 hold. We now state and prove analogues of Proof Rules 4.1.7-10.
103 4.2 Partial Correctness 5 Proposition. Given p :::;; PI EGuard(X), ql :::;; qEGuard(y), and f: X {pdf{qd then {p}f{q}. -+ Y, if PROOF. We have PIP = PI /\ P = P and ql v q = q. By De Morgan's Law (Exercise 3.3.10), q' = q'ql. Thus, q'fp = q'qlfplP = q'Op = O. D 6 Proposition. The composition rule holds. If {p}f{q} and {q}g{r} with f: X -+ Y, g: Y -+ Z then {p} gf{r}. rgfp PROOF. = rg(qfp) = (rgq)fp = gqfp = gfp· D 7 Proposition. The conditional rule holds. If {p /\ q}f{r} and {p /\ q'} g {r} then {p} if q then f else g {r}. PROOF. We recall from 3.3.23 that for f, g: X if q then f else g Since rfqp = = -+ fq Y, q EGuard(X), + gq'. fqp, rgq' p = gq' p are given, r(fq + gq')p = rfqp + rgq'p = fqp + gq'p = (fq + gq')p. 8 Proposition. The iteration rule holds. Given p, qEGuard(X), f: X {p /\ q} f{ q} then {q} while p do f{p' /\ q}. PROOF. D -+ X, if Recall from 3.3.23 that while p do f = L p'(fpt. 00 n=O We are given q'fqp = o. It suffices to prove that (p'q)'p'(fp)"q = 0 for all n ?: O. Using De Morgan's Law and the formula for supremum of 3.3.22, (p'q)' = p v q' = p + p'q', so (p' q)'p'(fp)"q For n = = (p + p' q')(p'(fp)n q) = p' q'(fptq. 0, p' q' q = O. As q'fp = q'fp(q + q') = q'fpq', for n > 0, we have p'q'(fp)n q = p'(q'fp)(fp)"-Iq which ends in q' q and so is O. = ... = p'q'(fpq')"q. D To define the weakest liberal precondition operator we introduce kerneldomain decompositions. The idea is very intuitive. For f: X -+ Yin Pfn, X decomposes as the disjoint union of two subsets K, D where D = DD(f) and K = D'. Thus, if i: K -+ X, i(x) = x and j: D -+ X, j(x) = x are the inclusion functions, we have that
104 4 Assertion Semantics is a coproduct diagram such that fi = 0 whereas fj is total. Noting that total morphisms were defined in any category with zero morphisms (2.2.21 asserted that f is total if t#-O always implies that ft #- 0) we have motivated the following definition. 9 Definition. For f: X --+ Y in the partially additive category C, a kerneldomain decomposition of f is (K, i, D,j) such that (i) K ~ X ~ D is a coproduct. (ii) fi = O. (iii) fj is total. C has kernel-domain decomposition if every morphism has a kernel-domain decomposition. 10 Example. Pfn and Mfn have kernel-domain decompositions. For f E Pfn set D = DD{f), K = {xlf(x) is not defined} as discussed above. Similarly, in Mfn, D = {xlf(x) #- 0}, K = {xlf(x) = 0}. While not obvious from the definition, two kernel-domain decompositions of a morphism are unique up to isomorphism. A proof is outlined in Exercises 3, 8, 10, and 11. The motivation for "domain" in "kernel-domain" is clear. We have chosen "kernel" because K indeed does act as a kernel in the sense of algebra and category theory. See Exercises 3-7. For the balance of this chapter we assume that our partially additive category C has kernel-domain decompositions. To proceed further, we introduce some shorthand for Pfn. For f: X --+ Y, set K{f) = {xEXlf(x) is not defined} and define two useful guards by d(f) = incDD(f)' k{f) = incK(f)' We can use k to define wlp in Pfn. For, given f: X --+ Y and q E Guard (X), we would like to define wlp{f, q) to be the guard incA where A = {xlf(x), if defined, satisfies q} = {x Iq'j(x) is not defined}. We thus have wlp{f,q) = k(q'f). Our next task then is to show how to define d{f) and k{f) in any partially additive category with kernel-domain decompositions. 11 Definition. A kernel-domain system for f: X --+ Y is
105 4.2 Partial Correctness K, i )X' P j )D Q' where (K, i, D,j) is a kernel-domain decomposition of f and P and Q are the quasi projections of 3.2.6: By 3.2.16 we have iP + jQ = id x . Since (iP)(jQ) = i(Pj)Q = 0 and (jQ)(iP) = 0 similarly, it follows from 3.3.13 that iP, jQ E Guard{X) and iP = (jQ)'. We write d(f) for jQ and k(f) for iP. (These guards do not depend on the choice of kernel-domain system as is proved in Theorem 13 below.) The weakest liberal precondition operator is then defined in terms of k(f) in the expected way: 12 Definition. Given f: X by ~ Y and q E Guard(Y) define wlp(f, q) E Guard(X) wlp(f, q) = k(q'!). 13 Theorem. Let K, i P )X' be a kernel-domain system for f: X ~ j Q )D Y. Then r = iP is the only guard r E Guard(X) satisfying the conditions (i) fr = 0, and (ii) if h: W ~ X is such that fh = 0 then rh = h. (Hence k(f) = iP in Definition 11 depends only on f and not on the particular kernel-domain system since (i) and (ii) are solely in terms of f and, similarly, d(f) depends only on f since it was observed in 11 that d(!) = (k(f))'.) PROOF. We first show r = iP satisfies (i) and (ii). As fi = 0 by the definition of a kernel-domain decomposition 9, fiP = O. For (ii), if fh = 0 then as iP + jQ = idx we have o = fh = f(iP + jQ)h = fiPh + fiQh so that fjQh = 0 by 3.1.8. By the definition of a kernel-domain decomposition, fj is total so that Qh = O. But then
106 4 Assertion Semantics h = (iP + jQ)h = iPh as desired. For the uniqueness statement, let r satisfy (i) and (ii) and define A = {tEGuard(X): ft = O}. Then as r satisfied (i), rEA. Furthermore, if tEA then setting h = t in (ii). rt = t, that is, t :::;; r in the Boolean algebra Guard(X). This shows that r is the 0 greatest element of A. The above proof leads to an order-theoretic characterization of wlp(f, q) E Guard (X): 14 Corollary. For f: X - Y, qEGuard(Y), the sets = {tEGuard(X): q'ft = O} B = {tEGuard(X): qft = ft} A are equal and have wlp(f, q) as greatest element. PROOF. If q'ft = 0, ft = (q + q')ft = qft whereas, conversely, if qft = ft then q'ft = q'(qft) = (q' q)ft = 0. Hence, A = B. That wlp(J, q) is the greatest element of A is immediate from Definition 12 and the proof of Theorem 13 (with fq' instead of f). 0 We are now able to establish the fundamental properties of wlp, analogous to 4.1.4 and 11-13. 15 Proposition. If f: X - Y, pEGuard(X), qEGuard(Y) then {p}f{q} if and only if p :::;; wlp(f, q). PROOF. Let p = wlp(f,q). If {p}f{q} then qfp = fp so by 14 p :::;; p. Conversely, if p:::;; p, qfp = qf(pp) = (qfp)p = (fp)p = f(pp) = fp so {p}f{q}. 0 16 Proposition. Given f: X - Y, wlp(J, wlp(g, r)). g: Y - Z, rEGuard(Z), wlp(gJ,r) = PROOF. Let q = wlp(g, r), P= wlp(J, q). By 14 we need only show that p is the greatest element of A = {p EGuard(X)lrgfp = gfp}. To see that pEA, rgfp = rg(qfp) = (rgq)fp = (gq)fp = g(qfp) = gfp· If pEA, then rg(fp) = gfp so by 13 (ii) (with r'g for f and fp for h) qfp = fp. But then by 14, p :::;; p. 0 17 Proposition. For f: X - Y, wlp(J, 1) = 1 (more precisely, wlp(f, idy ) = idx). PROOF. idx is the greatest element of {pEGuard(X): idyfp = fp}· 0
4.2 Partial Correctness 107 IS Proposition. For f: X -+ y, ql' qz EGuard(Y), wlp(f,ql /\ qz) PROOF. Let fii = wlp(J,qd /\ wlp(f,qz). = wlp(f,qJ We must show filfiz is the greatest element of A = {pEGuard(X): qlqzfp = fp}. To see filfiz E A, qlqzffilfiz = qz(qtffidfiz = qz(ffidfiz = (qzffiz)fil = ffilfiz· Now let pEA. Then (ql qz),fp = O. By De Morgan's Law and 3.3.22, (ql qz)' = v q; = q~ + qlq;. Thus, by 3.1.S, qUp = 0 and so, by 3, qtfp = fp. Thus, p :::;; fil by 14. SYQlmetrically, p :::;; fiz so p :::;; fil /\ fiz· 0 q~ EXERCISES FOR SECTION 4.2 1. Establish your proof rule of Exercise 4.1.5 in a partially additive category. 2. In a partially additive category, a diagram A, i P j 'x' Q ,B is a direct sum system if Qj = id y , Qi = 0, Pj=O, iP + jQ = idx . Thus, if X is the coproduct of A, B with injections i, j and P, Q are the quasi projections, a direct sum system results. Show, conversely, that given a direct sum system as above is a coproduct. [Hint: Use 3.2.16.] 3. In any category with zero morphisms, (K, i) is a kernel of f: X satisfies the following: -> Y if i: K -> X (i) fi = o. (ii) If t: T -> X satisfies ft = 0, there exists unique , X K _---"-1_-+, y ~'\/ T a: T -> K with ia = t. Given f, show that any two kernels of f are isomorphic. 4. For f: X -> Yin Pfn, show that incK : K not defined}. -> X is a kernel of f if K = {xEXlf(x) is
108 4 Assertion Semantics 5. For f: X ...... Yin Mfn, show that incK: K ---+ X is a kernel offif K = {xEXlf(x) = 6. For f: X ...... Yin Mon, show that incK: K e}, where e denotes the unit of Y. ---+ X is a kernel offif K = {xEXlf(x) = 7. For f: X ...... Yin Vect show that incK: K O} is the null space of f ---+ X is a kernel of fif K = {xEXlf(x) 0}. 8. For a direct sum system as in Exercise 2, show thatj: B that i: A ---+ X is a kernel ofO. ---+ = X is a kernel of P and 9. Let C be a nonempty category with zero morphisms. Prove that C has a zero object if and only if every total morphism has a kernel. In particular, by Exercises 4-7, Pfn, Mfn, Mon, and Vect have zero objects. 10. Let A, i P 'X' j Q - ,B f J- P Q A,_'X'_,B be direct sum systems in a partialIy additive category as in Exercise 2. Assume that there exists an isomophism IX, Show that there exists an isomorphism fJ . B x~lp ~ ] jj [Hint: First argue that i j- A,_'X'_,B «P Q is a direct sum system. Using ilXP + JQ = idx show P::;; IXP and similarly show IXP ::;; P. Now use Exercises 8 and 3.] 11. In a partialIy additive category, show that if (K, i, D,j) is a kernel-domain decomposition of f then i: K ---+ X is a kernel of f [Hint: The proof is implicit in the construction of fJ in the proof of Theorem 4.3.8 below.] Conclude, using Exercises 10 and 3, that any two kernel-domain decompositions of f are isomorphic. 12. Given fl' ... , J.: X define ---+ X, PI' ... , PnEGuard(X) in a partialIy additive category, to mean flPl + .. , + J.Pn (we assume that the sum exists). If r E Guard(X) is such that
109 4.3 Total Correctness {r A p;}J;{r} for all i, prove that {r} DO {p~ A •.. A p~ A r}. 13. Let V be a "value" set with at least two elements. Let Pfov be the category whose objects are sets and whose morphisms are given by Pfov(X, Y) = Pfo(X x V, Y x V) with X x V and so forth the Cartesian product of sets, and with composition and identities as in Pfo. For f E Pfov(X, Y), think of X as a set of input lines, Yas a set of output lines, and interpret f(x, d) = (y, e) as "input value d on line x results in output value e on line y." Prove that Pfo v is partially additive but show that if v =f:. Vl E V and x =f:. Xl EX then ifpEPfov(X,X) is defined by DD(p) = {(v,x),(Vl,X l )} with p(v,x) = (v, x), p(vl,xd = (vl,xd then pEGuard(X) but does not have a kernel. Conclude that Pfo v does not have kernel-domain decompositions. 14. Let (M, 0, e) be monoid. Show that the partially additive category FwR(M.o.e) of Exercise 3.3.17 has kernel-domain decompositions. Explain the following slogan for this category: "All asserted truth is reliable." 4.3 Total Correctness The weakest precondition wp(S, {3) of 4.1.5 strengthens the liberal precondition wlp(S, {3) of 4.1.3 by guaranteeing that computation of S will terminate. This shows that the relationship between wlp and wp in Pfn should be wp(f, q) d(f) = A wlp(f, q), where d(f) = incDD(f)' that is, "wp(f, q) is true of an initial state providing f is defined and wlp(f, q) is true." To elevate the theory of weakest precondition to a partially additive category with kernel-domain decompositions, then, we recall our definition 4.2.11 of d(f) E Guard(X) for each f: X ~ Y. We then prove that wp satisfies the analogues of 4.1.14-15. We are also able to characterize when the composition theorem 4.1.16 should hold. We recall that if f: X ~ Y has kernel-domain system KE i P )XE j Q )D then d(f) E Guard (X) is defined as jQ. 1 Definition. For f: X ~ Y, qEGuard(Y), wp(f,q)EGuard(X) is defined by wp(f, q) = d(f) A wlp(f, q). 2 Example. In Pfn, d(f) = incDD(f); if r = inc R , wp(f, r) = incs where S = {xEXlf(x) is defined and f(X)EQ}. The analogues of 4.1.14-15 follow quickly.
110 4 Assertion Semantics 3 Proposition. For any f: X -+ Y, wp(f, 0) = O. PROOF. Let K( i 'X( P j Q ,D be a kernel-domain system for f As f = fO', by Definition 4.2.12, wlp(f, 0) = k(O,!) = k(f) = iP. Since d(f) = jQ, we have d(f) = wlp(f,O), by 4.2.11. Thus, wp(J, 0) = d(f) 1\ wlp(J, 0) = wlp(J, 0)' 4 Proposition. Givenf: X -+ Yand q1' q2EGuard(Y), Wp(J,q1 1\ q2) = WP(J,q1) 1\ 1\ wlp(J, 0) = O. 0 WP(f,q2)· PROOF. Recall 4.2.18 that wlp(J, q1 1\ q2) = wlp(f, q1) 1\ wlp(J, Q2)' and the fact that a 1\ b 1\ C = (a 1\ b) 1\ (a 1\ c) in any lattice. Thus, WP(f,Q1 1\ Q2) = d(f) 1\ wlp(J,Q1 d(f) 1\ wlp(J,qd = = (d(f) = 1\ 1\ wlp(f, qt)) 1\ wp(J, q1) 1\ q2) wlp(f,q2) 1\ (d(f) wp(J, Q2)· 1\ wlp(J, Q2)) o We record the following basic facts about d(f): 5 Proposition. For f: X -+ Y, (i) wp(J, id y ) = d(f); (ii) f is total if and only if d(f) = idx . PROOF. (i) Recalling 4.2.17, wp(J, 1) = d(f) 1\ wlp(J, 1) = d(f) 1\ 1 = d(f). (ii) Let p = wlp(J, 0). Thus, fp = 0 by 4.2.13(i). If f is total, p = O. But d(f) = p' as was pointed out in the proof of Proposition 3, so d(f) = 1. Conversely, if d(f) = 1, p = O. Hence, if ft = 0 it follows from 4.2.13(ii) for p that t = pt = 0, so f is total. 0 The reader who has worked Exercise 4.1.6 will realize that, in ANMfn, the composition theorem wp(gf,Q) = wp(g, wp(f,q)) holds because when g(f(x)) is defined, g(y) is defined for every y Ef(x). This can be restated in a way more suitable for generalization, namely, as follows: Let i = incoo(t). If u is such that gu is total, DD(g) i , y_....:g'--...., Z ~\I u
111 4.3 Total Correctness then there exists unique a with ia = u as shown. This obviously is equivalent to the assertion that u(x) c DD(S) for each x E U, so this is indeed equivalent to the original principle, having used a set U of x's instead of just one. This suggests the following definition. 6 Definition. For g: Y -+ Z in any category with zero morphisms, a totalizer of 9 is (T, i) where i: T -+ Ysatisfies (i) gi is total; (ii) if u: U -+ Y is such that gu is total, there exists unique a: U ia = u. -+ T with Totalizers are unique up to isomorphism. See Exercise 2. 7 Examples. By the discussion above, every morphism 9 in ANMfn has incDD(s) as totalizer; similarly, in Pfn. It is clear that this construction does not provide a totalizer in Mfn. The following theorem then shows that wp(gf, q) = wp(f, wp(g, q)) in Pfn but not in Mfn. The theorem does not apply to ANMfn which is not partially additive (but see the end of chapter notes). 8 Theorem. Let g: Y -+ Z and let KE i P ) yE j Q )D be a kernel-domain system for g. Then the following two conditions are equivalent. 1. For all total f: X -+ Y and all q E Guard(Z), wp(gf, q) = wp(f, wp(g, q)). 2. j: D -+ X is a totalizer of g. PROOF. 1 = 2. That gj is total is given (Definition 4.2.9). Now suppose u: U -+ X is such that gu is total. If D '" j ~\\ ) ) X _---"-9_....) Y U = u then a = idDa = (Qj)a = Q(ja) = Qu so a is unique if it exists. We must show jQu = u. Now jQ = d(g) by definition. Also u is total by 2.2.22 so that d(u) = id u by 5(ii). Thus, ja
112 4 Assertion Semantics wlp(u,jQ) = id u /\ wlp(u, d(g)) = wlp(u, d(g)) /\ wlp(g, id z )) (by 4.2.17) = wp(u, wp(g, idz )) = wp(gu, idz ) = d(gu) /\ = (by hypothesis, as u is total) wlp(gu, idx ) (by 5(ii) and 4.2.17) id u But then by 4.2.13(i), jQu = jQuid u = uid u = u. 2 => 1. Given total f: X ...... Y, let be a kernel-domain system for gf. Consider p ,r---;--+l x jl g l y-""'::""'_lZ r r j D I --------. D y Such p, y exist as follows. Since 0 = (gf)i l = gidrfi l = g(iP + jQ)fi l , it follows from 3.1.8 that gjQfi l = O. As gj is total, Qfi l = O. But then if P is defined as Pfit> fi l = idrfi l = (iP + jQ)fi l = iPfi l = iP- To construct y simply use the hypothesis thatj: D ...... Y is a totalizer of g, since gfj 1 is total. We then observe the following: (i) (iPf)jl = O. For iPfjl = iPjy and Pj = O. (ii) (iPf)i l is total. For if iPfi l t = 0 then o = iPfi l t = iPiPt = ipt (as Pi = idK ) = fi l t. hence, as f is total, il t = 0 and then t = Pl il t = Pl0 = O. But then (i) and (ii) assert that (K l ,i l ,Dl ,jl) is a kernel-domain decomposition of iPf. Since iP = (jQ)' in Guard( Y) it follows from Definition 4.2.12 that wlp(f,jQ) = j 1 Q 1 . Since d(g) = jQ, d(sf) = jl Ql by Definition 4.2.11, this translates to 9 Hence, for q E Guard(Z), d(gf) = wlp(j, d(g))
113 4.3 Total Correctness wp(gj, q) = d(gf) /\ wlp(gj, q) = wlp(f,d(g)) /\ wlp(f, wlp(g,q)) (by 9 and 4.2.16) = wlp(f, d(g) /\ wlp(g, q)) (by 4.2.18) = d(f) /\ wlp(f, wp(g, q)) = (by 5, as j is total) o wp(f, wp(g, q)) EXERCISES FOR SECTION 4.3 1. Let X, Y be objects in Pfn. A guard transformer from Y to X is a function Guard(Y) ----"!:.... Guard (X), satisfying the axioms T(O) = 0, T(p /\ q) = T(p) /\ T(q), T(V p;) = V T(p;) for all families (p;) where, if Pi = inCA" V Pi = incA for A = U Ai' (i) Show that for all f: X --+ Yin Pfn that T(q) = wp(f, q) is a guard transformer from Y to X. (ii) Show that if T is a guard transformer from Y to X then T(q) = wp(f, q) for some f: X --+ Y. [Hint: f(x) = y if x E T {y} and f(x) is otherwise undefined; you must show, among other things, that such f is a well-defined partial function.] (iii) Show that the constructions of (i) and (ii) establish a bijection between Pfn(X, Y) and guard transformers from Y to X. This underlies the idea, carried out in some expository treatments, that a programming construct can be given semantics by specifying its guard transformer. 2. Fix g: Y --+ Z in a category with zero morphisms. Define the morphisms of a category whose objects are (U, u) with u: U --+ Y for which gu is total in such a way that terminal objects are totalizers of g. Conclude that any two totalizers of g are isomorphic. 3. Prove that ANMfn has kernel-domain compositions. (Warning: Since ANMfn is not partially additive, the theory of kernel-domain systems developed in the text does not necessarily apply.) 4. For f: X --+ Y, prove that d(f) is the least element of {pEGuard(X)lfp 5. For f: X --+ Y, g: Y --+ Z prove that d(gf) ::0; d(f). 6. For f, g: X --+ Y such that f + g exists prove that d(f + g) Use De Morgan's Law, 3.1.8, and Exercise 4.2.11.] 7. Let K, i P 'X' j Q ,D = p}. = d(f) v d(g). [Hint:
114 4 Assertion Semantics be kernel-domain systems for f, f1: X there exists C( as shown: ---> Y. Prove that d(f) ~ d(fd if and only if :~ X 0(1 I~ . D1 it ~ Prove that such C( is unique when it exists. [Hint: Use Exercise 4.2.8.] 8. Let f,f1: X ---> Y with kernel-domain systems as in Exercise 7. Prove that f C f1 in the extension ordering of Exercise 3.3.13 if and only if d(f) ~ d(fd and "g agrees with f when both are defined," that is, D 0( ---___+ jl Dl ----"--'-----+1 X j1 X commutes, where C( 19 -----~---_I f Y is as in Exercise 7. 9. Let pEGuard(X). Show that p = d(p). [Hint: Use Exercise 4.] 10. Show that wlp(idx,p) = p = wp(idx,p) for all pEGuard(X). [Hint: Use Exercise 9.] 11. Say that f: X ---> Y is deterministic iffor all q1, qz E Guard(Y), wlp(f,q1 v qz) = wlp(f,qd v wlp(f,qz)· (i) Prove that the deterministic morphisms form a subcategory. (ii) Show that in Pfn every morphism is deterministic whereas in Mfn the deterministic morphisms constitute the subcategory Pfn. This motivates the terminology. Notes and References for Chapter 4 The early papers in assertion semantics and the text of Alagic and Arbib were cited in the notes to Chapter 1. For additional expository accounts see: E. W. Dijkstra, A Discipline of Programming, Prentice-Hall, 1976. D. Gries, The Science of Programming, Spring-Verlag, 1983. The theory of Sections 2 and 3 is adapted from E. G. Manes, "Assertion semantics in a control category," Theoretical Computer Science, to appear. It is proved there that a third equivalent condition for theorem 4.3.8 is that every morphism is deterministic as defined in Exercise 4.3.11. Homological algebra, an area of abstract algebra with ties to algebraic topology, emphasizes the theory of "abelian categories," See the books of Freyd and Mitchell cited in the Chapter 2 notes. In an abelian category C (Veet is an example of one), there are zero morphisms, while finite products and finite co products share a common object and the construction is called a direct sum and is characterized by the direct sum systems of Exercise 4.2.2. In the paper by Manes cited above a more general theory of weakest precondition is given which goes beyond partially additive categories. A generalization of Theorem
115 Notes and References for Chapter 4 4.3.8 establishes that wp(gf, Q) = wp(f, wp(g, Q)) in ANMfo. The interpretation of wp(f, Q) in ANMfo is the one intended by Dijkstra in the book cited above, namely, wp(f,Q) = {xlf(x) # 0 and every YEf(x) is in Q}. As we mentioned in Exercise 4.3.1, many authors adopt the view that it is natural to define programming constructions in terms of their effects on the weakest precondition operator. We regard the assumptions on Pfo as too specialized to be adaptable to more general semantic categories, however. Counterexamples appear in the paper of Manes cited above.
PART 2 SEMANTICS OF RECURSION
CHAPTER 5 Recursive Specifications 5.1 The Kleene Sequence 5.2 The Pattern-ot-Calls Expansion 5.3 Iteration Recursively A recursive specification "defines a function in terms of itself." Recursive definitions occur commonly in the mathematical literature including that prior to the computer age. Here, the art of separating out "improper" recursive definitions was regarded as but one of the many skills necessary to write correct mathematics. But modern computer languages allow recursive specification to be expressed directly. Since the implementation of a programming language must respond to any recursive program, no matter how illconceived, we must pay attention to the mathematical question of what an "arbitrary" recursive specification should mean. We open Section 5.1 with some examples of recursive specification to demonstrate that the "desired" denotational semantics is not always clear and that there are several strategies for an operational semantics. A detailed discussion would be too long and we primarily focus on an informal treatment of "all-call" operational semantics. In this and the following three chapters we establish that for a recursive specification of a partial function X --+ Y there is a mathematical way to decribe the specification in terms of a total function t/J: Pfo(X, Y) ----+ Pfo(X, Y) whose all-call semantics coincides with the "Kleene semantics" of t/J. A wide class of recursively defined functions find their expected semantics with this approach. The pattern-of-calls expansion of Section 5.2 develops a partially additive form of Kleene semantics which capitalizes on the formal power series calculus available in the partially additive category Pfo. A formal proof that the semantics are the same must wait for Chapter 8. Section 5.3 expresses iteration recursively as is always done, say, in LISP. We extend the usual theory by adjoining concepts from partially additive semantics.
120 5 Recursive Specifications 5.1 The Kleene Sequence A simple example of a recursive specification for a partial function f: N IS 1 f(n) = {5f(n - 1) --+ N if n = 0 else. This specification is recursive because 1 is not a closed formula, but rather a definition of a function f in terms of itself. Regarding 1 as an equation, we may compute f(3) = f(3 - 1) = f(2) = f(2 - 1) = f(l) = f(1 - 1) = f(O) = 5. Indeed, it is quite clear that every solution f of equation 1 is total (else there exists a smallest n > 0 with f(n) not defined, hence, f(n - 1) is defined which contradicts f(n) = f(n - 1)) and so f(n) = 5 for all n is the unique solution. Alternatively, we may regard 1 as an algorithm which calls itself. Thus, to compute f(3) we would first call f(3), then f(2), then f(I), and then f(O) which terminates with final value 5; then f(l) (previously suspended) is 5; then f(2) is 5; and, at last, f(3) is 5. The specification of 1 appears to have only one solution and it seems not to matter here whether it is considered an equation or an algorithm. In general, however, recursive specifications may admit more than one equational solution, and more than one algorithmic solution depending on "calling strategy," and not every algorithmic solution is an equational one. In this section we present a number of basic examples to illustrate the complexity of even disarmingly concise recursive definitions for functions of the form N x ... x N --+ N. We then give a more mathematical formulation of recursive specification and define the "Kleene sequence" of a specification in precise mathematical terms to capture the idea of the sequence of successive algorithmic calls using the "all-call" strategy. Chapters 5-8 deal, in a large part, with alternative algebraic approaches to the semantics of the Kleene sequence. Our second example of a recursive definition is the familiar factorial function: 2 Example. The recursive specification fact(O) = 1 fact(n) = n' fact(n - 1) of the factorial function is a perfectly sound mathematical definition. The usual function f(n) = n! is the only equational or algorithmic solution as illustrated by the following computation.
121 5.1 The Kleene Sequence fact(3) = 3· fact(2) = 3·2· fact(l) = 3·2·1· fact(O) = 3· 2· 1 . 1 = 6. The next example is somewhat more complicated. 3 Example. The function a(m, n) known as Ackermann's function is defined by n+1 ffm=O { a(m,n) = a(m - 1,1) ifm =f. 0, n = a(m - 1, a(m, n - 1)) else. ° Thus, a(l, 1) = a(O,a(l,O)) = a(l,O) + 1 = a(O, 1) + 1 = (1 + 1) + 1 = 3. Although this may not be obvious, there is only one equational solution and it is total. But here there is possible ambiguity from the algorithmic point of view. In simplifying a(O, a(l, 0)) we chose to "call" the "outermost" a to get a(l,O) + 1. We could have called the "innermost" a instead yielding a(O, a(O, 1)). Although the ultimate result is the same in this case, we have made the point that calling strategy is not unique. Ackermann's function is quite interesting. For even very small m, n the computation of a(m, n) is very lengthy. We invite the reader to experiment. See the end of chapter notes to see why it is unlikely that any "closed formula" exists: the recursive definition is almost certainly the most convenient description. The next example shows how easily the equational and algorithmic approaches can be made to give different results. 4 Example. Consider the recursive "definition" f(n) = f(n + 1). If we use this as an algorithm we get the sequence of calls + 1) ---+ f(n + 2) ---+ ... that f is the everywhere-undefined f(n) -----. f(n which fails to terminate, so function. On the other hand, if we regard the above as an equation, then while the
122 5 Recursive Specifications everywhere-undefined function is still a solution, so is every constant total function. The following example, despite the simplicity of its description, defies analysis at the time of this writing. 5 Example. Define f(n) recursively by = f(n) {~(3n + f(n/2) 1) ifn = 0 or n = 1 if n is odd, n > 1 else. Computing equationally, we have f(3) = f(10) = f(5) = f(16) = f(8) = f(4) = f(2) = f(l) = 1. Similarly, f(7) = f(22) = f(ll) = f(34) = f(17) = f(52) = f(26) = f(13) = f(40) = f(20) = f(1O) = f(5) = (as above) 1. It is clear that if f(x) is defined then f(x) = 1. It is an unsolved problem of number theory whether or not f is total. We conclude our initial list of examples with a straightforward example in which different calling strategies yield different results. 6 Example. Define f: N x N - N recursively by f (m n) = { 18 ifm=O f(m - 1,f(1, 0)) , else. We now compute f(l, 0) algorithmically with two different calling strategies. The dot underneath indicates the call to be made. (i) "Call the leftmost f" [(1,0) - [(0,f(1,0)) - 1 8 . (ii) "Call the rightmost f" [(1,0) - f(O,[(l, 0)) --+ f(O,f(O,[(l, 0))) - .... Here the computation is non terminating. Our interpretation of (iJ is that we check if m = 0 without attempting to verify that n has a value in N. We hope these few examples as well as those to follow present sufficient evidence for the subtlety of the problem of assigning semantics to recursive specifications. As discussed earlier we shall limit our investigation to
123 5.1 The Kleene Sequence one algorithmic strategy which amounts to "calling all occurrences of f simultaneously." One approach to a rigorous definition would be to adopt a particular programming language, define the syntax of recursive call, and then adapt the "substitute-for-all" concept. In a Pascal-type language we would expect to deal with the issues of "local variables," "parameter passing," and operational semantics generally. This seems quite different from the recursive definitions written in a functional programming language (we shall extend FPF to include recursion in Section 6.3). In keeping with the spirit of this book, we will avoid an approach tied to the specifics of a single programming language and describe recursive specifications entirely in mathematical terms. Just as the requirement of syntactic validity limits the allowable specifications in a specific language, axioms will be imposed to prevent "arbitrary" specifications and these would allow us to prove theorems to guarantee that the assigned semantics exists and has useful properties. While we will never formally define "all-call" semantics, this idea underlies the "Kleene semantics" to be given in 16 below. We now motivate our approach and follow with the desired formal definitions. Let X, Y be sets. A recursive specification of a function f E Pfn(X, Y) takes the general form: 7 "For each x E X, f(x) depends on x and on f as follows .... " But let us look at 7 in a somewhat different way, phrasing it as follows: "Given x and certain values of f, we may combine them to form the value f(x)." We may abstract away from the individual values of x to say simply 8 "Given any function g there is a way to manipulate it that returns another function, call it rjJ(g). The function f that we are seeking to define, then, is such that f = rjJ(f)." Clearly, the rjJ mentioned in 8 is a total function g ~ rjJ(g) of the form 9 rjJ: Pfn(X, Y) -----+ Pfn(X, Y). 10 Example. The function rjJ: Pfn(N2, N) -----+ Pfn(N2, N), corresponding to the specification of the Ackermann function in Example 3, is defined by 11 n+1 rjJ(h)(m,n)= { h(m-1,1) h(m - 1, h(m, n - 1)) ifm = 0 ifm#O,n=O else. Thus, for example, if h is the total function h(m, n) function = m + n, rjJ(h) is the total
1 24 5 Recursive Specifications n+1 { ifJ(h)(m,n) = m 2m +n- 2 ifm = 0 ifm #- 0, n = 0 else. Continuing with our motivation, we now say, mathematically, that ifJ: Pfn(X, Y) ------+ Pfn(X, Y) is the recursive specification. This is justified by expressing the concepts we need in terms of ifJ as follows. Firstly, the equational solutions are exactly those hE Pfn(X, Y) with 12 ljJ(h) = h. In general, if A is any set and oc: A --+ A is any total function, a fixed point of oc is an a E A with oc(a) = oc. This terminology is natural since such a is left fixed by oc. In particular, the solutions of 12 are called "fixed point solutions." This is the usual terminology in the literature and we henceforth use it instead of the synonym "equational solution." How is ljJ related to "all-call" algorithmic solutions? Here, it pays to think syntactically. Imagine ifJ(h)(x) as a formula in h and x (as has been true in our examples so far) and interpret each h as a "call" so that ifJ(h) is a formula for the "first level of call." Then ifJ(ljJ(h)) represents the second level of call in the "all-call" strategy because by definition of ifJ(ljJ(h)), ljJ(h) is substituted for each h. Defining ljJn+1(h) = ifJ(ljJn(h)) as usual, we see ifJn(h) is the expression for the nth level of call. For example, the factorial function corresponds to the map ifJ for which ifJ(h)(n) = Thus, 2 ifJ (h)(n) = { I n. h(n - 1) ifn=O if n > O. {I n. h(n - 1) I - {1 n-(n - 1)·h(n - 1) ifn = 0 ifn > 0 ifn = 0 if n = 1 if n > 1. We think of ifJn(h) as having an "exit part" and a "call further part." The "call further" part collects all terms with an h, and the "exit part" obtains by ignoring all future calls. In the context of factorial as above, the exit parts are described by 13 exi t part for ifJ 1: if n = 0 then 1 else undefined; exit part for ifJ2: if n = 0 or 1 then 1 else undefined. A mathematically precise approach which intuitively "causes future calls to be ignored" is to substitute the everywhere-undefined functions .1.. E Pfn(X, Y) for h. Thus, ifJn(l..) is our candidate for the partial function corresponding to the "all-call" algorithmic solution after n levels of call. The reader should
125 5.1 The Kleene Sequence verify that, in 13, l/Ii(.1) are, indeed, the exit parts for I/I i • Since the substitution procedure creating I/In+1 (h) from I/In(h) should not disturb existing exit terms but may create new exit terms, we expect that I/In+1(.1) is an extension of I/In(.1) in the sense of Definition 2.1.9. . The motivations just given have glossed over numerous technical points and our discussion cannot be considered mathematically rigorous. Experience dictates that these ideas, nonetheless, conform to many examples of recursive specifications and this leads to the following mathematical definitions. 14 Definitions. Let X, Y be sets. The everywhere-undefined function in Pfn(X, Y), often called 0 earlier, will synonymously be called .1. We recall from 2.1.9 that Pfn(X, Y) is a poset under the extension ordering whereby f:.:::; g means DD(f) c DD(g) and g(x) = f(x) for x E DD(f). A recursive specification on Pfn(X, Y) is a total function 1/1: Pfn(X, Y) ~ Pfn(X, Y) such that 15 1/1(.1) :.:::; 1/1 2 (.1) :.:::; 1/1 3 (.1) :.:::; ... , that is, I/In(.1) :.:::; I/In+1(.1) for all n :.:::; 1 where I/In means the n-fold composition of 1/1 with itself. When 15 holds, the sequence 1/1(.1),1/1 2 (.1),1/1 3 (.1), ... is called the Kleene sequence of 1/1. The Kleene semantics of 1/1 is then flp E Pfn(X, Y) defined by 00 16 DD(fIp) = U DD(l/Ik(.1)) k=l while flp(x) = I/Ik(.1)(X) for any k with xEDD(l/Ik(.1)). Because of 15, it does not matter which k is used in the definition of flp(x), that is, I/In(.1)(x) = I/Im(.1)(x) if x E DD(l/In(.1)) n DD(l/Im(.1)). We say that I/Ik(.1) is the kth-approximant of the Kleene semantics flp' We pause to test these definitions on some of our earlier examples. 17 Example. The recursive definition of factorial in 2 has specification 1/1: Pfn(N, N) - - > Pfn(N, N), I/I(h)(n) = { 1 n'h(n - 1) ifn=O else. It is routinely computed that DD(l/Ik(.1)) = {O, ... ,k - 1} with I/Ik(.1)(n) = n! Thus, the Kleene semantics of 1/1 is the total factorial function since flp(n) = I/In+1(.1)(n) = n!
126 5 Recursive Specifications 18 Example. The specification for Example 4 is "': Pfn(N, N) ----+ Pfn(N, N), "'(h)(n) = h(n + 1). Since ",<-1) = .1, "'k(.1) = .1 for all k so that the Kleene semantics of", is .1. 19 Example. In the context of Example 6, the "all-call" strategy for f(l, 1) produces [(1,1) ----+ [(0,[(1,0» ----+ ... whose "simultaneous" meaning is perhaps not so clear for reasons discussed in 6. We invite the reader to struggle with it. What is the Kleene semantics here? Well, the specification",: Pfn(N x N, N) ----+ Pfn(N x N, N) is given by ",(h)(m, n) = 2 1/1 (.1)(m, n) = { 18 h(m _ 1, h(l, 0» ifm=O else; {18 "'<-l)(m _ 1, "'(.1)(1, 0» ifm=O else. But ",(.1)(1,0) is undefined so that "'(.1)(m - 1, "'(.1)(1, 0» is undefined even if n = 1. Thus, "'2(.1) = "'(.1) and the Kleene sequence is 1/1(.1), "'(.1), "'(.1), ... with Kleene semantics flp = "'(.1). Thus, DD(flp) = {(m, n): m = O}, flp(m, n) 18. = We conclude with an example that underscores many of the points developed in this section. 20 Example. Define f recursively by f(n) = o ifn = 0 2 if n > 0 and f(n - 1) > 1 if n > 0 and f(n - 1) = 1 else. {1 3 This is surely a reasonable mathematical specification since f(n + 1) is defined solely in terms of what happens to f(n), and since f(O) is given. Computing algorithmically, f(O) = 0 f(l) = 3 as f(O) = 0 f(2) = 1 as f(l) = 3 so that f is the total function
127 5.1 The Kleene Sequence f o 21 f(n) Here n =0 1 n: 2,4, 6, ... 2 n - 3, 5, 7,9, ... 3 n = 1. = f o 22 l/J(h)(n) = ifn = 1 if n > 2 if n > 3 else. ° ° ° and h(n - 1) > 1 and h(n - 1) = 1 But direct calculation shows ° =° l/J(1.)(n) = {~ ifn = else, = {~ ifn else, l/J 2 (1.)(n) which violates condition 15, namely, that l/J(1.) :5; 1/1 2 (1.) :5; l/J3(1.) :5; .... The situation is resolved informally by asserting that it is illegal to use the halting of the procedure itself as a test (since "else" meant "h(n - 1) = or h(n - 1) is undefined"). Formally, we say that the l/J of 22 is illegal because 15 fails. Experience dictates that reasonable recursive specifications can be restructured so that the Kleene semantics provides the intended semantics. In this case the desired specification is ° f o 23 ~(h)(n) = ifn = 1 ifn > 2 if n > 3 if n > ° ° ° ° and h(n - 1) > 1 and h(n - 1) = 1 and h(n - 1) = 0. The initial computation of f goes through the same way because the case that f(n - 1) is undefined never arises. However, now ~(1.)(n) = {01. ~2(1.)(n) = o { 31. n= else, n= ° ° n= 1 else, f o ~3(1.)(n) = 3 1. 1 ° n= n= 1 n= 2 else, and the Kleene semantics of 23 is just the intended semantics 21.
128 5 Recursive Specifications EXERCISES FOR SECTION 5.1 1. Let a(m, n) be Ackermann's function. Compute a(2, 2). 2. The Fibonacci function is defined recursively by f(n) = if n ::;; 1 then 1 else f(n - 1) + f(n - 2). Compute f(n) for n = 0, ... ,8. Verify that the Kleene semantics coincides with the unique fixed point solution. = 1. 3. For f as in Example 5 verify that f(19) 4. Repeat the analysis of Example 4 for the following: (i) f(n) = f(n). (ii) f(n) = f(f(n + 1)). 5. In Example 6, which of the two solutions, if any, is a fixed point solution? 6. Use the initial object property in the principle of simple recursion to prove that, givenf: N ---tN, xoEN, g(n) = if n = 0 then Xo else f(g(n - 1)) has exactly one fixed point solution for g. 7. The class of primitive recursive functions is the class of total functions of the form N k ---+ N defined inductively as follows. Basis Step: For each k > 0, 1 ::;; i::;; k, prj(n 1, ... , nk) = nj is primitive recursive Nk---+N. succ: N ---+ N, succ(n) = n zero: N ---+ N, zero(n) Pred: = +1 is primitive recursive. is primitive recursive. 0 N N, Pred(n) = {On-1 ---+ ifn = 0 else is primitive recursive. Inductive step. If m, k > 0, gl, ... , gm: N k ---+ N are primitive recursive and h: Nm ---+ N is primitive recursive then f: N k ---+ N is primitive recursive where f(n1,···,nd = h(gl(n 1,···,nk),···,gm(n 1,···,nk))· (Primitive recursion) If g: N k ---+ N is primitive recursive and if h: Nk+2 ---+ N is primitive recursive then f: Nk+1 ---+ N is primitive recursive where f is defined recursively by f(n 1,···, nk+1) = if nk+1 = 0 then g(n 1,···, nd else h(n1, ... ,nk+1,f(n 1, ... ,nk+1 - 1)) (A) (i) Show that (A) has exactly one equational solution for each fixed total g, hand show that this solution is total. (ii) Show that simple recursion (2.2.23) is a special case of primitive recursion. (iii) Prove that the following functions are primitive recursive: f(m,n) = m + n. f(m,n) = mn.
129 5.2 The Pattern-of-Calls Expansion f(m,n) = f(m, n) {Om-n ifm < n else. = if m = n then 1 else O. 8. Let tjI(h) = if n = 0 then 0 else 1 + h(h(n - 1)). Prove that the Kleene semantics is the total identity function f,in) = n. [Hint: show Ijtk(h) = if n < k then n else 1 + h2k(n - 1) by induction on k.] 9. Let tjlk(h) = if n > 100 then n - 10 else h(h(n tics is the "91-function" f",(n) [Hint: For 2 :$; k :$; = if n > + 11)). Show that the Kleene seman- 100 then n - 10 else 91. 102 show n -10 { tjlk(..L)(n) = 91 ifn> 100 if nE {lOO, 99, ... ,102 - k} undefined else and that tjlI02+m(..L) = tjlI02(..L) for all m > 0.] 10. We claimed that the discussion motivating Definitions 14 "glossed over numerous technical points." Debate the following. (i) If two different syntactic formulas tjI(h)(x) describe the same function h 1-+ tjI(h) the results of "all-call" substitution will compute the same function in both cases. (ii) All-call substitution always computes a partial function. (iii) In any specification one can always separate the "exit part" from the "call further part." 11. Let tjI: Pfn(N x N, N) ----+ Pfn(N x N, N) be the specification tjI(h) = if x = y then y + 1 else h(x, h(x - 1, y + 1)). Define f(x,y) = { +1 if x ;::0: y and x - y is even undefined else. X Show that f is a fixed point solution of tjI. Show that the total function g(x,y) x + 1 is another fixed point solution of tjI. = 5.2 The Pattern-of-Calls Expansion In the previous section we considered recursive specifications t/I: PCn(X, Y) ---+ PCn(X, Y) which were arbitrary subject to the requirement that t/ln(.L) ::; t/l n+ 1 (.L). There is no guarantee that the class of all such t/I is not much larger than the class suggested by the motivating examples, and it is reasonable to consider further axioms to narrow the gap. In this section we regard PCn as a partially additive category and focus on those t/I which are "power series" of the form
130 5 Recursive Specifications where, roughly speaking, Hn is the part of the specification involving n calls. An axiomatic treatment in arbitrary partially additive categories is the subject of Chapter 8. This section, which motivates the later work, is limited to a few examples of power-series specifications in PCn (for which all but the first two of these terms are 0), whose Kleene semantics is represented as a sum which we call the "pattern-of-calls expansion" because there is one term for each possible pattern of call as we would expect from our earlier motivations in Section 5.1 regarding the "all-call" strategy. We begin by examining the iterate ft: X --+ Y of a partial function f: X ---+ X + Y. As defined in Theorem 3.2.24, ft = I n=O fzft, where fl = PRd, f2 = PR 2fas in 3.2.15. Intuitively, the following flowchar.t identity holds: Writing f as indl + inzf2 with flowscheme 2 takes the form 3 x y Recalling from 3.2.5 that the flowscheme for g A + h: A --+ B is B
131 5.2 The Pattern-of-Calls Expansion 3 states that That 4 actually holds is seen by 00 = n=l Lfdf+f2 00 = L fdf =ft. n=O This suggests that we might define 1/1: Pfn(X, Y) - + Pfn(X, Y) given by ft recursively by the specification 5 As a first step toward the general definition of power-series maps and the pattern-of-calls expansion, we examine this recursive specification associated with the iterate in more detail. We distinguish two situations in 5. If the upper path is taken, we have a 1-substitution path which transforms a to the partial function 6 given the partial function a substituted for the single call along that path. If the lower path is taken, the partial function returned is 7 which takes no arguments since we have a O-substitution path. We may then write 8 Let H denote the pair (Ho, Hi). We associate with this a set of partial functions from X to Y which we name PC(H)-for pattern of calls of H -defined inductively as follows: 9 HOEPC(H). If a E PC(H), then Hi (a) E PC(H). In the inductive step, think of Hl(a) as the 1-substitution path, where a computation with interpretation a is substituted for the single call. This clearly implies that the elements of PC(H) are precisely those partial func-
132 5 Recursive Specifications tions which can be diagrammed as (n ~ r HI Ho 0 occurrences) which yields the partial function Hi(Ho) = f2ft", corresponding to n calls of the 1-substitution path followed by a final call of the O-substitution path. The semantics ft L f2ft = 00 = n=O the sum of all partial functions in PC(H) for the iterate may then indeed be termed the pattern-ofcalls expansion for the specification t/I of 5. The Kleene semantics of 5 is easily computed. From t/I(a) = f2 + afl we have t/lCl) = f2' t/l 2(1-) = t/l3(1-) = f2 f2 + f2fl' + (f2 + f2fl)fl = f2 + f2fl + fzfl, and an easy induction establishes t/l k(1-) k-l = L f2flk, n=O so that t/l 1(1-) ::; t/l 2(1-) ::; t/l3(1-) ::; ... and the Kleene semantics ft coincides with the pattern-of-calls expansion. Whereas the Kleene semantics approximates ft with increasingly larger functions (relative to the extension ordering), the pattern-of-calls expansion sums the "smallest components" of ft. We now give several further examples of recursive definitions of the form t/I(a) = a o + HI (a), before moving on to "nonlinear" definitions of the form t/I(a) = a o + HI (a) + H 2(a, a) which motivates the general power-series map t/I(a) = L Hn(a, ... , a) n~O ~ n times to be introduced in Chapter 8. 10 Example. The function g: N -+ N for which DD(g) = {nJn > 5},
133 5.2 The Pattern-of-Calls Expansion g(n) = n, may be recursively defined by the specification T F ljJ(a) If p: N ----+ N +N corresponds to the test (x > 5?) by p(x) = {(1, x) (2,x) if x> 5 else, then we see that ljJ(a) + a· PF' PT = ° which decomposes into Ho = PT which contains occurrences of the variable a, and HI = a· PF which contains 1 occurrence of a. 11 Example. Consider 5.1.4. Here ljJ(a)(n) = a(n + 1) or ljJ(a) where succ: N --+ N, n ~ n 1.., and HI (a) = a· succ. = a· succ + 1 is the successor function. In this case, Ho = 12 Example. Consider the recursive definition I(n) = {o I(n - 1) +1 Introducing the predecessor function pred: N ifn=O else. --+ N defined by o ifn=O n - 1 ifn > 0, pred(n) = { we see that the corresponding ljJ is ljJ(a) = 00 PT + succ· a· pred· PF' where 0 is the function constantly 0, and p: N --+ N corresponds to the test (n = O?). Here Ho = O· PT and HI (a) = succ· a· pred· PF.
134 5 Recursive Specifications We now consider a "nonlinear" recursive definition of a partial function f:N -+Nby 13 f(x) = if p(x) then f(f(g(x))) else x. This corresponds to the specification t/I(a) = PF + a 2 gPT' (Here squaring refers to composition of partial functions.) Corresponding to 8, we may rewrite this t/I as 14 where Ho = PF corresponds to the O-substitution path, while H 2 (a 1 ,a 2 ) = a 2a 1 gpT corresponds to the 2-substitution path, with a 1 being the partial function substituted for the first call along the path, while a 2 is the partial function substituted for the second call along the path. It is only when the same substitution is made at both places (as in checking the fixed point equation 14) that we force the two arguments of H2 to be equal. We now define the set PC(H) of patterns of calls for the H = (Ho, H 2) of 14 to be the set of partial functions N -+ N given by the inductive definition 15 Ho E PC(H). If a 1 and a 2 are in PC(H), then H 2(a 1 ,a2)EPC(H). We shall now see that 16 PC(H) is a family of partial functions with disjoint domains whose sum, eH = L (a: a E PC(H)), which we call the pattern-of-calls expansion for semantics of 13. t/I, coincides with the Kleene The idea is that PC(H) exhausts all possible patterns of calls. For example, consider the pattern of calls represented by the tree H2 /""Ho Ho/""Ho H2 which evaluates to PF(PFPFgPT)gPT = PF(gPT)2 on noting that pi = PF' Here Ho corresponds to the O-substitution path. Next, H2 t1 = / Ho "" Ho corresponds to the 2-substitution path with each call being to Ho; while the overall pattern t2 corresponds to the 2-substitution path with the first call
135 5.2 The Pattern-of-Calls Expansion being to the pattern t1 while the second call is to Ho. The Kleene sequence of 14 takes the form tjl0(1.) = 1. tjl1(1.) = Ho tjl2(1.) = Ho H2 + / "" Ho Ho H2 tjl3(1.) /""Ho = Ho + Ho + H2 /~ + /H2"" Ho Ho /H2"" Ho Ho It is clear that any two functions in PC(H) appear as terms in some tjlk(1.) and so have disjoint domains, and hence eH in 16 exists and coincides with the Kleene semantics. 17 Example. To illustrate the utility of the pattern-of-calls expansion we use it to prove that the Kleene semantics of the specification corresponding to 13 reduces to the semantics of while P do g. To analyze eH we recall that Ho = PF, while H 2(a 1,a 2) = a2a1gpT' It is thus clear that the leftmost term in every a E PC(H) is PF, while the rightmost term is PT unless a = Ho. Thus, H 2(a 1,a2) = ( .. ,PT)(PF' .. )gPT =0 unless a2 = Ho unless a2 = Ho, since PTPF = 0, the nowheredefined partial function. We see, then, that the only patterns-of-calls which can make a nonzero contribution to eH are of the form
136 5 Recursive Specifications H2 n~O / H2 occurrences tn = / .' /""- ""Ho Ho H2 /""-Ho Ho Now to = Ho = PF' while tn+! = H 2 (tn,PF) = PFtngPT, and we see by induction that tn = PF(gPTt since PFPF = PF' Thus, eH = 00 00 L t n = n=O L PF(gPT)" n=O which we recognize as the iterative fixed point solution of 4 corresponding to 11 = gPT and 12 = PF' so that eH does indeed equal the semantics of while P do 9 (which is It for 1= inTgpT + inFPF)' In the next example we consider a simultaneous recursive definition of two functions t, U E Pfn(X, X). 18 Example. Consider the simultaneous recursive specification x U Here Definition 5.1.14 must be generalized to ljJ: A ~ A where A is the set Pfn(X, X) x Pfn(X, X) of pairs of partial functions. This A inherits a partial addition from Pfn(X,X) by adding separately in each component. Decomposing I into 11 and 12 and 9 into gland g2 as before, we then rewrite 18 as t = Utf1 + 12' U = t 2g 1 + kg 2· We are thus led to define HoEA and H 2 : A2 ~ A by
137 5.2 The Pattern-ot-Calls Expansion H2((atl,aul),(at2,au2)) = (a u2atlfl,at2atlgl)' H2 requires explanation. H2 deals with the paths in the flowscheme 18 which make two calls. If VE {t, u}, j E {l, 2}, the argument avj refers to the substitution of v for the jth call. To see that this is the correct idea, it is essentially necessary to adapt the process of motivation as in 5.1.9-13 to appreciate the relevance of the Kleene semantics (defined via the obvious generalization of 5.1.14) of the specification 20 I/I(a, b) = Ho + H2((a, b), (a, b)). The first two terms of the Kleene sequence are 1/1(.1.., .1..) = Ho = (f2, kg 2); 1/1 2(.1..,.1..) = I/I(f2' kg 2) = Ho = + H 2((f2, kg 2), (f2, kg 2)) (f2,kg 2 ) + (kgddl,flgd = (f2 + kgddl,kg 2 + /igd· We leave it as an exercise for the reader to verify that if the flowschemes for t and for u are substituted for each t and each u in 18 then the exit paths with no calls of t, u in the resulting expanded flowschemes are exactly the components of 1/1(.1.., .1..) above. For H = (Ho, H 2), PC(H) is defined by 21 Ho c: PC(H). If (a tl , a u1 ), (a t2,au2 )EPC(H), H 2((a tl ,aud, (a t2 ,au2 ))EPC(H). While not obvious at this stage, the Kleene semantics and pattern-of-calls expansion exist and coincide. This follows from the theory in Chapter 8. 22 Example. The recursive specification of Ackermann's function, defined as in 5.1.3 has "power-series" representation. Here, 1/1: Pfn(N2,N)---+ Pfn(N2, N) is given by
138 5 Recursive Specifications F F (m,n) f-+n+l (m, n)f-+ (m, n)f-+ a(m - 1,1) a(m - l,a(m,n - 1)) .N· N N + Hl(a) + H2(a,a) where Ho(m, n) = if m = 0 then n + 1 else undefined; Then l/I(a) = Ho Hl(a)(m,n) = ifm > 0 and n H 2(a 1 ,a2)(m,n) = ifm, n EXERCISES FOR SECTION = 0 then a(m - 1,n) else undefined; > 0 then a2(m - 1,a 1 (m,n - 1)) else undefined. 5.2 1. Show that the pattern-of-calls expansion for t/I(h) = if n = 0 then 5 else h(n - 1) as in 5.1.1 is the total function f(n) = 5. 2. Obtain the pattern-of-calls expansion for the specification corresponding to Example 5.1.2 and prove that it is the usual factorial function. 3. For f E Pfn(X, X), p E Guard(X) a suitable specification for while p do f is t/I(h) = hfp + p'. Obtain the Kleene semantics and pattern-of-calls expansion and show they are equal. Discuss why this semantics is intuitively correct. 4. Repeat Exercise 3 for repeat f until p. 5. Show that the pattern-of-calls expansion in Example 12 is the identity function nr-+ n. 6. Consider the specification
139 5.3 Iteration Recursively l/I(h) = ifn::;; 1 then 1 else h(n - 1) + h(n - 2) corresponding to the Fibonacci function of Exercise 5.1.2. There are two candidates for a power-series specification, namely, (i) and (ii) l/I(h) = Ho + Hz(h,h), where Ho = if n ::;; 1 then 0 else undefined; Hl(h) = h(n - 1) + h(n - 2); Hz(h l , h z ) = hl (n - 1) + hz(n - 2). Show that the pattern-of-calls expansion for both (i) and (ii) coincide with the Fibonacci function. [Warning: Do not confuse + in Pfn(N,N) with + in N.] 7. Consider the power-series specification l/I(h) = Ho + Hz(h, h), where Ho = if n = 0 then 0 else undefined, corresponding to Exercise 5.1.8. Show that the pattern-of-calls expansion is the identity function. [Hint: As in the analysis of 13, consider trees. Show that for any subtree t, Hz / Ho '" is 1 or undefined accordingly as t evaluates to 0 or not.] 5.3 Iteration Recursively Iteration can only be expressed recursively in the programming language LISP. We have already studied "recursive iteration" in Section 5.2, where we saw that ft is defined by 5.2.5, ljJ(a) = f2 + afl' In the present section, we extend this concept by describing an iterative flowscheme with a set of simultaneous recursive equations. We then use partially additive semantics in prn and algebra to rewrite and solve these equations. This one example provides methods to be further applied in the exercises. Consider
140 5 Recursive Specifications 1 X T ----f p >-_.:...F_-----, r-----~ -- -- 9 a F ----h F X for a, b: X --+ X in Pfn, p, q, r: X ----+ X + X with P = in1PT + in 2PF with PTEGuard(X), PF = p~ and q = in1qT + in 2qF' r = in1rT + in 2rF similarly. Define f, g, hEPfn(X,X) to be the functions computed by starting at the indicated point in 1 and proceeding to the exit. Thus, the semantics of 1 is f. The reason for introducing g, h lies in the fact that they allow the following simultaneous recursive equations to describe 1: 2 f = g = if q then g else h; if P then a else g; h = if r then b else f. We now express the right-hand side of 2 in partially additive form, as 3 f 4 g= 5 h= = + apT· gqT + hqF· frF + brT . gPF It requires the theory of Chapters 6-8 to clarify what we intend "the correct solution" of 2 or 3-5 to be. Thus, the algebraic analysis we now give, while highly suggestive, must be justified later. Substituting 5, and then 3, in 4 we obtain 6 + (frF + brT)qF gqT + (gPF + apT)rFqF + brTqF g(qT + PFrFqF) + (apTrFqF + brTqF), g = gqT = =
141 5.3 Iteration Recursively which we recognize as the same form as 5.2.4. This prompts us to regard g as ft for f = in! (qT + PFrFqF) + in 2(ap TrFqF + brTqF). Accepting this, g = 00 L (apTrFqF + brTqF)(qT + PFrFqF)"· n=O But note that since qFqT = 0, an expression of the form tqF(qT for n ~ + u)" 1 and u E Guard (X) simplifies as follows: tqF(qT + u)" = + U)(qT + uri = tqFU(qT + U)(qT + ur 2 = tUqF(qT + U)(qT + ur 2 tqF(qT (since guards commute, by 3.3.19(v)) (3.3.19(v) again) Thus, 7 g = 00 L (apTrFqF + brTqF )(PFrFqF)" n=O and so g is given by 8 ~----+t X g x where 9 s = not P and not q and not r; c = aPTrFqF + brTqF. Substituting in 3, the desired semantics is then also that of
142 5 Recursive Specifications 10 T a The reader may easily verify by inspection that 10 is equivalent to 1. Hence, mechanical algebraic manipulation has simplified the original flowscheme! EXERCISES FOR SECTION 5.3 1. Use the methods of this section to simplify x T F x to
143 5.3 Iteration Recursively x x for appropriate s, c. 2. Show that F simplifies to while p do a: F
144 5 Recursive Specifications 3. A matrix over Pfn(X, X) is an m x n array A = [aij]' where i = 1, ... , m,j = 1, ... , n, and each aijE Pfn(X, X) subject to the requirement that for each i, t, u with t #- u, DD(a i,) n DD(a iu ) = 0. If A = [aiJ is an m x n matrix and if B = [bjk ] is an n x P matrix define an m x P array BA by a formula familiar from linear algebra, namely, C ik = n I j=l bjkaij , where I is the usual partially additive sum of Pfn(X, X) and bjkaij refers to composition. (i) Show that the sum for Cik exists. (ii) Show that BA is a matrix, that is, if t #- u, DD(c i ,) n DD(Ciu) = 0. (iii) Show that C(BA) = (CB)A for m x n A, n x P B, and P x q C. 4. In the context of 1-5 and the previous exercise, let aPT] B= [ 0 . brT (As usual, we represent a matrix such as A = [aij] as a rectangular array with aij in row i and columnj.) (i) Show that is a matrix. If C = [ciJ and D = [diJ are n x n matrices over Pfn(X, X) we say C + D is defined if cij + dij exists for all cij and if C + D with ij entry cij + dij is again a matrix. (ii) Writting show that FA + B is defined and that 3, 4, and 5 are equivalent to F = FA + B. For a continuation of the previous two exercises see Exercise 6.2.11. Notes and References for Chapter 5 "Recursive function theory" has been studied since the 1930s. The class of recursive functions is defined by extending the inductive part of the definition of the primitive recursive functions in Exercise 5.1.7 to include the construction of minimization:
145 Notes and References for Chapter 5 if f(nt, ... , nk+t) is recursive then g is recursive if g(n t , . .. , nk) is the least m with f(n 1, . .. , nk, m) = 0 but with f(n t , ... , nk, n) defined for 0::;; n ::;; m. Since there may be no m with f(n1, ... ,nk,m) = 0, this construction produces partial functions which need not be total. The class of recursive functions is known to coincide with the class of functions of the form N k ..... N that can be computed in a programming language such as Fortran, Pascal, LISP, Ada, .... In 1926, David Hilbert asked whether every recursive function that was total was necessarily primitive recursive. Ackermann's function of Example 5.1.3 provides a counterexample as was shown by W. Ackermann in a paper (written in German) in Mathematische Annalen, 19, 1928, pp. 118-133. After proving that a(m, n) always halts, Ackermann proved it could not be primitive recursive by arguing that the number of computations required for a(m, n) was larger than that required for any primitive recursive function. The reader may wish to check that a(3, 3) = 61, although here a computer might be helpful. We do not suggest that the reader should attempt to verify the following extraordinary fact: a(4,4) = 22 2216 - 3. This was verified by some poor abused secretary who was handed a crate of pencils, two truck loads of paper, and told to figure it out-no hurry-next week will be fine (The number a(4, 4) is vastly in excess of the number of hydrogen atoms that could fit in a cube having a side the diameter of the Milky Way Galaxy at a density of 1 ton per cubic inch). For the state-of-the-art on the unsolved problem of 5.1.5 see J. C. Lagarias, "The 3x + 1 problem and its generalizations," American Mathematical Monthly, 92, January 1985, pp. 3-22. For details on "computation strategies" see Z. Manna, Mathematical Theory of Computation, McGraw-Hill, 1974, Section 5-2. The reader may have been surprised at the omission of a number of terms associated with calling strategy such as "call-by-value," "call-by-name," "call-by-reference," and others. These terms refer to details of operational semantics and are usually meaningful only in the context of a specific programming language, so they apply at the level which we do not intend to address in this book. While Definitions 5.1.14-16 are implicit in the work of S. C. Kleene in recursive function theory, their emphasis in the context of the semantics of programming languages is due to D. S. Scott. See his article, "The lattice of flow diagrams," in E. Engeler (ed.), Symposium on Semantics of Algorithmic Languages, Lecture Notes in Mathematics Vol. 118, Springer-Verlag, 1971, pp. 311-366. For an exposition of Kleene's approach see Chapter 11 of H. Rogers, J r. Theory of Recursive Functions and Effective Computability, McGraw-Hill, 1967. (Incidentally, Chapter 12 of Rogers' book discusses "Recursively enumerable sets as a lattice") Section 5.2 is adapted from the authors' paper, "The pattern-of-calls expansion is the canonical fixpoint for recursive definitions," Journal of the Association for Computing Machinery, 29, 1982, pp. 577-602. The idea of expressing iteration recursively as in 5.3.2 was emphasized by John McCarthy, the inventor of LISP. See his paper, Recursive functions of symbolic expressions and their computation by machine, Communications of the Association for Computing Machinery, 3,1960, pp. 164-195.
CHAPTER 6 Order Semantics of Recursion 6.1 6.2 6.3 6.4 Domains Fixed Point Theorems Recursive Specification in FPF Fixed Points and Formal Languages The task of the next three chapters is to better understand the abstract principles underlying the ideas associated with recursive definitions of partial functions as presented in Chapter 5 and to thereby elevate the theory to a wide class of semantic categories. This chapter focuses on order semantics of recursion-the use of the theory of posets to provide a framework to formulate recursion and study its properties. We already used the extension ordering 2.1.9 in our requirement 5.1.15 that a recursive specification ljJ: Pfn(X, Y) -----+ Pfn(X, Y) must satisfy ljJ(..L) ::;; ljJ2(..L) ::;; ljJ3(..L) .,. This condition ensures that the formula for the Kleene semantics flJ! for ljJ of 5.1.16 defines a partial function and the motivating remarks preceding Definition 5.1.14 support that it is a natural condition. For most recursive specifications in Pfn, flJ! may be characterized as the least element of the set of fixed point solutions of ljJ, an abstract property expressed in terms ofthe poset structure ofPfn(X, Y) rather than as a specific formula such as 5.1.16 which applies only to partial functions. In Section 6.1 we show that the po set Pfn(X, Y) belongs to a general class of posets called "domains" and in Section 6.2 we prove that each "continuous" specification ljJ: D ~ D with D a domain has a least fixed point solution which is, moreover, given by a formula which generalizes 5.1.16. This is a satisfactory theory because, in practice, recursive specifications are continuous. One is not forced to look to more general semantic categories than Pfn to motivate the need for a more general theory, for the simultaneous recursion of 5.2.18 required an ad hoc generalization of the Definitions 1.5.14 of Kleene semantics. Simultaneous recursion is included in the theory of Section 6.2.
6.1 Domains 147 This is then applied in Section 6.3 to extend the functional programming fragment of Section 1.3 to simultaneous recursion definitions in Pfn. Recursive definitions are not limited to programming languages and find many uses in mathematics and computer science. In Section 6.4 we apply the theory of Sections 6.1 and 6.2 to solve for the language generated by a context-free grammar. 6.1 Domains The desire to define a Kleene semantics formula such as 5.1.16 in more general posets than Pfn(X, Y) lends to the notion of "domain." These are defined in this section and a few examples are given. Posets were introduced in 2.1.6 and infima and suprema for two-element families were discussed in 3.3.3. We begin by extending these definitions to arbitrary families. 1 Definition. Let (P, ~) be a poset and let S be any subset of P. An upper bound of S is an element x of P (not necessarily in S) satisfying "for all s E S, S ~ x." Let UB(S) denote the set of all upper bounds. Note that UB(0) = P (since if x E P, the condition "for all s E 0, S ~ x" is vacuously true). The supremum or least upper bound of S, denoted LUB(S) or V S, is the least element of UB(S); thus, it may not exist but is unique if it does as shown in 3.3.1. A po set (P, ~) is complete if LUB(S) exists for every subset S c: P. A complete po set has a least element, namely, LUB(0). We shall often use the symbol .l for the least element of a poset. Dually, a lower bound of S is an element of P (not necessarily in S) such that x ~ s for all s E S. Denote the set of all lower bounds of S by LB(S). The irifimum or greatest lower bound of S, if it exists, is the greatest element of LB(S) and is denoted GLB(S) or AS. Since UB(0) = P, LUB(0) is the same concept as the least element .l of (P, ~). Dually, GLB(0) is the greatest element. It is immediate from the definitions in 3.3.3 that if S = {x,y}, AS is the same concept as x 1\ y whereas V S = x v y. 2 Example. Let (N, ~) be the poset of the natural numbers with the usual numerical ordering. Then any nonempty subset of P has a least element. This amounts to the principle of mathematical induction, that is, if property Pn is not true for all n then {n: Pn is false} is nonempty and so has a least element no: assuming Po is true, no > 0, and Pno - 1 is defined. As Pno - 1 is true this contradicts the "induction hypothesis" that Pno - 1 => Pno ' It follows that any nonempty subset of (N, ~) which has at least one upper bound must have a least upper bound. On the other hand, no infinite subset has any upper bounds.
148 6 Order Semantics of Recursion 3 Example. The po set (&>(X), c) of 2.1.8 of all subsets of X is a complete po set. The greatest lower bound of a family !/ of subsets of X (which must exist -see Exercise 6) is its intersection n!/ = {xeXlxeS for all S e!/}, whereas the least upper bound of !/ is its union U!/ = {xeXlxeS for at least one Se!/}. The poset (Pfn(X, Y),~) with the extension ordering of 2.1.9 is not complete. Indeed, if f, g e Pfn(X, Y) have an x e DD(f) n DD(g) such that f(x) # g(x), then there are no upper bounds of {f, g} since iff ~ hand g ~ h thenf(x) = h(x) = g(x) which is not so. This provides half of the proof of the following: 4 Observation. For f, g e Pfn(X, Y), f, g have an upper bound if and only if = g(x)for all xeDD(f) n DD(g). f(x) One way was just observed and, conversely, if f(x) = g(x) for all ~ h, g ~ h for h defined by x e DD(f) n DD(g) thenf DD(h) = DD(f) u DD(g), h(x) xeDD(f) xeDD(g). = {f(X) g(x) The following is then immediate. 5 Observation. For the partially additive category Pfn, if (fd i e J) is summable in Pfn(X, Y), V/; exists in the extension ordering and 6 Example. For any sets X, Y, (Mfn(X, Y), ~) withf ~ g iff(x) c g(x) for all x e X is a complete po set. We leave the poset axioms as an exercise for the reader. Given {/;lieJ} c Mfn(X, Y), (V/;)(x) = U/;(x) iel defines the least upper bound, the least element being defined by J..(x) =0 (corresponding to the case J = 0). We now ask what property of the extension ordering Pfn(X, Y) allows a construction such as the formula for Kleene semantics 5.1.16. We have already seen in 4 that certain suprema exist. Indeed, further suprema exist in that Pfn(X, l') is a domain as now defined.
149 6.1 Domains 7 Definition. A poset (P, :$;) is a domain if it has a least element and if whenever (xn: n = 1,2,3, ... ) is an ascending chain in P (which means Xn :$; Xn+1 for all n) then LUB{x n } exists. 8 Example. (Pfn(X, Y), I :$; :$;) g if DD(f) with the extension ordering DD(g) and g(x) = I(x) for x E DD(f) c of 2.1.9 is a domain. Indeed, if 11 :$; 12 :$; jj :$; ..• define V/; by 00 9 DD(V/;) = U DD(/;) i=l (V/;)(X) = hex), any k with x E DD(h). Formula 9 is well defined because if x E DD(jj) n DD(h) then either jj :$; Ik or h :$; jj (since m :$; n implies that 1m :$; f,,) so that jj(x) = hex). It is then obvious that each jj is :$; V/;o Furthermore, if jj :$; g for all j then DD(jj) c DD(g) so that DD(V/;) c DD(g) and (Vn(x) = hex) (for some k) = g(x) so that V/; :$; g. This shows that 9 indeed defines the least upper bound. In the context of Example 8, we see that the Kleene sequence of 5.1.15 10 is indeed an ascending chain in Pfn(X, Y) and that its Kleene semantics 00 DD(fIJ!) = U DD(t/!k(~)), k=l IIJ!(x) = t/!k(~)(X), any k with xEDD(t/!k(~))' as in 5.1.16 is exactly an instance of 9 so that we now see that the Kleene semantics satisfies 11 12 Example. (Mfn(X, Y), :$;) as in 6 is a domain. This is obvious since any complete poset is a domain. In the next section we will introduce a general definition of a recursive specification as a suitable function t/!: D -+ D on an arbitrary domain D generalizing D = Pfn(X, Y). In this context, the following intuition is useful. 13 I:$; g for f, g E D means "g has at least as much information as f" This intuition applies to 10 in the context of the examples of Section 5.1 since
150 6 Order Semantics of Recursion corresponds to the fact that after k + 1 substitutions of t/I (in the "all-call scenario") at least as many and possibly more exits can occur as could for k substitutions. Thus, 10 is a "sequence of approximations" and 11 asserts that the Kleene semantics is the "limit of the approximating sequence." Hasse diagram notation as in 2.1.6 can be useful even for infinite posets. For example, (N, :::;;) as in 2 has Hasse diagram The "flattest" Hasse diagram would be 14 • • • with one dot for each element in some set X, and this describes the discretely ordered poset (X, =) on X where, as usual, x = y means x, yare equal. For any set X this is a poset as is easily verified. This is not a domain since there is no least element. This problem may be fixed by adjoining one: 15 This is indeed a domain since every ascending chain has one of the three forms .1:::;;.1:::;;.1:::;;···:::;;.1:::;;x:::;;x:::;;x:::;;···, which have suprema .1, x, and x, respectively. The formal definition is easily given as follows where the superscript ~ is the flat symbol from musical notation. 16 Definition. Let X be any set and let .1 be a new object not an element of X. The flat domain of X is the po set X~ = (X u {.1}, :::;;), where x :::;; y means "x = .1 or x = y."
151 6.1 Domains The Hasse diagram of X~ is indeed as in 15 and the reason that X is a domain was given above. A domain D isflat if D = X~, where X = {xED: x#- ..L}. We conclude the section by introducing product domains. The ad hoc discussion of simultaneous recursion in 5.2.18 required a specification of the form Pfn(X, X) x Pfn(X, X) ~ Pfn(X, X) x Pfn(X, X). Since our general model for a specification will be a function of the form D --+ D for D a domain and since E = Pfn(X, X) is a domain, we have motivated the idea that E x E should be a domain whenever E is. More general, but quite natural, examples of simultaneous recursion will require that Dl x ... x Dn be a domain when the D; are. The requisite definition is easily given as follows: 17 Definition. Let (D l , :-::; d, ... , (Dn, :-::;n) be domains, n > O. Then their prod- uct domain (D, :-::;) is defined as follows: D = Dl X ... x Dn (the product set, 2.3.1, 2.3.11), It is routine to show that (D, :-::;) is a poset. If is an ascending chain in (D, :-::;) then for each iE {1, ... , n}, is an ascending chain in (D;, :-::;;), and so has supremum x; and it is clear that (Xl' ... , xn) is the supremum in (D, :-::;) of the original chain. The least element of (D, :-::;) is (..L 1, ... , ..L n ), where ..L; is the least element of (D;, :-::; J Thus, (D, :-::;) is indeed a domain. EXERCISES FOR SECTION 6.1 1. Explain the use of the term "dually" in Definition 1 (cf. Exercises 3.3.1 and 3.3.2). 2. In any poset, show that for one-element subsets S coincide with s. This generalizes 3.3.6(i). = {s}, V Sand I\s exist and 3. Let S, T be subsets of a poset with SeT and assume VS, 1\ T exist. Show that V S ~ V T. State the dual result for infima (hence, of course, no proof is necessary). 4. Let (P, ~) be a po set. Show that the least element is the same concept as What is the dual statement? 5. Extend Proposition 3.3.5(iii, vi) by proving V P.
152 6 Order Semantics of Recursion X A (y A Z) = GLB{x,y,z} = (x A y) A Z in any meet-semilattice and x v (y v z) = LUB{x,y,z} = (x v y) v z in any join-semilattice. Included is the assertion that GLB {x, y, z} (respectively, LUB{x,y,z}) exists. [Hint: Cut the work in half, using duality.] 6. Prove that every subset of a complete poset has an infimum. [Hint: /\ S = VLB(S).] 7. Let (P, :-:;;) be a poset. A subset S of P is consistent if each two elements of Shave an upper bound in P. (P, :-:;;) is consistently complete if (P, :-:;;) is a meet-semilattice with least element in which every consistent subset has a least upper bound. (i) Show that every complete po set is consistently complete. [Hint: Use Exercise 6.] (ii) In Pfn(X, Y), with the extension ordering of 2.1.9, show that "overlap summable" and "consistent" coincide. Conclude that Pfn(X, Y) is consistently complete. (iii) In Pfn(X, Y), show that "disjoint-domain-summable" coincides with "consistent and!; A jj = .1 if i i= j." (iv) Show that (N, :-:;;) as in Example 2 is not consistently complete. 8. Let (D, :-:;;) be a domain, let k > 1. Show that Xl :-:;; x2 :-:;; X3 :-:;; ••• be an ascending chain, and let V {X I ,X2 ,X3 ,···} = V {Xk,Xk+I,Xk+2'···}· 9. Let D = XD, where X has one element. Show that D x D is not flat. [Hint: Draw the Hasse diagram.] 10. Let (P, :-:;;) be the poset of all nonzero real numbers with the usual numerical ordering. Let S = {xEPlx < O}. Show that UB(S) is infinite but has no least element. Conclude that (P,:-:;;) is not consistently complete (as defined in Exercise 7). 6.2 Fixed Point Theorems The previous part of this chapter has motivated the following generalizations of Definitions 5.1.14: 1 Definitions. Let D be a domain. A recursive specification on D is a total function 1/1: D -+ D such that 2 When 2 holds, the sequence I/I(.L), 1/1 2(.L), 1/13(.L), ... is the Kleene sequence ofl/l. The Kleene semantics ofl/l is thenftp E D defined by
153 6.2 Fixed Point Theorems 3 fIJi = V t/JnCl). 00 n=l This supremum exists by the definition of a domain and generalizes 5.1.16 by 6.1.11. In this section we investigate conditions on a po set (P, ~) and function t/J: P --+ P that guarantee the existence of fixed pointsf E P with t/J(f) = f The most important result is Theorem 13 below which asserts that if D is a domain and t/J: D --+ Dis continuous,f1Ji of 3 is the least fixed point of t/J. We begin with the following: 4 Observation. If (D, ~) is a domain and t/J: (D, ~) --+ (D, ~) is monotone as defined in 2.1.10 then t/J is a recursive specification. To prove this, first observe that 1- ~ f is true for any fin D since 1- is the least element, so 1- ~ t/J( 1-) must hold. By monotonicity, t/J(1-) ~ t/J2(1-), t/J2(1-) ~ t/J3(1-), .... The definition of least fixed point is formally given as follows. 5 Definition. Let (P, ~) be a poset and let t/J: P --+ P be a total function. A fixed point of t/J is an element f of P satisfying t/J(f) = f The least fixed point of t/J (if it exists) is the least element of the set of fixed points of t/J. We claimed in Example 5.1.2 that the recursive specification of the factorial function has only one fixed point solution. The next example provides a more careful verification. 6 Example. Consider the recursive specification t/J: Pfn(N, N) ------. Pfn(N, N) for the factorial function 1 ifn = 0 t/J(h)(n) = { n· h( n - 1) eIse. Then DD(t/J(h)) = {O} u {n: n - 1 E DD(h)} so that g ~ h implies DD(t/J(g)) ~ DD(t/J(h)), and an easy induction argument then establishes that t/J(g) ~ t/J(h). Hence, t/J is monotone. Let f(n) = nL Clearly, t/J(f) = f so f is a fixed point. Suppose t/J(h) = h. Then h(O) = t/J(h)(O) = 1 = f(O). Now assume h(n) = f(n) for n = 0, ... , k. Then h(k + 1) = t/J(h)(k + 1) = (k + 1)· t/J(h)(k) = (k + 1)· f(k) = (k + 1)k! = f(k + 1). Thus,fis the only fixed point oft/J, and so is the least fixed point of t/J. 7 Example. Let t/J be the recursive specification corresponding to Example 5.1.4, t/J(h)(n) = h(n + 1). This is obviously monotone. In 5.1.4 we showed that the set of fixed points is
154 6 Order Semantics of Recursion the set of total constant functions together with 1-, and so 1- is the least fixed point. Because these examples suggest a connection between the semantics of a specification and the least fixed point, we shall prove two general theorems concerning the existence of fixed points. The first is: 8 Theorem. Let (P, ::;) be a poset and let t/!: (P, ::;) ------+ (P, ::;) be monotone. Then il I 9 = V {h: h ::; t/!(h)} exists, it is afixed point olt/!. PROOF. Set H = {h: h ::; t/!(h)} so that 1= LUB(H). For any hEH we have h ::; t/!(h) whereas t/!(h) ::; t/!U) since t/! is monotone and IE UB(H) SO, by transitivity, h ::; t/!U). As I = LUB(H), I::; 10 t/!(/~ As t/! is monotone, 10 yields t/!U)::; t/!(t/!U)) so that t/!U) E H. But then as IE UB(H), t/!(f)::; I which together with 10 and antisymmetry yields 1= t/!U)· 0 t/!: D -+ D be the identity function t/!(d) = d. Then t/! is monotone. The Kleene semantics is lIP = 1- since t/!n(1-) = 1- for all n. This is a fixed point solution and is surely the least fixed point being the least element of D. Since the supremum of 9 11 Example. Let D be any domain and let I=VD is the greatest element of D this may not exist and if it does will produce a different fixed point solution as long as D has at least two elements. Theorem 8 has useful applications in mathematics (see Exercise 3) but is not very useful in semantics. We now introduce continuous functions which lead to the more useful Theorem 13. 12 Definitions. Let (D.::;) and (E, ::;') be domains. A monotone map t/!: (D, ::;) ------+ (E, ::;') is continuous if is preserves least upper bounds of ascending chains. That is, whenever 10 ::; 11 ::; 12 ::; ... is an ascending chain with I = V (fn) then t/!Uo)::; t/!Ud ::; t/!(2) ::; ... (which is automatically an ascending chain because t/! is monotone) has t/!(f) = V(t/!Un))' Thus, t/!(V Un)) = V(t/!Un))' We are now ready to prove the Kleenefixed point theorem: 13 Theorem. Let (D, ::;) be a domain and let uous. Then the Kleene semantics t/!: (D, ::;) ------+ (D, ::;) be contin-
155 6.2 Fixed Point Theorems (as in 3) is the ietlst fixed point of 1/1. PROOF: If fo :s;; fl :;;; f2 :s;; ... is any ascending chain, so is fl :;;; f2 :;;; f3 :;;; ... and both have exactly the same set of upper bounds and so must have the same least upper bound (both being least elements of the same set). It follows that the least upper bound f of .l :;;; I/I(.l) :;;; 1/1 2 (.l) :;;; ... must also be the least upper bound of I/I(.l) :;;; 1/1 2 (.l) :;;; 1/1 3 (.l) :;;; .... But the latter is exactly I/IU) by the continuity of 1/1. It follows that I/I(f) = f and f is a fixed point of ~. Now let I/I(g) = 9 be an arbitrary fixed point. As .l :s;; 9 and 1/1 is monotone, I/I(.l) :s;; I/I(g) = g. Similarly, 1/1 2 (.l):;;; I/I(g) = g, 1/1 3 (.l):;;; I/I(g) = g, ... so I/In(.l) :;;; 9 for all g. Thus, 9 is an upper bound of {I/I"(.ln and, as f is the least upper bound of this set,j :s;; g. 0 In certain situations, an intuitive guess f for the value offtp may be hard to verify using the formula of 3, whereas it might be relatively easy to show directly that f is the least fixed point of 1/1. Thus, f = ftp by Theorem 13. Example 6 illustrates this, providing we show 1/1 there is continuous. This is done next. 14 Example. We saw in Example 6 that the factorial function is the least fixed point of the specification I/I(h)(n) { = n= 0 1 n· h( n - 1) eIse. Such 1/1 is continuous as follows. Monotonicity was already observed above. If h = V hk with ho :;;; hI :s;; h2 :;;; ... then DD(I/I(h)) = {O} u {n: n - 1 E DD(hk) for some k}. l/I(h)(O) = 1 = l/I(hk)(O) for any k. For n > 0, I/I(h)(n) = n· (hd(n - 1) for any k with n - 1 E DD(hd. Thus, I/I(h) = V(I/I(hd). This shows that Example 6 is an instance of Theorem 13 and proves that the factorial function is the Kleene semantics. 15 Example. The specification ifn = if n > if n > if n > ~(h)(n) ~ {~ 0 0 but h(n - 1) > 1 0 and h(n - 1) = 1 0 and h(n - 1) = 0 of 5.1.23 is easily seen to be continuous. Settingf = I/IU) we get f(O) = 0 f(1) = 3 asf(O) = 0 f(2) = 1 asf(1) = 3
156 6 Order Semantics of Recursion thus revealing this earlier calculation in 5.1.20 as a proof that 1/1 has a unique fixed point. The specification of 5.1.20 was seen not to be monotone so Theorem 13 does not apply, but the same total function is the unique fixed point. 16 Example. Not every monotone map 1/1: D --+ D is continuous. For an example with D = Pfn(N, N) see Exercise 6. A simple abstract example is as follows. Let (D, ~) be the domain N u {oo, 00 + 1}, where 00, 00 + 1 are new objects not in N, with O<1<2<···<n<···<00<00+1. This is a complete poset with VA = {:ge" in 00 A +1 if A = 0 if A is finite, nonempty if A is infinite, 00 + 1 rt A ifoo+1EA and is a domain in particular. Define 1/1: D --+ D by 1/I(x) = {x + 1 ifxEN 00 + 1 if x = 00 or x = 00 + 1. Then 1/1 is easily seen to be monotone but if Xn = n, whereas V1/I(xn) = VX n+1 = 00 so 1/1 is not continuous. Note that ftp = V n + 1 = 00 n=O 00 is not a fixed point of 1/1. We close the section by showing how the theory includes simultaneous recursion. 17 Example. The recursive specification g(n) = if n = 0 then h(n, n) else 5 h(m, n) = if m = 0 then 0 else g(n) is formalized by the specification Pfn(N, N) x Pfn(N2, N) ~ Pfn(N, N) x Pfn(N 2, N),
157 6.2 Fixed Point Theorems 1/1 1 (t, u)(n) = if n = 0 then u(n, n) else 5, 1/12(t, u)(m, n) = if m = 0 then 0 else t(n), where we use the product domain construction of 6.1.17, and the notation I/Ik(t, u) = (I/I~ (t, u), I/I~(t, n». It is routine to check that 1/1 is continuous. The first three terms of the Kleene sequence are as follows: 1/1 1 (.1., 1-)(n) = if n > 0 then 5 else undefined, 1/12(1-, 1-)(m, n) = if m = 0 then 0 else undefined, 2 {5 I/Il(1-,1-)(n)= 0 I/I~(1-,1-)(m,n)= { 3 0 ifm=O 5 ifm,n>O undefined else, {5 I/Il(1-,1-)(n)= 0 3 1/12(1-, 1-)(m,n) = ifn > 0 ifn=O, ifn > 0 ifn=O, {O ffm=Omn=O 5 ifm, n > O. Since 1/13 consists of total functions, 1/13 = 1/1 4 = 1/15 = ... and coincides with I",. Thus, the Kleene semantics is I", = (if n = 0 then 0 else 5, if m, n > 0 then 5 else 0). If I were another fixed point solution thenI", ::; whereas I", consists of total functions so that unique fixed point solution. EXERCISES FOR SECTION I being the least fixed point I", = f. This shows I", is the 6.2 1. Complete the argument of Example 6 by providing the omitted induction argument. 2. Let X, Y be sets and let f: X --> Y be a total function. Show that the following functions are continuous. (i) f",,: (&,(X), c) ---> (&,(Y), c), (ii) f*: (&,(Y), c) ---> (&,(X), c), f*(A) = {YE Yly = f(a) for some aEA}. f*(B) = {x E Xlf(x) E B}. 3. The Cantor-Schroeder-Bernstein theorem of set theory asserts that if f: X --> Y, g: Y --> X are total injective functions then there exists an isomorphism h: X --> Y. This is quite an amazing result. For example, if X, Yare the indicated subsets of the plane
158 6 Order Semantics of Recursion then it is obvious that there exists an injective functionf: X --+ Y (create a "photographically reduced" copy of X inside one of the shaded rectangles in Y to define f) and, similarly, there exists an injective function g: Y --+ X but it is much less obvious that there exists a bijective function h: X --+ Y. Prove the Cantor-Schroeder-Bernstein theorem by using the following outline. (i) Given!, g define 1/1: (&,(X), c) ---+ (&,(X), c) by I/I(A) = X - g*(Y - f*(A)), where!*, g* are as in Exercise 2 and X - A means {xEXlxrtA}. (ii) By Theorem 8, 1/1 has a fixed point S, I/I(S) = S. Thus, X - S = g*(Y - !*(S)). (iii) Show that h: X --+ Y is well defined and bijective if h(x) = {!(X) the unique y, g(y) = x if XES else. 4. Show that the fixed point in Theorem 8 is, in fact, the greatest fixed point of 1/1. Conclude, using duality and Exercise 6.1.6, that A {g: g ~ h for all h with I/I(h) ~ h}, if it exists, is the least fixed point of 1/1. 5. Show that "domains and continuous functions" with composition and identities at the Set level is a category for which the construction of 6.1.17, with the usual Set-level projections, provides finite products. What is the terminal object? Prove that isomorphisms in this category are the isomorphisms of po sets. 6. Define fk E Pfn(N, N) by j,.(n) = if n is even and ~ 2k then n else undefined and define 1/1: Pfn(N, N) ---+ Pfn(N, N) by I/I(h) = {!m+l IdN if DD(h) is finite and m is the largest k with f" ~ h else. Show that 1/1 is monotone but that f" is an ascending chain with 1/1 (Vf,,) =lV(1/1 (f,,)), so that 1/1 is not continuous, show that the Kleene semantics of 1/1 is not a fixed point of 1/1.
159 6.2 Fixed Point Theorems 7. Investigate the Kleene semantics of g(n) = if n > 0 then h(n - 1, n) else 5, h(m, n) = if m = 0 then 0 else g(n + 1). 8. Use Theorem 13 to do Exercise 5.1.8. 9. Use Theorem 13 to show that the function J provides the Kleene semantics in Exercise 5.1.11. 10. Investigate the Kleene semantics of J(n) = if n = 0 then 1 else n - g(f(n - 1)), g(n) = if n = 0 then 0 else n - J(g(n - 1)). Show that the corresponding specification is continuous. Exercise 11 continues to develop iterative programs from the matrix point of view, building on Exercises 5.3.3-4. 11. An abstract iterative program with n-Ioops over Pfn(X, X) is (A, B, n), where A is n x nand B is n x 1 such that A : B is an n x (n + 1) matrix over Pfn(X, X). Such describes an algorithm to compute a partial function!! EPfn(X, X) by the obvious - - - II all a l2 l ~~---I2 .... a 21 a 22 ~ r"""'" l bl a l3 b2 a 23 ~--- 13 I .~ a 31 ( '\ a 32 I ~ r"""'" . ~ ~ a 33 I b3 I ......... I In general, the functions in row i have disjoint domains and at most one of these
160 6 Order Semantics of Recursion generalization ofthe following flow diagram for n = 3 (where A = [aiJ, B = [b;]) can act on an input value, so each row has only one input line. Furthermore, aij feeds back to row j and bi exits. The function /; is the function computed if entry is in row i. (i) Show that (A, B, 3) of Exercise 5.3.4 describes 5.3.1. (ii) For general (A, B, n) if F is an n x 1 "unknown," show that FA + B is defined and that if F = [/;] is "the solution," that is, if/; is the computed function beginning in row i, then F = FA + B. (iii) Using the product domain D = (Pfn(X,X))", show that FA + B is continuous, and that fop = I t/!: D -> D, t/!(F) = 00 m=O BAm (including the assertion that this sum exists; BAo means B). (iv) Argue that "the solution" F obtained by "running the flowscheme" of (A, B, n) is fop. [Hint: Prove by induction that if /;(x) is obtained in m loops then f(x) = gi(X), where gi is the ith entry of B + BA + ... + BAm.] 6.3 Recursive Specification in FPF In this section we extend the functional programming fragment FPF of Section 1.3 to allow recursive specification and, in particular, iterative constructs. Our approach is very straightforward. We extend the syntax to include FPF function expressions with function variables and use these to define simultaneous recursions. The semantics is defined as the Kleene semantics using Theorem 6.2.13. This illustrates the use of order semantics in providing a formal semantics for a programming language. The many examples of functions that can now be defined in FPF make the point that the Kleene semantics is a reasonable one even if, as discussed in Section 5.1, there are other approaches. We begin by summarizing the syntax of recursion in FPF in Table 1. (A final complete syntax for FPF appears in Table 24 on p. 165). The reader should reread Section 1.3 at this point to regain familiarity with FPF and its notations. We begin with an example.
161 6.3 Recursive Specification in FPF Table 1 Syntax of Recursive Definition in FPF In addition to Table 1.3.1: New Alphabet Symbols Letters: A B· .. Z Recursive Definition Symbol: <= Afunction variable is a nonempty string ofletters. The set of recursive expressions (REs for short) is defined by Basis Step: A function is an RE. A function variable is an RE. Inductive Step: Same as for functions in Table 1.3.1. A recursive specification is (G1 <=r 1 ,G2 <=r 2 ,···,G.<=r.), where n > 0, G1, ... , G. are distinct function variables and r l' ... , r. are REs such that Var(r;) (as defined in 7) c {G1 , 00., G.} for 1 :::; i :::; n. A recursive definition of a function to sum all the numerals on a tree (ignoring empty subtrees) is 2 sum(n) = n sum« (if n is a numeral), ») = 0, sum«t1,oo.,t k»), k > 0 tl + sum(t 2,···, tk ) { sum(t 2, ... ,tk ) if tl is a numeral sum(head t 1, tail t 1, (t 2, ... ,tk » ift 1 = ( ) else. For example, sum(l, (2, 3» = 1 + sum«2, 3» = 1 + sum (2, (3),( » = 3 + sum«3), ( » = 3 + sum (3, ( ), « ») = 6 + sum« ), « ») = 6 + sum«( ») = 6 + sum « ), ( ), ( » = 6 + sum« ),( » » = 6 + sum« = 6 + sum( ) = 6 + 0 = 6.
162 6 Order Semantics of Recursion For purposes of illustration, we code 2 in FPF. The following abbreviation is useful: 3 For U E DTN, equ =abb (= 0 [id, equ: t = u]). The semantics is = { T if t = u .f Flt"# u. We then have 4 Example. An FPF recursive specification corresponding to 2 is SUM <= (if num then id else (if eq< > then =0 else (if (num 0 head) then ( + 0 [head, (SUM 0 tail)]) else (if (eq< >0 head) then SUM 0 tail else SUM 0 [head 0 head, tail 0 head, tail]))). 4 has the form G1 <= r 1 with G1 = SUM and r 1 everything to the right of the <=. The general specification in Table 1, with n > 1, will be used for simultaneous recursion. Before giving an example of this type, we define the abbreviation. 5 Pair num =abb (if (eq< > v num) then F else (if « num 0 head) /\ (num 0 tail)) then T else F)) with semantics . PaIr num : t = {T F if t = (m, n) with m, n numerals I e se. 6 Example. An FPF coding of the simultaneous recursion of Example 6.2.17 is (G 1 <= r 1, G2 <= r 2), where r 1 =abb if (eqo /\ num) then (G2 0 [id, id]) else = 5, r 2 =abb if«eq o opr1) /\ Pair num) then =0 else G 1 • These examples help make it clear what we are trying to do and we turn now to a formal semantics. 7 Definition. If r is an RE, Yarer) denotes the set of function variables occurring in r. For example, if r = (if(SUM 0 [G, + ]) then (Go [H, =0]) else H), then Yarer) = {SUM,G,H}. The need for the condition "Var(r;) c {G 1 , •.• , Gn }" in Table 1 should now be clear since we cannot "call" a specification unless we can substitute for each function variable.
163 6.3 Recursive Specification in FPF For each finite nonempty list G1 , ••• , Gn of function variables, define RE(G 1 , ... , Gn ) = {nr is an RE, Var(r) c {G 1 , ... , Gn }}. Note that RE( G1 , .. . , Gn ) has an inductive definition, namely, that of Table 1 save that in the basis step we may take only one of G1 , ••• , Gn as a function variable. Also, recall from Section 1.3 that the derivation tree of each RE is unique because of our liberal use of parentheses. This enables us to define properties of elements of RE(G 1 , ... , Gn ) using induction. 8 Fixed Notation. For the balance of this section D = Pfn(DTN, DTN). Thus, D is a domain. For n > 0, Dn = D x ... x D (n times) is a domain as in 6.1.17. 9 Definition. Let G 1 , ••• , Gn (n > 0) be a list of function variables. Intuitively, for r E RE(G 1 , ... , Gn ), if we substitute a function hiE D for Gi the result is a function in D (e.g., in 6, r 2 ED if G1 E D). We now define this rigorously. Given rERE(G1 , ... ,Gn), its evaluation Dn~D is defined inductively as follows. 10 If r = f is a function, j(h 1 , ••• , hn ): t = 11 If r = Gi, Gi = pri, that is, 12 If k ~ 1, r 1, ... , f for all t E DTN. Gi (h 1 , .. ·, hn ) = hi' rkE RE(G 1 , ... , Gn ) then (rko"'ord/\ = Dn nr, .... ,rkll l Dk compk I D, where we use the double-bracket notation 13 [[f"""J.ll\"?X' y for the total function induced by the product property since the notation [/1, ... JkJ of 2.3.11 has a different meaning in FPF, and 14 compk(h 1 , ••• , hk ) = hk 0 ' " °hI is the k-fold composition DTN ~ DTN ~ ... hk-'l of 1.3.12. 15 Ifk~ 1, rl> ... ,rkERE(Gl, ... ,Gn) then DTN ~ DTN
164 6 Order Semantics of Recursion where 16 constrk (h 1, ... , hk ) = [h 1, ... , hk ] as in 1.3.13. 17 ur 1 , r 2 ,r3 ERE(G1 , .•• , Gn) then (ifr1 then r 2else r 3)" = Dn [[r •. r,.r,ll, D3 if-then-else, D with 18 if-then-else (h 1 , h2, h3) = if h1 then h2 else h3 as in 1.3.14. 19 UrERE(G1, ... , Gn) then (an" = Dn~D~D with a as in 1.3.15. 20 UrERE(G1, ... ,Gn) then (fr)" = Dn ~ D ~ D with / as in 1.3.16. This completes Definition 9. We then have the following: 21 Definition. Let p = (G 1 <= r 1, ... , Gn <= rn) be a recursive specification. Then the evaluation 1/1p of p is the function ./. = Dn [[r •.... ,rnll , Dn • 'l'p 22 Theorem. For any recursive specification p, I/Ip : Dn -+ Dn is continuous. While a proof of Theorem 22 is fundamentally straightforward, there are many things to prove owing to the length of the inductive definition 9-21. We have preferred to relegate the details to Exercises 4-11. In Chapter 8 we will similarly establish (but again leaving the details as an exercise) that I/Ip in 21 is a "power-series map." Since we shall prove that every power-series map is continuous, this would provide an alternative proof of 22. We can now give the desired definition of the semantics of recursion: 23 Let p = (G 1 <= r 1>' •• , Gn<= r n) be a recursive specification. By Theorems 23 and 6.2.13, I/Ip : Dn -+ Dn has a least fixed point (/1, ... ,fn). We define P:t=/1: t, that is, the semantics p: of p is the first component of the least fixed point of I/Ip- Our definition ignores the fact that sometimes in a simultaneous recursion defining n functions we want all n functions, not just the first. We do this so that the semantics of any FPF program is a function DTN -+ DTN
6.3 Recursive Specification in FPF 165 which makes for neat mathematical bookkeeping. It is easy to prove that if p = (G 1 <= r 1, ... , Gn <= rn) and t/J has fixed point (h 1 , ... , hn ) then if, say, p' is (G 3 <=r 3 ,G2 <=r 2 , Gl<=rl,G4<=r4, ... ,Gn<=rn), then t/Jp. will be (h 3, h 2 , h 1 , h 4 , ... , h n ) in that h3 = p': is obtainable, if somewhat tediously, by Definition 23. Tables 1.3.1 and 1 were of temporary status as needed to give a clear discussion without introducing too many new ideas at once. Table 24 gives the syntax of FPF in its final form and requires a simultaneous recursion to define functions and REs. Table 24 Complete Syntax of FPF with Recursion Alphabet of Symbols Digits: 0 1 ... 9 Letters: A B ... Z Parentheses: ( ) Atomic Functions: id head tail + - * ...;- = num Function Constructors: == °if then else [ ] a / Recursive Definition Symbol: <= DTNs are defined by: A numeral (= nonempty string of digits) is a DTN whereas (t 1 , ... ,tk >is a DTN ift 1 , ... ,tk (k:2: 0) are. The sets of functions and of recursive expressions (= REs) are simultaneously defined (together with Yarer) for each RE r) as follows: (i) (ii) (iii) (iv) An atomic function is a function. For each DTN t, == t is a function. Each functionfis an RE and Var(f) = 0. Each function variable G (= nonempty string of letters) is an RE and Var(G) = {G}. (v) If fl""'f,. (k :2: 1), p,f, 9 are functions so are (f,. 0'" °fl), [fl,'" ,f,.], (if p then f else g), (af), and (If). (vi) If r 1 , ... ,rk (k:2: 1), A, <1>, 'I' are REs so are (rko"'ork), [r 1 , ... ,rk], (if A then <I> else '1'), (a<l», and (1<1». Moreover, Var(rk0'" or 1) = Var([r 1"'" r k ]) = Yarer d U'" u Var(rk), Var(if A then <I> else '1') = Var(A) u Var(<I» u Var('I'), and Var(a<l» = Var(l<l» = Var(<I». (vii) If G1 , ... , Gn are distinct function variables and r 1,,,,, rn are REs with Yarer;) c {G 1 , ... ,Gn } for all i then (G 1 <= r 1 , ... ,Gn<= rn) is a function. Table 24 is reasonably concise given that it is a complete syntax for a fairly powerful programming language. The semantics is exactly as discussed earlier and 23 provides the semantics of 24 (vii). The point of the simultaneous definition of functions and REs is that a function defined by recursion can be used to build new functions and REs as we would expect. We round out this section with some examples of FPF functions. We begin with a basic iterative construct.
166 6 Order Semantics of Recursion 25 Definition. If p,J are functions, while p do f =abb G <= if p then (G 0 f) else id By 10, 11, and 17 the evaluation tjI: D --+ D of 21 is tjI(h) = if p then hf else id with Kleene sequence tjI(.L) = if p then 1. else id, t 1-+ { tjl2(.L) = if p then tjI(.L)f else id, tl-+ { t d fi d un e me f(t) t undefined if p(x) = F else, if p(x) = T, p(p(x)) = F if p(x) = F else, We leave it as an exercise to show Vtjln(.L) has the expected interpretation. In our remaining examples we will relax the formal syntax to conform to more usual programming style, using vertical structure, indenting, and fewer parentheses to enhance readability. The recursive specification (G1 <= r 1 , ••• , Gn <= rn) will be written vertically G1 <= r 1 Note that this includes abbreviations as a special case, since ri will be substituted for Gi • Indeed, ri is not required to have any function variables, and Gi <= ri is a "pure abbreviation" in this case. This recaptures some of the advantages of identifiers but, even so, no values are actually "stored" so there is no danger of "side effects." We illustrate with a square root function. When a function is to be used later, we violate syntax by giving a symbolic name that is not a string of capital letters. 26 Example. The square root function, coded in FPF as follows r <= if I In = largest m with m2 num then 1. else if eqo v eql then id else predopr2oGo[id, =1, =1J; G <= while ::;; 0 [pr 3, pr 1 J do ::;; n, may be
167 6.3 Recursive Specification in FPF 27 1- <= -;- 0[= 1, =0]; 28 succ <= if I num then 1- else + °[id, = 1]; 29 pred <= if I num then 1- else if eqo then =0 else - °[id, =1]. Here there are five function variables, only G used recursively. Clearly, 1- is the everywhere-undefined function, succ(n) = n + 1, and pred(n) = if n = 0 then 0 else n - 1. For a sample computation, fo = pred o pr 2 o G<28,1,1). But G(28, 1, I) = <28,6,36) via (28,1,1) ~ (28, 2, 4) ~ (28, 3, 9) ~ (28,4,16) ~ so fo = pred 6 = 5. (28, 5, 25) ~ <28, 6, 36) 30 Example. The function size <t l ' ... , t k ) = k is defined by size <= if num then 1- else = 0 else succ °size °tail. if eq< >then A sample computation is size(3,« ),4» = 1 + size«< ),4» = 2 + size( < ») = 2. 31 Example. The reverse function is defined by rev <= if num v eq< > then id else join °[rev °tail, head]. Thus, rev«5, < », <7,2),4) = join <rev «7, 2),4), <5, < ») = join (join <rev (4), <7,2», <5, < ») =join(join(join<rev< ),4),<7,2»),<5,< ») =join(join(join« ),4),<7,2»,<5,< ») = join <join «4), <7,2», <5, < ») =join«4,<7,2»,<5,< ») = <4, <7, 2), <5, < »).
168 6 Order Semantics of Recursion 32 Example. last: <t1, ... ,tk ) = tk last EXERCISES FOR SECTION is defined by =abb head 0 rev. 6.3 1. The function sum of 2 is not the same as 1+. Explain why. 2. Modify 4 so that the semantics multiplies all numerals on a tree, ignoring empty subtrees. 3. Use Definition 9 to compute the evaluation of r E RE(G 1 , Gz ) if r = ((if P then(f 0 [(IXGd, id]) else tail) 0 Gz 0 [G 1 ,/GZ ]). Exercise 4-111ead to a proof of Theorem 22. 4. If D, E, F are domains and gJ: D -> F is continuous. J: D -> E, 5. If D, E are domains and if J: D -> g: E -> F are continuous, prove that E is constant, J(d) = eo for all d, then J is continuous. 6. If D,E are domains and if d 1 ~ d z ~ d3 ~ ···in D, e 1 ~ e z ~ e 3 ~ ••• in E, show that V {(d.,ern)ln,mEN} exists in D x E and coincides with (V dn, Vern). 7. Let D 1, ... , Dn (n ;;:: 2) be domains, let F be a domain, and let J: Dl x ... x Dn----> F be a total function. Say that J is separately continuous if for each i E {l, ... , n} and for each fixed choice of dj E Dj (j ¥= i) the function g: Di -> F, g(x) = J(d l' ... , di - 1 , X, di +1' . .. , dn) is continuous. Show that J is continuous if and only if J is separately continuous. [Hint: That continuous implies separately continuous is easy. For the converse, use Exercise 6 for n = 2 and then use induction capitalizing on the poset isomorphism D1 x ... x Dn+! -> (D1 X ... x Dn) x Dn+1]. 8. Let D 1, . .. , DO' E be domains. Show that pri: D1 x ... x Dn ----> Di is continuous and show that if /;: E -> Di are continuous then [[J1' ... ,1.]]: E ----> D1 X ..• x Dn is continuous. (Compare Exercise 6.2.5.) 9. Show that composition, (fj, ... Jk) ~ he 0 . . . 0 J1, is a continuous function Pfn(Xo,Xd x ... x Pfn(Xk _ 1'Xd---->Pfn(XO ,Xk ). [Hint: By Exercise 7, this reduces to showing g 1-+ hgJ is continuous.] 10. Write a recursive definition of IJ in FPF (given fixed FPF function using the symbol/. J) without 11. Complete the proof of Theorem 22 by verifying that the functions of 16, 18, 19, and 20 are continuous. (For a shortcut on 20 use Exercise 10.) 12. Show that the semantics of F <= if num v eqo then id else IXF is the identity function. 13. Write an FPF function list such that list: t = <n 1 , ••• , nk ), where n 1 , .•. , nk are the numerals occurring in t in left-to-right order. Thus,
6.4 Fixed Points and Formal Languages list:« 169 »= list: < ) =< ), list: «1, «2), < », (3,4,5») = <1, 2, 3, 4, 5). 14. Write an FPF function iota with iota: n = <1,2, ... ,n). 6.4 Fixed Points and Formal Languages In this section we briefly recall the formal definition of a context-free grammar G on the alphabet (set of terminal symbols) X, and the usual definition of the language L(G) c X* that G generates. (X* is the set of finite strings over X, including the empty string A.) We give a fixed point definition of L(G) by showing that G induces a continuous function t/lG: (2 x*t ------. (2x*)" which maps n-tuples of languages (subsets of X*) to n-tuples of languages, where n is the number of nonterminal symbols in G, and that if (L I , ... , Ln) is the least fixed point of t/lG' then LI = L(G). 1 Definition. A context-free grammar G over the alphabet X with set V = {VI' V 2 , ... , vn } (with no Vi E X) of nonterminals and specified start symbol VI is characterized by a set P of productions Pc V x (VuX)*. We write G = (X, V; VI' P), and rewrite (v, w) in P in the synonymous form V -+ w. The use of the production V -+ W is to allow modification of a word u by replacing any occurrence of the letter v in u with the word w. We now characterize the language generated by the grammar as the set L( G) of terminal strings (members of X*) which can be derived from the start symbol by a finite number of applications of the productions of P. 2 Definition. We write WI => W 2 and say WI directly derives W 2 (with respect to G), if there exists a production v -+ W of P, and strings w', w" in (Vu X)* such that WI = w'vw" and W 2 = w'ww". We write WI b Wn and say WI derives Wn , if WI = Wn , or there exist W 2 ,.·., Wn - I such that WI => W 2 ,···, Wi => Wi + I '·.·, Wn - I => Wn . We then define the language generated by G to be the set L(G) = {WIWEX* and VI b w} of all terminal strings derivable from the start symbol. 3 Example. Let X = {a, b}, set V = {v I , V 2' V 3 }, and let P com prise the productions VI -+ v 2 , VI -+ v 3 , v 2 -+ av 2 b, v 2 -+ ab, V3 -+ bv 3 a, V3 -+ ba. Two
6 Order Semantics of Recursion 170 typical derivations of terminal strings from VI VI VI are: =V2 =aV2b b aaav2bbb =aaaabbbb = a b4, =V3 =bv 3a =bbv 3aa =bbbaaa = b3a3. 4 For this simple example it is clear by inspection that L(G) = {anbnln 2 I} u {bnanln 2 1}. To prepare the way for the general fixed point theory, we now show how this L(G) can be obtained from the least fixed point of a suitable operator "': (2X*)3 -----+ (2 X*)3. We start by rewriting the productions for G by "adding" all the productions with the same left-hand side: 4 + ab, V3 -----+ bv 3a + ba. V2 -----+ av 2 b We now replace the nonterminal symbols VI' v2, and V3 by variables VI' V2, and V3 which take values in the domain (2x*, c) of languages over X, and regard 4 as defining the sought-for function "': (2X*)3 -----+ (2 x*?: 5 where the previously formal + now denotes union, and bV3a + ba is shorthand for {bwalwE V3} u {ba}, and so on. Let us now apply Theorem 6.2.13 to determine the least fixpoint of the '" of 5. (That", is continuous will be proved below.) We compute the Kleene sequence of '" as follows: "'O(.l) = (0,0, O), ",I(.l) = "'(0,0,0) = (0,ab,ba), "'2(.l) = "'(0, ab, ba) = ({ab,ba}, {a 2b2,ab}, {b 2a2,ba}). It is then easy to prove, by induction on m, that 6 "''''(.l) = ({a i bi I1 ~j < m} u {b i ai l1 ~j < m}, {a i bi l1 ~j ~ m}, {b i ai l1 ~j ~ m}). Thus, the sequence ",m(.l) is indeed an ascending chain, and 7 V ",m(.l) = (L(G),{aibilj 2 1},{b i ai lj 2 1}). m;?,O We see that L(G) is indeed the first component of the least fixed point of "'. More generally, we see that for k = 1, 2, 3, 8 thekthcomponentof V",m(.l)is{wlwEX* m;?,O and Vkbw}.
171 6.4 Fixed Points and Formal Languages But if we look at 6, we can read off even more information then given by 8. Let :;,j be the jth power of the relation :;,: w:;,j W' (w derives w' inj steps) just in case there exists a sequence WI' W 2 ,.··, Wj (while such that W => WI just means that W =>0 W' => W 2 => ... Wj - 1 => Wj W = w'). Then the reader should be able to show that 6 yields 9 the kth component of IjJm( 1.) = {w IWE X* and Vk =>j W for some j ::;; m}. However, the form of 9 is misleading, as our next example shows: 10 Example. Consider the grammar with one variable and with productions summarized in the form VI ---+ a + av1a + v1bbv 1. Then, analogously to 5 this induces 1jJ: 2x ' ~ 2 x ' with I/I(vd a = + aVla + V1bbV1· Then IjJO(1.) = 0, 1/1 1 (1.) = {a}, 1/1 2 (1.) = {aaa,abba} u {a}. But 1/1 2 (1.) does not easily satisfy 9 since the shortest derivation of abba is VI => VI bbv 1 => abbv 1 => abba which takes 3 (rather than ::;; 2) steps. However, if we look at the derivation tree for abba. we see that it is of height 2. In other terms, if we allow parallel replacement (all variables in a string may be replaced in a single step), then we can indeed derive W in 2 steps: VI => VI p bbv 1 => abba. p This is reminiscent of the "all-call" semantics of Section 5.1 in that all variables are replaced on each "cycle," but is nondeterministic in that anyone of a whole set of productions may be chosen in replacing each occurrence of a variable. We now give the general definition of =>: p
172 6 Order Semantics of Recursion 11 Definition. For any grammar G and strings w, w' in (V u X)*, we say w parallel-derives w' and write w => W' just in case we can write p (a) for some k ~ 1 with each wIj in X* and each v Ij in V, and there exist productions vlj --+ w2j in P such that or w is terminal, (b) and w = w'. While 9 does not hold for our present example, it can be shown that t/I does satisfy 12 the kth component oft/lmCl) = {WIWEX* and Vk=>mW}. p But with 12 at our disposal, we have no trouble in providing the general theory. 13 Theorem. Let G {vl, ... ,vn }. Let = (X, V, VI' P) be a context-free grammar, with V = t/lG: (2 x *t ---+ (2 x *t be the function obtained from P in the manner exemplified by 4 and 5. Then t/lG is continuous, and L( G) equals the first component of the least fixed point of t/lG' PROOF. (i) A typical component of t/lG(VI "", v,,) will look like 14 where each w is from X* and each V is from {VI' V2 , ••• , v,,}. But it is clear that such functions are continuous (Exercise 1), and so t/lG is continuous. (ii) We need to verify the assertion that 15 the kth component of t/I'G( 1-) = {w IWE X* and Vk =>m w}. p Denote the right-hand side of 15 by L(v k , m) for convenience. Since =>0 w iffw = Vk , it is clear that p Vk L(Vk'O) =0 for 1 :::;; k :::;; n and so 15 holds for the basis step, m = O. For the induction step, suppose that t/I~(1-) = (L(v l , M), ... , L(v n , M» for some M. We must now prove that 15 holds for m = M + 1. Suppose, then, that 14 defines the kth component of t/lG(VI,"" v,,). Then if r > 0, all the terms of
173 6.4 Fixed Points and Formal Languages are in L(Vk' M + 1) by clause (a) of the definition of =, while if r = 0, the p terms are in L(Vk' M + 1) by clause (b); similarly, for the other terms of 14. Conversely, a string W belongs to L(Vk' M + 1) just in case there is a production like V k -+ W 10 Vl1 .•. V1rW Ir and strings w1u in L(v 1u , M), for 1 ::;;; u ::;;; r, such that W = W 10 Wl 1 ••• Wl r W 1r • Putting all this together, we conclude that t/I~+l(.l) = (L(Vl,M + 1), ... ,L(vn,M + 1)) and thus, by induction, that 15 holds for all M. (iii) Combining (i) and (ii), we have that the least fixed point of t/lG satisfies V m:<;;O = ({ WIWE X* and Vk =* W})l:<;; k:<;; n· t/lm(.l) P In particular, the first component of the least fixed point equals L(G). EXERCISES FOR SECTION D 6.4 1. We consider maps (2 x*t ---+ 2x'. Prove that 1/11 + 1/12 is continuous if 1/11' 1/12 are, where U sing Exercises 6.3.4-8, verify the assertion in part (i) of the proof of Theorem 13. 2. Let A, BE 2 x * and define 1/1: 2 x ' VE Y}. ---+ 2 x' by I/1(Y) = A Y + B, where AY = {wvlw E A, (i) Show that A *B (where A * = {A} u A u A 2 U ... ) is the least fixed point of 1/1. (ii) Show that if A ¢' A, A *B is the only fixed point of 1/1. [Hint: If Y ¢ A *B there exists Y E Y of shortest length with y ¢' A *B; show this is impossible if 1\ ¢' A.] The following definitions are needed in Exercises 3-7. An m x n matrix of languages is with each Aij E 2X". Mimicking the formula for matrix multiplication in linear algebra define Y+Z=YuZ YZ = {vw: VE Y, WEZ} for Y, Z E 2 X " and then define matrix multiplication [Aij] [BjJ m x n [Aij] and n x p [Bjk ], yielding m x p [C ik ] by = [C ik ] for
174 6 Order Semantics of Recursion Define matrix sum for m x n [Aij], [Bij] by [Aij] + [Bij] = Cij [Cij], = Aij + Bij. 3. Verify = ([Aij] [Bjk ]) [Ckl ]; + [Bij])[Cjk ] = ([AiJ [Cjk ]) + ([Bkj ] [Cjk ])· (i) [Aij]([Bjk ] [Ckl ]) (ii) ([Aij] 4. If Aij' Bi E 2 x • verify that the simultaneous system Y1 = All Y1 + ... + A 1n Y" + Bl Y" = Anl Y" + ... + Ann Y" + Bn has matrix form [lj] = [Aij][lj] + [BJ and least fixed point [Aij]*[BJ [Aij]* = I + [AiJ + [Aij]2 + "', where [.. = 'J {{A} 0 is the identity matrix and [Aij]n is the matrix product of [Aij] with itself n times. 5. Verify that the m x n identity matrix [Iij] of Exercise 4 satisfies [Iij] [Ajk] = [Ajk] for m x n [Ajk] [Bij][Ijk] = [BiJ for p x n [Bjk ]. 6. Verify that the system of Exercise 4 has exactly one fixed point solution if A¢: Aij for all i,j. [Hint: Use induction; for n = 1 use Exercise 2.] 7. In this exercise we assume the reader knows what is meant by the language L recognized by the finite-state automaton with state graph x
175 Notes and References for Chapter 6 Quick review: L = {a 1 , ••• , ani there exists a path with edge labels a 1 , ••• , an from the initial state q 1 to a final state (exclusively Q2' here). Thus, x 2 y2 x E L but no word beginning with y is in L. Solve for L using Exercises 4 and 6. [Hint: Let 1'; be the language recognized if qi were initial, so we seek L = Y1 • The second equation is Y2 = XY2 + {x,y} Y3 + {A}. To solve, use substitution and Exercise 2.] Notes and References for Chapter 6 The systematic use of domains (whose order relation abstracts "approximation") and continuous functions in computation is due to D. s. Scott ["Lattice theory, data types and semantics," in R. Rustin (ed.), Formal Semantics oj Programming Languages, Prentice-Hall, 1970]. Theorem 6.2.13 was proved earlier by S. C. Kleene [Introduction to Metamathematics, Van Nostrand, 1952]. Theorem 6.2.8 is due to A. Tarski ["A lattice-theoretical fixpoint theorem and its applications," Pacific Journal oj Mathematics, 5, 1955, pp. 285-309] although the special case (P, :;;;) = (&,(X), c:) as in 2.1.8 was proved by B. Knaster in 1928. We thank Irene Guessarian for Example 6.2.16. The use of recursive equations to find the language recognized by an automaton as in Exercise 6.4.7 is due to D. N. Arden ["Delayed logic and finite-state machines," in Theory oj Computing Machine Design, University of Michigan Press, Ann Arbor, 1960, pp.1-35].
CHAPTER 7 Canonical Fixed Points The previous chapter considered a number of situations in which an object of semantic interest arises as the least fixed point of a continuous map t/J: (D, :5:) ------+ (D, :5:) of some domain (D, :5:). So far, the domain structure is but a technical device to distinguish the least fixpoint from the other fixed points. This suggests the more general question: given t/J: D --+ D without assuming D a domain or t/J continuous, what additional requirements on D and t/J give t/J a distinguished fixed point? But in fact this question may be misguided since it examines one D and one t/J in isolation. The remarkable aspect of the least fixed point V (t/Jn(l.)) is that this formula is the same for all continuous t/J! This brief chapter introduces "canonical fixed points" as a precise method of assigning fixed points to particular classes of t/J's in a uniform way. We establish a criterion for the existence of a unique canonical fixed point. For domains and continuous t/J, the least fixed point is the unique canonical fixed point. In Section 8.2 we show that partially additive monoids equipped with power-series maps again have a unique canonical fixed point, namely, the patttern-of-calls expansion of Section 5.2. We begin by abstracting a "fixed point situation" without involving specific structures such as domains. To this end we will introduce a category of "recursion schemes" according to the following definition. 1 Definition. A category of recursion schemes is a category d of the following type. 2 Each object of d has the form A = (A, (J, t/J), where A is a set, (J is an additional structure on A-what sort depends on the particular d-and t/J: A --+ A is a total function, possibly subject to constraints involving (J.
177 7 Canonical Fixed Points 3 Each morphism f: (A, (J, 1/1) ---+ (A', (J', 1/1') of d is a total function f: A --+A' such that I/I'f = fl/l: A _--,1_-+1 A' ~1 1~' A _--'1'-----+1 A' (although not every such function need be a morphism). 4 Composition in d is ordinary composition of total functions. 5 The identity morphisms are the identity functions. The recursive specifications of 6.2.1 are recursion schemes in which (J is the order relation: 6 Example. Let an object of d be (D, :::;,,1/1), where (D,:::;,) is a domain and 1/1: (D, :::;,) ---+ (D, :::;,) is a recursive specification, that is, 1/1"(.1) :::;, 1/1"+1(.1) for all n. Let a morphism f: (D, :::;',1/1) ---+ (D', :::;",1/1') be a strict map (D, :::;,)---+ (D', :::;,') (where we say f is strict if f is continuous and f(.1) = .1) such that I/I'f = fl/l· Composition and identities are, of course, defined by 4 and 5 above. Such d is a category of recursion schemes. 7 Observation. If d is a category of recursion schemes, any morphism in d preserves fixed points: If I/I(a) = a, then I/I'(f(a)) = f(l/I(a)) = f(a). The philosophy embodied by much work in formal semantics at the time of this writing is to provide the I/I's with enough structure so that any A has a distinguished fixed point. Here, in 2, the emphasis is on (J and morphisms are ignored. We place this in a different perspective by introducing the idea of canonical fixed point which is explicitly based on the structure of d -morphisms. 8 Definition. A canonical fixed point rx for a category of recursion schemes d is an assignment of a fixed point rxA = l/I(rxA) to each object A in such a way that for every f: A --+ A' we have The next theorem-the main result of this section-shows that the nature of canonical fixed points in d is completely understood when d has an initial object. 9 Canonical Fixed Point Theorem. Let d be a category of recursion schemes with initial object A. Then there is a bijective correspondence between fixed points of A and canonical fixed points of d. In particular, if A has a unique fixed point then d has a unique canonical fixed point.
178 7 Canonical Fixed Points PROOF. Write A = (A, 8, ~). Let ~fi = fi be a fixed point of A. Define IXA = !(fi), where!: A ---+ A is the unique d-morphism. Because ~ A IA A IA 1 l~ we have "'(IXA) = "'!(fi) = !~(fi) = !fi = f: A ---+ A' then IXA' so IXA is a fixed point of A. If A y~ A f IN so that f(IXA) = f!(fi) = !(fi) = IXA' and IX is a canonical fixed point. It is obvious that every canonical fixed point arises this way because if IX is any canonical fixed point, IXA = !(tl) if fi = IXA. A special case of this is A = A showing that fi = aA since !: A ---+ A is the identity map, and this means that different fixed points fi of A give rise to distinct canonical fixed points. 0 We now apply this result to show that the Kleene semantics of 6.2.3 is canonical. 10 Theorem. For the category of recursive specifications of Example 6, the Kleene semantics Vn";o ",nCl) is the unique canonical fixed point. Let (N, ~) be the partially ordered set of natural numbers with adjoined greatest element 00, N = N + {oo}, and let s(n) = n + 1, s(oo) = 00. Then s has unique fixed point 00. Moreover, N = (N, ~,s) is initial with the unique homomorphism !: N ---+ A defined by PROOF. !(n) = !(sno) = ",n!(o) = ",nCl) while Hence, these recursion schemes have a unique canonical fixed point given by o EXERCISES FOR CHAPTER 7 1. Let C be a partially additive category. In 3.2.24 we showed that if f: X ...... X + y then for the unique fi: X ...... X, f2: X ...... Y with f = indi + inzf2' the iterate ft: X ...... Y of f is given by
179 Notes and References for Chapter 7 In this exercise we show that ft arises as a canonical fixed point for the following category .91 of recursion schemes. r: Objects: (C(X, Y),f, I/If) where f: X --+ X + Y, I/If(g) = gfl + f2' Morphisms: r: (C(X, Y),f, I/If) ----> (C(X, f), I, 1/11) is C(X, Y) ---+ C(x, f) satisfying the following three properties. a function (i) Whenever (g;) is summable in C(X, Y) then (rg;) is summable in C(x, f) and r(Ig;) = I(rg;). (ii) r(2) = lz. (iii) r(gfl) = (r(g».r; for all g E C(X, Y). Show that .91 is a category of recursion schemes and that ft is a canonical fixed point. 2. Create a version of Exercise 1 in which objects have the form (Pfn(X, X)', (A, B), I/IA.B) for which fA.B = 00 I BA', m=O as in Exercise 6.2.11, is a canonical fixed point. 3. Create a category of recursion schemes with objects of the form «2 x ·)., G, I/IG) with G a context-free grammar with n nonterminals so that L(G) is a canonical fixed point. Notes and References for Chapter 7 The canonical fixed point theorem is due to the authors: "The pattern-of-calls expansion is the canonical fixed point for recursive definitions," Journal of the Association for Computing Machinery, 29,1982, pp. 557-602.
CHAPTER 8 Partially Additive Semantics of Recursion 8.1 8.2 8.3 8.4 8.5 PAR Schemes The Canonical Fixed Point for PAR Schemes Additive Domains Proving Correctness Power Series and Products In Section 5.2, we used partially additive semantics in Pfn to describe a number of examples of recursive specification as "power-series" maps t/!(h) = Ho + Hl(h) + H 2 (h, h) + ... in which the Kleene semantics could alternatively be given by the pattern-ofcalls expansion. In Sections 8.1 and 8.2 we define recursive specifications and their patternof-calls expansion on general partially additive monoids, and will show that the pattern-of-calls expansion, like the Kleene semantics, may be regarded as a unique canonical fixed point in the sense of Chapter 7. In Section 8.3 we define ordered partially additive categories in which it can be shown that each power-series specification is also continuous and that, moreover, the Kleene semantics and pattern-of-calls expansion coincide. It follows that both order semantics and partially additive semantics apply to important semantic categories. In Section 8,4 we briefly illustrate some rules for correctness which use both ordered and partially additive semantics. Section 8.5 caps the theory ofthe first two sections to provide tools needed to define the semantics of recursive specification in a programming language using partially additive ideas. 8.1 PAR Schemes Generalizing (M,2) = Pfn(X, Y), we seek to formulate recursive definitions in a partially additive monoid (M,2} We have seen examples in Section 5.2 in which recursive specifications t/!: M - M can take the form t/!(a) =
181 8.1 PAR Schemes I Hm(h, . .. , h) for suitable maps Hm: M m --+ M. In these motivating examples, H m (h 1 , ••• , h m ) was defined as follows: For each distinct m-substitution path in a recursive call, replace the jth occurrence of a variable by hj (1 ::s; j ::s; m) and compose the partial functions along the path; then sum over all these paths. There are, in fact, more general ways of combining m functions than composing, as was seen in the definition of recursive specification for FPF in Section 6.3 where, for example, Hm = constrm as in 6.3.16 is important. The goal of this section is to give an abstract definition of suitable Hm to introduce a general theory of partially additive recursive specifications. (Some of this theory is postponed for Section 5.) The main property required of Hm is m-additivity as is now defined. 1 Definition. Let (M1' I1), ... , (Mm' Im), (M, D be partially additive monoids. A function L: M 1 --+ M is additive (M1, I 1) --+ (M, I) if for all summable families (h;) in (M1, 1), (Lh;) is summable in (M, and I D L(I h;) = I Lh;. More generally, a function L: M1 x ... x Mm --+ M is m-additive if whenever all but the jth variable is fixed the resulting function M j --+ M is additive, that is, L(h 1,· .. , hj - 1, I c;, hj +!, ... , h m) = I;L(h 1 , ... , hj - 1, c;, hj + 1,· .. , h m) for allj, summable families (c;), and all choices of fixed ht E M t (t #- j). Obviously, I-additive is the same as additive. For completeness we define to be any element of M. (The one-element set a O-additive map 1 --+ (M, 1, the empty product (2.3.10) M1 x ... x Mm when m = 0, should be thought of as the trivial partially additive monoid whose only element is 0.) I) 2 Observations. Any composition of additive maps is additive. The identity map idM : (M, --+ (M, is additive. If L is additive then L(O) = O. I) I) To see this, if L: (M, I) --+ (M', I'), tive, then I.;L(Ih;) = L'(I'Lh;) = I.;: (M', I') --+ (M", I") are addi- I" I.; (Lh;) = I" (L'L)h; shows L'L is additive. This argument may be iterated to see that a finite composition of additive maps is additive. That the identity map is additive is obvious and an additive map preserves 0 by definition since 0 is the empty sum. 3 Example. Let m ;?: 2 and let X o, ... , Xm be objects in a partially additive category C. Then the composition map C(Xo,Xd x'" X C(Xm-1,Xm)~C(XO,Xm) (h 1,·· ., hmH----+ h m'" h1
182 8 Partially Additive Semantics of Recursion is m-additive. For by the axioms on a partially additive category, specifically 3.2.1, we have (writing L for Lj to avoid notational confusion) L(h1,.·.,hj-1,LC;,hj+1, ... ,hm} = (hm···hj+1}[(Lc;)(hj-1···hdJ = (h m· .. hj+d L c;(hj- 1 ... hd = L (h m... hj+1}C;hj-1 ... h1 ; 4 Proposition. Let (M 1, L1)' ... , (Mm, Lm), (M, L) be partially additive monoids and let Dt: M 1 X ... x Mm --+ M (t E T) be a family of m-additive maps such that the sum D(h 1,···,hm} = LDt(h 1, ... ,hm} is defined for all h1, ... , hm. Then D is m-additive. PROOF. For fixed h 1, ... , hj- 1, hj+1, ••• , hmlet Lt(c} = Dt(h 1,· .. , hj- 1, C, hj+1, ... , hm}, L(c} = D(h1, ... ,hj-1,C,hj+1, ... ,hm}. We are given that each L t is additive and we must prove that L is additive. But clearly L(c} = LLt(c}. Hence, all reduces to showing that a sum L of additive maps L t is additive. Indeed, if (c;iiEI) is summable (we write LC; instead of Lj C; to avoid notational confusion) we have L(~)c;ii EI)) = L (Lt(L c;ii E I}lt E T) = L(LLtc;iiEI}ltE T) (L t is additive) = L(LLtc;itE T}liEI} (partition associativity) = L(Lc;iiEI}. D From 3 and 4 it follows that if Hm arises from summing composition paths as in our motivating examples in Section 5.2, then Hm is m-additive. 5 Example. As in Section 6.3, let D = Pfn(DTN, DTN} and for m consider so that constrm(h 1,···, hm}: (t 1,···, tm>= [h1 t1'···' hmtmJ. Then constr m is m-additive. This is because both constr m(h 1 ,···, hj- 1, L c;, hj+1, . .. , hm) (t 1,· .. , t m> ~ 1
183 8.1 PAR Schemes and L (constrm(h l ,···, hj-I' ci, hj+ I,···, hm)li E 1) : <t l ,···, tm) mean [hI t l , ... , hj- I tj- I , citj, hj+ 1 tj+ I , ... , hmtmJ for the unique (if any) i with tjEDD(c;). We have set the stage for the main definition of this section: 6 Definition. A partially additive recursive scheme, PAR scheme for short, is (M, H), where (M, D is a partially additive monoid and H = (Hm : m = 0, 1,2, ... ), where Hm: M m ~ M is m-additive for all m = 0, 1, 2, ... subject to the requirement that for each x E M, L (Hm(x, ... , x)lm = 0,1,2, ... ) exists so that the function L, 7 !/IH: M 00 ~ Ml> X~ L Hm(x, ... ,x) m=O is defined. Such !/IH is a "power-series" map (but the formal definition is postponed for 8.5.1). A "polynomial" is the case where Hm = for m ;;:: mo for some mo. In practice, a recursive specification arises in terms of a function !/I: M ~ M. In order semantics, M is a poset and it is a matter of showing !/I is continuous. The PAR-scheme approach has a new complication, namely, that once M is a partially additive monoid it is necessary to find H with !/I = !/IH as in 7. It may not be obvious how to find such H, and we saw in Exercise 5.2.6 that H need not be unique. On the other hand, the partially additive approach has advantages. The Hm used in a PAR scheme relate directly to the constructions used to build recursive specifications in practice. The pattern-of-calls semantics of the next section will provide a semantics in M for each PAR scheme (M, H) in the form of a sum whose terms deal with individual computation paths at a finer level than the Kleene appro ximants !/I;}(..l). The examples of Section 5.2 yielded specifications described by polynomials of degree at most 2. The following example is a nonpolynomial PAR scheme for a specification to compute the determinant of a square matrix. ° L, 8 Example. The following recursive algorithm computes the determinant of an n x n matrix by cofactor expansion along the first column. 0. Define function DET with input matrix MAT, output number Z, and additional local variables I and N. 1. Let N be the number of rows in MAT. 2. If N = 1 go to END. 3 Z:= 0; I := 0. 4. LOOP: I := I + 1; if updated I = N, exit.
184 8 Partially Additive Semantics of Recursion 5. If aij is the i - j entry of MAT and if Bij denotes the submatrix of MAT obtained by deleting row i and columnj, Z:= Z + (-I)I+IaIl DET(BIl ) 6. go to LOOP 7. END: Z := all To emphasize the practical reality of this algorithm, we give an APL program which implements it line-by-line. (The reader need not be familiar with APL since the original description is equivalent.) 'liZ +-- DET MAT; I; N [1] N +-- (pMAT) [1] [2] -'-+ (N = 1)/END [3] Z +-- I +--0 [4] LOOP: -'-+ (N < I +-- 1+ 1)/0 [5] Z +-- Z + (Cl)* 1 + I) x MAT[I; 1] x DET MAT[(l # IN)/lN; 1 + l(N - 1)] [6] -'-+ LOOP [7] END: Z +-- MAT[I; 1] The desired function is an element of the partially additive monoid (M,~) with M the set of all partial functions from the set of all square matrices with real entries to the set of reals. In defining Hn below, we must specify, for each m l , ... , mn, what Hn(m l , ... , mn) is as an element of M. We do that by exhibiting the number it returns when given a matrix MATas input. Then define Ho E M by Ho = if MAT = [all] is 1 x 1 then all else undefined; HI: M -'-+ M = is the always-undefined function; H 2: M2 -'-+ M by H 2(m l , m 2) = if MAT = [all a 21 a 12 ] is 2 x 2 a 22 then a ll m l ([a 22 ]) - a2Im2([aI2]) else undefined; H3: M3 -'-+M by H 3(m l ,m2,m 3) else undefined. = if MAT =[::: a 31 al2 a13] a 22 a 32 a 23 is 3 x 3 a 33
185 8.1 PAR Schemes Similarly, Hn(m l , ... , mn) is defined to yield a result only for a MAT that is n x n. Then l/J: M -+ M, defined by l/J(m) = L Hn(m, ... , m), n~O is the sought recursive specification. The desired semantics is equally given by the Kleene semantics or the pattern-of-calls expansion to be defined in the next section. In fact, the least fixed point is total, as is the only fixed point, and we thus use the fixed point equation DET = l/J(DET) to compute the determinant of a 2 x 2 matrix. DET([i n) = l/J(DET)(U ~J) = H 2 (DET, DET)(U D) = 2 DET([5J) - 4 DET([3J) = 2l/J(DET)([5J) - 4l/J(DET)([3J) = 2Ho([5J) - 4Ho([3J) EXERCISES FOR SECTION = 2·5 - 4· 3 = -2. 8.1 1. Use the algorithm of8 to compute DET[~ o 1] 1 0 3 4 2. In this exercise we briefly discuss polynomial maps on vector spaces to indicate the analogy with the partially additive "polynomials" defined after 7. Say that a function f: vm --> V, Va vector space, is m-linear if when all but one of the m variables are fixed with arbitrary elements of V, the resulting map V --> V is linear. Thus, 1-linear is linear. We define O-linear to mean constant. (i) Let R be the one-dimensional vector space of reals. Show that f: R --> R is linear if and only if there exists a constant b with f(x) = bx. [Hint: b = f(1).] (ii) Show that t: R --> R has form f(x) = cx 2 with c constant if and only if there exists 2-linear H 2 : R2 --> R with f(x) = H 2(x, x). [Hint: c = H 2(1, 1), H 2 (t, u) = tu.] (iii) Show that f: R --> R has form dx' with d constant if and only if there exists n-linear H.: R' --> R with f(x) = H.(x, ... , x). It follows that the familiar polynomial function p: R --> R, p(x) = ao + a l x + a2x2 + ... + a.x' is just p(x) = Ho + Hl(X) + H 2(x,x) + ... + H.(x, ... ,x) with H.: Rn --> R n-linear. The latter generalizes immediately to define polynomials p: V --> V in arbitrary vector spaces V. For example, (iv) Let V = R2 be the Cartesian plane, a two-dimensional vector space. Define
186 8 Partially Additive Semantics of Recursion p: V --+ V by + 2xy - y,y2 + 10 - 2x) show that p(x,y) has the form Ho + H1(x,y) + H 2((x,y),(x,y)) with Hn: p(X,y) = (X2 n-linear, n = 0, 1, 2. V --+ V 3. Letjl,'" ,jk Z 0 and let H,: Nil x ... X N'i, ---+ N, bej,-additive for 1 ~ t ~ k. Let L: Nl x ... X Nk ---+ N be k-additive. Define m = jl + ... + k Show that Nll x ... X N1j, X N21 X ... X Nkik M ---+ N, .», M(h ll ,···, hkik ) = L(Hl (h ll ,···, h1j,), ... , Hk(h kl ,···, hki is m-additive. [Hint: Despite the cumbersome notation, show that if all but one of the variables is fixed, the resulting function of one variable is the composition of two additive maps.] 4. Let (M, D = Pfn(DTN, DTN). Show that the map if-then-else: M3 --+ M is not 3-additive. (Further discussion to resolve this situation is given in 8.S.IS-no fair peeking.) 8.2 The Canonical Fixed Point for PAR Schemes The PAR schemes of 8.1.6 are the objects of a category of recursion schemes (7•• 1) whose canonical fixed point exists uniquely as an application of the canonical fixed point theorem 7..9 and coincides with the pattern-of-calls expansion 5. In addition to being a useful result for later work, this establishes that the pattern-of-calls expansion is always a fixed point solution. We begin by making PAR schemes into a category. 1 Definition. Let (M,I, H), (M',I',H') be PAR schemes. A homomorphism ifo: (M, I, H) ---. (M', I', H') of PAR schemes is an additive map ifo: (M, D -----+ (M', I') which also satisfies H~(ifoml,···,ifomn) = ifoHn(m1,···,mn) for all n ~ 0 and m 1 , ... , mn E M. It is obvious (using 8.1.2) that the composition of homomorphisms is a homomorphism and that the identity function is a homomorphism (M, I, H) -----+ (M, I, H). To conform to Definition 7.. 1, the category of partially additive recursive schemes, call it r!J>, has Objects: (M,a,I/I) where a = (I, H) with (M,I,H) a PAR scheme and 1/1 = I/IH as in 8.1.7. Morphisms: Homomorphisms <p as above. By remarks already made, this is clearly a category. We will treat a PAR scheme (M,I" H) as the object (M,(I,H),I/IH) of r!J> without comment in the sequel.
187 8.2 The Canonical Fixed Point for PAR Schemes According to Definition 7.. 1, to ensure that f1J> is a category of recursion schemes we must show for any homomorphism rjJ: (M, I, H) - - (M', I', H') that commutes. This is verified by rjJt/Jh = rjJ I = = Hih, ... , h) I' rjJHn(h, ... , h) I' H~(rjJh, ... , rjJh) since rjJ is additive since rjJ is a homomorphism = t/J' rjJh. To apply Theorem 7•.9 to f1J> we will need to construct an initial object. This requires considerable discussion. We begin by considering trees with n-branch nodes labeled Wn such as (00 Each such tree can then be thought of as the abstract specification of a pattern of calls. Given a PAR scheme H, we interpret such a tree by evaluating from leaf to root, replacing Wn with Hn as we go. We formalize with the following: 2 Definition. The abstract syntax for PAR-scheme semantics is the set e of all trees defined inductively as follows: Basis Step: ~o E e. Induction Step: Ift1' ... , tnEe, n ~ 1, then AWn tn t1 Ee. ••• In short, e consists of all finite-depth finitely branching trees with a node labeled W k if there are k branches from that node. In particular, each leaf is labeled Wo. Sometimes linear notation for elements of e is more useful. We achieve this by writing A Wo = • Wo
188 8 Partially Additive Semantics of Recursion so that the first two of the trees above have the linear form WI [WI [WI [WO]]], W2[WO, W3 [W O, WO, WI [Wo]]]. The trees in e abstractly represent the result of all possible iterated substitutions or patterns of calls, starting with wo which represents Ho. Our hope is to continue to develop the theory at a level of abstraction which relieves us from keeping track of special path structure when it plays no role. Such details will, of course, be necessary when analyzing specific examples. L, H), the interpretation SH of the tree 3 Definition. Given a PAR scheme (M, s in e is the element of M obtained by "running H on s." Basis Step: (~O)H Induction Step: = Ho. (A )H tl ... tn = Hn(tf1, ... , t:!). For example, the determinant of a 3 x 3 matrix [au] is H) as in Example 8.1.8 where (M, L, S~([au]) for Wo because a 3 x 3 determinant via cofactor expansion reduces to three 2 x 2 determinants each of which in turn reduces to two 1 x 1 determinants. Not every SEe corresponds to a possible pattern of calls in this example and for such s, SH = O. For example, consider S Here SH = H 2((W 2 [W O,WO])H,wff) = H2 (H2 (Ho, Ho), Ho)· By definition of H 2 , this is defined only for 2 x 2 input [au] to be all (H2(Ho, HO)([a22 ])) - a 2l H o([a12])' But this is undefined since H 2(Ho, Ho) is undefined on a 1 x 1 matrix.
189 8.2 The Canonical Fixed Point for PAR Schemes We need one technical result. 4 Lemma. Let (M,~) be a partially additive monoid. Given summable families (a;), (bj ) in M and 2-additive H 2: M2 ~ M, "Li.jH2(ai,bj) exists (the sum is countable-see Exercise 7) and "L H 2(ai,bj) = H 2("L ai,"L bj)' i,j More generally, given an m-additive map Hm: M m ~ M and m summable families (aU, ... , (a4:,), the following equation holds (including the assertion that the right-hand side exists): Hm(~afl, ... ,~a4:,) =. 'm '1 PROOF IDEA. "L. Hm(al" ... ,a;:). lto···.'m For example, H2(a + b, c + d + e) = + d + e) + H 2(b, c + d + e) = H 2(a,c) + H 2(a,d) + H 2(a,e) + H2(b,c) + H2(b,d) + H 2(b, e). H 2(a, c The general proof is similar. D We are now ready to define the pattern-of-calls expansion. 5 Definition. If (M, "L, H) is a PAR scheme, its pattern-of-calls expansion eH is given by eH = "L(SH: sEe), where SH is the interpretation of s as in 3. Of course, we need to be sure this sum exists. This is always so by the following theorem: 6 Theorem. For any PAR scheme (M,"L,H) the family (SH: SEe) is summable so that the pattern-of-calls expansion eH exists. PROOF. By basic set theory the sum is indeed a countable one-see Exercises 6-10. By the limit axiom for partially additive monoids (3.1.2), we may deduce that the sum "L (SH: sEe) exists if we can show that every finite subfamily is summable. Since any subfamily of a summable family is summable by 3.1.7, it suffices to show that there exists an ascending sequence of subsets ofe So such that Uk~OSk = C Sl C S2 c··· e, and each "L(SH: sESd exists. Define So = {wo} and
190 8 Partially Additive Semantics of Recursion Sk+1 = {W n[t1,···,t nJ: n?: 0, each ti is in Sd. Then certainly e = Uk<!OSk' Also, Sk C Sk+1 since Wo E Sl' and for the inductive step, if t E Sk then either t = Wo and tESk+1 or t = Wn[Sl," "snJ with n > 0, SjESk- 1 so that SjESk by the induction assumption and, hence t E Sk+1' It only remains to show that I(SH: SESk) exists. For k = use the unary sum axiom. Given that I (SH: S E Sk) exists, we may deduce that ° I(SH: SESk+1) = I(Hn(tr, ... ,t~): n?: 0, each tiESk) exists, being just I/lH(IsH: sESd, by Lemma 4. D We are now ready to return to our goal of showing that the category PAR schemes of 1 has an initial object. 7 Definition. The initial PAR scheme (A, f1J> of !, 1l) is defined by A = the set of all subsets of the abstract syntax e of 2; !(Si: iEI) is defined when the sets Si are disjoint- i # j implies Si n Sj = tfi-and is then U (Si: i E I); Hn: An --+ A, (Sl"'" Sn) ~{Wn[t1"'" tnJ: t1 E Si}' For example, H2( {wo, w 2 [W O, woJ}, {WI [woJ}) = {W2[WO' WI [woJJ, W2 [W 2 [W O, woJ, WI [woJJ}· There is work to do if we are to show this is indeed the desired initial object. We begin with the following: 8 Proposition. (A,!, 1l) is a PAR scheme. PROOF. It is obvious that (A,!) is a partially additive monoid. We must show Hn is additive in each variable. For notational convenience we show additivity in the first variable. Let S2' ... , Sn E A be fixed and consider a summable family (7;: iEI). Then an element of H n(7;,S2,,,,,Sn) has form wn[t, S2"'" snJ with t E 7;, Sj E Sj. Since 7; n 1j = tfi if i # j, H n(7;, S2"'" Sn) n Hn(1j, S2,···' Sn) = tfi. It is then clear that Hn (~ 7;, S2"'" Sn) = ~ Hn(7;, S2,···' Sn)· Finally, we must show that for each SEA, I/lH(S) = I n<! 0 Hn(S, ... , S) is defined, that is, that Hn(S, ... , S) n Hm(S, ... ,S) since each tree in Hn(S, ... ,S) has root wn- = tfi if m # n. This is clear D
191 8.2 The Canonical Fixed Point for PAR Schemes 9 Proposition. (A,!, H) has a unique fixed point, namely, e. PROOF. The fixed point equation is and is certainly satisfied bye. To see that there is no other fixed point, note that the equation implies that {~o} c S, and it then follows inductively that any w-tree is in S, so S = e. D This sets the stage for a major objective: 10 Theorem. (A,!, fl) is an initial object of the category f1JJ of PA~ schemes. For each PAR scheme (M,I,H) the unique homomorphism !: (A, I, fl) --+ (M, I, H) is defined by 11 !(S) = I(sHlsES) for each subset S of e (i.e., each S in A). We call! the canonical homomorphism to (M, I, H). PROOF. By Theorem 6 and 3.1.7, the sum I (sHls E S) is defined for each subset S of e. Thus, !(S) is well defined. To show that! is a homomorphism we proceed as follows. (i) Let Sl' ... , Sn be n subsets of e. Then !Hn(Sl"'" Sn) = = I I ((W n[t1"'" tnJ)HltiE S;) (Hn(t~,···, t,!f)1 ti E SJ = Hn(I (t~lt1 E Sl)"'" I (t,!fltnE Sn)) by n-additivity of Hn and 4 = H n(!(Sl),···, !(Sn))· (ii) Let the Si be disjoint subsets of e so that Ii Si is defined. Then = II(SH: SES;) i = by partition associativity since the Si are disjoint I!(Si) i so that! is additive. It only remains to show that ! is unique. But suppose that if> is any homomorphism (A,!, fl) - - + (M, I, H). Then S is the sum of its oneelement subsets and
192 8 Partially Additive Semantics of Recursion ¢J(S) = ~)¢J({S})ISES) for each See, by additivity, while ¢J({wo}) = Ho and ¢J({wn[t1, .. ·,tnJ}) = Hn(¢J(t 1), ... , ¢J(tn)) for any n Q-trees t i • But these last two equations together imply that ¢J( {s}) = SH for each s in e, and so ¢J(s) = (SH: s E S) = !(S). Hence! is unique. D L We then have our main theorem on PAR schemes: 12 The Canonical Expansion Theorem. The assignment (M,L,H)~eH = L(sHlsEe) of the pattern-of-calls expansion to each PAR scheme provides the unique canonical fixed point for the category of PAR schemes and their homomorphisms. PROOF. The Canonical Fixed Point Theorem 7..9 tells us that if there is an initial PAR scheme (A, 11)-with ! the unique scheme homomorphism to (M, H)-and if this initial scheme has a unique fixed point ao, then there is a unique canonical fixed point, namely, that given by (M, H) ~ !(a o). Applying this to the present circumstances, we have that ao = e by 9, while !(S) = L(SH: SES) by 11. D !, L, L, EXERCISES FOR SECTION 8.2 1. Prove in detail the following claims left to the reader in Definition 1: (i) id M : (M, H) ------> (M, H) is a homomorphism. L, I, (ii) If tfJ:(M,L,H)------>(M',I',H'), tfJ':(M',I',H')------>(MI,I",H") are homomorphisms, so is tfJ' tfJ: (M, I, H) ------> (Mil, I", H"). 2. Verify in detail the claim made in Proposition 8 that (A,!) is a partially additive monoid. 3. Let X be a set and consider the set Lx = 2 x ' oflanguages on X. (i) Show that (Lx, I, ·,1) is a partially additive semi ring (Exercise 3.3.14) if I is union, . is setwise concatenation AB = {WVIWEA,VEB}, and 1 = {A} with A the empty string. (ii) Show that (Lx, H) with Ho = 1, Hi (S) = aSb (which we write for the more tedious {a}S{b}), other Hm identically 0 is a PAR scheme with L, I/IH(S) = 1 + aSb and eH = {anbnln = 0,1,2, ... }. Observe that eH = L(G) for G the grammar: S-d S--->aSb.
193 8.3 Additive Domains 4. Generalizing the example of S3 for the determinant algorithm of S.l.S as discussed following 3, describe Sm E e so that S:: evaluates determinants of m x m matrices. 5. Use 4 to expand H3(a + b,c + d,e + f + g). In this section we have considered sums of the form which denotes a sum of the form ~::<bNEII x ... x 1m) with each It countable. Hence, it is essential to know that a finite product of countable sets is countable, since we have only considered summing countable families in a partially additive monoid. Exercises 6-10 review the necessary set theory culminating with the verification that the pattern-of-calls expansion is a countable sum. 6. Show that there exists a bijection IX: N x N 1X(1,0) = 2, 1X(0,2) = 3, 1X(1, 1) = 4, .... ] --+ N. [Hint: IX(O,O) = 0, IX(O, 1) = 1, 7. Let I = {iI' i2, i3""}' J = {jI,j2,h, ... } be countable. Show that I x J is countable. [Hint: For IX as in Exercise 6, show that f: I x J --+ N, f(im,jn) = lX(m, n) is bijective; use a subset argument if one of I, J is finite.] 8. Show that any finite product of countable sets is countable. [Hint: Use Exercise 7 and induction.] 9. Let (IN E J) be a family of sets with each I j countable and J countable. Show that UI j is countable. [Hint: As I j is countable there exists an injective function Jj: Ir-> N. Write J = {jI,j2,j3""} and for I j let j(x) =A for the smallest k with x E Ij •• Define f: Ij --+ N x N by f(x) = (Jj(X)(x),j(x)). For IX as in Exercise 8.1.6, prove that IXf: U I j -+ N is injective.] U XEU 10. Use Exercise 9 to prove e is countable. [Hint: Use the sets Sk of the proof of 5; show that each Sk is finite and hence countable.] Hence, eH is a countable sum. 11. Show that ft as in Exercise 7..1 is an instance of the pattern-of-calls expansion. 12. Show that the semantics L BAn of an abstract iterative program as in Exercises 6.2.11 and 7.. 2 is an instance of the pattern-of-calls expansion. 8.3 Additive Domains In this section we define ordered partially additive categories which include, as far as we know, all semantic categories of interest which are partially additive. Such a category C has the property that each C(X, Y) is a domain under the sum-ordering f : : ; g if g = f + h for some h (cf. 3.3.16) and for each PAR scheme (M'L, H), t/lH is continuous so that the
194 8 Partially Additive Semantics of Recursion scheme has both its Kleene semantics and its pattern-of-calls expansion and we prove these are equal by demonstrating that they are both canonical fixed points in a situation where the canonical fixed point theorem 7••9 guarantees that the canonical fixed point is unique. 1 Definition. Let (M,~) be a partially additive monoid. The sum-ordering on is the relation (M,~) a :::; b if b = a +h for some h. This relation is always reflexive and transitive. It is reflexive, a :::; a, because a = a + 0, and it is transitive, in that a :::; band b :::; c implies a :::; c, because if b = a + h, c = b + k then c = (a + h) + k = a + (h + k). We say (M,2) is a sum-ordered partially additive monoid ifthe sum-ordering is antisymmetric, a = b if a:::; band b :::; a, so that (M,2) is a poset. 2 Example. (M,2J is not sum-ordered if M by 00 .1 o 1 = {O, 1, .1, oo} and I, is defined if some an = 00 or an i= .1 infinitely often if all an = .1 or I is empty if no an = 00, {n: an i= .1} is finite and nonempty and the number of n with an = 1 is even if no an = 00, {n: an i= .1} is finite and nonempty and the number of n with an = 1 is odd. Such (M, I,) is a partially additive monoid with I, totally defined and .1 as additive zero. Since 0 + 1 = 1, 1 + 1 = 0 we have 0 :::; 1 and 1 :::; 0 even though 0 i= 1. 3 Definition. An additive domain (see 8 below) is a sum-ordered partially additive monoid (M, I,) satisfying the additional property that whenever (adiEI) is summable and bEM has the property that I,(adiEF):::; b for each finite subset F of I then also I,(ailiEI):::; b. An ordered partially additive category is a partially additive category C for which each partially additive monoid C(X, Y) is an additive domain. 4 Example. Not every sum-ordered partially additive monoid (M, I,) is an additive domain. Set M = {O, 1, oo} with 00 I,(anlnEI) = { 1 o if some an = 00 or {nla n = 1} is infinite if no an = 00 and {nla n = 1} is finite and nonempty if all an = 0 or I is empty. Partition associativity is easily verified. This is a sum-ordered partially additive monoid with 0 < 1 < 00. However, if ak = 1 for k = 0, 1, 2, ... then
195 8.3 Additive Domains I(aklkEF) = 1 for each finite subset F of N even though I(aklkEN) 00 i 1. Thus, (M, is not an additive domain. I) = The counterexamples 2 and 4 seem rather artificial. We now show that Pfn and Mfn are ordered and investigate further examples in the exercises. 5 Example. Pfn is an ordered partially additive category. To prove that Pfn(X, Y) is sum-ordered it is enough to observe. 6 Example. Any partially additive monoid in which c + c is not defined unless c = 0 is sum-ordered. For if a::;; b, b ::;; a then b = a + h, a = b + k so a = b + k = a + (h + k) = a + (h + k) + (h + k) so that h + k = 0; hence, h = 0 = k and a = b. Indeed the sum-ordering on Pfn(X, Y) is just the extension ordering of 2.1.9 as is clear from the definitions. Hence, if ai exists, ai = Va i (see 6.1.5) so that the axiom in 3 is obviously true. L L 7 Example. Mfn is an ordered partially additive category. Since the sumordering is just the usual ordering, f::;; g if f(x) c g(x) for all x E X, on ai so the axiom of 3 is clear. Mfn(X, Y), and a i = I U We now establish some theory for arbitrary additive domains beginning by showing that these are domains. 8 Theorem. An additive domain (M, L) is a domain under the sum-ordering ::;;. PROOF. By definition, (M, ::;;) is a poset. 0 is the least element since a for all a. Now let be an ascending chain. Then there exist Xk (k = 0,1,2, ... ) with 9 (k ~ 0). It follows that 10 since this is clear for k ak + 1 = = ao + k I Xi i=O (k 0) 0 and the inductive step is But then 11 ~ a = ao + 00 I Xi i=O = 0+a
196 8 Partially Additive Semantics of Recursion exists by the limit axiom since every finite subsum is a subs urn of a sum of form 10. We will show a = Va k • That each ak :::;; a is clear from 10 since a + ak = 00 LXi' i=k+1 Now suppose ak :::;; b for all k. Thus, by 10, every finite subsum of 11 is :::;; b so that a :::;; b by Definition 3. 0 We next prove two important results that relate additive constructions to the continuity of morphisms. 12 Theorem. Any sum of continuous maps is continuous, that is, if (M, L), (M',L) are additive domains, if hn: (M, :::;;) ~ (M', :::;;) is continuous for nEI, and if h(a) = L(hn(a)/nEI) is defined for all a E M then h is continuous. If (an) is an ascending chain with supremum a in (M, L), it follows from the continuity of each hn and 10 and 11 that there exist Xn,i with PROOF. k hn(a k +1) = hn(ao) + L Xn,i' i=O 00 hn(a) = hn(ao) + L Xn,i' i=O Thus, k = L hn(ao) + L L Xn,i (partition associativity) neI neIi=O k = h(ao) + LXi' i=O where we define Xi to be the subs urn of Xi Lne I L~=o Xn,i' = L Xn,i' neI Hence, by the proof of Theorem 8, 00 V h(ak ) = h(ao) + LXi' i=O But then
197 8.3 Additive Domains 00 = L hn(ao) + L L Xn,i nEli=O ne] 00 = h(a o) + L i=O Xi = Vh(ak ) o as desired. This leads to the following theorem: L, L) H) be a PAR-scheme for which (M, is an additive 13 Theorem. Let (M, domain. Then t/lH: (M, :::;;) --+ (M, :::;;) is continuous if :::;; is the sum-ordering. PROOF. By definition, t/I H(X) = L Hm(x, ... , x) with each Hm m-additive. By Theorem 12 it suffices to show that gm(a) = Hm(a, ... , a) is continuous. For m = 0 this amounts to observing that a constant map is continuous, which is clear. Now assume m ::::: 1. Let a = Va k in (M, :::;;) so that, with minor notational changes in 10 and 11 there exist Xk with 14 (k::::: 1), To prove gm continuous we must find Yk with 15 00 gm(a) = gm(ao) + L Yk. k=l First, observe that 10 takes the form 16 if Xo is defined to be a o. To discover Yk, evaluate g(ad using 16 and invoking 8.2.4. For example, 17 + Xl""'XO + Xl) = Hm(xo,· .. ,xo) + L(Hm(xi"""xi)l(ib ... ,im)EId = gm(ao) + Yl, gm(a l ) = Hm(xo where Yl is the II-indexed sum, II being the set of all (il, ... ,im ) with j :::;; 1 and at least one ij = 1. It is then not hard to guess that we should try o : :; i
198 8 Partially Additive Semantics of Recursion 18 with Ik = {(i 1 , ••• ,im )10 ~ ij ~ k, at least one ij = k}. The existence of the sum in 18 is clear since, using 16 and 8.2.6, it is a subsum of the expansion of t/J H(ak). We turn now to showing that the Yk of 18 satisfy 15. We first show gm(ak) = gm(ak-d + Yk' The case k = 1 was handled in 17. Proceeding inductively, assume gm(a n) = gm(an- 1) + Yn holds for alII ~ n ~ k. By 10, gm(ak) = gm(ao) k + L Yt· t=l We then have gm(ak+d = Hm C~ XU,"'' :~ Xu) ~ gm(a o) + L(Hm(x it ,···, xiJIO ~ i j = gm(ao) + L (Hn(Xi" ... , xiJIO ~ ij + 1, not all ij = 0) ~ k, not all ij = 0) ~u + "(Hm(Xi L... , , ... , Xi n )1(i 1 , .. ·, in) E Ik+d = (gm(ao) + tt Yt) + Yk+1' Finally, noting a = Lk=O Xk as ao = Xo, = gm(ao) + 00 L Yk' k=O D We conclude the section with a general result which when applied to recursive specifications on Pfn(X, Y) guarantees that the pattern-of-calls expansion always gives the Kleene semantics. L, 19 Theorem. Let (M, H) be a PAR scheme (8.1.6) with (M, domain (as in 3). Then the pattern-ofcalls semantics 20 eH = L SH see of 8.2.5 coincides with the Kleene semantics 21 of 6.2.3. 00 V t/J~(O) n=O L) an additive
199 8.3 Additive Domains PROOF. Let fJ> be the category of PAR schemes of 8.2.1 and let f!J>,.,; be the full subcategory of allJM, H) in fJ> with (M, an additive domain. Now the initial object (A, 11) of fJ> of 8.2.7-10 is~ clearly in fJ>,.,; sinc~ the sumordering is inclusion of subsets of e whereas is union. Thus, (A, f1) is the initial object of fJ>,.,; so, by the same proof as in Section 8.2, based on Theorem 7..9, 20 provides fJ> < with a unique canonical fixed point. But 21 is a fixed-point on (M, H) by Theorem 6.2.13 since, by Theorem 13, r/I H is continuous. Furthermore, 21 is a canonical fixed point of f!J>,.,; since for each morphism ifJ: (M, H) ----+ (M', H'), L, L, L) L L, L, ifJ ('2 L, r/I'lI(O)) L, n'2 ifJr/lH(O) (Theorem 13) = V r/lHifJ(O) n=O (8.2.1. ifJrjlH = r/lHifJ) = V r/lH(O), n=O = 00 00 where ifJ(O) = 0 because since ifJ is additive it preserves all sums including the empty one. Since 20 is the unique canonical fixed point, whereas 21 is a canonical fixed point, 20 and 21 coincide. D EXERCISES FOR SECTION 8.3 1. Show that the partially additive category FwR(M,o,e) of Exercise 3.2.11 is ordered. 2. Show that the partially additive category Pfn y of Exercise 4.2.13 is ordered. 3. Show that 1/1: Pfn(N, N) ----+ Pfn(N, N) I/I(h) = if n = 0 then 0 else hn(n - 1) is a power series specification with eH(n) = 0 for all n. [Hint: DD(Hn(a 1 , ... , an)) is a one-element set; 1/1 is not a polynomial.] 4. Show that if X is the po set with Hasse diagram and (a;) is summable if Va i exists with additive monoid. Ia i = Va i , then (X, I) is not a partially 5. Let (P, :::;;) be a consistently complete poset (Exercise 6.1.7) and say that (ai) is summable if (a;) is consistent with ai = Va i . Show that (P, is an additive domain whose sum-ordering coincides with the original one. I I)
200 8 Partially Additive Semantics of Recursion 8.4 Proving Correctness In this brief section we specialize to the additive domain Pfn(X, Y), establishing and illustrating some proof rules for specifications 1 Pfn(X, Y) ~ Pfn(X, Y) for a PAR scheme (Pfn(X, Y),L,H) which use both the ordered and the partially additive structure on Pfn(X, Y). We use the notations of previous sections of this chapter and 1 without further comment. 2 Tree Induction Rule (Partial Correctness). Let g E Pfn(X, Y). To prove eH :s; g it is necessary and sufficient to prove that for all n ?: 0 and s l' ..• , Sn E e withsfI :s;gfori= 1, ... ,nwehave(w[sl, ... ,snJ)H :S;g. PROOF. If eH :s; g then SH :s; eH :s; g for all SEe, for s = W[Sl' ... ' snJ in particular. Conversely, setting n = 0 yields wg :s; g so that, by induction, SH :s; g for all SEe. Using the special fact (see Exercise 1) 6.1.9 that sum and supremum coincide in Pfn(X, Y), D Note that the tree induction rule requires a guess g for the semantics to be given first. To be useful, g should be in "closed form." It seems unlikely that the tree induction rule would contribute useful information, say, about Ackermann's function. 3 Disjointness Lemma. If x E X, then SH (x) is defined for at most one SEe. 4 Termination Lemma. If for each x in X there exists SEe with SH(X) defined, then eH is total. 5 Example (Partial Correctness by Tree Induction). The "91 function" given by the recursive definition f(x) := if x > 100 then x - 10 else f(f(x + 11)) was analyzed in Exercise 5.1.9 where ordered semantics was used to show that the Kleene semantics is g(x) := if x > 100 then x - 10 else 91. We here illustrate the tree induction rule 2 by showing that if f(x) is defined then f(x) = g(x). In other words, we must prove f:S; g. This is a partial correctness proof since we say nothing about those x not in DD(f). The partially additive fixed point equation on Pfn(N, N) is f = l/I(f) = Ho + Hz(f,f), where
201 8.4 Proving Correctness Ho = if x> 100 then x - 10 else undefined, H 2 (s, t) = stu where u = if x:$; 100 then x + 11 else undefined. We must apply 2 for n = 0 or n = 2. For n = 0, Ho :$; g is clear. For n = 2, we assume s, t in PC(H) satisfy s, t :$; g and show that stU:$; g, that is, we show that if stu (x) is defined, then g(x) = stu (x). Now, if stu (x) is defined then both u(x) and tu(x) are defined and we have u(x) = x + 11 and x:$; 100; and, because t :$; g, tu () x = { (x + 11) - 10 = x + 1 if x + 11 > 100 91 else. Because s :$; g, stu(x) = (x + 1) - 10 and x +1> 100, or stu(x) = 91, that is, stu (x) = x - 9 and x = 100, or stu(x) = 91, that is, stu (x) = 91. Thus, stu:$; g, as was to be shown, and hence f :$; g. 6 Example (Total Correctness by Exhaustion). Consider again f, g, h, H o, u as in 5. We will show directly that g = I(SH: SEe) so that, in fact,f(x) = g(x) for all x. This is now a total correctness proof, since we characterize the behavior of f for all x. We must thus apply 4, showing that for each x there exists SEe with SH(X) = g(x). Case 1. x > 100. Then Ho(n) = g(n). Case 2. 90 :$; x :$; 100. To begin, observe Hou(x) = Ho(if x :$; 100 then x + 11 else 1-) + 11 > 100 then x + 11 - 10 else 1= if 90 :$; x :$; 100 then x + 1 else 1-. = if x :$; 100 /\ Next, claim that if tk = X Ho(Hou)k then tk(x) = if x = 101 - k then 91 else 1for 1 :$; k :$; 11. For k = 1, t 1 (x) = Ho(Hou)(x) = H o(if90:$; x :$; 100 then x + 1 else 1-) = if 90 :$; x :$; 100 /\ X + 1 > 100 then x + 1 - 10 else 1= if x = 100 then 91.
202 8 Partially Additive Semantics of Recursion For the inductive step, tk+l (x) = tk(Hou)(x) = tk(if 90 :::;; x :::;; 100 then x = if 90:::;; x :::;; 100 = if x +1= = if x = A X + 1= 101 - k then 91 else -1 101 - k then 91 else -1 101 - (k + 1 else -1) + 1) then 91 else (as 1 :::;; k :::;; 11) L But tk = sf! if = and for 90:::;; x :::;; 100, g(x) = tlol-Ax). Case 3. 0:::;; x < 90. For any such x there exists a unique a with 1 :::;; a :::;; 9, x + 11a :::;; 100, x + l1(a + 1) > 100. It is clear that + 11a. As 90 :::;; x + lla :::;; 100, 1 :::;; k :::;; 11 for x + 11a = 101 ua(x) = x SkUa(X) - k, we have = 91. But as t lo (91) = 91, tlOmt,;ua(x) for any m. But tloatkua has form SH = 91 since (a H2'S) EXERCISES FOR SECTION 8.4 1. Show that the tree induction rule 2 generalizes straightforwardly to any ordered partially additive category. Specifically relate the "special fact" discussed in 2 to the definition of an additive domain. 2. Consider the power series H: Pfn(N, N) - - Pfn(N, N) of Exercise 5.2.7 for the recursive specification -PH (h) = if n = 0 then 0 else 1 + h(h(n - 1)).
203 8.5 Power Series and Products Let f be the pattern-of-calls expansion of H. Use the tree induction rule 2 to show ~ idN · Then use the termination lemma 4 to conclude f = idN . f 3. In 5.2.17 it was shown that the pattern-of-calls expansion equation f for the fixed point f(x) = if p(x) then f(f(g(x))) else x is f(x) Establish that f ~ = while p(x) do g(x). while p do 9 using the tree induction rule 2. 4. Establish the results of 6.2.17 using the proofrules of this section. 5. Establish the result of Exercise 8.3.3 by using the proof rules of this section. 8.5 Power Series and Products In this final section we address some technical problems that would arise naturally in using PAR schemes to define the formal semantics of recursion for a programming language. The results established here are adequate to provide the semantics of FPF in PCn as is established in Exercises 6-11. The proof of the corresponding results for Kleene semantics is a good deal simpler. Nonetheless, the PAR-scheme approach does provide a tighter setting because Theorems 8.3.13 and 8.3.19 guarantee that PAR-scheme semantics is Kleene semantics whereas no converse result is known at the current time. The first new idea is that of a "power-series" scheme which is a PAR scheme with an additional property. All the PAR schemes arising, say, in FPF are power-series schemes. The initial PAR scheme of 8.2.7 is a powerseries scheme so that the canonical expansion Theorem 8.2.12 goes through restricted to power-series schemes. Hence, power-series schemes are not unduly restrictive. A "power-series map" is t/lH: M -+ M for (M, H) a power series scheme. In FPF we would surely require for the map constr mof 8.1.5 that if t/lH I ' ••• , t/lHm are power-series maps D -+ D then so is constrm(t/lH I " ' " t/lHJ: D -+ D. In general, we will define a "strongly m-additive" map to be one which converts m power-series-map inputs to a power-series output, and we will establish a workable criterion to prove that an m-additive map is strongly so. In practice, it appears that programming language constructors are strongly m-additive maps. We note that a PAR scheme (M, H) for which all countable families are summable is necessarily a power-series scheme and then all m-additive maps M m -+ M are necessarily strongly m-additive. Thus, the technical issues addressed by power series schemes and strongly m-additive maps have to do with summability. L, L,
204 8 Partially Additive Semantics of Recursion The third concept needed is a suitable product of partially additive monoids analogous to the product domain of 6.1.17. This is easily given and is useful for simultaneous recursions such as 5.2.18. But a new use for products arises. For domains DI , ... , Dm, D a function F: DI x ... x Dm ------. D is continuous (viewing the product DI x ... x Dm as a single domain) if and only if it is "m-continuous" (= separately continuous, see Exercise 6.3.7) which explains why "m-continuity" was not a needed concept earlier. The corresponding result for partially additive monoids is false. Thus, if, say, M I , Mz, M 3 , M are partially additive monoids a considerable number of distinct possibilities arise for a function f: MI x Mz X M3 ------. M. Not only may it be additive or 3-additive but it may, say, be 2-additive considered as MI x (Mz x M 3 ) ------. M. This third possibility is exactly what happens for the if-then-else map of 6.3.18, as we show below in 15. To motivate power-series schemes, recall the opening remarks of Section 8.1. In a PAR scheme (M,I,H) we think of Hm(h l , ... ,hm) as the sum of all composition m-substitution paths with hj replacing the jth occurrence of a function variable. Thus, while we have heretofore required only· that L Hm(x, ... , x) exists, for all x it would be just as reasonable to require the existence of I Hm(Xml' ... , xmm) no matter how XII; X21' X2 2 ; X31' X 3 2' X33; .•• are chosen in M. For ifj # k, the j-substitution paths are guarded from overlap with the k-substitution paths regardless of which functions replace the variables. This stronger assumption is the basis of the definitions to which we now turn. 1 Definitions. Let (M, I), (M', I') be partially additive mono ids. A power series (M, I) ---+ (M', I') is a family H = (Hmlm ~ 0) of m-additive maps Hm: Mm -+ M' such that for all m ~ 0, the sum 2 exists, regardless of how the j-arguments for each H j (indicated by ( -)) are chosen. The power-series map of H is then 3 which sum necessarily exists by the limit axiom on (M', I') since each finite subsum is a subsum of a sum of form 2. A power series H is a polynomial if Hm = 0 except for finitely many m. In this case the largest m with Hm # 0 is the degree. If H is a polynomial, t/lH is the polynomial map of (Hm). 4 A power-series scheme is (M, I, H), where H: (M, I) ---+ (M, I) is a
205 8.5 Power Series and Products power series, and for such H, t/lH: M -+ M is called a power-series recursive specification. The reader should pause to spot-check that our previous examples of PAR scheme are all power-series schemes. The promised definition of strong m-additivity is then as follows. 5 Definition. Let (M', I'), (Ml' Il), ... , (Mn' Ln), (M, I) be partially additive monoids and let L: Ml x ... x Mm - - M. Then L is strongly m-additive if L is m-additive and if whenever H t = (Htjlj ~ 0) are power series (M', I')-(Mt, Lt)(1 :s; t :s; m) then there exists a power series H: (M', L') - - (M, 2) with . 6 t/lH(h') = L(t/lH 1 (h'), ... , t/lH m (h')). We know of no "natural" examples of m-additive maps which are not strongly m-additive; although we conjecture that counterexamples do exist. The fact that one is hard to find is a good sign for the power-series approach. We now turn to develop a criterion (8 below) which makes it possible to prove that many m-additive maps are strongly m-additive. The following is a mild generalization of Lemma 8.2.4, so the proof is omitted. 7 Lemma. Let (Ml' I d, ... , (Mm' Im), (M,L) be partially additive monoids and let L: Ml x ... x Mm ---+ M be m-additive. Then whenever (hijljEJJ is a summable family in (M,I), (L(hlit, ... ,hmiJljiEJi) is a summable family in (M,I) and (where we write I for Ii to avoid notational confusion). I I) 8 Theorem. Let (Ml' 1), ... , (Mm' Lm), (M, be partially additive monoids and let L: Ml x ... x Mm ---+ M be m-additive. Suppose that whenever H t = (Hth ~ 0) is a power series (M', I') - - (Mt, Lt)(1 :s; t :s; m) then for all k :s; 0, the sum 9 exists, regardless of how the arguments, indicated by ( - ), are chosen. (Different arguments may be chosen in different terms.) Then L is strongly m-additive. PROOF. Define
206 8 Partially Additive Semantics of Recursion This sum exists, being a sub sum of a sum of form 9. D = (D k ) is a power series because L O:Sj:Sk Di-) = L O:Sj:Sk L it+"'+j~=j L it +"'+j~:Sk L(H 1it (-),···,HmjJ-)) L(H1j ,(-), ... ,HmjJ-)) is a sum of form 9. Finally, L(t/lH, (h'), ... , t/lHJh')) = L( L H 1it (h', ... ,h'), . .. , L j,"<!O HmjJh', ... ,h')) j~"<!O L L(H1it (h', ... , h'), ... , HmjJh', ... , h')) (by 7) it+"'+j~"<!O = I k"<!O Dk(h', ... , h') = t/lD(h')· o 11 Corollary. If in (M, L) every countable family is summable, every madditive map M1 x ... x Mm ---+ M is strongly m-additive. 12 Example. Let C be a partially additive category such as Pfn or Mfn in which any family in which each two-element subfamily is summable is itself summable. Let (h 1, ... , hm) ~ hm' .. h1 be the composition map of 8.1.3. This is already known to be m-additive. It is in fact strongly m-additive. To check 9, we must verify that exists where ij, has the form Htj,( -). It suffices to show that ij~'" Jj, + is defined whenever (j1"" ,jm) =f (u 1, ... , um). Let t be least with ij, =f fUt Then ij,-l ... Jj, = g = fu,-l ... fu, (= idxo if t = 1). Then ij, + fu, exists because H t is a power series. By 3.2.1 and 3.2.20 fu~'" fu, (ijm ... ij,+l )ij,g + (fu~ ... fu,+l )fu,g exists, as desired. The hypothesis on C is not necessary. See Exercise 5. The promised definition of the product of partially additive monoids is as follows. 13 Definition. Let (M1, L 1)' ... , (Mm,Lm) be partially additive monoids, m > O. Then their product partially additive monoid (M, L) is defined as follows: M = M1 X ••• x Mm (the product set, 2.3.1, 2.3.11).
207 8.5 Power Series and Products A family ((h li , ... , hmi)li E J) is summable in (M, (hjil i E J) is summable in (Mi' Ii) and then I (hli' ... ,hmi ) = ( D if for eachj E {l, ... ,mj }, ~ (hu), ... '~(hmi)), that is, "sum independently on each coordinate." To see that (M,~) satisfies the limit axiom, suppose that every finite subfamily of((h u , ... , hmi)1 i E J) is summable. Then if F is any finite subset of J, I(hu, ... ,hmi)liEF) exists so that for any j, Lj(hjiliEF) exists. As F is arbitrary and (Mj' Ij) satisfies the limit axiom, (hjil i E J) is summable. But then, as j is arbitrary, ((h li , ... , hmi)1 i E J) is summable. The remainder of the verification that (M,~) is a partially additive monoid is similar and is left as an exercise. We have already warned the reader that there are numerous additivity possibilities for a map L: M1 x ... x Mm ----+ M. We now explore some examples. 14 Example. The 2-additive composition map Pfn(X, Y) x Pfn(Y, Z) --.!::..... Pfn(X, Z) of 8.1.3 is not additive since it is false that (g + g')(f + 1') = gf + g'f' as additivity with respect to the product monoid structure would require. Indeed, by 2-additivity we have + g')(f + 1') = g(f + 1') + g'(f + 1') = gf + gf' + g'f + g'f' which differs from gi + g'f' if gf' # 0 or gf' # o. (g 15 Example. The map if-then-else of 6.3.18 is 2-additive considered as o x 0 2 ----+ 0, where 0 2 = 0 x 0 is the product partially additive monoid. This amounts to 16 + g2 else h1 + h2 (if I then gl else h 1) + (if I if I then gl = then g2 else h 2) and 17 if (f1 = + 12) then g else h (if 11 then g else h) + (if 12 then g else h). To see that 16 holds, observe that if I is true both sides will apply whichever of g 1, g2 (if either) is defined and similarly if I is false. That 17 holds is obvious since at best one of 11' 12 is defined.
208 8 Partially Additive Semantics of Recursion If if-then-else were 3-additive D3 if f then g1 = -+ D then + g2 else h + (if f then g2 else h) since if f is true and h is defined there is (if f then g1 else h) would hold. This cannot be true domain overlap on the right-hand side so that the sum is not defined. EXERCISES FOR SECTION 8.5 a { ;1M 1. If(M,D = (Ml,Ll) x ... x (Mm,Lm) as in 13 and Nl x ... x Nm where (Nl , L'd, ... , (Nm, L~) are partially additive monoids, show that additive if and only if each J; is m-additive. f is m- 2. Show that 5.2.19 provides a power-series scheme whose polynomial map on the product partially additive monoid Pfn(X, X) x Pfn(X, X) is the recursive specification for the simultaneous recursion of 5.2.1S. 3. Show that for additive domains, the product of 13 provides the domain product of 6.1.17. 4. Let X = {., -,0,1,2,3,4,5,6, 7,8,9} and let (M,D be the product partially additive monoid (Lx, L)3 for Lx as in Exercise 8.2.3. Define a suitable power series H so that I/IH: (Lx, D 3 ----> (Lx, D is such that (i) The fixed point equation (E, N, D) = I/IH(E, N, D) is E = . EE + -E + N, N=D+ND, D = {O, 1,2,3,4,5,6,7,8, 9}, that is, "a digit is one of 0, ... ,9, a number is a digit or a number followed by a digit, an expression is a number, - e or . e l ez where e, e l , e z are expressions." (ii) eH = (E, 1'1, 15), where 15 = {O, ... , g}, 1'1 = nonempty strings in 15*, and E is the set of prefix form arithmetic expressions in binary . and unary - (prefix form means e l . (e z - (e3· e4 )) is written· e l - e z · e3e4 ). [Hint: H is a second-degree polynomial.] 5. Generalize Example 12 to any partially additive category. [Hint: For m = 2, using the notations of 12, show that fo(fo + ... + fk) + ... + h-l (fo + fd + fdo exists. Then use induction on m.] Exercises 6-11 outline a proof that each FPF recursive specification Dn--+ Dn, with D = Pfn(DTN, DTN) and Dn the product partially additive monoid, is a power-series specification. We follow the inductive definition of 6.3.10-21.
209 Notes and References for Chapter 8 6. Show that (h 1 , ••• , hn ) f---+ h; is additive Dn -+ D for each i. 7. Show that constrm: Dm -+ D is strongly m-additive. 8. Show that if I/IH' I/IL, I/IM: Dn -+ D are power-series maps then so is if I/IH then I/IL else I/IM [Hint: if I/IH then I/IL else I/IM equals the I/IN whose mth term, m ~ 0, is given by and hE Dn is vector notation h = (h1 , • •• , hn ).] 9. If I/IH: Dn -+ D is a power-series map show that (rxl/lH(h)<t1> ... ,tk) = <I/IH(h)t 1 ,···,I/IH(h)tk) is a power-series map. [Hint: Consider where pr;: <t 1 , ••• ,tk) = t;. 10. Using Exercise 1, show that if I/Ij: Dn -+ D is a power-series map for 1 ::5: j ::5: k then 1/1: Dn -+ Dk, I/I(h): (I/I(h), ... , I/Ik(h» is also a power-series map. 11. Using the above exercises, Exercise 6.3.11, and the theory ofthis section, conclude that I/Ip : Dn -+ Dn of 6.3.21 is a power-series map. It follows from Theorem 8.3.19 above that the Kleene semantics of the I/Ip (based on Exercises 6.3.4-10) and power-series semantics produce the same semantics for FPF in Pfn. 12. Show that power-series semantics and Kleene semantics for FPF in Mfn (Exercise 1.4.3) coincide. [Hint: The proofs used for PIn are often trivialized because every countable family in Mfn(X, Y) is summable.] 13. Show that the initial PAR scheme of 8.2.7 is a power-series scheme. [Hint: The original proof that fln n flm = fjJ if m n works for arbitrary arguments.] Conclude that the canonical fixed point theorem 7.•9 applies to power-series schemes. [Hint: Review the proof of 8.2.12.] * Notes and References for Chapter 8 The theory of this chapter is due to the authors. The material of Sections 1-4 is adapted from their paper on the pattern-of-calls expansion as cited in the notes for Chapter 7. Section 5 is new. We thank Dana Scott for Exercise 8.3.5. A flawed version was attributed to him in the pattern-of-calls paper; the errors were ours, not his.
CHAPTER 9 Fixed Points in Metric Spaces 9.1 9.2 9.3 9.4 Contractions on Complete Metric Spaces Differential Equations Metrics on Trees Context-Free Languages as Metric Fixed Points A metric space is a set equipped with a function which assigns a numerical "distance" to each pair of points. These objects arise naturally in many areas of mathematics. In Section 9.1 we prove a classical result due (essentially) to S. Banach that under appropriate hypotheses, an endomorphism ljI of a nonempty metric space has a unique fixed point x-indeed x is the limit (in the sense of ever decreasing distance) of the sequence x o, ljIx o, ljI2xo, ... with any starting Xo. To illustrate the scope of this theorem we devote each of the remaining sections to an application. Section 9.2 establishes that an initialvalue problem of ordinary differential equations has a unique solution. Section 9.3 introduces a tree-syntax for recursive specification in which repeated execution produces an infinite (but nonrecursive) syntax tree. The independence of the resulting infinite tree from "calling strategy" follows from the uniqueness assertion in the Banach theorem. Finally, we show in Section 9.4 that the language defined by a context-free grammar often arises as the unique fixed point of the Banach theorem. The advantage of the Banach-theorem approach is that fixed points are unique. This also limits the scope of applicability since we have seen many fixed point equations in semantics (such as 5.1.4) which have more than one fixed point. 9.1 Contractions on Complete Metric Spaces A "metric space" is a set equipped with a specific notion of "distance" between its elements. The concept of "limit," familiar for numbers from a calculus course, extends readily to metric spaces since "approaching x" can
211 9.1 Contractions on Complete Metric Spaces be expressed in numerical terms by "distance to x approaches 0." If a sequence Xl' X 2 , X 3 , ... approaches X then in particular x n , Xm will approach each other as m, n get large; a sequence with this latter property is called "Cauchy." In a "complete" metric space, every Cauchy sequence is required to approach a limit. The main result of this section is that in a complete metric space X every function t/J: X -+ X which "shrinks distances" in the suitably precise sense of 19 has a unique fixed point. 1 Definition. Let [0, 00 J denote the set of all real numbers ~ 0. A metric on a set X is a function d: X x X ~ [0, ooJ which satisfies the following four axioms for all x, y, Z EX: (i) (ii) (iii) (iv) (Symmetry axiom) d(x, y) = d(y, x). (Triangle inequality) d(x, y) + d(y, z) d(x, x) = 0. If d(x, y) = then x = y. ° ~ d(x, z). A metric space is a pair (X, d) where X is a set and d is a metric on X. In a metric space, d(x, y) is called" the distance between x and y." 2 Example. Let X be the Earth's surface. The "neutrino metric" on X is d(x, y) = length of a straight line segment connecting x, y. (Neutrinos are subatomic particles which are so small and inert that they typically pass right through the Earth without interacting with any of its atoms.) The well-known "great-circle" metric used by airplanes is e(x, y) = length of any great-circle arc connecting x and y "the short way." 3 Example. Let X be the Euclidean plane R x R of all ordered pairs (x, y) with x, y real numbers. The Euclidean metric on X is the usual distance function d((X l ,X 2)'(Y1,Y2)) = J(x 1 - yd 2 + (X2 - Yz)2. The Manhattan metric on X, is another metric; however, one which reflects the distance along the streets of a city with east-west and north-south streets. Examples 2 and 3 are of a "geometric" character with the elements of X being "points." While such examples motivate much of the terminology used in metric spaces, they are by no means the only important examples. Two distinctly different types are provided by 4 and 6 below. 4 Example. Let to < t1 be real numbers and let [to, tlJ = {tlto ::;; t ::;; td. Define X to be the set of all continuous functions x: [t o,t 1 J ~ R. By the maximum value theorem, for each x, y E X there exists a number t in [to, t 1J
212 9 Fixed Points in Metric Spaces maximizing Ix(t) - y(t)l. Call this maximum value the distance d(x, y) between x and y, that is, define d(x, y) = Max Ix(t) - y(t)l. It is not hard to prove (X, d) is the metric space. This example is studied further in Section 2 in connection with differential equations. Note that complex objects, namely, functions, are treated as mere "points" by this abstraction. 5 Definition. A non-Archimedean metric on X is a function d: X x X [0,00) satisfying (i), (iii), and (iv) of 1 as well as ----+ Max (d(x, y), d(y, z» 2 d(x, z). Since d(x, y) + d(y, z) 2 Max(d(x, y), d(y, z» it is obvious that a non-Archimedean metric is metric. None of the examples so far is non-Archimedean. The following proposition provides a non-Archimedean metric which we apply to the theory of context-free languages in Section 4. 6 Proposition. Let X+ be the set of nonempty words on the alphabet X. Given two languages L, M c X+, we say they differ on w if w is in one language but not the other. We let I(L, M) be the length of the shortest word on which L and M differ, so that I(L, M) = 00 if and only if L = M. The map d: 2x+ x 2x+ ----+ [0, 00) defined by d(L, M) = r'(L.M) is a non-Archimedean metric on 2x+. ° PROOF. (i) d(L, M) = if and only if I(L, M) = 00 if and only if L = M. (ii) Symmetry is obvious. (iii) Given three languages L, M, and N, if L, N differ on w of length I(L,N) then, say, wEL, w¢N. IfWEM then I(M,N) S I(L,N), else w¢M and I(L, M) S I(L,N). We therefore have I(L, N) 2 Min(l(L, M), I(M, N». Thus, as desired. D 7 Example. We saw in Exercise 3.3.3 that "any subset of a poset is a poset." The same principle holds for metric spaces. If (X, d) is a metric space and A is any subset of X then dA : A x A ----+ [0, 00) defined by restricting d to A x A, that is, dA(x,y) = d(x,y) for x, YEA is a metric. That axioms 1 (i)-(iv) hold is obvious.
213 9.1 Contractions on Complete Metric Spaces Usually, we would just write (A, d) rather than (A, dA ) since this is rarely confusing. We say (A, d) is a metric subspace of (X, d). 8 Definition. Let (X, d) be a metric space, let (xn In = 1,2,3, ... ) be a sequence of elements of X, and let x E X. Then x is a limit of (x n), in symbols lim(xn) = x or Xn -4 x, if the sequence d(x, xn) of real numbers approaches O. (Some readers may have seen rigorous "epsilon-delta" definitions for limits of real numbers whereas others may have seen only intuitive definitions. We prefer to avoid further discussion of this here, our main point being that whatever the reader knows about limits for real numbers extends easily to arbitrary metric spaces.) 9 Proposition. Limits in a metric space (X, d) are unique, that is, if Xn -4 x and Xn -4 Y then x = y. PROOF. As d(x, y) ::;; d(x, xn) + dey, Xn) for any n by the axioms of 1, d(x, y) ----+ O. As d(x, y) is independent of n, d(x, y) = O. By 1, again, x = y. D Thus, while a sequence (x n) may have no limit, if a limit exists it is the limit and we denote it by lim(xn). 10 Example. Let f: [0, IJ -+ R, f(x) = eX. As is well known from calculus, x2 x3 e=l+x+-+-+· ... 2 3! If fn: [0, IJ then f" -4 -+ R is defined by f in the metric space of Example 4. 11 Example. In the metric space (2x+,d) of 6 with X = {a} let Ln = {a,a 2 , ... , an}. Then X+ = lim(Ln) because d(X+, Ln) = 2-(n+1). 12 Example. Let (X, d) be a metric space. If X possesses two distinct elements y, z then there exists a sequence (x n ) with no limit. Define Xn = { y if n is odd if n is even. Z To see that Xn -4 X is impossible for all x, observe that d(xn' xm) ::;; d(xn' x) + d(x, xm) so that as n, m get large d(xn' xm) ----+ 0 which says that dey, z) = 0 (as we can have y = Xn, Z = Xm for arbitrary large n, m). This contradicts the axioms of 1 as y =1= z.
214 9 Fixed Points in Metric Spaces Example 12 shows that it is unreasonable to expect all sequences to have a limit. A sequence which has a limit must at least be "Cauchy." This is explained in the next definition and proposition. 13 Definition. Let (xn) be a sequence in a metric space (X, d). Then (xn) is Cauchy if d(xm' xn) ---+ 0 as n, m get large. 14 Proposition. In any metric space, a sequence with a limit must be Cauchy. PROOF. d(x, xn) The reasoning is the same as in 12. If d(x, xn) ----+ 0 then d(x n, xm) ::; + d(x, xn) ---+ 0 also. 0 Our overall objective is to develop conditions on a metric space (X, d) and a total function t/!: X --+ X which guarantee that t/! has a distinguished fixed point. It is useful to contrast this situation with that for domains and continuous maps. There we "started" with .1 and applied t/! successively to get the sequence .1, t/!(.1), t/!2(.1), .... Monotonicity of t/! forced this sequence to be an ascending chain. The definition of "domain" provided a "limit" x = V t/!n(.1) to this sequence and continuity was used to prove t/!(x) = x. In a metric space there is no natural ".1" to start with, so let us start with any Xo. We may then form the sequence x, t/!(xo), t/!2(XO)' .... This sequence will hopefully have a limit x (so the sequence itself must at least be Cauchy by 14) and hopefully for this Xn we can prove t/!(x) = x. We will in fact find conditions on (X, d) and t/! to carry out this program in such a way that t/! has a unique fixed point (so that any starting Xo may be used). While this property is striking, it is also clear that "metric semantics" will not work for recursive specifications with multiple fixed points (e.g., 5.1.4). The uniqueness property stems from the following proposition. 15 Proposition. Let (X, d) be a metric space and let t/!: X 16 d(t/!(x), t/!(y)) < d(x, y) --+ X satisfy whenever x =F y. Then t/! has at most one fixed point. PROOF. Let t/!(x) = x, t/!(y) = y. Then if x =F y, d(x, y) = d(t/!(x), t/!(y)) < d(x, y) and this is impossible. 0 17 Example. A map satisfying 16 need not have any fixed points. For example, let R have its usual metric d(x, y) = Ix - y I and let (X, d) be the metric subspace as in 7 with X the subset {Xl,X2,X3' .•. } with Xn = lin. Define t/!: X --+ X by I/!(x n) = x n +1 • That 16 holds is easily verified, but clearly t/! has no fixed points.
215 9.1 Contractions on Complete Metric Spaces A related example holds for the subset y= {YI'Y2,YJ, ... } of R with Yn = n - lin. Here, cP: Y -+ Y, CP(Yn) = Yn+l again satisfies 16 and has no fixed points but whereas Xl' tjJ(xd, tjJ2(X I ), ... was a Cauchy sequence, YI' CP(YI), cp2(yd, ... is not. The following two definitions patch the holes created by Example 17 and lead to our main result, Theorem 20. 18 Definition. A metric space is complete if every Cauchy sequence has a limit. As is shown in the exercises and in the balance of the chapter, while it may take work to establish it, many examples of complete metric spaces exist. It is an issue of foundations to establish that the real line is complete; see the end of chapter notes. 19 Definition. Let (X, d) be a metric space. A map tjJ: X if there exists 0 ::; K < 1 with -+ X is a contraction d(tjJ(x), tjJ(y)) ::; K d(x, y) for all X # y. As K < 1, a contraction must satisfy 16. The reason 16 is a weaker condition is that the quotient d(tjJ(x), tjJ(y)) d(x,y) while < 1, may climb arbitrarily close to 1 (as happens for the cP of 17). 20 Theorem (Banach). Let (X, d) be a complete metric space, let Xo E X be arbitrary, and let tjJ: X -+ X be a contraction. Then tjJ has a unique fixed point, namely, 21 We must prove that the limit exists, and that it is a fixed point. Uniqueness will then follow by 15. To prove 21 is a fixed point, we first observe the following: PROOF. 22 Lemma. For any two metric spaces (XI,dd, (X2,d2), if f: X 23 ~ Y satisfies d2(f(x),j(y)) ::; dl(x,Y) for all x, YEX, then whenever x = lim(xn ) in (Xl' d l ), lim(fxn ) exists in (X2' d 2) and coincides with f(x). Thus, f(lim(x n )) = lim(f(xn )). o
216 9 Fixed Points in Metric Spaces Returning to the proof of 20, if the limit x of 21 exists then applying the lemma with I = 1/1 yields I/I(x) = I/I(lim(I/In(xo))) = lim(I/I(I/In(x o))) = lim(I/In+1 (xo» = x (since it is obvious that whenever lim(Yl' Y2' Y3' ... ) = Y then also lim(Y2' Y3' Y4' ... ) = y). Thus, we need only show the limit of 21 exists. As (X,d) is complete, this is equivalent to showing that (I/In(xo» is a Cauchy sequence. To see this, note that for any m < n, d(I/Imx,I/Inx) = d(I/Imx, I/Im(I/In-m x» < Kmd(x,I/In-mx). By repeated application of the triangle inequality d(x, I/In-m x ) ~ d(x,I/Ix) + d(I/Ix,I/I2 X ) + ... + d(I/In-m-lx,I/In-m x ) ~ (1 + K + ... + K n - m +1 ) d(x, I/Ix). But since K < 1, we know that LJ;2:0 Ki converges to the limit 1/(1 - K). Thus, 1 d(x, I/In-mx ) ~ 1 _ K d(x, I/Ix) and so d(I/Imx,I/Inx) ~ Km -K -1-d(x,I/Ix). Since K < 1, we can make the right-hand side as small as we please simply by requiring that m exceed some sufficiently large N. But then (I/Imxlm ~ 1) is Cauchy, and we are done. D We observe that Theorem 20 is an instance of the canonical fixed point theorem of Chapter 7, applied to the category whose objects are all (X, d, 1/1) with 1/1 a contraction (X, d) ~ (X, d) and whose morphisms I: (X, d, 1/1) ~ (X', d', 1/1') are total functions I: X - X satisfying d(fx,fy) ~ d(x, y) and 1/1'1 = 11/1. The initial object is the object whose set has one element from which the unique morphism to (X,d,I/I) maps the single element to the unique fixed point of (X, d, 1/1). Proving that this is an initial object requires Theorem 20, but the canonical fixed point theorem then establishes that morphisms preserve the unique fixed points. See Exercise 10. EXERCISES FOR SECTION 9.1 1. Prove that the metric in 4 satisfies the axioms of 1. [Warning: in the triangle inequality, if d(x,y) = Ix(t) - y(t)l, d(y,z) = I(y(u) - z(u))l you can not assume t = u; show Ix(t) - z(t)1 ~ d(x, y) + d(y, z) for all z.] Give a specific example of functions x, y, z with d(x, z) > Max(d(x, y), d(y, z)) to show that this metric fails to be non-Archimedean. [Hint: Consider parallel curves.]
217 9.1 Contractions on Complete Metric Spaces 2. For any set X define d: X x X - [0, 00) by d(x,y) = {~ x#y x =y. Show that d is a non-Atchimedean metric. It is called the discrete metric on X. In this metric space, prove that if (x n ) is Cauchy then there exists N such that Xn = Xm if n, m ~ N. Conclude that (X, d) is complete. 3. Prove that f. -+ f in 10. [Hint: Look up Taylor's Theorem.] 4. Prove that every nonempty metric space (X,d) admits a map l/I: X -+ X with d(f(x),f(y)) ~ d(x, y) such that f has a fixed point but is not a contraction. [Hint: Very easy!] 5. Let (A, d) be a metric subspace of (X, d) as in 7. Say that A is closed if whenever an -+ x in (X, d) with each an E A then also x E A, that is, "A is closed under limits." Prove that a closed metric subspace of a complete metric space is again complete. Give an example to show that "closed" is necessary. 6. In the Euclidean plane, the set of all points equidistant from x, y (x # y) is the perpendicular bisector of the segment connecting x, y and, in particular, is a set of zero area. Prove, however, that if the segment connecting x, y has slope 1 then the set of all points Manhattan-equidistant from x, y has infinite area. 7. A metric space (X, d) is finite if X is finite. Prove that every finite metric space is complete. 8. Let (X,d) be the Euclidean plane. The map l/I(x,y) = (tx,ty) is a contraction whose unique fixed point is (0,0). Let (A, d) be a nonempty metric subspace as in 7 such that (O,O)¢A. By 20, if l/I maps A into A, (A, d) is not complete. Give an example of such an A and show precisely where A fails to be complete. Similarly, give an example of a complete (A, d) with (O,O)¢A and show that for some aEA, l/I(a)¢A. 9. In Exercise 2.1.7 we saw that "a poset is a category." A similar result holds for metric spaces. For a metric space (X,d) define a category C(X.d) as follows: ob(C(x.d») = X, C(X.d)(X,y) that is, we have a morphism x by t = {tE [0, 00): d(x,y) ~ t}, --..!...... y just in case d(x,y) ~ t. Define composition t+u u x-y-z=x----->.z and define identities by (i) Verify C(X.d) is a category and that (X, d) = (Y, e) if C(X.d) = C(Y.e)' (ii) Show that C(X.d) is literally equal to its dual category C(i,d)' (iii) Show that for any x, y E C(X.d) the product x x y does not exist. [Hint: if a d x--p-y
218 9 Fixed Points in Metric Spaces were a product, consider a x +-- p b -----+ y l+~i;' Y and show such v can not exist.] 10. Without relying on the canonical fixed point theorem, prove that if 1/1: (X, d) --+ (X, d), 1/1': (X', d') --+ (X', d') are contractions with unique fixed points xo, Xo and if f: X ~ X' is a total function with d'(fx,fy) ::::; d(x, y) and 1/1'1 = N then fx o = xo· 9.2 Differential Equations The contraction theorem 20 has many applications. In this section we sketch a proof of a standard application to the solution of differential equations. While the material is not needed in later sections and is pot directly connected with program semantics, the mathematics used is similar in many respects and we felt it to be a good idea to give the reader the option of making the comparison. While the theory holds in great generality, we shall work out the onedimensional case for simplicity. We are given a function f: [to, t 1 ] x R-----+ R, and seek to find a trajectory x: [to, t 1 ] -----+ R which specif1es the value x(t) for each t with to ::::; t:5: t1 in such a way as to satisfy the differential equation 1 x(t) = f(t, x(t» and the initial condition 2 for some specified Xo E R. We emphasize that this classical problem is a recursive definition of x, since the derivative of x is defined in terms of the x which is to be found. Assuming sufficient continuity, we may integrate both sides of 1 to obtain it x(s)ds = it f(s,x(s»ds Jto Jto or x(t) - x(t o ) = it f(s, x(s» ds. Jto Using the specified value x(t o ) = x o , we see that we may replace our initialvalue problem 1 and 2 by the integral equation 3 x(t) = Xo + it f(s, x(s» ds, Jto
219 9.2 Differential Equations which is a fixed point equation. In more detail, let us use X to denote the space of all continuous functions [to, t 1 ] ----+ R, which includes the soughtfor trajectory x. Consider the integral operator t/I: X --+ X, where for each XEX, that is, x: [t o,t 1 ] ----+R, the value t/lx: [t o,t 1 ] ----+R is defined for each tin [t o,t 1 ] by (t/lx)(t) = X o + 4 it f(s, x(s» ds. Jto We see that x is a solution of the integral equation 3 if and only if x is a fixed point of the t/I defined by 4: x(t) = rt/lx)(t) for all t in [to, t 1 ]. All that remains is to show how to consider X as a complete metric space, and to give conditions on f which allow us to apply the contraction lemma to guarantee that t/I has a unique fixed point and thus, for such f, conclude that our initial-value problem has a unique solution. Indeed, let d be the metric on X introduced in 9.1.4. We leave it to the reader (Exercise 1) to prove that (X, d) is complete. We next consider an appropriate condition onf. 5 Definition. We say that f: [to, t 1 ] x R ----+ R satisfies a Lipschitz condition on X uniformly with respect to t in [to, t 1 ] if there exists a number K > 0 such that If(t, x) - f(t, y)1 ::; K Ix - yl for all x, y in Rand t in [to, t 1 ]. Note the (at first surprising) fact that we do not require K < 1 as in the Banach Theorem. This will be important when the proof of our concluding theorem takes a surprising turn below. 6 Theorem. If f: [to, t 1] x R ----+ R satisfies a Lipschitz condition with constant K > 0, then the corresponding operator t/I: X --+ X of 4 has a unique fixed point. PROOF. We have to compare d(t/lx, t/ly) with d(x, y). In the calculations below, each Max ranges over t in [to, t 1 ]. d(t/lx, t/ly) = Max It/lx(t) - t/ly(t) I = MaxlL (f(s,x(s» - f(s,Y(S»)dSI ::; Max ::; Max (1: (1: If(s,x(s» - f(S,Y(S»ldS) K Ix(s) - y(s)1 dS) ::; Max(K' d(x, y). (t - to» = K(tl - to)·d(x,y). since Ix(s) - y(s)1 ::; d(x, y)
220 9 Fixed Points in Metric Spaces We seem to be in trouble! There is no reason to expect that K· (t 1 - to) < 1, and so 1/1 is not a contraction. However, all is not lost-we shall see that there is some n ~ 1 for which 1/1" is a contraction. Just as the n-fold integral of the function constantly 1 is tn/nt, so do we have 7 .1,") < [K·(t 1 -, to)]"d( x, y ), d( .I," 'I' X, 'I' Y _ n. as is readily proved by induction on n. But we can choose n so large that [K· (t 1 - to)]" < n! and conclude that, for this n, 1/1" is a contraction. But then 1/1" has a unique fixed point, x in X, with I/I"x = x. Consider, now, that This sa~s that I/Ix is also a fixed point of 1/1". Since x was the unique fixed point of 1/1", it follows that I/Ix = x. To see that x is the unique fixed point of 1/1, just note that any fixed point of 1/1 is also a fixed point of 1/1". D EXERCISES FOR SECTION 9.2 1. Prove that (X, d) of 9.1.4 is complete. 2. The unique solution x: [0, 00) ---+ R satisfying x(t) = 3x(t), x(O) = 2 is x(t) = 2e 3t• Verify this using the theory of this section by showing that f(t, x) = 3x satisfies a Lipschitz condition and that 2e 3t is a fixed point of t/I. [Hint: The latter, recall, just says that 2e 3t solves the differential equation.] 3. Repeat Exercise 1 for x(t) = 2tx(t), x(O) = 5. 9.3 Metrics on Trees Informally, the specification W(x) = while p(x) do f(x) unwraps to the recursive specification W(x) = if p(x) then W(f(x» else x as may be seen from the flowchart equivalence
221 9.3 Metrics on Trees F w Of course, we can substitute the right-hand expression for W in this expression and obtain a larger flowchart, and this process could be continued getting larger and larger flowcharts. Each successive flowchart may evaluate more and more arguments without calling W itself. This suggests that we think of the semantics of such a recursive specification as being the semantics of the limit of the corresponding sequence of flow charts. In this section we formalize this concept, but replacing the informal treatment of flowcharts by a formal treatment of trees. More specifically, we consider recursive specification at a purely syntactic level in the form of a finite tree which "calls itself on specified arguments." "Execution" is a process of iterated substitution which produces a sequence of ever deeper finite trees which, in the limit, defines an infinite tree which represents the specification in nonrecursive form at the syntactic level. We show in this section that this infinite tree arises as the unique fixed point of the specification by applying the Banach fixed point theorem 9.1.20. This is the starting point of a number of theories discussed further in the notes to this chapter. So as not to obscure the underlying simplicity of the concepts involved we will avoid being too formal. The interested reader may consult the references in the end of chapter notes to learn more about the formal theory of infinite trees. 1 Example. A "tree-specification" for W = while p(x) do f(x) is 2 ~ W x/~~x x f I I I x The interpretation we have in mind is 3 ifp(x,y,z) = {~ if p(x) is true if p(x) is false.
222 9 Fixed Points in Metric Spaces Here the "formal argument" is specified on the left-hand side as x whereas the argument of W on the right-hand side is the tree for f(x). Hence, a single "execution" of 2 substitutes f(x) for each x on the right-hand side to give ifp f/~~f I x I f I I x f I x which, when semantically interpreted from the root down, gives "if p(x) then (if (f(x» then w(f(f(x))) else f(x» else x." Repeated call leads to the infinite tree 4 ifp f/~f I I x x ifp p / ~2 I x I x
223 9.3 Metrics on Trees where f I f I x and so forth. 5 Example. A "tree-specification" for the factorial function FACT(x) = if x = 0 then 1 else FACT(pred(x)) is ifzero 6 x /I~. one tImes x/ ~FACT FACT I x 1 pred I x Here, ifzero is as in 3 for p the test for equality to zero, one(x) = 1, and pred is the predecessor function. As the argument to FACT is pred(x) on the right-hand side of 6, the resulting infinite tree by repeated "execution" is 7 ifzero x /I~. one tImes /.£1zero X/I pred 1 ole ~times pred/ I x pred I I x pred ~ifzero /I~. one tImes pred / ~ 1 pred 1 x The reader should check that "semantic interpretation from the root down" is correct.
224 9 Fixed Points in Metric Spaces 8 Example. To illustrate how multiple arguments and calls can be handled, consider the two-variable Ackermann function of 5.1.3: 9 n+1 a(m, n) = { a(m - 1,1) a(m-1,a(m,n-1)) ° ifm = ifm #- 0, n = 0 else. A suitable "tree-specification" is Though perhaps not obvious, if "all" substitutions are made, the resulting "ACK-free" infinite tree is independent of "substitution strategy." It is crucial here that the root ifzero of the right-hand side of 10 is not ACK. This guarantees that we can force the depth of all ACKs to be arbitrarily large by repeated substitutions. Theorem 21 below provides a formal proof. To check their understanding, readers should now pause to verify the following: 11 The depth-9 tree obtained from Pld m / ACK ~ /ACK~ m pred I n by substituting 10 in both ACKs is independent of which one is substituted first. Our definition of trees will be informal. The examples above give the general idea. Node labels come from a fixed set and are either formal arguments x, y, z, ... or given function symbols f, g, ifzero, ... or functionvariable symbols FACT, ACK, .... Each function symbol has a definite arity
225 9.3 Metrics on Trees in {O, 1,2, ... } (we call a O-ary symbol constant, a 1-ary symbol unary, and a 2-ary symbol binary). Trees are drawn upside down with a root at the top and leaves at the bottom. Trees used in specifications shall have a unique root and each leaf shall be either a formal argument or a constant. (We consider subtrees in 13 with more arbitrary leaves, however.) In 7, the root is ifzero and there are infinitely many leaves, each either x or one. The given function symbols are one, pred, times, and ifzero of arities 0, 1, 2, and 3. In 6, FACT is a unary function-variable symbol. Properties 12 and 14 below for trees seem intuitively evident. Any formal definitions that yield these properties will suffice for the needs of the theory below. 12 Property. Each node in a tree connects to the root by a unique upward path of finite length. The number of nodes above the given node in this path is called the depth of the node. If t is finite, the depth of t is its maximum node depth. Thus, in the tree of 6, FACT has depth 2, ifzero has depth 0, and the tree itself has depth 4. In the infinite tree of 7 times occurs with depths 1, 3, 5, .... 13 Definition. For any tree t and integer n ~ 0, define ten] to be the subtree of all nodes of depth ::; n. Thus, if t is the tree of 6, = ifzero t[l] = ifzero teO] x t[2] = /I~. one tImes ifzero x /I~. one tImes x/ ~FACT and t = t[4] = t[5] = .... 14 Property. If to, t 1 , t 2 , ••• is a sequence of finite trees such that tn has depth n and such that tn[m] = tm whenever m ::; n. Then there exists an (obviously infinite) tree t with ten] = tn for all n. The tree t is unique in 14 since 12 implies that for any two trees, t, u, t = u if and only if ten] = u[nJ for all n. We now have the following: 15 Theorem. Let T be a set of trees each of which satisfies 12 and which satisfies 14. Define d: TxT -----+ [0,00) by
226 9 Fixed Points in Metric Spaces 16 d(t, u) = {o2- k (t,u) ift=u if t =I- u, where 17 for t =I- u, k(t, u) is the least k with t[k] =I- u[kJ. Then d is a non-Archimedean metric and (T, d) is complete. PROOF. That d(t, u) = d(u, t) is obvious and d(t, u) = 0 if and only if t = u by 12. To see d(t, v) = Max(d(t, u), d(u, v)) for all t, u, v observe that if k = Min(k(t, u), k(u, v)) then for p < k, t[k] = u[k] = v[k] so that k ~ k(t, v), hence d(t, v) ~ rk. But r kis one of d(u, t), d(u, v), so we have shown that d is a non-Archimedean metric. Now let t 1 , t 2 , t 3 , ••• be a Cauchy sequence. For each n, it follows from the definition of a Cauchy sequence that d(t" t s ) < rn whenever r, s are suitably large. In view of 12 and 13, this means that there exists an integer N n depending only on n so that 18 t,[n] = ts[n] whenever r, s ~ N m for each n. Define 19 Un = t,[n] for any r ~ Then Un is a finite tree of depth at most n. If m ~Nm so that N n. ~ n choose any r ~ N n and Un = t,[n], Um = t,[m], hence, un[m] = (t,[n])[m] = t,[m] = Um (it being obvious that for any tree t, (t[n] [m] = t[m] if m ~ n). Hence, the sequence (un) of 19 satisfies the hypotheses of 14 so that there exists a tree u with 20 u[n] = Un for all n. To finish the proof we must show that d(u, t,) ---- 0 as r gets large, thereby establishing completeness by showing that the arbitrary Cauchy sequence (tn) has u as limit. But this is easy as follows. For any fixed n, u[n] = Un = t,[n] for all r .~ Nn by 20 and 19. Thus, for r ~ Nn, k(u, t,) ~ nand d(u, t,) ~ 2- n. Since n is arbitrary (so that rn may be arbitrarily small) this says d(u, t,) ____ Oasr .... 00. 0 We next show that a wide class of recursive tree-specifications are contractions. (Sets of such specifications are discussed in Exercises 4 and 5.)
227 9.3 Metrics on Trees 21 Theorem. For fixed sets of given function symbols, n-ary function-variable symbol F and formal arguments Xl"'" Xn (n ~ 1), let T be the set of all trees whose leaves are formal arguments or nullary given functions. We consider T a complete metric space as in Theorem 15. Let t be a finite tree in T in which F appears but not at the root. Let 1jI: T --+ T be any function for which ljI(u) is obtained from u by substituting t for one or more F with arguments respected, that is, each leaf Xi in t is replaced with the ith-branch subtree of the F. Then IjI is a contraction. PROOF. In the notation of 17, for u, VET it is clear that k(ljIu, IjIv) ~ p + k(u, v) if p is the length of the shortest path from an F to the root in t. As p > 0, d(ljIu, IjIv) d(u, v) ----'-':--,-----'-c---'-= 2- k ('I'u,'I'v) 2 k(u, v) < - r(p+k(u,v)) 2 k(u,v) =2 _ P <1 • D According to the proof of the Banach theorem 9.1.20, the unique fixed point of the IjI of Theorem 21 is obtained by iterating IjI on any starting tree (which might as well be called "F"). The reader who has worked through 11, let alone its infinite iteration, should appreciate how the theory of contractions on metric spaces has provided a powerful test to prove uniqueness! We conclude this section with a brief comparison with earlier discussions of recursive call. Consideration of Example 1 should suffice. Semantically, ifp in 3 and f take the forms ifp E Pfn(X 3 , X), fEPfn(X, X) for an appropriate set X. At this level, 2 may be represented by 22 Pfn(X, X) ~ Pfn(X, X), IjI(W(x)) = ifp(x, W(f(x)), x). It is readily verified that IjI is continuous and that its Kleene sequence 23 1jI(.l) = ifp(x, .l,x) = if not p(x) then X else .l, 1j12(.l) = ifp(x, ifp(fx, .l,fx), x) = if not p(x) then x else if not p(f(x)) then f(x) else .l 1j13(.l) = ifp(x, ifp(fx, ifp(j2x, .l,j2x),fx), x), has the intended semantics while p(x) do f(x) as its least upper bound. A similar analysis obtains replacing Pfn with Mfn. The main idea ofthis section is that trees capture the process at a syntactic level which unifies an entire class of semantic interpretations. In broadest terms, a possible comparison between syntax and semantics would take the following form:
228 9 Fixed Points in Metric Spaces 24 (i) Choose a semantic category, C. (ii) Interpret the given function symbols f, g, ... as morphisms in C. (iii) Impose axioms on C to allow the interpretation of any tree as a Cmorphism. (iv) For 1/1: T --+ T as in Theorem 21 induce a corresponding 1/10: C(X,X)---+ C(X,X). (v) Prove that the interpretation of the unique fixed point of 1/1 is an appropriate canonical fixed point of 1/10. The example just discussed has these features, that is, the interpretation of the infinite tree 4 is also while p(x) do f(x). The literature cited at the end of the chapter certainly carries out this program for, at least, C = Pfn and C = Mfn. We have resisted developing 24 further and leave this as an open-ended research problem for the interested reader. EXERCISES FOR SECTION 9.3 1. Express Example 5.2.12 as a tree-specification and obtain the infinite tree of call. Does it clarify that the semantics is the identity function as was seen in Exercise 5.2.5? 2. Consider the infinite tree of call for appropriate tree-specification of the Fibonacci functions of Exercises 5.1.2 and 5.2.6. Is there any ambiguity? 3. Express 5.2.13 as a tree-specification. How clear is it that the infinite tree of call is semantically equivalent to that for while p do g, as was shown in 5.2.17? 4. Let (Xl,dd, ... , (X., d.) be metric spaces. Let X = Xl is a metric space if d((xJ, (yJ) X ..• X X •. Show that(X,d) = Max(di(xi, Yi)) and that (X, d) is complete if each (Xi' dJ is. 5. Use Exercise 4 to state and prove a generalization of Theorem 21 for simultaneously recursive tree-specifications. [Hint: the ith metric space is the set of all trees as in 21, but the number of formal arguments may depend on i.] Test your result on the specification of Example 5.2.1S. 6. Find appropriate forms of 22 and 23 for Example 5. 9.4 Context-Free Languages as Metric Fixed Points In Section 6.4, we associated with each context-free grammar G a function I/IG: (2 x *t ---+ (2x*)n which maps n-tuples of languages to n-tuples of languages. We saw that
229 9.4 Context-Free Languages as Metric Fixed Points (2X*)n could be given a domain structure with respect to which t/lG was continuous, and that L(G) was then the first component of the least fixed point of t/lG. In this section, we restrict G slightly by requiring that G be A-free, that is, that no production be of the form v ~ A. (It is well known that for any context-free grammar G there exists a A-free context-free grammar G' such that L(G') = L(G) - {A}.) Recall that X+ = X* - {A}. We may then regard t/lG as a map (2X+)n ~ (2X+)n. We give (2x+)n the structure of a metric space and show that t/lG is a contraction. It thus has a unique fixed point-which must then equal the least fixed point, and so has L(G) as its first component. We begin with the following: 1 Theorem. The metric space (2x+, d) of 9.1.6 is complete. PROOF. Let L l , L 2 , L 3 , such that ••• be a Cauchy sequence. For each k there exists Nk d(Lm, Ln) < r k for m, n > Nk • But this just says that Lm and Ln do not differ on words of length m, n > Nk • We can thus define the language L by stipulating that WE L if W E Lm ~k for for any (and thus all) m > N 1wl , where, recall, Iwi is the length ofw. It is then clear that (Lmlm ~ 1) converges to L. o It is seen in Exercise 9.3.4 that whenever (Xl,d l ), ... , (Xn,d n) are complete metric spaces then (X, d) is again a complete metric space if X = Xl X ..• X Xnandd«xl, ... ,xn)'(Yl, ... ,Yn» = Maxl<i<ndi(xi,y;).ltthenfollows that (2x+)n is a metric space with the metric - e«Vl'···, v,,),(Wl ,···, w,,» = Max d(V;, W;). 15,i5.n To see that t/lG has a unique fixed point for A-free G, we need simply verify that t/lG is a contraction with respect to this metric. 2 Theorem. Let G be a A-free context-free grammar, and let t/lG be the functional (2x+)" ~ (2x+)" derived from G essentially as in Section 6.4. Then t/lG is a contraction with respect to the metric e on (2x+)". PROOF. e(V, W) = Maxl$i$nd(V;, W;) = rlwl, where Iwl is the length ofw, the shortest word on which any V; differs from the corresponding W;. Let w' be the shortest word on which any t/lG(V)j differs from the corresponding t/lG(W)j. Then, for t/lG to be a contraction, we require that Iw'l > Iwl, no matter what the choice of distinct V and W, for then
230 9 Fixed Points in Metric Spaces The shortest word on which WlO V l l ... VIrWIr differs from WIO Wl l .. ·lt1rwIr is clearly longer (if it exists) than the shortest word in which any of the J.j's and aj's recurring therein differ, unless each W Ij = A. But then we are only in trouble if one of the VIj contains A. But we have forbidden this, and so we are done. 0 EXERCISES FOR SECTION 9.4 1. Let G be the grammar Construct a A-free grammar G' with L(G') = L(G) - {A}. 2. Corresponding to the grammar G of Exercise 1 define I/IG: 2x+ ---+ 2x +, X = {a}, by l/IG(V) = A + aVo Show that 1/1 is a contraction even though G is not A-free. 3. Study the proof of Theorem 2 to give an example of a grammar which is not A-free for which I/IG is not a contraction. 4. In the context of Example 6.4.3, define l/IG and compute l/I:;(Vi, V2, V3) for n = 1,2, 3, for arbitrary Vi' V2 , V3 . Convince yourself that as n gets large the sequence converges to the same languages as the Kleene sequence of 6.4.6. Notes and References for Chapter 9 To prove that many metric spaces, including those of Examples 9.1.3-4 are complete, it is necessary to know that the set R of real numbers with the usual metric d(x, y) = Iy - xl is complete. This is equivalent to the "least upper bound axiom" about R as a totally ordered set which asserts that every nonempty set A of reals which has an upper bound has a least upper bound. An "algorithm" to prove that the least upper bound axiom is true would construct LUB(A) as follows. As A has an upper bound there exists a smallest integer n with a < n + 1 for all a E A. To continue, choose decimal digits d i , d 2 , d3 ••• so that d i is as small as possible with a < n + .d i + .1 for all a, .d 2 is as small as possible with a < n + .d i d2 + .01 for all a, and so on. The desired least upper bound is n + .d i d 2 d 3 ... • This is not a constructive algorithm because, say, to choose d i requires "searching" through all of A even though A may be infinite. Ultimately, then, one must look harder at the foundations of the real numbers-and this is why the least upper bound axiom is an axiom. To underscore the overall importance of this issue, we refer the reader to a rigorous calculus text for proofs that each of the theorems in the list below requires the statements beneath it:
231 Notes and References for Chapter 9 TaY!O"T,orem if df/dx = 0 then f is constant I the mean value theorem . \ the maximum value theorem I R satisfies the least upper bound axiom For more about the Manhattan metric see E. Krause, Taxicab Geometry, AddisonWesley, 1975; who calls it the taxicab metric. A more leisurely and more general exposition of the material in Section 9.2 may be found in L. Padulo and M. A. Arbib, System Theory, Saunders, 1974. The theory of Section 3 was introduced by M. Nivat in the early 1970s, building on the ideas of M. Schiitzenberger. Many others contributed later. For references we refer the reader to S. L. Bloom, "All solutions of a system of recursion equations in infinite trees and other contraction theories," Journal of Computer and System Sciences, 27, 1983, pp. 225-255 (which also carries out the program of 9.3.24 for many categories) and to I. Guessarian, Algebraic Semantics, Lecture Notes in Computer Science, Vol. 99, Springer-Verlag, 1981. We note that Guessarian's approach uses order semantics with the Kleene sequence, rather than the Banach fixed point theorem. (Associate a function f(t) with each tree t, setting all variables to .1. Given the sequence t 1 , t 2 , t 3 , ••• of trees with functions f(td, f(t 2 ), ••• one shows that f(t 1 )::::; f(t 2 )::::; ••. provides the desired Kleene sequence.) For a more rigorous treatment of infinite trees, see the book by Guessarian just cited as well as C. C. Elgot, S. C. Bloom, and R. Tindell, "On the algebraic structure of rooted trees," Journal of Computer and System Sciences, 16,1978, pp. 362-399. Our treatment in Section 4 is motivated by the general theory of formal power series in noncommuting variables as initiated by M. Schiitzenberger in such papers as, "On a theorem of Jungen," Proceedings of the American Mathematical Society, 13, 1962, pp. 885-890. A useful exposition and further references are given in Chapter 9 of G. Lallement, Semigroups and Combinatorial Applications, John Wiley & Sons, 1979.
PART 3 DATA TYPES
CHAPTER 10 Functors 10.1 Data Types Lead to Functors 10.2 Fixed Points of Functors Modern programming languages employ a variety of data structures which are (at least) sets of "structured elements" equipped with functions that allow combining, manipulating, and querying of such elements. A very rich supply of data types may be built-either directly or recursively-from finite sets using finite products and coproducts. Thus, if A is a character alphabet, A x A x A is "length 3 character arrays." Ifpri: A x A x A -----+ A is the ith projection, pri(W) evaluates the ith coordinate of w, providing the semantics of what would be written as wEi] in many programming languages. A precise definition of length 3 character arrays should specify exactly which maps such as the pri should belong to the structure. For a recursive example, define "natural number" by ois a natural number if n is a natural number nO is a natural number which is captured by the recursive specification N: := {O} + N x {O}, where we represent each number n by a string of zeros of length n + 1. A central concern will be to provide precise semantics to such specifications. We build new data types, then, by applying such building operations to old data types. But the old data types are not necessarily just sets-it may be advantageous to remember some of their associated maps in the building process. As such, the building operations we use have to act on sets and on maps. The appropriate notion here is "functor." In the first section we motivate how data types lead to functors. The second section introduced the least and greatest fixed points of an endo-
236 10 Functors functor and presents examples of data types which may be described as such fixed points. 10.1 Data Types Lead to Functors Consider the program flowcharted in Figure 1. This program receives a natural number n ~ 1 as input and outputs the nth prime, 2 being the first. This program employs lists of primes as data structures. At any time, PL holds the list of primes found so far, while NP holds the odd integer which is a candidate for the next prime. This next prime is found as the first odd number not yet on the list PL which is not divisible by any of the primes already on the list. In this section we explore how the need for lists of integers is related to the notion of functor. Let N be the set of integers. (Ultimately, we will discover which properties ofN we need and shall "derive these from scratch.") The set N+ of (non empty) lists of integers has as elements all finite tuples <n 1, ... , nk) with each ni E N and k > O. The program of Figure 1 makes explicit use of the following functions: 1 which we use to treat the length 4 list <2,3,5,7) as dynamic-its length may change. 2 <m,<nl,···,nk»~ { nm I -L if 1 < m < k I ese, which we use to extract one entry at a time from PL for comparison purposes. 3 <n 1 , ... ,nk ) ~ nk, which we use to test the end of the list. 4 «n1,···,nk),m) ~ <n1,···,nk,m), which we use to add a new element at the end of the list. 5 <nl,···,nk)~k, which we use to find the length of the list.
237 10.1 Data Types Lead to Functors F ? T Figure 1 A flowchart for finding the nth prime: PL is "prime list" and NP is "next prime candidate." Let us now attempt a recursive definition of N+ which will also, ultimately, let us define the maps 1-5 (in 10.2.16 and 11.1.14 below). Define L (hopefully = N+) recursively by 6 L::=N +L x N. What does this mean? Let us create an analogy with our earlier work in order
238 10 Functors semantics and write 7 = N +L t{!(L) x N. The general form of such t{! is as a "mapping" from the category Set to itself. As seen from Exercise 2.1.7, a category may be regarded as a generalization of a poset. Briefly, consider the elements of the po set P as the objects of a new category C(P,:s;); and let the assertions that x ~ y become the morphisms x -+ y of C(P,:s;)' In other words, given objects x and y of C(p,::;) there is at most one morphism x -+ y, and there is such a morphism if and only if x ~ y in P. It can then be checked that morphisms compose appropriately (x -+ y and y -+ z yield x -+ z by transitivity) and identity morphisms are defined (x -+ x exists by reflexivity). In what follows, we will gain much motivation by lifting posets from (P, ~) to C(P,::;) and then studying an analogous concept that is available in every category C. As a first example, the least element 1- of a poset (1- ~ x for all x in P) generalizes to the initial object 0 for categories. (There is a unique arrow -+ X for every object X in C.) The ascending chain o 1- ~ t{!(1-) ~ t{!2(1-) ~ ... generalizes to a suitable diagram of morphisms 8 We shall see in Chapter 11 that this diagram has a "colimit," unique up to isomorphism, which generalizes the least upper bound V(t{!n(1-)). It will also be seen in Chapter 11 that the Kleene fixed point theorem 6.2.13 generalizes and that the t{! of 7 is "continuous" and so has a "least fixed point" which, moreover, is N+. For the present, we proceed intuitively and anticipate these later results. Regard 7 as asserting "a list is a natural number or a list followed by a natural number." Thus, a natural number is a list, and stage 1 yields: N = t{!(0) c L. At the (n + l)st stage, a list is either in N or is <I, n), where I is created at the nth stage. Clearly, then, t{!n(0) represents the lists created after n stages. Now observe the following: 9 Distributive Law of Set Theory. (AI + ... + A k) x B = (AI X B) + '" + (Ak x B) (k ~ 2). Indeed, a typical element of the left side is of the form «i,a),b) with aeA i, be B whereas a typical element of the right side is <i, <a, b» with ae Ai' be B so that «i,a),b) 1---+ <i,<a,b» is the desired isomorphism. Using 9, we compute
239 10.1 Data Types Lead to Functors 10 1jJ(0) =N +N x N 1jJ3(0) = N + (N + N x N) x = N + N2 + N3 1jJ4(0) = N + (N + N 2 + N 3 ) 1jJ2(0) = N 1jJ"(0) + [(N N=N X N N = x N) + (N x N x N)] + N2 + N 3 + N4 = N + N 2 + ... + N". If 7 is interpreted via 8 as saying that lists are exactly those objects that arise from natural numbers by finite application of 1jJ, then the set L we seek is the union of the 1jJ"(0) which, up to isomorphism, is indeed N+. The derivation of the maps 1, ... , 5 we postpone for later sections when it will have become more clear which tools we can use. We next consider the following construction. 11 Whenever f: A --+ B is a total function, there is an induced total function IjJ(A) ---+ IjJ(B) which we shall call1jJ(f) defined by N +A x N 1p(f)j N { +B x N, ifxEN if X = (a,n)EA x N. X X~ (f(a),n) This construction is a generalization of the notion of monotone map, namely, one for which x:::;; y implies ljJ(x):::;; ljJ(y). When a poset is viewed as a category, we write this as saying that IjJ is monotone if x -+ y implies ljJ(x) ---+ ljJ(y). The difference in 11 is that there are many functions IjJ(A) ---+ IjJ(B) so a specific one had to be singled out. The definition of the Kleene sequence generalizes immediately to 11 to yield the diagram 0 12 ~ 1jJ(0) = N ~ 1jJ2(0) = N +N xN 1p2(I)j " ' , which is essentially the series of inclusions oc N c N + N2 C ... c N + N 2 + ... + N" c "', since it is easily checked that the composition N + N 2 + ... + N" ~ 1jJ"(0) 1pn(l)j 1jJ"+1 (0) ~ N + ... + N"+1, where 0(, f3 are the isomorphisms of 10, is just the inclusion map. Up to isomorphism, 1jJ"(!) is the inclusion map. In a general category with an initial object, a construction such as 11 will give rise to a diagram like 12. It is time then for the formal definition.
240 10 Functors 13 Definition. Let C, D be categories. A functor t/l: C following data and axioms. --+ D is given by the Datum i. For each object C in C, t/l specifies an object t/lC of D. Datum ii. For each morphism f: C 1 --+ C2 in C, t/l specifies a morphism t/lf: t/lC 1 -------. t/lC2 of D. Axiom a. If f: C1 --+ C2 , g: C2 --+ C 3 in C, so that t/lf: t/lC 1 -------. t/lC2 , t/lg: t/lC2 -------. t/lC3 , t/l(gf): t/lC 1 -------. t/lC3 in D, then t/l(gf) = (t/lg)(t/lf). Axiom b. For each object C of C, t/lid c = idlPc ' Axioms a and b are called the functoriality axioms. We saw in 2.1.14 that a monoid homomorphism f: (M, *, e) -------. (M', *', e') is a map f: M --+ M' with the additional properties f(m 1 * m2 ) = f(m 1 ) *' f(m 2 ), f(e) = e', the first of which generalizes to axiom a, while the second generalizes to axiom b. To make this generalization more explicit, associate with the monoid (M, *, e) the category C(M ••• e) which has only one object, call it X, while M is the set of morphisms X --+ X, with composition X ~ X ~ X = X m 2 ·m,) X. It can be easily checked that a map f: M --+ M' satisfies the conditions for a homomorphism (M, *, e) -------. (M', *', e') if and only if it satisfies the axioms for a functor C(M ••• e) -------. C(M"".e')' A functor is thus a "homomorphism of categories," being the obvious notion of "structure-preserving mapping" between categories. The notion is central in category theory generally. An endofunctor is a functor of form C --+ C, mapping a category C to itself, and it is this sort of functor that arises in recursive specification. Axioms a and b are both assertions that two appropriately constructed morphisms with common domain and codomain are equal. This is automatically true in any poset since there is at most one morphism between any two objects when a poset is viewed as a category. The following is then obvious: 14 When posets are regarded as categories, "monotone map" is the same concept as "functor." In the discussion following 2.2.6, we stated that a major aspect of the philosophy of category theory is that "isomorphism" formalizes "abstractly the same." Thus, if t/l is to be regarded as an "abstract construction" (which constructs t/l(A) from A) then at the very least we should have t/l(A) and t/l(B) are isomorphic when A and B are. This is a consequence of axioms a and b as follows.
241 10.1 Data Types Lead to Functors 15 Theorem. Iftjl: C --+ D is afunctor and iff: C1 then tjlf: tjlC I ---+ tjlC2 is an isomorphism in D. PROOF. tjI(gf) Let 9 = f-l. = tjI(id) = id. Similarly, (tjlf)(tjlg) = --+ C2 is an isomorphism in C, Since gf = idcl , axioms a and b yield (tjlg)(tjlf) = id. Thus, tjlg o = (tjlf)-I. We close this section with a number of examples of functors as well as some tools to construct new such functors from old ones. Unprovided verifications are easy exercises for the reader. 16 Constant Functors. Let C, D be categories and let D be an object of D. Define a functor C fj )D c1 11 D 1------+ C2 lid D (the notation is shorthand for 15(C) = D while 15(f) = idD for every f: C 1 --+ C 2 ). 15 is a functor called the functor constantly D. A functor of form 15 is called a constant functor. 17 Identity Functors. If C is any category, the identity functor of C is given by C ide -----'''---+) C 18 Products of Functors. Let D be a category with finite products. Given /;: Di f--+ Ei in D, define fl x ... x fn to be the unique morphism as shown (i = 1, ... , n).
242 10 Functors Now let C be any category, and let 1/1 l ' a functor 1/1 1 X .•. x I/In: C -+ D by I/In be functors C ... , -+ D. Define C "'tx ... x"")D "'tCt x ... "'.C t 1"'t! 1----+ C2 X "'tC2 x··· X x ... x "'.C "'.1 2 That functoriality follows from that of each I/Ii is a straightforward exercise on Cartesian products. (But note that we must choose a definite product D1 x ... x Dn in D, at least whenever Di = I/ICi.) We may now present the dual constructions involving coproducts. 19 Coproducts of Functors. Let D be a category with finite co products. Given J;: Di -+ Ei in D, define 11 + ... + In to be the unique morphism as shown: 20 (i = 1, ... , n). (Recall that, working in the context of flow schemes in 2.3.22, we called this the parallel construction and denoted it by 1111··· Il/n.) In Set, we would have (f1 + ... + In)«d i, i)) = (J;(d;), i). In the dual category DOP this is just 11 x ... x In: E1 X ... X En --< D1 X .•. X Dn. It is easily seen that 11 + ... + In is the identity when each J; is an identity and, if gi: Ei -+ Fi, then (gl + ... + gn)(/1 + ... + In) = (gdd + ... + (gnfn) together with the dual results for x. Thus, dualizing 18, given functors 1/1 l ' ... , I/In: C -+ D from any category C, we may define C "'t + ... + "'.) D "'tCt 1----+ "'tC2 + ... + ",.Ct 1"'t! + ... + and it is functorial. (Warning: Do not confuse + ... + "'.1 "'.C 2 + with partially additive sum!) 21 Composition of Functors. Let 1/1 1: C -+ D, 1/12: D Their composition -+ E be arbitrary functors. is defined by (1/I21/11)(C) = 1/12(1/I1(C)), (1/I2I/1d(f) = 1/12(1/1 1 (f)). Functoriality of 1/1 21/1 1 is routinely established.
243 10.1 Data Types Lead to Functors In algebra, we speak of a polynomial as a function of the form anx n + an_1Xn-1 + ... + a 1x + ao. It is built up from constants and x using repeated addition and multiplication. We may think of x as a notation for the identity function. Then we can generalize the notion of polynomial to obtain the following: 22 Polynomial Functors. Let C be a category with finite products and coproducts. A polynomial functor C ---+ C is any functor which can be constructed from constant functors or the identity functor through the use of the product, coproduct, or composition operations of 18, 19, or 21. More formally, we have the following: Basis Step: The identity functor C ---+ C and any constant functor C ---+ C are polynomial functors C ---+ C. Induction Step: If 1/11 and 1/12 are polynomial functors C ---+ C, so too are 1/1 2 1/1 1, 1/1 2 + 1/1 band 1/1 2 X 1/1 1 . 23 Example. 1/1: Set ---+ Set with I/I(L) = N + L x N as in 7 is a polynomial functor. Define 1/1 1 = id x N, 1/1 2 = N + id. Then, 1/1 = 1/1 21/1 1 : When a distributive law such as 9 holds, in the category C, a polynomial functor 1/1: C ---+ C may be put into the canonical form I/I(A) ~ Co + (C 1 X A) + (C 2 X A2) + ... + (Cn x An), but a formal proof requires a formal definition of "isomorphism of functors" ~. See Exercises 5 and 7. We also offer some examples of functors Set ---+ Set which are, at least apparently, not polynomial. 24 Example. Define 1/1: Set ---+ Set by I/IA = 2A , I/If: I/IA ~ I/IB, S r--+ {J(x): XES}. Similarly, r: Set ---+ Set, r A = {S: S is a finite subset of A}, rf: r A ~ r B, s r--+ {J(x): XES}. See Exercise 10. 25 Example. Two functors can be the same on objects yet differ on morphisms. Define 1/1 1: Set ---+ Set by 1/1 1 A = 2 A , but 1/1 d: 1/1 1 A ~ 1/1 1 B, S ~ {YEB: f-1(y) c S}. 26 Example. Define (t:Set---+Set by A+={<a1, ... ,ak):k~1,aiEA} the set of finite nonempty lists in A, and define f +: A + ---+ B+ by f+<a 1,···,ak) = <f(a 1),···,f(ak)· EXERCISES FOR SECTION 10.1 1. Construct an isomorphism N+ __ N "fixed point" of the t/I of 7. + N+ X N. It is in this sense that N+ is a
244 10 Functors 2. Let N+ + be the set of all nonempty finite or infinite lists of natural numbers. Show that there exists an isomorphism N+ + ~ N + N+ + x N. Thus, N+ is not the only fixed point of the 1/1 of 7. 3. Describe I/IJ explicitly for the polynomial functor 1/1: Set -> Set of7. 4. Prove that the composition of functors as in 21 is again functorial, so is a functor. 5. Let l/J, 1/1: C -> D be functors. A natural transformation '1: l/J -> 1/1 is a family of D-morphisms of form '1C: l/JC ~ I/IC, one for each object C of C, such that for each C-morphism J: C, -> C2 the following "naturality square" commutes: "C, --'---'---+) "'C, 1"'I ---::;---+) '" D2 "C 2 The Junctor category DC has as objects all functors C -> D and natural transformations as morphisms with composition ('1''1)C = ('1'C)('1C) (the right-hand side being D-composition) and identities id'l': 1/1 -> 1/1, (i~)C = id'l'c' (i) Prove that the composition of natural transformations is a natural transformation and that id'l' above is always a natural transformation. (ii) Complete the proof that DC is a category. (iii) Two functors C -> D are isomorphic or naturally equivalent if they are isomorphic in DC. Prove that '1: l/J -> 1/1 is an isomorphism in DC if and only if each '1C is an isomorphism in D. [Hint: Prove that ('1C)-' is a natural transformation.] 6. Show that pri: 1/1, I/InC ~ x ... X I/In ~ I/Ii defined as the projection priC: I/IiC defines a natural transformation. 7. Show that I/I(C) = Co l/J, 1/1: Set -> Set are isomorphic if l/J(C) = Co X A 2 ). + (C, + (A X 1/1, C x ... X (C, x A)) and 8. Let 1 be a one element set and let 1/1: Set -> Set be a functor. (i) For each x E 1/11 show that '1x: id -> 1/1 is a natural transformation if '1xC: C ---+ I/IC maps each c E C to 1/1 (c)(x). Here, we regard c E C as the function 1 -> C mapping the unique element of 1 to c so that I/I(c): 1/1(1)---+ 1/1 (C). (ii) Show that x f---> '1x is a bijection between 1/11 and SetSet(id,I/I) [Hint: The inverse to Xf--->'1x is '1 f---> '11.] 9. Let X be any set, and let J: X -> 2x be any function. Define A E 2x by A = {XEX: x¢f(x)}. Prove that no Xo E X exists with J(x) = A. This establishes Cantor's diagonal argument: no surjection exists from X to 2x. It follows that there is no largest set: every set has more subsets than elements. 10. It is a basic fact of set theory that the polynomial functor 1/1 of 23 admits an isomorphism A = I/I(A) for every infinite set A bigger than Co + ... + Cn. Assuming this, use Exercise 9 to show that I/I(A) = 2A as in 24 is not a polynomial functor.
245 10.2 Fixed Points of Functors 11. Generalizing 3.2.24, a functorial iterate on a category C with finite coproducts assigns to each (A, B,f) with f: A ----> A + B a morphism If: A --+ B subject to two axioms: (a) f A If l (b) Given f: A ---+ (A g A then Ih+k lC+D hI C A+B A hi C + B, +B , (A + B) + B)t = A +B <It. ids) , B. (i) Define appropriate categories so that fH ft describes the action on objects of a functor. This explains the terminology. (ii) Show that the partially additive iterate of 3.2.24 is indeed a functorial iterate. Draw flowschemes to express axioms (a) and (b). [Hint: See Exercise 1.5.12.] (iii) Show that any functorial iterate satisfies the Elgot iteration equation A f lA+B ~ft,idB>' B (iv) Show that if a functorial iterate exists, C has zero maps with 0: A --+ B being (in1: A ---+ A + B)t. (v) Prove that the usual iterate is the unique functorial one in Pfn. [Hint: If ft is the usual one and f* is another, ft :0::; f* by Kleene semantics. To show DD(f*) c DD(ft) show that f*h = 0 if h: A --+ A is the guard function for the complement ofDD(ft).] It is an open question at the time of this writing if (v) holds for an arbitrary partially additive category. 10.2 Fixed Points of Functors In the previous section, we introduced recursive specification of data types by considering the set N+ of nonempty lists of integers to be a solution (see Exercise 10.1.1) L of the equation L::=N +L x N. This led us to introduce the general concept of a functor, of which an example ljJ: Set ---+ Set was defined by ljJ(L) = N + L x N, while for f: A ---+ B ljJ(f): ljJ(A) ---+ ljJ(B) {;;:)nf---+ (f(a), n). Thus, N+ could be seen as a fixed point in some sense of the endofunctor ljJ.
246 10 Functors In this section we define least and greatest fixed points for endofunctors generally. The definitions generalize definitions familiar on posets to categories. It will be proved in the next chapter that every polynomial endofunctor of Set has special such fixed points, but some examples will be explored herein to introduce the idea that any solution of the fixed point equation comes equipped with associated functions. 1 Definition. Let C be an arbitrary category and let l/J: C --+ C be an endofunctor of C. A fixed point of l/J is a pair (A, c5) with c5: l/JA ---+ A an isomorphism. There are two notions of "pre-fixed point" that we emphasize. A l/J-algebra is a pair (A, c5) with c5: l/JA ---+ A an arbitrary morphism, whereas a l/J-coalgebra is a pair (A, A) with A: A ---+ l/JA any morphism. If (A, c5), (B, y) are l/J-algebras, a morphism of l/J-algebras f: (A, c5) ---+ (B, y) is a C-morphism f: A --+ B such that Then idA: (A, c5) ---+ (A, c5) is a morphism because l/JidA = idtpA and if f: (A, c5) ---+ (B, y), g: (B, y) ---+ (C, 0 are morphisms then gf: (A, c5) ---+ (C, 0 is a morphism as l/J(gf) = (l/Jg)(l/Jf). This gives rise to the category l/J-Alg of l/J-algebras. Note the crucial role of the functoriality of l/J. Similarly, if (A, A), (B, r) are l/J-coalgebras, a morphism f: (A, A) ---+ (B, r) of l/J-coalgebras is a C-morphism f: A --+ B with and this gives rise to the category l/J-coAlg of l/J-coalgebras. In the case C = C(p,:<;;), we already know that a functor l/J: C(P,:<;;) ---+ C(P,:<;;) is just a monotone map l/J: (P, :s;;) ---+ (P, :s;;). A fixed point of l/J in the sense of 1 is just an x with l/J(x) = x. To see this, observe that an isomorphism x --+ y is just an assertion of equality (for if x --+ y is an isomorphism, it has inverse y --+ x, but then, by anti symmetry of :s;;, we conclude x = y from the assertions x :s;; y and y :s;; x). What are algebras and co algebras in C(P,:<;;)? A l/J-algebra is an x with l/Jx :s;; x, while a l/J-coalgebra is an x with x :s;; l/Jx. Given two l/J-coalgebras l/Jx :s;; x and l/Jy :s;; y, a morphism f: x --+ y of l/J-algebras is just an assertion x :s;; y (the first square above is meaningful since l/J is monotone and commutes because in a poset all diagrams do!); similarly for morphisms of l/J-coalgebras. Now consider the set of all l/J-algebras (still considering the case of a monotone l/J: (P, :s;;) ---+ (P, :s;;)). This is just a subset of P and so inherits the
247 10.2 Fixed Points of Functors ordering of P. Suppose it has a least element Xo (recall that least element in (P, ~) generalizes to initial object in a category), that is, t/Jxo ~ Xo and Xo ~ x for any x with t/Jx ~ x. It is a well-known result of po set theory (see Exercise 6.2.4) that this Xo is a fixed point of t/J and so is the least fixed point of t/J. This result immediately and easily a fortiori generalizes to the following: 2 Theorem. An initial object oft/F-Alg is afixed point oft/J. A terminal object of t/J-coAlg is afixed point oft/J. PROOF. Let (L, Jl) be an initial t/J-algebra. Applying t/J to Jl: t/J L ----+ L, (t/J L, t/J Jl) is a t/J-algebra. Obviously Jl: (t/J L, t/J Jl) ----+ (L, Jl) :r L I 1/11 1/IJl 'OJ:. . 'OJ: f 1/IL --"---+) --'-----+) L t is a morphism (the needed commutativity is Jl(t/JJl) = Jl(t/JJl)). As (L, Jl) is initial there exists f: (L, Jl) ----+ (t/JL, t/JJl) and, as idL is the only morphism (L, Jl) ----+ (L, Jl), Jlf = idL- But then fJl = (t/JJlHt/Jf) = t/J(Jlf) = t/J(idL) = id'l'L' This shows Jl is an isomorphism with inverse f Dually, if (G, M) is a terminal coalgebra and f: (t/JG, t/JM) ----+ (G, M) is the unique morphism, then f is inverse to M. 0 We have motivated the following: 3 Definition. Let t/J: C ~ C be an endofunctor of C. The least fixed point of t/J is the initial object (if it exists) of t/J-Alg. The greatest fixed point of t/J is the terminal object (if it exists) of t/J-coAlg. In the event that t/J has both a least fixed point (L, Jl) and a greatest fixed point (G, M), there exist unique f, g with 1/11 9 :r 'T I ' I M' L --1::---+) G .' M 1/1 L --:1/I-g---+) 1/IG Noting, however, that Mf = MfJlJl- 1 = MM-1(t/Jf)Jl- 1 = (t/Jf)Jl-l, we see that f = g. The morphism f: L -+ G is called the comparison map. We underscore that Definition 3 defines what we mean by "least" and
248 10 Functors "greatest" in the general case. There is no pre-established ordering. Note that, in general, the obvious way to attempt to generalize from C(p,::;;) to C via the relation R on objects defined by ARB if there exists f: A ~ B in C is not antisymmetric and is often not interesting. For example, in Set we have ARB always holds unless A is nonempty while B = 0. What makes Definition 3 important is that many data structures can be constructed as least or greatest fixed points of functors which arise naturally from recursive definitions. As will be clarified by the examples, the commutative squares in Definition 3 are not just technicalities needed to generalize from posets to categories, but rather embody in highly conceptual form a framework for recursive definition (of f and of g). The principle here is that a recursively defined data type should automatically have the capability to define functions on (or to) it recursively. We now develop some examples. But first, some useful notation: 4 Language Theory Notations. If A is a set, we have already introduced A+ = {a1"'a n: n ~ 1,a i EA} = A A* = {A} + A+ + (A x A) + (A x A x A) + "', {al "'an: n ~ O,aiEA}. = We think of elements of A+, A* as "words on the alphabet A." If S is a set of words on alphabet A and T is a set of words on alphabet B then define their set concatenation as ST = {st: SES,tE T} which is a set of words on the alphabet A u B. (It is not always the case that ST ~ S x T. For example, if S = {a,ab}, T = {ab,bab} then ST = {aab, abab, abbab} has only three elements whereas S x T has four.) Note, however, that our use of the coproduct in constructing A + , A * means that an element of A* belongs to An for a unique n. See Exercise 2. Define Aoo = {(ai: i = 1,2,3, ... ): each aiEA}. We write elements of A OO as "infinitely long words" so that it is possible to concatenate finite words on the left. For example, AnA oo = A OO A*AOO = A oo , for all n, but AOO A is not defined. 5 Example. Let A, B be disjoint sets. Consider the specification "data structure = something in B or else something in A followed by data structure." To be more precise, let t/J: Set ~ Set be the polynomial functor t/J(C) = B + (A x C).
249 10.2 Fixed Points of Functors Thus, if f: C --+ D, t/I(f) maps b to b and maps (a, c) to (a,f(c», t/I(f) = idB + (idA X f). While A*B = ({A} + A+)B = B + A+ B = B + (A+ x B)(see Exercise 2) = t/I(A *B) is a fixed point of t/I (see 9 below), there is a fixed point of t/I possessing infinite words. Indeed, t/I has a greatest fixpoint (A *B + A 00, M) as follows. Define M to be the isomorphism defined by A*B + AOO ~B + A x (A*B + AOO), ifw = AbEA*B ifw = a l '" a.bEA*B,n ifw = a l a2a 3 "·EA oo • To see this, let Ll: C ~ B that there exists unique 9 6 +A lB+AxC gl A*B lid B + idA + A"" 1 x C be an arbitrary function. We must show L\ C ~ X g lB+Ax(A*B+A"") M Clearly, 6 is equivalent to If Ll(c) E B 7 then g(c) = Ll(c). If Ll(c) = <al,cl)EA x C then that is, g(c) = M-l(alg(c l ». Mg(c) = alg(c l ), To see that 7 truly defines a unique g, classify c E C into two cases: Case 1. There exists k with Ll(c) = <al,c l ), Ll(c l ) = <a2,a 2), "., Ll(cd <a k +1,Ck+l), Ll(Ck+l)EB. Case 2. Otherwise, Ll(c) = <alc l ), Ll(c 2) = <a 2,c2), Ll(c 2) = <a 3 ,c 3 ), " (Remember: Ll is a total function.) = •• In case 1, g(c) = alg(cd = a l a 2g(c2) = ". = a l ." akg(c k+1) = a l ". akLl(c k+l ). In case 2, g(c) = a l ... a.g(c.+ l ) for every n. But then a l ... a. is an initial subword of g(c) so g(c) = a l a 2a 3 "·EA OO • To apply the greatest fixed point of Example 5 we are going to look at the trace semantics of the iteration of the partial function f: A ~ A A A + Bo:
250 10 Functors By this we mean the actual sequence of values produced through the successive applications of f which will thus be of the form (a o, a 1 , ••• , an' b) if iteration is successfully completed with result b; (a o, a 1 , ••• , an' error) if iteration ev~ntually produces an an for which f(a n) is undefined; or (a o , a 1 , ••• , an, ... ) if iteration never terminates. We define B = B o + {error} to simplify notation. It will then be useful to have a function last: A *B + ACO -----+ B with last(a 1 ••• anb) = b while last(a 1 ••• an···) is undefined, which can determine the final result in B of the iteration from the string produced by the trace semantics. We define the total function!: A -----+ A + B by f(a) = {f(a) error if aEDD(f) else. Such 1 induces a ",-coalgebra structure on A A~B+AxA ~(a) = {1~) _ <f(a),J(a) The map g: A -----+ A *B + A CO of 6 is recursively specified as in 7 by g(a) = {.[(a) f(a)· f(g(a)) 8 if .[(a) E B if f(a) E A. if f(a) E B else. It follows that 9 provides the trace semantics of f For ifthe iteration halts then g(a) = f(a)P(a)···r- 1 (a)ln(a), where fk(a)EA for k < n, r(a)EBo, or r(a) is undefined (so that In(a) = error E B), whereas if iteration does not halt then g(a) is the infinite sequence f(a)f2(a)j3(a)··· . We shall apply this trace semantics map to derive a formula for ft of 3.2.24 in 14 below, after we have defined last. But first we observe that the least set of finite strip.gs satisfying C = B + (A x C) is given by the least fixed point of our functor. 9 Example. The functor "': Set --+ Set, "'(C) = B + (A x C) of 5 also has a least fixed point (A *B, J1.), where J1. is the isomorphism B J1. (w) = + (A {waw To see this, given a function 15: B x A*B)~A*B, if WEB ifw = <a,w)EA x A*B. +A x C -----+ C, we must show that
251 10.2 Fixed Points of Functors 10 B + A x A*B idB Xli + idA B+AxC --(j--+I C defines f uniquely. This is clear since 10 is equivalent to = f(a 1 ... anb) 11 {t5(b) t5(a1 ,f(a2 ... anb) ~f n = 0 If n > O. We illustrate the results obtained by defining the partial function A *B 12 fi rst l A, n>O else, and the total function A*B~B, 13 a 1 ... anb 1-----+ b. The function first is defined by unpacking the isomorphism p.-1 using the quasi projection (3.2.6) PR 2 to read items occurring in the second term of the coproduct below. first = A*B ~ B +A x (A*B) PR 21 A x (A*B) 24 A. To define the last function, consider the unique morphism to (B,t5) for 15 = (idB , PB), where PB(a, b) = b: B id B + idA +A 1 x (A*B) x last B+AxB By 11, so that f(a 1 ... anb) = f(a 2 ... a1 b) = f(a 3 ... a 1b) = ... = f(b) = b = last(a 1 ... anb). Now let g: A ---+ A *B + AOO be the trace semantics ofthe partial function f: A ---+ A + Bo as in Example 5. Then the iterate of J, as defined earlier in 3.2.24, is equivalently defined by 14 ft=A~A*B x Aoo~A*B~B. We now also compute the comparison map for this functor t/I. By 3 it is that h: A*B ---+ A*B + A determined by OO
252 10 Functors id B+AxA*B B + idA x h ~l A*B 1 IB+Ax(A*B+AOO) M- 1 -----:-h---~' A*B + A OO By 11 h(al ... anb) = { M-1(b) M-1(al' h(a 2 ... anb) if n = 0 if n > 0, from which we see h(a 1 ... an b) = M-1(a 1, M- 1(a2' ... ' M-1(a n, M-l(b»···» = a 1 ... anb, and h = inA*B: A*B----+A*B inclusion map. +A OO is, as was to be expected, simply the 15 Example (Simple Recursion). The category Srd of simple recursion data of 2.2.27 has (N, 0, S) as initial object, where S(n) = n + 1. Let us introduce the polynomial functor tjJ: Set -+ Set with tjJ(X) = 1 + X, the coproduct of X with a one-element set denoted here simply as 1. An object in Srd is then just a tjJ-algebra since 15: tjJX ----+ X must have the form (xo,f) with xo: 1 -+ X an element of x, and f: X -+ X. It is easily checked that a morphism of tjJ-algebras is just a Srd-morphism: In this way, the natural numbers arise as the least fixed point of the data type specification c::= 1 + c and the fixed point isomorphism unpacks to yield 0 and the successor function. Moreover, the least fixed point property embodies the principle of simple recursion. This example is really a special case of 9: let B = 1 = A. 16 Example (Finite Lists of Natural Numbers). The set N+ of finite lists of natural numbers, considered in 10.1.1-5 arises as the least fixed point of tjJ: Set -+ Set, tjJ(C) = N + C x N. This is essentially the same as Example 9 with A = B = N but we have written N + C x N instead of N + N x C to accommodate the form of the push map in 10.1.4. This gives N+ the form NN* instead of N*N but, of course, these are the same. Thus, the isomorphism J.l: N + N+ ----+ N+ is given by () J.lw w if WEN = { vn ifw = (v,n)EN+ x N
253 10.2 Fixed Points of Functors +C and given any total function 15: N N + N+ X x N -----+ C, the diagram N _--,J1_~1 N+ Y id+fxidl N+CxN 0 Ie defines f since 17 f(n 1 ••• nk ) = {b(n 1 ) b(f(nl ... nk - 1 ), nk ) ~f k ~ 1 If k > 1. The maps last, push, and p of 10.1.3-5 arise as follows. The last map is constructed similarly to the map of 13 by applying 17 to a 15 above where 15 = <id N,P2)' Push is just The "length map" p: N+ -+ N of 10.1.5 is defined inductively by length(n) = 1 length(wn) = for nEN length(w) +1= s(w) for WE N+, n E N. But this just says that length is the f induced by 17 with 15: N N -----+ N defined as <(J(,f3), where (J((n) = 1 for all nand +N x 13 = N x N 2.!.....N ~N. The maps createk and spec of 10.1.1 and 10.1.2 will be discussed in the next chapter. 18 Example (Automata Theory). An automaton is given by a set Q of states, an initial state qo E Q, an input alphabet A, a next-state function y: Q x A an output function 13: Q -+ y. -----+ Q, Such an automaton processes an input word a 1 '''anEA* by beginning at the initial state qo, processing the input letters from the left to pass through the sequence of states ql = y(qo, ad, q2 = y(ql,a 2), ... , qn = y(q.-l,an), and emitting f3(qn) as the output in response to the input word. We now show that two different constructions, one a least fixed point and the other a greatest fixed point, may be used to describe this response function. Here A, Yare regarded as fixed sets. Consider the polynomial functor t/I: Set -+ Set, t/I(C) = 1 + C x A. This is very similar to 9 and the least fixed point of t/I is A *. Ifthe unique element of 1 is called A, the fixpoint isomorphism Jl: 1 + A * x A -----+ A * is essentially the identity function, that is, "{A} + A+ = A*." A t/I-algebra 15: 1 + Q x A -----+ Q takes the form <qo, y), where
254 10 Functors qo E Q (identifying elements of Q with functions from 1) is an initial state, and y: Q x A ~ Q is a next-state function, so that t/J-algebra = r: A* ~ Q with automaton without output function. The unique is defined by r(A) = qo r(wa) = y(r(w), a) and is precisely the final state reached from qo in processing w. We call r the reachability map of the automaton. The input/output response function with initial state qo is, then, A*~Q~Y. We now turn to a greatest fixed point construction. Let C, D be sets. Let [C ~ D] denote the set of all functions from C to D, that is, [C ~ D] = Set(C, D). 19 Observation. [C ~ (-)]: Set ~ Set is a functor, where for f: Dl [C ~ D1 ] [C-->f] j [C ~ ~ D2 , D 1 ], C ~ Dl 1-------+ C ~ Dl ~ D 2 • Thus, [C ~ n = f 0 - . Functoriality is easy: iff = idD then g 1--+ g idD = g so [C ~ n = id[c-->D]; and, iff: Dl ~ D 2 , 1': D2 ~ D 3 , [C ~ f'f] (g) = (f'f)g = f'(fg) = [C ~ f']([C ~ n(g))· Note that [C ~ (-)] is a polynomial functor if C is finite (for then [C ~ D] = D x ... x D (n times if C has n elements)), but this seems doubtful otherwise. Above, we saw that the input/output response function of an automaton is of the form A * ~ Y. To motivate what follows, given an automaton introduce the notation hq for the response function if the initial state were q. Thus, the actual response is hqo ' We immediately observe that while hc5(q,a)(w) = hq(aw). These equations may be expressed in the form K(hq) = (a 1-------+ hc5(q,a), P(q)) if
255 10.2 Fixed Points of Functors K: [A* -+ YJ --+ [A --+ [A* -+ YJ] X Y is defined by = (S, h(A)), K(h) where S: A --+ [A* -+ Y], a 1-----+ (w 1-------+ (w 1-------+ h(aw)). But such K has the general appearance of a coalgebra! To formalize this, define t/J: Set -+ Set by t/J = [A -+ (-)] x Y. Thus, t/J(C) = [A -+ C] x Y, and if f: C -+ D, t/J(f)(IY.., Y) = (f1Y..,y). We now show that t/J has greatest fixed point ([A* -+ YJ,K). Let A: Q --+ [A -+ Q] x Y be a t/J-coalgebra. We must show that '1 ----------., [AT;l (J 20 [A --> Q] x Y [A ---+ u ] .d) [A----+ [A* --> Y]] x Y Xl y defines a uniquely. For each qEQ, A(q) has the form (y q,f3q ) with Yq: A -+ Q, f3qE Y. Chasing q to [A --+ [A* -+ YJ] x Y in the two ways shown in 20 leads to two equations, one for each coordinate. These are, with the Y-coordinate shown first, 21 a(q)(A) = f3q, a(q)(aw) = a(yq(a))(w). Thus, a(q) is defined on A for all q, and a(q) is defined on all words of length n + 1 for all q providing a(q) is defined on all words oflength n for all q, so 21 is an inductive definition of a. But more is true! A t/J-coalgebra is just a way of coding an automaton without initial state. The formal principle involved is the following: 22 If C, D, E are sets then f(c, d) = g(c)(d) describes both directions of a bijection [C x D --+ E] = [C --+ [D -+ E]]. The proof of 22 is obvious. But then given A: Q --+ [A -+ Q] x Y, first A = [', f3] with ,: Q --+ [A -+ Q], f3: Q -+ Y. Then, by 22, such, are in bijective correspondence with functions y: Q x A --+ Q so that A is a way of coding (y, f3) = automaton without initial state. From this perspective, 21 becomes 23 a(q)(A) = f3(q), a(q)(aw) = a(y(q, a), w), from which it follows at once that a(q): A* -+ Y is the response of the auto-
256 maton with y: Q X A 10 Functors ~ observability map. Q, p: Q --+ Y and initial state q. Such (J is called the The reachability and observability maps are important in automata theory. An automaton is reachable if r is surjective, so that every state can be reached from the initial state and so "every state is necessary." An automaton is observable if (J is injective. This says that whenever two states are different, their input/output responses differ on at least one input word. A fundamental result of automata theory is that given arbitrary f: A * --+ Y there exists a reachable and observable automaton whose input/output response is f and, moreover, it is unique up to isomorphism. EXERCISES FOR SECTION 10.2 1. Specialize the proof of Theorem 2 to provide an alternate approach to Exercise 6.2.4. 2. Let S, T be disjoint subsets of S*. Show that ST = S x T. How is {A} + B + (BxB)+(BxBxB)+··· different from {A}uBuBBuBBBu··· if B has form A*? 3. Specialize the results of this section to compute the least and greatest fixed points and their comparison map for the polynomial functor t/J: Set ---+ Set, t/J(C) = B + c. 4. The polynomial functor of Exercise 3 is much less interesting on a poset. Let (P, :0;) be a po set with greatest element 1 and assume that b v x exists for all x. Define t/J: P ---+ P by t/J(x) = b v x. Show that t/J has b as least fixed point and has 1 as greatest fixed point. 5. Using the structure of A * B as a least fixed point as in 9, define A*B~A, a2 a ... a b I----> { -L n 1 ifn> 1 else, analogously to be the definition of first and last in 12 and 13. 6. In any category C, given IX W al y lb u with a, b isomorphisms, prove that I I IX W a-I Y IZ b- I U IZ Conclude that in the context of 3, if there exists a t/J-algebra morphism ----. (L, /-I) then the comparison map is an isomorphism. (G, M- 1 )
Notes and References for Chapter 10 257 7. Let 1/>, l/I: C -+ C be functors and let 1/: I/> -+ l/I be a natural equivalence (Exercise 10.1.5). Let (A, (j) be a l/I-algebra. Show that (A, (j) is a least fixed point of l/I if and only if (A, (j ·1/A) is a least fixed point of 1/>. State the dual result for greatest fixed points (hence, of course, no proof is needed). 8. Let E be an object in C and let +, EB both assign to each two C-objects A, B definite coproducts Show that the endofunctors I/>(C) = E + C, l/I(C) = E EB C are isomorphic. More generally (thereby raising a technical problem previously swept under the rug), show by induction that up to isomorphism offunctors (Exercise 10.1.5) it does not matter which products and coproducts are chosen for a given polynomial functor. The resulting least and greatest fixed points are then also unique up to isomorphism by Exercise 7. Notes and References for Chapter 10 Functors and natural transformations between them playa central role in the founding paper of category theory [So Eilenberg and S. Mac Lane, "General theory of natural equivalences," Transactions of the American Mathematical Society, 58, 1945, pp. 231-234]. Quoting from the introduction of the book of Freyd cited in the notes to Chapter 2, It is not too misleading, at least historically, to say that categories are what one must define in order to define functors, and that functors are what one must define in order to define natural transformations. The reader may wish to look up "representable functors" and the "Yoneda lemma" about them-Exercise 10.1.8 is a special case. For more on functors see the references cited in Chapter 2. The functorial iterate of Exercise 10.1.11 was introduced by the authors in their paper cited in Chapter 3 notes. The Elgot iteration equation was studied earlier by C. C. Elgot. (See his paper cited in the notes to Chapter 3.) In the 1970s a number of workers including G. Plotkin, M. D. Smyth, and M. Wand considered least fixed points of functors in connection with the recursive specification of data types, both in Set and in categories of domains such as those considered in Chapter 13; see D. 1. Lehmann and M. B. Smyth, "Algebraic specification of data types," Mathematical Systems Theory, 14, 1981, pp. 97-139]. Much of this work had been done earlier in a more abstract setting. Theorem 10.2.2 appears in M. Barr, "Coequalizers and free triples," Mathematische Zeitschrift, 116, 1970, pp. 307322, where the result is attributed to Lambek. Endofunctors of Set were the object of intensive study by a number of mathematicians in Prague including 1. Adamek, V. Koubek, and V. Trnkova in a series of papers beginning in 1971; see 1. Adamek and V. Koubek, "Remarks on Fixed Points of Functors," in Lecture Notes in Computer Science, Vol. 56, Springer-Verlag, 1977, pp. 199-205. The use of greatest fixed points to specify data types is due to the· authors in "Parametrized data types do not need highly constrained parameters," Information and Control, 52, 1982, pp. 139-158. For a category-theoretic treatment of reachability and observability see the authors' paper "Adjoint machines, state-behavior machines and duality" in Journal of Pure and Applied Algebra 6,1975, pp. 313-344.
CHAPTER 11 Recursive Specification of Data Types 11.1 From Least Upper Bounds to Least Fixed Points 11.2 Co-continuous Functors 11.3 Continuous Functors and Greatest Fixed Points Just as posets generalize categories, the Kleene fixed point theorem of 6.2.13 generalizes to provide a "co-continuous" endofunctor t/J: C -+ C with a least fixed point as the "colimit" of the "right chain" 1- ~ t/J(1-) ~ t/J2(1-) ~ t/J3(1-) ~ .... The details are developed in the first two sections. The dual theory is used to get greatest fixed points in Section 3. In 11.3.18 we introduce the notations C: := t/J(C) and C =: : t/J(C) for the least and greatest, respectively fixed points t/J. These exist for the wide class of "polynomial" functors Set -+ Set and in many other cases as well. 11.1 From Least Upper Bounds to Least Fixed Points In 10.1.12 we observed, for the polynomial functor t/J: Set -+ Set, defined by t/J(C) = N + C x N, that 1 is, up to isomorphism, the ascending chain of sets 2 o c: N c: N + N 2 c: N +N2 +N3 c: ... whose union N+ arises, as we have seen in 10.2.16, as the least fixed point of t/J. We have not been very precise about what it might mean for 1 and 2 to "coincide up to isomorphism." Moreover, to seek the proper level of gener-
259 11.1 From Least Upper Bounds to Least Fixed Points ality for our theory we should like, at least at first, to consider 1 for a wide class offunctors ljJ: C -+ C for C a category with initial object, but not necessarily Set. In an arbitrary category, what do the "inclusions" mentioned in 2 mean? Our approach, however, is to avoid answering such questions (which we regard as red herrings, though answerable), preferring to attack 1 directly through a "colimit" construction. While such exists in many categories we pay special attention in this section to the details in Set. We turn to defining colimits. As in the previous chapter, we proceed by finding the appropriate concept in a poset (P, :::;;), lifting it to C(p, S;), and then generalizing it to an arbitrary category C. It is clear that an ascending chain Xo :::;; Xl :::;; X 2 :::;; ... generalizes to a chain of morphisms Co -+ Cl -+ C2 -+ .. '. A least upper bound is an element X of P such that not only is Xn :::;; x for all n, but if also Xn :::;; Y for all n, then x :::;; y. Replacing each :::;; by -+ motivates the following definitions. 3 Definitions. Let C be a category. A right chain in C is an arbitrary sequence (en In;::: 0) of morphisms ofthe form (The term "right" makes reference to the convention that morphisms are usually written with the domain on the left and the codomain on the right. Thus, a right chain proceeds to infinity toward the right whereas the "left chain" to be introduced in 11.3.1 comes in from infinity from the left, " . -+ C 3 -+ C2 -+ C l -+ Co·) An upper bound of a right chain (c n ) as above is a pair (U, a) with a = (anln ;::: 0) with an: Cn -+ U such that \" ;.:'.' 4 (n ;::: 0). V For a fixed chain (c n ) we may form a category whose objects are the upper bounds of (c n ) and in which a morphism f: (U, a) ~ (V, /3) is a C-morphism f: U -+ V such that :I\~ 5 v j ) (n ;::: 0). v Composition and identities are as in C. That this makes sense is clear from a glance at the diagrams V --;-:---+) id u V /I~ v ---:j,----+) V --g--+) W
260 11 Recursive Specification of Data Types As in 10.2.3, we then use an initial object to generalize "least:" a colimit of (c n ) is an initial object in the category of upper bounds of (c n ). Thus, if (U, IX) is a colimit of (c n ) and (V, p) is an arbitrary upper bound of (c n ), there exists a unique f as in 5. If (V, f3) is also a colimit, this f is an isomorphism in C as well as in the category of upper bounds of (c n ). The category C has co limits of right chains if every right chain has a colimit. Colimits of right chains are dual to limits of left chains, as discussed in Section 11.3. Although the decision as to "which is the original and which is the co" is arbitrary, this terminology is well established in category theory. We now check that colimits of right chains do truly generalize least upper bounds of ascending chains in a po set. 6 Example (Least Upper Bounds in a Poset). Let C = C(P. S) be a poset considered a category as in Section 2.1. Then a right chain coincides with the notion of ascending chain Co:::;; C 1 :::;; C2 :::;;···· If U is an upper bound of (Cn ) in the sense of the theory of posets, that is, if U E P with Cn :::;; U for all n, then using IXn to name the unique Cn ~ y, we have that (U, IX) is an upper bound of (Cn ) in the sense of 3 because all diagrams commute in a poset, those of 4 in particular. Pursuing the same reasoning a little further, U = V Cn if and only if U is the colimit of (Cn ) (and, for po sets, colimits are unique-not just up to isomorphism). 7 Example (Colimits of Right Chains in Set). While somewhat involved, this construction is basic and deserves the reader's detailed attention. Let be a right chain in Set, so that each Cn: Cn -----. Cn +1 is a total function. We will construct a colimit (U, IX) of (c n ) explicitly. The idea is a simple one: think of the functions Cn as "inclusions" and "take the union." The analogy falters a little because the Cn are not required to be injective. The more precise way of thinking of "Cn C U" is that, for each n, an x E Cn is represented by the "ultimate value" of the sequence 8 (which is the constant sequence x when the Cn are inclusions). Since the Cn are sets without further structure there is no reasonable sense in which 8 should "converge," so we let the "ultimate value" be represented by the sequence itself. But we will regard two sequences as "having the same ultimate value" if they agree on all but finitely many values. Here, then, are the precise definitions. For m :::;; n let Cmn : Cm ~ Cn be the composition Cn-1Cn-2"'Cm (=id if m=n). Set A=ll(Cn:n~O}=
261 11.1 From Least Upper Bounds to Least Fixed Points {(n, x): x E Cn, n 9 ~ (n, x) O}. Define an equivalence relation (m, y) ~ ink ~ ~ on A by m, n That ~ is reflexive and symmetric is obvious. That ~ is transitive is easy since if cnk(x) = cmk(y), cm/(Y) = ct/(z), choose u ~ k, 1so that cnu<x) = CkuCnk(X) = CkuCmk(Y) = cmu(Y) = c/ucm/(y) = c/uct/(x) = ctu(z). Define U to be the set of ~ -equivalence classes of A, so that U = {[n, x]: x E Cn}, where en, x] is the equivalence class of (n, x), the set of all (m,y) with (n,x) ~ (m,y). Thus, [n,x] = [m,y] ¢>(n, x) ~ (m,y). Since [n,x] = [n + 1, cn(x)] = [n + 2, Cn+1 cn(x)] = ... , this captures the intuitive context of 8. Define 10 Cn~U, x 1--------+ [ n, x]. We must show that (U, a) is an upper bound of (c n ), that is, ~\ )~" u But if x E Cn, an +1 (cn(x)) = [n + 1, cn(x)] = en, x] show that if (V, P) is another upper bound then = an(x). Finally, we must 11 [n, x] 1--------+ Pn (x) is the unique function f with Cn u:I~ f (n ~ 0). lV First of all, is 11 well-defined, that is, if en, x] = em, y] is Pn(x) = Pm(Y)? Well, as (n, x) ~ (m, y) there exists k ~ m, n with cnk(x) = cmk(y). But as (V, P) is an upper bound we have n --->l en _ _c.::.... e e Ck - 1 e1 '.'~jy' V
262 11 Recursive Specification of Data Types so that Pn(x) = PkCnk(X) = PkCmk(Y) = Pm(Y). It is then obvious that frt. n = Pn. Furthermore, if also g: U --+ V satisfies grt.n = Pm then g[n, x] = grt.n(x) = Pn(x) = f[n,x], so g = f 12 Observation. Let the right chain (c n) in a category C have colimit (U, rt.) and let f: U --+ V be any C-morphism. Then (C,frt.) is an upper bound of (c n), where (frt.)n = frt. n, as is immediate from ),c. , <;~ flX n U flX n +1 1f V I t follows that if g: U --+ V satisfies grt. n = frt. n for all n, then g = f We have seen that po sets generalize to categories, and monotone maps generalize to functors. In the next section, we shall see that continuous maps generalize to co-continuous functors, and that a generalized Kleene fixed point theorem is available for co-continuous functors in a suitable category C (11.2.13). We shall also prove that every polynomial functor Set --+ Set is co-continuous (11.2.12). As an immediate corollary, we then have: 13 Generalized Kleene Fixed Point Theorem for Polynomial Functors. Every polynomial functor 1/1: Set --+ Set has as least fixed point the colimit (L, rt.) for the right chain 1.. ____ 1/1(1..) ~ 1/1 2 (1..) 1P2(!)j 1/1 2 (1..) ____ ••.• Since this result can be applied without study of the proof, we now provide examples for readers who do not wish to study the technical details of Section 11.2. 14 Finite Lists of Natural Numbers. We now reopen the discussion of 10.2.16. There we showed directly that the polynomial functor 1/1: Set --+ Set, I/I(C) = N + C x N has a least fixed point of the form (N+, Jl). But we have yet to see how to define the maps createk and spec of 10.1.1 and 10.1.2, and this we do now. By Theorem 13, (N+, rt.) is a colimit of I/In(!) for appropriate rt.. By the discussion of 10.1.10 we may think of rt.k as the inclusion of N + ... + N k in N+. But then we have 15 To define spec, first define for each k the maps
263 11.1 From Least Upper Bounds to Least Fixed Points 16 (m,(n 1 , · · · ,nk »1---+ { nm.1 if1<m<k I e se. If we are to play fair in defining SpeCk (and, eventually, spec), we should only use properties of N that we can deduce by creating N as in 10.2.15. There, a least fixed point isomorphism decomposed N as the coproduct 1~N~N. 17 But if C = A + Band B = D + E we should have C = A + D + E. This is true, but we need to be more precise. Thus, if and D~B~E A~C~B are coproducts, claim that ~D A---~'C is a coproduct. For, given fA: A -+ ~E F, fD: D -+ F, fE: E -+ F, define fB by w v and thenfby Then ft = fA' fuv = fBv = fD' fuw = fBw = fE. Furthermore, if gt = fA' guv = fD' guw = fE then gu = fB so that g = f Applying this to 17 we have o ---~'N ~1 ~N is a ternary coproduct. Applying very similar ideas, for any k ;;:: 1,
264 11 Recursive Specification of Data Types is a (k - I)-ary coproduct. By the distributive law 10.1.9, N x N k decomposes as a (k + l)-ary coproduct and we may regard this as a coproduct in Pfn by 2.3.18. Thus, speck is defined in Pfn by [siO,id] 18 ) N X Nk speCk) N = {l. Pi i= 0 I~i<k, where Pi(n 1 , ••• ,nk ) = n i. Applying the polynomial functor N x - to the colimit N+ of N k c Nk+l we see that N x N k - - + N X Nk+l has colimit N x N+, at least in Set. Our objective is to define spec: N x N+ - - + N by the colimit property, so we had best pause for the following: 19 Observation. If (U, IX) is a colirnit of (c n ) in Set then (U, IX) is a colirnit of (c n ) in Pfn too. PROOF: Let (V, f3) be an upper bound of (c n) in Pfn. Thus, ~~}~:H V where --1 arrowheads denote total functions. Define W ')In: en --1 W by 'Yn(X) = {f3*n(X) if x E DD(f3n) else. Then in Set, so there exists unique fo with Cn I~ u Define f: U ~ V by 10 IW = W + {*} and define
265 11.1 From Least Upper Bounds to Least Fixed Points f(u) = {fo(U) 1. if fo(u) -# else. * Then frxn = Pn· Moreover, if grxn = Pn in Pfn then gOrxn = Yn in Set if we define go(u) = * when u ¢ DD(g), go(u) = g(n) else, so that go = fo and hence 9 = f To conclude the definition of spec, observe holds in Pfn, so that (N,(specklk desired map ~ 0)) is an upper bound, inducing the from the colimit. 20 Example (s-Expressions). In the programming language LISP, an s-expression is either an atom or a pair of s-expressions. This recursive specification corresponds to the polynomial functor 1/1: Set -+ Set, I/I(C) = A + C x C, A being the set of "atoms". By Theorem 13, the least fixed point L is obtained as the colimit of I/In(0). We compute 1/1(0) = A 1/1 2 (0) = 1/13(0) A+A A x = A + (A + (A x A)) x (A + (A x A)) =A+AxA+~x~xA+Ax~x~ so that L is the set of all binary trees with labels only at the leaves, where each label is an element of A. Each s-expression decomposes into two trees, a "head" and a "tail". In LISP notation, the head function is denoted CAR, and the tail function CDR. The fixed point isomorphism A+LxL~L uncouples to give the inclusion Jl in! of the atoms, and the (partial) CAR and CDR functions: L CAR L lp, /J-I I A + (L xL) PR LxL ILxL lp2 CDR L
266 11 Recursive Specification of Data Types EXERCISES FOR SECTION 11.1 1. Taking note of 2.3.18 show that 19 generalizes to Mfn and ANMfn. 2. State more precisely and prove: In any category, the colimit of the right chain C~C~C~C~··· exists and is C itself. 3. Let X = {1, 2, 3, ... } and let f(x) = 2x. Consider the colimit of X~X~X~X~··· as constructed by 7. Characterize the equivalence relation (m, x) ~ (n, y) in terms of the prime factorizations of x, y and show that there are infinitely many equivalence classes in the colimit. 4. Let X={O,1, ... ,9}. Let djEX be the ith digit after the decimal point in the expansion of 'It, d 1 = 1, d2 = 4, d3 = 1, d4 = 5, d s = 2, .... Define J;: X -+ X by /;(d) = { d if d "2: dj 0 if d < d j • Is the colimit of X~X~X~X~··· finite or infinite? 11.2 Co-continuous Functors We now define "co-continuous functors" to generalize continuous maps between domains and prove that all polynomial functors Set --+ Set are cocontinuous. We shall then prove the generalized Kleene fixed point theorem promised in the previous section. 1 Observation. Let ljJ: C --+ D be a functor. Then if (c n ) = CO~Cl ~C2~C3--+··· is a right chain in C, (ljJc n) = ljJCo ~ ljJC 1 ~ ljJC2 ~ ljJC3 --+ ... is a right chain in D. Furthermore, if (U, oc) is any upper bound of (cn ) then (ljJU, ljJoc)-where (ljJoc)n = ljJocn-is an upper bound of (ljJc n), since the functoriality of ljJ yields Thus, any functor preserves right chains and upper bounds of right chains.
267 11.2 Co-continuous Functors 2 Definition. A functor t/J: C --+ D is co-continuous if whenever cn : Cn -+ Cn+1 is a right chain in C and (U, a) is a colimit for (c n ) then (t/JU, I/!a) as in 1, automatically an upper bound for (I/!c n ), is a colimit for (I/!c n ). 3 Example. By 11.1.6, if C and D are domains regarded as categories then a continuous map is just a co-continuous functor. (The awkwardness of the terminology is unfortunate; the dual notion of "continuous functor" will be introduced in 11.3.7 in connection with greatest fixed points, and here the terminology in category theory is too standard to change; and the dual notion to continuous maps of po sets seems not to playa role in the order semantics of control structures owing, for example, to the failure of Pfn(A, B) to be a "codomain." Definition 2 generalizes, in particular, to arbitrary posets as opposed to domains. 4 Example. Let I/!: Set --+ Set be a functor such that I/!(A) = 2A (two examples are given in 10.1.24 and 10.1.25). Then I/! is not co-continuous. For the right chain {o} c {O, 1} c {O, 1,2} c ... has (N, a) as colimit where an (x) = x, as is clear from 11.1.7. Again applying 11.1.7 to construct the colimit of I/!{O} ----+ I/!{O, 1} ---+ I/!{O, 1,2} ---+ ... , even though the functions I/!{O, ... ,n} ---+ I/!{O, ... ,n + 1} are not definite, we can still assert that the colimit is obtained by partitioning a countable union of finite sets into equivalence classes, and so is a countable set (see Exercise 1). Since I/!(N) = 2N is not countable (see Exercise 10.1.9), the argument is complete. 5 Theorem. The constant Junctor D: C --+ D oj 10.1.16 is co-continuous. PROOF: If an = id D , (D, a) is an upper bound of D~D~D~··· and is, in fact, colimit since if (V, p) is another upper bound then id shows all Pn have a common value; hence, has a unique solution J = Po. o
268 11 Recursive Specification of Data Types 6 Theorem. The identity functor ide: C -+ C is co-continuous. 7 Theorem. Let D have finite coproducts and let t/!l' ... ,t/!k: C -+ D be cocontinuous. Then their coproduct t/!l + ... + t/!k: C -+ D as in 10.1.19 is cocontinuous. PROOF. Let (c n ) be a chain in C with colimit (U, oc) and let (D, P) be an upper bound of(t/!l + ... + t/!d(cn ). Consult the diagram ""cn ",,(1.n/ ""lCn ~ "'I c. + ... + ----i-----___ ~_ "'IV ~ I ~ "'icl·+1 ~ "'kC. '" '" I "'I C.+1 + ... + "'kC.+I ~CnR~ ••• + o/~ '1\ 1"'1(1.. + ... + "'k(1..---____~ / "'I V + ... + "'k V ------7-----. D 1'.+1 where 1 ~ i ~ k and t i, ui, Vi are coproduct injections. We must show there exists unique f with f(t/!l OCn + ... + t/!kOCn) = Pn for all n ~ O. As the diagram shows, for each i, (D, pu;), with (pu;)n = PnUi, is an upper bound of (t/!iCn) so that, as t/!i is continuous, there exists unique J; with J;(t/!iOCn) = PnUi for all n. By the definition of a coproduct, there exists unique f with ft i = J;. Then for each i, f(t/!l rY. n + ... + t/!kOCn)U i = = fti(t/!iOCn) J;(t/!iOCn) = PnUi so that, by the uniqueness property of coproduct-induced maps, f(t/!lrY. n + ... + t/!krY. n) = Pn. Now suppose g(t/!locn + ... + t/!krY. n) = Pn for all n. Then for each i, = PnUi so that, recalling the uniqueness proviso in the definition of J;, gti = J;. But then g = f D 8 Theorem. Let t/!l: C -+ D, t/!2: D -+ E be co-continuous. Then their composition t/!2 t/!l: C -+ E is co-continuous. The astute reader may have observed that in 10.1.19 and Theorem 7 above, the finiteness of the coproducts was only a notational convenience. Any coproduct of co-continuous functors is co-continuous. We have so far not proved that a finite product of co-continuous functors C -+ D is co-continuous and, indeed, this is true only for special categories D. The next theorem establishes this for D = Set and here the finiteness of the
269 11.2 Co-continuous Functors product is crucial. This result is important, if difficult, and merits the reader's careful scrutiny. 9 Theorem. Let t/ll x ... x t/lk: C PROOF: t/ll, ... , t/lk: C -+ Set -+ be co-continuous. Then their product Set as in 10.1.18 is again co-continuous. It is best to isolate the following: 10 Lemma. For i = 1, ... , k let be a right chain in Set with colirnit (Vi, exi). Define (V, ex) by V = VI X ... x vk, exn = ex~ x ... x ex~: C; x ... x C! ~ V. Then (V, ex) is a colirnit of CJ x ... x C~ C6 x ... x c~) x ... x q d x ... x c1) Ci x ... x q ~ .... ct We leave it as an isomorphism-chasing exercise to show that if the lemma holds for any choice of colimits (Vi, exi) then it holds for all such choices. This granted, we assume (Vi, exi) is constructed as in 11.1.7. Now consider the diagram PROOF OF LEMMA. .: ......j C' X ... X ck ~ / - .., .., c' X •.. X n ck n ) C' x· .. X ck u----r--+ v with (V, 13) an upper bound for (c~ x ... x c~). We must show that there exists unique f with f(ex~ x ... x ex~) = f3n for all n ~ o. Recall the notations of 11.1.7. A typical element of V has the form ([nl,x l ], ... , [nbx k]) with XiECn, But-and here is where finiteness comes in-there exists I with I ~ ni for all i = 1, ... ,k. Then [ni,x;] = [/,y;] if Yi = C~il(XJ Thus, if f exists we must have f([nl,x l ],··· , [nbxk]) = f(ex{(yd,··· ,ex~(Yk)) = f3l(Yl'··· ,Yk) which proves the uniqueness of f To prove existence define f this way, that is, define 11 f([n l , Xl],· .. , [n k, Xk]) for any I = f3l(C~,lxl)'··· , C~kl(Xk)) ~ n l , ... , nk • Since f3t+1 (c tl X ... x c~) = f3t for all t, this definition is independent of l. In particular, for any n, f(ex~(Xl)' ... ' ex~(Xk)) = f([n, Xl], ... , en, Xn]) = f3n(X l ' ... , xd as desired. The main theorem then follows from the lemma by starting with a colimit (w, y) on a right chain (en) in C and setting (c~) = (t/liCn), (Vi,ex i ) = (t/li W, t/liY). D
270 11 Recursive Specification of Data Types 12 Corollary. Any polynomial functor Set -+ Set is co-continuous. We now generalize the Kleene fixed point theorem to provide least fixed points of co-continuous endofunctors. Note that the conditions on C simply generalize the conditions on a poset which make it a domain. 13 Generalized Kleene Fixed Point Theorem. Let C be a category with an initial object .l and such that every right chain has a colimit. Then every functor t/J: C -+ C which is co-continuous has as least fixed point the colimit (L, IX) for the right chain This right chain (c n), Cn = tjJn(!), has been constructed so that t/Jcn = Hence, if Pn = IXn+1' (L, P) is an upper bound of the right chain (t/Jc n), that PROOF. n C +1' is, But as t/J is co-continuous, (tjJL, tjJlX) is a colimit of (t/Jc n). Hence, there exists a unique Jl: 14 ",n+1(~) ",a! ~n+1 ",L (n ~ 0) , L Jl so that (L, Jl) is a t/J-algebra. To show that (L, Jl) is the least fixed point of t/J, we must (by 10.2.3) show that it is the initial t/J-algebra. So let (A, J) be any t/J-algebra. We must find a unique f with fJl = J(t/Jf). 15 Define fn: t/Jn(.l) 16 ",n+1(~) "'a/ "'fl ",A (?) (j ,\"+1 , L 1f 'A A recursively by fo =! /"+1 and claim that ",L Jl = t/J(t/Jn(.l)) tpIn, tjJA ~ A
271 11.2 Co-continuous Functors c. I/I.(.L) ~ A 1 I/I.+1(.L) ).+1 commutes for all n ~ O. This is clear for n = 0 as 1.. is initial. For the inductive step, f,.+2 Cn+1 = c5(t/lfn+l)(Cn+1 ) = c5(t/lfn+l)(t/lcn) = c5(t/I(fn+l cn)) = c5(t/lf,.) = f,.+I· Hence, there exists a unique f 17 L ----,,----+1 A f By 15 and observation 11.1.12, f/1 is the only morphism g: t/lL ---+ A with g(t/lcx n) = fcx n+1 for all n ~ O. In view of 17, f/1 is the only g with g(t/lcx n) = f,.+I' But also (c5(t/I(f)(t/lcx n) = c5(t/I(fcx n)) = c5(t/lf,.) = fn+1 so that f/1 = c5(t/ln. If also h/1 = c5(t/lh) then hcxn = fn by induction, since both paths are! when n = 0, and the inductive step is, consulting 15 (with h instead of f), hcx n+1 = h/1(t/lcxn) = c5(t/lh)(t/lcxn) = c5(t/lf,.) = fn+l' Then by the uniqueness of fin 17, h = f 0 18 Corollary. The Kleene fixed point theorem 6.2.13 is the special case that C is the category C(p. oS) of a domain (P, ~). EXERCISES FOR SECTION 11.2 1. Let X = XI U X2 U X3 U ... , where each Xi is countable. Prove that X is countable. [Hint: Use the coproduct property to construct a surjective function liXi ---+ X and conclude that it suffices to prove lixi is countable. To this end let PI < P2 < P3 < ... be prime numbers. List the elements of Xi and let /;: Xi -> N map the jth element of Xi to P2iP2j+I' Show that the induced function liXi ---+ N is injective.] 2. Let t/!:Set->Set be the functor t/!{A)=A*, where J:A->B, t/!(f)(al,···,an ) = (fad' ··(fan)EB*. Verify that t/! is a functor and show that t/! is co-continuous. [Hint: Write t/! as an infinite coproduct of polynomial functors.] It follows that the functor A f---+ B x A * is co-continuous; its least fixed point formalizes the semantics of "a widget is an element of B or a list of widgets." 3. In the spirit of Exercise 2, formalize "a widget is an element of B or a finite set of widgets" as the least fixed point of a co-continuous functor Set -> Set. 4. In the spirit of Exercises 2 and 3 use 8 to formalize "a widget is an element of B or a list of finite sets of lists of widgets" as the least fixed point of a co-continuous functor Set -> Set.
272 11 Recursive Specification of Data Types 11.3 Continuous Functors and Greatest Fixed Points The dual of the Generalized Kleene Fixed Point Theorem (11.2.13) asserts that if r/J is continuous and T is terminal then the limit of the left chain 2' "'_r/J3 T ~r/J2T • • ~tpT ~ T provides r/J with its greatest fixed point. The definitions and general theory are dual to those in the preceding two sections and so have already been done. The main task of this section is to explore limits in Set and to show that polynomial endofunctors of Set are not only co-continuous but continuous as well and thus are "bicontinuous." 1 Definition. Let C be a category. Notations for the dual category cop were given in 2.2.13. A left chain in C is a right chain in cop and so has the form Thus, left chains proceed from infinity from the left. Let (en) be a left chain. A lower bound of (c n)is (U, IX) with IX = (IXnl n ~ O), IXn: U -+ en such that (U, IX) is an upper bound of (c n ) in cop: (n ~ O). C A limit of (c n) is a lower bound of (c n) which, in COP, is a colimit of (c n). Thus, if (V, /3) is a lower bound of (c n ) in C, there exists unique f: V -+ U with (n ~ O). C C has limits of left chains if every left chain has a limit. 2 Example. Let C = C(p,";) be a po set qua category as discussed following 10.1.7. A left chain is a descending chain •.• .:-s; A lower bound is an element X3 X .:-s; X2 with X .:-s; .:-s; Xl Xn .:-s; Xo' for every n, and a limit is the
11.3 Continuous Functors and Greatest Fixed Points 273 greatest element of the set of lower bounds, that is, x is a limit of (xn) if x :::;; Xn for all n and if for every y with y :::;; Xn for all n, y :::;; x. This is a special case of the greatest lower bound introduced earlier in 6.1.1. In the theory of domains, ascending chains represent approximating sequences as discussed following 5.1.16, in the notes for Chapter 6, and in Section 13.1. While this interpretation fares poorly for right chains in Set, this is not so for left chains as we next explore. 3 Example (Limits of Left Chains in Set). Let be a left chain in Set. We think of the elements of Cn as "structures of depth n," and cn(x) as "the highest n-Ievels of the depth n + 1 structure x;" cn(x) "approximates" x. An example to keep in mind is A an alphabet, Cn = An = words of length n, cn(a 1 '" an+1) = a 1 ••• an' An approximating sequence is a sequence (xn In;::::.: 0) with Xn E Cn such that cn(x n+ 1 ) = Xn for all n. In the examples above, an approximating sequence amounts to an infinite list a 1 a Z a 3 '" represented as the sequence of its finite approximations a 1 , a 1 a z , a 1 a Z a 3 , ••• • In general, given the left chain (en), set U to be the set of all approximating sequences of(cn ) and define IXn: U --+ Cn by IX n(X O,X 1 ,x Z " •• ) = X n• If (x O ,x 1 ,x Z , ••• )EU then cn(x n+1) = Xn so that Cn lXn+l = IXn and (U,IX) is a lower bound of (c n ). We will show it is in fact a limit. Let (V, 13) be another lower bound of (c n ). We must show there exists unique f: V --+ U with c. i~ u +-(-f"'--- V Thus, if such f exists the nth coordinate of f(v), being 1X.f(V), must be f3n(v), that is, 4 is the only possible way of defining f The only thing left to check is that f(v) in 4 is an approximating sequence, that is, that cn(f3n+1 (v)) = f3n(v). But this is precisely the condition that (V, 13) be a lower bound for (en). It was observed in 11.1.19 that colimits of right chains of total functions are the same in Set and Prn. Such is not the case for limits of left chains. Limits of left chains in prn are explored further in Exercises 1-4. 5 Example. Let A be a set ("the alphabet") and let (c n ) be the left chain 6 where cn(a 1 ••• an +1 ) = a 1 ••• an' Then (A OO , IX) with IXn(a 1 aZ a 3 ••• ) = al ... an- 1 is the limit of (en). It is a lower bound since Cn lXn+l (a 1 a Z a 3 "') =
274 11 Recursive Specification of Data Types cn(a l ···an+1) = a l ···an = IX n(a l a Za 3 ... ). If(V,fJ) is an arbitrary lower bound define Yn(v) to be the last symbol in f3n+I(V). We shall prove that f3n+I(V) = YI (v)··· Yn(v) for all n? 0 by induction. This is clear for n = O. For the inductive step, since Cn +1 keeps all but the last symbol of a string of A n + l we have that f3n+Z(v) = (c n+l f3n+Z(V»Yn+1 (v) = f3n+1 (V)Yn+1 (v) = YI (v)··· Yn(V)Yn+1 (v). It then follows .that (U, IX) is a limit of (c n). Define f: V --. U by f(v) YI(V)YZ(V)Y3(V)···. Then 1X0f = ! = 130 whereas (IXn+d)(v) = IXn+1 (YI (v)YZ(V)Y3(V)· .. ) = YI (v)··· Yn(v) = = f3n+1 (v). If also g: V --+ U satisfies IXnf = f3n for all n then f(v) = X I X Z X 3 ··· IXn+d(v) = f3n+1 (v) = YI (V)··· Yn(V) so that f = g. , where Xl··· Xn = An alternate way to prove (AOO, a) is a limit is to show that it is isomorphic to the construction of Example 3; we leave this as an exercise for the reader. 7 Definition. A functor IjI: C --. C is continuous if whenever Cn: Cn +l --. Cn is a left chain in C and (U, IX) is a limit for (c n) then (IjIU, IjIIX), automatically a lower bound for (c n ) (see the diagram below), is a limit for (c n ). 8 Theorem. A constant functor and the identity functor are continuous. A composition of continuous functors is continuous. If D has finite products and IjI I' ••. , IjIk are continuous functors C --. D then their product IjJ I X ... X IjJk (as in 10.1.18) is continuous. PROOF. This follows by duality from Theorems 11.2.5-8. Here a crucial observation is that IjI: C --. D "is" also a functor IjJop: cop --. DOP defined by 9 IjIOP( C) = IjIC, IjIoP(CI ~ Cz ) = IjJCI ~ IjICz , functoriality being clear. (In understanding IjJop, the reader may wish to consider the case where C, Dare po sets; here, to say f is monotone one could say "Xl ::; X z => fX I ::; fx z" or, alternatively, "Xl? X z => fX I ? fx z ," the latter description being rp.) Also, dualizing the remark following 11.2.8, any product of continuous functors is continuous. D
275 11.3 Continuous Functors and Greatest Fixed Points The dual of Theorem 11.2.9 applies to functors SetOP --+ SetOP , not to Set --+ Set. The following, then, is a new theorem whose proof must be supplied: 10 Theorem. Let r/Jl' ... , r/Jk: C ---+ Set be continuous. Then their coproduct r/J 1 + ... + r/Jk: C ---+ Set (as in 10.1.19) is continuous. PROOF. Just as in the proof of 11.2.9, it is useful to first prove the following: 11 Lemma. For i = 1, ... , k let be a left chain in Set, with limit (Vi, a i). Define (V, a) by V = V 1 + ... an = a! + ... + a!: V ---+ C; + ... + C!. Then (V, a) is a limit of + Vk; PROOF OF LEMMA. As in the proof of Lemma 11.2.10, it is left as an exercise to show that we may assume each (Vi, a i) is constructed as in Example 3. Now consider C;+1 + ... + C!+l c' n + ... + ck n ) C~ + ... + C! ~fn f ~!+ ... +~! v -------+ u with (V, P) a lower bound for (c! + ... + c!). We must construct unique f with (a! + ... + a!)f = Pn for all n ?: 0 as shown. Let v E V. There exists unique i E {1, ... , k} such that Po(v) has form (i,x o), Xo E C~. Since (i, x) = Po(v) = Pl (ct + ... + d)(v), Pl (v) must have the form (i, Xl) with Xl E Cf, C~(Xl) = Xo. Continuing in this way, Pn+l (v) has the form (i, Xn+1) with c!(x n+1 ) = Xn. Define f(v) = (i, (x o, Xl' X2'· .. )). Since (xo, Xl'···) is indeed an approximating sequence for the left chain (c~), it is in Vi so that f is well defined. The remaining details are clear. For the proof of Theorem 10, let (c n ) be a left chain in C and set D ct: Cn+1 --+ Cn = r/Ji(Cn): r/Ji Cn+l ---+ r/JiCn· Unlike 11.2.9, the finiteness of the coproduct in Theorem 9 is not crucial. The same proof shows that an arbitrary coproduct of continuous set-valued functors is continuous. 12 Definition. A functor r/J: C co-continuous. --+ D is bicontinuous if it is continuous and Combining the theorems of this section and 11.2.12, we have a major result:
276 11 Recursive Specification of Data Types 13 Theorem. A polynomial functor t/I: Set ~ Set is bicontinuous. Next, we spell out the dual of 11.2.13 as an exercise in using duality 14 Theorem. Let C be a category with a terminal object T and such that every left chain has a limit. Then every continuous t/I: C ~ C has as greatest fixed point the limit of the left chain ... ---+ t/l3(T) 'I'2(i\ t/l 2 (T) ~ t/I(T) ~ T. PROOF. cop has an initial object, namely, T; t/l0P: cop continuous and the right chain T ~ t/I(T) ~ t/l 2 (T) 'I'2(!) ~ cop as in 9 is co- < ... (we write t/I rather than t/l0P to avoid tedium) has a colimit, call it (G, oc), in cop so that, by the proof of Theorem 11.2.13, there exists unique M such that ",n+1(T) "'f "'L (n ~ 0) \+1 M <L and (L, M) is the least fixed point of t/l0P. It follows that there exists unique M in C such that 15 M \ J: ",n+1( T) Given an arbitrary t/I-coalgebra (A, Ll), (A, Ll) is a t/l0P-algebra so that there exists a unique f in cop with That is, there exists a unique f in C with 16 ~ A----+''''A 11 L M 1"'1 '",L Thus, (L, M) is the greatest fixed point of t/I. o 17 Example (Infinite Lists). The limit of Example 5 arises from the polynomial functor t/I: Set ~ Set, t/I(C) = A x C whose least fixed point is empty but whose greatest fixed point is the limit of the left chain
277 11.3 Continuous Functors and Greatest Fixed Points ... ~ ((T X C) X C) (j x id) X x id j (T X C) X C i X idj T X C ~ T = {A} which, up to isomorphism, is just 6. The greatest fixed point is (AOO, M) with M being the isomorphism 18 Notation for Recursive Specification. If 1jJ: C -+ C is an endofunctor C: := 1jJ(C), C =: : 1jJ(C), specify the least and greatest fixed points of 1jJ. Criteria to ensure existence have been developed in this chapter. The presence of both possibilities encourages careful thought in advance as to whether one wants to define functions from or to the data type to be specified. EXERCISES FOR SECTION 11.3 Let C, D be categories. A functor 1jJ: C -+ D is an isomorphism of categories if IjJ is bijective on objects and if for every C1 , Cz , f 1---+ IjJf is bijective from C(C1,Cz) to D(IjJC1,IjJCZ )' C, D are isomorphic categories if such an isomorphism IjJ exists. 1. Show that "isomorphism of categories" is an equivalence relation. 2. Show that if C, D are isomorphic categories then C has limits of left chains if and only if D does. Prove a similar statement for colimits of right chains, products, and coproducts. [Hint: Use duality!] 3. A set with base point is (X, xo) where X is a set and Xo E X. These form the objects of a category if a morphism f: (X, x o) ------ (Y, Yo) is a total function f: X ...... Y with f(x o) = Yo. Verify that this is a category if composition and identities are the usual ones for total functions. Then prove that this category is isomorphic to PCn. [Hint: If C is the category of sets with base point define tjI: PCn ...... C by tjlx = (X u {.1}, .1), where .1 rt: X. For f: X ...... Yin PCn define (tjlf)(x) = {f.1(X) if x E DD(f) else. Then tjI is an isomorphism.] 4. Prove that PCn has limits of left chains. [Hint: the constructions of 3 generalize directly to the category of sets with base point.] 5. Show that the functor tjI(A) = B + A * of Exercise 11.2.2 is bicontinuous. 6. Define tjI: Set ...... Set by tjI(A) = Aoo, (tjlf)(a t a Za 3, ... ) = f(a t )(fa z )(fa 3) .. ·. Verify that tjI is a functor. Show that tjI is continuous. [Hint: tjI is an infinite product of polynomial functors.] Show that tjI is not continuous. Conclude that tjI is not a polynomial functor.
278 11 Recursive Specification of Data Types 7. Let C be a category. Define a new category C x C with objects all pairs (C 1 , C2 ) with C 1, C2 C-objects, morphisms (f1,f2): (C 1, C 2) ~ (D1' D2 ) pairs f1: C1 ..... D1, f2: C2 ..... D2 of C-morphisms and composition and identities as in C on each coordinate. Verify that C x C is a category. Also show that if C has finite products, finite coproducts, colimits of right chains, or limits of left chains then C x C does too by performing the corresponding C-construction on each coordinate. 8. For Set x Set as in Exercise 7, the semantics of the simultaneous recursive specification L := A M:= L + (A x L) + M + (B x M) arises by computing the least fixed point of Set x Set ~ Set x Set, (L, M) 1-----+ (A + (A x L) + M, L + (B x M». Show that this functor is co-continuous so that 11.2.13 applies. Show that the object part of the least fixed point is (A*A + A*(B + A*)*A*A,(B + A*)*A*A) by using substitution and 10.2.9. Notes and References for Chapter 11 The Generalized Kleene Theorem 11.2.13, clearly stated as a generalization of the Kleene fixed point theorem 6.2.13 from posets to categories, is due to M. B. Smyth, "Category-theoretic solution of recursive domain equations," Theory of Computation Report No. 14, Department of Computer Science, University of Warwick, 1976. The reader may wish to consult a text in category theory to learn more about limits and colimits. Colimits of right chains and coproducts are examples of colimits. A simple colimit, not otherwise discussed in this book, is that of a coequalizer of a pair f, g: X ..... Y, this being a morphism of form h: Y ..... Z such that hf = hg and with the property that (see the diagram below) X f ===:::: y g h I Z ~> Z' whenever h'f = h'g there exists a unique ()( with ()(h = h'. An important theorem is that all colimits can be constructed from coproducts and coequalizers. Dual statements, of course, hold for limits. Greatest fixed points in sets and their utility for the semantics of data types are due to the authors in their paper on parametrized data types cited in the notes to Chapter 10.
CHAPTER 12 Parametric Specification 12.1 Arrays 12.2 Stacks and Queues 12.3 A Functional Programming Fragment Revisited In Section 2 we will define "lists of E" in terms of the least fixed point specification E*: := 1 + E x E* (i.e., "a list is the empty list (1 = {A}) or an element of E followed by a list). But the parenthetical explication just given works only in Set, whereas the specification works in any category in which polynomial functors have least fixed points. In this sense E is a "parameter" for the "lists-of-" specifier. The specific parameter E may itself arise from another data type specification and may live in a category of arbitrary complexity subject to the technical needs of semantics. We do not wish to address the general theory of "computable parametric specification" in this book. We are content to give a leisurely discussion of three parametric specifiers: arrays-of-, lists-of-, and DTNs-of-. Arrays are considered in Section 1. No recursion is used here but products are required to distribute over coproducts in order to define "array-handling" functions familiar to programmers. In Section 2 we construct lists and regard these alternatively as stacks or as queues depending on which morphisms defined on lists are regarded as focal. Here recursive specification is central. Section 3 formalizes the data type DTN of Section 1.3 and makes it clear that the functional programming fragment FPF, which received a formal semantics in Pfn in Section 6.3, could in fact be given a formal semantics in a broader class of categories.
280 12 Parametric Specification 12.1 Arrays The Pascal type declaration array 1 .. n of e 1 is a simple data type; that is, it involves no recursive specification. If e is a set, the informal semantics of 1 is the n-fold Cartesian product set en = ex· .. x e together with the functions 2 spec: en x {l, ... , n} ((c l 3 assign: en x e ,···, -----+ e cn), i) 1----+ C i x {l, ... , n} -----+ en ((c l ,·.·, cn), C, i) f-----+ (c l ,· .. , Ci - l ' C, Ci+l,···, cn) which express the Pascal operations X[i] 4 X[i]:= C if X is of type 1. The choice of 2 and 3 as "functions belonging to an array" is somewhat arbitrary, but suitable to illustrate a number of ideas. For the balance of this section we work in a category C which satisfies the following axioms which are also required in Section 2: 5 C has a terminal object, call it 1. 6 C has finite coproducts. The coproduct of 1 + ... denoted En]. 7 + 1 (n times) will be e has finite products. 8 For each object A, A x - preserves finite coproducts, that is, if in i: Bi-----+ BI + ... + Bn is a coproduct then idA x in i: A x Bi -----+ (A x B 1 ) + ... + (A x Bn) is again a coproduct. These axioms all hold in Set (verify!). We need one technical result: 9 Observation. For any object A, pr I: A x 1 -----+ A is an isomorphism. PROOF. Consider
12.1 Arrays 281 We see that [idA,i]pr l = [prl,i] = idAxl ' But prl[idA,i] = idA- This shows pr I is an isomorphism with inverse [idA, i]. D We are now ready to define the semantics of 1 in C, where of C. The "object part" is en, the n-fold product. Then en id x inj: e is any object x 1 ---+ en x [n] is a coproduct by axiom 8, which with 9 allows the following definition of spec: 10 C" pr- 1 1 I en x 1 id x in. J_~ C" x En] ~ ........ Plj~ ... _---spec - C This clearly yields 2 if C = Set. To define assign, we make use of the following: 11 Theorem (Bi-index Principle). Let AI' ... , Am, B I , ... , Bn be objects and let h/ Ai ---+ Bj i = 1, ... , m, j = 1, ... , n. Then there exists unique f: Al + ... + Am ---+ BI X ... X Bn such that I A, ~"l + A. 'jP,~ B, ' B, (all i, j) A ; - - - - - - - - + I Bj lij PROOF. For fixedj there exist unique fj hj with fjini = fu for all i. Let f be the unique map with prjf = fj. Then prjfini = fjini = hj. If also prjgini = hj then, for each j, prjg must be fj so that g = f (Alternatively, first construct h with prjh = hj for all j, and then define f by fin i = h.) The map assign is then defined using 8 and 11 by en 12 assign x C x En] -------+C" id x in; r 1 prj C"xCxl Iv IC where { en en x e e This recaptures 3 when C = Set. fij= x x 1~e if i = j. x ifi "" j. 1~en~e o
282 1 2 Parametric Specification To show that these constructions are not just exercises in formalism we introduce explicit elements and functions (but see Exercise 1). 13 Definition. For C any object, an element of C is a morphism x: 1 -+ C. We write XEC. This corresponds to the usual notion in Set. In general, "morphisms are functions" in that if x E C and f: C -+ D in C the composition fx indeed is an element of D. Note also that the product property (2.3.11) asserts that the elements of a product C1 x ... product set of the sets of elements of the Ci . X Cn are just the Cartesian 14 Definition. We now generalize 4. If X E c n and if 1 ::;; i ::;; n, X[i] E Cis defined by X[i] = 1 ~cn~c. If also aEC then "X[i]:= a" should change X to the Y: 1-+ en defined as 1 [X,a,in,) I Cn X C x En] assign I cn. The reader should note that the theory of the present section does not work in Pfn or Mfn (see Exercise 1 for more details). For example, in Mfn the categorical product construction yields the disjoint union, with projections ( .) _{(a, i) pri a,l - I ..J.. if i = j I ese, which hardly yields the semantics of arrays. This leads us to make a point which appears counterintuitive but which, once grasped, avoids a great deal of confusion. The appropriate category for the theory of program semantics is not, in general, the appropriate category for the semantics of data types. For example, we have seen that Pfn and Mfn provide appropriate categories for the semantics of large classes of deterministic and nondeterministic programs, respectively, whereas Set provides the setting for defining many data types. The function associated with data types through constructions in Set are, of course, then available as morphisms in both Pfn and Mfn (cf. the discussion of C TOT preceding 12.2.7). In fact, morphisms defined in Set can then yield familiar interpretations of partial isomorphisms, as will be seen to be the case for pop and push of stacks in the next section. No matter which category we choose for our semantics, the general theory is still available to guide our analysis and constructions.
283 12.2 Stacks and Queues EXERCISES FOR SECTION 12.1 1. Show that Pfn satisfies axioms 5-7 but not 8. [Hint: See Exercises 2.3.8 and 11.3.3]. Do the same for Mfn. [Hint: See Exercise 2.3.7.] In both categories, show that every object has exactly one element! [Hint: What is the terminal element 1 of Mfn and Pfn?] Hence, using elements and functions gives a degenerate view of these categories. There are many important categories, for example, sheaves over a topological space, which satisfy axioms 5-,8 but for which elements give an incomplete view of the structure of the objects. It is not clear that such categories would arise in program semantics, however. 2. Given X E en and 1 ::;; i ::;; n, use spec to define X[i] E C. 3. Given an "array of arrays" X E (cnr and 1 ::;; i ::;; n, 1 ::;; j ::;; m, define X [i,j] E C. 12.2 Stacks and Queues Given a set E of elements, we may imagine situations in which we construct words X n •·· Xl in E* in such a way that our access to the Xi in a word is restricted. Stacks and queues provide two such examples. An instance of a stack is a stack df bills on a spike at a restaurant cash register-if X n ··· Xl is such a stack (with Xl the first put on the spike, X 2 the second, etc.) then immediate access is limited to Xn which, if removed, leaves X n - l ••• X 1. Hence, when E* is regarded as "stacks of E" we will want to include with this data type, the following partial functions. In this section total functions will be written A - B a~d(possibly but not necessarily total) partial functions will be denoted with half-arrows A ~ B. 1 2 E* ~ E, DD(top) = E+ = nonempty words, E* ...E!!.E..,..E*, DD(pop) E+, = An example of a queue is a queue of people buying tickets. The ticket seller deals with individuals in the order in which they entered the queue, just the reverse of the stack case. Hence, relevant queue functions are 3 E* ~ E, DD(bottom) 4 E* ~E, DD(rest) = = E+, E+, Both stacks and queues have the empty word which we express formally as a function on a one-element set as in 12.1.13 by I~E* 5 as well as the function 6 E x E* push, E*, (X, w) 1-------+ xw. Both stacks and queues are built up from the left (choice of left over right is arbitrary) by successive pushes beginning with the empty word, so that
284 12 Parametric Specification PUSh(Xl' A} = Xl' PUSh(X2,xd = X2 X l ' PUSh(X3,X2Xd = X3X2Xl ' and so on. The task of this section is to define 1-6 in a broad class of semantic categories. Our approach will be as follows. As long as C has zero maps we may consider a subcategory, call it C TOT ' of all C-objects and (not necessarily all) total C-morphisms (see 2.2.21-22). Examples to keep in mind are C = PCn and C = MCn with CTOT = Set in both cases (though not all total morphisms in MCn are in Set). E* arises as the least fixed point of the endofunctor CI------+ 1 + E x C of Set (cf. 10.2.9). We shall impose axioms on C so that polynomial endofunctors on CTOT exist and have least fixed points and use this not only to define E* but to define 1-6 in C in a way that makes natural reference to all of the recursive properties involved. 7 Definition. For the balance of this section we let C be a category and let CTOT be a subcategory of C subject to the following properties: (i) C has zero morphisms (2.2.16). (ii) All C-objects are objects of CTOT and every morphism in C TOT is total in C. Furthermore, CTOT satisfies axioms 12.1.5-8, that is, CTOT has a terminal object 1, has finite products and coproducts, and is such that A x-: CTOT --+ CTOT preserves coproducts for every object A. (iii) If in j : Aj --+ A is a finite coproduct in CTOT it is a coproduct in C as well. (iv) CTOT has colimits of right chains and limits of left chains and every polynomial functor CTOT --+ C TOT is bicontinuous. Notational convention: A --+ B for CToT-morphism, A ---, B for C-morphism (which may be, but is not necessarily, total). As already noted, main examples are C = PCn or MCn with CTOT = Set. Note that C is not required to have products. While the full strength of these axioms will not be applied in this section, experience dictates that at least this much should be assumed. Many stronger assumptions are suggested in the exercises. Let E be an object in C. We then define E* as follows. 8 Definition. Let t/!: CTOT --+ CTOT be the polynomial functor 9 t/!(C) = 1 + E x C. By 11.2.13 t/! has a least fixed point (E*, f1) with f1: 1 + E x E* ~ E*. Thus, by the definition of a coproduct, f1 decomposes as f1 = (A,push) which not only defines the desired generalizations of 5 and 6 but gives 10 1 ~ E* ~ E x E* is a coproduct
285 1 2.2 Stacks and Queues because J1 is an isomorphism (see Exercise 1). Note that 10 is a coproduct in both C and C TOT by 7 so we need not specify which. If C = PCn then 10 is given by the functions of 5 and 6. That this is a coproduct is clear. The least fixed point property is id + idE X f 1 + E x E* --""""""--~j 1 + E x D 11 <A, push) 1 l<d,(i) E* j f D wherein f exists uniquely given d: 1 ~ D, 15: E x D equivalent to 12 ------+ D. Clearly, 11 is f(A) = d, f(xw) = b(x,f(w)), which indeed does define f recursively. In general, the definition of the stack morphisms in C is immediate, using the quasi projections of 3.2.6. 13 top = E* ~ 1 + Ex E* ~E x E* -..E4 E; 14 pop PR -1 = It-I E* ------+ 1 +E x E* PR ~E x E* pr ~ E*. That this gives 1 and 2 when C = PCn is easily checked. Our strategy for defining bottom and rest in C by generalizing 3 and 4 is to use the least fixed point property. We begin by showing how 3 is a consequence of 11 when C = PCn. First extend bottom to b: E* ~ E* by {A A b(w) = ifw = else. b(A) =A b(x) =x (xEE) = (xEE, w "# A) bottom(w) Then 15 b(xw) b(w) defines b recursively. For example, b(X 3 X2 X l ) = But 15 is a special case of 12 if D 16 b(X2Xd = E*, b(x, v) = = b(xd = Xl· D = A, and {Xv V =A else. To do this in general, C then amounts to finding a suitable definition of the 15
286 12 Parametric Specification of 16. The key fact here is that E x (-) preserves the coproduct 10 in C TOT by 7. Thus, 17 E x 1 E x E* idExA) (idEXpush E x (E x E*) is a coproduct. Hence, fJ is uniquely defined in CTOT by 18 Ex 1 idE idE X A j A\ Ex E* X idE x push ) E x E* ( E x (E x E*) /pr 15 P~E*~ 2 Ex E* For C = prn, 18 reduces to fJ(x, A) = push(x, A) = x 19 fJ(x, yw) = push(y, w) = yw, that is, fJ(x, v) = v if v #- A, which is just 16. For general C, there exists a unique b in C TOT defined by the least fixed point property id + idE x b 1 + E x E* ---=---+) 1 + E x E* 20 (A, push) 1 1 (A, 15) E* which gives 15 when C = prn. b ) E* The desired generalization of 3 in C is then bottom = E* ~ E* ~ E x E* ~ E x 1 ~ E, 21 where the quasi projections refer to 10 and 17. This clearly recaptures 3 if C = prn, as E* ~ E in 21 is x f-+ x with domain of definition E. The general construction of rest: E* -..- E* uses the same concepts that led to 20 and so is left as Exercise 2. EXERCISES FOR SECTION 12.2 1. Let f: C -> D be an isomorphism and let A---.!......C~B be a coproduct. Show that is also a coproduct. 2. For C = Pfo extend rest to a total function r: E* (w A). Show that * -> r(A) = A r(xw) if 15: E x E* --> E* is defined by = 15 (x, r(w» E* by r(A) = A, r(w) = rest(w)
287 1 2.2 Stacks and Queues (j(x, w) = { A xw w=A w *- A. Generalize this to the C of 7 and define rest = E* ~E x E* push, E* ~ E*. 3. If 7(ii) is extended to countable coproducts a much more explicit construction of E* can be given. (i) Using the general theory of products, construct an isomorphism IXn: E x En ---+ E n+1 which, in Set, is IXn(X,(X 1, ... ,Xn)) = (x, Xl" ., xn). Letting EO mean 1, show that 1X0 recaptures 12.1.9. (ii) Define E* as the infinite coproduct 1 + E + E2 + .... Define A = ino: 1 ..... E*. As idE x inn: E x En ---+ E x E* is a coproduct, define push: E x E* ---+ E* by idE x inn E x En --=---"+1 E X E* ~n 1 1 push En + 1 E* Prove that is a coproduct and that (E*, <A, push») is a least fixed point of the t/I of 9. 4. In Set, the reverse map rev: E* ..... E* is defined by rev (A) Xl'" = A, rev(x n '" Xl) = Xn• (i) Define the queue functions in terms of the stack functions and rev. (ii) Define rev if E* is constructed in C TOT as in Exercise 3. (We do not see how to do this in the context of 7 without additional assumptions.) 5. Define E+ by the least fixed point specification E+ : := q>(E) = E + (E X E+). For C as in Exercise 3 show that E+ ~ E x E*. [Hint: Paralleling Exercise 3, show directly that a least fixed point can be built on E + E2 + E3 + .... J Conclude that there exists a coproduct See Exercise 13.3.3 for a related result. 6. Define E+ as in Exercise 5. Prove, assuming only 7, that there exists a coproduct of the form 1~E*+--E+ (although E+ ~ E x E* is not clear) by completing the following outline. (i) Show that there is a coproduct so that (recall Exercises 10.2.7-8) we may write the least fixed point isomorphism as v: Ex (1 + E+) ---+ E+.
288 1 2 Parametric Specification (ii) Explain why, given d: 1 -+ D, b: E x D exists unique c: 1 -+ D, g: E+ -+ D with 1 + (E x (1 + E+» id+vl ----+ D, it suffices to show that there id + (idE x <e,g») 1 + (E x D) 1<d,8) E+----------~~------~)D <c,g) (iii) h = E pr i " E x 1 idE X d, E x D ~ D depends only on d and b. Complete the proof by showing that c, g satisfy the square of (ii) if and only if c = d and g is the unique !/I-algebra morphism (E+, v) ----+ (D, (h, b >). 7. Give an example of a total morphism in Mfn which is not in Set. 12.3 A Functional Programming Fragment Revisited In Section 1.3 we introduced the data type DTN of dynamic trees of natural numbers and described a number of building operations--composition, conditional, construction, apply-to-all, and insertion-to convert old functions DTN -+ DTN into new ones. In this section we apply the theory in the intervening chapters to present the semantics of this functional programming fragment in a broad class of semantic categories. While we do not pursue the details, the requirements are roughly those of 12.2.7, where C(X, Y) has suitable ordered structure or partially additive structure to model iteration and recursion. In the following treatment, we shall make use of a number of results on functorial constructions. These results are easy to state and apply, but their proof would unduly burden the body of this volume. (Interested readers will not find it difficult to provide their own proofs and are encouraged to do so.) The results are the following: 1 Theorem. Let C, D have colimits of right chains and let C x D be the product category with ob(C x D) = ob(C) x ob(D),and(C x D)((Ct>D 1 ),(C2 ,D2 » = C(C1 ,C2 ) x D(D 1 ,D2 ) with composition and identities those of C, D on each coordinate. Then a functor r: C x D ---+ E is co-continuous if and only if r is separately co-continuous, that is, nC,-):D-+E and n-,D):C-+E are cocontinuous for each C in ob(C) and each D in ob(D). 2 Theorem. Let C have initial object .1 and co limits of right chains, and let r: C x C ---+ C be co-continuous (equivalently, separately co-continuous). Denote nC, -): C -+ C by r c. Then the right chain
289 12.3 A Functional Programming Fragment Revisited 3 has a colimit which we denote 4 from which we derive the least fixed point Jlc: property. r dt/l) - t/lC by the colimit It can then be shown that for all C, D and f: C -+ Din C, there exists a unique t/lf such that 5 where r d = r(C,1.) rn+d = r cn~(1.) r(D,!)) r(D,1.), r(f,id») r Dn~(1.) rDrnf) r DrM1.). With this definition on morphisms, t/I is a functor C -+ C. It can in fact then be shown that", is co-continuous, that is, 6 C- least fixed point of r( C, -) is a co-continuous functor. Dually, if C has a terminal object and limits of left chains, then the entire discussion applies to COP, yielding the dual result that if r is continuous, then 7 C- greatest fixed point of r(C,-) is a continuous functor. Set, r(A, B) = 1 + A x B is separately biThe functor r: Set x Set continuous, being polynomial in each variable, and so is bicontinuous by Theorem 1 and its dual. As discussed in Section 2, r(A, -): Set -+ Set has a least fixed point of the form (A *, JlA)' This gives rise to a co-continuous functor Af---+A* by 6. We leave it as an exercise for the reader to carefully unravel 5 to prove that A* 11* 8 B withf*«a1,···,a n » = <fa1,···,jan ), B* where, in accordance with the notation of Section 1.3, we write <a1, ... ,an ) instead of a 1 ••• an for elements of A *. (Actually, for this special case Exercise 11.2.2 provides a more direct route to the functoriality and co-continuity of A f---+ A *.) We then define the data type DTA of dynamic trees of A by the least fixed point specification
290 12 Parametric Specification 9 Except that we are using an arbitrary set A rather than the set of natural numbers for the length-l trees, this specification agrees with the basis and induction steps originally used to define the set of DTNs in Section 1.3. The least fixed point specified by 9 exists because t/lC = A + C* is cocontinuous, being a coproduct of a constant functor and C*. Indeed, it may be shown that A ~ DTA is co-continuous and that + DTQ Q::= A is a valid least fixed point specification-but that is another story. We turn now to building suitable functions on DTA' The fixed point isomorphism takes the form of a coproduct 10 Given f: DTA --+ DTA, "apply-to-all f" af: DTA --+ DTA is then defined using this corproduct by 11 A in 1 )DTA. E ~i~f In1 I .\. DTA. E in2 in2 DT;l' 11* DT1 The following general discussion will be useful shortly. In a least fixed point specification Q : := t/I(Q) for co-continuous t/I, the least fixed point (L, jJ.) arises (as in 4) via a colimit an: t/ln 1- - - L. Question: Is it valid to use the an as maps associated with the data type? Answer: Yes, since they are easily derived in a finite way from the least fixed point itself. Indeed, define Pn: t/ln 1. - - L for n ~ 0 as follows: 12 Po = 1.-GL, Pn+l = t/ln+1 L tpPn) t/I L ~ L. Then it may be shown that Pn constitute colimit injections, that is, "Pn = an up to isomorphism." But even without pausing to prove this here, it is clear that the ae,n of 5 can be constructed directly from the least fixed point isomorphism jJ.. For the case t/lC = 1 + B x C it may be checked directly that Pn is, up to isomorphism, the inclusion 1+B + ... + B n - - B*. By composing this with the appropriate injection, the inclusion 13 Bn~B* <b 1 , .. ·, bn >r------+ <b 1 , ... , bn > is available, as is our old friend
291 12.3 A Functional Programming Fragment Revisited 14 <) the empty word (written A in previous sections but as in Section 1.3). Recall that the initial object property of B* amounts to: given qo, there exists unique r with Hence, we may define, given f: DTA -----+ DTA, a map g: DT': and it is then easily seen that the insertion map If: DTA 15 in -----+ -----+ DTA by DTA is given by in A~l/DTl DTA The construction map is easy using the y of 13. Given fl' ... , fn: DTA -----+ DTA, define f by and then 16 Composition and conditional are dealt with as in earlier chapters. EXERCISES FOR SECTION 12.3 The results of this section provide the semantics of FPF in Mfn. Note that DTN does not change if we define MfnToT = Set. In Exercises 1 and 2 evaluate in Mfn using Kleene semantics in the usual way. 1. while p do i (6.3.25) for p, i E Mfn (DTN, DTN). 2. foreach 0 [I, g] whereforeach is defined in Exercise 1.4.3 and i, g E Mfn (DTN, DTN). 3. Verify in detail that lXi, /i, and [fl"" ,In] as defined in this section take their expected meanings in the context of Section 1.3.
292 1 2 Parametric Specification Notes and References for Chapter 12 For a much more detailed discussion of the issues raised in Exercise 12.1.1 see The book of R. Goldblatt cited in the notes to Chapter 2. The omitted proofs of 12.3.1-6 follow easily from results in the Lehmann-Smyth paper cited in the notes to Chapter 10. The constructions in Sections 1 and 2 refine those given by the authors in their paper on parametrized data types cited in the notes to Chapter 10.
CHAPTER 13 Order Semantics of Data Types 13.1 13.2 13.3 13.4 Introduction Constructions with Domains Cartesian-Closed Categories Solving Function Space Equations Work initiated by D. S. Scott and C. Strachey in the 1960s, and contributed to by many up to the present writing, yields a framework for program semantics in which every data type is a domain and every computed function is continuous. We provide a critique of these basic assumptions in Section 1, but then proceed to develop an introduction to this theory of ordered semantics in the remaining sections. 13.1 Introduction In this section we offer a critique of the motivations for ordered semantics. We hope the reader will not misconstrue our claims that other avenues of approach are possible as arguing against the merits of the ordered approach. Rather, we are reacting to the acceptance, certainly championed by some, of ordered semantics as the only mathematical foundation for the theory of program semantics. 1 A Basic Claim of Ordered Semantics. All computable functions are continuous. Defense of 1. Let D and D' be domains and let f: D - D' be computable (in some reasonable sense). In these domains the approximation relationship x ::;; y is interpreted to mean that the "information content" of x is included in that of y (i.e., y has more and better information). If there is to be some notion of computability, information will have to be determined by "finite" approximation, and a function will only be able to compute in terms of these
294 13 Order Semantics of Data Types finite approximations. To provide a precise notion of "finite approximation," suppose that an increasing chain of elements of D is given. Information here is increasing montonically, and since D is a domain we can form the least upper bound x = V xn• n~O Suppose then that f is computable. We then ought to be able to write if x::;; y, fx ::;;fy for if x approximates y then fx should approximate fy. Moreover, we should have since the finite amounts of information are the same on both sides and the basic assumption is that an element is determined by its finite information content. But the above says precisely that f is continuous. Critique of the Necessity of 1. In our study of the order semantics of recursive definitions in Section 6.2, we did not require that the functions so defined be continuous: they were simply partial functions f: A - B, say, where A and B were ordinary sets, not domains. We did see that the functions I/!: Pfn(A, B) - - Pfn(A, B) associated with recursive specifications were continuous maps, but this continuity was with respect to the partial ordering on Pfn(A, B) and had nothing to do with requiring A and B to be domains, or with requiring that the least fixed point of '" be continuous. There is a simple technical device to counter our objection, using the flat domain construction of Example 6.1.16. Let f: A - B be a partial function. Define f~: (A, =)~ --(B, =)~ by 2 f~(a) = {b .l if f(a) is defined and f(a) = b if a = .l or f(a) not defined. Then f f--+ f~ establishes a bijective correspondence between Pfn(A, B) and continuous functions (A, =)~ ----+ (B, = )~. But we still object, for we know from computability theory that computability of f does not guarantee computability of f~. For if f~ were computable, we could solve the halting problem for any program computing g by setting I if f~(a) =P .l { halt (a) = 0 if f~(a) = .l. We thus feel that the above principle should be weakened to:
295 1 3.1 Introduction 3 If D and D' are domains and if the orderings S are viewed as "approximation of information content" then all computable functions f: D ~ D' can be argued to be continuous. Thus, the theory of recursive definitions does not force computable functions to be continuous. However, could it be that the theory of d&ta types offers reasons to structure the sets D, D' underlying data types as domains and to interpret the underlying relations as "approximation of information content"? We consider a number of issues to further this discussion. 4 A Basic Claim of Ordered Semantics. A data type has an approximation ordering. Defense of 4. In the real world, we can in general only compute finite approximants to computable functions. Data types require an approximation ordering so as to express this. Critique of the Necessity of 4. The defense has many justifications but does not apply to all functions. There are many simple functions (consider Boolean and arithmetical operations) which we compute completely, not just approximately. More complicated functions arise through iteration and recursion. Thus, if l/I: Pfn(A, B) ---+ Pfn(A, B) is the continuous function associated with a recursive specification, the semantics V l/I k(1-) of 6.2.3 is approximated by the l/Ik(1-)-but A, B are just sets and need not themselves be partially ordered. Furthermore, this is not an approximation in the sense that if a approximates a', then f(a) approximates f(a'). It is the approximation of successive extensions of a partial function. The trace semantics map g: A ---+ A*B + AOO of 10.2.8 tracing the (perhaps infinite) iterated application of f: A ---+ A + B is approximated by the maps A ~A*B + A oo ~B + AB + ... + An-1B + An, where (ex n ) is the limit of the greatest fixed point construction for the polynomial functor l/IC = B + A x C (see 11.3.13). Again, no ordered sets were needed. In short, what is fuzzy about the defense of 4 is the leap from the fact that recursive definitions of computable functions yield sequences of "approximations" (which equal the function restricted to ever larger subsets of its domain of definition) to the suggested need for an approximation ordering on the domains of that function. The ordering of subsets of a set in no way requires an ordering on elements of the set. 5 A Basic Claim of Ordered Semantics. Every reasonable recursive specification of a data type is solved by taking the least fixed point of a continuous functor on domains.
296 13 Order Semantics of Data Types Defense of S. For then every recursive specification written by the programmer will have a meaning. The category Domadj of Section 4 below allows fixed points not only for polynomial functions but for function-space specifications that have no solution in Set. Critique ofS. Mathematically, this is a beautiful idea which has profoundly altered research directions in the foundations. In the situations we have examined in Set, however, the domain semantics of a recursive specification may not be what the programmer had in mind. In many cases, specifications important to the programmer can be solved directly in Set rather than in some category of domains, and for some of these (e.g., Example 13.2.10 below) the answer is closer to the programmer's intuition. Section 4 below expands the discussion of 5. In Section 2 we introduce constructions for domains which give rise to polynomial functors as well as the "function-space" domain [D -+ E] of continuous maps from D to E. Section 3 introduces Cartesian-closed categories to formalize the functionspace construction. Then Section 4 introduces "reflexive domain equations" such as D ~ A + [D -+ D] which have nontrivial solutions in D for domains but not for sets. 13.2 Constructions with Domains 1 Definition. Given domains D and D', let [D -+ D'] be the set of all continuous functions f: D -+ D', partially ordered by f::;; g<=> f(x) ::;; g(x) 2 Observation. [D space domain. -+ for all x in D. D'] is a domain under ::;;. We call [D -+ D'] a function- PROOF. The bottom element is 1- with 1-(x) = 1- ED' for each x in D. Given an ascending chain fo ::;; fl ::;; fz ::;; ... , define F: D -+ D by F(x) = V fn(x). Then F is continuous as follows. If d ::;; e in D, fn(d) ::;; f,,(e) ::;; F(e) for all n so that F(e) is an upper bound of fn(d) and F(d) ::;; F(e). This shows F is monotone. Now let do ::;; d 1 ::;; d z ::;; ... be an ascending chain in D and let d = V dn. To show F(d) = V F(d n) let F(d n) ::;; d' for all n and show F(d) ::;; d'. Fix m. For all n, fm(d n) ::;; F(d n) ::;; d'. As fm is continuous, fm(d) ::;; d'. This shows F(d)::;; d'. D
297 13.2 Constructions with Domains 3 Definition. Given domains D1 , ... , Dn we define the following: (i) Dl x .,. x Dn is the set of all ordered n-tuples (x l' ... , xn) with Xi E D under the ordering (x1, ... ,xn )::; (Yl, ... ,Yn) iff Xi::; Yi in Di , i = 1, ... , n (cf. 6.1.17). + ... + Dn = {1-} u {I} 1- ::; z for all z in Dl + ... Di · (ii) Dl X Dl + Dn, Observation. Dl x ... x Dn and Dl {n} x Dn under the ordering while (i, x) ::; (j, y) if i = j and x ::; Y in U ... U + ... + Dn are both domains. 4 Categories of Domains. To define data types recursively we are led, as in Section 11.1, to consider colimits of right chains and, as in Section 11.3, limits of left chains. This is meaningless unless we pin down which category of domains we are in. There exists a wealth of possible definitions of such categories in the literature. In this section we introduce the category Dome of domains and continuous maps. The categories Dom and Domadj are studied in Section 4. The reader should understand that different categories are introduced to solve different problems. Any claim that domains provide the necessary objects in semantics should clarify that different types of morphism are required at different times. The interest in Dome springs from the philosophy of 13.1.1 that "computable maps are continuous." (No one claims that all continuous maps are computable.) 5 Definition. The category Dome (c for "continuous") has domains for objects and continuous maps for morphisms. (Morphisms are not necessarily strict: we do not require f(1-) = 1-.) 6 Observation. Dome has products. The construction 3(i) with the usual projections (6.1.17) works. The reader might well expect us to say that the construction of 3(ii) is the coproduct in Dome- This is not so. Dome is a bit out of kilter because its morphisms do not preserve all the structure the objects possess in ignoring 1-. Thus, in attempting to define (f, g): {1-} U {I} x D U {2} x D' -+ E given f: D -+ E, g: D' -+ E there is no unique way to define (f, g) (1-). In fact, 7 Example. Dome does not have coproducts. Let 1 = {1-} be the one-element domain and suppose were a coproduct. Define domains E, F as shown:
298 13 Order Semantics of Data Types x \.1/ x y \z/ y I .1 F E Define continuous maps E ~ F I(x) = x,/(y) = y, 1(.1) = z, E ~ F g(x) = x, g(y) = y, g(.1) = .i. Viewing x, y as (continuous) maps 1 --+ F, we can use the supposed coproduct property to form IX = <x,y): D --+ F. D 7:~ 1'-....... : 0( x-.........,.+/y F 1 Then lX(d) = x and lX(e) = y. But IX must be continuous if indeed we have a coproduct in Dome. Hence, we can not have d::s; e in D since lX(d) i lX(e). Similarly, e i d. Thus, the least element .1 of D is distinct from d, e. Similarly, let p = <x,y): D --+ E. As IP maps d to x and e to y, IP = IX, by the uniqueness property of coproducts. Similarly, gp = IX. This is possible only if 1X(.1)E{X,y}, since I(E)ng(E) = {x,y}. But then consider u = <e, d): D --+ D, 't": F --+ F, 't"x = y, 't"y = x, 't"z = z, 't".1 = .i. Since u is an isomorphism owing to generalities about co products (why?) and all isomorphisms in Dome preserve .1 (why?), u(.1) = .i. As y = 't"IXU: D --+ F maps d to x and e to y, y = IX. But this is a contradiction since if 1X(.1) = x, y(.1) = Y whereas if 1X(.1) = y, y(.1) = x. The situation is not as bad as would appear at first glance because the construction of 3(ii) does have all the properties we need. For example, 8 Definition. Given h: Di --+ Ei (i = 1, ... , n) in Dome define 11 D1 + ... + Dn - + E1 + ... + En by (/1 + ... + fn)(x) = {~(X) if xEDi if x = L It is easily checked that idDl + ... + idDn = idDl +···+Dn and, given h: Di --+ E i, gi: Ei --+ Fi (i = 1, ... , n), then + ... + In:
299 13.2 Constructions with Domains Thus, just as in 10.1.19, if Fl , ... , Fn: Dome --+ Dome are functors, so is Fl + ... + Fn (with + as in 3(ii)). And it is then clear how 10.1.22 is interpreted to define polynomial Junctors in Dome. 9 Fact. Dome has colimits of right chains. In the Set-construction of 11.1.7, if each Cn is a poset with LUBs of ascending chains, the colimit U is partially ordered by [n,x] ~ Em, y] iff there exists I ~ m, n and x ~ y in C1 with [n,x] = [I, x], [m,y] = [1,)1]. If such U happens to be a domain, it is the desired colimit in Dome. In general, the desired colimit exists as the "completion" of U obtained by adding, in a suitably precise sense, LUBs which were missing in U. Although we avoid further discussion of completion, the following example provides intuition. 10 Example. One is tempted to define the natural numbers N by N::= N + {O}. But in which category? In Set, this yields the colimit 0--+0--+ T--+2--+"', where n = {O, ... , n} and the unlabeled maps are inclusions. If we regard each n as a poset in the usual way, the ordering on N obtained as discussed above is the usual one. But in N, the ascending chain 0 ~ 1 ~ 2 ~ ... has no LUB. The colimit in Dome is N u { (f)} with 00 = LUB(O, 1,2, ... ). This illustrates a general slogan: "with domains, one may get infinite elements in the data structures, like it or not." Dome shares a very pleasant property with Set: 11 Fact. Any polynomial Junctor Dome --+ Dome is co-continuous and so has least fixed point. The proof will be outlined in 13.3.10. EXERCISES FOR SECTION 1. Let PX,f = Pfn(X, Y) 1\ by 13.2 + {T}. For (1;IiEI) a family in PX,y, we define the operator 1\1; = {T f if no i exists with I; i= T else, where DD(f) = {xEX:I;(x) = jj(x) for all i, j with 1;"# T "# jj} and, for x in DD(f), f(x) is the common value of all such I;(x). Show that 1\ is the infimum operation of a complete poset. What is the supremum operation? (See Exercise 6.1.6.) Show that f v g = T if there exists x with f(x) "# g(x). Hence, T is the "overdefined" element. 2. Let C be the full subcategory of Dome of all complete posets.
300 1 3 Order Semantics of Data Types (i) Show that [D -> D'] is complete if D' is. (ii) Show that C has products using the construction of 6.1.17. (iii) Show that if Definition 3 of Di + ... + D. is modified to add a greatest element as well then Di + ... + D. is complete. Hence, polynomial functors may be defined C -> C. (iv) Show that C does not have coproducts. [Hint: Modify Example 8 by giving E, F greatest elements.] 3. Give an example of a domain with least and greatest elements which is not complete. [Hint: Six elements suffice.] 13.3 Cartesian-Closed Categories In this section, we introduce the notion of a Cartesian-closed category for two reasons: it supports the notion of a "function-space object" which generalizes the function-space domain [D -+ D'J of 13.2.2 thus allowing a categorical version of the notion of "Currying" from the A-calculus; and it provides insight into intuitionistic generalizations of Boolean logic. 1 Definition. A category C is Cartesian-closed if (i) C has a terminal object, 1; (ii) C has finite products; and (iii) for each B, C in C there exists a "function-space object" [C -+ BJ and an "evaluation morphism" e: [C -+ BJ x C -+ B with the property that for all f: C' x C -+ B there exists unique f": C' ~ [C -+ BJ with 2 Example. Set is Cartesian-closed. Define [C -+ BJ to be the set of all functions from C to B. Define e(g, c) = g(c). The unique f" is given by f"(c')(c) = f(c', c). This property of functions is familiar to users of LISP or the A.-calculus as Currying or lambda-abstraction which is the operation which converts the function f: C' x C ~ B to the function f": C' ~ [C -+ B]. Here, the display we write as x ~ g(x) would be written A.x.g(x). Then f: C' x C -+ B has A.-notation A.(c', c).f(c', c), and f": C' ~ [C -+ BJ has A.-notation A.C'.(Ac.f(c', c)). That is, f" takes the argument c' to return A.c.f(c', c), which is in [C -+ B]. The commutativity of the diagram of 1, expressed in A.-notation, is that for all a' in C' and a in C, (A.c'.A.c.f(c', c))(a')(a) = (A.(c', c).f(c', c))(a', a) since both sides evaluate to f(a', a) in B.
301 13.3 Cartesian-Closed Categories We now show that if a poset P, considered as a "set of generalized truth values," is Cartesian-closed when viewed as a category, then we capture what is known as a Heyting algebra in intuitionistic logic. Classical Boolean logic is recaptured when P = {F, T} with F < T. 3 Example. Let C = C(p.::s;) for a poset (P, ~). What does it mean to say that C is Cartesian-closed? Think of elements of P as "propositions" and interpret p ~ q as "if p is assumed then q can be proved." The terminal object of Cis the greatest element of P, which we then write as T for "true." A product p x q is characterized by p x q ~ p, p x q ~ q, if r ~ p, r ~ q then r p x q, ~ which coincides with the greatest lower bound p 1\ q as defined in 3.3.3; so we write p 1\ q for p x q and think of it as "p and q." We shall write [p --+ q] as p q and think of it as "p implies q." Let us now reexamine the diagram of 1(iii) in the form = Then a is just the deduction rule known as "modus ponens" to logicians, being the assertion ((p=q) p) 1\ ~ q. The commutativity of the diagram is of no interest in this example since all diagrams commute in a poset. What is really asserted (modus ponens-that is, a-having been given) is that f exists if and only if does, that is, r r 1\ p ~ q if and only if r = ~ (p = q) for all p, q, r. Evidently, this property requires p q to be LUB{rlr 1\ p ~ q} and so it is uniquely determined by p, q. A Cartesian-closed poset is called a Heyting algebra and is well known as the appropriate generalization of Boolean algebra in the area known as intuitionistic set theory. Classical Boolean logic, {F, T} with F < T, provides a Heyting algebra with p 1\ q = T if and only if p = T = q and p q = T if and only if p ~ q. Another Heyting algebra is the unit interval [0,1] of real numbers with ~ the usual ordering. Here, T = 1, p 1\ q = Min{p,q}, so that = p=q= { 1 p~q q p > q. Both of these examples have a least element, which we shall denote F for
302 13 Order Semantics of Data Types "false." Define the complement of p, ip, by ip=(p=F). In the Boolean case, iF = T, iT = F is the classical complement. Here, i i p = P for all p. In the unit interval example, however, p=o iP=g P > 0, iiP=g p>o p = 0, so that p::S; i i p but P # i i p in general. It is, however, true in general that ip= i i i p . The uniqueness of [C -4 B] is not special to posets: 4 Observation. In any category With finite products, if C, B are fixed and ([C -4 B]), e) as in l(iii) exists, then it is unique up to isomorphism. For it is a terminal object in the obvious category with objects (C',f), where f: C' -4 B. 5 Fact. Dome is Cartesian-closed. PROOF OUTLINE. 1 = {.l} is a terminal object. As noted in Exercise 6.2.5, the construction of 6.1.17 provides products in Dome. Define [C -4 B] as in 13.2.2. Useful technical results, not hard to prove, are the following: 6 f: C' x C -----+ B is continuous if and only if for each c' E C', C E C, f(c', -): C -4 Band f( -, c): C' -----+ B are continuous. (See Exercise 6.3.7.) 7 e: [C -4 B] x C -4 B defined by e(g, c) = g(c) is continuous. Defining e as in 7, then, given f as in l(iii), the unique function F defined as in 2 by F(c')(c) = f(c', c) is continuous by 6. More is true: 8 The "abstraction map" [C' x C -4 B] ~ [C' ---+ [C -4 B]], rx(f) = F is not just a bijection as in 10.2.22 but is an isomorphism in Dome (cf. Exercise 4 for the generalization). D 9 Discussion. We pause to discuss this notion of Cartesian-closed category. First, a category is or is not Catesian-closed: the function-space objects, if they exist, do so in only one way by 4. If objects are not sets neither are the function-space objects (cf. Example 3) so that function-space objects are not sets of functions in general. (We might, however, associate sets to objects by
303 13.3 Cartesian-Closed Categories defining an element of an object X to be a map x: 1 --+ X from the terminal object as we did in 12.1.13. Then an element x: 1----+ [C --+ BJ is of the form for a unique f: 1 x C ~ B. But using the canonical isomorphism C ~ 1 x C (cf. 12.1.9), we then see that elements x correspond bijectively to actual morphisms C --+ B.) Cartesian-closed categories have many nice properties. Some elementary ones are considered in the exercises. It may be shown that any Cartesianclosed category which has colimits of right chains satisfies Lemma 11.2.10. This is the hard part of r 10 Fact. Any polynomial functor Dome --+ Dome is co-continuous and so has a least fixed point. PROOF OUTLINE. Constant functors and the identity functor are co-continuous and any composition of co-continuous functors is again so. As remarked above, the proof of 11.2.9 that a product of co-continuous functors is cocontinuous goes through in any Cartesian-closed category. While Theorem 11.2.7 asserts that a coproduct of co-continuous functors is co-continuous, this does not apply since the + for the current polynomial functors is not a coproduct. Nonetheless, the proof of 11.2.7 goes through with minor changes. o EXERCISES FOR SECTION 13.3 1. In any Cartesian category prove the following: (i) [1 -+ A] ~ A for all A. (ii) [A -+ B x C] ~ [A -+ B] x [A -+ C] for all A, B, C. (iii) [A -+ 1] ~ 1. [Hint: Show [A -+ 1] is a terminal object.] -+ B' in a Cartesian-closed category C, define [A -+ fJ: [A -+ B] - - [A -+ B'] by 2. Given A and f: B [A ~B'] [A~f] x idE'! x A ---~, B' If [A ~ B] x A --e-~' B Show that this renders [A -+ (-)] a functor C -+ C. Give an explicit description of [A -+ fJ in Set and in Dome. 3. Let C be a Cartesian-closed category and let C, C TOT satisfy the axioms of 12.2.7. Define (E*,.u) as the least fixed point of I/I(C) = 1 + E x C and define (E+, v) as the least fixed point of qJ(C) = E + E x C. Then, as in Exercise 12.2.5 but without assuming countable coproducts, prove that E+ ~ E x E*. [Hint: Use the least fixed point property of E* to show that qJ has a least fixed point with object E x E*. The needed f: E x E* - - D corresponds. to a suitable f·: E* __ [E -+ D].] 4. Recall from Exercise 2.3.2 that in any category with products there is an isomorphism 0: A x (B x C) - - (A x B) x C. In a Cartesian-closed category, given B, C, C' dt:fine y, /3, IX by
304 13 Order Semantics of Data Types f- [C ~ B] x C - - - - - - - - - - - - + l B y x ide ([C' x Ire C~B] x C') X C-----::-(j-~l [C' x C~B] x (C' x C) e [C' x C ~ B] x (C' x C) - - - - - - - - - - + 1 B Ie px [C,B] x C idC'xe rex ide [C' ~ [C ~ B]] x (C' x C) Generalizing 8, show that ness of in 1.] r IX -----::-(j-~l ([C' ~ [C ~ B]]) x C') x C is an isomorphism with inverse fl. [Hint: Use unique- 5. Let C be a Cartesian-closed category with an initial object 0 and finite coproducts. Prove the following: (i) 0 x A ~ O. [Hint: Show 0 x A is initial.] (ii) [0 --+ A] ~ 1. [Hint: Show [0 --+ A] is terminal.] (iii) [A + B --+ C] ~ [A --+ C] x [B --+ C]. (iv) Show that A is not initial iff: A --+ 0 exists. [Hint: A --.L. 0 A =f- idA if A is not initial; this contradicts (i).] 6. Establish the Lawvere diagonal argument in a Cartesian-closed category C: Given a: 1. --+ J in C which has no fixed points in the sense that ax =f- x for all ele[X --+ J] which is surjective on ments x: 1--+ J of J then there exists no f: X elements. [Hint: For any such f, consider g: X x X J with g. = f and let go: 1 [X --+ J] be h·, where h=X~X x X-..!!..->l~J and ~ is defined by prl~ = idx = pr2~. Show that if Xo: 1 --+ X existed with fx o = gxo then axo = Xo; hence, no such Xo exists.] Show in detail that the proof outlined in the above hint when C = Set and J = {true, false} with a(true) = false, a(false) = true amounts to Cantor's diagonal argument of Exercise 10.1.9. 7. Interpret the identities of Exercises 1 and 5 in a Heyting algebra. Similarly, interpret the functor [A --+ (-)] of Exercise 2. (For Exercise 5 assume x v yexists.) 8. Let C be any Cartesian-closed category and let in i : Ai --+ A be a coproduct in C. Show that is again a coproduct in C. Hence, axiom 12.1.8 always holds in Cartesian-closed B the desired f: E x A B correcategories. [Hint: Given 1;: E x Ai [E --+ B] induced by the coproduct property of A; sponds to a suitable g: A recall that S x T = T x S as in Exercise 2.3.2.]
305 13.4 Solving Function Space Equations 9. Let H be a Heyting algebra. Show that H is a Boolean algebra (3.3.12) if and only if IIX = X for all x. [Hint: If IIX = X define 0 = Il,x v Y= I((IX) A (Iy)). H is a distributive lattice by Exercise 8. Show that x A (Ix) = 0 in any Heyting algebra with 0; hence, if IIX = x, x v (Ix) = 1.] 13.4 Solving Function Space Equations We have shown how to solve a wide variety ofrecursive equations to define useful data types in Set. We shall consider an example due to Scott and Strachey which suggests that it may be useful to solve recursive specifications in which the function space construction can appear on the right-hand side. But before considering this example, let us see, if we accept it, why it would force us to go beyond Set as a setting for the construction of data types. Look at the simple example of the isomorphism 1 {3: D ~ A + [D -+ D], where A is a fixed object of "atoms," so that 1 asserts "a datum is either an atomic datum or a function from data to data." (Nontrivial solutions of D ~ [D -+ D] are discussed in the exercises and arise by a mild extension of the theory of this section. The advantage of 1 is that D cannot be the oneelement domain and so must be infinite (if A =f. 0), whereas D ~ [D -+ D] is true for the one-element domain.) The point here is that, in the Cartesianclosed category Set, isomorphism 1 with A nonempty has no solutions since, by Cantor's diagonal argument (Exercise 13.3.6, but see also Exercise 10.1.9), the cardinality of Set(D, D) is strictly greater than that of D for any set D with at least two elements. It was a striking discovery of Scott to show that this isomorphism could be solved for D a suitable infinite domain and [D -+ D] the function-space domain of 13.2.1. Scott and Strachey considered the following approach to the formal semantics of a programming environment. We let the store have a given domain, location, as the set of locations. Each location can hold any value from some domain value. Thus, a state is given by assigning a value from value to each location in location: state = [location -+ value]. Next, a procedure is to be regarded as a procedure for mapping values to values but also changing one state into another ("side effects") and so we represent procedures by the domain procedure = [value x state -+ value x state]. But Scott and Strachey require that the values which can be stored in a location include elements from any of the given domains V1 , ••• , v,. , the specification of some location, the specification of a procedure, or a list of values. Formally, this leads them to the equation
306 13 Order Semantics of Data Types value = V1 + ... + v" + location + procedure + list (value). The main task of this section will be to introduce a category Domadj in which each of the three functors r i : DOm~dj -----+ Domadj involved in the above recursive specification, r 1 (state,procedure, value) = [location ~ value], r 2(state,procedure, value) = [value x state ~ value x r 3 (state, procedure, value) = V1 + ... + v,. + location state], + procedure + list (value), is continuous. The proof, which is a categorical refinement of an argument made by Scott in the setting of lattice theory, is a major achievement, giving one type of demonstration that the existence of a mathematical space of such values in which "procedures can call themselves as arguments" can be made precise without internal inconsistencies. Nonetheless, we deny that it is necessary to forsake Set for Domadj , since the specification procedure = [state ~ state] is not a necessary part of semantics. In actual programming languages, procedures are built up, as in Parts 1 and 2 of the present volume, in such a way that they form a proper subset of, for example, Pfn(state, state) forming, in fact, a denumerable subset of the nondenumerable space of arbitrary maps of the denumerable set of states to itself. To provide another motivation for the theory presented in this section, but to also extend the above critique, we may note that Scott's development of order semantics of programs went hand in hand with his work on the A-calculus, developed by Church as an alternative formulation of the syntax of computable functions. In the "type-free Church-Curry A-calculus," a so-called "A-expression" (cr. the discussion following 13.3.2) may be interpreted semantically either as a function or as a piece of data, so that the concatenation MN of two such A-expressions M and N is to be interpreted as "apply the function denoted by M to the datum denoted by N." High-level programming languages pass functions as arguments so that in one context the semantics of a function is a function whereas in another context the function may be viewed as a datum. Thus, many workers have felt that providing formal models of the type-free A-calculus is a necessary step in demonstrating the mathematical consistency of certain programming languages. To this we now turn. Let E denote the set of A-expressions, and let D be the space where their values lie. (Precise definitions of E, D will not be needed to make the motivational point we require.) We need to interpret each M in two ways, both as a data element in D and as a function D ~ D. This can be accomplished with a
13.4 Solving Function Space Equations 307 pair of maps E~D~[D--+D], where rx tells us how to interpret a A-expression as an element of D while p tells us how to reinterpret each x in D as a map P(x): D --+ D. For consistency, then, we require that rx(MN) = P(rx(M))(rx(N)). The reader familiar with computability theory may recognize p as related to G6del numbering. In computability theory, we take D to be N and take [N --+ N] to be the set of all partial functions from N to N. We call n the index or G8del number for p(n), which is defined to be the partial function N --+ N computed by the program (or other formal specification) encoded by the number n. Note that in this setting the map p is neither one-to-one nor onto. Thus, despite our discussion of the Scott-Strachey example above, it may seem surprising that Scott sought conditions under which D and p could be chosen with p an isomorphism p: D ~ [D --+ D], the limiting case of 1 in which A takes the value 0. This contrast with computability theory, where (unavoidably) each function has infinitely many G6del numbers, is intentional since a denotational theory should deal directly with the computable functions themselves. The ability of computability theory to use nonisomorphic G6del numberings reinforces our suggestion that the solution of D ~ A + [D --+ D] is not a necessary part of the formal semantics of programming languages. Nonetheless, the Scott-Strachey approach has been so influential that we shall focus this section on constructing a nontrivial isomorphism as in 1. While our construction is very close to the original one of Scott, we couch it in terms of our earlier work with functorial fixed points, eventually solving D ~ A + [D --+ D] as the greatest fixed point of a suitable functor rjJD = A + [D --+ D]. (We could, of course, regard any greatest fixed point of rjJ: C --+ C as the least fixed point of the same functor considered as cop --+ cop, and in this way other workers have viewed the construction below as a least fixed point. The choice is a matter of taste. We support our choice by virtue of a close comparison with the greatest fixed point construction in Set.) Several obstacles need to be overcome. We are not sure of what the + in A + [D --+ D] means until the category we work in is stabilized. Such a category must have a function-space construction [D --+ D] which is a functor in D. But in even so nice a Cartesian-closed category as Set or Dome there is no obvious way that a morphism f: D --+ E induces a morphism [D --+ D] -----+ [E --+ E]. Let us begin, then, by considering how f: D --+ E should induce such a map. If there were also a map g: E --+ D then given hE [D, D] (and here we are not in an arbitrary Cartesian-closed category but are dealing with a function-space object which is truly a set of functions so that h is a function D --+ D) f and g induce a map E~D~D~E
308 13 Order Semantics of Data Types hopefully in [E, E]. One approach might be to insist f is an isomorphism and set g = g-l. But a subcategory which has only isomorphisms will not have enough interesting maps. For example, the least fixed point colimit of 11.1.13 would just be 1..! What is needed is a broad class of maps which induce a map in the opposite direction. It turns out that the following definition will work: 2 Definition. Let D, E be domains and let f: D -+ E be continuous. An adjoint of f is a monotone function f*: E -+ D satisfying ff*(e) = e, f*f(d) s d: D f IE ~l~ D f IE 3 Theorem. (i) A continuous map has at most one adjoint. (Hence, iff has an adjoint f*, f* is called the adjoint off.) (ii) Iff has adjoint f*, f and f* are strict. In particular, f* is continuous. (iii) Iff: D -+ E, g: E -+ F have adjoints so does gf and (gf)* = f*g*. (iv) Iff is an isomorphism, f* = f- 1 is the adjoint off PROOF. (i) If ff* = fg = idE, while f*f, gf s idD then for eEE, f*e = f*(fg)e = (f*f)ge s ge. Symmetrically, ge s f* e. Thus, f* = g. (ii) As 1..:.::; f* 1.. and f is monotone, f1.. :.::; ff* 1.. = 1.. which implies f1.. = 1.. so f is strict. Let Yen = e. As f* is assumed monotone, to show f* is continuous it suffices to verify that if f* en s d for all n, then f* e s d holds. As f is monotone, en = ff* en s fd for all n so that e s fd. As f* is monotone, f*e s f*fd :.::; d, so f* is indeed continuous. Finally, as 1.. s f1.., f* 1.. s f*f1 :.::; 1.. so that f* 1.. = 1.. and f* is strict. (iii) (gf)(f*g*) = g(ff*)g* = gg* = idE· For dED. g*gfd s fd and, asf* is monotone, f*g*gfd s d. (iv) Obvious. D 4 Definition. Define Dom to be the category whose objects are all domains and whose morphisms Dom(D, E) are the continuous functions f: D -+ E which are strict, that is,J(1..) = 1... Dom is a subcategory of Dome. We then define the category Domadj as the subcategory of Dom of all domains and of maps which have an adjoint. By 3(ii), if f has an adjoint, f is in Dom. By 3(iii), Domadj is closed under composition, and identity maps of Dom are in Domadj by 3(iv). Thus, Domadj is indeed a subcategory of Dom. 5 Example. In Dom, each projection function pri: Dl x ... x Dn ---+ Di has an adjoint. Define prt(x) = (1.., ... ,1.., x, 1.., ... ,1..) with x in the ith coordinate. 6 Example. In Dom, i: D -+ 1 has an adjoint, namely, 1..: 1 -+ D.
309 13.4 Solving Function Space Equations 7 Example. For any two sets D, E, the map f: Mfn(D, E) ------. Pfn(D, E) defined by f(g)(d) ifg(d)={e} e = { undefined else has an adjoint, namely, f*(h)(d) ¢J if h(d) not defined = { {h(d)} else. 8 Example. Let D be any domain. Define f: [D -+ D] --+ D by f(g) = g(1.). Then f has an adjoint, namely, f*(d)(e) = d, that is, f*(d): D -+ D is constantly d. 9 Example. Iff: D -+ E in Domadj then the equation ff* = idE implies that f is surjective and f* is injective. This implies, in particular, that Domadj cannot have an initial object. Let D be any domain and let E be the flat domain 6.1.16 on the set E of subsets of D. By Cantor's theorem (Exercise 13.3.6) there is no surjection D -+ E and so no Domadrmorphism D -+ E. Thus, D is not initial. The Appendix to this section (which may be omitted by readers who are content to simply apply the following result) shows that Domadj meets a number of important criteria for a "category for recursive specification of data types as domains." In particular, it will have the property which motivated this section: 10 Theorem. There is, for each domain A, a domain D defined as the greatest fixed point of the functor t/!: Domadj ------. Domadj , t/!(C) = A + [C -+ C], D ~ A + [D -+ D]. Furthermore, each polynomial functor Domadj ------. Domadj is continuous and has a greatest fixed point. As a result, sets of recursive specifications have greatest fixed point solutions in Domadj . The details are established in the Appendix below. We conclude the body of this section with examples, starting with the example due to Scott and Strachey that introduced this section. 11 Example. For a given set location and sets VI' ... , v,. of atomic values, Values is recursively defined in Domadj by the specification state =: : [location~ --+ value] procedure =: : [value x state ------. value x state] value =: : Vl + ... + V,f + location + procedure + list (value).
310 13 Order Semantics of Data Types Here, if r: DOffiadj X Domadj -----+ Domadj is the functor qA, B) = 1 + A x B then via 12.3.7, list(C) = greatest fixed point ofqC, -) is continuous (though as we mentioned earlier, and will discuss below, list(C) will of necessity contain infinite lists). Since this makes the constituent functors continuous, these specifications may be solved in Domadj as indicated in Section 12.3. Since Value has at least two elements (unless vi' + ... + V,f + location~ is trivial) and [value -+ value] is embedded as a subset of value, these specifications have no solutions in Set. 12 Example. Consider an attempt to form A*B as a data type in some category of domains, the only technical requirement being that the concatenation map 13 A x A*B~A*B a, w 1----+ aw be continuous. (In Set, c arises from the isomorphism f.1-: B + A x A*B -----+ A*B by composing with inAxA*B; presumably a similar construction would work to justify the continuity of 13.) As a result, for fixed a E A, 14 A*B~A*B w 1----+ aw is continuous, being c(a, -). By the Kleene fixed point theorem 6.2.13, t/J has a fixed point v. Thus, so that" A *B" has an infinitely long word after all. We see in particular that to define ft: A -+ B given f: A -----+ A + B in Domadj the approach of 10.2.S (which used not only the terminal object property of the greatest fixed point A *B + A 00 of t/JC = B + (A x C) to define the trace semantics A -----+ A *B + A 00 of f but also the initial object property of the least fixed point A *B of t/J to define last: A *B -----+ B) would require modification. It is possible to define ft in Domadj in a different way. Observe that 15 [A -+ B] ~ [A -+ B] t/J(g)(a) = { a (f(a)) ~a) if f(a)EA if f(a)EB if f(a) = .l is continuous if f is, and so has least fixed point by the Kleene fixed point theorem. This least fixed point is ft. This can be done without domains using essentially the same t/J from Pfn(A, B) to itself, but what is mathematically nicer about the situation in Domadj is that [A -+ B] is itself a data type which is "computable" when A, B are, a statement not true for Pfn(N, N).
311 13.4 Solving Function Space Equations Appendix to Section 13.4 Our work in this Appendix will fall into two parts. The first will establish general criteria for a category C of domains to enable functors like t/I(C) = A + [C ~ C] to be continuous, so that they do have greatest fixed points. The second half of the Appendix will establish that DOmadj meets these criteria. These technical details will not be used elsewhere in the book, and so may be omitted by those readers uninterested in the proof of Theorem 10. 16 Desiderata for a data type category C: (i) C has limits of left chains and a terminal object. (ii) C has "polynomial" endofunctors, all of which are continuous. (iii) C has a "function-space endofunctor," given by C 1------+ [C ~ C] on objects, which is continuous. Unfortunately, we shall see that the functionspace belongs not to C but to a larger category of which C is a subcategory. Putting this another way, we start with the category with the desired function-space, find it has too many morphisms for C 1------+ [C ~ C] to be made into a functor, and thus restrict the set of morphisms in C to obtain functoriality. (iv) The greatest fixed point arising from the functors of (ii) and (iii) accord with the programmer's intuition. The point of the emphasized statement in (iii) is that, although Dome has the morphisms we need for program specification, we have to find a subcategory with fewer morphisms if we are to meet the above desiderata for data type specification. We begin by studying the category Dom of domains and strict maps of 4. 17 Observation. Dom has products. The construction of 13.2.3(i) works with the usual projection functions. Infinite products are constructed in the same way. 18 Observation. Dom has coproducts. Dl + ... + Dn is constructed as the disjoint union with least elements identified, that is, Dl + ... + Dn = {-L} + {I} x (Dl\{J-d) + ... + {n} x (Dn\{J-n}) (where B\A = {x: x E B, x ¢ A}) with ordering -L ~ z for all z, whereas (i, x) ~ (j, y) if and only if i = j and x ~ y in Di. This is clearly a domain, and the injection in i: Di ------+ Dl + ... + Dn mapping -Li to -L and mapping x "# -Li to (i, x) is strict. Given J;: Di ~ E strict, in· D;~' ID 1 +·:·+D. J., V oj, E f defined by f(-L) = -L E ' f(i, x) = J;(x) is strict (since each nontrivial chain is in one of the Di) and f ini = J; because J;(-LJ = -L E •
312 13 Order Semantics of Data Types Recall, by contrast, that Dom" in which the continuous maps need not be strict, does not have coproducts. However, this should not tempt us to use Dom as the setting for program semantics. A continuous f: D ..... D in Dom is strict, so that f(.1) = .1. But then its least fixed point is .1. But this means that in Dom every recursive specification has the undefined function for its semantics. So, we must reject the strictness condition if we are to use the Kleene sequence for the semantics of recursive program specification. We shall see that order semantics requires a different category of domains for program specification from that for data type specification: recall the emphasized remark in 16(iii) above. This is an important observation which must be understood if confusion is to be avoided. However, it does not invalidate the ordered approach, for we have seen that the approach we offered in Chapter 12 used Set as the setting for data type specification, but Pfn or Mfn for program specification. In any case, we must further study Dom to prepare for the study of Domadj which serves as the category of domains which meets the criteria 16 for recursive specification of data types-if it is required (contra our critique of Section 13.1) that all data types be domains. 19 Proposition. Dom has limits of left chains and a terminal object. PROOF. The terminal object is the one-element domain 1 = {.1} with .1 ::::;; .1. That i: D ..... 1 is strict for any domain D is clear. Given a left chain dn: Dn+l ..... Dn in Dom, construct the limit OCn: D ..... Dn in Set as in 11.3.3. Thus, D = {(x n: n = 0, 1,2, ... )lxnEDn, dn(x n+1 ) = x n} with ocm(dn) = d m. Define (xJ ::::;; (Yn) to mean Xn ::::;; Yn for all n. If (x~) ::::;; (x;) ::::;; (x;) ::::;; ... is an ascending chain in D then for each n, x~ ::::;; x; ::::;; x; ::::;; ... is an ascending chain in Dn and so has least upper bound Yn' As dn is continuous dnYn+l = dn(Vmx::'+d = Vm dnx::'+ 1 =Vmx::, (as(x::'ln=0,1,2, ... )ED) = Yn, so that (ynl n = 0,1,2, ... ) E D and hence is the least upper bound of (x~) ::::;; (x;) ::::;; (x;) ::::;; .. '. Clearly, ocn: D ..... Dn is strict. Given another lower bound f3n: V ..... Dn, the unique function f is strict since it is obvious from the construction of D that any function f with ocn! strict for all n is itself strict. 0 20 Proposition. Any polynomial functor 1/1: Dom ..... Dom (defined exactly as in 10.1.22) is continuous and hence has greatest fixed point by the dual of 11.2.13.
313 13.4 Solving Function Space Equations PROOF. This is proved just as in Set. The general result 11.3.8 applies. All that remains is that a coproduct of continuous functors be continuous and here only minor modifications of the proof of 11.3.10 are needed. We leave them to the reader. D 21 Strategy. We seek a subcategory C of Dom with ob(C) with the following virtues: = ob(Dom) and (i) D 1------+ [D --+ D], where [D --+ D] is defined as in 13.2.1, extends to a functor C --+ C. (ii) The terminal object 1 of D is terminal in C. (iii) If dn: Dn+1 --+ Dn is a left chain in C with limit exn: D --+ Dn in Dom as in 19 then exn is in C and, moreover, if (V, Pn) is a lower bound with Pn in C then the unique strict f with exnf = Pn is in C. It follows that exn: D --+ Dn is the limit in C. (iv) The functor [D --+ D] of (i) is continuous. (v) If /;: D; --+ E;, i = 1, ... , n, in C then f1 x ... x fn: D1 X ••• x Dn-E1 X •.. X En' f1 + ... + fn: D1 + ... + Dn - - E1 + ... + En, computed in Dom, are in fact in C. It follows that every polynomial functor Dom --+ Dom maps C into C. Such a subcategory would meet the desiderata of 16 rather well. The "polynomial" functors of 16(ii) are those of 21(v) and these all have greatest fixed points because of 20 and 21(ii, iii). Similarly. 16(iii) is met by 21(i, iv). The aesthetic criterion 16(iv) is argued by pointing out that the underlying set of the greatest fixed point of a polynomial functor is almost the same as if computed in Set (since products and limits of left chains are constructed as in Set whereas the coproduct is very close to the disjoint union) whereas the function-space domain [D --+ D] has been motivated by 13.2.2. Note, however, that C[D,D] for such a C is a subset of the strict maps from D to D and so is much smaller than [D --+ D], which equals the function-space Domc[D, D] of all continuous maps from D to D, whether strict or not. We will in fact satisfy 21 with the subcategory Domadj of Dom. What we have said so far might well apply to subcategories of categories other than Dom. We now show that Domadj satisfies the five conditions of 21, beginning with the following: 22 Definition. D 1------+ [D --+ D] as in 13.2.2 extends to a functor 1/1: Domadj - - Dom adj as discussed prior to 2, namely, for f: D --+ E define I/If: [D --+ D] - - [E --+ E] by (l/If)(D --+ D) = E L D ~ D ~ E. By 3(ii) (I/If)h E [E --+ E]. Recalling how [D --+ D], [E --+ E] are domains from 13.2.2, the continuity of f implies that of I/If since if h ::; I in [D, D] and e E E then (I/Ih)e = fhf*e ::; flf*e = (I/Il)e, whereas if ho ::; h1 ::; h2 ::; ... is an
314 13 Order Semantics of Data Types ascending chain in [D --+ D] then (V t/J(hn}}e. Define (t/Jf)* by t/J(V hn)e = f(V hn)f*e = V fhnf*e = (t/Jf)*(E~E) = D~E~ELD. Then (t/Jf)* is monotone because f* is. (t/Jf)(t/Jf)*t = f(f*tf)f* = (ff*)t(ff*) = t, whereas (t/Jf)*(t/Jf)h = f*(fhf*)f = (f*f)h(f*f) ::; h(f*f) ::; h. Finally, applying 3(iii), if g: E --+ F, t/J(gf)(h) = gfh(gf)* = gfhf*g* = (t/Jg)(t/Jf)(h) and by 3(iv), (t/JidD)h = idvhidJj = h. This shows t/J is functorial. 23 Proposition. The terminal object 1 = {1-} of Dom is terminal in Domadj • o PROOF. This is immediate from 6. We turn next to establishing 21 (iii) for Domadj , namely, that it forms limits --+ Dn be a left chain in Domadj . Define IXn in Dom. To this end let dn: Dn+1 24 Dm d mn ) Dn = dn. "dm-l { id d:- 1 "'d~ ifm > n if m = n ifm < n. 25 Lemma. For all m, n commutes. PROOF. If m > n, dndm(n+l) = dnd n+1 ... dm- 1 = dmn . If m = n, dndm(n+1) = dnd: = id = dmn . If m < n + 1 then dndm(n+1) = dnd: ... d! = d:- 1 ... d~ = 0 ~. 26 Definition. Let IXn: D --+ D be the limit of dn: Dn+1 27 Lemma. Let (xn)ED so that dnx n+1 dmnxm ::; Xn for all n. = Xn --+ Dn in Dom, as in 19. for all n. Then for any fixed m, PROOF. Ifm ~ n, dmnxm = dn···dm-1x m = X n. Ifm < n, dmnxm = d:- 1 "'d!x m = d:- 1 "'d!d m X m +1 ::; d:- 1 ·· ·d!+lXm +1 ::; ... ::; d:-1dn-1x n ::; X n· 0 28 Proposition. The conditions of 21(iii) hold, namely, that the Dom limit of a chain in Domadj has limit projection in Domadj and the unique strict map induced by a lower bound in Domadj is again in Domadj . PROOF. The maps dmn are in Dom by 3(ii). Hence, the content of Lemma 25 is that (Dm' (d mn : n = 0,1,2, ... » is a lower bound of dn: Dn + 1 --+ Dn in Dom,
315 13.4 Solving Function Space Equations inducing a unique strict a! with ama n = dmn for all n: IX Then ama: = dmn = id. To show a! is the adjoint of am let (xn)ED and show a!am(xn) ::; (xn) or, equivalently, for each n that ana!x m ::; Xn (since ak(xl) = x k). But as ana! = dmn this reduces to dmnxm ::; Xn which is precisely Lemma 27. This shows that anEDom adj . Now let (V, (f3n)) be a lower bound of dn: Dn+l --+ Dn in Domadj , so that 29 D··~~/t v As (V, (f3n)) is certainly a lower bound in Dom, there exists unique strict f \\")D. V We must show that f has an adjoint f*: D --+ V. Let (xn) E D. Claim that f3~am(xn) = f3~xm is an ascending chain. Indeed, f3~xm = f3~+l d~xm (as dmf3m+l = 13m- use 3(iii)) = f3~+ld~dmxm+l ::; f3~+lXm+l. So define 30 By 3(ii), f3~ is strict. It follows from the proof 13.2.2 that f* is continuous, monotone in particular. To show f*f::; idy, f*fv = V f3~amfv (as x = (amx: m = 0,1,2, ... ) for all x in D) = V f3~f3mv. As P~f3mv ::; v, v is an upper bound of(f3~Pmv) so that f* fv ::; v. Finally, we show ff* = id. We must show anff*(xm) = Xn for all n, (xm) E D. To do this we exploit D dmn .~). D . (m 2 n) v which is immediate from 24 and 29. Computing, !Xnff*(xm) = Pn V P~xm = Vm:?:.of3nP~xm (continuity of f3n). But for any n, an ascending chain Yo::; Yl ::; Yz ::; ... has the same set of upper bounds as Yn ::; Yn+l ::; Yn+2 ::; ... so that both have the same least upper bound. Thus, !Xnff*(xm) = Vm:?:.nf3nP~xm = Vm:?:.ndmnPmf3~xm = Vm:?:.ndmnxm = Xn as each dmnxm = Xn· D 31 Lemma. In Domadj let d n: Dn+l --+ Dn be a left chain and let f3n: V --+ Dn be a lower bound. Then f3~ 13m V is an ascending chain in V for every v E V, and (V, (f3n)) is a limit of (d n ) if and only if V f3~f3m = idy.
316 13 Order Semantics of Data Types PROOF. Let oc.: D -+ D. be the limit as in 19 which is then a limit in Domadj by 28. Let g: V -+ D be the unique map in Domadj with rx..g = f3. for all n. By 30 g*(x.) = V f3!x m so that g*gv = g*(rx..gvln ~ 0) = V f3!rx. mgv = V f3!f3mv. Thus, g*g = Vf3!f3m, and this is always defined. But then (V, (f3.)) is a limit ifandonly if g is an isomorphism if and only if g*g = idv · 0 c P be such that h = LUB(A) exists and suppose B c A is such that for all a E A there exists b E B with a ::; b. Then LUB(B) exists and LUB(B) = h. 32 Lemma. Let (P, ::;) be a poset, let A PROOF. That h is an upper bound for B is clear. Now let u be any upper bound for B. If a E A there exists bE B with a ::; b. As b ::; u, a ::; u; thus, u E UB(A) and h ::; u. D 33 Proposition. The functor 1/1: Domadj ---+ Domadj , ljID = [D -+ D] is continuous. PROOF. Let d.: D.+l -+ D. be a left chain in Domadj with limit rx..: D 28. Let f3n = I/IIX.: [D -+ D] ---+ [D. -+ D.] so that f3.h = rx..hlX:, f3: I = rx.: IIX., f3:f3.h By Lemma 31 we must show V1X:rx.. = idv . Thus, = -+ D. as in rx.:rx..hrx.:rx.•. V f3: f3.h = h. Lemma 31 guarantees that h = h idv = h V rx.: IX. = V hrx.:IX. (h is continuous) = idv V hrx.:rx.n = Vmrx.~rx.m V. hrx.:rx.. = Vm V.rx.~rx.mhrx.:rx.. (rx.~rx.m is continuous). Now if m ::; n, rx.~rx.mhrx.:lXn::; rx.:lXnhrx.:OC. since OCtrx.k is an ascending chain whereas, if m > n, rx.~rx.mhrx.:rx.. ::; rx.~rx.mhoc~rx.m because OCtrx.k is ascending and oc~ocmh is monotone. By Lemma 32, h = V. oc:oc.hrx.:rx.n = (Vn f3: f3.)h as desired. o 34 Proposition. The conditions of 21(v) hold in Domadj , namely, every polynomial functor on Dom maps Domadj into Domadj . PROOF. Since x, + are the product and coproduct in Dom as in 17 and 18, 10.1.18 and its dual apply. Using 3(iii) we have (fl x ... x fn)(fl* x ... x f.*) = fdl* X ••• x fnf.* = id x ... x id = id, whereas (ft* x ... x f.*)(fl X .•. x f.) = ft*fl X .•• X f.*f.· But in a product domain, (x 1 , ••• ,x.)::; (Yl' ... 'Y.) if
317 Notes and References for Chapter 13 and only if Xi :$; Yi' Hence, /;*/; :$; id implies 11*/1 x ... x J"*J,, (f1 x ... x J,,)* = It x ... x In*. The proof that (f1 + .. , + J,,)* = 11* + ... + J,,* is similar. :$; id. Thus, 0 Notes and References for Chapter 13 The Scott-Strachey approach to formal semantics was set forth in D. S. Scott and C. Strachey, "Towards a mathematical semantics for computer languages," in Proceedings of the Symposium on Computers and Automata (J: Fox, ed.), Polytechnic Institute of Brooklyn Press, 1971, pp. 19-46. The relation ofthis to the A-calculus can be seen from D. S. Scott, "Models for various type-free calculi," Proceedings of the IVth International Congress for Logic, Methodology and Philosophy of Science IV (P. Suppes, L. Henkin, A. Joja, and G. C. Moisil, eds.), North-Holland 1973, pp. 157-187. The (pre-categorical) statement of the formation of data types by inverse limits in some space of data types is given by D. S. Scott, "Data types as lattices," SIAM Journal of Computing, 5, 1976, pp. 522-587. All these matters are treated in textbook form by J. E. Stoy, Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory, MIT Press, 1977. By contrast, for an informal use of the Scott-Strachey approach without attention to the mathematical issues involved see M. J. C. Gordon, The Denotational Description of Programming Languages, An Introduction, Springer-Verlag, 1979. Godel introduced his numbering in "Dber Formal Unentscheidbare Siitze der Principia Mathematica und Verwandter Systemes, 1," Monatshefte for Mathematik und Physik, 38, 1931,pp. 173-198. (An English translation appears in M. Davis, The Undecidable, Raven Press, 1965.) An exposition of computability theory, including the use of Godel numbering therein, is given by A. J. Kfoury, R. N. Moll, and M. A. Arbib, A Programming Approach to Computability, Springer-Verlag, 1982. The notion of an adjoint map as generalizing a concept of Scott's is due to Gordon Plotkin; the way in which this concept is used here in making clear the relation between different categories of domains for programming language semantics is due to the authors. For an elementary introduction to Cartesian-closed categories and Heyting algebras see the book by Goldblatt cited in the Chapter 2 notes. Exercise 13.3.6 is an unpublished observation of F. W. Lawvere from the late 1960s. One of the problems in writing an introductory textbook is that many exciting topics have to be omitted. One such topic is that of the semantics of concurrent programs. For order semantics, the key notion is that of a power domain, designed to generalize the construction of the set 2D of subsets of a set, while avoiding the paradoxes associated with a recursive specification of the type D::= 2D. Power domains were introduced by G. D. Plotkin, "A power domain construction," SIAM Journal of Computing, 5, 1976, pp. 452-487. Our own approach to concurrency, avoiding order semantics, is given by M. Steenstrup, M. A. Arbib, and E. G. Manes, "Port automata and the algebra of concurrent processes," Journal of Computer and System Sciences, 27, 1983, pp. 29-50. For a well-received approach, see R. Milner, A Calculus of Communicating Systems, Springer-Verlag, 1980.
CHAPTER 14 Equational Specification 14.1 Initial Algebras 14.2 Sur-reflections A collection of sets Xl' ... , Xn together with various functions of the form Xi! x ... X X ik ----+ X ik +! constitutes a "many-sorted algebra." Section 1 gives examples of data types which arise as many-sorted algebras. An "equational specification" for a data type posits a many-sorted algebraic structure subject to a finite set of equations. What is attractive about this idea is that equational specifications are easily formalized within programming languages and have been partially implemented in experimental languages such as CLEAR, ACT ONE, CLU, and others. This provides a tool to define data types useful in programming and additionally promises to make available a useful research aid for pure mathematicians who study equationally defined algebraic structures. It is an unavoidable fact that each consistent equational specification has many different models. The pure mathematician mentioned above has this in mind from the beginning. The programmer interested in defining a definite data type, however, seeks "the minimal structure satisfying the given equations and no other equations." This is formalized as the initial object of the category of all models of the specification. In Section 1 we ofTer basic examples and introduce "sorted functors" which are simply endofunctors X on a suitable category. Appropriate subcategories of X-algebras lead to a rapprochement with the least fixed points introduced in Chapter 10. These subcategories are characterized in Section 2 where it is also proved that the desired initial objects exist. In general, the proof is nonconstructive and uses "very high orders of infinity." This nonconstructiveness is well known to be unavoidable. Many proponents of equational specification feel that almost all data types should be specified equationally. As discussed at the end of Section 1
319 14.1 Initial Algebras and elsewhere in the Chapter, we feel that this approach creates more problems than it solves. It remains for future language designers to find the proper role for equational specification. We have introduced other methods in the preceding chapters. 14.1 Initial Algebras Our goal in this section is to give some basic examples of equational specifications, to formalize their semantics as initial single- or many-sorted algebras, and to introduce their generalization as algebras over "sorted functors." 1 Example. An equational specification of the data type "Boolean" is as follows: F,T I B ( I B fv B2 (i.e., there is a set B with two constants F (for "false"), T (for "true"), a unary operation I ("negation"), and a binary operation v ("or")) subject to the six equations: 2 IF I = T, T=F, TvT=TvF=FvT=T, F v F = F. What we have in mind is the set B = {T,F} for which the above equations would force the usual definitions of "negation" and "or." The additional property 3 "if x E B then x = T or x = F" would guarantee this, but 3 is not in the form of an equation. Well-known theorems of "first-order model theory" guarantee that 3 can not be equivalently reexpressed using equations, even if infinitely many are allowed. (See 15 below for a definition of equation.) The point of limiting constraints to equations is simplicity, both from the point of view of syntax and efficiency of implementation. As it stands there are many models of these equations, for example, B = 2x for any set X, with F = 0, T = X, I = complement, and v = union. The "intended" model is the initial object in a suitable category of models. To make this precise, give.n two models (B, T,F, I, v), (B', T',F', I', Vi)
320 14 Equational Specification satisfying (the appropriately tagged version of) 2, define a morphism f:(B,T,F, i , V)_(B',T',F', ii, Vi) between two models to be a total function f: B --+ B' subject to f(T) = T ', 4 f(F) = F', f(i x) = i'(f(x)), f(x V y) = f(x) v'f(y). The conditions 4 are natural. They may be paraphrased as "f preserves the structure." It is easily seen that this yields a category under the usual composition of total functions and with the usual identity functions. Furthermore, the intended model (B, T, F, i, v) with B = {B, F} is an initial object in this category since for arbitrary (B', T', F', ii, Vi), f(T) = T', f(F) = F' is the requisite unique morphism. 5 Example. An equational specification of "the natural numbers" is l~Q~Q (so far we have an arbitrary set Q, an element 0 E Q, and a function s: Q --+ Q) subject to the empty set of equations. What we have in mind is Q = N with 0 the usual zero and s(n) = n + 1 the successor map. By iterated composition we can generate sn: Q --+ Q and we can apply such to 0 to get sno. But in the intended model no equations such as sn = sm sno ~ smo (m"# n) (m "# n) hold, and it is hard to think of any equations to impose. Many other properties come to mind such as ifn "# m then smo "# sno or if q E Q then q = sno for some n, but these statements are not equations. However, as has already been shown in 2.2.27, the intended model is the initial object in the category of all models of the specification. 6 Example. Let E = {e1, ... ,ep } be a finite alphabet. An equational specification for the data type "stacks of E" analogous to 12.2.1,2,5,6 is as in 7 together with equations to be discovered.
321 14.1 Initial Algebras 7 A Here the intended model is A = E + {aerr}, S = E* + {serr} where aerr and serr represent "alphabet error" and "stack error" constants which are introduced so that all functions may be total. Given two such models 7 and 7' (where in 7' we write A', e;, top', etc.) a morphism from the model of 7 to that of 7' is a pair f: A --+ A', g: S --+ S' of total functions which preserves the structure in the expected sense: 8 (i = 1, ... ,p), f(aerr) g(A) = aerr', = A', g(serr) = serr', g(pop(w» = pop'(g(w», top'(g(w» = f(top(w», g(push(x, w» = push'(f(x),g(w». Now consider a model 7. There is an obvious candidate for a morphism from the intended interpretation to this model, namely, 9 and g: E* 10 E + {serr} + {aerr} ~ A, f(eJ = ei , f(aerr) = aerr --+ S inductively defined by g(serr) = serr, g(A) = A, g(xw) = push(x, g(w» (xEE, wEE*). Appropriate equations, which together with 7 define an equational specification of "stack of E," are discovered by exploring what is needed to make 9 and 10 define a unique morphism. Since push(x, w) = xw is the intended model, any morphism must satisfy 9 and 10. Hence, f, 9 are unique without imposing any equations. But certain equations are suggested if f, 9 are to be morphisms. Consider the need for g(pop(w» = pop(g(w». The inductive step would
322 14 Equational Specification be for XW. Now pop(xw) is w in 12.2.2, whereas by 10 pop(g(xw)) = pop(push(x,g(w))). These are the same provided that we impose the equation pop(push(x, w)) = w which is certainly true in the intended model. Proceeding in this way, we leave it to the reader to show that the intended model is the initial object in the category of models 7 satisfying the equations 11. 11 For all xEA, WES, pop(push(x, w)) = w, top(push(x, w)) x, = pop (A) = pop(serr) top (A) = top(serr) = = serr, aerr, push(aerr, w) = push(x, serr) = serr. With these examples in tow we may now give the general definitions needed to define equational specifications. First of all, data types may require several sets. In Example 6 there were the sets A, S. Operations take the form Xl x .. , X X k ~ Xk+l, where each Xi is one of A, S and if k = 0 the operation is a constant of Xk+l' Now imagine the syntax of a programming language that declared, for example, 7. First a set of sorts would be declared, for the two sorts of set. Here {alpha, stack} would be appropriate. We then might declare 7 with the self-evident syntax [ ~ alpha] = Xl"'" [ ~ stack] = A, x P ' aerr, serr, [stack ~ alpha] = top, [stack ~ stack] = pop, [alpha x stack -----+ stack] = push. The mathematical definition of a signature is precisely this sort of declaration save that we write Qx, ... xnx instead of [Xl x ... X Xn -----+ xl (Note that the use of S for the set of sorts in 12 should not be confused with S for stacks in 7.) 12 Definition. A signature is (S, Q) whose S is a finite nonempty set (of sorts) and Q is a family (QwIWE S+) (recall S+ is the set of nonempty strings in S) of sets with Q w n Q v = 0 if w f= v. 13 Definition. An algebra over a signature (S, Q) (usually called an Q-algebra for short) is a structure (Q, S) where Q is a family (Qxlx E S) of sets, that is, there is one set of each sort, and c5 is a family of operations (c5",lwE U Qw)
323 14.1 Initial Algebras where if W E aXl ... XnX' (jw has the form Q Xl x···xQ Xn ~Q X (this being a constant in Qx if n = 0). A signature or any of its algebras are called many-sorted if there is more than one sort and single-sorted if there is only one sort. Examples 1 and 5 involve single-sorted algebras. Example 6, as already discussed, has two sorts. 14 Definition. The set of terms of a signature (S, a) and the type of each term are mutually defined inductively as follows: (i) If XES and i is an integer ~ 1 then vx , is a term and the type of vx , is x. (Such vx , is the ith variable of type x.) (ii) If WE aXl ... XnX and t 1, ... , tn are terms such that the type of ti is Xi then w(t 1 , ••• ,tn ) is a term of type x. Intuitively, terms codify what appears on one side of an equation. Thus, 11 provides several examples. For instance, for brevity write x = Va1ph l , W = Vstackl. Then top(push(x, w» is a term of type alph as follows: x is a term of type alph, w is a term of type stack by (i); push(x, w) is a term of type stack by (ii); top(push(x, w» is a term of type alph by (ii). The "variables" in top(push(x, w» are x, w. If (Q, (j) is any algebra this term is interpreted as a function. QalPh x Qstack ---+ Qstack· The type of the term is the sort where this function takes values. We leave the precise inductive definition to the reader as Exercise 2. We then have 15 Definition. Let (S, a) be a signature. As a-equation is any pair of terms of the same type, such an equation is written t1 = t2 rather than (t1,t 2). An a-algebra (Q,(j) satisfies the equation tl = t2 if the interpreted functions of t 1 , t2 on (Q,(j) are equal. The following is then the centl"al definition of this section. 16 Definition. An equational specification is (S, a, E), where (S, a) is a signature and E is a finite set of a-equations. The data type specified by (S, a, E) is the initial object of the full subcategory of all a-algebras satisfying all equations in E where a morphism f: (Q, (j) ---+ (Q', (j') of a-algebras is a
324 family f 14 Equational Specification = Uxlx E S) where each Qx~Q~ is a total function and the algebra structure is preserved in the sense that b:"UXl(ql), ... ,jxJqn» = fx(b w (ql,···,qn» for all qi E Qx;. (See Exercise 3.) The existence of this initial object is proved as Corollary 14.2.25 below. One special case is worthy of note here. 18 Theorem. For an equational specification (S, n, 0) with no equations, the specified data type (Q, b) may be constructed with Qx the subset of terms of type x which have no variables. PROOF. See Exercise 7 for an outline. D Theorem 18 applies to Example 5. If S = {x}, all terms (necessarily of type x) are one of sn(o), sn(x) and so the terms with no variables are the sn(o) as expected. We now wish to develop the idea that n-algebras are really XQ-algebras in the sense of Chapter 10. This idea is simple. Consider the n-algebra appropriate to Example 6, so that such a (Q, b) has the form 19 """""7 QP 1 A'~ ,aI h top Q..ack +-(-pu-s-:-h- U Q'IPh X Q"ack pop Using the device that a family of morphisms Ai -. B amounts to a single morphism 11 Ai -+ B, all of 19 can be expressed as a single morphism of the form
325 14.1 Initial Algebras Here a morphism (A, B) ---. (A', B') is defined to be a pair of total functions A ~ A', B ~ B' with no further conditions, and, recalling the notation [nJ = {1, ... ,n},wedefine 20 Xn(Qalph' Qstack) = ([p + IJ + Qstack, [2J + Qstack + (QalPh x Qstack)), noting that a morphism [nJ ~ Q amounts to a list of n elements of Q. Viewing things from this point of view allows us, in the next section, to put equational specifications and functorial least fixed points in a common framework. This generalizes both theories in a useful way. The idea of this chapter works for a much larger class of categories than we actually introduce. This limitation keeps the basic simplicity of the ideas in focus. Some further developments are suggested in Exercise 9 and some of the exercises to Section 2. 21 Definition. If S is any non empty set, the category SetS is defined as follows. An object A is an S-indexed family of sets, A = (Ax: XES) with each Ax a set. A morphism I: A ~ B is a family 1= (fx) of total functions of the form Ix: Ax ~ Bx for all XES. Composition is the usual one on each coordinate, that is, for g: B ~ C, (gf)Aa) = gx(fx(a» for all XES, a E Ax. The identity morphism idA: A ~ A is defined by (idA)x(a) = a. That this constitutes a category is obvious. 22 Definition. If (S, n) is any signature we define a functor Xn: SetS ~. SetS so that an Xn-algebra in the sense of 10.2.1 is essentially the same thing as an n-algebra as defined in 13. To this end, if (Ax) is an object in SetS and WES* let 23 ifw 24 A W =A Xl x"'xA Xn = Xl ... Xno (If W= A, define Aw Xn(Ax) = (Bx), = {A}.) Define Xn on objects by Bx = li(AwlwES*,WEn wx )' This says that for each win S*, Bx contains a separate copy of Aw for each W in nwx ' (Recall that Definition 12 guarantees that each W labels an element of at most one n wx , so we can write (B x )", = Aw for the corresponding copy. Thus, Bx contains as many copies of Aw as there are elements of n wx ; if nwx is empty for all w, Bx = 0.) The reader should now pause to verify that 20 is a special case of 23 providing we identify [nJ, with an n-fold coproduct of singletons. We then already see that an Xn-algebra 15: XnQ - - + Q has the form of a family (bxlx E S) with bx of the form U(QwIWES*,WEnwx) ~ Qx' By the coproduct property, each such bx has the form <b",lwES*,wEn wx ) where each such 15", has the form
326 14 Equational Specification Qw~Qx. The resulting family of bro is then exactly the same as in 13 so an Xn-algebra "is" an n-algebra under the bijective correspondence b <+> (b",) induced by the definition of a SetS-morphism and the coproduct property. There is no difficulty in defining Xn on morphisms to make it a functor in such a way that an Xn-algebra morphism corresponds to an n-algebra morphism as in 17. We leave proof details to the reader, but the definition is as follows: Given f: A ~ A in SetS, define Xnf = (gx) where, in the notation of 23, gx: Bx ~ Ex is defined by 25: 25 26 Example. For Xn as in 20, if f: A ~ [p maps 1, ... , p [2] + 1] + Astack galph A then Xnf = (gaIPh' gstack). Here, ) [p + 1] + A.tack + 1 to themselves and maps Astack to Astack using !stack, whereas + Astack + A alPh x Astack g"aok) [2] + A.tack + A alPh x A.tack is the map 1~1, 2~2, a 1---+ faIPh(a) (a, b) 1---+ U;'IPh(a),!stack(b)) (aEA aIPh )' (a E A alPh ' bE Astack). Having presented the basic definitions, we now assess some of the problems associated with defining data types by equational specification. In situations such as Examples 1 and 5, where the desired equations seem clear at the beginning, the method works well. In a situation such as Example 6, where the intended model seems clearer than the set of equations, serious problems may arise: Problem (i). More than finitely many equations may be needed. Problem (ii). If may be difficult to reconcile the initial object condition with intuitively correct equations of the intended model. A different problem, but a crucial one from the point of view of implementation of equational specification, is the following: Problem (iii). Given equational specification (S, n, E) there may be no effective algorithm to decide whether or not a given n-equation holds in the specified data type.
327 14.1 Initial Algebras Problem (iii) creates situations where nonconstructive mathematical proofs of the existence of a specified data type exist, but no effective construction of these data types can be given. A practical illustration of some of these difficulties arises by grappling with the relationship between stacks and queues. Our earlier approach in Section 12.2 seemed natural to us. The fundamental set is the set of lists. This is then interpreted as stacks, or as queues depending on which functions one is allowed to use. From the point of view of using equational specification, we have defined stacks in Example 6 and it is still necessary to find appropriate equations for queues. We have left this to the reader as Exercise 10. Once this is achieved, it would be a nice victory for equational specification to mathematically provide natural reasons why the same set-lists-underlies the data type stacks and queues. Unfortunately, we know of no proof other than to verify that both initial objects are built from lists. In short, our criticism is that we have put the cart before the horse in that what we already knew about stacks and queues is used to build an equational specification whereas equational specification has not helped us to better understand these data types. EXERCISES FOR SECTION 14.1 1. Show explicitly that for !, 9 in 9 and 10 the equations of 11 force top(g(w)) = !(top(w)), g(push(x, w)) = push(f(x),g(w)). 2. For any signature (S, a) as discussed just prior to 15 define "the variables of term t" and define "the function interpreting a term on an a-algebra" by induction on how the term is built. 3. Prove that with the morphisms of 17 and with composition (gf)x = Qx~Q~~Q~ and identities (idQ)x = id Qx that a-algebras form a category. 4. Verify in detail that the single-sorted data type {T, F} of Example 1 is the data type specified by an equational specification in the sense of 16. 5. Repeat Exercise 4 for Example 5. 6. Repeat Exercise 4 for Example 6. 7. Prove Theorem 18. [Hint: For WE ax, ... xnx, let 14(ii) define Dw- Exercise 2 provides the initial object property.] 8. Prove carefully that Xn is functorial and that the bijective correspondence between Xn-algebras and a-algebras is in fact an isomorphism of categories in the sense of the exercises of Section 11.3. 9. Generalize Definition 21 to C S for any category C. 10. Give an equational specification for queues.
328 14 Equational Specification 14.2 Sur-reflections In Section 1 we saw that the equational approach to abstract specification of data types is to define a finite set E of equations over some signature (S, Q), and then define the data type so specified as the initial object of the corresponding category of Q-algebras. In this section we develop the proof that such an initial object exists. The reader willing to accept this fact without proof may omit this section. We begin by studying the structure of the category SetS of 14.1.21. 1 Observation. SetS has products, coproducts, and co limits of right chains and limits of left chains, and these are constructed independently on each coordinate as in Set. Thus, if Ai = (Ai.xlxES) in SetS, define A by Ax = TIiAi,x' Then with projections pri: A -+ Ai, the usual projection pri,x: Ax -+ Ai,x for each x, this is a product in Sets. Coproducts are constructed similarly. Let us outline most of the details for colimits of right chains. (Limits of left chains are then similar.) Let An ~ An+1 be a right chain in Sets. Thus, for each XES, is a right chain in Set and so has a colimit ocn,x: An,x -+ Ux as in 11.1.7. Thus, OCn: An -+ U is a morphism in Sets if U = (UxIXES). We have in SetS because we have for each XES. The rest of the proof that (U, oc) is a coli mit follows a similar pattern: do whatever is needed for each x separately. 2 Proposition. For every signature (S, Q), Xu: SetS -----+ SetS is bicontinuous. PROOF. This is an easy consequence of 11.2.5-9 and 11.3.8,10. D In particular, we recapture 14.1.18, the existence of an initial Q-algebra, by applying the Generalized Kleene theorem 11.2.13. Of course, we have extra work to do to establish an initial object among
329 14.2 Sur-reflections those algebras satisfying a given set of equations. According to Definition 14.1.16, we seek the initial object of certain subcategories of O-algebras-or, equivalently, of Xu-algebras. The subcategories are characterized by the satisfaction of equations. We shall first show that such subcategories must be closed under products and sub algebras and then go on to see why any such subcategory must have an initial object. We begin with the relevant definitions. 3 Definition. Let (S,O) be a signature and (Q;,15;)liE1) be a family of Xualgebras (1 is not necessarily countable). Let Q = Qi in Sets. By the product property there exists unique 15 with n for all i E 1. Then (Q, 15) is an Xu-algebra. From the !l-algebra point of view, if WE OX1"'XnX then it is routinely checked that 15",: QXl x ... x QXn --+ Qx is defined by 15",((q;xl,···,qix)liE1) = (r;),ri = 15;",(Q;xl,···,Qix). Such (Q,15) is called the product algebra. See Exercise 3. When 1 = 0 the definition is satisfied with the obvious terminal algebra with each Qx = 1 a one-element set and with the unique morphism XnQ --+ Q. 4 Definition. Let (S,O) be a signature and let (Q,15) be an Xn-algebra. For each XES let Ax c: Qx with inclusion map incx : Ax --+ Qx, incAa) = a. Then A = (Ax) is a subalgebra of (Q, 15) if there exists Yx: XnA --+ Ax with --:j-nc--+l Q Evidently, such y exists in at most one way, and exists if and only if A is "closed under the operations," that is, for wEOX1 ... XnX and a1EA x" ... , an E AXn' 15",(a 1, .. . , an), necessarily in Qx, is in fact in Ax. Note that (A, y) is an algebra in its own right. We also say "(A, y) is a subalgebra of (Q, 15)." 5 Observation. Let (S,O) be a signature and let E be any set of O-equations. Then the class of all O-algebras satisfying all equations in E is closed under products and sub algebras, that is, if (Q;, 15/) satisfies E (i E1) then their product algebra (Q,15) also satisfies E, and, (ii) if (Q, 15) also satisfies E, and (A, y) is a subalgebra of (Q, 15) then (A, y) also satisfies E. (i)
330 14 Equational Specification PROOF IDEA. Let t be a term of type x with variables Xl' ... , x n • Then the interpretation of t on Q is the I-tuple of interpretations on the Qj, that is, t maps (%lieI)eQxj to (rjlieI)eQx, where rj is the interpreted value of t in (Qj)x of (qjl, . .. ,qjn). Thi!l is because the n-operations in a product are performed on each coordinate separately. Similarly, if A is a subalgebra of (Q,(j) and ajeA Xi ' t maps (aj) to the same value of Ax regardless of\\lhether t is the interpretation in (A, y) or (Q, (j) since the operations on a subalgebra just restrict those of the ambient algebra. The conclusions (i) and (ii) above are then clear. . 0 6 Example. Let (S, n, E) be the equational presentation of stacks as in 14.1.611. Let (Q,(j) be the initial stack QalPh = E + {aerr}, Qstack = E* + {serr} discussed earlier. Then the product (R, y) = (Q, (j) x (Q,(j) has RalPh = (E + {aerr}) x (E + {aerr}), Rstack = (E* + {serr}) X (E* + {serr}). In (R, y), push«x, aerr), (w, v)) = (push(x, w), push(aerr, v)) = (xw, serr), so that top(push«x, aerr), (w, v)) = top(xw, serr) = (x, aerr). This is expected as (R, y) satisfies all equations in 11. (R, y) is very different for the intended model (Q, (j). For example, the constants ej are (ej, ej) e Ralph but Ralph has more arbitrary elements of form (e j , ej). This illustrates how the class of n-algebras satisfying E has many models that differ from any "intended" one and that some device such as initiality is needed to specify a particular model. For the balance of this section we fix a finite set S, an arbitrary cocontinuous endofunctor X: SetS __ SetS and a full subcategory B of the category (10.2.1) X-Alg of X-algebras. Our main objective is to discover conditions on B which, on the one hand, hold when X has the form X(l and B is the subcategory of all a-algebras satisfying a set of equations, and which, on the other hand, guarantee that B has an initial object. To this end we require that every X -algebra has a "sur-reflection" in B. The proof that B has an initial objeGt requires some study of surjective functions and "image factorization" ~n Set. We present these ideas in a way that leads naturally to generalization to more arbitrary categories than SetS, as is explored in the end of chapter I1ot~s. 7 Definition. Let (Q,(j) be an X-algebra. A reflection of (Q,(j) in B is a morphism (): (Q, (j) - - (B, y) in .y-Alg such that (B, y) js in B with the property that whenever ()/: (Q, (j) - - (B', y') is an X-aige~r~ morphism
331 14.2 Sur-reflections () 8 (Q'~~"J;) (~"Y') with (B', Y') in B then there exists unique f: (B, y) - - (B', y') in X-Alg with fO = 0' as shown. Clearly, (B, y, 0) are unique up to isomorphism when they exist. See Exercise 11. Our interest in reflections lies in the following. 9 Proposition. If the initial X-algebra (Q,o) has a reflection 0: (Q, 0)-- (B, y) in B, (1J, y) is an initial object of B. PROOF. (Such initial (Q,o) exists by the Generalized Kleene theorem 11.2.13 in view of 1 and the assumed co-continuity of X.) Now let (B', y') in B be arbitrary. As (Q, 0) is initial, there exists unique 0' as in 8 and hence unique f If g: (B, y) ---+ (B', y') is arbitrary then as (Q, 0) is initial necessarily gO = 0' so that, by the definition of a reflection, g = f 0 10 Example. Consider the single-sorted signature I~Q. An algebra is simply a set with two constants. Let B be the full subcategory of all algebras satisfying the equation c =d. If (Q,o) is an arbitrary algebra with constants oe' bd its reflection in B is obtained by identifying oe and Od' that is, let , 6 Q ---+B, That (B, y) is in B if Ye = Yd = e is clear and that 0 is a morphism is obvious. Now consider 0' in 8. To see that f as in 8 is induced, it pays to first establish the following fact from set theory which we shall need later as well. 11 Lemma. Let 0: Q -+ B be a surjective function and let 0': Q -+ B' be an arbitrary function. Then there exists f: B -+ B' with Q () IB ~lf B'
332 14 Equational Specification if and only if for all q, r E Q, ()' q = ()' r whenever ()q = ()r. Furthermore, f exists it is the only f with f() = ()'. if such PROOF. If f() = ()' then if ()q = ()r, ()' q = f(()q) = f(()r) = ()'r. Conversely, define f as follows. Given bE B, as () is surjective choose qo with ()qo = band define fb = ()' qo. Let q E Q. Then by the definition of f, f()q = ()' qo where ()qo = ()q. By hypothesis, ()' qo = ()' q so f()q = ()' q and this shows f() = ()'. Finally, f is unique because if also g() = ()' then if fb = ()' qo as above, gb = g()qo = e'qo = fb. 0 Returning to Example 10, the only situation in which ()q = ()r with g =1= r is when {q, r} = {<5e , <5d } and in this instance, because ()' is a morphism and (B', 1') is in B, ()' q = ()'r. It follows from Lemma 11 that there exists a unique function f with f() = ()'. The argument is complete if such f is an algebra morphism. This is established by the following lemma which is also needed later. 12 Lemma. Let (): (Q, (5) ---+ (B, y), ()': (Q, (5) ---+ (B', y') be X-algebra morphisms and let f: B --+ B' be a morphism in SetS such that () Q~)' B' commutes. Then if ()x: Qx --+ Bx is surjective for each is an X -algebra morphism. XES, f: (B, y) ---+ (B', y') PROOF. Let XES. As ()x: Qx --+ Bx is surjective there exists d x: Bx --+ Qx with ()xdx = id Bx • (If Qx = 0 necessarily Bx = 0 so let dx = id0 , else let dAb) be any q with exq = b.) This defines d: B --+ Q in SetS with ()d = idB • As X is a functor, (X())(Xd) = id XB • It follows that each (X())x: (XQ)x ---+ (XB)x is surjective since for any t E (X B)x, t = (X())xu if u = (xd)x. Now consider the diagram Xf I XB' I B' lr' ? Q --()=----+l B f The square marked ? is what we need to establish as commutative. But as f() = ()', (Xf)(X()) = X(f()) = X()', the outer rectangle commutes by our assumption that ()' is an algebra morphism. Thus, (y'(Xf))X() = y'((Xf)(X())) = f()<5 = (fy)X(), where the last equality uses the assumption that () is an algebra morphism.
333 14.2 Sur-reflections Hence, for XES, (y'(Xf»x, (fy)x are two functions that agree when preceded by the surjective function (XO)x, and so these are equal. 0 We now introduce the central definitions of this section. 13 Definition. A sur-reflection of (Q, b) in B is a reflection 0: (Q, b) ~ (B, y) of (Q, b) in B such that Ox is surjective for all XES. Generalizing Definitions 3 and 4 where X = X n , the product (Q, b) of a family ((Qi' bJI i E J) of X-algebras is defined by Q = Qi (the product in SetS as in 1) and n (i E J) - - - - + ) Qj prj Furthermore, if (Q, b) is an X -algebra and Rx c Qx for all XES then R = (RJ is a subalgebra of (Q, b) if there exists (necessarily unique) b o with XR 00 X(inc) ) xQ I I : oj. R inc ) 10 Q where incAr) = r are the inclusion maps. B is a quasivariety if the product of any family of algebras in B is again in B and if every subalgebra of an algebra in B is again in B, that is, "B is closed under products and subalgebras." The empty product is included, that is, we require that the terminal algebra Xl ~ I (where Ix has one element for all x ES) is in B. It follows from 5 that the class of all Q-alge1;>ras satisfying a set of equations is a quasivariety. The main result relating quasi varieties and sur-reflections is the following: 14 Theorem. B is a quasivariety if and only if every X-algebra has a surreflection in B, and every X -algebra isomorphic to an algebra in B is in B. Before proving (half of) this, we note the following: 15 Corollary. Every quasivariety has an initial object. PROOF. This is immediate from Proposition 9 and Theorem 14. o As Corollary 15 is our main focus we shall only prove the needed half of Theorem 14. The converse, that sur-reflections imply quasivariety, will be
334 14 Equational Specification left for Exercise 18. We need a few preliminary results about products, subalgebras, and image factorizations. 16 Lemma. The product algebra is the product in the category of X -algebras. PROOF. In the notation following 13, let /;: (R, y) -----+ (ai' <5;) be morphisms and let f: R --+ Q be the unique morphism in SetS with prJ = /;. Then in the diagram XI; I XR yl R xI ? I I IXQ bl I Q Xpr, pr, I l xQ, lb, I Q, I; t we must prove that (?) commutes given that the three indicated subdiagrams and the outer rectangles commute. But for each i we have pri(<5Xj) = (pri<5)Xf = <5iXpr iXf = <5iX/; = /;y = pri(fy)· By the uniqueness property of products, <5Xf = fy. 17 Definition. Let f: A is (I, p, inc) where -+ D B be a total function. The image factorization of f with 1= {bEB: b = fa for some aEA}, inc is the inclusion map inc (c) = c, and p(a) = f(a). Note that f = inc p with p surjective and inc injective. Uniqueness properties and categorical generalizations are explored in the exercises. The definition extends in the obvious way to Sets. If f: A -+ B is a morphism in SetS the image factorization of f is (I, p, inc) where, for XES, (Ix, Px, inc x ) is the image factorization of fx. Image factorizations provide subalgebras: 18 Proposition. If f: (Q, <5) -----+ (R, y) is an X -algebra morphism and if (I, p, inc) is the image factorization of f in SetS then I is a subalgebra (I, /3) of (R,y) and p: (Q, <5) -----+ (I, /3) and inc: (I, /3) -----+ (R, y) are algebra morphisms. PROOF. To see this it is useful to establish the diagonal fill-in property: Given a commutative square us = it of functions as in 19
335 14.2 Sur-reflections 19 with s surjective and i injective there exists a unique {3 with {3s = t, i{3 = u. To construct a unique {3 with {3s = t use Lemma 5: if sa = sa' then ita = usa = usa' = ita' and, as i is injective, ta = ta'. To see i{3 = u observe that each bE B has the form sa so that ub = usa = ita = i{3sa = i{3b. See Exercise 15. To apply the diagonal fill-in property, consider 20: 20 XQ xp ----''---+1 X (inc) XI 1 XR I p: bl I Q ----+1 p ~ I -~--+I inc R Then for each XES, (X p)x is surjective; this was established in the proof of Lemma 12. Thus, 20 applies with s = (Xp)x, i = (incx ) to produce a unique {3 in SetS with 20 commutative. In particular, I is a subalgebra and p: (0(, (j) ---+ (1, {3), inc: (1, {3) ---+ (R, {3) are algebra morphisms. 0 We are finally ready to prove the desired half of Theorem 14: 21 Theorem. If B is a quasivariety, every X -algebra has a sur-reflection in B. PROOF. Let (Q, (j) be an X -algebra. Fix an algebra morphism fo: (Q, (j) ---+ (Bo, Yo) with (Bo, Yo) in B; such always exists since we can choose fo the unique morphism to the terminal algebra. For each XES, q, r E Qx we define an algebra morphism 22 (Q, (j) Ix.r 1 (Bxqr> Yxqr), as follows. If 22 exists with fxqr(q)"# fxqr(r) choose one. Else for all f: (Q, (j) ---+ (B, y) with (B, y) in B f(q) = f(r); in this case define 22 by fxqr = fo· Now let (ii, y) be the product of all the (B xqr , Yqxr)' Then as B is a quasivariety, (ii, y) is in B. By Lemma 16 there exists a unique algebra morphism f as shown in 23: 23 Let (B, (), inc) be the image factorization of f Then B is a sub algebra (B, y) of (ii, y) and (): (Q, (j) ---+ (B, y) is an algebra morphism by Proposition 18. As B is a quasivariety and (ii, y) is in B, (B, y) is in B. We shall show that () is the desired sur-reflection.
336 14 Equational Specification To see that 8 is a reflection, let 8': (Q, c5) ----+ (B', y') be an algebra morphism with (B', y') in B. To construct 9 as in 24, we rely on Lemma 11. Let XES, q, rEQx 24 (Q")~ fJ ;l"") inc -------+l (B,y) (H', y') and suppose 8x q = 8x r. Then !xqr(q) = (prXqrf)q (by 23) = (prxqrincx8x)q (as! = inc 8) = (prxqrincx8x)r (as 8xq = !xqr(r) = 8xr) (similarly). So that, by the definition of the !xqr' no morphism exists to an algebra in B distinguishing q and r and, in particular, in 248' q = 8'r. Thus, unique 9 in 24 exists and such 9 is an algebra morphism by Lemma 12. 0 As mentioned earlier, we have the following important result in the theory of equational specification: 25 Corollary. The data type specified by an equational specification as in 14.1.16 always exists. PROOF. The full subcategory B of all Q-algebras (thought of as Xn-algebras by 14.1.22-25 satisfying the set of equations E is a quasivariety by 5. Hence, B has an initial object by Theorem 21 and its Corollary 15. 0 EXERCISES FOR SECTION 14.2 1. In the context of Exercise 14.1.9, show that Observation 1 holds in CS providing it holds in C. 2. What difficulties, if any, arise in generalizing proposition 2 to CS , if (i) C = Dom? (ii) C = Dome? (iii) C = Domadj ? 3. Show that the product algebra of 3 is indeed the product in the category of algebras (in either the Q- or Xn-sense-it does not matter which because of Exercises 14.1.8 and 11.3.2). 4. Show that vector spaces constitute the full subcategory of algebras of a singlesorted signature with nullary 0, a unary r for each real number r (for the operations of scalar multiplication, q 1-----+ rq), and binary + which satisfy a finite set of equations (the usual ones). Show that the concept of a "vector subspace" coincides with that of a subalgebra as in 4. What is the initial vector space?
337 14.2 Sur-reflections 5. Let (Q,J) be the data type specified by (S, g, E) as in 14.1.16. Prove that the only subalgebra A of (Q, 15) is A = Q. [Hint: Use the initial algebra property to prove that inc: A -+ Q is surjective. Where is observation 5(ii) used?] Two equational specifications (S, g, E), (S, g', E') on the same sort set are equivalent iffor each Q in SetS there is a bijection 15 -+ {/ between il-algebras (Q,J) satisfying E and il'-algebras (Q, b') satisfying E' such that for f: Q -+ R in Sets, f is an g-algebra morphism (Q, 15) ---> (R, y) if and only if f is an g' -algebra morphism (Q, 15') - - + (R, 1"). 6. Show that the two categories of algebras for equivalent specifications are isomorphic categories in the sense of the exercises to Section 11.3. 7. Show that the equational specification for Boolean algebras of Example 14.1.1 is equivalent to with equations -,F = T, -,T= F, TAT=T, T A F = FAT = F A F = F. 8. The concept of a group (Exercise 2.2.11) may be captured by the single-sorted specification e I G ( 0- 1 G J' G2 with equations a'(b'c) = (a'b)'c, a'e=a=e'a, a' a-I = e = a-I. a. Show that this specification is equivalent to G2~G subject to the single equation a -:- «(a -:- a) -:- b) -:- c) -:- «a -:- a) -:- a) -:- c) = b. [Hint: The bijection 15 <-+ b' is given by a -:- b = a' b- I ; e = a -:- a (you must prove independence of choice of a), a-I = e -:- a, a·b = a -:- b- I .] This example shows that it may be far from obvious that two specifications are equivalent. 9. Let C be the category of join-semilattices (3.3.3) whose morphisms are defined by the condition f(x v y) = f(x) v f(y). Let B be the class of all single-sorted
338 14 Equational Specification algebras with signature subject to the equations q v (r v s) = (q v r) v s, q v r = r v q, q v q = q. Show that B, C are isomorphic categories. [Hint: (Q, :0;) f----+ (Q, b) via the usual q v r whereas (Q, b) - - + (Q, :0;) if a :0; r means q v r = r.] This shows that a nonequational structure (e.g., a semilattice qua poset) may be reexpressible in equational form. 10. Prove that any coproduct offunctors of the form Xu also has this form. Conclude that X: Set - - + Set, XQ = A + Q* as in Exercise 11.2.2 has the form Xu. Find a suitable a explicitly. 1l. Prove that reflections, as defined in 7, are unique up to isomorphism. [Hint: A reflection is just an initial object in a suitable category. If (B', y', fJ') is another reflection, f in 8 will be an isomorphism.] 12. Let (S, a) be a signature and let E be the set of all equations (XES). Let B be the full subcategory of all a-algebras satisfying E. Give a direct construction of the sur-reflection of each a-algebra in B (bypassing the existence proof of this section). [Hint: If (B, y) is in B, Bx has at most one element.] = 1 + A x Q with A finite. This has the form Xu for the single-sorted signature with S = {x}, ax = 1, axx• = A. The initial a-algebra is then the least fixed point A * as in 12.2.9. A subset of A may be viewed as a list in A for which order and repetition are not important. This may be expressed by the set E of equations 13. Let X: Set - - + Set be the polynomial functor XQ a(b(v» = b(a(v)), a(a(v)) = a(v), where a, bE axx and v = Vxl is a variable. Show that the data type specified by (S, a, E) is 2A [Hint: Directly construct a sur-reflection B: A * -> 2A where B(w) = set of symbols occurring in w.] 14. Let (R, y) be a subalgebra of (Q, b) with inclusions inc: R -> Q. Let (U, IX) be an algebra and let f: U -> R be functions such that inc f: (U, IX) -----+ (Q, b) is an algebra morphism. Prove that f: (U, IX) -----+ (R, y) is an algebra morphism. 15. Let fJ: Q -> B be a function. Prove that fJ is surjective if and only if for arbitrary fJ': Q -> B' there exists at most one f: B -> B' with Q () IB ~~/{ B'
339 Notes and References for Chapter 14 Let i: B -+ R be a function. Prove that i is injective if and only if for arbitrary i': R' -+ Q there exists at most one g: B' -+ B with gi/, B I I IR I B' 16. In an arbitrary category, a morphism () is an epimorphism if, as in Exercise 15, for all ()' there is at most one f with f(} = ()'. Dually, i is a monomorphism if, as in Exercise 15, for all i' there exists at most one g with ig = i'. Prove the following: (i) Iff: A -+ B, g: B -+ Care epimorphisms, so is gf (ii) Iff: A -+ B, g: B -+ C then if gf is an epimorphism, so is g. (iii) idA: A -+ A is an epimorphism. State (and hence there is no need to prove) the dual result for monomorphisms. 17. Consider the single-sorted signature of Exercise 10 so that X = Xn: Set --+ Set, XQ = A + Q*. Let B be the full subcategory of all X-algebras (Q,(j) for which there exists a monoid structure on ex for which the restriction of (j to Q* is the map A t-----. monoid unit q 1 ... qn t-----. monoid product q 1 ... q•. Prove that B is a quasivariety and that its initial object is (A *, Jl) where + (A*)* --+ A* maps elements of A to length-l words and words of words to single long words by merging, for example, (aba)(bc) of length 2 in (A *)* to ababc ofiength 5 in A *. Jl: A 18. Prove the converse part of Theorem 21 by showing that if every object has a sur-reflection in B then the reflection maps () of a product of B-objects or of a subalgebra of a B-object are isomorphisms. 19. Let B be a full subcategory of X-algebras. Say that B is closed under images if whenever f: (B, y) --+ (Q, (j) is an algebra morphism with each fx surjective then if (B, y) is in B, also (Q, (j) is in B. Show that if X = Xn and B is the set of all X -algebras satisfying a set of equations then B is closed under images. Notes and References for Chapter 14 Initial algebras with data types in mind are due to J. A. Goguen, J. W. Thatcher, E. G. Wagner, and J. B. Wright, "Initial algebra semantics and continuous algebras," Journal of the Association of Computing Machinery, 24, 1977, pp. 68-95. This paper is worth the attention of the reader who has made it this far in the book. Many-sorted algebras were studied earlier by universal algebraists. For much more detail and further topics and references in the area of equational specification, see H. Ehrig and B. Mahr, Fundamentals of Equational Specification 1, Springer-Verlag, 1985. This book discusses, and provides references for, the difficulties mentioned at the end of Section 1. See pages 305-306 for references to languages such as CLEAR, ACT ONE, and CLU as mentioned in the chapter introduction. Theorem 14.2.14 may be generalized to full subcategories of X -algebras for any
340 14 Equational Specification category C with products and "image factorization system" (E, M) as long as X maps morphisms in E into E. The necessary definitions and results may be found in the authors' text cited in the notes to Chapter 2. The interested reader may wish to pursue this and explore its applicability to Doms, (Domc)S, and (Domadj)s. The term "quasivariety" is used because a more central concept in universal algebra (the study of operations and equations-see G. Gratzer, Universal Algebra, Van Nostrand, 1968) is that of a variety which is a quasivariety which is additionally closed under homomorphic images as defined in Exercise 14.2.19. This exercise establishes that if B is a full subcategory of Xn-algebras then if B is the class satisfying a set of equations, B is a variety. The converse is true by a celebrated theorem of Garrett BirkholT (proved in 1935 in the single-sorted case). For a discussion of the many-sorted case see G. BirkholT and J. D. Lipson, "Heterogeneous algebras," Journal of Combinatorial Theory, 8, 1970, pp. 115-133, and Chapter 4 of the Ehrig-Mahr book cited above. A variety need not be describable by only finitely many equations, however, and the proof that the set of equations exists must be regarded as nonconstructive in general. Exercise 14.1.17 illustrates that the needed set of equations may not be obvious. The tentl "reflection" was introduced by Freyd in the exercises to the book cited in Chapter 2. All the texts on category theory cited there prove Freyd's "General adjoint functor theorem." Readers familiar with this result could obtain a much quicker proof of Theorem 14.2.21.
Epilogue A number of major mathematical structures were introduced in this book in the context of their application to program semantics. The reader interested in further pursuit will discover that a voluminous body of theory and open problems exists for each structure. The most important concepts discussed were: 1. Partially Ordered Sets (Posets). These are sets equipped with an order relation ::;; subject to axioms. In general, there may be x, y such that neither x ::;; y nor y ::;; x hold. Domains are posets in which there is a minimal element and each ascending chain has a least upper bound. The Kleene fixed point theorem asserts that each function f from a domain to itself which is continuous, that is, preserves least upper bounds of ascending chains, has a least fixed point x, that is, fx = x and if fy = y then x ::;; y. Boolean algebras are special po sets in which the standard Boolean operations such as or, and, not, and if-then-else are defined. 2. Categories. These have objects and morphisms. Mathematically (but not conceptually from the point of view of this book), categories are generalizations of posets. Least upper bounds and greatest lower bounds generalize to coproducts and products. Similarly, least upper bounds of ascending chains generalize to colimits of right chains with limits of left chains as the dual concept. Isomorphic objects in a category are "abstractly the same." An initial object admits a unique morphism to each object and a terminal object admits a unique morphism to every object. Zero morphisms generalize the totally undefined partial function. In a Cartesianclosed category, "lambda conversion" ofmorphisms is possible. 3. Metric Spaces. These are sets equipped with a distance function d(x, y) subject to axioms. In a complete metric space each sequence Xl' x 2 , x 3 , •••
342 Epilogue whose elements approach each other (i.e., d(xm' x n ) ---+ 0) converges to a limit x (i.e., d(x, x n ) ---+ 0). The Banach fixed point theorem states that if f is a function from a complete metric space to itself which is a contraction in that there exists K < 1 with d(fx,fy) ::;; Kd(x, y) for all x, y, then f has a unique fixed point x = fx. 4. Functors. Roughly speaking, these are structure-preserving functions between categories. The generalized Kleene fixed point theorem establishes that a functor from an appropriate category to itself which is co-continuous by virtue of preserving colimits of right chains has a least fixed· point. The dual theorem provides greatest fixed points for continuous functors. Bicontinuous functors are both continuous and cocontinuous and polynomial functors, which are built up from constant functors and the identity functor by using product and coproduct, are bicontinuous on appropriate categories. Additionally, we discussed the partially additive mono ids introduced by the authors in 1980. These are sets equipped with a sum operation that applies to some but not all countable families, subject to axioms. In a partially additive category the set of morphisms between two objects is a partially additive monoid (and there are other axioms as well). The fundamental problem addressed in the first two parts of the book was how to give the overall semantics of a recursive or iterative program given the semantics of the pieces. We achieved this by considering "semantic categories" C with enough structure so that the infinitary process of repeated call found suitable expression, important examples being Pfn (sets and partial functions) and Mfn (sets with multivalued functions). We emphasized two approaches to determine the desired element f of the set C(X, Y) of C-morphisms from X to Y as the semantics of a recursive specification. In the first, f is the least fixed point of an appropriate continuous function 1/1 mapping C(X, Y) to itself, as constructed by the Kleene fixed point theorem. Here, the semantic category must be designed so that C(X, Y) is a domain. In the second, f is the pattern-of-calls expansion which arises when C is a partially additive category as a pertinent fixed point of a requisite power-series map 1/1 from C(X, Y) to itself. In the most important cases, both approaches are applicable, and they yield the same f, but in "orthogonal" ways-the least fixed point is, roughly speaking, the limit of "up to n calls" as n increases whereas the pattern-of-calls expansion is the sum of each of the possible (perhaps infinite) computation paths in the tree-of-call. At the level of abstract trees, the infinite tree-of-call just mentioned arises as the unique fixed point of a contraction on a suitable complete metric space of trees, as constructed by the Banach fixed point theorem. All three fixed points are examples of a canonical fixed point. This definition formalizes the idea that, in each case, the fixed point assigned to each member of a class of functions is constructed "the same way in all cases." Because the definition uses only morphisms (rather than specific structure on the objects
Epilogue 343 such as domain, partially additive, metric) it offers the mathematician more leeway in seeking useful semantic categories. In the third part of the text we considered the problem of data type specification. In our opinion, the concepts we are trying to formalize are not as crystallized at this time in the computer science community as those involved in program flow and recursion, so that we have resisted any attempt at a definitive treatment. We have not, for example, formally defined "data type." Our philosophy, however, is that a data type is an object or finitely many objects in a semantic category C together with a specified finite collection of morphisms between them. We have taken the stand that a number of different tools can be employed to construct these objects and morphisms (and, implicitly, that such tools be considered for implementation in programming languages). The simplest data types such as arrays should be constructed directly (in this case using products in C). Recursively defined data types such as lists may be defined as functorial fixed points using the generalized Kleene theorem or its dual. Many data types are usefully defined using equational specification. We also discussed the work of Dana Scott in constructing a domain isomorphism D = A + [D -+ DJ. While emphasizing that the specification of such data types is not likely to arise in practical semantics, the existence of such isomorphisms is mathematically important in establishing the consistency of mathematical frameworks in which functions can be passed as arguments to procedures. The above mathematical structures and applications provide the reader with a varied set of algebraic techniques and frameworks useful in the study of the semantics of current and future programming languages. Progress in concurrency, parallel-processing, and many other areas will profoundly affect the design of programming languages. Yet it is our hope that basic concepts of program and data type construction can be formalized in such new environments by choosing an appropriate category for denotational semantics and then applying theory, such as that of this text, which works for a broad class of categories.
Author Index A F Ackennann, W. 145 Adamek, J. 70,257 Alagic, S. 37, 114 Arbib, M.A. 37,70,96, 114, 145, 179, 209,231,257,317 Arden, D.N. 175 Floyd, R. 37, 39 Freyd, P. 70, 114,257,340 B Backus, J. 1,37 Barr, M. 257 Benson, D.B. 97 Birkhoff, G. 340 Bloom, S.L. 231 C Church, A. 306 G Gooei, K. 317 Goguen, J.A. 339 Goldblatt, R. 70, 292, 317 Gordon, M.J.C. 317 Gratzer, G. 340 Gries, D. 114 Guessarian, I. 175, 231 H Halmos, P.R. 96 Herrlich, H. 70 Hilbert, D. 145 Hoare, C.A.R. 37, 99 D Davis, M. 317 De Bakker, J.W. 37 De Roever, W.P. Jr. 37 Dijkstra, E.J. 37,99, 114 J E Ehrig, H. 339, 340 Eilenberg, S. 257 Elgot, C.C. 70, 97, 231, 257 K Karp, R.M. 37 Kfoury, A.J. 37, 317 Kleene, S.C. 145, 175 Jacobson, N. 37 Jensen, K. 37
346 Index Knaster, B. 175 Koubek, V. 257 Krause, E. 231 p Padulo, L. 231 Plotkin, G. 257,317 R Rogers, H. Jr. L Lagarias, J .C. 145 Lallement, G. 231 Lambek, J. 257 Lawvere, F.W. 70,317 Lehmann, D. 257, 292 Lipson, J.D. 340 M Mac Lane, S. 70, 257 Mahr, B. 339,340 Manes, E.G. 70, %, 97, 114, 145, 179, 209,257,317 Manna, Z. 145 McCarthy, J. 145 Meertens, L.G. 37 Milner, R. 317 Mitchell, B. 70, 114 Moll, R.N. 37,317 N Nivat, M. 231 145 S Schutzenberger, M. 231 Scott,D.S. 37, 145, 175,209,293,305, 317,343 Smyth, M.B. 257,278 Steenstrup, M.E. 96, 317 Stoy, J. 37,317 Strachey, C. 37,293,305,317 Strecker, G.E. 70 T Tarski, A. 175 Thatcher, J.W. 339 Tindell, R. 231 TrnkovR, V. 257 W Wagner, E.G. 317 Wand, M. 257 Wirth, N. 37 Wright, J.B. 339
Subject Index A abstraction map 302 abstract iterative program 159 abstract syntax 187 Ackermann function 121, 137, 145,224 Adamczyk, K. 145 additive domain 194 additive function 181 m- 181 strongly m- 205 adjoint 308 -Alg 246 algebra of endofunctor 246 over a signature 322 alternative construct 33, 93 ANMfn 40 APL 25,184 approximating sequence 273 approximation ordering 149,295 array 280 assertion 4 associativity 23, 72 automaton 174, 253 B Banach fixed point theorem bicontinuous functor 275 bi-index principle 280 Boolean algebra 88 215 C CIM ,.,,) 240 C IP ,") 45, 238 Cantor diagonal argument 244 CAR 265 cartesian product 57 case statement 32, 93 category 39 cartesian-closed 300 discrete 46 ordered partially additive 194 of PAR-schemes 186 partially additive 79 of recursion schemes 176 Cauchy sequence 214 CDR 265 center 96 chain ascending 149 descending 272 left 272 right 259 co- 50 -coAlg 246 coalgebra of endofunctor 246 co-continuous functor 267 separately 288 codomain 35 coequalizer 278 colimit of right chain 260 comparison map 247
348 Index complement 87 composition in a category 39 of functors 242 of multifunctions 22 of partial functions 14 concatenation 43, 248 concurrent programs 317 conditional (if-then-else) 33, 82, 93 generalized 82 context-free grammar 169 continuous function 154 continuous functor 274 contraction 215 coproduct 62 of functors 242 D determinant 183 deterministic morphism 114 diagonal fill-in 334 direct sum system 101 disjoint union 58, 282 distributive law 30,31,75,95 Dom 308 Dom"'j 308 Dom, 297 domain (of morphism) 39 domain (poset) 149 additive 194 coproduct 297, 311 fiat 150 function-space 296 power 317 product 151, 158, 297, 311 domain of definition 13, 95 DTN 12,288 DTN 12 dual category 50 dynamic tree 289 of numerals 12 E e (= abstract syntax) 187 If' (= pattern-of-calls expansion) element 282, 303 Elgot iteration equation 245 empty family 72 empty string 43 endofunctor 240 epimorphism 339 =.bb 17 189 equation 323 equational specification 323 equivalent s 336 evaluation morphism 300 extension ordering 42, 90, 95, 114, 125, 148, 149 F family notation 58, 72 fibonacci function 128, 139 fixed point 124, 153 canonical 177 of endofunctor 246 greatest 247 least 153, 247 FPF (= functional programming fragment) 12, 203, 291 multifunction semantics for 25 full subcategory 43 function additive 181 bijective 47 continuous 154 everywhere undefined 20,30, 121, 125 guard 32,89 inclusion 32, 89 injective 20 m-additive 181 multi- 21 partial 13 primitive recursive 128, 144 semantically equivalent s II, 20 separately continuous 168 strict 177 surjective 20 total 14 functional programming languages function space domain 296 object 300 functor 240 bicontinuous 275 category 244 co-continuous 267 constant 241 continuous 274 coproduct of s 242 identity 241 polynomial 243 product of s 241 separately co-continuous 288 functoriality axioms 240 FwR 46
349 Index G generalized conditional 82 GOdel number 307 greatest element 86 greatest lower bound (see Infimum) group 56, 337 Guard(X) 89 guarded command 32 guard transformer 113 H Hasse diagram 42 Heyting algebra 301 homomorphism canonical---of PAR schemes of monoids 43, 240 of PAR schemes 186 191 I identity morphism 39 if-then-else (see Conditional) 95, 186, 207 image factorization 334 infimum arbitrary 147 binary 86, 94 initial object 48 initial value problem 218 injection morphism 63 integral operator 219 interpretation of syntax tree 188 inverse in a category 47 in a group 56 invertible 56 isomorphic 48 functors 244 isomorphism 47 of categories 277 iterate 83, 85 functorial 245 J join (see Supremum) join-semilattice 86, 337 K kernel 107 kernel-domain decomposition system 104 104 Kleene fixed point theorem 154 generalized 270 Kleene semantics 125, 152, 198 Kleene sequence 125, 152 L L(G) 169 lambda-abstraction 300 language theory notations 248 lattice 86 distributive 88 Lawvere diagonal argument 304 least element 55, 86 least upper bound (see Supremum) least upper bound axiom 230 left chain 272 limit of left chain 272 in a metric space 213 line tying morphism 66 Lipschitz condition 219 LISP 12, 139, 265, 300 Lower bound 94, 147,272 M many-sorted 323 matrix of languages 173 over Pfn(X,X) 144, 159 meet (see Infimum) meet -semilattice 86 metric 211 discrete 217 Manhattan 211, 231 non-Archimedian 212 metric space 211 complete 215 metric subspace 213 closed 217 Mfn 40 Mfn(X,Y) 21 Mfn~ 46 modus ponens 301 Mon 43 monoid 43 monoid homomorphism 43 monomorphism 339 monotone map 43 morphism 39 reliable 46 total 52
350 N naturally equivalent 244 natural transfonnation 244 o ob(C) 39 observability map 256 (_)OP 50 opposite category 50 P 22 PAR scheme 183 parallel construction 66, 242 partial correctness 99, 200 specification 99 partial function 13 with reliability 46 partially additive category 79 monoid 72 ordered category 194 recursive scheme 183 semiring 95 structure 75 partially ordered set (= poset) 41 partition 72 partition-associativity 72 Pascal 37 pattern of calls 131, 134 expansion 134, 189, 198 polynomial 204, 243 functor 243, 312 map 183, 185, 204 poset (= partially ordered set) 41 complete 147 consistently complete 152, 199 discretely-ordered 150 Poset 43 postcondition 99 power series 204 map 183, 204 recursive specification 205 scheme 204 power set 22 precondition 99 primitive recursion 128 product of algebras 329, 333, 336 of domains 151, 158 of functors 241 of partially additive monoids 206 (JJ>( -) Index projection function 59 projection morphism 60 Q quasi projection 76 quasivariety 333 queue 283 R reachability map 254 recursive specification on Pfn(X,Y) 125 on a domain 152 reflection 330, 340 repetitive construct 35, 93 S SC(X,Y) 26 semantically equivalent II, 20 semantic category 26 semantics 3 assertion 3, 4, 98 denotational 3 operational 3, 4, 69 partially additive 26 Set 40 s-expression 265 signature 322 simple recursion 53, 252 single-sorted 323 sorts 322 stack 283, 321, 330 strict continuous map 297 subalgebra 329, 333 subcategory 44 sum of multifunctions 28 of partial functions 29 surnmable 72 sum-ordering 90, 95, 194 supremum (= least upper bound) arbitrary 147 ascending chain 260 binary 86, 94 sur-reflection 333 T tenninal object tenns 323 49
Index test 32 total correctness 99, 201 total morphism 52 Tot(X,Y) 14 totalizer 56, 111 totally ordered set 42 trace semantics 249 tree 224 tree induction rule 200 U unit interval 96 upper bound 94, 147,259 V variety 340 Vect 44 351 W weakest liberal precondition (wlp(S,-» weakest precondition (wp(S,-» 100 while-do 34, 84, 85, 93, 96 generalized 83 z zero morphisms 51 object 52 in partially additive monoid 73 99
Texts and Monographs in Computer Science Suad Alagic Relational Database Tecbnology Suad Alagic and Michael A. Arbib Tbe Design of Well-Structured and Correct Programs S. Thomas Alexander Adaptive Signal Processing: Theory and Applications Michael A. Arbib. A. J. Kfoury. and Robert N. Moll A Basis for Theoretical Computer Science Michael A. Arbib and Ernest G. Manes Algebraic Approaches to Program Semantics F. L. Bauer and H. Wiissner Algorithmic Language and Program Development Kaare Christian The Guide to Modula-2 Edsger W. Dijkstra Selected Writings on Computing: A Personal Perspective Nissim Francez Fairness Peter W. Frey. Ed. Chess Skill in Man and Machine, 2nd Edition R. T. Gregory and E. v. Krishnamurthy Methods and Applications of Error-Free Computation David Gries. Ed. Programming Methodology: A Collection of Articles by Members of IFIP WG2.3 David Gries The Science of Programming A. J. Kfoury. Robert N. Moll. and Michael A. Arbib A Programming Approach to Computability E. V. Krishnamurthy Error-Free Polynomial Matrix Computations Franco P. Preparata and Michael Ian Shamos Computational Geometry: An Introduction Brian Randell. Ed. The Origins of Digital Computers: Selected Papers Arto Salomaa and Matti Soittola Automata-Theoretic Aspects of Formal Power Series Jeffrey R. Sampson Adaptive Information Processing: An Introductory Survey William M. Waite and Gerhard Goos Compiler Construction Niklaus Wirth Programming In Modula-2