Author: Meyer Y.   Jaffard S.   Ryan R.D.  

Tags: mathematics   physics   signal processing   signals  

ISBN: 0-89871-448-6

Year: 2001

Text
                    Wavelets
Tools for Science & Technology
Stephane Jaffard
Universite Paris XII
Institut Universitaire de France
Yves Meyer
Ecole Normale Superieure de Cachan
Academic des Sciences
Robert D. Ryan
Paris, France
slant
Society for Industrial and Applied Mathematics
Philadelphia

Copyright ©2001 by the Society for Industrial and Applied Mathematics. 10 987654321 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the publisher. For information, write to the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Library of Congress Cataloging-in-Publication Data Jaffard, Stephane, 1962- Wavelets : tools for science & technology / Stephane Jaffard, Yves Meyer, Robert D. Ryan. p. cm. “This new book began as a one-chapter revision of Wavelets: algorithms & applications, (SIAM, 1993), which is based on lectures Yves Meyer delivered at the Spanish Institute in Madrid in February 1991” --Pref. Includes bibliographical references and indexes. ISBN 0-89871-448-6 1. Wavelets (Mathematics) I. Meyer, Yves. II. Ryan, Robert D. (Robert Dean) 1933- Ш. Title. QA403.3 J34 2001 515'.2433-dc21 00-051607 slhjil is a registered trademark.
Contents Preface to Revised Edition ix Preface from the First Edition xiii Chapter 1. Signals and Wavelets 1 1.1 What is a signal?................................................ 1 1.2 The language and goals of signal and image processing............ 2 1.3 Stationary signals, transient signals, and adaptive coding ...... 6 1.4 Grossmann Morlet time-scale wavelets............................. 8 1.5 Time-frequency wavelets from Gabor to Malvar and Wilson......... 9 1.6 Optimal algorithms in signal processing......................... 10 1.7 Optimal representation according to Marr........................ 12 1.8 Terminology..................................................... 13 1.9 Reader’s guide ................................................. 13 Chapter 2. Wavelets from a Historical Perspective 15 2.1 Introduction.................................................... 15 2.2 From Fourier (1807) to Haar (1909), frequency analysis becomes scale analysis....................................................... 16 2.3 New directions of the 1930s: Paul Levy and Brownian motion .... 20 2.4 New directions of the 1930s: Littlewood and Paley.................... 21 2.5 New directions of the 1930s: The Franklin system..................... 23 2.6 New directions of the 1930s: The wavelets of Lusin................... 25 2.7 Atomic decompositions from 1960 to 1980 ......................... 26 2.8 Stromberg’s wavelets............................................. 28 2.9 A first synthesis: Wavelet analysis ............................. 29 2.10 The advent of signal processing................................. 31 2.11 Conclusions..................................................... 32 Chapter 3. Quadrature Mirror Filters 35 3.1 Introduction..................................................... 35 3.2 Subband coding: The case of ideal filters............. 36 3.3 Quadrature mirror filters........................................ 37 3.4 Trend and fluctuation............................................ 40 3.5 The time-scale algorithm of Mallat and the time-frequency algorithm of Galand................................................... 40 3.6 Trends and fluctuations with orthonormal wavelet bases........... 42
vi CONTENTS 3.7 Convergence to wavelets......................................... 43 3.8 The wavelets of Daubechies...................................... 46 3.9 Conclusions..................................................... 46 Chapter 4. Pyramid Algorithms for Numerical Image Processing 49 4.1 Introduction.................................................... 49 4.2 The pyramid algorithms of Burt and Adelson...................... 50 4.3 Examples of pyramid algorithms.................................. 54 4.4 Pyramid algorithms and image compression........................ 55 4.5 Pyramid algorithms and multiresolution analysis................. 57 4.6 The orthogonal pyramids and wavelets............................ 58 4.7 Biorthogonal wavelets........................................... 63 Chapter 5. Time-Frequency Analysis for Signal Processing 67 5.1 Introduction.................................................... 67 5.2 The collections Q of time-frequency atoms....................... 69 5.3 Mallat’s matching pursuit algorithm............................. 71 5.4 Best-basis search............................................... 72 5.5 The Wigner-Ville transform...................................... 72 5.6 Properties of the Wigner-Ville transform ....................... 74 5.7 The Wigner -Ville transform and pseudodifferential calculus..... 76 5.8 Return to the definition of time-frequency atoms................ 79 5.9 The Wigner Ville transform and instantaneous frequency.......... 79 5.10 The Wigner Ville transform of asymptotic signals ............... 81 5.11 Instantaneous frequency and the matching pursuit algorithm .... 83 5.12 Matching pursuit and the Wigner -Ville transform .............. 84 5.13 Several spectral lines......................................... 85 5.14 Conclusions.................................................... 86 5.15 Historical remarks............................................. 86 Chapter 6. Time-Frequency Algorithms Using Malvar-Wilson Wavelets 89 6.1 Introduction.................................................... 89 6.2 Malvar-Wilson wavelets: A historical perspective................ 90 6.3 Windows with variable lengths................................... 92 6.4 Malvar-Wilson wavelets and time-scale wavelets ................. 94 6.5 Adaptive segmentation and the split-and-merge algorithm......... 95 6.6 The entropy of a vector with respect to an orthonormal basis .... 96 6.7 The algorithm for finding the optimal Malvar-Wilson basis....... 97 6.8 An example where this algorithm works........................... 99 6.9 The discrete case............................................... 99 6.10 Modulated Malvar-Wilson bases..................................100 6.11 Examples.......................................................102 6.12 Conclusions....................................................104 Chapter 7. Time-Frequency Analysis and Wavelet Packets 105 7.1 Heuristic considerations........................................105 7.2 The definition of basic wavelet packets.........................108 7.3 General wavelet packets.........................................Ill 7.4 Splitting algorithms ...........................................112
CONTENTS vii 7.5 Conclusions......................................................114 Chapter 8. Computer Vision and Human Vision 117 8.1 Marr’s program...................................................117 8.2 The theory of zero-crossings.....................................120 8.3 A counterexample to Marr’s conjecture............................121 8.4 Mallat’s conjecture..............................................122 8.5 The two-dimensional version of Mallat’s algorithm................124 8.6 Conclusions......................................................125 Chapter 9. Wavelets and Turbulence 127 9.1 Introduction.....................................................127 9.2 The statistical theory of turbulence and Fourier analysis........128 9.3 Multifractal probability measures and turbulent flows ...........130 9.4 Multifractal modeling of the velocity field......................131 9.5 Coherent structures .............................................137 9.6 Conder’s experiments ............................................139 9.7 Marie Farge’s numerical experiments..............................140 9.8 Modeling and detecting chirps in turbulent flows................141 9.9 Wavelets, paraproducts, and Navier-Stokes equations..............145 9.10 Hausdorff measure and dimension.................................147 Chapter 10. Wavelets and Multifractal Functions 149 10.1 Introduction....................................................149 10.2 The Weierstrass function........................................150 10.3 Regular points in an irregular background.......................152 10.4 The Riemann function............................................157 10.4.1 Holder regularity at irrationals..........................158 10.4.2 Riemann’s function near xq = 1............................163 10.5 Conclusions and comments .......................................164 Chapter 11. Data Compression and Restoration of Noisy Images 167 11.1 Introduction....................................................167 11.2 Nonlinear approximation and sparse wavelet expansions...........168 11.3 Denoising.......................................................177 11.4 Modeling images.................................................181 11.5 Ridgelets ......................................................184 11.6 Conclusions.....................................................185 Chapter 12. Wavelets and Astronomy 187 12.1 The Hubble Space Telescope and deconvolving its images .........187 12.1.1 The model ............................................187 12.1.2 Discovering and fixing the problem.......................188 12.1.3 IDEA......................................................189 12.2 Data compression.................................................194 12.2.1 ht.compress..............................................194 12.2.2 Smooth restoration.......................................196 12.2.3 Comments.................................................197 12.3 The hierarchical organization of the universe ..................197 12.3.1 A fractal universe .......................................200
viii CONTENTS 12.4 Conclusions...................................................201 Appendix A. Filter Fundamentals 203 A.l The Z2(Z) theory and definitions ..............................203 A.2 The general two-channel filter bank............................205 Appendix B. Wavelet Transforms 209 B.l The L2 theory ................................................209 B.2 Inversion formulas............................................210 B.2.1 L2 inversion............................................211 B.2.2 Inversion with the Lusin wavelet........................213 B.3 Generalizations...............................................215 Appendix C. A Counterexample 219 C.l Introduction..................................................219 C.2 The function 0................................................220 C.3 Representations of f0 * Bp and its derivatives ...............221 C.4 Hunting the zeros of (Jo * ®рУ ................................223 C.5 The functions R, R * f)p, (Л * вРУ, and (R * 0РУ' ............225 C.6 (R * 0рУ' and (R * 0РУ vanish at the zeros of (/0 * @рУ'......225 C.7 The behavior of (R * 6>p)'7(/o * ^)"...........................226 C.8 Remarks........................................................227 C.9 A case of perfect reconstruction...............................229 Appendix D. Holder Spaces and Besov Spaces 233 D.l Holder spaces..................................................233 D.2 Besov spaces ..................................................234 D.3 Examples.......................................................235 Bibliography 237 Author Index 249 Subject Index 253
Preface to Revised Edition Wavelet analysis is a branch of applied mathematics that has produced a collection of tools designed to process certain signals and images. This new book is devoted to describing some of these tools, their applications, and their history. We will trace several of the technical roots of wavelet analysis, going back to the 1930s and before. These are examples of where the mathematical techniques that we now codify as wavelet analysis first appeared. They are for the most part concerned with the internal structure of mathematics itself. We judge that the applied point of view began after World War II and was embedded in a more general philosophical context exemplified by an ambitious program called The Institute for the Unity of Science. This “institute without walls’’ was a vision, a vision that was shared by such prominent scientists as John von Neumann, Claude Shannon, and Norbert Wiener. It was the time when Claude Shannon discovered the laws that govern the coding and transmission of signals and images. It was the time when Norbert Wiener and John von Neumann unveiled the relationships between mathematical logic, electronics, and neurophysiology. This led to the design of the first computers. It was the time when Dennis Gabor proposed that speech signals should be decomposed into a series of time-frequency atoms he named "logons." It was the time when Eugene P. Wigner and Leon Brillouin introduced the time- frequency plane. These pioneering scientists opened new avenues in science, and one of these av- enues is called time-frequency analysis. Time-frequency analysis, which is based on Gabor wavelets, will be one of the main topics of this book. Gabor wavelets were improved by Kenneth Wilson, Henrique Malvar, and finally by Ingrid Daubechies, Stephane Jaffard, and Jean-Lin Journe. In contrast with this established line of research, time-scale analysis has had a harder time. Indeed, time-frequency analysis yields the musical score, the notes with their frequencies and durations of the music we hear. Time-scale analysis focuses on the transients, the attack of the trumpet, which lasts a few milliseconds, and similar nonstationary signals. While time-frequency analysis was born in the 1940s, time-scale analysis emerged in the late 1970s in completely distinct areas such as image processing (E. H. Adelson and P. J. Burt), neurophysiology (David Marr), quantum field theory (Roland Seneor, Jacques Magnen, Guy Battle, Paul Federbush, James Glirnm, and Arthur Jaffe), and in geophysics (Jean Morlet). The outstanding collaboration between Alex Grossmann and Jean Morlet gave birth to a new vision that emerged in the 1980s, and the message was the following: While stationary or quasi-stationary signals are adequately decomposed into a series of time-frequency atoms or Gabor-like wavelets, signals with strong transients are
X PREFACE TO REVISED EDITION better analyzed with the time-scale wavelets developed by Grossmann and Morlet. A spectacular example where time-frequency analysis and time-scale analysis have been able to compete is the new JPEG-2000 compression standard for still images. This new standard is based on time-scale wavelets. The old JPEG standard was based on an algorithm called the discrete cosine transform, which is a kind of windowed Fourier transform. This algorithm belongs to the time-frequency group. (Here one ought to say “space-frequency,” since an image is a two-dimensional signal.) In the case of JPEG-2000, and in similar compression problems, time-scale wavelets have been preferred over time-frequency wavelets. This success story was not available when the original book first appeared. This new book began as a one-chapter revision of Wavelets: Algorithms & Applications (SIAM, 1993), which is based on lectures Yves Meyer delivered at the Spanish Institute in Madrid in February 1991. While Yves Meyer and Robert Ryan were working on the translation and revision of the new chapter, which ul- timately became Chapter 11 of the current book, it became clear, based on the many developments in both the theory and applications since 1993, that an exten- sive revision of the original book was needed. Since Stephane Jaffard already had suggested a number of changes and additions, particularly in the sections involv- ing the analysis of multifractal functions, where he is a recognized expert, he was invited to join the project. The result of our collaboration is an almost completely new book, and thus we have given it a new title. Although we have retained the core of the first four chapters, many parts of these chapters have been rewritten and expanded, particularly Chapters 1 and 2. Appendix A has been added as an introduction to some basic filter concepts and hence as a complement to Chapter 3. Chapter 5 has been completely rewritten; it contains new material on chirps that was not known when the first edition was published. Chapters 6 and 7 have been slightly expanded, but they generally follow the original texts. Rather than expanding Chapter 8, we have added Appendix C, which is devoted to a complete discussion of a counterexample to a conjecture of Stephane Mallat on zero-crossings. This counterexample was outlined in the first edition, but this is the first time the details have been published. Chapters 9 and 10, although based on the first edition, are considerably expanded and hence essentially new. Chapter 9 (formerly Chapter 10) tells a much more complete and up-to-date story about the use of wavelets for the study of turbulence. Chapter 10 (based on the former Chapter 9) contains a complete analysis of the Weierstrass and Riemann functions, plus a general discussion about the use of wavelets to analyze multifractal functions. Appendix В complements Chapter 10 by providing key results (with proofs) about some wavelet transforms and their inverses. The treatment here is perhaps slightly different from other developments of this now-classical theory. Chapter 11 is the original motivation for this new book, and we consider it the centerpiece. Here we discuss the intriguing interaction between wavelets and nonlinear analysis and the applications of this line of research to image compression and denoising. Since this chapter involves the concepts that may not be familiar to some readers, we have added Appendix D to introduce Holder and Besov spaces, plus results on their characterizations in terms of wavelet coefficients. The original edition contained two pages about the then-emerging use of wavelets in astronomy. It was written at a time when the applications of wavelets to astron- omy were received with skepticism. Wavelets are today recognized as an essential tool in astronomy. This story has been expanded in Chapter 12, where we have
PREFACE TO REVISED EDITION xi written a detailed analysis of how wavelets are used in two specific algorithms. We also discuss the use of wavelets to understand the hierarchical structure of the uni- verse and its evolution. This is embedded in a historical context going back to the eighteenth century. The bibliography has been considerably expanded to include research papers from each of the applications discussed, as well as many books and papers of general or historical interest. We have not listed any of the many websites that exist. Instead, we encourage the reader to visit the “official’’ wavelet site, www.wavelet.org, which is edited by Wim Sweldens with support from Lucent Technologies. Here one will find fists of regularly updated references, a calendar of events, finks to homepages of researchers, and links to sites from which wavelet software can be downloaded. Given the scope of the applications in this book, it is clear that we are not ex- perts in each, and thus we have relied on the help of others. We wish to thank specifically several individuals for their time, patience, and thoughtful comments: Richard Baraniuk, Guy Battle, Albert Bijaoui, Yves Bobichon, Albert Cohen, Joseph L. Gerver, Hamid Krim, John Rayner, Sylvie Roques, Marc Tajchman, Bruno Torresani, and Eva Wesfreid. Stephane Jaffard Yves Meyer Robert D. Ryan
Preface from the First Edition The “theory of wavelets” stands at the intersection of the frontiers of mathematics, scientific computing, and signal processing. Its goal is to provide a coherent set of concepts, methods, and algorithms that are adapted to a variety of nonstationary signals and that are also suitable for numerical signal processing. This book results from a series of lectures that Mr. Miguel Artola Gallego, Direc- tor of the Spanish Institute, invited me to give on wavelets and their applications. I have tried to fulfill, in the following pages, the objective the Spanish Institute set for me: to present to a scientific audience coming from different disciplines, the prospects that wavelets offer for signal and image processing. A description of the different algorithms used today under the name “wavelets” (Chapters 2-7) will be followed by an analysis of several applications of these methods: to numerical image processing (Chapter 8), to fractals (Chapter 9), to turbulence (Chapter 10), and to astronomy (Chapter 11). This will take me out of my domain; as a result, the last two chapters are merely resumes of the original articles on which they are based. I wish to thank the Spanish Institute for its generous hospitality as well as its Director for his warm welcome. Additionally, I note the excellent organization by Mr. Perdo Corpas. My thanks go also to my Spanish friends and colleagues who took the time to attend these lectures.
CHAPTER 1 Signals and Wavelets The purpose of this chapter is to give the reader a fairly clear idea about the scientific content of the book. All of the themes that will be developed in this study, using the necessary mathematical formalism, already appear in this overture. It is written with a concern for simplicity and clarity and avoids as much as possible the use of formulas and symbols. Signal and image processing ultimately involve a collection of numerical tech- niques, or algorithms. But like all other scientific disciplines, signal and image processing assume certain preliminary scientific conventions. We have sought in this first chapter to describe the intellectual architecture underlying the algorith- mic constructions that will be presented in other parts of the book. 1.1 What is a signal? Signal processing has become an essential and ubiquitous part of contemporary scientific and technological activity, and the signals that need to be processed appear in most sectors of modern life. Signal processing is used in telecommunications (telephone and television), in the transmission and analysis of satellite images, and in medical imaging (echography, tomography, and nuclear magnetic resonance), all of which involve the analysis, storage or transmission, and synthesis of complex time series. Signal processing occurs in most late-model automobiles, typically for some monitoring or control function. The record of a stock price is a signal, and so is a record of temperature readings that permit the analysis of climatic variations and the study of global warming. Does there exist a definition of a signal that is appropriate for the field of sci- entific activity called signal processing? We will not be mathematically precise on this point; instead, we provide a working definition. A needlessly broad definition of signal could include the sequence of letters, spaces, and punctuation marks ap- pearing in Montaigne’s Essays, but the tools we present do not apply to such a signal. We note, however, that the structuralist analysis done by Roland Barthes on literary texts shares some interesting similarities with multiresolution analysis (Chapter 4). The point of contact is the notion of scale. Barthes used the idea of scale in his analysis of literary texts, where different scales are represented, for example, by book, chapter, paragraph, sentence, and word. We will see that the definition of multiresolution analysis is built on the concept of scale. The signals we study will always be sequences of numbers and not sequences of letters, words, or phrases. These numbers often come from measurements, which are typically made using some recording device. We think of these signals as being functions of time like music and speech or, in some cases, as functions of posi-
2 CHAPTER 1 tion. For example, by properly associating numbers with the four bases of a DNA molecule, one obtains a signal that can be analyzed by the methods we describe in Chapter 9. Here we are thinking of one-dimensional signals, functions of a single time or space variable. It is equally important to consider two-dimensional signals, which we call images. Here again, image processing is done on the numerical representation of the image. For a black and white image, the numerical representation is created by covering the image with a sufficiently fine grid and by assigning a numerical gray scale, denoted by f(x, y), to each grid point (x. y). The value of f(x, y) is an average of the gray scales of the image in a neighborhood of (x,y). The image thus becomes a large matrix, and image processing is done on this matrix. These arrays can be enormous, and as soon as one deals with a sequence of images, such as in television, the volume of numerical data that must be processed becomes immense. Is it possible to reduce this volume by discovering hidden laws, or correlations, that exist between the different pieces of numerical information representing the image? This question leads us naturally to consider some of the goals of the scientific discipline called signal processing. 1.2 The language and goals of signal and image processing The subjects we are going to study appear in the scientific landscape where parts of mathematical physics, mathematics, and signal processing intersect, and conse- quently they share language from these disciplines. This can be confusing, so it is useful to explain some of the terms that we will be using. In so doing, we will introduce the signal processing tasks that appear throughout the rest of the book. “Analysis” has the same meaning in science that it has in ordinary language. The standard dictionary definition of “analyze” is to separate the whole (of either a physical substance or abstract idea) into its essential parts to examine the relation- ships between these parts as well as their relationship to the whole. The concept of analysis provides a program of work based on this hypothesis: Behind the apparent complexity of the world there is a hidden order that is accessible through analysis. The complexity is due to the mixture, to the combination of simple entities. The objective of analysis is to discover the nature of these constituents and how they relate to one another. This program is one of the pillars of modern science. In chemistry, this approach led to the preparation of pure substances and to the discovery of molecules and atoms, and it continues today in particle physics. The synthesis of urea by Friedrich Wohler in 1828 was proceeded by, and based on, its analysis. Analysis often has the same meaning in mathematics. Take, for example, Fourier analysis and assume that the complex object to be studied is a continuous, 27r-periodic function of a real variable. One tries to decompose the function into its structural elements. These are the simplest of the 2?r-periodic functions, namely, the sines and cosines. The analysis furnishes the Fourier coefficients. The analysis is validated by a synthesis, and here the synthesis is additive. It amounts to rep- resenting the analyzed function by its Fourier series. The synthesis is successful, however, only after the rules for combining the components are established. In our example, this amounts to finding a summation process that ensures the convergence of the Fourier series furnished by the analysis. In contrast to chemistry, where the constituent parts are well defined, Fourier analysis is not the only way to study the properties of continuous, 2?r-periodic functions. For example, by reinterpreting work by G. H. Hardy on a series
SIGNALS AND WAVELETS 3 attributed to Bernhard Riemann, Matthias Holschneider and Philippe Tchamitchian have shown that wavelet analysis is more sensitive and efficient than Fourier analysis for studying the differentiability of the Riemann function at a given point. In Fourier analysis, the structural elements are unique; they are sines and cosines. However, in wavelet analysis we will encounter many kinds of wavelets and other objects, such as wavelet packets. Unlike Fourier analysis, wavelet analysis favors no particular set of analyzing functions. There are many analyses, and we are led to the concept of a “box of tools” containing different analytic methods. Each of these methods provides a different way to view complexity. The choice of analytic method is justified by the goal of the analysis. These remarks apply particularly to signal processing. To analyze a signal means, in this book, to look for the constituent elements. These constituent elements are the elementary signals, the simplest signals into which the given signal can be decomposed. But an analysis makes sense only if it enables one to understand the properties of the object being analyzed and to understand its complexity. We will return to this aspect of signal analysis in section 1.3, where we introduce atomic decompositions. The term “coding” conies from information theory and signal processing, where it, like “analysis,” has many uses. “Transform coding” is a general term that refers to taking a linear transform of a signal or image. Fourier analysis is a form of transform coding, as are the algorithms discussed in Chapters 4 and 5. Note that “coding” and “analysis” do not always refer to linear processes. The coding by zero- crossings discussed in Chapter 8 is nonlinear. However, in each case, coding involves methods to transform the recorded numerical signal into another representation that is—depending on the nature of the signals studied— more convenient for some task or further processing. Decoding is simply the inverse of coding, and it means the same thing here as synthesis, or reconstruction. “Transmission” and “storage” have their ordinary meanings, but in the context of signal processing, these terms can involve layers of complexities. Every transmission channel, whether an old telegraph line or a modern satellite link, has a definite bandwidth and a computable cost for its use. Similarly, every storage medium has performance limitations and a price tag. The costs of information storage and transmission account for much of the economic motivation behind signal processing: The goals are to provide transmission and storage at a given level of performance for the lowest cost. Transmission and storage are often interrelated, in the sense that what is stored must be accessed and transmitted. These ideas will be illustrated later with examples, including the storage of fingerprints by the Federal Bureau of Investigation (FBI) and the storage and transmission of astronomical images (Chapter 12). The constraints placed on transmission and storage require that information be compressed. For example, it is too slow and too expensive to transmit raw images over the Internet. Before being transmitted, images are compressed using one of several schemes such as Joint Photographic Experts Group (JPEG) and Graphic Interchange Format (GIF). Very roughly, this is how the compression we will discuss works: A digital signal is analyzed, or coded. Either by design or luck, many of the coefficients that come from the coding are either zero or close to zero, and the other coefficients contain the “important information” or “significant features” of the signal. The small coefficients are set equal to zero, and the others are “quantized” and transmitted. These are received at the end of the channel and are used to decode, or synthesize, the signal.
4 CHAPTER 1 It is important to note that information typically is lost when small coefficients are set equal to zero and when the other coefficients are quantized. The trick, however, is to do the compression is such a way that the lost information is not noticed. If all of this is done cleverly, the reconstructed signal is, for the purposes at hand, as good as the original. A one-dimensional example is the digital telephone: The compression and transmission must be compatible with the 64 Kbit/second standard, which limits without recourse the quantity of information that can be transmitted in one second. At the same time, the quality must be such that the person at the receiver can recognize the voice at the other end. The compression we have just described should not be confused with another kind of compression that is well known to Internet users, namely, the compression of applications files. Here there must be absolutely no loss of information, and the decompressed file must be bit-for-bit the same as the original. This kind of compression is an example of what is called entropy coding, which is another use of “coding.” Most, but not all, uses of “coding” refer to either transform coding or entropy coding. Quantization is an unavoidable (and undesirable) part of this process. Theoret- ically, the coefficients given by a coding algorithm are arbitrary real or complex numbers, but practically, processors have finite precision, and they produce ratio- nal numbers whose dyadic expansions have a fixed length. The desired quality of the restored image, the channel capacity, and the cost dictate the length of the dyadic numbers that will be transmitted. Mapping the coefficients from the coding algorithm into a finite number of “bins” is called quantization or, more precisely, scalar quantization. A more sophisticated process called vector quantization maps vectors of coefficients into “bins” in ]Rn (n-dimensional Euclidean space). We will not be discussing quantization, but we wish to emphasize how important quantiza- tion is to the overall efficiency of the process. Quantization is an art, and the way it is done can “make or break” an algorithm. In most of the cases to be discussed, the analysis and synthesis (coding and de- coding, or reconstruction) are theoretically invertible processes: There is no loss of information, and one obtains perfect reconstruction of the original signal. Quan- tization, however, is not an invertible process and, unfortunately, it introduces systematic errors known as quantization noise. It is desired that the algorithms used for coding—taking into account the nature of the signals—reduce the effects of quantization noise. One of the advantages of quadrature mirror filters is that they “trap” this quantization noise inside well-defined frequency channels. These filters will be studied in Chapter 3. There is another aspect of the coding-transmission-decoding process that needs to be mentioned: Having quantized the coefficients into bins, it is customary to code the bins before transmission. This coding is entropy coding, and as indicated above, it is completely reversible. The idea is to transmit the information as efficiently as possible, using the statistical structure of the information to be transmitted. Perhaps the best-known example of entropy coding is the Morse Code, which codes the most frequently used letters with the simplest sequences of dots and dashes. The total efficiency of a compression scheme depends on the analysis, quantization, and entropy coding and how they work together. In addition to transmission and storage, there is a collection of signal processing tasks called diagnostics. Roughly speaking, this is like asking and answering a question about a signal. For example, Does a given sample of speech belong to one of several speakers? Or, Is an underwater acoustic signal coming from a submarine
SIGNALS AND WAVELETS 5 or a ship? For the most part, this book does not deal with diagnostics; however, a few comments are indicated. A diagnostic often depends on extracting a small number of significant parameters from a signal whose complexity and size are overwhelming. Some scientists believe that diagnostics would be easier if the signal or image has been correctly analyzed and compressed. From this point of view, analysis and the diagnostic are naturally related to data compression, and clearly, if this compression is done inappropriately, it can falsify the diagnostic. In the first edition of this book, we took the position that proper compression was relevant, or even necessary, for a given diagnostic task. Our position has changed, based mainly on a series of lectures by David Mumford df livered at the Institut Henri Poincare in the fall of 1998.1 We now feel that most diagnostic tasks are related to statistical modeling of a given collection of signals or images. Statistical modeling is an important field of research that is based on a fascinating set of tools. However, a discussion of statistical modeling lies well beyond the scope of this book. Finally, we mention restoration. Signal restoration is analogous to the restoration of old paintings. It amounts to ridding the signal of artifacts and errors, which we call noise, and to enhancing certain aspects of the signal that have undergone at- tenuation, deterioration, or degradation. We will discuss an application of wavelets to signal restoration in Chapter 11. So what are the goals of signal and image processing? Experts in signal pro- cessing are asked to develop, for a given class of signals, algorithms that perform certain tasks or operations. These algorithms should lead to the construction of microprocessors, like those that exist in cell phones and automobiles, that exe- cute these tasks automatically. Some of the important tasks have been described above: coding, diagnostics, quantization and compression, transmission or storage, decoding, and restoration. We will use several examples to illustrate the nature of these operations and the difficulties they present. It will become clear that no “universal algorithm” is appropriate for the extreme diversity of the situations encountered. Thus, a large part of this work is devoted to describing coding or analysis algorithms that can be adapted to particular classes of signals that one needs to process. Our first example illustrates restoration and diagnostic. One is interested in splitting a signal into the sum of two terms: The first term contains the informa- tion one wishes to recover, and the second term is the noise one wishes to erase. The problem is the study of climatic variations and global warming. This problem was discussed in detail by Professor Jacques-Louis Lions at the Spanish Institute in 1990 [174]. In this example, one has fairly precise temperature measurements from different points in the northern hemisphere that were taken over the last two centuries, and one tries to discover if industrial activity has caused global warming. The extreme difficulty of the problem stems from the existence of significant nat- ural temperature fluctuations. Moreover, these fluctuations and the corresponding climatic changes have always existed, as we have learned from paleoclimatology [250]. Thus, to have access to the “artificial” heating of the planet resulting from human activity and to develop a diagnostic, it is essential to analyze, and then to “erase,” these natural fluctuations, which play the role of noise. A more surprising example appears in neurophysiology. The optic nerve’s ca- pacity to transmit visual information is clearly less than the volume of information xThe ideas presented in these lectures can be found in [215] and [216].
6 CHAPTER 1 collected by all the retinal cells. Thus, there must be low-level processing of in- formation before it transits the optic nerve. David Marr developed a theory to understand the purpose and performance of this low-level processing, which is a type of coding and compression [198]. We present this theory in Chapter 8. The problems encountered in archiving data—as well as problems of transmis- sion and reconstruction—are illustrated by the FBI’s task of storing the American population’s fingerprints. Over 200 million fingerprint records must be stored, and the use of inked impressions on paper cards is no longer practical. The FBI began digitizing fingerprint records some years ago as part of a modernization program, but due to the massive amount of data (10 megabytes per record) it was decided that some form of compression was needed. In addition to efficient storage, it was also important to access the fingerprint files quickly and to transmit them electron- ically throughout the world. The goal was to be able to reconstruct the received image on a laptop computer, and the quality of the reconstructed image had to be such that the end user, whether a fingerprint expert or an automated fingerprint feature extractor, would have no difficulty interpreting the image. It was decided that coding and compression offered the only solution. Different image-compression algorithms were tested, and a wavelet-based algorithm, a variant of one described in Chapter 6, gave the best results, where “best” involved the speed of the algo- rithm as well as the compression ratio and the quality of the reconstructed image. This established the standard for fingerprint compression and reconstruction that is used today. (For further details, see Christopher Brislawn’s paper [41].) We have just described and illustrated some of the more important goals of signal and image processing that focus on compression, transmission and storage, and the attendant algorithms for coding and reconstruction. It is important to note, however, that there are many other significant problems in signal processing that will not be discussed. In particular, there is a vast area of signal processing based on probability and statistics that is beyond the scope of our work. As mentioned above, statistical modeling is crucial for high-level signal processing tasks like feature or pattern analysis and diagnostics. This is not to say that wavelets do not or will not play a role is this expanded arena; it is rather that here we limit ourselves, for the most part, to a deterministic theory. The few exceptions include some notes on Brownian motion and the appearance of noise in some of the examples. Before leaving this section, we believe it is important to reiterate a theme hinted at above: For the most part, we will be discussing coding algorithms and the role wavelets play in these algorithms. These techniques are clearly important in today’s technology, but they are only a part of the overall process. The quality of the total process depends on blending analysis, quantization, entropy coding, transmission, and decoding—all of which are interdependent—and ultimately, on implementing these processes in hardware. 1.3 Stationary signals, transient signals, and adaptive coding We have just defined a set of tasks, or operations, to be performed on signals or images. These tasks form a coherent collection. The purpose of this book is to describe a group of coding algorithms that have been shown, during the last few years, to be particularly effective for compression and for analyzing certain signals that are not stationary. We also will describe several “meta-algorithms” that allow one to choose the coding algorithm best suited to a given signal. To approach this problem of choosing an adaptive algorithm, we briefly classify signals
SIGNALS AND WAVELETS 7 by distinguishing stationary signals, quasi-stationary signals, and transient signals. A signal is stationary if its properties are statistically invariant over time. A well-known stationary signal is white noise, which in its sampled form appears as a series of independent drawings. A stationary signal can exhibit unexpected events, but we know in advance the probabilities of these events. These are the statistically predictable unknowns. The ideal tool for studying stationary signals is the Fourier transform. In other words, stationary signals decompose canonically into linear combinations of “waves,” that is, into sines and cosines. In the same way, some interesting classes of signals that are not stationary decompose more naturally into linear combina- tions of wavelets. These heuristics should not be taken too literally, since the full class of signals that are not stationary is too large to be processed by a single methodology. The study of nonstationary signals, where transient events appear that cannot be predicted, even statistically with knowledge of the past, necessitates techniques different from Fourier analysis. These techniques, which are specific to the nonstationary character of the signal, include wavelets of the time-frequency type and wavelets of the time-scale type. Time-frequency wavelets are suited, most specifically, to the analysis of quasi-stationary signals, while time-scale wavelets are adapted to signals exhibiting complicated geometrical features. Examples are edges in images and fractal or multifractal signals. Before defining time-frequency wavelets and time-scale wavelets, we will indicate their common points. They belong to a more general class of algorithms that are encountered in mathematics and in speech processing. Mathematicians speak of atomic decompositions, while speech specialists speak of decompositions in time- frequency atoms. The scientific reality is the same in both cases. As we have already mentioned, an atomic decomposition consists in extracting the simple constituents that make up a complicated mixture. Contrary to what happens in chemistry, the “atoms” that are discovered in a signal have no physical reality; they will depend on the point of view adopted for the analysis. These “atoms” will be time-frequency atoms when we study quasi-stationary signals, but they could, in other situations, be replaced by time-scale wavelets, which also are called Grossmann Morlet wavelets. These “atoms” or “wavelets” have no more physical existence than a specific number system used to do some numerical computation. Each number system has an internal coherence, but no scientific law asserts that multiplication must necessarily be done in base 10 rather than base 2. On the other hand, we feel the number system used by the Romans is excluded for practical reasons, since it is not particularly suitable for multiplication. Having different algorithms that allow us to code a signal by decomposing it into time-frequency atoms is a somewhat similar situation. The decision to use one or the other of these algorithms will be made by considering their “performance.” How well they perform must be judged in terms of one of the anticipated goals of signal processing. An algorithm that is optimal for compression can be disastrous for analysis: A standard L2 energy criterion for the compression could cause details that are important for the analysis to be systematically neglected. These thoughts will be developed and clarified in sections 1.6 and 1.7. At this point, however, we need to be more specific and define wavelets, which we do in the next two sections.
8 CHAPTER 1 1.4 Grossmann—Morlet time-scale wavelets Time-scale analysis—which should be called space-scale in the image case, and which is closely related to multiresolution analysis—involves using a vast range of scales. This notion of scale, which appropriately reminds us of cartography, implies that the signal (or image) is replaced, at a given scale, by the best possible approximation that can be drawn at that scale. By “traveling” from the large scales toward the fine scales, one “zooms in” and arrives at more and more precise representations of the given signal. The analysis is then done by calculating the change from one scale to the next. This produces the details that allow one, by correcting a rather crude approxima- tion, to move toward a better quality representation. This algorithmic scheme is called multiresolution analysis and is developed in Chapters 3 and 4. Multires- olution analysis is equivalent to an atomic decomposition where the atoms are wavelets. We define these wavelets by starting with a function -0 of the real variable t. This function is called a mother wavelet if it is well localized and oscillating. (It resembles a wave because it oscillates, and it is a wavelet because it is localized.) The localization condition is expressed in the usual way by saying that the function decreases rapidly to zero as \t\ tends to infinity. The second condition suggests that -0 vibrates like a wave. Mathematically, we require that the integral of ip be zero and that the other first m moments of ip also vanish. This is expressed by the relations y* tnip(t) dt = 0 for n = 0,1,..., m — 1. (1-1) The mother wavelet ip generates the other wavelets of the family ip(a,b)i a > 0, b E 1R, by change of scale and translation in time. (The scale of ip is conventionally one, and that of ip(a,b) is a > 0; the function ip is conventionally centered around zero, and ip(a,b) is then centered around 6.) Thus we have ^(a,b)(t) = ~y= m-—- ) , a > 0, (1.2) va \ a / Alex Grossmann and Jean Morlet showed in the early 1980s that this collection can be used as if it were an orthonormal basis when ip is real-valued [133]. This means that any signal of finite energy can be represented as a linear combination of wavelets ip(a,b) and that the coefficients of this combination are, up to a normalizing factor, the scalar products f(t)ip^a b^(t)dt. These scalar products measure, in some sense, the fluctuations of the signal f around the point b, at the scale given by a > 0. It required uncommon scientific intuition to assert, as Grossmann and Morlet did, that this new method of time-scale analysis was suitable for the analysis and synthesis of transient signals. Signal processing experts were at first annoyed by the intrusion of these two poachers on their preserve and made fun of their claims. This polemic had a short life, and in fact, the argument should never have arisen because the methods of time-scale or multiresolution analysis had existed for five or six years under various disguises: in signal analysis under the name of quadrature mirror filters and in image analysis under the name of pyramid algorithms. The first to report on this was Stephane Mallat. He constructed a guide that allowed the same signal analysis method to be recognized under very different pre-
SIGNALS AND WAVELETS 9 sentations, including wavelets, pyramid algorithms, quadrature mirror filters, and Littlewood-Paley analysis. Mallat’s brilliant observations led to the mathematical definition of multiresolution analysis, which provides a theoretical umbrella for our subject. Ingrid Daubechies discovered orthonormal wavelet bases having preselected reg- ularity and compact support [71] (see also [73]). The only previously known case was the Haar system (1909), which is not regular. Thus almost 80 years separated Alfred Haar’s work and its natural extension by Daubechies. On the other hand, the wavelets invented by Daubechies—or more precisely the biorthogonal versions developed slightly later—have taken less than 10 years to enter the mainstream of technology. The construction of Daubechies wavelets will be discussed in Chapter 3 and biorthogonal wavelets will be discussed in Chapter 4. The relevance of having smooth wavelets will be explained in Chapter 2. 1.5 Time-frequency wavelets from Gabor to Malvar and Wilson Dennis Gabor, in 1946, was the first to introduce time-frequency wavelets [124], and the functions he used are called Gabor wavelets. He had the idea to divide a wave—whose mathematical representation is cos(cj£ + 99)—into segments and to use one of these segments as the analyzing function. This was a piece of a wave, or a wavelet, which had a beginning and an end. To use a musical analogy, a wave corresponds to a note (A 440, for example) that has been emitted since the origin of time and continues indefinitely, without attenuation, until the end of time. A wavelet then corresponds to the same A 440 that is struck at a certain moment, say, on a piano, and is later muffled by the pedal. In other words, a Gabor wavelet has (at least) three pieces of information: a beginning, an end, and a specific frequency in between. Difficulties appeared when it was necessary to decompose a signal using Gabor wavelets. As long as one does only continuous decompositions (using all frequencies and all time), Gabor wavelets can be used as if they formed an orthonormal basis, in the same sense described above for the Grossmann-Morlet wavelets. There are problems, however, with a direct discrete version of the Gabor decomposition. In the late 1940s, a number of investigators, including Leon Brillouin, Dennis Gabor, and John von Neumann, felt that the system e27Vlkxg(x — I), k,l E Z, where g is the Gaussian g(x) = 7г-1/4е-а: /2, could be used as a basis to decompose any function in L2(R) (see [40], for example). Two physicists, Roger Balian (1981 in [17]) and Francis Low (1985 in [177]), proved independently that this is not the case. Furthermore, the Balian-Low theorem shows that the particular choice of g is not the problem and that the result cannot be true for any smooth, well-localized function. It is only recently, by abandoning Gabor’s approach, that two scientists working in different fields and in different parts of the world—Henrique Malvar in signal processing in Brasilia and Kenneth Wilson in physics at Cornell University—have discovered time-frequency wavelets having good algorithmic qualities. These special time-frequency wavelets, which we call Malvar-Wilson wavelets, are particularly well suited for coding speech and music. The decomposition of a signal in an orthonormal basis of Malvar-Wilson wavelets imitates writing music using a musical score. But this comparison is misleading be- cause a piece of music can be written in only one way, whereas there exists a non- denumerable infinity of orthonormal bases of Malvar-Wilson wavelets. Choosing
10 CHAPTER 1 one of these is equivalent to segmenting the given signal and then doing a tradi- tional Fourier analysis on the delimited pieces. What is the best way to choose this segmentation? This question leads us naturally to the next section. 1.6 Optimal algorithms in signal processing Which wavelet to choose? This question has often been posed at meetings held since 1985 on wavelets and their applications. But this question needs to be sharpened. What freedom of choice is at our disposal? What are the objectives of the choices we make? Can we make better use of the choices offered to us by considering the anticipated goals? These are several of the questions we will try to answer. The goal we have in mind is aptly illustrated by a remark Benoit Mandelbrot made in an interview on the French radio program France Culture. He noted that “the world around us is very complicated” and that “the tools at our disposal to describe it are very weak.” It is notable that Mandelbrot used the word “describe” and not “explain” or “interpret.” We are going to follow him in this, ostensibly, very modest approach. This is our answer to the problem about the objectives of the choices: Wavelets, whether they are of the time-scale or time-frequency type, will be chosen to describe as well as possible the reality around us. This description may lead to scientific understanding and the formulation of scientific laws, but once formulated, the wavelets themselves disappear. We have no reason to believe that there are scientific laws that are written in terms of wavelets. Thus our task is to optimize the description. This means that we must make the best use of the resources allocated to us -for example, the number of available bits —to obtain the most precise possible description. To resolve this problem, we must first indicate how the quality of the description will be judged. Most often, the criteria used are mathematical and do not have much to do with the user’s point of view. For example, in image processing, most calculations forjudging the quality of the description use the quadratic mean value of gray levels. It is clear, however, that our eye is much more sensitive and selective than this quadratic measure. Thus, in the last analysis, we should submit the performance of an “optimal algorithm” to the users, since the average approximation criterion that leads to this algorithm will often be inadequate. The case of speech (telephonic communication) or music is similar. The system- atic research that optimizes the reception quality is based on an L2 criterion that is mathematically convenient, but it is surely not the criterion used by the human ear. Ideally, we should have a two-stage program: the first based on mathematical criteria and the second based on user satisfaction. For the most part, the only stage we describe is the “objective search” for an optimal algorithm, even though its optimality is defined in terms of a debatable energy criterion. The search for mathematically tractable criteria that capture the performance of the human eye or ear continues as an open problem at the interface between mathematics and physiology. Progress has been made in this area, and we will discuss in Chapter 11 some new criteria that seem to be closer to the user’s point of view (at least for image processing) than the classical energy criteria. For example, in image synthesis, these criteria favor the reconstruction of sharp edges, which the eye is very quick to discern. Rather than formulate ad hoc algorithms for each signal or each class of signals, we will construct, once and for all, a vast collection called a library of algorithms.
SIGNALS AND WAVELETS 11 We also will construct a meta-algorithm whose function will be to find the particular algorithm in the library that best serves the given signal, given the criterion for the quality of the description. The number of signals recorded on 210 = 1024 points that take only the two values zero and one is 21024. It would be absurd to store all of these possible signals in our library. We will use a very large “library” to describe the signals, but we exclude this “library of Babel,” which would contain all the books, or all the signals in our case. But as everyone knows, the search for a specific book in the library of Babel is an insurmountable task. The “ideal library” must be sufficiently rich to suit all transient signals, but the “books” must be easily accessible. While a single algorithm, Fourier analysis, is appropriate for all stationary signals, the transient signals are so rich and complex that a single analysis method, whether of time-scale or time-frequency, cannot serve them all. If we stay in the relatively narrow environment of Grossmann-Morlet wavelets, also called time-scale algorithms, we have only two ways to adapt the algorithm to the signal being studied: We can choose one or another analyzing wavelet, and we can use either the continuous or the discrete version of the wavelets. For example, we can require the analyzing wavelet ip to be an analytic signal, which means that its Fourier transform ippF) is zero for negative frequencies. In this case, all the wavelets a > 0, b 6 ]R, generated by also will have this property, and their linear combinations given by the algorithm will be the analytic signal F associated with the real signal f. (For information about analytic signals see sections 2.6 and 2.7 or [222].) Similarly, we can follow Daubechies and for a given r > 1, choose for ip a real- valued function in the class Cr with compact support such that the collection 2j/2V’(2jz — A:), j,k € Z, is an orthonormal basis for L2(]R). In this discrete version of the algorithm, a = 2-J and b = A:2~-7. j, к 6 Z. In spite of this, the choices that can be made from the set of time-scale wavelets remain limited. The search for optimal algorithms leads us on some remarkable algorithmic adventures, where time-scale wavelets and time-frequency wavelets are in competition, and where they also are compared with intermediate algorithms that mix the two extreme forms of analysis. These considerations are developed in Chapters 6 and 7 and the question asked some years ago—Which wavelet to choose?—seems no longer relevant. The choices that we can and must consider no longer involve only the analyzing instrument, which is the wavelet. They also involve the methodology employed, which can be a time-scale algorithm, a time-frequency algorithm, or an intermediate algorithm. Today, the competing algorithms, time-scale and time-frequency, are included in a whole universe of intermediate algorithms. An entropy criterion permits us to choose the algorithm that optimizes the description of the given signal within the given bit allocation. Each algorithm is presented in terms of a particular orthogonal basis. We can compare searching for the optimal algorithm to searching for the best point of view, or best perspective, to look at a statue in a museum. Each point of view reveals certain parts of the statue and obscures others. We change our point of view to find the best one by going around the statue. In effect, we make a rotation; we change the orthonormal basis of reference to find the optimal basis. These reflections lead us quite naturally to the scientific thoughts of David Marr.
12 CHAPTER 1 1.7 Optimal representation according to Marr David Marr was fascinated by the complex relations that exist between the choice of a representation of a signal and the nature of the operations or transformations that such a representation permits. He wrote [198, pp. 20-21]: A representation is a formal system for making explicit certain en- tities or types of information, together with a specification of how the system does this. And I shall call the result of using a representation to describe a given entity a description of the entity in that representation. For example, the Arabic, Roman and binary numerical systems are all formal systems for representing numbers. The Arabic representation consists of a string of symbols drawn from the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), and the rule for constructing the description of a particular integer n is that one decomposes n into a sum of multiples of powers of 10... . A musical score provides a way of representing a symphony; the alphabet allows the construction of a written representation of words; and so forth. ... A representation, therefore, is not a foreign idea at all—we all use representations all the time. However, the notion that one can capture some aspect of reality by making a description of it using a symbol and that to do so can be useful seems to me a fascinating and powerful idea. But even the simple examples we have discussed introduce some rather general and important issues that arise whenever one chooses to use one particular representation. For example, if one chooses the Arabic numerical representation, it is easy to discover whether a number is a power of 10 but difficult to discover whether it is a power of 2. If one chooses the binary representation, the situation is reversed. Thus, there is a trade-off; any particular representation makes certain information explicit at the expense of information that is pushed into the background and may be quite hard to recover.2 This issue is important, because how information is presented can greatly affect how easy it is to do different things with it. This is evident even from our numbers example: It is easy to add, to subtract, and even to multiply if the Arabic or binary representations are used, but it is not at all easy to do these things—especially multiplication—with Roman numerals. This is a key reason why the Roman culture failed to develop mathematics in the way the earlier Arabic cultures had. There is an essential difference between Marr’s considerations and the algorithms that we develop in the first six chapters. The difference is that the choice of the best representation, according to Marr, is tied to an objective goal. For the problem posed by vision, one goal is to extract the contours, recognize the edges of objects, delimit them, and understand their three-dimensional organization. In contrast, the algorithms we present in this book are aimed only at reducing the amount of data. They were not designed to extract patterns or solve important scientific issues; sometimes they do. One can argue that compression is a necessary first step toward feature extraction and, conversely, that obtaining the “important features” 2These last italics are ours.
SIGNALS AND WAVELETS 13 of an image is indeed a form of compression. We, however, strongly believe that pattern recognition is not related to the kind of compression being discussed here. This position is based on our understanding of work by David Mumford. What we have said so far concerns the use of wavelets for signal and image processing, and indeed this is the major theme of our book. There is, however, a slightly different point of view that focuses on wavelets techniques (analysis and synthesis) as tools within mathematics. This aspect will appear in Chapter 10 where we illustrate the power of wavelet techniques by analyzing two examples of fractal functions: the Weierstrass function and the Riemann function. 1.8 Terminology The elementary constituents used for signal analysis and synthesis will be called, de- pending on the circumstances, wavelets, time-frequency atoms, or wavelet packets. The wavelets used will be either Grossmann-Morlet wavelets of the form = -^= a > 0, b e K, (1.3) the orthonormal wavelet bases that have the form = 2J/V(2^ - ft), J.fcgZ, (1.4) or the local Fourier bases of the form = ^(^ — 0 cos + 2) (^ ~ 0], € N, I 6 Z. (1-5) In the first two cases, we will speak of time-scale algorithms; in the last case, we will speak of time-frequency algorithms. Later we will mix the two points of view and subject the local Fourier bases to dyadic dilations. One thus encounters generalized time-frequency atoms. We will see in Chapter 10 (and in Appendix D) that the orthonormal wavelet bases of the form (1.4) have special properties that are not found in other decomposition algorithms. We will use only two very large “libraries.” The first consists of orthonormal bases whose elements are wavelet packets. In the second, the wavelet packets are replaced by the generalized time-frequency atoms that we have just described. 1.9 Reader’s guide In Chapters 2 through 7, we present the time-scale algorithms (Chapters 3 and 4) and time-frequency algorithms (Chapters 5, 6, and 7). Chapter 2 has a special status. We have tried to retrace some of the paths that led from Fourier analysis (Fourier, 1807) to wavelet analysis (Calderon, 1960, and Stromberg, 1981) and to the core of contemporary mathematics. Quadrature mirror filters are studied in Chapter 3 in the context of problems posed by the digital telephone. For this revised edition, we have added an ap- pendix that contains elementary information about filters, and thus complements Chapter 3. The pyramid algorithms described in Chapter 4 concern numerical image pro- cessing. They use precisely the quadrature mirror filters of Chapter 3, and they lead either to orthogonal wavelets or to biorthogonal wavelets. In Chapters 5 through 7, we will study time-frequency algorithms. The Wigner- Ville transform enables the signal to be “displayed in the time-frequency plane.”
14 CHAPTER 1 After indicating the main properties of the Wigner-Ville transform, we show that it leads to an algorithm that allows us to decompose a signal into new time-frequency atoms named “chirplets,” which are a kind of frequency-modulated Gabor wavelet. Two other algorithms that provide access to these “atomic decompositions” are presented in Chapter 6 (local Fourier bases) and Chapter 7 (wavelet packets). The first seven chapters form a coherent unit. This is not the case for the last five chapters; each of them treats a special application of wavelets and time-scale meth- ods. Chapter 8 deals with the possibility of coding an image using the zero-crossings of its wavelet transform. In Chapter 9 we discuss turbulence and some of the recent contributions wavelet analysis has made to this still-unsolved problem. This chap- ter also serves as an introduction to multifractal analysis; indeed, this subject was initially introduced as a tool for studying turbulence. Multifractal analysis is con- tinued in Chapter 10, where we show how wavelet analysis can be use to determine the Holder exponents, as a function of position, of a multifractal function. Chapter 10 contains analyses of the Weierstrass and Riemann functions. In Chapter 11 we describe the use of wavelets for denoising signals and images. This chapter also pro- vides a quick look at the connections between wavelets, nonlinear approximation, and Besov spaces -a mixture of seemingly unrelated techniques that is producing surprising and promising results. Chapter 12 is devoted to describing some the wavelet-based techniques that are being used in astronomy. Four appendices contain complementary material. Appendix A provides a brief introduction to the language and theory of filters. Classical results on the continu- ous wavelet transform and its inversion are presented in Appendix B. The results in Appendix В apply in particular to the inversion formula used in Chapter 10 for the analysis of Riemann’s function. Appendix C contains a presentation of a counter- example to a conjecture about zero-crossings; thus it is properly an appendix to Chapter 8. Although this counterexample has been known, this is the first time that a complete account has been published. Appendix D contains the definitions and a few basic results about Holder spaces and Besov spaces. These spaces are used elsewhere in the book, particularly in Chapters 9 and 11.
CHAPTER 2 Wavelets from a Historical Perspective 2.1 Introduction Time-frequency wavelets, which began with work by Dennis Gabor and by John von Neumann in the late 1940s, have a relatively long history in signal processing. Many of the fundamental contributions were subsequently achieved by physicists, and here we are thinking of work by Francis Low, Roger Balian, and Kenneth Wilson. Time-frequency wavelets have been widely used in speech processing, as will be shown in Chapter 6. Mathematicians did not pay much attention to this field. In contrast, the use of time-scale wavelets for signal and image processing is relatively recent, dating from the 1980s. However, in looking back over the history of mathematics, we will uncover at least seven different origins of wavelet analysis. Most of this work was done around the 1930s, and at that time the separate efforts did not appear to be parts of a coherent theory. Only today do we see how this work fits into the history of the theory of wavelets. We feel that it is important to describe these sources in some detail. Each of them corresponds to a specific point of view and a particular technique, which only now are we able to view from a common scientific perspective. What’s more, these specific techniques were rediscovered several years ago by physicists and mathe- maticians working on wavelets. Matthias Holschneider used, without knowing it, Lusin’s technique (1930) to analyze Riemann’s function (sections 2.6 and 10.4). Grossmann and Morlet rediscovered Alberto Calderon’s identity (1960) 20 years later. And to spare no one, Yves Meyer was not the first to construct a regular, well-localized orthonormal wavelet basis having the algorithmic structure of Haar’s system (1909): J.-O. Stromberg had done the same thing five years earlier [245]. Does this mean that everything had already been written? Not at all. More significantly, by rediscovering a number of known results, the “modern” wavelet investigators gave them new life and authority. Our debt to Grossmann and Morlet is not so much for having rediscovered Calderon’s identity as it is for having used it to analyze nonstationary signals. This early application of wavelets to signal processing certainly encountered resistance, and Calderon himself found this use of his work incongruous. The recent history of wavelets has been characterized by another phenomenon that we find scientifically important and sociologically interesting. From the be- ginning in the early 1980s, the “wavelet group” has consisted of researchers from several quite different disciplines, having different cultures and problems. The ex- changes within this group created a dynamic environment that we believe accounts for the rapid advances seen on at least two fronts: the synthesis and structuring of previous and new knowledge to produce a coherent theory of wavelets, and the
16 CHAPTER 2 rapid adoption of wavelet techniques in diverse disciplines outside mathematics. Not surprisingly, the most active interface has been between mathematics and sig- nal processing, and it can be fairly said that most applications in other fields have been through signal or image processing. But we wish to emphasize that the “flow” has been in both directions, and that mathematics has greatly profited by input from the other sciences. The most spectacular example (which is described in sec- tion 2.10) is the construction by Ingrid Daubechies of her celebrated orthonormal bases. As will be explained, this construction benefited from work in signal pro- cessing. Another example is Stephane Jaffard’s work on the analysis of multifractal functions, which was influenced by work on turbulence by Alain Arneodo and his team. The history of wavelets is reminiscent of the recent history of fractals. Fractal objects -long before the name was coined—appeared in mathematics more than a century ago. Georg Cantor’s triadic set is a prime example. In the mid nineteenth century, nobody would have suspected that fractals could be used to model natural phenomena in physics and chemistry as initiated by Mandelbrot. The success of this modeling led some scientists to speak about “a theory of fractals.” This theory has been questioned with skepticism by certain mathematicians. Nevertheless, one must acknowledge that Mandelbrot’s scientific vision has been a source of inspiration in contemporary mathematics. At a more artificial level, both fractals and wavelets involve “scale,” and so they enjoy a natural relationship, as we hope to show in Chapters 9 and 10. Our immediate objective in this chapter is to describe the links between signal processing and the different mathematical efforts that developed outside the “theory of wavelets”; a larger objective is to show how wavelet-based techniques in signal processing have been applied to disciplines outside mathematics. 2.2 From Fourier (1807) to Haar (1909), frequency analysis becomes scale analysis We first go back to the origins, that is, to Joseph Fourier. As is well known, Fourier asserted in 1807 that any 2?r-periodic function f could be represented by the sum o-o + cos kx + bk sin kx), fc=i which is called its Fourier series. The coefficients oq, a^, and bk (к > 1) are given by ao = X- / dx Jo and by 1 /*2 77 1 ak = — f (ж) cos kx dx, bk = — /(ж) sin kxdx. Jo Jo When Fourier announced his surprising results, neither the notion of function nor that of integral had yet received a precise definition. We can say conservatively that the mathematical justification of Fourier’s statement played an essential role in the evolution of the ideas mathematicians have had about these concepts.3 3We recommend Fourier Series and Wavelets by J.-P. Kahane and P. G. Lemarie-Rieusset [164] for the history of Fourier series.
WAVELETS FROM A HISTORICAL PERSPECTIVE 17 Before Fourier’s work, entire series were used to represent and manipulate func- tions, and thus the most general functions that could be constructed were endowed with very special properties. Furthermore, these properties were unconsciously as- sociated with the notion of function itself. By passing from a representation of the form 2 3 ao + aix + a,2X + a^x + • to one of the form «о + («1 c°s x + b\ sin x) + (a2 cos 2x + 62 sin 2z) + • • • , Fourier discovered, without knowing it, a new functional universe. In 1873, Paul Du Bois-Reymond constructed a continuous, 2?r-periodic function of the real variable x whose Fourier series diverged at a given point. If Fourier’s assertion were true, it could not be so in the naive sense imagined by Fourier. At that time, three new avenues were opened to mathematicians, and all three have led to important results: (1) They could modify the notion of function and find one that is adapted, in a certain sense, to Fourier series. (2) They could modify the definition of convergence of Fourier series. (3) They could find other orthogonal systems for which the divergence phe- nomenon discovered by Du Bois-Reymond in the case of the trigonometric system cannot happen. The functional concept best suited to Fourier series was created by Henri Lebesgue. It involves the space L2[0,2tt] of functions that are square integrable on the interval [0, 2тг]. The sequence 1 1 1 . 1 « 1 • « / X ____, -y=cosa?, —2=sina;, —— cos2.r. —=sm2x, ... (2-1) v2tt \J 7Г V71" v77 V77 is an orthonormal basis for this space. Furthermore, the coefficients of the decompo- sition in this orthonormal basis form a square-summable series, and this expresses the conservation of energy: The quadratic mean value of the developed function / is (up to a normalization factor) the sum of the squares of the coefficients. Finally, the Fourier series of / converges to / in the sense of the quadratic mean. The second way that was followed to avoid the difficulty raised by Du Bois- Reymond was to modify the notion of convergence. If the partial sums sn are replaced by the Cesaro sums an — ($q + • • • + sn_i)/n, then everything falls into place: The Fourier series of a continuous function f converges uniformly to f. The third route leads to wavelets. This was followed by Haar, who asked himself this question in his thesis: Does there exist another orthonormal system h0, hi,... ,hn,... of functions defined on [0,1] such that for any continuous function f defined on [0,1], the series (f,ho)ho(x) + (/, h^h^x) 4------H (/, hn)hn(x) H--- converges to /(a:) uniformly on [0,1]? Here we have written {u, v) = / u(x')v(x') dx,
18 CHAPTER 2 where v(x) is the complex conjugate of v(x), and we have chosen the interval [0,1] for convenience. As we will see, this problem has an infinite number of solutions. In 1909, Haar discovered the simplest solution and at the same time opened one of the routes leading to wavelets [134]. Haar begins with the function h such that h(x) = 1 for x 6 [0, |), h(x) = — 1 for x E [|,1). and h(x) = 0 for x [0,1). For n > 1, he writes n = 27 + fc, j > 0, 0 < A; < 27, and defines h„{x) = 23^2h(23x — k). The support of hn is exactly the dyadic interval In — [A:2~7, (k + 1)2-J), which is included in [0,1) when 0 < к < 27. To complete the set, define ho (a?) = 1 on [0,1). Then the sequence ho, hi,... , hn,... is an orthonormal basis (also called a Hilbert basis) for A2[0,1]. The uniform approximation of f(x) by the sequence Sn(f)(x) = (/, h0)h0(.r) H----F {fhn)hn(x) is nothing more than the classical approximation of a continuous function by step functions whose values are the mean values of /(a?) on the appropriate dyadic intervals. We can criticize the Haar construction on a couple of points. On one hand, the atoms hn used to construct the continuous function f are not themselves contin- uous functions, and thus there is a lack of coherence. But there is a more serious criticism. Suppose that instead of being continuous on the interval [0,1], f is a C1 function, which means that f is continuous and has a continuous derivative. Then the approximation of f by step functions would be completely inappropriate. In this case, a suitable approximation would be the one created from the graph of f by inscribing polygonal lines. These two defects of the Haar system and the idea of approximating the graph of f(x) with inscribed polygonal lines led Faber and Schauder to replace the func- tions hn of the Haar system by their primitives. This research began in 1910 and continued until 1920. Define the “triangle function” A by A(a?) = 0 if x [0,1], A(a?) = 2x if 0 < x < and A(a?) = 2(1 — x) if | < x < 1. Faber [100] and Schauder [234] considered the sequence An, n > 1, defined by An(.x) = A(27a: — к) for n = 23 + к, j' > 0, 0 < к < 23. The support of An is the dyadic interval In = [k2~3, (k + 1)2“7], and on this interval, An is the primitive of hn multiplied by 2 • 2J/2. For n — 0, we set Ao(.r) = x, and we add the function A_j(a:) = 1 to complete the set of functions. Then the sequence A_j, Aq, ... , An,... is a Schauder basis for the Banach space E of continuous functions on [0,1]. This means that every continuous function f on [0,1] can be written as f(x) = a + bx + ari An(.r) (2.2) n= 1 and that the series has the following properties: The convergence is uniform on [0,1] and the coefficients are unique. We note that the Haar system is not a Schauder basis of E because a Schauder basis of a Banach space must be made up of vectors of that space, and the functions hn are not continuous.
WAVELETS FROM A HISTORICAL PERSPECTIVE 19 The coefficients in (2.2) can be calculated directly by induction. We have /(0) = a and /(1) = a + b, which gives a and b. This allows us to consider a function /(a?) — a — bx. which is zero at x = 0 and x = 1 (and which we again denote by /). Once this reduction is made, we have /(|) = cq, which allows us to consider a function equal to zero at x = 0, x — and x = 1. The calculation continues with /(|) ~ a2 and /(j) = <тз, and so on. If we do not wish to “peel” f this way, the coefficients an can be computed directly by the formula a„ =/((fe+|)2-J) - + /((&+ 1)2-J)], (2.3) where n = 2J + k, j > 0, and 0 < к < 2J . We can give a further interpretation to (2.2). If, instead of being continuous, f were in C1, then we could differentiate (2.2) term by term and obtain the expansion of f' in the Haar basis. If f is in C1, the series (2.2) converges uniformly to f and the series differentiated term by term converges uniformly to f. Does this mean that the functions A„, n > 0, with the added function 1, constitute a Schauder basis for the Banach space C71[0,1]? As before, this is not the case because the functions An do not belong to the space in question. Following Holder, we define the space C^fO, 1], for 0 < h < 1, by the relation \f(x) — f(y)\ < C\x — y\h for some constant C > 0 and for all x,y 6 [0,1]. Then it is clear from (2.3) that |qti | < if f belongs to Ch. Since 2J < n < 2j + 1, we can also write |c>n| < Cn~h, n > 1. The converse, although much less evident, is nevertheless true when 0 < h < 1. It is not true if h = 1. Physicists are interested in the Holder spaces Ch because they occur naturally in the study of fractal structures. In fact, physicists wish to know more. They are interested in functions f whose Holder exponents h(a?o) vary from one point to another. This pointwise definition is slightly different: We say that f satisfies a Holder condition of exponent /г, 0 < h < 1, at a point Xq if \f(x) - f(x0)\ <C\x-x(}\h. (2.4) More generally, if m < h < m, + 1, m € N, then this definition should read \f(x)-Pni(x-x0)\<C\x-xQ\h, (2.5) where Pm is a polynomial of degree m. Then the Holder exponent of f at Xq is denoted by h(a;o) and defined to be the supremum of the h that satisfy (2.5). Contemporary science deals with numerous physical phenomena having multi- fractal structures. By this we mean that the Holder exponents of the function representing the structure vary from point to point in a particularly erratic way. To be more precise, we consider the set of points Ea where the Holder exponent /i(a?o) takes the value a. If these Ea are fractal sets, we say that f is multifractal. In this case, scientists are interested in determining the Hausdorff dimension d(a) of En as a function of a. (Hausdorff dimension is defined in section 9.10.) An example from mathematics of a multifractal object is the celebrated “non- differentiable” function attributed to Bernhard Riemann, which is defined by 12^=1. sin(7rn2a;)/7rn2. This example illustrates the point that the Fourier series of a function provides no directly accessible information about the function’s mul- tifractal structure. By using the wavelets of Lusin (which we present in section 2.6), Holschneider and Tchamitchian obtained a new proof of Gerver’s theorem
20 CHAPTER 2 [127, 128], which states that Riemann’s function is differentiable at certain ratio- nal multiples of 7? [144]. More recently, Stephane Jaffard has provided a complete analysis of the multifractal nature of Riemann’s function using wavelet techniques [153]. We describe this work in Chapter 10. A second example is the signal coming from fully developed turbulence. The multifractal structure of this signal has been studied by Alain Arneodo and his collaborators. We present this example in Chapter 9. Conceivably, the pointwise Holder exponents could be computed by going back to the definition. However, the example of the Riemann function shows that such an approach is too crude to yield practical results. Furthermore, for applications outside mathematics, this approach offers no way to take into consideration the in- evitable noise that alters a signal. The Schauder basis presents the same difficulties because the calculation of the coefficients an (according to (2.3)) calls directly on explicit values of the signal. Today, we are fortunate to have much more subtle ways to attack this problem. Specifically, the pointwise Holder exponents are now determined using wavelet anal- ysis. The wavelet coefficients replace those given by formula (2.3). They are less sensitive to noise because they measure, at different scales, the average fluctuations of the signal. These methods will be described in Chapters 9 and 10. 2.3 New directions of the 1930s: Paul Levy and Brownian motion Brownian motion is a random process. We will limit our discussion to the one- dimensional case. We thus write X(t,cj) for the Brownian motion: t denotes time, ш belongs to a probability space Q, and X(t, ш) is regarded as a real-valued function of time depending on the parameter ш. To obtain a realization of Brownian motion, we choose a particular orthonormal basis Zi(t), i 6 I, for the usual Hilbert space A2(1R). Then we know that the derivative (in the sense of distributions) ^X(t,cj) is written as i&I where the ^j(cj), i G I, are independent, identically distributed Gaussian random variables with zero mean. The problem is to choose the best possible representation of Brownian motion. As in all signal processing problems, it is certainly advisable to have in mind what we wish to study. If we wish to examine the spectral properties of Brownian motion, we are led to select the Fourier representation. The real line is cut into intervals [2Ztt, 2(Z + 1)тг], I 6 Z, and the trigonometric system is used on each of the intervals. In its real form, this is the trigonometric system (2.1). However, if we wish to highlight the local regularity of Brownian motion, Fourier analysis is inadequate. On the other hand, the analysis using the Schauder basis immediately reveals the Holder regularity CQ, a < |, of the Brownian motion trajectories. We start with the Haar basis for T2(1R) composed of the functions hn(t — I), n > 0, I G Z, and expand the white noise ^X(i,cu) in this orthonormal basis. By taking primitives, we obtain the development of Brownian motion in the Schauder basis. To simplify matters, we restrict the discussion to Brownian motion on the
WAVELETS FROM A HISTORICAL PERSPECTIVE 21 interval [0,1]. For this case, I = 0, and X(£,w) = g0(u)t + | y^2~j/2ffn(^)An(t), n=l where the gn(w) are independent, identically distributed Gaussian random variables with mean zero and variance one. This expansion often is called the “midpoint displacement construction.” This refers to the specific geometry of the “error term” a(j, к)A(2JT — k) in the Schauder basis expansion. Adding this term amounts to moving the midpoint of the preceding (piecewise affine) approximation of /(x). This midpoint displacement is precisely a(j, k). In the case of Brownian motion, these displacements are 2-J/2gn((u). To verify that the function X(t,(u) belongs to the Holder space Cn for almost all ш E Q, it is sufficient to show that 2-J/2|^n((u)| < С(ш)2~^а. If, for almost all well, one had supn>0 |gn(a;)| < oo, then the trajectories of the Brownian motion would almost surely belong to the space . But this is not the case, and instead we have supn>2(|gn(^)|/n) < 00 f°r almost all w E Q. Then the criterion for Holder regularity gives \X(t + h, a?) — X(t, cu)| < h log where C(iv) < oo for almost all wEd. We see from this theorem of Paul Levy how a representation in a particular basis can provide easy access to certain aspects of a problem. In this case, the Schauder basis provides quick access to local regularity properties of Brownian trajectories. As we were told by Gerard Kerkyacharian, this elegant proof was not given by Levy, although the tools we are using were available to him. Zbigniew Ciesielski [53] was the first to relate the midpoint displacement construction of Brownian motion to its global regularity. Fabrice Sellan [2] has extended this analysis to the case of fractional Brownian motion, as it was proposed by Mandelbrot and J. W. van Ness to model certain noise (see also [111]). He has found wavelets that, when suitably normalized, do for fractional Brownian motion what the Schauder basis did for ordinary Brownian motion. The coefficients in Sedan’s basis are uncorrelated Gaussians. This repre- sentation allows one to simulate precisely and efficiently the long-range correlations found in fractional Brownian motion. (For more about this, see the note at the end of Chapter 4.) Albert Benassi, Stephane Jaffard, and Daniel Roux have gen- eralized these ideas to the “elliptic Gaussian fields” [31]. This work demonstrates that multiresolution methods are well adapted to the analysis and synthesis of some Gaussian processes. 2.4 New directions of the 1930s: Littlewood and Paley We have shown with the example of Brownian motion how the Schauder basis provides direct and easy access to local regularity properties. On the other hand, the analysis of these properties using the trigonometric system is considerably more involved. Similar difficulties are encountered when we try to localize the energy of a func- tion. To be more precise, we first focus on 27r-periodic functions f and their Fourier
22 CHAPTER 2 series expansions. The integral A. JQ27r |/(x)|2 dx, which is the mean value of the en- ergy, is given directly by the sum of the squares of the Fourier coefficients. However, it is often important to know if the energy is concentrated around a few points or if it is distributed over the; whole interval [0,2%]. This determination can be made by calculating A. |/(x)|4dx or, more generally, |/(x)|pdx for 2 < p < oo. When the energy is concentrated around a few points, this integral will be much larger than the mean value of the energy, while it will be the same order of magni- tude when the energy is evenly distributed. We write ||/||p = (^ |/(x)\p dx)1/13 and, for obvious reasons of homogeneity, we compare the norms ||/||p to determine if the energy is concentrated or dispersed. But if p is different from 2, we can neither calculate nor even estimate these norms ||/||p by examining the Fourier coefficients of f. The information needed for this calculation is hidden in the Fourier series of /; to reveal it, it is necessary to subject the series to manipulations that were discovered by Littlewood and Paley as long ago as 1930. Littlewood and Paley define the dyadic blocks Ajf by Aj/N = (afc cos kx + bk sin kx), 2J <fc<2' + 1 where cos kx + 6,. sin kx) denotes the Fourier series of f. Then /(x) = a0 + £ A.J(x), 1=0 and the fundamental result of Littlewood and Paley is that there exists for each p, 1 < p < oo, two constants Cp > cp > 0 such that If p — 2, Cp — cp — 1, and there is equality in (2.6). Up to this point, wavelets have not yet appeared. The path that leads from the work of Littlewood and Paley to wavelet analysis passes through the research done by Antoni Zygmund’s group at the University of Chicago. Zygmund and the mathematicians around him sought to extend to n-dimensional Euclidean space the results obtained in the one-dimensional periodic case by Littlewood and Paley. It was at this point that a ’‘mother wavelet” 0 appeared. It is an infinitely dif- ferentiable, rapidly decreasing function, defined on the Euclidean space IRC, whose Fourier transform ф satisfies the following four conditions, where a is chosen by hypothesis in the interval (0, |]: (1) ф(£) = 1 if 1 + a < lei < 2-2q. (2) '0(e) = 0 if lei < 1 - a or lei > 2 + 2a. (3) 0(e) is infinitely differentiable on IRC. (4) Ж2--Ч)I2 = 1 for all ? 0. Condition (4) is not as complicated as it appears. It is sufficient to verify it for 1 — a < |el < 2 — 2a, and then only two cases arise: If 1 — a < |e| < 1 + a,
WAVELETS FROM A HISTORICAL PERSPECTIVE 23 condition (4) reduces to |'0(^)|2 + |'0(2^)|2 = 1, while if 1 + a < |£| < 2 — 2a, it is automatically satisfied since one term is equal to one and all the others are zero. Condition (4) implies that the analysis of Littlewood-Paley-Stein (whose defini- tion will be given in a moment) conserves L2 energy. In the one-dimensional case, this same condition is satisfied by every mother wavelet having the property that 2j/2^(2jx — kf j, к E Z, is an orthonormal basis for L2(IR). It also antici- pates similar conditions shared by the quadrature mirror filters (Chapter 3) and the Malvar-Wilson wavelets (Chapter 6). The theory for IRC proceeds by setting тДДх) = 2nj'0(2JT) and replacing the dyadic blocks of Littlewood and Paley with the convolutions — f * The Littlewood-Paley-Stein function is defined by Zoo \V2 If f belongs to L2(IRC), the same is true for g, and \\f ||2 = Ц^Цг (the conservation of energy). If 1 < p < oo, there exist two constants Cp > cp > 0 such that for all functions f belonging to LP(R"), ср1Ы1р < II/lip < CpH^Hp, (2-7) where / r \ i/p ll/llp = I / \f(x)\pdx} uv / The Littlewood Paley-Stein function g provides a method for analyzing f in which a major role is played by the; ability to vary arbitrarily the scales used in the analysis; by the same token, the notion of frequency plays a minor role. The dilations of size 2J are present in the definition of the operators Aj. Neverthe- less, conditions (1) and (2) endow these operators with a frequency content. The sequence of operators Aj, j 6 Z, constitutes a bank of band-pass filters, oriented on frequency intervals covering approximately one octave. Littlewood-Paley tech- niques have been extensively developed by Stein and his collaborators. We refer to [242], [243], and [114] for detailed descriptions of their applications in analysis. Thanks to the work of Marr and Mallat (which we describe in Chapter 8), the Littlewood Paley analysis provides an effective algorithm for numerical image pro- cessing. 2.5 New directions of the 1930s: The Franklin system In 1927, Philip Franklin, who was a professor at the Massachusetts Institute of Technology, had the idea to create an orthonormal basis from the Schauder basis by using the Gram-Schmidt process. This produces a sequence (/n)n>-i beginning with /-i(.r) = 1, /о(ж) = 2v/3(t— j),..., which is an orthonormal basis for L2[0,1]. This sequence (/n)n>-i is called the Franklin system and satisfies 1 r1 fn(x)dx = / xfn(x)dx = 0 for n > 1. Jo The Franklin system has advantages over both the Schauder basis and the Haar basis. It can be used to decompose any function / in L2 [0,1], which the Schauder
24 CHAPTER 2 basis does not allow, and it can be used to characterize the spaces Ca, 0 < a < 1, by the relation \ {f, fn) \ < Cn-1/2- 3", which the Haar basis does not allow. Thus the Franklin system works as well in relatively regular situations as it does in relatively irregular situations. The weakness of the Franklin basis is that it no longer has a simple algorithmic structure. The functions of the Franklin basis, unlike those of the Haar basis or those of the Schauder basis, are not derived from a fixed function ф by integer translations and dyadic dilations. This defect caused the Franklin system to be unattractive for applications. Zbigniew Ciesielski revived the forgotten Franklin system in 1963 by showing that it is localized [54, 55]. There exists an exponent 7 > 0 and a constant C > 0 such that |/п(т)| < C2j/2 exp(-7|2hr - k\) if 0 < x < 1, n = 2J + /7 0 < A; < 2J , and < C23y/2exp(-7|2<r - A:|). Thus, on a mathematical level, everything works as if fn(x) = 2д/2ф(2Фх — кф where is a Lipschitz function having exponential decay. Today we are aware of a much closer relationship between the Franklin system and wavelets (see [150]). Asymptotically the functions of the Franklin system become arbitrarily close to the orthonormal wavelet basis discovered by Stromberg in 1980. In fact, for n = 2J+ к, 0 < к < 2J, /n(x) = 2j/2-0(2Lr - к) + гп(ж) where, for a certain constant C, ||rn(x)||2 < C(2 - x/3)d(n), d(n) = inf(fc,2J - k). (2.8) The function which was discovered in 1980 by Stromberg, is completely explicit. It has the following three properties: (1) ф is continuous on the whole real line, it is linear on the intervals [1,2], [2,3],... , [Z,Z + 1],... , and it is linear on the intervals Z + 1 Z 2 ’ 2 (2) \ф(х)\ <67(2-^3)14 (3) 2^2ф(23х - k), j, kEZ, is an orthonormal basis for L2(R). Note that (2 — л/3) < 1; hence condition (2) means that ф decreases rapidly at infinity, and (2.8) means that ||т(т)||2 is small when d(n) is large.
WAVELETS FROM A HISTORICAL PERSPECTIVE 25 2.6 New directions of the 1930s: The wavelets of Lusin This section is in the right place historically, but scientifically it should come after the next section, since Lusin’s work is an example of continuous wavelet expansions. The interpretation of Lusin’s work in terms of the theory of wavelets would probably astonish its author. But it is certainly the best reading, the one that gives the greatest beauty to Lusin’s work. We begin by introducing the object of Lusin’s study, namely, the Hardy spaces НР(Ж), where 1 < p < oo. Let P denote the open, upper half-plane defined by z — x + iy and у > 0. A function f(x + iy) belongs to HP(IR) if it is holomorphic in the half-plane P and if / roe \ i/p sup I / \f(x + iy)\p dx I < oo. (2-9) y>0 \J — схэ / When this condition is satisfied, the upper bound, taken over у > 0, is also the limit as у tends to zero. Furthermore, /(ж + iy) converges to a function denoted by f(x) when у tends to zero, where convergence is in the sense of the Lp norm. The space НР(*С) can thus be identified with a closed subspace of LP(IR), which explains the notation. Hardy spaces play a fundamental role in signal processing. One associates with a real-valued signal /, defined for all t e R of finite energy, the analytic signal F for which f is the real part. By hypothesis, the energy of f is | f(t)|2 dt < oo, and we require that F have finite energy as well. This implies that F belongs to the Hardy space №(IR). Then F(t) = /(t) + ig(t), and the function g is the Hilbert transform of f. For further information about analytic signals, the reader may refer to [222]. One may also consult the remarkable exposition by Jean Ville [254]. Read in the light of the theory of wavelets, Lusin’s work concerns the analysis and synthesis of functions in HP(R) using “atoms” or “basis elements,” which are the elementary functions of HP(W). In fact, these are the functions (z — £)“2, where the parameter £ = и + iv belongs to P. In Lusin’s work, the Hardy space HP(F) was used as a tool to provide a better understanding of the LP(IR) space. More specifically, singular operators were shown to be bounded on LP(IR) by first studying their action on HP(W). In the latter case, such an operator is understood through its action on the building blocks (z — С)-2, C 6 F. Thus one wishes to obtain effective and robust representations of the functions f in Hp(jP) of the form f(z) = Уу* (z — C)~2O!(C) dudv, (2-10) where £ = u + iv and where q(£) plays the role of the coefficients. These coefficients should be simple to calculate, and their order of magnitude should provide an estimate of the norm of f in Lfp(R). Furthermore, we are interested in relating the decomposition of f given by (2.10) to a wavelet decomposition as defined in the next section. The synthesis is obtained by the following rule. We start with an arbitrary measurable function «(£), subject only to the following condition introduced by Lusin: The quadratic functional A defined by / \ 1/2 А(ж) ~ \ I \a(u + iv)\2v~2 dudv , J
26 CHAPTER 2 where Г(ж) = {(«, v) 6 IR2 | v > |u — ж| }, must be such that f^o(A(x))p dx is finite. Note that this condition involves only the modulus of the coefficients a(£). If the integral f^(A(x))p dx is finite, then necessarily /(ж) = (ж — С)~2<т(С) dudv belongs to 7/p(IR), and if 1 < p < oc, / roo \ i/p \\f\\P<C(p)(J JA(xWdx) . (2.11) The left member of (2.11) is the norm of f in 7/p(IR), as defined by (2.9). The estimate given by the right member of (2.11) is sometimes very crude. If, for example, /(ж) = (ж + г)-2 and if one makes the natural choice of the Dirac measure at the point i for a//), then the second member of (2.11) is infinite. This paradox arises because the representation (2.10) is not unique. To obtain a unique decomposition, which we call the natural decomposition, we restrict the choice to q(£) = ^vf'(u + w). When we do this, the two norms ||/||p and ||Л||р become equivalent if 1 < p < oc. Today this natural choice of coefficients has an interesting explanation. This interpretation, which depends on the contemporary formalism of wavelet theory, is given in the following section. 2.7 Atomic decompositions from 1960 to 1980 Guido Weiss, in collaboration with Ronald Coifman, was the first to interpret, as we have just done, Lusin’s theory in terms of atoms and atomic decompositions. The atoms are the simplest elements of a function space, and the objective of the theory is to find, for the usual function spaces, the atoms and the “assembly rules” that allow one to reconstruct all the elements of the function space using these atoms. In the case of the holomorphic Hardy spaces of the last section, the atoms were the functions (z — С) 2- С £ P*, and the assembly rules were given by the condition on Lusin’s function A. For the spaces Lp[0, 2-тг], 1 < p < oc, the atoms cannot be the functions cos kx and sin kx, к > 1, because this choice does not lead to assembly rules that are sufficiently simple and explicit to be useful in practice. Marcinkiewicz showed in 1938 that the simplest atomic decomposition for the spaces Lp[0,1], 1 < p < oc, is given by the Haar system. The Franklin basis would have served as well, and from the scientific perspective given by wavelet theory, the Franklin basis and Littlewood- Paley analysis are naturally related. One of the approaches to atomic decompositions is given by Calderon’s identity. To explain Calderon’s identity, we start with a function ф belonging to L2(IRn). (Later in this history, Grossmann and Morlet called this function an analyzing wavelet.) Its Fourier transform //£) is subject to the condition that [ 1Ж)|/ = 1 (2.12) Jo 1 for almost all £ 6 IRn. If ф belongs to L1 (IRn), condition (2.12) implies that [ //ж) dx = 0.
WAVELETS FROM A HISTORICAL PERSPECTIVE 27 We write ф(х) — ?£( —ж), ^(ж) = t~nz£(j), and ^t(x) = t-nz£(j). Let Qt denote the operator defined as convolution with its adjoint is the operator defined as convolution with -0f. Calderon’s identity is a decomposition of the identity operator, written symboli- cally as i= [°°QtQ^- Jo T This means that for all f G L2(R), fl I f= / <?«[«(/)] T Jo 1 where the limit of this improper integral is to be taken in the sense of L2(R). Grossmann and Morlet rediscovered this identity in 1980, 20 years after the work of Calderon. However, with this rediscovery, they gave it a different interpretation by relating it to the coherent states of quantum mechanics [133]. They defined wavelets (generated from the analyzing wavelet ф) by = —j, «>0, HR”. \ a / In the analysis and synthesis of an arbitrary function f belonging to L2(R7')i these wavelets </;(a,b) are going to play the role of an orthonormal basis. The wavelet coefficients W (a, 6) are defined by T(fl,6) = (/,^4 (2.13) where (u,v) = f u(x)v(x) dx. The function f is analyzed by (2.13). The synthesis of f is given by JV) = Г [ W(a,b^,(nM(X)db-^. (2.14) Jo Jr71 a This is a linear combination of the original wavelets using the coefficients given by the analysis. We return to the specific case of the Hardy spaces №(R) for 1 < p < oo. The analyzing wavelet ф(х) = ^(ж + z)”2 chosen by Lusin is the restriction to the real axis of the function + z)-2; it is holomorphic in P and belongs to all of the Hardy spaces. The Fourier transform of ф is ?3(£) = —2£e-‘’ for £ > 0 and ф(£) = 0 if £ < 0. Condition (2.12) is not satisfied; however, we have Г.А f 1 if £ > 0, / ivvoi2- = L .Л<п (2-15) Jo ‘ 10 if 4 < 0. Condition (2.15) implies that the wavelets ф(а,ь) generate H2(R) instead of L2(R) when а > 0, b G R. The wavelet coefficients of a function f belonging to the Hardy space №(R) are then W(a,b) = </, V\a,b)> = “ [ - \2 dx- 7Г J_oo (ж — b — гаф By Cauchy’s formula, this is equal to 2iay/af'(b + ia), since f is holomorphic in P. Thus the representation (2.14) of a function in the Hardy space №(R) coincides with the natural representation that we defined in the preceding section.
28 CHAPTER 2 2.8 Stromberg’s wavelets The real version of the holomorphic Hardy space H1(R) is denoted by ?Z1(IR). It is composed of the real-valued functions и(ж) for which there exists a real-valued function v(ж) such that и(ж) + гг?(ж) belongs to 7/1(IR). In other words, и(ж) belongs to ?Y1(IR) if and only if и and its Hilbert transform u belong to LX(IR). Research on atomic decompositions for the functions in the Hardy space ?Y1(IR) takes two completely different approaches: One involves the atomic decomposition of Coifman and Weiss, and the other concerns the search for unconditional bases for the space H1. Here is an outline of these theories. Coifman and Weiss showed that any function f of It1 can be written as /(ж) = (2-16) k=0 where the coefficients Ад, are such that lAfc| < 00 anc^ where the functions are atoms of H1. The conditions for a function to be an atom are the following: For each flfc, there exists an interval Ik such that ak(x) — 0 outside of Д, |flfc(x)| < l/|7fc| (]Ik\ is the length of Д), and ak(x)dx = 0. These three conditions imply that the norms of the ak in 7Y1 are bounded by a fixed constant. The price to pay for this extraordinary decomposition is that it is not given by a linear algorithm, and this naturally raises the problem of finding one. Finding an unconditional basis means constructing, once and for all, a sequence of functions bk of It1 that are linearly independent, in a very strong sense, and such that any function f of 7-f1 can be decomposed in the form OO /И = 52 fc=0 where the scalars (3k are defined explicitly by the formulas (Зк = У f (x)gk(x') dx. Here the gk are specific functions in the dual of H1; that is, they are BMO functions. The strong independence property is this: There exists a constant C such that if two sequences of coefficients (3k and Xk satisfy \(3k\ < |Afc| for all к, then J2 Д6д,(ж) <C ^Xkbk(x) , fc=0 fc=0 where || • || is the norm of the function space It1. Wojtaszczyk proved that the Franklin system {/n}neN without the function /(ж) = 1 is an unconditional basis for the subspace of ?Z1(IR) composed of functions that vanish outside the interval [0,1] [262]. Stromberg showed that the orthonor- mal wavelet basis 2-?/2?/;(2-7ж — A;), j, к E Z, defined in section 2.5 is, in fact, an unconditional basis for the space ?Y1(IR) [245]. Does there exist a relation between these two types of atomic decompositions? We first point out the main difference: The decomposition (2.16) of a function is not unique, and in some sense the atoms ak must be fitted to the function f. Thus the decomposition algorithm is not linear. On the other hand, one way to construct
WAVELETS FROM A HISTORICAL PERSPECTIVE 29 the atoms for (2.16) is to start with the expansion of f in an orthonormal basis of compactly supported wavelets and to group the wavelets to form the atoms. These groups of wavelets are a little like the dyadic blocks of Littlewood and Paley; however, in this case, they are defined by considering the moduli of the coefficients ctj^k of this series. The interested reader is referred to [203]. (The construction of wavelets with compact support will be developed in Chapter 3.) 2.9 A first synthesis: Wavelet analysis Thanks to the historical perspective that we enjoy today, we can relate the Haar system, the Littlewood-Paley decomposition (1930), the version of Franklin’s basis given by Stromberg (1981), and Calderon’s identity (1960) to one another. This first synthesis will be followed by a more inclusive synthesis that encom- passes the techniques of numerical signal and image processing. This second syn- thesis will lead to Daubechies’s orthonormal bases. This first synthesis is based on the definition of the “wavelet” and on the concept of “wavelet analysis.” We will see that the success of this synthesis depends on a certain lack of specificity in the original definition. When wavelets were first defined, mathematicians had not created a general formalism covering all of the examples we presented above. A physicist and a geophysicist, Grossmann and Morlet, provided a definition and a way of thinking based on physical intuition that was flexible enough to cover all these cases. Starting with the Grossmann-Morlet definition, we will present two other definitions and indicate how they are related. The first definition of a wavelet, which is due to Grossmann and Morlet, is quite broad. A wavelet is a function fi in L2(R) whose Fourier transform -0(£) satisfies the condition IV’(^)!2^ = 1 almost everywhere. The second definition of a wavelet is adapted to the Littlewood-Paley-Stein theory. A wavelet is a function fi in L2(IRn) whose Fourier transform -0(£) satisfies the condition |V,(2-3|2 = 1 almost everywhere. If fi is a wavelet in this sense, then у/ log 2 fi satisfies the Grossmann-Morlet condition. The third definition refers to the work of Haar and Stromberg. A wavelet is a function fi in L2(IR) such that 23/2fi(23x — kf j, к E Z, is an orthonormal basis for L2(IR). Such a wavelet fi necessarily satisfies the second condition. This shows that in going from the first to the third definition we are adding more conditions and thus narrowing the choice of functions that will be wavelets. What we gain is a more economical (less redundant) representation of the analyzed func- tion. In the general Grossmann-Morlet theory—which is identical to Calderon’s theory—the wavelet analysis of a function f yields a function W(a, b) of n + 1 vari- ables a > 0 and b 6 IRn. This function is defined by (2.13): W(a,b) = (f, fi(a,bf), where fi(a,b)(x) = a~n^2fi(^^)^ a > 0, b E R". In the Littlewood-Paley theory, a is replaced by 2~fi while b is denoted by x. Thus, if Г is the multiplicative group {2~3, j E Z}, then the Littlewood-Paley analysis is obtained by restricting the Grossmann-Morlet analysis to Г x Rn. In the Franklin-Stromberg theory, a is replaced by 2~3 and b is replaced by k2~3, where j, к E Z. In other words, the analysis of f in the Franklin-Stromberg basis is obtained by restricting the Littlewood-Paley analysis to the “hyperbolic lattice” S in (0, oc) x R consisting of the points (2~J, k2~3f fi к G Z. The logical relations between these wavelets analyses are easy to verify. We start with the Grossmann-Morlet analysis, which is equivalent to the Calderon identity. This is written I = Jo°° QtQt^t > where Qt(/) = f * fit- This
30 CHAPTER 2 becomes I — ^ZZZxj hi the Littlewood-Paley theory. Indeed, if t = 2 J, then one has Qt(f') = А7(/). Replacing t by 2~3 and the integral Jo°° u(t)y by the sum JZZZ-^ u(2_J) is completely classic. To relate Littlewood-Paley analysis to the analysis that is obtained using the orthogonal wavelets of Franklin and Stromberg, we write Uj(-'r) ~ 23фф23х) and Vb^) = 23ф(23хф where ^(ж) = ж). We let denote the convolution product f*ipj and A J denote the adjoint of the operator Aj : L2 (R) —> L2(IR). Then д;(/) = f * The coefficients a(j, k) of the decomposition of f in Stromberg’s orthonormal basis are then given by k) — 2J/2 У /(x)U(2jt — к) dx Thus we see that the wavelet coefficients are obtained by sampling the dyadic blocks A*(/) on the grid 2-JZ. This sampling is consistent with Shannon’s theorem. In all three cases, wavelet analysis is followed by a synthesis that reconstructs f from its wavelet transform. In the case of Grossmann-Morlet wavelets, this synthesis is given by the identity (2.14), which we rewrite here: Г[ (2.17) Jo JRn a In the case of the Littlewood-Paley analysis, the integral is replaced, as we have already mentioned, by the sum W(2-J) and (2.17) becomes A'-) = E/' (Д-/)(Ь)Л(х'-<>)Л. (2.18) _oo Jr” Finally, in the case of Stromberg’s orthogonal wavelets in one dimension, the last integral becomes a sum, and (2.18) becomes =£ isk^jAxY The preceding arguments may seem less than exciting, since the hypotheses on ф are designed specifically for the analysis of the space L2 of square-summable functions. This is the setting in which Grossmann and Morlet wrote their theoretical work. But this is evidently a sort of regression, for we have just shown that across a century of mathematical history, wavelet analysis was created specifically to analyze function spaces other than L2. Fourier analysis serves admirably for the analysis of L2. If we want wavelets to be useful for the analysis of other function spaces, it is necessary to impose conditions on the wavelets in addition to those we have already given. Until now we have required only that the analysis preserves energy or, equiv- alently, that the synthesis gives an exact reconstruction (although this equivalence is not immediately obvious). These new conditions concern the regularity of the wavelet the decay at infinity of ф, and the number of vanishing moments of ф. For example, we can require that ф belongs to the Schwartz class and that all of its
WAVELETS FROM A HISTORICAL PERSPECTIVE 31 moments vanish. Or, in the case of Daubechies’s wavelets, we can require that ip has m continuous derivatives, that it has compact support, and that its first r + 1 moments vanish. The properties of the Stromberg wavelet are intermediate: It has exponential decay, as does its first derivative, and xip(x)dx = 0. These new wavelets are particularly useful. For example, the Daubechies wavelets just mentioned can be used to analyze the functions in Ls,p, the space of functions in Lp whose derivatives of order s < inf (r,m) are also in Lp. 2.10 The advent of signal processing If history had stopped with this first synthesis, the Daubechies orthonormal bases, which improve the rudimentary Haar basis, would never have been discovered. A new start was made in 1985 by Stephane Mallat when he was still a graduate student. Mallat discovered the similarities between the following objects: (a) the quadrature mirror filters, which were invented by Croisier, Esteban, and Galand for the digital telephone; (b) the pyramid algorithms of Burt and Adelson, which are used in the context of numerical image processing; (c) the orthonormal wavelet bases discovered by Stromberg and Meyer. The relations between these concepts will be explained in the? next two chapters. By using the relation between wavelets and quadrature mirror filters, Daubechies was able to complete Haar’s work. For each integer r, Daubechies constructs an orthonormal basis for L2(IR) of the form 2J/2ipr(2d x — k). j,k G Z, having the following properties: (a) The support of ipr is the interval [0, 2r + 1]. (b) 0 = xnipr(x) dx = 0 for 0 < n < r. (c) Vv has qr continuous derivatives, where the constant q is about 1/5. When r — 0, this reduces to the Haar system. Daubechies’s wavelets provide a much more effective analysis and synthesis than that obtained with the Haar system. If the function being analyzed has m contin- uous derivatives, where 0 < m < r + 1, then the coefficients a(J,k) from its de- composition in the Daubechies basis will be of the order of magnitude 2“hn+1/2)j/ while it would be of the order 2-3j/2 wjth the Haar system. This means that as soon as the analyzed function is regular, the coefficients one keeps (those exceeding the machine precision) will be much fewer than in the case of the Haar system. Thus one speaks of signal compression. Furthermore, this property has a purely local aspect because Daubechies’s wavelets have compact support. Synthesis using Daubechies’s wavelets also gives better results than the Haar system. In the latter case, a regular function is approximated by functions that have strong discontinuities. This produces an annoying “blocking effect” when images are compressed using Haar wavelets, as the reader can verify by referring to the images of Jean Baptiste Joseph Fourier in Figure 2.1. These remarkable qualities of Daubechies’s bases explain their undisputed success.
32 CHAPTER 2 F1G. 2.1. Jean Baptiste Joseph Fourier (1768 1830): The image on the right was produced by analyzing the original image on the left using Haar wavelets. It was reconstructed from the 600 largest wavelet coefficients and shows the characteristic blocking effect that is the signature of Haar compression. Courtesy of Academic des Sciences -Pans and Jean-Loup Charmat. 2.11 Conclusions The status of wavelet analysis within mathematics is unique. Indeed, mathemati- cians have been working on various forms of wavelet decompositions for a fairly long time. Their goal was to provide direct and easy access to various function spaces. But during this period, which stretches from 1909 to 1980, from Haar to Stromberg, there was very little scientific interchange between mathematicians (of the ‘‘Chicago School”), physicists, and experts in signal processing. Not knowing about, tin1 mathematical developments and faced with the pressure of specific needs within their disciplines, the last two groups were led to rediscover wavelets. For example, Marr did not know about Calderon’s work on wavelets (dating from 1960) when he announced the hypothesis that we analyze in detail in Chapter 8. Similarly, G. Battle and P. Federbush were not aware of Stromberg’s basis when they needed it to do renormalization computations in quantum held theory [108] (see also [27] and [29]). As it was stressed by Battle [28, p. 87], “The physics community was intuitively aware of wavelets years before anything better than the Haar basis was mathematically known to exist. This cultural knowledge dates back to a paper by Wilson [261] on the renormalization group.” In the numerous Helds of science and technology where wavelets appeared at the end of the 1970s, they were handcrafted by the scientists and engineers themselves. Their use has never resulted from proselytism by mathematicians. Battle’s comment raises another point: We have given a brief historical review of some of the mathematical origins of what is now known as the theory of time-scale wavelets. This historical perspective is, however, incomplete for two reasons. First, we focused the discussion on mathematics. We are sure that diligent detectives could write a similar story about the appearance of wavelet techniques in physics—
WAVELETS FROM A HISTORICAL PERSPECTIVE 33 and perhaps in other fields of science. Indeed, as we have mentioned, David Marr built his own wavelets in an image processing context, while several groups of physicists working in quantum field theory designed ad hoc orthonormal wavelet bases (see, for example, [108], [130], and [261]). Second, we focused on time-scale wavelets. Furthermore, we note that time- frequency wavelets were, for a rather long time, neglected by mathematicians. They were, however, popular in signal processing. Indeed, computing the scalar product between a given signal f and a Gabor wavelet g(t — т}егш1 amounts to per- forming a windowed Fourier analysis. Moreover, Gabor wavelets were immediately welcomed in physics and signal processing, while time-scale wavelets had a harder time. Finally, Gabor wavelets can be described as an orbit under the action of the Weyl-Heisenberg group, which is playing a key role in quantum mechanics. This discussion will be postponed until Chapters 5 and 6, where the interaction between quantum mechanics and time-frequency analysis will be studied in more detail. Today the boundaries between mathematics and signal and image processing have faded, and mathematics has benefited from the rediscovery of wavelets by experts from other disciplines. The detour through signal and image processing was the most direct path leading from the Haar basis to Daubechies’s wavelets.
CHAPTER 3 Quadrature Mirror Filters 3.1 Introduction In his thesis, “Codage en sous-bandes: theorie et applications a la compression numerique du signal de parole,” Claude Galand carefully described the quadra- ture mirror filters (which he invented in collaboration with Esteban and Croisier [68]) and their anticipated applications [125]. He also posed some very impor- tant problems that would lead to the discovery of wavelet packets (Chapter 7) and Malvar-Wilson wavelets (Chapter 6). Galand’s work was motivated by the possibility of improving the digital tele- phone, a technology that involves transmitting speech signals as sequences of 0’s and l’s. However, as Galand remarked, these techniques extend far beyond digi- tal speech, since facsimile, video, databases, and many other forms of information travel over telephone lines. At present, the bit allocation used for telephone trans- mission is the well-known 64 kilobits per second. Galand sought, by using coding methods tailored to speech signals, to transmit speech well below this standard. To validate the method he proposed, Galand compared it with two other techniques for coding sampled speech: predictive coding and transform coding. Linear prediction coding amounts to looking for the correlations between succes- sive values of the sampled signal. These correlations are likely to occur on intervals of the order of 20 to 30 milliseconds. This leads one to cut the sampled signal x(n) into blocks defined by 1 < n < TV, N +1 < n < 2N, etc., and then to seek, for each block, coefficients 1 < к < p, that minimize the quadratic mean ^2^ |e(n)|2 of the prediction errors defined by p e(n) = x(n) — akx(n — k). k=i In general, p is much smaller than N. To transmit a block x(n), it suffices to transmit the first p values x(l),... , x(p), the p coefficients ai,... ,ap, and the prediction errors e(n). The method is efficient if most of the prediction errors are near zero. When they fall below a certain threshold, they are not transmitted, and significant compression can result. A form of transform coding consists of cutting the sampled signal into successive blocks of length TV, as we have just done, and then using a unitary transformation A to transform each block (denoted by X) into another block (denoted by Y). The block Y is then quantized, with the hope that, for a suitable linear transformation, the Y blocks will have a simpler structure than the X blocks. Subband coding will be presented in the next section.
36 CHAPTER 3 For a stationary Gaussian signal, the theoretical limits of the minimal distortion that can be obtained by the three methods are the same. However, as Galand showed, this assumes, in the case of subband coding, that the width of the frequency channels tends to zero and that their number tends to infinity. In Galand’s work, these frequency channels are obtained through a treelike arrangement of quadrature mirror filters. This construction leads precisely to wavelet packets, which we discuss in detail in Chapter 7. Today we know that wavelet packets based on filters with finite length do not enjoy the frequency localization that Galand had hoped for. In the cases of linear prediction coding and transform coding, the theoretical limits of the minimal distortion are calculated as the lengths of the blocks tend to infinity, while conserving the stationarity hypothesis. If the three types of coding yield asymptotically the same quality of compression, why introduce subband coding? Galand saw two advantages: the simplicity of the algorithm and the possibility that subband coding would reduce the unpleasant effects of quantization noise as perceived at the receiver. By quantizing inside each subband, the signal would tend to mask the quantization noise, and it would be less apparent. The same argument has been repeated by Adelson, Hingorani, and Simoncelli for numerical image processing [3]. The use of pyramid algorithms and wavelets allows aspects of the human visual system to be taken into account so that the signal masks the noise. The perceptual quality of the reconstructed image is improved even though the theoretical compression calculations do not distinguish this method from the others. It should be observed, however, that these theoretical compression calculations are based on a very specific hypothesis that is clearly not fulfilled in the case of images, which are not well modeled by a Gaussian stationary processes (see, for example, [95]). Readers not familiar with the theory of filters may wish to look at Appendix A; it provides an elementary introduction to the language and notation that is used in this chapter. 3.2 Subband coding: The case of ideal filters To illustrate the ideas, we follow Galand and begin with a simplistic example. For a fixed m > 2, let I denote an interval of length within [0,2тг], and let ij denote the Hilbert space of sequence (с^)^е^ satisfying lcfc|2 < oo and such that /(0) = is zero outside the interval I. This subspace Z2 will be called a frequency channel. If is a sequence belonging to Z2, the subsequence (ckm)kez provides an optimal, compact representation of the original. In fact, for 6 E I, = - m m \ m / \ m / since f(0) = 0 for 0^1. Thus we have ^[ZZZx.. |cfcm|2 = A |cfc|2, and this relation expresses the redundancy contained in the original sequence (c^), which is strongly correlated. This means that the original sequence contains m times the numerical data needed to reconstruct f on I, knowing that f vanishes outside of I. This observation is a form of Shannon’s theorem, and the critical sampling Cfcm is done at the Shannon-Nyquist rate. The ideal subband coding scheme consists of first filtering the incoming signal into m frequency channels associated with the intervals [— 27rd+i) i Q < Z < m -1,
QUADRATURE MIRROR FILTERS 37 and then subsampling the corresponding outputs, retaining only one point in m. This operation, which consists of restricting a sequence defined on Z to mZ, is called decim,ation and is denoted by m | 1. This ideal subband coding scheme is illustrated in Figure 3.1. У1(птп)--► У2(пт)---► ym(nm)—- Fig. 3.1. Subband coding scheme. The scheme for reconstructing the original signal is the dual of the analysis scheme. We began by extending the sequences y±(nm), ... ,ym(nm) by inserting 0’s at all integers that are not multiples of m. Next we filter this “absurd decision” by using the adjoint filters F^.... , F^. The output returns the original signal (x(n)). The reconstruction is illustrated in Figure 3.2. Fig. 3.2. Reconstruction. One can, as Galand did, hope for the best and try to replace the index functions of the intervals [^, 27r^jl~1^] with more regular functions of the form w(mx — 2лГ), 0 < I < m — 1. If the w(x) = wm(a?) were a finite trigonometric sum, then the filters Fi,... , Fm would have finite length, which is essential for applications. But the Balian-Low theorem (Chapter 6) tells us that such a function w cannot be constructed if we demand that it be regular and well localized (uniformly in m). Consequently, it is not possible to realize the ideal subband coding scheme just described if we require that the filters Fi,... , Fm have finite length and, at the same time, provide good frequency definition. 3.3 Quadrature mirror filters Faced with the impossibility of realizing subband coding using m bands covering the frequency space regularly and having finite-length filters—whose length must be Cm, for some C > 0, as required by the Heisenberg uncertainty principle—Galand limited himself to the case m = 2. He then had the idea to effect a finer frequency
38 CHAPTER 3 tiling by suitably iterating the two-band process. We will see in Chapter 7 that this arborescent scheme leads directly to wavelet packets, but we will also see that these wavelet packets do not have the desired spectral properties. Subband coding using two frequency channels works perfectly. We are going to describe it in detail. The input signals are arbitrary sequences (x(n))nez with finite energy: 52 Ип)|2 < which means that x G Z2(Z). In the context of the digital telephone, we assume that the original speech signals have been sampled to give the signals (x(n))nez- Let D denote the decimation operator D : Z2(Z) —> Z2(2Z) that consists of retain- ing only the terms with even index in a sequence (x(n))nez- (D is also denoted by 2 | 1.) The adjoint operator £ = £>*: Z2(2Z) Z2(Z) is the crudest possible extension operator. It consists, starting with a sequence (x(2n))nez, hi constructing the sequence defined on Z obtained by inserting 0’s at the odd indices. Thus we get the sequence (... , 0, ж(-4), 0, a?(—2), 0, x(0), 0, x(2), 0, x(4), 0,...). To simplify the notation we write X in place of (x(n))nez- These input signals X are first filtered using two filters Fq and Fl. Later, we will require that Fq be a low-pass filter (in a sense that will be made precise), and, consequently, Fl will be a high-pass filter. However, there is no distinction between the two filters at this point. The outputs Xq = Fq(X) and Xl = Fl(X) are two signals (.r0(n))„ez and (^i(n))nez with finite energy. Xq and Xi are subsampled with the decimation operator D = 2 | 1. Then we have To = £>(X0) = (x0(2n))nez and Tx = £>(Xx) = (.-r1(2n))neZ- We write (oo X1/2 / oo X1/2 £|x-o(2n)|4 and ||П||= . The two filters Fq and Fl are called quadrature mirror filters if, for all signals X of finite energy, one has iiM2 + riii2 = imi2- (3.i) Denote the operator DFq : Z2(Z) I2 (XL) by Tq, and similarly let Tx denote the operator DFl : Z2(Z) Z2(2Z). It can be shown that (3.1) is equivalent to I = TO*TO + TX*TX. (3.2) In (3.2), I : Z2 (Z) Z2(Z) is the identity operator. What is much less evident is that the vectors To*7o(X) and T*Tx(X) are always orthogonal, which is a consequence of the following theorem. Theorem 3.1. Let Fq(0) and Fx(0) denote the transfer functions of the filters Fq and Fl . Then the following two properties are equivalent to each other and to the property expressed by (3.1):
QUADRATURE MIRROR FILTERS 39 Fo(0) F^e) Fo(0+7r) Fi(6»+tt) (i) The matrix is unitary for almost all 6 E [0, 2тг]. (ii) The operator (7b, Ti) : Z* 2(Z) Z2(2Z) x Z2(2Z) is an isometric isomorphism. Recall that the sequence of Fourier coefficients of the 27r-periodic function Fq(0) is the impulse response of the filter Fq, and similarly for Fi(fT) and Fi. Condition (3.2) is called the perfect reconstruction property. The input signal X is the sum of two orthogonal signals TqTq(X) and TfTi (X), where the signals 7b(X) and 7i(X) were given by the analysis. The operators Tq — FqE and Tf = F{E arc applied to two sequences sampled on the even integers. These are first extended in the crudest way, which is by replacing the missing values with 0’s. Next, this seemingly nonsensical step is corrected by passing the sequences through the filters Fq and F£, which are the adjoints of Fq and F\. The correct result is read at the output (Figure 3.3). analysis synthesis Fig. 3.3. The complete scheme, analysis and synthesis. Condition (ii) means that quadrature mirror filters constitute orthogonal trans- formations of a particular type, while (i) allows us to construct quadrature mirror filters that have finite impulse response. To see this, we start with a trigonometric polynomial то(в') = Q() + aie10 + • • • + амем such that |mo(#)|2 + |mo(0 + 7r)|2 = 1 for all 0. Next, we write Fq(0) = = v'S?%.0(7 + 4 Then it follows directly that (i) is satisfied. The following five examples illustrate the definition of quadrature mirror filters. The first example is essentially a counterexample because it is never used, for a reason that will become clear later. It consists of bypassing the operators Fq and Fi and defining Tq and Ti directly. Define Tq (X) to be the restriction of the sequence X = (x(n))nez to the even integers, and define 7i(X) to be the restriction of this sequence to the odd integers. This is equivalent, in our notation, to taking Fq equal to the identity I, and to taking Fi to be the shift operator defined by (FiX)(n) = x(n — 1). Condition (3.1) is trivially satisfied, and the unitary matrix in (i) is 1 ’1 eie ' 1 -eie 2
40 CHAPTER 3 /2 1 2 e 1 — e e associated with this choice will be the Haar The second example is more interesting. The filter operators are defined by (FoX^n) = (FiX)(n) = The unitary matrix in (i) is 1 [1 + 2 [1 - and the orthonormal wavelet basis system. The third example recaptures the ideal filters presented at the beginning of the chapter. The 27r-periodic function mo(0) is one on [0,7г) and zero on [тг, 2тг), and mi(0) = 1 — mo(0). As above, define Fq(F) — \/2mo(0) and Fi(F) = \/2mi(0). In the fourth example, mo(0) becomes the characteristic function of the interval [—when it is restricted to [—7г,7г), and mi(0) = 1 — mo(0). The last example is a smooth modification of the preceding one. With 0 < a < , we ask that mo(0) be 27r-periodic, equal to one on the interval [—^ + o, — o], equal to zero on [|+a,y - a], even, and infinitely differentiable. In addition, we impose the condition K(»)l2 + l"‘o(0 + A2 = i- (3-3) Then write mi (0) = e~ie Fiq{9 + 7r), Fo(0) = \/2 mb($), Fi(ff) = \/2 mT(0), and we obtain two quadrature mirror filters. 3.4 Trend and fluctuation Let H denote the Hilbert space Z2(Z) of all sequences (x(n))nez such that ^2^°^ |x(n)|2 <oo. Write Hq and Hi for the two subspaces TqTq(H) and If Fq and Fi are quadrature mirror filters, then by Theorem 3.1, H will be the direct orthogonal sum of Hq and Hi. Write mo(0) = Fq(0) and assume that то(тг) = 0 and that this zero has order q > 1. Then |mo(#)|2 = 1 + O(|<9|2r?) and mi(0) = O(\в\д) as в tends to zero. Under these conditions, we say that Fq is a low-pass filter and that Fi is a high-pass filter, even though this terminology may not always be strictly justified. When these conditions are satisfied, the trend and the fluctuation around this trend of a signal X are defined by Xq — TqTq(X) and Xi = T^Ti^X), respectively. Note that the trend and fluctuation are defined in terms of a given pair of filters. They are not intrinsic properties of the function X, but they are handy heuristics. The trend Xq = Fo*Fo(X) is generally “smoother” than X, in the sense that the low-pass filter Fq removes high frequencies from X. In fact, one often says that Xq is “twice as smooth as X,” which is another useful heuristic. These heuristics are consistent with a theorem by S. Bernstein that relates the smoothness of a function with the size of the support of its Fourier transform. 3.5 The time-scale algorithm of Mallat and the time-frequency algorithm of Galand It is amazing to reread Galand’s thesis in the light of present understanding. Indeed, Galand’s goal was to obtain finer and finer frequency resolutions by appropriately
QUADRATURE MIRROR FILTERS 41 iterating the quadrature mirror filters. This is possible, however, only in the case of the ideal filters in our third example, but we cannot use these ideal filters because they have an infinite impulse response. In spite of this criticism, we will return to Galand’s point of view in Chapter 7, and it will lead us to wavelet packets. Thus we see that Galand was looking for time-frequency algorithms. But his fundamental discovery, quadrature mirror filters, was diverted from that end by Mallat, who used quadrature mirror filters to construct time-scale algorithms using a hierarchical scheme. Mallat considers an increasing sequence Tj = 2“JZ of nested grids that go from the “fine grid” Tyv, N > 1, to the “coarse grid” Tq. The signal to be analyzed has been sampled on the fine grid (we will come back to the sampling technique when studying the convergence problem), and our starting point is thus a sequence f = fo belonging to Z2(Tyv). In addition, two quadrature mirror filters Fq and are given. (We will see later what conditions they must satisfy.) These same filters will be used throughout the discussion. We process the signal f by decomposing it into Tof and T\f, which we also call the trend and fluctuation. The trend Tof = DFof has been downsampled and “lives” the coarser grid Tyv-i; it represents a new signal that is decomposed again into a trend and fluctuation. The fluctuations are never analyzed in this scheme, and the algorithm follows a “herringbone” pattern illustrated in Figure 3.4. (To be precise, the operators Tq = DFq and Ti = DF± should have another index к to indicate that they “live” on the grid Tyv-fc. We have used a simplified notation to emphasize that the filters are always the same; they are just expanded at each step to fit the coarser grid.) Гх Гуу-i rN_2 F/v-3 F/v-4 To Fig. 3.4. Mallat’s algorithm. The input signal f G Z2(Tyv) is finally represented by the sequence тт,... , ryv of fluctuations and by the last trend fx G Z2(Fq). The transformation that maps f onto (ri,... , ryv, /yv) is composed of a sequence of transformations, each of which is invertible because of the perfect reconstruction property of the quadrature mirror filters. Thus f can be computed directly from (iq,... , ryv, /yv)- The significance of Mallat’s algorithm stems from the following observation: For appropriate choice of the filters Fq and Fi, there are numerous cases where the fluctuations ri,... , ryv are, at different steps, extremely small. Coding the signal thus comes down to coding the last trend /yv as well as those coefficients of the fluctuations that are above the threshold fixed by the quantization. Notice that the last trend contains 2~N times less data than the input signal. If, in addition, many of the terms in the fluctuations are essentially zero, then the amount of data that must be stored or transmitted can be appreciably less than the data representing
42 CHAPTER 3 the original signal. In other words, there can be good compression. This remark is the starting point of Donoho’s denoising algorithms described in Chapter 11. 3.6 Trends and fluctuations with orthonormal wavelet bases We propose to describe the asymptotic behavior of Mallat’s algorithm as the num- ber of stages N tends to infinity. To do this, it is first necessary to present the continuous version of this algorithm. This involves orthonormal wavelet bases in the following “complete form,” which means that we have a wavelet plus a mul- tiresolution analysis. This will be explained here for L2(IR), and it will be discussed again in the next chapter for L2(IRn), where we formally define a multiresolution analysis. We begin with a function tp belonging to L2(IR) that has the following property: <p(x — A:), A: G Z, is an orthonormal sequence in L2(IR). (3-4) Let Vo denote the closed linear subspace of L2(IR) generated by this sequence. For the other j G Z, define the spaces Vj in terms of Vo by simply changing scale. This means that /(x)GVo^/(^)GV,-. (3.5) The other hypotheses are these: The Vj, j G Z, form a nested sequence: their intersection reduces to {0}; and their union UT^Vj is dense in L2(IR). We then write p>j.k(x) = 2J/2tp(2:>x — k), j, к G Z, and define the trend fj, at scale 2~J, of a function f G L2(IR) by kez The fluctuations- or details in the case of an image—are denoted by d) and defined by dj(x) = fj+i(x) — fj(x). To analyze these details further, we let Wt denote the orthogonal complement of Vj in V}+i, so that = Vj ф Wr Then there exists at least one function ф belonging to Wq such that ф(х — A:), к G Z, is an orthonormal basis of IVq. Such a function ф>, called the mother wavelet, has the following properties: V3,k(x) = 2j^V(2Jx - V), j, к G Z, (3.6) is an orthonormal basis for L2(IR), and, more precisely, for all j G Z, we have dAx) = (3-7) fcGZ The details at a given scale are thus linear combinations of the “elementary fluctu- ations,” which are the wavelets related to that scale. Given two functions 99 and ф that satisfy the conditions just described (which are called the father wavelet and mother wavelet, respectively), it is possible to define two quadrature mirror filters Fo and by using the operators To = DFt} and T\ = DFi. This is done by relating the approximation of the function space L2(IR) that is given by the nested sequence of subspaces Vj to the approximation of the real line IR that is given by the nested sequence of grids Tj = 2_JZ.
QUADRATURE MIRROR FILTERS 43 To do this, we consider that the function QJ?fc(x) = 2?/2^(2-7x — k) is centered around the point k2~i, which would be the case if q were an even function. We associate the point k2~i with the function ipj,k- This gives a correspondence be- tween Tj and the orthonormal basis {Qj.fc, к E Z} of Vj. At the same time, /2(Г^) is identified isometrically with Vj. To define the operator Tq : I2 (Tj+i) —> Z2 (Г7), it is sufficient to define its adjoint T* :/2(Г7) —Z2(Tj+1). It is constructed by starting with the isometric embedding Vj C Vj+i and by identifying Vj with Z2(F7) and Vj+i with /2(Tj+i), as explained above. This adjoint T(* is a partial isometry. The orthonormal basis ikj,k of ITj allows us to identify W, with /2(Г7) in the same way. The isometric embedding of ITj C V}+i, interpreted with this identification, becomes the partial isometry Tf:/2^)-/2^). (A mapping T : H\ —> H? from one Hilbert space to another is called a partial isometry if ||Tx||h2 = ||т-Цн1 for all x E Hi, which is equivalent to saying T preserves inner products; that is, (Tx, Ty)H2 = (x, у)нг for all x,y E Hi. “Partial” means there is no assumption that the mapping is onto.) Finally, the couple (q, -0) is represented by the couple (Tq,Ti) or, which amounts to the same thing, by the pair (Fq, Fi) of the two quadrature mirror filters. This crucial observation is due to Mallat. Mallat also posed the converse problem: Given two quadrature mirror filters Fo and Fi, is it possible to associate with them two functions q and ф having properties (3.4), (3.5), and (3.6)? Although the converse is incorrect in general, it is correct in numerous cases, and this led to the construction of Daubechies’s wavelets. Our first and third examples of quadrature mirror filters show that the converse is generally false. There are no functions q and ф behind these numerical algorithms. The second example is related to the Haar system. The function q is the char- acteristic function of [0,1), and Vj is composed of step functions that are constant on each interval [&2--7, (к + 1)2“j), к E Z. The fourth example leads to Shannon’s wavelets. The function p is the cardinal sine defined by sin . Finally, the last example is more interesting because both of the functions q and ф belong to the Schwartz class <S(1R), which consists of the infinitely differentiable functions that decrease rapidly at infinity. In the next section, we are going to pass from the analysis of sampled functions to the analysis of functions defined on Ж by passing to the limit in the discrete algorithms. To do this, we will give sufficient conditions on the transfer function Fo(0) = y/2mo(0) to construct a multiresolution analysis starting with two quadra- ture mirror filters Fo and Fi. 3.7 Convergence to wavelets Before one can restrict a very irregular function f belonging to L2(R) to a grid Г = hZ, h > 0, it is often necessary to smooth the function by filtering. This filtering ought to be done according to specific rules. These rules are designed so that in the event f is very regular, f can be reconstructed from the sampled version with good accuracy using interpolation. The proper sampling technique is a direct consequence of Shannon’s work.
44 CHAPTER 3 We filter f by forming the convolution f * gh, where gtfix) = hr1 gfii-1 x) and where g is chosen so that it and its Fourier transform g satisfy the following three conditions: (1) g E Cr and g,g',... , all decrease rapidly at infinity. (2) g(x) dx — 1. (3) д(2ктг) = 0 for к E Z, к 0. One can then restrict the filtered signal f * gh to the grid hZ. We assume that these conditions are satisfied throughout the discussion, and we begin by reconsidering Mallat’s “herringbone” algorithm. Start by fixing f in L2(R) and sample f on the grid Гдг using the preconditioning filter f * gN, where gN(x) = 2Ng(2Nx). We wish to study the asymptotic behavior of Mallat’s algorithm as N tends to infinity. The limit we are looking for is defined as follows: Fix the index j of the grid Tj. (Starting with Го, we will look at Г1,Г2,Гз,... .) Then we seek the (simple) limits of the sequences /а(А:),глг(А:),глг_1(2_1Л;), ... , rN-j (2“7/c),. . . as N tends to infinity, j and к being fixed. Refer to the “herringbone” scheme (section 3.5 and Figure 3.4) for the definitions of fN,r^,r^-i,.... Here are the results [210], which do not depend on the choice of g as long as it satisfies the conditions stated above. (We note that this last point will come up again in Chapter 4 when presenting the two-dimensional version of this theorem.) Theorem 3.2. Assume that the impulse responses of the quadrature mirror fil- ters Fq and Fi decrease rapidly at infinity and that the transfer function Fq(0) of Fq satisfies Fo(O) = \/2 and Fo(0) fi 0 if — < 0 < Then Mallat’s “herringbone” algorithm, applied to f * g^, as indicated above, converges to the analysis of f in an orthonormal wavelet basis. More precisely, ^lim^/N(A;) = dx, lim rx(k) = У ffiififific — к) dx, ^lim rN-j(2~ik) = J f (x^fij ffix — k) dx. Observe that Fo(O) = y/2 means that Fo is a low-pass filter and that Fi is a high-pass filter. The functions p and fi are, respectively, the father and the mother of the orthonormal wavelet basis, as explained in the preceding section. We will not prove this theorem, but we will discuss the hypotheses and how they relate to this analysis. Assuming that we have arrived at an orthonormal wavelet basis, we know that the function p must satisfy the functional equation y>(z) = y/2 5? - k), (3.8)
QUADRATURE MIRROR FILTERS 45 where the sequence {hk}kei is in This follows from the inclusion Vo C Vfi and the fact that \/Tp(2x -fc), fc £ Z, is an orthonormal basis for Vi. Another requirement is that |q(0)| = | f ip(x)dx\ = 1. This is equivalent to being dense in L2(IR), although this is not obvious. (The proof of this and of the other assertions in this section can be found in [60].) We assume (multiplying by a constant with modulus one if necessary) that f q(x) dx = 1. Taking the Fourier transform of both sides of (3.8) shows that <Ж) = (E E /u-e-i2”«‘)0(2-4). (3.9) Let mo(^) = IZfcez hke~^k- Then m0 is a 27r-periodic function in L2(0, 2тг). To relate this to the filter, take f 6 L2(R) and write fj^ = (/, Tj,k)- The relation (3.8) implies that Лл = 5А"-2‘Т+1,п. (3.10) The right-hand side of (3.10) can be interpreted as follows: Pass fj+i,k through the filter (/i-n)nGz (take the discrete convolution) and then save only the even terms. But this is exactly the original operator Tq, and Fq is the filter (/i-n)nGZ- Thus mo(^) and the transfer function Fo(£) are related as before by Fo(£) = V/2mo(£)- and Fq(& = Т.к^-ке~*к. The hypothesis that Fq and F\ are quadrature mirror filters means that lmo(£)|2 + 1шо(£ + тг)|2 = 1 (3.11) for almost every £. But the condition in Theorem 3.2 that the filter Fq decreases rapidly at infinity implies that ttlq is infinitely differentiable, so (3.11) holds every- where. The hypothesis Fo(O) = \/2 implies that mo(0) = 1. Taken together these hypotheses imply that the infinite product HkLi mo(2-/f£) converges uniformly on compact sets to an infinitely differentiable function. By iterating (3.9) we have N 0K) = 0(2*we) П ™o(2-‘:?), (3.12) and from this it follows that <ж) = Пт°т*е). (з.1з) fc=l Using (3.13) and the fact that ttiq is infinitely differentiable, it is not difficult to show that q belongs to L2(1R). On the other hand, showing that ip(x — к), к 6 Z, is an orthonormal sequence depends on the hypothesis that Fq(£) does not vanish on -f < e < We note that the condition that т0(^) does not vanish on — is sufficient but not necessary. Necessary and sufficient conditions were discovered by Albert Cohen and are given in [60]. A beautiful application of this result is the construction of the celebrated bases of Ingrid Daubechies.
46 CHAPTER 3 3.8 The wavelets of Daubechies These wavelets depend on an integer N > 1 that is related to the size of the support of the functions tp and ф, which is [0,27V — 1]. The Holder regularity of these functions is also determined by TV: cp and ф belong to Cr, where r = r(N") and Нтд^_>+оо 7V-1r(7V) = 7 > 0. The value of 7 is about 1/5. This implies that if a wavelet ф is to have 10 continuous derivatives, the length of its support must be about 100. The functions cp and ф, which ought to be written as срдг and фм, are the father and mother of the orthonormal wavelet basis. To construct this orthonormal basis, Daubechies applies the method of the last section. One starts with the nonnegative trigonometric sum Рм(к) = 1 — cn / (sinu)2N~1du = 7fcetfc^ \k\<2N-l with the constant > 0 chosen so that Рдг(тт) = 0. There exists (at least one) finite trigonometric sum mo(t) = 1 hk?~lkt with real hk such that |mo(t) 12 = Piytt) and mg(0) = 1. This classical result is known as the Fejer Riesz lemma; a proof can be found in [73]. The coefficients h-k are the impulse response of the filter Fq. Under these conditions, we know from Theorem 3.2 that the functions tp and ф exist and that they form a multiresolution analysis. We now use these results to construct tp and ф explicitly. The function tp, which we seek to construct, is given by = fjm(l(2-7). (3.14) fc=l One then shows that tp and all of its derivatives are in L2(IR). By inverting (3.14) as an infinite convolution of distributions, it is almost obvious that the support of tp is in [0. 2N — 1]. The fact that <p(a; — кф к 6 Z, is an orthonormal sequence is a direct consequence of Theorem 3.2 and the fact that m0(t) /О011 [—7- 7]- To determine the Fourier transform ф of the wavelet ф, we first define mi as mi(i) = ed1-27V)^ md(t + 7г). Then <) = = ^1(2-4) П , (3.15) fc=2 and the support of ф is the interval [0, 2N — 1]. If TV = 1, tp is the index function of [0,1), while ф(х) — 1 on [0, |), ф(х) = — 1 on [|, 1), and ф(х) = 0 elsewhere. The orthonormal basis 2^2ф(2Фх — кф j, fceZ, is then the Haar system. 3.9 Conclusions The functions ф = фм used by Daubechies to construct the orthonormal bases named for her are new “special functions.” These functions had not appeared in previous work, and their only definition is provided by (3.14) and (3.15). This means that the detour by way of quadrature mirror filters and the corresponding transfer
QUADRATURE MIRROR FILTERS 47 functions was nearly indispensable. In other words, it would hardly have been possible to discover Daubechies’s wavelets by trying to solve directly the existence problem: Is there, for each integer r > 0, a compactly supported function ip of class Cr such that 2J/2ip(2Jx — k), j, к 6 Z, is an orthonormal basis for L2(R)? On the other hand, the fast convergence of the wavelet decomposition of a func- tion is directly related to the smoothness of the wavelet used [60]. The “good” quadrature mirror filters are those that lead to smooth wavelets, and this leads to a criterion for the selection of filters that would have been difficult to obtain without the detour through wavelets and functional analysis. Quadrature mirror filters will appear again in the algorithms for numerical image processing, which we describe in the next chapter. Mallat’s algorithm for computing the wavelet coefficients of a function in the case of filters of finite length has come to be known as the fast wavelet transform. If the original signal X has N terms, then the cost of computing its fast wavelet transform—measured by the number of additions and multiplications—is of the order N. In contrast, the cost of fast Fourier transform is of the order N log N.
CHAPTER 4 Pyramid Algorithms for Numerical Image Processing 4.1 Introduction In his book Vision [198, p. 51], David Marr wrote that “although the basic elements in our image are the intensity changes, the physical world imposes on these raw intensity changes a wide variety of spatial organizations, roughly independently at different scales,” and later [p. 54] we read that “intensity changes occur at different scales in an image, and so their optimal detection requires the use of operators of different sizes.” Adelson, Hingorani, and Simoncelli used the same language in [3]: “Images contain information at all scales.” Cartography illustrates this concept very well. Maps contain different informa- tion at different scales. For example, it is impossible to plan a trip to visit the Roman churches in the Poitou-Charentes region using the map of France found on a globe of the earth. Indeed, the villages where these churches are found do not appear on the global representation, whose scale is of the order 1 to 107. One can only find these villages on maps whose scale is 1 to 200,000 or smaller. Cartographers have developed conventions for dealing with geographic informa- tion by partitioning it into categories that correspond to the different scales, for example, the scales typically used for a city, a department, a region, a country, a continent, and the whole globe, which may range from 1 to 15,000 to 1 to 107. These categories are not entirely independent, and the more important features existing at a given scale are repeated at the next larger scale. Thus, it is sufficient to specify the relations between information given at two adjacent scales to define unambiguously the embedding of the different representations at different scales. Naturally these embedding relations (such as which department belongs to which province, and which province belongs, in turn, to which country, and so on ...) are available to us from our knowledge of geography; however, they could be discovered by merely examining the maps. We can see from this example the fundamental idea of representing an image by a tree. In the cartographic example, the trunk would be the map of the world. By traveling toward the branches, the twigs, and the leaves, we reach successive maps that cover smaller regions and give more details, details that do not appear at lower levels. To interpret this cartographic representation using the pyramid algorithm, it will be necessary to reverse the roles of top and bottom, since the pyramid algorithm progresses from “fine to coarse.” In cartography, usage and certain conventions determine which details are deleted in going from one scale to another and which significant structures persist across a succession of scales. In this chapter, we are going to describe the pyramid algorithms of Burt and
50 CHAPTER 4 Adelson, as well as two important modifications derived from them. The purpose of these algorithms is to provide an automatic process, in the context of digital imagery, to calculate the image at scale 2?+1 from the image at scale 2?. If the original image corresponds to a fine grid with 1024 x 1024 points, the pyramid algorithm first yields a 512 x 512 image, then one 256 x 256, next a 128 x 128 image, and so on until reaching the absurd (in practice) 1x1 limit. The interest in pyramid algorithms derives from their iterative structure that uses results from a given scale 2? to move to the next scale 2j + 1. Returning to our cartographic example, we suppose that we already have maps of the French Departments at a scale of 1 to 200,000. Then it is of no value to refer to the new satellite images to construct a map of France at a scale of 1 to 2,000,000. The information needed to make this new map is already contained in the maps of the departments. The point is that one uses judiciously the work already done without going back to the raw data. We have just outlined the general philosophy of the pyramid algorithms without, however, describing the algorithms that are used to change scale. How, starting with a very precise representation of the Brittany coast at a scale of 1 to 200,000, can we arrive at a more schematic description at a scale of 1 to 2,000,000 without smoothing or softening too much the myriad details and roughness that characterize the Brittany coastline? The pyramid algorithms of Burt and Adelson [42] and their variants (orthogonal and biorthogonal pyramids) deal with this type of problem. In all cases, this will involve calculating (at each scale) an approximation of a given image by using an iterative algorithm to go from one scale to the next. 4.2 The pyramid algorithms of Burt and Adelson For the rest of the discussion, Гу = 2-JZ2 will denote the sequence of nested grids used for image processing. It often happens that the image is bounded by the unit square, in which case we will speak of a 512 x 512 image to indicate that j = 9; similarly, a 1024 x 1024 image will correspond to j — 10. At this point, we are working with images that are already digitized and appear as numerical functions. The raw image that provided these digital images will be a function f(x,y). This function can be very irregular, either because of noise or because of discontinuities in the image itself. For example, discontinuities are often due to the edges of objects in the image. The sampled images fj are defined on the corresponding grids Г? = 2“-7Z2. These sampled images are obtained from the original physical image f(x,y) by the restriction operators Rj : L2(R2) Z2(Fj). These operators Rj will be defined below. They are the same type as those used in numerical analysis to discretize an irregular function or distribution. The fundamental discovery of Burt and Adelson is the existence of restriction operators Rj with the property that, for all initial images f, the sampled images fj ~ RAf) are related to each other by extremely simple algorithms. These algo- rithms, of the type “fine to coarse,” allow fj—i to be calculated directly from fj without having to go back to the original physical image f(x, y). To define the restriction operators Rj, we first consider the case of a grid given by x — hk, у — hl, where h > 0 is the sampling step and (k, Z) € Z x Z. Very irregular functions should not be sampled directly, and, therefore, the image may have to be smoothed before it is discretized. This leads to the classic scheme illustrated in Figure 4.1.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 51 F E Fig. 4.1. F is a low-pass filter prior to the sampling E. To determine the characteristics of the filter F, first consider the special case, /(ж, у) — cos(mx + ny + ip), m, n E N. To sample this function correctly on ZzZ2, the Nyquist condition must be satisfied. This means that h must be less than min{^-, -} if we wish to be able to reconstruct f. Another way to interpret the Nyquist condition is that sampling on ZzZ2 will lose all information about frequen- cies higher than For the case at hand, the Nyquist condition comes down to suppressing, through the action of the filter F, all the frequencies in f that are greater than This is done by smoothing the signal through convolution with -^g (.E h)’ where g is a sufficiently regular function concentrated around zero. The filtering/sampling scheme maps the physical image f onto a numerical image defined by c(M) = ^2 // g (k - f(x,y)dxdy. By writing p(x,y) = g(—x, -y) and рь(х,у) = -^p (%, ^), we have c(fc, I) = (f, Phi - kh,- - lh)}, (4-1) (4-2) where (u, v) = ff u(x, y)v(x, y) dx dy and where denotes the (dummy) variable of integration. The operator that maps f onto с(к, I) is called the restriction operator and is denoted by Rh- The extension operator enables us to extend a sequence c(k,l) defined on /zZ2 to a regular function on R2. In this sense, it is inverse to the filtering/sampling operation. We define the extension operator to be the adjoint of the restriction operator; thus it is given by c(k,l) EE c(,k,l)<p(h 'x-k,h (4-3) This is an interpolation operator, which will be denoted by Ph- The simplest examples are given by spline functions. We consider the one- dimensional case to simplify the notation. If we let p be the triangle function T(x) = sup(l — |x|, 0), then (4.3) yields the familiar piecewise linear interpolation of a discrete sequence. A second choice is given by p = T * T, which is the basic cubic spline. Returning to the general case, we require that the operator PrR.g. composed of the restriction operator followed by the extension operator, has this property: For all functions f 6 L2(R2), in L2(R2) as/!->(). (4.4) By assuming, for example, that p is a continuous function that decreases rapidly at infinity, it is not difficult to show that (4.4) is equivalent to PhRh(jk) = 1, where 1 represents the function identically equal to one. One can also verify that this is equivalent to the Fix and Strang condition [244]: |£(0,0)1 = 1, (Э(2/С7Г, 2/7г) = 0 if (0,0) ф (к, I) € Z2. (4.5)
52 CHAPTER 4 In what follows, we assume that ff tp(x,y) dx dy — 1, after possibly multiplying tp by a constant of modulus one. We return to the fundamental problem posed by Burt and Adelson. Thus we consider the nested sequence Гу = 2~JZ2. These grids become finer as j —+ +oo and coarser as j — oo. We begin with a function (£> that is continuous on IR2 and decreases rapidly at infinity. We also assume, as above, that (£>(0,0) = 1. Denote by Rj and P, the restriction and extension operators associated with this choice of (£> and the grid Tj. This means that h = 2_J and that the operators Rh and Ph are denoted by Rj and Pj. (From now on, we will mostly use vector notation for the variables in IR2 and Z2. Thus, x € IR2 means x = (ян,Т2), к E Z2 means к — (k\,k,2), and k-x = k\X\ + k,2X2, and so on. This will be more compact, and it will better reveal the connection with the one-dimensional case.) Burt and Adelson’s basic idea is that, for certain choices of the function <p, the different sampled images Rj(f) = fj derived from the same physical image f are necessarily related by extremely simple relations. The dynamic of these relations is from “fine to coarse,” which means that a function defined on a fine grid is mapped to one on a coarse grid. To make these relations explicit, we denote by the operators that will eventually be defined by these relations, that is, by T^ff) — fj-1, where fj = Rj(f) and /7_i = Rj-i(f). We can summarize this with the two conditions Т. J2^)^/2^-,). (4.6) Rj^-TjRj. (4.7) One might naively think that the operator Tj could be defined by inverting the operator Rj in (4.7). But the operator Rt is a smoothing operator, and its inverse is not defined. In terms of images, it is not generally possible to go from a blurred image back to the original image. This means that, in general, we cannot solve (4.7) by elementary algebra. On the other hand, once Rj is restricted to an appropriate closed subspace Vj of L2(IR2), Rj : V3 —> /2(Гу) becomes, in certain cases, an isomorphism. Then we can solve (4.7) directly. Burt and Adelson asked how to determine the functions <£> such that (4.6) and (4.7) are satisfied. Stated this way, the problem is very difficult, for most of the usual choices of smoothing functions do not have these properties. To resolve this difficulty, Burt and Adelson proceeded the other way around: They sought to construct (£> from the operators Tj. For this it is necessary to derive some consequences of (4.7). The first is that the operator Tq : Z2(Z2) —> 12(2%2) can be written as Tq = DFq, where Fq : Z2(Z2) Z2(Z2) is a filter operator and where D : Z2(Z2) l2(2Z2>) restricts a function defined on Z2 to 2Z2. D is the decimation operator, which we have already encountered in Chapter 3. That Tq has this form is a consequence of the fact that To commutes with all even translations (see Theorem A.l). Thus, if X = (x(A:))fcG22 is in ^2(Z2), we can write T0(X)(2fc) = Ц2А; - Z)rr(Z), fceZ2, (4.8) ZGZ2 where cj(A:) is the impulse response of the filter Fq. For convenience, we assume (as Burt and Adelson did) that ш(к) is real.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 53 If we apply Tq to x(k) = f f(t)cp(t — k) dt, then (4.7) and (4.8) imply that - I /(i)<^(2-1i —/с) di = ^2 w(2/c — Z) i f (t)<^(i — I) dt ' Ig72 ' = / f{t) ( w(2k — 1)ф(ф — Z)) dt. J /gz2 By taking к = 0, we conclude that V(t) = 4 ^>((M2t + Z). (4.9) /GZ2 For practical applications, Burt and Adelson were particularly interested in filters with finite length. This means that cj(fc) = 0 if \k\ > N for some N. By taking Fourier transforms of both sides, (4.9) becomes ^(O=m0(2-4)^(2-I«), (4.Ю) where w(«) = Y. (4.П) fcez2 By iterating (4.10) and passing to the limit (which is possible because (,3(0) = 1 and the filter is finite), we see that <Ж) = Пт<>(2~Ч). (4.12) .7=1 The second consequence that we derive from (4.7) is that these conditions for different j are in fact equivalent. This can be seen by making the change of variables t 2Jt in (4.9) and integrating both sides against the function /. We then have 7?J_1(/)(2-'+Ifc) = ^w(2A--/)7?,(/)(2-J/). (4.13) /GZ2 In other words, under our assumptions, the operators Tj are defined by T;(X)(2^'+1fc) = - 0^(2"JZ) (4.14) /GZ2 when X = (;r(2_JZ)) belongs to Z2(Fj). The point is that the sequence (u(/c), к G Z2, is the same for all the operators Tj. Working backward, Burt and Adelson began with a finite sequence of coefficients cj(Zc) with the property that cu(/c) = 1. They defined rriQ by (4.11) and then ф by (4.12). The first question to arise is whether the second member of (4.12) defines a square-integrable function. If this is the case, this function is called ф, the restriction operators Rj are defined in terms of the Fourier transform of this function, and the transition operators T) are defined by (4.14). In this case, Rj-\ — TjRj for all j € Z. We have been describing the pyramid algorithms of Burt and Adelson, and much of this description closely resembles what was done in Chapter 3, particularly in sections 3.5, 3.6, and 3.7. However, to avoid confusion between the concepts and no- tation in the two chapters, we list explicitly some of the similarities and differences:
54 CHAPTER 4 (a) In both chapters, the mappings Tj : /2(Гу) —-> l2(Tj--i) are all the same except for a change of scale. This is seen explicitly in equation (4.14). (b) The mappings Tj were all denoted by Tq in Chapter 3. (c) There were two filters, Fq and F±, and two corresponding mappings, Tq and Tj, in Chapter 3. Here, in Chapter 4, only one filter, Fq, has appeared, and while the To of Chapter 3 corresponds to the Tq of Chapter 4, the same is not true for the two Ti’s. (d) The pyramid algorithm is only a “partial multiresolution analysis.” The miss- ing ingredient is orthogonality; so far, we have not encountered the equivalent of equation (3.11). This will appear in section 4.6. 4.3 Examples of pyramid algorithms Before continuing the presentation of the Burt and Adelson algorithms, we give examples of functions (/? that illustrate both the existence and the nonexis* mce of the transition operators. We also give examples of sequences ш(к, I) illustrating the existence and nonexistence of the associated function p. We begin with two examples where the transition operators do not exist. The first example is the Gaussian p(x, y) = ± exp(—x2 — y2), which plays an important role in Marr’s theory of vision (Chapter 8). There are no transition operators in this case because (4.10) implies that mo(^, rf) = exp (—1(£2 + ?y2)), which is clearly not 27r-periodic in £ and y. In the same way, the transition operators do not exist if p(x,y) = | exp(—|t| — \y\). One senses, justifiably, that the existence of transition operators is exceptional. Here, however, is an example where the operators do exist. To simplify the discussion, this example (the spline functions) is presented in one dimension. Let m > 0 be an integer and let x be the characteristic function of the interval [0,1]. Define p to be the convolution product x * • • • * X where there are m products and m + 1 terms. Then and (4.10) is satisfied with which is indeed 27r-periodic. Clearly, there is little chance of finding appropriate p by guessing; the efficient way is to approach the problem from the other direction. Thus, we begin with a sequence of transition operators (Tj), which is a sequence ш(к), fc G Z2, and we propose to reconstruct p. All the examples that we consider are constructed with separable sequences cc(fc), that is, sequences of the form d>(Zci)d>(A;2)• The associated function p will then necessarily be of the form p(xi)p(x2). We will be discussing T and p in the following examples. We are in the one-dimensional case, and these are functions of the variables к G Z and x G JR, respectively. For the first example take Co(k) = 0 if к 0 and cc(0) = 1. In this case the function p defined by (4.12) is the Dirac measure at x = 0 and the restriction operators Rj : L2(IR2) —/2(Гу) are no longer defined. In the second example take cj(fc) = 0 except for к = ±1, and lj(±1) = From this we can deduce that mo(^) = cos(£), p(x) = | on the interval [—1,1], and
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 55 ф(х) — 0 elsewhere. This choice for ф, which is (for the moment) perfectly reason- able, will be excluded when we introduce the concept of multiresolution analysis. Burt and Adelson proposed a very original function for lj, and this will be our third example. Take c5(0) = 0.6; ^(±1) = 0.25; c5(±2) = —0.05; and u>(A;) = 0 for \k\ > 3. The corresponding function ф(х) is continuous, its support is [—2,2], and it resembles Cexp(—с|т|), C > 0, c > 0, on this interval. The corresponding algorithm is called a Laplacian pyramid. We shall see this example again when we introduce biorthogonal wavelets at the end of the chapter. The purpose of our last example is to show that the existence of the function <p, defined by (4.12), is not a stable property, even in the simplest cases. In fact, we limit our discussion to functions c5(fc) that are zero except at к = 0 and к = — 1, and here d>(0) = p, 4)( — 1) = q, 0 < p < 1, 0<g<l,p + g = l. Then the choice p = q = | leads to a function ф that is the characteristic function of [0,1]. All other choices imply that the mathematical object on the right side of (4.12) is the Fourier transform of a probability measure p that is singular with respect to the Lebesgue measure. The support of this probability measure p is the interval [0,1]. This measure is defined by the following property: If I is a dyadic interval in [0,1] and if I' is the left half of I and I" is the right half of I, then p(I') — pp(I) and p(I") — qp(I)- This measure is multifractal (see, for example, [7] and Chapters 9 and 10). We drop for the moment the problem of choosing an optimal filter ш(к), к G Z2. Indeed, such a choice must take into consideration the overall objective. Burt and Adelson’s objective was image compression. We are going to present their compression algorithm in the next section. After that we will return to the problem of choosing the sequence cj(fc). 4.4 Pyramid algorithms and image compression Image compression is one of the uses of the pyramid algorithms. Burt and Adel- son’s algorithm, which we describe in this section, will later be compared with other algorithms (orthogonal pyramids and biorthogonal wavelets) that perform better. All of the pyramid algorithms act on images that are already sampled and never on the original physical image. In other words, the function p we have tried to construct using the sequence cj(fc) is never used. Then why have we investigated its properties? The brief answer is that the regularity (or smoothness) of p influ- ences the efficiency of the compression. More precisely, the regularity is related to the behavior of mo at £ = 0, and we will see below how this influences compression. (For a full discussion, see [60].) The Burt and Adelson pyramid algorithms use only the transition operators Tj : /2(Гу) —Z2(Fj_i). All of these operators are the same, except for a change of scale; therefore, we are going to assume that j — 0. The discussion of the algorithm begins with the definition of the trend and the fluctuations around this trend for a sequence f belonging to Z2(Fq). This trend cannot be Tq(/) because it “lives in a different universe” and cannot be compared with f to obtain a fluctuation. To define the trend, it is necessary to leave the coarse grid 2Fq, where Tq(/) is defined, and return to the fine grid Го, where f is defined. This is done by using the adjoint operator Tq : Z2(2Fq) /2(Г0), and the trend of f is defined by TqTo(/). We clearly want the trend of some very regular function such as constants and polynomials of low degree to coincide with these functions. This leads to the
56 CHAPTER 4 requirements that Tq 7b(l) = 1 and, more generally, that Tq TQ{xpyq) = xpyq for all p. q with p + q < some fixed integer N. This condition is equivalent to the following: The function mo, defined by (4.11), must vanish, along with all of its derivatives of order less than or equal to TV, at points етт, e — (eq, £2), ^1,^2 € {0,1}, with the exception of the origin. At the origin, one must have two2 =1 + 52 p+q>A + l The price that must be paid for these regularity conditions is that the length of the filter ш(к') must be at least proportional to N. Another observation is that the conditions we have just imposed on mo imply, by (4.12), that ф{2ктг) = 0 if к E Z2 and к 0. But this last condition is the same as (4.5), which, as we have seen, is necessary and sufficient to have PhRh(f) f in T2(R2) as h —0. Since Tq is the discrete analogue of the restriction operator Rh and since Tq corresponds to the extension operator Ph, TqTq is the “discrete approximation” operator corresponding to the continuous approximation operator Ph Rh The fluctuation around the trend is f — TqTo(J') when f belongs to Z2(To). This fluctuation is zero whenever f is a polynomial of degree no greater than TV, and one can easily deduce from this that the fluctuation will be very weak in all areas where* the image is very regular, since “regular” means being close to a polynomial (recall (2.5)). As we will see, this last property is the key to the success of the Burt and Adelson algorithm. The trend and the fluctuation around the trend of a sequence f belonging to Z2(T?) are defined by a simple change of scale. The trend is T*Tj(f), and the fluctuation is / - Т;Т7(/). If the sequence f is the restriction to the grid Tj of a function F that has N + 1 continuous derivatives in some open region Q, then 1/ - T;Tj(/)| < C2~^n+^j (4.15) at all the points of this region. This means that the Burt and Adelson algorithm becomes more effective as N increases. To define the coding and compression algorithm of Burt and Adelson, we begin with the fine grid Tm = 2-mZ2 and a numerical image fm sampled on this fine grid. This numerical image is, in fact, the restriction to Tm = 2-mZ2 of a physical image f G L2(R2). This means that fm is the restriction in the usual sense of the convolution product f * gm, where gm(x) = 4mg(2mrr). The properties of the function g were indicated in section 4.2. However, it is not necessary to return to the “physical image” f to use the algorithm. Burt and Adelson replace fm by the couple (trend, fluctuation). But the trend, which is given by ТДТт(/т), is completely defined by Tm(<fm'). This means that the trend T^Tm{fm) can be coded by retaining one pixel in four, and this coding is given by Tm(f7n'). In summary, Burt and Adelson code fm with the couple \Tm(fm), fm - TmTm(fm)]- Then the fluctuation, denoted by rm, is not processed further. They write ,fm-i — Tm(fm) and iterate the procedure. fm-i is coded by {fm—2, ^m~i), where fm—2 — Tm—i (/m— 1) and r,,(- 1 — fm—1 Trn_-^Tm—1 {fm— 1) . If we suppose that the starting image fm is bounded on a square of side 1, then the algorithm is stopped on reaching the summit of the pyramid, which is the grid To.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 57 The image fm is coded by the sequence (fo,Si,... ,rm), where fo, defined on Го, is a scalar and where the rj — (I — T*Tfi)fj, 1 < j < m, are the dififerent fluctuations. The diagram in Figure 4.2 gives a schematic description of the algorithm. j. тт £ Tm-i -/1 J 7YI J тп — 1 J?n — 2 T rp* гр * 1 rn 1 7П T rp* rp 4 -L ppi _ 1 m — 1 T T* rp L rn — 2 m — 2 I Гm Г rn — 1 Гт-2 Si Г г m Г m — 1 Г rn — 2 i Fig. 4.2. Burt and Adelson’s algorithm. Ti /о Го Interest in this coding scheme is based on the following two properties: (1) Going from fj to /j-i reduces the data one must deal with by a factor of four. Indeed, fj is defined on Tj and fj—i on Ty-j, which has one-fourth as many points. (2) In many cases, terms of the fluctuation vector rm are so small that, after quantization, they are replaced by zero. Condition (2) is satisfied in regions where the image is sufficiently regular. Running the algorithm the other way, which is reconstructing fm from the code, is easy. Begin with /0 and Si, and compute f\ = Tffo + Si. In the same way, reconstruct /2 by /2 = Tf /1 + Г2, and continue until fm is recovered. As we will see in section 4.6, this algorithm provides good compression only if most of the fluctuations are small, and hence quantized to zero. Otherwise it is inefficient, since essential information about the image is represented by too much data. 4.5 Pyramid algorithms and multiresolution analysis Before leaving the first version of Burt and Adelson’s algorithms, we are going to describe a continuous version. The interplay between the discrete algorithms and their continuous versions, which is implicit in the work of Burt and Adelson, was made explicit by Stephane Mallat and Yves Meyer. We consider the general case because dimension two plays no particular role in the following definition. A multiresolution analysis of L2(]Rn) is an increasing sequence of closed subspaces of L2(Rn) having the following three properties: (1) Vj — {0} and U^°oo Vj ls dense in L2(Rn). (2) For all functions f E L2(Rn) and all integers j E Z, fix') E Vo is equivalent to f(2>x) E Vj. (3) There exists a function p(x) E Vq such that the sequence <p(x — kf, к E Zn, is a Riesz basis for Vq.
58 CHAPTER 4 Recall that a Riesz basis of a Hilbert space H is, by definition, the image of a Hilbert basis (fj)jej of H under an isomorphism T : H —> H. (Note that T is not necessarily an isometry.) Then each vector x € H is decomposed uniquely in a series x = ^2ajeJ> j&J where 2 < oo. (4.16) Furthermore, ctj — (x,e*), where e* = (T*)-1(/j) is the dual basis of ej, and this dual basis is itself a Riesz basis. The two systems (ej) and (e*) are said to be biorthogonal. This is the abstract concept that leads to the development of biorthogonal wavelets (section 4.7). The regularity of a multiresolution analysis is given by the regularity of the functions belonging to Fq- To measure this regularity, we introduce an integer r that can take the values 0,1, 2,..., and even +oc. The multiresolution analysis is said to be r-regular if it is possible to choose the function (/? in (3) so that for all integers m > 0 and all x E Rn. |a>Cr)| <cm(i + Mrw (4-17) where a = (aq,... , on) is a multi-index satisfying aq + • • + on < r and where da = ( ^ )«i( )«2 ...( a ча \dxi' V 0X2'' ' ox„ ' We return to the two-dimensional case. Here, a multiresolution analysis is, in a certain sense, a particular case of a pyramid algorithm. Indeed, suppose that the function (/?, which is defined by (4.12), has the following additional property: There exist two constants C2 > Cq > 0 such that for all scalar sequences (afc)fcGz2, / \ 1/2 ^aktp(x-k) < C2f |a:fc|2 ) . fcGZ2 2 ^fcGZ2 ' (4-18) Let Vq denote the closed linear subspace of L2(R2) generated by the functions tp(x — fc), fc € Z2. Relation (4.18) implies that ip(x — fc), к G Zn, is a Riesz basis for Vo- One can verify that the conditions in (1) hold and that Vj C Vj+i when the Vj are defined by (2). The pyramid algorithms associated with multiresolution analyses are the only ones that we will study in the following sections. They have some remarkable properties. For example, the restriction operator Rj is an isomorphism between Vj and Z2(Ty), and the equation Rj-\ = TjRj can be solved directly. In fact, it is sufficient to restrict the two sides of Rj-i = T3Rj to Vj to invert Rj. Not all pyramid algorithms are related to a multiresolution analysis. A coun- terexample is given by one of the pyramid algorithms presented in section 4.3. In this example, tp(x) — | on [—1,1] and (/?(rr) = 0 elsewhere. Thus, 1И^) - - 1) + - 2) + • • + (-1)^^' - N) ||2 = -^=, whereas, according to (4.18), it should be of the order of л/TV. 4.6 The orthogonal pyramids and wavelets Shortly after the discovery of quadrature mirror filters by Esteban and Galand, Woods and O’Neil had the idea to apply this technique to image processing [263].
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 59 They thus obtained the first example of an orthogonal pyramid. We are going to set aside, for the moment, the specific construction carried out by Woods and O’Neil using separable filters. We will first present the notion of an orthogonal pyramid in complete generality, and then we will return to the particular case where the quadrature mirror filters appear in the construction of an orthogonal pyramid. The Burt and Adelson algorithm is not particularly efficient because it replaces information coded on A2 pixels by new information whose description requires | A2 pixels. This criticism, which we will analyze in a moment, is not always justified in applications, since in many examples of real images, most of the values of the gray levels on the |A2 pixels are in fact small and thus quantized to zero, and the unfavorable pixel count where A2 becomes |A2 rarely occurs. Let us examine, however, why the information has been wasted or, more precisely, where the inefficient coding occurs. At the start, the image f has been coded on A2 pixels. Next, we replace this by the couple [To(/), (Z—T0*T0)(/)], which is composed of the coding for the trend and the complete description of the fluctuations around the trend. The description of To(/) requires |A2 pixels, whereas the description of f — Tq Tq(/) continues to require N2 pixels. In all, we use N2 + |N2 pixels. At the next step, the pixel count becomes N2 + |A2 + j^A2, and so on. At the end, we will have used A2 + |A2 + 4g A2 + • • • + 1, or approximately |A2 pixels. The “wasted” pixels appear because the fluctuations f — TjTj(f') have not been coded efficiently. Orthogonal pyramids are a particular class of pyramid algorithms that code the fluctuations with |A2 pixels. With this scheme, there is no waste. When the original image / is replaced by coding the trend and the fluctuations, the required pixels are |A2 and ^A2, respectively, and the volume of data remains constant. A pyramid algorithm is said to be orthogonal if the trend To*To(/) and the fluc- tuations f — TqTq^) around this trend are orthogonal for each image f G Z2(To). Let H = /2(Г0), Ho = Т0*Т0(Я), and Hi = (Z - Т0*Т0)Я. If the pyramid algorithm is orthogonal, then Я = Hq®H\. Since the dimension of Hq is a quarter that of Я, the dimension of Hi is | dimA, as mentioned. An equivalent definition of orthogonal pyramids requires the adjoint To* of the operator Tq to be a partial isometry, which means that for all flg € l2(T-fl). (Recall that Tq is defined on Z2(T_i) with values in Z2(To).) This takes us back to one of the characteristic properties, in dimension one, of the low-pass filter Tq in a pair of quadrature mirror filters (Tq, Ti). And this observation prompts us, in dimension two, to look for the corresponding second filter, Ti. We will see in a moment that three filters are necessary in two dimensions. But first, we show how to construct some pyramid algorithms. We return to the transfer function mo defined by (4.11). The pyramid algorithm is orthogonal if and only if |шо(£л)|2 + + 7г,т/)|2 + |т0(£,т/ + 7г) |2 + |m0(£ + 7г,т/ + тг) |2 = 1. This condition is completely analogous to the one on the transfer function mo in the case of two quadrature mirror filters (section 3.3). Continuing this comparison, we consider the function tp in L2(R2) A L1(R2) de- fined by (4.12) and normalized by ff p(x)dx = 1. We might expect that the sequence p(x — fc), к € Z2, is orthonormal, and this is true in many cases. How- ever, the proof involves a delicate limit process, passing from the discrete to the
60 CHAPTER 4 continuous, and some orthogonal pyramids do not lead to orthogonal sequences of functions ip(x — fc), к G Z2. This difficulty already appeared in dimension one for the quadrature mirror fil- ters. The condition we assume here about mo, which is sufficient to allow passage from the discrete to the continuous, is the analogue of the condition we used in dimension one. It is sufficient to assume that mo is smooth and that m0 (£,?]) 7^ 0 if — f f and — | < /7 < Then tp(x — /0), к G Z2, is an orthonormal basis of a closed subspace Vq of L2(R2). By dilation, we see that 2Jcp(2Jx — k), к G Z2, is an orthonormal basis for the subspace Vj. Furthermore, the extension operator Pj : Z2(Fj) —> Vj is an isometric isomorphism, and the restriction operator Rj : L2(R2) —> /2(Гу) is decomposed into the orthogonal projection operator from L2(R2) onto Vj followed by the inverse isomorphism P~r : Vj /2(Г?). (Recall that the extension operator Pj and the restriction operator Rj were defined on page 51.) This allows us to define the transition operators T3 : Z2 (F^) —Z2(Fy _a) explicitly. (They had been defined implicitly by TjRj = Rj-i.) Use the operator Pj to identify /2(Tj) with Vj, and similarly use Pj-i to identify /2(Г7_1) with Vj-i. Having made these identifications, the transition operator Tj : /2(Гу) —> Z2(Fy_i) corresponds to the orthogonal projection of Vj on VQ —1, which is Pj^iTjP^1 in our notation. We define HQ to be the orthogonal complement of Vj in VQ_|_i. Thus, VJ+i = ® WJ- It is easy to verify—by once again using the “isometric interpretations” given by Pj : /2(Гу) —> Vj and Pj+i : Z2(Fj + i) —> IQ+1—that this orthogonal decomposi- tion corresponds precisely to the orthogonal decomposition of a function into its trend and fluctuation, and this latter decomposition is the definition of orthogonal pyramids. We come now to the two-dimensional generalization of the quadrature mirror filters. In dimension two, we consider four operators Tq, Si, Sz, and S3. All four are defined on Z2(Z2) with values in Z2(2Z2). We require that these four operators commute with the even translations r G 2Z2 and that ll/ll2 = II W)H2 + II^1(/)I|2 + II W)l|2 + II S3(f) II2 (4.19) for all f belonging to Z2(Z2). The left term is of course computed in Z2 (Z2), whereas each term on the right is computed in Z2(2Z2). One of the important results in the theory of orthogonal pyramids is the existence of the operators Si, S2, and S3 and the ability to construct them. Furthermore, if the impulse response cj(fc) of Tq decreases rapidly at infinity, the operators Si, S2, and S3 can be constructed to have this same property. Once Si, S2, and S3 are constructed, we can construct the corresponding wavelets "01, U2, and ф3. Assuming that m0(£, 7?) 7^ 0 if |£| < and \r/\ < tQ we define these three wavelets by tpj(x) = 4 ^2 <-L>j(k)</>(2x + /0), j =1, 2, or 3, (4.20) fcGZ2 where a>j(k) denotes the impulse response of Sj. Thus, under quite general conditions, the orthogonal pyramids lead to orthonor- mal wavelet bases, and this development proceeds by way of the two-dimensional generalization of quadrature mirror filters.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 61 We move on to the two-dimensional generalization of Mallat’s algorithm. The exact reconstruction identity, I = TqTq + S^1S1 + S2*S2 + S3*S3, (4.21) is deduced from (4.19). Identity (4.21) provides a particularly elegant solution to the problem of coding the fluctuation f — TqToIJ'). This fluctuation is exactly YSiW + ЗД/) + NN(/). The three operators SJ1, S2, and S3 are partial isometries, and this allows us to code the fluctuation f — T^To(f) with the three sequences Si(/), S2(f), 83(f). These three sequences belong to Z2(2Fq) when f G Z2(Tq), and thus the coding of each of them uses only one pixel out of four. Hence, three-fourths of the pixels are used to code the fluctuation, whereas one-fourth of them are used to code the trend. Consequently there are no longer any wasted pixels. We can now return to the algorithm and give it a much more precise formulation. This is illustrated in Figure 4.3, where Tj(fj) — fj—x and Sjg(fj) = Sj-iy, and where i =1, 2, or 3. Л г 1 m T, m,, 1 Sm,2 8/n ,3 Fig. 4.3. Two-dimensional generalization of Mallat’s algorithm. The wavelets appear in the asymptotic limit of this scheme, which is the two- dimensional analogue of Figure 3.4. This limit is taken on the number of steps m, which must tend to infinity. We start with a fixed function f belonging to L2(R2). We restrict f to the fine grid Tm using the classic scheme. This means we have a fixed regular function g. which decreases rapidly at infinity and whose Fourier transform g satisfies g(0) — 1 and g(2kir) = 0 if к 0. We write gm(x) = 4w^(2mrr). Finally, is the restriction to the (fine) grid Гт of the (filtered) function f * gm. We emphasize that what follows does not depend on the particular g that is used. (Recall that for the development of the pyramid algorithms of Burt and Adelson in section 4.2, we had g(x,y) = </?(—x, —y). This is not the case here; g and (/? are completely independent, as were g and the filters in Theorem 3.2.) If we still assume that does not vanish on [—f, f] x [— f, f] and that the pyramid is orthogonal as defined by (4.25), then Mallat’s algorithm converges as the number of steps tends to infinity. The limit of this process is another algo- rithm, namely, the decomposition of the original function f in the orthonormal basis composed of the following four sequences: tp(x — k), 2j'if1(2jx - k), 2jif2(2jx - fc), 2jif3(2jx - k), where xGR2, к G Z2, j G N.
62 CHAPTER 4 This means that if we fix the index j “outputs” of Mallot’s algorithm that are their coefficients are, respectively, of the grid Tj, and if we examine the defined on this grid, then the limits of 23 / f(x)<p(23x — k) dx, 23/ f(x)fi(23x — fc) dx, 2J / fixfupCT'x — kjdx. 23 / f (ж)'0з(2-?x — к) dx. Albert Cohen established this result under very general hypotheses [60]; of these, the most convenient is that mo(£,r/) f 0 if |£| < and |?y| < The beauty of this theory leads one to think that it provides the correct response to the image- processing problem. Indeed, the image is decomposed by wavelet analysis into information that is independent (orthogonal) from one scale to another, and this agrees with the general philosophy expressed in the introduction. These indepen- dent packets of information are represented by the trend in Vq and the fluctuations fj E Wj, whose orthogonal sum is equal to f. The characteristic scale of Wj is 23, and each fj E Wj is itself decomposed into orthogonal components according to the basis 23гф(23х — к), к E Z2, 0 = 01? f>2, or The Haar system provides the simplest example of two-dimensional orthogo- nal wavelets. This version is constructed as follows: Let and if be the one- dimensional Haar functions; then <p(x,y) = p(x)p(y), ifi(x,y) — p(x)tffy), 02(aq?/) — if(x)pfy), and 0з(я:, ?/) = if(x)if(y). This system has been used for image processing for a long time, and it is still used in astronomy (Chapter 12). However, the Haar system has the disadvantage that, following quantization, it introduces rather harsh edge effects, thus producing unpleasant images (see Fig- ure 2.1). This prompts us to say a few words about the quantization problem. If we stay in the L2 setting, all orthonormal bases allow the signal to be reconstructed exactly. This is not the point of view of the numerical analyst or image specialist. In practice, the coefficients from the decomposition must be quantized, whether we like it or not. These approximations arise from the machine accuracy or are imposed by a desire to compress the data. If it is true that f(x,y) = what happens to f if the ag are replaced by coefficients otj satisfying |c0 — aq-| < e, where e > 0 is related to the machine precision? If we use a discontinuous wavelet, one bad thing that happens is that spurious edges will appear, and even though the L2 error is small, the visual effect can be very disturbing (see Figure 2.1). The use of smooth wavelets produces a much better result. In spite of this, orthogonal wavelets (and the corresponding pyramid algorithms) have not completely satisfied the experts in image processing. One criticism is the lack of symmetry. The function ought to be even, while the function 0 ought to be symmetric in the sense that 0(1 — x) = if(x). These properties are satisfied by cer- tain orthogonal wavelets, but they do not hold for wavelets with compact support. The Haar system, which is antisymmetric about is the only exception. This lack of symmetry leads to visible defects, again following quantization. These defects do not appear when one uses symmetric, biorthogonal wavelets having compact support. We introduce these wavelets in the next section.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 63 4.7 Biorthogonal wavelets Following the pioneering work of Philippe Tchamitchian [246], Albert Cohen, Ingrid Daubechies, and Jean-Christophe Feauveau [57] studied a remarkable generaliza- tion of the notion of orthonormal wavelet bases, namely, biorthogonal systems of wavelets (see also [107]). We begin with the one-dimensional case. In place of an orthonormal basis of the form 2-?/2?Д2-7ж — к), j,k E Z, we use two Riesz bases, each the dual of the other, denoted by and The first is used for synthesis, and the second is used for analysis. This means that for all f belonging to L2(IR), OO oo /(t) — &j,k(t) , j =—oo fc= —oo (4.22) where \\fW2 and (JZjl-oo 52fcL-oo I^aI2)1^2 are equivalent norms on L2(IR) and where the coefficients are defined by /(x)^Jjfc(x) dx. (4.23) As before, we define ^'л(ж) ~ 2j/2t/;(2jx — k) and ^-^(x) = 2-?/2-0(2-7ж — к). Up to this point we have only weakened the definition of the orthonormal wavelet bases. But what we gain in flexibility by not requiring that allows us to make considerably stronger demands on ip. For example, we can require that U be the function in Figure 4.4. Fig. 4.4. An example of ip with tp(l — x) = ip(x). The general theory of Cohen, Daubechies, and Feauveau tells us that, for this choice of ip, ip3^ is a Riesz basis for L2(IR) and the dual basis has the same structure, which is given by 2-?/2-0(2-7x — k), j.k E Z. In this special case, the dual wavelet is not a continuous function. This is not necessarily a problem, but if we want more regularity, we need to take a more general approach. We will select ip from a set of functions that are continuous, have compact sup- port, are linear on each interval [|, ^T], к E Z, and are symmetric with respect to
64 CHAPTER 4 x — that is, ?/?(l — x) = гДж). We can do this so that the dual wavelet ф is a function in the class Cr and has cornpact support. We are going to outline how -0 and ф are constructed. Start with the triangle function <р(ж) = sup(l — |ж|, 0), which was mentioned in section 4.2. Then define mo(£) = (cos2-1£)2; by construction, <p(£) = mo(2-1^)(p(2“1^). Next, consider 9n(£) = cn (sint)2N+1 dt, where cn > 0 is chosen so that gyv(O) = 1. If friQ is defined by mo(^)mo(^) = Pa(£)> then m0(£)m0(£) + + т)т0(С + 7г) = 1. (4.24) (In the construction of the Daubechies wavelets, one imposed the condition |mo(£)|2 = 9n(£,)) Define ф E Z2(IR) by its Fourier transform <P(€) = П mo(2-J^)_ (4.25) Then the identity (4.24) is equivalent to f . . . ,. , f 0 if к ф 0, . . у <т)<р(ж - k)dx = <! if k = 0 (4.26) The function ф is even, its support is the interval [—27V, 27V], and ф is in the Holder space Cr for all sufficiently large N. It is clear that 1/2 5a акФ(х - к) but (4.26) implies that the inverse inequality also holds. Thus one can consider the closed subspace Vq C Z2(IR) for which ф(х — 7c), к E Z, is a Riesz basis. If the subspaces Vj are defined by f(x) E W .f(2Jx) E V„ then this sequence forms a multiresolution analysis of Z2(IR). In the same way, let V) be the closed subspace of Z2(IR) for which <p(.z; — 7c), к E Z, is a Riesz basis and construct the Vj similarly. The two multiresolutions (V)) and (Vj) are the duals of each other. This duality is used to define the subspaces Wj and Wj: f belongs to Wj if / belongs to Vy+X and if f(x)u(x) dx = 0 for all и E Vj. The wavelets ф and ф will be constructed so that ф(х — к), 7c E Z, is a Riesz basis for Wo and, similarly, ф(х — к), к E Z, is a Riesz basis for Hq. For this, we define mx(£) = e~’^m0(£ + 7r) and mx(£) = e~^W(£ + t), and we define the Fourier transforms of ф and ф by ^(^) = mi(2“1^)<p(2“1^) and V>(£) = mx (2-1£)<p(2-1£). Write ф^ь(х) — 2д(2ф(2^х — 7c), and define ^-^(x) similarly.
PYRAMID ALGORITHMS FOR NUMERICAL IMAGE PROCESSING 65 The only properties that are difficult to prove are that the family j, к e Z, is a Riesz basis for T2(IR) and that the same is true for the ф^к- These Riesz bases are the duals of each other. This means that фф^к, = bj,j’bk,k' and that f G E2(IR) can be represented as /(ж) = 52 j,kEZ and as /w = 52 Furthermore, the function ф is as simple as it is explicit: It is continuous; it has compact support; it is linear on each interval [|, and the values ф(2~гк) are explicit rational numbers. Finally, we have ф(1 — x) = ?/?(ж), and the symmetry, which Daubechies’s wavelets lack, is reestablished. In dimension two, we use the wavelets д)(х)ф(у), (р(у)ф(х), and фффффу), as in the orthogonal case. Then the dual wavelets are ффс)ффу), ффу)ффхф and ф(х)ффу'). While the JPEG committee is still working on developing the upcoming JPEG- 2000 standard for still image compression, at the time of writing, it is very likely that the JPEG-2000 standard will be based on biorthogonal filters and bitplane coding [197]. The flexibility offered by biorthogonal wavelet expansions is not limited to filter applications but extends to other areas. For instance, if Ds denotes the fractional powers of Z) = —we can require the two Riesz bases ф^к and J, к E Z, to be orthogonal with respect to the scalar product (f,g) = (Dsf)(x)(Dsg)^)dx. Fabrice Sellan has shown that such wavelets “decorrelate” fractional Brownian mo- tion (fBm) of order H = s — This means that this process can be written as 522~,s'7^z-v;j,fc(x)5 where the g^k are independent, identically distributed Gaussian random variables with mean 0 and variance 1. This decomposition involves small scales {j +oo) and large scales (j —сю), and it is easily seen that this second half of the series is divergent whenever H = s — | >0. This divergence must be fixed. One option is to replace ф^кФ) by ф31кф) — 7/^(0). A second option consists in introducing a scaling function 99 such that ^z~S39xk^j,M = 52с^,я^^- j<0 к where c(/c, H) is a FARIMA process. This is currently the best way to simulate accurately the long-range correlations in fBm (see [2]). P. G. Lemarie-Rieusset used the same idea to construct divergence-free biorthogonal wavelets in IR3, which may prove to be useful in the study of turbulence [171]. (There will be much more about wavelets and turbulence in Chapter 9.)
CHAPTER 5 Time-Frequency Analysis for Signal Processing 5.1 Introduction Time-frequency analysis for signal processing is an active field of research. Here, as in many domains, heuristic concepts structure and guide the work. The heuristic notions that will serve us in this and the following three chapters are (1) time- frequency atoms, (2) the optimal decomposition of a signal into time-frequency atoms, (3) instantaneous frequency, (4) the time-frequency plane, (5) the optimal representation of a signal in the time-frequency plane, and (6) optimal partitioning of the time-frequency plane. In this and the following chapters, we will try to give precise scientific meaning to these heuristic ideas. We add, however, that this is a large field of research and that our exposition is by no means exhaustive. Dennis Gabor [124] and Jean Ville [254] both addressed the problem of developing a mixed representation of a signal in terms of a double sequence of elementary signals, each of which occupies a certain domain in the time-frequency plane. In the following sections we will define what is meant by time-frequency plane and mixed representation, and we will suggest several choices for the elementary signals, or atoms. Roger Balian tackled the same problem and expressed the motivation for his work in these terms [17, p. 1357]: One is interested, in communication theory, in representing an os- cillating signal as a superposition of elementary wavelets, each of which has a rather well defined frequency and position in time. Indeed, useful information is often conveyed by both the emitted frequencies and the signal’s temporal structure (music is a typical example). The repre- sentation of a signal as a function of time provides a poor indication of the spectrum of frequencies in play, while, on the other hand, its Fourier analysis masks the point of emission and the duration of each of the signal’s elements. An appropriate representation ought to combine the advantages of these two complementary descriptions; at the same time, it should be discrete so that it is better adapted to communication theory.4 Similar criticism of the usual Fourier analysis, as applied to acoustic signals, is found in the celebrated work of Ville [254, p. 63]: If we consider a passage [of music] containing several measures (which is the least that is needed) and if a note, la for example, appears once in 4Here and elsewhere, the translations from French are ours.
68 CHAPTER 5 the passage, harmonic analysis will give us the corresponding frequency with a certain amplitude and a certain phase, without localizing the la in time. But it is obvious that there are moments during the passage when one does not hear the la. The [Fourier] representation is nevertheless mathematically correct because the phases of the notes near the la are arranged so as to destroy this note through interference when it is not heard and to reinforce it, also through interference, when it is heard; but if there is in this idea a cleverness that speaks well for mathematical analysis, one must not ignore the fact that it is also a distortion of reality: indeed when the la is not heard, the true reason is that the la is not emitted. Thus it is desirable to look for a mixed definition of a signal of the sort advocated by Gabor: at each instance, a certain number of frequencies are present, giving volume and timbre to the sound as it is heard; each frequency is associated with a certain partition of time that defines the intervals during which the corresponding note is emitted. One is thus led to define an instantaneous spectrum as a function of time, which describes the structure of the signal at a given instant; the spectrum of the signal, in the usual sense of the term, which gives the frequency structure of the signal based on its total duration, is then obtained by putting together all of the instantaneous spectrums in a precise way by integrating them with respect to time. In a similar way, one is led to a distribution of frequencies with respect to time; by integrating these distributions, one reconstructs the signal. Ville thus proposed to unfold the signal in the time-frequency plane in such a way that this development would lead to a mixed representation in time-frequency atoms. The choice of these time-frequency atoms would be guided by an energy distribution of the signal in the time-frequency plane. The time-frequency atoms proposed by Gabor are constructed from the Gaussian p(t) = 7r-1/4e-t /2 and are defined by w(t) = h~1/2ewtg^ h°\ (5-1) The parameters ш and to are arbitrary real numbers, whereas h is positive. The meaning of these three parameters is the following: ш is the average frequency of w, h > 0 is the duration of w, and to — h and to + h are the start and finish of the “note” w. Naturally, this depends on the convention used to define the width of g. The essential problem is to describe an algorithm that allows a given signal to be decomposed, in an optimal way, as a linear combination of judiciously chosen time-frequency atoms. The set of all time-frequency atoms (with ш and to varying arbitrarily in the time-frequency plane and h > 0 covering the whole scale axis) is a collection of elementary signals that is much too large to provide a unique representation of a signal as a linear combination of time-frequency atoms. Each signal admits an infinite number of representations, and this leads us to choose the best among them according to some criterion. A similar program (the definition of time-frequency atoms, analysis, and synthe- sis) was proposed by Jean-Sylvain Lienard in [173, pp. 948, 949], where he wrote: We consider the speech signal to be composed of elementary wave- forms, wf, (windowed sinusoids), each one defined by a small number of parameters.
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 69 A waveform model (wfm) is a sinusoidal signal multiplied by a win- dowing function. It is not to be confused with the signal segment, wf, that it is supposed to approximate. Its total duration can be decom- posed into attack (before the maximum of the envelope), and decay. In order to minimize spectral ripples, the envelope should present no 1st or 2nd order discontinuity. The initial discontinuity is removed through the use of an attack function (raised sinusoid) such that the total envelope is null at the origin, and maximum after a short time. Although exponential damping is natural in the physical world, we choose to model the decaying part of the wfs with another raised sinu- soid. Actually we see the wf as a perceptual unit, and not necessarily as the response of a format filter to a voicing impulse. Lienard’s time-frequency atoms (Figure 5.1) are different from those used by Ga- bor. They are, however, based on analogous principles. The Lienard atoms are of the form w(i) = A(i) cos(w£ + 99), where a) represents the average frequency of the emitted “note” and where the envelope A incorporates the attack and decay. The principal difference is that, in the atoms of Lienard, the duration of the attack and that of the decay are independent. Thus Lienard’s atoms depend on four inde- pendent parameters, and the optimal representation of a speech signal as a linear combination of time-frequency atoms is more difficult to obtain. Some empirical methods exist, and they lead to wonderful results for synthesizing the singing voice. For example, the Queen of the Night’s grand aria from Mozart’s Magic Flute has been interpreted by time-frequency atoms. This was not a copy of the human voice; it involved the creation of a purely numerical (superhuman) voice. This was com- missioned by Pierre Boulez, the director of the Institut de Recherches Coordonnees Acoustique-Musique, and achieved by X. Rodet of that institute (see [231]). Fig. 5.1. A Lienard time-frequency atom. 5.2 The collections fi of time-frequency atoms The time-frequency atoms of Gabor defined by (5.1) (which are also called Gabor wavelets and Gaborlets) and the waveforms of Lienard are two examples of what we call a collection fi of time-frequency atoms. This concept will play an essential role in this and the next two chapters. Mathematically, a collection of time-frequency atoms fi is a subset of L2(R) that is complete. This means that the finite linear combinations a? j w 7, Wj G fi, are dense in L2(IR). We also assume that if w 6 fi, then ||w\\2 = 1. But this definition
70 CHAPTER 5 is much too general to serve in practice for signal processing. Thus, in addition, we require that the elements w of Q have a simple algorithmic structure and that the elements w 6 Q are optimally localized in the time-frequency plane. Obviously these last two requirements have not yet been made mathematically precise; instead, they will be illustrated by the various examples we discuss. We will use the Wigner-Ville transform (section 5.5) to study localization in the time-frequency plane. Here are some of the collections Q that are available to us today: (1) The Gabor wavelets where w and t0 are arbitrary but h = 1. (2) The complete collection of Gabor time-frequency atoms where w, to, h > 0 are arbitrary. (3) The waveforms of J.-S. Lienard and X. Rodet. (4) Malvar-Wilson wavelets (Chapter 6). (5) Chirplets (Chapter 6). (6) Wavelet packets (Chapter 7). Given that we have these collections at our disposal, two problems arise: (a) What collection Q should one choose to study a given signal or a given class of signals? (b) Having chosen Q, how is one to decompose a signal f optimally in a series OqWo + aqwi + • • • + (Kj'ii'j + • • , where Wj 6 Q and the a:] are scalars? There is no general answer to the first question. A current point of view in signal analysis could be called a resemblance criterion. It holds that the time-frequency atom should “look like” the signal (or pieces of the signal) that is being analyzed. This is the point of view taken by Lienard, but intuition can be misleading. As an example, we mention the problem of storing fingerprints. The first compression algorithm depended on the optimal use of wavelet packets. This choice seemed natural, since the structure of fingerprints exhibits certain textures that one feels ought to be analyzed by a time-frequency algorithm rather than by a time-scale algorithm. However, to general surprise, it appears that the biorthogonal wavelets of Cohen, Daubechies, and Feauveau provide the best results. This conclusion was not obtained by theoretical considerations; it resulted from experimentation [41]. The fingerprint example brings us back to statistical modeling, which was briefly mentioned in section 1.2. If one is studying a large collection of signals or images that exhibit common features and if one wishes to choose a collection Q, then we believe one should first develop a statistical model of the collection. Such a model should include random variables that model the intrinsic variability within the data set. The goal is to find an algorithm that produces signals or images that have the same “look and feel” as the ones in the data set. This in turn should point the way to choosing an appropriate collection Q. We move on to question (b), for which we have several pieces of an answer. Having chosen a collection of time-frequency atoms Q, we must find a decomposition f = &owo + cciWi + • • + ctjWj + • , Wj E Q, (5-2) that is in some sense optimal. In signal processing, the notion of optimality should be defined in terms of some goal, and as discussed in Chapter 1, the most impor- tant ones currently are analysis, compression, transmission, storage, restoration,
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 71 denoising, and some specific diagnostic. We conclude by emphasizing once again that when dealing with a large data set a reliable diagnosis depends on the efficacy of the statistical model of the data. 5.3 Mallat’s matching pursuit algorithm One of the most elegant of the algorithms that lead to an optimal decomposition of the form (5.2) is Mallat’s matching pursuit algorithm. Its goal is analysis or diagnosis. Mallat’s algorithm can be applied to any collection of time-frequency atoms Q that satisfies a certain compactness property. In all of the cases we have in mind, the time-frequency atoms w are functions of a parameter A E A, and the forms (/, w\) are continuous functions of A. For example, in the case of the Gabor wavelets with arbitrary duration, Л = В x Ik x (0, oo). Furthermore, in this case, the functions (f, W\) tend to zero when A is in the complement of the compact set defined by [—TV, TV] x [—TV, TV] x [1/TV, TV] and N +oo. Another way to say this is that the functions (/, w\) are continuous on the one-point compactification Л = AU {сю} of A and vanish at the ideal point oo. It follows that for each f E A2 (IRQ, the function |(/, w\)\ attains its maximum value at some point in A. The general property we require is that the functions “vanish at infinity” or, more precisely, that they are continuous on the one-point compactification of A and vanish at the ideal point. This property can be verified for all of the examples discussed. Mallat’s algorithm consists in solving the optimization problem A) = sup \{f, w)|. (5.3) This problem has at least one solution wq E Q by virtue of the assumption. We write <ao = (/, w), and assuming that ||w||2 = 1 for all w E Q, we define /i(t) = /(i) - aowo(t). Note that there is no reason for Wq to be unique, and thus there is no reason for /i to be unique. By iterating the process we obtain A+i(0 = Л(0 - qtwj(O, It is proved in [183] that this algorithm converges and gives a representation (5.2). For an elementary discussion of this algorithm and pursuit algorithms in general, we suggest Mallat’s book [181]. One learns there that these algorithms were used in statistics in the early 1980s [117], and that convergence in case Q is infinite was first proved by L. K. Jones in 1987 [162] (see also [75] and [163]). DeVore’s paper [79] contains a review of pursuit algorithms in the context of nonlinear approximation, which is to say, with an emphasis on the rate at which these algorithms converge. The optimization problem (5.3) is inherently unstable, and a solution can be costly. These shortcomings have inspired the development of faster, more robust algorithms, whose names typically include “pursuit.” This is an active field of research. As an example of the kind of work being done, we recommend the recent thesis by Remi Gribonval where pursuit algorithms are used to analyze acoustic signals [132].
72 CHAPTER 5 5.4 Best-basis search Those who use Malvar-Wilson wavelets or wavelet packets have adopted a different point of view toward optimality. In both of these cases, the time-frequency atoms in Q can be regrouped to form a set A of orthonormal bases a. In other words, Q is replaced by a “library” A whose “books” a are orthonormal bases. In this case, Mallat’s matching pursuit algorithm is replaced by an algorithm that looks for the best basis. Said differently, in place of looking for “the best atom” wq E Q, one looks for “the best basis” no £ Л- This optimal basis is defined in terms of an entropy criterion that leads to “the most compact representation” of the given signal. We note that this algorithm is not iterative. However, there is a “Mallat” version of this algorithm that is used for denoising. One looks for the best basis Oo E A, but one retains in the corresponding decomposition of the signal f only those terms whose energy exceeds a certain threshold (which is related to the assumed level of the noise). If the sum of these terms is called /о, one considers the function f—fo and repeats the process. As an example, this denoising technique has been applied to the 1889 recording of Johannes Brahms performing his Hungarian Dance no. 1 in G Minor [34]. The connections between time-frequency atoms, the time-frequency plane, and the optimal representation of the analyzed signal in the time-frequency plane will be developed in the following sections. But at this point we are going to pause and discuss the special case of the Gabor wavelets. For the moment, the time-frequency plane will be the usual В 2 plane. The idea behind calling this the time-frequency plane is based on the following heuristic: One looks for an algorithm that allows one to “write,” in the time-frequency plane, a “partition” of a given signal f. The “notes” used to write this partition should be the time-frequency atoms found in one of the decompositions (5.2). We hope that these “notes” are simple and convenient; this means that they are accessible via a simple algorithm working in real time and that they are optimally localized in the time-frequency plane. This localization will be defined in the following sections using the Wigner Ville transform. In the case of Gabor wavelets, the disc {<(,€) । (e-W)2 + (t-t0)2 < i} is associated with the wavelet ewtg(t — to), and, more generally, the elliptical domain £' = {(t,c)]h2«-w)2 + 4^|A<ij is associated with the wavelet elu>tgh(t — to), where ph(t) = )• Later we will replace these elliptical domains with the corresponding rectangles which are called Heisenberg boxes. We note that these Heisenberg boxes can be horizontal, vertical, or square depending on the value of h. The localization of the time-frequency atoms depends on the Wigner-Ville transform, which we discuss in the next few sections. 5.5 The Wigner—Ville transform In work that is still stimulating to read [254], Ville set himself the task of studying three topics and of relating them to each other: (1) the distribution of energy of
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 73 a signal in the time-frequency plane, (2) the definition of instantaneous frequency, and (3) the optimal decomposition of a signal in a series of Gabor wavelets. In this and the following sections we will study the Wigner-Ville transform. Later we will indicate its use for studying the problems posed by Ville. We begin by presenting the point of view of Ville. We will then indicate how to interpret the results in terms of the theory of pseudodifferential operators as expressed in Hermann Weyl’s formalism. This will bring us back to work done by the physicist Eugene Wigner in the 1930s. Ville, searching for an “instantaneous spectrum,” wanted to display the energy of a signal in the time-frequency plane and to obtain an energy density W(t, £) having (at least) the following properties: = |/W|2, (5.4) У” lV(t,0dt=|/(?)|2, (5.5) where / denotes the Fourier transform of f. The heuristic behind this research is the following: W(t, £) should represent “the square of the modulus of the instantaneous Fourier transform of f at the instant t,” so that if a theory of the instantaneous Fourier transform existed, (5.4) would look like the Plancherel identity. Similarly, (5.5) would mean that the various instantaneous spectral contributions are summed to form the square of the modulus of the Fourier transform. An exhaustive descrip- tion of densities W(t, £) that satisfy (5.4) and (5.5) can be found in [112]. Ville made the following choice, which is now called the Wigner-Ville transform: f (t + ^)7 (t - I) e-^dT. (5.6) We now look at how well W(t,£) fulfills the notion of an energy density. If the signal f has finite energy (f^ |/(t)|2dt < °°)> then W(t,£) exists as a real-valued, continuous function in the time-frequency plane. As we will see in a moment, the converse is completely false. Even if rather restrictive conditions are imposed, such as belonging to the Schwartz class, a real function W(t,£) is not in general the Wigner-Ville transform of a signal with finite energy. If II/II2 — 1, it is clear from either (5.4) or (5.5) that = 1. but it is not true that W(t,£) is always nonnegative. Another property of W(t, £) concerns localization in the time-frequency plane. If f vanishes outside an interval [to, ti], then the same is true for W(t, £). Similarly, if the Fourier transform of f is zero outside [cuo, cui], then W(t, £) = 0 if £ [cuq, cui]. If—and this is of course impossible—f were zero outside [to, ti] while its Fourier transform f were zero outside [cu0, cui], then the Wigner-Ville transform of f would be zero outside the rectangle [to, ti] x [luq, cui]. It is this “property” that speaks in favor of Ville’s interpretation of W(t,£). Unfortunately, we immediately encounter a trap. If the Fourier transform of f is supported on u(J < |cu| < cui, where 0 < ca, < <^1, the same is not true for
74 CHAPTER 5 W(t,f), which, in general, is supported on |cj| < cji. The formerly empty interval |cj| < co’o can be filled with W(t,f). This phenomenon causes to be difficult to interpret: The Wigner-Ville transform can take nonzero values in regions of the time-frequency plane having nothing to do with the spectral properties of the signal. In spite of these artifacts and the fact that W(t,ff) can take negative values—and thus is an imperfect energy density—the Wigner-Ville transform has an important role in signal processing. It is interesting to note that the Wigner-Ville transform did not originate in signal processing but rather in quantum mechanics, and the technology transfer to signal processing was begun by Ville in the 1950s. At the time Ville did his work, there were no heuristics originating from signal processing that would lead to this specific quadratic transformation. Today, however, the Wigner-Ville transform appears naturally in signal processing because it is related to the ambiguity function of a signal /, which is defined by А(т,ш) = f + 0 f dt- The ambiguity function is a two-dimensional Fourier transform of the Wigner- Ville transform of /, and it is widely used in signal processing for radar (see, for example, [37]). On the other hand, if we abandon signal processing and instead move to the theory of pseudodifferential operators (section 5.7), then the Wigner- Ville transform appears naturally in quantum mechanics. (We find it remarkable that so much progress in signal processing has been realized by experts in quantum mechanics.) 5.6 Properties of the Wigner-Ville transform We begin with the case of signals with finite energy. If W(f-,t,f) denotes the Wigner-Ville transform of f, we need to compute W(T f-,t,f) when T is a linear operator. Since the mapping f W(Tf-,t, £) is quadratic, it is not clear that, given T, there will exist a linear operator T such that This is the case, however, in a number of important examples. We consider the problem for the following operators: Fourier transform f, unitary dilation Da, a, > 0, symmetry S, translation R^, b E IR, modulation Мш, w 6 R, multiplication operator ш E = 4=/(-); у/a \aJ Ш = f(-ty,
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 75 With this notation we have the following relations: IV(/;t,O/2Tr= (5.7) (5.8) (5.9) = W(f;t-b,^, (5.10) = W;«-4 (s.n) = W;«,?-2u-t). (5.12) Moreover, W(/; £, £) = W(f; t, — £), and W(егшН f; t,£) = W(f‘, t cos 2cj + £ sin 2cc, £ cos 2ш — t sin 2cj), where H = —^2 +t2 — 1 (the harmonic oscillator) and ш is any real number. A consequence of these relations is that the collection of all Wigner-Ville trans- forms of signals with finite energy is invariant under the Euclidean group of the time-frequency plane. This observation has some crucial consequences. If one truly trusts signal processing based on the Wigner-Ville transform, it implies that one should pave the time-frequency plane with Heisenberg boxes with arbitrary direc- tions and eccentricities. We will return to this point later. We list two more properties of the Wigner-Ville transform. If f(f) = У 9(t- s)h(s)ds, where g and h are signals (or functions) for which the integrals make sense, then = J W(g-t- s^)W(h-s,Qds. This is easily checked, but it is not intuitive since the Wigner-Ville transform is quadratic. The Moyal identity for functions f and g having finite energy is 2 [j W(J-.t,()W(g-,t,(;)dtdi = 27: J f (t)g(t) dt . (5.13) We now indicate how some of these properties can be used. Suppose that Q(i, £) = Pt2 + + qt2, p>0, q > 0, pq > r2. Then = 2exp(—Q(i, £)) is the Wigner-Ville transform of a signal f such that f \ f(t)\2 dt = 1 if and only if ff W(t, dtd£ — 2tt. This last condition is obviously necessary since, by (5.4), УУ lV(T£)^d£ = 27r У |/(i)|2dt To prove the result in the other direction, we first write Q as Q^=pp+Lt\\(P^y. \ p / V p /
76 CHAPTER 5 Using this, it is easy to compute ff W(t, £) dt d£ and thus to see that the condition ff W(t,£) dtd£ — 2тг implies that pq — r2 — 1. Thus, Q(i, £) = p(£ + ^i)2 + ~t2. If there is a function f such that W(/; £,£) = 2 exp(—Q(t, £)), then by (5.12) IF(/i;£,£) = 2exp(—p£2 - ^2) where /i(i) = f(t) exp(?^£2). Similarly, using (5.8) we see that W(f2, t, £) = 2exp(—£2 — i2) with /2(0 = (^jO- Another computation shows that the Wigner-Ville transform of the Gaussian p(i) = 7Г-1/4exp( —|i2) is TV(g;£,£) = 2exp(—£2 — i2). By taking the transfor- mations (5.8) and (5.12) in the other direction, we see that the transform of the function —) exp f - i—t2} is exactly lV(/;i,^) = 2 exp(—Q(£, £)). We have already stressed that the Wigner-Ville transform Wt, £) is not always a nonnegative function. In fact, the only cases where W(/;£, £) is nonnegative are W(i —io, £—Co), where W(t, £) = 2 exp(—Q(i, £)) is defined as above with pq—r2 = 1 [112]. Finally, there are several averaging procedures that allow one to eliminate the negative values of Wigner-Ville transforms. It suffices to consider ~ ??) exp(—Q(t, p)) dr dr], (5-14) where pq — r2 = 1. Roughly speaking (5.14) amounts to averaging W(i,C) over generalized Heisenberg boxes with arbitrary directions and eccentricities. 5.7 The Wigner—Ville transform and pseudodifferential calculus The following considerations allow us to relate the Wigner-Ville transform to quan- tum mechanics and the work of Wigner. We are going to forget signal processing for the moment and go directly to dimension n. The analogue of the time-frequency plane is the phase space x whose elements are pairs (ж,£), where ж is a position and £ is a frequency. We start with a symbol <т(ж,£) defined on phase space. Certain technical hy- potheses have to be made about this symbol to ensure convergence of the following integral when f belongs to a reasonable class of test functions. We will deal with this point in a moment. Following the formalism of Weyl, we associate with the symbol ст(ж,£) the pseu- dodifferential operator a(x, D) defined by (2^)"<7(х,О)[/](ж)= Ц (5.15) where the integral is over ]Rn x ]Rn. Define the kernel Kfx, y) associated with the symbol ст(ж, £) by (2тг)пК(х,у) = [ 7 ч (5-16) (О \пт(х + У А = (27Г) L\^2~'x~ у)-
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 77 The symbol ст(ж,^) is thus the partial Fourier transform, in the variable u. of the function Т(ж,и), and the kernel K(x,y) that interests us is a; — i/). We can also write L(x, y) = K{x + x — , and this allows us to recover the symbol ст(ж, £) by writing Thus we are led to hypotheses about the symbols that are the reflections, through the partial Fourier transform, of hypotheses that we may wish to make about the kernels. If we admit all the distribution kernels K(x,y) belonging to the space of tempered distributions on JRn x JRn (which we denote by <S'(IRn' x ]Rn)), then there will be no restrictions on a(x,£) other than the condition that сг(ж,е) G S'(lRn x Rn). An immediate consequence of (5.17) is this: If <т(ж,£) is the symbol for the operator T, then ст(ж,£) is the symbol for the adjoint operator T*. Finally, we consider a function f belonging to L2(IRn) and satisfying ||/||2 = 1. Let Pf denote the orthogonal projection operator that maps L2(lRn) onto the linear span of f. Then the kernel K(x,y) of Pf is f(x)f(y) and the corresponding Weyl symbol is Returning to dimension one, we have the following result: The Wigner-Ville transform of the function f is the Weyl symbol of the orthogonal projection operator onto the linear span of f. From this it is clear that the Wigner-Ville transform of f characterizes f, up to multiplication by a constant of modulus one. The following result is an important consequence of the preceding remarks. Theorem 5.1. Letfj,j G N, be a sequence of functions in T2(1R) and let Wj(t, £) be the Wigner-Ville transforms of the fj. Then the following two properties are equivalent: fj, j G N, is an orthonormal basis for L2. (5.19) =ои//ле) = 1, (5.20) = 27r<5jj/. If Pj denotes the projection associated with fj, then (5.19) amounts to writ- ing (fj,fj') = bjj/ and Pj = I- Since Wj(t,ff) is the Weyl symbol of Pj, = I is equivalent to the first equation of (5.20). More precisely, should converge to one in the sense of distributions. On the other hand, Moyal’s identity yields the second equation of (5.20). This simple and elegant theorem led to the following heuristic: Orthonormal bases for L2(1R) consisting of time-frequency atoms aj are in one-to-one correspon- dence with partitionings of the time-frequency plane with horizontal or vertical Heisenberg boxes. For example, orthonormal wavelet bases correspond to the now- familiar paving of the time-frequency plane (Figure 5.2) that James Glimm and
78 CHAPTER 5 Fig. 5.2. Dyadic paving of the time-frequency plane. Arthur Jaffe were using in quantum physics before wavelets existed [130]. Such pictures should not be taken too literally, however. Consider, for example, the Lemarie-Meyer wavelets where the mother wavelet ф belongs to the Schwartz class and satisfies 2тг Rtt Ж) = о if iei<— or iei>—. (5.2i) о о The corresponding orthonormal basis is = 2^2ф(2Н — к), j, к G Z. We then observe that the Fourier transform of -0^ is 2_J/2e-lfc2 J ^ф(2~д so it is natural to label by the Heisenberg boxes Rj k defined as R-,k = {(M) I k2~^ <t<(k + 1)2~J, Tt2j <£^< 2^ }, where £ = ±1. However, this labeling is not consistent with the definition of the Wigner-Ville transform of ф^к- Indeed, the Wigner-Ville transform of ф^ь is not supported by R^k UR~k. This is clear in the time domain, but of course the rapid decay of ф^ is a substitute for the lack of compact support. The situation in the frequency domain is more serious. Here, the transform Wj,k(t,Q of ф^ь is supported on |£| < but it takes large values on |£| < where it should vanish. These large values are related to the artifacts that prevent the interpretation of Wj^ as an energy density. Also, the sums in the fundamental identity £^^(4,0 = 1 (5.22) j к converge only in the sense of distributions: The series ^k |Wj;k(£, £)| does not converge, but the large oscillations of Wj,k(t,£) cancel each other in the left-hand
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 79 side of (5.22). This bad news means that the supports of the in the time-frequency plane are not “almost disjoint Heisenberg boxes” as was expected from the classical representations. 5.8 Return to the definition of time-frequency atoms Consider the following problem: For functions f E L2(1R) with ||/||2 = 1, we would like to have a “measure” of how lV(/;t,£) is distributed in the time-frequency plane. Is t,f) concentrated or is it spread out? Since |W(/; t, £)| < 1 whenever Ц/Ц2 = 1, the maximum of VF(/; t, £) is not a good measure. Although f may have most of its energy concentrated in frequencies around £0 near time to, |IF(/; io, £0) I is always bounded by one. Similarly, the Moyal identity (5.13) implies that W2(J-,t,£)dtd£ = 2тг. (5.23) This means that the concentration of W in the time-frequency plane cannot be measured in the L2 norm. The L1 norm of W is not always finite, and since W is bounded, this is due to the behavior of W at infinity. For a fixed t, VK(/;t,£) is the Fourier transform of the L1 function /(t + — ^), and, as such, it is sensitive to the smoothness of f. Thus, if f contains many high frequencies, then we may expect ff |VF(w; t, £) | dt d£ to be large. For example, if f = X[o,i], then fj IW(w; t, £)| dtdf — +oc. In view of these considerations, it is reasonable to ask that the time-frequency atoms w E Q are such that j'j'\W(w,t,C)\dtdC < C (5.24) for some constant C, uniformly in w. This relation will hold when the time- frequency atoms are derived from a smooth “mother function” ip by the operations (5.7) through (5.11) in section 5.6. This holds for Daubechies’s wavelets, but it is not true for the Haar wavelets because they are not smooth. With this motivation, we are ready to give a more precise definition of time- frequency atoms: A collection Q of functions w is a collection of time-frequency atoms if the finite linear combinations of the functions w E Q (||w||2 = 1) are dense in I/2(1R), and there exists a constant C such that (5.24) holds uniformly for w E Q. 5.9 The Wigner—Ville transform and instantaneous frequency In Ville’s fundamental work, which has essentially been the source for this chapter, he makes a careful distinction between the instantaneous frequency of a signal (assumed to be real) and the instantaneous spectrum of frequencies given by the Wigner-Ville transform. More precisely, let f be a real-valued signal with finite energy. Then Ville writes /(t) = ReF(t), where F is the corresponding analytic signal: F(t) is the restriction to the real axis of a function F(z) that is holomorphic in the upper half-plane Im z > 0 and belongs to the Hardy space /72(R). Ville writes F(t) = A{ty^\ (5.25) where A(t) is the modulus of F(t) and <£>(t) is its argument. (If F vanishes at some [isolated] points, one needs to add conditions to preserve smoothness of the
80 CHAPTER 5 functions A and ip, but we will set aside this issue for the moment.) Ville then defines the instantaneous frequency of f at t to be 1 d = шА1 (5.26) and the local pseudoperiod is its inverse 2tv/(p'(t). The idea we wish to capture is that of a slowly varying envelope A(t) inside which the “true” oscillations are modeled by For this model to make sense, we need to introduce conditions to ensure that the variations in A(t) do not interfere with the determination of the instantaneous frequency. Specifically, we require that A(i) change very little on a scale given by the local pseudoperiod 2тг/<//(£). We express this semiquantitatively by the relation |(logA(i))'| < |</(£)|, or < 1. (5-27) Furthermore, the pseudoperiod should vary slowly: or ^(0 И))2 (5.28) These two conditions are precisely those given for the definition of a chirp in signal processing (see [49]). In the same spirit, the instantaneous spectrum of f is defined as the Wigner-Ville transform of the analytic signal F. Ville discovered a beautiful relation between these two concepts. In fact, the instantaneous frequency is the weighted average of the frequency when the weight is the instantaneous spectrum: 4 Г £W(t,t)d£ = <//(01W = Z7T J-oc (2- Г \ Z7T J _oo J (5.29) If W(£,£) were nonnegative, then T;W(£,£)|F(i)|2 would be a probability density, and the instantaneous frequency would be the expectation of the frequency £ with respect to this probability density. As many authors have stressed (see, for example, [112]), the definition of instantaneous frequency proposed by Ville works well only for very special signals. These are the asymptotic signals whose precise analytic definition is given by fx(t) — a(t) cos(A<^>(i)), where a and p> are regular, real-valued functions of time and where A is a large parameter, which, mathematically speaking, tends to infinity. We will write A 1 to express this notion. The application of Ville’s program to asymptotic signals will be studied in the following sections. We will show, in contrast to what Ville thought, that the time-frequency atoms adapted to this analysis are not the Gabor wavelets. In fact, they are not the Malvar-Wilson wavelets discussed in Chapter 6; they are the chirplets introduced for this purpose by Richard Baraniuk and Steve Mann. We will return to this in Chapter 6. We stress here, however, that the definition of instantaneous frequency given by Ville loses all sense when the signal fx consists of two spectral lines, that is, when A(i) = ai(i)cos(A<^i(t)) + a2(i) cos(A^>2(i)), (5.30) where A 1. In this case, if we follow Ville, we are led to assign a single instanta- neous frequency to fx, which is absurd. Later, in section 5.13, we will look at what the instantaneous spectrum f >—> W(fx',t,£) provides in this case.
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 81 5.10 The Wigner—Ville transform of asymptotic signals The purpose of this section is to show exactly how well one can determine the instantaneous frequency of a signal f (as defined by (5.26)) from the Wigner-Ville transform of the signal. We wish to answer this question: To what extent does the Wigner-Ville transform reveal the instantaneous frequency of a signal /? As indicated in the last section, Ville’s definition of instantaneous frequency works well only for asymptotic signals. Thus our analysis is limited to signals of the form = A(t) cos(A^>(i)), (5.31) where A and cp are real-valued and belong to C°°, and where cp'(t) > 1. This last assumption, which has not been mentioned, will play an essential role. To simplify the computations, we will also assume that A is in the Schwartz class S(R). We will study the asymptotic behavior as Л —> +oc. (For readers unfamiliar with this kind of analysis, it is perhaps useful to visualize the function as oscillating within the envelope A. By letting Л become large, the influence of the envelope on the “frequency” becomes negligible.) With these assumptions, it is possible to show that the analytic signal associated with /д, namely, Fx = (I + iH)fx, has the form Fx{t)=A{t)eiX^ +Rx{t), (5.32) where Rx(t) = O(A-7V) for all N > 1. It goes without saying that if we change cp to — cp the original signal is unchanged. Thus, if the assumption about cp was cp'{t) < —1, the corresponding analytic signal would be Гл(4) = A(t)e~iX^ + RXty Ville’s definition of the instantaneous frequency gives w = A^'(i) + O(A“7V) if A{t) 0, which agrees with our intuition. As a consequence of Rouche’s theorem, the frequencies appearing in the analytic signal must have a positive average, and hence we come back to the constraint cp'> 1. If A(Zq) = 0, where Zq is an isolated zero of A, computing the instantaneous frequency according to Ville no longer makes sense. However, it is natural to compute the instantaneous frequencies at neighboring points and to pass to the limit, defining the instantaneous frequency at to as w = limt^t0 Xcp'(t) — Xcp'^to). On the other hand, if A(t) = 0 in a neighborhood of to, then this computation no longer makes sense, since the signal does not exist; it has vanished. We are now going to compare this direct approach with the analysis of the an- alytic signal Fx using the Wigner-Ville transform. We wish to determine if this Wigner-Ville transform is essentially concentrated on the curve Г that is defined by £ = Xcp'(t) in the time-frequency plane. The Wigner-Ville transform of Fx is given by the oscillatory integral + O(A-7V), (5.33) where a(t,r} = A(t-\—)A(t-----), v \ 2/ \ 2/
82 CHAPTER 5 and where = ip(t + I) - ~ I)- We used the asymptotic expansion (5.32), and we wish to find the asymptotic behavior of when Л is large. For this, we will use the stationary phase method. To simplify the discussion, we assume that <£>'(£) is strictly convex on the whole real line and that lim^-,^ <£>'(£) = +oc. The stationary phase method proceeds by supposing that £ = Xp where p is a constant and by solving the equation P = 2 V V + 2/ 2/J (5-34) for т when t and p are fixed. It is necessary to distinguish three separate cases: (a) If p > <£>'(£), then (5.34) has two solutions т and — t, and this leads to an asymptotic expansion whose dominant term is O(A-1/2), which will be explained in a moment. (b) Ifp = the unique solution of (5.34) is т = 0, and the dominant term is ^(A-1/3). (c) If p < <£>'(£), the dominant term is O(A-7V) for all N > 1. In the first case, the dominant term of the asymptotic expansion is 4A-1//2B(i, t) cos {a^^ + 2) —2)) + 4}’ (5.35) where т is defined by (5.34) and where / T \ I / T \ / T \ l —1/2 Z + 9 оЖ V + 9 / 9 ) / \ / I \ / \ / I As often happens in applied mathematics, we have solved an academic problem: £ and A tend to infinity simultaneously while £A-1 = p is constant. The real problem is different: A 1 is fixed and (f,£) ranges over the time-frequency plane. In this case, the situation is quite different, and it is not discussed in the classic texts. In fact, the three cases (a), (b), and (c) must be modified. The new regimes (a'), (bz), and (c') are defined by (a') £ > A/(t) + A1/3; (b') |£ - А^'(01 < А1/3; (с') £ < A/(t) - A1/3. In the first case, the asymptotic term (5.35) can be used. One observes that |r| > cA-1/3, where c > 0 is a constant. This implies that |B(i,r)| < CA1/6, and we have |lV(t, £)| < C"A-1/3. In the second case, |W(i,£)| is of the order A”1/3. Finally, in the third case, we have W(t, £) = A-1/3cj[A“1/3(e - A</(£))], (5.36) where |cj(a;)| < Сдг(1 + |ж|)-ЛГ for all integers N > 1. These three behaviors agree at the boundaries of the three regions. Here is a simple example where the three regimes arise. If /(£) = exp(zAf3), A > 1, then the Wigner-Ville transform of f is
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 83 where A is the Airy function defined by A(£) = [ e^s-s3/3)ds. (5.38) 2tt J_oo The Airy function decreases exponentially as £ —> — oc and oscillates within an envelope of order O(|£|“1//4) as £ —> +oc. In this case, one finds all three regimes £ > A(£>'(£) + Л1/3, |£ — | < Л1/3, and £ < X<p'(t) — A1/3 as indicated in the general discussion. The conclusion is this: The investigation of the large values taken by the modulus of the Wigner-Ville transform of an asymptotic signal /(£) = A(t)e'lX'p^ does not allow one to isolate the instantaneous frequency £ = X<p' (t) with a precision better than Л1/3. The best one can hope to obtain is |£ — | < A1/3. 5.11 Instantaneous frequency and the matching pursuit algorithm Mallat’s matching pursuit algorithm provides a third approach to the instantaneous frequency. This is the reason: If cjq is the instantaneous frequency of a signal f when t = to, this means that there is a “confidence interval” [to — h, to + h] = I in which the analytic signal F associated with f behaves like Ao(i) exp(zcJo(^o — i)) where Ao(t) is a regular function of the auxiliary variable s = This assumes that cuoh V>1. A function behaving this way is strongly correlated with a Gabor wavelet whose average frequency ш is near the instantaneous frequency cjq that one is trying to evaluate. The Mallat algorithm amounts to optimizing \(F, over h > 0 and ш G IR. The hope is that the pair (/rj,wo) where the maximum is attained would provide the length of the confidence interval and the instantaneous frequency. However, by going back to the case of asymptotic signals, it is easy to show that this approach yields a value ho of the order A-1/2. Thus the precision with which ш is determined is no better than A1/2, which is less precise than that given by the algorithm based on the Wigner-Ville transform. To bring the performance of the matching pursuit algorithm up to that of the Wigner-Ville transform, it is necessary to enlarge the collection of time-frequency atoms. We define Q as the set of linear chirps. These are the functions of the form w(i) = exp(i[a(t - t0) + /5(i - io)2]), (5.39) where a, (3, and to are three arbitrary real numbers and where h > 0 is also arbitrary. The function g is still the Gaussian g(t) — 7r-1/4e-t /2. Applying the Mallat algorithm using this extended collection of time-frequency atoms, one finds, for to fixed, an optimal value h = A-1/3. This value is much larger than the value h — A-1/2 that is obtained when the atoms are limited to the Gabor wavelets. At the same time, with this extended set of atoms, the frequency resolution in this case is (^(A1/3) rather than O(AX/2).
84 CHAPTER 5 5.12 Matching pursuit and the Wigner—Ville transform We are going to interpret the matching pursuit algorithm in terms of the Wigner- Ville transform. More precisely, we start with the Moyal identity, |(Л«’)|2 = 4; (5.40) If w is defined by (5.39), then is essentially the characteristic function of the oblique Heisenberg box В defined by IC - [« + 2/?(i - ^o)]| < (5-41) One can then write wb in place of w because the definition of В provides all of the parameters used to define w. Thus, to optimize |(/, w)|2, one must skew В so that is, on average, as large as possible on В. But W(/;i,C) attains its maximum when |£ — A<//(i)| < A1/3. This leads to an oblique Heisenberg box that is aligned with the instantaneous frequency of the signal; it is defined by a = A<//(£q), (3 = — and h = A-1/3. These choices have a couple of explanations. A purely geometric explanation is furnished by the following problem: Fix to, and let a, (3 6 R and h > 0 vary arbitrarily. Find the largest value of h, and the corresponding pair (a, (3), such that the Heisenberg box В defined by these parameters contains the arc defined by |t - t0| < h, £ = Figure 5.3 illustrates this problem. It is not difficult to see that the solution to this problem is again given by the orders of magnitude found before, namely, h — A”1/3 and the slope of the Heisenberg box В is A</?"(io)- We will show in Chapter 6 that the search for the optimal decomposition of the signal f in an adapted modulated Malvar Wilson basis comes down to finding the optimal covering of the graph Г of £ = by oblique Heisenberg boxes B. In a practical, nonacademic situation, one studies the signal f on a given interval [—T, Т]. In this case, the optimal covering of Г by oblique Heisenberg boxes В also can be described as the covering that minimizes the number of boxes used.
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 85 5.13 Several spectral lines All of the preceding discussion is based on the fundamental assumption that /(0 = A(t) cos(A^(t)), where A and ip are real-valued, regular functions and where A > 1 is a large parameter. The case /(t) = A(t) cos(A<^(t)) + B(t) sin(A<£(t)) reduces to the former one. If we define a(t) by ( A(t) = \/A2 + B2 cosa(i), [-B(t) = \/A2 + B2 sina(t), then f(t) — \/A2 + B2 cos(A0(i)), where 9(t) — <p(t) — A-1a(t) is as regular as <p. On the other hand, consideration of the finite sum f(t) = Ai cos(A<^i(t)) H---h An cos(Acpn(t)), (5.42) where pi,... , pn, Ai,... , An are smooth, real-valued functions, is a step in another direction. Here we speak of “several spectral lines,” and the task of the algorithm we wish to describe is to extract each of these spectral lines from the noisy signal f + az where z is a white noise and cr > 0 is a small parameter. As Patrick Flandrin has explained in [112], looking for the instantaneous fre- quency of a signal having several spectral lines does not make sense, and the search must be abandoned. If we again assume that pi'(t) > 1,... , pn'(t) > 1, the analytic signal F associ- ated with f is F(t) = Ai(t)e^1W + • • • + An(t)e^w + O(X~N), (5.43) where N > 1 is arbitrary. Then the Wigner-Ville transform of F can be written n E^<‘^) + EE (5.44) j—1 l<7<fc<n where is the Wigner-Ville transform of Aj(t)eTAv’j('^ and the W>,fc(t,£) represent the “cross terms.” From what we have seen in section 5.10, we know that Wj(t,£) is “essentially” concentrated on the curvilinear band |£ — A</?/(i)| < A1/3. This first part of the discussion leads us to assume that these bands are disjoint. By doing computations based on the stationary phase method, one can show that the “cross terms” Wj,k(t,£) are “concentrated” in the bands <А1/3. These cross terms play the role of artifacts and should be eliminated. They act like noise in the image of the signal represented in the time-frequency plane. To eliminate this noise, one takes advantage of the fact that these cross terms oscillate
86 CHAPTER 5 as a function of time. The farther the curves Fj (defined by £ = A<^/(t)) are sepa- rated from each other, the greater will be these oscillations. These parasitic terms are eliminated by appropriately averaging the Wigner Ville transform W(t,£) of F. One can prove that this averaging algorithm is equivalent to using Mallat’s algorithm, as in the last section. Of course, this averaging entails a loss of localiza- tion, and in practice, the Wigner Ville transform is used mainly to detect a single spectral line in the presence of noise. This is precisely the setting for the detection of gravitational waves. We will say more about gravitational waves in section 6.11. 5.14 Conclusions We currently have several tools for doing time-frequency analysis. These tools are of three kinds: (a) the Wigner-Ville transform, (b) Mallat’s matching pursuit algorithm, and (c) the best-basis algorithms of Coifman and Wickerhauser, which we will discuss in Chapter 6. The scientific problem that has motivated the development of these algorithms is the search for an “instantaneous Fourier transform” or an “instantaneous fre- quency,” or for an optimal decomposition of the signal in time-frequency atoms, or finally for an optimal representation of the signal in the time-frequency plane. Today we are faced with a paradox. The three algorithms (a), (b), and (c) provide three responses to the scientific problem. However, we cannot decide if these responses are pertinent to the problem since, in fact, the scientific problem has no precise meaning. Even if we stay within the context of the Wigner-Ville transform, there are an infinite number of choices because for certain applications it is necessary to “smooth” this transform either in time, in frequency, or in both variables simulta- neously. These smoothings “erase” the undesirable artifacts (for example, the cross terms that appear in (5.44)). The choices for the windows used for smoothing and the sizes of these windows will clearly depend on the signal being studied, and we do not now have an algorithm that leads to objective choices. The situation is even worse. The Wigner-Ville transform is one among a vast collection of quadratic transforms known as the Cohen class. If the Wigner-Ville transform ideally compresses the linear chirps of the form f(t) = exp(z(at + fit2)), a, fi G R, one can hope to have a quadratic transform that ideally compresses hyperbolic chirps, which are the signals of the form /(t) = exp(zA log(t)), A real. This problem is treated in [112], and one finds there the classical vicious circle: The analytic tool depends on the a priori information one has. If one wishes to analyze the sounds that bats emit, which are essentially hyperbolic chirps, then the Wigner-Ville transform is probably not the optimal tool. The other two algorithms are just as nonobjective. It goes without saying that Mallat’s matching pursuit algorithm depends critically on the choice of time- frequency atoms in Q, and similarly the Coifman Wickerhauser algorithm depends on the bases one chooses for the library of bases. Thus it appears that things are in a state of disorder and confusion. The asymp- totic signals have served as a test case to clarify the relations between the different algorithms. 5.15 Historical remarks Eugene Wigner was motivated by problems in quantum mechanics when he in- troduced what is called the Wigner transform of a function ф [260]. J. E. Moyal
TIME-FREQUENCY ANALYSIS FOR SIGNAL PROCESSING 87 elaborated on Wigner’s work, proved the identity (5.12) that bears his name, and obtained Theorem 5.1 [214]. The connection between signal processing and quan- tum mechanics was discovered by Jean Ville [254, p. 65]. He tells us that “la frequence est, a proprement parler, un operateur,” and of course he meant the op- erator N. G. de Bruijn [76, p. 59] also noted the connection between signal processing and quantum mechanics: Both in music and in quantum mechanics we have the situation of a function of a single variable, which appears to be a function of two variables as long as the observation is not too precise. The parallel between quantum mechanics and music can be carried a little further by comparing the composer to the classical physicist. The way the composer writes an isolated note as a dot, and thinks of it as being completely determined in time and frequency, is similar to the classical physicist’s conception of a particle with a well-determined position and momentum. These quotations are to support our observations concerning time-frequency anal- ysis versus time-scale analysis: The former is a result of cross-fertilization between signal processing and quantum mechanics, and mathematicians took little interest in these ideas until recently. The latter was pioneered by mathematicians long before it was adopted in physics and signal processing. The situation is different today, and time-frequency analysis is widely used in mathematics under the name of microlocal analysis and its variants. We men- tion, as examples, the theory of wave packets by Cordoba and Fefferman [67] and the Fourier Bros lagolnitzer transform [77]. The Cordoba-Fefferman wave packets generalize Gabor wavelets. In the n-dimensional case they are defined by where g(x) — /2 and h = |£o|1//2- One is interested in the asymptotics as h tends to zero. Cordoba and Fefferman then study the action of Fourier integral operators on such wave packets. This action mimics the well-known action on Gabor wavelets of the unitary group generated by the harmonic oscillator. The Cordoba-Fefferman wave packets motivated the construction of wavelet packets by Coifman, Meyer, and Wickerhauser. Anticipating the discussion in Chapter 7, we note that the orthonormal basis for L2(IR) consisting of the functions Wq(x — k), к 6 Z, together with 2J/2wn(2Jx — k), 2J < n < 2j+2, j > 0, n > 1, к 6 Z, mimics the Cordoba-Fefferman wave packets. If we accept that wri(x — к), к 6 Z, is centered around the frequency n, then the “main frequency” of wn(2J — k) will be 4J = Л”2, where h is the “length” of wn(2J — k). These remarks illustrate the interactions between quantum mechanics, signal processing, and, eventually, mathematics.
CHAPTER 6 Time-Frequency Algorithms Using Malvar-Wilson Wavelets 6.1 Introduction This chapter continues the time-frequency analysis of Chapter 5. We will introduce algorithms that allow us to decompose a given signal s into a linear combination of time-frequency atoms. The time-frequency atoms that we use are denoted by /д and are “coded” by Heisenberg rectangles R with sides parallel to the axes and with area 1 or 2тг, depending on the normalization. If R — [a,b\ x [a,0\, we require that the function /д be essentially supported on the interval [a, b] and that its Fourier transform /д be essentially supported on [a, (3\ and the opposite frequencies [—/?,—a]. We also want the algorithmic structure of /д to be simple and explicit to facilitate numerical processing in real time. The decomposition = 52^/дДО (6.1) j=0 cannot be unique, and we take advantage of this flexibility by looking for optimal decompositions, which for our purposes means that they contain the fewest possible terms. The point of view of Ville (and of numerous other signal-processing experts) is that it is first necessary to understand the physics of the process and that “the al- gorithms will follow.” A careful reading of Ville’s fundamental paper [254] suggests the following algorithm for finding the optimal decomposition (6.1): (1) Compute the Wigner-Ville transform W(t,£) of /; (2) define the domains of the time- frequency plane by 2”-7”1 < |W(t,£)| < 2”J, j > 0; and (3) optimally cover Q? with Heisenberg boxes Rj,k- One should then use these boxes to write the opti- mal decomposition (6.1). This program appears unrealistic, and one of the main objections is this: The domains may have complicated structures, and thus the Heisenberg boxes may provide poor coverings for the . This situation can be im- proved if the set of horizontal and vertical Heisenberg boxes is enlarged to include oblique boxes, but this means that we will need other time-frequency atoms. These new atoms will be introduced in section 6.11. For the time being we will be less ambitious and stay with horizontal and vertical Heisenberg boxes. The time-frequency atoms that we use are completely explicit. They are either Malvar-Wilson wavelets or wavelet packets, and we will immedi- ately write down the atomic decompositions of the type (6.1). This means that the synthesis will be direct, whereas the analysis will consist of choosing—with the use of an entropy criterion—the most effective synthesis, which is the one that leads to
90 CHAPTER 6 optimal compression. Thus the analysis proceeds according to algorithmic criteria and not according to physics, and it is not at all clear that this approach leads to a signal analysis that reveals physical properties having real significance. For exam- ple, Marie Farge had the idea to apply the algorithm to simulated two-dimensional turbulence. The algorithm extracted various coherent structures, beginning with the larger ones and continuing down the scale to a cutoff point. This shows that coherent structures can be well represented using a few wavelets. However, this remarkable work calls for an interpretation: What does this tell us about coherent structures? After these general remarks, it is time to specify the algorithms. There are two options: Malvar-Wilson wavelets and wavelet packets. With the first option, the signal is segmented adaptively and optimally, and then the segments are analyzed using classical Fourier analysis. The second option, wavelet packets, reverses the order of these operations: The signal is first filtered adaptively; then the analysis in the time variable is imposed by the algorithm. Ville proposed two types of analysis [254, p. 64]: “We can either first cut the signal into slices (in time) with a switch and then pass these different slices through a system of filters to analyze them, or we can first filter different frequency bands and then cut these bands into slices (in time) to study their energy variations.” The first approach leads to Malvar-Wilson wavelets and the second to wavelet packets. As mentioned above, a third option will be proposed in section 6.11. This option provides a better fit between the Heisenberg boxes Rj, which appear in (6.1), and the level sets Qj associated with the Wigner-Ville transform of f. 6.2 Malvar-Wilson wavelets: A historical perspective The scientific program that led to adaptive Malvar-Wilson wavelets was initiated by the physicist Kenneth Wilson (Nobel laureate in physics, 1982) [261]. These time- frequency wavelets were later discovered independently by the signal processing expert Henrique Malvar [185] (see also [186], [187], [188], and [189]). Malvar- Wilson wavelets fall within the general framework of windowed Fourier analysis. The window is denoted by w, and it allows the signal s to be cut into “slices” that are regularly spaced in time w(t — bl)s(t), I 6 Z. The parameter b > 0 is the nominal length of these slices. Next, following Ville, one does a Fourier analysis on these slices, which reduces to calculating the coefficients J e~iaktw(t — bl)s(t)dt, where a > 0 must be related to b and where к E Z. This is thus the same as taking the scalar products of the signal s with the “wavelets” = e'aktw(i - Ы). This analysis technique was proposed by Gabor [124], in which case the w was the Gaussian. The Gabor wavelets lead to serious algorithmic difficulties. More generally, Low and Balian showed in the early 1980s that if w is suffi- ciently regular and well localized, then the functions Wk,i, k,l E Z, can never be an orthonormal basis for L2(KQ [17]. More precisely, if the two integrals JR(1 + \t|)2|w(t)|2dt and JR(1 + |£|)2|w(£)|2«/£ are both finite, the functions Wkj, k,l G Z, cannot be an orthonormal basis of L2(IR). The crude window defined by w(i) = 1 on the interval [0, 2-zr) and w(t) = 0 elsewhere escapes this criterion. By choosing a = 1 and b = 2tv, the windowed analysis consists of restricting the signal to each interval \21tt, 2(/+1)tt) and using the Fourier transform (in this case, Fourier series) to analyze each of the corresponding
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 91 functions. But even if one starts with a smooth signal, the functions obtained by this crude segmentation are not the restrictions of smooth 2?r-periodic functions, and the Fourier analysis will highlight this lack of periodicity and interpret it as a discontinuity in the signal. One way to attenuate these numerical artifacts, which does not eliminate them completely, is to use the discrete cosine transform (DCT). We will describe the con- tinuous version of this transform. On each interval [2/тг, 2(/ + 1)тг), the signal s(t) is analyzed using the orthonormal basis composed of the functions and cos к E N*. If s is a very regular function, this segmentation introduces discontinuities only in the derivative of the signal, and the numerical artifacts produced by the segmentation are reduced from the order of magnitude | to p-. Wilson was the first to have the idea that one could get around the problem pre- sented in the Balian Low theorem by imitating the DCT and using a segmentation created with very regular windows. Wilson proposed to alternate the DCT with the discrete sine transform (DST) according to whether I is even or odd, where I denotes the position of the interval. The DST uses the orthonormal basis consisting of the functions ~^= sin к G N*. Wilson’s ideas have been the point of departure for numerous efforts, the most notable of which is due to Ingrid Daubechies, Stephane Jaffard, and Jean-Lin Journe [74]. They used a window w having the property that both it and its Fourier transform decay exponentially, and they constructed w so that the functions Uk,i, к E N*, I G Z, and uq,z, I G 2Z, defined by л- к v2 w(t — 2Ztt) cos —t, I G 2Z, A: = 1,2,... , (6.2) w(t — 2/тг), I G 2Z, к = 0, (6.3) Wfc,z(t) = y2 w(t — 2/тг) sin —t, I G 2Z + 1, A=l,2,... , (6-4) constitute an orthonormal basis for L2(IR). Exponential decay for both w and w is an essential requirement for the applications that Wilson had in mind in renormalization theory. Malvar did not know about Wilson’s work. He discovered a family of orthonormal bases whose algorithmic structure is the same as that described by (6.2), (6.3), and (6.4), but where the choice of the window w is simpler and more explicit. In fact, Malvar had only these hypotheses: w(t) =0 if t < —7Г or t > Зтг; (6-5) 0 < w(f) < 1 and w(2tt — t) = w(i); (6-6) W2(t) + W2( —t) = 1 if — 7Г < t < 7Г. (6-7) Then the construction is the same, and the sequence Uk,i defined by (6.2), (6.3), and (6.4) is an orthonormal basis for L2(IR). In Malvar’s construction, the window w can be very regular (infinitely differentiable, for example), but the Fourier transform of w cannot have exponential decay. Condition (6.5) prevents it, and this condition plays an essential role in the proofs. The Malvar basis can be incorporated into a general framework developed by Daubechies, Jaffard, and Journe. It appears there as a simple example in a system- atic construction. It can, however, be developed directly, and in this way Malvar’s construction happens to be more flexible than the Daubechies Jaffard Journe ap- proach. This remark will become clear in the next section.
92 CHAPTER 6 6.3 Windows with variable lengths Coifman and Meyer modified the preceding constructions to create windows with ar- bitrary, variable lengths [64]. The construction by Daubechies, Jaffard, and Journe does not extend to this context, while that of Malvar generalizes to the case of arbitrary windows without the slightest difficulty. We begin with an arbitrary partition of the real line into adjacent intervals [a j, a j-i-i], where ... < a_i < do < aj < a? < ... , lirn?^+oc a? = +oo, and limj^-oo aj = — oo. Write lj = aj+i — dj and let aj > 0 be positive numbers that are small enough so lj > aj + a j+i for all j' E Z. The windows Wj that we use will be essentially the characteristic functions of the intervals [aj, aj+1]; the role played by the disjoint intervals (dj — aj,dj + a?) is to allow the windows to overlap, which is necessary if we want the windows to be regular (Figure 6.1). More precisely, we impose the following conditions: о IA IA for all t E R, (6-8) w/t) = 1 if dj T aj < t < — aj_|_i, (6-9) Wj(t) =0 if t < dj — aj or t > dj+i + Qj+i, (6.10) w^(dj + t) + wJ(aj — t) = 1 if |t| < aj, (6.П) Wj-i(dj + t) = = гс7(а7-т) if |t| < a;. (6.12) Note that these conditions allow the windows w3 to be infinitely differentiable. It is clear that W = 1» identically on the whole real line. Finally, we come to the Malvar-Wilson wavelets. They appear in two distinct forms. The first is given by uj,k(t) — COS k + 2 J (*~ау) j e Z, к E N. (6.13) The second form consists of alternating the cosines and sines according to whether j is even or odd. Thus we have three distinct expressions for the second form: y|w.,v)cos А'7Г / 4 7~(* - <b), j E 2Z, k = 1,2,... , (6.14) Uj,k(t) = Jlw^’ j E 2Z, k = 0, (6.15) uj,k(t) — /Т / — Wj (t) Sin v 4 ктт . . 7~(* - j E 2Z+ 1, k= 1,2,... . (6.16) The functions Ujtk, j E Z, к E N, given by (6.13) are an orthonormal basis for L2(R), and so are the functions defined by (6.14), (6.15), and (6.16). Two Malvar-Wilson wavelets of the form (6.13) with к = 8 are shown in Figure 6.2. Note the similarity between these wavelets and the time-frequency atoms proposed by Lienard: The Malvar-Wilson wavelets are constructed with an attack (whose duration is 2aj), a stationary period (which lasts lj — aj — Oj+i), and then a decay (which lasts 2aj+1). The ability to choose, arbitrarily and independently, the
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 93 FlG. 6.1. A typical Malvar window. Fig. 6.2. Two Malvar-Wilson wavelets. duration of the attack, then that of the stationary section, and finally the duration of the relaxation is precisely what differentiates the Malvar-Wilson wavelets from the preceding constructions (Gabor or Daubechies-Jaffard Journe). It is, of course, important to make good use of the choices at our disposal, and we will see how to do this in the following sections.
94 CHAPTER 6 6.4 Malvar-Wilson wavelets and time-scale wavelets In 1985, Yves Meyer constructed a function ф belonging to the Schwartz class <S(R) such that — A:), j,k 6 Z, is an orthonormal basis for L2(1R). In addition, the Fourier transform of ф is zero outside the intervals [ — уу, ~ yp] and . We will see that these wavelets 2^2ф(2Н—к\ j, к 6 Z, constitute a particular case of the general Malvar construction. This is quite surprising because the Lemarie- Meyer wavelets constitute a time-scale algorithm, whereas the Malvar-Wilson wavelets are a time-frequency algorithm. There is thus an apparent incompatibility. In fact, it is by analyzing the Fourier transform f of an arbitrary function f in an appropriate Malvar-Wilson basis that we arrive at the analysis by Lemarie-Meyer wavelets. We begin with the following observation: The Malvar-Wilson wavelets let us analyze functions defined on a half-line. The segmentation of (0, oo) we use is the “natural” division into dyadic intervals [2J , 2j+i], j E Z. Then it is natural to choose the windows Wj, associated with these intervals, to be of the form Wj(x) = w(2~Jx). Thus the whole construction rests on the precise choice for the function w. For this, we make the following choices in accordance with conditions (6.8)-(6.12): w(x) = 0 outside the interval [|,|], w(2x) = w(2 — x) for 2 < x < |, and a,2(.r) + w2(2 — .r) = 1 on the same interval. Then aj — 2J, ctj — |2J, and lj = 2J — aj + O!j+i. This is illustrated in Figure 6.3. Using these parameters, the Malvar-Wilson wavelets of type (6.13) are, up to an irrelevant power of —1, uj,k(.x) = л/2 2 7//2w(2 J;r)sin 7Г 2~jx (6-17) If we replace the cosines in (6.13) by sines, we obtain a second orthonormal basis for L2 [0, oo] of the form vj,k(.x) = \/22 j/2w(2 7;r)cos 7Г f к + - j 2 i x (6.18) We next extend w to the whole line by making it an even function: w(—x) = w(x). This gives a natural odd extension for the functions Uj^ and an even extension for the Vj^- Finally, the complete collection of extended functions 1 V2 (6.19)
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 95 is an orthonormal basis for L2(R). It follows that the set of functions — zuj^), + iuj,k) is also an orthonormal basis for L2(R). Next, we observe that = 2-f3 + »/2w(2-^'le"(k+t'2>^’t (6.20) and that by letting k* = —1 — k, we have k + iu^k^x) = 2~(j+1)/2w(2-^)el7r(fc* + 1/2)2”Ja:. (6.21) The conclusion is that the sequence Z7r | к H— | 2 3x \ 2/ 2 (J+1)/2w(2 7ж)ехр j, к e z, (6.22) is an orthonormal basis for L2(R). Denote the Fourier transform of the function ^ж(ж)ег7ГЖ/2 by 6. This function is real-valued and satisfies $(тг — i) = #(£). Then the sequence -|=2j/20(2Jt - ктт), j, к e z, (6.23) is an orthonormal basis for L2(R). By defining ^(i) = -^0(ttZ) we regain the usual form, and 2J/2^(2Ji -k), j,kt Z, is also an orthonormal basis for L2(R). It is clearly possible to require that w be an infinitely differentiable function, in which case ф will be a function in the Schwartz class 5(R). Recall the program of Ville. There were two possible approaches: Either segment the signal appropriately and follow this by Fourier analysis, or pass the signal through a bank of filters and then study the individual outputs of the filter banks. Here we have taken the second approach. The filter bank was defined by the transfer functions w(2-Ja;), where w is the even window used above. 6.5 Adaptive segmentation and the split-and-merge algorithm From now on, we will give up trying to find an optimal segmentation. Instead, we will only consider a quite specific collection of segmentations and find the optimal segmentation within this collection. This collection will be fixed, and we note that there is no reason to believe that the solution within this collection will be related to the “physics” of the problem. For example, there is no reason to believe that this segmentation of a speech signal will have any relation to objects intrinsic to speech such as phonemes. We are not going to create the best segmentation all at once. We will modify an existing segmentation to produce a new one, and by iterating this procedure we approach an optimal segmentation. The modification operation is described in this section. A segmentation is modified by adjusting the partition (ftj) that defines the segmentation, and this is done by iterating the following elementary modifications: An elementary modification consists of suppressing a point dj of the partition; this means that the two intervals and [aj,aj+i] are combined into a single interval, namely, [flj-i, flj+i]- The other intervals remain unchanged. This operation is called merging. The inverse operation consists of adding an extra point a between the points a3 and aj+i, which results in replacing the interval
96 CHAPTER 6 [aj,aj+i] by the two intervals [aj,a] and [a,aj+i]. This inverse operation is called splitting, but, in fact, we will be using only the merging half of the algorithm. A split-and-merge algorithm provides a criterion to decide when and where to use one or the other of these elementary operations. We are going to examine the effect of these operations on a Malvar-Wilson basis. We will show that an elementary operation induces an elementary modification of the basis that is easy to calculate. The following observation is the point of departure for this discussion. For each fixed j, let Wj denote the closed subspace of L2(R) generated by the functions Uj,k, k G N* described by (6.13). Then f belongs to Wj if and only if /(£) = Wj(t)q(t), where q belongs to L2[a3 — aj,aj+i + oy+i] and satisfies the following two conditions: q(aj +r) = q(a3 - r) if |r| < a3, q(aj+1 + r) = ~q(aj+1 - r) if |r| < qj+i. There are no conditions that need to be satisfied on the interval [aj +a3, aj+i —qj+i] . From here, the merging algorithm is quite simple. Removing the point aj of the partition amounts to replacing the two subspaces W3-i and Wj by their direct orthogonal sum Wj-itBWj without disturbing any of the other spaces Wj,, j' j — 1 and j' j. But this, in turn, comes down to replacing the two windows Wj-i and Wj by the new window Wj defined by w3(t) = (w2_x(i) + w2^))1/2. The two lengths lj-i and l3 are replaced by lj = lj~i+lj, which changes the fundamental frequency in (6.13). We consider a simple example to fix our ideas. Start with a segmentation with intervals of length 1, aj = j, and choose w7(i) = w(t — j) with aj — |. We wish to examine the windows that can appear as a result of the merging algorithm. These windows and the corresponding wavelets will look like centipedes (see the second Malvar-Wilson wavelet in Figure 6.2). The localization of these centipedes in the time-frequency plane is not optimal. This is because, in using the merging algorithm, we never change the values of the numbers aj. In our example, we always keep aj = |. We must now provide the criterion that allows us to decide when to use the dynamic split-and-merge algorithm. This means that we need to establish a nu- merical value to measure what is gained or lost by adding or deleting a point in the subdivision. This is the purpose of the next section. 6.6 The entropy of a vector with respect to an orthonormal basis Let H denote a Hilbert space and let (ej)jej be an orthonormal basis for H. Let x be a vector of H of norm 1 and write x = ajej- The entropy of x relative to the basis ej is defined by exp(—|o j2 logjo712). Roughly, this entropy measures the number of significant terms in this decomposition. In information theory it measures the quantity of information needed to store these coefficients. Note that it is minimal in the simplest case where x is one of the e3, and it becomes large when many of the aj are of the same order of magnitude. If we have a collection (e^ )jej of orthonormal bases where w ranges over a set Q, we will choose for the analysis of x the particular basis (indexed by cjq) that yields the minimum entropy. This point of view poses three problems:
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 97 (1) Does an optimal basis exist? (2) It is not clear that a compression algorithm whose only objective is efficiency can also be used for diagnostic tasks. (3) The underlying energy criterion (the square of the norm in the Hilbert space H) can cause certain information in the signal to be given low priority, and this information can subsequently disappear in the compression even though it may be crucial for the diagnostic. Until recently, the algorithms used in image analysis were based on an energy function that is defined as the quadratic mean value of the gray levels. The al- gorithm used to search for an optimal basis for compression does not escape this difficulty. The search for a norm that is better adapted to the structure of images is still an open problem, but some progress has been made using Besov spaces. This will be discussed in Chapter 11. 6.7 The algorithm for finding the optimal Malvar-Wilson basis We will examine in detail the particular case where the Hilbert space H is the space of signals f with finite energy, which is defined by \f(t)\2dt. The quality of the compression will be measured only by this L2 criterion. The algorithm looks for “the best basis”; this is the one that optimizes compression based on the reduction of transmitted data. The search is done by comparing the scores of a whole family of orthonormal bases of L2(R). These are Malvar-Wilson bases, and they are obtained from segmentations of the real line into dyadic intervals. The decision to use only dyadic intervals is a poor man’s limitation to save search time. Indeed, it would be impossible to scan all Malvar-Wilson bases. Note, however, that the decision to limit the search to dyadic intervals may introduce artifacts. For example, in speech processing, one goal of optimal segmentation is to extract the phonemes. It is clear that phonemes are not subject to the condition that they begin and end on dyadic intervals. It is rather surprising that this limited search for a best basis has proved to be interesting for speech processing [258]. (While on this subject, we mention that X. Fang has developed a segmentation algorithm that can be used to partition a speech signal so that the signal in each segment is “almost” a phoneme. This is not a wavelet algorithm, but once Fang’s algorithm is used for preprocessing the signal, a wavelet algorithm can be used to analyze the individual phonemes. This is discussed in [257]. We note that the best- basis algorithm also played a role in the development of the standard for fingerprint compression [41].) The dyadic intervals are systematically constructed in a scheme that moves from “fine” to “coarse.” One begins with a segmentation having intervals of length 2“9, where q > 0 is large enough to capture the finest details appearing in the signal. By a change of scale, we may assume that q — 0. The process consists of removing, if necessary, certain points in the segmentation and in replacing, at the same time, two contiguous dyadic intervals I' and I" (appearing in the former segmentation) with the dyadic interval I = I' U I". The desire to have a fast algorithm dictates that the merge algorithm be limited to situations where [flj-i, dj] and [aj, flj+i] are the left (/') and right (/") halves of a dyadic interval I. For example, [2, 3] and [3,4] can become [2,4] with the disappearance of 3, but [3,4] and [4,5] can never become [3,5]. For the point 4 to disappear, it would be necessary to wait for the possible merging of the intervals [0,4] and [4,8].
98 CHAPTER 6 Having set q — 0, we start with the segmentation where the “fine grid” is Z. The intervals [а7-,а7+1] of section 6.5 are now [j, j + 1], and the first orthonormal basis to participate in the competition will be = V2w(t — j) cos ’rp + l) (i-j) . (6-24) where j G Z, к G N. The other orthonormal bases that participate in the competi- tion will all be obtained from this first one by merging. The algorithm that merges two orthonormal bases into one was described in section 6.5. Each partition of the real line into dyadic intervals of length greater than or equal to one defines one of the orthonormal bases that are allowed to participate in the competition. One reaches all of these partitions by iterating those elementary oper- ations that combine the left and right halves of a dyadic interval and by traversing this tree structure, starting from the “fine grid” Z. We will show how the competition proceeds in a moment, but first we establish a handy notation and make some simplifying assumptions. The collection of all the dyadic intervals I of length \I\ > 1 will be denoted by Z, and if I = [aj, aj+i] is one of these dyadic intervals, wj denotes the window that was denoted by Wj in section 6.5. In the same way, Wj denotes the closed subspace of Z2(R) that was denoted by Wj\ denotes the orthonormal sequence defined by (6.13), which is now an orthonormal basis for Wj. If I' and I" are, respectively, the left and right halves of the dyadic interval Z, then Wj — Wj> Ф Wj", and this direct sum is orthogonal. The signal f that we wish to analyze optimally is normalized by Ц/Ц2 = 1- To simplify the following discussion, we assume in addition that f(t) is zero outside the interval [1, T] for some sufficiently large T. Then f belongs to Wl if L = [0, 2Z] and I is large enough. It can be shown that if m tends to infinity, the entropy of f in the orthonormal basis of И/, I = [0, 2m], also tends to infinity. Thus there exists some value of m after which no longer enters into the competition. In other words, the dyadic partitions that come into play will, in fact, be the partitions of L = [0, 2m] (for sufficiently large m) into dyadic intervals I of length \I\ > 1. The number of partitions is thus finite, but it can be incredibly large, the order of magnitude being 2^2 fi It remains to find a fast algorithm to search for the “best basis.” This is the algorithm that we are now going to describe. If I belongs to Z, then we will write — f OO £(/) = -^|44!2loglT’|2. (6-25) k=0 and s’(/) = inf££(Jp), (6.26) p where the lower bound is taken over all the partitions (Jp) of the interval I into dyadic intervals Jp belonging to Z. If I = [j,j + 1], then clearly s*(Z) = e(Z). The problem that we must solve is thus reduced to finding the optimal partition (Jp) when I — L — [0, 2m], the largest of the dyadic intervals involved in the com- petition. The calculation of s*(Z) and the determination of the optimal partition
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 99 cannot be done directly because the number of cases to be considered is too large. We will calculate £*(/) for \I\ = 2n by induction on n. For n — 0, we must cal- culate £*(/) = for all intervals I = [j,j + 1] in [0, 2m], Next we proceed by induction on n, assuming that we have calculated £*(T) for \I\ — 2n and that we have determined the corresponding covering (Jp). Suppose that \I\ = 2n+1 and let I' and I" be the left and right halves of I. There are two cases: If e(I) < + €*(/"), keep I and forget all the preceding information about I' and J"; define £*(/) = s(Z) and the partition of I is the trivial partition consisting of only I. If s(Z) > C(J') + C(J"), set C(J) = C(J') + s*(Z") and the partition of I is obtained by combining the partitions of I' and I" that were used to calculate £*(/') and Arriving at the “summit of the pyramid,” that is to say, at L, we expect to have found the minimal entropy and the optimal partition of L, which leads to the optimal basis. We have just described the dyadic version of the optimal basis search, but as we indicated above, the restriction to certain dyadic intervals is unrealistic. A translation-invariant algorithm that avoids this restriction has been developed by Coifman and Donoho [62]. 6.8 An example where this algorithm works Consider a signal f(t) = g(t) + ^elivtg where g(t) = е_<2/2, the real number w can be arbitrarily large, and 0 < h < 1. We will be concerned with the limiting situation where h is very small. If f is analyzed using the Malvar-Wilson basis associated with a regularly seg- mented grid (aj = ja), then the entropy of the decomposition is necessarily greater than Clog Indeed, if the grid mesh is of order 1, the term -±=еш1д уегУ poorly represented, whereas if the mesh is of order h, the term g(t) is very poorly represented. The entropy of a decomposition of f can decrease to C (a constant) by using the adaptive segmentation of the last section. Assume that h — 2~q and that the initial grid is 2-<7Z. The optimal partition in dyadic intervals is then formed from the sequence of nested dyadic intervals Jq C Jq-i C • • C Jo containing to and having lengths 2~q,2 2~q,... ,1. To each Jn we associate the two contiguous intervals of the same length to the left and right of Jn. The extremities of the dyadic intervals thus defined constitute the optimal segmentation for f. It is not difficult to show that the entropy of f in the Malvar-Wilson basis corresponding to this segmentation does not exceed a certain constant C. The adaptive segmentation has allowed us to “zoom in” on the singularity of /, which is located at t = t0. Thus, in this example, the optimal segmentation algorithm has provided an interesting analysis of the signal f. 6.9 The discrete case We replace the real line R by the grid hZ, where h > 0 is the sampling step. Thus the signal f is given by a sampling denoted f(hk), к G Z, but we will not discuss
100 CHAPTER 6 here the technique used to arrive at this sampling. We will forget h in all that follows and assume that f is sampled on Z. A partition of Z is defined by the intervals [aj, aj+i], where aj — | is an integer and where aj is not an integer. (This construction often has been adopted for the DCT.) Denote the number of points belonging to [aj, aj+i] П Z by lj = aj+i — aj, and let the numbers aj > 0 be small enough so that a3 + aj+i < lj. The windows Wj will be subject to exactly the same conditions as in the contin- uous case. This means that Wj (Z) = 0 outside the interval [aj — aj, aj+i + Oj+i]; (6.27) Wj (Z) = 1 on the interval [aj + aj, aj+i — ctj+i]; (6.28) 0 < Wj(i) < 1 and Wj-i(aj + t) = Wj(aj — r) if |r| < a3‘, (6.29) w?(aj + r) + w^taj — t) = 1 if |r| < aj. (6.30) Then the double sequence is an orthonormal basis for Z2(Z). Nothing prevents us from considering a finite interval of integers and replacing Z2(Z) by Z2{1,... , N}. Start with a0 — | and end with aJ0+i — N + |. We require that w0(t) be equal to 1 on [|,a± — Qi], and there is no other constraint on this interval. Similarly, Wj0(Z) = 1 on [aj0+i — a!j()+i, aj0+i] with no other constraint on the interval. This shows that the Malvar-Wilson bases exist in very different algorithmic set- tings, and it is this that makes them more flexible than other analytic techniques such as Gabor wavelets and Grossmann-Morlet wavelets, for example. 6.10 Modulated Malvar-Wilson bases As indicated in the introduction, the use of Malvar-Wilson bases comes down to covering the time-frequency plane with Heisenberg boxes whose sides are parallel to the coordinate axes. More precisely, the boxes are defined by an adaptive segmen- tation of the time axis; once this is done, the partition of the frequency axis follows automatically from the uncertainty principle (the area of each box being 2тг). The use of wavelet packets is based on a similar, but inverse, approach; in this case the adaptive filtering precedes the segmentation. Unfortunately, these two options are incompatible with the use of more elaborate time-frequency algorithms such as the Wigner-Ville transform. This incompatibil- ity was stressed in the first edition of this book. Since then the situation has changed considerably, and today we have orthonormal bases that are adapted to frequency modulated signals. These new results were reported in [63], and the rest of this section is based on that article. Our story begins with work by Richard Baraniuk, Simon Haykin, Douglas Jones, and Steve Mann (see [18], [19], [20], [21], [194], [195], and [196]). The time-frequency atoms that are adapted to frequency modulated signals are called chirplets by these authors. Their chirplets are Gabor-type wavelets with an extra frequency modulation that is given by a linear chirp. The weakness of the original approach
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 101 is the lack of specific orthonormal chirplet bases that are flexible enough to be used in the context of the Coifman-Wickerhauser best-basis algorithm. To achieve this goal, the original chirplets will be reshaped in a way that mimics the construction of the Malvar-Wilson bases. We will now show how to construct orthonormal chirplet bases. The signals for which such a construction might be useful are quasi-stationary signals that can be partitioned into a sequence of pieces with specific frequency modulation laws. This segmentation is provided by an arbitrary increasing sequence tj, j G Z, of real numbers. The best-basis algorithm will be looking for the optimal partition. Since we want to avoid abrupt discontinuities, the segmentation of the signal is given by a sequence of bell-shaped functions Wj that mimic the characteristic func- tions of the intervals [tj, Lj+i]. More precisely, we assume that lim^-too t3 = ±oo, and we choose a3 > 0 such that ctj + cnjj-i V lj = b/4-i — tj j G (6.32) We require that the bell-shaped functions w3 have the following properties: o<w/t)<i, ^€СО°°(Ж), Wj(i) = 0 if t < t3 — aj or t > tj+i + + s) = Wj(tj - s'), w2(t) = 1 for all t. j = -oo (6.33) (6.34) (6.35) (6.36) |s| < a3, and These are exactly the conditions we used for constructing the Malvar-Wilson bases. We can now introduce the modulation “law” that did not exist in the standard Malvar-Wilson bases. The functions that provide the frequency modulation are real-valued quadratic spline functions whose knots are exactly the segmentation points tj, j G Z. In other words, p(t) = ~Z2 + bjt + c3 if tj<t<tj+i, and p is continuously differentiable on the real line. The orthonormal bases we will construct are adapted to frequency modulated signals of the type /(i) = A(t) exp(z<p(t)) where A is smooth. Let Г be the graph of £ = p'(t) in the time-frequency plane. The class of signals we want to treat is illustrated by Г in the time-frequency plane (see Figure 5.3). Theorem 6.1. The collection of functions = Jsin я (k + 0 GJl (6.37) where j G Z and к G N, is an orthonormal basis for L2(B). This is proved in [62]. Note that we have not defined Wj,k as e^^Wj^tf), where Wj^ is the standard Malvar Wilson basis. We have chosen (6.37) instead because we wish to mimic the standard linear chirps, which are the functions ez(ut+vt /2)gh^j. _ where gift) = h~x/2g(fy and where g is the Gaussian g(t) — -R~1t4e~t /2. Recall that these are the only functions for which the Wigner- Ville transform is nonnegative. Note, however, that the functions e^^Wj^tt) also form an orthonormal basis, since they are obtained from the Malvar- Wilson basis by a unitary mapping.
102 CHAPTER 6 6.11 Examples Frequency modulated signals play an important role in signal processing. One of the more scientifically interesting examples is given by the gravitational waves that are predicted by Einstein’s general relativity. Although these waves have not yet been observed, two international programs were launched to obtain evidence of their existence. One process that is predicted to produce these waves is the collapse of binary stars. In this case, the analytic description is given explicitly by /(i) = (i - cos[u(Z0 - t)5/& + 0], (6.38) where t0 is the time when the collapse occurred, 0 is a parameter, and w is a large constant that depends on the masses of the two stars. Since there is currently so much scientific interest in detecting gravitational waves, these signals are ideal for testing and comparing various time-frequency algorithms. Recalling the definition of a chirp that was given in section 5.9, we see that the two conditions | | “C 1 and Iwjpl 1 become - *ol» (b)1/z3- The signals that experiments seek to measure are considerably corrupted by noise. If one follows Donoho’s paradigm (discussed in Chapter 11), one is led—in the ideal case of Gaussian white noise—to build orthonormal bases in which these gravitational waves have a minimal description length [89]. This is what we are going to do now. We begin with a textbook example. Define /(i) = w(t) exp(iA<p(t)) where is a smooth, real-valued function with > 1, A is a large parameter, and the window is a smooth function with compact support. Then we use the best-basis algorithm. However, we will be looking for a suboptimal basis since the optimal one is out of reach. A suboptimal basis is a basis for which the entropy is of the same order of magnitude as the absolute minimum that would be reached as A tends to infinity. The search for a suboptimal basis inside the unmodified Malvar-Wilson library leads to a segmentation of w(i) exp(zA<^(i)) with a uniform step size h = cA-1/2, c > 0. On the other hand, if the chirplet library is used, the segmentation is still uniform but with a larger step size h = cA-1/3, c > 0, and this means better compression. The constant c is the order of magnitude of the inverse of the cube root of the third derivative of <p. This implies that c is infinite if the signal happens to be a linear chirp. This discussion leads to the following conclusion: For the class of frequency modulated signals we are studying, a Wigner-Ville transform performs no better than a best-basis search inside the chirplet library. In both cases, the frequency resolution is O(A1/3). The second example is perhaps more interesting, since the optimal segmentation is no longer uniform. We consider a signal of the form /(i) = w(i) cos^Z1/2), where w is again a smooth window with compact support and w is a large parameter. To find the suboptimal segmentation in the chirplet library, we use a new variable x = u)2t. This leads to the segmentation of the function cos^1/2) over the large interval [0,cc2]. Then a suboptimal segmentation is given by x^ — ck6 where c is a positive constant. The values of the integers к are 0,1,... , k0 where ко — c-1/6^1/3. Returning to the t variable, we see that this nonuniform segmentation becomes tk — caj~2k6, 1 < к < ко- Finally, we consider gravitational waves. We assume that the parameter w is large, that we are using the chirplet library, and that we are looking for a suboptimal
TIME-FREQUENCY ALGORITHMS USING MALVAR-WILSON WAVELETS 103 basis. In this situation, the suboptimal segmentation is no longer uniform, and in fact it becomes finer and finer as one approaches the blowup of the instantaneous frequency, which is the time when the binary star collapses. The segmentation of the signal f(t) on [i0 — 1, io] is highly nonuniform. Without loss of generality, we are assuming that io = 0. Then (up to an obvious sign change), the segmentation is given by ifc = ccc-8/5/c24/5 where 1 < k < k0 and k0 = c-1/6^1/3. This means that the size of the segmentation step ranges from w-1/3 to cc-8/5 when one reaches 0, which is when the star collapses. If we are looking for gravitational waves, the Wigner-Ville transform does not lead to very sharp time-frequency localization. However, in this case, it is possible to take advantage of the knowledge of the exact form of the chirp being sought to construct a quadratic transform, different from the Wigner-Ville transform, that is fashioned to detect optimally this particular kind of chirp. This transform is chosen from a large collection of quadratic transforms called Cohen’s class. (For more information see [112] and [49].) To complete this discussion, we will outline a wavelet technique proposed by J. M. Innocent and B. Torresani [148] for detecting the chirps described by (6.38). Their technique is based on a “ridge” detection. Consider the half-plane a > 0, b G R where the continuous wavelet transform is defined. The “ridge” is the region near b = t0 where the wavelet transform of a chirp will be large. This is explained informally as follows: Consider the chirp f(t) — . Its wavelet transform W(J;a,b) = | У /(^(~^~) dt (6.39) will be small due to cancellations if the chirp and the wavelet do not oscillate at the same frequency. By the same reasoning, the wavelet transform will be large if the pseudoperiod of the chirp, ^2^, coincides with the pseudoperiod of the wavelet, which is a. Thus the wavelet transform will be large near the curve defined by 27Г a = —— • <р'(Ь) There is no cancellation on this curve, and the computation of the integral in (6.39) looks like this: W(f;a,b) = | У ~- [iml/—-)\dt a J I \ a / I 1 f Л , ft — b\\ , = - / A(i) r0(---) \dt. a J I \ a / I In view of condition (5.27), we expect that A(t) does not vary much on the support of the wavelet, so that - f A(.tM—)|Л»Н||1А(Ь). a J I \ a / I This argument leads to the following heuristic: The continuous wavelet transform of a chirp is large in a neighborhood of the curve a = > where W(/;«,6)^H||M(6).
104 CHAPTER 6 This idea was first developed by Tchamitchian and Torresani in [247] and indepen- dently by Hunt, Kevlahan, Vassilicos, and Farge in [147]. In the case of chirps generated by the collapse of binary stars, </?(£) = (t — to)5/8 and A(t) — (t — to)-1/4, and the ridge is located near the curve a = 1 бТГ z . Q / Q — («0 - b)3/8. uw If we take ||'0||i = 1, |W(a, &)| ~ (to ~ b)-1/4 near this curve. This shows that the ridge depends on only two parameters. Innocent and Torresani propose a parametric statistical test to identify these two parameters, and thus to locate the ridge: Find t0 and determine the characteristic mass parameter w. 6.12 Conclusions The examples mentioned above suggest the following heuristic: If the Wigner-Ville transform W(t, £) of a given signal f is sharply concentrated in the time-frequency plane, then f has a compact decomposition in a suitable modulated Malvar-Wilson basis. This is too ambitious as stated, but it is an idea that opens the way to further study. At the present stage of research, a best-basis decomposition in either a Malvar-Wilson library or a wavelet packet library (Chapter 7) is a quick and efficient processing for a given signal or image. If one considers this analysis as a prepj-ocessing of an image, one is led to the concept of a multilayered analysis. The idea is that the initial processing with, say, a Malvar-Wilson basis reveals aspects of the image, such as textures, that can be further analyzed with more refined tools. This idea of multilayered analysis is illustrated by the denoising of the Brahms live recording. (See section 5.4 and [34].) We will return to this subject in the next chapter, where once again the available analytic tools will be expanded, this time to include wavelet packet bases.
CHAPTER 7 Time-Frequency Analysis and Wavelet Packets 7.1 Heuristic considerations A time-frequency analysis of a signal is a representation of the signal as a linear combination of time-frequency atoms. These time-frequency atoms are essentially characterized by an arbitrary duration t% — G and an arbitrary frequency ш. The instant ti is the moment when the signal is first heard (if it is a speech signal, for example), and G is the instant when it ceases to be heard. The frequency co is an average frequency, this is the frequency of the emitted tone in the case of a musical signal, while the frequency spectrum given by Fourier analysis takes into consideration the parasitic frequencies created by the note’s attack and decay. We also think of a time-frequency atom as occupying a symbolic region in the time-frequency plane (Figure 7.1). This symbolic region is a rectangle R with area 2tf, which expresses the Heisenberg uncertainty principle. 5 R 2tt h IVq--- tl t0 ^2 \—h—\ t Fig. 7.1. A Heisenberg box in the time-frequency plane. The most famous example of time-frequency atoms is that of the Gabor wavelets. For these we have fn(t) = eWotgh(t — to), where to = |(ti + £2) is the center of the time-frequency atom and <7/i(t) = Л-1/2#^), g(t) = 7г-1/4е-< /2. In this case, the “size” of the time-frequency atom is approximately h, and h is approximately equal to the duration —t\. To say that the time-frequency atom /д occupies the symbolic region R of the time-frequency plane means that /д is essentially supported by the interval [ti,G] and that the Fourier transform /д of /д is essentially supported by the interval [cc?o — f, <^o + f ]• It is well known that there does not exist a function with compact
106 CHAPTER 7 support whose Fourier transform also has compact support. This leads one to consider the following, less stringent conditions: J (t - *о)2|/в(<)|2* < C2h2, (7.1) /°° К-^)2|ЛгК)|Ч<2%С2Л~2. (7.2) The time-frequency atom that optimizes this criterion (that is, for which the con- stant C is the smallest possible) is precisely the Gabor wavelet, and the Gabor wavelet owes it success to this optimal localization in the time-frequency plane. On the other hand, we will see below that the Gabor wavelets have a disagreeable property that makes them unsuitable for the time-frequency signal analysis. If the time-frequency atoms /д were actually concentrated on rectangles R in the time-frequency plane, they would enjoy the following property: If R\ and R% are disjoint rectangles in the time-frequency plane, then У A,W7TW* = o. (7.3) We indicate the “proof” of this property. If R± and R% are disjoint, then either the horizontal sides of the rectangles are disjoint or the vertical sides are disjoint. In the first case, the supports of fRl and /д2 (in i) are disjoint, and the integral (7.3) is zero. In the second case, the supports of the Fourier transforms /дг and /д2 (in £) are disjoint, and the integral (7.3) is still zero, as we see by applying Parseval’s identity. We know, in fact, that this cannot happen, and if fR1 and /д2 are Gabor wavelets, the integral (7.3) is never zero. But this integral is small if R± and are “remote,” that is, if the rectangles mRi and are disjoint. Here m > 1 is an integer, and mR is the rectangle that has the same center as R and whose sides are m times the length of the sides of R. If m is large, remoteness becomes a very strong condition. Eric Sere has shown that remoteness of the rectangles Rq, Ri, R%, does not imply that the corresponding Gabor wavelets /д0, fR1, fR2,... are well separated from each other [235]. More precisely, for every m (no matter how large), there exist rectangles Rq,Ri,... in the time-frequency plane such that the rectangles mRj are pairwise disjoint and coefficients oo.oi.... , such that °° /-oc °° 2 £Ы2 = 1, and / У^^/д , (t) = +°°- (7.4) j=0 j=0 Thus remoteness of the rectangles in the time-frequency plane does not even imply that the corresponding Gabor wavelets are almost orthogonal, and consequently the apparent heuristic simplicity of the time-frequency plane is completely misleading. This phenomenon results from the arbitrariness of the h > 0 that are used in the definition of the time-frequency atoms. The rectangles Rq,Ri, ... in Sere’s result have arbitrarily large eccentricity. When h — 1 all is well, and the corresponding situation has been studied extensively. This is then a form of windowed Fourier analysis where the sliding window is a Gaussian [72]. Once we abandon the Gabor wavelets, we have two options: Malvar-Wilson wavelets and wavelet packets. We will briefly indicate the advantages and disad- vantages of these two options.
TIME-FREQUENCY ANALYSIS AND WAVELET PACKETS 107 If we use Malvar-Wilson wavelets, then, by their nature, the duration of the attack or of the decay is not necessarily related to the duration of the stationary part. We can, for example, have a Malvar-Wilson wavelet for which the durations of the attack and of the decay are of order 1, while the stationary part lasts T»l. If wq is the frequency corresponding to this stationary part, then the Fourier transform of the wavelet will be, at best, of the form sinT(e-^o) _ W(C-wo) and it cannot satisfy (7.2) because h is of the order of magnitude of T. Furthermore, this is true even if we allow in (7.2) the concentration around two frequencies of the same amplitude but opposite signs, that is, if we replace (7.2) with У” (ICI - This wavelet looks like the second wavelet in Figure 6.2, the centipede referred to in the text on page 96. On the other hand, the Malvar-Wilson wavelets are constructed to be exactly orthogonal. The implication of these observations is that the orthogonality of the Malvar-Wilson wavelets has been won at the price of their frequency localization, a localization that no longer guarantees the “minimal conditions” (7.1) and (7.2). One last remark about the Malvar-Wilson wavelets is obvious but significant: Although they are given by a simple formula, they are not obtained by translation, change of scale, and modulation (or modulation by of a fixed function g. The option we propose in this chapter is that of wavelet packets. Here are a few advantages of wavelet packets: (a) Daubechies’s orthogonal wavelets (Chapter 3) are a particular case of wavelet packets. (b) Wavelet packets are organized naturally into collections, and each collection is an orthonormal basis for A2(R). (c) One can compare the advantages and disadvantages of the various possible decompositions of a given signal in these orthonormal bases and select the optimal collection of wavelet packets for representing the given signal. (d) Wavelet packets are described by a simple algorithm 2J/2wn(2J x — k), where j, к G Z, n G N, and where the supports of the wn are in the same fixed interval [0,A]. The integer n plays the role of a frequency, and it can be compared with the integer к that occurs in the definition of the Malvar wavelets. The price paid for these advantages is the same as that associated with the Malvar-Wilson wavelets. Indeed, if to facilitate intuition we associate the rectangle R defined by /c2“J < t < (k + 1)2--7 and n2J < £ < (n + 1)2J in the time- frequency plane with the wavelet packet 2J/2wn(2Ji — k). then this choice does not meet conditions (7.1) and (7.2). Furthermore, we cannot do better by assigning a frequency different from n to wn, for although ||wn ||2 = 1, lim /inf [ (£ — w)2|wn(£)|2d£f =+oo. (7-5) n-^+oo [cueK J-OQ J The frequency localization of wavelet packets is relatively poor, except for certain values of n, and hence the “lim sup” in (7.5) (see [102]).
108 CHAPTER 7 7.2 The definition of basic wavelet packets We begin by defining a special sequence of functions wn, n E N, supported by the interval [0, 27V — 1], where N > 1 is fixed at the outset. If N = 1, these functions wn constitute the Walsh system, which is a well-known orthonormal basis for L2 [0,1]. (The Walsh system is discussed below.) If N > 2, the functions wn are no longer supported by [0,1]; however, the double sequence wn(x — /с), n E N, к E Z, (^-6) will be an orthonormal basis for L2(IR). This orthonormal basis will allow us to do an orthogonal windowed Fourier analysis. Thus, for the moment, this construction is similar to the Malvar-Wilson wavelets. The difference occurs when the dilations enter, the changes of variable of the form x i—> 2^x. We start with an integer N > 1 and consider two finite trigonometric sums, 27V-1 m»K) = д E h^iki k—0 2A-1 and mi(^) = —= ^2 9ке~гк^ k=o (7-7) that satisfy the following familiar conditions: 9k = (-l)fc+1 h2N-i-k or mi(C) = e *(27V 1)e m0(£ + <), (7.8) mo(O) = 1 and m0(£) 0 for E , (7.9) KO2 + lw(£ + tt)|2 = 1- (7.10) One choice, which leads to Daubechies’s wavelets (section 3.8), is given by |m0(£)|2 = 1 — cn / (sint)27V-1dt, (7.11) Jo where cn / (sint)27V 1dt = 1, Jo but other choices are possible [65]. As a first example take thq = |(e-^ + 1) and mi(^) = |(c~'^ — 1). Condition (7.10) reduces to 2 £ , • 2 £ i cos —h sin - = 1. 2 2 A second choice is given by л/2 7г0=Д(1 + л/3), V2hr = i(3 + \/3), v/2/i2 = i(3-\/3), л/2Л3 = ^(1 — x/3).
TIME-FREQUENCY ANALYSIS AND WAVELET PACKETS 109 Having selected the coefficients hk, we define the wavelet packets wn by induction on n = 0,1, 2,... using the two identities 2N-1 w2n(^) = hkwn(2x - k), (7-12) fc=0 2N-1 W2n+i(x) = V2 gkwn(2x - k), (7.13) fc=0 and the condition wq E Lx(1R) with УУ wq(x) dx = 1. We explain the roles of these two identities. Identity (7.12), with n = 0, is 2N-1 wq(x') = a/2 y^ hkwo(2x — k), (7-14) fc=0 and the function cp = wq is a fixed point of the operator T : LX(1R) L1(1R) that is defined by 2N-1 Г/(х) = C2 ^2 hkf(2x - k). (7.15) k=0 This equation becomes (7.16) by taking the Fourier transform. If / is normalized by f(x)dx = 1, then the fixed point is unique, and it is given by 0(C) = П (7.17) fc=l a relation we have now seen several times. On the other hand, the function cp can be constructed directly in “time” space by iterating T. Figure 7.2 illustrates the iterative scheme for constructing cp using the characteristic function of [0,1] for the initial value /о- Here, = T(fj'), and we have drawn the first few functions f0, fi, and /2- The coefficients Hq, hi, h%, and Л3 are approximately those given in the second example mentioned above. Then the sequence fj converges uniformly to the fixed point cp. Once the function cp = wq is constructed, we use (7.13) with n — 0 to obtain ip = wi. (The function t/j is the “mother” wavelet in the construction of the “ordinary” orthonormal wavelet bases, and cp is the “father” wavelet.) Next, we use (7.12) and (7.13) with n = 1 and obtain and W3. By repeating this process, we generate, two at a time, all of the wavelet packets. The support of cp is exactly the interval [0, 2N — 1] (see [65]), and it is easy to show that the supports of wn, n E N, are included in [0, 2N — 1]. The central result about the basic wavelet packets we have constructed is that the double sequence wn(x — к), n E N, к E Z, (7.18)
по CHAPTER 7 V^hi fo(2x) Fig. 7.2. Iterating T starting with fo = X[o,i]- is an orthonormal basis for L2(1R). To be more precise, the subsequence derived from (7.18) by taking 2J< n < 2j+1 is an orthonormal basis for the orthogonal complement Wj of Vj in Vj+i. Recall that, in the language of multiresolution analysis, Vj is the closed subspace of L2(1R) spanned by the orthonormal basis 2-7/2(^(27a; — /с), к E Z, and similarly, 2j/2'0(2jie — /с), к E Z, is an orthonormal basis for Wj. Thus, the construction of wavelet packets appears as a change of orthonormal basis inside each Wj. An interesting observation concerns the case where the filter has length one and Hq — hi — g\ — —qq = This brings us back to the Walsh system mentioned at the beginning of the section. We let r denote the one-periodic function that equals 1 on the interval [0, |) and —1 on the interval [|, 1). To define the Walsh
TIME-FREQUENCY ANALYSIS AND WAVELET PACKETS 111 system Wn, n E N, let x denote the characteristic function of the interval [0,1) and, for n = 8q + 2si + • • • + 2^£j, where £j = 0 or 1, write Wn(x) = [г(а;)]£о[г(2ж)]£1 • • • [r(2Jx)]£j y(x). Then it is not difficult to verify that W2n(x) = Wn(f2x) + Wn(2x - 1) and that = Wn(2x) - Wn(2x - 1). This shows that, in the case of filters of length one, the construction of wavelet packets leads to the Walsh system. The Walsh system Wn, n E N, is an orthonormal basis for L2 [0,1], and it follows immediately that the double sequence Wn(x — k), n E N, к E Z, is an orthonormal basis for L2(1R). (For more about the Walsh system and its connection with quadrature mirror filters, see [44].) In the general case of basic wavelet packets (filters longer than one), the supports of wn(x — к) and wn'(x — к') are not necessarily disjoint when к к', and proving the orthogonality of the double sequence wn fx — к), n E N, к E Z, is more subtle. We will return to this orthogonality issue in section 7.4. 7.3 General wavelet packets The basic wavelet packets are the functions wn, n E N (which are derived from a filter {/ifc}), and the sequence wn(x — fc), n E N, к E Z, is an orthonormal basis for L2(1R). This orthonormal basis is analogous to the Walsh system, but for filters longer than 1, it is more regular. That is, the frequency localization of the functions wn is better than the frequency localization of the functions in the Walsh system. Nevertheless, this frequency localization does not yield an estimate of the type inf (7.19) uniformly in n (see [65]). The general wavelet packets are the functions 2j/2wn(2Jx- kf n E N, j,fc£Z. (7.20) These are much too numerous to form an orthonormal basis. In fact, we can extract several different orthonormal bases from the collection (7.20). The choice j — 0, n E N, j, к E Z, leads to the orthonormal basis described in the previous section, while the choice n — 1, j, к E Z, leads to an orthonormal wavelet basis, as described in Chapter 3. There is another way to select a basis from the function in (7.20). Associate with each of the wavelet packets (7.20) the “frequency interval” I(j,n) defined by 2Jn < £ < 2-?(n + 1). The following result describes certain sets of wavelet packets that constitute orthonormal bases for L2(1R). THEOREM 7.1. Let E be a set of pairs j E Z, n E N, such that the corre- sponding frequency intervals I(j, n) constitute a partition of [0, oo), up to a countable set. Then the subsequence 2^2wn(23 x — к), (j,n)eE, kel, (7.21) is an orthonormal basis for L2(1R).
112 CHAPTER 7 Notice that choosing E is choosing a partition of the frequency axis. This parti- tioning is “active,” whereas the corresponding sampling with respect to the variable x (or i) is passive and is dictated by Shannon’s theorem. Going back to Ville, we see that wavelet packets lead to a signal analysis technique where the process is “first filter different frequency bands; then cut these bands into slices (in time) to study their energy variations.” Similarly, we refer to the methodology developed by Lienard: “The proposed analysis process contains the following steps: filtering with a zero-phase filterbank, and modeling the output signals into successive waveforms (channel-to-channel modeling).” When we have at our disposal a “library” of orthonormal bases, each of which can be used to analyze a given signal of finite energy, we are necessarily faced with the problem of knowing which basis to choose. We settle this problem with the same approach that we used for the Malvar-Wilson wavelets: The optimal choice is given by the entropy criterion that we have already used in the preceding chapter. This entropy criterion provides an adaptive filtering of the given signal. 7.4 Splitting algorithms Let (од.) and (/?&), к G Z, be two sequences of coefficients that satisfy the following conditions: ^2 lnA-|2 < °°, Z2 |A|2 < and, by defining mo(0) = ^аке~гкв and mi(0) = 12 Рке~гкв, the matrix V(e\ - [ moW “if'’) ] is unitarv ~ |то(0 + тг) т1(0 + 7г)] lsunltary- Consider a Hilbert space H with an orthonormal basis and define the sequence Д., к E Z, of vectors in H by Ла- = a/2 fak+i = л/^У^Ла:-/6/- (7.22) Then the sequence (Д), indexed by к G Z, is also an orthonormal basis for the Hilbert space H. Next, let Hq be the closed subspace of H generated by the vectors Ла-- which we denote by similarly will be generated by f^k+i — к G Z. Nothing prevents us from repeating on (770, e^) the operation we have done on (77, Cfc) and from iterating these decompositions while keeping the same coefficients (cifc) and (/3fc) at each step. An elementary example is useful for understanding the nature of this splitting algorithm. The initial Hilbert space is L2[0, 2тг] with the usual orthonormal basis = А=ег/с0, к G Z. The (27r-periodic) functions hiq and mi are (when restricted to [0,2тг)) the characteristic functions of [0, тг) and [тг, 2тг). Then the vectors Л& are ег2квто(в), and they constitute a Fourier basis for the interval [0,7r), while the vectors f^k+i constitute a Fourier basis for the interval [тг, 2тг). Finally, the subspace Hq of H is composed of the functions supported on the interval [0, тг), while 77i is composed of the functions supported on [тг, 2тг). Iterating the splitting algorithm leads to subspaces that are naturally denoted by 77(£1j where £i = 0 or 1, or even by 77/, where 7 denotes the dyadic interval of length 2-J and origin 2~Л1 + • • + In the example we have just studied, 77/ is exactly the subspace of L2[0,2tt) consisting of the functions that vanish outside the interval 2тг7.
TIME-FREQUENCY ANALYSIS AND WAVELET PACKETS 113 This example has guided the intuition of scientists working in signal processing. Assuming that the signal is sampled on Z, they have considered the situation where (cifc) and are two finite sequences and where mo resembles the transfer function of a low-pass filter while mi resembles that of a high-pass filter. One requires, at least, that mo(O) = 1 and that mo(0) 0 for 0 G [—f, f]- By analogy with the preceding example, these scientists were led to believe that the iterative scheme, which we have called the splitting algorithm, would provide a finer and finer frequency definition, as one wanders through the maze of “channels” illustrated in Figure 7.3. Fig. 7.3. An illustration of the splitting algorithm. The initial Hilbert space H is the direct sum of various combinations of these subspaces. In particular, H is the direct sum of all the subspaces at the same “splitting level”: at the first level there are 2 subspaces, at the next level there are 4, then 8, 16, and so on. To give a better understanding of the construction of wavelet packets and the exact nature of the splitting algorithm, consider the case where the initial Hilbert space is the space Vj, j > 1—in the language of multiresolution analysis—with the orthonormal basis 2^2cp(2jx — fc), к G Z. Next, suppose that the splitting algorithm has operated j times. Then we arrive exactly at the sequence of functions wn(x -k),kt2, 0 < n < 2J, and n = Sq + 2si + • • • + 2J-1Sj_i is the index of the “frequency channel” #(£0)£1 v..)£ p The frequency localization of wavelet packets does not conform to the intuition of the scientists who introduced these algorithms, and the only case where there is a precise relation between the integer n and a frequency in the sense of Fourier analysis is the case where mo and mi are the transfer functions of “ideal filters.”
114 CHAPTER 7 7.5 Conclusions It remains for us to indicate how to use wavelet packets. We begin by selecting, for use throughout the discussion, two sequences /ц,, др 0 < к < 27V—1, that satisfy the conditions for constructing wavelet packets. The choice of these sequences results from a compromise between the length (27V — 1) of the filters and the quality of the frequency resolution. Once the filters are selected, we set in motion the algorithm for constructing the wavelet packets. We obtain a huge collection of orthonormal bases for L2(1R) from this process. It is then a question of determining, for a given signal, the optimal basis. And again, the optimal basis is the one (among all those in the wavelet packets) that gives the most compact decomposition of the signal. We determine this optimal basis by using a “fine-to-coarse” type strategy and the method of merging. We start from the finest frequency channels A/j; these are associated with the dyadic intervals I of length \I\ = 2~m. The integer m is taken to be as large as necessary to be consistent with the chosen precision. The algorithm proceeds by making the following decision: It combines the left and right halves, I' and I", of a dyadic interval I whenever the orthonormal basis of Hi yields a more compact representation than that obtained by using the two orthonormal bases of Hi> and Hi„. The discrete version of wavelet packets also can be used and is immediately available. This is obtained by starting with the Hilbert space H — of a signal sampled on Z and the canonical orthonormal basis (efc)fcez, where efc(n) = 1 when n = к and 0 elsewhere. Here there is perfect resolution in position but no resolution in the frequency variable. Next, we systematically apply the splitting algorithm to improve the frequency definition until we reach the spaces Hi associated with the dyadic intervals I of length \I\ = 2~m. Finally, we apply the algorithm to choose the best basis (section 6.7). Wavelet packets offer a technique that is dual to the one given by the Malvar- Wilson wavelets. In the case of wavelet packets, we effect an adaptive filtering, whereas the Malvar-Wilson wavelets are associated with an adaptive segmentation of the time (or space) axis. As was the case with wavelets, wavelet packet orthonormal bases exist in two dimensions, where they have interesting applications to coding efficiently textured images. We quote from [200], where the basic ideas are developed and interesting examples of the compression of textured images are presented. Having pointed out that wavelets provide good compression for smooth images, Frangois Meyer writes: Wavelets, however, are ill suited to represent oscillatory patterns. Rapid variations of intensity can only be described by the small scale wavelet coefficients. Long oscillatory patterns thus require many such fine scale coefficients. Unfortunately those small scale coefficients carry very little energy, and are often quantized to zero, even at low compres- sion rates. In order to describe long oscillatory patterns, much larger libraries of waveforms, called wavelet packets, have been developed. After presenting several examples where the best wavelet packet basis outperforms wavelet coding (both visually and in terms of the quadratic mean), the author offers this criticism: We realize that when coding images that contain a mixture of smooth and textured features, the best-basis algorithm is always trying to find a compromise between two conflicting goals: describe the large scale
TIME-FREQUENCY ANALYSIS AND WAVELET PACKETS 115 smooth regions, and describe the local oscillatory patterns. The best basis is chosen in order to minimize the entropy, but such a choice may not always yield “visually pleasant” images. In fact we sometimes notice ringing artifacts on the border of smooth regions, when the basis is mostly composed of oscillatory patterns. These comments highlight one of the fundamental problems in image processing, which is that no single basis—that we are aware of today—is well suited to compress all images. A recent approach proposed by Frangois Meyer and his collaborators is reminiscent of color separation in the printing industry [201]. An image is separated into several layers, such as the smooth-regions layer and the textures layer. Each layer is coded differently using the transform, or basis, most appropriate for the layer. This is done is such a way that the compressed layers can be restored and put back together to produce a good image. The process is similar to the denoising algorithm used in [34], which we alluded to in sections 5.4 and 6.12. We also note the work of Jacques Froment for another approach to the problem of separating natural images into smooth regions and textured regions [122]. The sparse representation of images is a subtle and controversial issue. Wavelet packets offer an interesting option and perform better than ordinary wavelet ex- pansions when one wants to represent textured features accurately. However, the same remark applies to Malvar-Wilson bases, and one must decide which of these options to use. As if this were not complicated enough, highly textured images are well represented with brushlets [202]. Brushlets provide an improvement on wavelet packets by having better frequency localization. In both cases, one is trying to fit the frequency channels to the signal. We have mentioned that wavelet packets do not enjoy the desired frequency localization; in the Fourier domain, their decay is not ideal. Using brushlets amounts to decomposing the Fourier transform of a given signal with an adapted Malvar-Wilson basis.
CHAPTER О Computer Vision and Human Vision We propose to describe and comment on a small part of David Marr’s work. We limit our discussion to Marr’s analysis of the “low-level” processing of luminous information by the retinal cells. Marr suggested that the coding of this luminous information was based on the zero-crossings of an operator that is now called a wavelet transform. This hypothesis leads us to state the famous “Marr conjecture” and then to state its precise form as conjectured by Mallat. This precise form yields a remarkably effective algorithm. We will see, however, that Mallat’s conjecture is not generally correct, and this poses some fascinating new problems. 8.1 Marr’s program Marr’s book, Vision, A Computational Investigation into the Human Representa- tion and Processing of Visual Information [198], appeared in 1982. Stylistically it is reminiscent of Descartes’s Discours de la methode. Exactly as Descartes did, Marr takes us into his confidence and speaks to us as if we were a friend or colleague from his laboratory. Marr confides in us his intellectual progress and tells us about his doubts, his hopes, and his enthusiasms. He gives a lively description of the theories he has struggled with and rejected, and he explains his own research with an infectious enthusiasm. We recall that the goal of Marvin Minsky’s group at the Massachusetts Institute of Technology (MIT) artificial intelligence laboratory was to solve the problem of artificial vision for robots. The challenge was to construct a robot endowed with a perception of its environment that enabled it to perform specific tasks. It turned out that the first attempts to construct a robot capable of understanding its surroundings were completely unsuccessful. These surprising setbacks showed that the problem of artificial vision was much more difficult than it seemed. The idea then occurred to imitate, within the limits imposed by robot technology, certain solutions found in nature. Marr, who was an expert on the human visual system, was invited to leave Cambridge, England, for Cambridge, Massachusetts, to join the MIT group. According to Marr, the disappointments of the robot scientists were due to having skipped a step. They had tried to go directly from the statement of the problem to its solution without having at hand the basic scientific understanding that is necessary to construct effective algorithms. Marr’s first premise is that there exists a science of vision, that it must be de- veloped, and that once there has been sufficient progress, the problems posed by vision for robots can be solved. 117
118 CHAPTER 8 Marr’s second premise is that the science of human vision is no different from the science of robot (or computer) vision. Marr’s third premise is that it is as vain to imitate nature in the case of vision as it would have been to construct an airplane by imitating the form of birds and the structure of their feathers. On the other hand, he notes that the laws of aerodynamics explain the flight of birds and enable us to build airplanes. Thus it is important, as much for human vision as for computer vision, to establish scientific foundations rather than blindly to seek solutions. To develop this basic science, one must carefully define the scope of inquiry. In the case of human vision, one must clearly exclude everything that depends on training, culture, taste, and similar “conditioning.” For instance, the ability to distinguish the canvas of a master from that of an imitator has nothing to do with the science of basic human vision. One retains only the mechanical or involuntary aspects of vision, that is, those aspects that enable us to move around, to drive a car, and so on. Thus we limit the following discussion to low-level vision. This is the aspect of vision that enables us to re-create the three-dimensional organization of the physical world around us from the luminous excitations that stimulate the retina. The notion that low-level vision functions according to universal scientific algo- rithms seemed to be an implausible idea to some scientists, and it encountered two kinds of opposition. In the first place, neurophysiologists had discovered certain cells having specific visual functions. But Marr was opposed to this reductionist approach to the problems of vision, and he offered two criticisms on this subject: (a) After several very stimulating discoveries, neurophysiologists had not made sufficient progress to enable them to explain the action of the human visual system based on a collection of ad hoc cells. (b) It would be absurd to look for the cell that lets you immediately recognize your grandmother. On another front, Marr was opposed to attempts by psychologists to relate the performance of the human visual system to a learning process. Roughly, the idea is that we recognize the familiar objects of our environment by dint of having seen and touched them simultaneously. In fact, Bela Julesz made a fundamental discovery that eliminated this as a working hypothesis. Julesz made a systematic study of the response of the human visual system when it was presented with completely artificial images (synthetic images having no significance) that were computer-generated, random-dot stereograms. If these synthetic images presented a certain “formal structure” that stimulated stereovi- sion reflexes, the eye deduced, in several milliseconds and without the slightest hesitation, a three-dimensional organization of the image. This organization “in relief” is clearly only a mirage in which the mechanism of stereovision finds itself trapped. This mechanism acts with the same speed, the same quality, and the same precision as if it were a matter of recognizing familiar objects. The conclusion is that familiarity with the objects one sees plays no role in the primary mechanisms of vision. Marr set out to understand the algorithmic architecture of these low-level mechanisms. This venture can be likened to that of the seventeenth-century physiologists who studied the human body by comparing it to a complex and subtle machine—an assembly of bones, joints, and nerves whose functioning could be explained, cal- culated, and predicted by the same laws that applied to winches and pulleys. A century and a half later, Claude Bernard made a similar connection between the
COMPUTER VISION AND HUMAN VISION 119 organic functioning of the human body and results from the nascent field of organic chemistry. The synthesis of urea (Wohler, 1828) again reduced the gap between the chemistry of life and organic chemistry. In their scientific approach, these researchers relied on solid, well-founded knowl- edge that came either from mechanics or from chemistry. They then tried to effect a technology transfer and to apply results acquired in the study of matter to the life sciences. But what Marr set out to do was much more difficult because the relevant knowledge base, namely, an understanding of robots, was too tenuous to serve as the nucleus for an explanation of the human visual system. Marr asserted that the problems posed by human vision or by computer vision are of the same kind and that they are part of a coherent and rigorous theory, of an articulate and logical doctrine. It is necessary, at the outset, to set aside any consideration of whether the results will ultimately be implemented with copper wires or nerve cells and to limit the investigation to the following four properties of human vision that we wish to imitate or reproduce in robots: (a) The recognition of contours of objects. These are the contours that delimit objects and structure the environment into distinct objects. (b) The sense of the third dimension from two-dimensional retinal images and the ability to arrive at a three-dimensional organization of physical space. (c) The extraction of relief from shadows. (d) The perception of movement in an animated scene. The fundamental questions posed by Marr are the following: (a) How is it scientifically possible to define the contours of objects from the variations of their light intensity? (b) How is it possible to sense depth? (c) How is movement sensed? How do we recognize that an object has moved by examining a succession of images? Marr opened a very active area of contemporary scientific research by giving each of these problems a precise algorithmic formulation and by furnishing parts of the solution in the form of algorithms. Marr’s working hypothesis is that human vision and computer vision face the same problems. Thus, the algorithmic solutions can and must be tested within the framework of robot technology and artificial vision. In case of success, it is necessary to investigate whether these algorithms are physiologically realistic. For example, Marr did not believe that human neuronal circuits used iterative loops, which are an essential aspect of the existing algorithms. This discussion raises the basic problem of knowing the nature of the repre- sentation on which the algorithms act. Marr used a simple comparison to help us understand the implications of a representation. If the problem at hand was adding integers, then the representation of the integers could be given in the Roman sys- tem, in the decimal system, or in the binary system. These three systems provide three representations of the integers. But the algorithms used for addition will be different in the three cases, and they will vary greatly in difficulty. This shows that the choice of this or that representation involves significant consequences. (See the quotation on page 12.)
120 CHAPTER 8 8.2 The theory of zero-crossings Marr felt that image processing in the human visual system has a complex hierar- chical structure, involving several layers of processing. The “low-level processing” furnishes a representation that is used by later stages of visual information process- ing. Based on a very precise analysis of the functioning of the ganglion cells, Marr was led to this hypothesis: The basic representation (“the raw primal sketch”) fur- nished by the retinal system is a succession of sketches at different scales and these scales are in geometric progression. These sketches are made with lines, and these lines are the zero-crossings that Marr uses in the following argument [198, p. 54]: The first of the three stages described above concerns the detection of intensity changes. The two ideas underlying their detection are (1) that intensity changes occur at different scales in an image, and so their optimal detection requires the use of operators of different sizes; and (2) that a sudden intensity change will give rise to a peak or trough in the first derivative or, equivalently, to a zero-crossing in the second derivative .... These ideas suggest that in order to detect intensity changes effi- ciently, one should search for a filter that has two salient characteristics. First and foremost, it should be a differential operator, taking either a first or second derivative of the image. Second, it should be capable of being tuned to act at any desired scale, so that large filters can be used to detect blurry shadow edges, and small ones to detect sharply focused fine details in the image. Marr and Hildreth [199] argued that the most satisfactory operator fulfilling those conditions is the filter AG, where A is the Laplacian oper- ator (д2/дх2 +д2/dy2) and G stands for the two-dimensional Gaussian distribution G(x,y) = 27r1o.2 +y ^2<T , which has standard deviation a. AG is a circularly symmetric Mexican-hat-shaped operator whose distribution in two dimensions may be expressed in terms of the radial distance r from the origin by the formula5 — 1 ( r2 \ 1 - e 7ГСТ* \ 2ctz / .2 Marr is computing the two-dimensional wavelet transform of the image using the wavelet -0, which is the Laplacian of the Gaussian G. Today, -0 is known as Marr’s wavelet. If a black and white image is defined by the gray levels f(x,y), the zero- crossings of Marr’s theory are the lines defined by the equation (/ * ^^(x, y) — 0. Since the function -0 is even, the values of convolution product f * are (up to a proportionality factor) the wavelet coefficients of /, analyzed with the wavelet -0. Hence, the zero-crossings are defined by the vanishing of the wavelet coefficients. The values of ст remain to be specified. The values used in human vision are in geometric progression, and they were discovered by Campbell, Robson, Wilson, Giese, and Bergen, based on neurophysiological experiments. These experiments led to the values aj = (1.75рсто- Marr’s conjecture is that the original image f is completely determined by the sequence of lines defined by (/ * yv,) (ж, у) — 0. Interest in this representation of an 5We have changed the notation and corrected typos.
COMPUTER VISION AND HUMAN VISION 121 image stems from its invariance under translations, rotations, and dilations. Here are some of Marr’s thoughts about this representation [198, p. 67]: Zero-crossings provide a natural way of moving from an analog or continuous representation like the two-dimensional image intensity val- ues I(x,y) to a discrete, symbolic representation. A fascinating thing about this representation is that it probably incurs no loss of informa- tion. The arguments supporting this are not yet secure. In the following pages, we propose to study Marr’s conjecture. We will show first of all that it is incorrect for periodic images covering an unbounded area. In particular, we will construct a whole family of periodic functions that have the same zero-crossings. However, since our counterexample is unbounded, it does not exclude the possibility that the conjecture is true for images having finite extent. We will then examine Mallat’s conjecture, which is a version of Marr’s conjecture. Mallat’s conjecture leads to an explicit algorithm for reconstructing the image. This algorithm works very well in spite of the fact that Mallat’s conjecture is in general false. Although the algorithm is not widely used in practice, there is continuing research interest in this technique. The counterexample that we construct is, in a certain sense, more realistic than the one we present in the case of Marr’s conjecture. 8.3 A counterexample to Marr’s conjecture We begin with a counterexample in one dimension. It will then be easy to transform it into a two-dimensional counterexample. This counterexample has the property of being periodic in x (or in x and у in the two-dimensional case). We do not know how to construct other counterexamples. Consider all the functions f of the real variable x, having real values, and given by the series f(x) = sin ж + ^2 ak sin kx, (8-1) k=2 where we require that У>3Ы<1- (8.2) k=2 We are going to show that all choices of the coefficients ak lead to the same zero-crossings. For example, sin x and sin x + | sin 2x have the same zero-crossings. We prove this assertion by applying the following simple observation: If и and v are two continuous functions of x, and if, for some constant r € [0,1), |v(a?)| < r|w(a?)| for all x, then u(x) + v(x) = 0 is equivalent to u(x) — 0. Returning to (8.1), define gs(x) = ^y/2^e~X • Then / * QtSx') = e-<5 /2 sin x + У2 (y-ke~k 6 /2 sin kx. k=2 It follows from this that — ^“2 (/ * — e~6 /2 sina? + У2 k2ak&~k 6 sinkx = u(x) + v(x).
122 CHAPTER 8 Since | sin kx\ < k\sina?|, |u(a?)| < r|w(a?)|, where r = k3\a^| < 1. Thus the zero-crossings of all the functions f are x — mir, m E Z. If we wish to have 0 < f(x) < 1, it is sufficient to add a suitable constant to f(x) (defined by (8.1)) and then to renormalize the result by multiplication with a suitable positive constant. These two operations do not change the positions of the zero-crossings. A nontrivial two-dimensional counterexample is given by the function /(rr, y) — sin ж sin?/ + ak sin kx sin ky, k=2 where we now require that 2у>4Ы < 1. k=2 8.4 Mallat’s conjecture The existence of these counterexamples and several remarks Marr made in his book led Stephane Mallat to a more precise version of Marr’s conjecture. Mallat observed that numerical image processing using certain kinds of pyramid algorithms (quadra- ture mirror filters) and Marr’s approach represented two particular examples of wavelet analysis of an image. In fact, one has A(/ * g$) = 6~2f * ips, where ,z x 1 Л x2 + y2\ ( x2 + y2\ 7Г \ 2 j у 2 J is Marr’s wavelet. With this in mind, Mallat took up a promising approach: The idea was to give Marr’s conjecture a precise numerical and algorithmic formulation by taking advantage of the progress that had been made in image processing in the early 1980s using pyramid algorithms. We start with the one-dimensional case. Mallat replaced the Gaussian -т==е~х I2 v 27Г with the basic cubic spline 0, whose support is the interval [—2,2]. Recall that 0 — T*T, where T is the triangle function whose value is 1 — |ж| if |ж| < 1 and 0 if \x\ > 1. Let f be the function we wish to analyze by the method of zero-crossings and write 0$(x) = 6-10(6-1ж). Then the zero-crossings are the values of x where the second ,2 derivative yy2 (/ *#«) is zero and changes sign. To use the pyramid algorithms, Mallat assumes that 6 = 2~J, j G Z. He then proposes to code the signal f with the double sequence (xqj, yq,j), where ,2 (a) x = xqj is (for 6 = 2-J) a zero of ^2 (/ * #<$) where this function changes sign, and (b) Уч,з = In other words, Mallat considers the values of x = xqj where ^(/ * 0$) has an extremum, and he keeps the values of these local extrema in memory. Certain of these local extrema are related to points where the signal f changes rapidly; this is the case for the points x± and x? in Figure 8.1. Other extrema are related to points where the function changes very little. Mallat had the idea
COMPUTER VISION AND HUMAN VISION 123 to consider only the first of these and thus to retain only the local maxima of I^(7 * &б)\- This will not change the critical analysis that follows. Coding f with the double sequence (xqj, yq,j) meets two objectives: It is invariant under translation, and it corresponds to a precise form of Marr’s conjecture. Here is what Marr wrote [198, p. 68]: On the other hand, we do have extra information, namely, the val- ues of the slopes of the curves as they cross zero, since this corresponds roughly to the contrast of the underlying edge in the image. An analytic approach to the problem seems to be difficult, but in an empirical inves- tigation, Nishihara (1981) found encouraging evidence supporting the view that a two-dimensional filtered image can be reconstructed from its zero-crossings and their slopes.
124 CHAPTER 8 We are going to show that this conjecture is in general incorrect. However, this assertion must be tempered, since our counterexample depends on a specific choice for the function 0. If 0 is the cubic spline, then we have a counterexample. If, on the other hand, the cubic spline 0 is replaced with the function that is equal to 1 + cos x if |ж| < тг and to 0 if |ж| > тг (which is the Tukey window), then, for all signals f with compact support, reconstruction is theoretically possible but unstable (see Appendix C). In this case, it comes down to determining a function with compact support from the knowledge of its Fourier transform in the neighborhood of zero, and this is an unstable process. Appendix C contains a complete description of our counterexample, but for those who wish to skip the details, we provide an outline here in the main text. We begin by making a change of scale so that the values of ё are 27Г2--7 rather than 2~\ j € Z. (This is a convenience rather than an essential point.) We then define /о(т) = 1 + cos ж if —7Г < x < 7Г and fo(x) = 0 if |a?| > тг. The first step consists in finding the zeros of ^г(/о * ^<5), which are the inflection points of /о * &6- We note that /о * ^б(т) = 0 whenever |ж| > тг + 26, and thus we search for the other zeros. Since (/o *#d)/z — /о # is even, and cos(^ +.r) = — cos(f ~ t), it is clear just by examining the integral that (/о * — 0 for all ё < When ё is large, the roles played by /о and are interchanged, and we write (/0 * = fo* Then when ё > Зтг, we see that /о * = again by just examining the integral. We introduce a perturbation R that belongs to C°°(IR), is even, and is supported by < |a?| < j. We also require that the first three moments of R vanish. Having fixed such an R, the perturbation of f0 is f = fo + sR, where e > 0 is small. We then prove—and this is all in Appendix C—that (/6 * ^<5)//(ж) = 0 implies (R * ^)//(ж) — 0 and (R * Os)'(x) = 0. A stronger statement is actually needed: There exists a constant C such that d2 d2 C -^{fo*O6^ dxz uniformly for all ё = 2тг2-А Once this is proved, it is clear from the argument given above for the Marr counterexample that f and fo have the same zero-crossings. The fact that (f*0b)'(x) = (/о*^б)х(т) at these zero-crossings follows from the definition of R. If the function f that we wish to analyze by Mallat’s algorithm is a step function (with an arbitrarily large number of discontinuities), then Mallat’s conjecture is correct. In fact, thanks to the symmetry of the function 0, the zero-crossings occur (for sufficiently small ё > 0) at the points of discontinuity, while the values of the first derivatives of the smoothed signal furnish the jumps in the signal at these discontinuities. In this case, we have perfect reconstruction of the signal. All this explains, without doubt, why Mallat’s algorithm works in practice with such excellent precision, no matter which signals are treated. The signals in question have more in common with step functions than with the subtle functions described in the counterexamples. 8.5 The two-dimensional version of Mallat’s algorithm We start with a two-dimensional image g. From this we create the increasingly blurred versions at scales ё = 2-J , j € Z, by taking the various convolution products
COMPUTER VISION AND HUMAN VISION 125 g * Os, where, in two dimensions, Os(x, у) = 0s{x)0s{y). The function 0 is the basic cubic spline used in one dimension. Next we consider the local maxima of the modulus of the gradient of g * Os • We keep in memory the positions of these local maxima as well as the gradients at these points. The conjecture is that this data, computed for ё = 2-J , characterizes the image whose gray levels are given by g(x,y). We will show that this conjecture is incorrect in this general form. This does not exclude the possibility of its being true if (1) more restrictive assumptions are made about the function g or (2) the definition of the smoothing operator is changed. The counterexample in two dimensions will not be compactly supported. Finding a counterexample whose support is a square is an unsolved problem. Our coun- terexample will be g(x,y) — f(x) + f(y), where f is the counterexample in one dimension. Then g(x, y) * 06(x)06(y) = f(x) * Os(x) + f(y) * 06(y), and the gradient of this function is the vector Its length is (|^(/ * Os)|2 + |^(/ * ^)|2)1/2, and it has a maximum if and only if |^(/ * 0<$)| and |^j(/ * Os) | are at a maximum. But the set of functions / has been constructed so that the positions of the maxima of |^(/ * ^)| are independent of the choice of f and the same is true for the values of ^(/ * Os) at these points when ё = 2тг2-3', j E Z. 8.6 Conclusions All of this shows that Marr’s conjecture is doubtful. Nevertheless, the underlying heuristics are playing a key role in signal processing. A successful example is the signal analysis being done by Alain Arneodo and his group to reveal the complex nature of certain signals, particularly velocity signals from fully developed turbu- lence. This processing has been used on other signals, including signals derived from DNA and financial time series, with impressive results. (The application to turbu- lence will be described in detail in the next chapter.) We note that the problem of reconstruction is irrelevant for these applications: One wishes to extract some meaningful characteristics of the signal, but one is not interested in reconstructing the signal. Thus, even if Marr’s conjecture is doubtful, its spirit is alive. Regarding Mallat’s conjecture, one must distinguish between the problem of unique representation and that of stable reconstruction. In our opinion, the re- construction is never stable (unless the class of images to which the algorithm is applied is seriously limited). But it is, in certain cases, a representation that defines the image uniquely (see [166] and Appendix C).
CHAPTER 9 Wavelets and Turbulence 9.1 Introduction Studying turbulence with wavelets is a controversial scientific program. This is not surprising, since we are attacking one of the most difficult and itself controversial problems in science with a rather simple tool. Criticism arose originally when a few scientists announced that spectacular results had been obtained by wavelet meth- ods. It was highly unlikely, however, that one of the oldest fundamental problems of classical physics—a problem whose solution has eluded some of the outstand- ing scientists of the twentieth century—would suddenly be resolved by the mere introduction of a new tool. Similar criticisms arose when wavelet methods were first applied to image pro- cessing. Today we have a much better understanding of how low-level processing can benefit from wavelet methods. We also understand that some aspects of image processing, such as pattern recognition, are not directly accessible through wavelet methods. In our report on wavelets and turbulence, we hope to draw similar bal- anced conclusions by indicating what is working and what is not. Wavelets have been applied to at least three problems in fluid dynamics during the past 15 years. The first one concerns a line of research that was introduced by Benoit Mandelbrot and developed by Uriel Frisch and Giorgio Parisi; it is the program that has led to the recent results by Alain Arneodo and his coworkers in Bordeaux, France. These programs seek to unravel the intricate fine-scale geomet- rical structure of fully developed turbulence by analyzing time series obtained from wind tunnel experiments. One wishes to know if a fractal or multifractal model is appropriate. A second problem concerns the detection and modeling of the coherent structures that are found in turbulent flows. The third problem mentioned in this chapter deals with the mathematical and numerical treatment of the Navier-Stokes equations. The question here is, Do wavelet-based algorithms perform better than conventional numerical schemes? Turbulence was studied, described, and modeled long before wavelets existed. As in image processing—and in many other scientific fields—the most readily available and widely used tool was the Fourier transform. We indicate the successes and the limitations of this methodology in the next section. The bulk of the chapter is devoted to discussing the multifractal formalism and the role of wavelets in the continuing development of this approach. Next, we will indicate how wavelets have been used for studying the coherent structures in turbulence. As the last application, we will indicate how wavelets are being used to study the Navier- Stokes equations. (We suggest the book [118] by Uriel Frisch as a general reference on turbulence that discusses the concepts introduced in this chapter.) 127
128 CHAPTER 9 9.2 The statistical theory of turbulence and Fourier analysis The purpose of statistical modeling is to provide useful descriptions of large data sets that originate from complex phenomena. Statistical modeling contrasts sharply with the nineteenth-century approach to science, which culminated with accom- plishments like the work of Albert Einstein and the axiomatization of quantum mechanics. Physicists were looking for a few fundamental equations (indeed, par- tial differential equations) that would describe all the laws of the universe. They sought beauty and simplicity, and measured by technological accomplishments, this approach has been remarkably successful. This very success has led us in the late twentieth century to attack increasingly complex problems, which for one reason or another have resisted a purely deterministic approach. Statistical modeling is often an appropriate intellectual approach when faced with large data sets that are generated by a specific procedure and that present similarities that must be accu- rately described and understood. This approach is appropriate whenever a purely deterministic attack on a problem is impossible or impractical. The study of fluid dynamics is an intermediate case. The mathematical equations that govern the evolution of fluid flows have been known for a century; it is a system of nonlinear partial differential equations known as the Navier-Stokes equations. In principle, everything is written in these equations, but in practice we are faced with three monumental problems, which are surely related: We do not understand the mathematics of the Navier-Stokes equations, we cannot efficiently compute the solutions of these equations, and it is difficult to access experimentally the full space-time complexity of fully developed turbulence. In some practical situations, simplifications can be introduced that lead to tractable numerical simulations: They are of great importance in aerodynamics, for instance, where they replace costly wind-tunnel experiments. In the nontractable situations, stochastic modeling is often used. Once this is accepted another quandary must be faced: Should the model be determined by a data-fitting procedure or should it be based on plausible assumptions compatible with physical principles? Initially, the second approach was taken in the case of fully developed turbulence. However, data-fitting has gained importance as instrumentation and experimental techniques have improved. The theoretical successes of the mid twentieth century have been followed by the challenge of creating more sophisticated models to fit the accurate experimental data of the latter part of the century. The statistical theory of turbulence was introduced more or less simultaneously by Kolmogorov in 1941 [167], [168], Obukhov in 1941 [220], Onsager in 1945 [221], Heisenberg in 1948 [141], and von Weizsacker in 1948 [255]. This work involved applying the statistical tools used for studying stationary process to understand the partition of energy at different scales in the solutions of the Navier-Stokes equations. According to Leray, this statistical point of view could be justified by the loss of stability and uniqueness of the solutions for very large Reynolds numbers and for large values of time [172]. More recently, with the advent of computers, we have become even more aware of how sensitive the Navier-Stokes equations are to small errors (such as the inevitable computer round-off errors), to the point that computing deterministic solutions at high Reynolds numbers does not make sense. An example of this problem in the context of weather prediction is known as the “butterfly effect,” the idea that a butterfly’s passage can change the prediction. This implies that only statistical averages are relevant in many situations.
WAVELETS AND TURBULENCE 129 For modeling fully developed turbulence, we need to distinguish three, roughly defined, scalar regions. The intermediate scales (the inertial zone) lie between the smallest scales (where, through viscosity, the dynamic energy is dissipated in heat) and the largest scales (where exterior forces supply the energy). In this inertial zone, the theory of Kolmogorov stipulates that energy is neither produced nor dissipated but only transferred from one scale to another at a constant rate e. The statistical modeling of turbulence applies only to the inertial zone. Other assumptions are that turbulence is statistically homogeneous (invariant under translation), isotropic (invariant under rotation), and self-similar (invariant under dilations when considering scales in the inertial zone). The velocity com- ponents are treated as random variables, and the statistical description is derived from the corresponding correlation functions. In view of the space homogeneity, the Fourier transform is the mathematical tool adapted to this statistical approach. Kolmogorov and Obukhov used dimensional analysis to show that the average spec- tral distribution of energy must scale like £2/3|/c|-5/3, where к is the vector variable of the Fourier transform of the three-dimensional velocity. This means that a log- log plot of the energy versus |/c| has a slope equal to —5/3. This scaling law is very well verified experimentally for a large range (roughly three decades) of \k\. Perhaps the simplest statistical process that has the same power spectrum is fractional Brownian motion (fBm) with scaling exponent H = 1/3. This process is denoted by Вн(к) and is defined by the following three properties: Bu(t) is Gaussian, Bn(t) has stationary increments, and B//(t) satisfies the scaling law А-ЯВн(А£) ~ Bp{(t) for A > 0. The second requirement means that for each increment h, Bffif. + h) — B#(t) is a stationary process, and the last one means that А-яВн(А£) and B#(t) have the same statistics. The structure functions of fractional Brownian motion are E[ |Вн(£+т) —Вн(£)|р], where E is the expectation, and they satisfy the identity + = cP|r|H₽, (9.1) where 0 < p < oo and cp is a constant. Structure functions are used as a classi- fication tool. However, in many cases, such as turbulence, we do not have access to the expectation or ensemble average. We are then forced to make an ergodic assumption and to replace this expectation with an integral with respect to the space variable. Another problem makes life even more complicated. As will be explained later, the experimental data is the velocity of the flow measured at a given point xq as a function of time t. The |/c|-5/3 law concerns the Fourier transform of the velocity at a given time as a function of the space variable. An assumption, called the Taylor hypothesis, is needed to obtain u(a?i, X2, x%, t) as a function of the longitudinal variable aq. We will never have access to the full space time information. This issue will be discussed further in the next section. We end this section by noting that fractional Brownian motion had to be aban- doned as a model of turbulence. Precise wind-tunnel measurements have shown that the exponent in the structure functions of fully developed turbulence (9.3) is not a linear function of p. We will be looking at this nonlinear behavior later in the chapter. It is often interpreted as the signature of intermittency, which is an informal term meaning that certain quantities, particularly energy dissipation, vary greatly in time and space.
130 CHAPTER 9 9.3 Multifractal probability measures and turbulent flows We will describe in section 9.4 the multifractal signal processing that has been proposed by Frisch and Parisi, but first we wish to mention the pioneering work of Mandelbrot. Mandelbrot wished to model the rate of energy dissipation in a turbulent flow. This dissipation rate is defined as where ui,U2,W3 are the three components of the velocity field. Before describing Mandelbrot’s ideas for modeling е(ж, t), we indicate how е(ж, t) is measured. Our knowledge of the small-scale structure of a turbulent flow is derived from wind- tunnel experiments. A small wire is placed in a tunnel where the flow is turbulent, and the wire is heated at some point. The thermal decay is related to the fluid velocity u(xo,t) at this point, as a function of time. For computing the energy dissipation rate, we need instead u(x, to) as a function of the space variable x. The Taylor hypothesis, which says that the time variations are equivalent to the space variations, applies to wind-tunnel experiments, and thus e(x, t) is computed , о v 2 as c(^(xQ,ty) , where c is a constant. Various wind-tunnel experiments, going back to Batchelor and Townsend in the mid-1940s, have suggested that the energy dissipation at the smallest scales is not uniformly distributed [26]. More recently, Alain Arneodo and his group [15] have analyzed very accurate data that were obtained by Gagne and Hopfinger and colleagues in a large wind tunnel in Modane, France [5]. These wind-tunnel measurements confirm that the energy dissipation rate associated with the small scales of a turbulent flow is spatially intermittent. Observations like these led Mandelbrot to propose a random multiplicative model for the energy dissipation е(ж, £) [190, 191]. Mandelbrot’s model and ones that have followed describe how energy cascades from large scales to the smallest scales, where it is dissipated as heat. Thus, speak- ing informally, there are two aspects to these models: the rule that governs how the energy is partitioned from scale to scale and the set on which the dissipation oc- curs. To gain some intuition about these concepts, we describe the construction of the multifractal Bernoulli measure pp. This is perhaps the simplest mathematical construct that models these ideas. This probability measure depends on a parameter p E (0,1). Let q = 1 — p, and inductively define p = pp on dyadic subintervals I of [0,1). In other words, I = [k2~\{k + 1)2-J) with 0 < к < 2A j > 0. If Г = [к2~\(к + |)2"J) and I" = [(к + ^y2~i,k2~i) are the “sons” of /, then the three measures //(/), p(Ir), and p(//z) of these intervals are related by p(/z) = pp(I) and p(I") — qp(Fp When p = q = p is the ordinary Lebesgue measure on [0,1], and when 0 < p < q < 1, p is a singular measure. The function F(x) — р([0, ж]) is an example of a devil’s staircase. Indeed, F is a continuous strictly increasing function whose derivative F' vanishes almost everywhere. The velocity vanishes almost everywhere, but we are still moving! The construction of these measures serves as the model for what we call a multiplicative cascade, and the analysis of these measures serves as a model for the multifractal formalism. If p is a given probability measure on [0,1], its local scaling exponents си(жо) are defined by comparing p(I) with |7’|c*, where xq E I, as its length \I\ tends to zero. Mathematicians ask if p(I) < when xq E I, and if so define а(ж0) to
WAVELETS AND TURBULENCE 131 be the upper bound of these exponents a. Physicists are more optimistic and find cn(xo) by making a log-log plot of the mass r(Z) carried by I versus the length of I. If the log-log plot is close to a straight line with slope a, they write p(T) ~ \I\a- (The notation /z(I) ~ |/|Q means that i°| p|2 —* 1 as |I| —> 0. Elsewhere, we write expressions like A(x) ~ B(x) to mean that —> 1 as ж —0.) Returning to the special case of Bernoulli measures, а(ж0) can be computed explicitly, and it depends only on the average number of 0’s in the dyadic expansion of xq. This is the reason that cn(x0) is highly unstable as a function of xq-. It is discontinuous at every xq. In general, it is impossible to compute а(жо), and if one wants information about /т, it is necessary to look elsewhere. As is often the case, another function, such as an average, is more useful than the function itself. Here is what is done for measures: For a given exponent h in [0,1], denote by E(h) the set of xq such that о (.To) = h and denote by D(h) its Hausdorff dimension. Then D(h) measures how likely it is that the scaling exponent is h. This function D(h) is called the spectrum of singularities of /т; it has a characteristic concave shape, and it plays an important role in what follows. (The notion of Hausdorff dimension has become an indispensable tool for characterizing fractal sets, and it is used throughout this chapter and the next. For completeness, we include the definition and a brief discussion in section 9.10.) Mandelbrot generalized the construction of the Bernoulli measures to provide a better model for the rate of energy dissipation in fully developed turbulence. In this new construction, p is replaced by a random variable p(/,ca) that depends on the dyadic interval /, which will be subdivided into two subintervals I' and I", as in the Bernoulli construction. These random variables all belong to [0,1] and are independent and identically distributed as I runs over dyadic intervals. The exis- tence and properties of these random measures p are discussed by J.-P. Kahane and J. Peyriere in [165]. This model has been extensively studied by mathematicians, and its generalizations form an active area of research ([22] and [212] contain some of the latest results). An obvious drawback of this model is the dyadic partitioning, which has no reason to appear in fluid mechanics. Continuous-scale cascade models introduced by B. Castaing and his coworkers avoid this problem, since these models favor no particular scale [47]. 9.4 Multifractal modeling of the velocity field The goal of this research is to study the three-dimensional velocity field of fully developed turbulence, but as we have emphasized above, our knowledge about fully developed turbulence comes to us from a one-dimensional velocity signal. This signal is not a probability measure nor is there any reason to believe that it should be modeled as the primitive of a probability measure. The analysis outlined for measures does not apply, so if we want a similar analysis that applies to functions, it is necessary to extend the ideas. This step was taken by Parisi and Frisch [223]. The starting point for this analysis is the following observation: r(7) ~ \I\a as I contains xq and as its length \I\ tends to zero can be rewritten as |F(x) — F(a?o)| ~ |ж — xo|a, where F is a primitive (indefinite integral) of p. This suggests computing the pointwise Holder exponents. Recall the definition given in Chapter 2: If f is continuous at xq and if 0 < a < 1, then one writes f E Cc*(xq) when |/(x) — /(xq)| < C\x — xo|a for some constant C. If a > 1, then /(ж) — f(x0) is replaced by the error term in the Taylor expansion of f at Xq. The scaling exponent си(жо) is defined as the supremum of those a for
132 CHAPTER 9 which f belongs to Ca(xo)- Mathematicians then write / x r . Jog|/(x) - /(ж0)| a(a?o) = hm mt---------:------:--. (9.2) log |ж - ж0| Physicists might expect this liminf to be a limit, in which case си(жо) can be mea- sured by a log-log plot. Once а(жо) is defined, E(Ji) is defined to be the set of all xq for which а(х0) = h, h E [0,1]. Finally, D(Ji) is the Hausdorff dimension of E(h), and the multifractal spectrum of singularities of f is precisely this function D(h), which is defined on [0,1]. Note that this procedure can be applied to any function f of the real variable x. (In fact, it can be generalized to functions of several variables.) Note also that D(h) can be defined on the whole real line if Holder exponents h > 1 and h < 0 are used. In the case h < 0, one uses the weak scaling exponent defined in [208]. We will discuss the computational problems of determining D(h) later in this section. Multifractal signal processing consists in computing the spectrum of singularities D(h) and using it as a classification tool. We will use two examples from math- ematics to illustrate the power of this tool. It allows us to distinguish between regularly irregular functions and irregularly irregular functions. In the former case, irregularity can be anticipated, while in the latter case, this wild behavior cannot be anticipated at all. Strong, unexpected transients are responsible for this erratic behavior. The first situation is illustrated by the Weierstrass func- tion W(f) = Bn cos(Ant), where 0 < В < 1 and AB > 1. In this case а(жо) = = ho everywhere, and the singularity spectrum D(h) is trivial: D(h) = 0 if h ho, while D(h0) = 1. This situation is similar to that of fractional Brownian motion (В#(£) hi (9.1)), where си(жо) = H for ah xo- The second case is illustrated by the function 7Z(x) = n~2 sin(Trn2x), which is attributed to Bernhard Riemann, who is said to have suggested it as an example of a continuous, nowhere-differentiable function. Its complexity eluded mathemati- cians for more than a century, for is truly an erratic function. Its pointwise Holder exponent си(жо) is everywhere discontinuous. However, the spectrum of sin- gularities D(h) of the Riemann function is amazingly simple: D(h) = 0 if h < | or h > and D(h) = 4h — 2 if | < h < | [153]. (All of this is explained in more detail in Chapter 10.) The spectrum of singularities of other “special” functions can be computed explicitly, and it is amazing to see how many of these functions have nontrivial spectra (see [155] and [159]). We are now going to look at what the physicists have been doing. They make measurements; they measure the velocity of a turbulent flow or of other compli- cated functions. They then wish to know if the complexity of an experimental function can be explained by a multiplicative cascade or by some other hidden dy- namical system. For example, in the case of the Bernoulli measure pp, or of the corresponding devil’s staircase F, one would like to recover p and rules for con- structing pp from the knowledge of F. This is an inverse problem. Looking for a hidden multiplicative cascade that would generate a given signal is a great scientific challenge. This challenge is called multiscale system theory by Albert Benveniste and Alan S. Willsky. The goal is to recognize and analyze phenomena occurring at different scales. One wants to build “multiscale autoregressive processes” that will play the same role when one zooms across scales as arma processes do when one moves across time. The desire is to have an algorithm for detecting transients
WAVELETS AND TURBULENCE 133 across scales. These would be the unexpected events that appear as one zooms across scales. Neither algorithms nor software are yet available, and multifractal signal processing can be viewed as a limited attempt to achieve a multiscale system theory. However, some promising results on multiscale autoregressive models can be found, for example, in [23], [24], [25], and [70]. An initial piece of information that would help the search for the multiscale au- toregressive system or some other multiscale dynamic that explains the data set is the spectrum of singularities. Conversely, finding the spectrum of singularities becomes much easier if we know a priori that the signal has some simple multi- scale structure. This is indeed fortunate, since determining D(h) directly from the definition clearly requires an infinite amount of computing: The pointwise scaling exponent си(жо) must be computed at every point, which is obviously impossible. Furthermore, computing Hausdorff dimensions is also not feasible in practice, since it involves all possible coverings of the set being analyzed. The way around this impasse that Frisch and Parisi proposed involves the use of the structure functions I(y,p) = f |/(t + y) - f(x)\p dx, and it is based on the following heuristic reasoning. To speak of a multifractal structure means that for every positive exponent h, there is a set of singular points with Hausdorff dimension D(h) on which the in- crement \f(x + y) — f(x)\ acts like \y\h. The contribution of these “singularities of exponent h” to I(y,p) = \f(x + y) — f(x)\p dx is the order of magnitude of the product \y\ph |t/| 1-where the second factor is the probability that an interval of length \y\ intersects a fractal set of dimension As у tends to zero, the dominant term in I(y,p) is the one with the smallest possible exponent. This leads to the relation Ы<(р), (9.3) where C(p) = inf {ph + 1 - (9.4) h>0 The exponent £(p) is thus given by the Legendre transform of the codimension 1 — D(h), where D(Ji) is the Hausdorff dimension of the exceptional points x where \f(x+y)~/(t)| behaves like \y\h. If (9.4) is valid and if D is a concave function, then the spectrum of singularities can be recovered by the Legendre inversion formula D(h) = inf{ph + 1 — C(p)}- (9-5) p We say that the multifractal formalism applies to a given function f if (9.5) holds. This is not true in general. In fact, it was proved in [154] that any continuous func- tion D that is defined on [0,1] with values in [0,1] is the spectrum of singularities of some function f. The function D is not necessarily concave, so it cannot always be computed by (9.5). One might expect that (9.5) yields the convex hull of the spectrum of singularities. Unfortunately, this more conservative result is again too optimistic, since there are other reasons that cause the multifractal formalism to fail. One of these is the presence of chirps in the signal. This will be discussed in the next section.
134 CHAPTER 9 Alain Arneodo and his team modified the definition of the structure functions [16]. They replaced the crude increments f(x + у) — /(ж) with a smooth average of these increments, namely, with the wavelet coefficients W(f; y, x) = [ f(x + yt}'i/>(t}dt=- [ \ du. (9.6) J-oo У J-oc V У } As observed in [154], writing I + у) - ЛзТ dx < C\y\^, (9.7) where 0 < a < 1 and 1 < p < oo, is equivalent to f \W(f;y,x')\r‘dx < C'\y\ap, (9.8) or to f being in the Besov space B"’00. (The relation (9.8) is a possible definition of this Besov space. Besov spaces will appear again in Chapter 11. See Appendix D for a definition and discussion of Besov spaces.) It is actually possible to derive (9.3) and (9.4) from (9.8). Indeed, if f is in Ca(x0), then we will see in Chapter 10 that the wavelet transform of f near Xq satisfies the relation \W(f;a,b)\ < Caa(l + (9.9) Thus, in the cones \b — ж0| < C'a we expect that |BW)I~<A (9.Ю) where h is the Holder exponent of f at xq. The derivation of (9.3) and (9.4) then follows the same argument that was used to derive these relations from the definition of The relation (9.8) makes perfectly good sense when a is either negative or greater than one, whereas the structure functions \f(x + y) — f(x)\pdx do not offer the same flexibility. When the given function f is corrupted by noise, (9.8) can still be used on a range of scales, while the ordinary structure functions do not have a meaning. At this point, the program proposed by Frisch and Parisi needs to be reformulated. One starts with an arbitrary function f of a real variable and defines r(p) = sup{s I f e B*/p’°°}. (9.11) One then asks if D(h) = inf{hp + 1 — r(p)}. (9-12) p But (9.12) is not always true. Even if D is a concave function, this concave function is not generally given by an infimum of affine functions with positive slopes. If D(h) is bell shaped, negative values of p also are needed to obtain the decreasing part (to the right of the maximum) of D(h) in (9.12). The best result, which is given in [154], is this: If f E C£(1R) for some £ > 0, then D(h) < inf {hp + 1 — т(р)}, (9.13)
WAVELETS AND TURBULENCE 135 where pc is the only value of p for which r(p) = 1. Since p is the slope of the D(Ji) curve, we cannot expect to reconstruct the decreasing part of the curve without using negative values of p. Negative p’s are needed, but Besov spaces with negative p’s do not make sense. Indeed, the defining relation (9.8) does not make sense for negative p’s: Estimating the integral in (9.8) for negative p’s is clearly a totally unstable calculation because the integral will diverge whenever the wavelet transform W(J;a,b) vanishes. To proceed with the program, it is necessary to “renormalize” this divergence by eliminating most locations where VE(/;a,6) is very small. To do this cleverly, one must keep in mind the point of the computations: If the spectrum of singularities has a bell shape, we want to obtain the decreasing part of D(h). This part corre- sponds to the (relatively) small sets of points where the function has a large Holder exponent. Equation (9.9) shows that |VE(/;a, 6)| must be uniformly small on the cones above Xq if си(жо) is large, and, conversely, (9.9) shows that if |VE(/;a,6)| is small at some places and large at others in these cones, then f will not be smooth at x0. This observation provides the clue to the needed renormalization. The idea is to eliminate from the computation of the integral in (9.8) small values of |VE(/;a, 6)| that are surrounded by large ones and to keep only those values for which the wavelet transform is uniformly smaller around ж0. In other words, one takes into account only the local maxima of the wavelet transform. This is what Arneodo and his group do. They implement Mallat’s idea of using only the local maxima of the wavelet transform.6 The technique is known as the wavelet transform modulus maxima (WTMM) algorithm, and this is how they apply it: Let f be the function to be analyzed and let W(f;a,b) — j f(b + at)fft)dt be its wavelet transform, where a > 0 and b E IR. The first step is to find, for a fixed scale a, the values of b where |VE(/; a, 6)| attains a local maximum. These values are denoted by bjfa), j — 0,1,2,..., but one retains only those arguments b for which the points (6j(a), a) belong to a continuous curve that eventually reaches the horizontal axis, a = 0. These curves often bifurcate as a tends to zero. It is believed that the pattern of these branchings is a symbolic representation of a multiplicative cascade. This means that we are looking for a multiscale autoregressive process that would yield the data set. The next step consists of computing the partition functions Z(p,a) = У2 (9.14) 3 where the sum runs over the skeleton of the function f, that is, over those connected lines of local maxima. Finally, one hopes to find a power-law behavior that reads Z(p, a) ~ ат(р). (9.15) This optimistic search is made by looking at a plot of logZ(p, a) versus log a. The anticipated result is a linear plot whose slope т(р) does not depend on the choice of analyzing wavelet that is used to compute Wff; a, 6). 6See [7] and [16] for a more complete account of Arneodo’s work. More about the WTMM and Mallat’s work can be found in [181], [182], and [184].
136 CHAPTER 9 The use of partition functions such as (9.14) leads to intractable mathematical difficulties (see [154] for a discussion), and although the technique yields sharp numerical results, the numerical algorithms must be handled very carefully. There is, however, a variant of (9.14) that is mathematically robust. Again, the idea is to eliminate small values of the wavelet transform that carry no information. This idea was discussed above, but here the recipe is slightly different. We replace |VK(/; у, ж)| in (9.8) by d(a,6) = sup |I4Z(/: a/, 6Z)|, (a',6') where the supremum is taken over the box [0,a] x [b — a,b + a]. It can be shown that even for p < 0, the new exponent т?(р) that we obtain from the relation j dp(a,b)db~ar^ (9.16) is independent of the wavelet -0. The exponent у (p) is not altered by the addition of a smooth perturbation, and using y(p) in (9.5) instead of £(p) actually yields the correct part of the decreasing spectrum for many mathematical functions on which we can test the validity of these formulas (see [157]). We can reformulate these results by saying that the condition У dp(a,b) db < asp defines a natural extension of the Besov spaces B®,o° for negative values of p. We now return to fully developed turbulence. The very accurate data obtained from experiments made at the Modane wind tunnel led Frisch and Parisi to make the conjecture that D(h) is a “universal” function, which means that it is independent of the specific medium, the boundaries, and other details of a fully developed turbulent flow. If this were true, the determination of D(h) would yield important information about the nature of turbulence. It became clear to Arneodo and his colleagues that £(p) cannot be defined pre- cisely using (9.15) [11]. The log-log plot of Z(p, a) versus a was not a straight line over the full range of scales that represent the inertial range. Something even worse happened: When the scales are restricted to a range where the behavior is approx- imately linear, the slope depends on the analyzing wavelet. These considerations have led some investigators to question the definition of the inertial range. This consists of the scales that lie between two extremes: the largest scale where the flow is created and the smallest scale where energy is dissipated as heat due to the viscosity. Fitting data to a modified form of Castaing’s continuous-scale models led Arneodo to suggest that the scale at which dissipation occurs might fluctuate throughout the signal. There is both good news and bad news. The bad news is that it is not possible to apply the multifractal formalism in the strict sense given by (9.15) to turbu- lent flows. The good news is that this failure has opened a new line of research whose objective is to gain a better understanding of the inertial range in fully de- veloped turbulence. The dissipation scale is not constant but instead depends on the dynamical properties of the flow (see [233]). We mentioned in Chapter 1 and again in Chapter 8 that wavelet techniques have been used to analyze DNA sequences, and this is another application of the
WAVELETS AND TURBULENCE 137 WTMM algorithm. One first associates a sequence of real numbers xn with a DNA sequence consisting of the four nucleotides A, C, G, and T as follows: Select four real numbers va, vc, vg, and vt to represent A, C, G, and T respectively; if the zth nucleotide is j(z), where j(i) E {A, C, G, T}, define n xn Vj(j) (9.17) z=Cl One expects that the statistical properties of this sequence will yield pertinent information about the corresponding genome, and indeed this is the case. Arneodo and his coworkers have successfully applied the WTMM algorithm to compare this sequence with an fBm [13]. This technique allowed them to distinguish coding sequences (where (9.17) is statistically similar to regular Brownian motion) from noncoding sequences (where (9.17) is statistically similar to an fBm with index different from |). A more involved analysis of the long-range correlations in (A.l) and of its multifractal properties can be found in [12] and [8]. Finally, we note that the maxima lines of the wavelet transform are being used by Nicolleau and Vassilicos to characterize the intermittency in turbulence [219]. 9.5 Coherent structures Anyone who has seen experiments done in a water tunnel or seen one of the edu- cational films about turbulence has surely noticed that the flow is not “completely chaotic” and that it exhibits a sort of organization at large scales, at least at rel- atively low Reynolds numbers. The objects we see are called coherent structures, and technically they represent local condensations of the vorticity field that last longer than other characteristic times associated with the flow [105]. Unfortu- nately, this is about all we can say, and an initial problem is that there is not a precise mathematical definition for the objects called coherent structures. Coherent structures cannot be observed directly in wind-tunnel experiments at high Reynolds numbers. However, if one accepts the calculation of D(Ji) performed by Arneodo and his team on the Modane signal, then one has a first hint of the existence of coherent structures in fully developed turbulence. An examination of the data shows the existence of negative Holder exponents, and this implies the existence of extremely strong velocity gradients. If one does not dismiss these negative exponents <а(жо) as artifacts, then one interpretation of them is the rare passage of a strong vortex filament past the probe. These vortex filaments are coherent structures; they are rare and elusive, but they are thought by many experts to be one of the keys to understanding turbulence. While coherent structures are not visible in high Reynolds number wind-tunnel experiments, they are accessible in two-dimensional simulations, and this will bring us to another application of wavelets in the field of turbulence. We have already seen how wavelets are used to analyze experimental turbulence data. In section 9.7, we will describe how they are used to analyze simulated turbulence. The detection of coherent structures and the multifractal analysis of fully devel- oped turbulence are two completely different wavelet-based investigations. First, there is a great difference in scales: Multifractal analysis is concerned with the smallest scales, while the coherent structures that are studied are, for the most part, large-scale objects. Second, the analyses are performed on different signals: Multifractal analysis is done on the very precise one-dimensional signals obtained in
138 CHAPTER 9 wind-tunnel experiments, whereas wavelet analysis “looks for” coherent structures that are generated by two- and three-dimensional numerical simulation. In fact, numerical simulations are not capable of producing useful small-scale data, and this will probably be the case for some time to come. On the other hand, trying to understand coherent structures by analyzing one-dimensional wind-tunnel data may be like trying to understand a symphony by hearing only one instrument. The use of digital computers to simulate turbulent flows was anticipated at the time the first stored-memory computers were being built. This is what Herman H. Goldstine and John von Neumann wrote in 1946 [131, p. 4]: The phenomenon of turbulence was discovered physically and is still largely unexplored by mathematical techniques. At the same time, it is noteworthy that the physical experimentation which leads to these and similar discoveries is a quite peculiar form of experimentation; it is very different from what is characteristic in other parts of physics. Indeed, to a great extent, experimentation in fluid dynamics is carried out under conditions where the underlying physical principles are not in doubt, where the quantities to be observed are completely determined by known equations. The purpose of the experiment is not to verify a proposed theory but to replace a computation from an unquestioned theory by direct measurements. Thus wind tunnels are, for example, used at present, at least in large part, as computing devices of the so- called analogy type (or, to use a less widely used, but more suggestive, expression proposed by Wiener and Caldwell: of the measurement type) to integrate the nonlinear partial differential equations of fluid dynam- ics. Thus it was to a considerable extent a somewhat recondite form of computation which provided, and is still providing, the decisive mathematical ideas in the field of fluid dynamics. It is an analogy (i.e., measurement) method, to be sure. It seems clear, however, that digi- tal (in the Wiener-Caldwell terminology: counting) devices have more flexibility and more accuracy, and could be made much faster under present conditions. We believe, therefore, it is now time to concentrate on effecting the transition to such devices, and that this will increase the power of the approach in question to an unprecedented extent. In spite of the enormous progress made since these comments, the best super- computers cannot solve three-dimensional Navier-Stokes equations with enough resolution to yield accurately enough scales in the velocity field to verify the mul- tifractal hypothesis. What supercomputers can do is give a good sketch of the solution at (relatively) low Reynolds numbers. One of the chief advocates for studying turbulence in physical space and, in par- ticular, for studying coherent structures has been Norman Zabusky, who is perhaps best known for his discovery (in collaboration with Kruskal) of solitons. In 1977, he wrote (quoted from [264, p. 41]): In the last decade we have experienced a conceptual shift in our view of turbulence. For flows with strong velocity shear... or other organizing characteristics, many now feel that the spectral or wavenumber-space description has inhibited fundamental progress. The next “El Dorado” lies in the mathematical understanding of coherent structures in weakly
WAVELETS AND TURBULENCE 139 dissipative fluids: the formation, evolution and interaction of metastable vortex-like solutions of nonlinear partial differential equations. The study of coherent structures is pursued both experimentally and compu- tationally, but before we describe some of the latter work we have a few more comments on coherent structures themselves. We mentioned that what we call coherent structures are condensations (or concentrations) of vorticity. These struc- tures include vorticity tubes, those thin, swirling miniature tornadoes that are seen in water tunnel experiments. (For a precise description of vorticity tubes, the reader is referred to [52].) Recall that vorticity is defined as the curl of the velocity, and hence vorticity concentrations corresponding to coherent structures are sources of low pressure. This fact is crucial in the experiments of Yves Couder, which are described in the next section. But before going on, we return for a moment to the negative Holder exponents found by Arneodo and his team. They could be dismissed as artifacts, but to do so in science can lead to “missing the gold ring.” We prefer to regard them as rare but important events as suggested in [15]: “One tentative interpretation could be the occasional passage near the probe of slender vortex filaments of the sort observed in numerical simulations .... At first sight it seems that this interpretation should be rejected on the ground that the probe is measuring along a line and such a line has almost surely an empty intersection with the vortex filaments which, on inertial-range scales, appear as one dimensional objects.” 9.6 Couder’s experiments If seeing is believing, then the very clever experiments done by Couder and his collaborators provide convincing evidence for the existence of vortex filaments. We will describe these experiments, but first we need to return to the Navier-Stokes equations and relate the pressure to the viscosity and to 2 1 ч f dui 3uj A 67 2 \ dx , dxi J Note that the energy dissipation rate was z/cr2 = e. Now we have Ap = р,(|и?|2 — cr2), (9.18) which explains why large values of the length |cu| of the vorticity ш correspond to minima of the pressure. Here, p = ^, where p is the fluid density. The identity tells us that p will be recovered as a Coulomb potential of |u?|2 — cr2. This is an averaged quantity, and this implies more regularity and stability. It means that the regions where the pressure is low are well defined and easier to detect than the places where |cu| is large. In the experiments, Couder measures the pressure p(rr, t) at a given space loca- tion x = a?o, as a function of the time variable, and he also images the regions of low pressure, which according to (9.18) correspond to vortex tubes. The pressure is measured by a piezo-electric probe. The low-pressure regions are imaged by inject- ing microbubbles into the flow. These microbubbles migrate toward the regions of low pressure and accumulate in regions of strong vorticity. The low-pressure vortex filaments can thus be visualized. The pressure is recorded as a function of time, as is the image, and the visualization of the depressions can be correlated with the low peaks in the recorded pressure signal.
140 CHAPTER 9 Similar experiments have been done by S. Fauve and C. Laroch [106], and the data from these experiments have been analyzed by Patrice Abry [1] using wavelet techniques. Wavelet analysis is shown to be particularly useful in detecting and analyzing the low-pressure peaks; wavelets are used to split the signal into two components, where the first consists of these well-defined low-pressure peaks and the remainder is treated as noise. Abry and his collaborators use a statistical modeling of the “background noise.” An abrupt change from this statistical model shows that a vorticity filament has been detected. They then make a statistical decision on the wavelet coefficients to determine the coefficients that are due to the vorticity filament and those that are background. They are then able to do a cascade-type analysis on the “cleaned background.” Once the coefficients have been separated, Abry and his collaborators do an analysis similar to Arneodo’s (see, for example, [16] and [10]). This original approach borrows ideas from two quite different points of view: cascade models and coherent structures. This kind of processing will be met again in the next section and in a more systematic way when we discuss Donoho’s work in Chapter 11. 9.7 Marie Farge’s numerical experiments Marie Farge wished to detect and extract coherent structures in two-dimensional simulated turbulence. Farge explained how and why she was led to use wavelet analysis in her study of numerical simulations of two-dimensional fully developed turbulence [103, p. 289]: It is important to realize that the wavelet transform is not being used to study turbulence simply because it is currently fashionable; but rather because we have been searching for a long time for a technique capable of decomposing turbulent flows in both space and scale simul- taneously. If, under the influence of the statistical theory of turbulence, we had lost in the past the habit of considering the flow evolution in physical space, we have now recovered it thanks to the advent of su- percomputers and their associated means of visualization. They have revealed to us a menagerie of turbulent flow patterns, namely, the ex- istence of coherent structures and their elementary interactions ... for which the present statistical theory is not adequate. What Farge asks of wavelet analysis (or of any other form of time-frequency analysis) is to decouple the dynamics of the coherent structures from the residual flow. The residual flow would play only a passive role in an action whose protag- onists would be the coherent structures; these “protagonists” clash or join forces according to their “sign.” One should keep in mind, however, that the Navier-Stokes equations are nonlin- ear, and the interactions between the coherent structures and the residual flow is one of the main difficulties in Farge’s program. The decoupling is a first approxi- mation, which is believed to be valid for only a short time. In particular, it should be noted that, unlike solitons, when two coherent structures meet, there can be a strong interaction that leads to their fusion into a new coherent structure. Farge, after having tried several methods to extract the coherent structures from the residual flow, decided to use Victor Wickerhauser’s algorithm (discussed in Chapter 7), which provides a decomposition in a basis adapted to the signal. The
WAVELETS AND TURBULENCE 141 results are surprisingly good and are discussed in [104]. However, this method- ology relies heavily on the algorithm being used. The problem here is similar to that in image processing. We are claiming—if we accept the use of the best-basis search—that a compression-oriented method can detect patterns. Coherent struc- tures in turbulence are intricate features, and it seems remarkable that they could be extracted by using such a general tool as a best-basis search. The appropriate scientific explanation of this phenomenon remains to be found. 9.8 Modeling and detecting chirps in turbulent flows Chirps have been mentioned in several places and contexts. In Chapter 5, we defined a chirp to be a function f of the form /(t) = where A and ip satisfy the conditions AW A^p^t) and zq) (V'W)2 (9.19) < 1 c 1. Linear chirps and hyperbolic chirps were also defined in Chapter 5 in the discus- sion of the Wigner-Ville transform. Chirps appeared again in Chapter 6 in the construction of chirplets and chirplet bases, and we mentioned the specific chirp, /(t) = (t - t0)-1/4 cos[cj(t0 - t)5/8 + 0], in a brief discussion of gravitational waves. We are now going to discuss chirps in turbulent flows. Determining if there are chirps in fully developed turbulence is a current research issue. Superficially, is seems to be related to the problem of understanding coherent structures. In fact, it is possible to imagine coherent structures as being two- and three-dimensional chirps. Thus, being able to say something about the existence and distribution of chirps might have implications about coherent structures and their distribution. Again we emphasize that our “window on turbulence” is a one- dimensional velocity signal, so the immediate question is whether or not this signal contains chirps. We mentioned in section 9.4 that chirps in the signal can cause the multifractal formalism to fail. This problem is another motivation for studying chirps in the context of wavelets and the multifractal formalism. Indeed, to establish the multi- fractal formalism, we made the assumption that the wavelet transform satisfies |W(a, 6)| ~ ah in the cones \b — to| < Ca above a point where the Holder exponent is h. This contains implicitly the very specific assumption that there exist only “vertical” ridges in the signal, and such ridges correspond to a cusp singularity like \t — tol^- This means that if we wish to detect chirps with a multifractal formalism, then the standard multifractal formalism must be modified. This has been done, and we will describe this “grand canonical” multifractal formalism after some preliminary comments. It is unrealistic to expect to find an algorithm to detect the general (nonparamet- ric) behavior described by (9.19). If we wish to detect chirps, then it is necessary to be more specific about the objects we are trying to detect. What we want is a parametric model of a chirp. Several mathematical models have been proposed. Perhaps the simplest way to make (9.19) more specific is to require power-law be- haviors as t —> to, and the simplest way to do this is to model chirps by the functions fh,(3 defined by /м(() = |4-40|М"~‘"1Т (9.20)
142 CHAPTER 9 where h is the usual Holder exponent and /3 > 0 is called the oscillation exponent. We wish to find an algorithm that yields h and /3 when Д^ is the analyzed function, but we also want the algorithm to yield h and /3 if the analyzed function “looks like” Д^ near to- This is similar to the situation where an algorithm yields the Holder exponent when the function /(t) = \t — tol^ is analyzed, but it also detects any function whose Holder exponent is h. Thus, we see that there is the problem of defining the class of functions that “look like” (9.20) at to- Defining functions whose Holder exponent is h at t0 was relatively simple (recall (2.4) and (2.5)); saying what it means for a function to have a chirp at to is more subtle because it must involve the oscillation exponent /3, and there is not yet a universally accepted definition. Yves Meyer observed that if one integrates (9.20) n times, then the Holder expo- nent of f at to becomes t?, + n(l + /3). (This is easily checked by repeated integration by parts.) This means that the oscillation exponent /3 causes the Holder exponent to increase by 1 + (3 after each integration rather than by 1, as might be expected. This led to the following definition. Definition 9.1. f has a chirp of type (h,/3) at to if f is Ch(to) and if the iterated primitives f(~r>2> are Ch+n^1+l3\to). Note that this definition cannot be compared with (9.19) because there is no differentiability assumption in a neighborhood of to- This definition is, however, consistent with the “ridge heuristic”: Meyer proved that this behavior can be char- acterized by precise decay estimates of W(a, b) as (a, 6) moves away from the ridge a = \b — t0|1+/3 (see [160]). We will see in the next chapter that Riemann’s func- tion ^2 T2 sin7rn2t has a chirp at to = 1 according to this definition. Although the definition is well adapted to the study of mathematical examples like Riemann's function, it is not well adapted to practical signal analysis. Indeed, a minimal re- quirement for a definition to be useful for real data processing is invariance under the superposition of smooth noise of small amplitude. This is not the case here as shown by the following example. Let /(t) = th sin + A sin(2Jt), (9.21) where H h and A is small. This models a chirp (the first term) plus smooth noise of small amplitude (the second term). For A = 0, f has a chirp of type (h^ff) at t — 0, but as soon as A A 0, the Holder exponent of f^~n>> at 0 is inf{h + n(l + /3), H + n}. Thus if n is large enough, the Holder exponent increases by one at each integration and the oscillation exponent of (9.21) is not /3 as it should be by Definition 9.1, but it is 0. An alternative definition, which is robust with respect to the addition of smooth noise, was proposed by Arneodo, Bacry, Jaffard, and Muzy [9]. It is based on the experimental observation that one can see the oscillations of the chirp in the graph of (9.21) as long as H > h and on the belief that in this case, the oscillation exponent should reflect these oscillations. The next definition meets this requirement. Definition 9.2. Let (I — denote the fractional integral of order s and let hs(t) be the Holder exponent of (I — ^2)~s^2f- The oscillation exponent of f at to is defined to be lims_hs{to)-ho(to) _ wpere /i0(t0) = h(to) is the ordinary Holder exponent of f at to.
WAVELETS AND TURBULENCE 143 (Note that the operator (I — ) s^2 amounts to multiplying the Fourier trans- form of the signal by (1 + £2)-s/2.) To apply Definition 9.2 one needs only to check how much the Holder exponent increases under fractional integration of infinitesimal order, while Definition 9.1 requires one to check infinitely many integrations. Although Definition 9.2 is even further from the original definition of a chirp (9.19), it also has a characterization in terms of decay away from the ridge; however, in this case, the decay is much slower than for Definition 9.1. Having a robust definition is only the first step to detecting chirps in turbulence, but it leaves unsolved the fundamental problem: It is not at all clear what elu- sive chirps look like. Whatever they are, we expect them to be three-dimensional objects; in fact, we expect them to be coherent structures, perhaps like very fine vortex threads. The available signals are one-dimensional cuts, and it is not clear that one-dimensional cuts of “three-dimensional chirps” (whatever they are) are chirps as we have defined them. This problem has been studied by J.-M. Aubry in [14], and the situation is far from clear. Aubry developed a menagerie of three- dimensional chirps, and for some of them, almost every cut is a chirp. Nevertheless, as things now stand, we have no theoretical or numerical reason to believe that the possible chirps in turbulence belong to one category or another. If one is determined to go on a snark hunt, then it is necessary to have a bag. To look for chirps in turbulence, this means that we must have an algorithm that can be applied to the one-dimensional data. There are two approaches: If one expects to detect a few isolated spiral-like structures in a noisy environment —like the problem of detecting gravitational waves then ridge identification could be? considered (see section 6.11 and [148]). However, if one believes that chirps are so pervasive that only a statistical approach makes sense, then it is reasonable to look for an extension of the multifractal formalism. Such an extension has been proposed by Stephane Jaffard [156], and it is currently being implemented as a numerical algorithm by Alain Arneodo and his group in Bordeaux. The goal is much more ambitious than in the classical multifractal formalism. We now wish to determine the Hausdorff dimension of the set of points where there is a chirp with Holder exponent h and oscillation exponent /3. We denote this dimension by D(h,(3). This function is called the spectrum of oscillating singularities. We are going to describe this extension, but first we wish to show by an example why the classical formalism fails. We mentioned that the derivation of the standard multifractal formalism makes the assumption that all Holder singularities are cusp- like. An even more radical way to see the limitation of this multifractal formalism is to compare its behavior on the devil’s staircase, which we call F, and on the chirp (9.22) We are going to compare the behavior of IV(F; a, b) and IV(f; a, 6) at a small fixed scale a. For such a fixed a, there are positive constants Ci and such that Ciaiog2/iog3 < a, 6)| < C2ulog2/log3 (9.23) on intervals of length a around the lines where |W(F;a, b)\ attains it maxima (as a function of 6), and |W(F; a, b)\ decays rapidly outside these intervals. There are /(t) = ^sin-g.
144 CHAPTER 9 about a (los2/los3) such lines at the scale a, so that the total length of the region where (9.23) holds is about д1-1^2/1^3. Similarly, for the chirp, < |W(a, b)\ < C'2aF+v on intervals of length ar+^ around the ridges defined by a = |6|1+/3. This means that the statistics based on the wavelet coefficients at a given scale will be exactly the same for the devil’s staircase F and the chirp (9.22) if we choose 1 log 2 _ log 2/ log 3 l + /3~ log3 “ “ 1—Iog2/log3’ which amounts to having h = (3 = iog2 • The message from this example is this: To differentiate two such behaviors using a multifractal formalism, it is necessary to capture more information than is carried by the wavelet transform on the lines a — a constant. This touches on a recurrent theme associated with the use of the wavelet transform for analyzing the local behavior of functions: It is necessary to use all of the information about the wavelet transform contained in a full neighborhood of the points to. This is the heuristic: One must examine “some function” of W(a, b) in neighborhoods (0, а] X [b — a, bTa] as a —> 0. The “function” varies depending on the task. Recall that when faced with the problem of making sense of J \ W(a, b)\p db for p < 0, we were led to consider the function d(a, b) = sup |W(/; a', b')\, where the supremum is taken over (а',У) G [0, a] X [b — a, b + a]. To extend the multifractal formalism to one that will detect chirps, we use a slight variation of this idea. We consider the function ds(a, 6) = sup |(a/)sW(a/, &')|, where the supremum is taken over (a',&') G (0, a] X [b — a, b + a], and we define 7/(p, s) by f ^(a.bjdb-a^. (9.24) We now use a heuristic argument similar to the one used to derive (9.3) and (9.5) to derive D(h, (3) from (9.24). We begin by estimating the contribution of the chirps with Holder exponent h and oscillation exponent (3 to ds(a,6). If the box (0, a] x [6 — a, b + a] is centered above such a chirp, the ridge intersects the box at a' = a1+/3, and at this point |W(a',&')| ~ ah. Thus, ds(a,6) ~ a^1+^sah and d^/p(a,6) ~ a(i+/3)s+p/i we now follow exactly the argument presented in section 9.4. The total contribution of these chirps to the left-hand side of (9.24) is thus a(l+(3)s+ph+l-D(h,(3) so that p(p, s) = inf{(l + (3)s + ph + 1 - D(h, /3)}. h,(3 p(p, s) is obtained from 1 — D(h, (3) by a two-dimensional Legendre transformation. Thus, if D(h,(3) is concave, then 1 - D(/z, /3) = inf{(l + /3)s + ph + p(p, $)}.
WAVELETS AND TURBULENCE 145 The validity of this formula has been successfully tested on functions that display fractal sets of chirps [156], and as indicated above, its numerical implementation is being now undertaken by Arneodo and his team. 9.9 Wavelets, paraproducts, and Navier—Stokes equations This section is devoted to discussing the use of wavelets for solving the Navier- Stokes equations numerically. Divergence-free orthonormal wavelet bases were first found by Battle and Federbush [30], and the construction was later improved by P. G. Lemarie-Rieusset [171]. It is natural to try using these bases in place of fi- nite element methods in numerical schemes. By doing so, the Galerkin method is consistent with the invariance of the Navier-Stokes equations under certain trans- formations. We begin by writing these equations: I du — = — (uidi + + u3d3)u — Vp, + d2u2 + = 0, (9.25) u0r, 0) = ио(ж), where x = (a?i, <r2, ж3) belongs to IR3. In our model problem, there is no boundary, the fluid fills the space IR3, and there are no external forces. The system (9.25) con- tains four unknown functions iq, U2, u3, and p and consists of four equations (plus an initial condition), so the balance is correct. The transformations we considered are the group actions defined by Ju(rr,t) Au(Arr,A2t), Л > 0, t) u(x — y,t), у G IR3. The Battle-Federbush basis provides a Galerkin scheme into which the affine group actions can be incorporated. The Battle-Federbush basis is 23^2'ф(2:>х — k),j E Z, к E Z3, 0 G A, where A is a collection of 14 divergence-free wavelets. This basis spans the closed subspace of L2(IR3) XL2(IR3) x L2(IR3) defined by и = (ui,U2,u3), Uj E L2(IR3), and <9iUi+<92^2+ <93u3 = 0. We have 0 = (01,02,03), and these three functions belong to the Schwartz class. Furthermore, in one of the constructions [207], the Fourier transform of each of these functions is supported by the annulus defined by 2 8 -7Г < sup{|£iI, l&l, |€з|} < хтг. О о The numerical scheme that follows is aimed at decoupling the Navier-Stokes into a sequence of equations. This idea was proposed by several authors, including P. Frick and V. Zimin, whom we quote [116, p. 265]:7 Ideas, like the ones used to create wavelet analysis, were proposed by Zimin (1981) for construction of a hierarchical model of turbulence. In a paper by Zimin (1981) a special functional basis has been pre- sented. Functions of this basis are related to a hierarchical system of vortices of different sizes. The number of vortices in a unit volume in- creases with decreasing size and each function is well localized both in 7References cited here are in the original article.
146 CHAPTER 9 Fourier and physical spaces. The product of the characteristic scales of localization in Fourier and coordinate spaces satisfies the uncertainty condition. The cascade equations, written for the quantities Aj, each define the velocity oscillations in the interval of wave numbers and describe the principal characteristics of energy redistribution processes between different scales. The cascade equations minimize the dimensionality of systems, which describe the turbulent flows in a wide range of wave numbers, and have a form dtAj — fcXijkAj A^ T УгАг T Ft, where Fi characterize the energy sources in corresponding interval of spectrum .... The hierarchical model of turbulence is based on the natural assump- tion that turbulence is an ensemble of vortices of progressively diminish- ing scales. The hierarchical basis for two-dimensional (2D) turbulence describes the ensemble of the vortices, in which any vortex of the given size consists of four vortices of half size and so on. The ensemble of vor- tices of the same size forms a “level.” The functions of the hierarchical basis are constructed in such a way that Fourier-images of vortices of [a] single level occupy only [a] single octave in the wave-number space and regions of localization of different levels in the Fourier space do not overlap. The wave-number space is divided at ring zones such that F2n < \k\ < 7v2n+1. This quotation from Frick and Zimin implies that these authors are using the Shannon wavelet basis. This remark is made explicitly in their paper. The scaling function of the one-dimensional Shannon basis is the sine function </?(t) = , while the corresponding mother wavelet is given by -0(t) = 2</?(2t) — </?(£). These functions have poor localization in the coordinate space, although they have an ideal localization in the frequency domain. Shannon’s wavelets correspond to ideal filters in signal processing, and these ideal filters are unrealistic because their numerical support is infinite. For the same reason, Shannon’s wavelets cannot be used in numerical analysis. Indeed, the nonlinear terms that appear in the Galerkin scheme do not have a rapid off-diagonal decay, and the Navier-Stokes equations are not decoupled in the Shannon’s basis. The obvious question is, What happens if Shannon’s basis is replaced with the Battle-Federbush basis? The corresponding Galerkin scheme looks like this: u(x, i) = ад^)^д(ж), (9.26) aga where -0д is a condensed notation for the Battle-Federbush basis, and 4ai(t) = ^^A.A')av(t)+ Y ЖТА")аА,(4)ад.,((), (9.27) A' A',A"
WAVELETS AND TURBULENCE 147 where t?(A, A') = (A^v,^), /3(A, A', A") = ЬффХ', Фх", Фх), and з з r b(u, v, w) — EE uk(x)(dkvi)(x)wi(x) dx. (9.28) fc=l/=1J The Navier-Stokes equations are decoupled by this Galerkin scheme if and only if both ту(А, A') and /3(A, A', A") have a rapid off-diagonal decay. Concerning 7?(A, A'), we observe that 7?(A, A') = 0 whenever \j' — j\ > 3. (The wavelet ф\ is located at /c2~7 and the corresponding scale is 2-J.) If \j' — j\ <2, then these coefficients ту(А, A') have a rapid off-diagonal decay, since the wavelets фх belong to the Schwartz class. If the Shannon wavelets were used, this would not be the case, which rules out this basis. Now for the bad news. Even if the Battle-Federbush basis is used, /3(A,A',A") takes large off-diagonal values. Indeed, if A' = A" and j —* —oo, then /3(A, A', A") does not decay rapidly. This problem appears whenever one considers the product fg of two functions f and g whose Fourier transforms are supported by the ring R < |£| < 2R, where R is large. In this situation, the product fg may generate large low-frequency terms. This remark serves as an introduction to the so-called paraproduct algorithms that apply to the pointwise product between two nonsmooth functions f and g. Paraproduct algorithms are a way to analyze the application of nonlinear operators on highly oscillating functions. It is a way of rewriting the result as a hierarchy of terms that are easier to analyze. These techniques have been used successfully in the mathematical resolution of fluid dynamics equations (see [45], [50], [51], and [207]). Taking a broader perspective, we note that the pertinence of wavelet methods to the numerical solution of partial differential equations remains an unclear issue. One significant drawback is the lack of flexibility in constructing wavelets adapted to complicated geometry. Equally significant is the fact that multigrid algorithms, which share many of the desirable properties of wavelet algorithms, attained a mature development before wavelets were introduced. One point where wavelets seem to be competitive is in local refinements. It is easy to add a few new wavelets where “something” seems to be happening, whereas local refinements of meshes of finite elements are more complicated to handle. (We suggest [56] by A. Cohen, W. Dahmen, and R. DeVore, where these questions are discussed.) 9.10 Hausdorff measure and dimension Hausdorff dimension is a mathematical tool that allows one to quantify the fractal behavior of the functions and measures that have been mentioned in this chapter. It is a key tool in the mathematical development of multifractal analysis, and thus this last section also provides background for the next chapter. We are mainly interested in Hausdorff dimension, but to get there it is necessary to pass through the definition of Hausdorff measure. (Hausdorff measure appears once in section 11.4.) Our discussion follows that of Falconer in [101], and we recommend this book to anyone who wishes to learn more about these ideas. For any nonempty subset U С ЖС, the diameter of U is defined to be \U\ = sup{|z - y\ \x,y E IF'}- An s-cover of a subset А С ЖС is any countable (or finite) collection of set {£4} such that A c and 0 < \Uj\ < e.
148 CHAPTER 9 For any subset A C Rn, any s > 0, and any e > 0, define = (9.29) i=l where the infimum is taken over all s-covers of A. If s' < £, then every s'-cover is an s-cover, and hence 7Y^(A) > 7Yf(A). Thus TYf (A) tends to a limit as s —> 0, and we write 7YS(A) = lim TYf (A). (9.30) e—i-O 7YS(A), which is often infinite, is called the s-dimensional Hausdorff measure of А С ЖА. It can be shown that 7YS is a measure, and, in fact, n-dimensional Hausdorff measure is, up to a constant multiple, Lebesgue measure. We are not concerned here with Hausdorff measure, so we will move directly to the definition of Hausdorff dimension. Observe that if t > s and {tZj} is an s-cover of A, then i—1 i—1 Taking the infimum of both sides shows that ?Y|(A) < A-s?Yf (A). If ?YS(A) < oo, then by taking the limit as s —> 0 we see that 7Y*(A) = 0. In short, t > s and 7YS(A) < oo imply that 7Y*(A) = 0. A direct consequence is that 7Y*(A) = 0 for all А С ЖА whenever t > n: Since 7Yn is Lebesgue measure (up to a factor), the 7Yn measure of the unit bah in is finite-, it follows that TY^R") = 0 if t > n. The Hausdorff dimension of A is then defined as dimH(A) = inf{s | 7YS(A) = 0}. (9.31) If 7Y°(A) < oo, then it follows from the definition that A is finite. Thus with this exception, it is clear from the discussion that dimH(A) = inf{s | 7YS(A) = 0} = sup{s | ?YS(A) = oo}. (9.32) Thus the Hausdorff dimension of F is the point where the graph of 7YS(A) as a function of s “jumps” from infinity to zero. The Hausdorff dimension agrees with the ordinary definition of dimension for smooth objects: A smooth curve in Rn has Hausdorff dimension one, a smooth surface has Hausdorff dimension two, and, in general, a smooth m-dimensional manifold has Hausdorff dimension m. In particular, the unit sphere in Rn has Hausdorff dimension n — 1. On the other hand, the Hausdorff dimension of Cantor’s triadic set is as shown by Hausdorff in [140].
CHAPTER 10 Wavelets and Multifractal Functions 10.1 Introduction We presented the conjecture of Frisch and Parisi concerning the multifractal nature of the velocity of a turbulent fluid in the previous chapter on wavelets and turbu- lence. They introduced the hypothesis that there is a set of points with Hausdorff dimension D(Ji) where the velocity increments satisfy |п(ж + Дж, t) — v(x, t)\ ~ |Дж|\ and from this they argued that [ |п(ж + Дж, t) — v(x, t)\p dt ~ |Дж|<’(р) (Ю-1) Jr as |Дж| —* 0, where £(p) = inf{/ip + 1 - D(h)}. (Ю-2) h We are interested in D(h) because it tells us about the fractal or multifractal nature of fully-developed turbulence, but £(p) is the quantity we can compute numerically. Fortunately, under the assumption that D(h) is concave, it can be recovered from £(p) by a classical Legendre inversion formula D(h) = inf{/ip + 1 - C(p)}. (10.3) p Since Holder exponents and Hausdorff dimensions cannot be reasonably computed numerically, (10.3) is the only way to obtain the spectrum of singularities of a signal. Unfortunately, our understanding of this formula is quite poor; there are examples and counterexamples of its validity. (See [154] for a discussion.) The good news is that we can test (10.3) on several mathematically defined functions for which both sides of the equality can be computed independently, and this provides some intuition about the range of validity and the limitations of (10.3). We present two examples of functions that are fractal or multifractal, and we show how wavelet methods can be used to compute their Holder exponents and their spectrums of singularities. The two functions we study are the Weierstrass function W(t) = j^Bncos(Ant), n=0 149
150 CHAPTER 10 where 0 < В < 1 and AB > 1, and the Riemann function W) = £ 1 'L n=l In addition to exhibiting an example for which (10.3) holds, the analysis of the Riemann function will provide an opportunity to compare the performances of wavelets and Fourier analysis in the context of “multifractal analysis.” This chapter is more technical than the others, in the sense that we have chosen to present the proofs of certain results. The proofs have been selected to illus- trate techniques that we feel are basic to wavelet analysis. On the other hand, we warn the reader that this does not imply that the chapter is self-contained. To obtain a balance between telling the story and avoiding too much detail, we refer to other sources for certain key results and proofs. This chapter differs from the others in another respect: The emphasis is on the use of wavelets to analyze the detailed structure of functions, and thus it illustrates the use of wavelets “within mathematics,” as alluded to at the end of section 1.7. 10.2 The Weierstrass function Historians tell us that Karl Weierstrass mentioned the function 1Z in a talk to the Academy of Sciences in Berlin on 18 July 1872 and indicated that Riemann had introduced this function to warn mathematicians that a continuous function need not have a derivative [97]. This function, which first appeared in print in 1875 in [96], has come to be known as Riemann’s function, although there seems to be no written evidence, other than that given by Weierstrass, that connects Riemann directly with this function. (See [43] for a fascinating discussion of the mystery surrounding the origin of IZ.) Weierstrass was not able to analyze 1Z. Instead, he introduced the much more lacunary series W(t) = cos( Ant), 0 < В < 1, and showed that if A is an odd integer and if AB is sufficiently large, then W is nowhere differentiable. We will see that the result is true if AB > 1. (Weierstrass’s proof can be found in his collected work [256].) We intend to show that the function W(t) = B7 cos(AJt) is nowhere differ- entiable and that the same is true for the function W(t) = sin(AJt). These proofs will use wavelet analysis, which in this example appears in general outline as a form of Littlewood Paley analysis. The method we follow is due to Geza Freud [115]. The proof is quite simple, but it is based on an important aspect of wavelet analysis: Analyzing wavelets abound, and success follows from a judicious choice. We begin by defining the wavelet ф in terms of its Fourier transform ф. We first require that ф satisfy the following three conditions: (а) — 0 if £ < A-1, A > 1 (in particular on (—oo,0]). (b) VX£) = 0 if £ > A. (c) ?Д1) = 1. Since there is no problem in doing so, we will assume that A is infinitely differen- tiable. By construction, ^(^(O) = 0, so J tkip(t)dt = 0 for к G N. Furthermore, since ip is infinitely differentiable and has compact support, tp is in the Schwartz class, and, in particular, |i|fe|^(i)| —> 0 as \t\ +oc for all к G N. This is more than is needed for the proof, but it is there for the asking.
WAVELETS AND MULTIFRACTAL FUNCTIONS 151 Write iffftf = A-7?/’(A-7/), j G N, and denote the convolution operators f f*ifj by Aj. These operators constitute a sequence of bandpass filters. The analysis of a real function / using the sequence Aj resembles a Littlewood-Paley analysis that would be carried out on the analytic signal whose real part is f. Freud’s method is based on the following lemma. Lemma 10.1. Let f be a bounded, continuous function of the real variable t. If f is differentiable at to, then A7/(i0) = wherc £j 0 as j +oc. Proof. By definition, Aj/(to) = A-7 f /(to — P)if(Aip) dt. We can write /(to - t) = /(t0) - t/'(to) + te(i), where s(t) —> 0 as t —> 0 and |s(t)| < C for some C > 0. This gives three terms for Aj/(to). The first two are zero because f -0(t) dt = f t^(t) dt = 0. The third term is AJ f t£(t)if(AJt)dt = A--7 / s(A-J t)t'0(t) dt. But we have |s(A~J t)| < C, limj^+oc c(A~Jt) = 0 (simple convergence), and / |t||'0(t)| dt < oo. From this it follows that e.j = f £(A~ff)tif(t) dt —* 0 as j +oo. □ To prove Weierstrass’s result, we apply the operators Aj to the two functions W(t) = E^cos(A7t) and W(t) = В-7 sin(A-7t). ,;=0 j=0 By direct computation, (AjW)(t) = ^А~\АВуегАЧ and (AjW)(t) = -?:|а^ (AB)7eM\ Lemma 10.1 applies, and since {ABye1A3t 0 as / —> +oo, the conclusion is that W and W are nowhere differentiable. We pause to make an observation about the choice of the analyzing wavelet if. If we had initially chosen if to be real-valued and even, with if(ff) = 0 for |£| < A-1 and for |£| > A, then the analyzing wavelet if would have been real-valued and even. This would have led to (AjW)(t) = BJ sin(A-7t), and we could not have concluded from Lemma 10.1 that W is not differentiable at t = pA~qiv whenever A is an integer. From this example we see the merit of choosing an analyzing wavelet that is analytic: The information contained in Aj/(t) is more specific. This choice of analyzing wavelets loses its importance if we rephrase Lemma 10.1 in the following, more precise form. Lemma 10.2. Let f be a bounded, continuous function of the real variable t. If f is differentiable at to, then there exists a function g defined for x > 0 such that it is increasing, it is continuous at 0 with q(0) = 0, and I Aj f (ti) I < |ti - *o|>7( |ti - to I) + A-^(A-0 (10.4) for all j > 0 and all real ti. Proof. The proof is similar to that of Lemma 10.1, but here it is necessary to do some tinkering to separate the parameters to — ti and A--7 in the error term. As before, we write f(t) — /Go) + (t ~ M/'Go) + (io — tfftto — t),
152 CHAPTER 10 and then Ay/(ti) = J'(to — ti + A~3t)e(to — h + dt. (10.5) Define the function /3 on [0, +oo) by (3(h) = sup|f|<Zl |s(t)|. Then (3 has the fol- lowing properties: (i) /3 is continuous and bounded on [0, Too) and /3(0) = 0. (ii) (3 is monotonic nondecreasing, that is, /3(/ii) < /3(/гг) when hi < h^. (iii) (u + v)(3(u + v) < 2u(3(2u) + 2v/3(2v) whenever и > 0 and v > 0. Property (iii) follows from (ii). If и > v, then (u + v)(3(u + v) < 2u(3(2u) < 2u(3(2u) + 2v(3(2v). Returning to (10.5), we have |Aj/(^i)| < / |io - ii + A“Jt||s(to - ii + A“Jt)||V>(t)| dt < j\\t0 - ti| + |A“Jt|)/3(|t0 - ii| + A--7|t|)|'0(t)| dt < 2\t0 - ii| /3(2|t0 - ill)/ l^(i)l dt + 2 • A~3 j /3(2 • A~3 |t))|tlM(i)| dt. By taking t] to be the function defined by T](h) = 2sup ^/3(2/i)/ |^(t)| dt, //3(2/i|t|)|i||-0(t)| dt j>, we arrive at the statement of the lemma. □ The proofs of the two lemmas use only the following properties of the wavelet -0: f '/(’(t)dt = f tip(t) dt = 0 and ig/>(i) E L^IR). This leaves plenty of room for choosing a wavelet to fit the task at hand. To see the advantage of Lemma 10.2 over Lemma 10.1, suppose that we had madeythe “bad choice” of a real, even wavelet, in which case we ended up with (AjW)(i) = BJ sin(A3t), and we were not able, using Lemma 10.1, to reach the desired conclusion. The result follows from Lemma 10.2, however. For example, for io = 0, take ti = ^A~3 so that sinAJii = 1. The statement of Lemma 10.2 comes close to being a necessary and sufficient condition for differentiability at to. (The sharpest results about computing the regularity of a function using the wavelet transform can be found in [160] and [208].) 10.3 Regular points in an irregular background We now propose to determine the points ж() where a function, which may be very irregular at other points, has a given Holder regularity. This form of regularity is expressed by the following condition: For 0 < a < 1, f is said to be Ca(xo) if there exists a C > 0 such that |/(ж) - /(ж0)| < C\x - ж0|а.
WAVELETS AND MULTIFRACTAL FUNCTIONS 153 If there exists a constant C such that this relation holds uniformly for all ж() G IR, we say that f is in the Holder space ^(IR) and write f E C'a(lR). (Note that these definitions are consistent with those given in section 2.2.) We discussed the Grossmann-Morlet analysis of a function / in L2(1R) in section 2.7. There we introduced the notation №(«,(>) = CAVwT where ~ а -----------У a > 0, b E IR. The term a-1/2 was chosen so ||II2 — HV’lla, since we were interested in an L2 analysis. In the current chapter, we are interested in the analysis of functions in L°°, and we change the normalizing factor to a"1 so 111 = 11'0111- Thus, W(a,b) = - [ /(х)'ф(-------У dx. (10.6) a Jr \ a / This transform make sense if, for example, f is bounded and E LX(1R). We also mentioned in Chapter 2 the reconstruction formula 4 [ / [ ,4 /x — b\db\da f(x)= / ( / W(a,bfiM---------) —)—, (Ю.7) Ja>o v JbeR v a / a / a which converges in the sense of L2(1R) when the wavelet ф satisfies appropriate conditions. If f is bounded, then under suitable conditions on / and ф, the inversion formula (Ю.7) holds at all points where f is continuous. (Precise statements and technical details for two inversion theorems are given in Appendix B.) Wavelet analysis provides a direct and rather easy access to the pointwise behav- ior of functions and signals. This statement often has been used as an advertisement for wavelet analysis. We wish to back up this claim with a precise mathematical for- mulation when the pointwise behavior is measured by the Holder exponent а(жо). This goal will be reached with Theorem 10.1. The first result we will prove is a simple generalization of Lemma 10.2. It states that if f E Са(жо), then |W(a,6)| < C(aa + \b — ж0|а), where aa has replaced A_-777(A--7) and \b — жо|а has replaced \t± — toH(|G ~M)- (Note that t0 is now ж0, ii is b, and A--7 is a.) Here, and elsewhere in this chapter, we require the analyzing wavelet if to satisfy at least the condition |0(ж)| < C(1 + |ж|2)-1. Of course, we require the usual condition f dx = 0. Other conditions will be added as needed. As shown in section 10.2, there are plenty of wavelets with these properties. We are going to state and prove the next few results under the assumption that a < 1. This hypothesis is not essential, and the results extend to a > 1 (see [151]). Lemma 10.3. If f is bounded and in Са(жо), then its wavelet transform satisfies this condition: There exists a constant C > 0 such that, if a < 1 and \b — ж0| < 1? then \W(a,b)\ < C(aa + \b — xo\a). (10.8)
154 CHAPTER 10 Proof. The proof is simpler than that of Lemma 10.2. Since f 'ф(х') dx — 0, we can write W(a,b) = | У [/(ж) - /(ж0)] ) dx. Then |ТИ(а,Ь)| <i f |/(ж) - /(ж0)| [ф(-—-)| dx a J I \ а / I and by making the change of variable и — we have |W(ft,b)|<C' / \au + b — жо|а|'гМ'м)1 du <Caa / |u|a|^(u)| du + C\b — жо|а / |-0(u)| du. The result follows from the observation that |u|a|^(u)|du < C ц| 1 + |u|2 Too, since 0 < a < 1. □ Note that if f E Ca(lR), then the lemma implies that |W(ft, b)| < Caa. We will state and prove a converse to Lemma 10.3, but first we wish to comment on this estimate. The “cone of influence,” Г(жо) of жо, is defined by a > \b — жо|. If (b, a) E Г(жо), then (10.8) becomes |W(ft, b)| < 2Caa; however, if (b,d) is not in Г(жо), then |W(a, b)\ < 2C\b — ж0|а. Some scientists thought at first that the Holder exponent а(жо) of f at жо could be computed by estimating |W(a,b)\ inside the cone of influence Г(ж0). This belief is based on the following reasoning: Assume that the support of -0 is contained in [—1,1] and assume that we compute W(ft,b) when (b,а) Г(жо). Then e — \b — жо| — ft > 0 and W(a,b) = / /(ж)^(а;Ь)(ж)(/ж = / /(ж)^(а,Ь)(ж) dx. J J\X-XO\>E This computation led some to believe that the behavior of f near жо did not influ- ence the wavelet coefficients of f outside the cone of influence of ж0. An example that supports this idea is given by the function /(ж) = |ж — Жо|Т In this case, W(a,b) — aaipa (ь~ж°), where ^«(C) = с(аЖ|-1_а,0(£)- 0, then it suffices to read W(a, b) on the half-line a = A-1(b — Жо) > 0 to recover the exponent a. A counterexample is the chirp /(ж) = |ж — жо|аехр(г(ж — жо)-1). Integration by parts shows that |lV(ft, b)| < C^aN when a > /3\х — жо|, (3 > 0. If estimates inside the cone of influence were sufficient for determining the Holder regularity, then f would belong to Слг(ж0) for every integer N. But this is not the case, and thus, this counterexample shows that examining the wavelet coefficients inside the cone of influence is not sufficient for determining the Holder regularity of a function
WAVELETS AND MULTIFRACTAL FUNCTIONS 155 at a given point. Furthermore, the inequality |W(a, b)\ < C(aa + |6 — жо|а) is not sufficient either, and Lemma 10.3 does not yield a necessary and sufficient condition for f G Ca(xo). Nevertheless, the sufficient condition is only an epsilon away from (10.8), and the following theorem comes close to being the converse of Lemma 10.3. Theorem 10.1. Assume that 0 < a' < a < 1. If the wavelet transform of W(a,b) of a bounded function f satisfies |W(a,6)| < Cna(l+ \Ь~аХ-°\у (Ю-9) in some neighborhood 0 < a < ao, \b — жо| < bo, then f belongs to Ca(xo). We will prove this theorem, but, before doing so, we wish to comment on some of its implications. The first observation is that the estimate (10.9) implies that |W(a, 6)| < Caa~a , and this implies (by Theorem 10.1) that f G (Ja~a (JR). Ap- plying Theorem 10.1 means looking for points Жо where f is more regular than its “average” regularity. The global regularity is given by ft - a', and we are looking for points where the regularity is given by a. The next observation is that this theorem yields an algorithm for computing pointwise Holder exponents. Theorem 10.2. Assume that f is a bounded function that belongs to the Holder space C^(R) for some (3, 0 < (3 < 1. Then for every point xq G R, the Holder exponent а(/, Жо) is given by a(f, ж0) = liminf (10.10) bnT° log(« +|6-ж0|) Proof. Recall that a(f, ж0) = sup{a< | f G Са(жо)} and write (10.8) as |W(a, 6)| < C(a + |6 — ж0|)а. Then it follows from Lemma 10.3 that (f \ v f bg|W(a,6)| a(f, ж0) < liminf -—------------(10.11) лу log(« + \b - rcol) ’ To prove the result in the other direction, suppose that (10.11) is not an equality. Then there is an a such that // ч i- г log |LW(a, 6)| о.Лж0 < a < liminf -—--------r------7-, 1оё(а + |6-ж0|)’ and we have |W(a, 6)| < C(a + |6 — жо|)а. The assumption that f G C^(R) implies that (3 < a(f, жо) < a and (by Lemma 10.3) that |W(a, 6)| < Ca@. By interpolating between these two estimates, we obtain |IV(a, 6) | < Ca7 (1 + b~T°)??, where g = 6 a and 7 = 6a + (1 — $)/3, 0 < 0 < 1. By applying Theorem 10.1, we see that f E С7(ж0). Since 7 is any real number in (/?,«), we conclude that a(f, жо) > a. Thus (10.11) is an equality, which proves the result. □ A nice example where Theorem 10.2 applies is given by the Riemann function R(x) = 72 8т(тгп2ж). To see that this is true, we show that R. belongs to the global Holder space C1//2(R). We write 1Z(x) = where Rj(x) = —- 8т(тгп2ж).
156 CHAPTER 10 We immediately have HAdloo < 2 J and \\Rj Цоо < 7t2j. Hence, N oo |R(* + h) - ад[ <Tr£|ft|HB'U +2 £ II^U J=o j=N+l < 2jv\h\2N +2 2~N. The optimal choice of N is determined by \h\ < 4~N < 4|/z|. For this N we have |7£(ж + h) — 7£(ж)| < Cl/zl1/2, which means that 1Z E C'1//2(1R). The proof of Theorem 10.1 will follow a similar strategy. We will give the full details, but first we need to set up some notation and prove another lemma. Since we will be estimating separately the contributions of each scale a in (10.7), we introduce the following notation: (Да/)(ж) = /* W(a,b)if(-—. JbeK. \ a / a Lemma 10.4. Assume that \'ф(х')\ + \'ф'(х)\ < C(1 + |ж|2)-1. If the wavelet trans- form of f satisfies the inequality | W(a, 6)| < Caa (1 + (10.12) \ a J for some C > 0 and some a' < a, then < Ca“(l + ), (10.13) and |(A<J)'(AI ^“-‘(l + I^Q (10.14) Proof. Using (10.12) and the localization of U, we see that г 1 + |U^o ia' |(Да/)(ж)| < Caa / 1 Q J----. Ik aJA A - j x + |Z-6|2 a By introducing the new variable и = and noting that \x + < |т|а + \y|a , we have ил а ад n а Г f 1 + Mn/ , , I - x0 |a' Г du ' \(/\af)(x)\ < Ca / ———rdu+\--------- / ——. J 1 + iP I a I J 1 + uz Since a' < 1, (10.13) follows immediately. The proof of (10.14) is similar since (Да/)'(ж) = f W(a,bfif'(-—□ Agr \ a J a2 Proof of Theorem 10.1. We use (10.7) to write /(ж) - /(ж0) = [ [(Да/)(ж) - (Да/)(ж0)] —. J <2>0 ®
WAVELETS AND MULTIFRACTAL FUNCTIONS 157 For a > \x — a?o|, using the mean value theorem and (10.14), we have [(Дв/)И-(Дв/)(т0)]^ < C\x - a70|a. For a < \x — жо|, we estimate (Да/)(ж) and (Да/)(я;о) separately using (10.13) so that ( [(ДО/)И-(ДО/)Ы]- <C [ \x - x0\a'aa~a'— а<|ж — жо| a Ja<\x — жо| ® < C\x — a?0|a. Note that this is the point in the proof where it is crucial to have a' < a. □ Observe that (10.12) is stronger than (10.8) since a' < a. It often happens, however, that the large wavelet coefficients that determine the regularity at xq are in a cone | b~^Q | < C, in which case the right-hand sides of (10.8) and (10.12) are of the same order of magnitude, and the wavelet criterion is sharp. This is true for the Weierstrass function, as will be shown below. Having established Theorems 10.1 and 10.2, we can easily prove a result men- tioned in section 9.4, namely, that the Holder exponent of the Weierstrass function W(t) = Bn cos(An;r), 0 < В < 1, AB > 1, is — everywhere. The first step is to compute the wavelet transform of W with the wavelet that was used in Lemma 10.1. This is a straightforward computation, and we have 1 OC _ W(a, b) = - ^2 (10.15) n=0 Since the support of is contained in [A \ A], the only nonzero terms in the sum occur when —1 — < n < 1 — and from this it follows that |Wyy(a,&)| < Ca A . This proves that W E C losA (R) (by Theorem 10.1) and that Theorem 10.2 applies. Define an = A~n for n > 1. Then for a = an there is only one term in the right- hand side of (10.15), and W(W: an,b) = log A еш™1ь. Using Theorem 10.2, we have o(.tq) — lim inf a—»0 b-^x0 log |Wyy(a, 6)| log(a+ \b - ж0|) < lim n —>oc log |H/W(un,6)| log an log# log A Since W E C A (R), this proves that а(то) = — everywhere. 10.4 The Riemann function In 1916, G. H. Hardy proved that the Riemann function ОС 1 7£(ж) = — sin^A) is at best C3//4(to) in the following three cases [136]:8 8This proof uses results from a paper Hardy wrote with J. E. Littlewood in 1914 [137].
158 CHAPTER 10 (a) xq is irrational. (b) Xq = with p = 0 (mod 2) and q = 1 (mod 4). (c) xq = | with p = 1 (mod 2) and q = 2 (mod 4). Hardy’s proof is a precursor of Lemma 10.3. To obtain the irregularity of TZ at a given point, Hardy showed that a wavelet transform of TZ is “large” near that point. Of course, Hardy did not use wavelet language, but the “ancestor” of the wavelet transform he used is a perfectly good one, namely, the derivative of the Poisson kernel. Two problems remained open after Hardy’s work: the question of differentiability at the rationals | where p and q are odd and the determination of the exact Holder exponents for all x. Serge Lang suggested the first of these problems to an undergraduate class in December 1967, and to the general surprise of the mathematics world, Joseph L. Gerver, one of Lang’s sophomore students, resolved the problem by proving the following unexpected result: If xq = where p and q are odd, then 7Z is differentiable at xq and IZ'(xq) = — |. He then showed that 7Z is differentiable at no other points, and the problem of the differentiability of the Riemann function was completely settled (see [127] and [128]). We will follow Itatsu [149] and give a direct proof—which is based on Fourier analysis—of Gerver’s result. This method will actually give us a very precise de- scription of the oscillating behavior of 77 near these rationals. For the irrationals, we will reformulate Hardy’s method and, following Duister- maat [97], obtain the best possible “irregularity” at the irrationals. Hardy’s method cannot yield information about the “regularity” at those points since this neces- sitates Theorem 10.1, which was first proved in 1988 [151]. But we will see that Hardy’s method and Theorem 10.1 give the exact Holder exponent at every point. 10.4.1 Holder regularity at irrationals Following a variant of Hardy’s method, we use the wavelet analysis proposed by Lusin (section 2.6). Thus we take 'ф(х) — to be our analyzing wavelet. It is easy to check that ф E LX(R) with f |^(ж)1 dx = 1, that J ф(х) dx — 0, and that 'ф satisfies the conditions of Theorem 10.1. We begin by computing the wavelet transform Wjz(a, 5) = (77, ф>(а,Ь)}- For this, we define the function T(x) — TZ{x) — iS(x), where S(x) = 52X1. X cos(тгп2x). T has an analytic extension ОС = E n=l in the upper half-plane z — x + iy, у > 0, where it is uniformly bounded by 52X1 X- Furthermore, where Ti.r) = Thus, WK = ^(WT + Wp.
WAVELETS AND MULTIFRACTAL FUNCTIONS 159 It is particularly easy to compute the wavelet transform of T. In fact, , a T(x)dx Wr(a,b) = - / ----—----—— = 2гаТ (b + га), a > Q. 7Г J_00 (x- (b + га))2 This is just the form of Cauchy’s theorem that says = J- Г f^dz 2m (z - Q2 whenever / is holomorphic and bounded in the upper half-plane and Im£ > 0. A similar argument shows that TTr , —a f°° T(x)dx Wq-(a, b) = — / -———------—у =0, a > 0. 7Г j_oo (x + (b + га))2 Thus we have WrScl, b) — - b) — iaT'(b + la). Term-by-term differentiation shows that Т'(г) = ег7Гп2г, so we have Wr^cl, b) = гтшТ'(Ь + га) Q'TTn = — (ф + ш)-1), (10.16) where в is Jacobi’s Theta function defined by ф) = Imz>0. (10.17) We know from Lemma 10.3, Theorem 10.1, and Theorem 10.2 that one way to determine the regularity of at xq is to investigate the behavior of f)(z) in a neighborhood of xq. To carry out this program, it is necessary to understand how 0(z) is transformed under a group of transformations z i—► у (г) known as the theta modular group. This group is defined by , . rz + s . 7 2 =--------, (10.18) qz — p where rp + sq = —1, r, s,p, q are integers, and the matrix (r s\ . c t„ /even odd \ / odd even\ is of the form , , or , , . \q p J \ odd even J у even odd J A discussion of the theta modular group and its action on the Jacobi Theta function would be too much of a detour from our main objective. We will quote the needed results and refer the reader to the paper [97] by Duistermaat for a complete development. It is easy to see that the у : C —> C maps the upper half-plane into itself. It is slightly more involved to establish the following result: When у belongs to the theta modular group, в is transformed as follows: 0(z) = 6>(у(г)) eim7r/4 q-1/2 (z - ~1/2, (10.19)
160 CHAPTER 10 where m is an integer that is a rather complicated function of This formula, which is the cornerstone for the study of the Theta function, can be proved by first showing that the theta modular group is generated by the translation z ь-> г + 2 and by the inversion z i—> — With this established, it is only necessary to verify (10.19) for these two transforms. The first transformation just expresses the periodicity of 0. The second can be obtained by applying Poisson’s summation formula Z = Z^ Л27гп) ugz ncz (which holds at least for all f in the Schwartz class) to the Gaussian x i—► е-7гуж , у > 0. This yields у/У Z = E Tl6Z By extending this relation analytically to all of the upper half-plane, z = x + iy, у > Q, we have We will be using the fact that 0(г) -» 1 as ?/ -> +oo uniformly in x, where z — x + iy. In fact, oo oo p-vy \0(z) - 1| < = 2 (10.20) n=l n=l Our first result is that a(1Z, xq) = | when Xq = and p and q are not both odd. In this case, it is easy to show that there is a у in the theta modular group that maps Xq to infinity. Take b + ia = + ia. We are going to examine the behavior of |Wft.(a,Ь)\ as a 0. From equations (10.16) and (10.19), we see that \Wn(a.b)\ = a1/2-\o(- + Vm7r/4(^)“1/2 - a1/2|. (10.21) 2 1 \q qzaJ I The estimate (10.20) implies that f + -^-)егт7Г/4(гд)-1/2 — a1/2] —> as a 0, and this implies that log|W-fc(a,0| 1 hm--------------- = ”. a—>o log a 2 The result follows from Theorem 10.2: We proved before that 1Z E C1//2(R) and, hence, that Theorem 10.2 applies. We have just shown that ,. . e log|W(a, b)\ 1 lim inf -—-----—-----r- < -, b“T° log(a+ \b — x0|) 2 ° x0 so from Theorem 10.2, q(7+x()) < |. On the other hand, TZ E C1/2(1R) implies that q(7£,xq) > |. Thus, a(7£, Xq) = | whenever Xq = and p and q are not both odd. The case where p and q are both odd will be treated separately, but first we are going to determine a(7£, xq) when xq is irrational.
WAVELETS AND MULTIFRACTAL FUNCTIONS 161 If xq is irrational, then it cannot be mapped to infinity with an element of the theta modular group. However, it is possible to choose a rational 2 very close to xq and map it to infinity. This will provide an estimate of 0(z) for points z near 2 and hence an estimate of 0(z) for points near xq. But what do we mean by rationals “very close to ;ro”? In this case, we mean the rationale given by the continued fraction9 expansion of xq. This is a sequence of rationals — such that Pn . 1 * * * * ж0-----< yy. I Qn । (10.22) A result from the theory of continued fractions states that no rationals other than those in this sequence approximate xq better. However, some irrational numbers are much better approximated by their continued fraction expansion than is indicated by (10.22). The exponent 2 of qn in (10.22) is the “worst possible.” A degree of approximation to xq by rationals can be defined by considering the set Т(ж0) = Pn . 1 Жо-------< — 1 qn I q7„ (10.23) where the inequality in (10.23) must hold for infinitely many n such that pn and qn are not both odd. (We are only interested in these —, since they are the ones 4 Qn that can be mapped to infinity.) Then т(жо) is defined by t(xq') = sup T. tGT(xo) Note that t(xq) can be +oo. This is the case, for example, when xq = 2~n!. On the other hand, t(xq') > 2. (The reference for this and other results about continued fractions will always be [138].) We are now going to show that а(7г,зд)<| + —Ц (10.24) 2 2t(Xq) whenever xq is irrational. With what we have already shown, this proves that | < a(7Z,xo) < |. The proof is similar to the one given for x0 = 2, p and q not both odd. The first step is to choose a yn in the theta modular group that maps 2^ to infinity when pn and qn are not both odd. (In what follows we are only considering 2^- where pn and qn are not both odd.) A simple computation using the fact that rp + sp — —1 shows that this is always possible. Now define , , . Pn i 'I Pn\ zn = Ьп + гап =------H г x0----. qn I qn I We are going to examine the behavior of |Ит^(а, 6)| at the points zn = bn + ian. For this, it is convenient to define rn by I Pn\ 1 жо------= —• I qn । qnn 9For information about continued fractions we recommend An Introduction to the Theory of Numbers by G. H. Hardy and E. M. Wright [138].
162 CHAPTER 10 Since |a;o — for all n, it is clear that тп > 2 for all n. Armed with this notation and equation (10.21), we have IirK(a„, b„)I = «Д W- + - 11 2 1 \qn q*anJ I = Qn 2 + 4„Tl-2>)eimn7r/4(i)_1/2 - 2” I. z I v qn / I It follows from (10.20) (|#(z) — 1| < | if Imz > 1) and the fact that rn > 2 that i < Ы— +iq^~2}eimn7v/\i)~1/2 ~iq^\ < - 4 I \qn ' ‘ 4 for all sufficiently large n. We now wish to estimate liminf , MWK.MI baT7° k)g(«n + \bn - ж0|) In our notation, log(an + \bn — ж0|) = log2r/nT’', so we have log |W(«n,MI = /1 + J_\ Л + log2 \ -1 log(<2n + \bn xqI) \2 2rn / \ log 2qn n ' log f + iQ^_2^eimTl7r/4(i)-1/2 — iqn + log 2qn Tn The second term of the right-hand side of this equation tends to zero as n oo, and we conclude that r . f log |WR(an,6n)| 1 1 lim inf -—-----)------< —I---------------. n^oo log(an + \bn - Xq ) 2 2r(;ro) Theorem 10.2 applies, and since r . f log |W(u,6)| lim inf -—~------7- ba_T° log(a+|6-ж0|) . log |W-fc(an,6n)| lim mt------!-----------—— — log(an + |6n — ;го|) ’ we have 1 x 1 1 3 Z. X - <a(K,z0) < - + (10.25) which is what we wished to prove. Thus is certainly not smoother than | + 2T'(^0) at an irrational xq. It takes more work, involving the investigation of several cases, but it can be shown using Theorem 10.1 that a(1Z, Xq) = | + 2T(x0)' This was proved by Stephane Jaffard, and the details can be found in [152]. We are going to investigate in the next section the behavior of 'R(x') at the rationale x = where both p and q are odd. But before doing so, we wish to return to the multifractal formalism, which was mentioned at the beginning of this chapter and elsewhere. A classical result from number theory known as Jarmk’s theorem (see [101] for a proof) gives the exact Hausdorff dimension of the points having a given order r of approximation by rationals, namely,
WAVELETS AND MULTIFRACTAL FUNCTIONS 163 Thus, the spectrum of singularities of 1Z is d(h) =4h-2 if = 0. (10.26) For all other values of h, the set is empty, and hence its Hausdorff dimension is zero. The exponent | corresponds to the rationals where 7Z is differentiable. The Holder exponent is actually | at these points, as will be seen below. The increasing part of the spectrum (corresponding to h E [|, |]) can be recov- ered by the multifractal formalism, which is thus valid for Riemann’s function [153]. This is significant, since TZ contains chirps. 10.4.2 Riemann’s function near xq = 1 The last task in this chapter is to study Riemann’s function near the points that were not discussed in the previous section, namely, the rationals 2 = Recall that Gerver was the first to show that TZ' | at these points. To simplify the notation, we will discuss only the case 2 = 1. The study near the other rationals can be related to this case by mapping 2 onto 1 with a member of the theta modular group. Also, instead of 1Z, we work with e n2 We can take the imaginary part later, and to simplify notation we have dropped the factor 7Г. Thus we are going to study S(x) at x = 7Г. This function satisfies the following recursion relation: = js^a;) - S(x). (10.27) We are now going to obtain an asymptotic expansion of S(x) as x 0. We can restrict the values of x to x > 0, since S(—x) = S(x). &in x _ i _2_ 1 ЭД = V--------+ V ' n2 Z—/ n2 П=1 71 = 1 n—1 (10.28) Let v(x) = 52^00 e n2—-• The Fourier transform of f(x) = e ~x2-1 has the following asymptotic expansion at infinity: For each fixed К > 1, Ж = + ++ r
164 CHAPTER 10 where £k(£) is bounded and £k(£) —* 0 as £ —> oo. Using Poisson’s summation formula, we have ngZ = ^£/(th (10.29) = ^/(0) + V~X £ e-“2"2/* ( £ + ^fo£A. (^2) Y z—' \ z—' (27Fnrfc (27Ш2П \ Jx J / n^O 4 fc=l ' v ' V / and this is valid for each К > 1. We write _ лк „ — 17Г2п2ж/4 „ — iTT2n2x / \ '-'к X л о c - у £ By using equations (10.27), (10.28), and (10.29), we obtain the following asymptotic expansion for S(tt + x) as x 0: 2 к S(tv + x) = - ^ + ^xVzkgk[-\ +o(xK+1/2). (10.30) 12 2 L' \x/ k=l This proves that TZ'(1) = — which was first proved by Gerver. But the tech- nique used here tells us more. It is clear from (10.30) that 1Z E C3/2(l) and, in fact, that the Holder exponent of TZ at x — 1 is exactly This technique also yields precise information about the oscillatory behavior of TZ near 1. Equation (10.30) shows that near x = 1, TZ “looks like” the chirp ж3/2 sin superimposed on a straight line with slope — |. The gk have several interesting properties: They are —periodic functions that belong to Cfc“1/2(R), and gk(x) dx = 0. Perhaps more remarkable is their direct relation to Riemann’s function. For example, for k — 1, ли = [4S( “ ^r) “ 5(-Л)] The gk for k > 1 are similarly related to primitives of S. We have recently received a paper from Joseph L. Gerver wherein he studies the differentiability of the function ~ an(i its chirp behavior at rational points [129]. Gerver’s technique is similar to Itatsu’s. We refer to it as a Fourier- type technique because it uses the Poisson formula. In fact, it is a variant of the Poisson formula that was found by Hardy and Littlewood. 10.5 Conclusions and comments Our analysis of TZ for x = 1 is a direct Fourier method inspired by the paper [149] by Seiichi Itatsu. This work leads us to the following comparison of wavelet and Fourier methods: The wavelet transform gives a general method to estimate pointwise Holder regularity, but, in specific cases, a direct Fourier method may be more efficient and provide more information. We mention, moreover, a general setting where wavelet methods fail: Condi- tion (10.9) implies that f has a positive uniform regularity in a neighborhood of
WAVELETS AND MULTIFRACTAL FUNCTIONS 165 xq. This excludes all instances of functions that have a dense set of discontinu- ities. Such functions are not just curiosities; they include a large and important class of stochastic processes, namely, the Levy processes. These are processes with stationary, independent increments. They are multifractal and they satisfy the multifractal formalism, but wavelets offer no help for their analysis. In this case, one must return to a direct classical method (see [158]). We indicated above that the multifractal formalism is valid for Riemann’s func- tion [153]. However, it was not easy to prove this result, and this example underlines a problem in this area: The derivation of the spectrum of singularities for a signal using the multifractal formalism will never be completely satisfactory because it is necessary to verify that this formalism is valid for the signal or class of signals being analyzed. For Riemann’s function and for a handful of other functions, it is possible to compute the spectrum of singularities directly. In the case of turbulence, one can dream of deriving the spectrum of singularities mathematically from the Navier-Stokes equations, but as anyone slightly familiar with the field knows, we have very few results about general solutions of these equations, so it seems that we are very far from being able to reach this goal. A more modest and realistic program is to investigate the fractal nature of solutions of nonlinear partial differential equations that are mathematically sim- pler than the Navier-Stokes equation but that are “related” to these equations. Again results are scarce; however, there is at least one notable exception: The one-dimensional Burgers equation ди d fu2\ ----1---— = о dt dx\ 2 / (u(x,t) : R x R+ —> R) had been suggested as a greatly simplified model of the Navier-Stokes equations in one dimension. J. Bertoin proved that if the initial condition u(x, 0) is a Brownian motion, then the solution at time t is a Levy process, which, as noted above, is multifractal [35]. Thus, we have an example of a nonlinear partial differential equation that can develop multifractal solutions starting with a monofractal initial condition. This is the only example of this kind that we know of, and so the degree of generality of this phenomenon is not at all clear. We believe that it would be very instructive to generalize this result to three dimensions, since the Burgers equation in three dimensions is used to model the evolution of matter in the universe. In- deed, if one could prove that solutions of the three-dimensional Burgers equation are “generically” multifractal, this would provide a theoretical foundation for the many discussions about the multifractal nature of the distribution of matter in the universe (see, for example, [252]). Finally, we note that only time-scale wavelets have been used in this chapter. This remark also applies to the analysis of function spaces. It is remarkable that, though many different kinds of wavelet expansions are available and used in signal and image processing, only the time-scale wavelets have the “right” mathematical properties that allow their use for applications inside mathematics, namely, the characterization of function spaces and the analysis of multifractal functions.
CHAPTER 11 Data Compression and Restoration of Noisy Images 11.1 Introduction Wavelets have often been promoted as being the correct tool for processing non- stationary signals having strong transients. In contrast, Fourier analysis is the appropriate tool for studying stationary Gaussian processes. However, as Patrick Flandrin pointed out [110], being nonstationary is a negatively defined concept, and it is too broad to be mathematically useful; it is a jungle, a terra incognita waiting for proper exploration and clarification. Does this mean that our advertisement about wavelets and nonstationary signals belongs to the collection of unfulfilled claims made by the pioneers of the wavelet saga? Should we be pessimistic and conclude that wavelets have nothing to do with nonstationary signals? Not at all. Thanks to the work of a group at the University of South Carolina, this debate was settled when they found the following result: There exist well-defined classes of signals that are characterized by the fact that their wavelet expansions are sparse. If the wavelet expansion of a signal is sparse, then an efficient approximation of the signal requires only a few terms of its wavelet expansion. This paves the way to an efficient compression and transmission. Moreover, these classes are also character- ized by optimal nonlinear rational approximation. Ronald DeVore, Bjorn Jawerth, P. Petrushev, and V. Popov delimited some precisely defined territories inside the jungle of nonstationary signals. These territories are new function spaces, and they happen to be nicely related to certain Besov spaces. (See Appendix D for the defi- nition of Besov spaces and their characterization in terms of wavelets. See [203] for more complete details.) This fundamental discovery supports some of our pioneers’ claims, and at the same time, DeVore’s theorem draws a boundary line, which we illustrate with an example. An otherwise smooth function with isolated singularities of the form \t — to|a has a sparse wavelet expansion. (This example, which is true for arbitrarily small a > 0, was used to support the original claim.) However, singularities along a curve in !R2 are forbidden by the two-dimensional version of Theorem 11.1, since a function as simple as sup{l — ж2 — x%, 0} does not have a sparse wavelet expansion in the strict sense of Theorem 11.1. Furthermore, we note that oscillating singularities such as tsin| are excluded from Theorem 11.1, since they too do not have sparse wavelet expansions in the strict sense. These remarkable results will be described in the next section. Theorems 11.1, 11.2, and 11.3 characterize functions with sparse wavelet expansions. These charac- terizations are in terms of the ladder of Besov spaces and depend on several degrees of sparsity. We will see that they provide the background for David Donoho’s work (section 11.3), much of which was done in collaboration with Iain Johnstone, Gerard
168 CHAPTER 11 Kerkyacharian, and Dominique Picard (see [94]). Donoho’s work is based on the DeVore model. What we mean is that the object X (function, signal, or image) to be recovered can be modeled efficiently by a function belonging to a specified ball in a Besov space. The problem to be addressed is to recover X from noisy data modeled by У — X + aZ. where a > 0 is a small parameter and Z is a standard white noise. This problem leads to a much harder one: The data are given by Y = AX + aZ, where A is a compact operator. In image processing, A models the optics of the instrument used to obtain the image. This model is used in astronomy, as we will see in Chapter 12. An estimator X of X is given by a linear or nonlinear functional Ф acting on the data Y: X = Ф(У). The expected discrepancy between X and X will be compared with a power law Caa as a tends to zero. The optimal estimator is defined to be the one for which a is largest, irrespective of the constant C. The fundamental discovery made by Donoho and his coworkers is the following: Wavelet shrinkage (to be defined) yields an optimal estimator X — Ф(У), where optimality is challenged over all linear and nonlinear functionals Ф acting on the data. This result relies crucially on the fact that the wavelet series expansion of X is sparse. In this sense, sparsity is responsible for optimal denoising. (Selected references to Donoho’s work include [88], [90], [92], and [93].) Some of the models that are currently used in image processing are discussed in section 11.4. These models amount to writing an image f as a sum и + v, where и is supposed to represent the important features of the image, while v is intended to include everything else, such as the noise and textures. But what are these important features? Edges are strong candidates. According to David Marr, evolution shaped the human visual system so that it is very sensitive to edges: We immediately recognize the shape of a shirt, but not necessarily the pattern drawn on the shirt. The human eye needs relatively much longer time to distinguish one texture from another. Marr’s scientific program led to the following conjecture: A correctly tuned wavelet shrinkage applied to an image f yields the и component and eliminates the v component. In other words, wavelet shrinkage should be an edge detector. This conjecture will be discussed in section 11.4. Unfortunately, the class of functions (signals or images) whose wavelet expansions are sparse does not contain images, and this is certainly a limitation on the power of wavelet shrinkage in image processing. The good news is that a new basis called ridgelets shows promise of being able to yield representations of cartoon images that are sparser than those given by wavelet representations. A cartoon image is defined to be a piecewise smooth function with possible jump discontinuities across smooth Jordan curves. The construction of ridgelets and this new research are discussed in section 11.5. 11.2 Nonlinear approximation and sparse wavelet expansions Historically, nonlinear approximation developed from the work of several mathe- maticians in Central and Eastern Europe on rational approximation. Let f be a function defined on a closed and bounded (compact) interval I. To fix our ideas, assume that f belongs to P2(Z). For each positive integer N, one looks for a ra- tional fraction pw(x) = P(x)/Q(x) with degree < N (defined as the maximum of the degrees of the polynomials P and Q) that gives the best approximation to f in the L2(Z) norm. Thus one seeks, for each value of N, to minimize \\f — ды Ц2 with the constraints gw = P/Q, degP < N, and degQ < N. No hypothesis is
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 169 made about the position of the poles of дм- Since the set Rm of rational fractions gn — P/Q that are examined in seeking the minimum is not a linear subspace of L2(Z), the algorithm defining the best approximation is not linear. Furthermore, the function дм is not unique; it is, however, unique if the approximation is mea- sured in the uniform (L°°) norm. (See [226] for a complete discussion of rational approximation.) The goal is to represent rather complicated functions with only a few numbers, namely, the 2N + 1 coefficients of the polynomials P and Q. For this to make sense, it is necessary to know how to control the approximation error. Thus one tries to estimate гм(/) — Ц/ ~ <7n||2 as a function of N for large N when дм provides the best rational approximation of f. Rational approximation will offer an advantage over polynomial approximation only if (for an interesting set of functions) rN(f) 0 as N —+ oo much more rapidly in the case of rational approximation than in the case of polynomial approximation. When this is the case, one can represent the function /, with an acceptable error, using very few coefficients. This problem is also studied when the L2 norm is replaced with other functional norms such as the Lp norm or the uniform norm. Thus, we are concerned with data compression based on a representation adapted to the problem. In contrast to what happens in polynomial approximation, the sequence of errors rN(.f) = Ц/ — gN ||p can decrease rapidly as TV —> oo without / being regular on I in the usual sense. We are going to consider an instructive example studied by D. J. Newman in 1964 [218]. If one tries to approximate the function /(ж) = |ж| on [—1,1], by a polynomial Pm of degree N, the best possible uniform approximation yields sup [/(ж) - P/v(x)| < (11.1) for а у > 0. The order of approximation cannot be better because of the angle in the graph of /. Newman made the remarkable observation that if we allow rational fractions Pn/Qn of degree N, the best order of approximation10 becomes sup <Ce-^, (11.2) xe[-i,i] Qn\%) while the number of parameters is only doubled. Thus, to transmit this very sim- ple signal (the graph of /), rational fractions are much better than polynomials. Approximation by polynomials is linear: Polynomials of degree N form a linear space, and the best approximation of the sum of two functions is the sum of the approximations. Approximation by rational fractions of degree N is not linear: The sum of two rational fractions of degree N usually has degree 2N. The function /(ж) = |ж| is an example where rational approximation accelerates convergence; another example, which was mentioned in the introduction, is the function |ж|а, a > 0. An example where rational approximation offers no decisive advantage is given by the chirp /(x) = xsin^. The proofs and discussion of these striking phenomena have been presented in the work of J. Peetre, V. Peller, A. Pekarskii, P. Petrushev, and V. Popov. Results on nonlinear wavelet approximation were then extended to the multidimensional case by DeVore, Jawerth, and Popov [81]. 10This particular upper bound was obtained by N. S. Vjacheslavov in 1975; D. J. Newman proved (11.2) with an exponent different from 7Г. See [226] for a discussion of these results.
170 CHAPTER 11 Peller obtained his pioneer results for the periodic case [224]. Let Rn be the collection of all rational functions Pn/Qn-> where degP/v < N, < N, and Qn(z) does not vanish on the unit circle z = егв. We also denote by Rn the restrictions of these Pn/Qn to the unit circle. For a continuous function /(e10), we write ^(/) = distL-(/,RN) = inf ||/- д\\ж. (11.3) qERn Peller’s theorem says that rN(f) = O(7V~9) for every q > 1 if and only if / belongs to all the Besov spaces B1/p,p(Lp), 0 < p < 1. (A precise definition of these Besov spaces is given in Appendix D.) Roughly speaking, this condition means that the function / is absolutely continuous, that /' belongs to L1, that /" belongs to that f" belongs to L1/3, and so on. In some sense, / is infinitely differentiable, but its derivatives are measured in weaker and weaker norms. An example of such a function is /(0) = | sin($ — $o)|a, where a > 0. This is a periodic version of the example mentioned in the introduction. We are now going to discuss a variant of Peller’s result in which one is approx- imating functions defined on 1R. For this, we define Rn to be the collection of rational functions P/Q, where degP < degQ < N and Q(x) does not vanish on the real line. We wish to characterize those functions / of the real variable x for which rAr(/) = O(^) (11.4) for every q > 1. This turns out to be equivalent to the wavelet expansion of / being sparse. It is now time to define what we mean by sparse. Since we do not want the smoothness of the analyzing wavelet to be a restriction on the result, we will only consider the specific orthogonal wavelet basis 2Р2ф(2^х — k), j,k G Z, where belongs to the Schwartz class. We denote by ^ ^(ж) the function ф{2^х — к) and warn the reader that we are using the L°° normalization IIV’j^Hoo = IlV’lloo• (This is the same normalization that is used in Appendix D.) Then the wavelet expansion /(t) = 52 /г)^т(т) (11.5) j,k of a function / that is continuous on 1R and vanishes at infinity is said to be sparse if and only if \a(j, k)\p < oo for all 0 < p < 1. (11.6) Observe that the smaller the exponent p, the stronger the requirement. In fact, at the limit p = 0 (which is not considered) there would be only a finite number of nonzero terms. Note also that (11.6) implies that the series in (11.5) converges uniformly to /. If condition (11.6) holds, then the absolute values of the wavelet coefficients, when arranged in decreasing order, form a sequence {сп}пе^* that decreases rapidly as n —* oo. Indeed, if p — |, к G N*, the two conditions (11.6) and cn+1 < cn imply that псУк < c^k — Ck for some Ck > 0. Thus cn < for all n G N*, which means that the sequence decays rapidly. The first variant of Peller’s theorem follows an approach that was proposed by DeVore, Jawerth, and Popov [81].
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 171 Theorem 11.1. Let f be a continuous function defined on R and assume that f vanishes at infinity. Then condition (11.4) holds if and only if the wavelet expansion of f is sparse. We are going to outline a proof of half of the result, namely, that sparsity implies (11.4). The proof would be trivial if the wavelet if were a rational function PfQ, since a rearrangement of the partial sums of (11-5) would yield (11-4). Our tasks are thus (a) to write the wavelet as a series f) = ^27n^fS where, for q fixed, degPn < degQn < q, and where 22 7n|p < сю for p > 0 and (b) to substitute this expansion of f) in (11.5). Then we have = (1L7) n=0 where deg Pn < degQn < q, and where 22 7n|p < сю for p > 0. It takes only a moment of reflection to see that (11-7) yields (11-4). Finally, the decomposition of the wavelet is not a difficulty: Take Pn — 1 and Qn(x) = 1 + (anx — bnf2, where an > 0 and bn E R. This part of the proof can be generalized to any dimension. However, the impli- cation in the other direction is deeper, and it is not true in dimensions greater than one. The converse statement for one dimension relies on some beautiful estimates on rational approximation obtained by A. Pekarskii: For 0 < p < 1, there exists a constant C(p) such that for every pair of polynomials P, Q with deg P < deg Q < N, we have Ц/ BpP>P OC The reader should observe the similarity with Bernstein’s inequalities.11 Here, it is necessary to use the homogeneous Besov norms (see Appendix D). Before moving on, we present two examples. If 07,... , am are m positive expo- nents and g(x) — exp(—ж2), for example, then the function /(ж) = (ci|x - Tip ч------H cmlx - xm\am)g(x) has a sparse wavelet expansion and Theorem 11.1 applies. This function has a finite number of isolated singularities and is smooth elsewhere. These properties are not sufficient to ensure that a function has a sparse expansion. A counterexample is the function /(x) = xshP — 1. Oscillating singularities (chirps) prevent sparse wavelet expansions. An explanation of the ability of rational functions to mimic strong transients is given by an example. If q i7> 1 is large, then fq(x) = (1 + iqx)~2 has a sharp peak at zero and almost vanishes away from the origin. This rational function is quite simple. However, fq has one strong localized oscillation, just as a wavelet does. We are now going to change direction slightly and discuss another kind of ap- proximation. Instead of approximating f by elements of the set Rn of rational fractions with degrees less than or equal to N and with poles in arbitrary positions, we can approximate f by splines with free knots. In the simplest case, these are continuous, piecewise linear functions having N — 1 linear pieces. The N end points nWe suggest G. G. Lorentz’s book [175] for an introduction to Bernstein’s results.
172 CHAPTER 11 of the linear pieces are called knots-, they are free because they can be positioned arbitrarily. Instead of using linear splines (which are only continuous) we can use cubic splines (which will be C2), or splines of arbitrary regularity. We must assume that the order (of regularity) r of the splines is sufficiently large, given the rate at which we want the error гц between the signal and the best spline approximation to converge to zero. As a first step, Petrushev compared the quality of rational approximation with that given by spline approximation using N free knots ti < £2 < • • • < tjy in the interval I. These N free knots play the role of the N poles of Pn/Qn- (The norm used for this approximation will be specified a little later when we describe DeVore’s algorithm explicitly.) This search for the optimal positions of the N knots , tw is related to the problem of optimally segmenting a given signal (or function) on the interval I. One wants to determine where there are “natu- ral” changes in a signal: We want to segment a function f into N — 1 functions /1, /2,... , /лг-i defined on intervals Д, /2, • • • , Av-i forming a partition of I. Each fj must be well approximated by a polynomial Pj on Ij, where the degrees of the Pj must be < r + 1. A suitably truncated wavelet expansion gives this kind of approximation, if the wavelets are constructed with splines. The interested reader is invited to consult [86] for a more precise formulation. If the function to be segmented is strongly oscillating, such as егшх for a large cj, it is clear that the optimal segmentation is a delusion: It amounts to decomposing the sinusoid into a sequence of restrictions to intervals of length , and this destroys the information given by the periodicity. The same remark is true for a chirp of the form |.r|a sin-: It is poorly approximated by rational fractions or by free-knot splines. This second reading of the best approximation problem allows it to be formulated in any dimension. For instance, in two dimensions, an important problem in numer- ical analysis is to obtain optimal meshes, say, of the surface of an airplane to study its simulated flight. It is clear that the optimal mesh has to be strongly inhomo- geneous. For example, one takes relatively few samples on large flat surfaces. One can, for a given two-dimensional image, look for the optimal triangulation using A vertices and construct the approximation gN to f that minimizes \\f — gN Ц2 and that is piecewise affine with respect to this triangulation. However, in this case, we do not know how to relate the regularity of f to the rate at which r^f) = \\f— Ц2 tends to zero, and this difficulty comes from the fact that the eccentricities of the triangles can be arbitrarily large. It is only by limiting a priori the eccentricities of the triangles used in the adaptive triangulation that DeVore and Popov show us how to determine a suboptimal solution. They went around this problem by proposing a definition of what could be an optimal “segmentation” for a function f of n real variables. Such a segmentation is provided by N disjoint n-cubes Qi,... ,Qn, which play the role of the intervals [tj,tj+i) used by Petrushev. Starting with a fixed “bump” function 0 on the unit cube, we let Oq be a translated and dilated copy of в living on Q. Then, for a given function f belonging to Lp(lRn) and for each integer N > 1, DeVore, Popov, and Jawerth were looking for an optimal choice of cubes Qi,... , Qn and constants ci,... , cn such that PnQT) = II/ - c-lOq,------cn0qn ||p is as small as possible. They observe that a suboptimal solution can be obtained in any dimension by expanding / in a series of wavelets and simply retaining the N
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 173 largest coefficients. The corresponding partial sum gives the suboptimal solution. In other words, after having written this series, one determines, in a sense, the histogram of the coefficients, and one uses this histogram to realize an a posteriori compression. We thus return to the point of view expressed by Donoho, which is no longer to respect the natural order given by a particular series development. Rearranging the order of the terms can accelerate convergence of a wavelet series expansion. We will go into a little more detail about DeVore’s algorithm by describing it in two important cases: (1) The error is measured in the L2 norm (mean-square error), and the functions we wish to approximate are not, a priori, even bounded. (2) The error is measured in the L°° norm (uniform approximation), and the functions we wish to approximate have a certain a priori regularity. In the first case, we begin with an irregular function f belonging to L2(lRn), although the problem we wish to solve is, in fact, local. We try to estimate, for an arbitrary dimension n, Pn(J) = inf Ц/ — fN||2 in these two cases: (а) /дг belongs to the set Sn of free- knot splines, and (b) /nIx) — fTaxfixfirfi where here the index N means that the sum contains at most N terms and the fi>\ are wavelets. The set Sn is defined by partitioning the domain D where one is working into N dyadic cubes Qi,... , Q,v and by considering all linear combinations of the basic splines (pQj fitted to the cubes Qj. The quality of the approximation is measured in the L2 norm using a given positive exponent /3 G (0, |). We wish to characterize those functions / in L2(Rr') for which PN^f) is of the order . This property should be equivalent to some kind of smoothness property of /. Here is the precise statement of the result. Theorem 11.2 (DeVore, Jawerth, Popov [81]). Assume that (3 is fixed with 0 < fi < |. Let q be defined by ± | = fi and write a = nfi. Then the following three properties of a function f in L2(lRn) are equivalent. (1) The function f belongs to the Besov space Ba q (L9). (2) The wavelet coefficients q(A) of f satisfy the condition ^2 |ci(A)|g < oo. (The wavelets are assumed to form an orthonormal basis.) (3) The errors Pn(J) гп the nonlinear approximation satisfy Pn(J) — N^En, where eqN is summable. Since 0 < fi < |, we have 1 < q < 2, which means that |ce(A)|^ < oc is stronger than the obvious condition ^2 l°-( A) |2 — ll/lli- °ther words, Ba'q(Lq) is contained in L2. Furthermore, the best approximation (using N wavelets) is given by the nonlinear thresholding rule whereby one saves only the N largest wavelet coefficients. This is an approximation scheme where the natural order of the wavelet series is upset, and the terms are rearranged in order of decreasing L2 norms. If linear approximation were used, then pi\fif) = N~^sn and Sn G lq would imply that / belongs to the Sobolev space However, the Besov space Ba,q(Lq) is not contained in it is only in L2. Here, nonlinear approximation allows one to “cheat” and pretend that everything works as if fi derivatives of / belonged to L2. (Recall that functions in H@ are smooth; in fact they are [fi — times differentiable.) This brings us to a remark about sparsity. Assuming that / G L2(lRn), the condition < oc, where 1 < q < 2, means that the wavelet expansion f(x) — 52af^)'fix(x") is “sparser” than we would expect from just knowing that / G L2(lRn). This heuristic is based on the following weak inequalities, which we have already met in a slightly different form. For 0 < т < 1, let NT be the number
174 CHAPTER 11 of wavelet coefficients си(Л) such that |q(A)| > t. Then ^2|q(A)|9 = Cq < oc implies that NT < Cqr~q. This inequality is stronger than NT < ||/||2 2t-2, which is the best we have knowing only that f is in L2. Note, however, that this sparsity condition, which we will call (/-sparse, is not nearly as strong as the one used in Theorem 11.1. An interesting application of Theorem 11.2 is given by the characteristic function f = Xq of a smooth bounded region Q in 1R2. In this case, f belongs to all of the Besov spaces Ba,q(Lq) for a < ± and 1 < q < oc. Thus by Theorem 11.2, the errors in the nonlinear approximation satisfy Pn(/) = O(N~@) for any 0 < (3 < In this particular case, a direct check shows that Pn(J) = O(N~X/2). We are now going to describe the case where f is “regular” and the error for the nonlinear approximation is measured in the L°° norm rather than in the L2 norm. Since the theorem is mainly used in image processing, we will give only the two-dimensional version of the result. Theorem 11.3 (DeVore, Jawerth, Popov [81]). Assume that f e Ba>q(Lq), where a > | and 1 < q < oc. Then the optimal error En(J) — inf ||/ — /a||oo mea- sured in the uniform norm satisfies the inequality M/) < CN~a/2. (11.8) This is a striking result, since (11.8) would characterize the Holder space Ca if linear approximation were used. But Bf'1 is contained in where (3 = a — |, and not in Ca. The weaker assumption about f is compensated by nonlinear approximation to give the decay (11.8). The function ffx) = |x| exp (г|ж|-1 — |ж|2) provides an illustration of Theorem 11.3. This function belongs to the Holder space C1/2 but not to Ca for a > However, f belongs to Ba,q(Lq) for 1 < q < сю and a < | This function is a chirp at zero, and it is better compressed by a wavelet series expansion than by a Fourier series expansion. Here is a sketch of the proof of Theorem 11.3. To obtain a uniform approximation of f with an error less than or equal to <5, one defines the threshold jo to be ft-1 log2<5-1, and one keeps, in the first place, all of the terms in the orthogonal wavelet decomposition of /(ж, у) that correspond to scales 0 < j < jo- One assumes that the first approximation is at scale one given by a function of Vq and that one is looking for an approximation on a bounded set. Then this first step amounts to keeping C22-70 terms. If j > jo, one applies at each scale an explicit threshold to the wavelet coefficients: The coefficients satisfying |c(j,k,l)\ < Ej = are replaced by zero. The wavelet series is written as 22 12 c(j, к, — к, 2Jу — I), (3 — a — and the hypothesis is that c(j, k, 1)2^ belongs to lq. The number Nj of coefficients retained at the scale 2_J is estimated by observing that the condition of belonging to lq implies the corresponding weak inequality. We have Nj£jq < 22 |c(j, k, l)\q and hence N < C22-70 + < C"22j0. The error is no greater than C6. This approximation, where N terms are sufficient to obtain (in two dimensions) an error less than 7V~a/2, is surprising because the global regularity is given by the Holder exponent (3 — a — By using a linear algorithm, the error would be which is significantly larger. We remind the reader that this error is measured in the uniform (L°°) norm. The scientific message contained in the preceding proof is more important than the proof itself. It is this: A wavelet thresholding provides a near optimal nonlinear
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 175 approximation. The same conclusion was reached by T. Lyche and K. Mprken working in Oslo on computer aided design [178]. Their compression algorithm uses a multigrid (fine to coarse) scheme. It mimics the Schauder basis expansion, which is a successive approximation of a continuous function by piecewise affine functions. More precisely, as shown in Chapter 2, one has /(x) = ^20<fc<2J ci(j, к^в^х — к), where 0 is the triangle function centered at | on the interval [0,1] and where о(/,/с) = /(& + — |[/(A;2_-7) + f((k + 1)2_J)]. The one-dimensional case of the Oslo algorithm would then consist of setting a(j, к) = 0 for all q(j, k) that fall below a given threshold. This procedure looks like wavelet thresholding to the extent that the a(j, k) resemble wavelet coefficients. In the two-dimensional case, the Oslo algorithm erases each pixel whose gray level can be computed by averaging the gray levels of the neighboring pixels. This is the reason the algorithm is called knot removal in [178]. In [80], DeVore, Jawerth, and Lucier translated the Oslo algorithm into the language of wavelets. We reiterate that nonlinear approximation is indeed more efficient than linear approximation for many functions. One can with very few terms represent rather irregular functions using nonlinear approximation, while if one wished to obtain the same quality of approximation using a linear scheme, one would be obliged to use significantly more terms in the series (or impose much more regularity on the functions that one seeks to represent). In the context of image processing, the goal of nonlinear approximation is to obtain clean edges while optimizing the bit allocation. As is often the case, these nonlinear techniques seem, a posteriori, very natural in analysis because they amount to classifying things in order of importance rather than confining oneself to the conventional order, like the order of the terms of a series fixed in advance. These remarks might lead to the optimistic belief that non- linear approximation would yield a solution to the problem of feature extraction. Since “feature” has not been given a precise meaning, we illustrate this concept with an example. Wavelets have already been used for mammogram segmentation, enhancement, and compression [84] (see also [87] and [85]). The goal is to detect biopsy-proven malignant clusters of calcifications superimposed on ordinary tissues of varying density. These clusters are the features to be enhanced. Moreover, these features should not be degraded by a compression algorithm. Indeed, telediagnos- tics and teletherapy rely crucially on transmitting medical images, and compression is a key ingredient in efficient transmission. The good news for wavelet enthusiasts is that a wavelet-based algorithm is the only lossy compression algorithm to receive FDA approval for use in medical devices (see http://www.jpg.com). We believe that better suited methods will eventually outperform wavelets. This is based on our belief that proper statistical modeling of the class of images to be compressed will lead to the development of algorithms adapted to the images. Before moving to the second theme of this chapter, we wish to illustrate with an example the compression issue that concerns wavelet expansions versus Fourier expansions. As already mentioned, Peller’s theorem applies to the simple function /Р) = (ci|x - aq|ai 4-----h cm\x - xm\am)e^x2, where the exponents Qj are positive real numbers. If a = inf{cti,... ,am}, the Fourier transform of f decays like O(|^|-1“a)- Once f is made 27r-periodic, N = £-1/a terms of the Fourier series are needed to ensure an error that is uni- formly less than e. If a is small, then N is a large power of . If one uses wavelets
176 CHAPTER 11 to expand this particular function, then <9(o-1 logs-1) terms will suffice. This ex- ample supported the intuition of the pioneers, but it is only since the work described above that we have had a systematic approach to these compression issues. We end this section with a more systematic study of the singularities of functions that have sparse wavelet expansions. This problem can be studied in any dimension, but we will focus on the two-dimensional case and applications to image processing. A first step is to extend the definition of a sparse wavelet expansion to functions of n real variables. In IRn, 2n — 1 wavelets Vi are needed to obtain orthonormal wavelet bases of the form 2nJ/2'0j(27x — E), 1 < i < 2n — 1, j G Z, к G TH. Here again we want these wavelets to belong to the Schwartz class 5(lRn). As before, we say that the function / has a sparse wavelet expansion /(^') = 52 Q^'7-^)V(2U: - A:) (11.9) i,j,k if and only if |а(г, j,/c)|p < DC (11.10) i J,к for all 0 < p < 1. The following result was recently obtained by adapting an argument due to Stephane Jaffard in [154]. Theorem 11.4 (Y. Meyer). If f has a sparse wavelet expansion, then there exists a set E C with Hausdorff dimension zero such that the pointwise Holder exponent q(/, x) — Too for x E. As an application of this result, we know immediately that the function f defined by f(%) = sup{0,1 — |x|2}, x G Rn, does not have a sparse wavelet expansion because it is not smooth across the unit sphere, which has Hausdorff dimension n — 1. More precisely, the Holder exponent a(f, x) is 1 for |ж| = 1. We will outline the proof of Theorem 11.4, since it has not been published else- where. The hypothesis is that the coefficients of f in equation (11.9) satisfy (11.10). Write s = TV-1, N g N*, and construct the exceptional set En as follows. Let U£ be the union over and 1 < i < 2n — 1 of the closed balls |x — k2~J| < |a(i,j, k)\£ and let Е^ be the limsup7^+oo Uf. Finally, let E be the union of all of these En- Then the Hausdorff dimension of En is zero because )U la(M> A;)|£7? < сю for every rj > 0, and hence the Hausdorff dimension of E is also zero. If x E, then it is not difficult to apply Jaffard’s criterion (Theorem 10.1) to prove that a(f, x) — +oo, as announced. Roughly speaking, this theorem tells us that sparse wavelet expansions model signals with isolated singularities. In the two-dimensional case, images have jump discontinuities over lines, and this is excluded by the theorem. Does this mean that the achievements of DeVore and his collaborators do not apply to images? The situation is more complicated than a “yes” or “no” answer. Indeed, Besov spaces are being used to model images. The Besov space chosen by DeVore and Lucier is = B1,1(L1(R2)). Unfortunately, the characteristic functions of smooth bounded domains do not belong to B^’1. This is why the larger space BV (for “bounded variation”) is currently preferred. We are going to say more about the spaces that are used to model images in section 11.4. To prepare for this, we pause here to introduce two concepts that play key roles in current research: the space BV and weak lp.
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 177 The space BV(1R2) is defined to be those functions f whose partial derivatives and (taken in the sense of distributions) are Radon measures with finite total mass. The BV norm of f is JR2 |Vf\ dx^dxz, where |V/| is the length of the gradient of /. The characteristic functions yq of smooth domains Q belong to BV, and ||yq\\bv is the length of the boundary <9Q of Q. Although these characteristic functions do not belong to B-J,:L(1R2), we have the embeddings Bj’1^2) c BV(1R2) c В}’00^2), and these embeddings play a key role in Donoho’s denoising strategy, which is described in the next section. The definition of weak lp is this: A sequence cn is said to belong to weak lp if the nonincreasing rearrangement c* of \cn | satisfies the condition c* < Cn-1/p for some constant C > 0 and all n > 1. This condition is implied by ^2 \cn|p < сю, as was already noted following the definition of “sparse.” There is a remarkable connection between these two concepts that was discovered by A. Cohen, R. DeVore, P. Petrushev, and H. Xu in the case of the Haar system [59] and generalized to other wavelet bases by Y. Meyer. Theorem 11.5. If f belongs to BV(1R2), then the wavelet coefficients of f, q(A) = (/, belong to weak I1. The wavelets Vx are assumed to form an orthonormal basis for L2(1R2), and, to be precise about the normalization, the wavelet coefficients are those that appear when f is expanded as f(x) = '^^a(i,j,k)2^Vi(.^x~kf г = 1,2,3, j,fcGZ2. u'A This condition is sharp. In fact, if f is the characteristic function of the unit disc and if |a(An)| denotes the nonincreasing rearrangement of |q(A)|, A G A, then there is a positive 7 such that |a(An)| > yn-1. In spite of this, f contained in weak I1 does not imply that f is in BV. Being weak Z1 or, more generally, weak lp for 0 < p < 1 is a form of sparsity, although it is clearly weaker than having ^2 |q(A)|p < сю for all 0 < p < 1. These ideas will appear again in section 11.4. For the moment, we simply note that the connection between sparse representations of functions and image processing is an extremely active line of research. 11.3 Denoising We begin with the simplest example. One wants to recover X from the given data Y, where we assume that Y = X + aZ. The term aZ is considered to be noise; typically Z will be standard white noise and <r > 0 is a small parameter. In Donoho’s work, the object AC is a function f of a real variable t or an image, in which case we will assume that t belongs to the unit square. To develop an algorithm for recovering X, it is necessary to make some mathematical assumptions about the nature of f. These assumptions should reflect our a priori knowledge about the object X. Making assumptions about f based on our knowledge of what X should be is called modeling, and this issue will be addressed again in section 11.4. For the moment, we are going to follow Donoho, so our modeling of f says that f should be smooth or should belong to some ball В in a given function space. We
178 CHAPTER 11 will argue in the next section that images naturally belong to the space BV(R2) of functions of bounded variation in the plane. For convenience of notation, we will use X to denote both the object we wish to recover and the function that models this object. Our goal is to construct an estimator X of X. More precisely, we wish to build a nonlinear mapping Ф that takes Y (which are the data at our disposal) into a good candidate for X. We denote by || • || the norm that will be used to measure the error between X and X, and we let E denote the expectation operator taken with respect to the noise. We then consider the average risk E[||X — X||2] and compute it for the worst case. This yields the quantity sup E[||X-X||2], (11.11) хев where В is the ball containing all the functions we wish to recover. Finally, we would like the estimator to be optimal among all possible linear and nonlinear candidates. This means that we need to solve the following minimax problem: inf sup E[||X - X||2]. (11.12) ф хев This ambitious program is out of reach in most of the interesting cases, and we must thus be content with a near-optimal (or suboptimal) estimator Ф. Suboptimal is defined as follows: Let a be the largest power of a such that for every £ > 0 the estimate inf sup E[||X —X||2] < Ceaa~£ (11.13) ф хев is true as с 0. An estimator X is suboptimal if sup E[||X - X||2] = O(aa“e) хев for all e > 0 as a —* 0, where the exponent a is the same as in the optimal case. Roughly speaking, Donoho’s theorem tells us the following: If the risk is measured in the L2 norm, then the first thing to do for finding a near-optimal estimator is to construct an orthonormal basis for L2 in which the functions belonging to В have sparse expansions. Following Donoho, we illustrate this statement with an almost trivial example. In the example, X, У, and Z will be sequences {a?n}, {z/n}, and {zn}, n > 1. When we return to more realistic situations, these sequences will be the coordinates of f and the other objects in some suitable basis. The noise {zn} is not stochastic, but we assume that \zn\ < 1. In other words, each coordinate xn is corrupted by an error that does not exceed <j. The error between the sequence {a?n} we wish to recover and the estimated sequence {жп} will be measured in the Z2(N) norm. We are going to model our a priori knowledge about the solution by the condition |a?n| < Cn~@, n > 1, where C is a given constant and /3 > | is a given exponent. We denote this collection of sequences by B. The estimator we construct for the example is based on a slightly different defi- nition of risk. We do not average over the noise, but instead we focus on the worst case. This leads us to define the risk to be sup sup ||X — ХЦ2 (11.14) хев z
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 179 and to construct an estimator that minimizes (11.14). Constructing this estimator is an exercise. It is sufficient to do it separately for each coordinate. One first considers the case Cn~^ > a and then the case Cn~@ < a. The resulting decision rule, which constitutes the estimator, depends on C and (3. David Donoho improved this algorithm with a much more intuitive decision rule that does not depend on /3. This decision rule is called thresholding. It is defined as follows: If a > C, then it is assumed that the signal is entirely buried in the noise and we set X = 0. If 0 < a < C, we first consider those indices n for which \yn\ < 2a. For such an n, |a?n| < 3a, and this coordinate of the signal is considered to be buried in the noise. For these cases we set xn = 0. If \yn\ > 2a, then СпГ^ > |a?n| > a, and we set xn = yn — a sign(?/n), which implies that |жп| < |жп| < CrrE A simple computation shows that the worst risk is of the order a", where a — 2 — Observe that this risk becomes smaller as /3 increases. This thresholding algorithm is near optimal because it yields the same exponent a as the optimal estimator. Since the thresholding estimator does not depend on /3, the converse problem can be addressed: Given a sequence {жп}пем*, under what condition is the worst risk IZrXi \xn — xn\2 of the order a" as a tends to zero? Here, as before, yn = xn + <jzn, where \zn\ < 1 and xn is the estimator given by the previously defined thresholding. The answer is that |жп — a?n|2 = O(a“) if and only if the nonincreasing rearrangement of |a?n| decays like O(n-/3), where a — 2 — L. We are now going to leave this simple example and address more realistic situ- ations where the object we wish to recover is modeled by a function f defined on the interval [0,1] and belonging to some ball В in a function space E. The noise is assumed to be standard Gaussian white noise,12 and we wish to estimate /(t) from the noisy data = Ж+ 0 < t < 1. (11.15) We assume that the risk is evaluated in L2 [0,1]. With these assumptions and with what we have learned from the simple example, we are naturally led to look for an orthonormal basis {en} for L2[0,1] that ensures a fast decay of the coefficients (/, en), n > 1, when f G B. More precisely, we are led to search for a “best basis” among all orthonormal bases for L2 [0,1], where “best” means that one for which the decay of (/, en) is the fastest in the worst case, f running over B. As Paul Levy pointed out, if {en} is any orthonormal basis, then the coordinates zn = (z,en) of a standard white noise are independent, identically distributed standard Gaussian variables (which we abbreviate by i.i.d. A”(0,1)). Before going further with the general theory, it is useful to illustrate these ideas with an example. If, for instance, В is the unit ball of the Holder space C<y. a > 0, then the Fourier coefficients cn of f G В decay like O(n-a), which is optimal. The corresponding wavelet coefficients decay like O(n-a-1/2), which is clearly better. Furthermore, the space Ca is characterized by this decay of the wavelet coefficients. Holder spaces are embedded in the larger family of Besov spaces Ba'q(Lp). These remarks shed light on the deep relations between denoising and finding sparse expansions for certain classes of function spaces. We are now going to reformulate (11.15). Since the signal we are looking for is a smooth function defined on [0,1], our denoising problem can be restated as follows: 12See [145] for recent results on wavelet thresholding where the noise is not Gaussian.
180 CHAPTER И The data yo,... , yN-i, N — 2q. that are collected are given by Ук = 0 < к < N, (11.16) where the are i.i.d. V(0,1). The wavelet transform that is needed yields an iso- metric isomorphism between L1 2 3 [0,1] and Z2(N). Here we are looking for its discrete version. In this discrete version, L2 [0,1] is identified with Z2{0, ... , where each point 77 is given the mass With these conventions, the wavelet transform of (11.16) is Ym = Xm + ^=Zm, 0<m<N-l, (11.17) where the Xm are the wavelet coefficients of /(77) and the Zm are i.i.d. 7V(0,1). Here, the index m plays the role of the pair (j, k) that is usually used in the wavelet transform. If the smoothness assumption about f appears as the condition |Xm| < Cm~^, we are not too far from our first example. However, in the case at hand, the Zm are not uniformly bounded by 1, but rather by ^/2 log N. This is why the threshold r used below in Donoho’s wavelet shrinkage is not , as our simple example might lead one to believe, but rather r = -^y/2 log N. One of Donoho’s most interesting algorithms has the following remarkable prop- erty: Its application does not depend on the exponents o, p, and q of the Besov space Ba'q(Lp) used to model the data. We consider the case of a noisy image and try to reconstruct f (ti) from the noisy data di = f(ti) + az, where Zi is nor- malized white noise and where the points tz = (^y-, y(y) belong to the fine grid defining the image. This is how the algorithm works: Starting with the noisy data, one computes the corresponding empirical wavelet coefficients. (We will say some- thing about how these are computed in a moment.) Then one applies the following wavelet shrinkage to these empirical coefficients: All the coefficients with modulus less than or equal to г = 2ovV~ ! (log AQ1/2 are replaced with zero. Those whose modulus is greater than т are displaced toward zero by an amount equal to r. In other words, each wavelet coefficient, ж, is replaced by у — 0(x) — x — rsign(a?) if I a; I > т and by у = 0 otherwise. Donoho proved that this estimator has the following properties: (1) Each time that one has a priori “Besov” knowledge about the signal or image, the algorithm is suboptimal. (2) The algorithm preserves regularity, that is, the a priori knowledge about the signal. (3) If the signal is zero, the algorithm returns zero. We must note, however, that the threshold used in the algorithm depends on the a priori knowledge of the noise level. The suboptimal nature of the algorithm is again defined by the rate at which ||/ — / Ц2 tends to zero as the noise level a tends to zero. Here, / is the estimate of / given by Donoho’s algorithm. In Donoho’s algorithm one must compute the wavelet coefficients in a situation that is different from the usual case of a function defined on the whole real line. Here, we have only discrete data defined on an interval. Wavelets tailored to an in- terval have been constructed by I. Daubechies, A. Cohen, and P. Vial [58]. Roughly
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 181 speaking, one defines the approximation spaces Vj by first using all of the scaling functions p(2dx — k) having support in the interval I and by then adjoining other special scaling functions that take care of the ends of I. This is done so as to generate all of the polynomials of degree < N (in the case one is using wavelets with N zero moments). Then the construction of the wavelets follows the usual process. Having done this, Daubechies and her collaborators constructed the filters needed to pass from one scale to the next. These are the same filters that are used to process the data in Donoho’s algorithm. This new algorithm is called soft thresholding. We will see in Chapter 12 how this technique is used to improve an image reconstruction algorithm used in astronomy. This supports our theme that “specific problems call for tailored solutions.” 11.4 Modeling images Image processing is an important application of Donoho’s discoveries. This work concerns geometric-type images, and here is what it is about. A real image, like that of a classroom, is composed (approximately) of geometric forms that are outlined by rather simple contours. These geometric forms are “filled in” with variations in the luminous intensity called textures. For example, some students wear pullover sweaters, and a close examination of these sweaters reveals periodic, or almost periodic, patterns that have high spatial frequency with weak intensity. This is to say that the variations in luminous intensity may be very rapid but are weak when compared with the much more pronounced variations at the edges of the students’ silhouettes. If we asked a talented draftsperson to make a sketch of this classroom, the lines representing the contours would be very distinct while the textures would be reproduced with much less fidelity. These textures, like those created by hair, would be suggested rather than carefully drawn. This is indeed how an artist works. For example, A. Durer was famous for being able to create, with a single brush stroke, hair that appeared to be drawn hair by hair. Such ideas led to the concept of simulating natural textures automatically us- ing algorithmic techniques that imitate Durer’s brush. Currently, two-dimensional versions of fractional Brownian motion can be used to simulate some kinds of tex- tures. These simulations are made by representing a fractional Brownian motion as a series of appropriate wavelets with i.i.d. Gaussian coefficients. This technique was initiated by Fabrice Sellan, and the details can be found in [2]. For more recent work on the synthesis of fractional Brownian motion, see [211]. However, our focus is on the contours, and here we imitate Ingres and his pictorial vision. Marr suggested that the low-level process in the human visual system was based on some kind of wavelet analysis. Indeed, Marr wanted to explain the extraor- dinary ability of the human visual system to detect edges. Marr’s explanations are based on the following model. Consider a piecewise smooth function и with jump discontinuities across the boundaries dD^,... , dD^ of the domains Di,... ,in which и is smooth. The given image f is modeled by f = и + v, where и is defined as above and v contains the noise and texture. Indeed, if the function и is smooth inside D\,... , Dx with jump discontinuities across the boundaries dDi,... , dDjy, then the wavelet coefficients J itfxffx dx are either rather small or rather large. They are small whenever the support of fix does not hit one of the boundaries, and they are large when the support of fix intersects these boundaries. Wavelet thresh- olding retains only the large wavelet coefficients, and it can thus be interpreted as an edge detector. This is where Donoho’s wavelet shrinkage can be presented as
182 CHAPTER И an algorithm for finding contours. One can say that Donoho’s thinking extends the ideas initiated by Marr. The basic working hypothesis is that the noise and textures are indistinguishable and that the algorithm should extract the design, that is, the contours, while ignor- ing the textures and noise. Donoho’s algorithm can be compared to the patient and meticulous work of an archaeologist who, faced with broken and weathered frag- ments of pottery, reconstructs the missing pieces based on thought and experience, and from this deduces the eating habits of a civilization. The kind of information used by the archaeologist is not available to the lay person; it is accessible only to a specialist who is armed with a priori knowledge about what is being sought. Then the piece of broken pottery allows the archaeologist to choose one path among several from a universe of possibilities that has been sufficiently restricted by this a priori knowledge. It is clear that the paradigm according to which contours and textures are the only components of an image is a simplification. Jean-Michel Morel, for example, describes an image as an ordered collection of level lines. The ordering is defined by the intensity of the gray level. Such a representation is more robust than the one given by contours (see [213]). The и + v model we have introduced is quite general. We are now going to add some refinements. These new models will come equipped with “denoising algorithms” that are designed to extract the и component from the sum f = и + v. This problem of extracting и has been pursued by several authors. We will first describe the approach taken by Mumford and Shah. We then consider work by Osher and Rudin, followed by that of DeVore and Lucier, and finally we return to Donoho’s contribution. Most of these authors propose a variational approach: They minimize a functional over a collection of candidates for the cartoon sketch u. In the Mumford-Shah approach, one is looking for a pair (u, AT), where К is a compact set and the function и is smooth on the complement of K. We assume for the sake of simplicity that /(a?), x = (aq, X2), belongs to L2(Q), where Q is the unit square. The Mumford-Shah functional J, which is to be minimized, is defined by J(u) = /* |/(a?) — u(a?)|2 dx + a /* |Vu(a?)|2 dx + Jo. Jq\k where a > 0 and (3 > 0 are two parameters that need to be adjusted for the class of images being processed and 7Y1(/C) is the one-dimensional Hausdorff measure (total length) of K. We are looking for a cartoon sketch и with jump discontinuities across К. These discontinuities prevent the distributional derivative Vu from being square inte- grable, and this is the reason that Vu is only computed on the complement of K. Note that J is the sum of two terms in competition. The first term measures the quality of the approximation; the second term says how smooth we want и to be outside AT; and the third term measures a price or penalty to be paid for this ap- proximation. As indicated above, a and (3 need to be tuned to the class of images being processed. If (3 is quite small, this choice might lead to finding too many edges (and objects) in the image. On the other hand, if /3 is relatively large, some objects will be eliminated along with the additive noise. The optimal value of (3 depends on the class of images. A similar discussion applies to a. We now turn to the Osher-Rudin model. The first term, which measures the error, is the same, while the penalty function is a |Vu(a?)| dx. This term is the BV norm of u. We say that a function и defined on R2 has bounded variation if its
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 183 gradient, in the sense of distributions, is a signed Radon measure with finite total mass. By an obvious abuse of language, this finite mass is denoted by | Vu(a?)| dx. Observe that |Vu(a?)|da; is the sum of two terms. Indeed, Vu = f + /л where f G L1(Q) and where /л is singular with respect to Lebesgue measure. Then \f(x)| dx corresponds to |Vu(a?)|2cL; in the Mumford-Shah model, while \\/л\\ corresponds to More precisely, if и is smooth on finitely many do- mains Di,... ,Dyv of Q with jump discontinuities across their boundaries, then К = D± U • • U DN, \f(x)\dx = f^K ]Vu(a?)| dx, and \\/л\\ = fdD^ j(u) da, where j(u) is the jump discontinuity of и across dDj and da is the arc length. If j(u) = 1 identically, then \\/л\\ — (К). This discussion shows that the Mumford- Shah model and the Osher Rudin model have much in common. In the DeVore-Lucier model, the penalty function is further simplified. The BV norm is replaced by a Besov norm, and the functional that is minimized becomes ||/ —|a(A)|, where u(a?) = 22 q(A)'/a(.t) is an orthonormal wavelet expan- sion of u. As is mentioned in [83], this optimization problem is trivial in the wavelet domain and leads to wavelet shrinkage. To see this, let f(x) = 22 7(^)У,л(д;) be the wavelet expansion of /. Then the functional becomes ^2[(cv(A) — 7(A))2 + /3|q(A)|], and this can be minimized by finding the minimum of (q(A) — 7(A))2 + /3|q(A)| for each A. Assume that 7(A) > 0. Then a simple computation shows that the mini- mum occurs at q(A) = 0 if 7(A) < and at q(A) = 7(A) — if 7(A) > Similarly, if 7(A) < 0, the minimum occurs at q(A) = 0 if 7(A) > | and at q(A) = 7(A) + | if 7(A) < But this is just wavelet shrinkage with the threshold т equal to The last model we consider is the one defined by Donoho. The assumptions are slightly different. Donoho wishes to recapture и from / = и + v, where v is white noise and where и is subject to an a priori constraint of the form ||u||b < C. Here В is also a Besov space, and this a priori knowledge plays the role of the penalty function. In the Mumford Shah model or the DeVore Lucier model, the decomposition / = и + v is a solution of a variational problem. This appears to be an objective search, but it depends on the small parameters a and (3 that must be adjusted to the class of images we wish to process. In the DeVore-Lucier model, the decomposition also depends on the specific choice of the wavelet basis. This is due to the fact that 22 |cv(A)| is not the Besov norm of 22 afAj/i^; it is an equivalent norm. Donoho’s algorithm was developed in a stochastic setting. However, wavelet thresholding makes sense for any function /. One can ask if the и component in / = и + v can be reconstructed from the wavelet coefficients of / that exceed a given threshold. The resulting function и is then the same as the one obtained from the DeVore-Lucier approach. One should also compare the Osher-Rudin model with wavelet shrinkage. As mentioned above, Stan Osher and Leonid Rudin defined the cartoon sketch и of a given image / to be the solution of the variational problem inf J(u), where J(u) = II/ - «111 + AIMbv (11.18) and A is a small parameter. In the paper [59] cited in section 11.2, A. Cohen, R. DeVore, P. Petrushev, and H. Xu. addressed the issue of solving the variational problem (11.18) using a wavelet shrinkage algorithm. They proved this result: If instead of a smooth wavelet basis, one uses the Haar system, then wavelet shrinkage yields a cartoon sketch u such that J(fi) < Cinf J(u). One then says that й is suboptimal. Here C is a fixed
184 CHAPTER 11 constant, and the threshold in the shrinkage must be determined. Theorem 11.5 is a crucial piece of information that is used in this algorithm. Modeling geometric images with BV functions gives better results than modeling them with Besov spaces, but when Donoho wrote his fundamental papers, nothing better than the embedding Bj’1 C BV C B^’°° was known. These embeddings play a key role in Donoho’s denoising strategy. The same wavelet shrinkage is suboptimal for both of these Besov spaces, and thus it is suboptimal for BV(R2). Furthermore, at that time, information about wavelet coefficients of functions of bounded variation was rather poor, while the characterization of Besov spaces us- ing wavelet coefficients was quite simple. For example, f G Bj’1 if and only if Thus it was natural to use the space B^’1 rather than BV. 11.5 Ridgelets One can argue that functions of bounded variation do not adequately model im- ages. Indeed, a function of bounded variation is either the characteristic function of a domain whose boundary has a finite length, or it is an average of such functions. This atomic decomposition is provided by the co-area identity. Modeling objects with characteristic functions of domains with finite length boundaries may be in- appropriate, since the objects we have in mind are probably not that complicated. Donoho decided to describe an image as a collection of objects delimited by smooth boundaries instead of merely rectifiable ones. If one wants to efficiently represent (or compress) smooth domains, standard isotropic wavelets are not optimal. A better algorithm relies on an efficient de- scription of the boundary, and this calls for orthonormal bases that can efficiently represent elongated objects, such as the arc of a circle. No one knew how to do this until Donoho constructed a remarkable orthonormal basis that was designed to provide a sparse representation for objects having arbitrary large eccentricities. Donoho’s construction improved previous work by E. Candes. We are going to describe this basis, and we begin with one of our main themes. When constructing a wavelet basis, we should return to the issue raised by Jean Ville: Should we first segment the frequency domain, or should we use some bases that are built on a segmentation of the time (or space) domain? The construction of Donoho’s basis uses both strategies. Let £ — (£i, £2) be the frequency vector, which will be written in polar coordinates as £ = (pcos 0, psin 0), —00 < p < 00, 0 E [0, 2тг). We let p take negative values and identify (p, 0) with (—p, 0 + tt). Then L2(E£,d£) is identified with the closed subspace H of L2(E£ x [0, 2тг), \p\dpd6) defined by f G H if and only if = /(-р^ + т). (11.19) An orthonormal basis for L2(E£,d£) will be written as an orthonormal basis for H through this representation in polar coordinates. We return to the segmentation issue. The first segmentation is reminiscent of the Littlewood-Paley decomposition. The frequency plane is partitioned into dyadic annuli Гj defined by 2J < |£| < 2?+1, j E Z. To build an orthonormal basis in the p variable that is consistent with this segmentation, one uses Malvar-Wilson wavelets. Let w(p) be an even function of the real variable p with the following properties: w(p) is C°°, w(p) = 0 if |p| < | or |p| > 3, and |p|1//2w(p) satisfies the Malvar-Wilson conditions (section 6.3). The orthonormal basis we will use is set of functions 2Jw(2Jp) exp [гтг (A; + |)2^p].
DATA COMPRESSION AND RESTORATION OF NOISY IMAGES 185 Next, we treat the angular variable 0, and here the segmentation is performed in the frequency domain. We are dealing with 27r-periodic functions, and the corre- sponding frequencies are integers. The orthonormal basis for Т2[0, 2тг) that is used is the periodized version of the orthonormal wavelet basis 2^2ip(2H — к), j > 0, к E Z, and </?(£ — к), к E Z, where both and ip belong to the Schwartz class, (/?(—£) = c/?(t), and ip(l — t) — ip(t). This wavelet basis is indexed by the dyadic subintervals I of [0, 2tt). We write ipi(0), I El. Finally, the ridgelets p\, A E Л, are defined by their Fourier transforms. By definition, PaO) = 2j w(2J|£|)exp [ijv(k + 2J|£|]ip^O) —w(2?|£|)exp [ — 27Г(fc + 2J|£|]ipi(0 + tt) (11.20) Donoho’s original paper on this subject treated the ridgelet expansion of the characteristic function of a half-plane [91]. Since then it has been shown that the ridgelet expansion of a characteristic function of a smooth domain is weak Z1/2, whereas the best one can do with wavelets is weak I1. Thus ridgelets provide better compression for this class of images than do wavelets. 11.6 Conclusions Several problems have been raised in this chapter. The first consisted of defining the class of functions (signals, images) whose wavelet expansions are sparse, in one sense or another. These functions are adequately compressed with wavelets. Depending on the norm that was used to measure the appropriate approximation, several characterizations in terms of Besov spaces have been presented. The second message of this chapter seems to be a success story for wavelet anal- ysis: Whenever the a priori information on a given class of signals or images can be formulated as a bound on a Besov norm, then wavelet shrinkage provides an optimal denoising. On the other hand, if и is a smooth function inside finitely many domains with jump discontinuities across their boundaries, then one should shrink the ridgelet coefficients of f = и + <jv to recover и (v is a standard Gaussian white noise). These two statements seem to be contradictory, but they become consistent if one returns to the definition of the worst risk. This worst risk is the supremum of E[||/ — u||2] taken over the Besov ball ||u||b < C. Such a supremum can be attained for certain intricate functions и that do not correspond to our notion of a cartoon image. Besov balls are indeed very large sets. With the availability of ridgelets, new algorithms for optimal denoising should soon be available. Another message is that there continues to be a need for new function spaces “adapted to edges,” and this provides new goals for functional analysis.
CHAPTER 12 Wavelets and Astronomy This final chapter is about the use of wavelets in astronomy and astrophysics. Wavelets are being applied in many fields of science and technology. We have se- lected astronomy as an example for several reasons: There are diverse applications within the field, and although they all involve some form of signal or image process- ing, the techniques vary from one application to another. Astronomy is driven by sophisticated technology for both ground-based and space-based observations, and this technology has led to problems that appear to be well suited to wavelet tech- niques. Finally, there is widespread popular interest in astronomy and cosmology, an interest that has been kindled by the richness of recent discoveries. The chapter is based on our interpretation of the literature and on discussions with two groups of astronomers, the one directed by Albert Bijaoui (Observatoire de la Cote d’Azur, Nice) and the other led by Andre Lannes (Observatoire Midi- Pyrenees, Toulouse). In his review article on the uses of wavelets in astrophysics [36], Bijaoui discusses a number of problems where wavelet-based techniques are being applied; these include the analysis of solar time series; image compression, detection, and analysis of astronomical sources; and data fusion—as well as the study of the large-scale structure of the universe. We have selected three examples that illustrate different problems and techniques. In each case, wavelets are used in complex algorithms that are handcrafted by experts in astronomy to deal with specific problems. Roughly speaking, astronomical applications of wavelets differ from other applications because of the nature of astronomical images and signals. 12.1 The Hubble Space Telescope and deconvolving its images Long in planning, greatly over budget, and fraught with management and scientific problems, the Hubble Space Telescope (HST) is today one of the scientific wonders of the world. It is not necessary to be an astronomer to be impressed with the images it produces. This was not always the case. Shortly after launch in April 1990, it came close to being the scientific laughingstock of the century. The first images were very disappointing, and the experts soon determined that the 2.4-meter primary mirror of the Ritchey-Cretien telescope had a serious spherical aberration. We will discuss this problem and its correction, but first we need to introduce the model and language astronomers use to describe the process of obtaining astronomical images. 12.1.1 The model Suppose that fi is a digital image received by an astronomer, say, by downloading it from the database at the Space Telescope Science Institute. (This is the agency
188 CHAPTER 12 that coordinates the use of the HST and the distribution of its data.) (Although we are focusing on the HST, Д could be a digitized image from other sources; the model applies to many situations.) The astronomer’s working assumption is that fi is related to the “original object” fo by the equation fi(x) = p * fo(x) + n(xf (12.1) The function p is called the point-spread function. It is determined experimentally as the image of a “point source” star. For ground-based astronomy, p is determined, if possible, during each observing session; it includes the condition of the atmosphere and other parameters that can vary from observation to observation. The situation with the HST is different because there is no atmosphere, and in this case, p is quite stable. A “good” p closely approximates a delta function: Its support is concentrated around zero, and it decays rapidly to zero away from the origin. The width of the central spike is determined by the diffraction limitation of the optical system. A “bad” p will have serious side lobes, or wings, and it spreads the energy from a point source over a relatively large area. The function n denotes noise. In fact, n is a catch-all term that includes both random and systematic errors (errors in determining p, errors resulting from the linearity assumption, image sampling, etc.) and random noise not correlated with the signal (from the telescope, the detectors, the atmosphere, the pointing system, etc.). We write “original object” in quotes because trying to say exactly what it is leads to a philosophical debate. For our purposes, it is an element of the Hilbert space L2(R2). The mathematical problem is to recover fo from the data fi. 12.1.2 Discovering and fixing the problem After the discovery of the aberration, the user community turned to deconvolution to restore the images. It soon became clear, however, that this approach was too costly and had limited success and that a hardware solution would be required. Nevertheless, these initial deconvolution efforts did produce useful data, and the analyses of the point-spread functions—which varied with position of the point source in the field—provided information that helped to uncover the original man- ufacturing mistake. The mirror had been perfectly ground and polished but to the wrong function: The mirror was too flat. The problem was traced to an error in setting up the device, called a null corrector, used to test the shape of the mirror as it was being polished. By knowing the exact nature of the problem, it was possible to design optical systems to compensate for the aberrated mirror. The best known of these is COSTAR, which stands for Corrective Optics Space Telescope Axial Replacement. It is an optical device that intercepts the “aberrated” light just after it passes through the hole in the primary mirror and “corrects” it for use by the spectrographs and the Faint Object Camera. The original High Speed Photometer was removed to make space for COSTAR. Other corrective optics were built into a new Wide Field/Planetary Camera. These replacements, as well as other repairs, were done in December 1993 during the first servicing mission. The optical corrections proved to be wildly successful, and the overall performance was as good “as if the mirror were perfect.” One of the missions assigned to the HST is to explore the outer limits of the uni- verse. We know, based on the time it takes the light to reach earth, that the most distant galaxies observed are relatively young. These distant galaxies are in the
WAVELETS AND ASTRONOMY 189 process of developing their geometric complexity, and the structure of these distant objects provides hints about the development of the universe. Unfortunately, these objects are extremely faint (low intensity), and the received images are particularly noisy. The signal-to-noise ratio is indeed poor. Noise is always a problem in obser- vational astronomy, in fact, it is not an exaggeration to say that it is the central problem. Furthermore, a bad point-spread function leads to a poor signal-to-noise ratio. In spite of the profound disappointment with the first images and the realization that the mirror was aberrated, the telescope provided some useful scientific infor- mation between 1990 and 1993. This was possible because the images could, to a certain degree, be deconvolved. Several algorithms have been used to reconstruct images from the HST—both before and after the installation of corrective optics. Two of these algorithms, the Richardson-Lucy method and the maximum entropy method, are probabilistic. We are going to describe how wavelets are being used to improve the performance of a deterministic approach called interactive deconvo- lution with error analysis (IDEA). This algorithm was developed in the late 1980s by Lannes and his colleagues [169]. As stressed by the astronomy community, the main advantage of IDEA over competing algorithms is the fact that it provides precise error bounds. 12.1.3 IDEA The problem is to extract an image from the data fi, which is modeled by (12.1). This happens to be an ill-posed inverse problem; it does not satisfy the three condi- tions of Hadamard, namely, the existence, uniqueness, and stability of the solution. To have a feeling for this situation, take the Fourier transform of both sides of (12.1). Then /.(0=р(аЛ(а+А(а. (12.2) and recovering fo(f) means dividing both fi(f) and n(£) by p(£). It is clear that problems arise where p(£) vanishes or where |p(£)| << |n(£)|. IDEA is a fairly complex algorithm designed to circumvent these problems. To apply IDEA, one must bring to the process information that does not reside in equation (12.1), so- called a priori information. This a priori information is used to force a solution of the ill-posed problem. We will outline the main features IDEA, which existed as a stand-alone algorithm before wavelets entered the picture, and then we will show how wavelet techniques are being used to improve the performance of the original algorithm. We emphasize that IDEA is not a wavelet algorithm: Using IDEA means working with the Fourier transform and not the wavelet transform. (A detailed description of IDEA can be found in [169].) Since IDEA is a regularization algorithm, we begin with a few words about Tikhonov’s regularization of ill-posed problems (see [248]). As above, the problem to be solved is described by an equation of the form Y = TX + ctZ, (12.3) where T is a compact operator acting on some Hilbert space H, X is the object we wish to recover, Z is an additive noise, a > 0 is a parameter, and Y is the observed data. Tikhonov’s regularization can be described in the context of operator theory
190 CHAPTER 12 or in a more concrete form. In the abstract setting, the regularization depends on a positive number 77 > 0 and reads Xn = (T*T + (12.4) where T* is the adjoint of T and I is the identity operator. Observe that T*T + ql has an inverse if 77 > 0. At a formal level, Xri = T~rY if 77 = 0. But this inverse may not exist, and (12.4) provides us with an approximate inverse. The second version of Tikhonov’s regularization uses a singular-value decompo- sition. There exists an orthonormal basis Cq, ei,... , en,... for the Hilbert space H that consists of the eigenfunctions of the compact self-adjoint operator T*T. Let (An > 0) be the corresponding eigenvalues. In both versions of Tikhonov’s algorithm, T*Y is decomposed as T*Y — oioeo + ctiei + • + <лпеп + • • • , and in the first version we have Xri = aowoAo 2cq + ctiWiA| 2ei + • + OinwnAn^en + • • • , (12.5) A2 where the weights wn = ^^x2 are *п ^1C iRferval (0,1). These weights serve to regularize the divergent series chqAq 26q + ciiAj”2ei + • • • + anX~2en + • • •. We can go further and introduce other weights wn, wn E (0,1) in (12.5). The data are the an, n > 0, and the weights indicate our trust in the data. When T is a convolution operator, it is diagonalized by the Fourier transform. This transform plays the role of the eigenfunction expansion we have seen above. In this form, the weighting coefficients wn are replaced by a weighting function g. which will appear in IDEA and plays a similar role. However, the IDEA algorithm is an improvement over pure Tikhonov regularization. Tikhonov’s regularization is a linear algorithm, and it does not offer the possibility to use the specific (or a priori) information we may have about the object to be recovered. Once the small parameter 77 or the weights wn, n > 0, are fixed, they determine a smoothing operator W with the property that W(en) = wnen. This smoothing or averaging serves to “kill” the noise and to compensate for the “bad” behavior of the unbounded operator T-1. In image processing, this smoothing introduces a systematic blurring of the image and destroys the sharp localization of the edges. The weighting function in the IDEA algorithm is defined in the Fourier domain, and it is determined by preprocessing the given image. Furthermore, IDEA uses geometric information about the image to be reconstructed that we introduce as a priori information. The IDEA algorithm acts in the Fourier domain, but at the same time, it keeps track of the a priori information, which is known in the space domain. We have described Tikhonov’s regularization to provide general background about regularization algorithms, but we wish to stress again that the regularization used in IDEA has a different meaning. Here the regularization of the ill-posed problem involves imposing a priori constraints on the object we wish to reconstruct. With this background, we are ready to be more specific about the algorithm itself. IDEA depends crucially on a function 07 defined in the Fourier domain that provides a pointwise upper bound on the error function n, that is, 1ЛК)-Ж)Л(А <<?<«)• (12-6)
WAVELETS AND ASTRONOMY 191 The quality of the performance of IDEA depends on the quality of this estimate, and it is here that wavelets enter the picture. More precisely, wavelets are used to determine cq. We will explain how this is done in a moment, after describing IDEA. The first step in the IDEA algorithm is to “regularize” the support of the transfer function p. If P is the essential support of p, choose Pr to be a disc of radius r that contains P. (We are assuming two-dimensional optical images, although IDEA can be formulated more generally [169].) Pr will be the synthetic aperture of the system. Because of the practical limitations of telescopes and other technology involved in modern astronomy, it is hopeless to expect that fo can be reconstructed at “its highest level of resolution.” The object to be reconstructed is thus defined to be a smoothed version of fo, namely, Ш = (12-7) The main conditions on s are that most of its energy is concentrated in Pr and that s(0) = 1. One also wants the support of s to be as small as possible, concentrated around x — 0. It is shown in [169] that s can be taken to be a prolate spheroidal function. The support V of fs, whose size and shape is determined interactively in a wavelet-assisted application of IDEA, plays an important role in the algorithm. We stress that at this point neither fs nor fo is known. Our first approximation of fs will be ft, which is defined below. This first “guess” mimics (12.7), but it also takes into account the fact that the data are noisy. The idea is that information buried in the noise should be discarded. This leads to the following procedure that relies on the computation of which will be described in a moment. The function SNR(£) = (12.8) ^(0 defines a pointwise signal-to-noise ratio in the frequency space. This function is used to decide where the information given by Д should be retained and where it should be discarded as being too noisy. To this end, one chooses a threshold value at that is greater than one, but of order one, and defines РЙ) ($) - *’ (12.9) 0 otherwise. It is ft that is now used to find the “reconstructed object” that we call fr, and once again SNR enters the picture. This time SNR is used to define a weight function g(ff). Having defined g, fr is defined to be the function that minimizes the functional «(/) = /sV)IM)-/K)l2dC (12.10) The minimum is taken over all f E L2(V) where V is the support of fs. V is determined interactively and is part of the a priori information. The initial choice of V is described below.
192 CHAPTER 12 One has some freedom in defining g. It should be a nondecreasing function of SNR such that 0 < g(£) < 1. In addition, g must vanish on the parts of Pr where SNR(£) < at and be equal to one outside Pr. One way to define g is to select a threshold value a't with at < a't < sup^ SNR(£) and let g<£) = < i (SNR(Q - at) (a't - at) 0 if SNR(e) >a't, if at < SNR(£) < a^ if SNR(£) < at. (12.11) It is clear that g measures the confidence that can be attributed to the spectral information furnished by the Fourier transform of the noisy image. This is but a brief outline of the principal objects that are used in the IDEA algorithm. The algorithm itself is iterative, and we encourage interested readers to consult the cited papers for a detailed description. The purpose of this discussion has been to present just enough background so that one can show how wavelet techniques have been incorporated in IDEA, which, as mentioned above, existed as an effective algorithm before being wavelet assisted. In particular, we hope it is clear that the function SNR plays a key role in IDEA and that a good estimate for the function cq should contribute to the quality of the results. In all versions of IDEA, it is necessary to estimate cq, and indeed there are prewavelet techniques for doing this. In the wavelet-assisted version of IDEA, (Ji is estimated using the denoising technique described in Chapter 11 called soft thresholding. This is how it is applied by Roques and her collaborators [232]: Step 1: Compute the empirical wavelet coefficients z of the scaled noisy data where n is the number of data points or pixels. This transform is computed using the two-dimensional version of the wavelets adapted to an interval introduced by Cohen, Daubechies, and Vial [58]. Step 2: Apply wavelet shrinkage (soft thresholding) to these empirical wavelet coefficients: gt(z) = sign(z)(max{0, |z| - t}) with the threshold / 2 log n t = (j\ ----. V n a2 is the variance of the noise; we address it below. Note that this operation “shrinks” the wavelet coefficients toward zero by t and sets the coefficients with modulus less than t equal to zero. Step 3: Invert the wavelet transform to produce a denoised image fd and define by <НМ)-/Ж The variance cr2 used in the Donoho algorithm is estimated by analyzing a part of the field defined by fi that contains no image. Recall that the denoising described in Chapter 11 is based on two assumptions: The noise is Gaussian and the image is geometric. The latter of these assumptions is clearly not satisfied for astronomical
WAVELETS AND ASTRONOMY 193 ( / v(pf)dx^ (1 — g2 (fff) df^ , images, and the former is often violated. In particular, one of the components of noise in experimental astronomy may come from photon counters, where the noise is Poisson. In this case, astronomers transform the noisy data to make the noise “look” Gaussian and proceed to act as if it were Gaussian. They replace fi by 2 |; a more complicated transformation is used for mixed noise (see [4]). There is another point, in addition to estimating cr2, where wavelets “assist” IDEA: The denoised image fd is used to choose the initial value of V, which is the support for the deconvolution and thus an important piece of a priori information. In the actual algorithm, the set V is improved dynamically. V also appears in the interpolation parameter v — where v is the characteristic function of V. The value of и provides information about the stability of the reconstruction process. An obvious question is, Why not just use the image /j? As Roques and her colleagues show in [232], fd is indeed a low-noise image, but the resolution has not been improved: The image has been denoised but not deblurred. The companion question is, Why not do the deconvolution using an estimate of the noise similar to the one used to apply shrinking? Again, it is shown in [232] that the combined processes produce better images, at least for the very faint images obtained with the HST. Of course, this brings up the question, What is a good image? Astronomers must judge the quality of the image based on experience. They also have more ob- jective (mathematical) ways to measure the photometric and astrometric13 quality of the restored image. One naturally wants to have as high a resolution as possi- ble without introducing artifacts, but it is the astronomer who must differentiate artifact from image. As stressed before, IDEA has the advantage over competing algorithms of providing an estimate for the relative error \\fr-fS\\ WfrW We note that the people who invented IDEA have benefited from unforeseen good luck: They have been able to compare their deconvolved images with those obtained by the HST after the COSTAR correction was made. This comparison has led to these conclusions: (1) The IDEA algorithm produces corrected images that are closer to the “true” images than the images obtained by denoising methods traditionally used in as- tronomy. (2) The “true” images obtained after the COSTAR correction are better than those obtained by IDEA, which is not surprising. (3) The IDEA algorithm allows one to improve further the images obtained by the corrected telescope. Tests leading to these conclusions were made on images of the supernova SN1987A, which are particularly simple and spectacular. There is a bright core together with a well-delimited extended object, the ring (see [39]). 13Photometric refers to the local conservation of photons, and astrometric refers to the preser- vation of the geometry of the image.
194 CHAPTER 12 12.2 Data compression We are speaking about the problems of storing and transmitting the data acquired by the world’s astronomical observatories. As in the last section, we are looking at a technologically driven problem: The overall quality of telescopes is much greater today than it was 50 years ago. Astronomers were able to capture ten million galaxies in 1950; today they can examine 100 million galaxies. The very nature of the images coming from these instruments had undergone a revolution. Charge coupled devices (CCDs) have replaced silver salts, and chemical photography is almost a thing of the past. We read in [38] that telescopes typically use 2048 x 2048 CCDs at their focus. With 16 bits per pixel this leads to an 8 megabyte image. As an example of the amount of data generated, the Canada-France-Hawaii Telescope generates about 100 images each night, which translates to as much as 800 megabytes per night [251]. Planned future telescopes will generate about 10 gigabytes per night. All of this data must be stored, preferably in a form that offers reasonably easy access. These problems are reminiscent of those posed by the storage of fingerprints. For comparison, the FBI database contains about 200 million fingerprint records, and they receive on the order of 30,000 new cards per day, which is about 300 gigabytes each day [41]. (A set of fingerprints amounts to around 10 megabytes.) The comparison does not end there. With the advent of computers and communication networks, both astronomical images and fingerprints are now transmitted around the world, and compression is an economic necessity for both storage and transmission. Wavelets have been used to compress astronomical images since the late 1970s. G. M. Richter and others used Haar functions to compress astronomical data, which at that time came mainly from Schmidt plates (see [230] and [120]). These were scanned automatically and the data were compressed. This was before modern wavelet theory, in particular, before the introduction of multiresolution analysis, and Richter’s transform differed from the two-dimensional Haar transform related to a multiresolution analysis. Since then the transform has been revised to be a “true” wavelet transform associated with a multiresolution analysis. The Space Telescope Science Institute uses an algorithm called hcompress to compress the Digital Sky Survey, which is a database of images covering the whole sky. This algorithm consists in taking the two-dimensional Haar transform (called the H-transform by astronomers) of the digitized image and then quantizing the wavelet coefficients Wjp (called H-coefficients) using an arbitrary threshold. We will describe a more elaborate version of this algorithm called ht-compress that was developed by Yves Bobichon and Albert Bijaoui [38]. The ht_compress scheme differs from hcompress in the way the thresholds for the wavelet coefficients are determined. After describing the compression scheme, we will describe a regularized decompression algorithm also proposed by Bobichon and Bijaoui [38]. As in the case of IDEA, the regularization is not pure Tikhonov, since the Bobichon Bijaoui scheme uses a priori information to provide a smooth restored image. 12.2.1 ht_compress We illustrate the algorithm in one dimension. Thus, assume that f is a (noisy) signal defined on the integers / = 0,1,2,... ,N — 1, where N = 2P for some positive integer p. The first step is to estimate the standard deviation ctq of the noise in the original signal f. If the noise is not Gaussian, it is transformed as indicated in the last
WAVELETS AND ASTRONOMY 195 section so that it can be treated as Gaussian [4]. Knowing cr0, and assuming uncorrelated Gaussian noise, one can deduce the standard deviation aj of the noise in the wavelet coefficients of the Wjfk at scale j. With these assumptions, the standard deviation at scale j + 1 is related to that at scale j by the relation 1 The second step is to compute the Haar transform: I Wj+i,k = - 2/c). i The two-term filters are defined by (12.12) (12.13) if n = 0 or 1, otherwise, 9(n) = if n = 0, if n = 1, otherwise. Note that the original function f can be recovered using the equations fj,k = 2^[fj+1jh(k - 21) +wj+14g(k - 21)], (12.14) where fi = h and g = g for the Haar transform. In the next step, the Haar coefficients w.hk are replaced with the w' k defined by w'j.k = 0 wj,k if \wj,k\ < naj, if Iwj’/J > Kaj. The positive parameter к controls the compression ratio, once cr0 is determined. The coefficients w' k are quantized by forming the quotient w'- i- Qpk = (12.15) and defining q'- k to be the integer nearest to qj,k. (To avoid ambiguity, shrink the qjk toward zero when it falls exactly between two integers.) Finally, the coefficients Qj k are coded using a lossless 4-bit hierarchical coding scheme (see [146]). The coded coefficients can now be stored or transmitted. For example, it is possible to buy the complete Digital Sky Survey on 102 CD-ROMs compressed by a factor of 10 or on 18 CD-ROMs (8 for the northern sky and 10 for the southern sky) compressed by a factor of 100. These are available from the Space Telescope Science Institute, and, as indicated above, the compression algorithm is hcompress. Astronomers can also download images from the Space Telescope Science Institute. Furthermore, because the compression is based on a multiresolution analysis, the images can be downloaded and restored scale by scale, beginning with the largest scale. This means that an astronomer can stop the process once it is determined that the image is good enough for the task at hand. Bijaoui points out that it is
196 CHAPTER 12 essential to have a correct idea of transmitted images as fast as possible for control during astronomical observations [36]. Unfortunately, the direct restoration of the transmitted (or stored) data using (12.14) can lead to some unpleasant images. To obtain reasonable compression ratios, many of the original wavelet coefficients are set equal to zero, and others are quantized as multiples of K(jj. The result is that the restored image contains relatively large fields of pixels having the same value with abrupt discontinuities between the fields. These blocking effects are the signature of Haar compression (see Figure 2.1). One might expect that the use of smooth wavelets would give better results; however, in this case, going to a longer filter does not seem to be the solution. As Bijaoui remarked [36, p. 85]: Press [227] has introduced the Daubechies filter of length 4. The com- pression and uncompression algorithms take more time than hcompress and the quality of the resulting measurements is generally less than those obtained with the simple Haar transform for astronomical images. This could be due to the specificities of these images, mainly compound of peaks due to the stars. The correlation length is very short, and it is not relevant to process the data with long filters. This may be a victory for the Haar transform, but if a longer filter is not the answer, what is? Several solutions for producing a smoother image have been proposed; see, for example, [176] where Kalman filtering is applied and [259] where interpolation is used. We will outline a solution proposed by Bobichon and Bijaoui [38]; it is an inverse for their hLcompress algorithm. 12.2.2 Smooth restoration Recall that the final coding was lossless, which means that we can recover the coefficients <?'• k exactly. We can also multiply the q'- k by K&j to obtain a new set of wravelet coefficients Wj,k = K(7jq'jk. If the inverse Haar transform (12.14) is applied directly to the truncated and quan- tized coefficients the resulting image will certainly have unpleasant blocking effects. Bobichon and Bijaoui produce a smooth restored image scale by scale, be- ginning with the largest scale j — p. We speak of images, but for simplicity, we continue to illustrate the algorithm in the one-dimensional case. The Bobichon Bijaoui algorithm produces a smooth restored image fj at each scale j by minimizing the energy of the gradient of fj subject to certain constraints. To see how this works, we write (12.14) as the operator equation fi = H A+i + G»3+1 (12.16) and let D denote the first derivative (difference) operator. The restored image at scale j is defined to be the solution of the minimization problem inf [D(HfJ+l + Grj+1)]2, (12.17) 14In this section, does not indicate the Fourier transform.
WAVELETS AND ASTRONOMY 197 subject to the following constraints. The first constraint is that fj^ > 0. This is the a priori information that the image (without noise) is given by a positive function that measures gray levels. The second constraint limits the values v^k can take in (12.17). If the coefficient wj fc = 0, we know that the original wavelet coefficient with index j, к satisfied the condition < Kffj, and this condition is imposed on vj fc as it competes in the minimization (12.17). Similarly, if Wj^ — кг7:1д'у k 0, we know that ( , 1\ , 1\ - 2) < wkk < KGj \Qj,k +2), and the same condition is imposed on Vj^. The algorithm used to solve this minimization problem is an iterative process that passes back and forth between physical space and wavelet space, using the constraints in the two spaces. It would take us too far afield to go further into the details of the algorithm; we encourage the interested reader to consult [38]. 12.2.3 Comments We emphasize that this algorithm proceeds scale by scale, and as pointed out above, this is important to the astronomer: It can save time and money. We mentioned in the last section that astronomers have ways to measure the astrometric and photometric qualities of a restored image. The restoration algorithm we have outlined scores well on both points. The reader surely has noted the similarities between this restoration algorithm and IDEA. In both algorithms there were constraints imposed on the restored image (positivity in the Bobichon-Bijaoui algorithm and the support of the restored image in IDEA) and constraints imposed on the transform (P and s in IDEA and constraints on the Vj^ in the Bobichon-Bijaoui algorithm). We note that there are several other compression and decompression algorithms being proposed and used in astronomy. We mention, in particular, the pyramidal median transform developed by Jean-Luc Stark and his colleagues [241]. A study comparing compression algorithms using Schmidt plate data done at the Strasbourg Data Center has shown that a compression ratio of 260 to 1 can be obtained with acceptable quality using the pyramidal median transform algorithm. The same study showed that the limit for the JPEG standard with the same quality was only about 40 to 1. Readers interested in the details of these techniques can consult the recent book by Stark, Murtagh, and Bijaoui [240]; another source is the website www-dapnia.cea.fr. 12.3 The hierarchical organization of the universe This last section concerns a much more ambitious program that demands consider- able computing power as well as new ideas about how to deal with the information. The program, initiated and developed by Albert Bijaoui, seeks to determine the hierarchical structure in the universe. For example, our planetary system is part of the Milky Way, which itself is included in a much larger structure called the Local Group. According to Hubert Reeves [229, pp. 40, 41]: The Local Group consists of around twenty galaxies in the neigh- borhood of our own, within a radius of about five million light years. Andromeda and the two clouds of Magellan are part of this cluster.
198 CHAPTER 12 The galactic clusters, are they themselves organized into larger units? It seems indeed to be the case. One then speaks of a superclus- ter. Our Local Group is part of the Virgo supercluster. A supercluster contains several thousand galaxies in a volume whose dimensions are measured in tens of millions of light years. Ideas rarely have well-defined beginnings—consider wavelets and the historical account in Chapter 2. This is definitely the case for the idea of a hierarchically structured universe. Edward Harrison in his delightful book Darkness at Night, a Diddle of the Universe [139] cites several authors and sources where the notion of c hierarchical—or even fractal—structure is suggested more or less explicitly: Emanuel Swedenborg, 1734, Principia Rerum Naturalium. Immanuel Kant, 1755, Universal Natural History and Theory of the Heavens. Johann Lambert, 1761, Cosmological Letters. Edward Fournier d’Albe, 1907, Two New Worlds. Charles Charlier, 1922, How an Infinite World May Be Built Up. (The twentieth-century references are [113] and [48]. Detailed references to the eighteenth-century works can be found in [139].) Harrison referred to these sources in the context of his book, which is devoted to a historical and scientific account of the riddle: Why is the sky dark at night? We mention these sources to emphasize that the notion that the large-scale structure of the universe might be hierarchical goes back to at least the eighteenth century. By contrast, it is only as recently as 1924 that Edwin Hubble firmly established that ours is not the only galaxy. The history of cosmology is the history of competing views of the cosmos, and the idea that the Milky Way was the only galaxy was a popular model in the nineteenth century. Harrison points out that the famous astronomer William Herschel, who at one time supported the idea of many galaxies, later in life, “lost his confidence, renounced the many-island universe of Wright and Kant, and adopted a one-island universe. Following his lead, the one-island universe was widely adopted in astro- nomical circles in the nineteenth century.” The situation is vastly different today. The fact that the universe is expanding, as predicted by the Russian physicist Alexander Freidman in 1922, has been well established since the 1930s, and the controversy that thrived in the middle of the twentieth century between proponents of a steady-state universe and those who supported the notion of a big bang tilted definitively in favor of the latter with the discovery of the residual background radiation by Arno Penzias and Robert Wilson in 1965. This discovery led to the serious study of the implication of a big bang and the development of cosmological scenarios to explain how the universe got from t — 0 to what is observed today, which at certain scales is a rather lumpy universe. As noted by Slezak and others [237, p. 517]: The complexity of the distribution of galaxies and of clusters of galaxies is now clearly established up to scales of 50 hr1 Mpc .... The main feature of the galaxy distribution is the departure from homogeneity at all scales within reach. The topology of the distribu- tion is characterized by a complex network of sharp structures, one- dimensional filaments (Giovanelli et al. 1986) or two-dimensional sheets (de Lapparent et al. 1986) suggesting a cell-like geometry .... The high- density structures appear to connect clusters of gedaxies and delineate
WAVELETS AND ASTRONOMY 199 large spherical regions which are devoid of bright galaxies (de Lapparent et al. 1986; Pellegrini, da Costa, & de Caravalho 1989).15 Qualitative observations like these lead to one of the outstanding problems in modern cosmology, which roughly stated is this: How, starting from a relatively homogeneous initial state, has the universe evolved into a structure that “departs from homogeneity at all scales within reach”? Particle physicists who speculate on the origins of the big bang tell us that the “initial conditions,” or at least conditions at, say, t = 1O~30, were never homogeneous. At the top of the scientific hit parade for 1989 were the results provided by the Cosmic Background Explorer satellite, known as СОВЕ, which showed that indeed there were very small variations (1 part in 100,000) in the residual radiation from the big bang. This evidence supports— does not contradict -the big bang theory, but it does not change the problem stated above. It does, however, provide limits within which the problem is to be resolved. Given the problem, what experimental data exist with which one can start work? It is easier to say what does not exist: We do not have a nice three-dimensional map of the universe! The first data available were two-dimensional maps of galaxies. For example, in [236], Slezak, Bijaoui, and Mars identified about 7600 galaxies up to magnitude 19 from Schmidt plates in a 6° x 6° field at the eastern end of the Coma supercluster. The data for [237] is a redshift survey that comes from the Center for Astrophysics. Each strip is 135° wide in right ascension and 6° thick in declination. This is again basically two-dimensional data, where the redshift measures the distance from earth. Forget for the moment that the data are not ideal that there are probably many low-surface-brightness galaxies that have been missed, and that the distance measurements are not perfect—and assume provisionally that we have a good three- dimensional map of the galaxies in a chunk of the universe. Ideally we would like to use this map to check various theories (scenarios) describing the evolution of the universe. One way to do this is to run numerical simulations of different scenarios, for example, the classical cold dark matter (CDM) model or the hot dark matter (HDM) model. This has been done, and one ends up with a simulated universe in a box 192 Mpc on a side. And indeed the results from the two scenarios look different (see [170] and [36]). But clearly it is not enough to look different; one wishes to have an objective measure, and this is one place where wavelet analysis can make a contribution. Given real data, or even our ideal three-dimensional data, it is very difficult to use the data to define clusters, superclusters, and other perceived objects. To “see” these nested structures with objectivity is an extremely difficult problem. Slezak, Bijaoui, and Mars tell us some of the history of this research [236, p. 301]: After the visual identification of clusters on the POSS plates by Abell (1958) and Zwicky et al. (1961-1968), many computer algorithms were introduced to avoid a personal judgment, like cluster analysis (Materne, 1978; Huchra and Geller, 1982; Tago et al., 1984) or contrast analysis (Dodd and Mac Gillivray, 1986). In particular, with these tools or simi- lar ones, the existence of substructures in clusters would be established for an important fraction of rich clusters (Geller and Beers, 1982; Baier, 1983; but see also West et al., 1988; Katgert et al., 1988). So, the dis- tribution of galaxies cannot be reduced to the isolate groups identified 15References cited here are given in the original article.
200 CHAPTER 12 by the cluster analysis, but to a fuzzy hierarchic structure for which the same galaxy can belong to many entities.16 We hope with this background on the astronomical problem and with the other applications presented in the book, particularly the work of Marie Farge, that the reader sees the introduction of wavelet analysis as a natural step. Bijaoui tells us that in the late 1980s, when he first heard about wavelets, their use was not so clear. It was, he says, after he heard a lecture that Alain Arneodo gave in Nice in 1987 that he decided to try wavelet analysis for studying the large-scale structure of the universe. Since then he and his group have pioneered the application of wavelet techniques in astronomy, including innovative mixes of wavelet and statistical tech- niques. In addition to showing that these techniques can be used to identify clusters and superclusters of galaxies, they have shown how to identify voids, which may ultimately prove to be more significant for differentiating cosmological scenarios [237]. Furthermore, they have introduced objective parameters to measure these voids. So far the results favor an intermediate scenario, somewhere between the CDM and the HDM models. 12.3.1 A fractal universe We have talked about using wavelet techniques to determine hierarchical structures, but the complexity of the distribution of galaxies in the universe leads one naturally to think of a fractal structure. This is the path followed by Mandelbrot [192, 193], although, as we have seen, it had been suggested earlier by Fournier d’Albe and Charlier. In fact, they were quite specific in describing possible fractal arrange- ments that could lead to a dark sky at night. Although Mandelbrot suggested a distribution of matter leading to a fractal universe, it seems that a multifractal approach corresponds with reality [161]. We believe that wavelets are today the best tool for analyzing fractal and multifractal structures; in addition, there is some evidence that wavelet-based techniques have the potential for revealing the rules by which complex multifractal structures were constructed. This is a much more ambitious program than “simple analysis.” We mentioned this kind of program in connection with turbulence in Chapter 9. We illustrate the idea with the following simple example. The ideas come from Arneodo and his group at Bordeaux. They have been trying to elucidate the dynamical processes that generate complex fragmented structures like the Cantor triadic set. Here, briefly, is the proposed method, illustrated for the Cantor set [7]. One considers the canonical probability measure ц supported by the Cantor set. One then computes the wavelet transform of //.: ]_ Г / — I) \ W(a,b') = - / a J \ a / The set defined by |W(a, 6)| > Л, where Л is a certain threshold, is represented in the half-plane, a > 0, b E R. One can also, for a > 0, determine the values b such that b |PF(a, b)\ attains a maximum. When these maximal values are plotted, they are seen to be organized into more or less vertical lines with breaks and bifurcations. In the two representations that we have just defined, the dynamics of the frag- mentation appear in full force. Starting with the largest values of a, one moves 16References cited here are given in the original article.
WAVELETS AND ASTRONOMY 201 toward the small values of a. One then observes a cascade of bifurcations that constitute a symbolic representation of the Cantor triadic set. Using the maximal lines, Arneodo has been able to reconstruct the process that leads to the measure he also has been able to do this for more complex measures having support on more complex Cantor sets. We believe that in many cases all the necessary information to reconstruct the process is contained in these maximal skeletons. Can one in a similar way unravel the secrets of the fragmentation processes that have led to the structure of the galaxies? This surely seems to be an overly ambitious program. But is it today any more farfetched than were the ideas of Kant and others in the eighteenth century? 12.4 Conclusions Wavelet-based techniques are being widely applied in astronomy and astrophysics. The review article [36] by Bijaoui cites 114 references. By now there must be well over 200 papers dealing with wavelets applied to astronomy. We believe that this work illustrates the flexibility of the ideas found in wavelet and multiresolution analysis. Astronomers have been particularly inventive in using wavelets to deal with the ubiquitous problem of noise. We have described the use of thresholding, but there are other techniques whereby the noise is dealt with in wavelet space rather than in the Fourier domain or in the original space. The number of different techniques invented are witness to the richness of the method. While the wavelet transform itself plays an important role in the astronomer’s algorithms, the general notion of multiscale processing seems to us to be more pervasive. We have noted the usefulness of progressive reconstruction in practi- cal astronomy and the fact that the popular reconstruction algorithms, which are nonlinear regularization schemes, proceed scale by scale. Another divergence from classical wavelet theory is the use of redundant algorithms rather than, for example, algorithms with decimation. As Bijaoui notes [36, p. 78]: The wavelet transform is a tool widely used today by astrophysicists, but they do not apply only the discrete transform resulting from a mul- tiresolution analysis but a large range of discrete transforms: Morlet’s transform, for time-frequency analysis, the a trous algorithm and the pyramidal transform for image restoration and analysis, pyramidal with Fourier transform for synthesis aperture imaging. Physical constraints generally play an important part in applying the discrete transform. And we emphasize again the unique character of astronomical images and the need to tune the algorithms to the image and the task at hand. Our last remark is a prediction. We have seen in Chapter 11 the intimate relations that exist between wavelet theory and nonlinear approximation. We also have noted the use of nonlinear techniques being applied scale by scale in astronomy. Considering the strong interest and inventiveness astronomers have shown so far in using wavelets and multiresolution analysis, we expect to see the more innovative applications of nonlinear techniques to follow.
APPENDIX A Filter Fundamentals This appendix is written for readers who are not familiar with the basic concepts and language of filters. It also provides a larger context for parts of Chapter 3. A.l The Z2(Z) theory and definitions We begin with a general result about linear operators on Z2(Z). Theorem A.l. If F : Z2(Z) —> Z2(Z) is a continuous linear operator that com- mutes with translations, then there exists a sequence (hk)kez € Z2(Z) such that nln (A.l) nez for all x = (xk)kei E Z2(Z). Furthermore, the function H(af) = ^2ке^^кСкш is in L°°(0, 2тг), and ||F|| = ЦТ/Цоо. Conversely, if (hk)kez E Z2(Z) is such that H E L°°(0, 2тг), then (A.l) defines a continuous linear operator that commutes with translations, and ||F|| = ЦКЦоо. Proof. Assume that F : Z2(Z) Z2(Z) is a continuous linear operator that com- mutes with translations, and let {ek}k&% denote the canonical basis for Z2(Z) defined by efc(n) = 0 if n ф к and efc(fc) = 1. Then Feo is an element of Z2(Z), which we denote by h = (Zifc). Since F commutes with translations, we have Fe/j = hn—ken (A.2) nez for all к E Z. We go to the spectral domain and define the operator F : L2(0,2tt) -> L2(0,2tt) in the obvious way: For X(af) — ^2kezxkCkw in L2(0,2tt), define ВД = y(w) = YXX fcez where = F(xk)- Since the Fourier transform is an isometry, F is a bounded linear operator with the same norm as F, that is, ||F|| = ||F||. Taking the Fourier transform of both sides of (A.2) shows that
204 APPENDIX A so by the definition of F, Fe',k!jJ = Н(ш)егкш. By linearity, FXN = H(cX)XN for any finite trigonometric sum XN. We wish to show that thisjelation is true for all X E L2(0, 2-тг). This follows directly from the continuity of F: Assume that Xv is a finite trigonometric sum such that XN —> X in L2(0,2tt). Then by continuity, FXy —> FX in L2(0, 2tt). Since H E L2(0, 2тг), HX E L1(0, 2тг), and we have the following inequality: \\H(xN — х)\\х < ||Я||2||Хдг - *l|2- The right-hand side tends to zero as N oo, so HXN HX in This and the fact that HXj^ = FXy FX in Т2(0,2тг) imply that the two functions FX and HX are equal almost everywhere, that is, (FX)(w) = Н(ш)Х(ш) for almost every co E (0, 2тг). At this point, it is purely a matter of measure, integration, and functional analysis to prove that H E L°°(0,2-tt) and that ЦЯЦоо = ||F||. The general result is this: Let (X, p) be a measure space and assume that g E L2(X, g) is such that gf E L‘2(X, д') for all g E L‘2(X, g). Then (i) the mapping G : L2(X,g) L2(X,g) defined by f >-> gf is a bounded linear transformation, and (ii) the function g is in L°°(X, g). Furthermore, ||G|| = ||p||oo- However, for the case at hand, one has the richness of the group structure of the integers and its dual group T, and there is a more elegant way to proceed. To prove that H E L°°(0, 2-тг), consider the special unit vectors X.v(-’ - £) = -Ц1 + + • • • + e^-W-01. /XL J Since ||XN||2 = 1 and FXN = ЯХдг, ||ЯХдг||2 < ||F||. When we compute the norm of ЯХдг, we get ~ 1 /*2?г ||FXNMll2 = r- / K,v(w-C|HK)|2dC ^7r Jo where , 9 1 / sin n \ 2 KN(u - Q = - C)| = T7 ( . ) X \ sin —/ is the Fejer kernel. Hence, Kn * |Я|2(с<;) = ||ЯХдг(с<;) ||| < ||F||2, so Kn * |F|2 is in L°°(0, 2tt) uniformly in N. Since |Я|2 belongs to L^O, 2тг), Км * |Я|2 tends to \H\2 in L1(0, 2тг), and we conclude that |Я|2 belongs to Т°°(0, 2-тг). To recapitulate, knowing that FXN — HXn for trigonometric polynomials, we have shown that Я is in L°°(0,27r) and that ЦЯЦоо < ||F||. Once we know that Я E L°°(0, 2-тг), it follows directly that FX — HX for all X E L2(0,2tt) and that ||F|| < ЦЯЦоо, which proves the result in one direction.
FILTER FUNDAMENTALS 205 To prove the result in the other direction, we assume that the sequence (h/J in Z2(Z) is such that H e L°°(0, 2т). For x e Z2(Z), define the mapping F by (Fx)^ — hk-nxn — h * x. (A.3) nez (Note that £neZ hk-nxn is often called the discrete convolution of h and x.) We need to show that Fx E Z2(Z) and that F is bounded with ||F|| = Ц/ГЦоо. (It is clearly linear and it commutes with translations.) But this follows directly from the fact that 1 /*27Г — / dw = V hk-„x„ (A.4) 2?r Jo and the arguments that were given for the proof in the other direction. □ This theorem provides the basis for the I2 theory of discrete filters, and, in fact, we define a filter to be a continuous linear mapping F : Z2(Z) Z2(Z) that com- mutes with translations. There are other definitions for filters that involve different domains, ranges, and topologies, but whatever the setting, filters are always trans- lation invariant and continuous. The I2 context suits our objectives. The impulse response of a filter F is defined to be Foq — h, and the sequence h — (hfcjfcgz also is called the filter. If all but a finite number of the hk are zero, we say that the filter has finite impulse response (FIR) and that it is an FIR filter. If not, it has infinite impulse response (HR). In practice, filters are finite. This does not mean that HR filters are of no interest; they are important theoretically, and they can often be approximated by finite filters for applications. There are, however, finite filters that are finite by design, such as the finite filters associated with compactly supported wavelets. For the moment we will stay with the general case and only assume that (hkfkei Z2(Z). We define the transfer function of F to be the 2T-periodic function H(w) = y hkeik". fcez For convenience, the transfer function of F is more often denoted by F(uf). If T is a bounded linear operator on Z2(Z), its adjoint Tr is defined in the usual way by (fFx,y) = (t,F*?/) for all x, у E l2(7f). A simple computation shows that if F is the filter (h/J, then F* is the filter (h_fc). Thus, F*(w) = F(w). A.2 The general two-channel filter bank The general two-channel filter bank is illustrated in Figure A.l. We are not concerned here with the quantization and transmission, which are assumed to be perfect, although we wish to emphasize that these are serious prob- lems in practice. We assume that the outputs of the analyzing filters Fq and Fi, followed by downsampling (operator D in section 3.3), go directly to the upsampling
206 APPENDIX A (operator D* in section 3.3) and then to the synthesizing (or reconstruction) filters Go and Gi. If Y(u>) = F0(u?)A(lj) is the output of the filter Fq, then after downsampling, the signal is represented by Yq(co) = У‘2к&г2кш. This can be written as ^o(w) = |{Fo(u?)X(cj) + Fq(u + 7r)X(w + 7г)}, and similarly Ti(w) = |{Fi(w)X(w) + Fi(cj + 7t)X(w + тг)}. Thus, the output X' is given by = [{(адед + + (Fq(cJ + 7г)б7()(й>’) + Fl (cj + 7f)G1 (w)) X(ca + тг)}- Note that this output involves two forms of the input: the original signal A'(cu) plus X(cj + -тг), which is called an aliased version of X(ca). Experts in signal pro- cessing tell us that this part of the output is undesirable, so the first step toward perfect reconstruction is to set the coefficient of X(ca + -тг) equal to zero: Fq(cJ + 7t)Go(cj) + Fi(cj + 7t)G1(cu) = 0. (A.6) Then to have the output exactly equal the input, we must have Fq(cj)Go(c<;) + Fi (cj)Gi (w) = 2. (A.7) This requirement is loosened in practice to be Fo(w)Go(w) + Fi(w)GiH = 2e~inw, ntl, (A.8) which means that the original signal is allowed to be delayed. The relations (A.6) and (A.8) are now classic, and they have been “solved” in various ways over the last two decades. We will describe several of these solutions. Esteban and Galand [99] took Fi(ca) =Fo(u + тг), Gn(w) =F0(w), Gl(c<j) — — Fq(cJ + 7t).
FILTER FUNDAMENTALS 207 It is easy to see that these choices satisfy (A.6) and that condition (A.8) becomes IFoMI2 - |F0(w + тг)|2 = where n must now be odd, since ш ш + тг changes the sign of the left-hand side. Esteban and Galand called these filters quadrature mirror filters (QMFs). The name “mirror” comes about as follows: If we extend the function Fq to a holomorphic function in an annulus about the unit circle by defining fc-ez then Fi(z) — Fq(—z) and the filters are mirrored through the origin by the trans- form z —z. The idea behind “quadrature” is only slightly more complicated: Esteban and Galand were interested in real, symmetric FIR filters of the form 2A+1 FoM = fc=0 with /zjv+fc+i = h^-k for 0 < к < N. These conditions imply that the phases of the filters Fo and Fi differ by ±|: hence the phases are in quadrature. Unfortunately, these conditions cannot be satisfied except for the Haar filter (see [253]). To fix this situation, Smith and Barnwell introduced the following conditions (for real filters) [238], [239]: F(^) = - rTrnwF0(^ + 7r). n odd. Go И = GiH = - Fq(u + тг) = e-inwFi(u;). These filters are often called conjugate quadrature filters (CQFs). R,elation (A.6) is satisfied, and relation (A.8) becomes The problem reduces to finding a filter Fq that satisfies this relation. In practice, one would like Fq to be finite (FIR) and “causal.” Causal means that there is no output before there is an input or, formally, that = 0 for к < 0 implies that (Fx)fc = 0 for к < 0. Then it is easy to see that a finite causal filter must be of the form N fc=0 The development in Chapter 3 proceeds slightly differently. We begin by speci- fying that Gq = Fq and the Gi = Ff. Then the problem is to find Fq and Fi that satisfy (A.6) and (A.8). But these are now Fq(cj + tt)Fq(cj) + Fi (u; + tt)Fi(cu) = 0, (A.6') |F0(w)|2 + IFjMI2 = 2, (A. 8') which are exactly the conditions for the matrix in Theorem 3.1 to be unitary. These conditions figure prominently in the proof of Theorem 3.1. A straightforward
208 APPENDIX A computation shows that (A.6') and (A.8') imply (3.1). The implication in the other direction is a bit more technical. Finally, the novice is warned that there are many conventions in this business. Some authors (see [60], for example) save the odd coefficients for Fi(w) instead of the even ones, in which case Yi(cu) = |{Fi (cu)A'(cu) — Fi(cu + 7r)X(u? + -тг)}. There are also conventions about the definition of the transfer function: Sometimes it is defined to be Fo(cu) = 52 These differences can be confusing, but they do not alter the fundamental results.
APPENDIX В Wavelet Transforms The purpose of this appendix is to present several of the basic theorems about wavelet transforms that have been used in the text but that have not been proved. The techniques used to establish these results are typical of those used in wavelet theory, and the proofs will illustrate where the different assumptions about the analyzing wavelet are used. B.l The L2 theory To simplify the notation, we present the results in one dimension, although the results are true for Rn. We assume throughout this section that the analyzing wavelet ф is in L2(R) and that the wavelets are defined by a > 0, 6 G R. (B.l) For future reference, we note that the mapping (a, b) ф(а,ь) is continuous from R* xRto L2(R). The wavelet transform is defined for f E L2(R) by Wf(a,b) = f(x^{a^(x)dx, (B.2) where, as elsewhere, V;(a,b)(;r) = Then |W/(a, 6)| < ||/||||V;IL and thus by our remark about the continuity of (a, 6) it is clear that W/(a, 6) is continuous on R* x R. We wish to prove that the mapping f Wf(a.b') is a partial isometry from L2(R) into L2(R+ x R, db^). This is not true in general, so additional assumptions must be made about ф. The assumption we make, which is called an admissibility condition, is that Jo 1 (B.3) for almost all £ E Rn. We have written the admissibility condition so that it is clear how the results generalize to Rn. For our case, we expect (B.3) to hold for £ = ±1, which means that it holds for all £ 0. One can normalize either CV! or the norm of ф, but not necessarily both. We have chosen to normalize ф so that = 1. Note that the factor a-1/2 in the definition of ^(а,г>) is chosen so that ||ф(а,ь) ||2 — IlVdh- One may prefer different normalizations in other settings, but these choices do not
210 APPENDIX В affect the substance of the results. In fact, in the following sections we replace the factor a”1/2 by a and have ||V’(a,b)111 = IlV’lli- The first result shows that if if satisfies (B.3), then the mapping f Wf(a,F) is a partial isometry (that is, \\f\\ = ||W/||) from L2(R) into L2(R+ x R, db^f). Theorem B.l. Assume that the analyzing wavelet if E L2(R) satisfies the ad- missibility condition (B.3). If fig E L2(R), then fip Wf(a,b)Wg(.a,b)db^ = (B.4) Proof. Since both f and if are in L2(R), we can use Parseval’s identity to express Wf(a, b) as WHa.b) h Г f Г (B.5) Z7T J-oq Z7T If we let Fa(ffi = ^f(fi)>/aif(af), then Ffifi) E Ll(R,df) and Wfia, b) = Ffi-b) for all a > 0. (B.6) Consider the integral / = (2я)2 Г Г (В.7) Jo J-oo a By the definition of Fa and property (B.3), we have I = Г Г |M)I2W«4)I2^ - = ll/ll2 = 2тг|| f II2 < +№. Jo J-oo a Since I is finite, Fubini’s theorem implies that Fa(fi) is an element of L2(R) for almost every (a.e.) a > 0. Thus, Wf(a,b) = Fa(—b) is an element of T2(R) for a.e. a > 0, and 2тг / \FM)\2d^= \Fa(b)\2db (B.8) for a.e. a > 0. It follows by integrating both sides of (B.8) that L°°rjW/(a’b^db^=IW|2' (B.9) Equation (B.9) means that the mapping W : L2(R) —> L2(R+ x R,db^) defined by f ь—> Wf is a partial isometry. Since (B.4) is the inner product form of (B.9), this proves the result. □ B.2 Inversion formulas We are going to prove two inversion formulas. The first one is an L2 formula, and it is a direct consequence of Theorem B.l. In fact, we will prove a more general L2 result that has uothm« to do with wavelets-, the wavelet result follows from
WAVELET TRANSFORMS 211 the general result. The second inversion formula is special because the analyzing wavelet is the Lusin wavelet / 4 1 VW = —(------Г^2- 7Г(Ж + ifi This is the inversion formula that is needed in Chapter 10 for studying the wavelet transform of Riemann’s function. To be specific, Theorem 10.1 is essential for our analysis of Riemann’s function, and the proof of Theorem 10.1 uses (10.7). Lusin’s wavelet satisfies the hypothesis of Theorem 10.1, and Theorem B.5 establishes (10.7) when f is Riemann’s function. Thus, Theorem 10.1 applies when f is Riemann’s function and is the Lusin wavelet. B.2.1 L2 inversion We begin with an abstract setting, not for the sake of abstraction, but because the setting reveals the essentials and simplicity of the result. Thus, let Q be a locally compact Hausdorff space and let p be a positive Radon measure on Q. It is possible that /i(Q) = Too, but we require that p(K) < Too on compact subsets К of Q. We will be dealing with the two Hilbert spaces L2(R) and T2(Q,cfyz). The next assumption is that there is a family of functions G L2(R), a) E Q, such that the mapping a; i—> is continuous from Q to L2 (R). We will state and prove two theorems within this setting, and then we will interpret the results in the context of section B.l. Theorem B.2 . Define the operator T : L2(R) —> T2(Q, dp) by Tf — {f,ify). If T is a partial isometry, that is, if f К/Ж-Т= H/lli, (В.Ю) JEl then Hx)= [ (f,yy^(x)dp(a)) = lim [ {f.yjyvJx) dp(cv), (B.ll) where Kj E Q is any sequence of increasing compact sets such that UjKj = Q. The right-hand limit is in the sense of L2(R). The general theoretical basis for establishing this is the following: If T : Hq —> Hi is a partial isometry from one Hilbert space Hq to another Hi, then the adjoint operator T* : Hi Hq is such that ||T*|| < 1 and T*T — I, where I denotes the identity. The bilinear form of (B.10) is (f,g) = [ (f,M(g,^) dpfia), (B.12) Jn and formally we have (f,g}= [ ( [ (f,yy/y^x)dp(u)\g(x)dx. (B.13) Equation (B.13) is the weak form of (B.ll), but to arrive at (B.13) we had to interchange the order of integration. This and more will be justified by the following stronger result.
212 APPENDIX В Theorem B.3 . If F is a continuous function on Q with compact support, then f(x) = dpfiF) satisfies II/IIl2(r) < Ж(Ш)|2<ЫО 1/2 (B.14) Proof The continuity of the mapping a) i—> and the assumptions on F ensure that the function f is well defined as an element of L2(IR). The inequality f / < sup \\F((w)fi^\\L4R)\\g\\L2W, g e T2(R), R JQ allows us to invoke Fubini’s theorem and interchange the order of integration in the following formula: {F,(g,^}) = ((F,fiJ,g). Since, (T*F,g) = (F,Tg) = (F, it follows that /(ж) = T*F(T) = f Jsi With this representation, (B.14) is a restatement of ||T'*|| <1. □ With the establishment of (B.14), the proof of Theorem B.2 is straightforward. The estimate (B.14) implies that the representation T*F(x~) = / F^fiw(x)d/fi(F) (B.15) Jo is true for all F E L2(Q, cfyz). Indeed, since the continuous functions with compact support are dense in L2(Q,d/r), (B.14) implies that the representation (B.15) is true for all F E L2(fil,dg). By taking F = Tf, f E L2(R), we see that equation (B.ll) is just a restatement of T*T = I. An algorithm for computing the integral in (B.15) is equally easily established. Let Q be the union of an increasing sequence of compact sets Kj, j E N. Then fK F(w>),ifCJ(x) d/jfcv) = Ffiafificfix) dg(al), where FfiiF) — F(c<;) if ш E Kj and Fj(aj) = 0 if ш Kj. Clearly, the sequence fj(x) — fK Р(со)^(х) d/jfaf) tends to /(ж) = FfKjfEfix) d/fiaf) in L2(K). Finally, we interpret Theorem B.2 in the context of Theorem B.l. Thus, let Q = 1R+ x IR, ш = (ft, &), and d/jfaf) =dbf%. It is not difficult to show that the mapping (a, 6) и-» fja,b from x R to L2(1R) is continuous. Thus, in view of Theorem B.l, Theorem B.2 applies, and we have the following result for the continuous wavelet transform. Theorem B.4 . With the same hypotheses of Theorem B.l, Ж> = rrWf(a,b)<l,(.^)dbdf (B.16) JO J-OO a The integral converges strongly in L2(R) in the sense indicated in Theorem B.2.
WAVELET TRANSFORMS 213 We can state a slightly different form of Theorem B.4. A consequence of Theorem B.l (and Fubini’s theorem) is that Wf(a,b) is in L2(R, db) for a.e. a > 0. Thus, for a.e. a > 0, the function fa{x) = / Wf (а, Ь)ф(а<Ь)(x) db (B.17) is well defined. This function fa can be interpreted as the component of f at scale a for the decomposition given by the wavelet fi. Another application of Theorem B.3 shows that f(x) = [ fa(x) Jo a (B.18) which says that f is the weighted sum of its components at scale a. B.2.2 Inversion with the Lusin wavelet The next result is a reconstruction theorem when the analyzing wavelet is the Lusin wavelet 7 x 1 = —------—z. 7г(ж + г)2 Lusin’s wavelet is the restriction to the real line of the function Ф(г) = ±(z + i)~2, which is holomorphic in the open half-plane Q = {z = x + iy | у >0}. The same remark applies to the functions fi(a,b) — where a > 0 and b E R. (Note the normalization ||IIi — IlV’lli-) Thus, we cannot expect a function f to belong to the linear span of the functions V;(a,b)? a > 0, b E R, unless f has a holomorphic extension in Q. The reconstruction theorem is the converse of this statement. Theorem B.5. Let f be a bounded continuous function defined on R, and as- sume that there exists a bounded holomorphic function F defined in Q with the following properties: (1) F(x + iy) —> f(x) uniformly for x E R as у —> 0. (2) F(x + iy) 0 uniformly for x e R as у —> +oo. Then for x E R; f(x) = lim f f Wg{a,b)fi{a^x)db^-, (B.19) R . > тс '' P J —OO where Wf(a,b) is the wavelet transform Wf(a,b) = (f,fi(a,b)} = У f(x)fi{ab)(x)dx. (B.20) Furthermore, the convergence is uniform on compact subsets of№. The proof is an exercise in classical complex analysis, and it follows directly from the following two lemmas. Lemma B.l. yF'(x + iy) —> 0 uniformly for x ей as у +oo.
214 APPENDIX В Let Г denote the circle = z + ^егв | 0 < 0 < 2тг, z — x + iy, у > 0}. We use Cauchy’s formula to write F' as F'(x + iy) = [ F^ -72 d(f 2m Jr — x — iy)2 which implies that y\F'(x + iy)\ < 2sup^6r |F(£)|. The result follows from this estimate and property (2) of Theorem B.5. Lemma B.2. yF'(x + iy) —> 0 uniformly on compact subsets of R as у —> 0. Cauchy’s formula and a simple limiting argument show that F'(x + iy) = -— [ ----—-77 dt, у > 0. v J 2m ft — x — iy)2 y Since p_-rL^)2 dt — 0, we can write r dt = r f(t) - dt J-oo ft — X — iy)2 J-oo ft ~ x — iy)2 1 У f(x + yu) - /(ж) (u — г)2 Write the last integral as f f(x + yu) - /(ж) f f(x + yu) - /(ж) J\u\<r (и-г)2 U + JH>R fa-i)2 du, and fix R large enough so that Г /(ж + ?/ц) - /(ж) \u\>R fa — i)2. With R fixed, f(x + yu) — /(ж) —> 0 uniformly for ж € К C R and |м| < R as у 0, where К is any compact subset of R. This shows that / f(x + yu) - /(ж) s J[U\<R u~2 uniformly for ж G К whenever у is sufficiently small. Combining these estimates proves Lemma B.2. We now return to the proof of the theorem. The first step is to observe that 2iaF'fb + ia) = (f,fi(a,b))- Then Cauchy’s formula yields 2?f?2 F^(b l ?ri\ Wf(a,b)fj[ay))(x) db =------- / ---------—db = — 4а2Т/х(ж + 2ia), _0Q 7Г _____00 (X l) “4“ 1/Qjj and our task is to show that fR f(x) — — lim / F"(x + ia)a da. R^ + oo dp Integration by parts yields / F"{x + ia)a da = [ — iF'(x + ш)а] p + i Ff(x + ia)da.
WAVELET TRANSFORMS 215 f Й(±м)|2— = !• Jo и The first term on the right-hand side tends to zero uniformly on compact subsets of R by the lemmas. The second term is F(x + iR) — F(x + гр), which converges by hypothesis to —f(x) uniformly on 1 as 0 and R Too. B.3 Generalizations The wavelet analysis of functions that are not square integrable is an important issue. It is not an academic problem, since it concerns many stochastic processes such as white noise and fractional Brownian motion. Ordinary Brownian motion is also an example. We assume that the analyzing wavelet belongs to the Schwartz class 5(R). If we wish to analyze a tempered distribution f G ^'(R), we first compute its wavelet coefficients Wf(a,b) = a > 0, b G R, (B.21) where г/;(а,Ь)(ж) — а'0(^^)- We then hope to recover f through the inversion formula /(ж) = f [ Wf(aJ^a^x}db — (B.22) Jo J-oo a whenever satisfies the admissibility condition (B.3), which in this case is (B.23) However, equation (B.22) is not true in general if f is not square integrable. More precisely, the validity of (B.22) is related to the behavior of f at infinity. This is rather surprising. One might have thought that (B.22) cannot be true because f was too irregular. This is not the case, and, in fact, complicated distributions can be represented locally thanks to the oscillating nature of the wavelet. A counter- example to (B.22) is simply the function f(x) = 1, or, more generally, f(x) — P(x), where P is any polynomial. On the other hand, the Dirac mass at x = xq or any compactly supported distribution can be recovered from its wavelet coefficients using the inversion formula (B.22). We are going to discuss these facts in a slightly more general setting. The analysis of /, which is the computation of the wavelet coefficients, will be done using an analyzing wavelet г/i, but the synthesis will be achieved with a second wavelet 0. (The usefulness of this generalization was stressed by Matthias Holschneider. This generalization also paves the way to discrete biorthogonal wavelet expansions [see Chapter 4].) The first step is to rewrite the admissibility condition as 0(su)^(su)— = 1, s±l. (B.24) If we write p = 0 * г/i, where ?Дж) = ^(—ж)? then (B.24) is equivalent to the two conditions ,, du _ , f°° , 4 du 7](u) — = — 1 and / p(u) — = 1. (B.25)
216 APPENDIX В We will assume that f is a tempered distribution and that both 0 and belong to the Schwartz class <S(R); however, in certain cases it is sufficient to assume that Г) G <S(R). With these assumptions, we have the following result. Theorem B.6. There exists a function in 5(R) such that f^x>tp(x)dx = 1 and f(x)=[ [ Wf{a,b)e{a^{x)db— + f * tp(x). (B.26) Jo J-oo a We will prove this result, but first a few comments are in order. If, for instance, f(x) — 1 identically, we certainly have Wf(a,b) — 0, and (B.22) is not true. How- ever, (B.26) is true; it reads 1 = 1. Identity (B.26) appears several times in the book in various disguises. In the context of a multiresolution analysis, (B.26) corresponds to writing / = 9 + (В.27) where g belongs to Vq and fj belongs to Wj. More precisely, g(x) = ^2 Pk)tp(x - kf (B.28) k— — oo which mimics the convolution product f * tp. In other words, (B.26) amounts to writing f as the sum of a trend, given by f * tp, and small scale details, given by the integral. Proof of Theorem B.6. We will be considering the function ПОС 7 Wf(a,b)0(a b}(pc) db(B.29) Since Wf(a, b) is infinitely differentiable on (0, +oo) x (—oo, +oo) and since it grows no faster than a polynomial as \b\ oo, the integral in (B.29) raises no convergence issues and is well defined. As in the L2 theory, we use Plancherel’s identity (which is in fact used to define the Fourier transform of a tempered distribution) to write Wf(a, 6) as Wf (a, b) = J- Г d?. (B.30) 27Г J-oq Now fix x and integrate with respect to b to obtain У ^f(a,b)0{a>b>)(x)db=j У f(^(a&0(a^(xyb£ d^db. (B.31) Since f can be viewed as a tempered distribution on IR2, that is, f G 5'(1К2), and since for each fixed x, is in <S(IR2), we can interchange the order of “integration” in (B.31). Thus, (B.31) can be written as Г Wf(a,b)6{a,b,(x)dx=f Г Г 27Г J no J no “ ” (B.32) = 5-/ Z7T
WAVELET TRANSFORMS 217 We end the proof by doing the integration with respect to a. This becomes simpler once we introduce the function H defined by du H& = - J-oa u Observe that 77(G) = ^(0)0(0) = 0 because tp has at least one vanishing moment. Thus is in the Schwartz class. Note that (B.25) implies that = 0. It follows from these facts that H is in the Schwartz class. Furthermore, (B.25) implies that /7(0) = 1. Then we have / 4«) - = - Я(С, (В.33) J e a and c.W = 2_/' (B.34) Z7T j — QQ The function H is the Fourier transform of a function ip that also belongs to the Schwartz class. Thus we can write (B.34) as Fe = f * cp£ - f * (B.35) where 7?е(ж) = |<p(|). The inversion formula follows directly from (B.35): Since /7(0) = p(x) dx — 1, f * converges to f in the sense of distributions as £ tends to zero. This completes the proof of the theorem. As a final remark, note that it is possible to define directly as <р(ж) = — У rj(s) ds =-----У r](s) ds. □ One might suspect that = [ У (B.36) would converge to f as £ tends to 0 and R tends to 00. A calculation that is almost identical to the one above yields Fe,R = f * Ae - f * 4>R- (B.37) As we have already seen, f * —> f in the sense of distributions as £ —> 0. If f has compact support, then f * (рц —> 0 as 77 —> 00, but this is not true in general if f does not have compact support. Thus we see that it is the behavior of f at infinity that accounts for the failure of (B.22). We have stated and proved Theorem B.6 in the context of tempered distributions and wavelets in the Schwartz class. The strategy of the proof can be used to establish similar results under different assumptions about the analyzed object f and the wavelets ip and 0.
218 APPENDIX В In many problems concerning pointwise regularity, it is not necessary to have an inversion formula like (B.22) that includes all of the wavelet coefficients. Formula (B.26) is often sufficient. For example, the first term contains all of the information necessary to compute Holder exponents at a given point, while the term f * ip is the “smooth” part of the function. Matthias Holschneider made the interesting observation that the flexibility of- fered by the choice of the second wavelet allows one to “cheat.” For example, assume that the analyzing wavelet satisfies only f^o'i^>(x)dx — 0 and that the wavelet 0 used for the synthesis belongs to 5(R) and is such that tj satisfies the admissibility condition. Now assume that the wavelet coefficients Wf(a,b) of the function f that we wish to analyze satisfy |^(«, 6)| < Caa (1 + (В.38) for all 0 < a < 1 and \b — жд| < 1, where 0 < a' < a and a > 1. Then it is a straightforward application of Theorem B.6 to show that f belongs to С“(жо). The point is that since a > 1 and has only one vanishing moment—хгф(ж) dx may not exist, or, if it exists, it may not vanish—cannot be used alone to conclude that f G Ca(xo). An application of this strategy is provided by the Riemann function 7£(ж) = — sin(7rA). n=l For analyzing 7£, it is convenient to use the Lusin wavelet ф>(х) — because the wavelet coefficients are the values of the Jacobi theta function in the upper half-plane. In principle, the Lusin wavelet cannot be used to prove that 7Z belongs to С3/2(жо) when Xq = 1: Although JZZq 'Ф(х) dx = 0, JZZq %'ф(х) dx is not even finite. One can navigate around this obstacle by choosing a synthesizing wavelet 0 G 5(R) such that ~ <7?/ 0(и)ф(гф — = 1. и (The original paper is [144].) Another example of the flexibility provided by the “biorthogonal” continuous wavelet analysis was provided by Holschneider to invert the Radon transform [143].
APPENDIX A Counterexample C.l Introduction A counterexample to Mallat’s conjecture about zero-crossings (section 8.4) was found by Yves Meyer in the early 1990s. It appeared in conference notes, but it has never been published. Since there is continuing interest in analyzing signals using zero-crossings, we have elected to present a complete discussion rather than the outline given in the first edition of this book. The following counterexample is based on the one announced by Meyer. The development given here is more “constructive” than that presented by Meyer, but the price paid is that the proof requires considerable computation. The counterexample for two dimensions follows rather easily from the one- dimensional case, where the real work must be done. The construction given here is reminiscent of the one given for the counterexample to Marr’s conjecture in sec- tion 8.3. However, in the case of Mallat’s conjecture, there are other conditions to be satisfied, since both the zero-crossings and the first derivatives of the functions must agree. Fortunately, these conditions must be met only for dyadic values of the scaling parameter p. This makes it possible to construct a smooth, compactly supported counterexample. The other difference between the two conjectures is that in Marr’s case the kernel is the Gaussian and in Mallat’s case the kernel is the basic cubic spline. We begin with the function /о defined by /oW = fn + COS< (СЛ) 10 if \t\ > 7Г. We will show that there are infinitely many functions of the form /(O = /oW + W) (C.2) such that (/о * Op)" and (/ * 0p)" have the same zeros when p = 2tt2~j , j E Z, and such that at these zeros, (/o * @РУ(У) = (/ * @РУ(У)- The function R will be a “small” C°° function whose support is < \t\ < j. To keep things symmetric, we will define R so that it is even. The function 0p(t) = p~10(p~^1t), where 0 is the basic cubic spline (Figure C.l). The analysis centers on locating the zeros of (fo*0p)"• We will show that there are only two simple zeros in the interval (—7? — 2p, тг + 2p) for each value of p = 2tt2~j . We will also show that, at these zeros, both (R * 0рУ and (R * 0рУ vanish. This proves that at these points the derivatives of fo * 0p and f *0p agree. Observe that both fo * 0p and f * 0p vanish identically outside (—7? — 2p, тг + 2p).
220 APPENDIX С Fig. C.l. The cubic spline в = T * T. The last step will be to show that the functions (/о * and (/ * Op)" have the same zeros. For this we will argue that there is a constant M such that |(Л*^)''И|<Л/|(/о*вр)''(<)| (C.3) for all t E R, uniformly in p — 27Г2--7. Then for some A > 0, we can replace R by AR and have |(A7?*9p)"(t)| <r|(/0*^)"(t)|, (C-4) where r < 1. The conclusion that (/0 * 0p)"(t) and (/ * 0p)"(t) have the same zeros follows from the following lemma. Lemma C.l. If и and v are two continuous functions on R such that |v(t)| < r\u(t)\ for some 0 < r < 1, then и + v and и have the same zeros. We begin by establishing several representations for the convolutions and their derivatives of the functions fo, 0, and R. Once we have established these rep- resentations, the proof follows rather easily. Our approach is to develop explicit representations for the various functions. The objective is to reveal the geometry of the situation, which we hope leads to an understanding of how the example works. We begin with observations about the kernel в. C.2 The function в As before, в is the cubic spline 0 = T * T, where T is the triangular function that is equal to 1 — \t\ if \t\ < 1 and is equal to zero if \t\ > 1. Recall that 0p(t) — p~16{p~1t). The support of в, and of its derivatives, is [—2, 2], and thus the support of 0p(t) = p~10(p~1t) is [—2p, 2p\. In what follows, we will change scale and let p = 27r2-J, j E Z, rather than having p = 2--7 . This makes the supports of the functions 0p commensurate with the support of /о- Note that 0 and 0" are even functions. Since it is useful to visualize 0", which is the analyzing wavelet, it is shown explicitly in Figure C.2. The fourth derivative of 0p, which is the distribution ^4) = p-4 [6_2p - 46_„ + 66 - 4<5„ + 62p], (C.5) plays a featured role in our analysis. Here, <5 is the usual “delta function,” and we write 6a to indicate that the “Dirac mass” is at the point a. We use the notation
A COUNTEREXAMPLE 221 Fig. C.2. The second derivative of в. Tt0^ to denote the distribution 0^ shifted to the right by t. Thus by definition, 99 * 0^p\t) = and for any continuous function 99, 9? * 6^4)(t) = p~4[v(t ~ 2p) - - p) + 699(f) - 4(/9(t + p) + <p(t + 2p)]. Note that the “filter” 0^ has the following important property: P^W(i) = 0 (C.6) for all t whenever P is a polynomial of degree < 3. Also note that J 0p(t) dt = 1. C.3 Representations of fo * 0p and its derivatives Although the counterexample depends on the discrete values p = 27Г2--7, the func- tions /0 * Op and (/0 * Op)" can be analyzed for all p > 0. Thus, in this and the following section we will consider ranges of p rather than ranges of j. More will be said about this distinction as we progress through the demonstration. We note once and for all that /0, 0p, and /0 * 0p are even functions, as are their even derivatives. We use this fact freely without further comment. Here are several expressions for /0 * 0p and (/0 * Op)" that will be used in what follows: /o*0p(t)=p 1 / fQ(t-s)0(p 1s)ds, (fo * Op)"(pt) = p~2 / f0(p(s - t))0"(s) ds, (fo*0p)"(t) = Fo*0^(t), (C.7) (C.8) (C.9) where Fq is any C2 function such that Fo(t) — fo(t) for all t E R. In particular, an obvious definition for Fq is this: I ° P[)(t) = \ |(t + 7г)2 - (1 + cost) 27Tt if t < —7Г, if |t| < 7Г, if t > 7Г. (C.10)
222 APPENDIX С However, another useful function is Fb(—t); the choice of using Fo(t) or Fq( —t) in (C.9) depends on the computation one is doing, which in turn depends on the value of p. We emphasize that relations (C.9) and (C.10) can be used to give an explicit representation of (/q * @РУ' for aH P and all t, and hence that the values of the function and its zeros can be computed to any degree of accuracy for a given p. When the support of is in [—7Г, 7г], that is, when the five points t — 2p. t — p, t, t + p, and t + 2p lie in [—тг, 7г], there is another useful representation for (/0 *0РУ'- In this case, Fq * = — cost * since the filter 6^ “kills” the quadratic part of Fq, as (C.6) shows. Thus from (C.9), p4(/o * — cos(t — 2p) + 4cos(t —p) — 6cost + 4cos(t+p) — cos(t+2p) = — 24(sin cost, and we have / sin — \ 4 (/o * Qp)"(t) = -( —) cost. (C.ll) \ 2 / This representation holds for all p < and |t| < тг — 2p. Since the theme of our program is to understand the behavior of the functions involved, we list for future reference the explicit representation of (/q * $p)/z(t) for p > 2тг. This expansion is based on (C.9) and (C.10). Since (/0 * Op)" is an even function, we consider only positive values of t. For 0 < t < 7Г, p4(/o * = 3t2 + Зтг2 - 4тгр - 6(1 + cost). (C.12) For ТГ < t < p — 7Г, p4(/o * Opy\t) = 6tU - 4тгр. (C.13) For p — ТГ < t < p + ТГ, p4(/o * ®рУ'{1) = -2(t - P + тг)2 + 6tU - 4тгр + 4(1 + cos(t - p)). (C.14) For p+7T<t<2p — 7Г, p4(/o * Opy\t) = -2rrt + 4тгр. (C.15) For 2p — тг < t < 2p + тг, /(/o * Opy'(t) = j(t - 2p - тг)2 - (1 + cos(t - 2p)). (C.16) Using this representation of p4(/o *@РУ' (and the representation of its derivative) it is an easy piece of analysis to show that p4(/o * Op)" has the following properties (see Figure C.3): • P4(/o * 0p)" has a minimum value of Зтг2 — 12 — 4тгр at t = 0. • P4(/o * 0p)" has a zero at t = |p when p > Зтг. • P4(/o * 0p)" is monotonic increasing from t = 0 to t = tm, where tm — p + p and where p is the unique solution of t + sint = in the interval (—7г,тг). • /(/о *^),z is monotonic decreasing from tm to t = 2p + тг, after which it vanishes identically. The point trn = p + p will appear again in section C.7.
A COUNTEREXAMPLE 223 Fig. C.3. Plots of *вРУ' for three values of p. C.4 Hunting the zeros of (fo * Op)" There are two problems: to find the zeros and to show that they are simple. The method used depends on the size of p. It is relatively easy to do the analysis for p < and for p > Зтг. The intermediate cases require more computation. We will not present all of the computations involved in this analysis, but we will indicate at least one way to proceed in each case. As before, we will consider only t > 0. In each case, we will prove that there is one simple zero between 0 and 7Г + 2p. All of the functions vanish identically for t > tv + 2p. We will consider three cases. Case 1: p < The representation (C.ll) shows that is a simple zero if p < It also shows that is a zero when p = that it is a simple zero follows from the continuity of the function (Jo * 0p)"' Moreover, there are no other zeros in [0, tv + 2p). The fact that there are no other zeros between and tv + 2p is easily established for p < £ if one writes (/о * = У fo(s)0p(s -t)ds = - I cos(s)0p(s - t) ds. (C.17) When p < the last integral equals — j’Z cos(s)0p(s — t)ds, which is strictly positive for < t < 7? + 2p. The case p = also follows from (C.17), but the argument is less obvious. There is no problem when t > 7Г, since for these values the integrand is positive. Thus,
224 APPENDIX С we consider < t < тг. In this case, the integral is /ТГ pt Pit / — cos(s)$2L (s — t) ds = / + / + / — cos(.s)$2L (s — t) ds. / x 7Г / x 7Г / 7Г / x 2 «7 f' 2 2 b The second and third integrals are positive, but the first is negative. We compare the values of the first and third integrals to show that the first is smaller in absolute value than the third. For this we write f2 \ A(t) = / — cos(s)02L (s — t) ds — / — cos 1 s-----)6L(s----------t) ds 4 Jt \ 2/ 4 \ 2 / and B(t) = / — cos(s)$2L (s — t) ds = / — cos(s — тг — t)6L (.$ — тг) d.s. Jt 4 Jt 4 Since < t < .s < тг, it is not difficult to see that | — cos(s — f )| < — cos(s — тг — t) and that 0^(s — — t) < dz(s — тг). Hence |A(t)| < B(t), and since the second integral is positive, the sum is positive. It is not necessary for the counterexample (and thus they are not included), but arguments similar to the last one can be used to show that (/0 * 0p)" is strictly positive for < t < тг + 2p when < p < This establishes that for p < (/o * dp)” has only one simple zero when 0<t<Tr + 2p. Case 2: p > Зтг We have essentially dealt with this case in section C.3. It is clear from the analysis of the representation (C.12)-(C.16) that p4(./o * dp)” has only one zero, t — ^p, in [О, тг + 2p) and that it is simple. Case 3: < p < Зтг This is the no-man’s-land case where the supports of /0 and d are about the same size, and consequently the computations become more difficult. We have done specific computations for p = тг, and 2тг using the representation (C.9) and developing explicit formulas similar to (C.12)-(C.16). The result, as expected, is that there is exactly one simple zero in the interval [0, тг + 2p) in each case. We also have located these zeros well enough for the task at hand, which is to show that these zeros are also zeros of (R * dp)”. These computations, while perhaps tedious, are completely elementary. The results on the zeros of (/0 * dp)” are summarized as a lemma. Lemma C.2. For each p = 2тг2~^, j G Z, (/o * dp)” has only two (symmetric) zeros in the interval (—тг — 2р,тг + 2p) and these zeros are simple. • For j < — 1, the zeros are ±|p. • For j = 0, the zeros are located in the intervals |тг < \t\ < ^тг. • For j = 1, the zeros are located in the intervals |тг < \t\ < ^тг. • For i — 2, the zeros are located in the intervals \тг < \t\ < |тг. «7 7 J I I » • For j > 3, the zeros are ±|тг. «7 - 7 J
A COUNTEREXAMPLE 225 C.5 The functions R, R* 0p, (R * 0PY, and (R * Op)" The function R is defined in terms of a small C°° function whose support is con- tained in | < \t\ < Thus let h be an arbitrary function in C°° with support in [|, ^]. Define 0 9(t) = 0 -h(-t) if 0 < t < f, ifi> f, if t < 0. (C.18) Define R(t) = g”'(t). Then R is an even C°° function defined on the whole real line. The function R has been defined as a third derivative so that the representations = (C.19) (K.^)'(t) = -5*9‘4’(t) (C.20) hold for all p and all HR. The utility of these representations, which play a central role in our arguments, is based on the following two facts: First, the supports of both g' and — g are contained in the set К — [—ff ] U [f, f ]• Second (which will be proved later), the support of the filter rto0p^ when to is an isolated zero of (/o * OpY' does not intersect the interior of K. This means that g' * #p4\to) = 0 and — g * #p4\to) = 0, and thus, by (C.19) and (C.20), that (R * 0p)" and (R* Op)’ vanish at these zeros. This shows that the fact that (R * Op)” and (R * 0p\ vanish at the zeros of (/o * Op)" depends only on the support of h. C.6 (R * OpY' and (R * 0PY vanish at the zeros of (fo * 0pY' The idea of the proof is to locate the zeros of (/0 * Op)" and then to use the representations (C.19) and (C.20) to show that these zeros are also zeros of (R*0p)" and (R * Op)’. The motivation for this approach is to see explicitly the conditions that must be imposed on h to make the counterexample work. In this part of the proof, it is only the support of h that counts. Having located the zeros of (/0 * Op)", it is a simple exercise to show that they are zeros of (R*0p)" and {R^0p)’. In fact, if to > 0 is an isolated zero of (/o *$p)/z, then the points to — 2p, to — p, to, t0 + p, and to + 2p do not intersect the interior of the support of g or д’, which is К in both cases. In only two cases, j• = 3 and j = 4, does one of these points even intersect the boundary of К (at ^), and both g and g' vanish (along with all of their derivatives) at this point. For j > 5, the five points completely miss K. If to = |p for j < —1, then the five points in question are — |p, — |p, |p, |p, and |p, and none of these points come even close to K. The cases j = 0,1,2 are easily checked, although here one must check that the five points of the filter miss К for the range of values indicated in the “zero summary.” In short, the support of rto0^ misses К whenever t0 is a zero of (/0 * 6pY'- We note that the isolated zeros of (/0 * OpY' are all “infinite” zeros of (R * Op)". Since the supports of g and д’ are contained in K, the zeros of (/q * 0p)" are also zeros of {R^opy. This proves that having (R * 0p)" and (R * 0p)’ vanish at the zeros of (/0 * 0p)" depends only on the support of h. Proving that the zeros of (/0 * 0p)" and (/ * 0p)" are the same for the discrete values p = 2тг2~-7, j G Z, is another matter.
226 APPENDIX С C.7 The behavior of (R * 0p)"/(/o * 0P)" As indicated in the introduction, we wish to show that is bounded uniformly in p = 27Г2--7 , j G Z. But before getting into the details, we make some preliminary observations. The range for t will always be [—7? — 2p, 7Г + 2p], but because of the symmetry we only look at 0 < t < тг + 2p. For each p, (R * 0p)"(t) = 0 for all t > j + 2p and (/o * 0p)"(t) = 0 for all t > тг + 2p. For each fixed p = 2тг2~\ the function L is continuous on the interval 0 < t < tt + 2p. This follows from the fact that the isolated, simple zero of (/0 * 0p)" is a zero of (R * 0p)". The “other” zero of (/0 * 0p)", that is, t() = 7? + 2p, offers no problem, since (R * 0p)" is identically zero in a neighborhood of to- Thus, for each j, there is a constant Mj > 0 such that (R*oPYfW <M (fQ*Op)"(t) ~ J for 0 < t < it + 2p. Our immediate goal is to show that there is an M such that Mj < M for all j. Again, we consider cases. For j > 5, we use the representations (C.ll) and (C.19). We know from (C.19) that (R * 0p)"(t) = 0 for t > + 2p for all p. Thus for all j >5, (R * 0p)"(t) = 0 when t > |tt. For 0 < t < |тг, it is clear from (C.ll) that /sin £ \ 4 /3 \ /2\4 /Ч\ l(/0 * 0p)"(t)\ > cos cos f) = C > 0. \ 2 / 'O / \7F / \O/ Thus, whenever (7? * 0p)"(t) 0, (R * 0p) (t) n> * p Since (R * 0p)"{t) = R"*0p(t), and since \R" * 0p(t)| < max|7?"(t)| = max |p(5\t)|, we have < С’1 max |p(5)(Z)|. (C.21) l/o * 0p)"(t) Hence, there is an > 0 such that (R*oPnt) <M C/b*V'W " J-5 for all j > 5. When j < —1, we continue with our program of “explicit representation” and use the representation (C.12)-(C.16) to show that |p4(/o * 0p)"(t)\ is bounded away from zero, uniformly in p, for any t where (R * 0p)"(t) 0. First, use (C.19) to deduce that (R*0p)"{t) = 0 for 7? < t < p — 7? and use (C.13) to see that p4(/o * 0p)"(b) = 67Г2 - 4тгр < -107Г2, and p4(/o * 0p)"(p - 7г) = 2тгр - 67Г2 > 27Г2.
A COUNTEREXAMPLE 227 This takes care of any possibility that p4(/o * ОрУ'(У) comes close to zero on the support of (R * 0РУ between t = 0 and t = tm, where p4(fo * Opyf(t) is maximum. Next, we must see what happens for t > tm. Recall that (R * Opy'(t) = 0 for all t > + 2p. Thus, we wish to investigate the value of p4(/o * Opy'(t) at t = ^ + 2p. For this, we use (C.16) and discover that 4/т д\,/9 2 2+ v2 P (fo * Op) \1+2P) = ----2— > L This means that |p4(/o * #p)zz(t)| bounded away from zero uniformly on the support of (R*Opy'. The representation (C.19) and the fact that only one point of the filter can intersect К when p > 4тг (j < —1) imply that |р4(Я* Opy'(ty < 6 max |pz(t)|. Hence, (R*0py'(t) (fo*Opy(t) (C.22) where Mj<~i < 6 max |(/(t)|. For each j = 0,1, 2, 3,4 there is an Mj such that (R*Opy'(t) (fo*Opy'(t) < Mj. By taking M = max{Mj<_i, Mo, Mi, М2, М3, M4, Mj>5}, we see that (C.3) is satisfied uniformly in p. The result follows as indicated in the introduction by choosing A so that AM < 1 and using the Lemma C.l. C.8 Remarks We have analyzed the zeros of the function (/о * 0рУ' in considerable detail. To summarize, let to(p) denote the unique zero of (/о * 0рУ' in the interval [0,7? + 2p). Then to(p) = f + e(p) Ip + £(p) if 0 < p < f, if f P < Зтг, if Зтг < p < 00. (C.23) We have not analyzed the function e(p), but we know that it is relatively small and positive. For example, е(2тг) < (0.005)тг. This behavior of to(p) is qualitatively typical in the following sense: If 0 is re- placed by any symmetric kernel 77 in C3 having compact support [—T, T] and having the property that 77" has exactly one simple zero, say, at t = r, 0 < r < T, then is the unique simple zero of (/о * РрУ'(Р) f°r all sufficiently small p and to(p) is asymptotic to the linear function rp as p —> +00. This is the case, for example, if one takes 77 = /о- That the zeros behave as claimed can be seen by using the representation (/о * т]рУ'(р1) = P 2 / fo(p(t - s))t]"(s) ds
228 APPENDIX С for large p and the representation (/o * r]py'(t) = p-1 У /о (t - s)p(p-1s) ds for small p. The point here is that this part of the counterexample is not particularly sensitive to the kernel в. However, the fact that is a linear combination of delta functions makes it easy to evaluate the zeros for the finite number of intermediate values that we needed to locate for our particular construction. On the other hand, the nature of 0^ was critical for other parts of the coun- terexample. We constructed the perturbation R so that Tto0p^ would not intersect the (interior) of the support of R for t0 — to(27r2_J), j E Z. This depended on the fact that rto0p4^ is concentrated at five points, and it is was easy to avoid these points for the discrete values p = 27t2~j, j E Z. Note that this cannot happen if we consider all values of p, which means that we do not have a counterexam- ple for the continuous case. In fact, it is clear from (C.23) that the five points io(p) — 2p, to(p) — P, 1о(рУ to(p) + P- to(p) + 2p sweep out the whole real line R as p traverses the real axis, and thus there is no place to “hide” the support of a perturbation R. Assuming that we have a kernel p satisfying the conditions indicated above, there is no problem in having the support of R miss the support of pp for small p. The problem arises when p is large. For example, if the support Ttop^ includes all of [to — pT, to + pT], then this support will ultimately cover all of R, and again there is no place to hide a perturbation. This is the case, for example, if p — fo, the Tukey kernel. Having said this, it is conceivable that there are other combinations of perturbations R and kernels p such that R * Pp^tto) = 0 for some sequence Pj —» +oo. In our example, we have attributed this to the fact that the supports of R and TtO^ do not intersect. This, however, is just a reflection of the fact that R was constructed as the third derivative of a function g with compact support, and hence that f tnR(t) dt — 0 for 0 < n < 3. By now it should be clear that the assumption p = 27Г2--7 is not necessary for our construction. We could have used any sequence pj such that pj —> +oo as j — oo and pj —> 0 as j —> Too. The essential point is that there are only a finite number of pj, j E J, that must be checked “by hand.” In fact, it is not necessary to do the computations, as we have done. One can argue as follows: For each j E J, the function (/о * 0РзУ’{1) has only a finite number of isolated zeros. This follows from the fact that Fq * Op4^ is an entire function on each subinterval on which it has an analytic form, and an entire function has only finitely many zeros on a compact interval. Ensuring that (R * 6p )" and (R * вр У vanish at these zeros amounts to writing finitely many linear equations of the form h(R) = 0,... ,In(R) = 0. Since the vector space of our perturbations R is infinite dimensional, there are infinitely many nontrivial R that satisfy the conditions. Ensuring that the zeros of (/о * @РУ' are zeros of (R * вРУ' and (R * врУ is only part of the problem. The other part is to guarantee that the perturbation R does not introduce new zeros. In our example, we were able to bound (R*Vpy'(t) yfo*vPyyt) uniformly for large p and for small p, and we were faced with only a finite number of intermediate values. In this part of the argument, the size of R", or the fifth
A COUNTEREXAMPLE 229 derivative of h, enters the picture. In fact, it is not difficult to see that a large value of g^ft) can introduce new zeros in the cases where p is small. On the other hand, we see from (C.22) that controlling g’ is sufficient when p is large. But since g has compact support, we have sup |g'(t)| < (f )4 sup |^^(t)|, and it is fairly clear that the counterexample depends on having |g(5)(i)| sufficiently small. We have not, however, carried the analysis to the point where we can say exactly how small |g(5)(t)| must be. C.9 A case of perfect reconstruction We mentioned in Chapter 8 that perfect reconstruction is possible if the analyzed function f has compact support and if the kernel 0 is the Tukey window. Here is the precise statement and proof of that case of perfect reconstruction. Theorem C.l. Assume that f is a real-valued function in L1(R) with compact support. If 0 is the kernel n/n_fl+COSt for \t\ < 7Г, CRC — ) n f । ,| 10 for \t\ > 7Г, then f is uniquely determined by knowing the location of the zeros of (f * 0p )" and the values of (J * 0p У at these zeros for any sequence pj such that pj +oo as j Too. Proof. The proof depends on the fact that the Fourier transform f is the restric- tion to the real line of the entire function /(z) = f f(x)e~zx dx, and thus that f is uniquely determined by the values of f at a sequence of points that tends to zero as j —» +oo. We are going to assume that we know a value of R > 0 such that f(t) = 0 for \t\ > R. The first step is to compute (/ * 0РУ'(1) and (/ * 0py{t), and for this we assume that \t\ < 7rp — R. For these values of t, we have (/ * ОРУ'(У) = - p 3 / /(t-s)cos(p xs)ds j —TVp — — p-3 j /(i—s)cos(p-1.s)</s = + (c.24) Write f(x) = А(х)ега(х'> with the conventions that A(x) > 0, (C.25) —7Г <a(x) < 7Г. (C.26) Since f is real-valued, /(—x) = f(x) = А(х)е~га(х\ and thus from (C.24) we have (M)"W = -±A(p-i)cos[tp-> +a(p-1)]. (C.27) A similar computation shows that (/ * 6Py(t) = sin [ip-1 + a(p-1)] • (C.28)
230 APPENDIX С These expressions hold for t E [—тгр + R,~p — R]. In particular, they hold in the interval Ip = [— ^p, ^p) whenever p > ^R. The next point is central to the argument, and we state it as a lemma. Lemma C.3. Either (f * 0РУ' vanishes identically on Ip or (J * 0РУ' has exactly one zero, t — tp, in Ip. In the latter case, tpp~x + a(p-1) = ±^. It is clear from (C.27) that (/ * 0рУ vanishes identically on Ip if and only if A(p“1) = 0. Thus, assume that A(p-1) У 0. We consider the linear function 1(h) — tp~x + o(p~'). I maps Ip into the half-open interval ~ - 2 + 1)’2‘+q;(t 1))- (C.29) I (I/У) contains one, and only one, number of the form + кл, к E Z—irrespective of the value of а(р~г). Furthermore, since we require that o(:r) be contained in (—7Г,тг], the only points of this form that appear in l(Ip) are This proves the lemma. One part of the hypothesis is that we “know” the zeros of (/ * вРУ. Thus, given p (sufficiently large) we know if (/ * 0РУ vanishes identically on Ip or not. If it vanishes, then A(p-1) = 0, and we know that /(p-1) = 0. If (/ * 0рУ does not vanish identically on Ip, then we know from the lemma that there is exactly one tp E Ip where (/ * Opy(tp) = 0 and that «„p-1 + «(p-1) = y. (C.30) The other part of the hypothesis is that we know the value of (/ * 6p)'(tp). In particular, we know if (/ * Opy(tp) > 0 or if (/ * Opy(tp) < 0. If (/ * 6py(tp) > 0, then we know from (C.28) that _____1 , _I ч 7Г ZpP +a(p ) = --. Similarly, if (/ * 0РУ{1р) < 0, then we know that _____1 , _i , 7Г lPP + a(p ) = + 2 • Thus we know unambiguously that «(p-1) = - [sign (/ * Opy(tp)] | - tpp-1 (C.31) and that вРУМ\. (C.32) This completes the proof, but it is perhaps useful to state what we have done as an algorithm. Assume that we are given a sequence of positive numbers {pjJjeN such that Pj+i > Pj and pj Too as j —> +oo. The algorithm reads as follows: Step 1 : For all sufficiently large pj (pj > ^R if we know R,), examine the zeros of (/ * 0РзУ in the interval [—f Pj, fpj)- If (/ * 0р3У vanishes identically on this interval, then /(p’1) = 0, and we know the value of /(p^1)- In this case, go to
A COUNTEREXAMPLE 231 the next value of j and repeat Step 1. If (/ * 0РУ' does not vanish identically on 1РзУ g° to SteP 2- Step 2: Denote the unique zero of (/ * 0РэУ' in the interval [—fpj,fpj) by tr If (/ * ерУ^з) > °, then "(dj1) = - 1зР)Г and A^P^ = P2j(f * вР;>У^зУ in which case we know /(p”1). If (/ * < 0, then «(pj1) = f — tjPyX and and again we know the value of f(pj Go to the next value of j and return to Step 1. This algorithm produces the sequence {/(pj-1 )}jeFU and this sequence determines the entire function f(pz) = f f(x)e~zx dx in the following sense: If pi and p2 are entire functions and if Pi(p~1) = gztpj1) for infinitely many j G N, then = p2. There is another way in which the /(p”1) determine f; If /(2) = anzn. then the coefficients an can be computed inductively by the relation aN = hm ----- p3 Finally, since the Fourier transform / 1—> / is one-to-one, f is uniquely determined by /• □ Strict constructionists may find these “determinations” or “reconstructions” less than satisfactory, and, indeed, as it is stated, all we have is a uniqueness theorem. Mallat’s conjecture is proved for this specific window. We note, however, that large values of p played a key role, since we needed to have information for an infinite number of points p3 that tend to infinity. It would be interesting to know if having the same information for a sequence p3 that tends to zero would also guarantee uniqueness.
APPENDIX D Holder Spaces and Besov Spaces This appendix contains the definitions and some fundamental results about Besov spaces and related spaces that are mentioned in Chapter 9 and again prominently in Chapter 11. D.l Holder spaces We begin by defining the homogeneous Holder spaces because they are simple and they lead naturally to the Besov spaces. For a given a, 0 < a < 1, C’a(IRn) is defined to be the set of all continuous functions f such that \f(y)-f(x)\<C\y-x\a for all x,y G If we let ll/llc sup \f(y) - \y ~ x\a then || • ||^Q is a norm and Ca(IRn) is a Banach space in this norm, modulo the constant functions. This definition can be reformulated using the modulus of continuity <жоо(/, Л), which is defined as follows: ^оо(М) = sup \f(x + y) - /(ж)|. Iwl < h Then f belongs to Ca(IRn) if and only if woo(/, h) < Cha. It is easy to see that ^oo(f i h) || ,.|| SUP----7^--- = II/IIg- h>0 <1 If 1 < a < 2, the definition is similar, but [Ду/](ж) = f{x + y) — fix') is replaced by [Ду/](ж) = f(x + 2y) — 2f(x + y) + f(x). The space (/"(JR71) is again defined by the condition woo(/, Л) < Cha. It is a Banach space, but now the elements are modulo the affine functions. For N < a < N + 1, [Ду/](ж) is replaced by the iterated difference [Д(^+1/](ж). C“(K") is then a Banach space of functions modulo polynomials P/v, where deg Pn < N. The spaces Ca(IRn) (with the dot) are said to be homogeneous for the following reason: If 0 < A < oo and fx(x) = /(Аж), then ||/a||(Aq = Aa||/||^Q. The non- homogeneous Holder spaces Ca(]Rn) (without the dot) are defined by the relation 233
234 APPENDIX D Ca(Rn) = C“(Rn) nL°°(lRn). To be more precise, we should say that a function f is in Ca(Rn) if and only if it is a bounded representative of an element of C*"(IRn). The norm of f G C"(IRn) is defined by ||/||cq = ||/||oo + |Ш1с“, and C'a(Rn) is a Banach space in this norm, but the norm does not satisfy the homogeneous property. If the spaces are defined on a compact subset К C Rn, then Ca(K) = Ca(K\ and there is no distinction. In this case, f\ does not make sense, and the homogeneous property is lost. Both C"(IRn) and Ca(IRn) have advantages in analysis. The advantage of Ca(IRn) is that it is an algebra: If /,g E Ca(IRn), then fg E Ca(Rn)- This is not true for C"(IRn). On the other hand, there are examples where the large-scale behavior (or complete self-similarity) is important, and where it is necessary to admit functions that are unbounded at infinity. Examples include fractional Brownian motion or, more generally, у processes. In other situations, it is only necessary to focus on small scales. D.2 Besov spaces We will move from the homogeneous Holder spaces C’a(IRn) to the homogeneous Besov spaces Ba,q(Lp)(Rn) (which are often denoted by Bp/7(IRn)) in two steps: The modulus of continuity <a;oo(/, h) is replaced by up(f, h)= sup ||/(z + у) - /(^)||LP(Kn), \y\<h and the condition woo(/, Л) < Cha is replaced by wp(f,ti) < e(h)ha, where s(h) must satisfy the condition The norm of f E Ba,q(Lp) is naturally defined as II/IIbP'4 dh\1/q haq h J and this norm is homogeneous with ||/а||^«,у = Аа-Ё||/|| This new definition is for the case 1 < p < oo, 1 < q < oo, and 0 < a < 1. For 7V<a:<7V-|-l,n> 1, it is necessary to replace [Ду/](ж) by [Д^+1/](г). If 1 < p < oo, 1 < q < oo, and 0 < a < oo, then Bp,9(lRn) is a Banach space of functions modulo polynomials of degree N, where N is the largest integer in a — у. (There are no polynomials when a < y.) In parallel with what was done for Holder spaces, the nonhomogeneous Besov space Bp ,9(lRn) is by definition the intersection Bp ,9(lRn) A Lp(IRn), and \\f\\B^ = \\f\\LP + \\f\\^ Besov spaces are easily characterized by size estimates on wavelet coefficients. Let = 2n^2ipi(2qx — k), j E Z, к E 2Ln, i — 1,... , 2n — 1, be an orthonormal wavelet basis for L2(IRn), where each belongs to the Schwartz class iS(Bn). Then all of the moments of the wavelets vanish, and hence f P(x)f>(x) dx = 0 for all
HOLDER SPACES AND BESOV SPACES 235 polynomials P. As is customary, we simplify the notation by dropping the index i. We also change the normalization of the wavelet coefficients of f and write c(j,k) = 2nq f f(x)ip(2^x — k)dx. (D.l) The integral in (D.l) is unambiguously defined for elements of the Besov space Ba,9(Lp), since any two representatives of f and g of the same element differ by a polynomial of degree less than or equal to a - With these conventions, we have the following result. Theorem D.l. If f belongs to Ba,q(Lp), then the sequence Ej defined by (Eivvr) =2-><“-?v (d.2) belongs to lq(Tfi. Conversely, if the wavelet coefficients of a function f satisfy this condition, then f = g + P, where g G Ba'q(Lp) and P is polynomial. (There is no restriction on the degree of P.) A simplification occurs when a = and p = q. In this case, condition (D.2) becomes EEWW<“. (D.3) jgz fcezn and the Besov space Bp'/p,p(IRn) is isomorphic to ZP(Z). When 0 < p < 1, p = q, and a = the corresponding Besov space can be defined either by (D.3) or by the following growth property of the dyadic blocks Aj(f) that occur in the Littlewood Paley expansion of f. This growth property reads ||A//)||p<£j2-^, E3elq(Z), (D.4) and it characterizes the Besov space Bp,9(lRn). In the particular case p — q and a — condition (D.4) becomes E 2"й|Ал(/)||Р < oo. (D.5) j = -OO The wavelet coefficients c(j, k) of f can be interpreted as a sample of &j(f) on the grid 2-JZn, and this heuristic leads to replacing || Aj(/)||p by the Riemann sum ^2fceZ,, lcCb k)\p2~nj. By carrying out this program, one can show that (D.5) is equivalent to |c(j, k)\p < oo. The details can be found in [203]. D.3 Examples We are going to illustrate the use of Besov spaces for modeling and denoising with a textbook example. The signal we wish to denoise is written as the sum of two terms s(x) = 0(x) + Xg(x) cos(wx), where the signal 0 is the characteristic function of an interval (a,&), the noise is a modulated Gaussian дш(х) = g(x) cos(wx) (w large), and the coefficient A is a small parameter. Clearly, this noise is an academic simplification, but the following discussion also applies to more realistic situations.
236 APPENDIX D We will try to extract the signal by using a regularity criterion. From the usual point of view, the signal 6 is not a regular function, whereas the noise дш is infinitely differentiable. In trying to extract the signal 0 from the sum + Xg(x) cos(wx), which represents the noisy signal, we have a problem where the “good” function is irregular and the “bad” function is regular. We will see that the judicious use of Besov spaces lets us reverse the order of things. From the point of view of the Sobolev spaces Hs, the relative regularity of 6, compared with that of the product g(x) cos(wx), increases with a>. If, for example, the exponent s of the Sobolev space is less than |, the Sobolev norm of 6 is bounded while that of g(x) cos(ux) tends to infinity like xs as x oo. In this sense, the Sobolev norm knows how to distinguish edges from textures. This contrast is even greater if one uses the Besov spaces Bs,q(Lp). In fact, ifO<p<l,s = ^, and q — oo, then 0 belongs to the corresponding Besov space while the norm of g(x) cosfwx) is of the order (With Sobolev spaces, the best one can do is ct?1/2-£.) This means that if A is small enough and w is large enough, one can extract the signal 0 from the noisy signal 0(x) + Xg(x) cos(xx) using a criterion based on the optimization of a Besov norm. In this case, one is using the Besov space , and the closer p is to zero, the sharper is the discrimination between the main term 0(x) and the error term Xg(x) cos{wx). We note that the jump discontinuities of 0(x) do not prevent it from belonging to the Besov space . We see then how this approach is preferable to low-pass filtering, which would indeed eliminate Xg(x) cos(w.r) but at the same time would blur the edges of 0(x). A similar example in two dimensions is given by f = и + v, where и is the characteristic function of the unit disc and v is a Gaussian white noise. Measured in the Besov norm B^,q^ where s < the function и has a relatively small norm while the norm of v is infinite whenever s > —1. The discrepancy between и and v becomes more apparent as p tends to zero. This leads us to a program that discriminates the edges from the textures (and from the noise) by using a criterion given by the Besov norm and that requires the final result to have a “small” Besov norm. This is indeed the viewpoint adopted by Donoho. For Donoho, the a priori knowledge is modeled by membership in certain Besov spaces.
Bibliography Numbers in brackets following an entry indicate the pages where the reference is cited. Several books and articles, which are not cited in the text, have also been listed. [1] P. Abry, Ondelettes et turbulences: Multiresolutions, algorithms de ddcomposition, invari- ance d’echelle et signaux de pression, Diderot Editeur, Arts et Sciences, Paris, 1997. [140] [2] P. Abry and F. Sellan, The wavelet-based synthesis for the fractional Brownian motion proposed by F. Sellan and Y. Meyer: Remarks and fast implernentation, Appl. Comput. Harmon. Anal., 3 (1996), pp. 377-383. [21, 65, 181] [3] E. H. Adelson, E. Simoncelli, and R. Hingorani, Orthogonal pyramid transforms for image coding, in Proc. SPIE Conf. Visual Comm. Image Process. II, vol. 845, 1987, pp. 50- 58. Reprinted in Selected Papers in Image Coding and Compression, M. Rabbani, ed., SPIE Milestone Series, SPIE Press, Bellingham, WA, pp. 331-339, 1992. [36, 49] [4] F. ANSCOMBE, The transformation of Poisson, binomial and negative-binomial data, Biometrika, 35 (1948), pp. 246-254. [193, 195] [5] F. Anselmet, Y. Gagne, E. J. Hopfinger, and R. A. Antonia, High-order velocity structure functions in turbulent shear flow, J. Fluid Meeh., 140 (1984), pp. 63-89. [130] [6] A. Antoniadis and G. Oppenheim, eds., Wavelets and Statistics, Lecture Notes in Statis- tics 103, Springer-Verlag, New York, 1995. [7] A. Arneodo. F. Argoul, E. BACRY, J. Elezgaray, and J.-F. Muzy, Ondelettes, mul- tifractals et turbulences: De I’ADN aux croissances cristallines, Diderot Editor, Arts et Sciences, Paris, 1995. [55, 135, 200] [8] A. Arneodo. B. Audit, E. Bacry, S. Manneville, J.-F. Muzy, and S. G. Roux, Ther- modynamics of fractal signals based on wavelet analysis-, applications to fully developed turbulence data and DNA sequences, Phys. A, 254 (1998), pp. 24-45. [137] [9] A. Arneodo, E. Bacry, S. Jaffard, and J.-F. Muzy, Oscillating singularities on Cantor sets. A grand-canonical multifractal formalism, J. Statist. Phys., 87 (1997), pp. 179 209. [142] [10] A. Arneodo, E. Bacry, and J.-F. Muzy, The thermodynamics of fractals revisited with wavelets, Phys. A, 213 (1995), pp. 232-275. [140] [11] ------, Random cascades on wavelet dyadic trees, J. Math. Phys., 39 (1998), pp. 4142-4164. [136] [12] A. Arneodo, Y. d’Aubenton Carafa, B. Audit, E. Bacry, J.-F. Muzy, and C. Ther- mes, What can we learn with wavelets about DNA sequences?, Phys. A, 249 (1998), pp. 439- 448. [137] [13] A. Arneodo, Y. d’Aubenton Carafa, E. Bacry, P. V. Graves, J.-F. Muzy, and C. Thermes, Wavelet based fractal analysis of DNA sequences, Phys. D, 96 (1996), pp. 291- 320. [137] [14] J.-M. Aubry, Traces of oscillating functions, J. Fourier Anal. Appl., 5 (1999), pp. 331-345. [143] 937
238 BIBLIOGRAPHY [15] E. Bacry, A. Arneodo, U. Frisch, Y. Gagne, and E. Hopfinger, Wavelet analysis of fully developed turbulence data and measurement of scaling exponents, in Turbulence and Coherent Structures, O. Metais and M. Lesieur, eds., Kluwer Academic Press, Norwell, MA, 1991, pp. 203-215. [130, 139] [16] E. Bacry. J.-F. Muzy, and A. Arneodo, Singularity spectrum of fractal signals from wavelet analysis: Exact results, J. Statist. Phys., 70 (1993), pp. 635-674. [134, 135, 140] [17] R. Balian, Un principe d’incertitude fort en theorie du signal ou en mdcanique quantique, C. R. Acad. Sci. Paris Ser. II, 292 (1981), pp. 1357-1361. [9, 90, 67] [18] R. G. Baraniuk and D. L. Jones, New dimensions in wavelet analysis, in Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1992. [100] [19] ------, New signal-space orthonormal bases via the metaplectic transform, in IEEE-SP Internat. Symposium on Time-Frequency and Time-Scale Analysis, IEEE Press, Piscataway, NJ, 1992. [100] [20] ------, Unitary equivalence: A new twist on signal processing, IEEE Trans. Signal Process., 43 (1995), pp. 2269-2282. [100] [21] ------, Wigner-based formulation of the chirplet transform, IEEE Trans. Signal Process., 44 (1996), pp. 3129-3135. [100] [22] J. BARRAL, Moments, continuite et analyse multifractale des martingales de Mandelbrot, Probab. Theory Related Fields, 113 (1999), pp. 535-570. [131] [23] M. Basseville, A. Benveniste, K. Chou, S. Golden, R. Nikoukhah, and A. Will- sky, Modeling and estimation of multiresolution stochastic processes, Special issue of IEEE Trans. Inform. Theory on Wavelet Transforms and Multiresolution Signal Analysis, 38 (1992), pp. 766-784. [133] [24] M. Basseville. A. Benveniste, and A. Willsky, Multiscale autoregressive processes, part I: Schur-Levins on parametrizations, IEEE Trans. Signal Process., 40 (1992), pp. 1915-1934. [133] [25] ------, Multiscale autoregressive processes, part II: Lattice structures for whitening and modeling, IEEE Trans. Signal Process., 40 (1992), pp. 1935-1954. [133] [26] G. Batchelor and A. A. Townsend, The nature of turbulent motion at large wave num- bers, Proc. Roy. Soc. London Ser. A, 199 (1949), pp. 238-255. [130] [27] G. Battle, A block spin construction of ondelettes, part II: The QFT connection, Comm. Math. Phys., 114 (1988), pp. 93-102. [32] [28] ------, Wavelet refinement of the Wilson recursion formula, in Recent Advances in Wavelet Analysis, L. Schumaker and G. Webb, eds., Academic Press, Norwell, MA, 1994, pp. 87-118. [32] [29] G. Battle and P. Federbush, Ondelettes and phase cluster expansions, a vindication, Comm. Math. Phys., 109 (1987), pp. 417-419. [32] [30] ------, Divergence-free vector wavelets, Michigan Math. J., 40 (1993), pp. 181-195. [145] [31] A. Benassi, S. Jaffard, and D. Roux, Analyse multi-echelle des champs gaussiens markoviens d’ordre p indexes par [0,1], C. R. Acad. Sci. Paris Ser. I, (1991), pp. 403-406. [21] [32] P. Bendjoya, E. Slezak, and C. Froeschle, The wavelet transform, a new tool for as- teroid family determination, Astronom. Astrophys., 251 (1991), pp. 312-330. [33] J. J. Benedetto and M. W. Frazier, eds., Wavelets: Mathematics and Applications, CRC Press, Boca Raton, FL, 1993. [34] J. Berger, R. R. Coifman, and M. J. Goldberg, Removing noise from music using local trigonometric bases and wavelet packets, J. Audio Eng. Soc., 42 (1994), pp. 808-818. [72, 104, 105] [35] J. Bertoin, The inviscid Burgers equation with Brownian initial velocity, Comm. Math. Phys., 193 (1998), pp. 397-406. [165] [36] A. Bijaoui, Wavelets and astrophysical applications, in Wavelets in Physics, H. C. van den Berg, ed., Cambridge Univ. Press, Cambridge, U. K., 1997, pp. 77-115. [187, 196, 199, 201] [37] R. E. Blahut, W. Miller Jr, and С. H. Wilcox, eds., Radar and Sonar, Parti, Springer- Verlag, New York, 1991. [74] [38] Y. Bobichon and A. Bijaoui, A regularized image restoration algorithm for lossy com- pression in astronomy, Experiment. Astronom., 7 (1997), pp. 239-255. [194, 196, 197]
BIBLIOGRAPHY 239 [39] К. Bouyoucef, D. Fraix-Burnet, and S. Roques, Interactive deconvolution with er- ror analysis (IDEA) in astronomical imaging-. Application to aberrated HST images on SN1987A, M87 and 3C66B, Astronom. Astrophys. Suppl. Ser., 121 (1997), pp. 575-585. [193] [40] L. Brillouin, Science and Information Theory, Academic Press, New York, 1956. [9] [41] С. M. Brislawn, Fingerprints go digital, Notices Amer. Math. Soc., 42 (1995), pp. 1278- 1283. [6, 70, 97, 194] [42] P. J. Burt and E. H. Adelson, The Laplacian pyramid as a compact image code, IEEE Trans. Comm., 31 (1983), pp. 532-540. [50] [43] P. L. BUTZER AND E. L. Stark, “Riemann’s example” of a continuous nondifferentiable function in the light of two letters (1865) of Christoffel to Prym, Bull. Soc. Math. Belg., 38 (1986), pp. 45-73. [150] [44] J. S. Byrnes, Quadrature mirror filters, low crest factor arrays, functions achieving optimal uncertainty principle bounds, and complete orthonormal sequences—A unified approach, AppL Comput. Harmon. Anal., 1 (1994), pp. 261-266. [Ill] [45] M. Cannone, Ondelettes, paraproduits et Navier-Stokes, Diderot Editeur, Arts et Sciences, Paris, 1995. [147] [46] R. Carmona, W.-L. Hwang, and B. Torresani, Practical Time-Frequency Analysis, vol. 9 of Wavelet Analysis and Its Applications, Academic Press, San Diego, CA, 1998. [47] B. Castaing AND B. Dubrulle, Fully developed turbulence: A unifying point of view, J. Physique II (Paris), 5 (1995), p. 895. [131] [48] С. V. L. Charlier, How an infinite world may be built up, Ark. Mat. Astron. Fys., 16 (1922), pp. 1-34. [198] [49] E. Chassande-Mottin and P. Flandrin, On the time-frequency detection of chirps, Appl. Comput. Harmon. Anal., 6 (1999), pp. 252-281. [80, 103] [50] J.-Y. Chemin, Calcul paradifferentiel precise et application a des equations aux derivees partielles non semi-lineaires, Duke Math. J., 56 (1988), pp. 431-469. [147] [51] -----, Persistance de structures geometriques dans les fluides incompressibles bidimen- sionnels, Ann. Ecole Normale Superieure, 26 (1993), pp. 1-26. [147] [52] A. J. Chorin and J. E. Marsden, A Mathematical Introduction to Fluid Mechanics, Springer-Verlag, New York, 1979. [139] [53] Z. Ciesielski, Holder conditions for realizations of Gaussian processes, Trans. Amer. Math. Soc., 99 (1961), pp. 403-413. [21] [54] -----, Properties of the orthonormal Franklin system, Studia Math., 23 (1963), pp. 141- 157. [24] [55] -----, Properties of the orthonormal Franklin system, II, Studia Math., 27 (1966), pp. 289- 323. [24] [56] A. Cohen, W. Dahmen, and R. DeVore, Adaptive wavelet methods for elliptic operator equations: Convergence rates, Math. Comput., to appear. [147] [57] A. Cohen, I. Daubechies, and J.-C. Feauveau, Biorthogonal bases of compactly supported wavelets, Comm. Pure Appl. Math., 44 (1992), pp. 485-560. [63] [58] A. Cohen, I. Daubechies, and P. Vial, Wavelets and fast wavelet transform on an inter- val, Appl. Comput. Harmon. Anal., 1 (1993), pp. 54-81. [180, 192] [59] A. Cohen, R. DeVore, P. Petrushev, and H. Xu, Nonlinear approximation and the space BV(R2), Amer. J. Math., 121 (1999), pp. 587-628. [177, 183] [60] A. Cohen and R. D. Ryan, Wavelets and Multiscale Signal Processing, Chapman &; Hall, London, 1995. [45, 47, 55, 62, 208] [61] R. R. Coifman, Adapted multiresolution analysis, computation, signal processing and op- erator theory, in Proc. Internat. Congr. Math., Kyoto, Japan, 1990, vol. II, Springer-Verlag, New York, 1991, pp. 879-887. [62] R. R. Coifman and D. Donoho, Translation-invariant de-noising, in Wavelets in Statistics, A. Antoniadis and G. Oppenheim, eds., Springer-Verlag, New York, 1995, pp. 125-150. [99] [63] R. R. Coifman, G. Matviyenko, and Y. Meyer, Modulated Malvar-Wilson bases, Appl. Comput. Harmon. Anal., 4 (1997), pp. 58-61. [100]
240 BIBLIOGRAPHY [64] R. R. Coifman and Y. Meyer, Remarques sur I’analyse de Fourier a fenetre, C. R. Acad. Sci. Paris Ser. I Math., 312 (1991), pp. 259-261. [92] [65] R. R. Coifman, Y. Meyer, and V. Wickerhauser, Size properties of wavelet packets, in Wavelets and Their Applications, M. B. Ruskai et al., eds., Jones and Bartlett, Boston, MA, 1992, pp. 453-470. [108, 109, 111] [66] R. R. Coifmann, Y. Meyer, S. Quake, and V. Wickerhauser, Signal processing and com- pression with wavelet packets, in Progress in Wavelet Analysis and Applications, Y. Meyer and S. Roques, eds., Editions Frontieres, Gif-sur-Yvette, France, 1993, pp. 77-93. [67] A. Cordoba and C. Fefferman, Wave packets and Fourier integral operators, Comm. Partial Differential Equations, 3 (1978), pp. 979-1005. [87] [68] A. Croisier, D. Esteban, and C. Galand, Perfect channel splitting by use of interpo- lation/decimation/tree decomposition techniques, in Internat. Conf. Inform. Sci. Systems, Patras, Greece, 1976, pp. 443-446. [35] [69] M. D.'F.hi.en and T. Lyche, Decomposition of splines, in Mathematical Methods in CAGD and Image Processing, T. Lyche and L. Schumaker, eds., Academic Press, Boston, MA, 1992, pp. 135-160. [70] K. Daoudi, A. Frakt, and A. Willsky, Multiscale autoregressive models and wavelets, Special issue of IEEE Trans. Inform. Theory on Multiscale Statistical Signal Analysis and its Applications, 45 (1999), pp. 828-845. [133] [71] I. DAUBECHIES, Orthonormal bases of compactly supported wavelets, Comm. Pure AppL Math., 41 (1988), pp. 909-996. [9] [72] -----, The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inform. Theory, 36 (1990), pp. 961-1005. [106] [73] -----, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. [9, 46] [74] I. Daubechies, S. Jaffard, and J.-L. Journe, A simple Wilson orthonormal basis with exponential decay, SIAM J. Math. Anal., 22 (1991), pp. 554-573. [91] [75] G. Davis, S. G. Mallat, and M. Avellaneda, Adaptive greedy approximations, Constr. Approx., 13 (1997), pp. 57-98. [71] [76] N. G. DE Bruun, Uncertainty principals in Fourier analysis, in Inequalities, O. Shisha, ed., Academic Press, New York, 1967, pp. 57-71. [87] [77] J.-M. Delort, FBI-Transformation, Second Microlocalization and Semilinear Caustics, Lecture Notes in Math. 1522, Springer-Verlag, New York, 1992. [87] [78] R. DeVore, Adaptive wavelet bases for image compression, in Curves and Surfaces in Geometric Design, P.-J. Laurent, A. Le Mehaute, and L. Schumaker, eds., A К Peters, Natick, MA, 1994, pp. 1-16. [79] -----, Nonlinear approximation, Acta Numer., 7 (1998), pp. 51-150. [71] [80] R. DeVore, B. Jawerth, and B. Lucier, Surface compression, Comput. Aided Geom. Design, 9 (1992), pp. 219-239. [175] [81] R. DeVore, B. Jawerth, and V. Popov, Compression of wavelet decompositions, Amer. J. Math., 114 (1992), pp. 737-785. [169, 170, 174] [82] R. DeVore and G. G. Lorentz, Constructive Approximation, Springer-Verlag, New York, 1993. [83] R. DeVore and B. Lucier, Fast wavelet techniques for near-optimal image processing, in Proc. 1992 IEEE Military Comm. Conf., IEEE Press, Piscataway, NJ, 1992, pp. 1129-1135. [183] [84] R. DeVore, B. Lucier, M. Kallergi, W. Quin, R. Clark, E. Safe, and L. P. Clarke, Wavelet compression and segmentation of mammographic images, J. Digital Imag., 7 (1994), pp. 27-38. [175] [85] R. DeVore, B. Lucier, and Z. Yang, Feature extraction in digital mammography, in Wavelets in Biology and Medicine, A. Aldroubi and M. Unser, eds., CRC Press, Boca Raton, FL, 1996, pp. 145-156. [175] [86] R. DeVore and V. Popov, Interpolation spaces and non-linear approximation, in Function Spaces and Applications, 1986, M. Cwikel et al., eds., Lecture Notes in Math. 1302, Springer- Verlag, New York, 1988. [172]
BIBLIOGRAPHY 241 [87] R. DeVore, Z. Yang, M. Kallergi, B. Lucier, W. Qian, R. Clark, and L. P. Clarke, The effect of wavelet bases on the compression of digital mammograms, IEEE Engrg. Med. Biol., 15 (1995), pp. 570-577. [175] [88] D. DONOHO, Wavelet shrinkage and W. V.D.: A ten-minute tour, in Progress in Wavelet Analysis and Applications, Y. Meyer and S. Roques, eds., Editions Frontieres, Gif-sur- Yvette, France, 1993. [168] [89] ------, Denoising by soft thresholding, IEEE Trans. Inform. Theory, 41 (1995), pp. 613-627. [Ю2] [90] ------, Nonlinear solution of linear inverse problems by wavelet-vaguelette decomposition, Appl. Comput. Harmon. Anal., 2 (1995), pp. 101-126. [168] [91] ------, Tight frames of k-plane ridgelets and the problem of representing objects that are smooth away from d-dimensional singularities in R.n, Proc. Nat. Acad. Sci., U.S.A., 96 (1999), pp. 1828-1833. [185] [92] D. Donoho and I. Johnstone, Ideal denoising in an orthonormal basis chosen from a library of bases, C. R. Acad. Sci. Paris Ser. A, 319 (1994), pp. 1317-1322. [168] [93] ------, Ideal spatial adaption via wavelet shrinkage, Biometrika, 81 (1994), pp. 425-455. [168] [94] D. Donoho, I. M. Johnstone, G. Kerkyacharian, and D. Picard, Wavelet shrinkage: Asymptopia?, J. Roy. Statist. Soc. Ser. B, 57 (1995), pp. 301-369. [168] [95] D. Donoho, M. Vetterli, R. DeVore, and I. Daubechies, Data compression and har- monic analysis, IEEE Trans. Inform. Theory, 44 (1998), pp. 2435-2476. [36] [96] P. DU Bois-Reymond, Versuch einer Classification der willkiirlichen Functionen reeller Argumente nach ihren Aenderungen in den kleinsten Intervallen, J. Reine Angew. Math., 79 (1875), pp. 21-37. [150] [97] J. J. Duistermaat, Self similarity of ‘Riemann’s nondifferentiable function’, Nieuw Arch. Wisk., 9 (1991), pp. 303-337. [150, 158, 159] [98] E. Escalera, E. Slezak, and A. Mazure, New evidence for subclustering m the Coma cluster using the wavelet analysis, Astronom. Astrophys., 269 (1992), pp. 379-384. [99] D. Esteban and C. Galand, Application of quadrature mirror filters to split band voice coding systems, in Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1977, pp. 191-195. [206] [100] G. Faber, Uber die orthogonal Funktionen des Herrn Haar, Jahresber. Deutsch. Math.- Verein., 19 (1910), pp. 104-112. [18] [101] K. FALCONER, Fractal Geometry: Mathematical Foundations and Applications, John Wiley & Sons, West Sussex, U. K., 1993. [147, 162] [102] A. Fan, Moyene de localization frequentielle des paquets d’ondelettes, Rev. Mat. Iberoamer- icana, 14 (1998), pp. 63-70. [107] [103] M. Farge, The continuous wavelet transform of two-dimensional turbulent flows, in Wavelets and Their Applications, M. B. Ruskai et al., eds., Jones and Bartlett, Boston, MA, 1992, pp. 275-302. [140] [104] M. Farge, E. Goirand, Y. Meyer, F. Pascal, and V. Wickerhauser, Improved pre- dictability of two-dimensional turbulent flows using wavelet packet compression, Fluid Dy- nam. Res., 10 (1992), pp. 229-250. [141] [105] M. Faroe, N. Kevlahan, V. Perrier, and E. Goirand, Wavelets and Turbulence, Proc. IEEE, 84 (1996), pp. 639-669. [137] [106] S. Fauve, C. Laroche, and B. Castaing, Pressure fluctuations in swirling turbulent flows, J. Physique II (Paris), 3 (1993), pp. 271-278. [140] [107] J.-C. FEAUVEAU, Analyse multiresolution par ondelettes non orthogonales et bases de filtres numerique, Ph.D. thesis, Univ, of Paris-South, Orsay, France, 1990. [63] [108] P. Federbush, Quantum theory in ninety minutes, Bull. Amer. Math. Soc., 17 (1987), pp. 93-103. [32, 33] [109] C. Fefferman, The multiplier problem for the ball, Ann. of Math., 94 (1971), pp. 330-336. [110] P. Flandrin, Some aspects of non-stationary signal processing with emphasis on time- frequency and time-scale methods, in Wavelets: Time-Frequency Methods and Phase Space, J.-M. Combes, A. Grossman, and P. Tchamitchian, eds., Springer-Verlag, Berlin, 1989, pp. 68-98. [167]
242 BIBLIOGRAPHY [111] , Wavelet analysis and synthesis of fractional Brownian motion, IEEE Trans. Inform. Theory, 38 (1992), pp. 910-917. [21] [112] , Time-Frequency/Time-Scale Analysis, Academic Press, San Diego, CA, 1998. [73, 76, 80, 85, 86, 103] [113] E. Fournier d’Albe, Two New Worlds, Longmans, Green, London, 1907. [198] [114] M. Frazer, B. Jawerth, and G. Weiss, Littlewood-Paley Theory and the Study of Func- tion Spaces, AMS, Providence, RI, 1991. [23] [115] G. Freud, Uber trigonometrische approximation und fouriersche reihen, Math. Z., 78 (1962), pp. 252-262. [150] [116] P. Frick and V. Zimin, Hierarchical models of turbulence, in Wavelets, Fractals, and Fourier Transforms, M. Farge, J. C. R. Hunt, and J. C. Vassilicos, eds., vol. 43 of Inst. Math. Appl. Conf. Ser. New Ser., The Clarendon Press, Oxford, U. K., 1993, pp. 265-283. [145] [117] J. Friedman and W. Stuetzle, Projection pursuit regression, J. Amer. Statist. Assoc., 76 (1981), pp. 817-823. [71] [118] U. FRISCH, Turbulence: The legacy of A. N. Kolmogorov, Cambridge Univ. Press, Cam- bridge, U. K., 1995. [127] [119] U. Frisch, P. L. Sulem, and M. Nelkin, A simple dynamical model of intermittent fully developed turbulence, J. Fluid Meeh., 87 (1978), pp. 719-736. [120] K. Fritze, M. Lange, H. Oleak, and G. M. Richter, A scanning microphotometer with an on-line data reduction for large field Schmidt plates, Astron. Nach., 298 (1977), pp. 189- 196. [194] [121] .1. Froment, Traitement d’images et applications de la transformee en ondelettes, Ph.D. thesis, Univ, of Paris-Dauphine, Paris, 1990. [122] , A functional analysis model for natural images permitting structured compression, ESAIM: Control, Optimization and Calculus of Variations, 4 (1999), pp. 473-495. Available on-line at http://www.emath.fr. [115] [123] J. Froment and J.-M. Morel, Analyse multiechelle, vision stereo et ondelettes, in Les ondelettes en 1989, P. G. Lemarie, ed., Lecture Notes in Math. 1438, Springer-Verlag, Berlin, 1990, pp. 51-80. [124] D. Gabor, Theory of communication, J. IEE, 93 (1946), pp. 429-457. [9, 67, 90] [125] C. Galand, Codage en sous-bandes: theorie et applications a la compression numerique du signal de parole, Ph.D. thesis, Univ, of Nice, Nice, France, 1983. [35] [126] C. Gasquet and P. Witomski, Fourier Analysis and Applications: Filtering, Numerical Computation, Wavelets, Springer-Verlag, New York, 1998. [127] J. GervER, The differentiability of the Riemann function at certain rational multiples of тг, Amer. J. Math., 92 (1970), pp. 33-55. [20, 158] [128] , More on the differentiability of the Riemann function, Amer. J. Math., 93 (1970), pp. 33-41. [20, 158] [129] , On Cubic Lacunary Fourier Series. Rutgers Univ., Camden, NJ, preprint, 1999. [164] [130] J. Glimm and A. Jaffe, Quantum Physics: A Functional Integral Point of View, 2nd ed., Springer-Verlag, New York, 1987. [33, 78] [131] H. H. Goldstine and J. von Neumann, On the principles of large scale computing ma- chines, in John von Neumann: Collected Works, vol. 5, A. Taub, ed., Pergamon Press, Oxford, U. K., 1963, pp. 1-32. [This paper was never published elsewhere. It contains ma- terial presented by von Neumann in a number of lectures, in particular, one at a meeting on 15 May 1946 of the Mathematical Computing Advisory Board, Office of Research and Inventions, Navy Department, which in 1947 became the Office of Naval Research.] [138] [132] R. Gribonval, Approximations non-lineaires pour I’analyse des signaux sonores, Ph.D. thesis, Univ, of Paris-Dauphine, Paris, 1999. [71] [133] A. Grossmann and J. Morlet, Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math. Anal., 15 (1984), pp. 723-736. [8, 27] [134] A. HAAR, Zur Theorie der orthogonalen Functionensysteme, Math. Ann., 69 (1910), pp. 331-371. [18]
BIBLIOGRAPHY 243 [135] W. HARDLE, G. Kerkyacharian, D. Picard, and A. Tsybakov, eds., Wavelets, Approx- imation, and Statistical Applications, Lecture Notes in Statistics 129, Springer-Verlag, New York, 1998. [136] G. H. HARDY, Weierstrass’s non-differentiable function, Trans. Amer. Math. Soc., 17 (1916), pp. 301-325. [157] [137] G. H. Hardy and J. E. Littlewood, Some problems in Diaphantine approximation II, Acta Math., 37 (1914), pp. 194-238. [157] [138] G. H. Hardy and E. M. Wright, An Introduction to the Theory of Numbers, 4th ed., Oxford Univ. Press, London, 1962. [161] [139] E. HARRISON, Darkness at Night, A Riddle of the Universe, Harvard Univ. Press, Cam- bridge, MA, 1987. [198] [140] F. HAUSDORFF, Dimension und dusseres Mass, Math. Ann., 79 (1919), pp. 157-179. [148] [141] W. Heisenberg, Zur statistischen Theorie der Turbulenz, Z. Phys., 124 (1948), pp. 628-657. [128] [142] E. Hernandes and G. Weiss, A First Course on Wavelets, CRC Press, Boca Raton, FL, 1996. [143] M. Holschneider, Inverse Radon transforms through inverse wavelet transforms, Inverse Problems, 7 (1991), pp. 853-861. [218] [144] M. Holschneider and P. Tchamitchian, Pointwise analysis of Riemann’s “non differen- tiable” function, Invent. Math., 105 (1991), pp. 157-176. [20, 218] [145] C. Houdre and R. Averkamp, Wavelet Thresholding for Non (necessarily) Gaussian Noise: Idealism. Georgia Institute of Technology, Atlanta, GA, preprint, 1999. [179] [146] L. Huang and A. Bijaoui, Astronomical image data compression by morphological skeleton transformations, Experiment. Astronom., 1 (1991), pp. 311-327. [195] [147] J. C. R. Hunt, N. K.-R. Kevlahan, J. C. Vassilicos, and M. Farge, Wavelets, fractals and fourier transforms: Detection and analysis of structures, in Wavelets, Fractals, and Fourier Transforms, M. Farge, J. C. R. Hunt, and J. C. Vassilicos, eds., vol. 43 of Inst. Math. Appl. Conf. Ser. New Ser., The Clarendon Press, Oxford, U. K., 1993, pp. 1-38. [104] [148] J.-M. Innocent and B. Torresani, Wavelets and binary coalescences detection, Appl. Comput. Harmon. Anal., 4 (1997), pp. 113-116. [103, 143] [149] S. ITATSU, The differentiability of the Riemann function, Proc. Japan Acad., Ser. A Math. Sci., 57 (1981), pp. 492-495. [158, 164] [150] S. Jaffard, Proprietes des matrices “bien localisees” pres de leur diagonale et quelques applications, Ann. Inst. H. Poincare, Anal. Non Lineaire, 7 (1990), pp. 461-476. [24] [151] , Pointwise smoothness, two microlocalization and wavelet coefficients, Publ. Mat., 35 (1991), pp. 155-168. [153, 158] [152] , Local behavior of Riemann’s function, Contemp. Math., 189 (1995), pp. 278-307. [162] [153] —-----, The spectrum of singularities of Riemann’s function, Rev. Mat. Iberoamericana, 12 (1996), pp. 441-460. [20, 132, 163, 165] [154] , Multifractal formalism for functions, Part 1: Results valid for all functions, Part 2: Self-similar functions, SIAM J. Math. Anal., 28 (1997), pp. 944-998. [133, 134, 136, 149, 176] [155] , Old friends revisited: The multifractal nature of some classical functions, J. Fourier Anal. Appl., 3 (1997), pp. 1-22. [132] [156] , Oscillation spaces: Properties and applications to fractal and multifractal functions, J. Math. Phys., 39 (1998), pp. 4129-4141. [143, 145] [157] , Beyond Besov Spaces. Univ, of Paris XII, Creteil, France, preprint, 1999. [136] [158] , The multifractal nature of Levy processes, Probab. Theory Related Fields, 114 (1999), pp. 207-227. [165] [159] S. Jaffard and B. Mandelbrot, Peano-Poyla motion, when time is intrinsic or binomial (uniform or multifractal), Math. Intelligencer, 19 (1997), pp. 21-26. [132] [160] S. Jaffard and Y. Meyer, Wavelet methods for pointwise regularity and local oscillations of functions, Mem. Amer. Math. Soc. 123, No. 587, AMS, Providence, RI, 1996. [142, 152]
244 BIBLIOGRAPHY [161] В. J. T. Jones, V. J. Martinez, E. Saar, and J. Einasto, Multifractal description of the large-scale structure of the universe, Astrophys. J., 332 (1988), pp. 1-5. [200] [162] L. JONES, On a conjecture of Huber concerning the convergence of projection pursuit re- gression, Ann. Statist., 15 (1987), pp. 880-882. [71] [163] , A simple lemma on greedy approximation in Hilbert space and convergence results for projection pursuit regression and neural network training, Ann. Statist., 20 (1992), pp. 608-613. [71] [164] J.-P. Kahane and P. G. Lemarie-Rieusset, Fourier Series and Wavelets, vol. 3 of Stud. Devel. Modern Math., Gordon and Breach, London, 1995. [16] [165] J.-P. Kahane and J. Peyriere, Sur certaines martingales de Benoit Mandelbrot, Adv. Math., 2 (1976), pp. 131-145. [131] [166] C. J. Kic’ey AND C. J. Lennard, Unique reconstruction of band-limited signals by a Mallat- Zhong wavelet transform algorithm, J. Fourier Anal. Appl., 3 (1997), pp. 63-82. [125] [167] A. N. KOLMOGOROV, The local structure of turbulence in incompressible viscous fluid for very large Reynolds numbers, Dokl. Akad. Nauk SSSR, 30 (1941). Proc. Roy. Soc. London Ser. A, 434 (1991), pp. 9-13, reprint. [128] [168] , A refinement of previous hypotheses concerning the local structure of turbulence in viscous incompressible fluid at a high Reynolds number, J. Fluid Meeh., 13 (1962), pp. 82- 85. [128] [169] A. Lannes, S. Roques, and M. J. Casanove, Resolution and robustness in image pro- cessing: A new regularization principle, J. Opt. Soc. Amer., 4 (1987), pp. 189-199. [189, 191] [170] E. Lega, H. Scholl, J. M. Alimi, A. Bijaoui, and P. Bury, A parallel algorithm for structure detection based on wavelet and segmentation algorithm, Parallel Comput., 21 (1995), pp. 265-285. [199] [171] P. G. Lemarie-Rieusset, Analysis multi-resolution non orthogonal, commutations entre projecteurs et derivation et ondelettes vecteurs a divergence nulle, Rev. Mat. Iberoameri- cana, 8 (1992), pp. 221-237. [65, 145] [172] J. LERAY, Etudes de diverses equations integrates non-lineaires et de quelques problemes que pose I’hydrodynamique, J. Math. Pures Appl., 9 (1933), pp. 1-82. [128] [173] J.-S. Lienard, Speech analysis and reconstruction using short-time, elementary waveforms, in Proc. IEEE Internal. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1987, pp. 948-951. [68] [174] J.-L. Lions, El Planeta Tierra, El papel de les matematicas у de los superordenadores, Espasa Calpe, Madrid, 1990. [Lectures given at the Institute de Espana.] [5] [175] G. G. Lorentz, Approximation of Functions, 2nd ed., Chelsea Publishing Co., New York, 1986. [171] [176] H. Lorenz, G. M. Richter, M. Cappaccioli, and G. Longo, Adaptive filtering in astro- nomical image processing, Astronom. Astrophys., 277 (1993), pp. 321-330. [196] [177] F. Low, Complete sets of wave packets, in A Passion for Physics—Essays in Honor of Geoffrey Chew, World Scientific, Singapore, 1985, pp. 17-22. [9] [178] T. Lyche and K. M0RKEN, Knot removal for parametric B-spline curves and surfaces, Comput. Aided Geom. Design, 4 (1987), pp. 217-230. [175] [179] S. G. Mallat, Multifrequency channel decompositions of images and wavelet models, IEEE Trans. Acoust. Speech Signal Process., 37 (1989), pp. 2091-2110. [180] , A theory for multiresolution signal decomposition: The wavelet representation, IEEE Trans. Patt. Anal. Mach. Intell., 11 (1989), pp. 674-693. [181] , A Wavelet Tour of Image Processing, Academic Press, New York, 1998. [71, 135] [182] S. G. Mallat and W.-L. Hwang, Singularity detection and processing with wavelets, IEEE Trans. Inform. Theory, 38 (1992), pp. 617-643. [135] [183] S. G. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process., 41 (1993), pp. 3397-3415. [71] [184] S. G. Mallat and S. Zhong, Characterization of signals from multiscale edges, IEEE Trans. Patt. Anal. Mach. Intell., 14 (1992), pp. 710-732. [135] [185] H. S. MALVAR, Lapped transforms for efficient transform/subband coding, IEEE Trans. Acoust. Speech Signal Process., 38 (1990), pp. 969-978. [90]
BIBLIOGRAPHY 245 [186] , Fast algorithm for modulated lapped transform, Electron. Lett., 27 (1991), pp. 775- 776. [90] [187] , Signal Processing with Lapped Transforms, Artech House, Norwood, MA, 1991. [90] [188] H. S. Malvar and D. H. Staelin, Reduction of blocking effects in image coding with a lapped orthogonal transform, in Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1988, pp. 781-784. [90] [189] , The lot: transform coding without blocking effects, IEEE Trans. Acoust. Speech Signal Process., 37 (1989), pp. 553-559. [90] [190] B. Mandelbrot, Possible refinement of the lognormal hypothesis concerning the distribu- tion of energy dissipation in intermittent turbulence, in Statistical Models and Turbulence, M. Rosenblatt and C. W. Van Atta, eds., Lecture Notes in Physics 12, Springer-Verlag, Berlin, 1972, pp. 333-351. [130] [191] , Intermittent turbulence in self-similar cascades: Divergence of high moments and dimension of carrier, J. Fluid Meeh., 62 (1974), pp. 331-358. [130] [192] , The Fractal Geometry of Nature, Freeman, San Francisco, 1982. [200] [193] , Les objects fractals, Flammarion, Paris, 1995. [200] [194] S. Mann and S. Haykin, The chirplet transform—A generalization of Gabor’s logon trans- form, in Vision Interface ’91, Canadian Inform. Process. Society, Toronto, Canada, 1991. [100] [195] , Adaptive chirplet transform: An adaptive generalization of the wavelet transform, Optical Engineering, 31 (1992), pp. 1243-1256. [100] [196] , Time-frequency perspectives: The chirplet transform, in Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1992. [100] [197] M. W. MARCELLIN, private communication, October 1999. [Prof. Marcellin is a member of the JPEG-2000 committee.] [65] [198] D. Marr, Vision: A Computational Investigation into the Human Representation and Processing of Visual Information, W. H. Freeman and Co., New York, 1982. [6, 12, 49, 117, 120, 121, 123] [199] D. Marr and E. Hildreth, Theory of edge detection, Proc. Roy. Soc. London Ser. B, 207 (1980), pp. 187-217. [120] [200] F. G. Meyer, Image compression in libraries of bases, 1998. [Lecture notes for a course given at the Institut Henri Poincare, Paris.] [114] [201] F. G. Meyer, A. Z. Averbuch, J.-O. Stromberg, and R. R. Coifman, Multi-layered image representation: Application to image compression, in Internat. Conf. Image Process., ICIP’98, Chicago, IL, IEEE Press, Piscataway, NJ, 1998. [115] [202] F. G. Meyer and R. R. Coifman, Brushlets: A tool for directional image analysis and image compression, Appl. Comput. Harmon. Anal., 4 (1997), pp. 147-187. [115] [203] Y. Meyer, Ondelettes et Operateurs I: Ondelettes, Hermann, Paris, 1990 (in French). Wavelets and Operators, Cambridge Univ. Press, Cambridge, U. K., 1992 (in English). [29, 167, 235] [204] , Ondelettes et Operateurs II: Operateurs de Calderon-Zygmund, Hermann, Paris, 1990 (in French). Wavelets, Cambridge Univ. Press, Cambridge, U. K., 1997 (in English). [205] , L’analyse par ondelettes d’un objet multifractal: La function sinn2/ de Rie- mann, Math. Colloquium of the Univ, of Rennes, Rennes, France, 1991. [206] , Ondelettes et algorithmes concurrents, Hermann, Paris, 1992. [207] , Wavelets, paraproducts, and Navier -Stokes equations, in Current Developments in Mathematics 1996, International Press, Cambridge, MA, 1997. [145, 147] [208] , Wavelets, Vibrations and Scalings, CRM Monogr. Ser. 9, AMS, Providence, RI, 1998. [132, 152] [209] Y. Meyer AND R. R. Coifman, Ondelettes et Operateurs HI: Operateurs multilineaires, Hermann, Paris, 1991 (in French). Wavelets, Cambridge Univ. Press, Cambridge, U. K., 1997 (in English). [210] Y. MEYER and F. Paiva, Convergence de I’algorithme de Mallat, J. Anal. Math., 60 (1993), pp. 227-240. [44]
246 BIBLIOGRAPHY [211] Y. MEYER, F. Sellan, AND M. Taqqu, Wavelets, generalized white noise and fractional integration: The synthesis of fractional Brownian motion, J. Fourier Anal. Appl., 5 (1999), pp. 465 -494. [181] [212] G. M. MOLCHAN, Scaling exponents and multifractal dimensions for independent random cascades, Comm. Math. Phys., 179 (1996), pp. 681-702. [131] [213] J.-M. Morel and S. Solimini, Variational Methods in Image Segmentation, Birkhauser, Boston, MA, 1995. [182] [214] J. E. Moyal, Quantum mechanics as a statistical theory, Proc. Cambridge Philos. Soc., 45 (1949), pp. 99-124. [87] [215] D. Mumford and A. Desolneux, Pattern Theory through Examples. Forthcoming. [5] [216] D. Mumford and B. Gidas, Stochastic models for generic images, Quart. Appl. Math, to appear. [5] [217] J.-F. Muzy, E. Bacry, and A. Arneodo, The multifractal formalism revisited with wavelets, Internal. J. Bifur. Chaos Appl. Sci. Engrg., 4 (1994), pp. 245-302. [218] D. J. Newman, Rational approximation of |ж|, Michigan Math. J., 11 (1964), pp. 11-14. [169] [219] F. Nicolleau and C. Vassilicos, The Topology of Intermittency, tech, report, Department of Applied Mathematics and Theoretical Physics, Cambridge Univ., Cambridge, U. K., 1999. [137] [220] A. M. Obukhov, On the distribution of energy in the spectrum of turbulent flow, Dokl. Akad. Nauk SSSR, 32 (1941), pp. 22-24. [128] [221] L. ONSAGER, The distribution of energy in turbulence, Phys. Rev., 68 (1945), p. 286. [128] [222] A. PAPOULIS, Signal Analysis, 4th ed., McGraw-Hill, New York, 1988. [25] [223] G. Parisi and U. Frisch, On the singularity structure of fully developed turbulence, in Turbulence and Predictability in Geophysical Fluid Dynamics, Proc. Internat. School of Physics “E. Fermi,” 1983, Varenna, Italy, M. Ghil, R. Benzi, and G. Parisi, eds., North- Holland, Amsterdam, 1985, pp. 84-87. [131] [224] V. Peller, A description of Hankel operators of class &p for p > 0, an investigation of the rate of rational approximation, and other applications, Math. USSR Sbornik, 50 (1985), pp. 465-492. [The Russian version was published in 1983.] [170] [225] P. Petrushev, Direct and converse theorems for spline and rational approximation and Besov spaces, in Function Spaces and Applications, M. Cwikel et al., eds., Lecture Notes in Math. 1302, Springer-Verlag, New York, 1988. [226] P. Petrushev and V. Popov, Rational Approximation of Real Functions, Cambridge Univ. Press, Cambridge, U. K., 1988. [169] [227] W. L. Press, Wavelet-based compression software for FITS images, in Astronomical Data Analysis Software and Systems I, APS Conference Series, vol. 25, Astronom. Soc. Pacific, San Francisco, 1992. [196] [228] H. Queffelec, Derivabilite de certaines sommes de series de Fourier lacunaire, C. R. Acad. Sci. Paris Ser. A, 273 (1971), pp. 291-293. [229] H. Reeves, Patience dans I’azur, Seuil, Paris, 1981. [197] [230] G. M. Richter, Zur auswertung astronomischer aufnahmen mit dem automatischen fldchenphotometer, Astronom. Nachr., 299 (1978), pp. 283-303. [194] [231] X. Rodet, Time-domain format-wave-function synthesis, Comput. Music J., 8 (1985). [69] [232] S. Roques, F. Bourzeix, and K. Bouyoucef, Soft-thresholding technique and restoration O/3C273 jet, Astrophys. Space Sci., 239 (1986), pp. 297-304. [192, 193] [233] S. Roux, Analyse en ondelettes de I’auto-similaritd de signaux en turbulence plainement developpee, Ph.D thesis, Univ, of Aix-Marseille, Marseille, France, 1996. [136] [234] J. Schauder, Zur Theorie stetiger Abbildungen in Funktionalrdumen, Math. Z., 26 (1927), pp. 47-65. [18] [235] E. Sere, Localisation frequentielle des paquets d’ondelettes, Rev. Mat. Iberoamericana, 11 (1995), pp. 334-354. [106] [236] E. Slezak, A. Bijaoui, and G. Mars, Identification of structures from galaxy counts: Use of the wavelet transform, Astronom. Astrophys., 227 (1990), pp. 301-316. [199]
BIBLIOGRAPHY 247 [237] E. Slezak, V. DE Lapparent, and A. Bijaoui, Objective detection of voids and high-density structures in the first CfA redshift survey slice, Astronom. J., 409 (1993), pp. 517-529. [198, 199, 200] [238] M. J. T. Smith and T. P. Barnwell III, A procedure for designing exact reconstruction filter banks for tree structured subband coders, in Proc. IEEE Internat. Conf. Acoust. Speech Signal Process., IEEE Press, Piscataway, NJ, 1984. [207] [239] , Exact reconstruction techniques for tree structured coders, IEEE Trans. Acoust. Speech Signal Process., 34 (1986), pp. 434-441. [207] [240] J.-L. Stark, F. Murtagh, and A. Bijaoui, Image Processing and Data Analysis. The Multiscale Approach, Cambridge Univ. Press, Cambridge, U. K., 1998. [197] [241] J.-L. Stark, F. Murtagh, B. Pirenne, and M. Albrecht, Astronomical image compres- sion based on noise suppression, Pub. Astronom. Soc. Pacific, 108 (1996), pp. 446-455. [197] [242] E. M. Stein, Singular Integrals and Differentiability Properties of Functions, Princeton Univ. Press, Princeton, NJ, 1970. [23] [243] , Topics in Harmonic Analysis Related to the Littlewood-Paley Theory, Princeton Univ. Press, Princeton, NJ, 1970. [23] [244] G. Strang and G. Fix, An Analysis of the Finite Element Method, Prentice-Hall, Engle- wood Cliffs, NJ, 1973. [51] [245] ,1.-0. Stromberg, A modified Franklin system and higher-order spline systems on R as unconditional bases for Hardy spaces, in Conference on Harmonic Analysis in Honor of Antoni Zygmund, vol. II, W. Beckner et al., eds., Wadsworth, Belmont, CA, 1983, pp. 475- 494. [15, 28] [246] P. TCHAMITCHIAN, Biorthogonalite et theorie des operateurs, Rev. Mat. Iberoamericana, 3 (1987), pp. 163-189. [63] [247] P. TCHAMITCHIAN AND B. TORRESANI, Ridge and skeleton extraction from the wavelet trans- form, in Wavelets and Their Applications, M. B. Ruskai et al., eds., Jones and Bartlett, Boston, MA, 1992, pp. 123-151. [104] [248] A. N. Tikhonov, Regularization of incorrectly posed problems, Soviet Math. Dokl., 4 (1963), pp. 1624-1627. [189] [249] B. Torresani, Analyse continue par ondelettes, InterEditions/CNRS Editions, Paris, 1995. [250] R. Vautard AND M. Ghil, Singular spectrum analysis in nonlinear dynamics, with appli- cations to paleoclimatic time series, Phys. D, 35 (1989), pp. 359-424. [5] [251] J. P. VerAN AND J. R. WRIGHT, Compression software for astronomical images, in Astro- nomical Data Analysis Software and Systems HI, ASP Conference Series, vol. 61, Astronom. Soc. Pacific, San Francisco, 1994. [194] [252] M. Vergassola, B. Dubrulle, U. Frisch, and A. Noullez, Burgers’ equation, devil’s staircases and the mass distribution for large-scale structures, Astronom. Astrophys., 289 (1994), pp. 325-356. [165] [253] M. Vetterli and J. KOVACEVIC, Wavelet and Subband Coding, Prentice-Hall, Englewood Cliffs, NJ, 1995. [207] [254] J. Ville, Theorie et applications de la notion de signal analytique, Cables et Transmissions, Laboratoire de Telecommunications de la Societe Alsacienne de Construction Mecanique, 2A (1948), pp. 61-74. [25, 67, 72, 87, 89, 90] [255] C. F. VON WEIZSACKER, Das Spektrum der Turbulenz bei grofien Reynoldschen Zahlen, Z. Phys., 124 (1948), pp. 614-627. [128] [256] K. WEIERSTRASS, Uber continuierliche Functionen eines reellen Arguments, die fur kienen Werth des letzeren einen bestimmten Differentialquotienten besitzen, in Matematische Werke II, Abhandlung 2, Georg Olms, Verlagsbuchhandlung, Mildesheim; Johnson Reprint Corp., New York, 1967, pp. 71-74. [150] [257] E. Wesfreid, Vocal command signal segmentation and phonemes classification, in Proc. Second Symposium on Artificial Intelligence, Havana, Cuba, A. Ochoa, M. Ortiz, and R. Santana, eds., Editorial Academia Cuba, Havana, 1999, pp. 45-50. [97] [258] E. Wesfreid and V. Wickerhauser, Adapted local trigonometric transform and speech processing, IEEE Trans. Signal Process., 41 (1993), pp. 3596-3600. [97]
248 BIBLIOGRAPHY [259] R. L. White, High-Performance Compression of Astronomical Images, tech, report, Space Telescope Science Institute, Baltimore, MD, 1992. [196] [260] E. P. WlGNER, On the quantum correction for thermodynamic equilibrium, Phys. Rev., 40 (1932), pp. 749-759. [86] [261] K. G. WiLSON, Renormalization group and critical phenomena II: Phase-space cell analysis of critical behavior, Phys. Rev. B, 4 (1971), pp. 3184-3205. [32, 33, 90] [262] P. WOJTASZCZYK, The Franklin system is an unconditional basis in Hi, Ark. Mat., 20 (1982), pp. 293-300. [28] [263] J. W. Woods and S. O’Neil, Subband coding of images, IEEE Trans. Acoust. Speech Signal Process., 34 (1986), pp. 1278-1288. [58] [264] N. ZABUSKY, Computational synergetics, Phys. Today, July (1984), pp. 36-46. [138]
Author Index Abry, Patrice, 140 Adelson, E. H., ix, 31, 36, 49, 50, 52, 55, 57 Arneodo, Alain, 16, 20, 125, 127, 130, 134-137, 139, 140, 142, 143, 145, 200 Aubry, J.-M., 143 Bacry, Emmanuel, 142 Balian, Roger, 9, 15, 67, 90 Baraniuk, Richard, xi, 80, 100 Barnwell, T. R, 207 Barthes, Roland, 1 Batchelor, G. K., 130 Battle, Guy, ix, xi, 32, 145 Benassi, Albert, 21 Benveniste, Albert, 132 Bernard, Claude, 118 Bertoin, J., 165 Bijaoui, Albert, xi, 187, 194-197, 199, 201 Bobichon, Yves, xi, 194, 196 Boulez, Pierre, 69 Brillouin, Leon, ix, 9 Brislawn, Christopher, 6 Burt, P. J., ix, 31, 50, 52, 55, 57 Calderon, Alberto, 13, 15, 27, 32 Candes, E., 184 Castaing, B., 131 Charlier, Charles, 198, 200 Ciesielski, Zbigniew, 21, 24 Cohen, Albert, xi, 45, 62, 63, 147, 177, 180, 183 Coifman, Ronald, 26, 86, 92 Cordoba, A., 87 Couder, Yves, 139 Croisier, A., 31, 35 Dahmen, W., 147 Daubechies, Ingrid, ix, 9, 11, 16, 31, 46, 63, 91, 180 De Bruijn, N. G., 87 DeVore, Ronald, 71, 147, 167, 169, 170, 172-175, 177, 182, 183 Donoho, David, 140, 167, 173, 177, 179, 180, 183, 184, 236 Du Bois-Reymond, Paul, 17 Duistermaat, J., 158, 159 Einstein, Albert, 128 Esteban, D., 31, 35, 58, 207 Faber, G., 18 Falconer, Kenneth, 147 Fang, X., 97 Farge, Marie, 90, 104, 140, 200 Fauve, S., 140 Feauveau, Jean-Christophe, 63 Federbush, Paul, ix, 32, 145 Fefferman, C., 87 Flandrin, Patrick, 85, 167 Fourier, Joseph, 13, 16, 31 Fournier d’Albe, Edward, 198, 200 Franklin, Philip, 23 Freidman, Alexander, 198 Freud, Geza, 150 Frick, R, 145 Frisch, Uriel, 127, 130, 131, 133, 134, 136, 149 Froment, Jacques, 115 Gabor, Dennis, ix, 9, 15, 67, 90 Gagne, Y., 130 249
250 AUTHOR INDEX Galand, Claude, 31, 35, 36, 41, 58, 207 Gerver, Joseph L., xi, 158, 163, 164 Glimm, James, ix, 77 Goldstine, Herman H., 138 Gribonval, Remi, 71 Grossmann, Alex, ix, 8, 15, 27 Haar, Alfred, 9, 17, 31 Hardy, G. H., 2, 157, 158, 164 Harrison, Edward, 198 Hausdorff, F., 148 Haykin, Simon, 100 Heisenberg, Werner, 128 Herschel, William, 198 Hingorani, R., 36, 49 Holschneider, Matthias, 3, 15, 19, 215, 218 Hopfinger, E. J., 130 Hubble, Edwin, 198 Hunt, J. C. R., 104 Innocent, J. M., 103, 104 Itatsu, Seiichi, 158, 164 Jaffard, Stephane, ix, x, 16, 20, 21, 91, 142, 143, 176 Jaffe, Arthur, ix, 78 Jawerth, Bjorn, 167, 169, 170, 172- 175 Johnstone, Iain, 167 Jones, Douglas, 100 Jones, L. K., 71 Journe, Jean-Lin, ix, 91 Julesz, Bela, 118 Kahane, Jean-Pierre, 16, 131 Kant, Immanuel, 198 Kerkyacharian, Gerard, 21, 168 Kevlahan, N. K.-R, 104 Kolmogorov, A. N., 128, 129 Krim, Hamid, xi Kruskal, M. D., 138 Lambert, Johann, 198 Lang, Serge, 158 Lannes, Andre, 187, 189 Laroch, C., 140 Lebesgue, Henri, 17 Lemarie-Rieusset, Pierre Gilles, 16, 65, 145 Leray, J., 128 Levy, Paul, 20, 21, 179 Lienard, Jean-Sylvain, 68, 92, 112 Lions, Jacques-Louis, 5 Littlewood, J. E., 22, 157, 164 Low, Francis, 9, 15, 90 Lucier, B., 175, 182 Lusin, N., 25, 158 Lyche, T., 175 Magnen, Jacques, ix Mallat, Stephane, 8, 23, 31, 41, 43, 57, 122, 135 Malvar, Henrique, ix, 9, 90, 91 Mandelbrot, Benoit, 10, 16, 21, 127, 130, 131, 200 Mann, Steve, 80, 100 Marcinkiewicz, J., 26 Marr, David, ix, 6, 11, 12, 23, 32, 49, 117-120, 122, 168, 181 Mars, G., 199 Meyer, Frangois, 114, 115 Meyer, Yves, x, 15, 31, 57, 92, 94, 142, 176, 177, 219 Minsky, Marvin, 117 Morel, Jean-Michel, 182 Mprken, K., 175 Morlet, Jean, ix, 8, 15, 27 Moyal, J. E., 86 Mumford, David, 5, 13, 182 Muzy, Jean-Frangois, 142 Newman, D. J., 169 Nicolleau, F., 137 Obukhov, A. M., 128, 129 O’Neil, S., 58 Onsager, L., 128 Osher, S., 182, 183 Paley, R. E. A. C., 22 Parisi, Giorgio, 127, 130, 131, 133, 134, 136, 149 Peetre, J., 169 Pekarskii, A., 169, 171 Peller, V., 169, 170 Penzias, Arno, 198 Petrushev, P., 167, 169, 172, 177, 183 Peyriere, Jacques, 131 Picard, Dominique, 168
AUTHOR INDEX 251 Popov, V. A., 167, 169, 170, 172-174 Rayner, John, xi Reeves, Hubert, 197 Richter, G. M., 194 Riemann, Bernhard, 3, 19 Rodet, X., 69 Roques, Sylvie, xi, 192, 193 Roux, Daniel, 21 Rudin, L. I., 182, 183 Ryan, Robert, x Schauder, J., 18 Sellan, Fabrice, 21, 65, 181 Seneor, Roland, ix Sere, Eric, 106 Shah, J., 182 Shannon, Claude, ix, 43 Simoncelli, E., 36, 49 Slezak, E., 198, 199 Smith, M. J. T., 207 Stark, Jean-Luc, 197 Stromberg, J.-O., 13, 15, 24, 31 Swedenborg, Emanuel, 198 Tajchman, Marc, xi Tchamitchian, Philippe, 3, 19, 63, 104 Torresani, Bruno, xi, 103, 104 Townsend, A. A., 130 van Ness, J. W., 21 Vassilicos, J. C., 104, 137 Vial, P., 180 Ville, Jean, 25, 67, 68, 72, 73, 79, 87, 89, 90, 95, 112, 184 Vjacheslavov, N. S., 169 von Neumann, John, ix, 9, 15, 138 von Weizsacker, C. F., 128 Weierstrass, Karl, 150 Weiss, Guido, 26 Wesfreid, Eva, xi Weyl, Hermann, 73 Wickerhauser, Victor, 86, 140 Wiener, Norbert, ix Wigner, Eugene, ix, 73, 76, 86 Willsky, Alan S., 132 Wilson, Kenneth, ix, 9, 15, 32, 90, 91 Wilson, Robert, 198 Wohler, Friedrich, 2, 119 Wojtaszczyk, P., 28 Woods, J., 58 Xu, H., 177, 183 Zabusky, Norman, 138 Zimin, V., 145 Zygmund, Antoni, 22
Subject Index admissibility condition, 209, 215 aliasing, 206 ambiguity function, 74 analytic signal, 11, 25, 79 associated with an asymptotic signal, 81 approximation of irrationals by con- tinued fractions, 161 astronomical data, 194 asymptotic signals, 81 atomic decomposition, 3, 7, 8, 26 atoms, 18, 26, 67 Balian-Low theorem, 9, 37, 91 bases chirplet, 101 local Fourier, 13 wavelet, 13 Bernoulli measures, 130, 131 Bernstein’s theorem, 40 Besov spaces characterized by wavelet coeffi- cients, 234 homogeneous, 234 nonhomogeneous, 234 best-basis algorithm, 72, 97, 101, 114 big bang, 198 biorthogonal wavelets, 63, 70 divergence-free, 65 Bobichon Bijaoui algorithm, See ht-compress Brownian motion, 20 fractional (fBm), 21, 65, 129, 181 realization of, 20 regularity of, 20, 21 Burgers’s equation, 165 Burt and Adelson’s algorithms, See pyramid algorithms Calderon’s identity, 15, 26, 27, 29 cartography, an illustration of scale, 49 50 cartoon image, 168 chirplets, 80, 100 chirps, 80 first definition, 142 hyperbolic, 86 linear, 83 second definition, 142 three-dimensional, 143 in turbulence, 141 coding textures, 114 coherent structures, 127, 137 conjugate quadrature filters, 207 Couder’s experiment, 139 Daubechies’s wavelets, 31, 79 construction, 46 decimation operator, 37, 38, 52 devil’s staircase, 130 DeVore-Lucier model, 183 discrete cosine transform, 91 discrete sine transform, 91 DNA, 2, 125, 136 dyadic blocks, 22, 30 entropy criterion, 89, 112 entropy of a vector, 96 estimator, 168, 178, 179 optimal, 168, 178 suboptimal, 178 extension operator, 51, 60
254 SUBJECT INDEX fast Fourier transform, 47 fast wavelet transform, 47 filter bank, general two-channel, 205 filter, definition of, 205 fingerprints, storage by FBI, 6, 70 FIR, See impulse response Fix and Strang condition, 51 fluctuation, 40, 41, 55, 56 continuous, 42 Fourier analysis, 2 Fourier Bros lagolnitzer transform, 87 Fourier series, 2, 16, 19 Franklin system, 23, 24, 26 Freud’s method, 151 functions of bounded variation (BV), 176, 178, 184 Gabor wavelets, 9, 33, 69, 105 optimal localization of, 106 geometric images, 181 Gerver’s theorem, 19 global warming, 5 gravitational waves, 86, 102, 103 Grossmann-Morlet wavelets, 7, 9, 11, 13, 29, 100 Holder condition, 19 Holder exponents, 19, 20 algorithm for computing, 155 Holder spaces, 19 CQ(]R), 153 homogeneous, 233 nonhomogeneous, 233 Haar system, 9, 15, 18, 31, 46, 62, 79 Haar wavelets, See Haar system Hardy spaces, 25 real version, 28 Hausdorff dimension, 19, 147, 148 Hausdorff measure, 148 hcompress, 194, 195 Heisenberg boxes, 72, 78 associated with level sets of the Wigner-Ville transform, 89, 90 Heisenberg uncertainty principle, 105 Hilbert basis, 18 ht_compress, 194, 196 Hubble Space Telescope, 187 IDEA, a deconvolution algorithm, 189 193 HR, See impulse response image, See signal, two-dimensional image processing, See signal process- ing fundamental problem, 115 impulse response, 39, 205 finite (FIR), 205 infinite (IIR), 205 inertial zone, 129 instantaneous frequency, 67, 79, 85 of an asymptotic signal, 81 relation with instantaneous spec- trum, 80 via matching pursuit, 83 Ville’s definition, 80 instantaneous spectrum, 73, 79 intermittency, 129 inversion formulas for the wavelet transform, 153, 210 generalized, 215 Jacobi’s Theta function, 159 Jarmk’s theorem, 162 JPEG-2000, x |k\-5/3 law, 129 knot removal, 175 Legendre inversion formula, 133 Lemarie-Meyer wavelets, 78, 94 Levy processes, 165 linear chirps, See chirps Littlewood-Paley analysis, 9, 22, 23, 26, 29, 151 Littlewood-Paley-Stein function, 23 Littlewood-Paley-Stein theory, 29 Lusin’s wavelet, 27, 211, 213 Mallat’s algorithm, 41-42, 47, 124 continuous version, 42 Mallat’s conjecture, 117, 121, 122, 125 a case where it is true, 229 a counterexample, 219 Mallat’s matching pursuit algorithm, See pursuit algorithms Mallat’s theorem convergence to wavelet analysis, 44
SUBJECT INDEX 255 Malvar-Wilson bases, 100, 101, 115 optimal, 97 Malvar-Wilson wavelets, 9, 23, 35, 89, 90, 92, 94, 107, 114 mammogram analysis, 175 Marr’s conjecture, 117, 120, 122, 123, 125 a counterexample to, 121 Marr’s wavelet, 120, 122 microlocal analysis, 87 models for image processing, 168 Moyal’s identity, 75, 84 multifractal analysis, 137 multifractal formalism, 127, 130, 133, 165 an extension of, 143, 144 failure of, 136, 141 multifractal objects, 19 multilayered analysis, 104 multiplicative cascade, 130 multiresolution analysis, 8, 9, 42, 57, See also pyramid algorithms regularity of, 58 multiscale system theory, 132 Mumford-Shah model, 182, 183 Navier-Stokes equations, 128, 145 nonlinear approximation, 175 Nyquist condition, 51 optimal algorithms, the search for, 10, 11 orthogonal pyramids, See pyramid al- gorithms orthonormal basis, 18 oscillation exponent, 142 Osher-Rudin model, 182, 183 Oslo algorithm, 175 paraproduct algorithms, 147 partial isometry, 43, 210 partition functions, 135 perfect reconstruction, 39, 206 point-spread function, 188 pseudodifferential calculus, See Wigner-Ville transform pursuit algorithms, 71 Mallat’s, 71 pyramid algorithms, 8, 31, 36 Burt and Adelson’s, 50-53, 55, 56, 59 coding scheme, 57 examples, 54-55 image compression, 55 orthogonal pyramids, 58-60 relation with multiresolution analyses, 58 Q-sparse, 174 quadrature mirror filters, 8, 31, 36, 38, 41, 59, 111, 207 examples, 39-40, 43 quantization, See signal processing quantization noise, 4, 36 rational approximation, 168-171 versus spline approximation, 172 representation, Marr’s ideas, 12 restriction operator, 50, 51, 60 ridgelets, 168, 184, 185 Riemann’s function, 3, 13, 15, 132, 142, 150, 165, 211, 218 belongs to C1/2(]R), 155 spectrum of singularities of, 132, 163 Riesz basis, 58 Schauder basis, 18, 20, 23 to represent Brownian motion, 20 segmentation, 10 Shannon’s theorem, 30, 36, 112 Shannon’s wavelets, 43 signal, 1-2 frequency modulated, 100-102 nonstationary, 7 stationary, 7 two-dimensional, 2 signal processing, 2 analysis, 2 coding entropy, 4 linear prediction, 35 transform, 3, 35 by zero-crossings, 3 compression, 3, 31 diagnostics, 4 quantization, 4, 62 restoration, 5 storage, 3 transmission, 3
256 SUBJECT INDEX sparse wavelet expansion, 167, 170, 171, 173, 176 sparsity, See sparse wavelet expansion spectrum of oscillating singularities, 143 spectrum of singularities, 131, 133, 165 spline approximation, 171 spline function, 51 basic cubic, 51, 219 split-and-merge algorithm, 96 splitting algorithms, 112 statistical modeling, 5, 6, 70, 128 Stromberg’s wavelets, 24, 28 structure functions, 129, 133, 134 of fractional Brownian motion, 129 subband coding ideal filters, 36-37 two channels, 38 subsampling, See decimation opera- tor Taylor hypothesis, 129, 130 textures, 181, 182 theta modular group, 159 thresholding, 173, 174, 179, 181 soft, 181, 192 Tikhonov regularization, 189 time-frequency algorithms, 13 time-frequency analysis, 67, 86, 87, 89, 105 time-frequency atoms, 68, 72, 89, 105 a collection Q, 69 Gabor’s, 68, 69 Lienard’s, 69 precise definition, 79 time-frequency plane, 67, 72 time-frequency wavelets, 7, 9, 15 time-scale algorithms, 13 time-scale wavelets, 7, 15 transfer function, 205 transition operator, 60 trend, 40, 41, 55, 56 continuous, 42 Tukey’s window, 124, 229 unconditional basis, 28 vortex filaments, 137 Walsh system, 110, 111 wavelet analysis, 3 wavelet coefficients, 27 wavelet methods for PDEs, 147 wavelet packets, 35, 36, 38, 89, 90, 107, 114, 115 basic, 108, 109 general, 111 wavelet shrinkage, 168, 180, 181, 183 wavelet thresholding, See threshold- ing wavelet transform modulus maxima algorithm, 135, 137 wavelets divergence-free, 145 wave packets of Cordoba-Fefferman, 87 weak lp, 176 Weierstrass’s function, 13, 132, 149 belongs to C~ ‘"s A (]R), 157 Weyl-Heisenberg group, 33 Weyl symbol, 76 Wigner-Ville transform, 72, 73 of an asymptotic signal, 81-83 cross terms, 85 properties, 74, 75 pseudodifferential calculus, 76 quantum mechanics, 74 relation with ambiguity function, 74 relation with Weyl symbol, 77 WTMM algorithm, See wavelet trans- form modulus maxima algo- rithm zero-crossings, 117, 120