Handbook of Neural Network Signal Processing P1

Chia sẻ: Tuyen Thon | Ngày: | Loại File: PDF | Số trang:30

Thêm vào BST

Báo xấu

143
lượt xem 25
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

The Advanced Signal Processing Handbook: Theory and Implementation for Radar, Sonar, and Medical Imaging Real-Time Systems Stergios Stergiopoulos The Transform and Data Compression Handbook K.R. Rao and P.C. Yip Handbook of Multisensor Data Fusion David Hall and James Llinas Handbook of Neural Network Signal Processing Yu Hen Hu and Jenq-Neng Hwang Handbook of Antennas in Wireless Communications Lal Chand Godara Forthcoming Titles Propagation Data Handbook for Wireless Communications Robert Crane The Digital Color Imaging Handbook Guarav Sharma Applications in Time Frequency Signal Processing Antonia Papandreou-Suppappola Noise Reduction in Speech Applications Gillian Davis Signal Processing in Noise Vyacheslav Tuzlukov Electromagnetic Radiation and the...

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Handbook of Neural Network Signal Processing P1

Handbook of NEURAL NETWORK SIGNAL PROCESSING © 2002 by CRC Press LLC
THE ELECTRICAL ENGINEERING AND APPLIED SIGNAL PROCESSING SERIES Edited by Alexander Poularikas The Advanced Signal Processing Handbook: Theory and Implementation for Radar, Sonar, and Medical Imaging Real-Time Systems Stergios Stergiopoulos The Transform and Data Compression Handbook K.R. Rao and P.C. Yip Handbook of Multisensor Data Fusion David Hall and James Llinas Handbook of Neural Network Signal Processing Yu Hen Hu and Jenq-Neng Hwang Handbook of Antennas in Wireless Communications Lal Chand Godara Forthcoming Titles Propagation Data Handbook for Wireless Communications Robert Crane The Digital Color Imaging Handbook Guarav Sharma Applications in Time Frequency Signal Processing Antonia Papandreou-Suppappola Noise Reduction in Speech Applications Gillian Davis Signal Processing in Noise Vyacheslav Tuzlukov Electromagnetic Radiation and the Human Body: Effects, Diagnosis, and Therapeutic Technologies Nikolaos Uzunoglu and Konstantina S. Nikita Digital Signal Processing with Examples in MATLAB® Samuel Stearns Smart Antennas Lal Chand Godara Pattern Recognition in Speech and Language Processing Wu Chou and Bing Huang Juang © 2002 by CRC Press LLC
Handbook of NEURAL NETWORK SIGNAL PROCESSING Edited by YU HEN HU JENQ-NENG HWANG CRC PR E S S Boca Raton London New York Washington, D.C.
Library of Congress Cataloging-in-Publication Data Handbook of neural network signal processing / editors, Yu Hen Hu, Jenq-Neng Hwang. p. cm.— (Electrical engineering and applied signal processing (Series)) Includes bibliographical references and index. ISBN 0-8493-2359-2 1. Neural networks (Computer science)—Handbooks, manuals, etc. 2. Signal processing—Handbooks, manuals, etc. I. Hu, Yu Hen. II. Hwang, Jenq-Neng. III. Electrical engineering and signal processing series. QA76.87 H345 2001 006.3′2—dc21 2001035674 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microﬁlming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. All rights reserved. Authorization to photocopy items for internal or personal use, or the personal or internal use of speciﬁc clients, may be granted by CRC Press LLC, provided that $1.50 per page photocopied is paid directly to Copyright clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA. The fee code for users of the Transactional Reporting Service is ISBN 0-8493-2359-2/01/$0.00+$1.50. The fee is subject to change without notice. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Speciﬁc permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identiﬁcation and explanation, without intent to infringe. Visit the CRC Press Web site at www.crcpress.com © 2002 by CRC Press LLC No claim to original U.S. Government works International Standard Book Number 0-8493-2359-2 Library of Congress Card Number 2001035674 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0 Printed on acid-free paper
Preface The ﬁeld of artiﬁcial neural networks has made tremendous progress in the past 20 years in terms of theory, algorithms, and applications. Notably, the majority of real world neural network appli- cations have involved the solution of difﬁcult statistical signal processing problems. Compared to conventional signal processing algorithms that are mainly based on linear models, artiﬁcial neural networks offer an attractive alternative by providing nonlinear parametric models with universal approximation power, as well as adaptive training algorithms. The availability of such powerful modeling tools motivated numerous research efforts to explore new signal processing applications of artiﬁcial neural networks. During the course of the research, many neural network paradigms were proposed. Some of them are merely reincarnations of existing algorithms formulated in a neural network-like setting, while the others provide new perspectives toward solving nonlinear adaptive signal processing. More importantly, there are a number of emergent neural network paradigms that have found successful real world applications. The purpose of this handbook is to survey recent progress in artiﬁcial neural network theory, algorithms (paradigms) with a special emphasis on signal processing applications. We invited a panel of internationally well known researchers who have worked on both theory and applications of neural networks for signal processing to write each chapter. There are a total of 12 chapters plus one introductory chapter in this handbook. The chapters are categorized into three groups. The ﬁrst group contains in-depth surveys of recent progress in neural network computing paradigms. It contains ﬁve chapters, including the introduction, that deal with multilayer perceptrons, radial basis functions, kernel-based learning, and committee machines. The second part of this handbook surveys the neural network implementations of important signal processing problems. This part contains four chapters, dealing with a dynamic neural network for optimal signal processing, blind signal separation and blind deconvolution, a neural network for principal component analysis, and applications of neural networks to time series predictions. The third part of this handbook examines signal processing applications and systems that use neural network methods. This part contains chapters dealing with applications of artiﬁcial neural networks (ANNs) to speech processing, learning and adaptive characterization of visual content in image retrieval systems, applications of neural networks to biomedical image processing, and a hierarchical fuzzy neural network for pattern classiﬁcation. The theory and design of artiﬁcial neural networks have advanced signiﬁcantly during the past 20 years. Much of that progress has a direct bearing on signal processing. In particular, the nonlinear nature of neural networks, the ability of neural networks to learn from their environments in super- vised and/or unsupervised ways, as well as the universal approximation property of neural networks make them highly suited for solving difﬁcult signal processing problems. From a signal processing perspective, it is imperative to develop a proper understanding of basic neural network structures and how they impact signal processing algorithms and applications. A challenge in surveying the ﬁeld of neural network paradigms is to distinguish those neural network structures that have been successfully applied to solve real world problems from those that are still under development or have difﬁculty scaling up to solve realistic problems. When dealing with signal processing applications, it is critical to understand the nature of the problem formulation so that the most appropriate neural network paradigm can be applied. In addition, it is also important to assess the impact of neural networks on the performance, robustness, and cost-effectiveness of signal processing systems and develop methodologies for integrating neural networks with other signal processing algorithms. © 2002 by CRC Press LLC
We would like to express our sincere thanks to all the authors who contributed to this hand- book: Michael T. Manry, Hema Chandrasekaran, and Cheng-Hsiung Hsieh (Chapter 2); Andrew D. Back (Chapter 3); Klaus-Robert Müller, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bern- hard Scholköpf (Chapter 4); Volker Tresp (Chapter 5); Jose C. Principe (Chapter 6); Scott C. Douglas (Chapter 7); Konstantinos I. Diamantaras (Chapter 8); Yuansong Liao, John Moody, and Lizhong Wu (Chapter 9); Shigeru Katagirig (Chapter 10); Paisarn Muneesawang, Hau-San Wong, Jose Lay, and Ling Guan (Chapter 11); Tülay Adali, Yue Wang, and Huai Li (Chapter 12); and Jinshiuh Taur, Sun-Yuan Kung, and Shang-Hung Lin (Chapter 13). Many reviewers have carefully read the manuscript and provided many constructive suggestions. We are most grateful for their efforts. They are Andrew D. Back, David G. Brown, Laiwan Chan, Konstantinos I. Diamantaras, Adriana Dumitras, Mark Girolami, Ling Guan, Kuldip Paliwal, Amanda Sharkey, and Jinshiuh Taur. We would like to thank the editor-in-chief of this series of handbooks, Dr. Alexander D. Poularikas, for his encouragement. Our most sincere appreciation to Nora Konopka at CRC Press for her inﬁnite patience and understanding throughout this project. © 2002 by CRC Press LLC
Editors Yu Hen Hu received a B.S.E.E. degree from National Taiwan University, Taipei, Taiwan, in 1976. He received M.S.E.E. and Ph.D. degrees in electrical engineering from the University of Southern California in Los Angeles, in 1980 and 1982, respectively. From 1983 to 1987, he was an assistant professor in the electrical engineering department of Southern Methodist University in Dallas, Texas. He joined the department of electrical and computer engineering at the University of Wisconsin in Madison, as an assistant professor in 1987, and he is currently an associate professor. His research interests include multimedia signal processing, artiﬁcial neural networks, fast algorithms and design methodology for application speciﬁc micro-architectures, as well as computer aided design tools for VLSI using artiﬁcial intelligence. He has published more than 170 technical papers in these areas. His recent research interests have focused on image and video processing and human computer interface. Dr. Hu is a former associate editor for IEEE Transactions of Acoustic, Speech, and Signal Pro- cessing in the areas of system identiﬁcation and fast algorithms. He is currently associate editor of the Journal of VLSI Signal Processing. He is a founding member of the Neural Network Signal Pro- cessing Technical Committee of the IEEE Signal Processing Society and served as committee chair from 1993 to 1996. He is a former member of the VLSI Signal Processing Technical Committee of the Signal Processing Society. Recently, he served as the secretary of the IEEE Signal Processing Society (1996–1998). Dr. Hu is a fellow of the IEEE. Jenq-Neng Hwang holds B.S. and M.S. degrees in electrical engineering from the National Taiwan University, Taipei, Taiwan. After completing two years of obligatory military services after college, he enrolled as a research assistant at the Signal and Image Processing Institute of the department of electrical engineering at the University of Southern California, where he received his Ph.D. degree in December 1988. He was also a visiting student at Princeton University from 1987 to 1989. In the summer of 1989, Dr. Hwang joined the Department of Electrical Engineering of the Uni- versity of Washington in Seattle, where he is currently a professor. He has published more than 150 journal and conference papers and book chapters in the areas of image/video signal processing, computational neural networks, and multimedia system integration and networking. He received the 1995 IEEE Signal Processing Society’s Annual Best Paper Award (with Shyh-Rong Lay and Alan Lippman) in the area of neural networks for signal processing. Dr. Hwang is a fellow of the IEEE. He served as the secretary of the Neural Systems and Applica- tions Committee of the IEEE Circuits and Systems Society from 1989 to 1991, and he was a member of the Design and Implementation of Signal Processing Systems Technical Committee of the IEEE Signal Processing Society. He is also a founding member of the Multimedia Signal Processing Tech- nical Committee of the IEEE Signal Processing Society. He served as the chairman of the Neural Networks Signal Processing Technical Committee of the IEEE Signal Processing Society from 1996 to 1998, and he is currently the Society’s representative to the IEEE Neural Network Council. He served as an associate editor for IEEE Transactions on Signal Processing from 1992 to 1994 and currently is the associate editor for IEEE Transactions on Neural Networks and IEEE Transactions on Circuits and Systems for Video Technology. He is also on the editorial board of the Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology. Dr. Hwang was the con- ference program chair of the 1994 IEEE Workshop on Neural Networks for Signal Processing held in Ermioni, Greece in September 1994. He was the general co-chair of the International Symposium on © 2002 by CRC Press LLC
Artiﬁcial Neural Networks held in Hsinchu, Taiwan in December 1995. He also chaired the tutorial committee for the IEEE International Conference on Neural Networks held in Washington, D.C. in June 1996. He was the program co-chair of the International Conference on Acoustics, Speech, and Signal Processing in Seattle, Washington in 1998. © 2002 by CRC Press LLC
Contributors Tülay Adali Sun-Yuan Kung Jose C. Principe University of Maryland Princeton University University of Florida Baltimore, Maryland Princeton, New Jersey Gainesville, Florida Andrew D. Back Jose Lay Gunnar Rätsch Windale Technologies University of Sydney GMD FIRST and University of Brisbane, Australia Sydney, Australia Potsdam Berlin, Germany Hema Chandrasekaran Huai Li U.S. Wireless Corporation University of Maryland Bernhard Schölkopf San Ramon, California Baltimore, Maryland Max-Planck-Institut für Biologische Kybernetik Konstantinos I. Yuansong Liao Tübingen, Germany Diamantaras Oregon Graduate Institute of Technological Education Institute Science and Technology Junshiuh Taur of Thessaloniki Beaverton, Oregon National Chung-Hsing University Sindos, Greece Taichung Shang-Hung Lin Taiwan, China Scott C. Douglas EPSON Palo Alto Laboratories Southern Methodist University ERD Volker Tresp Dallas, Texas Palo Alto, California Siemens AG Corporate Technology Ling Guan Michael T. Manry Munich, Germany University of Sydney University of Texas Sydney, Australia Arlington, Texas Koji Tsuda AIST Computational Biology Cheng-Hsiung Hsieh Sebastian Mika Research Center Chien Kou Institute of Technology GMD FIRST Tokyo, Japan Changwa Berlin, Germany Taiwan, China Yue Wang John Moody The Catholic Universtiy of Yu Hen Hu Oregon Graduate Institute of America University of Wisconsin Science and Technology Washington, DC Madison, Wisconsin Beaverton, Oregon Hau-San Wong Jenq-Neug Hwang Klaus-Robert Müler University of Sydney University of Washington GMD FIRST and University of Sydney, Australia Seattle, Washington Potsdam Berlin, Germany Lizhong Wu Shigeru Katagiri HNC Software, Inc. Intelligent Communication Science Paisarn Muneesawang San Diego, California Laboratories University of Sydney Kyoto, Japan Sydney, Australia © 2002 by CRC Press LLC
Contents 1 Introduction to Neural Networks for Signal Processing Yu Hen Hu and Jenq- Neng Hwang 2 Signal Processing Using the Multilayer Perceptron Michael T. Manry, Hema Chandrasekaran, and Cheng-Hsiung Hsieh 3 Radial Basis Functions Andrew D. Back 4 An Introduction to Kernel-Based Learning Algorithms Klaus-Robert Müller, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bernhard Schölkopf 5 Committee Machines Volker Tresp 6 Dynamic Neural Networks and Optimal Signal Processing Jose C. Principe 7 Blind Signal Separation and Blind Deconvolution Scott C. Douglas 8 Neural Networks and Principal Component Analysis Konstantinos I. Diamantaras 9 Applications of Artiﬁcial Neural Networks to Time Series Prediction Yuansong Liao, John Moody, and Lizhong Wu 10 Applications of Artiﬁcial Neural Networks (ANNs) to Speech Processing Shigeru Katagiri 11 Learning and Adaptive Characterization of Visual Contents in Image Retrieval Systems Paisarn Muneesawang, Hau-San Wong, Jose Lay, and Ling Guan 12 Applications of Neural Networks to Image Processing Tülay Adali, Yue Wang, and Huai Li 13 Hierarchical Fuzzy Neural Networks for Pattern Classiﬁcation Jinshiuh Taur, Sun-Yuan Kung, and Shang-Hung Lin © 2002 by CRC Press LLC
1 Introduction to Neural Networks for Signal Processing 1.1 Introduction 1.2 Artiﬁcial Neural Network (ANN) Models — An Overview Basic Neural Network Components • Multilayer Perceptron (MLP) Model • Radial Basis Networks • Competitive Learning Networks • Committee Machines • Support Vector Machines (SVMs) Yu Hen Hu 1.3 Neural Network Solutions to Signal Processing University of Wisconsin Problems Digital Signal Processing Jenq-Neng Hwang 1.4 Overview of the Handbook University of Washington References 1.1 Introduction The theory and design of artiﬁcial neural networks have advanced signiﬁcantly during the past 20 years. Much of that progress has a direct bearing on signal processing. In particular, the non- linear nature of neural networks, the ability of neural networks to learn from their environments in supervised as well as unsupervised ways, as well as the universal approximation property of neural networks make them highly suited for solving difﬁcult signal processing problems. From a signal processing perspective, it is imperative to develop a proper understanding of basic neural network structures and how they impact signal processing algorithms and applications. A challenge in surveying the ﬁeld of neural network paradigms is to identify those neural network structures that have been successfully applied to solve real world problems from those that are still under development or have difﬁculty scaling up to solve realistic problems. When dealing with signal processing applications, it is critical to understand the nature of the problem formulation so that the most appropriate neural network paradigm can be applied. In addition, it is also important to assess the impact of neural networks on the performance, robustness, and cost-effectiveness of signal processing systems and develop methodologies for integrating neural networks with other signal processing algorithms. Another important issue is how to evaluate neural network paradigms, learning algorithms, and neural network structures and identify those that do and do not work reliably for solving signal processing problems. This chapter provides an overview of the topic of this handbook — neural networks for signal processing. The chapter ﬁrst discusses the deﬁnition of a neural network for signal processing and why it is important. It then surveys several modern neural network models that have found successful signal processing applications. Examples are cited relating to how to apply these nonlinear © 2002 by CRC Press LLC
computation paradigms to solve signal processing problems. Finally, this chapter highlights the remaining contents of this book. 1.2 Artiﬁcial Neural Network (ANN) Models — An Overview 1.2.1 Basic Neural Network Components A neural network is a general mathematical computing paradigm that models the operations of bio- logical neural systems. In 1943, McCulloch, a neurobiologist, and Pitts, a statistician, published a seminal paper titled “A logical calculus of ideas imminent in nervous activity” in Bulletin of Mathe- matical Biophysics [1]. This paper inspired the development of the modern digital computer, or the electronic brain, as John von Neumann called it. At approximately the same time, Frank Rosenblatt was also motivated by this paper to investigate the computation of the eye, which eventually led to the ﬁrst generation of neural networks, known as the perceptron [2]. This section provides a brief overview of ANN models. Many of these topics will be treated in greater detail in later chapters. The purpose of this chapter, therefore, is to highlight the basic concept of these neural network models to prepare the readers for later chapters. 1.2.1.1 McCulloch and Pitts’ Neuron Model Among numerous neural network models that have been proposed over the years, all share a common building block known as a neuron and a networked interconnection structure. The most widely used neuron model is based on McCulloch and Pitts’ work and is illustrated in Figure 1.1. 1.1 McCulloch and Pitts’ neuron model. In Figure 1.1, each neuron consists of two parts: the net function and the activation function. The net function determines how the network inputs {yj ; 1 ≤ j ≤ N } are combined inside the neuron. In this ﬁgure, a weighted linear combination is adopted: N u= wj yj + θ (1.1) j =1 {wj ; 1 ≤ j ≤ N } are parameters known as synaptic weights. The quantity θ is called the bias (or threshold) and is used to model the threshold. In the literature, other types of network input combination methods have been proposed as well. They are summarized in Table 1.1. © 2002 by CRC Press LLC
TABLE 1.1 Summary of Net Functions Net Functions Formula Comments N Linear u= wj yj + θ Most commonly used j =1 N N Higher order (2nd order formula u= wj k yj yk + θ ui is a weighted linear combination of higher order polynomial exhibited) j =1 k=1 terms of input variable. The number of input terms equals N d , where d is the order of the polynomial N Delta − u= wj yj Seldom used j =1 The output of the neuron, denoted by ai in this ﬁgure, is related to the network input ui via a linear or nonlinear transformation called the activation function: a = f (u) . (1.2) In various neural network models, different activation functions have been proposed. The most commonly used activation functions are summarized in Table 1.2. TABLE 1.2 Neuron Activation Functions Activation Function Formula a = f (u) Derivatives df (u) du Comments Sigmoid f (u) = 1 f (u)[1 − f (u)]/T Commonly used; derivative can be 1+e−u/T computed from f (u) directly. Hyperbolic tangent u f (u) tanh T 1 − [f (u)]2 /T T = temperature parameter Inverse tangent 2 u f (u) = π tan−1 T 2 1 Less frequently used π T · 1+(u/T )2 1 u > 0; Threshold f (u) = −1 u < 0. Derivatives do not exist at u=0 Gaussian radial basis f (u) = exp − u − m 2 /σ 2 −2(u − m) · f (u)/σ 2 Used for radial basis neural network; m and σ 2 are parameters to be speciﬁed Linear f (u) = au + b a Table 1.2 lists both the activation functions as well as their derivatives (provided they exist). In both sigmoid and hyperbolic tangent activation functions, derivatives can be computed directly from the knowledge of f (u). 1.2.1.2 Neural Network Topology In a neural network, multiple neurons are interconnected to form a network to facilitate dis- tributed computing. The conﬁguration of the interconnections can be described efﬁciently with a directed graph. A directed graph consists of nodes (in the case of a neural network, neurons, as well as external inputs) and directed arcs (in the case of a neural network, synaptic links). The topology of the graph can be categorized as either acyclic or cyclic. Refer to Figure 1.2a; a neural network with acyclic topology consists of no feedback loops. Such an acyclic neural network is often used to approximate a nonlinear mapping between its inputs and outputs. As shown in Figure 1.2b, a neural network with cyclic topology contains at least one cycle formed by directed arcs. Such a neural network is also known as a recurrent network. Due to the feedback loop, a recurrent network leads to a nonlinear dynamic system model that contains internal memory. Recurrent neural networks often exhibit complex behaviors and remain an active research topic in the ﬁeld of artiﬁcial neural networks. © 2002 by CRC Press LLC
1.2 Illustration of (a) an acyclic graph and (b) a cyclic graph. The cycle in (b) is emphasized with thick lines. 1.2.2 Multilayer Perceptron (MLP) Model The multilayer perceptron [3] is by far the most well known and most popular neural network among all the existing neural network paradigms. To introduce the MLP, let us ﬁrst discuss the perceptron model. 1.2.2.1 Perceptron Model An MLP is a variant of the original perceptron model proposed by Rosenblatt in the 1950s [2]. In the perceptron model, a single neuron with a linear weighted net function and a threshold activation function is employed. The input to this neuron x = (x1 , x2 , . . . , xn ) is a feature vector in an n-dimensional feature space. The net function u(x) is the weighted sum of the inputs: n u(x) = w0 + wi xi (1.3) i=1 and the output y(x) is obtained from u(x) via a threshold activation function: 1 u(x) ≥ 0 y(x) = (1.4) 0 u(x) < 0 . 1.3 A perceptron neural network model. The perceptron neuron model can be used for detection and classiﬁcation. For example, the weight vector w = (w1 , w2 , . . . , wn ) may represent the template of a certain target. If the input feature vector x closely matches w such that their inner product exceeds a threshold −w0 , then the output will become +1, indicating the detection of a target. The weight vector w needs to be determined in order to apply the perceptron model. Often, a set of training samples {(x(i), d(i)); i ∈ Ir } and testing samples {(x(i), d(i)); i ∈ It } are given. Here, d(i)(∈ {0, 1}) is the desired output value of y(x(i)) if the weight vector w is chosen correctly, and Ir and It are disjoined index sets. A sequential online perceptron learning algorithm can be applied to iteratively estimate the correct value of w by presenting the training samples to the perceptron © 2002 by CRC Press LLC
neuron in a random, sequential order. The learning algorithm has the following formulation: w(k + 1) = w(k) + η(d(k) − y(k))x(k) (1.5) where y(k) is computed using Equations (1.3) and (1.4). In Equation (1.5), the learning rate η(0 < η < 1/|x(k)|max ) is a parameter chosen by the user, where |x(k)|max is the maximum magnitude of the training samples {x(k)}. The index k is used to indicate that the training samples are applied sequentially to the perceptron in a random order. Each time a training sample is applied, the corresponding output of the perceptron y(k) is to be compared with the desired output d(k). If they are the same, meaning the weight vector w is correct for this training sample, the weights will remain unchanged. On the other hand, if y(k) = d(k), then w will be updated with a small step along the direction of the input vector x(k). It has been proven that if the training samples are linearly separable, the perceptron learning algorithm will converge to a feasible solution of the weight vector within a ﬁnite number of iterations. On the other hand, if the training samples are not linearly separable, the algorithm will not converge with a ﬁxed, nonzero value of η. MATLAB Demonstration Using MATLAB m-ﬁles perceptron.m, datasepf.m, and sline.m, we conducted a simulation of a perceptron neuron model to distinguish two separa- ble data samples in a two-dimensional unit square. Sample results are shown in Figure 1.4. 1.4 Perceptron simulation results. The ﬁgure on the left-hand side depicts the data samples and the initial position of the separating hyperplane, whose normal vector contains the weights to the perceptron. The right-hand side illustrates that the learning is successful as the ﬁnal hyperplane separates the two classes of data samples. 1.2.2.1.1 Applications of the Perceptron Neuron Model There are several major difﬁculties in applying the perceptron neuron model to solve real world pattern classiﬁcation and signal detection problems: 1. The nonlinear transformation that extracts the appropriate feature vector x is not speciﬁed. 2. The perceptron learning algorithm will not converge for a ﬁxed value of learning rate η if the training feature patterns are not linearly separable. 3. Even though the feature patterns are linearly separable, it is not known how long it takes for the algorithm to converge to a weight vector that corresponds to a hyperplane that separates the feature patterns. © 2002 by CRC Press LLC
1.2.2.2 Multilayer Perceptron A multilayer perceptron (MLP) neural network model consists of a feed-forward, layered network of McCulloch and Pitts’ neurons. Each neuron in an MLP has a nonlinear activation function that is often continuously differentiable. Some of the most frequently used activation functions for MLP include the sigmoid function and the hyperbolic tangent function. A typical MLP conﬁguration is depicted in Figure 1.5. Each circle represents an individual neuron. These neurons are organized in layers, labeled as the hidden layer #1, hidden layer #2, and the output layer in this ﬁgure. While the inputs at the bottom are also labeled as the input layer, there is usually no neuron model implemented in that layer. The name hidden layer refers to the fact that the output of these neurons will be fed into upper layer neurons and, therefore, is hidden from the user who only observes the output of neurons at the output layer. Figure 1.5 illustrates a popular conﬁguration of MLP where interconnections are provided only between neurons of successive layers in the network. In practice, any acyclic interconnections between neurons are allowed. 1.5 A three-layer multilayer perceptron conﬁguration. An MLP provides a nonlinear mapping between its input and output. For example, consider the following MLP structure (Figure 1.6) where the input samples are two-dimensional grid points, and the output is the z-axis value. Three hidden nodes are used, and the sigmoid function has a parameter T = 0.5. The mapping is plotted on the right side of Figure 1.6. The nonlinear nature of this mapping is quite clear from the ﬁgure. The MATLAB m-ﬁles used in this demonstration are mlpdemo1.m and mlp2.m. It has been proven that with a sufﬁcient number of hidden neurons, an MLP with as few as two hidden layer neurons is capable of approximating an arbitrarily complex mapping within a ﬁnite support [4]. 1.2.2.3 Error Back-Propagation Training of MLP A key step in applying an MLP model is to choose the weight matrices. Assuming a layered MLP structure, the weights feeding into each layer of neurons form a weight matrix of that layer (the input layer does not have a weight matrix as it contains no neurons). The values of these weights are found using the error back-propagation training method. © 2002 by CRC Press LLC
1.6 Demonstration of nonlinear mapping property of MLP. 1.2.2.3.1 Finding the Weights of a Single Neuron MLP For convenience, let us ﬁrst consider a simple example consisting of a single neuron to illustrate this procedure. For clarity of explanation, Figure 1.7 represents the neuron in two separate parts: a summation unit to compute the net functions u, and a nonlinear activation function z = f (u). The 1.7 MLP example for back-propagation training — single neuron case. output z is to be compared with a desired target value d, and their difference, the error e = d − z, will be computed. There are two inputs [x1 x2 ] with corresponding weights w1 and w2 . The input labeled with a constant 1 represents the bias term θ shown in Figures 1.1 and 1.5 above. Here, the bias term is labeled w0 . The net function is computed as: 2 u= wi xi = Wx (1.6) i=0 where x0 = 1, W = [w0 w1 w2 ] is the weight matrix, and x = [1 x1 x2 ]T is the input vector. Given a set of training samples {(x(k), d(k)); 1 ≤ k ≤ K}, the error back-propagation training begins by feeding all K inputs through the MLP network and computing the corresponding output {z(k); 1 ≤ k ≤ K}. Here we use an initial guess for the weight matrix W. Then a sum of square © 2002 by CRC Press LLC
error will be computed as: K K K E= [e(k)]2 = [d(k) − z(k)]2 = [d(k) − f (Wx(k))]2 . (1.7) k=1 k=1 k=1 The objective is to adjust the weight matrix W to minimize the error E. This leads to a nonlinear least square optimization problem. There are numerous nonlinear optimization algorithms available to solve this problem. Basically, these algorithms adopt a similar iterative formulation: W(t + 1) = W(t) + W(t) (1.8) where W(t) is the correction made to the current weights W(t). Different algorithms differ in the form of W(t). Some of the important algorithms are listed in Table 1.3. TABLE 1.3 Iterative Nonlinear Optimization Algorithms to Solve for MLP Weights Algorithm W(t) Comments Steepest descend gradient = −ηg(t) = −η dE/dW g is known as the gradient vector. η is the step size or method learning rate. This is also known as error back-propagation learning. Newton’s method = −H −1 g(t) H is known as the Hessian matrix. There are several different ways to estimate it. −1 = − d 2 E/dW2 (dE/dW) Conjugate- = ηp(t) where Gradient method p(t + 1) = −g(t + 1) + β p(t) This section focuses on the steepest descend gradient method that is also the basis of the error back- propagation learning algorithm. The derivative of the scalar quantity E with respect to individual weights can be computed as follows: K K ∂E ∂[e(k)]2 ∂z(k) = = 2[d(k) − z(k)] − for i = 0, 1, 2 (1.9) ∂wi ∂wi ∂wi k=1 k=1 where   2 ∂z(k) ∂f (u) ∂u ∂  = = f (u) wj xj  = f (u)xi (1.10) ∂wi ∂u ∂wi ∂wi j =0 Hence, K ∂E = −2 [d(k) − z(k)]f (u(k))xi (k) . (1.11) ∂wi k=1 With δ(k) = [d(k) − z(k)]f (u(k)), the above equation can be expressed as: K ∂E = −2 δ(k)xi (k) (1.12) ∂wi k=1 δ(k) is the error signal e(k) = d(k) − z(k) modulated by the derivative of the activation function f (u(k)) and hence represents the amount of correction needed to be applied to the weight wi for the © 2002 by CRC Press LLC
given input xi (k). The overall change wi is thus the sum of such contribution over all K training samples. Therefore, the weight update formula has the format of: K wi (t + 1) = wi (t) + η δ(k)xi (k) . (1.13) k=1 If a sigmoid activation function as deﬁned in Table 1.1 is used, then δ(k) can be computed as: ∂E δ(k) = = [d(k) − z(k)] · z(k) · [1 − z(k)] . (1.14) ∂u Note that the derivative f (u) can be evaluated exactly without any approximation. Each time the weights are updated is called an epoch. In this example, K training samples are applied to update the weights once. Thus, we say the epoch size is K. In practice, the epoch size may vary between one and the total number of samples. 1.2.2.3.2 Error Back-Propagation in a Multiple Layer Perceptron So far, this chapter has discussed how to adjust the weights (training) of an MLP with a single layer of neurons. This section discusses how to perform training for a multiple layer MLP. First, some new notations are adopted to distinguish neurons at different layers. In Figure 1.8, the net-function and output corresponding to the kth training sample of the j th neuron of the (L − 1)th are denoted by uL−1 (k) and zj (k), respectively. The input layer is the zeroth layer. In particular, zj (k) = xj (k). j L−1 0 L The output is fed into the ith neuron of the Lth layer via a synaptic weight denoted by wij (t) or, for L simplicity, wij , since we are concerned with the weight update formulation within a single training epoch. 1.8 Notations used in a multiple-layer MLP neural network model. L To derive the weight adaptation equation, ∂E/∂wij must be computed: K K ∂E ∂E ∂uL (k) ∂ L = −2 L (k) · i L = −2 δiL (k) · L L L−1 wim zm (k) ∂wij k=1 ∂ui ∂wij k=1 ∂wij m K L−1 = −2 δiL (k) · zj (k) . (1.15) k=1 L−1 In Equation (1.15), the output zj (k) can be evaluated by applying the kth training sample x(k) to the MLP with weights ﬁxed to wij . However, the delta error term δiL (k) is not readily available and L has to be computed. Recall that the delta error is deﬁned as δiL (k) = ∂E/∂uL (k). Figure 1.9 is now used to illustrate i how to iteratively compute δiL (k) from δm (k) and weights of the (L + 1)th layer. L+1 © 2002 by CRC Press LLC