# Mạng thần kinh thường xuyên cho dự đoán P2

Chia sẻ: Do Xon Xon | Ngày: | Loại File: PDF | Số trang:21

0
54
lượt xem
16

## Mạng thần kinh thường xuyên cho dự đoán P2

Mô tả tài liệu

Fundamentals Adaptive systems are at the very core of modern digital signal processing. There are many reasons for this, foremost amongst these is that adaptive ﬁltering, prediction or identiﬁcation do not require explicit a priori statistical knowledge of the input data. Adaptive systems are employed in numerous areas such as biomedicine, communications, control, radar, sonar and video processing (Haykin 1996a).

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: Mạng thần kinh thường xuyên cho dự đoán P2

6. 14 A GENERAL CLASS OF LEARNING ALGORITHMS 2.4 A General Class of Learning Algorithms To introduce a general class of learning algorithms and explain in very crude terms relationships between them, we follow the approach from Guo and Ljung (1995). Let us start from the linear regression equation, y(k) = xT (k)w(k) + ν(k), (2.5) where y(k) is the output signal, x(k) is a vector comprising the input signals, ν(k) is a disturbance or noise sequence, and w(k) is an unknown time-varying vector of weights (parameters) of the adaptive system. Variation of the weights at time k is denoted by n(k), and the weight change equation becomes w(k) = w(k − 1) + n(k). (2.6) Adaptive algorithms can track the weights only approximately, hence for the following ˆ analysis we use the symbol w. A general expression for weight update in an adaptive algorithm is w(k + 1) = w(k) + ηΓ (k)(y(k) − xT (k)w(k)), ˆ ˆ ˆ (2.7) where Γ (k) is the adaptation gain vector, and η is the step size. To assess how far an adaptive algorithm is from the optimal solution we introduce the weight error vector, ˘ w(k), and a sample input matrix Σ(k) as w(k) = w(k) − w(k), ˘ ˆ Σ(k) = Γ (k)xT (k). (2.8) Equations (2.5)–(2.8) yield the following weight error equation: w(k + 1) = (I − ηΣ(k))w(k) − ηΓ (k)ν(k) + n(k + 1). ˘ ˘ (2.9) For diﬀerent gains Γ (k), the following three well-known algorithms can be obtained from (2.7). 3 1. The least mean square (LMS) algorithm: Γ (k) = x(k). (2.10) 2. Recursive least-squares (RLS) algorithm: Γ (k) = P (k)x(k), (2.11) 1 P (k − 1)x(k)x (k)P (k − 1) T P (k) = P (k − 1) − η . (2.12) 1−η 1 − η + ηxT (k)P (k − 1)x(k) 3. Kalman ﬁlter (KF) algorithm (Guo and Ljung 1995; Kay 1993): P (k − 1)x(k) Γ (k) = , (2.13) R + ηxT (k)P (k − 1)x(k) ηP (k − 1)x(k)xT (k)P (k − 1) P (k) = P (k − 1) − + ηQ. (2.14) R + ηxT (k)P (k − 1)x(k) 3 Notice that the role of η in the RLS and KF algorithm is diﬀerent to that in the LMS algorithm. For RLS and KF we may put η = 1 and introduce a forgetting factor instead.
7. FUNDAMENTALS 15 The KF algorithm is the optimal algorithm in this setting if the elements of n(k) and ν(k) in (2.5) and (2.6) are Gaussian noises with a covariance matrix Q > 0 and a scalar value R > 0, respectively (Kay 1993). All of these adaptive algorithms can be referred to as sequential estimators, since they reﬁne their estimate as each new sample arrives. On the other hand, block-based estimators require all the measurements to be acquired before the estimate is formed. Although the most important measure of quality of an adaptive algorithm is gen- erally the covariance matrix of the weight tracking error E[w(k)wT (k)], due to the ˘ ˘ statistical dependence between x(k), ν(k) and n(k), precise expressions for this covari- ance matrix are extremely diﬃcult to obtain. To undertake statistical analysis of an adaptive learning algorithm, the classical approach is to assume that x(k), ν(k) and n(k) are statistically independent. Another assumption is that the homogeneous part of (2.9) w(k + 1) = (I − ηΣ(k))w(k) ˘ ˘ (2.15) and its averaged version E[w(k + 1)] = (I − ηE[Σ(k)])E[w(k)] ˘ ˘ (2.16) are exponentially stable in stochastic and deterministic senses (Guo and Ljung 1995). 2.4.1 Quasi-Newton Learning Algorithm The quasi-Newton learning algorithm utilises the second-order derivative of the objec- tive function 4 to adapt the weights. If the change in the objective function between iterations in a learning algorithm is modelled with a Taylor series expansion, we have ∆E(w) = E(w + ∆w) − E(w) ≈ (∇w E(w))T ∆w + 1 ∆wT H∆w. 2 (2.17) After setting the diﬀerential with respect to ∆w to zero, the weight update equation becomes ∆w = −H −1 ∇w E(w). (2.18) The Hessian H in this equation determines not only the direction but also the step size of the gradient descent. To conclude: adaptive algorithms mainly diﬀer in their form of adaptation gains. The gains can be roughly divided into two classes: gradient-based gains (e.g. LMS, quasi-Newton) and Riccati equation-based gains (e.g. KF and RLS). 2.5 A Step-by-Step Derivation of the Least Mean Square (LMS) Algorithm Consider a set of input–output pairs of data described by a mapping function f : d(k) = f (x(k)), k = 1, 2, . . . , N. (2.19) 4 The term objective function will be discussed in more detail later in this chapter.
8. 16 A STEP-BY-STEP DERIVATION OF THE LMS ALGORITHM x(k) x(k-1) x(k-2) x(k-N+1) z -1 z -1 z -1 z -1 w1(k) w2(k) w3(k) wN (k) y(k) Figure 2.5 Structure of a ﬁnite impulse response ﬁlter The function f ( · ) is assumed to be unknown. Using the concept of adaptive systems explained above, the aim is to approximate the unknown function f ( · ) by a function F ( · , w) with adjustable parameters w, in some prescribed sense. The function F is deﬁned on a system with a known architecture or structure. It is convenient to deﬁne an instantaneous performance index, J(w(k)) = [d(k) − F (x(k), w(k))]2 , (2.20) which represents an energy measure. In that case, function F is most often just the inner product F = xT (k)w(k) and corresponds to the operation of a linear FIR ﬁlter structure. As before, the goal is to ﬁnd an optimisation algorithm that minimises the cost function J(w). The common choice of the algorithm is motivated by the method of steepest descent, and generates a sequence of weight vectors w(1), w(2), . . . , as w(k + 1) = w(k) − ηg(k), k = 0, 1, 2, . . . , (2.21) where g(k) is the gradient vector of the cost function J(w) at the point w(k) ∂J(w) g(k) = . (2.22) ∂w w=w(k) The parameter η in (2.21) determines the behaviour of the algorithm: • for η small, algorithm (2.21) converges towards the global minimum of the error performance surface; • if the value of η approaches some critical value ηc , the trajectory of convergence on the error performance surface is either oscillatory or overdamped; • if the value of η is greater than ηc , the system is unstable and does not converge. These observations can only be visualised in two dimensions, i.e. for only two param- eter values w1 (k) and w2 (k), and can be found in Widrow and Stearns (1985). If the approximation function F in the gradient descent algorithm (2.21) is linear we call such an adaptive system a linear adaptive system. Otherwise, we describe it as a nonlinear adaptive system. Neural networks belong to this latter class. 2.5.1 The Wiener Filter Suppose the system shown in Figure 2.1 is modelled as a linear FIR ﬁlter (shown in Figure 2.5), we have F (x, w) = xT w, dropping the k index for convenience. Con- sequently, the instantaneous cost function J(w(k)) is a quadratic function of the
9. FUNDAMENTALS 17 weight vector. The Wiener ﬁlter is based upon minimising the ensemble average of this instantaneous cost function, i.e. JWiener (w(k)) = E[[d(k) − xT (k)w(k)]2 ] (2.23) and assuming d(k) and x(k) are zero mean and jointly wide sense stationary. To ﬁnd the minimum of the cost function, we diﬀerentiate with respect to w and obtain ∂JWiener = −2E[e(k)x(k)], (2.24) ∂w where e(k) = d(k) − xT (k)w(k). At the Wiener solution, this gradient equals the null vector 0. Solving (2.24) for this condition yields the Wiener solution, w = R−1 r x,d , x,x (2.25) where Rx,x = E[x(k)xT (k)] is the autocorrelation matrix of the zero mean input data x(k) and r x,d = E[x(k)d(k)] is the crosscorrelation between the input vector and the desired signal d(k). The Wiener formula has the same general form as the block least-squares (LS) solution, when the exact statistics are replaced by temporal averages. The RLS algorithm, as in (2.12), with the assumption that the input and desired response signals are jointly ergodic, approximates the Wiener solution and asymptot- ically matches the Wiener solution. More details about the derivation of the Wiener ﬁlter can be found in Haykin (1996a, 1999a). 2.5.2 Further Perspective on the Least Mean Square (LMS) Algorithm To reduce the computational complexity of the Wiener solution, which is a block solution, we can use the method of steepest descent for a recursive, or sequential, computation of the weight vector w. Let us derive the LMS algorithm for an adaptive FIR ﬁlter, the structure of which is shown in Figure 2.5. In view of a general adaptive system, this FIR ﬁlter becomes the ﬁlter structure within Figure 2.1. The output of this ﬁlter is y(k) = xT (k)w(k). (2.26) Widrow and Hoﬀ (1960) utilised this structure for adaptive processing and proposed instantaneous values of the autocorrelation and crosscorrelation matrices to calcu- late the gradient term within the steepest descent algorithm. The cost function they proposed was J(k) = 1 e2 (k), 2 (2.27) which is again based upon the instantaneous output error e(k) = d(k) − y(k). In order to derive the weight update equation we start from the instantaneous gradient ∂J(k) ∂e(k) = e(k) . (2.28) ∂w(k) ∂w(k)