Thuật toán ICA: 13 Practical Considerations (Cân nhắc thực tế)

Practical Considerations

In the preceding chapters, we presented several approaches for the estimation of

the independent component analysis (ICA) model. In particular, several algorithms

were proposed for the estimation of the basic version of the model, which has a

square mixing matrix and no noise. Now we are, in principle, ready to apply those

algorithms on real data sets. Many such applications will be discussed in Part IV.

However, when applying the ICA algorithms to real data, some practical con-

siderations arise and need to be taken into account. In this chapter, we discuss

different problems that may arise, in particular, overlearning and noise in the data.

We also propose some preprocessing techniques (dimension reduction by principal

component analysis, time filtering) that may be useful and even necessary before the

application of the ICA algorithms in practice.

13.1 PREPROCESSING BY TIME FILTERING

The success of ICA for a given data set may depend crucially on performing some

application-dependent preprocessing steps. In the basic methods discussed in the

previous chapters, we always used centering in preprocessing, and often whitening

was done as well. Here we discuss further preprocessing methods that are not

necessary in theory, but are often very useful in practice.

263

Independent Component Analysis. Aapo Hyv¨

arinen, Juha Karhunen, Erkki Oja

ISBNs: 0-471-40540-X (Hardback); 0-471-22131-7 (Electronic)

264

PRACTICAL CONSIDERATIONS

13.1.1 Why time filtering is possible

In many cases, the observed random variables are, in fact, time signals or time series,

which means that they describe the time course of some phenomenon or system.

Thus the sample index

(

)

is a time index. In such a case, it may be very useful

to filter the signals. In other words, this means taking moving averages of the time

series. Of course, in the ICA model no time structure is assumed, so filtering is not

always possible: If the sample points

(

)

cannot be ordered in any meaningful way

with respect to

, filtering is not meaningful, either.

For time series, any linear filtering of the signals is allowed, since it does not

change the ICA model. In fact, if we filter linearly the observed signals

(

)

obtain new signals, say



(

)

, the ICA model still holds for



(

)

, with the same

mixing matrix. This can be seen as follows. Denote by

the matrix that contains

the observations

(1)

:::

(

)

as its columns, and similarly for

. Then the ICA

model can be expressed as:

(13.1)

Now, time filtering of

corresponds to multiplying

from the right by a matrix, let

us call it

.Thisgives



ASM



(13.2)

which shows that the ICA model still remains valid. The independent components

are filtered by the same filtering that was applied on the mixtures. They are not

mixed with each other in



because the matrix

is by definition a component-wise

filtering matrix.

Since the mixing matrix remains unchanged, we can use the filtered data in the

ICA estimating method only. After estimating the mixing matrix, we can apply the

same mixing matrix on the original data to obtain the independent components.

The question then arises what kind of filtering could be useful. In the following,

we consider three different kinds of filtering: high-pass and low-pass filtering, as

well as their compromise.

PREPROCESSING BY TIME FILTERING

265

13.1.2 Low-pass filtering

Basically, low-pass filtering means that every sample point is replaced by a weighted

average of that point and the points immediately before it.1This is a form of

smoothing the data. Then the matrix

in (13.2) would be something like

:::

1 1 1 0 0 0 0 0

:::

0 1 1 1 0 0 0 0

:::

0 0 1 1 1 0 0 0

:::

0 0 0 1 1 1 0 0

:::

0 0 0 0 1 1 1 0

:::

0 0 0 0 0 1 1 1

:::

(13.3)

Low-pass filtering is often used because it tends to reduce noise. This is a well-

known property in signal processing that is explained in most basic signal processing

textbooks.

In the basic ICA model, the effect of noise is more or less neglected; see Chapter 15

for a detailed discussion. Thus basic ICA methods work much better with data that

does not have much noise, and reducing noise is thus useful and sometimes even

necessary.

A possible problem with low-pass filtering is that it reduces the information in the

data, since the fast-changing, high-frequency features of the data are lost. It often

happens that this leads to a reduction of independence as well (see next section).

13.1.3 High-pass filtering and innovations

High-pass filtering is the opposite of low-pass filtering. The point is to remove slowly

changing trends from the data. Thus a low-pass filtered version is subtracted from

the signal. A classic way of doing high-pass filtering is differencing, which means

replacing every sample point by the difference between the value at that point and

the value at the preceding point. Thus, the matrix

in (13.2) would be

:::



1 0 0 0 0 0

:::

0 1



1 0 0 0 0

:::

0 0 1



1 0 0 0

:::

0 0 0 1



1 0 0

:::

0 0 0 0 1



1 0

:::

0 0 0 0 0 1 1

:::

(13.4)

1To have a causal filter, points after the current point may be left out of the averaging.

266

PRACTICAL CONSIDERATIONS

High-pass filtering may be useful in ICA because in certain cases it increases

the independence of the components. It often happens in practice that the compo-

nents have slowly changing trends or fluctuations, in which case they are not very

independent. If these slow fluctuations are removed by high-pass filtering the fil-

tered components are often much more independent. A more principled approach to

high-pass filtering is to consider it in the light of innovation processes.

Innovation processes

Given a stochastic process

(

)

, we define its innovation

process

(

)

as the error of the best prediction of

(

)

, given its past. Such a best

prediction is given by the conditional expectation of

(

)

given its past, because it

is the expected value of the conditional distribution of

(

)

given its past. Thus the

innovation process of

(

)

is defined by

(

)



(

)

(



(



:::

(13.5)

The expression “innovation” describes the fact that

(

)

contains all the new infor-

mation about the process that can be obtained at time

by observing

(

)

The concept of innovations can be utilized in the estimation of the ICA model due

to the following property:

Theorem 13.1 If

(

)

and

(

)

follow the basic ICA model, then the innovation

processes

(

)

and

(

)

follow the ICA model as well. In particular, the components

(

)

are independent from each other.

On the other hand, independence of the innovations does not imply the indepen-

dence of the

(

)

. Thus, the innovations are more often independent from each

other than the original processes. Moreover, one could argue that the innovations

are usually more nongaussian than the original processes. This is because the

(

)

is a kind of moving average of the innovation process, and sums tend to be more

gaussian than the original variable. Together these mean that the innovation process

is more susceptible to be independent and nongaussian, and thus to fulfill the basic

assumptions in ICA.

Innovation processes were discussed in more detail in [194], where it was also

shown that using innovations, it is possible to separate signals (images of faces) that

are otherwise strongly correlated and very difficult to separate.

The connection between innovations and ordinary filtering techniques is that the

computation of the innovation process is often rather similar to high-pass filtering.

Thus, the arguments in favor of using innovation processes apply at least partly in

favor of high-pass filtering.

A possible problem with high-pass filtering, however, is that it may increase noise

for the same reasons that low-pass filtering decreases noise.

13.1.4 Optimal filtering

Both of the preceding types of filtering have their pros and cons. The optimum would

be to find a filter that increases the independence of the components while reducing

PREPROCESSING BY PCA

267

noise. To achieve this, some compromise between high- and low-pass filtering may

be the best solution. This leads to band-pass filtering, in which the highest and the

lowest frequencies are filtered out, leaving a suitable frequency band in between.

What this band should be depends on the data and general answers are impossible to

give.

In addition to simple low-pass/high-pass filtering, one might also use more so-

phisticated techniques. For example, one might take the (1-D) wavelet transforms of

the data [102, 290, 17]. Other time-frequency decompositions could be used as well.

13.2 PREPROCESSING BY PCA

A common preprocessing technique for multidimensional data is to reduce its dimen-

sion by principal component analysis (PCA). PCA was explained in more detail in

Chapter 6. Basically, the data is projected linearly onto a subspace

(13.6)

so that the maximum amount of information (in the least-squares sense) is preserved.

Reducing dimension in this way has several benefits which we discuss in the next

subsections.

13.2.1 Making the mixing matrix square

First, let us consider the case where the the number of independent components

is smaller than the number of mixtures, say

. Performing ICA on the mixtures

directly can cause big problems in such a case, since the basic ICA model does not

hold anymore. Using PCA we can reduce the dimension of the data to

. After such

a reduction, the number of mixtures and ICs are equal, the mixing matrix is square,

and the basic ICA model holds.

The question is whether PCA is able to find the subspace correctly, so that the

ICs can be estimated from the reduced mixtures. This is not true in general, but

in a special case it turns out to be the case. If the data consists of

ICs only, with

no noise added, the whole data is contained in an

-dimensional subspace. Using

PCA for dimension reduction clearly finds this

-dimensional subspace, since the

eigenvalues corresponding to that subspace, and only those eigenvalues, are nonzero.

Thus reducing dimension with PCA works correctly. In practice, the data is usually

not exactly contained in the subspace, due to noise and other factors, but if the noise

level is low, PCA still finds approximately the right subspace; see Section 6.1.3. In

the general case, some “weak” ICs may be lost in the dimension reduction process,

but PCA may still be a good idea for optimal estimation of the “strong” ICs [313].

Performing first PCA and then ICA has an interesting interpretation in terms of

factor analysis. In factor analysis, it is conventional that after finding the factor

subspace, the actual basis vectors for that subspace are determined by some criteria

Thuật toán ICA - 13: Practical Considerations

Chủ đề:

Tài liệu liên quan

Tài liêu mới

AI tóm tắt

Giới thiệu tài liệu

Đối tượng sử dụng

Từ khoá chính

Nội dung tóm tắt

Hỗ trợ

Phương thức thanh toán

Theo dõi chúng tôi