Infrared spectroscopy as a tool for discrimination between sensitive
and multiresistant K562 cells
Anthoula Gaigneaux, Jean-Marie Ruysschaert and Erik Goormaghtigh
Laboratory of Structure and Function of Biological Membranes, Free University of Brussels, Belgium
Fourier transform infrared spectroscopy was performed on
human leukemic daunorubicin-sensitive K562 cells and their
multiresistant counterpart derived by selection. Statistical
analysis, including variable reduction and linear discrimi-
nant analysis was performed on sensitive and multiresistant
cells spectra in order to establish a diagnostic tool for
multiresistant pattern. For each of the two methods of data
reduction tested [genetic algorithm or principal component
analysis (PCA)] discrimination between the two cell lines was
found to be possible. The best results, obtained with
PCA-reduction, showed an accuracy of 93% on a distinct
test set of spectra. These results demonstrate the efficiency of
Fourier transform infrared spectroscopy for classification.
Further analysis of the spectral differences indicated that
discrimination between resistant and sensitive cells was
based on variations in all cellular contents. Lipid and nucleic
acid decreased, relatively, while the protein content
increased.
Keywords: multiresistance; infrared spectroscopy; multivar-
iate statistics; K562; leukemia.
In recent years, infrared spectroscopy has been a powerful
tool for biodiagnostics [1]. A major advantage of infrared
spectroscopy over more classical techniques of investigation
is that neither staining of the samples nor chemical reagent
additions are necessary. Just a few minutes and a few lLof
a cell suspension are sufficient to obtain a spectrum
representative of all cell constituents.
This technique is based on absorption of infrared light by
the vibrational transitions in covalent bonds. Intensities
provide quantitative information, while frequencies give
qualitative information about the nature of these bonds,
their structure, and their molecular environment.
In complex systems such as cells, the main absorptions
arise from N–H, C¼O, C–H and P¼O bonds from the
proteins, lipids, and nucleic acids present in the cells. An
infrared spectrum of cells is the sum of all these contribu-
tions. A classical group frequency approach can be used to
interpret changes in one of the cell component, as previously
done on leukemic cell lines [2]. Another way to analyse
infrared spectra is to use the spectral signature to correlate
spectral patterns with biological properties. Rigas [3] proved
that IR spectroscopy was able to detect features of human
normal or malignant cultured colonocytes. Multivariate
statistics known as Ôpattern recognition techniquesÕhave
been used to classify spectra in intrinsic groups when they
are unsupervised (cluster analysis, or principal component
analysis). Naumann et al. [4] successfully used cluster
analysis to characterize hundreds of bacterial cell lines. The
same approach was also used to clearly distinguish between
normal and chronic lymphocytic leukemia cells [5]. Super-
vised multivariate methods, such as linear discriminant
analysis (LDA) or partial least squares regression are
powerful tools to build rules of discrimination that are used
later to identify new samples. This method was successfully
applied to skin tumours [6] and to lymph cells and tissues [7].
The multiresistant phenotype is an significant problem in
cancer chemotherapy. It is characterized by cell resistance to
multiple and structurally unrelated drugs [8]. It may be
expressed by cells selected for resistance to a single agent.
Many of these multiresistant cells differ from their sensitive
counterpart by overexpression of a membranous protein of
170 kDa, named P-glycoprotein (P-gp) [9]. Although the
sole presence of P-gp has proven in some cell lines to confer
multidrug resistance phenotype [10], previous studies have
shown that molecular changes in lipid and nucleic acid
fractions of the cells accompany P-gp overexpression
[11,12].
In this study, we worked with sensitive (K562/DNS) and
multiresistant (K562/DNR) human chronic myelogenous
leukemia K562 cells. First, we examined whether infrared
spectroscopy, associated with data reduction techniques and
multivariate statistics, is able to identify multidrug resistant
phenotype in these cells with a high accuracy. Second, we
tried to learn more about biological origin of the spectral
differences that exist between the K562-multiresistant cell
line and its sensitive counterpart.
MATERIALS AND METHODS
Cell culture
K562 is a human chronic myelogenous leukemia cell line. In
this study, two different K562 lines were used. The first cell
line (cell line A) has been described previously [13]. A second
Correspondence to G. Erik, Laboratory of Structure and Function of
Biological Membranes, Free University of Brussels, CP 206/2,
Boulevard du Triomphe, B-1050 Brussels, Belgium.
Fax: + 32 2 650 5382, Tel.: + 32 2 650 5386,
E-mail: egoor@ulb.ac.be
Abbreviations: P-gp, P-glycoprotein; K562/DNS, sensitive K562 cells;
K562/DNR, daunorubicin resistant K562 cells; PCA, principal com-
ponent analysis; LDA, linear discriminant analysis; MDR, multidrug
resistant.
(Received 4 January 2002, accepted 21 January 2002)
Eur. J. Biochem. 269, 1968–1973 (2002) ÓFEBS 2002 doi:10.1046/j.1432-1033.2002.02841.x
cell line (cell line B) was obtained from A. Delforge (Bordet
Hospital, Bruxelles). From each cell line (K562/DNS), a
multiresistant subline (K562/DNR) was derived by selection
on daunorubicin. All cell lines were kept in exponential
growth in RPMI 1640 medium, supplemented with 10%
fetal bovine serum,
L
-glutamine (2%), and 1% antibiotic/
antimycotic solution, at 37 °C, in an humidified atmosphere
of 5% CO
2
. All growing media and supplement were
purchased at Life Technologies (Paisley, Scotland). To
maintain resistance phenotype, K562/DNR was selected in
a medium containing 1 l
M
daunorubicin or doxorubicin for
1 week every 2 months. All infrared measurements were
carried out at least one week after the interruption of culture
with selection agent. The cell lines were maintained at the
same density of cells and then harvested in the same phase of
culture growth (exponential) for IR measurement.
For harvesting, cells were centrifuged 3 min at 300 gand
the pellet washed twice in a solution 0.9% NaCl to remove
all growing medium.
FTIR spectroscopy
An aliquot of cell pellet was deposited on a germanium
crystal (2–5 ·10
5
cells per smear). The sample was
rapidly evaporated in N
2
flux to obtain a homogenous film
of entire cells. IR measurements were recorded between
4000 and 800 cm
)1
by a Bruker Equinox spectrophotometer
(Bruker, Karlsruhe, Germany) containing a liquid
N
2
-refrigerated Mercury Cadmium Telluride detector. Each
spectrum was obtained by averaging of 256 scans at a
resolution of 4 cm
)1
. The spectra were baseline corrected
and normalized for equal area between 1711 and 1485 cm
)1
.
Spectra were encoded every 1 cm
)1
.
Data analysis
All spectra were treated with in-house software working in a
MATLAB
environment (
MATLAB
6, Mathworks Inc., Natick,
USA). Spectra were separated in a training set constituted of
48 samples of the cell line A and a test set composed of 30
spectra of cell line A and cell line B. The training set was the
only one used for model calculations (PCA, genetic
algorithm and LDA).
Data reduction by principal component analysis (PCA).
IR spectra are samples defined by 3000 variables. To
reduce this number, PCA was performed. PCA is a method
of variable reduction that builds linear combinations
between variables (wavenumbers) varying together. The
first linear combination is called the first principal compo-
nent, and contains almost 98% of the variance. The second
principal component is a linear combination of wavenum-
bers, which explains the maximum of residual variance and
is perpendicular to the first one. The following principal
components obey the same rules. This method allows the
reduction of a spectrum to 10 variables (the first 10 principal
components) that explain almost 100% of variance.
Selection of wavenumbers with genetic algorithm.The
genetic algorithm is a supervised method that uses muta-
tion/selection principles to solve problems [14]. Many
parameters can be adjusted for increasing the efficiency of
the algorithm. The data were analysed with a window of five
wavenumbers, assuming that adjacent wavenumbers are
highly correlated. A population of 32 solutions was built at
each generation, and evaluated. The algorithm stopped at
generation 100 or when 50% of convergence was reached
between all the solutions. The mutation rate was 0.005, with
double crossing-over, and data were divided in nine subsets
to cross-validate the models.
As the solutions proposed by this method are not
deterministic, running the algorithm several times allows a
more precise solution to be obtained. Only the wavenum-
bers selected in more than 80% of all models built were kept
in the final model.
Linear discriminant analysis (LDA).Thisstatistical
multivariate method is supervised. It searches for the
variables containing the greatest interclass variance and
the smallest intraclass variance, and constructs a linear
combination of the variables to discriminate between the
classes. The rule is constructed with training set of samples,
and further tested with the test set. We performed LDA in
standard method, i.e. including all the variables in the
model.
RESULTS
Spectral information contained in a cell IR spectrum
Figure 1 (line A) shows a representative spectrum of K562/
DNS cells, which can be divided in three regions. The
absorption between 3000 and 2800 cm
)1
is dominated by
the stretching vibration of CH
2
and CH
3
groups mainly
contained in fatty acids of the cell. The band at 2963 cm
)1
can be assigned to the asymmetric stretching of CH
3
,and
the band at 2873 cm
)1
to its symmetric mode. The bands at
2926 and 2853 cm
)1
can be assigned to an asymmetric and
symmetric stretching mode of CH
2
, respectively [1]. The
peak shoulder present at 1740 cm
)1
canbeassignedtothe
ester C¼O stretching of phospholipids [15,16], not present
in DNA and proteins. Between 1700 and 1300 cm
)1
,
contributions are primarily due to proteins, with some
Fig. 1. K562 cell spectrum and spectral areas selected by genetic algo-
rithm. A smear of about 2 ·10
5
cells was dried on an area of 2 cm
2
on
the germanium surface as explained in Materials and methods.
Wavenumbers selected by genetic algorithm are in shaded.
ÓFEBS 2002 K562 cell classification by FT1R (Eur. J. Biochem. 269) 1969
absorptions from lipids. The stretching of protein amide
C¼O bonds arises at 1650 cm
)1
(amide I). The deformation
of protein amide N–H bond appears at 1540 cm
)1
(amide
II) [15]. The 1450 and 1400 cm
)1
bands arise from the side
chain of proteins [15], but the C–H bending vibration of
fatty acids at 1467 and 1450 cm
)1
[3] and the carboxylate
vibration of fatty acids at 1400 cm
)1
[17] are superimposed.
Absorptions between 1300 and 900 cm
)1
arise mainly
from phosphate associated with nucleic acids, i.e. DNA
and RNA. The absorption bands at 1245 and 1087 cm
)1
are characteristic of asymmetric and symmetric phospho-
diester vibration of nucleic acids [15]. In glycogen-poor cells
such as lymphocytes, Benedetti et al. assigned the shoulders
present at 1117 and 1020 cm
)1
to RNA and DNA,
respectively [18].
Classification by LDA
LDA was applied to discriminate the two cell lines. The
large number of variables (3000) of an infrared spectrum
is a problem for this approach that needs more observations
than variables. We attempted to reduce this number by two
distinct methods: genetic algorithm (supervised method),
and PCA (unsupervised method).
Classification by LDA on spectra restricted by genetic
algorithm
Genetic algorithm was performed on the training set
composed of K562/DNR (22 spectra) and K562/DNS
(26 spectra) cells. Each spectrum was obtained for another
cell culture. The 48 spectra were accumulated over a
period of eight months. The region between 2800 and
1800 cm
)1
, which does not contain any chemical infor-
mation excepted from atmospheric CO
2
, was discarded.
After 16 runs of the algorithm, we selected wavenumbers
present in more than 80% of the 185 models built. They
were distributed in 10 regions of the spectra (Fig. 1),
including several areas in lipid and in nucleic acid regions,
and one area associated with proteins (amide II). Training
set spectra were used for model building in LDA. The
model was tested with the 30 test spectra (not included in
the training set) on which the global accuracy was 73%
(Table 1). About the half of the resistant spectra were
classified in the sensitive class.
Classification by LDA on spectra reduced by PCA
PCA was performed on the training set. At this stage, only
two or three principal components were sufficient to obtain
a partial separation between the two cell lines; Fig. 2 shows
the spectra reduced with PCA projected on vector 2 and
vector 4. Each one of these two vectors (Fig. 3) has
features at characteristic wavenumbers of nucleic acids,
lipids, and protein. It is interesting to note that, in the
second vector, a negative influence of 1625 cm
)1
(attribut-
ed to a beta sheet secondary structure of proteins) is
associated with a positive value of 1667 cm
)1
(ahelix
secondary structure). This may reflect a modification in the
global secondary structure composition in the cells.
Reduced training set spectra was used for model building
in LDA. The results obtained show 100% of correct
classification for the training set. For the test sets (Table 2),
the global accuracy was 93%.
Table 1. Results of LDA for spectra of the test set when spectra were
reduced by genetic algorithm. Overall accuracy on the training set was
100% and overall accuracy on the test set is 73%. Actual assignments
in columns, LDA predicted assignments in rows.
K562/DNS K562/DNR Accuracy
K562/DNS 7 0 100%
A line
K562/DNS 6 0 100%
B line
K562/DNR 6 5 45%
A line
K562/DNR 2 4 67%
B line
Fig. 2. Two-dimensional plot of PCA-reduced spectra of K562 cells.
Resistant K562 cells (39 spectra, black stars), sensitive K562 cells
(39 spectra, circles) of the training (full) and test set (empty). The
percentage of variance represented by each component is indicated on
the axes.
Fig. 3. Principal components which allow a partial separation between
resistant and sensitive K562 cells. (A) Second vector. (B) Fourth vector.
All components are on the same scale.
1970 A. Gaigneaux et al. (Eur. J. Biochem. 269)ÓFEBS 2002