MINISTRY OF EDUCATION VIETNAMESE ACADEMY
AND TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY ————————————
NGUYEN VAN TRUONG
IMPROVING SOME ARTIFICIAL IMMUNE ALGORITHMS FOR NETWORK INTRUSION DETECTION
THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS
Hanoi - 2019
MINISTRY OF EDUCATION VIETNAMESE ACADEMY
AND TRAINING OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY ————————————
NGUYEN VAN TRUONG
IMPROVING SOME ARTIFICIAL IMMUNE ALGORITHMS FOR NETWORK INTRUSION DETECTION
THE THESIS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN MATHEMATICS Major: Mathematical foundations for Informatics Code: 62 46 01 10
Scientific supervisor:
1. Assoc. Prof., Dr. Nguyen Xuan Hoai
2. Assoc. Prof., Dr. Luong Chi Mai
Hanoi - 2019
Acknowledgments
First of all I would like to thank is my principal supervisor, Assoc. Prof.,
Dr. Nguyen Xuan Hoai for introducing me to the field of Artificial Immune System.
He guides me step by step through research activities such as seminar presentations,
paper writing, etc. His genius has been a constant source of help. I am intrigued
by his constructive criticism throughout my PhD. journey. I wish also to thank my
co-supervisor, Assoc. Prof., Dr. Luong Chi Mai. She is always very enthusiastic in
our discussion promising research questions. It is a pleasure and luxury for me to work
with her. This thesis could not have been possible without my supervisors’ support.
I gratefully acknowledge the support from Institute of Information Technology,
Vietnamese Academy of Science and Technology, and from Thai Nguyen University
of Education. I thank the financial support from the National Foundation for Science
and Technology Development (NAFOSTED), ASEAN-European Academic University
Network (ASEA-UNINET).
I thank M.Sc. Vu Duc Quang, M.Sc. Trinh Van Ha and M.Sc. Pham Dinh
Lam, my co-authors of published papers. I thank Assoc. Prof., Dr. Tran Quang
Anh and Dr. Nguyen Quang Uy for many helpful insights for my research. I thank
colleagues, especially my cool labmate Mr. Nguyen Tran Dinh Long, in IT Research
& Development Center, HaNoi University.
Finally, I thank my family for their endless love and steady support.
Certificate of Originality
I hereby declare that this submission is my own work under my scientific super-
visors, Assoc. Prof., Dr. Nguyen Xuan Hoai, and Assoc. Prof., Dr. Luong Chi Mai. I
declare that, it contains no material previously published or written by another person,
except where due reference is made in the text of the thesis. In addition, I certify that
all my co-authors allow me to present our work in this thesis.
Hanoi, 2019
PhD. student
Nguyen Van Truong
i
Contents
List of Figures v
List of Tables vii
Notation and Abbreviation viii
INTRODUCTION 1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Problem statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Outline of thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 BACKGROUND 5
1.1 Detection of Network Anomalies . . . . . . . . . . . . . . . . . . . . . . 5
1.1.1 Host-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2 Network-Based IDS . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 A brief overview of human immune system . . . . . . . . . . . . . . . . 8
1.3 AIS for IDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 AIS features for IDS . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 Selection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Negative Selection Algorithms . . . . . . . . . . . . . . . . . . . 12
ii
1.4.2 Positive Selection Algorithms . . . . . . . . . . . . . . . . . . . 15
1.5 Basic terms and definitions . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Strings, substrings and languages . . . . . . . . . . . . . . . . . 16
1.5.2 Prefix trees, prefix DAGs and automata . . . . . . . . . . . . . 17
1.5.3 Detectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.4 Detection in r-chunk detector-based positive selection . . . . . . 20
1.5.5 Holes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5.6 Performance metrics . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5.7 Ring representation of data . . . . . . . . . . . . . . . . . . . . 23
1.5.8 Frequency trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.6.1 The DARPA-Lincoln datasets . . . . . . . . . . . . . . . . . . . 27
1.6.2 UT dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.6.3 Netflow dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 COMBINATION OF NEGATIVE SELECTION AND POSITIVE SE-
LECTION 30
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3 New Positive-Negative Selection Algorithm . . . . . . . . . . . . . . . . 31
2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3 GENERATION OF COMPACT DETECTOR SET 43
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3 New negative selection algorithm . . . . . . . . . . . . . . . . . . . . . 45
iii
45 3.3.1 Detectors set generation under rcbvl matching rule . . . . . . .
3.3.2 Detection under rcbvl matching rule . . . . . . . . . . . . . . . 48
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4 FAST SELECTION ALGORITHMS 51
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 A fast negative selection algorithm based on r-chunk detector . . . . . . 52
4.4 A fast negative selection algorithm based on r-contiguous detector . . . 57
4.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 APPLYING HYBRID ARTIFICIAL IMMUNE SYSTEM FOR NET-
WORK SECURITY 66
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Hybrid positive selection algorithm with chunk detectors . . . . . . . . 69
5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.4.3 Performance metrics and parameters . . . . . . . . . . . . . . . 72
5.4.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
CONCLUSIONS 78
Contributions of this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Published works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iv
BIBLIOGRAPHY 81
v
List of Figures
1.1 Classification of anomaly-based intrusion detection methods . . . . . . 7
1.2 Multi-layered protection and elimination architecture . . . . . . . . . . 9
1.3 Multi-layer AIS model for IDS . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Outline of a typical negative selection algorithm. . . . . . . . . . . . . . 13
1.5 Outline of a typical positive selection algorithm. . . . . . . . . . . . . . 15
1.6 Example of a prefix tree and a prefix DAG. . . . . . . . . . . . . . . . . 18
1.7 Existence of holes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Negative selections with 3-chunk and 3-contiguous detectors. . . . . . . 23
1.9 A simple ring-based representation (b) of a string (a). . . . . . . . . . . 25
1.10 Frequency trees for all 3-chunk detectors. . . . . . . . . . . . . . . . . . 26
2.1 Binary tree representation of the detectors set generated from S. . . . . 33
2.2 Conversion of a positive tree to a negative one. . . . . . . . . . . . . . . 33
2.3 Diagram of the Detector Generation Algorithm. . . . . . . . . . . . . . 35
2.4 Diagram of the Positive-Negative Selection Algorithm. . . . . . . . . . 37
2.5 One node is reduced in a tree: a compact positive tree has 4 nodes (a)
and its conversion (a negative tree) has 3 node (b). . . . . . . . . . . . 38
2.6 Detection time of NSA and PNSA. . . . . . . . . . . . . . . . . . . . . 40
2.7 Nodes reduction on trees created by PNSA on Netflow dataset. . . . . . 41
2.8 Comparison of nodes reduction on Spambase dataset. . . . . . . . . . . 41
3.1 Diagram of a algorithm to generate perfect rcbvl detectors set. . . . . . 47
4.1 Diagram of the algorithm to generate positive r-chunk detectors set. . . 55
vi
4.2 A prefix DAG G and an automaton M . . . . . . . . . . . . . . . . . . 57
4.3 Diagram of the algorithm to generate negative r-contiguous detectors set. 61
4.4 An automaton represents 3-contiguous detectors set. . . . . . . . . . . . 62
4.5 Comparison of ratios of runtime of r-chunk detector-based NSA to run-
time of Chunk-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.6 Comparison of ratios of runtime of r-contiguous detector-based NSA to
runtime of Cont-NSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vii
List of Tables
1.1 Performance comparison of NSAs on linear strings and ring strings. . . 24
2.1 Comparison of memory and detection time reductions. . . . . . . . . . 39
2.2 Comparison of nodes generation on Netflow dataset. . . . . . . . . . . . 40
3.1 Data and parameters distribution for experiments and results comparison. 49
4.1 Comparison of our results with the runtimes of previously published
algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Comparison of Chunk-NSA with r-chunk detector-based NSA. . . . . . 63
4.3 Comparison of proposed Cont-NSA with r-contiguous detector-based NSA. 64
5.1 Features for NIDS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Distribution of flows and parameters for experiments. . . . . . . . . . . 73
5.3 Comparison between PSA2 and other algorithms. . . . . . . . . . . . . 74
5.4 Comparison between ring string-based PSA2 and linear string-based PSA2. 76
viii
Notation and Abbreviation
Notation
Length of data samples
(cid:96) Sr |X| Set of ring presentations of all strings in S Cardinality of set X
An alphabet, a nonempty and finite set of symbols
Σ Σk Set of all strings of length k on alphabet Σ, where k is a
positive integer. Set of all strings on alphabet Σ, including an empty string.
Matching threshold
Set of all positive r-chunk detectors at position i.
Set of all negative r-chunk detectors at position i.
Σ∗ r Dpi Dni CHUNKp(S, r) Set of all positive r-chunk detectors. Set of all negative r-chunk detectors. CHUNK(S, r)
CONT(S, r) Set of all r-contiguous detectors.
L(X) rcbvl Set of all nonself strings detected by X. r-contiguous bit with variable length.
Abbreviation
Artificial Immune System AIS
Accuracy Rate Ant Colony Optimization ACC ACO
Anomaly Network Intrusion Detection System ANIDS
Block-Based Neural Network BBNN
Chunk-NSA Chunk Detector-Based Negative Selection Algorithm Cont-NSA Contiguous Detector-Based Negative Selection Algorithm
DR Detection Rate
DAG Directed Acyclic Graph
FAR GA False Alarm Rate Genetic Algorithm
HIS Human Immune System
HIDS Host Intrusion Detection System
IDS Intrusion Detection System
ix
Machine Learning ML
Multilayer Perceptron Network Intrusion Detection System MLP NIDS
Negative Selection NS
Negative Selection Algorithm NSA
Negative Selection Mutation Positive-Negative Selection Algorithm NSM PNSA
Positive Selection Algorithm PSA
Two-class Positive Selection Algorithm PSA2
PSO PSOGSA Particle Swarm Optimization Particle Swarm Optimization-Gravitational Search Algorithm
Real-valued NSA RNSA
Support Vector Machines SVM
Transmission Control Protocol Variable length detector-based NSA TCP VNSA
1
INTRODUCTION
Motivation
Internet users and computer networks are suffering from rapidly increasing num-
ber of attacks. In order to keep them safe, there is a need for effective security monitor-
ing systems, such as Intrusion Detection Systems (IDS). However, intrusion detection
has to face a number of different problems such as large network traffic volumes, im-
balanced data distribution, difficulties to realize decision boundaries between normal
and abnormal actions, and a requirement for continuous adaptation to a constantly
changing environment. As a result, many researchers have attempted to use different
types of approaches to build reliable intrusion detection system.
Computational intelligence techniques, known for their ability to adapt and to
exhibit fault tolerance, high computational speed and resilience against noisy informa-
tion, are hopefully alternative methods to the problem.
One of the promising computational intelligence methods for intrusion detection
that have emerged recently are artificial immune systems (AIS) inspired by the biolog-
ical immune system. Negative selection algorithm (NSA), a dominating model of AIS,
is widely used for intrusion detection systems (IDS) [55, 52]. Despite its successful
application, NSA has some weaknesses: 1-High false positive rate (false alarm rate)
and false negative rate, 2-High training and testing time, 3-Exponential relationship
between the size of the training data and the number of detectors possibly generated for
testing, 4-Changeable definitions of ”normal data” and ”abnormal data” in dynamic
network environment [55, 79, 92]. To overcome these limitations, trends of recent works
are to concentrate on complex structures of immune detectors, matching methods and
hybrid NSAs [11, 94, 52].
Following trends mentioned above, in this thesis we investigate the ability of
NSA to combine with other classification methods and propose more effective data
2
representations to improve some NSA’s weaknesses.
Scientific meaning of the thesis: to provide further background to improve per-
formance of AIS-based computer security field in particular and IDS in general.
Reality meaning of the thesis: to assist computer security practicers or experts
implement their IDS with new features from AIS origin.
The major contributions of this research are: Propose a new representation of
data for better performance of IDS; Propose a combination of existing algorithms as
well as some statistical approaches in an uniform framework; Propose a complete and
non-redundant detector representation to archive optimal time and memory complex-
ities.
Objectives
Since data representation is one of the factors that affect the training and testing
time, a compact and complete detector generation algorithm is investigated.
The thesis investigates optimal algorithms to generate detector set in AIS. They
help to reduce both training time and detecting time of AIS-based IDSs.
Also, it is regarded to propose and investigate an AIS-based IDS that can
promptly detect attacks, either if they are known or never seen before. The proposed
system makes use of AIS with statistics as analysis methods and flow-based network
traffic as experimental data.
Problem statements
Since the NSA has some limitations as listed in the first section, this thesis
concentrates on three problems:
1. The first problem is to find compact representations of data. Objectives of this
problem’s solution is not only to minimize memory storage but also to reduce
testing time.
2. The second problem is to propose algorithms that can reduce training time
and testing time in compared with all existing related algorithms.
3
3. The third problem is to improve detection performance with respect to reduc-
ing false alarm rates while keeping detection rate and accuracy rate as high as
possible.
Solutions of these problems can partly improve first three weaknesses as listed in the
first section. Regarding to the last NSAs’ weakness about changeable definitions of
”normal data” and ”abnormal data” in dynamic network environment, we consider it
as a risk in our proposed algorithm and left it for future work.
Logically, it is impossible to find an optimal algorithm that can both reduce time
and memory complexities and obtain best detection performance. These aspects are
always in conflict with each other. Thus, in each chapter, we will propose algorithms
to solve each problem quite independently.
The intrusion detection problem mentioned in this thesis can be informally
stated as:
Given a finite set S of network flows which labeled with self (normal) or nonself
(abnormal). The objective is to build classifying models on S that can label exactly an
unlabeled network flow s.
Outline of thesis
The first chapter introduces the background knowledge necessary to discuss the
algorithms proposed in following chapters. First, detection of network anomalies is
briefly introduced. Following that, the human immune system, artificial immune sys-
tem, machine learning and their relevance are reviewed and discussed. Then, popular
datasets used for experiments in the thesis are examined. related works.
In Chapter 2, a combination method of selection algorithms is presented. The
proposed technique helps to reduce detectors storage generated in training phase. Test-
ing time, an important measurement in IDS, will also be reduced as a direct consequence
of a smaller memory complexity. Tree structure is used in this chapter (and in Chapter
5) to improve time and memory complexities.
A complete and nonredundant detector set, also called perfect detectors set,
4
is necessary to archive acceptable self and nonself coverage of classifiers. A selection
algorithm to generate a perfect detectors set is investigated in Chapter 3. Each detector
in the set is a string concatenated from overlapping classical ones. Different from
approaches in the other chapters, discrete structure of string-based detectors in this
chapter are suitable for detection in distributed environment.
Chapter 4 includes two selection algorithms for fast training phase. The optimal
algorithms can generate a detectors set in linear time with respect to size of training
data. The experiment results and theoretical proof show that proposed algorithms
outperform all existing ones in term of training time. In term of detection time, the
first algorithm and the second one is linear and polynomial, respectively.
Chapter 5 mainly introduces a hybrid approach of positive selection algorithm
with statistics for more effective NIDS. Frequencies of self and nonself data (strings) are
contained in leaves of trees representing detectors. This information plays an important
role in improving performance of the proposed algorithms. The hybrid approach came
as a new positive selection algorithm for two-class classification that can be trained
with samples from both self and nonself data types.
5
Chapter 1
BACKGROUND
The human immune system (HIS) has successfully protected our bodies against
attacks from various harmful pathogens, such as bacteria, viruses, and parasites. It
distinguishes pathogens from self-tissue, and further eliminates these pathogens. This
provides a rich source of inspiration for computer security systems, especially intrusion
detection systems [92]. Hence, applying theoretical immunology and observed immune
functions, its principles, and its models to IDS has gradually developed into a new
research field, called artificial immune system (AIS).
How to apply remarkable features of HIS to archive scalable and robust IDS
is considered a researching gap in the field of computer security. In this chapter, we
introduce the background knowledge necessary to discuss the algorithms proposed in
following chapters that can partly fulfill the gap.
Firstly, a brief introduction to network anomaly detection is presented. We
then overview HIS. Next, immune selection algorithms, detectors, performance metrics
and their relevance are reviewed and discussed. Finally, some popular datasets are
examined.
1.1 Detection of Network Anomalies
The idea of intrusion detection is predicated on the belief that an intruder’s
behavior is noticeably different from that of a legitimate user and that many unautho-
rized actions are detectable [65]. Intrusion detection systems (IDSs) are deployed as a
second line of defense along with other preventive security mechanisms, such as user
6
authentication and access control. Based on its deployment, an IDS can act either as
a host-based or as a network-based IDS.
1.1.1 Host-Based IDS
A Host-Based IDS (HIDS) monitors and analyzes the internals of a computing
system. A HIDS may detect internal activity such as which program accesses what
resources and attempts illegitimate access, for example, an activity that modifies the
system password database. Similarly, a HIDS may look at the state of a system and
its stored information whether it is in RAM or in the file system or in log files or
elsewhere. Thus, one can think of a HIDS as an agent that monitors whether anything
or anyone internal or external has circumvented the security policy that the operating
system tries to enforce [12].
1.1.2 Network-Based IDS
A Network-Based IDS (NIDS) detects intrusions in network data. Intrusions
typically occur as anomalous patterns. Most techniques model the data in a sequential
fashion and detect anomalous subsequences. The primary reason for these anomalies
is the attacks launched by outside attackers who want to gain unauthorized access to
the network to steal information or to disrupt the network. In a typical setting, a
network is connected to the rest of the world through the Internet. The NIDS reads
all incoming packets or flows, trying to find suspicious patterns. For example, if a
large number of TCP connection requests to a very large number of different ports are
observed within a short time, one could assume that there is someone committing a
port scan at some of the computers in the network. Port scans mostly try to detect
incoming shell codes in the same manner that an ordinary intrusion detection system
does. In addition to inspecting the incoming traffic, a NIDS also provides valuable
information about intrusion from outgoing or local traffic. Some attacks might even be
staged from the inside of a monitored network or network segment; and therefore, not
regarded as incoming traffic at all. The data available for intrusion detection systems
can be at different levels of granularity, like packet level traces or Cisco netflow data.
7
The data is high dimensional, typically, with a mix of categorical as well as continuous
numeric attributes. Misuse-based NIDSs attempt to search for known intrusive patterns
while an anomaly-based intrusion detector searches for unusual patterns. Today, the
intrusion detection research is mostly concentrated on anomaly-based network intrusion
detection because it can detect both known and unknown attacks [12].
1.1.3 Methods
On the basis of the availability of prior knowledge, the detection mechanism
used, the mode of performance and the ability to detect attacks, existing anomaly
detection methods are categorized into six broad categories [41] as shown in Fig. 1.1.
This figure is adapted from [12].
Figure 1.1: Classification of anomaly-based intrusion detection methods
AIS is a fairly new research subfield of Computational intelligence. It was
considered as a system that acts intelligently: What it does is appropriate for its
circumstances and its goal; it is flexible to changing environments and changing goals;
it learns from experience; also it makes appropriate choices given perceptual limitations
and finite computation [68].
8
1.1.4 Tools
IDS tools are used for purposes such as information gathering, victim identi-
fication, packet capture, network traffic analysis and visualization of traffic behavior.
These tools for both commercial and free purposes can be examplified, such as Snort,
Suricata, Bro, OSSEC, Samhain, Cisco Secure IDS, CyberCop, and RealSecure. Some
immune-related IDS tools including LISYS [10], which is based on TCP packages, and
MILA [26], a multilevel immune learning algorithm proposed for novel pattern recog-
nition.
However, despite their initially promising and influential properties, immune-
based IDSs never made it beyond the prototype stage [83]. Two main issues that
impeded the progress of immune algorithms were identified: large computational cost
to achieve acceptable coverage of the potentially anomalous region [54], and the failure
of these algorithms to generalize properly beyond the training set [79].
1.2 A brief overview of human immune system
Mainly being inspired by the human immune system, researchers have devel-
oped AISs intellectually and innovatively. Physical barriers, physiological barriers, an
innate immune system, and an adaptive immune system are main factors of a multi-
layered protection architecture included in our human immune system; among which,
the adaptive immune system being capable of adaptively recognizing specific types of
pathogens, and memorizing them for accelerated future responses is a complex of a
variety of molecules, cells, and organs spread all over the body [46]. Pathogens are for-
eign substances like viruses, parasites and bacteria which attack the body. Figure 1.2,
adapted from [77], presents a multi-layered protection and elimination architecture.
T cells and B cells cooperate to distinguish self from nonself. On the one hand,
T cells recognize antigens with the help of major histocompatibility complex (MHC)
molecules. Antigen presenting cells ingest and fragment antigens to peptides. MHC
molecules transport these peptides to the surface of antigen presenting cells. T cells,
whose receptors bind with these peptide-MHC combinations, are said to recognize
9
Figure 1.2: Multi-layered protection and elimination architecture
antigens. On the other hand, B cells recognize antigens by binding their receptors
directly to antigens. The bindings actually are chemical bonds between receptors and
epitopes. The more complementary the structure and the charge between receptors and
epitopes are, the more likely binding will occur. The strength of the bond is termed
affinity. To avoid autoimmunity, T cells and B cells must pass a negative selection
stage, where lymphocytes matching self cells are killed.
Prior to negative selection, T cells undergo positive selection. This is because in
order to bind to the peptide-MHC combinations, they must recognize self MHC first.
Thus, the positive selection will eliminate T cells with weak bonds to self MHC. T cells
and B cells, which survive the negative selection, become mature, and enter the blood
stream to perform the detection task. Since these mature lymphocytes have never
encountered antigens, they are naive. Naive T cells and B cells can possibly auto-react
with self cells, because some peripheral self proteins are never presented during the
negative selection stage. To prevent self-attack, naive cells need two signals in order
to be activated: one occurs when they bind to antigens, and the other is from other
sources as a confirmation. Naive T helper cells receive the second signal from innate
system cells. In the event that they are activated, T cells begin to clone. Some of
the clones will send out signals to stimulate macrophages or cytotoxic T cells to kill
antigens, or send out signals to activate B cells. Others will form memory T cells. The
activated B cells migrate to a lymph node. In the lymph node, a B cell will clone itself.
10
Meanwhile, somatic hyper mutation is triggered, whose rate is 10 times higher than
that of the germ line mutation, and is inversely proportional to the affinity. Mutation
changes the receptor structures of offspring; hence offspring have to bind to pathogenic
epitopes captured within the lymph nodes. If they do not bind, they will simply die
after a short time. Whereas, in case they succeed in binding, they will leave the lymph
node and differentiate into plasma or memory B cells.
In summary, the HIS is a distributed, self-organizing and lightweight defense
system for the body. These remarkable features fulfill and benefit the design goals of
an intrusion detection system, thus resulting in a scalable and robust system [53].
1.3 AIS for IDS
1.3.1 AIS model for IDS
Figure 1.3 illustrates the steps necessary to obtain an AIS solution for a secu-
rity problem, as firstly envisioned by de Castro and Timmis [27] and latter adopted
by Fernandes et al. [35]. Firstly, the security domain of the system to model needs
to be identified. Secondly,the immune entities that best fit the needs of the system
should be picked from the immunological theories. That should ease pointing out the
representation of the entities. In the step of the affinity measures one should take into
account a matching rule that outputs if two elements should bind.
Figure 1.3: Multi-layer AIS model for IDS
11
1.3.2 AIS features for IDS
According to Kim et al. [55], AIS features can be illustrated and summarized
as follows.
Firstly, a distributed IDS supports robustness, configurability, extendibility and
scalability. It is robust since the failure of one local intrusion detection process does
not cripple the overall IDS. It is also easy to configure a system since each intrusion
detection process can be simply tailored for the local requirements of a specific host.
The addition of new intrusion detection processes running on different operating sys-
tems does not require modification of existing processes and hence it is extensible. It
can also scale better, since the high volume of audit data is distributed amongst many
local hosts and is analyzed by those hosts.
Secondly, a self-organizing IDS provides adaptability and global analysis. With-
out external management or maintenance, a self organizing IDS automatically detects
intrusion signatures which are previously unknown and/or distributed, and eliminates
and/or repairs compromised components. Such a system is highly adaptive because
there is no need for manual updates of its intrusion signatures as network environments
change. Global analysis emerges from the interactions among a large number of varied
intrusion detection processes.
Next, a lightweight IDS supports efficiency and dynamic features. A lightweight
IDS does not impose a large overhead on a system or place a heavy burden on CPU
and I/O. It places minimal work on each component of the IDS. The primary functions
of hosts and networks are not adversely affected by the monitoring. It also dynami-
cally covers intrusion and non-intrusion pattern spaces at any given time rather than
maintaining entire intrusion 8 and non-intrusion patterns.
One more important feature is a multi-layered IDS which increases robustness.
The failure of one-layer defense does not necessarily allow an entire system to be
compromised. While a distributed IDS allocates intrusion detection processes across
several hosts, a multi-layered IDS places different levels of sensors at one monitoring
place.
Additionally, a diverse IDS provides robustness. A variety of different intrusion
12
detection processes spread across hosts will slow an attack that has successfully com-
promised one or more hosts. This is because an understanding of the intrusion process
at one site provides limited or no information on intrusion processes at other sites.
Finally, it is a disposable IDS that increases robustness, extendibility and config-
urability. A disposable IDS does not depend on any single component. Any component
can be easily and automatically replaced with other components. These properties are
important in an effective IDS, as well as being established properties of the HIS.
1.4 Selection algorithms
The main developments within AIS have focussed on three immunological the-
ories: clonal selections, immune networks and negative selections. Negative selection
approaches are based on self-nonself discrimination in biology system. This property
makes it attractive for computer and network security researchers. A survey by G. C.
Silva and D. Dasgupta in [71] showed that in five-year period 2008-2013, NSA predom-
inated all the other models of AIS in term of published papers relating to both network
security and anomaly detection. This trend triggers for much of the research work in
this thesis.
A model of AIS, positive selection algorithm (PSA), is also investigated. Under
some conditions, we will prove in a follow section that PSA is adequate to NSA in term
of anomaly detection performance.
1.4.1 Negative Selection Algorithms
Negative selection is a mechanism employed to protect the body against self-
reactive lymphocytes. Such lymphocytes can occur because the building blocks of
antibodies are different gene segments that are randomly composed and undergo a fur-
ther somatic hypermutation process. Therefore, this process can produce lymphocytes
which are able to recognise self-antigens [85].
NSAs are among the most popular and extensively studied techniques in ar-
tificial immune systems that simulate the negative selection process of the biological
immune system. Stephanie Forrest et al. [38] proposed an algorithmic model of this
13
process, which can be considered as a classifier that learns from only self samples
(negative examples)1.
A typical NSA comprises of two phases: detector generation and detection [7,
50]. In the detector generation phase (Fig. 1.4.a), the detector candidates are generated
by some random processes and censored by matching them against given self samples
taken from a set S (representing the system components). The candidates that match
any element of S are eliminated and the rest are kept and stored in the detector
set D. In the detection phase (Fig. 1.4.b), the collection of detectors are used to
distinguish self (system components) from nonself (outlier, anomaly, etc.). If incoming
data instance matches any detector, it is claimed as nonself or anomaly. Figure 1.4 is
adapted from [38].
Figure 1.4: Outline of a typical negative selection algorithm.
Concept matching or recognition, are used both in the detector generation phase
and in the anomaly detection phase. Regardless of representation, a matching rule on
a detector d and a data sample s can be informally defined as a distance measure
between d and s within a threshold. Matching threshold exposes the concept of partial
1At the writing time of this thesis, the paper has been cited more than 2300 times
matching: two points do not have to be exactly the same to be considered matching.
14
A partial matching rule can support an approximation or a generalization in the algo-
rithms. The choice of the matching rule or the threshold in a matching rule must be
application specific and representation dependent [51]. For real-valued representation,
some popular rules are Euclidean distance and Manhattan distance. In string represen-
tation, rcb(r-contiguous bits) matching rule and r-chunk matching rule are the most
famous ones and they are formally presented in following section.
Since its introduction, NSA has had many applications such as in computer virus
detection [37, 5], monitoring UNIX processes [36], anomaly detection [22, 26], intrusion
detection [19, 54, 46, 59, 18, 93], scheduling [64], fault detection and diagnosis [45, 72],
negative database [33, 98], negative authentication [25, 20]. Moreover, NSA has been
quite successfully applied in immunology where they are used as models to provide
insight into fundamental principles of immunity and infection [15], and to illustrate
the immunological processes such as HIV infection [56, 57].
The most significant characteristics of a NSA making its uniqueness and strength
are:
• No prior knowledge of nonself is required [29].
• It is inherently distributable; no communication between detectors is needed [30].
• It can hide the self concept [33].
• Compared with other change detection methods, NSAs do not depend on the
knowledge of defined normal. Consequently, checking activity of each site can
be based on a unique signature of each while the same algorithm is used over
multiple sites.
• The quality of the check can be traded off against the cost of performing a check
[38].
• Symmetric protection is provided so the malicious manipulation on detector set
can be detected by normal behavior of the system [38].
• If the process of generating detectors is costly, it can be distributed to multiple
sites because of its inherent parallel characteristics.
15
• Detection is tunable to balance between coverage (matching probability) and the
number of detectors [29].
1.4.2 Positive Selection Algorithms
Contrary to NSAs, PSAs have been less studied in the literature. PSAs are
mainly developed and applied in intrusion detection [23, 73, 44, 66], malware detec-
tion [39], spam detection [81], and classification [40, 67]. Stibor et al. [80] argues that
positive selection might have better detection performance than negative one. How-
ever, for problems and applications that the number of detectors generated by NSAs is
much less than that of self samples, negative selection is obviously a better choice [51].
Similar to NSA, a PSA contains two phases: detector generation and detection.
In the detector generation phase (Fig. 1.5.a), the detector candidates are generated by
some random processes and matched against the given self sample set S. The candi-
dates that do not match any element in S are eliminated and the rest are kept and
stored in the detector set D. In the detection phase (Fig. 1.5.b), the collection of de-
tectors are used to distinguish self from nonself. If incoming data instance matches any
detector, it is claimed as self. In other words, detectors modeling involves generating a
Figure 1.5: Outline of a typical positive selection algorithm.
16
set of strings (patterns) that do not match any string in a training dataset too strongly
(negative selection) or weakly match at least one string from the same dataset (positive
selection). Having obtained the detectors, one usually examines a set of testing dataset
(i.e., ”antigens”), for which we search one or all matching detectors for classification.
1.5 Basic terms and definitions
In selection algorithms, an essential component is the matching rule which de-
termines the similarity between detectors and self samples (in the detector generation
phase) and coming data instances (in the detection phase). Obviously, the matching
rule is dependent on detector representation. In this thesis, both self and nonself cells
are represented as strings of fixed length. This representation is a simple and popular
representation for detectors and data in AIS, and other representations (such as real
valued) could be reduced to binary, a special case of string [42, 51].
1.5.1 Strings, substrings and languages
An alphabet Σ is nonempty and finite set of symbols. A string s ∈ Σ∗ is a
sequence of symbols from Σ, and its length is denoted by |s|. A string is called empty
string if its length equals 0. Given an index i ∈ {1, . . . , |s|}, then s[i] is the symbol
at position i in s. Given two indices i and j, whenever j ≥ i, then s[i . . . j] is the
substring of s with length j − i + 1 that starts at position i and whenever j < i, then
s[i . . . j] is the empty string. If i = 1, then s[i . . . j] is a prefix of s and, if j = |s|,
then s[i . . . j] is a suffix of s. For a proper prefix or suffix s(cid:48) of s, we have in addition
|s(cid:48)| < |s|. Given a string s ∈ Σ(cid:96), another string d ∈ Σr with 1 ≤ r ≤ (cid:96), and an index
i ∈ {1, . . . , (cid:96) − r + 1}, we say that d occurs in s at position i if s[i . . . i + r − 1] = d.
Moreover, concatenation of two strings s and s(cid:48) is s + s(cid:48).
A set of strings S ⊆ Σ∗ is called a language. For two indices i and j, we define
S[i . . . j] = {s[i . . . j]|s ∈ S}.
17
1.5.2 Prefix trees, prefix DAGs and automata
A prefix tree T is a rooted directed tree with edge labels from Σ where for all
c ∈ Σ, every node has at most one outgoing edge labeled with c. For a string s, we write
s ∈ T if there is a path from the root of T to a leaf such that s is the concatenation
of the labels on this path. The language L(T ) described by T is defined as the set of
all strings that have a nonempty prefix s ∈ T . For example, for T as in Fig. 1.6.a we
have 0 ∈ T and 10 ∈ T , but 1 (cid:54)∈ T . Furthermore, 0 ∈ L(T ), 01 ∈ L(T ) since 0 ∈ T and
11 (cid:54)∈ L(T ) since no prefix of 11 lies in T . Trees for self dataset and nonself dataset are
called positive trees and negative trees, respectively.
A prefix DAG D is a directed acyclic graph with edge labels from Σ, where
again for all c ∈ Σ, every node has at most one outgoing edge labeled with c. Similar
to prefix trees, the terms root and leaf used to refer to a node without incoming and
outgoing edges, respectively. We write s ∈ D if there is a path from a root node to
a leaf node in D that is labeled by s. Given n ∈ D, the language L(D, n) contains
all strings that have a nonempty prefix that labels a path from n to some leaf. For
instance, if D is the DAG in Fig. 1.6.b and n is its lower left node, then L(D, n) consists
of all strings starting with 11. Moreover, we define L(D) = ∪n is a root of DL(D, n).
A finite automaton is a tuple M = (Q, q0, Qa, Σ, ∆), where Q is a set of states
with a distinguished initial state q0 ∈ Q, Qa ⊆ Q the set of accepting states, Σ the
alphabet of M , and ∆ ⊆ Q × Σ × Q the transition relation. Furthermore, we assume
that the transition relation is unambiguous: for every q ∈ Q and every c ∈ Σ there is
at most one q(cid:48) ∈ Q with (q, c, q(cid:48)) ∈ ∆. It is common to represent the transition relation
as a graph with node set Q (with the initial state and the accepting states highlighted
properly) and labeled edges (a c-labeled edge from q to q(cid:48) if q(cid:48) ∈ Q with (q, c, q(cid:48)) ∈ ∆).
An automaton M is said to accept a string s if its graph contains a path from q0 to
some q ∈ Qa whose concatenated edge labels equal s (note that this path may contain
loops). The language L(M ) contains all strings accepted by M .
A prefix DAG can be turned into a finite automaton to decide the membership
of strings in languages. The details steps of this process is presented in Chapter 4.
18
Figure 1.6: Example of a prefix tree and a prefix DAG.
1.5.3 Detectors
In PSAs and NSAs, an essential component is the matching rule which deter-
mines the similarity between detectors and self samples (in the detector generation
phase) and coming data instances (in the detection phase). Obviously, the matching
rule is dependent on detector representation. For string based AIS, the r-chunk and
r-contiguous detectors are among the most common matching rules. A r-chunk match-
ing rule can be seen as a generalisation of the r-contiguous matching rule, which helps
AIS to achieve better results on data where adjacent regions of the input data sequence
are not necessarily semantically correlated, such as in network data packets [9].
An important difference between rcb and r-chunk matching rules is holes, or the
undetectable strings, that they may induce. This concept is presented in Section 1.5.5.
Given a nonempty alphabet Σ and finite set of symbols, positive and negative
r-chunk detectors, r-contiguous detectors, rcbvl-detectors could be defined as follows:
Definition 1.1 (Positive r-chunk detectors). Given a self set S ⊆ Σ(cid:96), a tuple (d, i) of
a string d ∈ Σr, where r ≤ (cid:96), and an index i ∈ {1, ..., (cid:96) − r + 1} is a positive r-chunk
detector if there exists a s ∈ S such that d occurs in s.
Definition 1.2 (Negative r-chunk detectors). Given a self set S ⊆ Σ(cid:96), a tuple (d, i) of
a string d ∈ Σr, r ≤ (cid:96), and an index i ∈ {1, ..., (cid:96) − r + 1} is a negative r-chunk detector
if d does not occurs any s ∈ S.
Although some proposed approaches in following chapters can be implemented
on any finite alphabet, binary strings used in all examples are binary, Σ = {0, 1}, just
for easy understanding.
Example 1.1. Let (cid:96) = 6, r = 3. Given a set S of five self strings: s1 = 010101, s2
= 111010, s3 = 101101, s4 = 100011, s5 = 010111. The set of some positive r-chunk
19
detectors is {(010,1), (111,1), (101,2), (110,2), (010,3), (101,3), (101,4), (010,4),
(111,4))}. The set of some negative r-chunk detectors is {(000,1), (001,1), (011,1),
(001,2), (010,2), (100,2), (000,3), (100,3), (000,4), (001,4), (100,4)}.
Definition 1.3. Given a self set S ⊆ Σ(cid:96), a string d ∈ Σ(cid:96) is a r-contiguous detector if
d[i, . . . , i + r − 1] does not occurs any s ∈ S for all i ∈ {1, ..., (cid:96) − r + 1}.
Example 1.2. Let (cid:96) = 5, r = 3. Given a set of 7 self strings S = {01111, 00111,
10000, 10001, 10010, 10110, 11111}. The set of all 3-contiguous detectors is {01011,
11011}. This example is adapted from [32].
We also use the following notations:
• Dpi = {(d, i)|(d, i) is a positive r-chunk detector} is set of all positive r-chunk
detectors at position i, i = 1, . . . , (cid:96) − r + 1.
• Dni = {(d, i)|(d, i) is a negative r-chunk detector} is set of all negative r-chunk
detectors at position i, i = 1, . . . , (cid:96) − r + 1.
i=1 Dpi is set of all positive r-chunk detectors.
• CHU N Kp(S, r) = ∪(cid:96)−r+1
i=1 Dni is set of all negative r-chunk detectors.
• CHU N K(S, r) = ∪(cid:96)−r+1
• CONT(S, r) is the set of all r-contiguous detectors that do not match any string
in S.
• For a given detectors set X, L(X) is the set of all nonself strings detected by X.
We also say that Σ(cid:96) \ L(X) is the set of all self strings detected by X.
Example 1.3. Let (cid:96) = 5, matching threshold r = 3. Suppose that we have the set S
of six self strings s1 = 00000, s2 = 00010, s3 = 10110, s4 = 10111, s5 = 11000, s6 =
11010. Dp1 = {(000,1), (101,1), (110,1)} (Dp1 is set of all left most substrings of
length r of all s ∈ S), Dn1 = {(001,1), (010,1), (011,1), (100,1), (111,1)}, Dp2 =
{(000,2), (001,2), (011,2), (100,2), (101,2)}, Dn2 = {(010,2), (110,2), (111,2)}, Dp3
= {(000,3), (010,3), (110,3), (111,3)}, Dn3 = {(001,3), (011,3), (100,3), (101,3)}
(note that Dpi ∪ Dni = Σ3, i = 1, 2, 3).
20
The self space covered by the set of CHUNKp(S, 3) is {0, 1}5\ L(CHUNKp(S, 3))
= {00000, 00001, 00010, 00011, 00110, 00111, 01000, 01001, 01010, 01011, 01110,
01111, 10000, 10001, 10010, 10011, 10100, 10101, 10110, 10111, 11000, 11001, 11010,
11011, 11110, 11111}. Set of all strings detected by CHUNK(S,3) is L(CHUNK(S,3))
= {00001, 00011, 00100, 00101, 00110, 00111, 01000, 01001, 01010, 01011, 01100,
01101, 01110, 01111, 10000, 10001, 10010, 10011, 10100, 10101, 11001, 11011, 11100,
11101, 11110, 11111}.
Definition 1.4. Given a self set S ⊆ Σ(cid:96). A triple (d, i, j) of a string d ∈ Σk, 1 ≤ k ≤ (cid:96),
an index i ∈ {1, ..., (cid:96) − r + 1} and an index j ∈ {i, ..., (cid:96) − r + 1} is called a negative
detector under rcbvl matching rule if d does not occur in any s, s ∈ S.
In other words, a triple (d, i, j) is a rcbvl detector if there exist (j − i + 1) r-chunk
detectors (d1, i),..., (dj−i+1, j) that dk, dk+1 are two (r − 1)-symbol overlapping strings,
k = 1, ..., j − i.
Example 1.4. Given (cid:96), r and the set S of self strings as in Example 1.1, S = {010101,
111010, 101101, 100011, 010111}. Triple (0001,1,2) is a rcbvl detector because there
exist two 3-chunk detectors (000,1), (001,2) that 000 and 001 are two 2-bit overlapping
strings. A detectors set under rcbvl matching rule contains 5 variable length detec-
tors {(0001,1,2), (00100,1,4), (100,4,4), (011110,1,4), (11000,1,3)}. It is a minimum
detectors set (23 bits) that covers all detector space of r-chunk detectors set in Exam-
ple 1.1 (45 bits).
Matching threshold r plays an important role in selection algorithms. The
value of r can be used to balance between underfitting and overfitting. Our proposed
methods in Chapter 5 investigate this value in combination with simple statistics for
better detection performance.
1.5.4 Detection in r-chunk detector-based positive selection
It could be seen from Example 1.3 that L(CHU N Kp(S, r)) = {00100, 00101,
01100, 01101, 11100, 11101} (cid:54)= L(CHU N K(S, r)), so the detection coverage of Dn
is not the same as that of Dp. This is undesirable for the combination of PSA and
21
NSA. Hence, to combine PSA and NSA in a unified framework, we have to change the
original semantic of positive selection in the detection phase as follows.
Definition 1.5 (Detection in positive selection). If new instance matches (cid:96) − r + 1
positive r-chunk detectors (dij , i), i = 1, . . . , (cid:96) − r + 1, it is claimed as self, otherwise it
is claimed as nonself.
With this new detection semantic, the following theorem on the equivalence of
detection coverage of r-chunk type PSA and NSA could be stated.
Theorem 1.1 (Detection Coverage). The detection coverage of positive and negative
selection algorithms coincide.
(1.1) L(CHU N Kp(S, r)) = L(CHU N K(S, r))
Proof. From the description of NSAs (see Fig. 1.4), if a new data instance matches
a negative r-chunk detector, then it is claimed as nonself, otherwise it is claimed as
self. Obviously, it is dual to the detection of new data instances in positive selection
as given in Definition 1.5.
This theorem lays the foundation for our novel Positive-Negative Selection Al-
gorithm (PNSA) proposed in next chapter.
1.5.5 Holes
The generalization performed by selection algorithms corresponds to the strings
that are labeled self even though they do not occur in the self sample S. These strings
are called holes. There are two types of holes: crossover holes and length-limit holes in
rcb matching rules literature. A crossover hole is a crossover of certain self strings, in
which all substrings of length r occur in self strings. A length-limit hole is one that has
at least one substring of length r that exists in CHUNK(S, r). The r-chunk matching
rule eliminates the problem of length-limit holes.
Fig. 1.7 illustrates the existence of holes in a self and a nonself space comprised
by self and detector strings. The string universe Σ(cid:96) is a squared region. Each dark
circle represents a detector and a grid shape in the middle is self. The universe is
22
Figure 1.7: Existence of holes.
classified by the detector set as self (grid region and holes - white region) and nonself
(dark region covered by all circles).
In fact, holes are not a ”problem”, but as pointed out by Stibor et al. [78], they
are a necessary property of the selection algorithms. Without holes, the algorithms
would do nothing but memorize the training data for classification naively.
Fig. 1.8 shows a set of seven self strings as in Example 1.2, S ⊂ {0, 1}5 (left)
along with CHUNK(S, 3) (middle) and CONT(S, 3) (right). For both detector types,
the induced bipartitionings of the shape space {0, 1}5 are illustrated with strings that
are classified as nonself having a gray background and strings that are classified as self
having a white background. Bold strings are members of the self-set. Holes are the
strings that are classified as self but do not occur in the self-set S (non-bold, non-shaded
strings). This figure is adapted from [32].
1.5.6 Performance metrics
We used three metrics to evaluate each machine learning technique. Detection
rate (DR), Accuracy rate (ACC) and False alarm rate (FAR) are defined as:
(1.2) DR = T P T P + F N
ACC = (1.3)
(1.4) F AR = T P + T N T P + F N + F P + F N F P F P + T N
23
Figure 1.8: Negative selections with 3-chunk and 3-contiguous detectors.
Where TP (True Positive) is the number of true positives (correctly classified
as nonself), TN (True Negative) is the number of true negatives (correctly classified
as self), TP (False Positive) is the number of false positives (classified as nonself,
actually self), and FN (False Negative) is the number of false negatives (classified as
self, actually nonself).
We used 10-fold cross-validation technique and holdout one to evaluate our ap-
proaches in experiments. Regarding the former, the dataset was randomly partitioned
into 10 subsets. Of the 10 subsets, a single subset was retained for testing, and the
others were used as training data. The process was then repeated 10 times, with each
of the 10 subset used exactly once as the testing data. The 10 results from the folds
were then averaged to produce a single performance. Regarding the later, dataset is
split into two groups: training set used to train the classifier and test set (or hold out
set) used to estimate the performance of classifier.
1.5.7 Ring representation of data
As we known, most of AIS-based applications use two types of data representa-
tion: string and real-valued vector. For both popular types, representations are linear
structure of symbols or numbers. They may omit information at the edges (the begin
24
and the end) of these structures. For example, to detect a new instance s as self or
nonself in detection phases of PSA and NSA, it takes (cid:96) − r + 1 steps in the worst case
to match s again each detector at positions 1,...,(cid:96) − r + 1. Therefore, for given detector
d, the symbols d[1] and s[1] are used in only one match, the symbols d[2] and s[2] are
used in two matches, etc. In the other words, all positions in linear structures are not
equal in term of matching times.
Our earlier experimental implementation of NSA on binary ring-based strings
produces a trigger to solve this problem. A set of 50,000 random self strings and a set of
10,000 random nonself strings with the same length of 50 were used in the experiment.
Table 1.1 shows the experimental results on values of r ranging from 10 to 16 under
10 cross-validation technique. The results show that both detection rate and accuracy
rate of ring-based NSA are higher than that of the linear-based one, while false alarm
rates are relatively similar. Accordingly, we could use ring structures instead of linear
ones for more exact classification.
Table 1.1: Performance comparison of NSAs on linear strings and ring strings.
r
10 11 12 13 14 15 16 NAS on linear strings FAR ACC DR 0.0008 0.0102 0.8343 0.0051 0.0535 0.8380 0.0177 0.1817 0.8488 0.0399 0.4054 0.8677 0.0640 0.6456 0.8875 0.0829 0.8293 0.9023 0.0934 0.9340 0.9112 NSA on ring strings FAR DR 0.0010 0.0123 0.0063 0.0655 0.0212 0.2193 0.0465 0.4704 0.0719 0.7184 0.0888 0.8888 0.0964 0.9662 ACC 0.8345 0.8390 0.8522 0.8723 0.8932 0.9075 0.9139
With reference to string-based detectors set, a simple technique for this approach
is to concatenate each string representing a detector with its first k symbols. Each new
linear string is a ring representation of its original binary one. Fig. 1.9 shows a ring
representation (b) of its original string (a) with k = 3.
Given a set of strings S ⊂ Σ(cid:96), a set Sr ⊂ Σ(cid:96)+r−1 includes ring representations
of all strings in S by concatenating each string s ∈ S with its first r − 1 symbols.
Note that we can easily apply the idea of ring strings for other data representa-
25
Figure 1.9: A simple ring-based representation (b) of a string (a).
tions in AIS. One way to do this, for instance, is to create ring representations of other
structures such as trees, and automata, from set Sr instead of S as usual.
1.5.8 Frequency trees
Given a set D of length-equaled strings, a tree T on D, noted TD, is a rooted
directed tree with edge labels from Σ where for all c ∈ Σ, every node has at most one
outgoing edge labeled with c. For a string s, we write s ∈ T if there is a path from the
root of T to a leaf such that s is the concatenation of the labels on this path. Each
leaf is associated with an integer number, that is frequency of string s ∈ D and s is
the concatenation of the labels on the path ending by this leaf. This tree structure is
a compact representation of r-chunk detectors in our algorithm in Chapter 5.
Example 1.5. Let (cid:96) = 5 matching threshold r = 3. Suppose that we have the
set S of four strings: s1 = 00000, s2 = 10110, s3 = 10111, s4 = 11111. Sr =
{0000000, 1011010, 1011110, 1111111}. S1 = {(000,1), (101,1), (111,1)}, S2 =
{(000,2), (011,2), (111,2)}, S3 = {(000,3), (110,3), (111,3)}, S4= {(000,4), (101,4),
(111,4)}, S5 = {(000,5), (010,5), (110,5), (111,5)}.
Assuming that S = {N, A}, where set of normal data N = {s1, s2} and set of
abnormal data A = {s3, s4}.
Nr = {0000000, 1011010} (ring representations of all strings in N ). N1 =
{(000,1), (101,1)}, N2 = {(000,2), (011,2)}, N3 = {(000,3), (110,3)}, N4= {(000,4),
(101,4)}, N5 = {(000,5), (010,5)}.
Ar = {1011110, 1111111}. A1 = {(101,1), (111,1)}, A2 = {(011,2), (111,2)},
A3 = {(111,3)}, A4= {(111,4)}, A5 = {(110,5), (111,5)}.
Ten trees presenting all 3-chunk detectors are in Fig. 1.10. Five trees TNi (TAi),
i = 1, . . . , 5, are in the first (second) row, from left to right, respectively.
Call to mind that there are some strings that belong to both positive trees and
26
Figure 1.10: Frequency trees for all 3-chunk detectors.
negative trees. A substring of s2, s2[1 . . . 3] = 101, for example, s2[1 . . . 3] ∈ TN1 and
s2[1 . . . 3] ∈ TA1. This situation could lead to error rates in detection phase. Therefore,
frequencies of matches will be used to improve detection performance of algorithms.
Detailed technique of using the frequencies is presented in Chapter 5.
1.6 Datasets
There are two basis NIDSs which depend on the source of data to be analyzed:
packet-based NIDSs and flow-based ones. M. H. Bhuyan et al. reviewed in [13] that
most of NIDSs are based on packet-based data. However, we only concentrate on
flow-based NIDSs because of three reasons: 1). It can detect some special attacks,
like DDoS or RDP Brute Force, more efficient and faster than payload based one,
since lesser information is needed to be analyzed [87, 76]; 2). Flow-based anomaly
detection methods process only packet headers and, therefore, reduce data amount and
processing time for high-speed detection on large networks. It can solve the scalability
problem in condition of increasing network usage and load [76]. 3). Flow-based NIDSs
decrease privacy issues in comparison with packet-based ones because of the absence
of payload [76].
27
1.6.1 The DARPA-Lincoln datasets
So as to evaluate the performance of different intrusion detection methodologies,
MITs Lincoln laboratory with the sponsor from the DARPA ITO and Air Force Re-
search laboratory, gathered the DARPA Lincoln datasets which consist of nine weeks
in 1998: seven weeks for training and two remaining weeks for test data. The datasets
are collected and stored in the form of Tcpdump, they are data source for extracting
to some datasets such as KDD99 [3], and NetFlow [86].
There were more than 300 instances of 38 different attacks launched against
victim UNIX hosts in the attack date which can fall into one of the four categories:
Denial of Service (DoS), Probe, Users to Root (U2R), and Remote to Local (R2L).
For each week, inside and outside network traffic data, audit data recorded by the
Basic Security Module on Solaris hosts, and file system dumped from UNIX hosts were
collected. In 1999, another series of datasets which contained three weeks of training
and two weeks of test date was collected. More than 200 instances of 58 attack types
were launched against victim UNIX and WindowsNT hosts and a Cisco router. In 2000,
there were three additional scenario-specific datasets generated to address distributed
DoS and Windows NT attacks. Detailed descriptions of these datasets can be found
at [1].
1.6.2 UT dataset
A public labeled flow-based dataset is provided in [75]. This dataset was cap-
tured by monitoring a honeypot hosted in the University of Twente network, so we
call it TU dataset. This dataset has three categories: malicious traffic, unknown traffic
and side-effect traffic. It has 14,170,132 flows which are mostly of malicious nature.
Only a small number of flows, 5,968 flows or 0.042%, are not labeled (unknown) and
considered as normal data in our experiments in following chapters. Each flow in the
dataset has 13 fields: id: the ID of the flow, src ip: anonymized source IP address
(encoded as 32-bit number), dst ip: anonymized destination IP address (encoded as
32-bit number), packets: number of packets in flow, octets: number of bytes in flow,
start time: UNIX start time (number of seconds), start msec: start time (milliseconds
28
part), end time: UNIX end time (number of seconds), end msec: end time (millisec-
onds part), src port: source port number, dst port: destination port number, tcp flags:
TCP flags of the flow, and prot: IP protocol number.
Examples of a normal flow and an attack flow are (393, 3145344965, 2463760020,
3, 168, 1222173606, 974, 1222173610, 239, 0, 769, 0, 1) and (1, 2463760020, 3752951033,
1, 60, 1222173605, 985, 1222173605, 985, 4534, 22, 2, 6), respectively. The ID of a flow
is used to distinguish attack flows from the others.
1.6.3 Netflow dataset
Packet-based DARPA dataset [1] is used to generate flow-based DARPA dataset,
called NetFlow, in [86]. This dataset focuses only on flows to a specific port and an IP
address which receives the highest number of attacks. It includes all 129,571 traffics
(including attacks) to and from victims. Each flow in the dataset has 10 fields: Source
IP, Destination IP, Source Port, Destination Port, Packets, Octets, Start Time, End
Time, Flags, and Proto. All 24,538 attacked flows are labeled with text labels, such as
neptune, portsweep, ftpwrite, etc.
Examples of a normal flow and an attack flow are (172.16.112.20, 172.16.112.50,
53, 32961, 1, 161, 1999-03-05T08:17:10, 1999-03-05T08 :17:10, 17, 00) and (209.167.99.71,
172.16.112.50, 10353, 4288, 1, 46, 1999-03-12T17:23:19, 1999-03-12T17:23:19, 6, 02:::port-
sweep), respectively.
1.6.4 Discussions
This thesis does not mention any techniques or algorithms for NIDS on a variety
of datasets. Although there exist many other well-known datasets for IDS, such as
KDD99 datasets [3], NSL-KDD dataset [4], and FDFA datasets [2], they are all out of
scope of this netflow-related thesis.
There have been a number of different viewpoints and researches which argue
the DARPA datasets. McHugh [62] proposed one of the most important assessment
of DARPA datasets which was deeply critical. In the other words, this assessment
gives some examples such as normal and attack data have unrealistic data rates, the
29
lack of training datasets for anomaly detection for its intended purpose or no efforts
for validation; to show that some evaluation methodologies are questionable and may
have biased the results. Furthermore, the ideas from Malhony and Chan [60] which
can be seen as a confirmation of findings owned by McHughs’s experiments, revealed
that many attributes had small and fixed ranges in simulation, but large and growing
ranges in real traffic.
Although there exist some above argumentative theories, DARPA-Lincoln dataset
plays a vital role in publicity and is the most sophisticated benchmark for researchers
in the process of the evaluation on intrusion detection algorithms or machine learning
algorithms [92]. One of the important factor of this data is to be a proxy for developing,
testing and evaluating detecting algorithms, rather than to be a solid dataset for a real
time system. If a detection algorithm has a high performance based on the DARPA
data, it is more likely to have a good performance in real network environment [88].
1.7 Summary
This chapter presents background topics used in this thesis including HIS, IDS,
and AIS. Some terms and definitions that will be used throughout the thesis are also
stated clearly. Two important data structure, ring-based and frequency-based, can be
used for improving classification rate of selection algorithms. Some popular perfor-
mance metrics and three well-known datasets for NIDS are presented and discussed.
These datasets will be used for experiments in the other chapters. Besides, under new
semantic of detection in r-chunk based PSA (Section 1.5.4), we proved an important
theorem on the coincide of detection coverage of PSA and that of NSA. This theorem
will lead to one contribution of the thesis that will be discussed in next chapter.
30
Chapter 2
COMBINATION OF NEGATIVE SELECTION AND POSITIVE SELECTION
It can be seen from Theorem 1.1 in Chapter 1 that the r-chunk based PSA and
NSA are dual in term of detection. This motivates our approach to combining two
selection algorithms that does not affect the detection performance of each.
2.1
Introduction
NSA and PSA are computational models that have been inspired by negative
and positive selection of the biological immune system. Among the two, NSA has been
studied more extensively resulting in more variants and applications [51]. However,
all of existing string-based NSAs have worst-case exponential memory complexity for
storing the detectors set, hence, limit their practical applicabilities [31]. In this chap-
ter, we introduce a novel selection algorithm that employs binary representation and
r-chunk matching rule for detectors. The new algorithm combines negative and pos-
itive selection to reduce both detector storage complexity and detection time, while
maintaining the same detection coverage as that of NSAs (PSAs).
In following section, we review some related works. Section 2.3 presents a new
r-chunk type NSA is presented in detail that is the combination of positive and nega-
tive selection, called PNSA. In our proposed approach, binary trees are used as data
structure for storing the detectors set to reduce memory complexity, and therefore the
time complexity of the detection phase. Section 2.4 details preliminary experimental
31
results. Summary of the chapter is in the final section.
2.2 Related works
Both PSA and NSA achieve quite similar performance for detecting novelty
in data patterns [24]. Dasgupta et al. [21] conducted one of the earliest experiments
on combining positive with negative selection. The combined process is embedded in
a genetic algorithm using a fitness function that assigns a weight to each bit based
on the domain knowledge. Their method is neither aimed to reduce detector storage
complexity nor detection time. Esponda et al. [34] proposed a generic NSA for anomaly
detection problems. Their model of normal behavior is constructed from an observed
sample of normally occurring patterns. Such a model could represent either the set
of allowed patterns (positive detection) or the set of anomalous patterns (negative
detection). However, their NSA is not concerned with the combination of positive and
negative selection in detection phase as in proposed one. Stibor et al. [80] argued that
positive selection might have better detection performance than negative selection.
However, the choice between positive selections and negative ones obviously depends
on representation of the AIS-based applications.
To the best of our knowledge, there has not been any published attempt in
combining r -chunk type PSA and NSA for the purpose of reducing detector storage
complexity and detection time complexity.
2.3 New Positive-Negative Selection Algorithm
Our algorithm first constructs (cid:96) − r + 1 binary trees (called positive trees)
corresponding to (cid:96) − r + 1 positive r-chunk detectors set Dpi, i = 1, . . . , (cid:96) − r + 1. Then,
all complete subtrees of these trees are removed to achieve a compact representation of
the positive r-chunk detectors set while maintaining the detection coverage. Finally, for
every ith positive trees, we decide whether or not it should be converted to the negative
tree, which covers the negative r-chunk detectors set Dni. The decision depends on
which tree is more compact. When this process is done, we have (cid:96) − r + 1 compact
32
binary trees that some of them represent positive r-chunk detectors and the others
represent negative ones.
The r-chunk matching rule on binary trees is implemented as follows: a given
sample s matches the positive (negative) tree ith if s[i . . . i + k] is a path from the
root to a leaf, i = 1, . . . , (cid:96) − r + 1, k < r. The detection phase can be conducted by
traveling the compact binary trees iteratively one by one: a sample s is claimed as
nonself if it matches a negative tree or it does not match all positive trees, otherwise
it is considered as self.
Example 2.1. For the set of six self strings from Example 1.3, S = {00000, 00010,
10110, 10111, 11000, 11010}, where (cid:96) = 5 and r = 3, the six binary trees (the left
and right child are labeled 0 and 1 respectively) represent the detectors set of six 3-
chunk detectors (Dpi and Dni, i = 1, 2, 3) as depicted in Fig. 2.1. In the figure, dashed
arrows in some positive trees mark the complete subtrees that will be removed to achieve
compact tree representation. The positive trees for Dp1, Dp2 and Dp3 are in (a), (c)
and (e), respectively; The negative trees for Dn1, Dn2 and Dn3 are in (b), (d) and (f ),
respectively.
The number of nodes of the trees in Figures 2.1.a - 2.1.f (after deleting complete
subtrees) are 9, 10, 7, 6, 8 and 8, respectively. Therefore, the chosen final trees are
those in Figures 2.1.a (9 nodes), 2.1.d (6 nodes) and 2.1.e or 2.1.f (8 nodes). In real
implementation, it is unnecessary to generate both positive trees and negative trees.
Since each Dpi could dually be represented either by a positive or a negative tree,
we only need to generate (compact) positive trees. If a compact positive tree T has
more number of leaves than the number of internal nodes that have single child, the
corresponding negative tree T (cid:48) has less number of nodes than that of T . Therefore, T (cid:48)
should be used instead of T to represent Dni more compactly. Figure 2.3 presents a
diagram of the algorithm. The following example illustrates this observation.
Example 2.2. Consider again the set of six self strings S from Example 1.3, S =
{00000, 00010, 10110, 10111, 11000, 11010}. The compact positive tree for the positive
3-chunk detectors set Dp2 = {(000,2); (001,2); (011,2); (100,2); (101,2)} is showed
33
Figure 2.1: Binary tree representation of the detectors set generated from S.
in Fig. 2.2.a. This tree has three leaves and two nodes that have only one child (in
dotted circles) so it should be converted to the corresponding negative tree as illustrated
in Fig. 2.2.b.
Figure 2.2: Conversion of a positive tree to a negative one.
34
Algorithm 2.1 Detector Generation Algorithm. 1: procedure DetectorGeneration(S, r, T )
Input: A set of self strings S ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set T of (cid:96) − r + 1 prefix trees presenting all r-chunk detectors.
T = ∅ for i = 1, ..., (cid:96) − r + 1 do
create an empty prefix positive tree Ti for all s ∈ S do
insert every s[i . . . i + r − 1] into Ti
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:
end for for all internal node n ∈ Ti do if n is root of complete binary subtree then delete this subtree end if
end for if (number of leaves of Ti) > (number of nodes of Ti that have only one child) then
for all internal node ∈ Ti do
if it has only one child then if the child is a leaf then delete the child
end if create the other child for it end if
end for mark Ti as a negative tree
14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: end for 25: 26: end procedure
end if T = T ∪ {Ti}
The proposed technique is summarized by Algorithm 2.1 and Algorithm 2.2.
The first algorithm, Algorithm 2.1, generates (cid:96) − r + 1 trees where each of these is
labeled with self or nonself. The process of generating compact binary (positive and
negative) trees representing the complete r-chunk detectors set is conducted in the outer
“for” loop. First, all binary positive tree Ti are constructed by the first inner loop.
Then, the compactification of all Ti is conducted by the second one, i = 1, . . . , (cid:96) − r + 1.
The conversion of a positive tree to negative one takes place in “if” statement after the
second inner “for” loop. The procedure for recognizing a given cell string s as self or
nonself, is carried out by the last “while . . . do” and “if . . . then . . . else” statements.
Figure 2.4 presents a diagram of the algorithm.
35
Figure 2.3: Diagram of the Detector Generation Algorithm.
The detection phase could be done by the second algorithm, Algorithm 2.2, and
be illustrated by the following example.
Example 2.3. Given S, r as in Example 1.3, S = {00000, 00010, 10110, 10111,
11000, 11010}, and s = 10100 as the inputs of the algorithm, three binary trees are
36
constructed as the detectors set in Figures 2.1.a, 2.1.d. and 2.1.e. One of the output of
the algorithm is “s is nonself ” because all the paths of tree T2 do not contain substring
of s: s[2. . . 4] = 010.
Algorithm 2.2 Positive-Negative Selection Algorithm. 1: procedure PNSA(T , r, s)
Input: A set T of (cid:96)−r +1 prefix trees presenting all r-chunk detectors, a matching threshold r ∈ {1, . . . , (cid:96)}, an unlabeled string s ∈ Σ(cid:96). Output: A label of s (as self or nonself). (cid:46) A temporary boolean variable
f lag = true i = 1 while (i ≤ (cid:96) − r + 1) and (f lag = true) do
if (Ti is a positive tree) and (s /∈ Ti) then f lag = false
end if if (Ti is a negative tree) and (s ∈ Ti) then f lag = false
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: end if 17: 18: end procedure
end if i=i+1 end while if f lag = false then output s is nonself else output s is self
From the description of DetectorGeneration, it is trivial to show that it takes
|S|((cid:96)−r+1).r steps to generate all necessary trees (detector generation time complexity)
and ((cid:96) − r + 1).r steps to verify a cell string as self or nonself in the worst case (worse-
case detection time complexity). These time complexities are similar to the popular
NSAs (PSAs) such as the one proposed in [31]. However, by using compact positive
and negative binary trees for storing the detectors set, PNSA could reduce the storage
complexity of the detectors set in comparison with the other r-chunk type single NSAs
or PSAs that store detectors as binary strings. This storage complexity reduction could
potentially lead to better detection time complexity in real and average cases. To see
this, first, let the following theorem be stated:
Theorem 2.1 (PNSA detector storage complexity). Given a self set S and an integer
37
Figure 2.4: Diagram of the Positive-Negative Selection Algorithm.
(cid:96), the procedure DetectorGeneration produces the detector (binary) tree set that have
at most total ((cid:96) − r + 1).2r−2 less number of nodes in comparison to the detector tree
set created by a PSA or NSA only, where r ∈ {2, . . . , (cid:96) − r + 1}.
38
Proof. We only prove the theorem for the PSA case, the NSA case can be proven in a
similar way. Because there are ((cid:96) − r + 1) positive trees can be build from the self set
S, so the theorem is proved if it can reduce at most 2r−2 nodes from a positive tree.
The theorem is proved by induction on r (also the height of binary trees).
It is noted that when converting a positive tree to a negative tree, the reduction
in number of nodes is exactly as the result of the subtraction of number of leaf nodes
from the number of internal nodes that have only one child.
When r = 2, there are 16 trees of possible positive trees are of height 2. By
examining all 16 cases, we have found that the maximum reduction in number of
nodes is 1. One example of these cases is the positive tree that has 2 leaf nodes after
compactification as in Fig. 2.5.a. Since it has two leaf nodes and one one-child internal
node, after being converted to the corresponding negative tree, the number of nodes is
reduced by 2 - 1 = 1.
Figure 2.5: One node is reduced in a tree: a compact positive tree has 4 nodes (a) and its conversion (a negative tree) has 3 node (b).
It is supposed that the theorem’s conclusion is right for all r < k. We shall
prove that it is also right for k. This is done by an observation that in all positive
trees that are of height k, there is at least one tree with both left subtree and right
subtree (of height k − 1) that each can be reduced by at least 2(k−1)−2 nodes after
conversion.
A real experiment on network intrusion dataset showed at the end of following
section shows that the storage reduction is only about 0.35% of this maximum.
39
2.4 Experiments
Next, we investigate the possible impact of the reduction in detector storage
complexity in PNSA on the detection real (average) time complexity in comparison
with single NSA (PSA). All experiments are performed on a laptop computer with
Windows 8 Pro 64-bit, Intel Core i5-3210M, CPU 2.50GHz (4 CPUs), 4GB RAM.
Table 2.1 shows the results on detector memory storage and detection time of
PNSA compared to a popular NSA proposed in [31] on some combinations of S, (cid:96) and
r. The training dataset of selves S contains randomly generated binary strings. The
memory reduction is measured as the ratio of reduction in number of nodes of the
binary tree detectors generated by PNSA when compared to the binary tree detectors
generated by the NSA in [31]. The comparative results show that when (cid:96) and r are
sufficiently large, the detector storage complexity and the detection time of PNSA are
significantly smaller than NSA in [31] (36% and 50% less).
Table 2.1: Comparison of memory and detection time reductions.
S 1,000 2,000 2,000 2,000 (cid:96) 50 30 40 50 r Memory (%) Time (%) 12 15 17 20 0 2.5 25.9 36.3 0 5 42.7 50
We have conducted another experiment by choosing (cid:96) = 40, |S| = 20,000 (S is
the set of randomly generated binary strings of length (cid:96)) and varying r (from 15 to
40). Then, (cid:96) − r + 1 trees were created using single NSA and other (cid:96) − r + 1 compact
trees were created using PNSA. Next, both detectors sets were used to detect every
s ∈ S. Fig. 2.6 depicts the detection time of PNSA and NSA in the experiment. The
results show that PNSA detection time is significantly smaller than that of NSA. For
instance, when r is from 20 to 34, detection in PNSA is about 4.46 times faster than
that of NSA.
Next experiment is conducted on Netflow dataset, a conversion of Tcpdump
from well-known DARPA dataset to Netflow [86]. We use all 105,033 normal flows as
self samples. This self set is first converted to binary string of length 104, then we run
40
Figure 2.6: Detection time of NSA and PNSA.
our algorithm on r changing from 5 to 45. Table 2.2 shows some of the experiment
steps. The percentage of node reduction is in the final column. Fig. 2.7 depicts the
reduction of nodes in trees created by PNSA comparison to that of NSA for all r = 3,..,
45. It shows that the reduction is more than one third when the matching threshold
greater than 19.
Table 2.2: Comparison of nodes generation on Netflow dataset.
r 5 10 15 20 25 30 35 40 45 NSA 727 33,461 1,342,517 9,428,132 18,997,102 29,668,240 42,596,987 58,546,497 79,034,135 PNSA 706 31,609 1,154,427 6,157,766 11,298,739 17,080,784 24,072,401 32,841,794 44,194,012 Reduction(%) 2.89 5.53 14.01 34.68 40.52 42.42 43.48 43.90 44.08
An experiment on Spambase dataset [8] is presented in an our work [A4], it
shows that maximum nodes reduction percentages of PNSA in comparison to PSA and
NSA for some r are 20% and 40%, respectively.
2.5 Summary
In this chapter, we have proposed a novel approach to combining PSA and
NSA. The new algorithm, PNSA, uses a compact representation of the detectors set as
41
Figure 2.7: Nodes reduction on trees created by PNSA on Netflow dataset.
Figure 2.8: Comparison of nodes reduction on Spambase dataset.
positive and negative binary trees. PNSA was theoretically proven to achieve better
detector storage complexity in comparison to single NSA or PSA while maintaining the
detection coverage, detector generation time complexity, and worst-case detection time
complexity. It in fact can balance the availability of both self samples and nonself ones.
This could potentially lead to lower detection time on real cases (i.e smaller average
detection time complexity) as preliminarily confirmed in our experiments.
The main contents of this chapter is published in a Vietnamese journal [A4].
Also, this article is changed about 30% in comparison to its previous version in ISI-
42
indexed proceedings of the IEEE International Conference on Computing and Commu-
nication Technologies, Research, Innovation, and Vision for the Future (RIVF), 2013.
43
Chapter 3
GENERATION OF COMPACT DETECTOR SET
In this chapter, we present a new NSA to generate a complete and nonredundant
detectors set that uses an extension of r-chunk matching rule. This allows to reduce
detectors storage and classification time. Experimental results on two datasets show
the effectiveness of proposed algorithm.
3.1
Introduction
A complete and nonredundant detector set, also called perfect detectors set, is
necessary to archive the maximum coverage for detectors with minimum number of
detectors [30]. In general, the larger detector set doesnt always contribute more in
coverage because of the overlap. However, in case of nonredundant representation, the
overlap problem is always not reachable.
To date, there have been some computing models of binary detectors that could
generate a complete and non-redundant detectors set, called perfect detectors set-a
set of minimum detectors with the same detection ability in comparing to that of all
possible detectors, such as those based on prefix trees [31], or automata [32]. In these
models, detectors are represented as a whole structure (tree or automata) rather than
a set of individual strings. While they provide a more compact representation of the
detectors sets for AIS; and therefore achieve a better detection time complexity, these
models of binary-based AIS are hard to deploy in distributed environments. Naturally,
one desirable property of NSA is its ability to be implemented in a distributed manner
44
- each detector might detect different kinds of nonself, which is desirable for many
applications such as in computer security systems. Therefore, the focus of this chapter
is on binary-based NSAs that employ a discrete set of detectors (strings) in order that
they can be implemented in distributed environments. We can, for example, randomly
divide the discrete set of detectors into some subsets, each of which is for a site in
distributed environments.
The rest of the chapter is organized as follows. In the next section, we first briefly
review some related works. In Section 3.3, we detail our new NSA that can generate
perfect detectors sets base on rcbvl. Section 3.4 briefly describes our experiments in
generating perfect detectors sets. Section 3.5 summaries the chapter and discusses
some possible future works.
3.2 Related works
With respect to binary-based AIS using discrete detectors set, to the best of
our knowledge, the only algorithm for generating a perfect and discrete set of r-chunk-
based (rcb-based) detectors was proposed by T. Stibor in [78] (by Wierzcho´n in [90]),
which has frequently been cited, compared, and applied in the literature1.
The algorithm in [78] uses a hash table data structure to insert, delete and
search efficiently boolean values, which are indexed with a composite key of r -chunk
string concatenated with detector index i. The algorithm in [90] uses a discrete and
partial detectors to find their overlapping patterns (strings). It also takes time to find
ineffective patterns in preprocessing step. In spite of generating perfect detectors sets,
both of the algorithm do not use compact representation of discrete detectors.
In this chapter, we introduce new deterministic algorithm to generate a perfect
and discrete set of rcbvl-based detectors, which is equitable to a full set of r-chunk-based
detectors in term of anomaly detection. Moreover, compact string-based detectors set
can achieve better memory and time complexities compared to conventional string-
1To the time of writing this chapter, the work in [78] (in [90]) has had 44 (47) citations on Google
Scholar.
based algorithms. Note that, as discussed in Chapter 1, perfect detectors set cannot
45
solve the problem of holes.
3.3 New negative selection algorithm
Given a non-empty set S of self strings of length (cid:96), and an integer r ∈ {1, . . . , (cid:96)−
r + 1}, this section presents a new NSA based on rcbvl matching rule. Some prefix
trees are first used to generate perfect detectors set from S, and then this set is used
to distinguish if a new sample as self or nonself.
3.3.1 Detectors set generation under rcbvl matching rule
Algorithm 3.1 summarizes the first phase of new NSA, called RNSA. In the
algorithm, we first construct for every position i ∈ {1, . . . , (cid:96) − r + 1} a prefix tree
Ti. These trees are temporary data structures for detectors generation. Each prefix
tree Ti can be constructed as follows: start with an empty prefix tree and insert every
s[i . . . i + r − 1], s ∈ S, into it (lines 4-6). Next, for every non-leaf node n and every
c ∈ Σ where no edge with label c starts at n, create a new leaf n(cid:48) and an edge (n, n(cid:48))
labeled with c. Finally, delete every node from which none of the newly created leaves
is reachable (lines 9-15). Detectors set D is first created by all s ∈ T1 in line 18. The
rest of the algorithm, lines 19-42, updates partial detectors in D by identifying their
right overlapping strings in prefix trees.
From the description of the algorithm, it takes |S|.((cid:96) − r + 1).r steps to generate
((cid:96) − r + 1) prefix trees and |D|.((cid:96) − r + 1).2r steps to generate perfect detectors set D.
Figure 3.1 presents a diagram of the algorithm.
Example 3.1. Given (cid:96), r and the set of self strings S as in Example 1.1, S = {010101,
111010, 101101, 100011, 010111}. Some steps in the Algorithm 3.1 generate a perfect
detectors set {(0001,1,2), (00100,1,4), (100,4,4), (011110,1,4), (11000,1,3)} as in Ex-
ample 1.4 are:
Set D is first created as {(00,1,1), (011,1,1), (110,1,1)}. Then the for loop (lines
19-42) calculates D and D1 as following:
For i = 2: D = {(0001,1,2), (0010,1,2), (0111,1,2), (1100,1,2)} and D1 = ∅.
For i = 3: D = {(00100,1,3), (01111,1,3), (11000,1,3)} and D1 = {(0001,1,2)}.
46
Algorithm 3.1 Algorithm to generate perfect rcbvl detectors set 1: procedure GenerationDetectors(S, r, D)
Input: A set of self strings S ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set D of rcbvl detectors
for i = 1, ..., (cid:96) − r + 1 do Create an empty prefix tree Ti end for for all s ∈ S do for i = 1, ..., (cid:96) − r + 1 do
insert s[i . . . i + r − 1] into Ti end for
end for for i = 1, ..., (cid:96) − r + 1 do
for all non-leaf node n ∈ Ti and all c ∈ Σ do if no edge with label c starts at n then create a new leaf n(cid:48) and an edge (n, n(cid:48)) labeled with c. end if
end for delete node n ∈ Ti from which none of the newly created leaves is reachable.
end for D1 = ∅ D = {(s, 1, 1)|s ∈ T1} for i = 2, ..., (cid:96) − r + 1 do
D2 = ∅ for all (s, k, j) ∈ D do
if there exists a s(cid:48) ∈ Ti where s[i − k + 1...|s|] is prefix of it then
D2 = D2 ∪ {(s + s(cid:48)[|s| − j + k...|s(cid:48)|], k, i)} delete every node n ∈ Ti from which only nodes in the s(cid:48) is reachable for all s(cid:48) ∈ Ti where s[i − k + 1...|s|] is prefix of it do if |s| − i + k < r then
D2 = D2 ∪ {(s[|s|] + s(cid:48), i − 1, i)} else
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31:
D2 = D2 ∪ {(s(cid:48), i, i)}
end if delete every node n ∈ Ti from which only nodes in the s(cid:48) is reach- able end for else
D1 = D1 ∪ {(s, k, j)} end if
end for for all s(cid:48) ∈ Ti do D2 = D2 ∪ {(s(cid:48), i, i)}
32: 33: 34: 35: 36: 37: 38: 39: 40: end for 41: D = D ∪ D1 42: 43: end procedure
end for D = D2
47
Figure 3.1: Diagram of a algorithm to generate perfect rcbvl detectors set.
48
For i = 4: D = {(00100,1,4), (011110,1,4)} and D1 = {(0001,1,2), (11000,1,3),
(100,4,4)}.
The final step, D = D ∪ D1 in line 42, generates the perfect detectors set {(0001,1,2),
(00100,1,4), (100,4,4), (011110,1,4), (11000,1,3)}.
3.3.2 Detection under rcbvl matching rule
To detect if a given string s is self or nonself, we simply check our rcbvl matching
rule on s against every detector in D. If it is the case, output s is nonself, otherwise
s is self. The function min used in Algorithm 3.2 to return the smallest number from
two values. It is clear to see that this algorithm has the same time complexity with
Algorithm 3.1.
Algorithm 3.2 Algorithm to detect if a given string s is self or nonself 1: procedure VNSA(S, s, r, D)
Input: A set of self strings S ⊆ Σ(cid:96), an unlabeled string s ∈ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set D of rcbvl detectors, a label of s (self or nonself).
if s(cid:48)[i . . . min(i + r − 1, |s(cid:48)|)] occurs in s at position i then
output s is nonself exit procedure end if end for
2: GenerationDetectors(S, r, D) for all (s(cid:48), n, m) ∈ D do 3: for i = n, ..., m do 4: 5: 6: 7: 8: 9: 10: 11: 12: end procedure
end for output s is self
3.4 Experiments
In our experiments, we use a popular flow-based dataset NetFlow [86] and a
random dataset. The flow-based NetFlow is used for Experiment 1. Similar to the
previous studies [86], we select the same 4 features Packets, Octets, Duration and
Flags from the NetFlow dataset as the input of Experiment 1. A randomly created
dataset is used for Experiment 2. This dataset contains 50,000 binary strings with the
49
length of 100.
Flows in NetFlow are converted into binary strings by two steps. The first step
is to map all features to binary string features. After this step, a total string features
are constructed for both normal and abnormal data. The second step is to concatenate
the binary string features for every flows. After this step, dataset contains binary
strings with their length of 49. The distributions of training and testing datasets as
well as parameters r, (cid:96) for 4 experiments are described in Table 3.1.
Table 3.1: Data and parameters distribution for experiments and results comparison.
Size (in bits) (cid:96) r Train Test rcbvl Time (in minutes) r-chunk rcbvl
49 49 10 8 119,571 79,571 10,000 50,000 r-chunk Case 1 206,810 31,672 42,704 8,096 1 0.8 0.2 0.18
30 30 12 14 25,000 40,000 25,000 10,000 Case 2 367,092 2,324,056 79,222 392,815 4.1 8.65 0.75 1.4
Results in Table 3.1 show that our proposed algorithm reduces both size (in
bits) of detectors and time (in minutes) to classify testing dataset in comparison with
that of best algorithm in [78].
All experiments are performed on a laptop computer with Windows 8 Pro 64-bit,
Intel Core i5-3210M, CPU 2.50GHz (4 CPUs), 4GB RAM.
3.5 Summary
In this chapter, we have introduced a novel NSA to generate perfect detectors
sets for string-based AIS. We developed a rcbvl matching rule as an extension of tra-
ditional rcb. Our new algorithm has much lower time complexity than the previously
proposed algorithm for generating perfect detection sets in [78]. More importantly,
proposed algorithm always generate complete and non-redundant detectors sets for
string-based AIS. Experimented results show that proposed algorithm can reduce both
runtime of detection and storage of detectors.
50
Varying length of the parameter r in the rcbvl matching rule can balance spe-
cialisation and generalisation in classification systems, which will be the next step of
our study. Moreover, improving our approach to generate perfect detectors set under
rcb matching rule is an interesting research direction.
According to discussion on holes in Section 1.5.5, the number of holes can be
effected by matching rules. We anticipate that the rcbvl matching rule with length of
corresponding detectors may create an acceptable number of holes. This will be one of
our next research directions.
//Small contribution of this chapter is partly published in an international jour-
nal [A6].
51
Chapter 4
FAST SELECTION ALGORITHMS
4.1
Introduction
Although NSAs have been studied most in AIS literature and successfully ap-
plied not only computer science problems but also to biological research questions, their
runtime has limited their practical usefulness for problems with large scale data. We,
thus, investigate methods to compute the classification outcome efficiently. We pre-
cisely reduce the computational complexity of this task based on techniques from string
processing and data compression. We obtain linear time algorithms for two important
matching rules, r -contiguous rule and r -chunk rule, for which negative selection had
been conjectured recently to be polynomial [32, 82, 84].
In this chapter, we present fast NSAs for r -contiguous rule and r -chunk rule,
noted as Chunk-NSA and Cont-NSA respectively, which generate compressions of com-
plete and non-redundant detectors sets in linear time with respect to size of training
data. The algorithms comprise of two main phases: generating a graph as a represen-
tation of all detectors and adjusting it to an automaton. We will prove the proposed
algorithms are r times faster in data training compared with that in [32]. Prefix DAG
is important data structure used in the algorithms.
In following section, we review mostly related works. The two following sections
present r-chunk detector-based NSA and r-contiguous detector-based NSA in detail,
respectively. Section 4.5 details preliminary experimental results. The summary of the
chapter is in the final section.
52
4.2 Related works
Table 4.1 shows the comparison of our results with the runtimes of previously
published algorithms. All runtimes for training phase and classification phase are
given for a binary alphabet (|Σ| = 2) since not all algorithms are applicable to arbitrary
alphabets. The parameter K = min{|S[i..i+r −1]|, i = 1, ..., (cid:96)−r +1}. The parameter
|D|, the desired number of detectors, is only applicable to algorithms that generate
detectors explicitly.
The early work of Dhaeseleer D’haeseleer et al. [29] proposed algorithms that
generate detector set by a exhaustive search. However, these algorithms still have a
runtime exponential in r. Some other algorithms were later proposed by Wierzcho´n [89],
Ayara et al. [6], and Stibor et al. [78] with the same complexities. Our algorithm and
algorithm in [32] produce the results that would be obtained with the maximal number
of generated detectors. We proposed a NSA that produces a substantial performance
improvement in training phase compared to the most effective published NSA in [32].
Recently, Textor [82] proposed a sampling NSA. However, the algorithm is in-
herently inefficient, as it will typically need exponentially many patterns to achieve a
sufficient coverage of the universe. This NSA is then generalized by finite automata-
based models to perform several related algorithmic tasks, which are not complicated
to implement and customize to real-world problems [84]. Nevertheless, the runtimes of
this NSA are still polynomial.
4.3 A fast negative selection algorithm based on r-
chunk detector
Each self string of S is supposed to have an associated index, S = {si, i =
1, .., |S|}. We also use the following notations:
• An array Q, where Q[s][c] is a pointer used for creating new nodes in the tree,
s ∈ Σr−1, and c ∈ Σ. This data structure is used for expanding the partial DAG
gradually.
• An array P , where P [i] is a structure of two fields, a pointer P [i].end and a string
53
Table 4.1: Comparison of our results with the runtimes of previously published algo- rithms.
Matching rules
r-chunk
r-contiguous
Algorithms Stibor et al. [78] Elberfeld, Textor [31] Elberfeld, Textor [32] Present thesis D’haeseleer et al. [29] (linear) D’haeseleer et al. [29] (greedy) Wierzcho´n [89] Elberfeld, Textor [31] Elberfeld, Textor [32] Present thesis Training (2r + |S|)((cid:96) − r + 1) r2|S|((cid:96) − r) |S|(cid:96)r |S|(cid:96) (2r + |S|)((cid:96) − r) 2r|S|((cid:96) − r) 2r(|D|((cid:96) − r) + |S|) |S|3(cid:96)3r3 |S|(cid:96)r |S|(cid:96) + (2r − K).(cid:96) Classification |D|(cid:96) |S|(cid:96)2r (cid:96) (cid:96) |D|(cid:96) |D|(cid:96) |D|(cid:96) |S|2(cid:96)3r3 (cid:96) (cid:96)r
P [i].str ∈ Σr−1, i = 1, .., |S|. Each P [i] is joined with si, where the first field
P [i].end is used to point to the end node in the path of si in the prefix DAG and
the second field P [i].str is a suffix of the path ended by P [i].end.
Two array Q and P are used to expand the prefix DAG step by step. The final
automaton to decide the membership of strings in Σ(cid:96) is constructed by two phases.
The first phase is to create a DAG G that L(G) ∩ Σ(cid:96) = Σ(cid:96) - L(CHUNK(S, r)) by
Algorithm 4.1 and this DAG is then turned into an automaton M that L(M ) ∩ Σ(cid:96)=
L(CHUNK(S, r)) in the second phase by Algorithm 4.2. In Algorithm 4.1, notations
NULL and new() are used with semantic similar to that in C programming language.
The key idea behind our construction is that instead of generating all (cid:96) − r + 1
prefix trees T1, ..., T(cid:96)−r+1 as in [32], we only create one tree T1 for S[1..r] explicitly and
then enlarge it by adding nodes and edges level-by-level to obtain a prefix DAG. After
this process, the DAG encodes all positive detectors S[i . . . i + r − 1], i = 1, ..., (cid:96) − r + 1.
This is then inverted to construct a compression of CHUNK(S, r).
Figure 4.1 presents a diagram of the algorithm.
Example 4.1. Let (cid:96) = 5, r = 3. Given a set of 7 self strings as in Example 1.2,
S = {01111, 00111, 10000, 10001, 10010, 10110, 11111}. The set CHUNK(S, 3) is
{(000, 1), (010, 1), (110, 1), (010, 2), (100, 2), (101, 2), (110, 2), (011, 3), (100, 3),
(101, 3)}. The set CONT(S, 3) is {01011, 11011}.
54
Algorithm 4.1 To generate positive r-chunk detectors set 1: procedure PositiveR-chunkDetector(S, r, G)
Input: A set of self strings S ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A prefix DAG G presenting positive r-chunk detectors set.
G = ∅ for i = 1, ..., |S| do
insert si[1 . . . r] into G and assign P [i].end to the leaf node in path s[1 . . . r] P [i].str = s[2 . . . r]
end for for i = r + 1, ..., (cid:96) do for j = 1, ..., |S| do
if Q[P [j].str][sj[i]] = N U LL then Q[P [j].str][sj[i]] = new() end if
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16:
end for for j = 1, ..., |S| do p = P [j].end for c ∈ Σ do if (Q[P [j].str][c] (cid:54)= N U LL) and(edge starts from p with label c does not exist) then create an edge starts from p with label c to Q[P [j].str][c]
end if P [j].end = end node of the edge starts from p with label sj[i] end for
end for for j = 1, ..., |S| do
17: 18: 19: 20: 21: 22: 23: 24: 25: end for 26: 27: end procedure
Q[P [j].str][sj[i]] = N U LL P [j].str = P [j].str[2...r − 1] + sj[i] end for
Example 4.2. Let (cid:96) = 5, r = 3 and S is self set from Example 4.1. The prefix DAG
generated by Algorithm 4.1 is illustrated in Fig. 4.2.a. Apter adjusting by Algorithm 4.2,
this DAG can be turned into an automaton as in Fig. 4.2.b. They have the properties
that L(G)∩{0, 1}5={0, 1}5\L(CHUNK(S,3)) and L(M )∩{0, 1}5= L(CHU N K(S, 3)).
Algorithm 4.3 summarizes the overall of proposed Chunk-NSA. In the algorithm,
we use a self set S, an integer r ∈ {1, . . . , (cid:96) − r + 1}, a cell string s to be detected, a
prefix DAG G. Chunk-NSA will detect s as self or nonself.
Theorem 4.1. Given any S ⊆ Σ(cid:96), s ∈ Σ(cid:96) and r ∈ {1, . . . , (cid:96)}, algorithm Chunk-NSA
55
Figure 4.1: Diagram of the algorithm to generate positive r-chunk detectors set.
constructs an automaton M with L(M ) ∩ Σ(cid:96) = L(CHU N K(S, r)) in time O(|S|.(cid:96).|Σ|)
and checks if s ∈ L(M ) in time O((cid:96)).
Proof. We first construct a prefix DAG G as follows: start with an empty prefix DAG,
and also a prefix tree in this case, then insert every s ∈ S[1 . . . r] into it. This was done
by first for loop in Algorithm 4.1 in time |S|.r.|Σ|.
Then for each s[i], s ∈ S, and i = r + 1, .., (cid:96), we add nodes and corresponding
56
Algorithm 4.2 To generate negative r-chunk detectors set 1: procedure NegativeR-chunkDetector(S, r, G)
Input: A set of self strings S ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A prefix DAG G presenting negative r-chunk detectors set.
PositiveR-chunkDetector(S, r, G) create a special node n(cid:48) for each non-leaf node n ∈ G do for c ∈ Σ do
if no edge with label c starts at n then create new edge (n, n(cid:48)) labeled with c end if end for
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: end for 15: 16: end procedure
end for for each node n ∈ G do if n is not reachable to n(cid:48) then delete n end if
edges to G by using linking pointers of P and Q as in lines 7-26 in Algorithm 4.1.
The first inner for loop (line 8) is to create new pointer to expand G to next level.
Note that, P [i] is joined with si as mentioned at the beginning of this section. The
second inner for loop (line 13) is to add new nodes and corresponding edges to G. The
for loop in line 22 is to free all pointers in Q and to update all strings in P. It takes
((cid:96) − r) iterative steps to generate DAG G presenting all positive r-chunk detectors.
There are three inner for loops (lines 8, 13, 22), each with |S| steps. Therefore, the
time complexity of Algorithm 4.1 is |S| + 3.|S|.((cid:96) − r + 1).|Σ|1.
Algorithm 4.2 inverts the DAG G as follows: Firstly, a special node n(cid:48) is created.
Then for every non-leaf node n and every c ∈ Σ for which no edge with label c starts
at n, we create new edge (n, n(cid:48)) labeled with c. Finally, we delete every node n which
is not reachable to n(cid:48). There are no more than |S|.(cid:96).|Σ| nodes in the DAG G, so the
time complexity of Algorithm 4.2 is |S|.(cid:96).|Σ|.
The DAG G generated by Algorithm 4.2 is turned into a finite automaton M by
1Assuming that we can seek each element of arrays P and Q in unit time. This, for example, can
be done by operators in base-|Σ| numeral system.
making the leaf node n(cid:48) as an accepting state with self-loops for all c ∈ Σ and setting
57
Figure 4.2: A prefix DAG G and an automaton M
Algorithm 4.3 A fast r-chunk detector-based NSA 1: procedure Chunk-NSA(s, S, r, G, M )
Input: A set of self strings S ⊆ Σ(cid:96), an unlabeled string s ∈ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A prefix DAG G presenting negative r-chunk detectors set, an automaton M , a label of s (self or nonself).
2: 3: 4: 5: 6: 7: end if 8: 9: end procedure
NegativeR-chunkDetector(S, r, G) turn G into an automaton M if s ∈ L(M ) then output s is nonself else output s is self
the initial state to the root of G (line 3 in Algorithm 4.3). Now we obtain automaton
M that has the properties claimed by the theorem. Moreover, it is easy to see that it
takes O((cid:96)) to check whether s ∈ L(M ) or not. So we obtain the claimed runtimes.
4.4 A fast negative selection algorithm based on r-
contiguous detector
In this section, two arrays P and Q are used as the same meaning in the previous
section. Besides, we use two set P1 and P2 of pointers for chunk detectors. They swap
their roles at the end of each step.
Theorem 4.2. There exists an algorithm that, given any S ⊆ Σ(cid:96) and r ∈ {1, . . . , (cid:96)},
58
constructs a finite automaton M with L(M ) ∩ Σ(cid:96) = CON T (S, r) in time O(|S|.(cid:96).|Σ| +
(Σr − K).(cid:96)), where K= min{|S[i..i + r − 1]|, i = 1, . . . , (cid:96) − r + 1}.
59
Algorithm 4.4 Algorithm to generate negative r-contiguous detectors set 1: procedure Cont-NSA(S, r, G)
Input: A set of self strings S ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: An automaton G presenting negative r-contiguous detectors set.
G = P1 = ∅ for s ∈ Σr \ S[1 . . . r] do
insert s into G and create p.end that points to the leaf node in path s p.str = s[2 . . . r] P1 = P1 ∪ {p}
end for for i = r + 1, ..., (cid:96) do for j = 1, ..., |S| do
if Q[P [j].str][sj[i]] = N U LL then Q[P [j].str][sj[i]] = new() end if
end for P2 = ∅ for each p ∈ P1 do for c ∈ Σ do if (Q[p.str][c] = N U LL) then
Q[p.str][c] = new() p.str = p.str[2...r − 1] + c p.end = Q[p.str][c] P2 = P2 ∪ {p} else if Q[p.str][c] is newly created in this inner for loop then
p.str = p.str[2...r − 1] + c p.end = Q[p.str][c] P2 = P2 ∪ {p} end if end if end for
end for for j = 1, ..., |S| do
Q[P [j].str][sj[i]] = N U LL P [j].str = P [j].str[2...r − 1] + sj[i]
end for for each p ∈ P1 do for c ∈ Σ do Q[p.str][c] = N U LL end for
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42:
end for P1 = P2
end for for each node n ∈ G do
60
if n is not reachable to a leaf node ∈ P1 then delete n end if
43: 44: 45: 46: 47: 48: end procedure
end for turn G into an automaton
Proof. We first construct a prefix DAG G as follows: start with an empty prefix DAG,
and also a prefix tree in this case, then insert every s ∈ Σr \ S[1 . . . r] into it. We also
create P1 including all points that links to the leaves of G. This was done by first for
loop in Algorithm 4.4 in time (Σr − |S[1..r]|).r. After this step, we have L(G) ∩ Σr =
CHUNK(S[1..r], r).
Then for each s[i], s ∈ S, and i = r + 1, .., (cid:96), we add nodes and corresponding
edges to G by using linking pointers of P and Q as in lines 8-39 in Algorithm 4.4. The
first inner for loop (line 9) hashes every s ∈ S into Q. The second inner for loop (line
15) expands G to the next level. A main idea for this step is to lengthen each path in G
if the corresponding point p is not hashed by Q. In the other words, each path linked
by p in G is concatenated with it right overlapping r-chunk detector if this detector
exists, i.e. that corresponding to a NULL point in Q.
The next two loops (lines 31 and 35) are to free all pointers in Q and to update
all strings in P. The last for loop (line 42) deletes all partial paths in G that do not
reach to a leaf node in P1. In the other words, after this step, all leaf in G has level (cid:96).
It takes ((cid:96) − r) main iterative steps (line 8) to generate DAG G presenting
CHUNK(S, r). Two inner for loops (lines 9, 31), each with |S| steps. Two other inner
for loops (lines 15, 35), each with |P1| steps. We have |P1| ≤ |Σr| − |S[i..i + r − 1]|, i =
1, . . . , (cid:96) − r + 1. In the worst case will occur, |P1| = |Σr| − |S[i..i + r − 1]|, if there exist
an index i that all r-chunk detectors of Dni take part in expanding the DAG G.
Therefore, the time complexity of Algorithm 4.4 is (2.|S|.(cid:96).|Σ| + 2.(Σr − K).(cid:96)),
where K = min{|S[i..i + r − 1]|, i = 1, . . . , (cid:96) − r + 1}.
Finally, we turn DAG G into a finite automaton with the claimed property by
making all leaves accepting states with self-loops for all c ∈ Σ and setting the initial
state to the root of G. An example of this construction for dataset in Example 4.1 is
61
Figure 4.3: Diagram of the algorithm to generate negative r-contiguous detectors set.
showed in Fig. 4.4.
62
Figure 4.4: An automaton represents 3-contiguous detectors set.
Corollary 4.1. Given any S ⊆ Σ(cid:96), s ∈ Σ(cid:96) and r ∈ {1, . . . , (cid:96)}. After time O(|S|.(cid:96).|Σ|+
(Σr − K).(cid:96)) preprocessing, where K = min{|S[i..i + r − 1]|, i = 1, . . . , (cid:96) − r + 1}, we
can check if s ∈ L(CON T (S, r)) in time O(|(cid:96)|.r).
Proof. Firstly, a DAG G representing all r-contiguous detectors is created by Algo-
rithm 4.4 in time O(|S|.(cid:96).|Σ| + (Σr − K).(cid:96)). Secondly, to check if s ∈ L(CON T (S, r))
we can match each substring |s[i..i + r − 1]|, i = 1, . . . , (cid:96) − r + 1 to corresponding
detectors in G level-by-level.
Figure 4.3 presents a diagram of the algorithm.
4.5 Experiments
In our experiments, we use a popular flow-based dataset NetFlow [86] and a
random dataset. The flow-based NetFlow is generated from packet-based DARPA
dataset [1] is used for Experiment 1. It was encoded as a set of 105,033 binary strings
with their length of 49. A randomly created dataset containing 50,000 strings of length
100 is used for Experiment 2. Distribution of parameter values and running time (in
milliseconds) of experiments are showed in Table 4.2 and in Table 4.3. All experiments
are performed on a laptop computer with Windows 8 Pro 64-bit, Intel Core i5-3210M,
CPU 2.50GHz (4 CPUs), 4GB RAM.
In Table 4.2, the runtime of r-chunk detector-based NSA in [32] on Experiment
1 and Experiment 2 are in columns a and c, respectively. Also, the runtime of proposed
Chunk-NSA on Experiment 1 and Experiment 2 are in columns b and d, respectively.
As can be seen from the results in the table that there is a positive correlation between
threshold r and the ratio of runtime of NSA [32] to runtime of Chunk-NSA (as in
columns a/b and c/d).
63
Table 4.2: Comparison of Chunk-NSA with r-chunk detector-based NSA.
r
10 11 12 13 14 15 16 17 18 19 20 Experiment 1 a/b b 2.9 454 3.2 439 3.4 454 4.1 435 4.2 418 4.3 486 4.5 437 5.3 391 5.5 410 6.3 375 7.0 359 a 1,330 1,395 1,564 1,767 1,771 2,092 1,985 2,071 2,249 2,345 2,859 Experiment 2 c/d d 3.1 482 3.5 472 4.5 360 4.7 453 5.0 451 6.2 450 8.5 365 9.6 427 10.7 422 11.3 470 15.6 437 c 1,490 1,633 637 2,134 2,276 2,793 3,086 4,079 4,509 5,312 6,796
Fig. 4.5 shows that ratios are closer to r when r increases, and it relatively fits
theoretical proof in the previous section. In the figure, ratio 1 and ratio 2 are ratios
of runtime of r-chunk detector-based NSA to runtime of Chunk-NSA in Experiment
1 and Experiment 2, respectively. As presented in the bottom row of Table 4.2 that
when r = 20, the training time of Chunk-NSA on Experiments 1 and 2 are 7.0 and
15.6 times faster than that of 20-chunk detector-based NSA in [32], respectively.
Figure 4.5: Comparison of ratios of runtime of r-chunk detector-based NSA to runtime of Chunk-NSA
Similarly, Table 4.3 reveals that the runtime of r-contiguous detector-based
NSA in [32] on Experiment 3 and Experiment 4 are in columns a and c, respectively.
Also, the runtime of proposed Cont-NSA on Experiment 3 and Experiment 4 are in
64
Table 4.3: Comparison of proposed Cont-NSA with r-contiguous detector-based NSA.
r
10 11 12 13 14 15 16 17 18 19 20 Experiment 3 b a 1125 1453 1032 1437 1203 3082 1047 2344 1156 2688 1188 4578 1254 7181 1313 9985 1235 12157 1500 19652 1797 24156 a/b 1.3 1.4 2.6 2.2 2.3 3.9 5.7 7.6 9.8 13.1 13.4 Experiment 4 d c 953 1500 1109 1704 1063 1906 1015 2157 1031 3188 1235 4203 1125 6484 1156 8594 1047 12923 1625 18547 1922 24282 c/d 1.6 1.5 1.8 2.1 3.1 3.4 5.8 7.4 12.3 11.4 12.6
Figure 4.6: Comparison of ratios of runtime of r-contiguous detector-based NSA to runtime of Cont-NSA
columns b and d, respectively. According to described results above, there is a positive
correlation between threshold r and the ratio of runtime of NSA [32] to runtime of
Cont-NSA (as in columns a/b and c/d).
Fig. 4.6 introduces that ratios are closer to r when r increases, and it relatively
fits theoretical proof in the previous section. In the figure, ratio 3 and ratio 4 are
ratios of runtime of NSA to runtime of Cont-NSA in Experiment 3 and Experiment
4, respectively. Results in the bottom row of Table 4.3 show that when r = 20, the
training time of Cont-NSA on Experiments 3 and 4 are 13.4 and 12.6 times faster than
65
that of 20-contiguous detector-based NSA in [32], respectively.
4.6 Summary
In this chapter, we have introduced a novel fast NSA to generate r-chunk detec-
tors set in linear time. Empirical results and theoretical proof show that the proposed
algorithm Chunk-NSA outperform the benchmark one in [32].
A limitation of Chunk-NSA is that it uses two arrays Q and P temporarily to
expand the prefix DAG representing all positive detectors. The size of Q and P are |Σ|r
and |S|.r, respectively. However, this drawback may not be seen as a serious problem
because of modern systems that support with huge internal memory, especially it will
be more feasible in case of binary, Σ = {0, 1}.
The main limitation of Cont-NSA is that it takes time (cid:96).r in detection phase.
However, the generation of a DAG presenting all r -contiguous detectors takes only a
linear time with respect to cardinality of S (Theorem 4.2). This, therefore, provides
additional benefits for operation such as to count the number of detectors, or to list
detectors explicitly for distributed detection.
To verify the effectiveness of the proposed approaches, two datasets are adopted
to validate this approaches. The results indicate that the proposed approaches can
improve much of training time. We anticipate that the Cont-NSA can be further
developed by using more complicated graphs for better detection time. This will be
discussed specifically in our future research. The main contribution of this chapter is
submitted to an ISI indexed journal.
66
Chapter 5
APPLYING HYBRID ARTIFICIAL IMMUNE SYSTEM FOR NETWORK SECURITY
In this chapter, we present our algorithm, called PSA2, to apply a hybrid algo-
rithm that combines two-class PSA and statistical technique to achieve better perfor-
mance of intrusion detection.
5.1
Introduction
This chapter presents a hybrid algorithm that combines two-class PSA and sta-
tistical technique to achieve better performance of intrusion detection. This improved
PSA is trained with both self and nonself samples while a typical PSA is trained with
one type of samples only. Two types of detectors are generated by PSA2 to cover
area of both self space and nonself space. The first one generated by original PSA.
Considering nonself as self, we apply PSA with new self samples set to obtain the
second detectors type. Both types of these detectors will be used to detect normal and
abnormal data instances in our NIDS.
To the best of our knowledge, there has not been any published attempt in
modeling PSA or its variant for flow-based NIDS. Our algorithm PSA2 would be the
first example with respect to flow-based AIS-NIDS. Recently, Fernandes el al. have
commented in their survey that AIS-NIDS should be combined with network traffic
statistics to minimize the obstacle the amount of network traffic and diminish the
computational burden [35]. Therefore we believe that our proposed algorithm in this
67
chapter is the first attempt in combining selection algorithms with statistical technique.
The rest of the chapter is organized as follows. In the next section, we review
the works most related to our thesis. Section 5.2 presents the PSA2 in detail. Four
experiments on two datasets are introduced in Section 5.3. Section 5.4 concludes the
chapter and discusses some possible future works.
5.2 Related works
Recently, artificial intelligence-based intrusion detection systems attract con-
siderable interest from researchers [43]. Various artificial intelligent algorithms are
deployed to build NIDSs (flow-based or network-based or hybrid ones) or their com-
ponents. Support vector machines (SVMs), fuzzy systems, and rule-based system are
becoming popular in the context of producing software components for IDS [96, 61, 91].
We have also seen the recent development in this area with natural computation such
as AIS [58, 74, 97, 47], and evolutionary algorithms, like particle swarm optimization
(PSO) [28], genetic algorithm (GA) [14], ant colony optimization (ACO) [63, 95].
Statistical techniques have attracted a lot of attention by researchers in network
security field over the last three decades [88]. Simple statistical techniques can be used
to describe the network traffic and that vary significantly from the normal behavior to
the anomalous one [16]. However, our review shows that the approach in this chapter
is the first attempt in combining selection algorithms with statistics for the purpose of
intrusion detection.
As mentioned in Section 1.6, one major factor about the network flow is that flow
does not provide any payload. This method processes only packet headers; therefore,
it causes a significant drop in the volume of traffic to be processed. For this reason,
flow-based NIDS currently attracts much attention of researchers, especially for high-
speed networks [99]. For example, a system operates on lightweight network flows
and uses One-Class SVMs for analysis is proposed in [91]. Different from traditional
anomaly detection systems, the system is trained with malicious rather than with
benign network data. The system is suited for the load of large-scale networks and is
less affected by typical problems of ordinary anomaly detection systems. A method
68
focuses on the traffic flow statistics of attackers who send flows to both the object
port and generally closed port in the global network is proposed in [69]. This flow-
based attacker detection method accurately identifies hosts sending flows to object port
as attackers, without any deep packet inspection. An improved block-based neural
network (BBNN) is introduced in [86]. The BBNN evolved by modifying original
Genetic Algorithm. The experiment on a packet-based DARPA dataset shows that
enhanced BBNN outperforms some other algorithms. A centroid-based partitioning
strategy technique for data preprocessing is proposed in [43]. The method can select a
high-quality subset that has far fewer instances from the original training data to build
lightweight NIDSs. Experimental results on flow-based datasets introduced in [76, 91]
show that it can achieve admirable efficiency in training time and comparable detection
performance. Multilayer perceptron (MLP) neural network and Gravitational Search
Algorithm (GSA) are used for NIDS to classify benign and malicious flows with very
high accuracy rate [70]. A hybrid of two metaheuristic algorithms, PSO and GSA,
called PSOGSA algorithm, is examined to optimize the interconnection weights of a
MLP neural network [48]. This metaheuristic algorithm helps a MLP-based anomaly
detector to improve accuracy and false alarm rate. M. J. Chapple et al. [17] and
M. Zolotukhin et al. [99] consider network traffic as a time series, a powerful tool to
describe network traffic evolution patterns, to build effective NIDSs. An extended
version of inductive supervised SVMs, called S4VM, is recently proposed in [49]. This
semi-supervised method to develop a flow-based NIDSs which enables training with
both labelled and unlabelled data. It is very effective to detect anomalies in a small
amount of labelled data.
Above works achieved remarkable results, however, their FAR are relatively
high. Our proposed approach in this chapter combining NSA, PSA and simple statis-
tical techniques outperforms all mentioned works in terms of DC, ACC and FAR.
69
5.3 Hybrid positive selection algorithm with chunk
detectors
Given (cid:96), r, a normal dataset N ⊂ Σ(cid:96), an abnormal dataset A ⊂ Σ(cid:96). Algo-
rithm 5.1 and Algorithm 5.2 create positive trees and negative trees, respectively. A
new data instance s ∈ Σ(cid:96) is detected as self or nonself by Algorithm 5.3.
Algorithm 5.1 Algorithm to generate positive trees 1: procedure SelfTreesGeneration(N , r, TN )
Input: A set of self strings N ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set TN of (cid:96) − r + 1 prefix trees presenting positive trees. for i = 1, ..., (cid:96) do
Create an empty tree TNi
end for for all s ∈ Nr do for i = 1, ..., (cid:96) do
2: 3: 4: 5: 6: 7: 8: end for 9: 10: end procedure
insert every s[i . . . i + r − 1] into TNi end for
Algorithm 5.2 Algorithm to generate negative trees 1: procedure NonselfTreesGeneration(A, r, TA)
Input: A set of nonself strings A ⊆ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set TA of (cid:96) − r + 1 prefix trees presenting negative trees. for i = 1, ..., (cid:96) do
Create an empty tree TAi
end for for all s ∈ Ar do for i = 1, ..., (cid:96) do
2: 3: 4: 5: 6: 7: 8: end for 9: 10: end procedure
insert every s[i . . . i + r − 1] into TAi end for
In Algorithm 5.3, Leaf(s, T ) is a function to return frequency value correspond-
ing with a string s ∈ Σr, this value is contained in the leaf of the path labeled by s in
tree T . Parameters d1 and d2 sum up frequencies of s in positive trees TN and negative
trees TA, respectively. Four other parameters t1, t2, t3, t4 are also used in PSA2 to
control its performance.
70
Algorithm 5.3 Algorithm PSA2 to detect if a new data instance s ∈ Σ(cid:96) is self or nonself 1: procedure PSA2(N , A, s, r, TN , TA)
Input: A set of nonself strings N ⊆ Σ(cid:96), a set of self strings A ⊆ Σ(cid:96), an unlabeled string s ∈ Σ(cid:96), a matching threshold r ∈ {1, . . . , (cid:96)}. Output: A set TA of (cid:96) − r + 1 prefix trees presenting negative trees, a set TN of (cid:96) − r + 1 prefix trees presenting positive trees, a label of s (self or nonself).
SelfTreesGeneration(N , r, TN ) NonselfTreesGeneration(A, r, TA) d1 = d2 = d3 = 0 Create a string sr as ring representation of the string s for i = 1, ..., (cid:96) do
s(cid:48) = sr[i . . . i + r − 1] if s(cid:48) /∈ TNi then d1 = d1 + 1
end if if s(cid:48) /∈ TAi then d2 = d2 + 1
2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: end for 17: if d1 > t2 then output s is nonself 18: else if d2 > t3 then output s is self 19: else if d3.t4 > (cid:96) then output s is nonself 20: else output s is self 21: end if 22: 23: end procedure
end if if Leaf(s(cid:48), TNi) < Leaf(s(cid:48), TAi).t1 then d3 = d3 + 1 end if
Example 5.1. Given t1 = t2 = t3 = t4 = 2, s = 00001, (cid:96) = 5, r = 3 and a self
strings set N , a nonself strings set A as in Example 1.5, N = {00000, 10110} and
A = {10111, 11111}. Ring type of s is s(cid:48) = 0000100. Since only two substrings of s(cid:48),
s(cid:48)[1 . . . 3] ∈ TN1 and s(cid:48)[2 . . . 4] ∈ TN2, d1 = 3 > t2, d2 = 5. As a result of Algorithm 5.3,
s is classified as nonself (output in line 18).
5.4 Experiments
All experiments are performed on a laptop computer with Windows 8 Pro 64-bit,
Intel Core i5-3210M, CPU 2.50GHz (4 CPUs), 4GB RAM.
71
5.4.1 Datasets
In our experiments, we use two popular flow-based datasets: NetFlow [86] and
TU [75]. Similar to the previous studies [86, 49, 48], we select the same 4 features from
the NetFlow dataset as the input of experiments 1, 5, 6, and 7; 7 features from the
NetFlow dataset as the input of experiments 2 and 3; 4 features from the UT dataset as
the input of experiment 4. Table 5.1 shows detailed information about these features.
Noticeably, feature Duration is the number of seconds from start time to end time of
the flow.
Table 5.1: Features for NIDS.
Experiment Dataset Feature
1 2, 3
UT 4 5, 6, 7 NetFlow Packets, Octets, Duration, Flags NetFlow Packets, Octets, Duration, Scr port, Dst port, Flags, IP protocol Packets, Octets, Duration, Flags NetFlow Packets, Octets, Duration, Flags
The large number of flows in UT dataset causes a time-consuming training
phase. Therefore, we only use all 5,968 unknown traffic as normal data and a frac-
tion, 10000, of other traffic as abnormal data in Experiment 4. The ratios of training
dataset to testing dataset for experiments 5, 6, and 7 are 66.6:33.3, 1:1, and 33.3:66.6,
respectively. The distribution of flows in experiments is described in Table 5.2.
5.4.2 Data preprocessing
The preprocessing for the features of training dataset consists of two steps.
The first step is to map all features to binary string features. After this step, a total
string features are constructed for both normal and anomalous data. The second one
is to concatenate the binary string features for every flows. After this step, training
dataset contains equaled-length binary strings. For testing dataset, each feature will
be converted to a binary string using the map of training data. If a feature value is
not in range of corresponding training feature range, its binary form will be selected
randomly.
72
Example 5.2. Given a training dataset contain numerical two-feature flows S =
{N, A}, where set of normal data N = {(1, 8), (0.5, 6)} and abnormal data A = {(1,
9), (1, 6), (0.5, 9)}.
The ranges of the first feature and the second one are {0.5, 1} and {6, 8, 9},
respectively. Therefore, we can use one bit (two bits) to encode the first (the second)
feature. A map of the numeric ranges {0.5, 1} and {6, 8, 9} can be binary string ranges
{0, 1} and {00, 01, 10}, respectively. Finally, the training dataset contains strings of
length 3, S = {N, A}, where N = {101, 000} and A = {110, 100, 010}.
Assuming a flow s = (2, 8) is in testing dataset. The first feature (2 /∈ {0.5, 1})
is randomly encoded by 0 or 1 while the second feature (8 ∈ {6, 8, 9}) is encoded by 01.
The binary form of s is 001 or 101.
5.4.3 Performance metrics and parameters
We used three metrics to evaluate each machine learning technique: Detection
rate (DR), Accuracy rate (ACC) and False alarm rate (FAR).
We used a 10-fold cross-validation technique to evaluate our approach in Exper-
iments 1 and 4. In Experiments 2, 3 and 5-7, a hold-out validation is used to evaluate
our approach with subsets of Netflow dataset as described in Table 5.2.
Algorithm PSA2 requires the 5 integer-valued parameters r, t1, t2, t3, t4, which
control the detection performance of NIDS. To obtain optimal values of parameters
for a given dataset, we first predict the possible value ranges for them. Then we test
with every quintuple (r, t1, t2, t3, t4) in these ranges on a randomly sampled subset,
usually one-tenth, of the testing dataset. Finally, we retest the system on the whole
testing dataset in localities of parameters in the chosen quintuple. Regarding to 10-fold
cross-validation, the chosen quintuple in the fold 1 is used to estimate the others folds
to reduce training time. Additionally, value of (cid:96) is determined by encoding technique
in data preprocessing. Distribution of parameter values and a number of flows for all
four experiments are showed in Table 5.2.
Since it is challenging to decide which objective function is the best to choose
an optimal quintuple. More specifically, optimization problem in this research is a
73
Table 5.2: Distribution of flows and parameters for experiments.
Experiment Normal flows
1 2 (cid:96) 49 63 r 10 10 t1 5 7 t2 7 9 t3 8 14 t4 9 7
3 53 9 12 16 7 8
4 5 41 49 7 14 1 3 7 37 2 11 4 4
6 48 12 4 19 7 5
7 46 11 2 30 8 3 105,033 59,980 for training, 45,053 for testing 59,980 for training, 45,053 for testing 5,968 70,022 for training, 35,011 for testing 52,516 for training, 52,517 for testing 35,011 for training, 70,022 for testing Attack flows 24,538 5,952 for training, 18,586 for testing 5,952 for training, 18,586 for testing 10,000 16,359 for training, 8,179 for testing 12,269 for training, 12,269 for testing 8,179 for training, 16,359 for testing
multi-objective optimization problem: we aim to increase the DR while reducing the
FAR at the same time. We select our objective function to reach the highest DR while
keeping FAR as low as possible.
5.4.4 Performance
Table 5.3 illustrates the results of four experiments. Note that some algorithms
in [86] run with different parameters, we only select a run with the highest detection
rate for comparison. We compare PSA2 performance with some other algorithms in
most recently published works and some ones run with the help of WEKA 3 with their
parameters optimized by meta-classifier CVParameterSelection.
74
Table 5.3: Comparison between PSA2 and other algorithms.
Algorithms ACC DR FAR Experiment 1
PSA2 BBNN [86] SVM (linear) [86] SVM (polynomial) [86] SVM (RBF) [86] SVM (sigmoid) [86] Naive Bayes [86] Random Forest Random Tree 0.9879 0.9682 0.9273 0.9344 0.9468 0.9614 0.6777 0.9943 0.9941 0.9977 0.9992 0.9992 0.9975 0.9974 0.9974 0.9975 0.9976 0.9977 0.0146 0.0514 0.1223 0.1071 0.0839 0.0871 0.5116 0.0064 0.0067 Experiment 2
0.9676 0.9645 0.9622 0.9564 0.9369
PSA2 PSOGSA-based MLP [48] Cuckoo-based MLP [48] GSA-based MLP [48] PSO-based MLP [48] EBP-based MLP [49] Random Forest Random Tree 0.9896 0.9812 0.9796 0.9763 0.9672 0.9655 0.9309 0.7596 0.7634 0.1771 0.0013 0.0119 0.0133 0.0155 0.0203 0.0315 0 0.0001 Experiment 3 0.9626
PSA2 (10% labelled data) S4VM (10% labelled, 90% unlabelled data) [49] Random Forest (10% labelled data) Random Tree (10% labelled data) 0.9781 0.9196 0.7157 0.7151 0.0421 0.0400 0.0157 0.0384 0.0063 0.0065 Experiment 4
PSA2 Naive Bayes SVM (linear) SVM (polynomial) SVM (RBF) SVM (sigmoid) Random Forest Random Tree 0.9757 0.6832 0.7315 0.7143 0.6263 0.6263 0.9777 0.9766 0.9634 0.9972 1.0000 0.8788 1.0000 1.0000 0.9675 0.9664 0.0035 0.8668 0.7185 0.5613 0.9998 1.0000 0.0052 0.0062 Experiment 5
PSA2 Random Forest 0.9845 0.9818 0.9979 0.9997 0.0186 0.0868 Experiment 6
PSA2 Random Forest 0.9961 0.9870 0.9915 0.9998 0.0028 0.0636 Experiment 7
PSA2 Random Forest 0.9847 0.9901 0.9979 0.9999 0.0183 0.0490
75
The results of Experiment 1 show that PSA2 reaches best performance when
we set quintuple values (10, 5, 7, 8, 9) for parameters quintuple (r, t1, t2, t3, t4), in
which the remarkable FAR is 1.46%, and ACC is 98.79%. Moreover, DR is ranked the
second with 99.77%. BBNN and SVM (linear) have the highest DR, but their FARs are
not comparable with that of PSA2. Both Random Forest and Random Tree perform
excellently with highest ACC, admirable DR and genuine FAR.
Similar to the corresponding studies [48, 49], results of PSA2 in Experiments 2,
3 are average results from ten repeated runs of PSA2. Also, results of Experiment 2
show that PSA2 outperforms four algorithms proposed in [48] in terms of three metrics
DR, ACC and FAR. Also, PSA2 is better than both Random Forest and Random Tree
in terms of ACC and DR.
PSA2 can not train with unlabelled data, so we only use 10% labelled dataset,
5,998 attack flows and 595 normal flows, for training phase in Experiment 3. Meanwhile
algorithm S4VM uses 100% training dataset for training, in which 90% unlabelled
dataset and 10% labelled dataset. Results from the experiment strongly confirm the
efficiency of PSA2 in comparison with methods proposed in [49]. Both Random Forest
and Random Tree perform badly with very low ACC and DR.
In Experiment 4, PSA2 is competitive with both Random Forest and Random
Tree. Some SVMs achieved admirable accuracies with 100%, but their FARs are very
high with approximately or even 100%. Poor FARs of these algorithms mean that they
can not verify the nature of benign traffic in the dataset, while PSA2, Random Forest
and Random Tree can.
In experiments 5-7, ACCs and DRs of two algorithms PSA2 and Random Forest
are similar. However, FARs of PSA2 are better (far lower) than that of Random Forest.
In all experiments, FAR of PSA2 is always the lowest among that of compared
algorithms except for Random Forest and Random Tree in experiments 1-3. This result
partly proves the consistency of PSA2 for NIDS.
Two drawbacks of PSA2 are listed: 1- It must store a map to encode each new
sample into a binary string in detection phase. The size of the map depend on the
distribution of flow features’ domains. However, all four experiments show that map
76
storage does not exceed 1% size of training dataset. 2- Time to tune parameters is
an expensive factor in PSA2. Depending on training data, it takes about 2-5 hours to
choose optimal parameters in the experiments.
Another experiment is conducted for comparing PSA2’s performance in cases of
ring-based and linear-based datasets. In line 2 of Table 5.4, performance metrics are of
ring string-based PSA2 from Experiment 1 (line 3 in Table 5.3). The best performance
metrics for linear string-based PSA2 conducted in the same conditions are in line 3 of
the table. In this case, (cid:96) = 40 and optimal values for arguments r, t1, t2, t3, and t4 are
9, 4, 8, 7, and 10, respectively. The results show that ring string-based PSA2 is better
than linear string-based PSA2 in terms of three metrics ACC, DR, and FAR.
Table 5.4: Comparison between ring string-based PSA2 and linear string-based PSA2.
Algorithms
Ring string-based PSA2 (from Experiment 1) Linear string-based PSA2 ACC DR 0.9879 0.9833 0.9977 0.9973 FAR 0.0146 0.0199
5.5 Summary
In this chapter, we present a new PSA, called PSA2, for two-class classification
through a series of works. It has four important features that make it unique and
alleviate some issues in NSAs. Firstly, it uses ring representation instead of linear
one for better performance in terms of both detection and accuracy rates. Secondly,
proposed algorithm used PSA with both type of data, normal samples and abnormal
ones, in a uniform framework, while other PSAs use only one type of samples. This
results good coverage of both self space and nonself one. Last but not least, the
process of parameters optimization (r, t1, t2, t3, t4,) as well as the method of using
three frequency-related parameters d1, d2 and d3, play an important role in improving
overall performance. The new method to map integer values into binary strings is the
forth algorithm’s feature.
To verify the effectiveness of the proposed approach, two different datasets are
adopted to validate this approach. The results from four experiments indicate that the
proposed approach can produce competitive and consistent classifying performance on
77
real datasets. Moreover, results form Experiment 3 with only 10% of training dataset
confirm that PSA2 can detect anomalies in a small amount of labelled data.
In the future, we are planning to combine our algorithms with some machine
learning methods to have better detection performance, as well as reduce training time.
Moreover, it would be interesting to further develop technique on how to chose optimal
parameters as well as to integrate them in new objective functions. The main contri-
bution of this chapter is accepted to publish in proceedings of a National Conference
on Fundamental and Applied IT Research.
78
CONCLUSIONS
Applying computational intelligence-based techniques is an inevitable trend to
build smart IDSs. This approach helps computer network more adaptable to continu-
ously changing environment with more sophisticated attacks. In this thesis, we show
that AIS, a subfield of computational intelligence, is relatively success in building NIDS
at least in simulation stage. A series of works is proposed and investigated to improve
NSAs for two popular matching rules, r -chunk and r -contiguous.
Contributions of the thesis
The major contributions of this research are:
1. Propose a ring representation of data instead of linear one for better performance
in terms of both detection rate and accuracy rate.
2. Propose an algorithm PNSA that combines two selection algorithms in a uni-
form for compact representation of data. Performance of the algorithm is highly
guaranteed by the experiment results and theoretical proof.
3. Propose a NSA with variable length of detectors, VNSA, to generate a com-
plete and non-redundant detector sets as well as reduce detectors storage and
classification time.
4. Propose a r -chunk detector-based NSA, Chunk-NSA, and experimentally and
theoretically prove that it is r times faster in data training compared with the
most recently published algorithms.
5. Propose an algorithm PSA2 to apply a hybrid algorithm that combines PSA and
some statistical approaches to achieve better performance of intrusion detection
in compared with some recently published works.
79
6. Propose a data conversion to convert data into a suitable binary format.
One minor contribution of this thesis is proposing an algorithm Cont-NSA on r -
contiguous matching rule and proving that it is approximately r times faster in data
training compared with that of a recently published algorithm.
Future works
In the future, we would like to:
• Combine our algorithms with some machine learning methods to have better
detection performance.
• Further develop technique that can choose optimal parameters and integrate them
in new objective functions for hybrid NIDS.
• Improve proposed algorithms to apply them on other data types with different
data representations, matching rule is also a future research direction.
• Further optimize Cont-NSA for better detection time O((cid:96)) and optimal training
time O(|S|(cid:96)) .
In a nutshell, the thesis has overviewed important works relating to the research topic,
has proposed some improvements of selection algorithms, and has verified the effec-
tiveness of proposed algorithms by experiments and proofs. Obtained results has been
satisfying given research objectives. However, the results are also humble and should
be improved more by the PhD. student in the future. The PhD. student would like to
receive any comments from scientists, and other readers concerned about the subject
so that the result of the topic can be increasingly perfect.
80
Published works
A1. N. V. Truong and P. D. Lam, “Improving negative selection algorithm in artificial
immune systems for computer virus detection,” Journal of Science and Technology,
Thai Nguyen University, 72(06):53–58, 2010.
A2. N. V. Truong, V. D. Quang and T. V. Ha, “A fast r-chunk detector-based negative se-
lection algorithm,” Journal of Science and Technology, Thai Nguyen University,90(02):55–
58, 2012.
A3. N. V. Truong and T. V. Ha, “Another look at r-chunk detector-based negative
selection algorithm,” Journal of Science and Technology, Thai Nguyen University,
102(02):45–50, 2013.
A4. N. V. Truong, N. X. Hoai, and L. C. Mai, “A Novel Combination of Negative
and Positive Selection in Artificial Immune Systems,” Vietnam National Univer-
sity, Hanoi Journal of Science: Computer Science and Communication Engineering,
31(1):22–31, 2015.
A5. N. V. Truong, P. D. Lam, and V. D. Quang, “Some Improvements of Selection
Algorithms for Spam Email Filtering,” Journal of Science and Technology, Thai
Nguyen University, 151(06):85–91, 2016.
A6. N. V. Truong, N. X. Hoai, “An improved positive selection algorithm for flow-based intrusion detection,” Proceedings of the The 2nd National Conference on Fundamen-
tal and Applied IT Research (FAIR), 2019 (Accepted).
81
Bibliography
[1] DARPA Dataset. https://www.ll.mit.edu/r-d/datasets. [accessed 20-Mar-2019].
[2] FDFA Datasets. http://www.unsw.adfa.edu.au/australian-centre-for-cyber-
security/cybersecurity/ADFA-IDS-Datasets/. [accessed 20-July-2018].
[3] KDD99 Dataset. http://kdd.ics.uci.edu/databases/kddcup99. [accessed 20-Mar-
2018].
[4] NSL-KDD Dataset. https://www.unb.ca/cic/datasets/nsl.html. [accessed 25-
July-2019].
[5] S. Afaneha, R. A. Zitarb, and A. A. Hamamic. Virus detection using clonal selec-
tion algorithm with genetic algorithm (VDC algorithm). Applied Soft Computing,
13:239–246, 2013.
[6] M. Ayara, J. Timmis, R. de Lemos, L. N. de Castro, and R. Duncan. Nega-
tive selection: How to generate detectors. In Proceedings of the 1st International
Conference on Artificial Immune Systems (ICARIS), pages 89–98, 2002.
[7] A. S. A. Aziz, M. Salama, A. ella Hassanien, and S. E. O. Harafi. Detectors gen-
eration using genetic algorithm for a negative selection inspired anomaly network
intrusion detection system. In Proceedings of the FedCSIS, pages 597–602, 2012.
[8] K. Bache and M. Lichman. UCI Machine Learning Repository.
http://archive.ics.uci.edu/ml. [accessed 20-July-2016].
[9] J. Balthrop, F. Esponda, S. Forrest, and M. Glickman. Coverage and generaliza-
tion in an artificial immune system. In Proceedings of Genetic and Evolutionary
Computation Conference (GECCO), pages 3–10, 2002.
[10] J. Balthrop, S. Forrest, and M. Glickman. Revisiting LISYS: Parameters and
normal behavior. In Proceedings of the Congress on evolutionary computation,
pages 1045–1050, 2002.
82
[11] F. Barani. A hybrid approach for dynamic intrusion detection in ad hoc networks
using genetic algorithm and artificial immune system. In Proceedings of the Iranian
Conference on Intelligent Systems (ICIS), pages 1–6, 2014.
[12] D. K. Bhattacharyya and J. K. Kalita. Network anomaly detection: A machine
learning perspective. CRC Press, 2013.
[13] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita. Network anomaly de-
tection: methods, systems and tools. IEEE communications surveys & tutorials,
16(1):303–336, 2014.
[14] R. Bronte, H. Shahriar, and H. M. Haddad. A signature-based intrusion detection
system for web applications based on genetic algorithm. In Proceedings of the
International Conference on Security of Information and Networks, pages 32–39,
2016.
[15] T. C. Butler, M. Kardar, and A. K. Chakraborty. Quorum sensing allows T cells
to discriminate between self and nonself. Proceedings of the National Academy of
Sciences, 110(29):11833–11838, 2013.
[16] C. Callegari and N. Cyprus. Statistical approaches for network anomaly detection.
In Proceedings of the 4th International Conference on Internet Monitoring and
Protection (ICIMP), pages 24–28, 2009.
[17] M. J. Chapple, T. E. Wright, and R. M. Winding. Flow anomaly detection in
firewalled networks. In Proceedings of the Securecomm and Workshops, pages 1–6,
2006.
[18] S. Chen. Optimized multilevel immune learning algorithm in abnormal detection.
Information Technology Journal, 12(3):514–517, 2013.
[19] D. Dasgupta. Artificial Immune Systems and Their Applications. Springer-Verlag,
Berlin Heidelberg, 1998.
[20] D. Dasgupta and R. Azeem. An investigation of negative authentication systems.
In Proceedings of the 3rd International Conference on Information Warfare and
Security, pages 117–126, 2008.
83
[21] D. Dasgupta and Y. Cao. An immunogenetic approach to spectra recognition. In
Proceedings of the Genetic and Evolutionary Computation Conference (GECCO),
pages 149–155, 1999.
[22] D. Dasgupta and S. Forrest. Novelty detection in time series data using ideas
from immunology. In Proceedings of the International Conference on Intelligent
Systems, pages 82–87, 1996.
[23] D. Dasgupta and F. Gonzalez. An immunity-based technique to characterize in-
trusions in computer networks. IEEE Transactions on Evolutionary Computation,
6:281–291, 2002.
[24] D. Dasgupta and F. Nino. A comparison of negative and positive selection algo-
rithms in novel pattern detection. In Proceedings of the International Conference
on Systems, Man, and Cybernetics, pages 125–130, 2000.
[25] D. Dasgupta and S. Saha. Password security through negative filtering. In Proceed-
ings of the International Conference on Emerging Security Technologies (EST),
pages 83–89, 2010.
[26] D. Dasgupta, S. Yu, and N. S. Majumdar. MILA-multilevel immune learning al-
gorithm. In Proceedings of the Genetic and Evolutionary Computation Conference
(GECCO), pages 183–194. Springer, 2003.
[27] L. N. de Castro and J. Timmis. Articial Immune Systems: A New Computational
Intelligence Approach. Springer-Verlag New York, Inc. Secaucus, NJ, USA, 2002.
[28] K. S. Desale and R. Ade. Genetic algorithm based feature selection approach for
effective intrusion detection system. In Proceedings of the International Conference
on Computer Communication and Informatics (ICCCI), pages 1–6, 2015.
[29] P. Dhaeseleer. An immunological approach to change detection: Theoretical re-
sults. In Proceedings of the IEEE Computer Security Foundations Workshop, pages
18–26, 1996.
[30] P. D’haeseleer, S. Forrest, and P. Helman. An immunological approach to change
detection: algorithms, analysis and implications. In Proceedings of IEEE Sympo-
sium on Security and Privacy, pages 110–119, 1996.
84
[31] M. Elberfeld and J. Textor. Efficient algorithms for string-based negative selec-
tion. In Proceedings of the International Conference on Artificial Immune Systems,
pages 109–121, 2009.
[32] M. Elberfeld and J. Textor. Negative selection algorithms on strings with efficient
training and linear-time classification. Theoretical Computer Science, 412(6):534
– 542, 2011.
[33] F. Esponda, E. S. Ackley, and S. Forrest. Online negative databases. In Proceedings
of the International Conference on Artificial Immune Systems (ICARIS), pages
175–188. Springer, 2004.
[34] F. Esponda, S. Forrest, and P. Helman. A formal framework for positive and neg-
ative detection schemes. IEEE Transactions on Systems, Man, and Cybernetics,
Part B (Cybernetics), 34(1):357–373, 2004.
[35] D. A. Fernandes, M. M. Freire, P. A. Fazendeiro, and P. R. In´acio. Applications of
artificial immune systems to computer security: A survey. Journal of Information
Security and Applications, 35:138 – 159, 2017.
[36] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff. A sense of self for
UNIX processes. In Proceedings of the IEEE Symposium on Research in Security
and Privacy, pages 120–128, 1996.
[37] S. Forrest, B. Javornik, R. E. Smith, and A. S. Perelson. Using genetic algorithms
to explore pattern recognition in the immune system. Evolutionary Computation,
1:191–211, 1993.
[38] Forrest, Stephanie and Perelson, Alan S. and Allen, Lawrence and Cherukuri,
Rajesh. Self-nonself discrimination in a computer. In Proceedings of the IEEE
Symposium on Security and Privacy, pages 202–212, 1994.
[39] Z. Fuyong and Q. Deyu. Run-time malware detection based on positive selection.
Journal in Computer Virology, 7:267–277, 2011.
[40] Z. Fuyong and Q. Deyu. A positive selection algorithm for classification. Journal
Computational Information Systems, 7:207–215, 2012.
[41] A. A. Ghorbani, W. Lu, and M. Tavallaee. Network intrusion detection and pre-
vention: concepts and techniques. Springer Science & Business Media, 2009.
85
[42] F. Gonz´alez, D. Dasgupta, and J. G´omez. The effect of binary matching rules in
negative selection. In Proceedings of the Genetic and Evolutionary Computation
Conference (GECCO), pages 195–206, 2003.
[43] C. Guo, Y.-J. Zhou, Y. Ping, S.-S. Luo, Y.-P. Lai, and Z.-K. Zhang. Efficient
intrusion detection using representative instances. Computers & Security, 39,
Part B:255 – 267, 2013.
[44] X. Hang and H. Dai. Applying both positive and negative selection to supervised
learning for anomaly detection. In Proceedings of the Conference on Genetic and
Evolutionary Computation (GECCO), pages 345–352, 2005.
[45] P. K. Harmer, P. D. Williams, G. H. Gunsch, and G. B. Lamont. An artificial im-
mune system architecture for computer security applications. IEEE Transactions
on Evolutionary Computation, 6(3):252–280, 2002.
[46] S. Hofmeyr. An immunological model of distributed detection and its application
to computer security. PhD thesis, The University of New Mexico, ALbuquerque,
NM, 1999.
[47] S. B. Inadyuti Dutt and I. Maitra. Intrusion detection system using artificial
immune system. International Journal of Computer Applications, 144(12):19–22,
2016.
[48] Z. Jadidi, V. Muthukkumarasamy, and E. Sithirasenan. Metaheuristic algorithms
based flow anomaly detector. In Proceedings of the Asia-Pacific Conference on
Communications (APCC), pages 717–722, 2013.
[49] Z. Jadidi, V. Muthukkumarasamy, E. Sithirasenan, and K. Singh. Flow-based
anomaly detection using semisupervised learning. In Proceedings of the Interna-
tional Conference on Signal Processing and Communication Systems (ICSPCS),
pages 1–5, 2015.
[50] Z. Ji. Negative Selection Algorithms: from the Thymus to V-detector. PhD thesis,
The University of Memphis, 2006.
[51] Z. Ji and D. Dasgupta. Revisiting negative selection algorithms. Evolutionary
Computation, 15:223–251, 2007.
86
[52] L. Jim and M. Gregory. A review of artificial immune system based security
frameworks for manet. International Journal of Communications, Network and
System Sciences, 9(1):1–18, 2016.
[53] K. Jungwon. Integrating Articial Immune Algorithms for Intrusion Detection.
PhD thesis, University College London, 2002.
[54] J. Kim and P. J. Bentley. An Evaluation of Negative Selection in an Artificial
Immune System for Network Intrusion Detection. In Proceedings of the Genetic
and Evolutionary Computation Conference (GECCO), pages 1330–1337, 2001.
[55] J. Kim, P. J. Bentley, U. Aickelin, J. Greensmith, G. Tedesco, and J. Twycross.
Immune system approaches to intrusion detection – a review. Natural Computing,
6(4):413–466, 2007.
[56] A. Koˇsmrlj, A. K. Jha, E. S. Huseby, M. Kardar, and A. K. Chakraborty. How
the thymus designs antigen-specific and self-tolerant T cell receptor sequences.
Proceedings of the National Academy of Sciences, 105(43):16671–16676, 2008.
[57] A. Koˇsmrlj, E. L. Read, Y. Qi, T. M. Allen, M. Altfeld, S. G. Deeks, F. Pereyra,
M. Carrington, B. D. Walker, and A. K. Chakraborty. Effects of thymic selection
of the T-cell repertoire on HLA class I-associated control of HIV infection. Nature,
465(7296):350–354, 2010.
[58] V. D. Kotov and V. Vasilyev. Immune model based approach for network in-
trusion detection. In Proceedings of the International Conference on Security of
Information and Networks (SIN), pages 233–237, 2010.
[59] W. Ma, D. Tran, and D. Sharma. Negative selection with antigen feedback in
intrusion detection. In Proceedings of the International Conference on Artificial
Immune Systems (ICARIS), pages 200–209, 2008.
[60] M. V. Mahoney and P. K. Chan. An analysis of the 1999 DARPA/Lincoln Lab-
oratory evaluation data for network anomaly detection. In Proceedings of the In-
ternational Workshop on Recent Advances in Intrusion Detection, pages 220–237,
2003.
[61] C. A. Mart´ınez, G. I. Echeverri, and A. G. C. Sanz. Malware detection based on
cloud computing integrating intrusion ontology representation. In Proceedings of
the IEEE Latin-American Conference on Communications, pages 1–6, 2010.
87
[62] J. McHugh. Testing intrusion detection systems: A critique of the 1998 and 1999
DARPA intrusion detection system evaluations as performed by Lincoln Labo-
ratory. ACM Transactions on Information and System Security, 3(4):262–294,
2000.
[63] T. Mehmod and H. B. M. Rais. Ant colony optimization and feature selection
for intrusion detection. In Advances in Machine Learning and Signal Processing,
pages 305–312. Springer, 2016.
[64] R. Murugesan and V. N. Kumar. A Fast Algorithm for Solving JSSP. European
Journal of Scientific Research, 64:579–586, 2011.
[65] P. Ning and S. Jajodia. Intrusion detection techniques. The Internet Encyclopedia,
2003.
[66] L. X. Peng and Y. F. Chen. Positive selection-inspired anomaly detection model
with artificial immune. In Proceedings of the International Conference on Cyber-
Enabled Distributed Computing and Knowledge Discovery (CyberC), pages 56–59,
2014.
[67] P. H. Pisani, A. C. Lorena, and A. C. Carvalho. Adaptive positive selection for
keystroke dynamics. J. Intell. Robotics Syst., 80(1):277–293, 2015.
[68] D. Poole, A. Mackworth, and R. Goebel. Computational Intelligence: A Logical
Approach. Oxford University Press, Oxford, UK, 1997.
[69] Y. Sawaya, A. Kubota, and Y. Miyake. Detection of attackers in services using
anomalous host behavior based on traffic flow statistics. In Proceedings of the
International Symposium on Applications and the Internet (SAINT), pages 353–
359, 2011.
[70] M. Sheikhan and Z. Jadidi. Flow-based anomaly detection in high-speed links us-
ing modified GSA-optimized neural network. Neural Computing and Applications,
24(3-4):599–611, 2014.
[71] G. C. Silva and D. Dasgupta. A Survey of Recent Works in Artificial Immune
Systems, pages 547–586. World Scientific, 2016.
88
[72] G. C. Silva, R. M. Palhares, and W. M. Caminhas. Immune inspired fault detection
and diagnosis: A fuzzy-based approach of the negative selection algorithm and
participatory clustering. Expert Systems with Applications, 39:12474–12486, 2012.
[73] K. B. Sim and D. W. Lee. Modeling of Positive Selection for the Development
of a Computer Immune System and a Self-Recognition Algorithm. International
Journal of Control, Automation, and Systems, 1:453–458, 2003.
[74] T. S. Sobh and W. M. Mostafa. A cooperative immunological approach for de-
tecting network anomaly. Applied Soft Computing, 11(1):1275 – 1283, 2011.
[75] A. Sperotto, R. Sadre, F. Vliet, and A. Pras. A labeled data set for flow-based
intrusion detection. In Proceedings of the IEEE International Workshop on IP
Operations and Management, pages 39–50, 2009.
[76] A. Sperotto, G. Schaffrath, R. Sadre, C. Morariu, A. Pras, and B. Stiller. An
Overview of IP Flow-Based Intrusion Detection. IEEE Communications Surveys
Tutorials, 12(3):343–356, 2010.
[77] T. Stibor. On the appropriateness of negative selection for anomaly detection and
network intrusion detection. PhD thesis, TU Darmstadt, 2006.
[78] T. Stibor, K. M. Bayarou, and C. Eckert. An investigation of R-chunk detector
generation on higher alphabets. In Proceedings of the Genetic and Evolutionary
Computation Conference (GECCO), pages 299–307, 2004.
[79] T. Stibor, P. Mohr, J. Timmis, and C. Eckert. Is negative selection appropriate for
anomaly detection? In Proceedings of the Genetic and Evolutionary Computation
Conference (GECCO), pages 321–328, 2005.
[80] T. Stibor, J. Timmis, and C. Eckert. A comparative study of real-valued negative
selection to statistical anomaly detection techniques. Lecture notes in Computer
science, 3627:262–275, 2005.
[81] Y. Tan. Anti-Spam Techniques Based on Artificial Immune System. CRC Press,
2016.
[82] J. Textor. Efficient negative selection algorithms by sampling and approximate
counting. In Proceedings of the International Conference on Parallel Problem
Solving from Nature, pages 32–41. Springer Berlin Heidelberg, 2012.
89
[83] J. Textor. Search and learning in the immune system: models of immune surveil-
lance and negative selection. PhD thesis, L¨ubeck, University, 2012.
[84] J. Textor, K. Dannenberg, and M. Li´skiewicz. A generic finite automata based
approach to implementing lymphocyte repertoire models. In Proceedings of the
Conference on Genetic and Evolutionary Computation (GECCO), pages 129–136,
2014.
[85] J. Timmis, A. Hone, T. Stibor, and E. Clark. Theoretical advances in artificial
immune systems. Theoretical Computer Science, 403(1):11–32, 2008.
[86] Q. A. Tran, F. Jiang, and J. Hu. A real-time netFlow-based intrusion detec-
tion system with improved BBNN and high-frequency field programmable gate
arrays. In Proceedings of the IEEE International Conference on Trust, Security
and Privacy in Computing and Communications, pages 201–208, 2012.
[87] J. Vykopal. Flow-based Brute-force Attack Detection in Large and High-speed
Networks. Dissertations, Masaryk University, 2013.
[88] Y. Wang. Statistical Techniques for Network Security: Modern Statistically-Based
Intrusion Detection and Protection. IGI Global, 2008.
[89] S. T. Wierzcho´n. Discriminative power of the receptors activated by k-contiguous
bits rule. Journal of Computer Science and Technology, 1(3):1–13, 2000.
[90] S. T. Wierzcho´n. Generating optimal repertoire of antibody strings in an artifi-
cial immune system. In Proceedings of the Symposium on Intelligent Information
Systems, pages 119–133, 2000.
[91] P. Winter, E. Hermann, and M. Zeilinger. Inductive intrusion detection in flow-
based network data using one-class support vector machines. In Proceedings of the
International Conference on New Technologies, Mobility and Security (NTMS),
pages 1–5, 2011.
[92] S. X. Wu and W. Banzhaf. The use of computational intelligence in intrusion
detection systems: A review. Applied Soft Computing, 10(1):1–35, 2010.
[93] B. Xu, W. Luo, X. Pei, M. Zhang, and X. Wang. On average time complexity of
evolutionary negative selection algorithms for anomaly detection. In Proceedings
90
of the First ACM/SIGEVO Summit on Genetic and Evolutionary Computation,
pages 631–638. ACM, 2009.
[94] H. Yang, T. Li, X. Hu, F. Wang, and Y. Zou. A survey of artificial immune system
based intrusion detection. The Scientific World Journal, 2014.
[95] X. Yang and Z. Hui. Intrusion detection alarm filtering technology based on ant
colony clustering algorithm. In Proceedings of the International Conference on
Intelligent Systems Design and Engineering Applications (ISDEA), pages 470–
473, 2015.
[96] D. Y. Yeung and Y. Ding. Host-based intrusion detection using dynamic and
static behavioral models. Pattern Recognition, 36:229–243, 2003.
[97] Zeng, Jie and Liu, Xiaojie and Li, Tao and Li, Guiyang and Li, Haibo and Zeng,
Jinquan. A novel intrusion detection approach learned from the change of antibody
concentration in biological immune response. Applied Intelligence, 35(1):41–62,
2011.
[98] D. Zhao and W. Luo. Real-valued negative databases. In Proceedings of the
European Conference on Artificial Life (ECAL), pages 884–890, 2013.
[99] M. Zolotukhin, T. Hmlinen, T. Kokkonen, and J. Siltanen. Online detection of
anomalous network flows with soft clustering. In Proceedings of the International
Conference on New Technologies, Mobility and Security (NTMS), pages 1–5, 2015.