Nguyễn Văn Trường và đtg<br />
<br />
Tạp chí KHOA HỌC & CÔNG NGHỆ<br />
<br />
80(04): 121 - 125<br />
<br />
A NEW APPROACH FOR GENERATING A COMPLETE DETECTOR<br />
REPERTOIRE IN ARTIFICIAL IMMUNE SYSTEMS<br />
Nguyen Van Truong1, Pham Dinh Lam2<br />
1<br />
<br />
College of Education – TNU; 2 Board of Information Technology - TNU<br />
<br />
ABSTRACT<br />
Generation of the Detector set is a key problem in Artificial Immune Systems (AIS) due to cost of<br />
time and space. A weakness in many detector generating algorithms was random generation of<br />
detectors so that many candidate detectors may be discarded. We present a novel data structure for<br />
extracting data from protected self set. That helps to produce all possible detectors, or a complete<br />
detector repertoire, with lower time and space complexities compare to recent novel approaches.<br />
Theoretical analysis and experimental results show that our approach is effective and feasible.<br />
These new valuable characteristics can make complex AIS have the highest ability to detect data<br />
changes and to reduce false detection.<br />
Keywords: Artificial immune system, Intrusion detection system, self set, detector set, complete<br />
detector repertoire, false detection<br />
<br />
INTRODUCTION*<br />
It is impractical for security software to detect<br />
all intrusions in computer networks. Many<br />
network technologies on security came into<br />
being. The more successful one is based on<br />
AIS, which illustrates biology immune system<br />
on computers. It is evaluated as a new and<br />
effective soft computing method in the field<br />
of network security and information security.<br />
Especially, it is suitable for building up the<br />
Intrusion Detection Systems (IDS) that can<br />
protect computer systems against intruders<br />
and the destruction of computer viruses or<br />
other malicious software system.<br />
This paper is concerned with one aspect of<br />
computer security: ability of detecting all<br />
kinds of changes or attacks, both known and<br />
unknown ones. The method explored involves<br />
the most discussed immune models: Negative<br />
selection algorithm. In the algorithm,<br />
generating the detector set plays a very<br />
important role and helps to increase the<br />
overall performance of AIS. Our method is<br />
to concentrate on producing complete<br />
repertoire for negative selection in not only<br />
security systems, but also in problem<br />
requiring some tolerance of noise, or<br />
involving dynamic streams of data (such as<br />
system call sequences).<br />
*<br />
<br />
Tel: 0915016063; Email: nvtruongtn@gmail.com<br />
<br />
There are many researches on intrusion<br />
detection generating, such as Exhaustive<br />
Detector Generating Algorithm, Linear Time<br />
Detector generating algorithm, the Greedy<br />
Detector Generating Algorithm [2], [7]. Our<br />
algorithm involves a special data structure<br />
called Location table. It plays an important<br />
part in our algorithm and leads to reduce both<br />
time and space complexities. Furthermore,<br />
our approach has many advantages when<br />
applying in dynamic environment as<br />
mentioned in the 5th section.<br />
The remaining of the paper is organized as<br />
follows. In the 2nd section, we give a brief<br />
literature review of AIS. Our technique of<br />
generating complete repertoire, the main part of<br />
the paper, is presented in the 3rd section. Some<br />
analyses and experiments are given in the 4th<br />
section. The 5th section concludes the paper.<br />
LITERATURE REVIEW<br />
In this section, we first described negative<br />
selection algorithm, very commonly used<br />
algorithm in AISs. After that, we present a bit<br />
some recent researches in the field.<br />
Artificial<br />
negative<br />
selection<br />
is<br />
a<br />
computational imitation of self/nonself<br />
discrimination, firstly designed as a change<br />
detection method. This mechanism is first<br />
given by Forrest et al. [1]. The outline of a<br />
typical negative selection algorithm contains<br />
two stages [10].<br />
121<br />
<br />
Nguyễn Văn Trường và đtg<br />
<br />
Tạp chí KHOA HỌC & CÔNG NGHỆ<br />
<br />
Figure 1. Original model of detector generation<br />
<br />
In the generation stage (Fig. 1), the detectors<br />
are generated by some random process and<br />
censored by trying to match self samples<br />
taken from set S. Those candidates that match<br />
are eliminated and the rest are kept as<br />
detectors in set D. In the detection stage (Fig.<br />
2), the collection of detectors (or detector set)<br />
is used to check whether an incoming data<br />
instance is self or nonself. If it matches any<br />
detector, it is claimed as nonself or an<br />
anomaly. This description is limited to some<br />
extent, but conveys the essential idea.<br />
<br />
Figure 2. Detection of new instances<br />
<br />
There have never been Vietnamese<br />
individuals or groups have studied<br />
systematically artificial immune system and<br />
its application, except that NC Group (Natural<br />
Computing) of the Military Technical<br />
Academy. However, it only studied some<br />
theory aspects without experiments [5]. Our<br />
122<br />
<br />
80(04): 121 - 125<br />
<br />
recent research evolves negative selection<br />
algorithm, but it only concentrates on how to<br />
detect effectively by using given location<br />
table without generating detectors [6]. The<br />
approach therefore is not suitable for<br />
distributed environment. There are ongoing<br />
studies of many foreign researches in the field<br />
[7]. But their algorithms’ complexities are<br />
much lager than our ones. The details of our<br />
algorithm are described in the following<br />
section.<br />
GENERATING COMPLETE REPERTOIRE<br />
In the Fig. 3, we illustrate the process for<br />
generating a detector set by using the number<br />
of five contiguous matches required for a<br />
match. The string to be protected is logically<br />
segmented into five equal-length “self”<br />
strings. To generate the detector, random<br />
strings are produced and matched against<br />
each of the self strings. The first two strings,<br />
00000111 and 00000010, are eliminated<br />
because they both match self string 00000110<br />
at at least five contiguous positions. The<br />
string 11101001 fails to match any string in<br />
the self at at least five contiguous positions,<br />
so it is accepted into the detector set.<br />
<br />
Figure 3. Illustration of generating Detector set<br />
<br />
Our approach is derived from previous work<br />
[6]. Data space is often huge, so AIS need<br />
huge memory storage. This also leads to<br />
increase<br />
of<br />
time<br />
complexity.<br />
Our<br />
improvement has lower complexity than that<br />
of current fastest algorithm. Detail notations<br />
are described in the following before giving<br />
our algorithm.<br />
<br />
Nguyễn Văn Trường và đtg<br />
<br />
Tạp chí KHOA HỌC & CÔNG NGHỆ<br />
<br />
- U is the set of all possible binary strings of<br />
length l, and r is the matching threshold.<br />
- s is the binary string, s ∈ U.<br />
- D is the detector set, D ⊂ U.<br />
- A is two dimensions matrix presenting<br />
location table with the size 2r.(l-r+1)<br />
The pseudo code of the first part of the<br />
algorithm to create table A is described as<br />
follow.<br />
Initialize all cells of A to 0;<br />
For each s string to be protected, s ∈ U<br />
For j ∈ [1, l - r+1]<br />
Begin for<br />
i = is an integer number of sub-string of s<br />
whose length is r and starting position is j<br />
within the string s;<br />
A[i, j] = 1;<br />
End for<br />
For example, given self set S containing five<br />
strings to be protected S = {s1 = 01011; s2 =<br />
11001; s3 = 10110; s4 = 00101; s5 = 01010}; l<br />
= 5, r = 3. Applying our algorithm we had the<br />
table A below.<br />
0<br />
1<br />
2<br />
3<br />
<br />
1<br />
0<br />
1<br />
1<br />
0<br />
<br />
2<br />
0<br />
0<br />
1<br />
1<br />
<br />
3<br />
0<br />
1<br />
1<br />
1<br />
<br />
4<br />
<br />
0<br />
<br />
1<br />
<br />
0<br />
<br />
5<br />
<br />
1<br />
<br />
1<br />
<br />
1<br />
<br />
6<br />
<br />
1<br />
<br />
0<br />
<br />
1<br />
<br />
7<br />
<br />
0<br />
<br />
0<br />
<br />
0<br />
<br />
For s1 = 01011: with j = 1 we calculate i =<br />
(010)2 = (2)10. Thus, A[2,1] equals to 1. A[6,2]<br />
equals to 0 because there are no sub-string<br />
110 (or 6 in integer form) of all five strings<br />
starting in 2nd position. Other values in the<br />
table are calculated in a similar way.<br />
The pseudo code of the second part of the<br />
algorithm for generating complete repertoire<br />
is as follow. In the algorithm, the function<br />
right(d, r-1) returns a r – 1 bit string from the<br />
end of the bit string d.<br />
D=∅;<br />
<br />
80(04): 121 - 125<br />
<br />
Function Try(d,i)<br />
begin<br />
if (i = l)D ← d;<br />
else<br />
begin<br />
d1=right (d,r-1) + “1”;<br />
d0 = right (d,r-1) + “0”;<br />
k = integer form of d1;<br />
if (A[k,i] = 0) Call Try(d1,i+1);endif<br />
k = integer form of d0;<br />
if (A[k,i] = 0) Call Try(d0,i+1);endif<br />
end<br />
endif<br />
end<br />
for i satisfying A[i,1] = 0<br />
d = binary form of integer i having r bit;<br />
Call Try(d,r);<br />
endfor<br />
For example, with A[4,1] = 0, we initialize d<br />
= 100 as first three bits of candidate detectors.<br />
We calculate d1 = 001 and d0 = 000.<br />
With d1 = 001, we have k = 1, A[1,2] = 0 so<br />
that d1 will be used in next step. Similarly, in<br />
the next step, we calculate d0 = 010 and d1 =<br />
011. These two strings have values 2 and 3 in<br />
integer form, respectively. Both A[2,3] and<br />
A[3,3] equal to 1, so that we cannot find out<br />
any detector in this step.<br />
With d0 = 000, we have k = 0, A[0,2] = 0 so<br />
that d0 will be used in next step. It is similar<br />
and easy to find out one detector d = 10001 in<br />
the step.<br />
By examining A[0,1], A[3,1] and A[7,1] in<br />
similar way, we can find out a complete<br />
detector repertoire D containing six detectors<br />
D = {d1 = 00000; d2 = 01100; d3 = 01111; d4<br />
= 10000; d5 = 11111; d6 = 11100}.<br />
ANALYSIS AND EXPERIMENT<br />
The principle data structure used in this<br />
algorithm consists of the large (l − r) × 2r A<br />
array to save data extracted from self. This<br />
has an impact on the time and space<br />
complexities of the algorithm: time: O((l –<br />
r).|S| + O(2l-r)), and space: O(2r.(l-r)).<br />
123<br />
<br />
Nguyễn Văn Trường và đtg<br />
<br />
Tạp chí KHOA HỌC & CÔNG NGHỆ<br />
<br />
The above algorithm runs in time linear in the<br />
size of the self set (for given parameters l and<br />
r). This is much lower than algorithm presents<br />
in [7]. Our algorithm’s complexities are 18<br />
times lower than that of the algorithms in [7]<br />
in some cases. The detailed comparison is in<br />
table 1 below.<br />
We now present our experiment on AIS for<br />
detecting real viruses. Our viruses are quite<br />
strange to some well known antivirus<br />
software; and they can affect .COM and .EXE<br />
file types in the current directory only. The<br />
experiments were carried out by compared the<br />
viruses detecting ability of our AIS-IDS and<br />
that of famous commercial antivirus software<br />
KIS, Kaspersky Internet Security. The<br />
software was licensed and updated on Feb 15,<br />
2011. The software can not detect our virus<br />
and can only detect the viruses after a few<br />
days. This can be explained as follows: 1.<br />
Database of the software has not had or has<br />
not updated signatures of the viruses; 2.<br />
Mechanism of virus detection based on<br />
behavior is not good enough. In contrast, our<br />
approach allows to detect changes even one<br />
bit of data immediately. Our system can alert<br />
100% accurately, it means that the rate of<br />
false positive reduces to zero and the rate of<br />
false negative reduces to minimum.<br />
<br />
80(04): 121 - 125<br />
<br />
Picture 1. Our system found 3 files infected with virus<br />
<br />
Picture 2. KIS can not detect virus<br />
<br />
Table 1. The result comparison<br />
<br />
|S|<br />
<br />
Our alg.’s<br />
space comp.:<br />
O(2r.(l-r))<br />
<br />
Space comp. of<br />
the algo. in [7]:<br />
O(2r.(l-r)2)<br />
<br />
Our algo.’s time<br />
complexity:<br />
O((l – r).|S| +<br />
O(2l-r))<br />
<br />
Time comp. of<br />
the algo. in [7]:<br />
O((l – r).|S| +<br />
O(2r(l-r))<br />
<br />
l<br />
<br />
R<br />
<br />
250<br />
<br />
16<br />
<br />
10<br />
<br />
6,144<br />
<br />
36,864<br />
<br />
1,564<br />
<br />
7,644<br />
<br />
500<br />
<br />
16<br />
<br />
11<br />
<br />
10,240<br />
<br />
51,200<br />
<br />
2,532<br />
<br />
12,740<br />
<br />
500<br />
<br />
32<br />
<br />
14<br />
<br />
294,912<br />
<br />
5,308,416<br />
<br />
271,144<br />
<br />
303,912<br />
<br />
1,000<br />
<br />
16<br />
<br />
12<br />
<br />
16,384<br />
<br />
65,536<br />
<br />
4,016<br />
<br />
20,384<br />
<br />
1,000<br />
<br />
32<br />
<br />
14<br />
<br />
294,912<br />
<br />
5,308,416<br />
<br />
280,144<br />
<br />
312,912<br />
<br />
1,000,000<br />
<br />
32<br />
<br />
15<br />
<br />
557,056<br />
<br />
9,469,952<br />
<br />
17,000,362<br />
<br />
17,557,056<br />
<br />
124<br />
<br />
Nguyễn Văn Trường và đtg<br />
<br />
Tạp chí KHOA HỌC & CÔNG NGHỆ<br />
<br />
CONCLUSIONS<br />
The key of IDS based on artificial immune is<br />
generating the Detector set. Our approach<br />
helps to reduce both time and space<br />
complexities. This is really new result in the<br />
area. In the future, we plan to report more<br />
detail experimental data about the algorithm<br />
as well as to apply them for dynamic data<br />
space in our next papers. Because we believe<br />
that location table A can contain not only bit<br />
data, but also integer data for dynamic S. It<br />
means that A[i,j] will increase or decrease one<br />
when adding a self to S or deleting a self from<br />
S, respectively. Furthermore, it is our belief<br />
that our method can be used for variable<br />
matching length r. This new idea will be<br />
suitable for various problems in the field of<br />
computer science.<br />
ACKNOWLEDGMENT<br />
The authors would like to thank Dr. Nguyen<br />
Xuan Hoai and many colleagues in IT R&D<br />
Center - HNU, for their useful suggestions<br />
when authors presented the research in seminar<br />
series on computational intelligence last year.<br />
REFERENCES<br />
[1].Forrest et al, Self-Nonself Discrimination in a<br />
Computer, in Proceedings of 1994 IEEE<br />
Symposium on Research in Security and Privacy,<br />
Oakland, CA, 202-212, 1994.<br />
[2].F. Liu, Q. Wang and X. Gao, Survey of<br />
Artificial Immune System, in Proceedings of The<br />
First IEEE International Symposium on Systems<br />
and Control in Aerospace and Astronautics, 985989, 2006.<br />
<br />
80(04): 121 - 125<br />
<br />
[3].Jamie Twycross at al., Detecting Anomalous<br />
Process Behavior using Second Generation<br />
Artificial<br />
Immune<br />
Systems,<br />
Journal<br />
of<br />
Unconventional Computing, Vol. 1, pp. 1–26,<br />
2010.<br />
[4].L. N de Castro and J. Timmis, Artificial<br />
Immune Systems: A New Computational<br />
Intlligence Approach, Springer-Verlag, 2002.<br />
[5].Nguyễn Xuân Hoài, Nguyễn Văn Trường, Vũ<br />
Mạnh Xuân, “Hệ miễn dịch nhân tạo và ứng<br />
dụng”, Tạp chí Khoa học và Công nghệ - ĐH Thái<br />
Nguyên, số 2, 2007, 13-18.<br />
[6].Nguyen Van Truong, Pham Dinh Lam,<br />
Improving negative selection algorithm in<br />
artificial immune systems for computer virus<br />
detection, Journal of Science and Technology,<br />
Thai Nguyen University, 72(10), 2010, 53-58.<br />
[7].Patrik D’haeseleer et al., An Immunological<br />
Approach to Change Detection: Algorithms,<br />
Analysis and Implications, IEEE Symposium on<br />
Security and Privacy, 1996.<br />
[8].U. Aickelin, J. Greensmith and J. Twycross,<br />
Immune System Approaches to Intrusion<br />
Detection - A Review, in The Proceedings of the<br />
Third International Conference on Artificial<br />
Immune Systems, LNCS 3239, 316-329, 2004.<br />
[9].T. Lin et al. Research on The Network<br />
Intrusion Detection Based on The Immune<br />
System, in Proceedings of the Fifth IEEE<br />
International Conference on Machine Learning<br />
and Cybernetics, 4479-4482, 2006.<br />
[10]. Zhou Ji, Dipankar Dasgupta, Revisiting<br />
Negative Selection Algorithms, Evolutionary<br />
Computation 15(2), 2007.<br />
<br />
TÓM TẮT<br />
MỘT CÁCH TIẾP CẬN MỚI ĐỂ SINH RA MỘT KHO ĐẦY ĐỦ CÁC BỘ DÒ<br />
TRONG HỆ MIỄN DỊCH NHÂN TẠO<br />
Nguyễn Văn Trường1*, Phạm Đình Lâm2<br />
1<br />
<br />
Trường Đại học Sư phạm – ĐH Thái Nguyên; 2 Đại học Thái Nguyên<br />
<br />
Vấn đề chủ yếu trong các Hệ miễn dịch nhân tạo là việc sinh ra tập bộ dò tốn nhiều chi phí về thời<br />
gian và bộ nhớ. Điểm yếu trong nhiều thuật toán sinh bộ dò là do các bộ dò được sinh ra ngẫu<br />
nhiên nên nhiều ứng cử viên cho bộ dò bị loại bỏ. Chúng tôi trình bày một cấu trúc dữ liệu mới để<br />
trích rút dữ liệu từ tập tế bào cần bảo vệ. Nó cho phép sinh ra tất cả các bộ dò, hay kho đầy đủ các<br />
bộ dò, với chi phí thấp hơn về thời gian và bộ nhớ so với những nghiên cứu gần đây. Những phân<br />
tích lý thuyết và kết quả thực nghiệm chứng tỏ cách tiếp cận của chúng tôi hiệu quả và khả thi.<br />
Những đặc điểm giá trị này cho phép các hệ thống AIS phức tạp có khả năng cao nhất để phát hiện<br />
những thay đổi trong dữ liệu và làm giảm phát hiện âm cực.<br />
Từ khóa: Hệ miễn dịch nhân tạo, Hệ phát hiện xâm nhập, tập tế bào, tập bộ dò, kho đầy đủ các<br />
bộ dò, phát hiện âm cực<br />
*<br />
<br />
Tel: 0915016063; Email: nvtruongtn@gmail.com<br />
<br />
125<br />
<br />