’<br />
Tap ch´ Tin hoc v` Diˆu khiˆn hoc, T.24, S.1 (2008), 32–41<br />
ı<br />
a `<br />
e<br />
e<br />
.<br />
.<br />
.<br />
<br />
.<br />
’<br />
´<br />
ˆ ´<br />
ˆ<br />
´<br />
´<br />
INH<br />
CO CHE MAY HOC CHAN DOAN VIRUS MAY T´<br />
.<br />
. .<br />
´<br />
`<br />
ˆ<br />
ˆ<br />
HOANG KIEM1 , TRU O NG MINH NHAT QUANG2<br />
.<br />
1<br />
<br />
Tru.o.ng Dai hoc Cˆng nghˆ Thˆng tin, DHQG TP.HCM<br />
`<br />
o<br />
e<br />
o<br />
. .<br />
.<br />
2 Trung tˆm D`o tao Dai hoc Tai ch´.c Cˆn Tho.<br />
a<br />
a .<br />
u `<br />
a<br />
. .<br />
.<br />
<br />
Abstract. When computer virus wide spreads in the world nowadays, anti-virus needs to improve<br />
their identifying methods to enhance the performance. In this paper, we introduce a new method<br />
to diagnose computer virus. First, we analyse the characteristics of viral data type to define virus<br />
classes through object-oriented methods. Second, we study the machine learnning mechanism for each<br />
virus class. Finally, we apply these learning forms to a data processing stage of a machine learning<br />
anti-virus expert system. The experimentation results show that the machine learning approach<br />
is suitable for anti-virus to identify the computer virus. This approach also gives a new aspect of<br />
anti-virus technology.<br />
´<br />
´<br />
´<br />
´<br />
T´m t˘t. Trong bˆi canh c´c hˆ thˆng m´y t´ thu.o.ng xuyˆn bi virus tˆn cˆng, c´c hˆ ph`ng<br />
o<br />
a<br />
o ’<br />
a e o<br />
a ınh<br />
`<br />
e<br />
a o<br />
a e<br />
o<br />
.<br />
.<br />
.<br />
.o.ng ph´p nhˆn dang v` t˘ng cu.o.ng hiˆu qua chˆ n do´n. Trong<br />
’<br />
´<br />
´<br />
’ a<br />
chˆng virus m´y t´ cˆn cai tiˆn phu<br />
o<br />
a ınh `<br />
a ’ e<br />
a<br />
a<br />
a .<br />
a a<br />
`<br />
e<br />
.<br />
.<br />
’<br />
’<br />
´<br />
`<br />
b`i viˆt n`y ch´ ng tˆi gi´.i thiˆu phu.o.ng ph´p m´.i dˆ chˆ n do´n virus m´y t´<br />
a<br />
e a<br />
u<br />
o o<br />
e<br />
a<br />
o e a<br />
a<br />
a ınh. Dˆu tiˆn, virus<br />
a<br />
e<br />
.<br />
.o.c dinh ngh˜ hu.o.ng dˆi tu.o.ng theo d˘c tru.ng d˜. liˆu. Kˆ tiˆp, xˆy du.ng c´c mˆ h`<br />
´ ´<br />
´<br />
m´y t´ du . .<br />
a ınh<br />
ıa ´<br />
o<br />
a<br />
u e<br />
e e<br />
a<br />
a<br />
o ınh<br />
.<br />
.<br />
.<br />
.<br />
´<br />
’ y ’<br />
hoc th´ ho.p cho t`.ng l´.p virus. Cuˆi c` ng, ap dung c´c b`i to´n hoc v`o giai doan xu. l´ cua mˆt<br />
ıch .<br />
u<br />
o<br />
o u<br />
´<br />
a a a<br />
a<br />
o<br />
.<br />
.<br />
.<br />
.<br />
.<br />
´ng virus m´y t´ hu.o.ng tiˆp cˆn m´y hoc v` hˆ chuyˆn gia. Kˆt qua thu.c nghiˆm cho<br />
´ a<br />
´<br />
’<br />
hˆ ph`ng chˆ<br />
e o<br />
o<br />
a ınh ´<br />
e .<br />
a . a e<br />
e<br />
e<br />
e<br />
.<br />
.<br />
.<br />
.<br />
´<br />
’<br />
a a<br />
ıch .<br />
a<br />
a<br />
a<br />
a ınh, mo. ra hu.o.ng nghiˆn c´.u<br />
´<br />
e u<br />
thˆy phu.o.ng ph´p n`y th´ ho.p cho b`i to´n nhˆn dang virus m´y t´<br />
a<br />
.<br />
.<br />
m´.i trong cˆng nghˆ anti-virus ng`y nay.<br />
o<br />
o<br />
e<br />
a<br />
.<br />
<br />
´.<br />
ˆ<br />
1. GIO I THIEU<br />
.<br />
Internet l` mˆi tru.`.ng thuˆn lo.i cho virus m´y t´ lˆy lan trˆn diˆn rˆng. M˘c d` c´c<br />
a o<br />
o<br />
a .<br />
a ınh a<br />
e<br />
e o<br />
a u a<br />
.<br />
. .<br />
.<br />
.ng cˆp nhˆt v` ph´t triˆ n, c´c hˆ<br />
´<br />
hˆ ph`ng chˆng virus (AV, Anti-virus software) khˆng ng`<br />
e o<br />
o<br />
o<br />
u<br />
a<br />
a a a<br />
e’<br />
a e<br />
.<br />
.<br />
.<br />
.<br />
˜n thu.`.ng xuyˆn bi virus xˆm nhˆp, d´nh c˘p v` ph´ huy d˜. liˆu. Do d´<br />
´ a a ’ u e<br />
´<br />
thˆng m´y t´ vˆ<br />
o<br />
a ınh a<br />
o<br />
e .<br />
a<br />
a<br />
a<br />
a<br />
o<br />
.<br />
.<br />
`<br />
´<br />
´ .<br />
’ e<br />
e a<br />
a ınh, bao vˆ an to`n d˜. liˆu cho c´c hˆ<br />
a u e<br />
a e<br />
cˆn nghiˆn c´.u cai tiˆn co. chˆ nhˆn dang virus m´y t´<br />
a<br />
e u ’ e<br />
.<br />
.<br />
.<br />
.<br />
.i c´c phu.o.ng ph´p d˜ biˆt, ch´ng tˆi giai quyˆt<br />
´<br />
´<br />
´<br />
thˆng cˆng nghˆ thˆng tin (CNTT). Kh´c v´ a<br />
o<br />
o<br />
e o<br />
a o<br />
a a e<br />
u<br />
o ’<br />
e<br />
.<br />
`<br />
´ .<br />
b`i to´n nhˆn dang virus m´y t´ theo hu.´.ng tiˆp cˆn m´y hoc. Dˆu tiˆn ch´ng tˆi dinh<br />
a<br />
a<br />
a<br />
a ınh<br />
o<br />
e a<br />
a<br />
a<br />
e<br />
u<br />
o .<br />
.<br />
.<br />
.<br />
.´.ng dˆi tu.o.ng 5 l´.p virus co. ban A, B, C, D v` E du.a v`o d˘c tru.ng d˜. liˆu cua<br />
´<br />
’<br />
’<br />
ngh˜ hu o<br />
ıa<br />
o<br />
o<br />
a<br />
a a<br />
u e<br />
.<br />
.<br />
.<br />
.<br />
.ng c´c b`i to´n hoc cho c´c l´.p virus du.a v`o c´c k˜ thuˆt hoc quy<br />
ch´ng. Sau d´ xˆy du<br />
u<br />
o a<br />
a a<br />
a<br />
a o<br />
a a y<br />
a .<br />
.<br />
.<br />
.<br />
.<br />
˜<br />
´<br />
´ .<br />
’ a<br />
a . ınh o<br />
nap, hoc chı dˆn, hoc vet, hoc tu.o.ng tu. v` hoc t` huˆng. Dˆ d´nh gi´ tiˆp cˆn, ch´ng tˆi<br />
e’ a<br />
a e a<br />
u<br />
o<br />
.<br />
.<br />
.<br />
. .<br />
.<br />
.p 5 b`i to´n hoc v`o hˆ ph`ng chˆng virus m´y t´ hu.´.ng tiˆp cˆn m´y hoc v` hˆ<br />
´<br />
´ .<br />
t´ ho<br />
ıch .<br />
a<br />
a<br />
o<br />
a ınh<br />
o<br />
e a<br />
a<br />
. a e o<br />
.<br />
. a e<br />
.<br />
´<br />
’<br />
chuyˆn gia MAV (Machine Learning Approach to Anti-virus Expert System). Kˆt qua thu.c<br />
e<br />
e<br />
.<br />
´ .<br />
´<br />
nghiˆm cho thˆ y hˆ nhˆn dang ch´ x´c c´c virus d˜ cˆp nhˆt v` du. do´n trˆn 91% biˆn thˆ<br />
e<br />
a e a .<br />
ınh a a<br />
a a<br />
a a . a<br />
e<br />
e<br />
e’<br />
.<br />
.<br />
.<br />
.<br />
.i.<br />
virus m´<br />
o<br />
<br />
.<br />
’<br />
´ ´<br />
ˆ<br />
ˆ<br />
´<br />
´<br />
CO CHE MAY HOC CHAN DOAN VIRUS MAY T´<br />
INH<br />
.<br />
<br />
33<br />
<br />
’<br />
ˆ<br />
2. TONG QUAN<br />
`<br />
2.1. Kh´i niˆm vˆ virus m´y t´<br />
a<br />
e<br />
e<br />
a ınh<br />
.<br />
´<br />
Virus m´y t´ (computer virus, trong b`i n`y goi t˘t l` virus) l` loai chu.o.ng tr` m´y<br />
a ınh<br />
a a . a a<br />
a .<br />
ınh a<br />
.o.c thiˆt kˆ dˆ thu.c hiˆn c´c chı thi cua n´ sau chu.o.ng tr` kh´c [1]. B´ mˆt sao ch´p ban<br />
´ e e’ .<br />
´<br />
’ . ’ o<br />
du .<br />
e<br />
e a<br />
ınh a<br />
ı a<br />
e ’<br />
.<br />
.<br />
. m´y n`y sang m´y kh´c, l`m suy giam n˘ng<br />
’<br />
thˆn n´ v`o c´c hˆ thˆng m´y t´<br />
a o a a e o<br />
a ınh, virus lˆy t` a a<br />
a u<br />
a<br />
a a<br />
a<br />
. ´<br />
.c hoat dˆng hˆ thˆng v` xˆm pham d˜. liˆu ngu.`.i d`ng. Theo Bordera [2], virus m´y t´<br />
lu<br />
e o<br />
a a<br />
u e<br />
o u<br />
a ınh<br />
.<br />
. o<br />
.<br />
. ´<br />
.<br />
.<br />
. chı thi, thˆng tin, d˜. liˆu ho˘c chu.o.ng tr` l`m suy giam t´ ho`n thiˆn cua<br />
´<br />
’<br />
’<br />
o<br />
u e<br />
a<br />
ınh a<br />
ınh a<br />
e<br />
l`: “bˆ t c´ ’ .<br />
a<br />
a u<br />
.<br />
.<br />
.<br />
’m ho˘c ph´ huy, ho˘c gh´p ban thˆn n´ v`o<br />
’<br />
’<br />
t`i nguyˆn m´y t´<br />
a<br />
e<br />
a ınh, l`m vˆ hiˆu, gˆy nguy hiˆ<br />
a<br />
o e<br />
a<br />
e<br />
a<br />
a<br />
a<br />
e<br />
a o a<br />
.<br />
.<br />
.<br />
t`i nguyˆn cua m´y t´ kh´c v` thi h`nh khi chu.o.ng tr` m´y t´ thi h`nh”.<br />
a<br />
e ’<br />
a ınh a a<br />
a<br />
ınh a ınh<br />
a<br />
Ch´ng tˆi phˆn loai virus du.a v`o d˘c tru.ng d˜. liˆu theo 5 l´.p nhu. sau:<br />
u<br />
o<br />
a<br />
a a<br />
u e<br />
o<br />
.<br />
.<br />
.<br />
.<br />
.p A (stand Alone program): c´c loai sˆu tr` c´ dinh dang u.ng dung dˆc lˆp.<br />
- L´<br />
o<br />
a<br />
ınh o .<br />
o a<br />
. a<br />
. ´<br />
.<br />
. .<br />
.p B (Boot record): c´c loai virus lˆy v`o mˆu tin kho.i dˆng hˆ thˆng.<br />
˜<br />
´<br />
’ o<br />
e o<br />
- L´<br />
o<br />
a<br />
a a<br />
a<br />
.<br />
.<br />
.<br />
` .<br />
- L´.p C (asCii text): c´c loai virus, sˆu tr` c´ m˜ nguˆ n dang script.<br />
o<br />
a<br />
a<br />
ınh o a<br />
o<br />
.<br />
.p D (Document): c´c loai macro virus lˆy v`o tu. liˆu Microsoft Office.<br />
- L´<br />
o<br />
a<br />
a a<br />
e<br />
.<br />
.<br />
.p E (Executable): c´c loai virus lˆy v`o c´c tˆp tin thi h`nh.<br />
a<br />
a a a a<br />
a<br />
- L´<br />
o<br />
.<br />
.<br />
’<br />
` a<br />
2.2. Tˆ ng quan vˆ b`i to´n nhˆn dang v` du. b´o virus m´y t´<br />
o<br />
e<br />
a<br />
a<br />
a . a<br />
a ınh<br />
.<br />
.<br />
´ a<br />
e<br />
Nhˆn dang virus m´y t´ l` qu´ tr` t` kiˆm c´c mˆ ta d˘c tru.ng virus trong thu. viˆn<br />
a .<br />
a ınh a a ınh ım e<br />
o ’ a<br />
.<br />
.<br />
.<br />
’<br />
˜u trˆn tˆp chˆ n do´n [3]. N˘m 1995, Lo v` cˆng su. [4] gi´.i thiˆu phu.o.ng ph´p loc m˜ dˆc<br />
a<br />
a<br />
a o<br />
mˆ<br />
a<br />
e a<br />
a<br />
o<br />
e<br />
a .<br />
a o<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.a v`o phˆn t´ d˘c tru.ng v` thuˆc t´<br />
.o.ng ph´p n`y c´ u.u diˆ m l` do.n gian nhu.ng<br />
’ a<br />
’<br />
du a<br />
a ıch a<br />
a<br />
o ınh. Phu<br />
a a o<br />
e<br />
.<br />
.<br />
.<br />
´<br />
`<br />
´<br />
´<br />
’ a<br />
e a<br />
kha n˘ng du. b´o virus m´.i c`n han chˆ. N˘m 1996, IBM dˆ xuˆ t phu.o.ng ph´p thˆng kˆ dˆ<br />
a<br />
o o<br />
e a<br />
a<br />
o<br />
e e’<br />
.<br />
.<br />
. dˆng [5]. Do dˆu ra chı l` c´c chuˆi m˜ tr´ chon nˆn chu.a<br />
˜<br />
˜<br />
`<br />
’ a a<br />
tr´ chon chuˆi nhˆn dang tu o<br />
ıch .<br />
o<br />
a<br />
a<br />
o a ıch .<br />
e<br />
.<br />
.<br />
. .<br />
. b´o du.o.c dˆi tu.o.ng c´ phai l` m˜ dˆc hay khˆng. N˘m 1998, Spafford gi´.i thiˆu phu.o.ng<br />
´<br />
’ a a o<br />
du a<br />
o<br />
e<br />
o<br />
a<br />
o<br />
.<br />
. o<br />
.<br />
.<br />
.<br />
’ a sˆu tr` Internet trˆn co. so. d˜. liˆu (CSDL) m˜ thu.c<br />
’ u e<br />
a .<br />
ph´p phˆn t´ qu´ tr` lˆy lan cu a<br />
a<br />
a ıch a ınh a<br />
ınh<br />
e<br />
.<br />
´<br />
´<br />
thi, c´ch lˆy v` vi tr´ c´c n´t mang bi tˆ n cˆng dˆ du. b´o c´c t` huˆng tu.o.ng tu. trˆn c´c<br />
a<br />
a a . ı a u<br />
a o<br />
e’ . a a ınh o<br />
.<br />
.<br />
. e a<br />
.o.ng ph´p n`y chay chˆm, chi ph´ cao, dˆ qu´ tai khi mo. rˆng danh s´ch<br />
˜ a ’<br />
’ o<br />
n´t kh´c [6]. Phu<br />
u<br />
a<br />
a a<br />
a<br />
ı<br />
e<br />
a<br />
.<br />
.<br />
.<br />
. dung mˆ h` mang tr´ tuˆ nhˆn tao ANN<br />
’<br />
o ınh .<br />
ı e a .<br />
c´c n´t mang v` sˆu tr`<br />
a u<br />
a a<br />
ınh. N˘m 2000, IBM su .<br />
a<br />
.<br />
.<br />
.p c´c mˆu tin kho.i dˆng (MTKD). Kˆt qua nhˆn dang<br />
˜<br />
´<br />
’ o<br />
’<br />
(Artificial Neural Networks) phˆn l´ a<br />
a o<br />
a<br />
e<br />
a<br />
.<br />
.<br />
.<br />
.o.c 80–85% c´c virus la v´.i sai sˆ du.´.i 1% trˆn c´c mˆu du.o.ng [7]. Tuy nhiˆn khi ´p dung<br />
˜<br />
´<br />
a<br />
o<br />
o o<br />
e a<br />
a<br />
e<br />
a<br />
du .<br />
.<br />
.<br />
´<br />
ANN cho c´c dˆi tu.o.ng thi h`nh Win32, c´c chuyˆn gia IBM c˜ng chu.a du.a ra du.o.c minh<br />
a o<br />
a<br />
a<br />
e<br />
u<br />
.<br />
.<br />
´<br />
ch´.ng thuyˆt phuc n`o cho hu.´.ng nghiˆn c´.u n`y [8]. N˘m 2001, G. Matthew v` cˆng su.<br />
u<br />
e<br />
a<br />
o<br />
e u<br />
a<br />
a<br />
a o<br />
.<br />
.<br />
.<br />
`<br />
´ kˆt qua nhˆn dang m˜ dˆc Win32 b˘ ng k˜ thuˆt hoc quy nap Find-S (dat 87.35%)<br />
´<br />
’<br />
cˆng bˆ e<br />
o<br />
o<br />
a<br />
a o<br />
a<br />
y<br />
a .<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
’<br />
v` phˆn l´.p Nave Bayes (dat 96.7%) [9]. Tuy nhiˆn do c´c thuˆt to´n chuˆ n h´a d˜. liˆu ph´.c<br />
a a o<br />
e<br />
a<br />
a<br />
a<br />
a o u e<br />
u<br />
.<br />
.<br />
.<br />
. cho 4266 mˆu thu. (3265 m˜ dˆc v` 1001 u.ng dung) nhu.ng hoat<br />
˜<br />
`<br />
´<br />
’<br />
tap, cˆn dˆn 1 GB bˆ nh´<br />
a e<br />
o o<br />
a<br />
a o a<br />
´<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.o.ng chu.a du.o.c phˆn l´.p nˆn phu.o.ng ph´p n`y c´ han chˆ<br />
´<br />
´<br />
’ e a o<br />
dˆng k´m hiˆu qua trˆn c´c dˆi tu .<br />
o<br />
e<br />
e<br />
a o e<br />
a a o .<br />
e<br />
.<br />
.<br />
.<br />
˜<br />
` m˘t thu.c tiˆn.<br />
vˆ a<br />
e .<br />
e<br />
.<br />
.<br />
’<br />
´<br />
ˆ<br />
´<br />
ˆ<br />
´<br />
´<br />
3. CO CHE MAY HOC CHAN DOAN VIRUS MAY T´<br />
INH<br />
.<br />
’ u<br />
’<br />
3.1. Tˆ ch´.c co. so. tri th´.c<br />
o<br />
u<br />
´<br />
M´y hoc (machine learning) l` l´ thuyˆt xˆy du.ng c´c hˆ chu.o.ng tr` tu. kh´m ph´ tri<br />
a<br />
a y<br />
e a<br />
a e<br />
ınh .<br />
a<br />
a<br />
.<br />
.<br />
.<br />
<br />
34<br />
<br />
. .<br />
´<br />
`<br />
ˆ<br />
ˆ<br />
HOANG KIEM, TRU O NG MINH NHA T QUANG<br />
.<br />
<br />
´<br />
’ a<br />
e<br />
u<br />
a ıch, ’ y ıch .<br />
th´.c b˘ ng c´c cˆ u tr´c d˜. liˆu v` thuˆt giai d˘c biˆt, gi´p phˆn t´ xu. l´, tr´ chon, chi<br />
u `<br />
a<br />
a a<br />
u u e a<br />
a<br />
.<br />
.<br />
.<br />
.<br />
. liˆu v` hˆ tro. quyˆt dinh liˆn quan dˆn kinh nghiˆm cua con ngu.`.i. Mˆt sˆ k˜<br />
˜<br />
´<br />
´<br />
´<br />
’<br />
e<br />
e<br />
e<br />
tiˆt h´a d˜ e a o .<br />
e o u .<br />
e .<br />
o<br />
o o y<br />
.<br />
. ´<br />
.p cho c´c tru.`.ng ho.p cˆn tham khao y kiˆn<br />
´<br />
’ ´ e<br />
thuˆt hoc c´ thˆ sinh luˆt chuyˆn gia, th´ ho<br />
a . o e’<br />
a<br />
e<br />
ıch .<br />
a<br />
o<br />
a<br />
.<br />
.<br />
. `<br />
.c cu thˆ v` chuyˆn sˆu [10]. Nguyˆn liˆu d`nh cho c´c hˆ hoc<br />
’ a<br />
chuyˆn gia trong c´c l˜ vu . e<br />
e<br />
a ınh .<br />
e a<br />
e e a<br />
a e .<br />
.<br />
.<br />
. so. tri th´.c (CSTT, knowledge base), ch´.a c´c su. kiˆn mˆ ta d˜. liˆu v` c´c luˆt nhˆn<br />
’<br />
’ u e a a<br />
l` co<br />
a<br />
u<br />
u a . e<br />
o<br />
a<br />
a<br />
.<br />
.<br />
.<br />
.<br />
dang.<br />
.<br />
´ .<br />
` .<br />
`<br />
’ y a<br />
Trong tiˆp cˆn m´y hoc, tri th´.c virus ch´.a thˆng tin vˆ loai virus cˆn xu. l´, c´c mˆ ta<br />
e a<br />
a<br />
u<br />
u<br />
o<br />
e<br />
a<br />
o ’<br />
.<br />
.o.ng, c´c luˆt nhˆn dang v` dang th´.c d˜. liˆu m` virus nh˘m<br />
´<br />
´<br />
’<br />
h`nh vi cua virus trˆn dˆi tu .<br />
a<br />
e o<br />
a<br />
a<br />
a<br />
a .<br />
u u e<br />
a<br />
a<br />
.<br />
.<br />
.<br />
.<br />
. dung mˆ h` l´.p (class) ch´.a c´c virus c´ c`ng d˘c tru.ng d˜. liˆu. Mˆi l´.p<br />
˜ o<br />
’<br />
a<br />
v`o. MAV su .<br />
a<br />
o ınh o<br />
u a<br />
o u<br />
u e<br />
o<br />
.<br />
.<br />
.o.ng u.ng v´.i mˆt l´.p d˜. liˆu chˆ n do´n [11] du.o.c dinh ngh˜ hu.´.ng dˆi tu.o.ng nhu. o.<br />
’<br />
´<br />
’<br />
a<br />
ıa o<br />
o<br />
virus tu<br />
´<br />
o o o u e<br />
a<br />
.<br />
.<br />
. .<br />
.<br />
H` 1.<br />
ınh<br />
<br />
’ .<br />
Dang tri th´.c th´. hai du.o.c mˆ ta trong CSTT l` tˆp luˆt nhˆn dang. MAV su. dung mˆt<br />
u<br />
u<br />
o ’<br />
a a<br />
a<br />
a<br />
o<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
. viˆn mˆ ta d˘c tru.ng virus du.´.i dang tˆp c´c vector VK = {v1 , v2, ..., vk} v` ´p dung ph´p<br />
o ’ a<br />
o .<br />
a a<br />
aa .<br />
e<br />
thu e<br />
.<br />
.<br />
.<br />
´<br />
´<br />
truy vˆ n c´c vecto. vi trong tˆp d˜. liˆu S theo c´c luˆt dˆn xuˆ t dang: R : p1∧p2∧...∧pn → q,<br />
a a<br />
a u e<br />
a a a<br />
a .<br />
.<br />
.<br />
. ˜<br />
.ng cho tˆp thuˆc t´ virus, q l` kˆt luˆn cua qu´ tr` suy diˆn.<br />
˜<br />
´ a<br />
’<br />
trong d´ pi d˘c tru<br />
o<br />
a<br />
a<br />
o ınh<br />
a e<br />
a ınh<br />
e<br />
.<br />
.<br />
.<br />
.<br />
’<br />
3.2. Phˆn hoach b`i to´n chˆ n do´n virus m´y t´<br />
a<br />
a<br />
a<br />
a<br />
a<br />
a ınh<br />
.<br />
’<br />
’ a o<br />
a a<br />
a<br />
u e<br />
a<br />
a<br />
a<br />
a<br />
a ınh<br />
Du.a v`o d˘c tru.ng nhˆn dang cua c´c l´.p d˜. liˆu, b`i to´n chˆ n do´n virus m´y t´<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.o.c phˆn th`nh c´c b`i to´n con, su. dung c´c k˜ thuˆt hoc t`. do.n gian dˆn ph´.c tap nhu.<br />
´<br />
’ .<br />
’<br />
a<br />
a<br />
a a a<br />
a y<br />
a . u<br />
du .<br />
e<br />
u .<br />
.<br />
sau:<br />
’<br />
´<br />
- B`i to´n 1: chˆ n do´n l´.p C (asCii text files) theo co. chˆ hoc vet.<br />
a a<br />
a<br />
a o<br />
e . .<br />
.p D (Document files) theo co. chˆ hoc tu.o.ng tu..<br />
’<br />
´<br />
- B`i to´n 2: chˆ n do´n l´<br />
a a<br />
a<br />
e .<br />
a o<br />
.<br />
.p B (Boot record) theo co. chˆ hoc chı dˆn.<br />
’<br />
˜<br />
´<br />
’ a<br />
- B`i to´n 3: chˆ n do´n l´<br />
a a<br />
a<br />
a o<br />
e .<br />
’<br />
´<br />
´<br />
- B`i to´n 4: chˆ n do´n l´.p E (Executable files) theo co. chˆ hoc t` huˆng.<br />
a a<br />
a<br />
a o<br />
e . ınh o<br />
’n do´n l´.p A (stand Alone program) theo co. chˆ hoc quy nap.<br />
´<br />
- B`i to´n 5: chˆ<br />
a a<br />
a<br />
a o<br />
e .<br />
.<br />
˜i b`i to´n su. dung co. so. d˜. liˆu (CSDL) virus mˆu d˘c th` tu.o.ng u.ng cua l´.p:<br />
˜ a<br />
’ .<br />
’ u e<br />
’ o<br />
Mˆ a a<br />
o<br />
a<br />
u<br />
´<br />
.<br />
.<br />
S = {SA , SB , SC , SD , SE }<br />
<br />
˜<br />
v´.i SA , SB , SC , SD v` SE l` CSDL virus mˆu cua c´c l´.p; aObject, bObject, cObject, dObject<br />
o<br />
a<br />
a<br />
a ’ a o<br />
. liˆu trong khˆng gian chˆ n do´n cua mˆi b`i to´n, theo th´. tu. d´.<br />
’<br />
˜<br />
e’<br />
u .<br />
o<br />
a<br />
a ’<br />
o a a<br />
u . o<br />
v` eObject l` c´c diˆ m d˜ e<br />
a<br />
a a<br />
’<br />
3.3. C´c b`i to´n chˆ n do´n virus m´y t´<br />
a<br />
a<br />
a<br />
a<br />
a<br />
a ınh<br />
’<br />
3.3.1. B`i to´n 1: chˆ n do´n l´.p virus C-class<br />
a a<br />
a<br />
a o<br />
.p C lˆy nhiˆm b˘ ng c´ch ch`n ho˘c tao m´.i cˆu lˆnh script v`o dˆi tu.o.ng. Goi:<br />
˜<br />
`<br />
´<br />
Virus l´<br />
o<br />
a<br />
e<br />
a<br />
a<br />
e<br />
a .<br />
o a e<br />
a o<br />
.<br />
.<br />
.<br />
.<br />
.o.ng chˆ n do´n.<br />
’<br />
´<br />
T = {ai , c|i = 32, ..., 127; c ∈ N } l` dˆi tu .<br />
a o<br />
a<br />
a<br />
.o.ng lˆy nhiˆm (virus).<br />
˜<br />
´<br />
V = {bj , m|i = 32, ..., 127; n ∈ N } l` dˆi tu .<br />
a o<br />
a<br />
e<br />
. cua T , c l` k´ thu.´.c (sˆ k´ tu.) cua T, b l` tˆp k´ tu. cua virus<br />
´ y .<br />
’<br />
’<br />
trong d´ ai l` tˆp k´ tu<br />
o<br />
a a y .<br />
a ıch<br />
o<br />
o<br />
y . ’<br />
j a a<br />
.<br />
.<br />
.´.c cua V v` N l` tˆp sˆ nguyˆn du.o.ng. T nhiˆm virus V khi v` chı khi<br />
˜<br />
´<br />
’<br />
V, m l` k´ thu o<br />
a ıch<br />
a<br />
a a o<br />
e<br />
e<br />
a ’<br />
.<br />
<br />
.<br />
’<br />
´ ´<br />
ˆ<br />
ˆ<br />
´<br />
´<br />
CO CHE MAY HOC CHAN DOAN VIRUS MAY T´<br />
INH<br />
.<br />
<br />
35<br />
<br />
V ⊆ T.<br />
´.<br />
˜ ´<br />
’<br />
o o o<br />
a<br />
Goi SC = {V1, V2, ..., Vn} l` CSDL l´.p C . U ng v´.i mˆi dˆi tu.o.ng chˆ n do´n T , x´c dinh:<br />
a<br />
o<br />
a<br />
a .<br />
.<br />
.<br />
.`.ng ho.p 1: T ⊃ V ∀i = 1..n, kˆt luˆn T nhiˆm virus V (t´.c l` T = T ∪ V ):<br />
˜<br />
´ a<br />
• Tru o<br />
e<br />
e<br />
a<br />
i<br />
i u<br />
0<br />
.<br />
.<br />
a `<br />
a u ’<br />
- X´c dinh T0 = CT (Vi) = T \Vi∀CT (Vi) l` phˆn b` cua Vi trong T<br />
a .<br />
- Loai bo virus: Vi ← {φ}.<br />
. ’<br />
.`.ng ho.p 2: T = V ∀i = 1..n, kˆt luˆn dˆi tu.o.ng T l` sˆu tr` V . Do sˆu tr` khˆng<br />
´ a o<br />
´<br />
• Tru o<br />
e<br />
a a<br />
ınh i<br />
a<br />
ınh o<br />
i<br />
.<br />
.<br />
.<br />
’ (T0 = {φ}) nˆn thu.c hiˆn Vi ← {φ}.<br />
c´ vˆt chu<br />
o a<br />
e<br />
e<br />
.<br />
.<br />
.<br />
’<br />
´<br />
’<br />
Ban chˆ t cua b`i to´n chˆ n do´n C -class l` hoc vet. Tri th´.c virus du.o.c chuyˆn gia cung<br />
a ’ a a<br />
a<br />
a<br />
a . .<br />
u<br />
e<br />
.<br />
.´.i dang . Thuˆt giai do.n gian, c´ dˆ ph´.c tap<br />
˜<br />
’<br />
´<br />
’<br />
’<br />
cˆ p du o .<br />
a<br />
a u e<br />
a<br />
a<br />
o o u .<br />
.<br />
.<br />
.<br />
.<br />
.i k´ thu.´.c d˜. liˆu v` sˆ mˆu virus c´ trong S . Tuy nhiˆn thuˆt to´n khˆng<br />
´ ˜<br />
’ e o<br />
O(n) ty lˆ v´ ıch<br />
o u e a o a<br />
o<br />
e<br />
a<br />
a<br />
o<br />
C<br />
.<br />
.<br />
.<br />
.a ra kh˘ng dinh du.o.ng khi c´ virus m´.i. Do virus text c´ tˆp lˆnh han chˆ v` ´ phˆ biˆn<br />
’ ´<br />
’<br />
´ a ıt o e<br />
du<br />
a<br />
o<br />
o<br />
o a e<br />
e<br />
.<br />
. .<br />
.<br />
nˆn hoc vet l` lu.a chon ph` ho.p trong giai doan hiˆn nay. Trong tu.o.ng lai khi lu.o.ng virus<br />
e<br />
a .<br />
u .<br />
e<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
`<br />
´<br />
’ o<br />
’<br />
text du l´.n, c´ thˆ thay b˘ ng c´c mˆ h` hoc du.a x´c suˆ t trˆn d˜. liˆu v˘n ban nhu. Nave<br />
o e’<br />
a<br />
a<br />
o ınh .<br />
a<br />
a e u e a<br />
.<br />
.<br />
Bayes.<br />
’<br />
3.3.2. B`i to´n 2: chˆ n do´n l´.p virus D-class<br />
a a<br />
a<br />
a o<br />
<br />
’ .<br />
D − class l` l´.p c´c virus macro su. dung tˆp m˜ lˆnh VBA (Visual Basic Application) dˆ<br />
a o a<br />
a<br />
a e<br />
e’<br />
.<br />
.<br />
.`.ng MSOffice [12]. Kh´c v´.i c´c macro thˆng thu.`.ng thi h`nh nh`.<br />
˜<br />
a o a<br />
o<br />
o<br />
a<br />
o<br />
lˆy nhiˆm trˆn mˆi tru o<br />
a<br />
e<br />
e<br />
o<br />
. thi h`nh b˘ ng c´c thu tuc trigger (nhu. AutoExec). Chı c´ c´c<br />
`<br />
’ .<br />
’ o a<br />
lˆnh Run, c´c virus macro tu<br />
e<br />
a<br />
a<br />
a<br />
a<br />
.<br />
.<br />
tu. liˆu n`o su. dung macro m´.i c´ nguy co. ch´.a virus macro (H` 2).<br />
e a ’ .<br />
o o<br />
u<br />
ınh<br />
.<br />
`<br />
Trong mˆ h` hoc kh´m ph´ tu.o.ng dˆ ng, c´c h`m R nhˆn dang c´ dang:<br />
o ınh .<br />
a<br />
a<br />
o<br />
a a<br />
a<br />
o .<br />
.<br />
.<br />
(Xi = Vi) ∧ ... ∧ (Xk = Vk )<br />
<br />
˜<br />
´<br />
´<br />
trong d´ mˆi Xj l` c´c biˆn, Vj l` c´c gi´ tri c´ thˆ c´ cua c´c biˆn n`y, c´c ph´p tuyˆ n cua<br />
o o<br />
a a<br />
e<br />
a a a . o e’ o ’ a<br />
e a a<br />
e<br />
e’ ’<br />
.ng gi´ tri c´ thˆ c´, ho˘c tˆp cua nh˜.ng gi´ tri n`y.<br />
’ o<br />
’<br />
nh˜<br />
u<br />
a . o e<br />
a a<br />
u<br />
a . a<br />
. .<br />
’<br />
´i v´.i dˆi tu.o.ng chˆ n do´n dObject khi c´c gi´ tri cua c´c<br />
´<br />
a<br />
Mˆt h`m R c´ tri TRUE dˆ o o<br />
o a<br />
o .<br />
o<br />
a<br />
a<br />
a . ’ a<br />
.<br />
.<br />
.ng h`m d´. Ngo`i ra, h`m tra vˆ tri FALSE. Trong khˆng<br />
´ ’<br />
’ ` .<br />
a<br />
o<br />
a<br />
a<br />
e<br />
o<br />
biˆn cua dObject l` mˆt trong nh˜<br />
e<br />
a o<br />
u<br />
.<br />
.o.ng, khi h`m R nhˆn dang nhiˆu ho.n mˆt dˆi tu.o.ng, tˆp con cua<br />
’<br />
`<br />
´<br />
´<br />
’<br />
gian chˆ n do´n N dˆi tu .<br />
a<br />
a<br />
o<br />
a<br />
a<br />
e<br />
o o<br />
a<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.o.c nhˆn dang bo.i R. Ngu.o.c lai, cho mˆt tˆp con c´c<br />
’<br />
a<br />
o a<br />
a<br />
c´c gi´ tri m` n´ nhˆn dang goi l` du .<br />
a<br />
a . a o a<br />
.<br />
.<br />
. a<br />
.<br />
.<br />
. .<br />
. .<br />
’ .<br />
`<br />
´i tu.o.ng, ta c´ thˆ tao mˆt h`m nhˆn dang du.o.c ph´t sinh bo.i tˆp con n`y b˘ ng c´ch lˆ y<br />
´<br />
’ a<br />
dˆ<br />
o<br />
o e<br />
o a<br />
a<br />
a<br />
a a<br />
a<br />
a<br />
.<br />
.<br />
.<br />
.<br />
.<br />
.<br />
´ ’<br />
ph´p tuyˆ n c´c gi´ tri cua c´c biˆn cua ch´ng [13].<br />
e<br />
e’ a<br />
a . ’ a<br />
e<br />
u<br />
˜ ´<br />
´<br />
Trong khˆng gian SD, hˆ s˜ xˆy du.ng c´c h`m R cho mˆi dˆi tu.o.ng dObject. Nˆu R nhˆn<br />
o<br />
e e a<br />
a a<br />
o o<br />
e<br />
a<br />
.<br />
.<br />
.<br />
.<br />
.o.c Vj (tu.o.ng u.ng v´.i n´t l´ ”Virus macro”), kˆt luˆn dObject nhiˆm virus d˜ biˆt:<br />
˜<br />
´ a<br />
´<br />
dang du .<br />
´<br />
o u a<br />
e<br />
e<br />
a e<br />
.<br />
.<br />
R : (X1 = true) ∧ (X2 = true) ∧ (X3 = true) ∧ (X4 = true) ∧ (X4+i = true) ∀i = 1..n.<br />
<br />
˜<br />
´ a<br />
Ngu.o.c lai, c´ thˆ kˆt luˆn dObject nhiˆm mˆt loai virus macro m´.i.<br />
e<br />
o<br />
o<br />
. . o e’ e<br />
.<br />
.<br />
.<br />
´<br />
H` 3a v` 3b mˆ ta c´c luˆt nhˆn dang virus macro c˜ v` m´.i theo co. chˆ hoc tu.o.ng<br />
ınh<br />
a<br />
o ’ a<br />
a<br />
a<br />
u a o<br />
e .<br />
.<br />
.<br />
.<br />
.. B`i to´n chˆ n do´n D − class c´ thˆ nhˆn dang dˆn 98% c´c macro la (2% thˆ t bai do<br />
’<br />
´<br />
´<br />
tu<br />
a a<br />
a<br />
a<br />
o e’ a<br />
e<br />
a<br />
a .<br />
.<br />
.<br />
.<br />
.<br />
.`.i d`ng). Tuy nhiˆn k˜ thuˆt n`y khˆng ph´t hiˆn du.o.c c´c virus chen gi˜.a<br />
’<br />
password cua ngu o u<br />
e y<br />
a a<br />
o<br />
a e<br />
u<br />
.<br />
.<br />
. a<br />
. tao. Hu.´.ng giai quyˆt l` thiˆt lˆp bˆ tinh chınh luˆt du.´.i dang t`y chon diˆu<br />
´ a<br />
´ a o<br />
`<br />
’<br />
’<br />
c´c macro tu .<br />
a<br />
o<br />
e<br />
e .<br />
a<br />
o .<br />
u<br />
e<br />
.<br />
.<br />
.<br />
.<br />
`<br />
`<br />
khiˆ n trang th´i c´c mˆnh dˆ “dObject khˆng c´ macro tu. tao” v` “Dˆ ng y x´a macro.”<br />
e’<br />
a a<br />
e<br />
e<br />
o<br />
o<br />
a<br />
o ´ o<br />
.<br />
.<br />
. .<br />
<br />
36<br />
<br />
. .<br />
´<br />
`<br />
ˆ<br />
ˆ<br />
HOANG KIEM, TRU O NG MINH NHA T QUANG<br />
.<br />
<br />
H` 2. Phˆn loai tu. liˆu MSOffice v` c´c h`m R nhˆn dang virus macro<br />
ınh<br />
a<br />
e<br />
a a a<br />
a .<br />
.<br />
.<br />
.<br />
<br />
’<br />
3.3.3. B`i to´n 3: chˆ n do´n l´.p virus B-class<br />
a a<br />
a<br />
a o<br />
’<br />
`<br />
’ o u ıa. a<br />
L´.p B ch´.a c´c boot virus lˆy v`o c´c MTKD trˆn sector dˆu tiˆn cua tˆ ch´.c d˜ B`i<br />
o<br />
u a<br />
a a a<br />
e<br />
a<br />
e<br />
.o.c giai quyˆt theo hu.´.ng phˆn t´ h`nh vi [14] nhu. sau:<br />
’<br />
´<br />
’<br />
to´n chˆ n do´n B − class du .<br />
a<br />
a<br />
a<br />
e<br />
o<br />
a ıch a<br />
’ ´ ’ a<br />
’<br />
´<br />
a e a a<br />
o e<br />
• Tˆ ch´.c 2 CSDL ch´.a c´c boot virus d˜ biˆt v` c´c MTKD sach phˆ biˆn cua c´c HDH.<br />
o u<br />
u a<br />
.<br />
´<br />
`<br />
’<br />
• Cung cˆ p 2 tˆp miˆn (domain theory) dinh ngh˜ h`nh vi cua boot virus v` MTKD sach.<br />
a<br />
a<br />
e<br />
ıa a<br />
a<br />
.<br />
.<br />
.<br />
V´ du:<br />
ı .<br />
Bootvirus ← GetM emSize, DecM emSize, SetM emSize, SetM emV i, M ovV iCode<br />
GetM emSize ← ReadM em, GetV alue<br />
DecM emSize ← SetN ewSize, W riteM em(...)<br />
<br />
´<br />
´ .<br />
’<br />
• Tai bObject v`o khˆng gian t` kiˆm l` mˆt cˆy nhi phˆn c´ n´t gˆc d˘c ta diˆ m v`o lˆnh.<br />
a<br />
o<br />
ım e a o a<br />
a e<br />
.<br />
. a o u o a ’ e’<br />
.<br />
.. N´t con l` c´c lˆnh r˜ hu.´.ng v` nhay. N´t l´ l` c´c diˆ m<br />
˜ a e<br />
`<br />
’<br />
Nh´nh biˆ u diˆn c´c lˆnh tuˆn tu u<br />
a<br />
e’<br />
e<br />
a .<br />
a a e<br />
e o<br />
a<br />
u a a a<br />
e’<br />
.<br />
.<br />
.ng. C´c lˆnh l˘p xu. l´ nhu. lˆnh tuˆn tu. v`o-ra trˆn cˆy con cuc bˆ (H` 4).<br />
`<br />
’ y<br />
d`<br />
u<br />
a e<br />
a<br />
e<br />
a . a<br />
e a<br />
.<br />
.<br />
.<br />
. o ınh<br />
.<br />
´<br />
´<br />
’ ım e<br />
’<br />
• Ap dung thuˆt giai t` kiˆm, thu thˆp c´c h`nh vi cua bObject v`o danh s´ch t´c vu:<br />
a<br />
a a a<br />
a<br />
a<br />
a .<br />
.<br />
.<br />
.<br />
. nhˆ t, thˆng b´o t` trang<br />
´<br />
`<br />
`<br />
´<br />
’ a<br />
’ a<br />
- Nˆu danh s´ch phan ´nh dˆy du c´c mˆ ta cua tˆp miˆn th´<br />
e<br />
a<br />
a<br />
o ’ ’ a<br />
e<br />
u a<br />
o<br />
a ınh .<br />
.<br />
. l´ bˆnh, b´o c´o kˆt qua, kˆt th´c qu´ tr`<br />
˜<br />
´<br />
´<br />
’<br />
’<br />
’ e<br />
nhiˆm virus cua bObject, xu y e<br />
e<br />
a a e<br />
u<br />
a ınh.<br />
.<br />
´ a<br />
´<br />
`<br />
’ a<br />
e<br />
a<br />
- Nˆu danh s´ch phan ´nh c´c mˆ ta cua tˆp miˆn th´. hai, kˆt luˆn bObject an to`n.<br />
e<br />
a<br />
a<br />
o ’ ’ a<br />
e<br />
u<br />
.<br />
.<br />
´<br />
’<br />
- Ngo`i ra, bObject c´ t` trang bˆ t thu.`.ng (virus m´.i, sector hong, dinh dang la...).<br />
a<br />
o ınh .<br />
a<br />
o<br />
o<br />
.<br />
.<br />
.<br />
.o.ng v`o CSDL tu.o.ng u.ng.<br />
´<br />
´<br />
• Kˆt th´c qu´ tr`<br />
e<br />
u<br />
a ınh, cˆp nhˆt thˆng tin dˆi tu .<br />
a<br />
a<br />
o<br />
o<br />
a<br />
´<br />
.<br />
.<br />
’<br />
˜<br />
´<br />
´ .<br />
’ a o o o<br />
So v´.i mˆ h` mang no.ron [7], chˆ n do´n boot virus theo co. chˆ hoc chı dˆn c´ tˆc dˆ<br />
o o ınh .<br />
a<br />
a<br />
e .<br />
.o.ng du.o.ng th`.i gian kho.i dˆng d˜ mˆm trˆng) v` ch´ x´c ho.n (nhˆn dang 96%<br />
`<br />
´<br />
’ o<br />
o<br />
ıa e<br />
o<br />
a ınh a<br />
a<br />
nhanh (tu<br />
.<br />
.<br />
.<br />
boot virus la) [15]. Tuy nhiˆn phu.o.ng ph´p n`y c´ nhu.o.c diˆ m l` ph´.c tap trong c`i d˘t [16].<br />
e<br />
a a o<br />
e’ a u .<br />
a a<br />
.<br />
.<br />
.<br />
<br />