MINISTRY OF EDUCATION
AND TRAINING
VIETNAM ACADEMY OF
SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY OF SCIENCE AND TECHNOLOGY
-------------------------------
NGUYEN TUAN ANH
RESEARCH ON DEVELOPING METHOD OF
MINING FUZZY ASSOCIATION RULES BASED
ON LINGUISTIC INFORMATION AND ITS
APPLICATION
Major: Mathematical Foundation for informatics
Code: 62 46 01 10
SUMMARY OF COMPUTER AND INFORMATION
TECHNOLOGY DOCTORAL THESIS
HA NOI, 2020
List of works of author
1
Trần Thái Sơn, Nguyễn Tuấn Anh, Nâng cao hiệu quả khai phá luật kết
hợp mờ theo hướng tiếp cận đại số gia tử", Kỷ yếu hội nghị quốc gia lần VI
về nghiên cứu bản ứng dụng công nghệ thông tin (Fair) - Huế,
6/2013.
2
Tran Thai Son, Nguyen Tuan Anh, “Improve efficiency fuzzy association
rule using hedge algebra approach, Journal of Computer Science and
Cybernetics, Vol 30, No 4, 2014.
3
Tran Thai Son, Nguyen Tuan Anh, Hedges Algebras and fuzzy partition
problem for qualitative attributes, Journal of Computer Science and
Cybernetics, V.32, N.4, 2016.
4
Tran Thai Son, Nguyen Tuan Anh, Partition fuzzy domain with multi-
granularity representation of data based on Hedge Algebra approach,
Journal of Computer Science and Cybernetics, vol. 33, pp. 63-76, 2017.
1
INTRODUCTION
Nowadays, there is an important research direction on which the problem of mining
association rules (AR) is concerned and is soon developed in the direction of data mining.
Specially, many algorithms have been developed in different directions but mainly
focused on two main directions:
(i) Improve the average speed of rule mining algorithms because this is an
exponentially complex problem due to repeated database (DB) scans.
(ii) Further research on the meaning of mining rules means as not all mining rules
have implications for users.
Fuzzy AR take the form: "If X is A, then Y is B". "X is A" is called Premise, "Y is B"
is called the conclusion of the rule. 𝑋={𝑥1,𝑥2,,𝑥𝑝}, Y={𝑦1,𝑦2,,𝑦𝑞} is a subsection
of a set of attributes I of the DB. 𝐴={𝑓𝑥1,𝑓𝑥2,,𝑓𝑥𝑝}, B={𝑓𝑦1,𝑓𝑦2,,𝑓𝑦𝑞} are the
corresponding fuzzy sets of attributes X and Y. Dividing the domain of the attribute is an
important first step for an information processing process. Recently, researchers have paid
attention to the construction of such membership functions (MF) because of the obvious
influence of this step on the next step.
The thesis studies the methods of exploiting legal knowledge in fuzzy combination
with linguistic information (linguistic rule) from DB or digital data stores. Hedge algebras
are used (HA) instead of fuzzy set theory to study some problems of AR mining:
(i) The fuzzy AR is studied and there are some disadvantages, including in
constructing algorithms to increase processing speed as well as in the problem of fuzzy
partition planning of the attributes in order to find out the meaningful ARs.
(ii) With different data representations, HA gives a simple unified approach that is
highly effective in processing.
Research purposes:
- Study methods of semantically expressing fuzzy concepts based the MF or other
mathematical methods providing that they express the semantics of the most suitable
concepts.
- Studying methods of knowledge exploitation in general and fuzzy rules in particular.
- Research different data representations of information so that it can be exploited in
ARs in a diverse and meaningful way. The thesis uses single-granular and multi-granular
data representation, consistent with the increasing attention of this research direction.
CHAPTER 1. SOME BASIC KNOWLEDGE
1.1. Fuzzy set and operations on fuzzy sets
1.1.1. Fuzzy set
Definition 0.1: Let U be the universe of objects. The fuzzy set A on U is a set of
ordered pairs (𝑥,𝜇𝐴(𝑥)), where 𝜇𝐴(𝑥) is a function from U to [0, 1] assigned to each x of
U a value 𝜇𝐴(𝑥) reflects degree of dependence x to the fuzzy set A.
1.1.2. Linguistic variable
1.1.3. Fuzzy partition
There are several methods for partitioning domain values using fuzzy logic as follows:
1) Definition 1.3: Given m fixed items 𝑝1,𝑝2,,𝑝𝑚 that belongs to 𝑈 = [𝑎,𝑏] 𝑅 .
It is the reference space of the basic variable 𝑢 of the llinguistic variable 𝑋. Then a set 𝑇 of
m fuzzy sets 𝐴1,𝐴2,,𝐴𝑚 defined on 𝑈 (with corresponding MFs 𝜇𝐴1,𝜇𝐴2,..., 𝜇𝐴𝑚 are
2
called a fuzzy partition of 𝑈 if the following conditions are met. complacent, ∀𝑘 =
1,,𝑚:𝜇𝐴𝑘(𝑝𝑘) (𝑝𝑘 belongs to the part called core of 𝐴𝑘);
2) If x [𝑝𝑘−1,𝑝𝑘+1] then 𝜇𝐴𝑘(𝑥)=0
3) 𝜇𝐴𝑘(𝑥) consecutively;
4) 𝜇𝐴𝑘(𝑥) monotonous rose on[𝑝𝑘−1,𝑝𝑘];
5) ∀𝑥𝑈, ∃𝑘, so that 𝜇𝐴𝑘(𝑥)>0; If the fuzzy partition satisfies the following
conditions 6), it is called a strong fuzzy partition.
6) ∀𝑥𝜖𝑈,𝜇𝐴𝑘(𝑥)=1
𝑚
𝑘=1 ; If a fuzzy partition satisfies the following conditions 7), 8),
9), it is called a uniform partition.
7) If 𝑘 𝑚 then 𝑘=𝑝𝑘+1 𝑝𝑘= a constant
8) The fuzzy set 𝜇𝐴𝑘(𝑥) is the symmetric function.
1.2. The fuzzy sets 𝝁𝑨𝒌(𝒙) have the same geometry - Hedge algebra
1.3. The concept of hedge algebra
1.3.1. The concept of hedge algebra
Definition 1.4: A HA is denoted as a set of 4 components denoted 𝐴𝑋 = (X,G,H,)
in which G is a set of generating elements, H is a set of hedges, and "≤" is a semantically
inductant relationship on X. Suppose that there are constant elements 0, W, 1 in G
corresponding the smallest element, the largest element and the neutral element in X. We
call each linguistic value 𝑥 𝑋 a term in HA. The set of H contains H={ℎ−1 <−2 <
<−𝑞} and 𝐻+={ℎ1<2<<𝑝}.
1.3.2. Quantitative semantics of linguistic values
Definition 1.5: Suppose AX=(𝑋,𝐺,𝐻,) is a linear HA. The mapping 𝑣𝔵: 𝑋
[0,1] is called a semantic quantitative function of AX if:
𝑣𝔵 is the light one to one from the set X on the fragment [0,1] and preserves the order
on X, ie x, yX, x <y 1vv (x) <v_x (y) 0) = 0, v_x (1) = 1.
v (X) consecutively: truncates in [0,1], ie (a, b) and (a, b) [0,1], (a, b) ∩v_x
(X) ≠ .
(i) 𝑣𝔵 is the light one to one from the set X on the fragment [0,1] and preserves the
order on X, ie x, yX, x <y 1vv (x) <v_x (y) 0) = 0, v_x (1) = 1.
(ii) 𝑣(𝑿)
consecutively
: truncates in [0,1], i.e. (𝑎,𝑏) and (𝑎,𝑏)[0,1],
(𝑎,𝑏)𝑣𝔵(𝑿).
Definition 1.6: A sign function 𝑆𝑖𝑔𝑛 X
{−1,0,1} is a recursively defined map as
follows, in which ℎ,ℎ′
H𝑐
{𝒄,𝒄+}:
(1) 𝑆𝑖𝑔𝑛(𝑐) = −1, 𝑆𝑖𝑔𝑛(𝑐+) = 1;
(2) 𝑆𝑖𝑔𝑛(ℎ𝑐) = −𝑆𝑖𝑔𝑛(𝑐) if h is nnegation to c; 𝑆𝑖𝑔𝑛(ℎ𝑐) = 𝑆𝑖𝑔𝑛(𝑐) if h is
positive to c;
(3) 𝑆𝑖𝑔𝑛(ℎ′ℎ𝑥) = 𝑆𝑖𝑔𝑛(ℎ𝑥), if ℎ′ℎ𝑥
ℎ𝑥 and ℎ′ is negative to ; 𝑆𝑖𝑔𝑛(ℎ′ℎ𝑥) =
𝑆𝑖𝑔𝑛(ℎ𝑥), if ℎ′ℎ𝑥
ℎ𝑥 and ℎ′ is positive to ;
(4) 𝑆𝑖𝑔𝑛(ℎ′ℎ𝑥) = 0, if ℎ′ℎ𝑥 = ℎ𝑥.
Definition 1.7: Let AX be a full linear HA and 𝑓𝑚 is a fuzzy measure on X. It is
clearly shown that the mapping 𝔳𝔵: 𝑋
[0,1] is induced by the fuzzy measure 𝑓𝑚 if is
defined recursively as after:
(1) 𝑣𝔵(𝑊) =
= 𝑓𝑚(𝑐), 𝑣𝔵(𝑐) =
.𝑓𝑚(𝑐) =
.𝑓𝑚(𝑐), 𝑣(𝑐+) =
+
.𝑓𝑚(𝑐+);
3
(2) 𝑣𝔵(ℎ𝑗𝑥)=𝑣𝔵(𝑥)+𝑆𝑖𝑔𝑛(ℎ𝑗𝑥){𝜇(𝑖)𝑓𝑚(𝑥)𝜔(ℎ𝑗𝑥)𝜇(𝑗)𝑓𝑚(𝑥)
𝑖−𝑠𝑖𝑔𝑛(𝑗)
𝑖=𝑠𝑖𝑔𝑛(𝑗) },
where j, −𝑞𝑗𝑝 and 𝑗
0, and 𝜔(ℎ𝑗𝑥)=1
2[1+𝑆𝑖𝑔𝑛(ℎ𝑗𝑥) 𝑆𝑖𝑔𝑛(ℎ𝑝𝑗𝑥) (𝛽 𝛼)]
{𝛼,𝛽};
1.4. Mining association rules
1.4.1. Some concepts
Suppose that 𝐼 = {𝐼1,𝐼2,..,𝐼𝑚} is a set of m distinct properties. Assume D is a DB
whose records contain a subset T of properties, each record has its own index. An AR is an
associated clause of the form 𝑋𝑌, where 𝑋,𝑌 𝐼, satisfying the condition 𝑋𝑌=.
The sets X and Y are called itemsets.
Definition 1.10: An AR is an associated clause of the form 𝑋𝑌, where 𝑋,𝑌
𝐼 and
X, Y are called itemsets, satisfying the condition 𝑋𝑌 =. The set X is called the cause,
and Y is the consequence one
There are 2 important measurements for AR: Support and confidence.
Definition 0.11: The support of the itemset X is the rate of the number of records in D
containing the itemsets X to the number of records in D.
𝑆𝑢𝑝𝑝(𝑋)=|𝑋|
|𝐷|
Definition 1.12: The support of the rule 𝑋 𝑌: The support of an association rule
𝑋𝑌 is the ratio between the number of records containing the set 𝑋 𝑌, compared to
the total number of records in D.
𝑆𝑢𝑝𝑝(𝑋 𝑌)=𝑃(𝑋𝑌)=|𝑋∪𝑌|
|𝐷|
(1.2)
Definition 1.13: The confidence of the rule 𝑋𝑌: The confidence of an association
rule X Y is the ratio between the number of records in D containing 𝑋 𝑌 and the
number of records in D containing the set X.
𝑐𝑜𝑛𝑓(𝑋𝑌) = 𝑆𝑢𝑝𝑝(𝑋 𝑌)
𝑆𝑢𝑝𝑝(𝑋)
(1.3)
1.4.2. The problem of fuzzy association rule
Let 𝐷𝑇 = {𝑡1,𝑡2,,𝑡𝑛} be the transaction database, n is the total number of records in
D. Let 𝐼 = {𝑖1,𝑖2,,𝑖𝑚} be the items, each item 𝑖𝑗 (1𝑗 𝑚) is an item attribute or a
quantitative attribute. A fuzzy attribute set is a pair of 𝑍,𝐶 where Z corresponds to a set
of attributes zj and C corresponds to a set of fuzzy sets 𝑐𝑗. If fuzzy association rule
𝑋 𝑖𝑠 𝐴 𝑌 𝑖𝑠 𝐵, it is called trust if the support of 𝐹(𝑍,𝐶) and confidence 𝐹𝐶((𝑋,𝐴),(𝑌,𝐵)),
with 𝑍 =𝑋𝑌, 𝐶 =𝐴𝐵.
The support of a itemsets 𝑍,𝐶 denoted by 𝑓𝑠(〈𝑍,𝐶〉)is determined by the following
formula:
𝑓𝑠(〈𝑍,𝐶〉)= (𝑡𝑖[(𝑥𝑗,𝑎𝑗)])
𝑚
𝑗=1
𝑛
𝑖=1 𝑛
(1.4)
Where m is the number of itemsets in the itemset (𝑍,𝐶).
Fuzzy confidence is determined by the following formula:
𝐹𝐶((𝑋,𝐴),(𝑌,𝐵))= 𝑓𝑠(𝑍,𝐶)
𝑓𝑠(<𝑋,𝐴>)
(1.5)
The algorithm of fuzzy association rule mining based on Apriori algorithm: