Summary of Computer doctoral thesis: Mining fuzzy association rules and fuzzy sequential patterns in temporal quantitative databases
lượt xem 4
download
The objective of the thesis: Mining association rules with time-interval between events in temporal quantitative databases called fuzzy time-interval association rules; mining sequential patterns with time-interval between events in temporal quantitative sequential databases called fuzzy sequential patterns with fuzzy time intervals; mining common sequential rules with time-interval between events in temporal quantitative sequential databases called fuzzy common sequential rules with fuzzy time intervals.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Summary of Computer doctoral thesis: Mining fuzzy association rules and fuzzy sequential patterns in temporal quantitative databases
- MINISTRY OF EDUCATION AND VIETNAM ACADEMY TRAINING OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY ---------------------------- TRUONG DUC PHUONG MINING FUZZY ASSOCIATION RULES AND FUZZY SEQUENTIAL PATTERNS IN TEMPORAL QUANTITATIVE DATABASES Major: Information Systems Code: 9 48 01 04 SUMMARY OF COMPUTER DOCTORAL THESIS Ha Noi – 2021
- The thesis has been completed at Graduate University Of Science And Technology - Vietnam Academy Of Science And Technology Supervisor 1. Assoc. Prof. Dr. Do Van Thanh Supervisor 2. Assoc. Prof. Dr. Nguyen Duc Dung Review 1: Review 2: Review 3: The thesis will be defended at the Board of Examiners of Graduate University Of Science And Technology - Vietnam Academy Of Science And Technology, at……………….on…………………………. The thesis can be explored at: - Library of Graduate University Of Science And Technology - National Library of Vietnam
- INTRODUCTION 1. Motivation of the thesis (Phương and Thành, 2013) Mining association rules and sequential patterns, sequential rules are some of the most important domains in data mining. Up to now, a lot of research related to them. Association rules and sequential patterns, sequential rules are proposed in many forms such as transaction/quantitative; weighted / unweighted; with / without time; etc. Rekesh Agrawal et al first introduced an association rule mining problem in transaction databases in 1993 (Agrawal, Imieliński and Swami, 1993) and up to now, there have been many proposed algorithms according to many different approaches to mining the rules in transaction databases such as APRIORI (Agrawal, Srikant and others, 1994), PARTITION (Savasere, Omiecinski and Navathe, 1995), A-CLOSE (Pasquier et al., 1999a), A-CLOSE+ (Shekofteh, Rahmani and Dezfuli, 2008), CLOSE (Pasquier et al., 1999b), CLOSET (Pei et al., 2000), CLOSET+ (Wang, Han and Pei, 2003), CHARM (Zaki and Hsiao, 2002), MAFIA (Burdick, Calimlim and Gehrke, 2001), GENMAX (Gouda and Zaki, 2005), ECLAT (Ogihara et al., 1997), DIC (Brin et al., 1997), FP-GROWTH (Han et al., 2004), CFPMINE (Qin, Luo and Shi, 2004), ETARM (Nguyen et al., 2018), LRM (Saravanan and Sree, 2011), PARM (Sumathi and Kirubakaran, 2012), NEGFIN (Aryabarzan, Minaei-Bidgoli and Teshnehlab, 2018). However, there are many databases in which values of attributes are numeric or categorical called quantitative databases. Mining association rules in a quantitative database often uses one of two ways: discretized (Srikant and Agrawal, 1996a; Lent, Swami and Widom, 1997; Fukuda et al., 1999; Rastogi and Shim, 2002) and fuzzy values of quantitative attributes (Chan and Au, 1997; Kuok, Fu and Wong, 1998; T.-P. Hong, Kuo and Chi, 1999; Hong, Kuo and Chi, 2001; Hong, Chiang and Wang, 2002; Hong, 2003). The essence of the first approach is transform quantitative database into the transaction database by converting the qualitative attributes into a number of corresponding items and then applying one of the mining association rules algorithms in transaction databases. The second approach solves the disadvantages of the first, but they require need to be improved the algorithms. Temporal database is a database that stores information about the time of transactions (Tansel et al., 1993) (Aydin and Angryk, 2018). In 1998, Lu et al. (Lu, Han and Feng, 1998) proposed association rule with time interval between transactions in the temporal databases. The rules had a form → where a, b are itemsets. In work (Lu, Han and Feng, 1998), two algorithms, E-Apriori and EH-Apriori, were proposed. On the main idea, the two algorithms E-Apriori, EH-Apriori are based on the idea of the Apriori algorithm and use a sliding window for the time interval. For mining association rules with time interval, further algorithms are proposed such as FITI (Tung et al., 2003), ITARM (Qin and Shi, 2006), ITP-Miner (Lee and Wang, 2007), IAR Miner (Nandagopal, Arunachalam and Karthik, 2012), CITP-Miner (Nguyen et al., 2019), NCITPS-MINER (Nguyen et al., 2020). Mining association rule with time interval has only applied for temporal transaction databases, but has not yet been done for temporal quantitative databases. This is a research gap that the thesis wishes to solve. Sequential rule, sequential pattern are known as the classical sequential rule, classical sequential pattern to distinguish them from a kind of new sequential rule, sequential pattern proposed in recent years. The classical sequential patterns (referred to as the sequential pattern) are classical sequences (in short, sequences) that are frequent in sequential databases. The sequential patterns show relationships between transactions in the sequence. Mining sequential pattern in sequential databases was the first introduced in 1995 (Agrawal, Srikant and others, 1995) and continues to receive a lot of attention. There are many algorithms to mine sequential patterns in sequential transaction databases such as GSP (Srikant and Agrawal, 1996b), SPIRIT (Garofalakis, Rastogi and Shim, 1999), SPADE (Zaki, 2001), 1
- SPAM (Ayres et al., 2002), FAST (Salvemini et al., 2011), CM-SPADE (Fournier-Viger, Gomariz, Campos, et al., 2014), MAXSP (Fournier-Viger, Wu and Tseng, 2013), GENMINER (Lo, Khoo and Li, 2008), FREESPAN (Han et al., 2000), PREFIXSPAN (Pei et al., 2004), CLOSPAN (Yan, Han and Afshar, 2003), MSPIC-DBV (Van, Vo and Le, 2018), HSPREC (Bhatta, Ezeife and Butt, 2019),... Temporal sequential database is a sequential database which include infomation of transaction time. In 2000, Yoshida et al (Yoshida et al., 2000) proposed time-interval sequential patterns with time-interval in temporal sequential databases, the sequential patterns such as 〈 〉 where a, b, c were items, [1−4] and [5−9] were predetermined time ranges. Delta-Pattern algorithm (Yoshida et al., 2000) was porposed to find these sequential patterns. The time-interval sequential patterns were continues found by some algorithms such as I-Apriori and I-PrefixSpan (Chen, Chiang and Ko, 2003), TAS (Giannotti et al., 2006). In 2005, to solve sharp boundary problems when a time interval is near the boundary of two time ranges, Chen và Huang (Chen and Huang, 2005) used fuzzy theory to apply to time intervals for mining fuzzy time-interval sequential patterns which are such as 〈 〉 where Short, Long are fuzzy sets. In (Chen and Huang, 2005), two algorithm, FTI-Apriori and FTI-PrefixSpan, were proposed to find the fuzzy time-interval sequential. The fuzzy time-interval sequential patterns were also found by FP Growth- PrefixSpan algorithm (Mukhlash, Yuanda and Iqbal, 2018). The common sequential rules have just been mentioned in recent years (Fournier-Viger et al., 2010, 2017). The common sequential rules present the relationship between unordered items in which all items in the antecedent part have to occur before any item in the consequent part. The first algorithm to find common sequential rules was CMRules (Fournier-Viger et al., 2010) and continues solved by later algorithms such as Rule Growth (Fournier-Viger, Nkambou and Tseng, 2011), ERMiner (Fournier-Viger, Gueniche, et al., 2014). They are actually useful and have been applied in practice (Çelebi et al., 2014). The works of common sequential rules mining was resolved only in the case of transactional sequence databases, there are no works for mining these rules in quantitative sequence databases, where attributes in the sequence databases receive numeric and/or categorical values. This is also a research gap that the thesis wishes to solve. This thesis aims to solve the those gaps identified above. The research to solve those problems is really necessary not only in terms of theoretical development but also in terms of practical application. That is the motivation for the author of the dissertation to do research on the topic “Phát hiện association rule và luật chuỗi mờ trong cơ sở dữ liệu định lượng có yếu tố thời gian”. Specifically, the thesis proposes and solves the problems of the discovery association rules and sequential patterns, common sequential rules with time interval between events in temporal quantitative databases and temporal quantitative sequential databases. 2. The objective of the thesis Mining association rules with time-interval between events in temporal quantitative databases called fuzzy time-interval association rules. Mining sequential patterns with time-interval between events in temporal quantitative sequential databases called fuzzy sequential patterns with fuzzy time intervals. Mining common sequential rules with time-interval between events in temporal quantitative sequential databases called fuzzy common sequential rules with fuzzy time intervals 3. Research method The thesis used the following research methods: 2
- Methods of synthesis and analysis: were used to synthesize and analyze researches on related issues to detect research gaps and identify research problems that the thesis needs to solve. The analytical method was also often used when proposing new concepts related to the dissertation's research problem so that new concepts were developed based on as much as possible of related concepts. Comparative method: was used to compare proposed techniques and algorithms to solve related research problems, thereby forming ideas for new algorithms for research problems. Designing and evaluating computational complexity: were used to design algorithms to solve problems set out in the thesis and estimate the computational complexity of these algorithms. Experimental method: The proposed algorithms are all tested on real data sets to evaluate the correctness and feasibility of the algorithm. 4. The major contributions of the thesis The main contributions of the thesis are proposing and solving the following problems: Propose problem and algorithm to mine association rules with time interval between transactions in temporal quantitative databases, where the quantative attributes and the time interval between transactions are converted to fuzzy attributes and fuzzy time intervals [CT4]. Propose problem and algorithm to mine sequential patterns with time interval between transactions in temporal quantitative sequence databases, where the quantative attributes and the time interval between transactions are converted to fuzzy attributes and fuzzy time intervals [CT5]. Propose problem and algorithm to mine common sequential patterns with time interval between transactions in temporal quantitative sequence databases, where the quantative attributes and the time interval between transactions are converted to fuzzy attributes and fuzzy time intervals [CT9]. 5. Thesis organization The thesis includes the introduction, 04 content chapters and the conclusion: Introduction: Presenting the need and motivation of the topic research; research objectives, subjects, scope; research methods; main contributions and structure of the thesis. Chapter 1: Overview of association rule and sequential pattern, common sequential rule mining. Chapter 2: Mining fuzzy association rules with time interval in temporal quantitative databases Chapter 3: Mining fuzzy sequential patterns with time interval in temporal quantitative sequence databases. Chapter 4: Mining fuzzy common sequential rules with time interval in temporal quantitative sequence. Conclusion: Presenting some conclusions about the significance, contribution of the thesis and future research. 3
- CHAPTER 1. OVERVIEW OF ASSOCIATION RULE AND SEQUENTIAL PATTERN, COMMON SEQUENTIAL RULE MINING This chapter presents an overview of issues related to mining association rules and sequential patterns, common sequential rules in transaction / quantitative databases with or without time interval. This chapter also shows the unresolved gaps to determine the research problem of the thesis. 1.1. Association rule 1.1.1. Association rule mining in transaction databases Definition 1.1 Transaction database (Agrawal, Srikant and others, 1994): Let I = { } be a set of items, D = { } be a set of transactions, where (1jm) is itemset with I, is the value of the item in the transaction . In this case, D is a transaction database. Definition 1.2 Association rule (Agrawal, Imieliński and Swami, 1993): Let X be a itemset, the transaction T contains X if and only if XT. Association rule is a form X Y where XI, YI and XY=. In which, X is called the antecedence, Y is the consequence of the rule. Definition 1.3 The support and the confidence (Agrawal, Imieliński and Swami, 1993) The support of X, denoted sup(X), is defined as follows: |{ | }| (1.1) | | The support of the rule , denoted sup( ), is defined as follows: (1.2) | | The confidence of the rule , denoted conf( ), is defined as follows: (1.3) The mining association rules is usually divided into two phases: (Agrawal, Imieliński and Swami, 1993; Kotsiantis and Kanellopoulos, 2006): Phase 1: Finding all of frequent itemsets with its support are more than or equal to the pre- user threshold (min_sup); Phase 2: Generating association rules that has its confidence satified min_conf form the frequent itemsets which were found in phase 1. 1.1.2. Mining association rule in quantitative databases Definition 1.4 Quantitative database (Chan and Au, 1997): Let I = { } be a set of attributes, D = { } be a set of transactions, (1jm) be a set of value of attributes and I, is the value of attribute (1kn) in transaction (1jm) and it can be a numeric or categorical. Then, D is called a quantitative database. 1.1.3. Mining association rule with time interval in temporal database Definition 1.5 Temporal database is a database in which a attribute describes time occur of each transaction. Bảng 1.1. Some works on mining association rule with time interval Algorithm Dataset Rule Description EH-Apriori (Lu, Han and Feng, Temporal If the item a sold, then → 4
- 1998), database the item b will be sold FITI (Tung et al., 2003), in 2 days later. ITARM (Qin and Shi, 2006), ITP-Miner (Lee and Wang, 2007), IAR Miner (Nandagopal, Arunachalam and Karthik, 2012), NCITPS-Miner (Nguyen et al., 2020) 1.2. Sequential pattern 1.2.1. Mining sequential patterns in sequence databases Definition 1.6 A sequence database (Agrawal, Srikant and others, 1995): Let I ={ } be a set of items. A sequence s =〈 〉 is a list of items in order with I (1km). A sequence database SD is a set of sequences SD = { }. Definition 1.7 Length of a sequence (Agrawal, Srikant and others, 1995): Length of a sequence 〈 〉 is the number of items in the sequence. A sequence whose length is k referred to k-sequence. Definition 1.8 Subsequence (Agrawal, Srikant and others, 1995): A sequence 〈 〉 is a subsequence of a sequence 〈 〉 if and only if there exist an interger k with satified , denoted . Definition 1.9 The support (Agrawal, Srikant and others, 1995): The support of a sequence in a sequence database SDB, denoted sup( ), is the number of sequences containing divided by the number of sequences in the database: |{ | }|/|SDB| (1.4) A sequence is said to be a frequent sequence or a sequential pattern if and only if sup( ) min_sup, for a threshold min_sup set by the. 1.2.2. Mining sequential pattern in quantitative sequence databases Definition 1.10 Quantitative sequence databases: Let I = { } be a set of attributes. A quantitative sequence s = 〈 〉 is an order list of sets of attributes I (1km) and a attribute a can be a numeric or categorical. A quantitative sequence databases is a set of all quantitative sequences{ }. 1.2.3. Mining sequential pattern with time interval in temporal sequence databases Definition 1.11 A temporal (quantitative) sequence database (Guyet, 2020): is a (quantitative) sequence database in which a attribute describes time occur of each transaction in the sequences. Let I = { } be a set of items. A temporal sequence 〈 〉 , where is the time occur of an item I (1 n). The sequence s is also showed as the form s = 〈 〉 where (1≤ j≤ k) and is the time that the purchase of the item occurs in A temporal sequence database is a set of all temporal sequences { }. If the values of items in I are 0 or 1 then the database will be a temporal sequence database If the values of items in I are numeric or categorical then the database will be a temporal quantitative sequence database. 5
- Bảng 1.2. Some works on mining sequential patterns with time interval Algorithm Dataset Pattern Description TAS (Giannotti et Temporal 〈 〉 If a customer buys a item a and later al., 2006) sequence a item b an interval of 3 days then he databases will buy a item c an interval of 5 days. Delta-Pattern Temporal 〈 〉 If a customer buys a item a and later (Yoshida et al., sequence a item b an interval of [0, 3 days] 2000) databases then he will buy a item c an interval of [0, 5 days]. I-Apriori algorithm, Temporal 〈 〉 If a customer buys a item a and later I-PrefixSpan (Chen, sequence a item b an interval of I1 then he will Chiang and Ko, databases buy a item c an interval of I2. 2003) FTI-Apriori, FTI- Temporal 〈 〉 If a customer buys a item a and later PrefixSpan (Chen sequence a item b an interval of Short then he and Huang, 2005) databases will buy a item c an interval of Long. FP-Growth- PrefixSpan (Mukhlash, Yuanda and Iqbal, 2018) SPFTI (Chang, Temporal 〈 〉 If a customer buys a item a and later Chueh and Lin, sequence a item b an interval of then he 2009), databases will buy a item c an interval of . ISPFTI (Chang, Chueh and Luo, 2012) 1.3. Common sequential rule 1.3.1. Common sequential rule Definition 1.12 Common sequential rule (Fournier-Viger et al., 2012): Let I = { } be a set of items, SD be a sequence database, a common sequential rule X⟹Y be a relationship between two unordered itemsets X, Y I satisfy X Y=, X, Y ≠ and the items in Y must appear after the items in X. 1.3.2. Mining common sequential rule Common sequential rule was proposed in the last few years (Fournier-Viger et al., 2010). Bảng 1.3. Some works on mining common sequential rules Algorithm Dataset Rule Description CMRules (Fournier-Viger Sequence Common If a customer bought et al., 2010), databases sequential rule: items a and b then he will Rule Growth (Fournier- {a, b} ⟹ {c} buy a item c later Viger, Nkambou and Tseng, 2011), ERMiner (Fournier-Viger, Gueniche, et al., 2014) Definition 1.13 A left/right equivalence class (Fournier-Viger, Gueniche, et al., 2014): Let be the set of all frequent sequential rules and I be the set of all items. A left equivalence class. A left equivalence class is the set of frequent rules = {W ⟹ Y | Y I |Y| = i} such that W I and i is an integer. Similarly, a right equivalence class is the set of frequent rules = {X ⟹ W | X I |X| = i} such that W I and i is an integer. 6
- Definition 1.14 left/right merges (Fournier-Viger, Gueniche, et al., 2014): Let be a left equivalence class and two rules r = W ⟹ X and s = W ⟹ Y such that r, s and | | | – |. A left merge of r, s is the process of merging r, s to obtain ⟹ . Similarly, let be a right equivalence class and two rules r = ⟹ and s = ⟹ such that r, s and | | | – |. A right merge of r, s is the process of merging r, s to obtain the rule ⟹ Conclusion of Chapter 1 Chapter 1 has presented an overview, summarizing the issues related to mining association rules and sequential patterns, common sequential rules in (transactional, quatitative) databases (transactional, quantitative) and (transactional, quatitative) sequential databases with time interval The thesis will focus on researching proposals and solutions to solve the following 3 problems: Problem 1: Mining association rules with time interval between transactions in temporal quantitative databases. Problem 2: Mining sequential patterns with time interval between transactions in temporal quantitative sequence databases. Problem 3: Mining common sequential rules with time interval between transactions in temporal quantitative sequence databases. The next three content chapters in the thesis will present solutions for those 3 research problems. 7
- CHAPTER 2. MINING ASSOCIATION RULES WITH TIME INTERVAL IN TEMPORAL QUANTITATIVE DATABASES In Chapter 1, the thesis has pointed out the gaps that need to be studied about mining association rules with time intervals. This chapter, the thesis will present solutions to solve that research problem. At that time, a new type of association rule called fuzzy association rule with time interval is proposed. The research results of this Chapter have been published in the Indian Journal of Science and Technology [CT4]. This chapter mainly presents the problem of mining fuzzy association rules with time intervels in quantitative databases. 2.1. Introduction Association rule mining problems are an important object in data mining research. Association rule mining in transaction database was first introduced in 1993 by Rakesh Agrawal et al (Agrawal, Imieliński and Swami, 1993) and continue up to now (Agrawal, Srikant and others, 1994; Savasere, Omiecinski and Navathe, 1995; Zaki and Hsiao, 2002; Sumathi and Kirubakaran, 2012; Aryabarzan, Minaei-Bidgoli and Teshnehlab, 2018). The time distinct among items in mining association rules were been attentive (Lu, Han and Feng, 1998; Tung et al., 2003; Qin and Shi, 2006; Lee and Wang, 2007; Nandagopal, Arunachalam and Karthik, 2012) và khoảng cách thời gian giữa các giao dịch được mờ hóa trong nghiên cứu (Chen and Huang, 2005). Main idea of the work in (Chen and Huang, 2005) is brief: Time-interval items were applied by fuzzy sets; Frequently sequential patterns with forms 〈 〉, where a, b, c are attributes, Short, Long are linguistic term of time-interval, were mined; The FTI-Apriori algorithm and FTI-Prefix Span algorithm were proposed to mining the fuzzy time-interval sequential patterns. However, the research in (Chen and Huang, 2005) only applied for sequence databases without quantitative attributes, it not used with temporal quantitative databases. An instance of the result of rules like that “If a customer buy an item a and later an item b an interval of Short then he will buy an item c an interval of Long”. The research [CT2] proposed and solved the problem of mining association rules with time interval in temporal transaction database. These rules are as form “If an item a is bought then an item b will be bought an interval of Short”. The FTITS algorithm was porposed in [CT2]. The FTITS algorithm was based on the idea of FTI-Apriori algorithm (Chen and Huang, 2005). The main objective of this chapter is to mine fuzzy association rules with fuzzy time interval in temporal quantitative databases. 2.2. Problem definition Definition 2.1 Let I={ } be set of all attributes, D = { }, where (1jp) is a set of attributes satisfy I at the time ( ≥0), is value of attribute (numeric or categorical) at (1≤k≤n). Then, D is called a temporal quantitative database. Definition 2.2 Let T be a set of transactions, I be a set of attributes and { } be a set of fuzzy sets, which link to each of attributes in I, { } be fuzzy sets link to (k=1, .., n), where hk is the number of the fuzzy sets of the attribute , be a jth fuzzy set of attribute (1≤ j≤ ). DF = (T, I, FE) is called a temporal fuzzy database and fuzzy set is called a fuzzy attribute. Each fuzzy set has a membership : X[0,1]. Definition 2.3 : P is a fuzzy sequence with a form 〈 〉 where pj is a fuzzy attribute and tj is time of pj (1≤j≤n; tj-1≤tj with 2≤j≤n). Sequence A is sorted by 8
- alphabet in case of events occurs at the same time. Then the interval-time values between two continuous fuzzy items in a sequence is tij=tj+1-tj (1≤j≤n-1). Definition 2.4 (mở rộng từ (Chen and Huang, 2005)) Let FE be a set of fuzzy sets, which link to attributes in a temporal quantitative database; LT={ | j=1,2,...,p} be linguistic terms. The sequence β = is a fuzzy time-interval sequence if FE and LT for 1≤j≤r-1 and FE. The sequence β is length of r. Definition 2.5 (mở rộng từ (Chen and Huang, 2005)) A fuzzy time-interval sequence = is a subsequence of fuzzy time-interval sequence β = if and only if exists a integer value w satisfied with with i|1≤i≤k-1 and . In case of k=1, . Definition 2.6 (mở rộng từ (Chen and Huang, 2005)) Given two fuzzy time-interval sequences are 1 = and 2 = . 1 joins 2 to a fuzzy time-interval sequence = . Definition 2.7 A fuzzy time-interval association rule has a form , where X, Y are fuzzy time-interval sequences, LT. Let P = 〈 〉 be a fuzzy sequence and α = be a fuzzy time-interval sequence, with 1≤i≤r is a value of at the time . The support of P in α is defined as follows: { { } { } (2.1) Given a temporal fuzzy database DF with N transactions, we have definitions: Support of α in DF is defined as follows: ∑ (2.2) Confidence of rule in DF is defined as follows: ( ) (2.3) Support of rule is defined as follows: ( ) (2.4) A fuzzy time-interval sequence is called frequently if it has its support more than or equal to min_sup (min_sup is the user‟s threshold). From Definition 2.5 and Error! Reference source not found., we obtain the following properties: Property 1: A subsequence of frequent fuzzy time-interval sequence is also frequent fuzzy time-interval sequence. Property 2: All frequent fuzzy time-interval sequences with length of k are result of joining two fuzzy time-interval sequences with length of k-1. 9
- 2.3. Fuzzy time-interval in temporal quantitative database algorithm 2.3.1. Problem Given a temporal quantitative database D, and a min_sup, a min_conf value threshold, set LT of linguistic terms, fuzzy membership functions of each of linguistic terms, FE set of fuzzy sets of quantitative attributes in D with their fuzzy membership functions. The main objective is to determine in database all fuzzy time-interval association rules whose supports are more than or equal to the min_sup and whose confidences are more than or equal to the min_conf 2.3.2. Main idea First, we transform the temporal quantitative database D to a temporal fuzzy database DF by using fuzzy sets and their membership functions for quantitative attributes in D. Then, all frequently fuzzy time-interval sequences are found. This process is based on Apriori algorithm: two phases in generating frequently fuzzy time-interval sequences are repeated until all one found. In the first phase, , set of candidates with length of k was built from , set of frequently fuzzy time-interval sequences with length of k-1. In the second phase, all frequent sequences in are added into . The process of finding frequently fuzzy time- interval sequences stops when =. Fuzzy time-interval association rules are generated by extracting from frequently fuzzy time-interval sequences with length of more than or equal to two. Then we computed the confidence of the rules by formula Error! Reference source not found. and results are all rules which have their confidences more than and equal to min_conf 2.3.3. FTQ Algorithm Algorithm 2.1. FTQ Algorithm Input: - A temporal quantitative database D; - min_sup, min_conf; - FE is sets of quantitative attributes in D with their fuzzy membership functions D; - LT is set of linguistic terms and fuzzy membership functions of each of linguistic terms. Output: Fuzzy time-interval association rules have their confidences more than or equal to min_conf and their supports more than or equal to min_sup. FTQ{ 1. Transform D to DF 2. ={set of all attributes of DF} 3. ={α |Supp(α)≥min_sup} 4. =; 5. for each do 6. for each do 7. for each ltd LT do { 8. α= *ltd* ; 9. add α to ; 10. } 11. for each α do 12. α.count=Supp(α); 10
- 13. ={α |α.count ≥min_sup} 14. for (k>2; ≠;k++){ 15. =fuzzy_apriori_gen( ); 16. for each α do 17. α.count=Supp(α); 18. ={α |α.count ≥min_sup} 19. } 20. return Generating_rules( ); 21.} 2.3.4. The correctness and completeness of the FTQ algorithm Theorem 2.1. The FTQ algorithm is correct and complete. 2.3.5. Special cases of fuzzy time interval association rules Theorem 2.2: FTQ algorithm can find the rules as form và 2.4. Experimental results 2.5. Datasets Table 2.1. Datasets Number of Number of Datasets attributes transactions ISTANBUL STOCK EXCHANGE 8 537 VNINDEX 11 1161 2.6. Results a) ISTANBUL STOCK EXCHANGE Figure 2.1 shows the relationship between the number of result rules and the min_conf threshold with different min_sup Figure 2.1. Relationship between the number of rules and the min_conf with different min_sup Figure 2.2 and Figure 2.3 show the results comparing the number of rules and the execution time of the fuzzy time interval method (A) with the time interval division method (B). The time interval in the interval division method (B) is equally divided into 3 intervals, the values of the time interval are equal to 1 if they are in the range, otherwise they have the value 0. 11
- Figure 2.2. The results comparing the number of rules of the fuzzy time interval method (A) with the time interval division method (B) when FTQ algorithm executes Figure 2.3. The results comparing the number of execution time of the fuzzy time interval method (A) with the time interval division method (B) when FTQ algorithm executes Conclusion of Chapter 2 In this chapter, the thesis presents the solution of mining association rules with time interval between transactions in temporal quantitative databases by proposing an algorithm to mining fuzzy time interval association rules in such databases. That algorithm is called FTQ. According to this algorithm, the quantitative attributes and the time interval that occurs between transactions are both transformed into fuzzy sets. The FTQ algorithm is developed from the idea of the Apriori algorithm (Agrawal, Srikant and others, 1994), a sequence of length k is obtained by join two sequences of length k-1. This chapter also presents the experimental results of the FTQ algorithm on real datasets, comparing with the corresponding time interval division method and analyzing the meaning of the obtained rules. 12
- CHAPTER 3. MINING FUZZY SEQUENTIAL PATTERNS WITH FUZZY TIME- INTERVALS IN QUANTITATIVE SEQUENCE DATABASES In Chapter 1, the thesis has pointed out the gaps that need to be studied about mining sequential patterns with time intervals. This chapter, the thesis will present solutions to solve that research problem. At that time, a new type of sequential pattern called fuzzy sequential patterns with time intervals is proposed. The research results of this Chapter have been published in the journal of Cybernetics and Information Technologies [CT5]. This chapter mainly presents the problem of mining sequential patterns with time interval in quantitative databases. 3.1. Introduction Mining sequential patterns is one of the most important domains in data mining. Mining sequential patterns from transaction databases (events present or not) is the first introduced in (Agrawal, Srikant and others, 1995). In many cases, the values of time distances among the events in a sequence are preferred. The time-interval sequential patterns in transaction databases were investigated by authors such as (Yoshida et al., 2000; Chen, Chiang and Ko, 2003). In (Chen, Chiang and Ko, 2003), time-interval sequential patterns were presented such as 〈 〉, where a, b, c are items, were predetermined time ranges. To solve sharp boundary problems when a time interval is near the boundary of two time ranges in (Chen, Chiang and Ko, 2003), the fuzzy theory was applied to time intervals (Chen and Huang, 2005). The fuzzy time-interval sequential patterns in (Chen and Huang, 2005) showed the relationship among events such as 〈 〉, where Short and Long were linguistic terms for time intervals. The main objective of this chapter is to mine fuzzy sequential patterns with fuzzy time- intervals in quantitative sequence databases. The fuzzy sequential patterns with fuzzy time- intervals as form 〈 〉, where are fuzzy sets associated with attributes a, b, c and Short, Long are fuzzy sets associated with time intervals. 3.2. Problem definition Definition 3.1 Let I={ } be a set of attributes, s = 〈 〉 be a quantitative sequence, where I is an attribute (1kn), ( ≥0) is time of , with 2kn and ( ) = , is numeric or categorical. A quantitative sequence database QSD is a set of all transaction sequences. Definition 3.2 Let { } be a set of fuzzy sets of attributes of I, { } be a set of fuzzy sets of the attribute (k=1, 2..., u), where is the jth fuzzy set j (1≤ j≤ ), is the number of fuzzy sets of , is called a fuzzy attribute. Each fuzzy set has its membership function : X[0,1]. A sequence fs = 〈 〉 is called a fuzzy one, where (1≤ i≤ n) is a fuzzy set, is the value of the membership function of ( ). A fuzzy sequence database FSD is a set of all fuzzy transaction sequences. Definition 3.3 Let LT={ | j=1,2,...,p} be fuzzy sets of time-intervals, be the membership function of the fuzzy set (Hu, Tzeng and Chen, 2004). Then, α = 〈 〉 is called a fuzzy sequence with fuzzy time-intervals if (1≤i≤r) is a fuzzy set and LT (1≤i≤r-1). ). A fuzzy sequence with fuzzy time- intervals whose length is r referred to r-fuzzy sequential pattern with fuzzy time-intervals. 13
- Definition 3.4 A sequence = 〈 〉 is a subsequence of β = 〈 〉 if and only if there exists an integer w satify with i|1≤i≤k-1 và . In case of k=1, . Definition 3.5 Given a fuzzy sequence S = 〈 〉 and a fuzzy sequence with fuzzy time- intervals α = 〈 〉, we define: The support of S for α, denoted by , is defined as follows: { (3.1) (∏ ) { ( )} A fuzzy sequence B = 〈 〉 is a subsequence of S=〈 〉 if and only if there exists r integers để | và . The support of the fuzzy transaction sequence S for α, denoted by SupS(α), is the maximum of the supports of B which are belong to S for α: ( ) (3.2) The support of a fuzzy sequence with fuzzy time-intervals α, denoted Supp(α), is the average of supports of all fuzzy transaction sequences in FSD for α. (∑ ) (3.3) where is the ith fuzzy transaction sequence in FSD, NS is the number of fuzzy transaction sequences in FSD. A fuzzy sequential pattern with fuzzy time-intervals is the fuzzy sequence with fuzzy time-intervals whose support is not less than a user-defined threshold min_sup. Tính chất 1: A subsequence of frequent fuzzy time-interval sequence is also a frequent fuzzy time-interval. Tính chất 2: All frequent fuzzy time-interval sequences with length of k are result of joining two fuzzy time-interval sequences with length of k-1. 3.3. Algorithm of Mining Fuzzy Sequential Patterns with Fuzzy Time-Intervals - FSPFTIM 3.3.1. The problem Input: QSD is a quantitative sequence database, min_sup is a user-defined threshold, FE is a set of fuzzy sets with its membership functions of attributes in QSD, LT is a set of fuzzy sets with its membership functions of time. Output: k- fuzzy sequential patterns with fuzzy time intervals, k >1. 3.3.2. Proposed approach First, all the quantitative attributes in QSD are partitioned into fuzzy sets, then the transaction sequences in QSD are transformed into fuzzy transaction sequences based on these fuzzy sets and their corresponding membership functions.. And we get a fuzzy sequence database, called FSD Next, the FSPFTIM algorithm is applied to find out fuzzy sequential patterns with fuzzy time-intervals. This algorithm is improved from the Apriori algorithm. The algorithm is an iterative process consisting of several steps. At the kth step, the algorithm generates a set denoted Ck of candidate k-fuzzy sequences with fuzzy time intervals 14
- and then calculates the support of these candidate sequences. The sequences having the support greater than min_sup will be added into Lk as a set of k-fuzzy sequential patterns with fuzzy time-intervals. This process will be ended when it is unable to generate any new set of fuzzy sequential patterns with fuzzy time-intervals The result of the algorithm is a set of all k- fuzzy sequential patterns with fuzzy time- intervals for k>1. 3.3.3. FSPFTIM algorithm Pseudo code of the algorithm is shown in Error! Reference source not found.. Algorithm 3.1. FSPFTIM algorithm Input: - Quantitative sequence databases QSD; - a user-defined threshold min_sup; - FE is a set of fuzzy sets with its membership functions of attributes in QSD; - LT is a set of fuzzy sets with its membership functions of time. Output: Set of all k- fuzzy sequential patterns with fuzzy time intervals, k >1. FSPFTIM { 1. Creating a fuzzy sequence database FSD from the QSD 2. {fe| fe is an attribute of FSD} 3. { |Sup()≥min_sup} 4. ; 5. for each do { 6. for each do { 7. for each ltd LT do { 8. *ltd* ; 9. add to ; 10. } 11. } 12. } 13. for each do { 14. Tính độ hỗ trợ của (Sup()); 15. } 16. { |Sup() ≥min_sup} 17. for (k>2; ≠;k++){ 18. fuzzy_apriori_gen( ); 19. for each do { 20. Computing the (Sup()); 21. } 22. { |Sup() ≥min_sup} 23. } 24. return ⋃ ; 25. } 3.3.4. The correctness and completeness of the FTQ algorithm Theorem 3.1. The FSPFTIM algorithm is correct and complete. 15
- 3.3.5. The complexity of the FSPFTIM algorithm O(N.l.h +M.N.l.h2 + | |2. ( ) |LT| + ∑ | | ( ) | |) 3.3.6. Special cases of fuzzy sequential patterns with fuzzy time interval. Theorem 3.2: FSPFTIM algorithm can find sequential patterns such as 〈 〉 and 〈 〉 3.4. Experimental results 3.4.1. Datasets Table 3.1. Datasets Number of Number of Number of Average of Average of transaction Datasets attributes transactions sequences transactions transaction (I) (D) (T) sequences (S) S100I1000T3D34 1000 341 100 29.3 3.41 1K Online 1523 365 87 22.8 4.20 Retail_France 3.4.2. Results 3.4.2.1. Relationships between the number of fuzzy sequential patterns with fuzzy time- intervals and min_sup, and between the execution time and min_sup according to the different numbers of partitions of quantitative attributes. . (a) (b) Online Retail_France Online Retail_France Figure 3.1. Relationship between the number of fuzzy sequential patterns with fuzzy time- intervals and min_sup (a), Relationship between the execution time and min_sup (b) according to the different numbers of partitions of quantitative attributes (K) for the S100I1000T3D341K . 3.4.2.2. Relationships between the number of fuzzy sequential patterns with fuzzy time- intervals and min_sup, and between the execution time and min_sup according to the different numbers of partitions of time-intervals 16
- (a) (b) S100I1000T3D341K S100I1000T3D341K Figure 3.2. Relationship between the number of fuzzy sequential patterns with fuzzy time- intervals and min_sup (a), Relationship between the execution time and min_sup (b) according to the different numbers of partitions of quantitative attributes (K) for the S100I1000T3D341K dataset 3.4.2.3. Comparing the number of sequential patterns of the fuzzy time interval method with the time interval division method. Figure 3.3. The results comparing the number of sequential patterns of the fuzzy time interval method (A) with the time interval division method (B) when FSPFTIM algorithm executes Conclusion of Chapter 3 In this chapter, the thesis presents the solution of mining sequential patterns with time interval in temporal quantitative sequence databases by proposing an algorithm, called FSPFTIM, to mining fuzzy sequential patterns with fuzzy time intervals. In the algorithm, the quantitative attributes and the time intervals that occurs between transactions are both transformed into fuzzy sets. The FSPFTIM algorithm is developed from the idea of the Apriori algorithm. This chapter also presents the experimental results of the FSPFTIM algorithm on real datasets, comparing with the corresponding time interval division method and analyzing the meaning of the obtained rules. 17
- CHAPTER 4. MINING FUZZY COMMON SEQUENTIAL RULES WITH FUZZY TIME-INTERVAL IN QUANTITATIVE SEQUENCE DATABASES. In Chapter 1, the thesis has pointed out the gaps that need to be studied about mining common sequential rules with time intervals. This chapter, the thesis will present solutions to solve that research problem. a new type of common sequential rule called fuzzy common sequential rules with time intervals is proposed. The research results of this Chapter have been published in the journal of International Journal of Uncertainty, Fuzziness and Knowledge- Based Systems [CT9]. This chapter mainly presents the problem of mining sequential rules with time interval in quantitative databases 4.1. Introduction Mining sequential rules is one of the important domains in data mining. There are two kinds of sequential rules: classical sequential rules and common sequential rules (Fournier- Viger et al., 2017). A sequential rule of the first kind expresses a relationship between two series of events happening one after another, in which the antecedent and consequent parts are in a same sequential pattern (Agrawal, Srikant and others, 1995). Like the process of mining association rules, in general the process of mining sequential rules of the first kind also consists of two phases: the first phase is to discover sequential patterns and the second phase is to generate sequential rules from found sequential patterns. The common sequential rules (or the sequential rules of the second kind) have just been mentioned later (Fournier-Viger et al., 2010; Fournier-Viger, Nkambou and Tseng, 2011; Fournier-Viger, Gueniche, et al., 2014). The common sequential rules present the relationship between unordered items (Fournier-Viger et al., 2010; Fournier-Viger, Nkambou and Tseng, 2011; Fournier-Viger, Gueniche, et al., 2014), in which all items in the antecedent part have to occur before any item in the consequent part. A common sequential rule is as form { } ⟹ { }, that mean “If a customer has bought items a and b then the customer will buy items c and d later”, in which items a and b are not chronologically ordered and items a and b are not chronologically ordered too but items c and d have to be bought after items a and b. As known, most sequence databases in the real world are quantitative sequense ones, where the attributes receive numerical or categorical values whereas the existing algorithms. Thus, mining fuzzy common sequential rules with time interval is practical. The main objective of this chapter is to mine fuzzy common sequential rules with fuzzy time-intervals in quantitative sequence databases. The fuzzy common sequential rules with fuzzy time-intervals as form { } ⇒ { } where are fuzzy sets associated with attributes a, b, c, d and Long is a fuzzy set associated with time interval. 4.2. Problem definition Definition 4.1 Let I={ } be a set of attributes, be a total order relation of attributes in I and i1 i2 …, iu, s = 〈 〉 is a quantitative sequence, where I (1kn), is the value of at ( is numeric or categorical), với 2kn; ( , , ) is called a element of the quantitative sequence s. In a sequence, elements occurred in a same time are sorted by the relation of the attributes. A quantitative sequence s can be presented in an alternate form s = 〈 〉, where = {( , ), ( , ), ( , ),…, ( , )} and all attributes in occur at the same time . is called a transaction. 18
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Summary of Computer doctoral thesis: Mining decision laws on the data block
26 p | 10 | 5
-
Summary of Computer and Information technology doctoral thesis: Research on developing method of mining fuzzy association rules based on linguistic information and its application
26 p | 19 | 4
-
Summary of Computer sciense Doctoral Thesis: Mining decision laws on the data block
26 p | 12 | 3
-
Summary of Computer doctoral thesis: Mining weighted sequential patterns in sequence database
25 p | 15 | 3
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn