Handbook of Multimedia for Digital Entertainment and Arts- P12

Chia sẻ: Cong Thanh | Ngày: | Loại File: PDF | Số trang:30

0
44
lượt xem
4
download

Handbook of Multimedia for Digital Entertainment and Arts- P12

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Handbook of Multimedia for Digital Entertainment and Arts- P12: The advances in computer entertainment, multi-player and online games, technology-enabled art, culture and performance have created a new form of entertainment and art, which attracts and absorbs their participants. The fantastic success of this new field has influenced the development of the new digital entertainment industry and related products and services, which has impacted every aspect of our lives.

Chủ đề:
Lưu

Nội dung Text: Handbook of Multimedia for Digital Entertainment and Arts- P12

  1. 322 M. Furini and M. Montangero two other policies, the revenue of a customer depends on the number of customers that receive the song directly and indirectly from him/her. Thus, the way in which the song spreads among customers greatly influences their revenue; e.g., a customer might have a big income with little effort if the customers to whom the song is delivered put a lot of effort in reselling the song; customers might get an unexpected reward also after a long period of time from the moment he/she sold the song. The proportional reward policy produces better results than the equal distribution, as a customer needs a smaller number of customers in its subtree, even if to get a full refund of CI , the minimum number of customers is reasonable in both cases. Moreover, although we analyzed the worst-case scenario, this is unlikely to hap- pen in reality, especially if we think that music is distributed according to social relationships. Hence, with a multi-channel distribution strategy, in average, even a smaller number of customers has to be reached and it is likely that many customers (the higher they are in the tree, the better) might have a revenue grater than CI . From the store point of view, the choice of the policy depends on which, among the customers, the store wants to favor: the selfish policy favors the ones that buy from the store; the equal favors the customers that join the music distribution earlier; the proportional favors the customers that actually distribute the song. Related Work Content distribution in a mobile environment is a subject investigated in recent liter- ature: some are experiencing the development of ad-hoc P2P networks in a mobile environment [13, 14], others are proposing to disseminate contents in Wi-Fi based ad-hoc networks through epidemic algorithms [18, 29]. The multi-channel distribution outlined in this paper does not require a real P2P network, as the song delivery simply requires the cooperation of two customers, making the operation more similar to what happens when a friend text-messages or sends an MMS to another friend. In this case, the message content is the song and the network used is other than the cellphone network. Many of the approaches present in literature are designed to stimulate users cooperation in a P2P networks [2, 3, 25, 28]; for example, peers are asked to route queries or are limited in the use of bandwidth according to the amount of bandwidth they provided to the system. For scalability reasons, most of these mechanisms are distributed, requiring only local information available at each peer. This might lead to malicious modification of peer local information by the peer itself, and hence tamper resistant software should be employed at the user side. Our mechanism is simple to implement because it comes with a centralized control mechanism at no cost: the store can always keep track of the spreading of the song and is always able to correctly assign revenues. Whenever a user wants to play a song he/she just bought, he/she needs to buy the license from the store. At this stage, the store can easily record who sold and who bought the song, updating the information about song spreading.
  2. 14 Incentive Mechanisms for Mobile Music Distribution 323 We are not aware of any distribution strategy that couples cellphone networks and free-of-charge technologies to distribute contents, and that makes use of incentive mechanisms to stimulate the customer cooperation in content distribution. Conclusions In this paper we analyzed the characteristics of the current mobile music scenario, investigating the communication infrastructure, the pricing strategy and the copy- right protection scheme currently used. The analysis highlighted that a replication of the strategy used to distribute contents in the Internet-based music market is not worth applying in the mobile scenario, as it presents critical problems (excessive download time and high cost). To mitigate such problems, we show that a multi-channel distribution strategy can be successful. In such a strategy, customers can re-distribute the song acquired by using the free-of-charge communication technologies provided in cellphones. We showed that by using a smart protection scheme, music sharing could avoid piracy. We also present an incentive mechanism, coupled with three different reward poli- cies, which stimulates customers cooperation by providing a financial compensation to those customers who help distributing music files. The evaluation of the multi-channel distribution strategy equipped with the pro- posed incentive mechanism showed that considerable benefits may be received by all the entities involved in the mobile music distribution, from music stores to cus- tomers, to cellphone network providers. References 1. K. C. Almeroth and A. Garyfalos. Coupons: Wide scale information distribution for wireless ad hoc networks. In Proc. of IEEE Globecom, December 2004. 2. K. G. Anagnostakis and M. B. Greenwald, Exchange-based incentive mechanisms for peer-to- peer file sharing. In International Conference on Distributed Computing Systems, 2004. 3. K. G. Anagnostakis, M. B. G. Yang, T. Condie, S. Kamvar, and H. Garcia-Molina, Non- cooperation in competitive p2p networks. In International Conference on Distributed Com- puting Systems, 2004. 4. P. Antoniadis, C. Courcoubetis and B. Strulo, Incentives for content availability in memory- less peer-to- peer file sharing systems. SIGecom Exch., Vol.5, No. 4, pp. 11–20, 2005, ACM press. 5. K. Biddle, P. England, M. Peinado, and B. Willman. The darknet and the future of content distribution. In Proc. of the ACM Workshop on DRM, 2002. 6. B. Brown, A. J. Sellen, and E. Geelhoed. Music sharing as a computer supported collabora- tive application. In Proceedings of ECSCW 2001, Bohn, Germany, 2001. Kluwer academic publishers. 7. L. Buttyan and J. P. Hubaux. Stimulating cooperation in selforganizing mobile ad hoc net- works. ACM/Kluwer Mobile Networks and Applications (MONET), 8(5):579–592, October 2003.
  3. 324 M. Furini and M. Montangero 8. C. J. Cobb and W. D. Hoyer. Planned versus impulse purchase behavior. Journal of Retailing, 62, April 1986. 9. R. Dingledine and P. Syverson. Reliable mix cascade networks through reputation. In Pro- ceedings of the Sixth International Financial Cryptography Conference (FC02), March 2002. 10. M. Feldman, K. Lai, I. Stoica and J. Chuang, Robust incentive techniques for peer-to-peer net- works, Proceedings of the 5th ACM conference on Electronic commerce, pp. 102–111, 2004. 11. M. Furini, M. Montangero, “The Impact of Incentive Mechanisms in Multi-Channel Mobile Music Distribution”, Multimedia Tools and Applications, Vol. 37. No. 3, pp. 365–382, March 2008. Springer Netherlands Editor. 12. M. Furini, M. Montangero, “The Use of Incentive Mechanisms in Multi-Channel Mobile Music Distribution”, Proceedings of 2nd IEEE International Conference on Automated Pro- duction of Cross Media Content for Multi-Channel Distribution (AXMEDIS 2006), Leeds, UK, December 12–15. IEEE Computer Press 2006. 13. E. Harjula, M. Ylianttila, J. Ala-Kurikka, J. Riekki, J. Sauvola, Plug-and-play application platform: towards mobile peer-to-peer, Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia MUM, October 2004. 14. N. Hatt BlueFramework - Application Framework for Bluetooth Enabled Mobile Phones, TIK- ˜ MA-2005-16, ETH ZA1=4rich, Switzerland, 2005. 15. T. H.-T. Hu, K. Wongrujira, and A. Seneviratne. Reputation in peer-to-peer networks. In Proceedings of the IEEE International Conference on Communications (ICC 2004), pages ˆ 1411A–1415, Paris, France, June 2004. 16. D. Hughes, G. Coulson, J. Walkerdine, Free Riding on Gnutella Revisited: The Bell Tolls?, IEEE Distributed Systems online, Vol. 6, No. 6, June 2005. 17. IFIP. Digital music report 2008 – Summary. Research report, International Federation of the Phonographic Industry, 2008. [on-line] Available at http://www.ifpi.org/content/ library/DMR2008-summary.pdf 18. A. Khelil, C. Becker, J. Tian, K. Rothermel, An epidemic model for information diffusion in MANETs, Proceedings of the 5th ACM International Workshop on Modeling, Analysis and Simulation of Wireless and Mobile Systems, September 2002. 19. J. Liang, R. Kumar, Y. Xi, and K. Ross, Pollution in P2P File Sharing Systems. In IEEE Infocom, Miami, FL, USA, March 2005. 20. T.S. Messerges and E.A. Dabbish, Digital Rights Management in a 3G Mobile Phone and Beyond Proceedings of Digital Right Management (DRM), Washington, October 2003. ACM Press. 21. Microsoft Corporation, Microsoft PlayReady powers next-generation media experiences on mobile networks, http://www.microsoft.com/presspass/press/2007/feb07/02-123GSMNew TechnologyPR.mspx ˆ 22. M. J. O’Grady and G. M. P. OA’Hare Just-In-Time Multimedia Distribution in a Mobile Com- puting Environment. IEEE Multimedia, 62–74, 2004. 23. G. P. Premkumar. Alternative distribution strategies for digital music. Communication of the ACM, 9(9):89–95, September 2003. 24. P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman. Reputation systems. Communica- tions of the ACM, 43(12):45–48, 2000. 25. A. Roczniak and A. El Saddik, Impact of incentive mechanisms on quality of experience. Proceedings of the 13th annual ACM international conference on Multimedia, pp. 311–314, 2005. 26. D. Rook. The buying impulse. Journal of Consumer research, 14(2):189–199, September 1987. 27. N. B. Salem, M. J. L. Buttyan, and J. P. Hubaux. A charging and rewarding scheme for packet forwarding. In Proceedings of MobiHoc, June 2003. 28. Q. Sun and H. Garcia-Molina, Slic: A selfish link based incentive mechanism for unstructured peer-to-peer networks. In International Conference on Distributed Computing Systems, 2004.
  4. 14 Incentive Mechanisms for Mobile Music Distribution 325 29. Paul Tennent, Malcolm Hall, Barry Brown, Matthew Chalmers, Scott Sherwood, Toward social mobility: Three applications for mobile epidemic algorithms Proceedings of the 7th international conference on Human computer interaction with mobile devices & services MobileHCI ’05. 30. Wireless-Intelligence. World cellular connection. Research report, Wireless Intelligence, 2005. [on-line] Available at https://www.wirelessintelligence.com
  5. Chapter 15 Pattern Discovery and Change Detection of Online Music Query Streams Hua-Fu Li Introduction In recent years, many applications generate large amount of data streams in real time. For example, sensor data generated from sensor networks, online transaction flows in retail chains, stream Web click-sequences and records in Web services and applications, performance measurement in network monitoring and traffic manage- ment, call records in telecommunications. Mining data streams differs from mining traditional static data sets in two main aspects [2]: The volume of a continuous data stream over its lifetime could be huge and fast changing. The queries require timely answers, and the response time is short. Hence, it is not possible to store all the streaming data in main memory or even in secondary storage. This motivates the design for in-memory summary data struc- ture with small memory footprints that can support both one-time and continuous queries. Furthermore, online approach of mining such data has to sacrifice the cor- rectness of their analysis results by allowing some counting errors, i.e., it generates approximate results, and only has single pass over the data [8]. Recently, online music downloading is a hot Web service. Many companies, such as Apple’s iTunes [18], Napster [16], Loudeye, Yahoo’s MusicMatch [17], Kuro [20], KKBox [19], and EasyMusic.com, provide this Web service. Accord- ing to the reports of IFPI (International Federation of the Phonographic Industry; IFPI: http://www.ifpi.org/), there are more than 60 hundred millions of online mu- sic downloads at 2006. For example, the amount of online music downloading of Apple’s iTune is about 50 hundred millions from 2004 to 2007. Hence, knowledge discovery of such online music downloading behaviors of customers is an important research and a practical issue for data mining. H.-F. Li ( ) Department of Computer Science, Kainan University, Taoyuan, Taiwan e-mail: hfli@mail.knu.edu.tw B. Furht (ed.), Handbook of Multimedia for Digital Entertainment and Arts, 327 DOI 10.1007/978-0-387-89024-1 15, c Springer Science+Business Media, LLC 2009
  6. 328 H.-F. Li Fig. 1 Computation model for music query streams The issue comes from the context of online music-downloading services (such as Apple’s iTunes, Napster, Loudeye, Yahoo’s MusicMatch, Kuro, KKBox, and EasyMusic. com), where the stream in question are streams of queries, i.e., music- downloading requests, sent to the server, and we are interested in finding the useful music melody structures requested by most customers during some period of time. The discovered patterns can be used to predict the future trend of online music styles and to personalize the Web services of online music downloading. With the processing model of music query streams presented in Fig. 1 [11, 12], the melody stream processor and the summary data structure are two major components in such a streaming environment. The user query processor receives user queries in the form of , and then transforms the queries into music data (i.e., melody sequences) in the form of by querying the music database. Note that the com- ponent Buffer can be optionally set for temporary storage of recent music melody sequences from the music query streams. Among various techniques of data mining, frequent pattern mining is one of the most popular data mining approaches used to discover the customers’ behaviors from large data sets. However, traditional data mining techniques for discovering frequent patterns are not feasible for mining frequent patterns from such application of online music downloading. Because such data characteristic is streaming, the pro- posed methods for mining such streaming data need new capabilities such as using limited memory to maintain the essential information embedded in an unbound data stream, one-pass data scan and real time processing of each incoming data element. Mining music data is one of the most important research issues in multimedia data mining. Although several techniques have been developed for discovering and analyzing the content of static music data [3, 9, 10, 14, 15], new techniques are needed to analyze and discover the content of streaming music data. Recently, two
  7. 15 Pattern Discovery and Change Detection of Online Music Query Streams 329 efficient one-pass mining algorithms, FCS-stream [11] and MMSLMS [12], were pro- posed by Li et al. for discovering the closed frequent melody structures and the maximal frequent melody structures over the entire history of a continuous music query stream. Both algorithms are stream mining methods of a landmark window. However, the knowledge embedded in streaming data is likely to be changed as time goes by. Identifying the recent changes of data streams quickly can provide valuable information for the analysis of the streaming data [5]. Hence, we need new single-pass approaches for mining frequent patterns from the streaming online mu- sic downloading requests within a sliding window. Several one-pass mining methods [5, 6] were proposed for finding frequent patterns over data streams within a sliding window. The baseline method, called SWFI-stream, for mining frequent patterns over transaction data streams within a sliding window was proposed by Chang and Lee [5]. In the framework of SWFI- stream algorithm, there are two phases for mining frequent patterns over stream sliding windows. One is a window initialization phase. The phase is activated while the number of incoming data transactions generated so far is less than or equal to a predefined window size. The other is a window sliding phase. The second phase is activated after the window becomes full. The SWFI-stream algorithm is composed of four steps. First, all sub-patterns of a transaction are extracted. Second, these sub- patterns are inserted into a prefix-tree lattice structure. Third, all in-frequent patterns are pruned from the lattice structure. Finally, all frequent patterns are generated form the lattice structure. The first two steps are performed in the window initialization phase and the last two steps are performed in the window sliding phase. There are several performance bottlenecks of the typical solution. First, SWFI- stream needs the extra memory to maintain the original window in a temporal list and a prefix-tree lattice structure for storing the frequent patterns and semi-frequent patterns. Second, the processing complexity of enumerating each incoming trans- action is exponential, i.e., O.k 2 /, where k is the length of transaction. Third, the cost of maintaining the prefix-tree lattice structure of this typical solution is also exponential. Chi et al. [6] proposed a sliding window based algorithm, called Moment, which might be the first method to find frequent closed itemsets from transaction data streams. A summary data structure, called CET (Closed Enumeration Tree), is used in the Moment algorithm to maintain a dynamically selected set of itemsets over a sliding window. These selected itemsets consist of closed frequent itemsets and a boundary between the closed frequent itemsets and the rest of the itemsets. CET covers all necessary information because any status changes of itemsets (e.g. from infrequent to frequent or from frequent to infrequent) must be through the boundary in CET. Whenever a sliding occurs, it updates the counts of the related nodes in CET and modifies CET. Experiments of Moment algorithm show that the boundary in CET is stable so the update cost is little. However, Moment must maintain huge CET nodes for a closed frequent itemset. The ratio of CET nodes and closed frequent itemsets is about 30:1. If there are a large number of closed frequent itemsets, the memory requirement of Moment algorithm will be inefficient.
  8. 330 H.-F. Li In this paper, an efficient stream mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is proposed to find the frequent temporal pat- terns over melody sequence streams. In the framework of our proposed algorithm, an effective bit-sequence representation is used to reduce the time and memory needed to slide the windows. The FTP-stream algorithm can calculate the support thresh- old in only a single pass based on the concept of bit-sequence representation. It takes the advantage of “left” and “and” operations of the representation. Experi- ments show that the proposed algorithm only scans the music query stream once, and runs significant faster and consumes less memory than existing algorithms, such as SWFI-stream and Moment. The proposed FTP-stream algorithm is an exact stream mining method. That means the FTP-stream algorithm can generate the set of frequent patterns over music query streams without any information loss. It is because that the proposed algo- rithm uses bit-sequence representation of chord-sets to record the exact frequency of each chord-set. Then, the algorithm constructs the set of frequent patterns by us- ing these bit-sequence representations of chord-sets. Consequently, generating exact results is one of the benefits of our proposed algorithm. After mining frequent temporal patterns from online music query streams, the next issue of this work is that how to use these frequent patterns to predict the future trend of online music styles and to personalize the Web services of online music downloading. Hence, we need new information, i.e., changes of patterns, to as- sist the domain experts to predict the Web user behaviors and personalize the Web services. The second research issue of this paper is change detection of frequent patterns across data streams. With data streams, people are more often interested in mining queries such like “Compared to the history, what are the distinct features of the cur- rent status?”, “What are the most popular melody structures in the last four hours?” and “What are the relatively stable factors over time?” To answer such queries, we have to examine the changes of streaming data to assist the domain experts to predict the future trend of popular online music styles [7, 13]. Therefore, a sim- ple single-pass algorithm, called MQS-change (changes of Music Query Streams), is proposed to detect the changes of frequent patterns across music query streams. Experiments show that the proposed MQS-change algorithm is an effective method to detect the changes of data streams efficiently. Based on our best knowledge, the proposed MSQ-change algorithm is the first stream mining algorithm for discover- ing the changes of frequent patterns over music query data streams. Furthermore, for answering such above example query “What are the most popular melody struc- tures in the last four hours?”, the definitions of MFI (maximal frequent itemset), MFS (maximal frequent item-string), ICI (increasing changed itemset) and ICS (in- creasing changed item-string) can be used as popular melody structures in this paper although there are many other definitions of most popular melody structures depend on domain knowledge of experts. Note that MFI, MFS, ICI, ICS are defined in con- cluding Section. Hence, if the sliding window can be modified to contain the melody sequences generated from last four hours, we can use the proposed MQS-change to mine the most popular melody structures from last four hours.
  9. 15 Pattern Discovery and Change Detection of Online Music Query Streams 331 Problem Definition of Pattern Discovery of Music Query Streams In this section, several features of music data are described and the problem definition of pattern discovery of music query streams is described. The basic ter- minologies on music used in this paper are referred to [9, 10, 14]. A chord is the sounding combination of three or more notes at the same time. A note is a single symbol on a musical score, indicating the pitch and duration of what is to be sung and played. A chord-set is a set of chords. Let « D fi1 ; i2 ; : : : ; in g be a set of chord-sets, called items for simplicity, where n is the total number of chord-sets used for pattern mining. An itemset is a subset of items, i.e., a set of chord-sets. A k-itemset is an itemset with k items, denoted as .x1 ; x2 ; : : : ; xk /, where k is the length of that itemset. For brevity, the commas are omitted. For example, a 3-itemset (a, b, c) is written as (abc), where a, b, c are chord-sets, and the length of (a, b, c) is 3. A melody sequence stream (MSS) is a sequence of incoming melody sequences, Œm1 ; m2 ; : : : ; mN /, where a melody sequence mi is an itemset and N is an unknown large number of melody sequences that will arrive. Note that, in the representation of Œm1 ; m2 ; : : : ; mN /, the symbol “[” is the starting point of incoming melody sequence of the data stream and the symbol “)” is the current point of the data stream. Hence, it means that m1 is the first melody sequence and mN is latest incoming melody sequence of the data stream. The sequence of w recent melody sequences of MSS is called the sliding window (SW) of MSS, where w is the size of the SW. The support of an itemset X , denoted as sup.X /, is the number of melody sequences in SW containing X as a subset. An itemset X is a frequent temporal pattern (FTP), if and only if sup.X / s w, where s is a user-defined minimum support threshold in the range of [0, 1]. An itemset X is called infrequent temporal pattern (ITP), if and only if sup.X / < s w. A frequent temporal pattern is called maximal frequent temporal pattern (MFTP) if and only if it is not a subset of any other frequent temporal patterns. Definition of Problem 1 Given a melody sequence stream MSS and the size of slid- ing window w, the problem of online mining of user-centered music query streams is to discover the set of frequent temporal patterns by one scan of the w recent melody sequences of MSS with an adjustable user-defined minimum support threshold s in the range of [0, 1]. Example 1. Let the first four melody sequences of a stream of melody sequences be ; ; , and , where m1 ; m2 ; m3 , and m4 are the identifiers of melody sequences and a, b, c, d and e are the identifiers of chord-sets, i.e., item identifiers. Let the size w of sliding window be 3 and the user-specified minimum support threshold s be 0.5. The stream of first four melody sequences is composed of two sliding windows, i.e., SW1 D Œm1 ; m2 ; m3  and SW2 D Œm2 ; m3 ; m4 , where first window SW1 contains the sequences m1 ; m2 and m3 , and the second window SW2 contains the sequences m2 ; m3 and m4 . Thus,
  10. 332 H.-F. Li A Melody Sequence Stream FTPs of SW1 FTPs of SW2 (a), (b), (c), (e) (b), (c), (e) (ac), (bc), (be), (ce) (bc), (be), (ce) (bce) (bce) A melody sequence stream is formed by melody sequences arriving in series Fig. 2 An example melody sequence stream and the frequent temporal patterns in two consecutive sliding windows SW1 and SW2 the number of melody sequences of FTPs must have at least two sequences (0.5 of w D 3 is 1.5). The result of example 1 is shown in the right side of Fig. 2, and the explanation on how to get the FTPs in Fig. 2 is given in Section “The Proposed Algorithm FTP-stream”. In Fig. 2, the discovered FTPs in SW1 are four 1-itemsets, f.a/; .b/; .c/; .e/g, four 2-itemsets, f.ac/; .bc/; .be/; .ce/g, and one 3-itemset, f(bce)g. The dis- covered FTPs in SW2 are three 1-itemsets, f.b/; .c/; .e/g, three 2-itemsets, f.bc/; .be/; .ce/g, and one 3-itemset, f(bce)g. In this example, we can find that f.a/; .ac/g are FTPs in SW1 , but are not FTPs in SW2 . Mining of Frequent Temporal Patterns in Music Query Streams Data Processing: Bit-sequence Representation In the proposed algorithm, for each item X in the current sliding window, a bit- sequence with w bits, denoted as Bit.X /, is constructed. If an item X is in the i -th music sequence of current sliding window, the i -th bit of Bit.X / is set to be 1; otherwise, it is set to be 0. The process is called bit-sequence transform. Example 2. Consider an example melody sequence stream in Fig. 2 and assume that sliding window is composed of three melody sequences. Five items (chord-sets), a, b, c, d, and e, are used in this example. The first window SW1 consists of three con- secutive melody sequences: ; , and . Because the item a appears in the 1st and 3rd melody sequences of SW1 , the bit- sequence of a is 101, i.e., Bit.a/ D 101. Finally, a set of bit-sequences of 1-itemsets, i.e., Bit.b/ D 011; Bit.c/ D 111; Bit.d / D 100 and Bit.e/ D 011, is generated by using bit-sequence transform for each new item of SW1 .
  11. 15 Pattern Discovery and Change Detection of Online Music Query Streams 333 The Proposed Algorithm FTP-stream In this section, based on the representations of appearing items, an efficient steam mining algorithm, called FTP-stream (Frequent Temporal Pattern mining of streams), is introduced. In the framework of FTP-stream algorithm, an effective list structure, called 2C-list (a list of 2-C andidates), is constructed after performing bit-sequence transform of each incoming item. Therefore, each item has its unique 2C-list. Each entry in the 2C-list consists of two fields: X and sup.X /, where X is the item identifier of the item being inserted and sup.X / registers the number of sequences containing the item X. The process of stream mining is composed of three phases: window initialization phase, window sliding phase, and frequent temporal pattern generation phase. Window Initialization Phase of FTP-stream Algorithm The window initialization phase is activated while the number of melody sequences generated so far in a melody sequence stream is less than or equal to a user- predefined sliding window size w. In this phase, each item in the new incoming melody sequence is transformed into its bit sequence representation by using the bit-sequence transform. Example 3. Consider the melody sequence stream in Fig. 2. The first sliding win- dow SW1 contains three melody sequences: m1 ; m2 , and m3 . The bit-sequences of items and 2C-lists of sliding window SW1 in the initialization phase of FTP-stream algorithm are shown in Fig. 3. Fig. 3 Bit-sequences and 2C-lists of items of SW1 after window initialization phase (from m1 to m3 )
  12. 334 H.-F. Li Window Sliding Phase of FTP-stream Algorithm The window sliding phase is activated after the sliding window becomes full. A new incoming melody sequence is appended to the current sliding window, and the oldest melody sequence is removed from the window. For removing oldest information, an effective pruning method is used in the FTP- stream algorithm. Based on the bit-sequence representation, FTP-stream uses the bitwise-left-shift operation to remove the aged melody sequences from the set of items in the current sliding window. After sliding the window, an effective pruning method, called Item-Prune, is used to improve the memory requirement of FTP- stream. The pruning approach is that an item X in the current sliding window is dropped if and only if sup.X / D 0. Note that the support of an item(-set) is com- puted by counting the number of ones from its bit sequence representation. For example, sup.a/ is 2 and sup.c/ is 3 since Bit.a/ D 101 and Bit.c/ D 111. Example 4. Consider the melody sequence stream in Fig. 2. Before the fourth melody sequence is processed, the first melody sequence m1 must be removed from the current sliding window using the bitwise-left-shift operation on the set of items. Hence, Bit .a/ is modified from 101 to 010 by using bitwise- left-shift operation. After performing bitwise-left-shift operation for each item, we got Bit.c/ D 110; Bit.d / D 000; Bit.b/ D 110, and Bit.e/ D 110. Then, the new melody sequence is processed by using bit-sequence transform. The result is shown in Fig. 4. Note that the item d is dropped since Bit.d / D 000, i.e., sup.d / D 0, based on Item-Prune method. Frequent Temporal Pattern Generation Phase of FTP-stream The frequent temporal pattern phase is performed only when the up-to-date set of FTPs is requested. In this phase, the FTP-stream algorithm uses a level-wise Fig. 4 Bit-sequences and 2C-lists of items after sliding SW1 to SW2
  13. 15 Pattern Discovery and Change Detection of Online Music Query Streams 335 method to generate the set of candidate temporal patterns CTPk (candidate temporal patterns with k items) from the pre-known frequent temporal patterns FTPk 1 (frequent temporal patterns with k 1 items) according to the Apriori property, where k > 2. The step is called CTP-Gen-W2C (Candidate Temporal Pattern Generation Without 2-Candidates). Then, FTP-stream uses the bitwise AND op- eration to compute the supports of these candidates in order to find the frequent ones FTPk . The generation-then-test process is stopped until no new candidates with k C 1 items .CTPkC1 / are generated. One of the benefits of the proposed al- gorithm is that the FTP-stream algorithm generates candidates from 3-candidates to l-candidates, where l is the size of largest itemset. It is because the set of frequent 2-itemsets can be determined by the 2C-lists and its bit-sequences. Consequently, it overcomes the performance bottleneck of 2-candidate generation of the level-wise frequent pattern mining algorithms. Example 5. Consider the bit-sequences and 2C-lists of items of SW2 in Fig. 4, and let the minimum support threshold s be 0.5. Hence, a temporal pattern X is frequent if sup.X / 0:5 3 D 1:5. In the following, we discuss the mining steps of frequent temporal patterns of SW2 . First, the FTP-stream algorithm checks the bit-sequence of each item to find the frequent 1-itemsets, i.e., .b/; .c/ and .e/. Then, FTP-stream generates frequent 2- itemsets, .bc/; .be/ and .ce/, by combining frequent 1-item .c/ with item e where sup.e/ D 2 in 2C-list of item e and frequent 1-item .b/ with items c and e in 2C- list of item b. After that we find that sup.ce/ D 2; sup.bc/ D 2, and sup.be/ D 3. Then, the algorithm generates one candidate 3-itemset (bce) from 2C-list of item b according to Apriori property. After that FTP-stream uses bitwise AND operation to count the sup.bce/ D 2, i.e., Bit.b/ AND Bit.c/ AND Bit.e/ D 110. Because no new candidates are generated, the generation-then-test process is stopped. Hence, there are six frequent temporal patterns, .b/; .c/; .bc/; .be/; .ce/; .bce/, generated by FTP-stream in SW2 . Experimental Evaluation of Pattern Discovery of Music Query Streams In this section, the experiments are performed to compare the proposed algorithm FTP-stream with the algorithms SWFI-stream [5] and Moment [6]. The source code of Moment algorithm, denoted as MomentFP, is provided by Dr. Yun Chi [5]. In this paper, we focus on the problem of mining frequent itemsets from stream slid- ing window, but MomentFP algorithm is a closed frequent itemset mining method. Hence, we modified the MomentFP algorithm by enumerating each closed frequent itemset into a subset of frequent itemsets. The new MomentFP algorithm for mining frequent itemsets is called MomentFPC in this paper.
  14. 336 H.-F. Li All the programs are implemented using Microsoft Visual C C C Version 6.0 and performed on a 1.80 GHz Pentium R PC machine with 512 MB memory running on Windows 2000. For testing frequent temporal patterns mining of melody sequence streams, we generate melody sequence streams using IBM synthetic data genera- tor proposed by Agrawal and Srikant [1]. One synthetic melody sequence stream, denoted by T5.I4. D1000K, of size 1 million melody sequences each are used to evaluate the performance of the proposed FTP-stream algorithm. T5.I4.D1000K, with 1,000 unique items, has an average melody sequence size of five items with an average maximal frequent temporal pattern size of four items. The minimum support threshold s used in the following experiments is set to 0.001. The comparisons of memory usages of the existing algorithms with the pro- posed FTP-stream algorithm are shown in Figs. 5, 6 and 7. Figure 5 shows the memory usage of the window initialization phase. As shown in Fig. 5, FTP-stream consumes only about 2.9 MB in window initialization phase, but the memory con- sumption of original data is increased linearly from 0.3 MB to 4.2 MB. From the figure, we can see that as the number of incoming melody sequences increases, the memory requirements of all algorithms grow. However, the memory requirement of Original Data FTP-stream SWFI-stream MomentFP+ 120 Memory Usage (MB) 100 80 60 40 20 0 2K 4K 6K 8K 10K 12K 14K 16K 18K 20K Incoming Melody Sequences (Window Size = 20K) Fig. 5 Comparisons of memory usages in the window initialization phase Original Data FTP-stream SWFI-stream MomentFP+ 120 Memory Usage (MB) 100 80 60 40 20 0 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Incoming Sliding Windows (Window Size = 20K) Fig. 6 Comparisons of memory usages in the window sliding phase
  15. 15 Pattern Discovery and Change Detection of Online Music Query Streams 337 FTP-stream SWFI-stream MomentFP+ Memory Usage (MB) 120 100 80 60 40 20 0 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Incoming Sliding Windows (Window size = 20K) Fig. 7 Comparisons of memory usages in frequent temporal patterns generation phase FTP-stream SWFI-stream MomentFP+ 500 Processing Time (seconds) 400 300 200 100 0 10K 20K 40K 60K 80K 100K Window Size (1K = 1,000) Fig. 8 Comparisons of processing time of window initialization phase under different window sizes our FTP-stream algorithm is significant less than that of algorithms SWFI-stream and MomentFPC. Figure 6 shows the memory usage of the window sliding phase. In the window sliding phase, the memory usage of FTP-stream is approximately a half of original data. The memory requirements of algorithms SWFI-stream and MomentFPC are significant larger than that of FTP-stream algorithm. Figure 7 shows the memory usage of the frequent temporal pattern generation phase. In this phase, the memory usage of FTP-stream is between 43 MB to 48 MB but the memory requirement of FTP-stream is still less than that of algorithms SWFI-stream and MomentFPC. Con- sequently, the memory requirement of the proposed FTP-stream algorithm is less than that of the existing well-known algorithms SWFI-stream and MomentFPC. The comparisons of processing time of these algorithms are given in Figs. 8, 9, and 10. Figure 8 shows the processing time of window initialization phase of algorithms, FTP-stream, SWFI-stream and MomentFPC, under different window sizes from 10 K melody sequences to 100 K melody sequences. From the figure,
  16. 338 H.-F. Li FTP-stream SWFI-stream MomentFP+ Processing Time (seconds) 120 100 80 60 40 20 0 10K 20K 40K 60K 80K 100K Window Size (1K = 1,000) Fig. 9 Comparisons of mining time of frequent temporal patterns under different window sizes FTP-stream SWFI-stream MomentFP+ 600 Total Processing Time 500 400 (seconds) 300 200 100 0 10K 20K 40K 60K 80K 100K Window Size (1K = 1,000) Fig. 10 Comparisons of total processing time (Dwindow initialization timeCwindow sliding timeCpattern generation time) under different window sizes the processing time of FTP-stream is less than that of algorithms SWFI-stream and MomentFPC. Figure 9 shows the mining time of frequent temporal patterns under different window sizes from 20,000 melody sequences to 100,000 melody sequences. From this figure, we can find that the frequent pattern generation time of FTP-stream is greater than that of algorithms SWFI-stream and MomentFPC. The reason is that the FTP-stream algorithm uses a level-wise method to generate all frequent pat- terns in this phase. Although the processing time of FTP-stream in this phase is greater than the existing algorithms, the total processing time (included window ini- tialization time, window sliding time and pattern generation time) is less than that of algorithms SWFI-stream and MomentFPC. The result is shown in Fig. 10. Con- sequently, the proposed FTP-stream algorithm is faster than the existing algorithms SWFI-stream and MomentFPC.
  17. 15 Pattern Discovery and Change Detection of Online Music Query Streams 339 Change Detection of Online Music Query Streams Problem Definition In this section, we describe several features of music data used in this paper and define the problem of detecting changes of user-centered music query streams. For the basic terminologies on music, we refer to [8, 12]. Definition 1. The type I melody structure is represented as a set of chord-sets. Definition 2. The type II melody structure is represented as a string of chord-sets. Note that the difference between sets and strings is the order of elements. The order of chord-sets in a set (type I melody structure) is not necessary, but it is important in a string (type II melody structure). For example, given a set of chord- sets fa; b; cg, there are one set of chord-set (abc) with 3 chord-sets, but there are six strings of chord-sets with 3 chord-sets, i.e., ; ; ; ; , and . Definition 3. A string Z is called an item-string, i.e., a string of chord-sets. A k-item-string is represented by , where xi 2 «; 8i D 1; 2; : : : ; k. The support of an item-string Z, denoted as sup.Z/, is the number of melody sequences containing Z as a substring in the MSS so far. An item-string is frequent if sup.Z/ s jMSSj. Definition 4. A frequent itemset is called a maximal frequent itemset (MFI) if it is not a subset of any other frequent itemsets. Definition 5. A frequent item-string is called a maximal frequent item-string (MFS) if it is not a substring of any other item-strings. Definition 6. A maximal frequent itemset P is called positive itemset burst (PIB) if its sup.P /z sup.P /z 1 @MFI , where @MFI is a user-specified itemset burst threshold in the range of [0, 1], sup.P /z is the estimated support of P from window w1 to window wz , and z is the window identifier of current window. Definition 7. A maximal frequent itemset P is called negative itemset burst (NIB) if its sup.P /z 1 sup.P /z @MFI . Definition 8. A maximal frequent item-string Q is called positive item-string burst (PSB) if its sup.Q/z sup.Q/z 1 @MFS , where @MFS is a user-specified item- string burst threshold in the range of [0, 1], sup.Q/z is the estimated support of Q from window w1 to window wz . Definition 9. A maximal frequent item-string Q is called negative item-string burst (NSB) if its sup.Q/z 1 sup.Q/z @MFS .
  18. 340 H.-F. Li Definition 10. A maximal frequent itemset P is called increasing changed itemset (ICI) if @MFI > .sup.P /iC1 sup.P/i / "MFI ; 8i; i D z h1 C1; z h1 C2; : : : ; z, where "MFI is a user-specified increasing changed itemset threshold in the range of [0, 1], and h1 is a number of basic windows defined by user. Definition 11. A maximal frequent item-string Q is called increasing changed item-string (ICS) if @MFS > .sup.Q/j C1 sup.Q/j / "MFS ; 8j; j D z h2 C 1; z h2 C 2; : : : ; z, where "MFS is a user-specified increasing changed item-string threshold in the range of [0, 1], and h2 is a number of basic windows defined by user. Definition 12. A maximal frequent itemset P is called decreasing changed itemset (DCI) if @MFI > .sup.Q/j sup.Q/j C1 / MFI ; 8j; j D z h1 C 1; z h1 C 2; : : : ; z, where MFI is a user-specified decreasing changed item-string threshold in the range of [0, 1], and h1 is a number of basic windows defined by user. Definition 13. A maximal frequent item-string Q is called decreasing changed item-string (DCS) if @MFS > .sup.Q/j sup.Q/j C1 / MFS ; 8j; j D z h2 C1; z h2 C 2; : : : ; z, where MFS is a user-specified decreasing changed item-string threshold in the range of [0, 1], and h2 is a number of basic windows defined by user. Definition of Problem 2 Given a MSS, s; @MFI ; @MFS ; "MFI ; "MFS ; MFI , and MFS , the problem of detecting changes in user-centered music query streams is to main- tain the set of MFI and MFS, and to detect the set of PIB, NIB, PSB, NSB, ICI, ICS, DCI, and DCS, by one scan of a continuous user-centered music query stream. Detecting Changes from User-centered Music Query Streams In this section, we developed an effective algorithm, called MQS-change (changes of Music Query Streams), to detect the changes of maximal frequent melody struc- tures in current user-centered music query streams. Two music melody structures (set of chord-sets and string of chord-sets) are maintained and four melody structure changes (positive burst, negative burst, increasing change and decreasing change) are monitored in a new summary data structure, called MSC-list (a list of Music Structure Changes). The Proposed Summary Data Structure MSC-list The proposed summary data structure, called MSC-list, consists of two temporal lists, MFI-list and MFS-list, where MFI-list is a list of entries which contains cur- rent maximal frequent itemsets, and MFS-list is a list of entries which maintains maximal frequent item-strings so far. Each entry of MFI-list consists of two fields: pattern-id Y and support-list Y.support-list, where pattern-id is a unique identifier of this maximal frequent itemset, and support-list is composed of a list of .sup.Y /; i /, where i is the window identifier of window wi containing the itemset e.
  19. 15 Pattern Discovery and Change Detection of Online Music Query Streams 341 For example, an entry of MFI- list indicates that the itemset abcd is a maximal frequent itemset and its estimated support is 30% in window w1 , 37% in w2 , 46% in w3 , and 70% in w4 . Assume that the @MFI is 0.2 (i.e., 20%), "MFI D 0:05 (i.e., 5%), and h1 be 3 (i.e., three consecutive windows). Hence, the pattern abcd is an increasing changed itemset from windows w1 to w3 , and has a positive itemset burst in window w4 . Each entry of MFS-list also consists of two fields: pattern-id Z and support-list Z. support-list, where pattern-id is a unique identifier of this maximal frequent item- string, and support-list is composed of a list of .sup.Z/; i /, where i is the identifier of window wi containing the item-string Z. In the following, we use the term maximal frequent pattern (MFP) to substitute the maximal frequent itemset and maximal frequent item-string. Two operations are used to maintain the MSC-list: 1. Update MSC-list: For each entry of MSC-list, MQS- change algorithm updates the support-list of this entry, i.e., append a new support record to the support-list. If an entry e is not a MFP, i.e., sup.e/ < s jMSSj, the entry is deleted from the current MSC-list. 2. New MSC-list: if MQS-change finds a maximal frequent pattern P from the cur- rent window wz and P … MSC-list, and sup.P / s w, where s is the minimum support threshold in the range of [0, 1], and w is the window size, a new entry of the form , where z is the current window identifier, is created in the current MSC-list. In this section, a new summary data structure, called MSC-list (a list of Music Structure Changes), is developed to maintain the essential information about MFI, MFS, PIB, NIB, PSB, NSB, ICI, ICS, DCI, and DCS with their supports embedded in the individual window of the current MSS. A simple single-pass algorithm, called MQS-change (changes of Music Query Streams), is proposed to mine the changes from user-centered music query streams. The Proposed MQS-change Algorithm The proposed MQS-change (changes of Music Query Streams) algorithm is com- posed of four steps. First, MQS-change repeatedly reads a window of melody sequences into available main memory. Second, the maximal frequent itemsets and maximal frequent item-strings in the current window are mined using the proposed FTP-stream algorithm, and added into MSC-list with their potential supports com- puted. Third, the set of MFIs and MFSs are maintained in the current MSC-list, and the changes are verified by MQS-change. Finally, MQS-change will return the changed patterns immediately if the user-centered music query stream has a change. For discovering each change of patterns, the proposed MQS-change algorithm is divided into five mining procedures, called MQS-change-MFIS (MQS-change for MFI and MFS, as shown in Fig. 11), MQS-change-PNIB (MQS-change for PNI and
  20. 342 H.-F. Li Procedure MQS-change-MFIS (maximal frequent itemsets and item-string mining of MQS-change algorithm) Input: (1) MSS, (2) s. Output: a MSC-list which is composed of MFI and MFS. Begin MSC-list = NULL; Repeat: foreach window wi in MSS do /*∀i =1,2, …, z, and z is the current window id */ Mine MFPs from wi by using FTP-stream algorithm; foreach MFP of wi do if MFP∈MSC-listthen Update MSC-list; else New MSC-list; endif endfor foreach entry e in the MSC-list do /* Pruning of MSC-list */ foreach sup (e) in e.support-list do if sup(e)i < s • (z−i+1) then /* i is the window identifier and sup (e)i is the estimated support of e from w1 to wi */ Delete the entry (sup (e), i) from e.support-list; endif endfor if sup (e) < s • | MSS | then /* entry e is not a MFP */ Delete e from MSC-list; endif endfor endfor End Fig. 11 Procedure MQS-change-MFIS of MQS-change algorithm − − Fig. 12 Procedure MQS-change-PNIB of MQS-change algorithm PNS, as shown in Fig. 12), MQS-change-PNSB (MQS-change for PNS and PNB, as shown in Fig. 13), MQS-change-ICIS (MQS-change for ICI and ICS, as shown in Fig. 14), and MQS-change-DCIS (MQS-change for DCI and DCS, as shown in
Đồng bộ tài khoản