Handbook of Multimedia for Digital Entertainment and Arts- P10

Chia sẻ: Cong Thanh | Ngày: | Loại File: PDF | Số trang:30

Thêm vào BST

Báo xấu

84
lượt xem 7
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

Handbook of Multimedia for Digital Entertainment and Arts- P10: The advances in computer entertainment, multi-player and online games, technology-enabled art, culture and performance have created a new form of entertainment and art, which attracts and absorbs their participants. The fantastic success of this new field has influenced the development of the new digital entertainment industry and related products and services, which has impacted every aspect of our lives.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Handbook of Multimedia for Digital Entertainment and Arts- P10

11 Hack-proof Synchronization Protocol for Multi-player Online Games 261 Fig. 16 The path of the local avatar Q (thicker line) and the path of the non-local avatar P (thinner line) rendered on Q’s local machine which zoomed into the last 60 s chronization (AS). Using AS, each host advances in time asynchronously from the other players but enters into the lockstep mode when interaction occurs. When en- tering the lockstep mode, in every timeframe t each involved player must wait for all packets from other players before advancing to timeframe t C 1. Because this is a stop-and-wait protocol, extrapolation cannot be used to smooth out any delay caused by the network latency. In [12], the authors improve the performance of the lockstep protocol by adding pipelines. Extrapolation is still not allowed under the pipelined lockstep protocol. Therefore, if there is an increased network latency and packets are delayed, the game will be stalled. In [10], the authors propose a sliding pipeline protocol that dynamically adjusts the pipeline depth to reﬂect current network conditions. The authors also introduce a send buffer to hold the commands generated while the size of the pipeline is ad- justed. The sliding pipeline protocol allows extrapolation to smooth out jitters. Although these protocols are designed to defend against the suppress-correct cheat, it can also prevent speed-hacks when entering into the lock-step mode be- cause players are forced to synchronize within a bounded amount of timeframes.
262 Y.S. Fung and J.C.S. Lui However, speed-hack can still be effective when lock-step mode is not activated. And since these protocols do not allow packets to be dropped, any lost packet must be retransmitted until they are ﬁnally sent and acknowledged. Therefore, the min- imum timeframe of the game cannot be shorter than the maximum latency of the player with the slowest connection and all clients must run the game at a speed that even the slowest client can support. Furthermore, any sudden increase in the latency will cause jitters to all players. Our protocol does not incur any lock-step requirement to game clients while the advantage of loose synchronization in conventional dead-reckoning protocol is completely preserved. Thus, smooth gameplay can be ensured. As we have proved in Section “Proof of Invulnerability”, a cheater can only cheat by generating malicious timestamps and it can be detected easily and immediately. Therefore, the speed-hack invulnerability of our protocol will be enforced throughout the whole game session so that any action of cheating can be detected immediately. Moreover, the AS protocol requires a game client to enter the lock-step mode when interaction occurs which requires a major modiﬁcation of the client code to realize it. However, existing games can be modiﬁed easily to adapt our proposed protocol. One can simply add a plugin routine to convert a dead-reckoning vector to the synchronization parameters before sending out the update packets, and add another plugin routine to convert back the synchronization parameters to a dead- reckoning vector on receiving the packets. The NEO protocol [13] is based on [2], the authors describe ﬁve forms of cheating and claim that the NEO protocol can prevent these cheating. In [17], the authors show that for the ﬁve forms of cheating [13] designed to pre- vent, it prevents only three. They propose another Secure Event Agreement (SEA) protocol that prevents all ﬁve forms of cheating which the performance is at worst equal to NEO and in some cases better. In [19], the authors show that both NEO and SEA suffer from the undo cheat. Let PH denote an honest player and PC denote a cheater, and MH ; KH and MC ; KC represent the message and its key from PH and PC respectively. The cheater PC performs the undo cheat as follows: both players send their encrypted game moves (MH and MC ) normally in the commit phase. Then, PH sends key KH in the reveal phase. However, PC delays KC until KH is received and MH is revealed. If PC ﬁnd that MC is poor against MH ; PC will purposely drop KC and therefore undoing the move MC . The authors then propose another anti-cheat scheme for P2P games called RACS which relies on the existence of a trusted referee. The referee is responsible for T1 - receiving player updates, T2 - simulating game play, T3 - validating and resolving conﬂicts in the simulation, T4 - disseminating updates to clients and T5 - storing the current game state. The referee used in RACS works very likely to a traditional game server in con- ventional client-server architecture. The security of RACS completely depends on the referee. For example, speed-hack can be prevented with validating every state updates by the referee. Although RACS is more scalable than client-server architec- ture, it suffers from the same problem that the involvement of a trusted third party is required.
11 Hack-proof Synchronization Protocol for Multi-player Online Games 263 Conclusion In this paper, we presented a synchronization protocol for multi-player online games that support dead-reckoning. Meanwhile, it is invulnerable to a very common type of cheat called speed-hack. The general idea is that the server or peer players can use the legal speed of an avatar to compute its position from a set of update param- eters. This eliminates the need to state the avatar’s position directly in the update packets. Even if the cheater is able to modify the data in the update packets, the cheater cannot spoof other players to render a faster moving avatar because the dis- placement an avatar can travel is now bounded by the legal speed of the player that is authorized by the server (in client-server architecture) or among all peers (in P2P architecture). We have used various examples to illustrate our protocol and proved the security feature of our proposal. We have carried out simulations to demonstrate the feasibility of our protocol. References 1. Banavar H, Aggarwal S, Khandelwal A (2004) Accuracy in dead-reckoning based distributed multi-player games. In: Proceedings of NetGames 2004, Portland, August 2004, pp 161–165 2. Baughman NE, Levine BN (2001) Cheat-proof playout for centralized and distributed online games. In: Proceedings of IEEE INFOCOM. IEEE, Piscataway, pp 104–113 3. Counter Hack (2007) Types of Hacks. http://wiki.counter-hack.net/CategoryGeneralInfo 4. DeLap M et al (2004) Is runtime veriﬁcation applicable to cheat detection. In: Proceedings of NetGames 2004, Portland, August 2004, pp 134–138 5. Diot C, Gautier L (1999) A distributed architecture for multiplayer interactive applications on the internet. In: IEEE Networks magazine, Jul–Aug 1999 6. Diot C, Gautier L, Kurose J (1999) End-to-end transmission control mechanisms for mul- tiparty interactive applications on the internet. In: Proceedings of IEEE INFOCOM, IEEE, Piscataway 7. Even Balance (2007) Ofﬁcial PunkBuster website. http://www.evenbalance.com 8. Feng WC, Feng WC, Chang F, Walpole J (2005) A trafﬁc characterization of popular online games. IEEE/ACM Trans Netw 13(3):488–500 9. Gautier L, Diot C (1998) Design and evaluation of mimaze, a multiplayer game on the Internet. In: Proceedings of IEEE Multimedia (ICMCS’98). IEEE, Piscataway 10. Jamin S, Cronin E, Filstrup B (2003) Cheat-prooﬁng dead reckoned multiplayer games (extended abstract). In: Proc. of 2nd international conference on application and development of computer games, Hong Kong, 6–7 January 2003 11. Lee FW, Li L, Lau R (2006) A trajectory-preserving synchronization method for collaborative visualization. IEEE Trans Vis Comput Graph 12:989–996 (special issue on IEEE Visualiza- tion’06) 12. Lenker S, Lee H, Kozlowski E, Jamin S (2002) Synchronization and cheat-prooﬁng proto- col for real-time multiplayer games. In: International Worshop on Entertainment Computing, Makuhari, May 2002 13. Lo V, GauthierDickey C, Zappala D, Marr J (2004) Low latency and cheatproof event ordering for peer-to-peer games. In: ACM NOSSDAV’04, Kinsale, June 2004 14. Mills DL (1992) Network time protocol (version 3) speciﬁcation, implmentation and analysis. In: RFC-1305, March 1992 15. MPC Forums (2007) Multi-Player Cheats. http://www.mpcforum.com 16. Pantel L, Wolf L (2002) On the impact of delay on real-time multiplayer games. In: ACM
264 Y.S. Fung and J.C.S. Lui NOSSDAV’02, Miami Beach, May 2002 17. Schachte P, Corman AB, Douglas S, Teague V (2006) A secure event agreement (sea) protocol for peer-to-peer games. In: Proceedings of ARES’06, Vienna, 20–22 April 2006, pp 34–41 18. Simpson ZB (2008) A stream based time synchronization technique for networked computer games. http://www.mine-control.com/zack/timesync/timesync.html 19. Soh S, Webb S, Lau W (2007) Racs: a referee anti-cheat scheme for p2p gaming. In: Proceed- ings of NOSSDAV’07, Urbana-Champaign, 4–5 June 2007, pp 34–42 20. The Z Project (2007) Ofﬁcial HLGuard website. http://www.thezproject.org 21. Wikipedia (2007) Category: Anti-cheat software. http://en.wikipedia.org/wiki/Category:Anti- cheat software
Chapter 12 Collaborative Movie Annotation Damon Daylamani Zad and Harry Agius Introduction Web 2.0 has enjoyed great success over the past few years by providing users with a rich application experience through the reuse and amalgamation of different Web services. For example, YouTube integrates video streaming and forum technologies with Ajax to support video-based communities. Online communities and social net- works such as these lie at the heart of Web 2.0. However, while the use of Web 2.0 to support collaboration is becoming common in areas such as online learning [1], operating systems coding [2], e-government [3], and ﬁltering [4], there has been very little research into the use of Web 2.0 to support multimedia-based collab- oration [5], and very little understanding of how users behave when undertaking multimedia content-based activities collaboratively, such as content analysis, se- mantic content classiﬁcation, annotation, and so forth. At the same time, spurred on by falling resource costs which have reduced limits on how much content users can upload, online communities and social networking sites have grown rapidly in popularity and with this growth has come an increase in the production and sharing of multimedia content between members of the community, particularly users’ self- created content, such as song recordings, home movies, and photos. This makes it even more imperative to understand user behaviour. In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufﬁcient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which em- body more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. D.D. Zad and H. Agius ( ) School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, Middlesex, UK e-mail: damon.zad@brunel.ac.uk; harryagius@acm.org B. Furht (ed.), Handbook of Multimedia for Digital Entertainment and Arts, 265 DOI 10.1007/978-0-387-89024-1 12, c Springer Science+Business Media, LLC 2009
266 D.D. Zad and H. Agius Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an archi- tecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG- 7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with exist- ing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation. Collaborative Retrieval and Tagging We now consider research in collaborative retrieval and tagging within three areas: research that centres on a community-based approach to data retrieval or data rank- ing, collaborative tagging of non-video ﬁles, and collaborative tagging of videos. The research in each of these areas is trying to simplify and reduce the size of a vast problem by using collaboration among members of a community. This idea lies at the heart of the architecture presented in this paper. Collaborative Retrieval Retrieval is a core focus of contemporary systems, particularly Web-based mul- timedia systems. To improve retrieval results, a body of research has focused on adopting the collaborative approach of social networks. One area in which collab- oration has proven beneﬁcial is that of reputation-based retrieval, where retrieval results are weighted according to the reputation of the sources. This approach is employed by Chen et al. [4] who propose adaptive community-based multimedia retrieval using an agent reputation model that is based on social network analy- sis methods. Sub-group analysis is conducted for better support of collaborative ranking and community-based search. In social network analysis, relational data is represented using ‘sociograms’ (directed and weighted graphs), where each partici- pant is represented as a node and each relation is represented as an edge. The value of a node represents an importance factor that forms the corresponding participant’s reputation. Peers who have higher reputations should affect other peers’ reputations to a greater extent, therefore the quality of data retrieval of each peer database can be signiﬁcantly different. The quality of the data stored in them can also be differ- ent. Therefore, the returned results are weighted according to the reputations of the sources. Communities of peers are created through clustering. Koru [6] is a search engine that exploits Web 2.0 collaboration in order to provide knowledge bases automatically, by replacing professional experts with thousands or
12 Collaborative Movie Annotation 267 even millions of amateur contributors. One example is Wikipedia, which can be directly exploited to provide manually-deﬁned yet inexpensive knowledge bases, speciﬁcally tailored to expose the topics, terminology and semantics of individual document collections. Koru is evaluated according to how well it assists real users in performing realistic and practical information retrieval tasks. Collaboration in ﬁltering is common. For example, Chen et al. [7] provide a framework for collaborative ﬁltering that circumvents the problems of traditional memory-based and model-based approaches by applying orthogonal nonnegative matrix tri-factorization (ONMTF). Their algorithm ﬁrst applies ONMTF to simul- taneously cluster the rows and columns of the user-item matrix, and then adopts the user-based and item-based clustering approaches respectively to attain individual predictions for an unknown test rating. Finally, these ratings are fused with a linear combination. Simultaneously clustering users and items improves on the scalability problem of such systems, while fusing user-based and item-based approaches can improve performance further. As another example, Yang and Li [8] propose a collab- orative ﬁltering approach based on heuristic formulated inferences. This is based on the fact that any two users may have some common interest genres as well as differ- ent ones. Their approach introduces a more reasonable similarity measure metric, considers users’ preferences and rating patterns, and promotes rational individual prediction, thus more comprehensively measuring the relevance between user and item. Their results demonstrate that the proposed approach improves the prediction quality signiﬁcantly over several other popular methods. Collaborative Tagging of Non-Video Media Collaborative tagging has been used to create metadata and semantics for different media. In this section, we review some examples of research concerning collab- orative tagging of non-video media. SweetWiki [9] revisits the design rationale of wikis, taking into account the wealth of new Web standards available, such as for the wiki page format (XHTML), for the macros included in pages (JSPX/XML tags), for the semantic annotations (RDFa, RDF), and for the ontologies it manipulates (OWL Lite). SweetWiki improves access to information with faceted navigation, enhanced search tools and awareness capabilities, and acquaintance networks iden- tiﬁcation. It also provides a single WYSIWYG editor for both metadata and content editing, with assisted annotation tools (auto-completion and checkers for embedded queries or annotations). SweetWiki allows metadata to be extracted and exploited externally. There is a growing body of research regarding the collaborative tagging of pho- tos. An important impetus for this is the popularity of photo sharing sites such as Flickr. Flickr groups are increasingly used to facilitate the explicit deﬁnition of com- munities sharing common interests, which translates into large amounts of content (e.g. pictures and associated tags) about speciﬁc subjects [10]. The users of Flickr have created a vast amount of metadata on pictures and photos. This large number
268 D.D. Zad and H. Agius of images has been carefully annotated for the obvious reason they were accessible to all users and therefore the collaboration of these users has resulted in producing an impossible amount of metadata that is not perceivable without such collabo- ration. Zonetag [11] is a prototype mobile application that uploads camera phone photos to Flickr and assist users with context-based tag suggestions derived from multiple sources. A key source of suggestions is the collaborative tagging activ- ity on Flickr, based on the user’s own tagging history and the tags associated with the location of the user. Combining these two sources, a prioritized suggested tag list is generated. They use several heuristics that take into account the tags’ social and temporal context, and other measures that weight the tag frequency to create a ﬁnal score. These heuristics are spatial, social and temporal characteristics; they gather all tags used in a certain location regardless of the exact location, tags the users themselves applied in a given context are more likely to apply to their cur- rent photo than tags used by others, and ﬁnally tags are more likely to apply to a photo if they have been used recently. CONFOTO [12] is a browsing and an- notation service for conference photos which exploits sharing and collaborative tagging through RDF (Resource Description Framework) to gain advantages like unrestricted aggregation and ontology re-use. Finally, Bentley et al. [13] performed two separate experiments: one asking users to socially share and tag their personal photos and one asking users to share and tag their purchased music. They discov- ered multiple similarities between the two in terms of how users interacted and annotated the media, which have implications for the design of future music and photo applications. Collaborative Tagging of Video Media We now review some examples of research concerning collaborative tagging of video media. Yamamoto et al. [14] present an approach for video scene annota- tion based on social activities associated with the content of video clips on the Web. This approach has been demonstrated through assisting users of online forums associate video scenes with user comments and through assisting users of We- blog communications generate entries that quote video scenes. The system extracts deep-content-related information about video contents as annotations automatically, allowing users to view any video, submit and view comments about any scene, and edit a Weblog entry to quote scenes using an ordinary Web browser. These user comments and the links between comments and video scenes are stored in annotation databases. An annotation analysis block produces tags from the accu- mulated annotations, while an application block has a tag-based, scene-retrieval system. IBM’s Efﬁcient Video Annotation (EVA) system [15] is a server-based tool for semantic concept annotation of large video and image collections, optimised for collaborative annotation. It includes features such as workload sharing and support in conducting inter-annotator analysis. Aggregate-level user data may be collected
12 Collaborative Movie Annotation 269 during annotation, such as time spent on each page, number and size of thumbnails, and statistics about the usage of keyboard and mouse. EVA returns visual feedback on the annotation. Annotation progress is displayed for the given concept during annotation and overall progress is displayed on the start page. Ulges et al. [16] present a system that automatically tags videos by detecting high-level semantic concepts, such as objects or actions. They use videos from on- line portals like YouTube as a source of training data, while tags provided by users during upload serve as ground truth annotations. Elliot and Ozsoyoglu [17] present a system that shows how semantic metadata about social networks and family relationships can be used to improve semantic annotation suggestions. This includes up to 82% recall for people annotations as well as recall improvements of 20-26% in tag annotation recall when no anno- tation history is available. In addition, utilising relationships among people while searching can provide at least 28% higher recall and 55% higher precision than keyword search while still being up to 12 times faster. Their approach to speed- ing up the annotation process is to build a real-time suggestion system that uses the available multimedia object metadata such as captions, time, an incomplete set of related concepts, and additional semantic knowledge such as people and their relationships. Finally, Li and Lu [18] suggest that there are ﬁve major methods for collaborative tagging and all systems and applications ﬁt into one of these ﬁve categories: Ontology approaches: FolksAnnotation, a system that extracts tags from del.ici.ous and maps them to various ontology concepts, has helped to demon- strate that semantics can be derived from tags. However, before any ontological mapping can occur, the vocabulary usually must be converted to a consistent format for string comparison. Statistical and pattern approaches: These approaches allow researchers to control and manipulate inconsistency and ambiguity in collaborative tagging. Statistical and pattern methodologies work well in general Internet indexing and searching, such as Google’s PageRank or Amazon’s collaborative ﬁltering system. Social network approaches: These approaches attempt to incorporate social net- work knowledge into collaborative tagging to improve the understanding of tag behaviours. Visualization approaches: Some researchers have incorporated the help of visu- alization, such as showing a navigation map or displaying the social network relations of the users. User consensus formation approaches: These approaches focus on the incon- sistency and ambiguity issues associated with collaborative tagging which stem from a lack of user consensus. Prominent applications, such as those offered by Wikipedia that ask users to contribute more extensive information than tags, have placed more focus on this issue. Given the complexity of the content being con- tributed, collaborative control and consensus formation is vital to the usability of a wiki and is driving extensive research.
270 D.D. Zad and H. Agius Summary This section considered example research related to collaborative retrieval and tagging. There is a great deal of research focused on retrieval that exploits user col- laboration to improve results. Mostly, user activity is utilised rather than information explicitly contributed or annotated; consequently, there tends to be less useful, gen- eral purpose metadata produced that could be exploited by other systems. There is also a rising amount of research being carried out on collaborative annotation of non-video media, especially photos, spurred on by websites such as Flickr and del.icio.us. Such sites provide the means for users to collaborate within a commu- nity to produce extensive and comprehensive annotations. However, the static nature of the media makes it less complicated and time-consuming to annotate than video, where there are a much greater number of semantic elements to consider which can be intricately interconnected due to temporality. There is far less understanding of how users behave collaboratively when annotating video; consequently, a body of research is starting to emerge here, some examples of which were reviewed above, where user comments in blogs and other Web resources, tags in YouTube, sam- ple data sets, and power user annotations have been the source for annotating the videos. Since the majority of systems rely on automatic annotation or manual anno- tation from power users, the power of collaboration from more typical ‘everyday’ users, who are far greater in number, to tackle this enormous amount of data is un- derexplored. As a result, we undertook an experiment with a number of everyday users in order to ascertain their typical behaviour and preferences when annotating video, in particular, when annotating user-created movies (e.g. those found on sites like YouTube). The experiment design and results are described in the following sections. Experiment Design In order to better understand how users collaborate when annotating movies, we undertook an experiment with 50 users. This experiment is now described and the results presented in the subsequent section. Users were asked to undertake a series of tasks using two existing video meta- data tools and their interactions were tracked. The users were chosen from a diverse population in order to produce results from typical users similar to the ZoneTag [11] average user approach. The users were unsupervised, but were communicating with other users via an instant messaging application, e.g. Windows Live Messenger, so that transcripts of all conversations could be recorded for later analysis. These tran- scripts contain important information about the behaviour of users in a collaborative community and contain metadata information if they are considered as comments on the videos. This is similar to the approach of Yamamoto et al. [14] who tried to utilise user comments and blog entries as sources for annotations. Users were also interviewed after they completed all tasks.
12 Collaborative Movie Annotation 271 Video Metadata Tools and Content The two video metadata tools used during the experiment were: YouTube: This tool provides a community for sharing video content on the Web. YouTube enables users to upload their videos, set age ratings for the videos, enter a description of the video, and also enter keywords. COSMOSIS: This system provides the means for more advanced content-based annotation with MPEG-7. With this system, users can model video content and deﬁne the semantics of their content such as objects, events, temporal relations and spatial relations [19, 20]. The video content used in the experiment was categorised according to the most popular types of self-created movies found on sites such as YouTube and Google Video. The categories were as follows: Personal content: This type of content is personal to users, e.g. videos of fam- ily, friends and work colleagues. Content is typically based around the people, occasion or location. Business content: This type of content has been created and is used for commer- cial purposes. It mainly includes videos created for advertising and promotion, such as video virals. Academic content: This type of content serves academic purposes, e.g. teaching and learning or research. Recreational content: This type of content has been created and is used for purposes other than personal, business or academic, such as faith, hobbies, amusement or ﬁlling free time. In addition, the video content exhibits certain content features. We consider the key content features in this experiment as follows: Objects: People, animals, inanimate objects, and properties of these objects. Events: Visual or aural occurrences within the video, e.g. a car chase, a ﬁght, an explosion, a gunshot, a type of music. Aural occurrences include music, noises and conversations. Relationships: Temporal, spatial, causer (causes another event or object to oc- cur), user (uses another object or event), part (is part of another object or event), specialises (a sub-classiﬁcation of an object or event), and location (occurs or is present in a certain location). The video content used in the experiment was chosen for its ability to richly exhibit one or more of these features within one or more of the above content categories. Each segment of video contained one or more of these features but was rich in a particular category, e.g. one video might be people-rich while another is noise- rich. In this way, all the features are present throughout the entire experiment and participants’ responses and modelling preferences, when presented with audiovisual content that includes these features, can be discovered.
272 D.D. Zad and H. Agius User Groups and Tasks Users were given a series of tasks, requiring them to tag and model the content of the video using the tools above. Users were assigned to groups (12-13 per group), one for each of the four different content categories above, but were not informed of this. Within these category groups, users worked together in smaller experiment groups of 3-6 users to ease the logistics of all users in the group collaborating together at the same time. Members of the same group were instructed to communicate with other group members while they were undertaking the tasks, using an instant messaging application, e.g. Windows Live Messenger. The collaborative communication tran- scripts were returned for analysis using grounded theory [21]. Consequently, group membership took into account user common interests and backgrounds since this was likely to increase the richness and frequency of the communication. The impor- tance of user communication during the experiment was stressed to users. The four user category groups were given slightly different goals as a result of differences between the categories. The personal category group (Group 1) was asked to use their own videos, the business category group (Group 2) was pro- vided with business-oriented videos, the academic category group (Group 3) was provided with videos of an academic nature, and the recreational category group (Group 4) were provided with a set of recreational videos. The videos for each category group differed in which features they were rich in, with other features also exhibited. Table 1 summarises the relationships between the content categories, user category groups and content rich features. Each user was required to tag and model the content of 3-5 mins worth of videos in YouTube and COSMOSIS. This could be one 5 min long video or a number of Table 1 Mapping of content categories to user category groups to content features Content Category: Personal Business Academic Recreation User Category Group: 1 2 3 4 People X X X X Animals X X X X Inanimate Objects X X X X Properties X X X X Events X X X X Content Features Music X X Noise X X X Conversation X X Temporal Relations X Spatial Relations X X Causer Relations X X User Relations X X Part Relations X X X Specialises Relations X X Location Relations X X
12 Collaborative Movie Annotation 273 videos that together totalled 5 mins. This ensured that users need not take more than about 15 mins to complete the tasks, since more time than this would greatly discourage them from participating, either initially or in completing all tasks. At the same time, the video duration is sufﬁcient to accommodate meaningful seman- tics. Users did not have to complete all the tasks in one session and were given a two week period to do so. YouTube tags, COSMOSIS metadata and collaborative communication transcripts were collected post experiment. After the users had undertaken the required tasks, a short, semi-structured inter- view was performed with each user. The focus of the interviews was on the users’ experiences with, and opinions regarding, the tools. Experiment Results This section presents the results from the experiment described in the above sec- tion. The experiment produced three types of data from four different sources: the metadata from tagging videos in YouTube, the MPEG-7 metadata created by COS- MOSIS, the collaborative communication transcripts, and the interview transcripts. The vast amount of textual data generated by these sources called for the use of a suitable qualitative research method to enable a thorough but manageable analysis of all the data to be performed. Research Method: Grounded Theory A grounded theory is deﬁned as theory which has been “systematically ob- tained through social research and is grounded in data” [22]. Grounded theory methodology is comprised of systematic techniques for the collection and analysis of data, exploring ideas and concepts that emerge through analytical writing [23]. Grounded theorists develop concepts directly from data through its simultaneous collection and analysis [24]. The process of using this method starts with open coding which includes theoretical comparison and constant comparison of the data, up to the point where conceptual saturation is reached. This provides the concepts, otherwise known as codes, that will build the means to tag the data in order to properly memo it and thus provide meaningful data (dimensions, properties, rela- tionships) to form a theory. Conceptual saturation is reached when no more codes can be assigned to the data and all the data can be categorised under one of the codes already available, with no room for more codes. In our approach, we include an additional visualisation stage after memoing in order to assist with the analysis and deduction of the grounded theory. Figure 1 illustrates the steps taken in our data analysis approach. As can be seen in the ﬁgure, the MPEG-7 metadata and the metadata gathered from YouTube tagging, along with the collaborative communication transcripts and
274 D.D. Zad and H. Agius Metadata Data (individual) Raw Data Collaborative data Open Concepts (Experiment Communication Coding Groups) Transctipts Data (Category Groups) Interview Data (Total) Memoing Validation Theoretical Constant Comparison Comparison Dimensions, Properties, Relationships Conceptual Saturation Data Grounded Theory (individual) Data (Experiment Visualised Data Groups) Data (Category Groups) Data (Total) Visualisation Fig. 1 Grounded theory as applied to the collected data in this experiment interviews, form the basis of the open coding process. The memoing process is then performed on a number of levels. The process commences on the individual level where all the data from the individual users is processed independently. Then the data from users within the same experiment group are memoed. Following this, the data for entire user category groups is considered (personal, academic, business and recreational) so that the data from all the users who were assigned to the same cat- egory are memoed together to allow further groupings to emerge. Finally, all the collected data is considered as a whole. All of the dimensions, properties and rela- tionships that emerge from these four memoing stages are then combined together and visualised. Finally, the visualised data is analysed to provide a grounded the- ory concerning movie content metadata creation and system feature requirements.
12 Collaborative Movie Annotation 275 The most important results are presented in the following two sub-sections and are then used to form the basis of an architecture for a collaborative movie annotation system. Movie Content Metadata Creation This section presents the key metadata results from the grounded theory approach. We ﬁrst consider the most commonly used tags; then we discuss the relationships between the tags. Most Commonly Used Tags According to Li and Lu [18], recognising the most common tags used by differ- ent users when modelling a video can assist with combining the ontology approach with the social networking approach (described earlier) when designing a collabo- rative annotation system. Our results indicate that there were some inconsiderable differences in the use of tags for movies in different content categories and that, overall, the popularity of tags remains fairly consistent irrespective of these cate- gories. Figure 2 to Figure 5 represent the visualisation of the tags used in YouTube in different categories and show all of the popular tags. The four most commonly used tags in YouTube concerned: 1. inanimate objects 2. events 3. people 4. locations Fig. 2 Overall use of tags in YouTube for movies in the personal category
276 D.D. Zad and H. Agius Fig. 3 Overall use of tags in YouTube for movies in the business category Fig. 4 Overall use of tags in YouTube for movies in the academic category Figure 6 to Figure 9 illustrate the tags used in COSMOSIS within each category and their popularity. In this case, the four most commonly used tags overall con- cerned: 1. time 2. events 3. inanimate objects 4. people
12 Collaborative Movie Annotation 277 Fig. 5 Overall use of tags in YouTube for movies in the recreational category Fig. 6 Overall use of tags in COSMOSIS for movies in the personal category The peak use of time in COSMOSIS is explained by the fact that it allows tags to be associated with time points (start points and/or end points), which is not possible in YouTube, and asks users if they wish to add time points after each tag is added. As a consequence, users added time points to most tags. This suggests that users will add time points for tags if the means to do so is easily provided. Consequently, a collaborative movie annotating system should fully support these commonly used tags and prioritise their accessibility.
278 D.D. Zad and H. Agius Fig. 7 Overall use of tags in COSMOSIS for movies in the business category Fig. 8 Overall use of tags in COSMOSIS for movies in the academic category Relationships between Tags Another set of key results from the experiment concerned relationships between the tags. This shows which tags are used with each other more often; that is, if an object is tagged in a scene which tag also tends to be used in conjunction with it. The bar diagrams in Figure 10 and Figure 11 show the relationships between tags for YouTube and COSMOSIS respectively (tags that were not used at all have been removed to improve readability). One immediate observation is that as users are not
12 Collaborative Movie Annotation 279 Fig. 9 Overall use of tags in COSMOSIS for movies in the recreational category able to provide time points in YouTube, the relation between time and other tags is considerably low while for COSMOSIS it is extremely high, for the reasons stated above. Overall, the most common relationships between tags discovered from the experiment data were: Inanimate Object – Inanimate Object – Inanimate Object – Time Property People Event – Time Event – Property Event – Inanimate Object People – Time People – Property People – Event The strong relationships between time and other tags suggests that a collaborative movie annotating system should allow and encourage users to add time points for their tags and make the process of it as simple as possible. Similarly, users tend to add properties for inanimate objects, events and people quite frequently; therefore it is imperative that this process be supported in an accessible fashion.
280 D.D. Zad and H. Agius Fig. 10 Overall relationships between tags used in YouTube Fig. 11 Overall relationships between tags used in COSMOSIS