intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Master thesis Computer Science: Object movement modeling

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:72

6
lượt xem
2
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

This thesis develops a flexible customer behavior analysis system, including essential head pose estimation or F-formation modules. This system will be evaluated in an actual retail store. Further, after studying the system, realizing the mentioned problems of the head pose problem, we also propose a process to collect the head pose dataset and multi-task deep neural network model, fusing face detection and head pose estimation to yield face position and head pose at the same time.

Chủ đề:
Lưu

Nội dung Text: Master thesis Computer Science: Object movement modeling

  1. VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN DINH TUAN OBJECT MOVEMENT MODELING Master thesis Major: Computer Science Ha Noi - 2021
  2. VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN DINH TUAN OBJECT MOVEMENT MODELING Major: Computer Science Supervisor: Dr. Tran Quoc Long HA NOI - 2021
  3. ABSTRACT Artificial intelligence has advanced in recent years, enabling the development of numer- ous applications for seeing the physical world and assisting humans with a variety of activities. Among the numerous applications of video understanding problems, one of the more challeng- ing branches is object movement analysis. To understand object movement, we must rely on the context of the activity to indicate the object’s states, such as shopping, going to the hospital, participating in sports, or crowd behavior. Despite the amount of studies, the majority of studies focus on specific behaviors, and there is currently no comprehensive model for this topic. This thesis establishes a framework for modeling object movement in the setting of cus- tomers in a retail store. To be more precise, we begin by modeling the store’s consumers’ individual and group behavior. Second, we design and implement this modeling using a dis- tributed approach, which enables efficient deployment of the system. Finally, we installed and assessed this system in a physical store in Vietnam. After conducting experiments with the system, we discovered that head pose estimation is a significant module in the consumer behavior analysis problem. However, models that perform well on benchmark datasets do not always perform well when deployed. This occurs as a result of the data collecting system’s complex architecture. After all, current benchmark datasets are generated only in a laboratory setting. As a result, this thesis also provides a novel technique that requires a dataset, an easy-to-set-up and gather system, in order to obtain more diverse data. Additionally, we propose multi-task model learning for face detection and head pose estimation simultaneously, which reduces latency in comparison to the traditional method for head pose estimation, which relies on face detection and head pose estimation independently. i
  4. ACKNOWLEDGMENTS There are many people I must thank for contributing to the two wonderful years of my experience as a Master student. First I want to express my gratitude to my adviser, Dr. Tran Quoc Long for continu- ous support of my study and research, for his patience, motivation, enthusiasm, and immense knowledge. His advice was invaluable to me during my student and master’s years. He has pro- vided me with numerous opportunities to participate in a variety of projects, both production and research, through which I have learned many fundamental and significant lessons. He ex- tracted the essence of my idea, uncovered the more fundamental story, and placed it in a larger context when I pitched it to him. He gives me advise and helps me see things more clearly when I’m in trouble. I am very grateful to Dr. Long for not only educating me about computer vision or the art of communication, but also for teaching me how to think. I’ve been very fortunate to learn from many other incredibly amazing people in MEMS LAB (Micro Electronic Mechanical System Lab), leaded by Prof. Chu Duc Trinh, they always give me a chance, insightful remarks, feedback and guidance. I would like to thank my close collaborator, MSc. Phan Hoang Anh who I’ve had the distinct pleasure of working with and learning from through robotics project and thoughtful discussions. He is very wonderful and motivating leader, who always knows how to lift his team from trouble and push the project to milestone. I also have to thank Assoc. Prof. Le Thanh Ha, who’ve directed Human Machine In- teraction (HMI) laboratory, where I have participated for more than three years since I was a freshman. Assoc. Prof. Le Thanh Ha , Dr. Nguyen Chi Thanh, Dr. Nguyen Do Van, Assoc. Prof. Nguyen Thi Thuy and member of HMI Lab enthusiastically shared their experience and provided feedback on my computer vision project. I’d like to thank Mr. Nguyen Viet Hung, a former member of HMI Lab, for introducing me to the lab during this time. He was the first per- son to introduce me about the field of research and always encouraged me to learn new things. I also appreciate duration of four years he was my room mate, he is like my brother to me. I have to thank many people who have contributed to my daily life and who have made my experience at University of Engineering and Technology, VNU so pleasant. From the UET ii
  5. AILab: Nguyen Minh Tuan, Nguyen Van Phi, Tran Minh Duc, Hoang Giang, Le Pham Van Linh, Le Duy Son, Hoang Tung, Nguyen Phuc Hai. From MEMS lab: Nguyen Duc Tien, Tran Huu Quoc Dong, Bui Duy Nam, Dam Phuong Nam. I’d like to show my thankfulness to Nguyen Viet Linh and Nguyen Viet Hoang of the UET AI Lab, who collaborated with me on the work reported in Chapters 2 and 3. I am grateful to the funding agency, DNP company for supporting my research, especially Keisuke Hihara, Yumeka Utada, Akihiko Torii, Naoki Izumi, who weekly discuss with me about research project. In addition, I want to thank Gem Vietnam company, particularly Dr. Vu Xuan Tung and Vu Xuan Tuan, who spent six months working with me in research project and evaluating the system in a real-world setting. I’m very fortunate to have my two best friends Nguyen Khac Hung and Nguyen Dinh Huy, who always care about me and are a safe place for me in difficult times and a source of joy in happy times. And Nguyen Tien Dat, Nguyen Khanh Linh, Nguyen Thien Viet are my cousins, who have always been supportive of me since I first started as a student. A big thank to my parents, who sacrificed many things for me and my young brother, who always care about us and supporting us spiritually throughout our life. iii
  6. Contents ABSTRACT i ACKNOWLEDGMENTS ii 1 Introduction 4 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Contributions and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Behavior Analysis System: Design and Implementation 9 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2 Aim of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 System Design and Implementation . . . . . . . . . . . . . . . . . . . 17 2.3 Validation and Behavior Visualization . . . . . . . . . . . . . . . . . . . . . . 25 2.3.1 State validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3.2 Personal Behavior Analysis Visualization . . . . . . . . . . . . . . . . 26 2.3.3 Group Behavior Analysis Visualization . . . . . . . . . . . . . . . . . 29 2.4 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 UET-Headpose: A sensor-based top-view head pose dataset 32 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 Head pose estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.2 Head pose dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.1 Structure Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.2 Analyze UET-Headpose dataset and baseline model . . . . . . . . . . . 39 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
  7. 3.4.2 Metric evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4 Simultaneous face detection and 360 degree head pose estimation 44 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.1 Propose model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3.2 Rotation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3.3 Multitask loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.1 Training strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4.2 Dataset and evaluation protocols . . . . . . . . . . . . . . . . . . . . . 51 4.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Conclusion 56 Publications 57
  8. List of Figures 2.1 Human Behavior Analysis needs of retail store . . . . . . . . . . . . . . . . . . 12 2.2 Overview of system architecture . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3 Modeling a person Hi as state machine with the set of states Q = {I, A, L, P }. 16 2.4 State transition of customer H2 : I → A → P → L . . . . . . . . . . . . . . . 17 2.5 Layer-based system architecture . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.6 (a) Description for system work follow Message-based process scheme. (b) Detail for each process work in ROS environment. . . . . . . . . . . . . . . . . 22 2.7 Message definition for information of Approach, Pick and Interact node . . . . 22 2.8 Statistic for state A and P for people in store with and without forming group (F-formation) condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.9 Statistic duration time of state A and P for people in the store with and without forming group (F-formation) condition . . . . . . . . . . . . . . . . . . . . . . 28 2.10 Statistic number of state A and P for people in store with and without forming group (F-formation) condition . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.11 (a) 2D position of customer state in purchase hour 9 in 5/21/2021. (b) 2D po- sition of customer having id 37, 38, 41, 94 with and without forming group condition, the red dot represents staff position. . . . . . . . . . . . . . . . . . . 29 2.12 Statistic for F-formation group type in one day. . . . . . . . . . . . . . . . . . 30 2.13 Statistic for duration time (second) for F-formation group type for all customers in one day. Note: Due to too many humans, some idi are not shown. . . . . . . 30 2.14 Statistic for duration time (second) of Pick, Approach state and forming in F- formation for all customer with condition having approach or pick state . . . . 30 2.15 Statistic duration time (second) for state and forming in Customer-Staff group for customer forming F-formation group. . . . . . . . . . . . . . . . . . . . . . 31 3.1 The detailed architecture system of new approach. . . . . . . . . . . . . . . . . 34 3.2 The pipeline of pose data and image data synchronization . . . . . . . . . . . . 38 3.3 Comparison between CMU, 300W-LP and UET-Headpose distribution of the yaw angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Evaluation models in the AFLW2000 dataset, loss of angle (MAWE) measured in degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 1
  9. 3.5 Cost comparison with other datasets . . . . . . . . . . . . . . . . . . . . . . . 43 4.1 The detailed architecture of Multitask-Net model. . . . . . . . . . . . . . . . . 47 4.2 Loss graphs of Multitask-Net model when training all branch. In this figure, confidence loss graph of face detection branch is on the left and Yaw loss of head pose estimation branch is on the right. . . . . . . . . . . . . . . . . . . . 52 4.3 Evaluate our models trained in CMU dataset . . . . . . . . . . . . . . . . . . . 54 2
  10. List of Tables 2.1 Log for state transition of customer having id 2 . . . . . . . . . . . . . . . . . 17 2.2 Method for each attribute recognition module . . . . . . . . . . . . . . . . . . 19 2.3 Personal state evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Store-staff classification evaluation, using Mobilenet[55] . . . . . . . . . . . . 26 2.5 F-formation group recognition evaluation . . . . . . . . . . . . . . . . . . . . 26 3.1 Evaluation models in two datasets AFLW2000 and UET-Headpose-Val . . . . . 42 4.1 Evaluation models which are trained in 300WLP datasets in the all BIWI dataset 52 4.2 Compare our different version models which are trained in 70% BIWI dataset, test in the 30% BIWI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3
  11. CHAPTER 1 Introduction 1.1 Overview Nowadays, Artificial Intelligence enables computers to see and understand the enormous visual world around us and endow them with the ability to interact and analyze humans. Hu- mans find it easy to understand the action of an object and another human in a video, tasks that involve a sequence of actions. For instance, a glance at a video is sufficient for a human to determine what actions are performing. Similarly, in a retail store, employees quickly can determine expectations of a customer through their actions and interaction with an item of the store. Challenges. Although this ability feels so natural and effortless for us, it brings many challenges for a computer. In a computer, we have sub-modules to detect and classify objects in each image, but we still lack methods to recognize action in a video and combine these actions to predict the following state of the object in that video. Moreover, various complicated activities of humans can come up in a video, and maybe one activity is associated with other activities that will mystify the machine. In a retail store situation again, when customers visit the store, they will make a wide variety of actions before making a buy decision. Due to surveillance systems being used very popular, customer behavior analysis is recommended for capture and analysis of the expectation of customers. In particular, this system would acquire image stream from a wide range of camera, with each sub-module analysis one aspect of human action, then the system have enormous information to construct summary about the attitude of the customer as well as agent and analysis the correlation between actions of human and the revenue. Likewise, head pose estimation plays an essential role in human behavior analysis, but most of the current method focuses on building algorithms based on laboratory environment, so this module should also be noticed in this area. Encouraging progress. Despite the difficulty of this task, we have recently witnessed several benchmark datasets for human behavior analysis issues such as research on [28], [19], 4
  12. [12]. However, these dataset benchmarks are limited in the laboratory environment, so methods built on them will be difficult when applied to practical situations. Some human analysis stud- ies experiment on benchmarks such as [49], using Bayesian Model Network to formulating the system or [35], [30], [31] also involve to analysis behavior of humans. Our progress on related system design has been slightly similar to [49]. Head pose estimation is also investigated sig- nificantly in producing benchmark datasets [79], [26], [17] or methodologies [77], [54], [14]. Together, these advances have enabled many real-world applications, including human behavior analysis, human-robot interaction, perception in robotic, face recognition. Remaining Challenges. [49] provides a basic design for analyzing human behavior, but it is at an abstraction level, providing neither a flexible way to establish a design nor evaluating it in real-life situations. Furthermore, the system’s modules are pretty simple, exclusive of the head pose estimation or F-formation, which is one of the complex problems in analyzing human behavior. On the other hand, the current head pose dataset is quite biased toward the lab environ- ment, and the methods to estimate pose [77], [54], [14] are separate from the face recognition model, so when implementing the actual system, the system will do more processing, thereby increasing latency. Outline of contributions. This thesis develops a flexible customer behavior analysis system, including essential head pose estimation or F-formation modules. This system will be evaluated in an actual retail store. Further, after studying the system, realizing the mentioned problems of the head pose problem, we also propose a process to collect the head pose dataset and multi-task deep neural network model, fusing face detection and head pose estimation to yield face position and head pose at the same time. Long-term motivations. This thesis sets out two long-term goals. Firstly, we want to provide a basic system to analyze user behavior in retail stores. In the future, with problems of analyzing human behavior in more complex situations, we can inherit this system by adding recognition functions to the system quickly. Moreover, with the system implementation as suggested in the thesis, we can conveniently distribute the functions to many edge devices, thereby improving the distribution ability with large systems. Second, with the proposal to create a head pose dataset by sensor-based method, the cost of collecting data will be more affordable, and we can collect data in various environments. Challenges of this approach. However, this approach also has some challenges. The most obvious difficulty is that evaluating the system is quite expensive. We need to evaluate each submodule and the abstract results for the system. For example, small modules inside the system like head pose estimation, human pose estimation, human tracking, or higher module is F-formation detection. Meanwhile, the abstract question is: How can we base on the result of the system to analyze customer behavior? To solve the above two problems, we evaluate the 5
  13. accuracy of each submodule through the human-labeled data set. For abstract questions, we analyze the relationship between the amount of behavior performed and the number of products and product codes sold during and around that period. These analytics allow us to see which behavior is most relevant to purchasing. Our research can determine what behavior is related to a particular customer’s purchase, but due to the limited number of cameras in the system and the store’s privacy, the system cannot know which behavior has led to the most purchases. Putting the model into practice also faces performance-related challenges. Specifically, because the system requires one camera, we use the Realsense D435 Camera, a Jetson D435 computer, and a PC with a 1080Ti graphics card, so a framework is required to interchange information between these devices. Moreover, in practice, when the number of people in the store is large, or the connection between devices is slow, the system needs a mechanism to synchronize the results of many modules to avoid data loss. 1.2 Related Work Human Behavior Analysis. This thesis is inspired by many works whose goal is to build a system to analyze human behavior. Previous studies have used statistical frameworks, in- tending to maximize the probability of actions occurring together [49], [35], [31], even with each small module in the system, the above studies also use statistical algorithms or traditional methods to solve the problem. For example, in the study [35], the author uses traditional image processing techniques to identify human skeletons. Recently, research in this area has focused on identifying more specific behaviors instead of studying systems for deploying these recog- nition modules. For example, F-formation is one of the most critical modules in analyzing the behavior of a group of people, thereby showing the degree of interaction of the individuals in the image. The study [58] uses the sampling method to find out the O-space, which is the com- munication area of a people group based on the position and head pose of each person in the image. The limitation of this method is that because the head pose data is modeled according to a fixed distribution, i.e., a normal distribution, the method may not be generalizable in practice. Similarly, the study [23] provides a scheme that uses optimization algorithms and a classifica- tion model to find subgroups in a crowd. The latter method uses more modern algorithms than the former, but both assume the input data is known coordinates and head pose. In other words, the above studies have not put small behavioral recognition modules into an extensive system to link to other modules. Moreover, most importantly, the behavioral problem of analyzing human behavior is very diverse, so putting these modules into practice is necessary to analyze the algorithm’s weaknesses. In general, for each human behavior analysis problem, we expect different final analysis results for each requirement, but each submodule within an extensive system should cover all possible cases. 6
  14. Head Pose Estimation. Head pose plays a significant role in problems related to ana- lyzing human behavior. Based on a person’s head pose, we can determine the area in which that person is interested or the object that person is interacting. Many studies create benchmark datasets like [79], [17], which use intermediate data such as camera depth or face keypoints. Therefore, the above works need to limit people to a specific space or distance from the camera for annotations with high accuracy. [26] gives a system of many cameras mounted around the room, then combines the data of these many cameras to give annotations; head pose is one of those annotations. Due to the peculiarity of [26] to design datasets for social interaction anal- ysis, many annotations other than head pose are obtained, so the design system is relatively complex and challenging to implement when retrieving the data in reality. Meanwhile, many low-cost sensors can measure the change of an object’s rotation angle. The application of this sensor to the head pose problem promises to offer a new, cheaper method, less constrained by depth cameras. Modeling approach. Most of the studies related to this problem describe only a part of the problem, for ex- ample, [49], [30], [35] only designed a system capable of analyzing customer behavior in the store, but did not show a relationship between the results of the analysis system and the store sales figures. Therefore, we do not have a logical view of the system’s output. Similarly, [58] introduces a method to detect F-formation, but this method is entirely separate from the system. This detect method has the assumption that the head pose and the position of the people in 3D space are known. Our work proposes a system with complex modules like tracking, head pose estimation, human pose estimation, and higher-level modules like F-formation. Moreover, these modules will also be evaluated for accuracy and relevance to the time of purchase. From there, we have a better insight into the system and the results that the system recognizes. On the other hand, current methods use statistical methods or traditional image processing in sub-modules. Recently, deep learning has been a perfect tool for us to use to solve this prob- lem. For example, our work uses convolutional neural network (CNN) to solve sub-modules like [9] for human pose recognition, [65] for customer tracking, [67] for customer recognition customer’s face. In addition to modules using pretrained or finetuned, the thesis also uses CNN for new problems in the system such as customer and employee classification. Finally, the head pose plays a very important role in analyzing customer behavior, because in fact, customers mostly show interest in a product through their head pose. Most of the works mentioned above do not use head pose or traditional methods to handle head pose problems. After running the customer behavior analysis system in the actual store, it was found that the accuracy of the head pose module, when put out in reality, is not high. So we decided to inves- tigate how to create a data set that is more generalizable rather than confined to the laboratory 7
  15. environment. The most important thing is to design a system that generates head pose data to replace depth cameras; our works use angle sensors. On the other hand, we can see currently face recognition usually needs two phases, face detection, and head pose estimation. In our works, we propose a multi-task model that returns the results of face detection and head pose estimation simultaneously. 1.3 Contributions and Outline In this thesis, we develop a complete system to analyze customer behavior at the store. Specifically, we propose a design capable of analyzing many low- or high-level human behav- iors when shopping, including group behavior; We also introduce an evaluation framework to see how the system’s results correlate to the purchase metrics by the hour of the day. Moreover, we also propose a data acquisition procedure and a feasible head pose prediction method for real problems. In Chapter 2 we model the whole scheme in mathematic and describe in detail the system, how to organize the system in layers, and improved versions of the system through the stages of testing the system in real life. With the system design described in this chapter, we set up the system in a physical store on multiple devices. We also provide assessments of the system’s sub-modules and the relevance of behaviors to purchases in this section. In this chapter, when stabilizing the system, we used the work “Employing Extended Kalman Filter with Indoor Positioning System for Robot Localization Application” by Dong and Tuan et al. In this work, we built a robotic system and the algorithm used to stabilize the robot’s position. In Chapter 3 we propose a new way to generate head pose datasets in general; this method is sensor-based, so it is possible to reduce the limitations of the depth camera. Our work has also been published “UET-Headpose: A sensor-based top-view head pose dataset” by Linh and Tuan et al., where Tuan and Linh are equal co-authors. In Chapter 4 we propose a multi-task model capable of returning the results of face detection and head pose estimation at the same time. Our work is published “Simultaneous face detection and 360-degree head pose estimation” by Hoang and Tuan et al. Finally, in Chapter 5 we identify the remaining challenges and discuss the path forward. 8
  16. CHAPTER 2 Behavior Analysis System: Design and Im- plementation 2.1 Introduction The demands and requirements of a large number of individuals are visible in their be- havior, interactions with other customers or employees, and activities related to shop items. Understanding customer behavior in an interior store is critical for any company that wants to provide a more personal and compelling shopping experience, enhance store operations, and ultimately improve user experience, sales conversion rates and revenue. The store’s staff is re- sponsible for the majority of the study of client behavior and sentiments. However, when the number of consumers grows, the analysis of employees does not demonstrate flexibility and re- activity as quickly. As a result, there is a requirement to assess consumer behavior automatically and with minimal delay, as well as to track customer behavior over time. 2.1.1 Related work There is still a scarcity of research on predicting customer behavior in stores in the present literature. While a consumer is in the store, several gestures are made, and modeling gestures mathematically lets us develop an efficient system that can expand to other gestures for different situations. Furthermore, decomposing this challenge into more recognizable sub-problems like tracking, detection, and linking them in the system is made easier by modeling client behaviors from a mathematical perspective. Some works in this field have been introduced, such as [49] and [66], however these systems do not generalize to other issues, and there is very little inheritance of these systems when analyzing behavior analysis problems in settings other than the exhibition or hospital. These works also don’t structure the modules in a distributed fashion, and there aren’t many 9
  17. empirical analyses for real-world behavior. Moreover, these systems use traditional machine learning or image processing methods to recognize the related behavior. For example, [49] use Mean Shift algorithm for human tracking module, which is sensitive to complicated background such as in a retail store; or [66] use morphological processing and HOG algorithm to detect people. In recent years, as a result of advances in deep learning in computer vision, such as [37], [74], [25], [42], [75], related issues have become more efficient and accurate to handle. Nu- merous studies on surveillance camera systems have been published, such tracking [72], [65] and [64], in which algorithms can capture the trajectory of people such as store customers. Be- sides, detection [6], [16], [11], in which these detection algorithms can use an image as input to create a bounding box around an object such as a human, car, dog, cat, from this informa- tion we can determine the location of object. Additionally, deep learning is extremely effective at recognizing some critical client characteristics, such as head pose [67], [14], [54]. These works estimate three degrees of angle roll, pitch, and yaw using a face picture clipped by face detection in the preceding phase. These data aid in the determination of client attention zone in the store by customer behavior systems. In human pose [9], [63], [59], these works employ a complex neural network to determine the pose of a human skeleton. The data used to analyze consumer behavior is derived from the appearance, gestures, and interactions of customers with other people or objects in the shop; these data are almost entirely gathered via camera images. Today, the majority of stores are equipped with surveillance cameras, which has resulted in the publication of various research on monitoring consumer behavior in-store, including [4], [34], [20], [70], [36]. However, the majority of these publications focus on specific sub-modules of the customer behavior problem and have not yet developed a generic technique or a system with a high capacity for module integration. For instance, [70] focuses only on face analysis to as- certain customer interest, [4] investigates the approach for determining a customer’s browsing behavior, [36] focuses on customer pose estimation through a bi-directional recurrent neural network. Additionally, these works’ experiments are conducted mostly in the laboratory and lack data on actual customer behavior. Customer behavior is expressed at the store not only through individual actions such as picking up an item, glancing at the area surrounding the item [35] or approaching this area, but also through group behavior. Actually, group behavior is a very efficient way for customer to express their concern with other objects, such as items or employees. F-formation is a very familiar technique for describing group behavior [48], [27], [13], in [27], the author divides the F-formation group into numerous varieties, including the L-shaped group, the Vis-Vis group, the Circular group, the Side-by-Side group, and so on. In this article, we will discuss three dif- ferent configurations: L-shape, Vis-Vis, and Side-by-Side. The first process of determining an f-formation group is group detection; this task requires segmenting the crowd into tiny groups. 10
  18. The second phase uses the head pose, body pose, and position of each member to identify the group type. There are numerous research publications on the initial phase of F-formation [57], [58]; nevertheless, these works assume prior knowledge of the client’s 3D position and face orientation, which are the two most complicated pieces of information in practice. The most recent state-of-the-art research on the F-formation problem is [23], which, like previous works , makes the assumption that the 3D coordinates and face orientation are available from the SALSA benchmark dataset [3]. The authors of this work employ a pipeline structure that consists of three distinct steps: data deconstruction step, pairwise classification step to con- struct correlation matrix for people in image and reconstruction step to cluster people in the same group from correlation matrix, F-formation module in our work are based on this method. Moreover, we produce F-formation results from head pose estimation, human pose estimation and object detection modules to have insight connection between the results of these modules and the F-formation result. Because there is less work on the classification of F-formations in general, we classify them using some rules based on the pose and location properties of the group’s members. As shown in Figure. 2.1, the thorough system should comprise three components (a) Behavior Modelling, which enables us to present our designs mathematically; (b) Behavior System, which enables us to use the design from the first part to architect and implement the system; and finally, after once the system is implemented, we should have (c) Behavior Visual- ization, which enables us to gain insight into our behavior data. Actually, the current works on behavior system only visualize only abstract result of behavior data such as [31] which only de- scribes the average visit time, visitor count, and percentage of interaction or [35] only visualize trajectories of various arm action, etc. In our work, we describe detail both personal behavior and group behavior during a single day when our system is running in a real store. 2.1.2 Aim of the work This work is intended to propose a comprehensive framework for explaining the system in the modelling part. This part provides us with better context for designing the system and integrating new modules or modifying the structure of the system in the future. After developing a model of customer behavior in the retail store, the system is constructed using a layered architecture, and its modules are implemented in a distributed approach, which enables the system to be deployed across various devices. Finally, our methodology is implemented in a real-world retail environment, and behavior data is visualized in depth using both a personal and group approach. To our knowledge, after a review of the current state of the art, the main contributions of our framework are the following: 11
  19. • Building an approach to modeling customer behavior in the store, with this tool, we can decompose this enormous problem into smaller ones and generalize it to other challenges. • Building a behavior analysis system from modeling, the system can decentralize to a large number of devices, from which it can optimize speed and leverage the distributed prob- lem’s capabilities. • Finally, the system is evaluated in-store, where it collects and visualizes data about the behavior of individual and group users. This provides insight into the behavior of our customers. (a) Behavior Modelling (b) Behavior System (c) Behavior Visualization ピックアップ ♪ ♪ 80cm以内 Human Behavior Analysis System Figure 2.1: Human Behavior Analysis needs of retail store The rest of the work is organized as follows: Section 2.2 illustrates modeling and design of the system; in Section 2.3, the performance analysis for each module in the system is reported together with the visualization of behavior data; finally, Section 2.4 reports conclusions. 2.2 System The customer behavior analysis system makes use of data from sensors, specifically a camera with depth data in this case. They enable behavior recognition modules to recognize and store data in two forms, transition data and interaction data. Transition and interaction data can be depicted as a component in Fig. 2.2 represents the general system. 12
  20. Depth Camera Action Recognition Module Transition & Behavior Detection Interaction Data Recognition Module Visualization Module Recognition Module Action Recognition Recognition Module Recognition Module Recognition Module Figure 2.2: Overview of system architecture 2.2.1 Modelling When it comes to the problem of customer behavior analysis in the retail business, there are three major questions that the system wishes to store about the customer’s activity: 1. Where do customers go in the store? 2. Which item piques the customer’s interest or attention? 3. Who do customers interact with during the decision-making process? According to the answers to the above questions, a customer behavior analysis system should track each person’s location and distinguish him or her from other customers when the customer enters the store. Additionally, the system must be aware of the person’s field of view or area of interest, as well as what the consumer holds up in the store; if the customer is interested in an item, they must pick it up to inspect it. Moreover, interactions between customers and employees or customers and other customers are critical for assessing customer behavior since they provide insight into the consumer’s level of interest. Also, the system can track the amount of time employees spend serving consumers. [31] has developed a system but has not yet developed a model of customer behavior. As can be seen, the analysis of a person’s purchasing behavior can be classified into two categories: individual behavior and group behavior. As a result, this section will attempt to mathematically model these two types. 13
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2