VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN DINH TUAN
OBJECT MOVEMENT MODELING
Master thesis
Major: Computer Science
Ha Noi - 2021
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN DINH TUAN
OBJECT MOVEMENT MODELING
Major: Computer Science
Supervisor: Dr. Tran Quoc Long
HA NOI - 2021
ABSTRACT
Artificial intelligence has advanced in recent years, enabling the development of numer-
ous applications for seeing the physical world and assisting humans with a variety of activities.
Among the numerous applications of video understanding problems, one of the more challeng-
ing branches is object movement analysis. To understand object movement, we must rely on
the context of the activity to indicate the object’s states, such as shopping, going to the hospital,
participating in sports, or crowd behavior. Despite the amount of studies, the majority of studies
focus on specific behaviors, and there is currently no comprehensive model for this topic.
This thesis establishes a framework for modeling object movement in the setting of cus-
tomers in a retail store. To be more precise, we begin by modeling the store’s consumers’
individual and group behavior. Second, we design and implement this modeling using a dis-
tributed approach, which enables efficient deployment of the system. Finally, we installed and
assessed this system in a physical store in Vietnam.
After conducting experiments with the system, we discovered that head pose estimation is
a significant module in the consumer behavior analysis problem. However, models that perform
well on benchmark datasets do not always perform well when deployed. This occurs as a result
of the data collecting system’s complex architecture. After all, current benchmark datasets are
generated only in a laboratory setting. As a result, this thesis also provides a novel technique
that requires a dataset, an easy-to-set-up and gather system, in order to obtain more diverse data.
Additionally, we propose multi-task model learning for face detection and head pose estimation
simultaneously, which reduces latency in comparison to the traditional method for head pose
estimation, which relies on face detection and head pose estimation independently.
i
ACKNOWLEDGMENTS
There are many people I must thank for contributing to the two wonderful years of my
experience as a Master student.
First I want to express my gratitude to my adviser, Dr. Tran Quoc Long for continu-
ous support of my study and research, for his patience, motivation, enthusiasm, and immense
knowledge. His advice was invaluable to me during my student and master’s years. He has pro-
vided me with numerous opportunities to participate in a variety of projects, both production
and research, through which I have learned many fundamental and significant lessons. He ex-
tracted the essence of my idea, uncovered the more fundamental story, and placed it in a larger
context when I pitched it to him. He gives me advise and helps me see things more clearly when
I’m in trouble. I am very grateful to Dr. Long for not only educating me about computer vision
or the art of communication, but also for teaching me how to think.
I’ve been very fortunate to learn from many other incredibly amazing people in MEMS
LAB (Micro Electronic Mechanical System Lab), leaded by Prof. Chu Duc Trinh, they always
give me a chance, insightful remarks, feedback and guidance. I would like to thank my close
collaborator, MSc. Phan Hoang Anh who I’ve had the distinct pleasure of working with and
learning from through robotics project and thoughtful discussions. He is very wonderful and
motivating leader, who always knows how to lift his team from trouble and push the project to
milestone.
I also have to thank Assoc. Prof. Le Thanh Ha, who’ve directed Human Machine In-
teraction (HMI) laboratory, where I have participated for more than three years since I was a
freshman. Assoc. Prof. Le Thanh Ha , Dr. Nguyen Chi Thanh, Dr. Nguyen Do Van, Assoc.
Prof. Nguyen Thi Thuy and member of HMI Lab enthusiastically shared their experience and
provided feedback on my computer vision project. I’d like to thank Mr. Nguyen Viet Hung, a
former member of HMI Lab, for introducing me to the lab during this time. He was the first per-
son to introduce me about the field of research and always encouraged me to learn new things.
I also appreciate duration of four years he was my room mate, he is like my brother to me.
I have to thank many people who have contributed to my daily life and who have made
my experience at University of Engineering and Technology, VNU so pleasant. From the UET
ii
AILab: Nguyen Minh Tuan, Nguyen Van Phi, Tran Minh Duc, Hoang Giang, Le Pham Van
Linh, Le Duy Son, Hoang Tung, Nguyen Phuc Hai. From MEMS lab: Nguyen Duc Tien,
Tran Huu Quoc Dong, Bui Duy Nam, Dam Phuong Nam. I’d like to show my thankfulness to
Nguyen Viet Linh and Nguyen Viet Hoang of the UET AI Lab, who collaborated with me on
the work reported in Chapters 2 and 3.
I am grateful to the funding agency, DNP company for supporting my research, especially
Keisuke Hihara, Yumeka Utada, Akihiko Torii, Naoki Izumi, who weekly discuss with me
about research project. In addition, I want to thank Gem Vietnam company, particularly Dr. Vu
Xuan Tung and Vu Xuan Tuan, who spent six months working with me in research project and
evaluating the system in a real-world setting.
I’m very fortunate to have my two best friends Nguyen Khac Hung and Nguyen Dinh
Huy, who always care about me and are a safe place for me in difficult times and a source of
joy in happy times. And Nguyen Tien Dat, Nguyen Khanh Linh, Nguyen Thien Viet are my
cousins, who have always been supportive of me since I first started as a student.
A big thank to my parents, who sacrificed many things for me and my young brother, who
always care about us and supporting us spiritually throughout our life.
iii