VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Nguyen Minh Trang
ADVANCED DEEP LEARNING METHODS
AND APPLICATIONS IN
OPEN-DOMAIN QUESTION ANSWERING
MASTER THESIS
Major: Computer Science
HA NOI - 2019
VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
Nguyen Minh Trang
ADVANCED DEEP LEARNING METHODS
AND APPLICATIONS IN
OPEN-DOMAIN QUESTION ANSWERING
MASTER THESIS
Major: Computer Science
Supervisor: Assoc.Prof. Ha Quang Thuy
Ph.D. Nguyen Ba Dat
HA NOI - 2019
Abstract
Ever since the Internet has become ubiquitous, the amount of data accessible by
information retrieval systems has increased exponentially. As for information con-
sumers, being able to obtain a short and accurate answer for any query is one of
the most desirable features. This motivation, along with the rise of deep learning,
has led to a boom in open-domain Question Answering (QA) research. An open-
domain QA system usually consists of two modules: retriever and reader. Each
is developed to solve a particular task. While the problem of document compre-
hension has received multiple success with the help of large training corpora and
the emergence of attention mechanism, the development of document retrieval in
open-domain QA has not gain much progress. In this thesis, we propose a novel
encoding method for learning question-aware self-attentive document represen-
tations. Then, these representations are utilized by applying pair-wise ranking
approach to them. The resulting model is a Document Retriever, called QASA,
which is then integrated with a machine reader to form a complete open-domain
QA system. Our system is thoroughly evaluated using QUASAR-T dataset and
shows surpassing results compared to other state-of-the-art methods.
Keywords: Open-domain Question Answering, Document Retrieval, Learning to
Rank, Self-attention mechanism.
iii
Acknowledgements
Foremost, I would like to express my sincere gratitude to my supervisor Assoc.
Prof. Ha Quang Thuy for the continuous support of my Master study and research,
for his patience, motivation, enthusiasm, and immense knowledge. His guidance
helped me in all the time of research and writing of this thesis.
I would also like to thank my co-supervisor Ph.D. Nguyen Ba Dat who has
not only provided me with valuable guidance but also generously funded my re-
search.
My sincere thanks also goes to Assoc. Prof. Chng Eng-Siong and M.Sc. Vu
Thi Ly for offering me the summer internship opportunities in NTU, Singapore
and leading me working on diverse exciting projects.
I thank my fellow labmates in KTLab: M.Sc. Le Hoang Quynh, B.Sc. Can
Duy Cat, B.Sc. Tran Van Lien for the stimulating discussions, and for all the fun
we have had in the last two years.
Last but not the least, I would like to thank my parents for giving birth to me
at the first place and supporting me spiritually throughout my life.
iv
Declaration
I declare that the thesis has been composed by myself and that the work has not
be submitted for any other degree or professional qualification. I confirm that the
work submitted is my own, except where work which has formed part of jointly-
authored publications has been included.
My contribution and those of the other authors to this work have been ex-
plicitly indicated below. I confirm that appropriate credit has been given within
this thesis where reference has been made to the work of others. The work pre-
sented in Chapter 3 was previously published in Proceedings of the 3rd ICMLSC
as “QASA: Advanced Document Retriever for Open Domain Question Answering
by Learning to Rank Question-Aware Self-Attentive Document Representations”
by Trang M. Nguyen (myself), Van-Lien Tran, Duy-Cat Can, Quang-Thuy Ha
(my supervisor), Ly T. Vu, Eng-Siong Chng. This study was conceived by all of
the authors. My contributions include: proposing the method, carrying out the
experiments, and writing the paper.
Master student
Nguyen Minh Trang
v