
Abstract
Ever since the Internet has become ubiquitous, the amount of data accessible by
information retrieval systems has increased exponentially. As for information con-
sumers, being able to obtain a short and accurate answer for any query is one of
the most desirable features. This motivation, along with the rise of deep learning,
has led to a boom in open-domain Question Answering (QA) research. An open-
domain QA system usually consists of two modules: retriever and reader. Each
is developed to solve a particular task. While the problem of document compre-
hension has received multiple success with the help of large training corpora and
the emergence of attention mechanism, the development of document retrieval in
open-domain QA has not gain much progress. In this thesis, we propose a novel
encoding method for learning question-aware self-attentive document represen-
tations. Then, these representations are utilized by applying pair-wise ranking
approach to them. The resulting model is a Document Retriever, called QASA,
which is then integrated with a machine reader to form a complete open-domain
QA system. Our system is thoroughly evaluated using QUASAR-T dataset and
shows surpassing results compared to other state-of-the-art methods.
Keywords: Open-domain Question Answering, Document Retrieval, Learning to
Rank, Self-attention mechanism.
iii