
Data Analytics MSc Dissertation MTH775P, 2019/20
Disquisitiones Arithmeticæ
Predicting the prices for breakfasts and beds
Hai Nam Nguyen, ID 161136118
Supervisor: Dr. Martin Benning
A thesis presented for the degree of
Master of Science in Data Analytics
School of Mathematical Sciences
Queen Mary University of London

Declaration of original work
This declaration is made on August 17, 2020.
Student’s Declaration: I Student Name hereby declare that the work
in this thesis is my original work. I have not copied from any other students’
work, work of mine submitted elsewhere, or from any other sources except
where due reference or acknowledgement is made explicitly in the text, nor
has any part been written for me by another person.
Referenced text has been flagged by:
1. Using italic fonts, and
2. using quotation marks “. . . ”, and
3. explicitly mentioning the source in the text.
i

This work is dedicated to my niece Nguyen Le Tue An(Mochi), who has
brought a great source of joy to me and my family recently.

Abstract
Pricing and guessing the right prices are vital for both hosts and renters on home-
sharing plat-form from internet based companies. To contribute the growing inter-
est and immense literatureon applying Artificial Intelligence on predicting rental
prices, this paper attempts to build ma-chine learning models for that purpose
using the Luxstay listings in Hanoi. R2score is used as the main criterion for the
model performance and the results show that Extreme GradientBoostings (XGB)
is the model with the best performance with R2= 0.62, beating the most so-
phisticated machine learning model: Neural Networks.
iii

Contents
Declaration of original work i
Abstract iii
1 Introduction 1
2 Literature Review 2
3 Experimental Design 5
3.1 Dataset ................................. 5
3.2 K-Fold Cross Validation ........................ 6
3.3 Measuring Model Accuracy ...................... 7
4 Methods 9
4.1 LASSO ................................. 9
4.1.1 FISTA .............................. 10
4.2 Random Forest ............................. 12
4.3 Gradient Boosting ........................... 14
4.4 Extreme Gradient Boosting ...................... 16
4.5 LightGBM ................................ 19
4.5.1 Gradient-based One-sided Sampling ............. 20
4.5.2 Exclusive Feature Bundling .................. 20
4.6 Neural Networks ............................ 23
4.6.1 Adam Algorithm ........................ 25
4.6.2 Backpropagation ........................ 26
iv

