
Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
39
Double Deep Q-Network algorithm for solving
traffic congestion on one-way highways
Nguyen Tuan Thanh Le
1*
, Dat Tran-Anh
1
, Quoc-Bao Bui
2
, Linh Manh Pham
3
Abstract: The problem of reducing traffi
c congestion on highways is one of the conundrums that the
transport industry as well as the government would like to solve. With the great development of high
technologies, especially in the fields of deep learning and reinforcement learning, the system u
sing
multi-
agent deep reinforcement learning (MADRL) has become an effective method to solve this
problem. MADRL is a method that combines reinforcement learning and multi-
agent modeling and
simulation approaches. In this article, we apply the Double Deep Q-
Network (DDQN) algorithm to a
multi-agent model of traffic congestion and compare it with two other algorithms.
Keywords: Traffic congestion problem, multi-agent deep reinforcement learning, agent-b
ased
simulation, autonomous vehicles
1. Introduction
*
Reinforcement learning (RL) is increasingly
becoming one of the areas of great interest, with
the advent of deep reinforcement learning (DRL),
when (Mnih et al., 2015) used a construct called
Deep Q-Network (DQN) to create an agent
capable of outperforming a professional player in
a series of 49 classic Atari games. Reinforcement
learning has also confirmed its position through a
number of achievements: AlphaFold tool,
developed by DeepMind (a subsidiary of Google)
successfully predicted the structure of proteins,
Self-Playing Game Agents have been successful
in training self-playing agents to beat professional
athletes in the game of Go. In addition, automated
robots are trained to drive autonomously, control
robots in space environments, and recognize and
interact with humans.
In reinforcement learning, an agent interacts
with an environment and performs actions to
1
Thuy loi University
2
School of Applied Mathematics and Informatics, Hanoi
University of Science and Technology, Hanoi, Vietnam
3
VNU University of Engineering and Technology
*
Corresponding author, ORCID ID: 0000-0002-3527-4066
Received 14
th
Aug. 2023
Accepted 3
rd
Dec. 2023
Available online 31
st
Dec. 2023
achieve some goals. The system receives
rewards from the environment based on its
actions. The goal of reinforcement learning is to
maximize the total reward value during the
interaction, which can be seen as an extension
of machine learning, where the system not only
learns from static data but also from dynamic
interactions with the environment. Key
challenges include stability, instability, and
dimensional curses. Standalone Q-Learning
divides state action value functionality into
independent tasks performed by individual
agents to solve the dimensional curse. However,
given a dynamic environment that is common
because agents can change their policies
concurrently, the learning process can be
volatile and variable. To enable information
sharing between actors, a suitable
communication mechanism is required. This is
important because it determines the
content/amount of information that each agent
can observe and learn from its neighbors, which
directly impacts the amount of information
uncertainty that can be minimized. Common
approaches include allowing neighboring agents
to i) exchange their information with each other
and directly use partial observations during the