Received 14 June 2025, accepted 13 July 2025, date of publication 28 July 2025, date of current version 14 August 2025.
Digital Object Identifier 10.1109/ACCESS.2025.3593136
An Efficient Model for Real-Time Traffic Density
Analysis and Management Using Visual Graph
Networks
NIKHIL NIGAM 1, DHIRENDRA PRATAP SINGH1,
JAYTRILOK CHOUDHARY1, AND SURENDRA SOLANKI 2
1Department of Computer Science and Engineering, MANIT, Bhopal 462003, India
2Department of Artificial Intelligence and Machine Learning, Manipal University Jaipur, Jaipur, Rajasthan 303007, India
Corresponding author: Surendra Solanki (Surendra.solanki@jaipur.manipal.edu)
ABSTRACT Real-time traffic management systems are needed to manage urbanization’s impact on traffic
conditions. Traffic dynamics in cities are complex, and traditional signal timing methods and simple vehicle
detection cannot handle them. The paper describes a method for improving urban traffic studies using
Real-time Dense Analysis and Management using Visual Graph Networks (RDAMVGN) that utilizes
deep learning techniques along with Visual Graph Networks based on visualizations. This study aims to
develop a robust, dynamic, and accurate approach to traffic density analysis, vehicle classification, and
dynamic signal control in order to achieve high accuracy in traffic flow analysis. The proposed RDAMVGN
framework incorporates both a LACF-YOLO detection model and a Faster Region-Based Convolutional
Neural Network (Faster RCNN) detection model for high-speed and high-accuracy vehicle identification.
This framework applies transfer learning to adapt pre-trained features to new traffic environments. This
enhances vehicle classification in complex urban scenes and improves the model’s ability to distinguish
vehicles from non-vehicle objects. Traffic flow optimization is achieved by using Mask RCNN and
LSTM. The comparative analysis includes Fine-Tuned YOLOv8 with Fine-Tuned Faster RCNN, Fine-
Tuned YOLOv3 with Fine-Tuned Faster RCNN, YOLOv5 with Faster RCNN, YOLOv3 with Faster RCNN,
standalone Faster RCNN, YOLOv5, YOLOv3, Reinforcement Learning (RL), and Deep Reinforcement
Learning (DRL) models, evaluate across precision, recall, accuracy, under pre-emption and non-pre-emption
scenarios. The RDAMVGN-based detection component exhibits demonstrably superior performance across
all evaluation metrics in real-time traffic management systems. It achieves a high precision of 97.75 %,
indicating accurate vehicle identification with minimal false positives. Its recall rate of 96.17 % reflects
strong detection capability, minimizing missed vehicles. The overall accuracy stands at 98.48 %, indicating
robust classification and localization. In pre-emption scenario, the model maintains its lead with pre-emption
precision of 94.79 %, pre-emption recall of 98.66 %, and pre-emption accuracy of 97.43 %, showcasing its
reliability in real-time traffic prioritization scenarios.
INDEX TERMS Deep learning, real-time traffic management, signal optimizations traffic density/spatial
occupancy, vehicle detection.
I. INTRODUCTION
A growing urban population and rising vehicle numbers
create pressure on existing road infrastructure, demand-
ing efficient and intelligent traffic management systems for
The associate editor coordinating the review of this manuscript and
approving it for publication was Guangjie Han .
smoother flow and reduced congestion. Vehicles have diverse
features such as shape, color, edges, shadows, and texture,
which makes accurate identification and classification chal-
lenging. Classification and localization are key steps in the
object detection process [1],[2]. Classification identifies the
type of vehicle, while localization defines bounding boxes
around vehicles in images or videos; together, these processes
VOLUME 13, 2025
2025 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/ 140413
N. Nigam et al.: Efficient Model for Real-Time Traffic Density Analysis and Management
enable modern traffic systems to analyze real-time vehicle
data from high-quality cameras, allowing optimization of
green signal timings at intersections based on the current
traffic density. This optimization reduces vehicle wait times
and lowers traffic congestion. In earlier times, inductive loop
technology was used, which was introduced in the 1960s and
involved wire loops embedded in the pavement to detect vehi-
cles. While effective for basic vehicle detection, it provided
limited information. The radar system was also used, which
used radio waves to determine the range, direction, and speed
of vehicles. Nowadays, advances in imaging technology have
made cameras more affordable, portable, and reliable. There
are three types of vision-based vehicle detection algorithms:
motion-based, handcrafted feature-based, and CNN-based.
Motion-based methods, such as optical flow and background
subtraction, are effective for detecting moving vehicles but
rely on static images or videos. Handcrafted feature-based
methods like Histogram of Oriented Gradients (HOG) [3] and
SIFT [4] provide low-level feature representations but require
expert knowledge for feature extraction. In contrast, CNN-
based techniques leverage deep learning to automatically
learn high-level features from large datasets, simplifying the
feature extraction process but necessitating powerful compu-
tational resources due to their reliance on extensive matrix
operations optimized by Graphics Processing Units (GPUs).
In urban traffic networks, traffic signals regulate vehicle
movement at intersections, helping manage flow and enhance
safety while reducing congestion through optimized timing
strategies [5],[6].
Modern urban traffic flows are highly dynamic and com-
plex. Static signal timing plans and basic vehicle detection
are inadequate for managing traffic. The use of deep learning
technologies makes real-time, accurate, and adaptive traffic
analysis and management possible. A paradigm shift takes
place with the development of models such as You Only
Look Once (YOLO) and Faster RCNN. YOLO uses fast
processing to instantly identify and localize different vehi-
cle types in a single network pass. This supports real-time
traffic monitoring and allows quick responses to changing
conditions. In contrast, Faster R-CNN is known for its high
precision and accuracy in object detection, making it suitable
for tasks where detection accuracy is critical. This ability
is particularly advantageous when dealing with dense traffic
scenarios. Transfer learning improves object classification,
especially in traffic scenarios, by leveraging knowledge from
previously trained models. Pre-trained models increase detec-
tion accuracy and reduce training time and data requirements.
By utilizing extensive datasets, these models enhance the
system’s ability to recognize various elements in complex
environments, making them well-suited for real-time traf-
fic analysis. Mask R-CNN, an extension of Faster R-CNN,
introduces a dedicated branch for predicting segmentation
masks alongside class labels and bounding boxes, enabling
more precise localization and identification of objects within
images. This advancement allows for detailed size and
shape analyses of different vehicle types, which is essential
for comprehensive spatial occupancy/traffic density analy-
sis. By accurately segmenting each vehicle, Mask R-CNN
facilitates precise measurement of vehicle dimensions and
positions in complex traffic scenes, enhancing the assessment
of how much space vehicles occupy on the road. Further-
more, Long Short-Term Memory (LSTM) networks are used
for sequential data processing to predict and manage traffic
patterns. Through the analysis of time-series data based on
spatial occupancy from different signal phases, these models
are capable of accurately predicting traffic flow patterns and,
using polynomial regression, optimizing signal timing by
identifying the relationship between spatial occupancy and
green signal duration to improve overall traffic efficiency.
However, it remains a major challenge to integrate all these
capabilities into a framework optimized for traffic manage-
ment. Despite these advances, existing methods of managing
traffic lack adaptability to real-world conditions. RL pro-
vides adaptive signal control, but it comes at an enormous
computational cost. This limitation restricts the possibility of
implementing it in real time in more complex urban environ-
ments. CNN-based object classification improves detection
accuracy. However, the current system cannot predict traffic
changes in advance very often. To enhance urban mobility
and traffic signal efficiency, an integrated model is required.
This model should combine rapid object detection, advanced
classification, and predictive pattern analysis.
To bridge these gaps, the study proposes RDAMVGN,
an advanced integrated framework for real-time traffic den-
sity analysis, accurate vehicle detection, and adaptive traffic
signal optimization. In this study, LACF-YOLO is used for
detecting vehicles quickly, Faster RCNN is used for iden-
tifying objects accurately, Mask RCNN is used to analyze
detailed vehicles, and LSTM is applied to analyze predictive
signals. Polynomial regression predicts green signal timing
based on predicted traffic patterns by LSTM. This integration
significantly improves traffic flow and safety and sets the
standard for intelligent transport systems in cities. The pro-
posed model’s precision, accuracy, and recall are evaluated
on the Sumo simulator in this study. This marks a significant
advance in the quest for optimal urban mobility scenarios,
demonstrating its superiority over existing traffic manage-
ment methodologies.
A. MOTIVATION & CONTRIBUTION
The growth of urbanization intensifies traffic congestion,
posing critical challenges to mobility, environmental health,
and public safety. Congestion reduces the efficiency of trans-
portation networks and contributes to commuter stress, fuel
consumption, and greenhouse gas emissions as well. The
economic costs associated with lost time and energy waste
are substantial, placing a heavy burden on urban infrastruc-
ture and resources. Traditional traffic management systems,
originally designed for simpler and more predictable traffic
scenarios, are increasingly inadequate at coping with the
complexity and dynamic nature of modern urban traffic.
As cities continue to expand, there is an urgent need for
140414 VOLUME 13, 2025
N. Nigam et al.: Efficient Model for Real-Time Traffic Density Analysis and Management
intelligent, adaptive traffic management solutions capable of
addressing these multifaceted challenges in real time.
This study presents a novel, real-time traffic density
analysis framework designed to support adaptive traffic sig-
nal control in complex urban environments. The proposed
framework, RDAMVGN, integrates advanced deep learning
techniques through a multi-layered architecture that includes
rapid vehicle detection, detailed instance segmentation, and
predictive analytics. Unlike conventional static timing-based
traffic control systems or simplistic vehicle detection models,
RDAMVGN leverages a data-intensive, modular approach to
provide accurate and responsive traffic management. The key
contributions of this work are.
1. Speed and Precision Integration: The paper presents
RDAMVGN, a novel dual-model framework combining the
rapid detection capability of LACF-YOLO with the high-
precision classification strength of Faster R-CNN, specif-
ically designed for managing high-density urban traffic.
This integration ensures both speed and accuracy, which
are key requirements for adaptive traffic signal control in
complex environments. The system employs transfer learning
by leveraging pre-trained networks and fine-tuning them on
traffic-specific datasets to maintain consistently high clas-
sification accuracy even with limited training data. This
approach improves vehicle localization and classification
while effectively filtering out non-vehicle objects, enhancing
traffic density estimation. RDAMVGN’s scalable design and
robust performance across diverse traffic conditions establish
it as a significant advancement over traditional traffic man-
agement models.
2. Instance-Level Segmentation Using Mask RCNN
for Detailed Traffic Monitoring: This study introduces
an improved intelligent traffic management approach by
integrating Mask R-CNN for detailed vehicle segmentation
within the RDAMVGN framework. Unlike traditional models
that only detect vehicles, this method classifies them accord-
ing to size, type, and shape. It enables more precise estimation
of road space occupied by each vehicle. For example, the
system recognizes that a bus occupies the space of several
cars instead of counting vehicles equally. This leads to a more
realistic and effective measurement of traffic congestion and
road capacity, supporting improved traffic signal control and
overall traffic flow management.
3. Predictive Signal Optimization with LSTM Net-
works: This study improves traffic management by inte-
grating LSTM networks within the RDAMVGN framework
to analyze traffic patterns in real time. Unlike traditional
systems that respond only to current traffic conditions, this
method predicts future traffic flow and adjusts signal timings
using polynomial regression proactively. As a result, it pre-
vents congestion before it occurs and facilitates smoother
traffic movement.
4. Detailed, Real-Time Response with Quantifiable
Impacts: The RDAMVGN framework enhances traffic man-
agement by integrating speed, accuracy, and traffic flow
prediction within a unified model. Evaluation using the
SUMO simulator shows that RDAMVGN reduces congestion
by 10% and lowers average vehicle wait time to 1.05 seconds.
Unlike conventional systems that address only one or two
tasks, RDAMVGN combines detection, classification, seg-
mentation, and prediction in a single architecture. It responds
effectively to real-time traffic conditions and demonstrates
significant improvements in traffic efficiency.
5. Prioritizing emergency vehicles: Perhaps most impor-
tantly, this paper recognizes the critical need for priori-
tizing emergency vehicles in traffic management systems.
By proposing the use of custom CNN models and sound
classification techniques for emergency vehicle detection,
this research contributes to reducing response times for
emergency services, thereby enhancing public safety and
potentially saving lives.
This paper is divided into the following sections: Section II
reviews existing literature regarding traffic signal control
advancement. Methodology and model architecture are dis-
cussed in Section III. Some empirical results and comparative
study with related models are provided in Section IV, and
finally Section Vdiscusses the conclusions of the present
study with future research scope.
II. LITERATURE REVIEW
A. CNN BASED DETECTOR
Significant advancements in deep learning architectures for
object detection have been documented over the years.
Krizhevsky et al. [7] introduced AlexNet, a deep convolu-
tional neural network with 650,000 neurons and 60 million
parameters, consisting of five convolutional and three fully
connected layers. This model employed the ReLU activation
function, which accelerated training approximately six times
faster than traditional Tanh, and used dropout to mitigate
overfitting effectively. AlexNet demonstrated the feasibility
of training large models on GPUs, marking a milestone
in deep learning. Simonyan and Zisserman [8] developed
VGGNet, which increased network depth by stacking multi-
ple small convolutional layers and used max-pooling layers to
reduce the spatial dimensionality of feature maps. However,
VGGNet suffered from slow training speeds and a high num-
ber of parameters, limiting its efficiency despite improved
accuracy. Girshick et al. [2],[9] introduced RCNN, which
generated region proposals for object detection but suffered
from long training and testing times, complicating optimiza-
tion. Later, Fast RCNN improved upon RCNN by integrating
RoI pooling to enhance accuracy and speed in feature extrac-
tion. He’s et al. [1] demonstrated that SPP-net improved
detection by extracting fixed-length feature vectors from the
entire image feature maps, enabling multi-scale object han-
dling. Szegedy et al. [10] enhanced detection by incorporating
inception modules in GoogleNet, which efficiently combined
features at multiple scales. Ren et al. [11] introduced Region
Proposal Networks (RPN) to generate object proposals within
the model, although detecting small objects remained chal-
lenging. He’s et al. [12] addressed training difficulties with
VOLUME 13, 2025 140415
N. Nigam et al.: Efficient Model for Real-Time Traffic Density Analysis and Management
ResNet, utilizing residual learning to enable training of sub-
stantially deeper networks effectively. Dai et al. [13] proposed
Feature Pyramid Networks (FPN) to integrate multi-scale
features for improved detection across different object sizes.
Redmon et al. [14] proposed YOLO, redefining object detec-
tion as a regression problem that enables real-time detection
but struggled with small and crowded objects. Liu et al.
[15] introduced SSD, which employed hard negative mining
for accuracy improvement and supported real-time inference.
Further innovations include Lin et al. RetinaNet [16], which
introduced focal loss to handle class imbalance by focusing
on hard negatives. Huang et al. [17] proposed DenseNet,
emphasizing feature reuse and improved information flow by
concatenating inputs with residual outputs. Chen et al. [18]
presented Dual Path Network (DPN), combining strengths of
ResNet and DenseNet to enhance feature reusability along-
side exploring new features. Howard et al. [19] optimized
computational costs for mobile platforms with MobileNet;
Cai et al. introduced Cascade RCNN, which refines proposals
in a cascaded manner [20]; and Law and Deng et al. proposed
CornerNet, which detects objects as pairs of corners [21].
Recent advancements in YOLOv8-based architectures
have demonstrated high efficacy in object detection, par-
ticularly for vehicle detection in complex and dynamic
environments. In [22], the YOLOv8 model was applied to car
detection tasks under varying conditions, including diverse
image resolutions, vehicle scales, and densely populated
scenes. The model achieved approximately 98% accuracy on
two datasets and exhibited strong capabilities in both vehi-
cle recognition and speed estimation. Notably, it performed
well in multi-directional vehicle counting at intersections,
indicating its suitability for dynamic traffic signal control.
Despite these strengths, challenges such as detecting small,
multi-scale, and occluded objects—especially in remote sens-
ing imagery—persist. To address these limitations, Liu et
al. [23] proposed an enhanced YOLOv8 variant by replac-
ing the conventional C2f module with a DIBLayer, which
expanded the receptive field while minimizing feature loss.
The integration of a TA attention module further improved
focus on small targets without significantly increasing model
complexity. A cross-layer feature fusion mechanism was also
introduced to boost detection accuracy while maintaining
lightweight architecture. Additionally, Focal-EIOU loss was
employed to accelerate training convergence and enhance
bounding box prediction precision. Further progress was
reported in [24], where the CSD-YOLO model was devel-
oped specifically for detecting small targets on water surfaces
under challenging conditions. It utilized a WODown down-
sampling module featuring a three-branch fusion structure
to better retain fine-grained details. The C2f-GC module,
incorporating gated convolution, improved object differen-
tiation. Moreover, a dynamic detection head (Dyhead) was
integrated to adaptively regulate the receptive field, which led
to significant performance gains.
To further overcome persistent issues such as missed
detections and false positives, Wang et al. [25] introduced
architectural modifications to the YOLOv8n model. These
included the AFP module (Attention Scale Sequence Fusion
with the P2 layer) to enhance small object feature extraction.
A lightweight convolutional module (LWConv) was proposed
by converting the standard C2f block into LW_C2f to reduce
model size and computational cost. Additionally, Wise-IoU
loss was incorporated to improve bounding box accuracy and
mitigate the effects of low-quality samples.
Together, these studies highlight continuous progress in
YOLOv8-based object detection, especially for small and
complex targets, through ongoing innovations in network
architecture, attention mechanisms, feature fusion, and loss
functions. These improvements are crucial for practical appli-
cations in traffic management, remote sensing, and other
real-world environments.
B. REINFORCEMENT LEARNING (RL)-BASED OBJECT
DETECTION
RL has increasingly been explored as an alternative to tradi-
tional object detection methods. Initial investigations demon-
strated the feasibility of RL through Deep Q-learning and
hierarchical search strategies, achieving competitive results
with methods like RCNN [26],[27]. However, these early
approaches encountered challenges in detecting a variable
number of objects. To address this, Li et al. [28] incorporated
prior knowledge using restricted Edge Boxes [29], enhancing
candidate box quality and improving accuracy and recall.
These methods frequently employed an inhibition-of-return
(IoR) mechanism to manage the detection of multiple objects.
Later advancements in RL-based object detection intro-
duced more sophisticated models. Ba et al. [30] developed
a deep recurrent attention model (RAM), trained via RL, for
multi-object recognition. This was further refined by integrat-
ing clues or constraints in the clued RAM to guide the agent’s
search and improve efficiency. Current research explores
diverse applications of RL in object detection [31], includ-
ing efficient detection, weakly supervised detection [32],
and object segmentation [33]. The integration of RL with
rapid proposal generation techniques, such as YOLO-V3, and
multi-agent systems using Independent Q-Learners has also
been investigated for object tracking [32].
Choi and Ha [34] proposed a novel approach that sim-
plifies labeling by using image-level object counts instead
of bounding boxes. Their method, utilizing an actor-critic
framework, learns to generate bounding boxes. However, this
approach is susceptible to counting errors, potentially impact-
ing detection performance. Furthermore, its generalization
to completely novel environments may require additional
training. The reliance on a pre-trained evaluation model also
limits its performance to the quality and diversity of the train-
ing data. Additionally, the method does not identify object
classes and relies on accurate object counts, which can be
challenging to calculate total space occupied with vehicles.
While achieving comparable performance to transformer-
based approaches, a more thorough comparison with diverse
existing methods is needed.
140416 VOLUME 13, 2025
N. Nigam et al.: Efficient Model for Real-Time Traffic Density Analysis and Management
Zhou et al. [35] introduced ReinforceNet, a framework
designed to overcome limitations in existing RL-based object
detection methods. It features an enhanced reward function
and incorporates a region selection network (RS-net) and
a bounding box refinement network (BBR-net) to improve
accuracy. However, the complexity of ReinforceNet, involv-
ing multiple RL agents and CNN-based features, may require
substantial computational resources and expertise. While
promising on datasets like PASCAL VOC, its scalability and
generalization to real-world scenarios require further evalu-
ation. The method’s reliance on pre-trained models, such as
Vgg16, and the challenges in optimizing the reward function
and network components also present limitations. Further-
more, its computational demands raise concerns about its
suitability for real-time applications.
C. TRAFFIC SIGNAL MANAGEMENT BASED ON DIFFERENT
APPROACHES
This part discusses the recent advancements in traffic signal
control, focusing on methodologies employing LSTM and
machine learning, particularly DRL and RL, to enhance traf-
fic efficiency and responsiveness.
Several studies have explored the application of LSTM
networks in traffic management and prediction. For instance,
research presented in [36] utilizes LSTM models to forecast
the transition time of actuated traffic signals, specifically the
shift from green to red. This approach leverages existing
signal timing parameters as inputs, rather than attempting to
estimate them. Furthermore, LSTM networks have demon-
strated effectiveness in travel time estimation [37]. These
models, designed to handle sequential data, contribute to
improved navigation and logistics by providing more accu-
rate travel time predictions. This body of research, along
with comprehensive reviews found in [38],[39], and [40],
provided a foundational understanding for our investigation.
Recent advancements in DRL, multi-agent systems, and
connected vehicle technologies have spurred significant
progress in intelligent traffic signal control. Several models
have emerged, each providing unique approaches to opti-
mize traffic flow. Kodama et al. [41] introduced a DRL
system within the SUMO simulation environment, focusing
on reinforcing successful traffic signal control strategies. This
methodology demonstrated a substantial 33% reduction in
waiting times compared to traditional systems. However, the
computational demands of this approach present a potential
barrier to real-time deployment. In another development, the
PV-TSC system [42] leverages RL to manage traffic signals
for both pedestrians and vehicles, particularly within the
context of 6G networks. This system integrates pedestrian
and vehicle traffic, enabling comprehensive control of both
flows. While this system effectively addresses pedestrian and
vehicle interaction, its primary focus may limit its appli-
cability in environments with diverse transportation modes.
To address network-wide traffic management, a multi-agent
reinforcement learning (MARL) system utilizing a hierar-
chical Nash-Stackelberg game model was proposed [43].
This approach aims to improve coordination across multi-
ple traffic signals. However, the computational complexity
of this model scales with network size, potentially hinder-
ing real-time performance. Further exploring network-level
control, a conflict graph approach was developed to model
interactions between traffic signals in SUMO [44]. This
method facilitates network-wide traffic control by explicitly
considering signal conflicts. Nevertheless, its practical imple-
mentation requires significant infrastructure coordination and
computational resources, and its validation with real-world
data remains limited. Addressing mixed-autonomy traffic,
a coupling control system was designed for isolated intersec-
tions, employing pseudo-platoons [45]. This system adapts
to the dynamic nature of traffic, including both automated
and human-driven vehicles. However, its implementation
necessitates substantial infrastructure coordination. Dai et al.
[46] presented a neighborhood cooperative MARL system
for adaptive traffic signal control, particularly targeting epi-
demic regions. This system, validated in both simulated and
real-world scenarios, prioritizes intersections based on their
importance and fosters cooperative behavior among signal
agents. However, the system’s reliance on extensive data
sharing raises potential privacy concerns.
Grumert and Pereira [47] proposed a heads-up green sys-
tem in connected traffic signals to enhance traffic efficiency,
which includes information-sharing capabilities between
traffic signals and connected vehicles. This system could
improve travel times by up to 15% during peak demand,
particularly in queue situations, with connected vehicles
experiencing the most significant benefits; even lower shares
(5-15%) also showed notable improvements. However, this
approach relies heavily on infrastructure support for effec-
tive communication among connected vehicles. Wang et al.
[48] developed a network-wide traffic signal control model
using bilinear system modeling and adaptive optimization
within the VISSIM traffic simulation environment, specifi-
cally for a 35-intersection network in Bellevue, Washington.
This model effectively integrates bilinear system modeling
to account for interactions between traffic delays and signal
timing splits, although practical implementation may neces-
sitate calibration and tuning. Ma et al. [49] introduced a
novel deep neural network model that employs an actor-critic
architecture to extract and represent recent traffic condi-
tions through temporally sequential image representations
of intersections. Utilizing temporal traffic pattern, this deep
actor-critic method consistently outperforms existing DRL
techniques; however, its computational complexity may be
high for real-time applications. Hussain et al. [50] proposed a
method to improve traffic flow efficiency during yellow inter-
vals at signalized intersections by combining a countdown
strategy with a green LED light to inform drivers of upcom-
ing signal changes. The study collected various data points,
including vehicle positions, travel speeds, and instances of
traffic violations, but implementation may require integra-
tion with existing signal systems. Xu et al. [51] conducted
VOLUME 13, 2025 140417