Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
47
Application of deep learning in water surface detection
for Dong Hoi city using Sentinel-1 images
Nguyen Cam Van
1
, Dinh Viet Tu
2
, Van Ngoc An
2
, Dinh Nhat Quang
1*
Abstract:
Efficient water resource management is a critical mandate for governmental authorities, as it
directly i
mpacts the effective utilization of this invaluable natural resource. The expeditious and
accurate extraction of water surfaces significantly impacts governmental decision-
making. Leveraging
the advanced capabilities of high-resolution satellite imagery an
d the precise orbital data return, this
study employs state-of-the-
art deep learning techniques to enhance the efficiency of water surface
detection. Specifically, Sentinel-
1 data acquired from Google Earth Engine is utilized as a primary input
for proposed machine-
learning models. With the satellite images covering the entire of Quang Binh
province, the analysis detects 15.96 km of water surfaces along the Nhat Le River and 2.8 km
2
surface
area of the Phu Vinh reservoir. The evaluation metrics, i.e., Overa
ll Accuracy and Kappa, approach 0.9
approximately, indicate the robustness and potential of the results.
Keywords: Deep learning, Dong Hoi city, Google Earth Engine, Sentinel-1, water surfaces.
1. Introduction
*
Water resources hold immense significance,
especially for agrarian nations like Vietnam,
playing a pivotal role in agricultural practices
such as the regulation of irrigation water as well
as urban development planning. Quang Binh
province has diverse water resources, consisting
of an extensive river network (with 5 main river
systems, i.e. Roon, Gianh, Ly Hoa, Dinh, and
Nhat Le), 153 lakes and reservoirs, and a long
coastline (Figure 1). The Phu Vinh reservoir
in Dong Hoi city serves as a linchpin for
supplying irrigation water to the surrounding
agricultural zones. Additionally, Nhat Le river
courses through the city rapid urbanization
along its banks. Consequently, monitoring of
water surfaces in Quang Binh in general and
Dong Hoi city in particular emerges as a
crucial endeavor, ensuring the sustainable
growth of urban areas, establishing a safe
1
Thuyloi University
2
ARS Vietnam Company Limited, LePARC center
*
Corresponding author
Received 17
th
Oct. 2023
Accepted 6
th
Dec. 2023
Available online 31
st
Dec. 2023
flood escape route once floods occur, and
further aiding local authorities in effectively
managing water distribution from the Phu
Vinh reservoir.
Over the years, different methods have been
proposed for detecting and extracting water
surfaces. Some researchers have employed band
ratio image analysis on optical-sensor imagery
to delineate water surfaces (Fisher et al., 2016;
Quang et al., 2021). This approach primarily
relies on discerning the spectral disparities
between water surfaces and other features.
However, its effectiveness hinges on image
quality and the clarity of spectral characteristics,
rendering it unsuitable for images obscured by
over 20% cloud cover. Alternatively, the single
threshold segmentation method, which mainly
leverages spectral differences between water
surfaces and other objects (e.g. land, vegetation,
and urban features) in specific spectral bands, is
also employed. While effective for larger water
surfaces, it proves less efficient in regions
where pixels exhibit a mix of water and non-
water. Additionally, several studies have
harnessed Machine Learning (ML) techniques
Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
48
to extract water surfaces by implementing
some traditional algorithms such as Support
Vector Machine (Liu et al., 2020) and
Decision Tree (Chen et al., 2018) with
favorable outcomes. These methods, however,
necessitate expertise in ML model
construction. Another research deployed the
Deep Learning-based model with the
architecture namely U-Net on the Landsat-8
optical imagery dataset, using labels derived
from NDWI (Ch et al., 2022). While this
approach yielded commendable results, it is
also contingent on cloud-free coverage and
favorable weather conditions.
Figure 1. Location of the case study in Quang Binh province
Given the challenges posed by existing
methodologies, this study employs Deep
Learning (DL) techniques on Sentinel-1 radar
imagery acquired from Google Earth Engine
(GEE) platform for the purpose of water surface
detection. This research focuses on a case study
involving the Phu Vinh Reservoir, Nhat Le
River and seaside of Nhat Le beach in Dong Hoi
city, Quang Binh province. The primary
objectives encompass: (1) leveraging radar
images to enhance temporal resolution for
precise monitoring of three kinds of water
surfaces: reservoir, river, and sea; (2)
substituting optical imagery, which is often
obscured by cloud cover, with radar images; and
(3) employing the U-Net model in conjunction
with automatic labels to train models to extract
water surfaces quickly and effectively.
2. Case study and data collection
2.1. Case study
Situated in the North Central coastal region
of Vietnam, Quang Binh province encompasses
a natural area of 8.066 km
2
. Its complex
topography is characterized by a narrow terrain
that descends from West to East. The
hydrological network exhibits significant
density, with streams averaging 0.6 to 1.85
km/km
2
. Five main river systems, namely Roon,
Gianh, Ly Hoa, Dinh and Nhat Le, cover an
area of 7.944 km
2
, and with a total length of 367
km. Notably, Gianh and Nhat Le river basins
collectively comprise 92% of total area of five
basin area. This study specifically focuses on
three distinct water surfaces within Dong Hoi
city, i.e., Phu Vinh rreservoir, Nhat Le river,
and seaside of Nhat Le beach (Figure 1).
Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
49
2.2. Data collection
Sentinel-1 is a satellite constellation
developed by the European Space Agency for
the purpose of the surveillance of natural
disasters through the acquisition of high-
resolution data. The data derived from Sentinel-
1 comprises Level-1 Ground Range Detected
products, which consist of focused Synthetic
Aperture Radar (SAR) data with the notable
advantage of operating capability at
wavelengths unaffected by cloud cover or low
illumination.
Furthermore, radar images are acquired from
the cloud-based image processing platform
known as Google Earth Engine (GEE). GEE
offers users access to an extensive repository of
satellite imagery and facilitates the processing
of large-scale geospatial data, making it
accessible even to non-experts in information
technology (Gorelick et al., 2017).
Consequently, the evaluation data was gathered
throughout the entirety of 2022, coinciding with
the dates of Sentinel-1 flew over, within the
geographical expanse encompassing the Phu
Vinh reservoir and Nhat Le River, as illustrated
in Figure 1. Moreover, for the purposes of
training and validation, three additional regions
namely, Gianh basin and its adjacent (in
Quang Binh province), Thua Thien Hue and Da
Nang provinces, which exhibits similar features
and weather conditions, were selected (see
Table 1 and Figure 2).
Table 1. List of Sentinel-1 images used for training and validation
Product Date Polarization Region
S1A_IW_GRDH_1SDV_20221202T105652_20
221202T105717_046152_05866A_23A6
December
2
nd
, 2022
VH + VV Thua Thien Hue
and Da Nang
S1A_IW_GRDH_1SDV_20221207T110518_20
221207T110543_046225_058903_EC81
December
07
th
, 2022
VH + VV Gianh basin & its
adjacent
Figure 2. Region of interest for training and validation:
(a) Gianh basin and its adjacent; (b) Thua Thien Hue; (c) Da Nang
Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
50
3. Methodology
3.1. Deep Learning models
Deep Learning models, or neural networks
(NN) models, are a category of machine
learning models, drawing inspiration from
biological systems, particularly the intricate
processes of the human brain. In the image
segmentation techniques, U-Net architecture
stands out as one of the most renowned
networks, built upon the principles of the fully
Convolutional Neural Network (CNN). A
standard CNN architecture typically
incorporates various components, including
convolutional layers, activation layers (such as
Rectified Linear Unit or RELU), pooling layers
(e.g., max pooling), and fully connected layers.
Figure 3a illustrates a common CNN
architecture applied to the renowned MNIST
handwritten recognition problem.
Image segmentation is a widely recognized
task in computer vision, involving the partitioning
of an image into distinct multiple segments or
regions, each of which corresponds to a different
object or part of an object. The goal of image
segmentation is to assign a unique label or
category to every pixel in the image, so that pixels
with similar attributes are grouped together.
Ronneberger et al. (2015) introduced the U-Net
architecture, which was developed for biomedical
image segmentation. The network consists of a
contracting path and an expansive path, which
collectively gives it a U-shaped configuration
(Figure 3b). This network is also applied widely in
other domains, not just biomedical images.
Figure 3. (a) A typical CNN architecture; (b) U-Net architecture
3.2. Evaluation metrics
Table 2 shows the confusion matrix-based in
approached used in this study to estimate the
Overall Accuracy (OA) and Kappa coefficient
(Kappa) of classified images. There are four
types of classified pixels when comparing
classified images with reference data: True
positive (TP), False negative (FN), False
positive (FP), True negative (TN). According to
the four types of classified pixels, the OA and
Kappa coefficient are calculated with formulas
(1) and (2).
Table 2. Confusion matrix
Reference data
Water Non-water
Water TP FP
Classified
data Non-water FN TN
(1); (2)
where is the chance accuracy represented
by (TP + FP)*(TP + FN) + (FN + TN)*(FP +
TN), and T is the total number of pixels in the
accuracy assessment.
3.3. Application of Deep learning models
Journal of Water Resources & Environmental Engineering - No. 87 (12/2023)
51
3.3.1. Reference (label) data creation
SAR images suffer from a noise-like
phenomenon called speckle. Lee filter is a
technique that removes speckle in radar image
processing, and its formula is described by
formula (3). Once speckle is removed by
applying Lee filter, Otsu binary threshold is
applied to obtain the reference image (Figure 4).
(3)
where is the output image, denotes
the local mean in the scanning window centered
on the ij-th pixel, denotes the central element
in the window, is the variance of the pixel
values in the current window.
Figure 4. Reference data creation
3.3.2. Training data creation
The SAR image is processed with two
methods before feeding to the U-Net model:
- Method 1: Scale the SAR image to range
from 0 to 255, as shown by formula (4).
(4)
- Method 2: The threshold value for
separating water and non-water pixels
depending on the specific dataset, the time of
acquisition, atmospheric conditions and the
nature of the water bodies being analyzed. The
raw Sentinel-1 images in 2022 reveal that the
potential value to separate water and non-water
pixels is -18dB, thus we used the SAR value
clip from -18dB to 0 by formula (5), then scaled
to range from 0 to 255 by formula (4).
Figure 5. (a) Raw Sentinel-1 image; (b) its pixels statistic