VNU INTERNATIONAL SCHOOL
Pose Estimation of Surgical Instruments
using Convolutional Neural Networks for
MIS Applications
Author:
Tran Duc Vinh
A thesis submitted in fulfillment of the requirements
for the degree of MICE
in the
VNU International School
June 22, 2025
2
Declaration of Authorship
I, Tran Duc Vinh , declare that this thesis titled, “Pose Estimation of Surgical Instru-
ments using Convolutional Neural Networks for MIS Applications” and the work
presented in it are my own. I confirm that:
This work was done wholly or mainly while in candidature for a research de-
gree at this University.
Where any part of this thesis has previously been submitted for a degree or
any other qualification at this University or any other institution, this has been
clearly stated.
Where I have consulted the published work of others, this is always clearly
attributed.
Where I have quoted from the work of others, the source is always given. With
the exception of such quotations, this thesis is entirely my own work.
I have acknowledged all main sources of help.
Where the thesis is based on work done by myself jointly with others, I have
made clear exactly what was done by others and what I have contributed my-
self.
Signed:
Date:
3
VNU INTERNATIONAL SCHOOL
Abstract
Faculty Name
VNU International School
Pose Estimation of Surgical Instruments using Convolutional Neural Networks
for MIS Applications
by Tran Duc Vinh
Accurate detection and pose estimation of surgical instruments are critical for
computer-assisted interventions (CAI) and robotic-assisted surgeries. This research
proposes an innovative method for detecting and estimating the pose of multiple
surgical tools using the YOLOv8-pose model. A comprehensive dataset comprising
images of clippers, irrigators, and scissorswas meticulously collected andannotated
totrainthemodel, facilitatingpreciselocalizationandorientationestimationof these
instruments during laparoscopic procedures.
The performance of the model was assessed using a test dataset across four vari-
antsofYOLOv8-pose. Notably, theYOLOv8nvariant,characterizedbyitslightweight
architecture with only 3 million parameters, exhibited superior performance in both
pose estimation and object detection tasks. For pose estimation, it achieved a Preci-
sion of 91%, Recall of 93%, mean Average Precision at IoU 0.5 (mAP@0.5) of 97.9%,
and mAP@0.5-0.95 of 88.7%, underscoring its capability to reliably track surgical in-
struments. In terms of object detection, the model recorded a Precision of 97.9%, Re-
call of 96.0%, mAP@0.5 of 99.2%, and mAP@0.5-0.95 of 64.6%, demonstrating robust
identification and real-time tracking of multiple instruments in surgical settings.
These findings affirm YOLOv8n as an exceptionally efficient model for real-time
surgical instrument tracking and pose estimation, rendering it highly suitable for
integration into robotic-assisted and minimally invasive surgical systems. Further-
more, this study establishes a foundation for extending the methodology to encom-
pass additional surgical instruments, thereby advancing automation and precision
in AI-driven surgical technologies.
4
Contents
Declaration of Authorship 2
Abstract 3
1 Introduction 9
1.1 Rationale .................................... 9
1.2 Aim and Objectives of the Study . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1 Aim ................................... 10
1.2.2 Objectives ............................... 10
1.3 Research questions .............................. 11
1.4 Methods of the Study ............................. 12
1.5 Scope of the Study .............................. 12
1.6 Main Contributions .............................. 13
2 Minimally Invasive Surgery and Pose Estimation 15
2.1 Minimally Invasive Surgery ......................... 15
2.2 Surgical Instruments and Pose Estimation . . . . . . . . . . . . . . . . . 16
2.3 Pose Estimation Methods .......................... 17
2.3.1 Traditional Methods in Medical Pose Estimation ......... 17
2.3.2 Modern Deep Learning-Based Methods .............. 18
2.3.3 Key Considerations for Pose Estimation in Healthcare ..... 18
2.4 Deep Learning Models for Pose Estimation . . . . . . . . . . . . . . . . 19
2.4.1 Convolutional Neural Networks (CNNs) ............. 19
2.4.2 Graph Convolutional Networks (GCNs) ............. 20
2.4.3 Hybrid Deep Learning Models . . . . . . . . . . . . . . . . . . . 20
2.5 YOLO Model and Its Applications . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Mechanism of YOLO ......................... 21
2.5.2 Applications in Healthcare . . . . . . . . . . . . . . . . . . . . . 21
2.5.3 Advantages of YOLO ........................ 22
2.5.4 Limitations of YOLO ......................... 22
3 Methodology 23
3.1 YOLOv8 Model ................................ 23
3.1.1 Architecture Overview ........................ 23
3.1.2 Loss Functions ............................ 24
3.1.3 YOLOv8-Pose: Extending to Surgical Tool Pose Estimation . . . 25
3.2 Collecting Datasets .............................. 26
3.2.1 Surgical Instrument Dataset . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Data Labeling and Preprocessing . . . . . . . . . . . . . . . . . . 27
3.3 Model Setup and Training .......................... 28
3.3.1 Hyperparameter Configuration . . . . . . . . . . . . . . . . . . . 28
3.3.2 Optimization Strategy and Loss Function ............. 30
3.4 Model Performance Evaluation ....................... 30
5
4 Results and Discussions 33
4.1 Model Training Results ............................ 33
4.1.1 Training Configuration and Process . . . . . . . . . . . . . . . . 33
4.1.2 Performance Across YOLOv8 Variants . . . . . . . . . . . . . . . 34
4.1.3 Discussion of Results ......................... 36
4.2 Qualitative Results .............................. 36
4.2.1 Detection and Pose Estimation Outcomes . . . . . . . . . . . . . 36
4.2.2 Pose Estimation with Keypoint Connection . . . . . . . . . . . . 38
4.2.3 Trajectory Tracking and Movement Analysis . . . . . . . . . . . 39
4.3 Comparison with Other Methods ...................... 40
4.3.1 Overview of Comparative Analysis . . . . . . . . . . . . . . . . 40
4.3.2 Description of Compared Models ................. 40
4.3.3 Performance Metrics and Results .................. 40
4.3.4 Discussion ............................... 41
4.4 Discussion on Model Strengths and Limitations . . . . . . . . . . . . . 41
4.4.1 Strengths ................................ 41
4.4.2 Limitations .............................. 42
4.5 Practical Applications and Future Prospects . . . . . . . . . . . . . . . . 42
4.5.1 Practical Applications ........................ 42
4.5.2 Future Prospects ........................... 43
5 Conclusion and Future Work 44
5.1 Conclusion ................................... 44
5.2 Future Work .................................. 44