Doctoral dissertation in Mathematics and informatics: Research on development of methods of graph theory and automata in steganography and searchable encryption

Chia sẻ: _ _ | Ngày: | Loại File: PDF | Số trang:98

Thêm vào BST

Báo xấu

20
lượt xem 3
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

The purpose of the dissertation is to research on the development of new and quality solutions using graph theory and automata, suggesting their applications in, and applying them to steganography and searchable encryption.

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: Doctoral dissertation in Mathematics and informatics: Research on development of methods of graph theory and automata in steganography and searchable encryption

MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY —————————— Nguyen Huy Truong RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS Hanoi - 2020
MINISTRY OF EDUCATION AND TRAINING HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY —————————— Nguyen Huy Truong RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE ENCRYPTION Major: Mathematics and Informatics Major code: 9460117 DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS SUPERVISORS: 1. Assoc. Prof. Dr. Sc. Phan Thi Ha Duong 2. Dr. Vu Thanh Nam Hanoi - 2020
DECLARATION OF AUTHORSHIP I hereby certify that I am the author of this dissertation, and that I have completed it under the supervision of Assoc. Prof. Dr. Sc. Phan Thi Ha Duong and Dr. Vu Thanh Nam. I also certify that the dissertation’s results have not been published by other authors. Hanoi, February 03, 2020 PhD. Student Nguyen Huy Truong Supervisors Assoc. Prof. Dr. Sc. Phan Thi Ha Duong Dr. Vu Thanh Nam
ACKNOWLEDGMENTS I am extremely grateful to Assoc. Prof. Dr. Sc. Phan Thi Ha Duong. I want to thank Dr. Vu Thanh Nam. I would also like to extend my deepest gratitude to Late Assoc. Prof. Dr. Phan Trung Huy. I would like to thank my co-workers from School of Applied Mathematics and Informatics, Hanoi University of Science and Technology for all their help. I also wish to thank members of Seminar on Mathematical Foundations for Computer Science at Institute of Mathematics, Vietnam Academy of Science and Technology for their valuable comments and helpful advice. I give thanks to PhD students of Late Assoc. Prof. Dr. Phan Trung Huy for sharing and exchanging information in steganography and searchable encryption. Finally, I must also thank my family for supporting all my work.
CONTENTS Page LIST OF SYMBOLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii LIST OF ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CHAPTER 1 PRELIMINARIES . . . . . . . . . . . . . . . . . . . . . . . 4 1.1 Basic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.3 Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . . 6 m 1.1.4 The Galois Field GF (p ) . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 Digital Image Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Exact Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Longest Common Subsequence . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 Searchable Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE GALOIS FIELD USING GRAPH THEORY AND AUTOMATA . . . . . 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 The Digital Image Steganography Problem . . . . . . . . . . . . . . . . . . 18 2.3 A New Digital Image Steganography Approach . . . . . . . . . . . . . . . . 19 2.3.1 Mathematical Basis based on The Galois Field . . . . . . . . . . . . 19 2.3.2 Digital Image Steganography Based on The Galois Field GF (pm ) Using Graph Theory and Automata . . . . . . . . . . . . . . . . . . 21 2.4 The Near Optimal and Optimal Data Hiding Schemes for Gray and Palette Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 CHAPTER 3 AN AUTOMATA APPROACH TO EXACT PATTERN MATCHING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.2 The New Algorithm - The MRc Algorithm . . . . . . . . . . . . . . . . . . 41 3.3 Analysis of The MRc Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 CHAPTER 4 AUTOMATA TECHNIQUE FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM . . . . . . . . . . . . . . . . . . 56 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 i
4.2Mathematical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3Automata Models for Solving The LCS Problem . . . . . . . . . . . . . . . 61 4.4Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.5Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 CHAPTER 5 CRYPTOGRAPHY BASED ON STEGANOGRAPHY AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION . . 68 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2, 9, 8) . . . . . 70 5.3 Automata Technique for Exact Pattern Matching on Encrypted Data . . . 74 5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data 76 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 LIST OF PUBLICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 ii
LIST OF SYMBOLS Σ An alphabet Σ∗ The set of all strings on Σ ∅ The empty set The empty string |S| The number of elements of a set S |u| The length of a string u GF (p )m The Galois field is constructed from the polynomial ring Zp [x], where p is prime and m is a positive integer (GF (p ), +, ·) A vector space over the field GF (pm ) n m LCS(p, x) A longest common subsequence of p and x lcs(p, x) The length of a LCS(p, x) LeftID(u) The least element the leftmost location of u Rmp (u) The last component of LeftID(u) in p (I, M, K, Em, Ex) A data hiding scheme I A set of all image blocks with the same size and image format M A finite set of secret elements K A finite set of secret keys Em An embedding function embeds a secret element in an image block Ex An extracting function extracts an embedded secret element from an image block qcolour The number of different ways to change the colour of each pixel in an arbitrary image block I An image block M A secret element K A secret key Adjacent(cp , a) An adjacent vertex of cp c block A string of length c Pos p (z) The last position of appearance of z in p Mp An automaton accepting the pattern p Config(p) The set of all the configurations of p Wp (u) The weight of u in p Wp (C) The weight of C WConfig(p) The set of the weights of all the configurations of p i Wp (a) The weight of a at the location i in p Wmp (a) The heaviest weight of a in p W (a) The weight of a in p iii
LIST OF ABBREVIATIONS AOSO Average Optimal Shift Or BF Brute Force BFS Breadth First Search BMH Boyer Moore Horspool BNDM Backward Nondeterministic Dawg Matching CTL Chang Tseng Lin EBOM Extended Backward Oracle Matching ER Embedding Rate FJS Franek Jennings Smyth FOPA Fastest Optimal Parity Assignment FSBNDM Forward SBNDM HASH Hashing HCIH High Capacity of Information Hiding LBNDM Long BNDM LCS Longest Common Subsequence LSB Least Significant Bit MSDR Maximal Secret Data Ratio MSE Mean Square Error NP Nondeterministic Polynomial OPA Optimal Parity Assignment PA Parity Assignment PCT Pan Chen Tseng PSNR Peak Signal to Noise Ratio RGB Red Green Blue SA Shift Add SAE Searchable Asymmetric Encryption SBNDM Simplified BNDM SE Searchable Encryption SSE Searchable Symmetric Encryption TVSBS Thathoo Virmani Sai Balakrishnan Sekar WF Wagner Fischer WL Wu Lee iv
LIST OF FIGURES Figure 1.1. A simple graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Figure 1.2. A spanning tree of the graph given in Figure 1.1 . . . . . . . . . . . 6 Figure 1.3. The transition diagram of A in Example 1.3 . . . . . . . . . . . . . 7 Figure 1.4. The basic diagram of digital image steganography . . . . . . . . . . . 9 Figure 1.5. The degree of appearance of the pattern p . . . . . . . . . . . . . . . 12 Figure 2.1. The nine commonly used 8-bit gray cover images sized 512 × 512 pixels 35 Figure 2.2. The nine commonly used 8-bit palette cover images sized 512 × 512 pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Figure 2.3. The binary cover image sized 2592 × 1456 pixels . . . . . . . . . . . 36 Figure 3.1. Sliding window mechanism . . . . . . . . . . . . . . . . . . . . . . . 40 Figure 3.2. The basic idea of the proposed approach . . . . . . . . . . . . . . . . 44 Figure 3.3. The transition diagram of the automaton Mp , p = abcba . . . . . . . 46 v
LIST OF TABLES Table 1.1. An adjacency list representation of the simple graph given in Figure 1.1 5 Table 1.2. The performing steps of the BF algorithm . . . . . . . . . . . . . . . . 11 Table 1.3. The dynamic programming matrix L . . . . . . . . . . . . . . . . . . 13 Table 2.1. Elements of the Galois field GF (22 ) represented by binary strings and decimal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Table 2.2. Operations + and · on the Galois field GF (22 ) . . . . . . . . . . . . . 30 Table 2.3. The representation of E and the arc weights of G for the gray image . 31 Table 2.4. The payload, ER and PSNR for the optimal data hiding scheme (1, 2n − 1, n) for palette images with qcolour = 1 . . . . . . . . . . . . . 36 Table 2.5. The payload, ER and PSNR for the near optimal data hiding scheme (2, 9, 8) for gray images with qcolour = 3 . . . . . . . . . . . . . . . . . . 37 Table 2.6. The payload, ER and PSNR for the near optimal data hiding scheme (2, 9, 8) for palette images with qcolour = 3 . . . . . . . . . . . . . . . . 37 Table 2.7. The comparisons of embedding and extracting time between the chapter’s and Chang et al.’s approach for the same optimal data hiding scheme (1, N, blog2 (N + 1)c), where N = 2n − 1, for the binary image with qcolour = 1. Time is given in second unit . . . . . . . . . . . . . . 37 Table 3.1. The performing steps of the MR1 algorithm . . . . . . . . . . . . . . . 46 Table 3.2. Experimental results on rand4 problem . . . . . . . . . . . . . . . . . 51 Table 3.3. Experimental results on rand8 problem . . . . . . . . . . . . . . . . . 51 Table 3.4. Experimental results on rand16 problem . . . . . . . . . . . . . . . . . 52 Table 3.5. Experimental results on rand32 problem . . . . . . . . . . . . . . . . . 52 Table 3.6. Experimental results on rand64 problem . . . . . . . . . . . . . . . . . 53 Table 3.7. Experimental results on rand128 problem . . . . . . . . . . . . . . . . 53 Table 3.8. Experimental results on rand256 problem . . . . . . . . . . . . . . . . 54 Table 3.9. Experimental results on a genome sequence (with |Σ| = 4) . . . . . . . 54 Table 3.10. Experimental results on a protein sequence (with |Σ| = 20) . . . . . . 55 Table 4.1. The Refp of p = bacdabcad . . . . . . . . . . . . . . . . . . . . . . . . 59 Table 4.2. The comparisons of the lcs(p, x) computation time for n = 50666 . . . 66 Table 4.3. The comparisons of the lcs(p, x) computation time for n = 102398 . . 67 vi
INTRODUCTION In the modern life, when the use of computer and Internet is more and more essential, digital data (information) can be copied as well as accessed illegally. As a result, information security becomes increasingly important. There are two popular methods to provide security, which are cryptography and data hiding [2, 5, 6, 20, 56, 62, 81]. Cryptography is used to encrypt data in order to make the data unreadable by a third party [5]. Data hiding is used to embed data in digital media. Based on the purpose of the application, data hiding is generally divided into steganography that hides the existence of data to protect the embedded data and watermarking that protects the copyright ownership and authentication of the digital media carrying the embedded data. Steganography can be used as an alternative way to cryptography. However, steganography will become weak if attackers detect existence of hidden data. Hence integrating cryptography with steganography is as a third choice for data security [2, 5, 6, 12, 19, 61, 62, 81, 86, 93]. With the rapid development of applications based on Internet infrastructure, cloud computing becomes one of the hottest topics in the information technology area. Indeed, it is a computing system based on Internet that provides on-demand services from application and system software, storage to processing data. For example, when cloud users use the storage service, they can upload information to the servers and then access it on the Internet online. Meanwhile, enterprises can not spend big money on maintaining and owning a system consisting of hardware and software. Although cloud computing brings many benefits for individuals and organizations, cloud security is still an open problem when cloud providers can abuse their information and cloud users lose control of it. Thus, guaranteeing privacy of tenants’ information without negating the benefits of cloud computing seems necessary [28, 38, 40, 41, 60, 95, 102]. In order to protect cloud users’ privacy, sensitive data need to be encoded before outsourcing them to servers. Unfortunately, encryption makes the servers perform search on ciphertext much more difficult than on plaintext. To solve this problem, many searchable encryption techniques have been presented since 2000. Searchable encryption does not only store users’ encrypted data securely but also allows information search over ciphertext [26, 28, 29, 38, 40, 60, 71, 85, 102]. Searchable encryption for exact pattern matching is a new class of searchable encryption techniques. The solutions for this class have been presented based on algorithms for [26] or approaches to [41, 89] exact pattern matching. As in retrieving information from plaintexts, the development of searchable encryption with approximate string matching capability is necessary, where the search string can be a keyword determined, encrypted and stored in cloud servers or an arbitrary pattern [28, 40, 71]. From the above problems, together with methods using graph theory and automata proposed by P. T. Huy et al. of solving problems of exact pattern matching (2002), longest common subsequence (2002) and steganography (2011, 2012 and 2013), and their potential applications in steganography and searchable encryption, as well as under the direction of supervisors, the dissertation title assigned is research on development of methods of 1
graph theory and automata in steganography and searchable encryption. The purpose of the dissertation is to research on the development of new and quality solutions using graph theory and automata, suggesting their applications in, and applying them to steganography and searchable encryption. Based on the results and suggestions introduced by P. T. Huy et al., the dissertation will focus on following four problems in steganography and searchable encryption: - Digital image steganography; - Exact pattern matching; - Longest common subsequence; - Searchable encryption. The first problem is stated newly in Chapter 2, the three remaining problems are recalled and clarified in Chapter 1. For the first three problems, the dissertation’s work is to find new and efficient solutions using graph theory and automata. Then they will be used and applied to solve the last problem. The dissertation has been completed with structure as follows. Apart from Introduction at the beginning and Conclusion at the end of the dissertation, the main content of it is divided into five chapters. Chapter 1. Preliminaries. This chapter recalls basic knowledge indicated throughout the dissertation (strings, graph, deterministic finite automata, digital images, the basic model of digital image steganography, some parameters to determine the quality of digital image steganography, the exact pattern matching problem, the longest common subsequence problem, and searchable encryption), re-presents important concepts and results used and researched on development in remaining chapters of the dissertation (adjacency list, breadth first search, Galois field, the fastest optimal parity assignment method, the module method and the concept of the maximal secret data ratio, the concept of the degree of fuzziness (appearance), the Knapsack Shaking approach, and the definition of a cryptosystem). Chapter 2. Digital image steganography based on the Galois field using graph theory and automata. Firstly, from some proposed concepts of optimal and near optimal secret data hiding schemes, this chapter states the interest problem in digital image steganography. Secondly, the chapter proposes a new approach based on the Galois field using graph theory and automata to design a general form of steganography in binary, gray and palette images, shows sufficient conditions for existence and proves existence of some optimal and near optimal secret data hiding schemes, applies the proposed schemes to the process of hiding a finite sequence of secret data in an image and gives security analyses. Finally, the chapter presents experimental results to show the efficiency of the proposed results. Chapter 3. An automata approach to exact pattern matching. This chapter proposes a flexible approach using automata to design an effective algorithm for exact pattern matching in practice. In given cases of patterns and alphabets, the efficiency of the proposed algorithm is shown by theoretical analyses and experimental results. Chapter 4. Automata technique for the longest common subsequence problem. This chapter proposes two efficient sequential and parallel algorithms for computing the length of a longest common subsequence of two strings in practice, using automata technique. Theoretical analysis of parallel algorithm and experimental results 2
confirm that the use of the automata technique in designing algorithms for solving the longest common subsequence problem is the best choice. Chapter 5. Cryptography based on steganography and automata methods for searchable encryption. This chapter first proposes a novel cryptosystem based on a data hiding scheme proposed in Chapter 2 with high security. Additionally, ciphertexts do not depend on the input image size as existing hybrid techniques of cryptography and steganography, encoding and embedding are done at once. The chapter then applies results using automata technique of Chapters 3 and 4 to constructing two algorithms for exact and approximate pattern matching on secret data encrypted by the proposed cryptosystem. These algorithms have O(n) time complexity in the worst case, together with an assumption that the approximate algorithm uses d(1 − )me processors, where , m and n are the error of the string similarity measure proposed in this chapter and lengths of the pattern and secret data, respectively. In searchable encryption, the cryptosystem can be used to encode and decode secret data on users side and pattern matching algorithms can be used to perform pattern search on cloud providers side. The contents of the dissertation are written based on the paper [T1] published in, the revised manuscript [T4] submitted to KSII Transactions on Internet and Information Systems (ISI), and the papers [T2, T3] published in Journal of Computer Science and Cybernetics in 2019. The main results of the dissertation have been presented at: - Seminar on Mathematical Foundations for Computer Science at Institute of Mathematics, Vietnam Academy of Science and Technology, - The 9th Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018, - Seminar at School of Applied Mathematics and Informatics, Hanoi University of Science and Technology. 3
CHAPTER 1 PRELIMINARIES This chapter will attempt to recall terminologies, concepts, algorithms and results which are really needed in order to present the dissertation’s new results clearly and logically, as well as help readers follow the content of the dissertation easily. The background knowledge re-presented here consists of basic structures (Section 1.1: strings (Subsection 1.1.1), graph (Subsection 1.1.2), deterministic finite automata (Subsection 1.1.3), and the Galois field GF (pm ) (Subsection 1.1.4)), digital image steganography (Section 1.2), exact pattern matching (Section 1.3 ), longest common subsequence (Section 1.4) and searchable encryption (Section 1.5). 1.1 Basic Structures 1.1.1 Strings In this dissertation, secret data are considered as strings. So, some terms related to strings will be recalled here [11, 24, 83]. A finite set Σ is called an alphabet. The number of elements of Σ is denoted by |Σ|. An element of Σ is called a letter. A string x of length n on the alphabet Σ is a finite sequence of letters of Σ and we write x = x[1]x[2]..x[n], x[i] ∈ Σ, 1 ≤ i ≤ n, where n is a positive integer. A special string is the empty string having no letters, denoted by . The length of the string x is the number of letters in it, denoted by |x|. Then || = 0. Notice that for the string x = x[1]x[2]..x[n], we can also write x = x[1..n] in short. The set of all strings on the alphabet Σ is denoted by Σ∗ . The operator of strings is concatenation that writes strings as a compound. The concatenation of the two strings u1 and u2 is denoted by u1 u2 . Let x be a string. A string p is called a substring of the string x, if x = u1 pu2 for some strings u1 and u2 . In case u1 = (resp. u2 = ), the string p is called a prefix (resp. suffix) of the string x. The prefix (resp. suffix) p is called proper if p 6= x. Note that the prefix or the suffix can be empty. 1.1.2 Graph Besides some basic concepts in graph theory, this subsection recalls the way representing a graph by adjacency lists and breadth first search [82]. These are used in Chapter 2. A finite undirected graph (hereafter, called a graph for short) G = (V, E) consists of a nonempty finite set of vertices V and a finite set of edges, where each edge has either one or two vertices associated with it. A graph with weights assigned to their edges is called a weighted graph. 4
Communication Channel Cover Cover Stego Stego Image Embed Extract Image Image Image Send An edge connecting a vertex to itself to is called a loop. Multiple edges are edges connecting the same vertices. A graph having no loops and no multiple edges is called a simple graph. In a simple graph, the edge associated to an unordered pair of vertices {i, j} is called the Secret edge {i, j}. Key Secret Key Two vertices i and j in a graph G are called adjacent if they are vertices of an edge of G. Sender Receiver A graph without multiple edges can be described by using adjacency lists, which specify adjacent vertices of any vertex of the graph. Example 1.1. Using adjacency lists, the simple graph given in Figure 1.1 can be represented as in Table 1.1. 1 2 4 2 1 3 5 4 Figure 1.1. A simple graph Table 1.1. An adjacency list representation of the simple graph given in Figure 1.1 Vertex Adjacent vertices 1 2, 3 2 1, 3, 4 3 1, 2, 4, 5 4 2, 3, 5 5 3, 4 Given a simple graph G, a subgraph of G that is a tree including every vertex of G is called a spanning tree of G. A spanning tree of a connected simple graph can be built by using breadth first search (BFS). This algorithm is shown in pseudo-code as follows. Breadth First Search: Input: A connected simple graph G with vertices ordered as i1 , i2 , . . . , in . Output: A spanning tree T . Set T to be a tree consisting only i1 ; Set L to be an empty list; Put i1 in L While (L is not empty) { Remove the first vertex i from L; 5
Secret Data Communication For each adjacent vertex j of i Channel StegoIf (j is not in L and T ) Cover Extract Image Image{ Send to Add j to the end of L; Add j and the edge {i, j} to T ; } Secret Key } Return T ; Receiver End. Example 1.2. For a graph given in Figure 1.1, a spanning tree of this graph is found by using BFS as in Figure 1.2. 1 2 4 2 3 3 5 4 5 Figure 1.2. A spanning tree of the graph given in Figure 1.1 A graph with directed edges (or arcs) is called a directed graph. Each arc is associated with the ordered pair of vertices. In a simple directed graph, the arc associated with the ordered pair (i, j) called the arc (i, j). And the vertex i is said to be adjacent to the vertex j and the vertex j is said to be adjacent from the vertex i. 1.1.3 Deterministic Finite Automata Study on the problem of the construction and the use of deterministic finite automata is one of objectives of the dissertation. Hence, this subsection will clarify this model of computation [44, 82]. Definition 1.1 ([44]). Let Σ be a finite alphabet. A deterministic finite automaton (hereafter, called an automaton for short) A = (Σ, Q, q0 , δ, F ) over Σ consists of: • A finite set Q of elements called states, • An initial state q0 , one of the states in Q, • A set F of final states. The set F is a subset of Q, • A state transition function (or simply, transition function), denoted by δ, that takes as arguments a state and a letter, and returns a state, so that δ : Q × Σ → Q, • The transition function δ can be extended so that it takes a state and a string, and returns a state. Formally, this extended transition function δ can be defined recursively by δ : Q × Σ∗ → Q such that ∀q ∈ Q, δ(q, ) = q, ∀s ∈ Σ∗ , ∀a ∈ Σ, δ(q, as) = δ(δ(q, a), s). 6
An alternative and simple way presenting an automaton is to use the notation “transition diagram”. A transition diagram of an automaton A = (Σ, Q, q0 , δ, F ) is a directed graph given as follows [44]. a) Each state of Q is a vertex. b) Let q 0 = δ(q, a), where q is a state of Q and a is a letter of Σ. Then the transition diagram has an arc (q 0 , q) labeled a. If there are several letters that cause transitions from q 0 to q, then the arc (q 0 , q) is labeled by a list of these letters. c) There is an arrow into the initial state q0 . This arrow does not originate at any vertex. d) States not in F have a single circle. Vertices corresponding to final states are marked by a double circle. Example 1.3. Consider an automaton A = (Σ, Q, q0 , δ, F ) over Σ = {a, b}, where Q = {q0 , q1 , q2 }, F = {q2 }, and δ is given by the following table. Then the transition diagram of A is shown in Figure 1.3. δ a b q0 q0 q1 q1 q2 q1 q2 q2 q2 a a, b q0 b q1 a q2 b Figure 1.3. The transition diagram of A in Example 1.3 Definition 1.2 ([82]). A string p is said to be accepted by the automaton A = (Σ, Q, q0 , δ, F ) Secret Datathe initial state q0 to a final state, it means that if it takes δ(qData Secret 0 , p) is a state in F . Communication 1.1.4 The Galois Field GF (pm ) Channel Cover Cover Embed how to Stego Stego Extract This Image subsection describes construct a finite field with pm elements, called the Image m Image Image Galois field GF (p ), where p is prime and m ≥ 1 is an integer [88]. The algebraic structure will be used in Chapter 2. Send to Let p be a prime number. Define Zp [x] to be the set of all polynomials with the variable x, whose coefficients belong to the field Zp . Addition and multiplication in Zp [x] are defined in the usual way andSecret Key the coefficients modulo p at the end. then reduce Secret Key For f (x) ∈ Zp [x], the degree of f (x), denoted by deg(f ), is the largest exponent of Sender x in f (x). A polynomial f (x) ∈ Zp [x] is called to be irreducible if there does Receiver not exist 7
polynomials f1 (x), f2 (x) ∈ Zp [x] such that f (x) = f1 (x)f2 (x), where deg(f1 ) > 0 and deg(f2 ) > 0. Let f (x) ∈ Zp [x] be an irreducible polynomial with deg(f ) = m ≥ 1. Define Zp [x]/(f (x)) to be the set of pm polynomials of degree at most m − 1 in Zp [x]. Addition and multiplication in Zp [x]/(f (x)) are given as in Zp [x], followed by a reduction modulo f (x). Then Zp [x]/(f (x)) with these operations is a field having pm elements, called the Galois field GF (pm ). Note that for p is prime and m ≥ 1, the Galois field GF (pm ) is unique. 1.2 Digital Image Steganography The interest problem in Chapter 2 is digital image steganography. This section will recall the concept of digital images, the basic model of digital image steganography, some parameters to determine the efficiency of digital image steganography and lastly re-present results researched on development and used in Chapter 2 such as the fastest optimal parity assignment (FOPA) method, the module method and the concept of the maximal secret data ratio (MSDR) [18, 20, 21, 39, 49, 50, 51, 53, 61, 63, 65, 76, 78, 104]. A digital image is a matrix of pixels. Each pixel is represented by a non negative integer number in the form of a string of binary bits. This value indicates the colour of the pixel [39]. Note that based on the way representing of colours of pixels, digital images can be divided into following different types [78]. 1. Binary image: Each pixel is represented by one bit. In this image type, the colour of a pixel is white, “1” value, or black, “0” value. 2. Gray image: Each pixel is typically represented by eight bits (called 8-bit gray image). Then the colour of any pixel is a shade of gray, from black corresponding to colour value “0” to white corresponding to colour value “255”. 3. Red green blue image: Each pixel is usually represented by a string of 24 bits (called 24-bit RGB image), where the first 8 bits, the next 8 bits and the last 8 bits corresponds to shades of red, green and blue, specifying the red, green and blue colour components of the pixel, respectively. Then the colour of the pixel is a combination of these three components. 4. Palette image: The colour of each pixel is not shown directly by the number representing the pixel as for RGB images. Instead, this number is a colour index of the colour of the pixel existed in the colour table (the palette), an ordered set of values (strings of 24 bits) which represent all colours as in RGB images used in the image and contained in the file with the image. The size of the palette is the same as the length of a bit string representing a pixel and is limited by 8 bits. For a string of 8 bits, call palette images 8-bit palette images. The objective of digital image steganography is to protect data by hiding the data in a digital image well enough so that unauthorized users will not even be aware of their existence [21, 18]. Figure 1.4 shows the basic model of digital image steganography, where the cover image is a digital image used as a carrier to embed secret data into, the stego image is digital image obtained after embedding secret data into the cover image by the 8
function block Embed with the secret key on the Sender side. For steganography generally, a a, b the secret data needs to be extracted fully by the block Extract with the secret key on the Receiver side [20, 61, 63, 76]. The total number of the secret data b sequence abits embedded in the cover image is called q q q a Payload. Corresponding to a certain Payload, to measure the embedding capacity of the 0 1 2 cover image, the embedding rate (ER) is used and defined as follows [104]. b Payload ER = (bpp), (1.1) W ×H where W and H are the cover image’s width and height. Secret Data Secret Data Communication Channel Cover Cover Stego Stego Image Embed Extract Image Image Image Send to Secret Key Secret Key Sender Receiver Figure 1.4. The basic diagram of digital image steganography The peak signal to noise ratio (PSNR) is used to evaluate quality of stego image. Based on the value of PSNR, we can know the degree of similarity between the cover image and stego image. If the PSNR value is high, then quality of stego image is high. Conversely, quality of stego image is low. In general, for the digital image, PSNR is defined by the following formula [20, 53] 2 4 2552 2 PSNR = 10 log10 (dB), (1.2) MSE where 1 PW −1 PH−1 i=0 i=0 ((B(i, j) − B 0 (i, j))2 + (G(i, j) 0 2 5 j) − R0 (i, j))2 ) 3 − G (i, j)) + (R(i, 4 MSE = , 3×W ×H where B(i, j), G(i, j), R(i, j), B 0 (i, j), G0 (i, j) and R0 (i, j) are the colour value of the Blue, Green and Red components of a pixel at position (i, j) in the cover and stego image, respectively. For human’s eyes, the threshold value of PSNR value is 30dB [20, 53, 65, 104], it means that the PSNR value is higher than 30dB, it is hard to distinguish between the cover image and its stego image. Let G be a palette image and P = {c1 , c2 , . . . , cn } be its palette, where ci is the colour of a pixel of G corresponding to the colour index i. Each colour c in P is considered as a vector consisting of red, green and blue components. Suppose d is a distance function on P . The FOPA method [50] tries to get functions Next, Next: P → P , and Val, Val: P → Z2 , where two conditions are satisfied for all c ∈ P as follows. 9
1. d(c, Next(c)) = minv6=c∈P d(c, v), 2. Val(c) =Val(Next(c)) + 1 on the field Z2 . Call GP = (VP , EP ) a weighted complete undirected graph of the palette image G, where VP = P and the weight of the edge {c, c0 } is d(c, c0 ). The function Nearest, Nearest: P → P , is given by Nearest(c) = c0 holding d(c, c0 ) = minv6=c∈P d(c, v). A rho forest F = (V, E) is a directed graph with vertices weighted by the functionVal, where V = VP , E is a set of all arcs (v, Next(v)), the vertex v has the weightVal(v) for all v ∈ V . The construction of a algorithm determining F is the essence of the FOPA method. Algorithm for FOPA: Input: A weighted complete undirected graph GP , the function Nearest. Output: A rho forest F = (V, E). Choose a vertext c ∈ P , set V = {c}, and set C = P \{c}; SetVal(c) = 0; // Or 1 randomly While (C is not empty) // Update F { a) Take one element v ∈ C; b) Initialize v0 = v, setVal(v0 ) = 0 (or 1 randomly), by a finite loop, find a longest sequence of k + 1 different elements in P consecutively, v0 , v1 , . . . , vk , such that Nearest(vi ) = vi+1 for ∀i = 0, k − 1, vi ∈ C, vk ∈ C or vk ∈ V , and set Next(vi ) = vi+1 , i = 0, k − 1; b1) Case vk ∈ C: SetVal(vi ) = 1+Val(vi−1 ), i = 1, k and Next(vk ) = vk−1 ; Set V = V ∪ {v0 , v1 , . . . , vk } and C = C\{v0 , v1 , . . . , vk }; b2) Case vk ∈ F : SetVal(vi ) = 1+Val(vi+1 ), i = k − 1, . . . , 1, 0; Set V = V ∪ {v0 , v1 , . . . , vk−1 } and C = C\{v0 , v1 , . . . , vk−1 }; } Return F ; End. Definition 1.3 ([51]). Let M be a module over the ring Zm , k > 0 be a natural number, and U be a subset of M \{0}. Call U a k-base of M if for any v in M \{0}, there exist t elements v1 , v2 , . . . , vt ∈ U, t ≤ k, together with a1 , a2 , . . . , at ∈ Zm such that v = v1 a1 + v2 a2 + .. + vt at . Let G be a digital image, call CG the set of all colours of pixels in G. Consider the case m = 2 and G is a binary image. Then CG = {0, 1}, and for n is a positive integer, the set M = Z2n = {(x1 , x2 , . . . , xn )|xi ∈ Z2 , i = 1, n} with element addition and scalar multiplication defined as usual is a module over the ring Z2 [49]. For k = 1, the set U = M \{0} is an unique 1-base of M [51]. Two functions Next, Next: CG → CG , and Val, Val: CG → Z2 , satisfying the condition Val(c) =Val(Next(c)) + 1 on the ring Z2 , are defined in [49]. Suppose that for N ≥ |U |, I = {I1 , I2 , . . . , IN } is an arbitrary image block of G, K = {K1 , K2 , . . . , KN |Ki ∈ Z2 , i = 1, N } is a secret key, d is any element in M , and h is a surjective function from I to U . In the module method, d is considered as a secret data, embedded in and extracted from the image block I with the key K by the blocks Embed and Extract as follows [49, 51]. 10