TIN X LÝ D LIU
ThS. Dương Phi Long Email: longdp@uit.edu.vn
Chương 2:
TRƯNG ÐI HC CÔNG NGH THÔNG TIN
KHOA H THNG THÔNG TIN
Tài liu bài ging:
KHAI THÁC D LIU IS252
01
02
03
04
05
2
NI DUNG BÀI HC
Giithiu
Làmsch dliu(Data cleaning)
Tích hp d liu (Data integration)
Rút gn d liu (Data reduction)
Biến đổi, mã hóa d liu (Data transformation)
3
Giithiu
1. Cácdng bdliu
2. Đốitượng dliu
3. Thuctính
4. Thu thpdliu
5. Chtlượng cadliu
6. Tinxlýdliu
7. Cáckthut tinxlýdliu
4
Dliu
Data
15
Un-structured
texts in websites, emails, articles, tweets 2D/3D images, videos + meta spectrograms, DNAs, …
Structured relational (table-like)
Data
15
Un-structured
texts in websites, emails, articles, tweets 2D/3D images, videos + meta spectrograms, DNAs,
Structured relational (table-like)
5
1. Cácdng bdliu(Type of Data sets)
-Record
Relational records
Data matrix:numerical matrix,
crosstabs
Document data:text documents
term-frequency vector
Transaction data
-Graph and network
World Wide Web
Social or information networks
Molecular Structures
a) Record
b) Data matrix
c) Transaction data
d) Document-term matrix