Trnh Tn Đt
Khoa CNTT Đi Học Sài Gòn
Email: trinhtandat@sgu.edu.vn
Website: https://sites.google.com/site/ttdat88/
Contents
Introduction
Common Techniques in Data Classification
Handing Different Data Types
Variations on Data Classification
Introduction
Definition: Given a set of training data points along with associated training
labels, determine the class label for an unlabeled test instance.
Classification algorithms contain two phases:
Training Phase: a model is constructed from the training instances.
Testing Phase: the model is used to assign a label to an unlabeled test instance.
The output may be presented for a test instance in one of two ways
Discrete Label
Numerical Score
Introduction
Application domains
Customer Target Marketing
Medical Disease Diagnosis
Multimedia Data Analysis
Document Categorization and Filtering
Introduction
The work in the data classification area
Technique-centered: Numerous classes of techniques are studied such as decision
trees, neural networks, SVM methods, probabilistic methods,
Data-Type Centered: Many different data types are created by different
applications such as text, multimedia, uncertain data, time series, and discrete
sequence.
Variations on Classification Analysis: Numerous variations on the standard
classification problem exist, which deal with more challenging scenarios such as
rare class learning, transfer learning or semi-supervised learning