
Trịnh Tấn Đạt
Khoa CNTT – Đại Học Sài Gòn
Email: trinhtandat@sgu.edu.vn
Website: https://sites.google.com/site/ttdat88/

Contents
Introduction
Common Techniques in Data Classification
Handing Different Data Types
Variations on Data Classification

Introduction
Definition: Given a set of training data points along with associated training
labels, determine the class label for an unlabeled test instance.
Classification algorithms contain two phases:
Training Phase: a model is constructed from the training instances.
Testing Phase: the model is used to assign a label to an unlabeled test instance.
The output may be presented for a test instance in one of two ways
Discrete Label
Numerical Score

Introduction
Application domains
Customer Target Marketing
Medical Disease Diagnosis
Multimedia Data Analysis
Document Categorization and Filtering
…

Introduction
The work in the data classification area
Technique-centered: Numerous classes of techniques are studied such as decision
trees, neural networks, SVM methods, probabilistic methods, …
Data-Type Centered: Many different data types are created by different
applications such as text, multimedia, uncertain data, time series, and discrete
sequence.
Variations on Classification Analysis: Numerous variations on the standard
classification problem exist, which deal with more challenging scenarios such as
rare class learning, transfer learning or semi-supervised learning

