
N.L.P.
NATURAL LANGUAGE PROCESSING
Teacher: Lê Ngọc Tấn
Email: letan.dhcn@gmail.com
Blog: http://lengoctan.wordpress.com
Trường Đại học Công nghiệp Tp. HCM
Khoa Công nghệ thông tin
(Faculty of Information Technology)

Chapter 4
Computational Linguistics
NLP. p.2

What is computational linguistics?
It is an interdisciplinary field dealing with the statistical
or rule-based modeling of natural language from a
computational perspective
Corpus, Corpora
Pre-processing : normalization, tokenization,…
Alignment Methods
Programming
NLP. p.3

Corpus Definitions
What is a corpus?
–It contains an important number of texts
–Corpora : a set of corpus
Golden corpus
–Brown Corpus
–Susanne Corpus
–EUROPARL Corpus
Corpus can be annotated or POS tagged
NLP. p.4

Corpus Categories (1)
Schema of corpus evolution
NLP. p.5