  • Brian P. Cleary is the creator of the best-selling Words Are CATegorical(tm) series, now a 13-volume set with more than 2 million copies in print. He is also the author of the Math Is CATegorical(tm) series and the single titles Rainbow Soup: Adventures in Poetry, Rhyme and PUNishment: Adventures in Wordplay, Eight Wild Nights: A Family Hanukkah Tale, Peanut Butter and Jellyfishes: A Very Silly Alphabet Book and The Laugh Stand: Adventures in Humor. Mr. Cleary lives in Cleveland, Ohio.

  • Almost all research in the social and behavioral sciences, and also in eco­ nomic and marketing research, criminological research, and social medical research deals with the analysis of categorical data. Categorical data are quantified as either nominal or ordinal variables. This volume is a collec­ tion of up-to-date studies on modern categorical data analysis methods, emphasizing their application to relevant and interesting data sets.

  • Words and character-bigrams are both used as features in Chinese text processing tasks, but no systematic comparison or analysis of their values as features for Chinese text categorization has been reported heretofore.

  • Most text message normalization approaches are based on supervised learning and rely on human labeled training data. In addition, the nonstandard words are often categorized into different types and specific models are designed to tackle each type. In this paper, we propose a unified letter transformation approach that requires neither pre-categorization nor human supervision.

  • In text categorization, feature selection (FS) is a strategy that aims at making text classifiers more efficient and accurate. However, when dealing with a new task, it is still difficult to quickly select a suitable one from various FS methods provided by many previous studies. In this paper, we propose a theoretic framework of FS methods based on two basic measurements: frequency measurement and ratio measurement. Then six popular FS methods are in detail discussed under this framework.

  • Text categorization is a crucial and wellproven method for organizing the collection of large scale documents. In this paper, we propose a hierarchical multi-class text categorization method with global margin maximization. We not only maximize the margins among leaf categories, but also maximize the margins among their ancestors. Experiments show that the performance of our algorithm is competitive with the recently proposed hierarchical multi-class classification algorithms.

  • We address the rating-inference problem, wherein rather than simply decide whether a review is “thumbs up” or “thumbs down”, as in previous sentiment analysis work, one must determine an author’s evaluation with respect to a multi-point scale (e.g., one to five “stars”). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, “three stars” is intuitively closer to “four stars” than to “one star”.

  • Automatic detection of general relations between short texts is a complex task that cannot be carried out only relying on language models and bag-of-words. Therefore, learning methods to exploit syntax and semantics are required. In this paper, we present a new kernel for the representation of shallow semantic information along with a comprehensive study on kernel methods for the exploitation of syntactic/semantic structures for short text pair categorization.

  • In this paper, we encode topic dependencies in hierarchical multi-label Text Categorization (TC) by means of rerankers. We represent reranking hypotheses with several innovative kernels considering both the structure of the hierarchy and the probability of nodes. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. ...

  • This paper presents an approach to text categorization that i) uses no machine learning and ii) reacts on-the-fly to unknown words. These features are important for categorizing Blog articles, which are updated on a daily basis and filled with newly coined words. We categorize 600 Blog articles into 12 domains. As a result, our categorization method achieved an accuracy of 94.0% (564/600).

  • This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance — as measured by micro-averaged F-measure on a standard text categorization collection — is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords.

  • Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled documents in a source language (e.g. Italian). In this work we present many solutions according to the availability of bilingual resources, and we show that it is possible to deal with the problem even when no such resources are accessible. The core technique relies on the automatic acquisition of Multilingual Domain Models from comparable corpora. ...

  • A wide range of supervised learning algorithms has been applied to Text Categorization. However, the supervised learning approaches have some problems. One of them is that they require a large, often prohibitive, number of labeled training documents for accurate learning. Generally, acquiring class labels for training data is costly, while gathering a large quantity of unlabeled data is cheap. We here propose a new automatic text categorization method for learning from only unlabeled data using a bootstrapping framework and a feature projection technique.

  • We introduce two novel methods of text categorization in which documents are split into fragments. We conducted experiments on English, French and Czech. In all cases, the problems referred to a binary document classification. We find that both methods increase the accuracy of text categorization. For the Na¨ve Bayes classifier this increase is ı significant.

  • In this fun-filled book, playful puns and comical cartoon cats combine to show, not tell, readers what prepositions are all about. Each preposition in the text, like under, over, by the clover, about, throughout, and next to Rover, is highlighted in color for easy identification. This is the newest addition to the Words Are CATagorical(tm) series, which has sold over 450,000 copies.

  • Phân loại văn bản là một vấn đề quan trọng trong lĩnh vực xử lý ngôn ngữ. Nhiệm vụ của bài toán này là gán các tài liệu văn bản vào nhóm các chủ đề cho trước.

  • "Word-Nerd," Brian P. Cleary and highly-acclaimed illustrator, Brian Gable collaborate to clarify the concept of synonyms for young readers with playful, lively, and whimsical rhymes and humorous, comical, and amusing illustrations. For easy identification, synonyms are printed in color, and key words are illustrated on each page. This funny, best-selling series shows, not tells, each part of speech.

  • Tham khảo sách 'a mink, a fink, a skating rink: what is a noun words are categorical', tài liệu phổ thông, tiếng anh phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

  • Abstract—This paper extends a novel Vietnamese segmentation approach for text categorization. Instead of using annotated training corpus or lexicon which is still lack in Vietnam, we use statistic information extracted directly from a commercial search engine and genetic algorithm to find the most reasonable way of segmentation. The extracted information is document frequency of segmented words. We conduct many thorough experiments to find out the most appropriate mutual information formula in word segmentation step.

  • Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Robust Object Categorization and Segmentation Motivated by Visual Contexts in the Human Visual System

