Sentiment analysis of citations in scientiﬁc papers and articles is a new and interesting problem due to the many linguistic differences between scientiﬁc texts and other genres. In this paper, we focus on the problem of automatic identiﬁcation of positive and negative sentiment polarity in citations to scientiﬁc papers. Using a newly constructed annotated citation sentiment corpus, we explore the effectiveness of existing and novel features, including n-grams, specialised science-speciﬁc lexical features, dependency relations, sentence splitting and negation features. ...
Sentiment classiﬁcation refers to the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a subject at hand. The proliferation of user-generated web content such as blogs, discussion forums and online review sites has made it possible to perform large-scale mining of public opinion. Sentiment modeling is thus becoming a critical component of market intelligence and social media technologies that aim to tap into the collective wisdom of crowds.
One of the central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007).
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors.
Most previous work on multilingual sentiment analysis has focused on methods to adapt sentiment resources from resource-rich languages to resource-poor languages. We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data. We rely on the intuition that the sentiment labels for parallel sentences should be similar and present a model that jointly learns improved monolingual sentiment classifiers for each language. ...
Although Subjectivity and Sentiment Analysis (SSA) has been witnessing a ﬂurry of novel research, there are few attempts to build SSA systems for Morphologically-Rich Languages (MRL). In the current study, we report efforts to partially ﬁll this gap. We present a newly developed manually annotated corpus of Modern Standard Arabic (MSA) together with a new polarity lexicon.
Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string.
The lack of Chinese sentiment corpora limits the research progress on Chinese sentiment classification. However, there are many freely available English sentiment corpora on the Web. This paper focuses on the problem of cross-lingual sentiment classification, which leverages an available English corpus for Chinese sentiment classification by using the English corpus as training data.
In this paper, we deﬁne a new type of summary for sentiment analysis: a singlesentence summary that consists of a supporting sentence that conveys the overall sentiment of a review as well as a convincing reason for this sentiment. We present a system for extracting supporting sentences from online product reviews, based on a simple and unsupervised method. We design a novel comparative evaluation method for summarization, using a crowdsourcing service.
The amount of labeled sentiment data in English is much larger than that in other languages. Such a disproportion arouse interest in cross-lingual sentiment classification, which aims to conduct sentiment classification in the target language (e.g. Chinese) using labeled data in the source language (e.g. English).
Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classiﬁcation, but their performance varies greatly depending on the model variant, features used and task/ dataset.
This paper describes a system for real-time analysis of public sentiment toward presidential candidates in the 2012 U.S. election as expressed on Twitter, a microblogging service. Twitter has become a central site where people express their opinions and views on political parties and candidates. Emerging events or news are often followed almost instantly by a burst in Twitter volume, providing a unique opportunity to gauge the relation between expressed public sentiment and electoral events.
Most sentiment analysis approaches use as baseline a support vector machines (SVM) classiﬁer with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classiﬁcation accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide signiﬁcant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing.
We propose a novel algorithm for sentiment summarization that takes account of informativeness and readability, simultaneously. Our algorithm generates a summary by selecting and ordering sentences taken from multiple review texts according to two scores that represent the informativeness and readability of the sentence order. The informativeness score is deﬁned by the number of sentiment expressions and the readability score is learned from the target corpus. We evaluate our method by summarizing reviews on restaurants. ...
The translation of sentiment information is a task from which sentiment analysis systems can beneﬁt. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
We describe a sentiment classiﬁcation method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automatically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to ﬁnd the association between words that express similar sentiments in different domains.
Unsupervised vector-based approaches to semantics can model rich lexical meanings, but they largely fail to capture sentiment information that is central to many word meanings and important for a wide range of NLP tasks. We present a model that uses a mix of unsupervised and supervised techniques to learn word vectors capturing semantic term–document information as well as rich sentiment content.
Sentiment analysis on Twitter data has attracted much attention recently. In this paper, we focus on target-dependent Twitter sentiment classification; namely, given a query, we classify the sentiments of the tweets as positive, negative or neutral according to whether they contain positive, negative or neutral sentiments about that query. Here the query serves as the target of the sentiments. The state-ofthe-art approaches for solving this problem always adopt the target-independent strategy, which may assign irrelevant sentiments to the given target. ...
Recent advances in Machine Translation (MT) have brought forth a new paradigm for building NLP applications in low-resource scenarios. To build a sentiment classiﬁer for a language with no labeled resources, one can translate labeled data from another language, then train a classiﬁer on the translated text. This can be viewed as a domain adaptation problem, where labeled translations and test data have some mismatch.
We derive two variants of a semi-supervised model for ﬁne-grained sentiment analysis. Both models leverage abundant natural supervision in the form of review ratings, as well as a small amount of manually crafted sentence labels, to learn sentence-level sentiment classiﬁers. The proposed model is a fusion of a fully supervised structured conditional model and its partially supervised counterpart. This allows for highly efﬁcient estimation and inference algorithms with rich feature deﬁnitions. ...