Shiitake mushroom contains several therapeutic actions such as antioxidant and antimicrobial properties, carried by the diversity of
its components. In the present work, extracts from shiitake mushroom were obtained using different extraction techniques: high-pressure
operations and low-pressure methods. The high-pressure technique was applied to obtain shiitake extracts using pure CO2 and
CO2 with co-solvent in pressures up to 30 MPa.
Automatic key phrase extraction is fundamental to the success of many recent digital library applications and semantic information retrieval techniques and a difficult and essential problem in Vietnamese natural language processing (NLP). In this work, we propose a novel method for key phrase extracting of Vietnamese text that exploits the Vietnamese Wikipedia as an ontology and exploits specific characteristics of the Vietnamese language for the key phrase selection stage.
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping.
Hidden Markov models (HMMs) are powerful statistical models that have found successful applications in Information Extraction (IE). In current approaches to applying HMMs to IE, an HMM is used to model text at the document level. This modelling might cause undesired redundancy in extraction in the sense that more than one ﬁller is identiﬁed and extracted. We propose to use HMMs to model text at the segment level, in which the extraction process consists of two steps: a segment retrieval step followed by an extraction step. ...
In this paper we compare different approaches to extract deﬁnitions of four types using a combination of a rule-based grammar and machine learning. We collected a Dutch text corpus containing 549 deﬁnitions and applied a grammar on it. Machine learning was then applied to improve the results obtained with the grammar. Two machine learning experiments were carried out. In the ﬁrst experiment, a standard classiﬁer and a classiﬁer designed speciﬁcally to deal with imbalanced datasets are compared.
This paper describes an approach to extract the aspectual information of Japanese verb phrases from a monolingual corpus. We classify Verbs into six categories by means of the aspectual features which are defined on the basis of the possibility of co-occurrence with aspectual forms and adverbs. A unique category could be identified for 96% of the target verbs. To evaluate the result of the experiment, we examined the meaning of -leiru which is one of the most fundamental aspectual markers in Japanese, and obtained the correct recognition score of 71% for the 200 sentences. ...
Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-speciﬁc examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall. ...
Classical Information Extraction (IE) systems ﬁll slots in domain-speciﬁc frames. This paper reports on S EQ, a novel open IE system that leverages a domainindependent frame to extract ordered sequences such as presidents of the United States or the most common causes of death in the U.S. S EQ leverages regularities about sequences to extract a coherent set of sequences from Web text. S EQ nearly doubles the area under the precision-recall curve compared to an extractor that does not exploit these regularities. ...
Joint sentiment-topic (JST) model was previously proposed to detect sentiment and topic simultaneously from text. The only supervision required by JST model learning is domain-independent polarity word priors. In this paper, we modify the JST model by incorporating word polarity priors through modifying the topic-word Dirichlet priors.
We learn a joint model of sentence extraction and compression for multi-document summarization. Our model scores candidate summaries according to a combined linear model whose features factor over (1) the n-gram types in the summary and (2) the compressions used. We train the model using a marginbased objective whose loss captures end summary quality. Because of the exponentially large set of candidate summaries, we use a cutting-plane algorithm to incrementally detect and add active constraints efﬁciently. ...
In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difﬁcult to identify. We propose a novel algorithmic approach to RE that starts by ﬁrst identifying these structures and then, within these, identifying the semantic type of the relation.
We present a ﬁrst known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classiﬁcation. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora.
Recently, several latent topic analysis methods such as LSI, pLSI, and LDA have been widely used for text analysis. However, those methods basically assign topics to words, but do not account for the events in a document. With this background, in this paper, we propose a latent topic extracting method which assigns topics to events.
In my thesis, I propose to build a system that would enable extraction of social interactions from texts. To date I have deﬁned a comprehensive set of social events and built a preliminary system that extracts social events from news articles. I plan to improve the performance of my current system by incorporating semantic information. Using domain adaptation techniques, I propose to apply my system to a wide range of genres.
We present a mobile touchable application for online topic graph extraction and exploration of web content. The system has been implemented for operation on an iPad. The topic graph is constructed from N web snippets which are determined by a standard search engine. We consider the extraction of a topic graph as a speciﬁc empirical collocation extraction task where collocations are extracted between chunks.
The applicability of many current information extraction techniques is severely limited by the need for supervised training data. We demonstrate that for certain ﬁeld structured extraction tasks, such as classiﬁed advertisements and bibliographic citations, small amounts of prior knowledge can be used to learn effective models in a primarily unsupervised fashion. Although hidden Markov models (HMMs) provide a suitable generative model for ﬁeld structured text, general unsupervised HMM learning fails to learn useful structure in either of our domains.
In this paper we address the problem of extracting key pieces of information from voicemail messages, such as the identity and phone number of the caller. This task differs from the named entity task in that the information we are interested in is a subset of the named entities in the message, and consequently, the need to pick the correct subset makes the problem more difﬁcult. Also, the caller’s identity may include information that is not typically associated with a named entity.
This paper proposes an approach to full parsing suitable for Information Extraction from texts. Sequences of cascades of rules deterministically analyze the text, building unambiguous structures. Initially basic chunks are analyzed; then argumental relations are recognized; finally modifier attachment is performed and the global parse tree is built. The approach was proven to work for three languages and different domains. It was implemented in the IE module of FACILE, a EU project for multilingual text classification and !E. ...
Lexicon definition is one of the main bottlenecks in the development of new applications in the field of Information Extraction from text. Generic resources (e.g., lexical databases) are promising for reducing the cost of specific lexica definition, but they introduce lexical ambiguity. This paper proposes a methodology for building application-specific lexica by using WordNet. Lexical ambiguity is kept under control by marking synsets in WordNet with field labels taken from the Dewey Decimal Classification. tion requirement.
Aspect extraction is a central problem in sentiment analysis. Current methods either extract aspects without categorizing them, or extract and categorize them using unsupervised topic modeling. By categorizing, we mean the synonymous aspects should be clustered into the same category. In this paper, we solve the problem in a different setting where the user provides some seed words for a few aspect categories and the model extracts and clusters aspect terms into categories simultaneously.