intTypePromotion=1
zunia.vn Tuyển sinh 2024 dành cho Gen-Z zunia.vn zunia.vn
ADSENSE

Báo cáo khoa học: "A Tool for Deep Semantic Encoding of Narrative Texts"

Chia sẻ: Hongphan_1 Hongphan_1 | Ngày: | Loại File: PDF | Số trang:4

57
lượt xem
4
download
 
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

We have developed a novel, publicly available annotation tool for the semantic encoding of texts, especially those in the narrative domain. Users can create formal propositions to represent spans of text, as well as temporal relations and other aspects of narrative. A built-in naturallanguage generation component regenerates text from the formal structures, which eases the annotation process. We have run collection experiments with the tool and shown that non-experts can easily create semantic encodings of short fables. ...

Chủ đề:
Lưu

Nội dung Text: Báo cáo khoa học: "A Tool for Deep Semantic Encoding of Narrative Texts"

  1. A Tool for Deep Semantic Encoding of Narrative Texts David K. Elson Kathleen R. McKeown Columbia University Columbia University New York City New York City delson@cs.columbia.edu kathy@cs.columbia.edu Abstract frequently, yet is rarely studied in computational linguistics. Narrative occurs with every other dis- We have developed a novel, publicly avail- course type, including dialogue, news, blogs and able annotation tool for the semantic en- multi-party interaction. Given the volume of nar- coding of texts, especially those in the rative prose on the Web, a system competent at un- narrative domain. Users can create for- derstanding narrative structures would be instru- mal propositions to represent spans of text, mental in a range of text processing tasks, such as well as temporal relations and other as summarization or the generation of biographies aspects of narrative. A built-in natural- for question answering. language generation component regener- In the pursuit of a complete and connected rep- ates text from the formal structures, which resentation of the underlying facts of a story, our eases the annotation process. We have annotation process involves the labeling of verb run collection experiments with the tool frames, thematic roles, temporal structure, modal- and shown that non-experts can easily cre- ity, causality and other features. This type of anno- ate semantic encodings of short fables. tation allows for machine learning on the thematic We present this tool as a stand-alone, re- dimension of narrative – that is, the aspects that usable resource for research in semantics unite a series of related facts into an engaging and in which formal encoding of text, espe- fulfilling experience for a reader. Our methodol- cially in a narrative form, is required. ogy is novel in its synthesis of several annotation 1 Introduction goals and its focus on content rather than expres- sion. We aim to separate the narrative’s fabula, the Research in language processing has benefited content dimension of the story, from the rhetori- greatly from the collection of large annotated cal presentation at the textual surface (sjuˇet) (Bal, z corpora such as Penn PropBank (Kingsbury and 1997). To this end, our model incorporates formal Palmer, 2002) and Penn Treebank (Marcus et al., elements found in other discourse-level annotation 1993). Such projects typically involve a formal projects such as Penn Discourse Treebank (Prasad model (such as a controlled vocabulary of thematic et al., 2008) and temporal markup languages such roles) and a corpus of text that has been anno- as TimeML (Mani and Pustejovsky, 2004). We tated against the model. One persistent tradeoff in call the representation a story graph, because these building such resources, however, is that a model elements are embodied by nodes and connected by with a wider scope is more challenging for anno- arcs that represent relationships such as temporal tators. For example, part-of-speech tagging is an order and motivation. easier task than PropBank annotation. We believe More specifically, our annotation process in- that careful user interface design can alleviate dif- volves the construction of propositions to best ap- ficulties in annotating texts against deep semantic proximate each of the events described in the tex- models. In this demonstration, we present a tool tual story. Every element of the representation we have developed, S CHEHERAZADE, for deep is formally defined from controlled vocabularies: annotation of text.1 the verb frames, with their thematic roles, are We are using the tool to collect semantic rep- adapted from VerbNet (Kipper et al., 2006), the resentations of narrative text. This domain occurs largest verb lexicon available in English. When 1 Available at http://www.cs.columbia.edu/˜delson. the verb frames are filled in to construct action 9 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pages 9–12, Suntec, Singapore, 3 August 2009. c 2009 ACL and AFNLP
  2. Figure 1: Screenshot from our tool showing the process of creating a formal proposition. On the left, the user is nesting three action propositions together; on the right, the user selects a particular frame from a searchable list. The resulting propositions are regenerated in rectangular boxes. propositions, the arguments are either themselves proposition with our tool involves selecting an ap- propositions or noun synsets from WordNet (the propriate frame and filling the arguments indicated largest available noun lexicon (Fellbaum, 1998)). by the thematic roles of the frame. Annotators are Annotators can also write stative propositions guided through the process by a natural-language and modifiers (with adjectives and adverbs culled generation component that is able to realize textual from WordNet), and distinguish between goals, equivalents of all possible propositions. A search plans, beliefs and other “hypothetical” modalities. in the interface for “flatter,” for example, offers a The representation supports connectives including list of relevant frames such as flat- causality and motivation between these elements. ters . Upon selecting this frame, an Finally, and crucially, each proposition is bound annotator is able to supply arguments by choosing to a state (time slice) in the story’s main timeline actors from a list of declared characters. “The fox (a linear sequence of states). Additional timelines flatters the crow,” for one, would be internally rep- can represent multi-state beliefs, goals or plans. In resented with the proposition ([Fox1 ], the course of authoring actions and statives, an- [Crow1 ]) where flatters, Fox and Crow are not notators create a detailed temporal framework to snippets of surface text, but rather selected Word- which they attach their propositions. Net and VerbNet records. (The subscript indi- cates that the proposition is invoking a particular 2 Description of Tool [Fox] instance that was previously declared.) In this manner an entire story can be encoded. The collection process is amenable to community and non-expert annotation by means of a graphical Figure 2 shows a screenshot from our interface encoding tool. We believe this resource can serve in which propositions are positioned on a timeline a range of experiments in semantics and human to indicate temporal relationships. On the right text comprehension. side of the screen are the original text (used for As seen in Figure 1, the process of creating a reference) and the entire story as regenerated from 10
  3. Figure 2: The main screen of our tool features a graphical timeline, as well as boxes for the reference text and the story as regenerated by the system from the formal model. the current state of the formal model. It is also pos- in thematic targets for automatic learning (such as sible from this screen to invoke modalities such dilemmas where characters must choose from be- as goals, plans and beliefs, and to indicate links tween competing values). between propositions. Annotators are instructed In the latter collection, both annotators were un- to construct propositions until the resulting textual dergraduates in our engineering school and native story, as realized by the generation component, is English speakers, with little background in lin- as close to their own understanding of the story as guistics. For this experiment, we instructed them permitted by the formal representation. to only model stated content (as opposed to includ- The tool includes annotation guidelines for con- ing inferences), and skip the linking to spans of structing the best propositions to approximate the source text. On average, they required 35-45 min- content of the story. Depending on the intended utes to encode a fable, though this decreased with use of the data, annotators may be instructed to practice. The 40 encodings include 574 proposi- model just the stated content in the text, or include tions, excluding those in hypothetical modalities. the implied content as well. (For example, causal The fables average 130 words in length (so the an- links between events are often not articulated in a notators created, on average, one proposition for text.) The resulting story graph is a unified rep- every nine words). resentation of the entire fabula, without a story’s Both annotators became comfortable with the beginning or end. In addition, the tool allows an- tool after a period of training; in surveys that they notators to select spans of text and link them to completed after each task, they gave Likert-scale the corresponding proposition(s). By indicating usability scores of 4.25 and 4.30 (averaged over which propositions were stated in the original text, all 20 tasks, with a score of 5 representing “easiest and in what order, the content and presentation di- to use”). The most frequently cited deficiencies in mensions of a story are cross-indexed. the model were abstract concepts such as fair (in the sense of a community event), which we plan to 3 Evaluation support in a future release. We have conducted several formative evaluations 4 Results and Future Work and data collection experiments with this inter- face. In one, four annotators each modeled four of The end result from a collection experiment is the fables attributed to Aesop. In another, two an- a collection of story graphs which are suitable notators each modeled twenty fables. We chose to for machine learning. An example story graph, model stories from the Aesop corpus due to sev- based on the state of the tool seen in Figure 2, is eral key advantages: the stories are mostly built shown in Figure 3. Nodes in the graph represent from simple declaratives, which are within the ex- states, declared objects and propositions (actions pressive range of our semantic model, yet are rich and statives). Each of the predicates (e.g., , 11
  4. !"#$%$&'())#*+,$)' -%.#/%$#' ()**&%+%,$-%./)**0%%%%% !"#"$%&% !"#"$%'% ;% 12$*3&%+%,$-%.12$*30% 425,&%+%,$-%.*25,0%%%% /$%()**&8% (0+,$)'1$2')31+4#)' Figure 3: A portion of a story graph representation as created by S CHEHERAZADE. , ) are linked to their corre- 5. Other features of the software package, such sponding VerbNet and WordNet records. as the setting of causal links and the ability to We are currently experimenting with ap- undo/redo. proaches for data-driven analysis of narrative con- tent along the “thematic” dimension as described 6. A review of the results of our formative eval- above. In particular, we are interested in the auto- uations and data collection experiments, in- matic discovery of deep similarities between sto- cluding surveys of user satisfaction. ries (such as analogous structures and prototypical characters). We are also interested in investigat- ing the selection and ordering of content in the References story’s telling (that is, which elements are stated Mieke Bal. 1997. Narratology: Introduction to the and which remain implied), especially as they per- Theory of Narrative. University of Toronto Press, tain to the reader’s affectual responses. We plan Toronto, second edition. to make the annotated corpus publicly available in Christiane Fellbaum. 1998. WordNet: An Electronic addition to the tool. Lexical Database. MIT Press, Cambridge, MA. Overall, while more work remains in expanding the model as well as the graphical interface, we Paul Kingsbury and Martha Palmer. 2002. From tree- bank to propbank. In Proceedings of the Third In- believe we are providing to the community a valu- ternational Conference on Language Resources and able new tool for eliciting semantic encodings of Evaluation (LREC-02), Canary Islands, Spain. narrative texts for machine learning purposes. Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. Extensive classifications of 5 Script Outline english verbs. In Proceedings of the 12th EURALEX International Congress, Turin, Italy. Our demonstration involves a walk-through of the S CHEHERAZADE tool. It includes: Inderjeet Mani and James Pustejovsky. 2004. Tem- poral discourse models for narrative structure. In 1. An outline of the goals of the project and the Proceedings of the ACL Workshop on Discourse An- notation, Barcelona, Spain. innovative aspects of our formal representa- tion compared to other representations cur- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and rently in the field. Beatrice Santorini. 1993. Building a large anno- tated corpus of english: The penn treebank. Compu- 2. A tour of the timeline screen (equivalent to tational Linguistics, 19. Figure 2) as configured for a particular Aesop Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Milt- fable. sakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. 2008. The penn discourse treebank 2.0. In 3. The procedure for reading a text for impor- Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). tant named entities, and formally declaring these named entities for the story graph. 4. The process for constructing propositions in order to encode actions and statives in the text, as seen in Figure 1. 12
ADSENSE

CÓ THỂ BẠN MUỐN DOWNLOAD

 

Đồng bộ tài khoản
2=>2