Báo cáo khoa học: "A Tool for Deep Semantic Encoding of Narrative Texts"
lượt xem 4
download
We have developed a novel, publicly available annotation tool for the semantic encoding of texts, especially those in the narrative domain. Users can create formal propositions to represent spans of text, as well as temporal relations and other aspects of narrative. A built-in naturallanguage generation component regenerates text from the formal structures, which eases the annotation process. We have run collection experiments with the tool and shown that non-experts can easily create semantic encodings of short fables. ...
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Báo cáo khoa học: "A Tool for Deep Semantic Encoding of Narrative Texts"
- A Tool for Deep Semantic Encoding of Narrative Texts David K. Elson Kathleen R. McKeown Columbia University Columbia University New York City New York City delson@cs.columbia.edu kathy@cs.columbia.edu Abstract frequently, yet is rarely studied in computational linguistics. Narrative occurs with every other dis- We have developed a novel, publicly avail- course type, including dialogue, news, blogs and able annotation tool for the semantic en- multi-party interaction. Given the volume of nar- coding of texts, especially those in the rative prose on the Web, a system competent at un- narrative domain. Users can create for- derstanding narrative structures would be instru- mal propositions to represent spans of text, mental in a range of text processing tasks, such as well as temporal relations and other as summarization or the generation of biographies aspects of narrative. A built-in natural- for question answering. language generation component regener- In the pursuit of a complete and connected rep- ates text from the formal structures, which resentation of the underlying facts of a story, our eases the annotation process. We have annotation process involves the labeling of verb run collection experiments with the tool frames, thematic roles, temporal structure, modal- and shown that non-experts can easily cre- ity, causality and other features. This type of anno- ate semantic encodings of short fables. tation allows for machine learning on the thematic We present this tool as a stand-alone, re- dimension of narrative – that is, the aspects that usable resource for research in semantics unite a series of related facts into an engaging and in which formal encoding of text, espe- fulfilling experience for a reader. Our methodol- cially in a narrative form, is required. ogy is novel in its synthesis of several annotation 1 Introduction goals and its focus on content rather than expres- sion. We aim to separate the narrative’s fabula, the Research in language processing has benefited content dimension of the story, from the rhetori- greatly from the collection of large annotated cal presentation at the textual surface (sjuˇet) (Bal, z corpora such as Penn PropBank (Kingsbury and 1997). To this end, our model incorporates formal Palmer, 2002) and Penn Treebank (Marcus et al., elements found in other discourse-level annotation 1993). Such projects typically involve a formal projects such as Penn Discourse Treebank (Prasad model (such as a controlled vocabulary of thematic et al., 2008) and temporal markup languages such roles) and a corpus of text that has been anno- as TimeML (Mani and Pustejovsky, 2004). We tated against the model. One persistent tradeoff in call the representation a story graph, because these building such resources, however, is that a model elements are embodied by nodes and connected by with a wider scope is more challenging for anno- arcs that represent relationships such as temporal tators. For example, part-of-speech tagging is an order and motivation. easier task than PropBank annotation. We believe More specifically, our annotation process in- that careful user interface design can alleviate dif- volves the construction of propositions to best ap- ficulties in annotating texts against deep semantic proximate each of the events described in the tex- models. In this demonstration, we present a tool tual story. Every element of the representation we have developed, S CHEHERAZADE, for deep is formally defined from controlled vocabularies: annotation of text.1 the verb frames, with their thematic roles, are We are using the tool to collect semantic rep- adapted from VerbNet (Kipper et al., 2006), the resentations of narrative text. This domain occurs largest verb lexicon available in English. When 1 Available at http://www.cs.columbia.edu/˜delson. the verb frames are filled in to construct action 9 Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, pages 9–12, Suntec, Singapore, 3 August 2009. c 2009 ACL and AFNLP
- Figure 1: Screenshot from our tool showing the process of creating a formal proposition. On the left, the user is nesting three action propositions together; on the right, the user selects a particular frame from a searchable list. The resulting propositions are regenerated in rectangular boxes. propositions, the arguments are either themselves proposition with our tool involves selecting an ap- propositions or noun synsets from WordNet (the propriate frame and filling the arguments indicated largest available noun lexicon (Fellbaum, 1998)). by the thematic roles of the frame. Annotators are Annotators can also write stative propositions guided through the process by a natural-language and modifiers (with adjectives and adverbs culled generation component that is able to realize textual from WordNet), and distinguish between goals, equivalents of all possible propositions. A search plans, beliefs and other “hypothetical” modalities. in the interface for “flatter,” for example, offers a The representation supports connectives including list of relevant frames such as flat- causality and motivation between these elements. ters . Upon selecting this frame, an Finally, and crucially, each proposition is bound annotator is able to supply arguments by choosing to a state (time slice) in the story’s main timeline actors from a list of declared characters. “The fox (a linear sequence of states). Additional timelines flatters the crow,” for one, would be internally rep- can represent multi-state beliefs, goals or plans. In resented with the proposition ([Fox1 ], the course of authoring actions and statives, an- [Crow1 ]) where flatters, Fox and Crow are not notators create a detailed temporal framework to snippets of surface text, but rather selected Word- which they attach their propositions. Net and VerbNet records. (The subscript indi- cates that the proposition is invoking a particular 2 Description of Tool [Fox] instance that was previously declared.) In this manner an entire story can be encoded. The collection process is amenable to community and non-expert annotation by means of a graphical Figure 2 shows a screenshot from our interface encoding tool. We believe this resource can serve in which propositions are positioned on a timeline a range of experiments in semantics and human to indicate temporal relationships. On the right text comprehension. side of the screen are the original text (used for As seen in Figure 1, the process of creating a reference) and the entire story as regenerated from 10
- Figure 2: The main screen of our tool features a graphical timeline, as well as boxes for the reference text and the story as regenerated by the system from the formal model. the current state of the formal model. It is also pos- in thematic targets for automatic learning (such as sible from this screen to invoke modalities such dilemmas where characters must choose from be- as goals, plans and beliefs, and to indicate links tween competing values). between propositions. Annotators are instructed In the latter collection, both annotators were un- to construct propositions until the resulting textual dergraduates in our engineering school and native story, as realized by the generation component, is English speakers, with little background in lin- as close to their own understanding of the story as guistics. For this experiment, we instructed them permitted by the formal representation. to only model stated content (as opposed to includ- The tool includes annotation guidelines for con- ing inferences), and skip the linking to spans of structing the best propositions to approximate the source text. On average, they required 35-45 min- content of the story. Depending on the intended utes to encode a fable, though this decreased with use of the data, annotators may be instructed to practice. The 40 encodings include 574 proposi- model just the stated content in the text, or include tions, excluding those in hypothetical modalities. the implied content as well. (For example, causal The fables average 130 words in length (so the an- links between events are often not articulated in a notators created, on average, one proposition for text.) The resulting story graph is a unified rep- every nine words). resentation of the entire fabula, without a story’s Both annotators became comfortable with the beginning or end. In addition, the tool allows an- tool after a period of training; in surveys that they notators to select spans of text and link them to completed after each task, they gave Likert-scale the corresponding proposition(s). By indicating usability scores of 4.25 and 4.30 (averaged over which propositions were stated in the original text, all 20 tasks, with a score of 5 representing “easiest and in what order, the content and presentation di- to use”). The most frequently cited deficiencies in mensions of a story are cross-indexed. the model were abstract concepts such as fair (in the sense of a community event), which we plan to 3 Evaluation support in a future release. We have conducted several formative evaluations 4 Results and Future Work and data collection experiments with this inter- face. In one, four annotators each modeled four of The end result from a collection experiment is the fables attributed to Aesop. In another, two an- a collection of story graphs which are suitable notators each modeled twenty fables. We chose to for machine learning. An example story graph, model stories from the Aesop corpus due to sev- based on the state of the tool seen in Figure 2, is eral key advantages: the stories are mostly built shown in Figure 3. Nodes in the graph represent from simple declaratives, which are within the ex- states, declared objects and propositions (actions pressive range of our semantic model, yet are rich and statives). Each of the predicates (e.g., , 11
- !"#$%$&'())#*+,$)' -%.#/%$#' ()**&%+%,$-%./)**0%%%%% !"#"$%&% !"#"$%'% ;% 12$*3&%+%,$-%.12$*30% 425,&%+%,$-%.*25,0%%%% /$%()**&8% (0+,$)'1$2')31+4#)' Figure 3: A portion of a story graph representation as created by S CHEHERAZADE. , ) are linked to their corre- 5. Other features of the software package, such sponding VerbNet and WordNet records. as the setting of causal links and the ability to We are currently experimenting with ap- undo/redo. proaches for data-driven analysis of narrative con- tent along the “thematic” dimension as described 6. A review of the results of our formative eval- above. In particular, we are interested in the auto- uations and data collection experiments, in- matic discovery of deep similarities between sto- cluding surveys of user satisfaction. ries (such as analogous structures and prototypical characters). We are also interested in investigat- ing the selection and ordering of content in the References story’s telling (that is, which elements are stated Mieke Bal. 1997. Narratology: Introduction to the and which remain implied), especially as they per- Theory of Narrative. University of Toronto Press, tain to the reader’s affectual responses. We plan Toronto, second edition. to make the annotated corpus publicly available in Christiane Fellbaum. 1998. WordNet: An Electronic addition to the tool. Lexical Database. MIT Press, Cambridge, MA. Overall, while more work remains in expanding the model as well as the graphical interface, we Paul Kingsbury and Martha Palmer. 2002. From tree- bank to propbank. In Proceedings of the Third In- believe we are providing to the community a valu- ternational Conference on Language Resources and able new tool for eliciting semantic encodings of Evaluation (LREC-02), Canary Islands, Spain. narrative texts for machine learning purposes. Karin Kipper, Anna Korhonen, Neville Ryant, and Martha Palmer. 2006. Extensive classifications of 5 Script Outline english verbs. In Proceedings of the 12th EURALEX International Congress, Turin, Italy. Our demonstration involves a walk-through of the S CHEHERAZADE tool. It includes: Inderjeet Mani and James Pustejovsky. 2004. Tem- poral discourse models for narrative structure. In 1. An outline of the goals of the project and the Proceedings of the ACL Workshop on Discourse An- notation, Barcelona, Spain. innovative aspects of our formal representa- tion compared to other representations cur- Mitchell P. Marcus, Mary Ann Marcinkiewicz, and rently in the field. Beatrice Santorini. 1993. Building a large anno- tated corpus of english: The penn treebank. Compu- 2. A tour of the timeline screen (equivalent to tational Linguistics, 19. Figure 2) as configured for a particular Aesop Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Milt- fable. sakaki, Livio Robaldo, Aravind Joshi, and Bonnie Webber. 2008. The penn discourse treebank 2.0. In 3. The procedure for reading a text for impor- Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). tant named entities, and formally declaring these named entities for the story graph. 4. The process for constructing propositions in order to encode actions and statives in the text, as seen in Figure 1. 12
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Báo cáo y học: " A tool for comparing different statistical methods on identifying differentially expressed genes"
24 p | 49 | 6
-
báo cáo khoa học: "Use of RE-AIM to develop a multi-media facilitation tool for the patient-centered medical home"
31 p | 61 | 6
-
Báo cáo khoa hoc:" A rehabilitation tool for functional balance using altered gravity and virtual reality"
7 p | 48 | 5
-
báo cáo khoa học: " Validity and usefulness of members reports of implementation progress in a quality improvement initiative: findings from the Team Check-up Tool (TCT)"
13 p | 56 | 5
-
báo cáo khoa học:" A tool to measure the attributes of receiving IV therapy in a home versus hospital setting: the Multiple Sclerosis Relapse Management Scale (MSRMS)"
8 p | 51 | 4
-
báo cáo khoa học: " Using knowledge brokers to facilitate the uptake of pediatric measurement tools into clinical practice: a before-after intervention study"
17 p | 44 | 4
-
báo cáo khoa học: "IsoBED: a tool for automatic calculation of biologically equivalent fractionation schedules in radiotherapy using IMRT with a simultaneous integrated boost (SIB) technique"
11 p | 106 | 4
-
Báo cáo y học: "A proposed adaptation of the European Foundation for Quality Management Excellence Model to physical activity programmes for the elderly - development of a quality self-assessment tool using a modified Delphi process"
30 p | 52 | 4
-
Báo cáo khoa học: " a Tool for Teaching by Viewing Computational Linguistics"
4 p | 59 | 4
-
báo cáo khoa học: " Evaluation of a clinical decision support tool for osteoporosis disease management: protocol for an interrupted time series design"
7 p | 49 | 4
-
báo cáo khoa học: " Usability evaluation of a clinical decision support tool for osteoporosis disease management"
12 p | 49 | 3
-
báo cáo khoa học: " The Rx for Change database: a first-in-class tool for optimal prescribing and medicines use"
9 p | 65 | 3
-
báo cáo khoa học: " Using the theory of planned behaviour as a process evaluation tool in randomised trials of knowledge translation strategies: A case study from UK primary care"
9 p | 72 | 3
-
báo cáo khoa học: "Virtual reality and physical rehabilitation: a new toy or a new research and rehabilitation tool?"
2 p | 47 | 3
-
Báo cáo khoa học: "Hyperglycaemic index as a tool to assess glucose control: a retrospective study"
6 p | 34 | 2
-
Báo cáo y học: " A retrospective cohort pilot study to evaluate a triage tool for use in a pandemic"
9 p | 38 | 2
-
Báo cáo khoa học: "A Web-Based Interactive Computer Aided Translation Tool"
4 p | 60 | 2
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn