Ph.D. Theses

Moving from News to Social Media Unsupervised Knowledge Enrichment for Event Extraction

By Hao Li
Advisor: Heng Ji
November 23, 2015

Event extraction is an important task in Information Extraction (IE), which is a sub-field in Natural Language Processing (NLP). It has been applied to different genres (e.g., news articles, web blogs, tweets, etc.) and various applications (e.g., question answering, information retrieval, etc.). The goal of event extraction is to extract structure information for the events that are of interest from unstructured documents. It will be extremely valuable if we could automatically detect and extract such events effectively.

However, identifying and classifying events is a challenging problem mainly due to three reasons: the first challenge is the lack of training data across genres thus traditional supervised systems can not be easily adapted to new genres. For example, we found that event extraction performed notably worse on web blogs than on newswire texts. Adapting an existing event extractor to another genre usually requires additional annotations. The second challenge comes from informal genres such as social media. The context of a social media message is usually short and incomplete (e.g., each tweet has a length limitation of 140 characters). Lacking of context, a single tweet itself usually cannot provide a complete picture of the corresponding events. The third challenge is the informal nature of social media. Social media messages are written in an informal style, which causes the poor performance of NLP tools designed for more formal genres.

This thesis focuses on tackling these challenges for event extraction in various genres, where the inter-dependencies of various components and subtasks can be found. The main theme of this thesis is to incorporate within-genre knowledge and cross-genre knowledge as two types of background knowledge to boost the event extractor performance, instead of conducting event extraction solely on each single document (e.g., a new article sentence or a social media message). We utilize three genres - news articles, tweets and Facebook messages as three case studies, to demonstrate the effectiveness and efficiency of utilizing knowledge enrichment techniques for event extraction tasks.

Return to main PhD Theses page