Bootstrapping for Fun: Web-based Construction of Large Data Sets for Humor Recognition

Mihalcea, R.; Strapparava, Carlo

Humor is one of the most interesting and puzzling aspects of human behavior. Despite the attention it received in fields such as philosophy, linguistics, or psychology, there have been only few attempts to create computational models for humor recognition or generation. Similar to many other applications in natural language processing, the availability of large amounts of data is crucial for the development of automatic methods for humor recognition. In this paper we show that it is possible to bootstrap a very large and relatively clean corpus of humorous sentences starting with a handful of manually selected seeds, and we show how various stylistic features, inspired from theoretical studies of humor, can be applied to automatically distinguish between humorous and non-humorous examples.