Welcome to the shared task HAHA - Humor Analysis based on Human Annotation, a task to classify tweets in Spanish as humorous or not, and to determine how funny they are. This task is part of IberEval2018.
News
- September 19, 2018. Training and annotated test data are available in the data section.
- July 31, 2018. The proceedings of the IberEval 2018 workshop are online, including the proceedings of the HAHA challenge (track 4). Access here.
- May 17, 2018. The final results of the HAHA tasks are shown in the results page.
- May 10, 2018. New deadline for working notes submission: May 28th, 2018.
Working notes must be written in English and must use the Springer style.
The minimum length of papers should be 5 pages.
- May 7, 2018. The submission and results pages are available. Results will be uploaded twice a day with the most recent submissions during the submission period.
Introduction
While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Machine Learning and Computational Linguistics. There exist some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), but a characterization of humor that allows its automatic recognition and generation is far from being specified. The aim of this task is to gain better insight in what is humorous and what causes laughter.
There is past work regarding this topic. Semeval-2015 Task 11 proposed to work on figurative language, such as metaphors and irony, but focused on Sentiment Analysis. Semeval-2017 Task 6 presented a similar task to this one as well.
The HAHA evaluation campaign proposes different subtasks related to automatic humor detection. In order to carry out the tasks, an annotated corpus of tweets in Spanish will be provided.
Corpus
We provide a corpus of 20,000 crowd-annotated tweets based on (Castro et al., 2017), divided in 16,000 tweets for training and 4,000 tweets for testing. The annotation was made with a voting scheme, in which users could select one of six options: the tweet does not contain humor, or the tweet contains humor and a number of stars from one to five.
All tweets are classified as humorous or not humorous. The humorous tweets received at least three votes indicating a number of stars, and at least five votes in total. The not humorous votes received at least three votes for not humor (they might have less than five votes in total).
The corpus contains annotated tweets such as the following:
Text | – La semana pasada mi hijo hizo un triple salto mortal desde 20 metros de altura. – ¿Es trapecista? – Era :( |
Is humorous |
True |
Votes: Not humor |
1 |
Votes: 1 star | 0 |
Votes: 2 stars | 1 |
Votes: 3 stars | 2 |
Votes: 4 stars | 0 |
Votes: 5 stars | 1 |
Average stars | 3.25 |
Task description
Three subtasks are proposed, based on tweets written in Spanish:
- Humor Detection: telling if a tweet is a joke or not (intended humor by the author or not). The results of this task will be measured using F-measure for the humorous category and accuracy. F-measure is the main measure for this task.
Baselines for this task over the test data:
- baseline1: Decide randomly with 50% probability: F1 0.42, Acc 0.49
- baseline2: Choose tweets that start with a hyphen as humor: F1 0.17, Acc 0.66 - Funniness Score Prediction: predicting a funniness score value (average stars) for a tweet in a 5-star ranking, supposing it is a joke. The results of this task will be measured using root-mean-squared error.
Baseline for this task over the test data:
- Choose the value 3 (middle of the scale) for all tweets: rmse 1.14 Funniness Distribution Prediction (experimental): the final task goes beyond the previous one by asking a prediction of the distribution of votes for a tweet (i.e., what percentage of votes for each one of the 5 stars).
Important Dates
- March 26th, 2018: 16,000 tweets for training.
- April 23rd, 2018: 4,000 tweets for testing.
April 30thMay 7th, 2018: results submission page.May 7thMay 14th, 2018: publication of results.May 21stMay 28th, 2018: working notes paper submission.- June 18th, 2018: notification of acceptance.
- June 27th, 2018: camera ready paper submission.
- September 18th, 2018: IberEval Workshop.
Data
The training and test data can be downloaded here. If you use this corpus, please cite (Castro et al., 2018a) or (Castro et al., 2018b).
Test data with gold annotations
You can find more information about this corpus here.
Contact
If you want to participate in this task, please join the Google Group hahaibereval2018. We will be sharing news and important information about the task in that group. If you have any question that you prefer to write privately, contact us via hahapln@fing.edu.uy
The organizers of the task are:
- Santiago Castro. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis and Humor Detection.
- Luis Chiruzzo. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, Syntactic Analysis.
- Aiala Rosá. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, Event and Temporal Analysis.
We are part of the NLP research group at Instituto de Computación, Facultad de Ingeniería, Universidad de la República, Uruguay.
Bibliography
(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).
(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.
(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.
(Castro et al., 2017) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2017). HUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis. arXiv preprint arXiv:1710.00477.
(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.
(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada
(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer