HAHA - Humor Analysis based on Human Annotation

Welcome to the 2019 edition of the shared task HAHA - Humor Analysis based on Human Annotation, a task to classify tweets in Spanish as humorous or not, and to determine how funny they are. This task is part of IberLEF 2019.

News

August 30, 2019. The training and test data used this year are available in the data section.
August 15, 2019. We hosted the competition in CodaLab and we used to have the related information there. However, Codalab suffered a major problem, all of our competition data was lost and there are no backups. You can see a summary of the task in the overview paper and all the submitted papers in Track 3 of the IberLEF Proceedings, you can also check the results for the task. In the following days we will be publishing the training and test data with gold annotations. The challenge will be presented at IberLEF 2019 workshop in Bilbao on September 24th.
May 20, 2019. The test data is availbe in the CodaLab page for the HAHA challenge.
March 25, 2019. The training data is availbe in the CodaLab page for the HAHA challenge, please join the CodaLab competition to proceed.
March 18, 2019. In order to register your team, please join the Google Group hahaiberlef2019. This year we will be using CodaLab for handling the submissions, the CodaLab competition page will be ready at the beginning of next week.

Introduction

While humor has been historically studied from a psychological, cognitive and linguistic standpoint, its study from a computational perspective is an area yet to be explored in Machine Learning and Computational Linguistics. There exist some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), but a characterization of humor that allows its automatic recognition and generation is far from being specified. The aim of this task is to gain better insight in what is humorous and what causes laughter.

There is past work regarding this topic. Semeval-2015 Task 11 proposed to work on figurative language, such as metaphors and irony, but focused on Sentiment Analysis. Semeval-2017 Task 6 presented a similar task to this one as well. This is the second edition of the HAHA task, you can also see the results of last year's edition of the task (Castro et al., 2018b).

The HAHA evaluation campaign proposes different subtasks related to automatic humor detection. In order to carry out the tasks, an annotated corpus of tweets in Spanish will be provided.

Corpus

We provide a corpus of crowd-annotated tweets based on (Castro et al., 2018a), divided in 80% for training and 20% tweets for testing. The annotation was made with a voting scheme in which users could select one of six options: the tweet is not humorous, or the tweet is humorous and a score is given between one (not funny) to five (excellent).

All tweets are classified as humorous or not humorous. The humorous tweets received at least three votes indicating a number of stars, and at least five votes in total. The not humorous votes received at least three votes for not humor (they might have less than five votes in total).

The corpus contains annotated tweets such as the following:

Text	– La semana pasada mi hijo hizo un triple salto mortal desde 20 metros de altura. – ¿Es trapecista? – Era :(
Is humorous	True
Votes: Not humor	1
Votes: 1 star	0
Votes: 2 stars	1
Votes: 3 stars	2
Votes: 4 stars	0
Votes: 5 stars	1
Funniness score	3.25

Task description

Based on tweets written in Spanish, the following subtasks are proposed:

Humor Detection: telling if a tweet is a joke or not (intended humor by the author or not). The results of this task will be measured using F-measure for the humorous category and accuracy. F-measure is the main measure for this task.
Funniness Score Prediction: predicting a funniness score value (average stars) for a tweet in a 5-star ranking, supposing it is a joke. The results of this task will be measured using root-mean-squared error.

Important Dates

March 18th, 2019: team registration page.
March 25th, 2019: release of training data.
May 20th, 2019: release of test data.
June 3rd, 2019: results submission page.
June 10th, 2019: publication of results.
June 17th, 2019: working notes paper submission.
June 24th, 2019: notification of acceptance.
July 1st, 2019: camera ready paper submission.
September 24th, 2019: IberLEF 2019 Workshop.

Data

The training and test data can be downloaded here. If you use this corpus, please cite (Castro et al., 2018a) or (Chiruzzo et al., 2019).

Training data

Test data used in the competition (without annotations and with autogenerated tweet ids)

Test data with gold annotations

You can find more information about this corpus here.

Results

The following are the results for Task 1:

Team	F1	Precision	Recall	Accuracy
adilism	82.1	79.1	85.2	85.5
Kevin & Hiromi	81.6	80.2	83.1	85.4
bfarzin	81.0	78.2	83.9	84.6
jamestjw	79.8	79.3	80.4	84.2
INGEOTEC	78.8	75.8	81.9	82.8
BLAIR GMU	78.4	74.5	82.7	82.2
UO UPV2	77.3	78.0	76.5	82.4
vaduvabogdan	77.2	72.9	82.0	81.1
UTMN	76.0	75.6	76.5	81.2
LaSTUS/TALN	75.9	77.4	74.5	81.6
Taha	75.7	81.0	71.1	82.2
LadyHeidy	72.5	74.4	70.8	79.1
Aspie96	71.1	67.8	74.9	76.3
OFAI–UKP	66.0	58.8	75.3	69.8
acattle	64.0	68.3	60.2	73.6
jmeaney	63.6	61.3	66.1	70.5
garain	59.3	49.1	74.8	59.9
Amrita CEN	49.5	47.8	51.4	59.1
random baseline	44.0	39.4	49.7	50.5

The following are the results for Task 2:

Team	RMSE
adilism	0.736
bfarzin	0.746
Kevin & Hiromi	0.769
jamestjw	0.798
INGEOTEC	0.822
BLAIR GM	0.910
LaSTUS/TALN	0.919
UTMN	0.945
acattle	0.963
Amrita CEN	1.074
garain	1.653
Aspie96	1.673
OFAI–UKP	1.810
random	2.455

Contact

If you want to participate in this task, please join the Google Group hahaiberlef2019. We will be sharing news and important information about the task in that group. If you have any question that you prefer to write privately, contact us via hahapln@fing.edu.uy

The organizers of the task are:

Luis Chiruzzo. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Syntactic Analysis, Subjectivity Analysis.
Santiago Castro. University of Michigan, Ann Arbor, USA. Areas of research: Multimodality, Question Answering, Subjectivity, Sarcasm and Humor Analysis.
Mathias Etcheverry. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Lexical Semantics, Subjectivity Analysis.
Diego Garat. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, NLP for Legal Texts.
Juan José Prada. Facultad de Ingeniería, Universidad de la República, Uruguay. Information Extraction, Event Analysis in Social Networks, Syntactic Analysis.
Aiala Rosá. Facultad de Ingeniería, Universidad de la República, Uruguay. Areas of research: Subjectivity Analysis, Event and Temporal Analysis, Syntactic Analysis.

We are part of the NLP research group at Instituto de Computación, Facultad de Ingeniería, Universidad de la República, Uruguay.

Bibliography

(Chiruzzo et al., 2019) Chiruzzo, L., Castro, S., Etcheverry, M., Garat, D., Prada, J. J., & Rosá, A. (2019). Overview of HAHA at IberLEF 2019: Humor Analysis based on Human Annotation. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019).

(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.

(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.

(Castro et al., 2017) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2017). HUMOR: A Crowd-Annotated Spanish Corpus for Humor Analysis. arXiv preprint arXiv:1710.00477.

(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.

(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada

(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer