HAHA - Humor Analysis based on Human Annotation

Welcome to the 2021 edition of the shared task HAHA - Humor Analysis based on Human Annotation, a task to classify tweets in Spanish as humorous or not and to deepen the humor analysis determining different characteristics of the tweets considered humorous. This task is part of IberLEF 2021

News

April 8, 2021. The training and dev dataset are available and can be downloaded from the Codalab page.
March 18, 2021. The Codalab page for the competition is available: https://competitions.codalab.org/competitions/30090. Registration is open!

Introduction

While humor has been historically studied from a psychological, cognitive, and linguistic standpoint, its study from a computational perspective is an active area of research in Machine Learning and Computational Linguistics. There exist some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), but a characterization of humor that allows its automatic recognition and generation is far from being specified. The aim of this task is to gain a better insight into what is humorous and what causes laughter, and we propose to go further in the direction of analyzing humor structure and content. It also aims to advance the field of Computational Humor field by providing a large dataset for humor researchers to explore theories of humor in Spanish. The expected target audience is NLP researchers interested in providing understanding and advances in highly subjective tasks, though anybody is welcome to participate.

Figurative language, specifically humor, has been a productive area of research as regards shared tasks for several years. Semeval-2015 Task 11 presented one of the challenges posed by figurative language, such as metaphors and irony: its impact on Sentiment Analysis. Semeval-2017 Task 6 presented humorous tweets submitted to a comedy program, and asked competitors to predict the ranking that the comedy program’s audience and producers gave the tweets. Grupo PLN-InCo has organized two editions of the HAHA task; at IberEVAL 2018 (Castro et al., 2018b) and at IberLEF 2019 (Chiruzzo et al., 2019). They both consisted of two subtasks: Humor Detection and Funniness Score Prediction. SemEval-2021 Task 7 (Meaney et al., 2021) is an ongoing task that combines humor detection with offense detection. It proposes the same subtasks as in HAHA 2018 and 2019, and adds two additional tasks: Offense Score Prediction and Controversial Humor Classification.

The HAHA evaluation campaign proposes different subtasks related to automatic humor detection. In order to carry out the tasks, an annotated corpus of tweets in Spanish is provided. This year's edition also proposes two new subtasks with the aim of deepening our understanding of computational humor.

Task description

We propose four subtasks, two of them are analogous to the subtasks proposed in HAHA 2018 and 2019, while the other two are new. Based on tweets written in Spanish, the following subtasks are proposed:

Humor Detection: determining if a tweet is a joke or not (intended humor by the author or not). The performance of this task is measured using the F1 score of the ‘humorous’ class.
Funniness Score Prediction: predicting a Funniness Score value for a tweet in a 5-star ranking, assuming it is a joke. The performance of this task is measured using the root mean squared error of the funniness score.
Humor Mechanism Classification: for a humorous tweet, predict the mechanism by which the tweet conveys humor from a set of classes such as irony, wordplay, hyperbole, or shock. In this task, only one class per tweet is allowed. The performance of this task is measured using the Macro-F1 score.
Humor Content Classification: for a humorous tweet, predict the content of the joke based on its target (what it is making fun of) from a set of classes such as racist jokes, sexist jokes, dark humor, dirty jokes, etc. This task might be related to other tasks such as detection of offensive content or hate speech. In this case, there could be many classes associated with a tweet, and also tweets that do not belong to any of the categories (multi-label classification). The performance of this task is measured using the Macro-F1 score.

Corpus

For tasks 1 and 2, we provide a corpus of crowd-annotated tweets separated in three subsets: training (24,000 tweets), development (6,000 tweets) and testing (6,000 tweets). The annotation uses a voting scheme in which users could select one of six options (Castro et al., 2018a; Chiruzzo et al., 2020): the tweet is not humorous, or the tweet is humorous and a score is given between one (not funny) to five (excellent). A part of each subset is also be manually annotated for including the information used in tasks 3 and 4.

The corpus contains annotated tweets such as the following:

Text	Tips para que tu cita salga bien: —Ponerla entre comillas. —Pon el nombre del autor al final. ¿Qué esperaban? ¿Consejos de amor?	Gracias por comunicarse con INADI: Si es negro marque el 1. Si es chino marque el 2. Si es puto marque el 3
Is humorous	True	True
Votes: Not humor	1	1
Votes: 1 star	0	1
Votes: 2 stars	0	0
Votes: 3 stars	1	1
Votes: 4 stars	2	2
Votes: 5 stars	1	0
Average score	4	3
Humor mechanism	wordplay	irony
Humor content	none	ethnicity/origin, lgbt

Important Dates

March 18th, 2021: CodaLab page.
~~April 1st~~ April 8th, 2021: training and development sets.
May 27th, 2021: test set and open for submissions.
June 3rd, 2021: publication of results.
June 14th, 2021: paper submission.
June 24th, 2021: notification of acceptance.
July 1st, 2021: camera-ready paper submission.
September, 2021: IberLEF 2021 Workshop.

Data

The data is available and can be downloaded from the Codalab page. If you use this data, please cite the following publication:

Chiruzzo, L., Castro, S., Góngora, S., Rosá, A., Meaney, J. A., & Mihalcea, R. (2021). Overview of HAHA at IberLEF 2021: Detecting, Rating and Analyzing Humor in Spanish. Procesamiento del Lenguaje Natural, 67, 257-268.

Results

The following are the final results for the competition:

User	Team	Task 1 F1	Task 2 RMSE	Task 3 M-F1	Task 4 M-F1
TanishqGoel	Jocoso	0.8850 (1)	0.6296 (3)	0.2916 (2)	0.3578 (2)
icc	icc	0.8716 (2)	0.6853 (9)	0.2522 (3)	0.3110 (4)
moradnejad	ColBERT	0.8696 (3)	0.6246 (2)	0.2060 (7)	0.3099 (5)
kuiyongyi		0.8681 (4)	0.6797 (8)	0.2187 (5)	0.2836 (6)
jgcarrasco	noda risa	0.8654 (5)	-	-	-
Neakail	BERT4EVER	0.8645 (6)	0.6587 (4)	0.3396 (1)	0.4228 (1)
Mjason	RoMa	0.8583 (7)	1.1975 (11)	-	-
JAGD	UMUTeam	0.8544 (8)	0.6226 (1)	0.2087 (6)	0.3225 (3)
skblaz		0.8156 (9)	0.6668 (6)	0.2355 (4)	0.2295 (7)
sgp55	humBERTor	0.8115 (10)	-	-	-
antoniorv6	RoBERToCarlos	0.7961 (11)	0.8602 (10)	0.0128 (10)	0.0000 (9)
sarasmadi	N&&N	0.7693 (12)	-	0.0404 (9)	-
ayushnanda14	TECHSSN	0.7679 (13)	0.6639 (5)	-	-
kdehumor	KdeHumor	0.7441 (14)	1.5164 (12)	-	-
baseline	baseline	0.6619 (15)	0.6704 (7)	0.1001 (8)	0.0527 (8)

Contact

We use the Codalab platform to manage participants and submissions. If you have any question, contact us via hahapln@fing.edu.uy

The organizers of the task are:

Luis Chiruzzo. Facultad de Ingeniería, Universidad de la República, Uruguay.
Santiago Castro. University of Michigan, Ann Arbor, USA.
Santiago Góngora. Facultad de Ingeniería, Universidad de la República, Uruguay.
Aiala Rosá. Facultad de Ingeniería, Universidad de la República, Uruguay.
Julie-Anne Meaney. University of Edinburgh, UK.
Rada Mihalcea. University of Michigan, Ann Arbor, USA.

Bibliography

(Attardo & Raskin, 1991) Attardo, S., & Raskin, V. (1991). Script Theory Revis(it)ed: Joke similarity and joke representation model. Humor: International Journal of Humor Research.

(Attardo et al., 2002) Attardo, S., Hempelmann, C. F., & Di Maio, S. (2002). Script oppositions and logical mechanisms: Modeling incongruities and their resolutions. Humor, 15(1), 3–46.

(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.

(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.

(Chiruzzo et al., 2019) Chiruzzo, L., Castro, S., Etcheverry, M., Garat, D., Prada, J. J., & Rosá, A. (2019). Overview of HAHA at IberLEF 2019: Humor Analysis based on Human Annotation. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019).

(Chiruzzo et al., 2020) Chiruzzo, L., Castro, S., & Rosá, A. (2020, May). HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 5106-5112).

(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.

(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada

(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer