HAHA - Humor Analysis based on Human Annotation

Welcome to the 2026 edition of the shared task HAHA - Humor Analysis based on Human Annotation and Automatic Humor Generation, a task to clasify, analyze, generate, and detect automatic humor in Spanish. This task is part of IberLEF 2026

News

June 10, 2026. The final results for task 3 are published in the results page.
June 4, 2026. The final results for tasks 1 and 2 are published in the results page.
May 27, 2026. The evaluation phase begins, the test data has been published and the development phase results can be seen here.
May 21, 2026. All participants who submitted systems are invited to write their working notes, the instructions for writing the papers are here.
April 8, 2026. The data to be used during the development phase is released. You can download it from the CodaBench page.
March 18, 2026. The CodaBench page for the competition is available: https://www.codabench.org/competitions/14700/. Registration is open!

Introduction

While humor has been historically studied from psychological, cognitive, and linguistic perspectives, its computational study is an active area of research in Machine Learning and Computational Linguistics that has been gaining traction over recent years. Starting with some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), and then going on to the HAHA, Hahackathon, MWAHAHA and related tasks, there has been development mainly in the field of automatic humor detection and classification, but a characterization of humor that allows its automatic recognition and generation is far from being solved. This task aims to gain better insight into what is humorous and what causes laughter, and to take some steps forward by assessing the capabilities of current LLMs to generate actual humorous content in Spanish and attempting to see whether it’s possible to automatically distinguish between computer-generated humor and humor written by humans. The target audience is NLP researchers interested in providing understanding and advances in highly subjective tasks, though anybody is welcome to participate.

Figurative language, specifically humor, has been a productive area of research as regards shared tasks for several years. Semeval-2015 Task 11 presented one of the challenges posed by figurative language, such as metaphors and irony: its impact on Sentiment Analysis. Semeval-2017 Task 6 presented humorous tweets submitted to a comedy program, and asked competitors to predict the ranking that the comedy program’s audience and producers gave the tweets. Grupo PLN-InCo has organized three editions of the HAHA task; at IberEVAL 2018 (Castro et al., 2018b), at IberLEF 2019 (Chiruzzo et al., 2019) and at IberLEF 2021 (Chiruzzo et al., 2021). They both consisted of subtasks related to humor detection and analysis. SemEval-2021 Task 7 (Meaney et al., 2021) combined humor detection with offense detection. Later on, Labadie Tamayo et al. (2023) organized the HUHU task at IberLEF 2023, focusing specifically on detecting and classifying hurtful humor. Interest in computational humor generation is starting to gain traction, with the current organization of MWAHAHA, the SemEval 2026 Task 1 on Humor Generation, the first task focused exclusively on generation, and later on, another lab began organizing a similar task for Arabic called ARHAHA.

This year the HAHA evaluation campaign proposes different subtasks related to automatic humor detection and generation, with the aim of deepening our understanding of computational humor.

Task description

There are three subtasks for this track. One of them is similar to the first subtasks proposed in HAHA 2018, 2019, and 2021, but over a new domain, while the other two are new:

Humor Detection: determining if a news headline is a joke or not (satirical news headlines vs real news headlines). The main performance metric of this subtask will be the F1 score of the 'humorous' class. This subtask is similar to the first subtask proposed in previous editions of the HAHA shared task, but this time it's applied to a particular domain where humorous and non-humorous content might sometimes be difficult to tell apart.
LLM-generated humor detection: determining if a joke inspired by a news headline was generated by an LLM or written by a human. The main performance metric for this subtask will be the F1 score of the 'automatic' class.
Humor Generation: generating jokes from a news headline using computational methods. This subtask will be evaluated through human preference judgments, employing LLM arena-style battles between pairs of generated jokes, and ranking the systems using an Elo-based leaderboard.

Participants could choose to participate in any number of subtasks, it is not mandatory that they submit results for all of them.

Corpus

In order to download the data, you must register in the CodaBench page.

Important Dates

March 18th, 2026: team registration page.
~~April 1st,~~ April 8th, 2026: development sets released and open for dev submissions.
May 27th, 2026: test sets released and open for test submissions.
June 3rd, 2026: end of test submissions, publication of results of subtasks 1 and 2.
June 10th, 2026: publication of results of subtask 3.
June 12th, 2026: paper submission.
June 23rd, 2026: notification of acceptance.
July 1st, 2026: camera-ready paper submission.
September, 2026: IberLEF 2026 Workshop.

Data

In order to download the data, you must register in the CodaBench page.

Results

The final results for the shared task can be seen here.

Contact

The organizers of the task are:

Luis Chiruzzo. Facultad de Ingeniería, Universidad de la República, Uruguay.
Santiago Castro. Facultad de Ingeniería, Universidad de la República, Uruguay.
Santiago Góngora. Facultad de Ingeniería, Universidad de la República, Uruguay.
Guillermo Moncecchi. Facultad de Ingeniería, Universidad de la República, Uruguay.
Aiala Rosá. Facultad de Ingeniería, Universidad de la República, Uruguay.
Ignacio Sastre. Facultad de Ingeniería, Universidad de la República, Uruguay.
Agustín Martínez. Facultad de Ingeniería, Universidad de la República, Uruguay.
Guillermo Rey. Facultad de Ingeniería, Universidad de la República, Uruguay.
Juan Pablo Conde. Facultad de Ingeniería, Universidad de la República, Uruguay.
Victoria Amoroso. Facultad de Ingeniería, Universidad de la República, Uruguay.
Juan José Prada. Facultad de Ingeniería, Universidad de la República, Uruguay.

Previous Editions

Bibliography

(Attardo & Raskin, 1991) Attardo, S., & Raskin, V. (1991). Script Theory Revis(it)ed: Joke similarity and joke representation model. Humor: International Journal of Humor Research.

(Attardo et al., 2002) Attardo, S., Hempelmann, C. F., & Di Maio, S. (2002). Script oppositions and logical mechanisms: Modeling incongruities and their resolutions. Humor, 15(1), 3–46.

(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.

(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.

(Chiruzzo et al., 2019) Chiruzzo, L., Castro, S., Etcheverry, M., Garat, D., Prada, J. J., & Rosá, A. (2019). Overview of HAHA at IberLEF 2019: Humor Analysis based on Human Annotation. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019).

(Chiruzzo et al., 2020) Chiruzzo, L., Castro, S., & Rosá, A. (2020, May). HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 5106-5112).

(Chiruzzo et al., 2021) Chiruzzo, L., Castro, S., Góngora, S., Rosá, A., Meaney, J. A., & Mihalcea, R. (2021). Overview of HAHA at IberLEF 2021: Detecting, Rating and Analyzing Humor in Spanish. Procesamiento del Lenguaje Natural, 67, 257-268.

(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.

(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada

(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer