Welcome to the 2021 edition of the shared task HAHA - Humor Analysis based on Human Annotation, a task to classify tweets in Spanish as humorous or not and to deepen the humor analysis determining different characteristics of the tweets considered humorous. This task is part of IberLEF 2021



While humor has been historically studied from a psychological, cognitive, and linguistic standpoint, its study from a computational perspective is an active area of research in Machine Learning and Computational Linguistics. There exist some previous works (Mihalcea & Strapparava, 2005; Sjöbergh & Araki, 2007; Castro et al., 2016), but a characterization of humor that allows its automatic recognition and generation is far from being specified. The aim of this task is to gain a better insight into what is humorous and what causes laughter, and we propose to go further in the direction of analyzing humor structure and content. It also aims to advance the field of Computational Humor field by providing a large dataset for humor researchers to explore theories of humor in Spanish. The expected target audience is NLP researchers interested in providing understanding and advances in highly subjective tasks, though anybody is welcome to participate.

Figurative language, specifically humor, has been a productive area of research as regards shared tasks for several years. Semeval-2015 Task 11 presented one of the challenges posed by figurative language, such as metaphors and irony: its impact on Sentiment Analysis. Semeval-2017 Task 6 presented humorous tweets submitted to a comedy program, and asked competitors to predict the ranking that the comedy program’s audience and producers gave the tweets. Grupo PLN-InCo has organized two editions of the HAHA task; at IberEVAL 2018 (Castro et al., 2018b) and at IberLEF 2019 (Chiruzzo et al., 2019). They both consisted of two subtasks: Humor Detection and Funniness Score Prediction. SemEval-2021 Task 7 (Meaney et al., 2021) is an ongoing task that combines humor detection with offense detection. It proposes the same subtasks as in HAHA 2018 and 2019, and adds two additional tasks: Offense Score Prediction and Controversial Humor Classification.

The HAHA evaluation campaign proposes different subtasks related to automatic humor detection. In order to carry out the tasks, an annotated corpus of tweets in Spanish is provided. This year's edition also proposes two new subtasks with the aim of deepening our understanding of computational humor.

Task description

We propose four subtasks, two of them are analogous to the subtasks proposed in HAHA 2018 and 2019, while the other two are new. Based on tweets written in Spanish, the following subtasks are proposed:


For tasks 1 and 2, we provide a corpus of crowd-annotated tweets separated in three subsets: training (24,000 tweets), development (6,000 tweets) and testing (6,000 tweets). The annotation uses a voting scheme in which users could select one of six options (Castro et al., 2018a; Chiruzzo et al., 2020): the tweet is not humorous, or the tweet is humorous and a score is given between one (not funny) to five (excellent). A part of each subset is also be manually annotated for including the information used in tasks 3 and 4.

The corpus contains annotated tweets such as the following:

Text Tips para que tu cita salga bien:
—Ponerla entre comillas.
—Pon el nombre del autor al final.
¿Qué esperaban? ¿Consejos de amor?
Gracias por comunicarse con INADI:
Si es negro marque el 1. Si es chino marque el 2. Si es puto marque el 3
Is humorous
True True
Votes: Not humor
1 1
Votes: 1 star 0 1
Votes: 2 stars 0 0
Votes: 3 stars 1 1
Votes: 4 stars 2 2
Votes: 5 stars 1 0
Average score 4 3
Humor mechanism wordplay irony
Humor content none ethnicity/origin, lgbt

Important Dates


The data is available and can be downloaded from the Codalab page. If you use this data, please cite the following publication:

Chiruzzo, L., Castro, S., Góngora, S., Rosá, A., Meaney, J. A., & Mihalcea, R. (2021). Overview of HAHA at IberLEF 2021: Detecting, Rating and Analyzing Humor in Spanish. Procesamiento del Lenguaje Natural, 67, 257-268.


The following are the final results for the competition:
UserTeamTask 1 F1Task 2 RMSETask 3 M-F1Task 4 M-F1
TanishqGoel Jocoso 0.8850 (1) 0.6296 (3) 0.2916 (2) 0.3578 (2)
icc  icc 0.8716 (2) 0.6853 (9) 0.2522 (3) 0.3110 (4)
moradnejad ColBERT  0.8696 (3) 0.6246 (2) 0.2060 (7) 0.3099 (5)
kuiyongyi   0.8681 (4) 0.6797 (8) 0.2187 (5) 0.2836 (6)
jgcarrasco noda risa 0.8654 (5) - - -
Neakail BERT4EVER 0.8645 (6) 0.6587 (4) 0.3396 (1) 0.4228 (1)
Mjason  RoMa 0.8583 (7) 1.1975 (11) - -
JAGD UMUTeam 0.8544 (8) 0.6226 (1) 0.2087 (6) 0.3225 (3)
skblaz   0.8156 (9) 0.6668 (6) 0.2355 (4) 0.2295 (7)
sgp55 humBERTor 0.8115 (10) - - -
antoniorv6 RoBERToCarlos 0.7961 (11) 0.8602 (10) 0.0128 (10) 0.0000 (9)
sarasmadi N&&N 0.7693 (12) - 0.0404 (9) -
ayushnanda14 TECHSSN 0.7679 (13) 0.6639 (5) - -
kdehumor KdeHumor 0.7441 (14) 1.5164 (12) - -
baseline baseline 0.6619 (15) 0.6704 (7) 0.1001 (8) 0.0527 (8)


We use the Codalab platform to manage participants and submissions. If you have any question, contact us via hahapln@fing.edu.uy

The organizers of the task are:


(Attardo & Raskin, 1991) Attardo, S., & Raskin, V. (1991). Script Theory Revis(it)ed: Joke similarity and joke representation model. ​ Humor: International Journal of Humor Research.

(Attardo et al., 2002) Attardo, S., Hempelmann, C. F., & Di Maio, S. (2002). Script oppositions and logical mechanisms: Modeling incongruities and their resolutions. ​ Humor, 15(1), 3–46.

(Castro et al., 2016) Castro, S., Cubero, M., Garat, D., & Moncecchi, G. (2016). Is This a Joke? Detecting Humor in Spanish Tweets. In Ibero-American Conference on Artificial Intelligence (pp. 139-150). Springer International Publishing.

(Castro et al., 2018a) Castro, S., Chiruzzo, L., Rosá, A., Garat, D., & Moncecchi, G. (2018). A crowd-annotated spanish corpus for humor analysis. In Proceedings of the Sixth International Workshop on Natural Language Processing for Social Media (pp. 7-11).

(Castro et al., 2018b) Castro, S., Chiruzzo, L., & Rosá, A. Overview of the HAHA Task: Humor Analysis based on Human Annotation at IberEval 2018.

(Chiruzzo et al., 2019) Chiruzzo, L., Castro, S., Etcheverry, M., Garat, D., Prada, J. J., & Rosá, A. (2019). Overview of HAHA at IberLEF 2019: Humor Analysis based on Human Annotation. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2019). CEUR Workshop Proceedings, CEUR-WS, Bilbao, Spain (9 2019).

(Chiruzzo et al., 2020) Chiruzzo, L., Castro, S., & Rosá, A. (2020, May). HAHA 2019 Dataset: A Corpus for Humor Analysis in Spanish. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 5106-5112).

(Fleiss, 1971) Fleiss, J. L. (1971). Measuring nominal scale agreement among many raters. Psychological bulletin, 76(5), 378.

(Mihalcea & Strapparava, 2005) Mihalcea, R., & Strapparava, C. (2005). Making Computers Laugh: Investigations in Automatic Humor Recognition. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. HLT ’05, (pp. 531–538). Association for Computational Linguistics, Vancouver, British Columbia, Canada

(Sjöbergh & Araki, 2007) Sjöbergh, J., and Araki, K. (2007). Recognizing Humor Without Recognizing Meaning (2007). In WILF, (pp. 469–476). Springer