SemEval-2025 Task 3: Mu-SHROOM, the Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes

The shared task focuses on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models. Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span labeling task.

Forum