The shared task focuses on detecting hallucinations and other overgeneration mistakes in the output of instruction-tuned large language models. Mu-SHROOM addresses general-purpose LLMs in 14 languages, and frames the hallucination detection problem as a span labeling task.
Forum
Year
2025
Link to publication

