AuTexTification: Automated Text Identification

 This shared task is proposed to study: (i) the automatic detection of machine generated text, (ii) the generalization capabilities of MGT detectors to new domains, and (iii) the feasibility of fine-grained MGT attribution to one of many generation models. Furthermore, a multi-domain annotated dataset of human-authored text and MGT generated by various LLMs is provided, which is a valuable resource for exploratory linguistic analysis of machine-generated and human-authored texts. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles).