(named) entity recognition

MultiCoNER-ES

MULTICONER is a large multilingual dataset for Named Entity Recognition that covers 3 domains (Wiki sentences, questions, and search queries) across 11 languages, as well as multilingual and code-mixing subsets. This dataset is designed to represent contemporary challenges in NER, including low-context scenarios (short and uncased text), syntactically complex entities like movie titles, and long-tail entity distributions.

DIANN-2023-ES

The corpus contains abstracts of scientific articles from Elsevier journals belonging to the biomedical domain. Specifically, the texts were collected between 2017 and 2018. The corpus is provided in two partitions, a training and an evaluation partition. The training partition contains 500 texts. These texts correspond to the training and evaluation partitions made public for the DIANN competition at IberLEF 2018. In addition, a private test partition containing 100 texts is provided.