DIANN-2023-EN

The corpus contains abstracts of scientific articles from Elsevier journals belonging to the biomedical domain. Specifically, the texts were collected between 2017 and 2018. The corpus is provided in two partitions, a training and an evaluation partition. The training partition contains 500 texts. These texts correspond to the training and evaluation partitions made public for the DIANN competition at IberLEF 2018. In addition, a private test partition containing 100 texts is provided. Since this is the partition used to evaluate systems on the ODESIA Leaderboard, this partition will not be made public. All disabilities mentioned in the texts have been annotated in the corpus.

Language(s)
English
Year
2023
Domain
Health
Text types
Abstracts scientific articles
Format
json

Number of units
600
Type of units
Documents
Tokens
108412
Documents
600
Training set size
500
Test set size
100

If you have published a result better than those on the list, send a message to odesia-comunicacion@lsi.uned.es indicating the result and the DOI of the article, along with a copy of it if it is not published openly.