EvALL

EvALL 2.0 (Evaluate ALL 2.0) is an evaluation tool for information systems that allows evaluation of a wide set of metrics covering many evaluation contexts, including classification, ranking, or LeWeDi.

Persistence

The user can save evaluations, as well as retrieve past evaluations.

Replicability

All evaluations are conducted using the same methodology, making them strictly comparable.

Effectiveness

All metrics are encompassed under the theory of measurement and have been doubly implemented and compared.

Generalization

Generalization is achieved through the use of a standardized input format that allows the user to evaluate all evaluation contexts.

What can I do with EvALL?

Evaluation against repository

Evaluate your predictions against any of the tasks included in the EvALL 2.0 repository.

Evaluation against your own Gold Standard

Evaluate your predictions against your own Gold Standard in two simple steps: upload your files and select your metrics.

Evaluation Dashboard

View your results on a comprehensive evaluation dashboard where you can compare all your results: past, present, and future.

Metrics

Select from a wide range of metrics and evaluation contexts.

Analyze your results

Graphically view your results from different perspectives and capture the images to include them in your articles or projects.

Analysis Console

Analyze your results in detail through the PyEvALL console, where you can see errors in formats, analysis of metric preconditions, and much more.

Publish your results

Publish your best results on the leaderboard for each task in the EvALL 2.0 repository so that everyone can compare themselves.

Publish your Gold Standard

Do you want your task to appear in the EvALL 2.0 repository? Send us the necessary information, and we will include it so that everyone can evaluate against it.

Evaluation Contexts

Mono-label classification

Accuracy System Precision Kappa Precision Recall FMeasure ICM ICM Norm

Hierarchical mono-label classification.

ICM ICM Norm

Multi-label classification

Precision Recall FMeasure

Hierarchical multi-label classification.

ICM ICM Norm

Ranking

Precision at k R Precision MRR MAP DCG nDCG

LeWiDi

Cross Entropy ICM-Soft ICM-Soft Norm

Evaluation Dashboard

The EvALL 2.0 Dashboard provides an intuitive interface for exploring and comparing the results obtained across various selected metrics and executed on information system predictions. Through dynamic and customizable graphics, the Dashboard allows for data analysis from different perspectives and adjustment of visualizations according to your research needs. Additionally, the ability to zoom in and capture screenshots of the graphics enables you to effectively document and share your findings in articles or research projects.

EvALL console

The PyEvALL console provides a comprehensive experience for visualizing and addressing format errors detected in your prediction files. From detecting duplicate instance identifiers to incorrect formats and inconsistent data types. Additionally, PyEvALL allows you to explore errors produced in the analysis of metric preconditions, helping you understand and effectively correct any inconsistencies in your systems.