2017

The Dagstuhl-15512 ArgQuality Corpus
An English corpus for studying the assessment of argumentation quality. It contains 320 online debate portal arguments, annotated for 15 different quality dimensions by three annotators. [zip v1 1mb] [zip v2 1mb]

In version 2, the annotated XMI files have been changed according to a new underlying type system where each quality dimension is represented by an own annotation. This annotation contains not only the majority score of the respective dimension (as in version 1), but also the mean score and the scores of all annotators. We recommend to use version 2.

In case you publish any results related to the Dagstuhl-15512 ArgQuality corpus, please cite our EACL 2017 paper on argumentation quality. [pdf] [bib].

The Webis-ArgRank-17 Dataset
An English benchmark dataset for studying argument relevance. It contains 32 rankings as well a ground-truth argument graph with more than 30,000 argument units. In addition, we provide the source code to reproduce our ranking experiments based on the dataset. [zip 13mb]

In case you publish any results related to the Webis-ArgRank-17 dataset, please cite our EACL 2017 paper on argument relevance. [pdf] [bib]

 

2016

The Webis-Editorials-16 Corpus
An English corpus with 300 news editorials from three online news portals, annotated for the types of all argumentative discourse units. [zip 5mb]

In case you publish any results related to the Webis-Editorials-16 corpus, please cite our COLING 2016 paper on argumentation strategies. [pdf] [bib]

 

2014

The ArguAna TripAdvisor Corpus
An English corpus for studying local sentiment flows and aspect-based sentiment analysis. It contains 2100 hotel reviews balanced with respect to the reviews’ sentiment scores. All reviews are segmented into subsentence-level statements that have then been manually classified as a fact, a positive, or a negative opinion. Also, all hotel aspects mentioned in the reviews have been annotated as such. [zip v1 with software 10mb] [zip v1 8mb] [zip v2 8mb]

In addition, we provide nearly 200k further hotel reviews without manual annotations. [zip v1 265mb] [zip v2 265mb]

The corpus is free-to-use for scientific purposes, not for commercial applications. In version 2, the annotated XMI files have been changed according to a new underlying type system that is more easily extendable. Notice that some adaptations of the software of version 1 are necessary to make it work with version 2.

In case you publish any results related to the ArguAna TripAdvisor corpus, please cite our CICLing 2014 paper. [pdf] [bib]