The datasets below were obtained from the UCI Machine Learning Repository or StatLog. They were transformed to a standard format, attributes followed by outputs, and incomplete examples were removed. Sometimes a dataset contains both training and test examples; they are combined in one file, training examples followed by test examples. Note that attributes are not scaled. Users should scale the attributes after the dataset has been split into training and test parts, if they do so.
Dataset | D | N | Orig. | Comments |
---|---|---|---|---|
australian* | 14 | 690 | StatLog | see also UCI |
breast-cancer | 9 | 683 | UCI | aka wisconsin, ID# removed, 16 incomplete examples removed |
cleveland* | 13 | 297 | UCI | converted from the 5-class dataset, 6 incomplete examples removed |
diabetes | 8 | 768 | UCI | aka pima-indians |
german | 24 | 1000 | StatLog | see also UCI |
heart* | 13 | 270 | StatLog | see also
UCI,
a subset of cleveland |
ionosphere | 34 | 351 | UCI | |
sonar | 60 | 208 | UCI | |
votes84 | 16 | 435 | UCI | |
wdbc | 30 | 569 | UCI | ID# removed |
* Be aware of some categorical attributes.
Dataset | K | D | N | NT | Orig. | Comments |
---|---|---|---|---|---|---|
dna | 3 | 180 | 2000 | 1186 | StatLog | i.e., UCI/splice with 4 ambiguous examples removed |
glass | 6 | 9 | 214 | UCI | class 4 is absent | |
iris | 3 | 4 | 150 | UCI | ||
letter | 26 | 16 | 16000 | 4000 | UCI | StatLog has different order and training/test split |
pendigits | 10 | 16 | 7494 | 3498 | UCI | |
satimage | 6 | 36 | 4435 | 2000 | StatLog | see also UCI, class 6 is absent |
segment | 7 | 18 | 2310 | StatLog | see also UCI | |
shuttle | 7 | 9 | 43500 | 14500 | StatLog | see also UCI |
vehicle | 4 | 18 | 846 | StatLog | see also UCI | |
vowel | 11 | 10 | 528 | 462 | UCI | |
wine | 3 | 13 | 178 | UCI |