******************** * Evaluation Data * ******************** FileA: document FileB: outPorterJavaPre.txt The mean number of words pre conflation class: 1.5939909 The Index Compression Factor: 0.37264386 The number of words and stems that differ: 194970 The mean number characters removed: 1.467316 The median number characters removed: 1.0 The mode number characters removed: 0 The characters removed table: Number of words with 0 Chars Removed 138040 Number of words with 1 Chars Removed 52648 Number of words with 2 Chars Removed 40422 Number of words with 3 Chars Removed 32266 Number of words with 4 Chars Removed 25152 Number of words with 5 Chars Removed 11314 Number of words with 6 Chars Removed 3350 Number of words with 7 Chars Removed 4635 Number of words with 8 Chars Removed 1156 Number of words with 9 Chars Removed 428 Number of words with 10 Chars Removed 63 Number of words with 11 Chars Removed 26 Number of words with 12 Chars Removed 10 The mean Hamming distance: 1.5871054 The median Hamming distance: 1.0 The mode Hamming distance: 0 The Hamming distance table: Number of words with 0 Hamming distance 114540 Number of words with 1 Hamming distance 75924 Number of words with 2 Hamming distance 40082 Number of words with 3 Hamming distance 32578 Number of words with 4 Hamming distance 24970 Number of words with 5 Hamming distance 10874 Number of words with 6 Hamming distance 2879 Number of words with 7 Hamming distance 4382 Number of words with 8 Hamming distance 1049 Number of words with 9 Hamming distance 514 Number of words with 10 Hamming distance 247 Number of words with 11 Hamming distance 275 Number of words with 12 Hamming distance 294 Number of words with 13 Hamming distance 265 Number of words with 14 Hamming distance 241 Number of words with 15 Hamming distance 173 Number of words with 16 Hamming distance 116 Number of words with 17 Hamming distance 54 Number of words with 18 Hamming distance 33 Number of words with 19 Hamming distance 18 Number of words with 20 Hamming distance 2 The Fox and Frakes Similarity Metric: 0.63007784 The Chris O'Neill Similarity Metric: 84.65270657650963%