INCITEST 2019 Conference

Correction Spelling Error On File Conversion Results In Information Extraction
Abdul Rohim, Ken Kinanti Purnamasari

Universitas Komputer Indonesia


Files generated from the file conversion process can cause spelling mistakes including non-word and word-boundary misspellings. Non-word spelling mistakes because spellings are not available in dictionaries so spellings do not match Indonesian standard of grammar (KBBI). Non-word errors are dealt with using two spelling correction methods that are soundex methods to provide remedial remedies and methods of damerau-levenshtein distance used to select improvements from words recommended. The soundex method is a spelling correction technique (phonetic string matching) and damerau-levenshtein distance method based on the similarity of writing (approximate string matching). Soundex will look for words that have similarities based on the same phonetic code as the recommended word and damerau-levenshtein distance look for the smallest edit distance value of the word that is considered misspelled with the recommended word. While the word-boundary spelling mistake is because the spacing does not fit between the words so the two words become one word. Word-boundary errors are handled by looking for two words in KBBI, when the two words are combined will be the same as the word that is considered a word-bound misspelling. Based on test results, average for recall and precision for misspelled word detection of 100%, 80.6%. for improvement of the average word recall and precision that is 100%, 65,1%

Keywords: correction spelling error, soundex, damerau-levensthein distance, information extraction, non-word, word-boundary

Topic: Informatic and Information System


Web Format | Corresponding Author (Abdul Rohim)