There are several NLP systems whose ac- curacy depends crucially on finding mis- spellings fast. However, the classical approach is based on a quadratic time algo- rithm with 80% coverage. We present a novel algorithm for misspelling detection, which runs in constant time and improves the coverage to more than 96%. We use this algorithm together with a cross docu- ment coreference system in order to find proper name misspellings. The experiments confirmed significant improvement over the state of the art.

Fast and Accurate Misspelling Correction in Large Corpora

Popescu, Octavian;Ngoc Phuoc An, Vo
2014

Abstract

There are several NLP systems whose ac- curacy depends crucially on finding mis- spellings fast. However, the classical approach is based on a quadratic time algo- rithm with 80% coverage. We present a novel algorithm for misspelling detection, which runs in constant time and improves the coverage to more than 96%. We use this algorithm together with a cross docu- ment coreference system in order to find proper name misspellings. The experiments confirmed significant improvement over the state of the art.
9781937284961
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11582/251822
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
social impact