This post introduces a semi-supervised approach to word tokenization and POS tagging that enables support for resource-poor languages. The Hmong language is a resource-poor language [1] where corpora of POS-tagged data are previously unavailable, precluding supervised approaches. At the same time, the Hmong language has an unusually high number of homonyms and features syllable-based spacingContinue reading “A semi-supervised combined word tokenizer and POS tagger for Hmong”