natural language processing – Hmong Medical Corpus

A semi-supervised combined word tokenizer and POS tagger for Hmong

This post introduces a semi-supervised approach to word tokenization and POS tagging that enables support for resource-poor languages. The Hmong language is a resource-poor language [1] where corpora of POS-tagged data are previously unavailable, precluding supervised approaches. At the same time, the Hmong language has an unusually high number of homonyms and features syllable-based spacingContinue reading “A semi-supervised combined word tokenizer and POS tagger for Hmong”

Hmong Medical Corpus Blog: The Rationale

The Hmong Medical Corpus (currently hosted here: http://corpus.ap-southeast-2.elasticbeanstalk.com/hminterface/) was launched in August 2019 with a goal of making Hmong medical information readily available to members of the Hmong community in a single, searchable location and to members of the linguistic research community who need greater access to material in Hmong. This project involves natural languageContinue reading “Hmong Medical Corpus Blog: The Rationale”

Design a site like this with WordPress.com

Get started