Question classification with limited annotated data

For resource-poor languages such as Hmong, large datasets of annotated questions are unavailable, which means that producing an automated question classifier is a potentially challenging task. Currently, a dataset containing 411 annotated Hmong questions is publicly available. The challenge here is to produce a question classifier with adequate accuracy using this available dataset. What weContinue reading “Question classification with limited annotated data”

Using Word Embeddings for Semantic Analysis of Nominal Classifiers

Word embeddings created by Word2Vec can be utilized in exploring the semantic distributions of nouns associated with nominal classifiers. In this post, we explore using dendrogram analysis and k-means clustering with word embeddings as a means to form hypotheses for research involving these distributions. Nominal classifiers are known to have a range of semantic valuesContinue reading “Using Word Embeddings for Semantic Analysis of Nominal Classifiers”

A semi-supervised combined word tokenizer and POS tagger for Hmong

This post introduces a semi-supervised approach to word tokenization and POS tagging that enables support for resource-poor languages. The Hmong language is a resource-poor language [1] where corpora of POS-tagged data are previously unavailable, precluding supervised approaches. At the same time, the Hmong language has an unusually high number of homonyms and features syllable-based spacingContinue reading “A semi-supervised combined word tokenizer and POS tagger for Hmong”

Hmong Medical Corpus Blog: The Rationale

The Hmong Medical Corpus (currently hosted here: http://corpus.ap-southeast-2.elasticbeanstalk.com/hminterface/) was launched in August 2019 with a goal of making Hmong medical information readily available to members of the Hmong community in a single, searchable location and to members of the linguistic research community who need greater access to material in Hmong. This project involves natural languageContinue reading “Hmong Medical Corpus Blog: The Rationale”

Design a site like this with WordPress.com
Get started