Word Vectorization – Hmong Medical Corpus

Question classification with limited annotated data

For resource-poor languages such as Hmong, large datasets of annotated questions are unavailable, which means that producing an automated question classifier is a potentially challenging task. Currently, a dataset containing 411 annotated Hmong questions is publicly available. The challenge here is to produce a question classifier with adequate accuracy using this available dataset. What weContinue reading “Question classification with limited annotated data”

Using Word Embeddings for Semantic Analysis of Nominal Classifiers

Word embeddings created by Word2Vec can be utilized in exploring the semantic distributions of nouns associated with nominal classifiers. In this post, we explore using dendrogram analysis and k-means clustering with word embeddings as a means to form hypotheses for research involving these distributions. Nominal classifiers are known to have a range of semantic valuesContinue reading “Using Word Embeddings for Semantic Analysis of Nominal Classifiers”

Design a site like this with WordPress.com

Get started