A Stanford CoreNLP POS Tagger model for Hmong

A new Stanford CoreNLP POS Tagger model for Hmong is now available. The model file and corresponding props files are available here: https://github.com/nathanmwhite/hmong-medical-corpus/tree/master/Stanford-CoreNLP This model is trained and tested on the files created in the previous post, derived from the Hmong Medical Corpus: The training data file: hmcorpus_train.conllu The test data file: hmcorpus_test.conllu

Using a SQL database for corpus development and management

Corpora are useful tools both for analyzing human language and for NLP application development. However, finding a good platform for building a corpus is not always straightforward. Using the sqlite3 package to create a SQL database to manage our corpus data is an excellent solution, as it provides a means both to maintain the internalContinue reading “Using a SQL database for corpus development and management”

Hmong Medical Corpus Blog: The Rationale

The Hmong Medical Corpus (currently hosted here: http://corpus.ap-southeast-2.elasticbeanstalk.com/hminterface/) was launched in August 2019 with a goal of making Hmong medical information readily available to members of the Hmong community in a single, searchable location and to members of the linguistic research community who need greater access to material in Hmong. This project involves natural languageContinue reading “Hmong Medical Corpus Blog: The Rationale”

Design a site like this with WordPress.com
Get started