Hmong Medical Corpus Blog

Hmong Medical Corpus Blog: Discussing NLP approaches for resource-poor languages

  • A Stanford CoreNLP POS Tagger model for Hmong

    May 4, 2020 by

    A new Stanford CoreNLP POS Tagger model for Hmong is now available. The model file and corresponding props files are available here: https://github.com/nathanmwhite/hmong-medical-corpus/tree/master/Stanford-CoreNLP This model is trained and tested on the files created in the previous post, derived from the Hmong Medical Corpus: The training data file: hmcorpus_train.conllu The test data file: hmcorpus_test.conllu

  • Converting text data from SQL tables to CoNLL-U format

    April 23, 2020 by

    The Hmong Medical Corpus stores its tagged text data in a SQL database. To use this data with Stanford CoreNLP, it must first be converted into CoNLL-U format. This post shows how this is done. First, let’s import the libraries needed. from itertools import groupby import os import sqlite3 import pandas as pd Next, let’s… Read more

View all posts

Follow My Blog

Get new content delivered directly to your inbox.

Design a site like this with WordPress.com
Get started