Converting text data from SQL tables to CoNLL-U format

The Hmong Medical Corpus stores its tagged text data in a SQL database. To use this data with Stanford CoreNLP, it must first be converted into CoNLL-U format. This post shows how this is done. First, let’s import the libraries needed. from itertools import groupby import os import sqlite3 import pandas as pd Next, let’sContinue reading “Converting text data from SQL tables to CoNLL-U format”

Using a SQL database for corpus development and management

Corpora are useful tools both for analyzing human language and for NLP application development. However, finding a good platform for building a corpus is not always straightforward. Using the sqlite3 package to create a SQL database to manage our corpus data is an excellent solution, as it provides a means both to maintain the internalContinue reading “Using a SQL database for corpus development and management”

Design a site like this with WordPress.com
Get started