A TextGrid file contains data about intervals, segments, times etc. of the corresponding signal file (audio in wav, mp3, aif…). Because grids are in plain-text – they can be analysed / checked / extracted automatically, or parsed.
In case you are a linguist/phonetician you might be using Praat, a small, but very powerful, programme for phonetic analysis. Chances are have a lot of speakers and recordings. You will probably segment signals in Praat, and save the segmentation in TextGrids.
Thanks to Margaret Mitchell and Steven Bird, who contributed the parser for Praat TextGrid to Natural Language Toolkit, automated analysis is now much easier.
I am grateful to the authors, because they saved me a lot of time during segmentation checks. All that was needed was a Python script that uses the above code to load TextGrid content, and then write a set of checks for each file/speaker.
Checking file 03-speaker-im.TextGrid Checking proper tier names... Checking if tiers contain 32 items... Checking if all tiers have valid text... Checking if the diphthongs have pairs... Checking if all words are present... Checking if the words and diphthongs match... Mismatch: "ay_l" not allowed in "dice", at position 24. It should say "ay_s".
Here, for example, my script warned me that I have a wrong label for a diphthong in the file number 3. To spot that “manually” it would require a lot of time and attention.
I hope this post might help other researchers, and here is the Python script I wrote for my phonetic research.