This tool was developed for internal use to guide us in revision of our manuscripts, but we thought it might also be of use to others. Please keep it mind that it is experimental and still in development.
The heart of the tool is a set of roughly 38,000 words to which a TAP phonics level has been assigned. When you enter in a text for analysis, the tool simply breaks it into words and non-words. We look in the levelled list for each word and, if it is found, adjust the appearance of that word in the results depending on its level.
Did we assign the TAP levels to each of those 38,000 words by hand? Not on your life! A separate tool builds that list using two public domain resources: The Carnegie Mellon University (CMU) Pronouncing Dictionary and the English language word list from the Moby Project by Grady Ward which is mirrored at Project Gutenberg. The latter list has over 130,000 words. We begin by looking up the pronunciation of each word in the CMU Pronouncing Dictionary. These are North American English pronunciations, but the TAP Level will be the same for lower level words which is what we’re focused on for now.
If the pronunciation is found, we try to reconstruct the spelling of the word from its pronunciation (its phoneme set) using lookup tables of spelling-pronunciation pairs. We have one table for each TAP Level. (What goes in each table above level 3 or 4 is still pretty fluid—for now the collection available through our app consists of Level 2 novels.) The highest level table in which we need to look to find spellings that let us account for the sound of the word is the level assigned to that word. All this is done in a Perl script. Did we mention that this was experimental and a work in progress? If you’ve made it this far and really want to look at the nitty-gritty of the recursive algorithm we use to recover the spelling of the target word, you’ll find all the rough-and-ready code for the decodability analysis tool in this public repository and the Perl script itself here.