Text normalization refers to methods used to convert text input from written (e.g., "$2.50", "5:15pm") to spoken form ("two dollars and fifty cents", "five fifteen PM"), primarily, though not exclusively, as part of a text-to-speech synthesis engine. Unlike much of modern natural language processing, much of this process is currently accomplished using hand-written language-specific grammars rather than general-purpose machine learning engines, hindering our ability to scale. In the first half of the talk, I describe experiments on learning text normalization for cardinal and ordinal numbers (Gorman & Sproat 2016), which accounts for much of the complexity of these language-specific grammars. The first model I describe uses an end-to-end recurrent neural network. The second model, drawing inspiration from the linguistics literature, uses finite-state transducers constructed with a minimal amount of training data. While both models achieve near-perfect performance, the latter model can be trained using several orders of magnitude less data than the former, making it particularly useful for low-resource languages. I also describe some extensions of this work to other classes such as currency and time expressions. The second half of the talk will discuss Pynini (Gorman 2016), an open-source Python library for grammar compilation, which is used to implement minimally supervised text normalization grammar induction. I will review the design of this library and provide several tutorial examples drawn from morphophonology.
For more information about Kyle Gorman, click https://research.google.com/pubs/KyleGorman.html