Retreiving High-Quality IPA for Many Languages

English Transcription of “IPA” (Wikipedia)

When learning a new language, it usually is a big speedup to master the pronunciation quickly. However, just listening to native speakers often doesn’t provide enough input to really distinguish between a the sound of the new language. To get around this, looking at the transcription using the International Phonetic Alphabet (IPA) provides a good cue.

Unfortunately, many Anki decks or Memrise courses still have no IPA transcriptions. This is very likely due to the lack of machine-readable resources in order to automatically generate the transcription for a given entry.

But, as a workaround for private use, for example a small Anki deck, one can easily query existing websites of established dictionaries. For example the German publisher PONS provides an online dictionary for many languages (mostly European) including IPA transcriptions.

To extract these transcriptions, a simple shell script based on xidel can be used.  The following script will take the French word “possibilité” and return the transcription “[pɔsibilite]”.

#!/bin/bash

WORD="possibilité"

urlencode() {
 # urlencode <string>

 local length="${#1}"
 for (( i = 0; i < length; i++ )); do
 local c="${1:i:1}"
 case $c in
 [a-zA-Z0-9.~_-]) printf "$c" ;;
 *) printf '%s' "$c" | xxd -p -c1 |
 while read c; do printf '%%%s' "$c"; done ;;
 esac
 done
}

URL="http://en.pons.com/translate/french-english/`urlencode ${WORD}`"
IPA=`xidel "${URL}" -e '<span class="phonetics">{text()}</span>'`

echo ${IPA}

This example code can be easily adjusted to obtain the IPA transcription from any other language supported by the PONS dictionary. By building a loop structure, one can batch-process a small amount of entries, e. g. a list of vocabulary for an Anki deck.

Conclusion

Having described an easy way to retrieve machine-readable IPA transcriptions for languages supported by the PONS Online Dictionary, we will hopefully see more and more Anki decks and Memrise courses to include phonetic transcriptions in flashcards, thus leading to more fun and success during the process of language learning

3 thoughts on “Retreiving High-Quality IPA for Many Languages

  1. acutia

    This looks useful. But how do you implement this bash script. Do you compile it or run it via some other app? Could it be used within Anki itself to search for IPA for all the entries in a deck?

    1. Timo Horstschäfer Post author

      You can run it directly on a terminal in Bash. I used it to generate a list of IPA transcriptions for my French deck. To get the list into Anki, I exported the deck as Plain Text, used LibreOffice to merge the two files and imported back into Anki.

Leave a Reply