Backward Compatibility / TextToSpeech

Dirty Phrasebook – Part 6

On 1st April 2015 I published a joke app to Google Play named Dirty Phrasebook which is based upon Monty Python’s Dirty Hungarian Phrasebook sketch. I this series of articles we’ll take a look in to the code (which will be open-sourced along with the final article). In this article we’ll look at the how Text-To_Speech was implemented in the app.

ic_launcherThe last remaining aspect of Dirty Phrasebook for us to look at is the TextToSpeech engine. TextToSpeech support has been in Android since API 4 (Donut) but is worthy of a little discussion because the API has changed quite a bit in API 21 (Lollipop) and later and it looks as though it has changed quite considerably internally. We’ll discuss how I decided to handle this changes in due course, but let’s begin with a quick discussion of our requirements.

Firstly, as we saw in the previous article, we saw how we need to change the visibility of the TextToSpeech indicator depending on whether the selected language is available in the TextToSpeech engine. Therefore we need a mechanism to check this. Also, because I was in a particularly cruel mood one morning, following a suggestion from Sebastiano Poggi I decided to increase the audio volume to maximum just prior to TestToSpeech playback, and return it to its previous level at the end. This was with the intention of embarrassing people trying it for the first time who took the precaution of lowering their volume before trying the TTS. A story was recounted to me by someone trying it at work had “Drop your panties, Sir William, I cannot wait till lunchtime.” spoken very loudly by their phone (despite turning the volume down low) eliciting lots of accusatory looks from that person’s colleagues. That fully justified the decision 🙂

Let’s begin by looking at the class which performs this volume setting:

final class VolumeController extends android.speech.tts.UtteranceProgressListener {
    private final AudioManager audioManager;
    private final Map<String, Integer> volumeMap = new HashMap<>();

    VolumeController(AudioManager audioManager) {
        this.audioManager = audioManager;

    public void onStart(String utteranceId) {


    public void onDone(String utteranceId) {

    public void onError(String utteranceId) {

    public void setVolume(String utteranceId, float volume) {
        int currentVolume = audioManager.getStreamVolume(AudioManager.STREAM_MUSIC);
        int newVolume = (int) (volume * audioManager.getStreamMaxVolume(AudioManager.STREAM_MUSIC));
        volumeMap.put(utteranceId, currentVolume);

    private void resetVolume(String utteranceId) {
        Integer volume = volumeMap.remove(utteranceId);
        if (volume != null) {

    private void setMusicVolume(int volume) {
        audioManager.setStreamVolume(AudioManager.STREAM_MUSIC, volume, 0);

We’ll discover more about utteranceId later on, but it uniquely identifies the particular instance of a phrase being spoken. So what we do is get the existing volume and map it to the utteranceId. When we get an onError() or onDone() callback signifying that a specific utterance has completed, we reset the volume back to its original value.

The API changes I mentioned earlier are comprised of an number of new methods which replace some of the existing methods which are deprecated as of API 21. For me this poses a quandary as I dislike using deprecated methods. Therefore I elected to create my own compat wrapper which will provide a standard interface, but individual implementations call the relevant methods on TextToSpeech for the host OS:

public abstract class TextToSpeechCompat {
    private static final String UTTERANCE_ID_FORMAT = "com.stylingandroid.dirtyphrasebook.tts.TextToSpeechCompat-%d";
    private static int currentUtteranceId = 0;
    private final TextToSpeech textToSpeech;

    private final VolumeController volumeController;

    public static TextToSpeechCompat newInstance(Context context, TextToSpeech.OnInitListener initListener) {
        TextToSpeech textToSpeech = new TextToSpeech(context, initListener);
        TextToSpeechCompat textToSpeechCompat;
        AudioManager audioManager = (AudioManager) context.getSystemService(Context.AUDIO_SERVICE);
        VolumeController volumeController = new VolumeController(audioManager);
            textToSpeechCompat = new LollipopTextToSpeech(textToSpeech, volumeController);
        } else {
            textToSpeechCompat = new LegacyTextToSpeech(textToSpeech, volumeController);
        return textToSpeechCompat;

    private static String getUtteranceId() {
        return String.format(Locale.UK, UTTERANCE_ID_FORMAT, currentUtteranceId++);

    protected TextToSpeechCompat(TextToSpeech textToSpeech, VolumeController volumeController) {
        this.textToSpeech = textToSpeech;
        this.volumeController = volumeController;

    protected TextToSpeech getTextToSpeech() {
        return textToSpeech;

    public boolean isLanguageAvailable(Locale locale) {
        int availability = textToSpeech.isLanguageAvailable(locale);
        return availability != TextToSpeech.LANG_NOT_SUPPORTED;

    public int setLanguage(Locale locale) {
        return textToSpeech.setLanguage(locale);

    public int speak(CharSequence text, int queueMode, Float volume) {
        String utteranceId = getUtteranceId();
        volumeController.setVolume(utteranceId, volume);
        return speak(text, queueMode, utteranceId);

    protected abstract int speak(CharSequence text, int queueMode, String utteranceId);

    public void shutdown() {


This works by holding a reference to a TextToSpeech instance, but this is actually an abstract class and there are distinct concrete implementations for the Lollipop APIs and legacy APIs – the appropriate one is created in the newInstance() factory method.

We also have a method to generate a unique utteranceId, plus some standard methods which wrap some of the existing methods of TextToSpeech which are common to both Lollipop and older versions of TextToSpeech. In the best interests of only implementing what I need, I only have a single abstract method which represents the only method I need which actually changes between Legacy and Lollipop versions:

class LegacyTextToSpeech extends TextToSpeechCompat {
    protected LegacyTextToSpeech(TextToSpeech textToSpeech, VolumeController volumeController) {
        super(textToSpeech, volumeController);

    public int speak(CharSequence text, int queueMode, String utterenceId) {
        HashMap<String, String> params = new HashMap<>();
        if (utterenceId != null) {
            params.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, utterenceId);
        return getTextToSpeech().speak(text.toString(), queueMode, params);
public class LollipopTextToSpeech extends TextToSpeechCompat {
    protected LollipopTextToSpeech(TextToSpeech textToSpeech, VolumeController volumeController) {
        super(textToSpeech, volumeController);

    public int speak(CharSequence text, int queueMode, String utteranceId) {
        return getTextToSpeech().speak(text, queueMode, null, utteranceId);

The differences in these two abstract method implementations show the delta in the APIs.

That’s pretty much it as far as the code is concerned but it is worth sharing some issues that I encountered on Lollipop devices. The isLanguageAvailable() method in TextToSpeech seems to be a little unreliable as it reports that certain languages are available when they are not. Some examples of languages that it incorrectly reports as being available are Swedish, Norwegian, and Danish. Anyone using with Dirty Phrasebook may have found that these simply do not play back (certainly for me on a device in the UK). This is a known issue, but one for which I have been unable to find a solution. I have tried calling getAvailableLanguages() (introduced in API 21) instead of using the older isLanguageAvailable() (which is what is reporting incorrectly for some languages) but it simply crashes with a NullPointerException so I have little option but to stick with the solution. If anyone has a solution for this then please let me know!

That concludes our look at Dirty Phrasebook. I encourage you to download it from Google Play (it’s free and contains no advertisements) and have a play with it to see how the UX works and quite seamlessly hides any need to perform on the fly translation as the user types.

One final note: If anyone would like to contribute additional translations then please take a look at the project source where there are full instructions for how to put together a pull request containing new translations.

The source code for this series is available here.

I am deeply indebted to my fantastic team of volunteer translators who generously gave their time and language skills to make this project sooo much better. They are Sebastiano Poggi (Italian), Zvonko Grujić (Croatian), Conor O’Donnell (Gaelic), Stefan Hoth (German), Hans Petter Eide (Norwegian), Wiebe Elsinga (Dutch), Imanol Pérez Iriarte (Spanish), Adam Graves (Malay), Teo Ramone (Greek), Mattias Isegran Bergander (Swedish), Morten Grouleff (Danish), George Medve (Hungarian), Anup Cowkur (Hindi), Draško Sarić (Serbian), Polson Keeratibumrungpong (Thai), Benoit Duffez (French), and Vasily Sochinsky (Russian).

© 2015, Mark Allison. All rights reserved.

Copyright © 2015 Styling Android. All Rights Reserved.
Information about how to reuse or republish this work may be available at

1 Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.