Text

Dirty Phrasebook – Part 2

On 1st April 2015 I published a joke app to Google Play named Dirty Phrasebook which is based upon Monty Python’s Dirty Hungarian Phrasebook sketch. I this series of articles we’ll take a look in to the code (which will be open-sourced along with the final article). In this article we’ll look at the remainder of translation mechanism used to (hopefully) reliably and repeatable translate the user’s input in to one of the phrases from the sketch.

ic_launcherPreviously we looked at how to simplify a string to make string comparisons easier, so the next thing to look at is how we actually perform the ‘translations’. We saw previously that we have a set of nine target phrases and we’ll take the modulus of the hashcode of the string to generate a number between 0 and 8 to select the appropriate string.

For each language to which we’re going to ‘translate’ we have a duplicate of these 9 strings translated to the appropriate language stored in a string array resource, and the name of each string array will be the ISO 639-1 code for that specific language. In some cases we may add a country code, as well – we’ll cover this later. For example the English, Italian and Croatian translations are:

<?xml version="1.0" encoding="utf-8"?>
<resources xmlns:tools="http://schemas.android.com/tools" tools:ignore="Typos,UnusedResources">
  <string-array name="en">
    <item>I will not buy this record, it is scratched.</item>
    <item>My hovercraft is full of eels.</item>
    <item>Do you want to come back to my place, bouncy bouncy?</item>
    <item>If I said you had a beautiful body, would you hold it against me?</item>
    <item>I am no longer infected.</item>
    <item>You have beautiful thighs.</item>
    <item>Drop your panties, Sir William, I cannot wait till lunchtime.</item>
    <item>Please fondle my bum.</item>
    <item>My nipples explode with delight.</item>
  </string-array>

  <string-array name="it">
    <item>Non comprerò questo album, è graffiato.</item>
    <item>Il mio hovercraft è pieno di anguille.</item>
    <item>Ti va di venire a casa mia, bum bum?</item>
    <item>Se ti dicessi che hai un corpo magnifico, poi me lo rinfacceresti?</item>
    <item>Non sono più infetto.</item>
    <item>Hai delle bellissime cosce.</item>
    <item>Cala le mutande, Ser Guglielmo, non posso attendere fino a pranzo.</item>
    <item>Per favore, accarezzami il sedere.</item>
    <item>I miei capezzoli esplodono di delizia.</item>
  </string-array>

  <string-array name="hr">
    <item>Neću kupiti ovu ploču, ogrebana je.</item>
    <item>Moja je lebdjelica puna jegulja.</item>
    <item>Hoćete li doći sa mnom doma, hopa cupa?</item>
    <item>Da li biste se naljutili kad bih rekao da imate lijepo tijelo?</item>
    <item>Nisam više zaražen.</item>
    <item>Imate lijepe bokove.</item>
    <item>Skidajte gaće Sir William, ne mogu čekati do ručka.</item>
    <item>Molim Vas pomilujte mi dupe.</item>
    <item>Moje bradavice pršte od užitka.</item>
  </string-array>
</resources>

In addition to this we need a mapping of these ISO 639-1 codes to human readable forms to use in the UI, so we have another couple of string arrays which will be used to perform this mapping:

<?xml version="1.0" encoding="utf-8"?>
<resources xmlns:tools="http://schemas.android.com/tools" tools:ignore="Typos">
  <string-array name="languages">
    <item>en</item>
    <item>it</item>
    <item>hr</item>
  </string-array>

  <string-array name="readable_languages">
    <item>English</item>
    <item>Italiano</item>
    <item>Hrvatski</item>
  </string-array>
</resources>

Note that the order of these two arrays must match for the following code to work correctly.

So we can now turn our attention to the Translator class which will actually perform the ‘translations’. We create the Translator class by building the map with the readable form as the key and the ISO 639-1 form as the value:

public class Translator {

    private final Resources resources;
    private final String packageName;
    private final Map languagesMap;

    public static Translator getInstance(Context context) {
        Resources resources = context.getResources();
        String packageName = context.getPackageName();
        String[] languages = resources.getStringArray(R.array.languages);
        String[] readableLanguages = resources.getStringArray(R.array.readable_languages);
        if (languages.length != readableLanguages.length) {
            throw new DeveloperException("R.array.languages and R.array.readable_languages need to be the same size. Bad developer!");
        }
        Map<String, String> languagesMap = new HashMap<>(readableLanguages.length);
        for (int index = 0; index < readableLanguages.length; index++) {
            languagesMap.put(readableLanguages[index], languages[index]);
        }
        return new Translator(resources, packageName, languagesMap);
    }

    Translator(Resources resources, String packageName, Map<String, String> languagesMap) {
        this.resources = resources;
        this.packageName = packageName;
        this.languagesMap = languagesMap;
    }
}

The DeveloperException here represents a case which should never be hit in a production environment and is designed as a fail fast for me making stupid mistakes during development. DeveloperException simply subclasses java.lang.RuntimeException.

Next we need to be able to set the target language that will be translated to:

public class Translator {
    private static final String RESOURCE_TYPE = "array";

    private String[] translations;
    .
    .
    .

    public void setLanguage(String language) {
        translations = getTranslations(languagesMap.get(language));
    }

    private String[] getTranslations(String language) {
        final int resourceId = resources.getIdentifier(language, RESOURCE_TYPE, packageName);
        if (resourceId == 0) {
            throw new DeveloperException("The dev is cray cray. Cannot find string array for language " + language);
        }
        return resources.getStringArray(resourceId);
    }
}

So when the language is set with one of the readable languages, it looks up the corresponding ISO 639-1 name, and loads the appropriate translation string array.

With all of this in place, performing the actual translation for the currently set language is pretty easy:

public class Translator {
    .
    .
    .
    public String getTranslation(String source) {
        String sanitised = StringSanitser.sanitise(source);
        int hashcode = Math.abs(sanitised.hashCode());
        int index = hashcode % translations.length;
        return translations[index];
    }
    .
    .
    .
}

The string that we wish to translate is source and we begin by performing our sanitisation that we discussed previously. We then get the hash code of the sanitised string (which can be negative so we Math.abs() it to ensure that it’s positive) and get the modulus with respect to the number of translation strings to get an index of the string that we’ll translate to. Using this technique means that if the user repeatedly enters the same string (or similar strings which are the same after sanitisation) then the translation will always be the same.

That seems like it should be it, but there’s a requirement that I set which complicates things slightly. I wanted an identity translation. In other words if the user enters one of the target phrases as the input, we should always translate to that phrase even if the currently selected target language is not the one containing the matching string.

To achieve this we need to create a map with each entry having a key of the hash code of one of the translation strings, and the value being its index in the string array. We create this for all of the strings in all of the target languages during initialisation and we’ll use a SparseArray rather than a Map as it’s more efficient:

public class Translator {
    .
    .
    .
    private final SparseArray hashLookup = new SparseArray<>();

    public static Translator getInstance(Context context) {
        Resources resources = context.getResources();
        String packageName = context.getPackageName();
        String[] languages = resources.getStringArray(R.array.languages);
        String[] readableLanguages = resources.getStringArray(R.array.readable_languages);
        if (languages.length != readableLanguages.length) {
            throw new DeveloperException("R.array.languages and R.array.readable_languages need to be the same size. Bad developer!");
        }
        Map<String, String> languagesMap = new HashMap<>(readableLanguages.length);
        for (int index = 0; index < readableLanguages.length; index++) {
            languagesMap.put(readableLanguages[index], languages[index]);
        }
        Translator translator = new Translator(resources, packageName, languagesMap);
        translator.updateHashLookup();
        return translator;
    }
    .
    .
    .
    private void updateHashLookup() {
        Collection languages = languagesMap.values();
        hashLookup.clear();
        for (String language : languages) {
            String[] allTranslations = getTranslations(language);
            for (int index = 0; index < allTranslations.length; index++) {
                String sanitised = StringSanitser.sanitise(allTranslations[index]);
                int sanitisedHashCode = Math.abs(sanitised.hashCode());
                if (hashLookup.get(sanitisedHashCode) != null) {
                    throw new DeveloperException("Dude, we have duplicate hash codes for translations strings");
                }
                hashLookup.put(sanitisedHashCode, index);
            }
        }
    }
}

This simply iterates through all of the strings in all of the translations and stores their index mapped to their hash code.

Next we need to modify our translation code slightly:

public class Translator {
    .
    .
    .
    public String getTranslation(String source) {
        String sanitised = StringSanitser.sanitise(source);
        int hashcode = Math.abs(sanitised.hashCode());
        Integer index = hashLookup.get(hashcode);
        if (index == null) {
            index = hashcode % translations.length;
        }
        return translations[index];
    }
    .
    .
    .
}

This will first perform a hash lookup for any of the strings that we've just mapped and use the appropriate index if one was found, otherwise we perform the usual modulus lookup.

So that's the translation engine working. In the next article we'll start looking at the UI and see how we present this to the user.

The source code for this series is available here.

I am deeply indebted to my fantastic team of volunteer translators who generously gave their time and language skills to make this project sooo much better. They are Sebastiano Poggi (Italian), Zvonko Grujić (Croatian), Conor O'Donnell (Gaelic), Stefan Hoth (German), Hans Petter Eide (Norwegian), Wiebe Elsinga (Dutch), Imanol Pérez Iriarte (Spanish), Adam Graves (Malay), Teo Ramone (Greek), Mattias Isegran Bergander (Swedish), Morten Grouleff (Danish), George Medve (Hungarian), Anup Cowkur (Hindi), Draško Sarić (Serbian), Polson Keeratibumrungpong (Thai), Benoit Duffez (French), and Vasily Sochinsky (Russian).

© 2015, Mark Allison. All rights reserved.

Copyright © 2015 Styling Android. All Rights Reserved.
Information about how to reuse or republish this work may be available at http://blog.stylingandroid.com/license-information.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.