Text

Dirty Phrasebook – Part 2

On 1st April 2015 I published a joke app to Google Play named Dirty Phrasebook which is based upon Monty Python’s Dirty Hungarian Phrasebook sketch. I this series of articles we’ll take a look in to the code (which will be open-sourced along with the final article). In this article we’ll look at the remainder of translation mechanism used to (hopefully) reliably and repeatable translate the user’s input in to one of the phrases from the sketch.

ic_launcherPreviously we looked at how to simplify a string to make string comparisons easier, so the next thing to look at is how we actually perform the ‘translations’. We saw previously that we have a set of nine target phrases and we’ll take the modulus of the hashcode of the string to generate a number between 0 and 8 to select the appropriate string.

For each language to which we’re going to ‘translate’ we have a duplicate of these 9 strings translated to the appropriate language stored in a string array resource, and the name of each string array will be the ISO 639-1 code for that specific language. In some cases we may add a country code, as well – we’ll cover this later. For example the English, Italian and Croatian translations are:

In addition to this we need a mapping of these ISO 639-1 codes to human readable forms to use in the UI, so we have another couple of string arrays which will be used to perform this mapping:

Note that the order of these two arrays must match for the following code to work correctly.

So we can now turn our attention to the Translator class which will actually perform the ‘translations’. We create the Translator class by building the map with the readable form as the key and the ISO 639-1 form as the value:

The DeveloperException here represents a case which should never be hit in a production environment and is designed as a fail fast for me making stupid mistakes during development. DeveloperException simply subclasses java.lang.RuntimeException.

Next we need to be able to set the target language that will be translated to:

So when the language is set with one of the readable languages, it looks up the corresponding ISO 639-1 name, and loads the appropriate translation string array.

With all of this in place, performing the actual translation for the currently set language is pretty easy:

The string that we wish to translate is source and we begin by performing our sanitisation that we discussed previously. We then get the hash code of the sanitised string (which can be negative so we Math.abs() it to ensure that it’s positive) and get the modulus with respect to the number of translation strings to get an index of the string that we’ll translate to. Using this technique means that if the user repeatedly enters the same string (or similar strings which are the same after sanitisation) then the translation will always be the same.

That seems like it should be it, but there’s a requirement that I set which complicates things slightly. I wanted an identity translation. In other words if the user enters one of the target phrases as the input, we should always translate to that phrase even if the currently selected target language is not the one containing the matching string.

To achieve this we need to create a map with each entry having a key of the hash code of one of the translation strings, and the value being its index in the string array. We create this for all of the strings in all of the target languages during initialisation and we’ll use a SparseArray rather than a Map as it’s more efficient:

This simply iterates through all of the strings in all of the translations and stores their index mapped to their hash code.

Next we need to modify our translation code slightly:

This will first perform a hash lookup for any of the strings that we’ve just mapped and use the appropriate index if one was found, otherwise we perform the usual modulus lookup.

So that’s the translation engine working. In the next article we’ll start looking at the UI and see how we present this to the user.

The source code for this series is available here.

I am deeply indebted to my fantastic team of volunteer translators who generously gave their time and language skills to make this project sooo much better. They are Sebastiano Poggi (Italian), Zvonko Grujić (Croatian), Conor O’Donnell (Gaelic), Stefan Hoth (German), Hans Petter Eide (Norwegian), Wiebe Elsinga (Dutch), Imanol Pérez Iriarte (Spanish), Adam Graves (Malay), Teo Ramone (Greek), Mattias Isegran Bergander (Swedish), Morten Grouleff (Danish), George Medve (Hungarian), Anup Cowkur (Hindi), Draško Sarić (Serbian), Polson Keeratibumrungpong (Thai), Benoit Duffez (French), and Vasily Sochinsky (Russian).

© 2015, Mark Allison. All rights reserved.

CC BY-NC-SA 4.0 Dirty Phrasebook – Part 2 by Styling Android is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at http://blog.stylingandroid.com/license-information.

Leave a Reply

Your email address will not be published. Required fields are marked *