Audio / AudioTrack / Christmas Voice

Christmas Voice – Part 2

On 23rd December 2016 I released Christmas Voice to Google Play. It is a voice changer app which allows you sound like either Santa Clause or, if you prefer, an Elf. The app is completely free, with no adverts; and it’s also open-source (this link is at the end of the article). The app is only available for devices running Marshmallow and later (API23+) for reasons which will become obvious as we look at the technique used to perform the audio transformation. In this short series of articles we’ll take a look at how it works.

Previously we looked at how we record audio to a file on the device, and now we’ll turn our attentions to audio playback, and applying the all important audio transformations when we do so.

To play back we use a sister class to AudioRecord named AudioTrack. In may ways this is like a mirror image of AudioRecord. Whereas AudioRecord allows us to effectively read audio data from an audio source such as the microphone; AudioTrack allows us to write audio to the audio sink.

We’ll start by looking at how we create the AudioTrack instance in MediaToolsProvider:

public class MediaToolsProvider {
    private PresetReverb presetReverb = null;
    .
    .
    .

    AudioTrack getAudioTrack(long bufferSize) {
        AudioFormat audioFormat = getAudioFormat(AudioFormat.CHANNEL_OUT_MONO);
        AudioAttributes attributes = new AudioAttributes.Builder()
                .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
                .setUsage(AudioAttributes.USAGE_MEDIA)
                .build();
        AudioTrack track = new AudioTrack.Builder()
                .setAudioFormat(audioFormat)
                .setBufferSizeInBytes((int) bufferSize)
                .setAudioAttributes(attributes)
                .setTransferMode(AudioTrack.MODE_STATIC)
                .build();
        PresetReverb presetReverb = getPresetReverb();
        track.attachAuxEffect(presetReverb.getId());
        track.setAuxEffectSendLevel(1.0f);
        return track;
    }

    private PresetReverb getPresetReverb() {
        if (presetReverb == null) {
            presetReverb = createPresetReverb();
        }
        return presetReverb;
    }

    private PresetReverb createPresetReverb() {
        PresetReverb presetReverb = new PresetReverb(1, 0);
        presetReverb.setPreset(PresetReverb.PRESET_PLATE);
        presetReverb.setEnabled(true);
        return presetReverb;
    }
}

Once again we create an AudioFormat object, and the only difference here is the channel mask of CHANNEL_OUT_MONO as this is using an output channel rather than an input channel. It is important that the AudioFormat matches what we captured, hence the decision to hold the responsibility for creating both AudioRecord and AudioTrack within MediaToolsProvider.

For AudioAttributes we use some sensible defaults which match our use-case, and we then add a PresetReverb as an AuxEffect. This will give an echo-like effect to the output, and will give the final output more warmth and depth during playback. It is an addition to the main voice transfer effect which enhances things a little, in my opinion.

The only thing worth mentioning is that the buffer size is passed in a an argument. This ties in with how we’re actually using the AudioTrack. AudioTrack has two modes of operation MODE_STATIC and MODE_STREAM. Static mode is designed for small, fixed size audio chunks, and Stream mode is designed for longer audio chunks. Static mode would be best suited to our use here, but I hit issues with using PlaybackParams (more on this later) with static mode so I was forced to use stream mode instead.

The buffer size will actually be the size of the audio file that we created previously, and we’ll simply write the contents of the file in to the AudioTrack.

Some people may be wondering why I used the intermediate file, and didn’t just read from the AudioRecord object and write directly to the AudioTrack. There are two reasons for this: firstly because we’re using the default audio encoding, this may be compressed and therefore variable length. In such cases we would be unable to predict the size of the buffer needed for the AudioTrack in advance. Secondly, I wanted to store the last recording so that it is still available if the user exits an re-opens the app. If we held it in memory then this would not be possible.

Next we’ll look at the AudioPlayer class which actually plays back the audio file:

class AudioPlayer implements Player {
    private final AudioTrack audioTrack;
    private final File file;

    private float speed = 1f;

    private Thread playerThread;

    AudioPlayer(AudioTrack audioTrack, File file) {
        this.audioTrack = audioTrack;
        this.file = file;
    }

    @Override
    public boolean isPlaying() {
        return audioTrack.getPlayState() == AudioTrack.PLAYSTATE_PLAYING;
    }

    @Override
    public void startPlaying() {
        if (isPlaying()) {
            audioTrack.stop();
        }
        PlaybackParams playbackParams = audioTrack.getPlaybackParams();
        playbackParams.setPitch(speed);
        audioTrack.setPlaybackParams(playbackParams);
        audioTrack.setPlaybackPositionUpdateListener(positionListener);
        audioTrack.play();
        AudioPlayerTask playerTask = new AudioPlayerTask(audioTrack, file);
        playerThread = new Thread(playerTask);
        playerThread.start();
    }

    @Override
    public void setSpeed(float speed) {
        this.speed = speed;
    }

    @Override
    public void stopPlaying() {
        audioTrack.flush();
        audioTrack.stop();
        audioTrack.release();
        playerThread = null;
    }

    private AudioTrack.OnPlaybackPositionUpdateListener positionListener = new AudioTrack.OnPlaybackPositionUpdateListener() {
        @Override
        public void onMarkerReached(AudioTrack track) {
            track.flush();
            track.release();
            Timber.d("Playback Complete");
        }

        @Override
        public void onPeriodicNotification(AudioTrack track) {
            //NO-OP
        }
    };
}

Most of this should be pretty easy to understand, but there are a couple of areas worthy of explanation.

The first is the OnPlaybackPositionUpdateListener which will receive callbacks during playback. We’re only interested in when a marker is reached. We’ll look at how we set the marker later on, but we’ll set it for the end of playback. So we actually use this as the trigger to clean up the AudioTrack once playback is complete.

The other important area is PlaybackParams. This is what we use to actually perform the pitch shifting of the audio which is what is creating the voice changing effect we’re after. If we shift the pitch lower then we’ll get the Santa voice, and shifting it higher will result in the elf voice. PlaybackParams is only supported in API23 and later, and is the reason that the app is API23+. While it is possible to implement a custom pitch shift algorithm, it is not easy to get right, and that’s why I opted to go for an easy option. It’s a completely free app without any advertising, after all!

All that’s left is the AudioPlayerTask which reads the audio data from the file, and writes it to the AudioTrack:

class AudioPlayerTask implements Runnable {
    private static final int BUFFER_SIZE = 1024;

    private final AudioTrack audioTrack;
    private final File inputFile;

    AudioPlayerTask(AudioTrack audioTrack, File inputFile) {
        this.audioTrack = audioTrack;
        this.inputFile = inputFile;
    }

    @Override
    public void run() {
        InputStream inputStream = getInputStream();
        if (inputStream == null) {
            return;
        }
        byte[] buffer = new byte[BUFFER_SIZE];
        int read = -1;
        int total = 0;
        int size = (int)inputFile.length();
        while (total < size) {
            try {
                read = inputStream.read(buffer, 0, BUFFER_SIZE);
            } catch (IOException e) {
                Timber.e(e, "Error reading audio file");
            }
            audioTrack.write(buffer, 0, read, AudioTrack.WRITE_BLOCKING);
            total += read;
        }
        try {
            inputStream.close();
        } catch (IOException e) {
            Timber.e(e, "Error closing audio file");
        }
        int totalFrames = audioTrack.getBufferSizeInFrames();
        audioTrack.setNotificationMarkerPosition(totalFrames);
        Timber.d("Complete");
    }

    private InputStream getInputStream() {
        InputStream inputStream;
        try {
            inputStream = new FileInputStream(inputFile);
        } catch (FileNotFoundException e) {
            Timber.e(e, "Error opening audio file for reading");
            return null;
        }
        return inputStream;
    }
}

Once again, this is pretty straightforward. The only area of note is that we determine the number of frames in the buffer after we’ve written all of the audio data, and then use this to set the marker position which will trigger the cleanup when the playback completes.

I toyed with including some examples of the audio effects that Christmas Voice produces but decided against it. It’s a free app, after all – I encourage you to try it for yourself.

Many thanks to my testers: Darren, Ben, Jenny, Roberto, Seb, Dario, Donn, Naresh, Gyuri, Kenton, Wiebe, Mike, & Danny. Additional thanks to Sebastiano Poggi for proof-reading – any remaining typos are all mine, but there would have been many more without Seb’s keen eye!

The source code for Christmas Voice is available here.

© 2016, Mark Allison. All rights reserved.

Copyright © 2016 Styling Android. All Rights Reserved.
Information about how to reuse or republish this work may be available at http://blog.stylingandroid.com/license-information.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.