Member-only story

Making your own assistant with Qt/QML and Google Speech-to-Text

Petar Koretić
7 min readJan 19, 2020

--

In previous articles we explored Qt/QML in different ways so in this one we are going to continue with more Cloud integration. We are going to use Google Cloud Speech-to-Text API to be able to recognize basic voice commands which we could put in our home on embedded device like Raspberry PI with a nice graphical interface as part of our DIY automation system…or you can use it just as a way to learn more about different parts of Qt/QML.

Setup

We start as usual where you need Qt/QML installed and for this article you will also need Google Cloud Speech-to-Text API setup for making required API calls which we will use to transcribe speech to text.

In this article we are going to use an API key for obvious simplicity reasons so make sure to understand what that implies and to follow security recommendations as described in official documentation if you plan to test the code!

Never distribute your app with API KEY builtin and if you are doing it for your home purposes use IP address limitation.

Speech to Text API

There is a number of ways to communicate with Google Cloud APIs and in particular with Speech API. We can go with Google Client libraries, HTTP Rest API or gRPC API.

Given that HTTP is easily supported in Qt without adding extra libraries, that will be our choice as that makes for an easy cross platform support.

Note however that REST interface doesn’t support streaming recognition so if that is your requirement you should go with another interface.

REST API method

Google Cloud Speech-to-Text REST Interface is nicely documented so one just has to decide which methods to call. As we are going to have just one call at a time we are going to use recognize method so our call in turn will be like:

POST https://speech.googleapis.com/v1/speech:recognize

This is marked as a synchronous method in the API which in this case means that processing of the audio transcription will happen during usual HTTP request-response flow so in our response we are immediately going to get recognized audio content as text. This fits our…

--

--

Petar Koretić
Petar Koretić

No responses yet

Write a response