How to create your own speech recognition application with tasker

Oct 02 2013

You can also be interested in:

Tasker is an awesome android app which let's you create and execute deep level tasks based on context in user-defined profiles, or widgets

What captured my attention is its javascript API which let's you interact with many phone functions through javascript, so you can imagine how many nice job you can accomplish with this app.

Here we'll see how you can implement your own speech recognition application so that your phone will answer to your defined commands!

We'll see how to get it through tasker interface, but then we'll see that it is possible to use javascript(let) to enhance our application.

Create the speech recognition task

Ok, let's start by creating our main task, here comes the description:

A1: Get Voice [ Title:What do you want, babe? Language Model:Free Form Maximum Results:3 Timeout (Seconds):30 ] A2: Variable Split [ Name:%VOICE Splitter:, Delete Base:Off ] A3: For [ Variable:%voice Items:%VOICE(1:) ] A4: If [ %voice ~R call John ] A5: Call [ Number:+39xxxxxxxxxx Auto Dial:On ] A6: Stop [ With Error:Off Task:speech recognition ] A7: End If A8: If [ %voice ~R take photo ] A9: Take Photo [ Camera:Rear Filename:speech Naming Sequence:Chronological Insert In Gallery:On Discreet:On Resolution:3264x2176 Scene Mode:Auto White Balance:Auto Flash Mode:Auto ] A10: Notify [ Title:Speech recognition Text:photo taken Icon:null Number:0 Permanent:Off Priority:3 ] A11: Stop [ With Error:Off Task:speech recognition ] A12: End If A13: End For A14: Show Scene [ Name:speech popup Display As:Overlay, Blocking Horizontal Position:100 Vertical Position:100 Show Exit Button:On Continue Task Immediately:On ]

Let's see in detail the task actions, basically we try to recognize the speech, then we cycle through every recognition attempt checking any matching condition to be true. If a condition is met then its actions are executed and the task is stopped. If no condition is met as true we show a scene asking for a retry or cancel action.

Get Voice uses a speech recognizer to convert speech into text. The text will be stored in a variable named %VOICE. The stored text could be a comma separated list of results because of deficiencies in the speech recognition, you can choose how many attempts to perform by changing the Maximum Results option.
Variable Split splits a variable considering the given separator. In this case we split the %VOICE variable using the comma separator and we get a set of new variables %VOICE1...%VOICE3 containing the single parts of the original string, e.g if %VOICE = 'man,bad,sad' then %VOICE1 = 'man', %VOICE2 = 'man', %VOICE3 = 'sad'
We start a For loop which will cycle through all the new created %VOICEn variable storing their values in the local %voice variable.

Then it comes the conditional block, in which we perform different operations basing upon the %voice value.

If the recognized text matches regexp 'call John' then...
...John's number is Called with auto-dial option...
...and the task is Stopped
End If
If the recognized text matches regexp 'take photo' then...
...Take Photo with the rear camera, discrete etc...
...Notify that the photo has been taken...
...Stop the task
End If

The end of the loop block:

End For

What if no matches were found?

Show Scene speech popup Display

About the last point: I've created a Scene which actually is a layer with two buttons, a retry button that when tapped performs again the task, and a cancel button which simply destroyes the scene.

How to run this task

Is not matter of this entry to talk about profiles or widgets, but clearly each task can be activated by a profile (an event, an opening application, a gesture...) or from an action, for example when clicking a scene button, or from a widget, using zoom, or you can also create your own application using Tasker App Factory.

Considerations

There's really nothing complex here! But which could be the problem? Yes, the problem could stay in the large amount of logic (if conditions) we have to write to listen for all the desired commands.

There's also something we can enhance: we may group similar actions into arrays, so that we can avoid code repetition for similar operations, like calling different contacts; or we can match using complex regular expressions and so on...

So the natural evolution of such approach is to use the power of javascript to write our conditional block!

Let's use javascript

In this case we can use the Javascriptlet action available in tasker to execute our conditional block this way:

A1: Get Voice [ Title:What do you want, babe? Language Model:Free Form Maximum Results:3 Timeout (Seconds):30 ] A2: Variable Split [ Name:%VOICE Splitter:, Delete Base:Off ] A3: For [ Variable:%voice Items:%VOICE(1:) ] A4:Javascriptlet [ Code:.... Libraries: Auto Exit: On Timeout(Seconds):45 ] A5: End For A6: If [ %nope ~ 1 ] A7: Show Scene [ Name:speech popup Display As:Overlay, Blocking Horizontal Position:100 Vertical Position:100 Show Exit Button:On Continue Task Immediately:On ] A8: End If

As we see we have replaced the A4--A12 actions with a single javascriptlet action. Inside our javascript code we can use the voice variable to match it against expressions or also the original %VOICE variable that we can access this way:
var string = global('VOICE');

Let's consider an example:

// do we meet a condition? var nope = 0; // we can create a people dictionary ;) var dict = [ {name: 'John', 'number': '3278372893'}, {name: 'Pino', 'number': '9789389349'}, {name: 'James', 'number': '89434834343'} ]; // call him plz! for(var i = 0, l = dict.length; i < l; i++) { var p = dict[i]; // let's use regular expressions! var rexp = new RegExp("call.*" + p.name , "gi"); if(rexp.match(voice)) { call(p.number, true); exit(); } } // Why don't we take a discrete photo? if(/take.*photo/.test(voice)) { takePhoto(0, '640x840', '/path/to/file.jpg', true); exit(); } // can't find a match, no exit before nope = 1;

If a condition is met then the javascript execution exits and the nope variable is set to 0 so no scene is shown, otherwise the nope variable is set to 1 and the scene is shown to the user.

This was just an example of what you can do, but if you think about the power that js put in your hands you can easily imagine the opportunities that tasker opens for you.

So what about you? Do you think you'll write your own speech recognition application?

abidibo.net

How to create your own speech recognition application with tasker

Create the speech recognition task

How to run this task

Considerations

Let's use javascript

Comments are welcome!

Your Smartwatch Loves Tasker!

Featured

Django admin and bootstrap 5

About code optimization, learn from exercises

Notes on the Pearson correlation coefficient

Archive

U8 SmartWatch Voice Commands

How to send short text messages from your U8 smartwatch

U8 smartwatch - Calculator App - Proof of Concept

How to get the best of your U8 smartwatch

AWR - abidibo's web radio software

Two must-have apps for android sysadmins

Create the speech recognition task

How to run this task

Considerations

Let's use javascript

Subscribe to abidibo.net!

Comments are welcome!

Your Smartwatch Loves Tasker!

Featured

Django admin and bootstrap 5

About code optimization, learn from exercises

Notes on the Pearson correlation coefficient

Archive