abidibo.net

How to create your own speech recognition application with tasker

android tasker tips

Tasker is an awesome android app which let's you create and execute deep level tasks based on context in user-defined profiles, or widgets

What captured my attention is its javascript API which let's you interact with many phone functions through javascript, so you can imagine how many nice job you can accomplish with this app.

Here we'll see how you can implement your own speech recognition application so that your phone will answer to your defined commands!

We'll see how to get it through tasker interface, but then we'll see that it is possible to use javascript(let) to enhance our application.

Create the speech recognition task

Ok, let's start by creating our main task, here comes the description:

A1: Get Voice [ Title:What do you want, babe? Language Model:Free Form Maximum Results:3 Timeout (Seconds):30 ]
A2: Variable Split [ Name:%VOICE Splitter:, Delete Base:Off ]
A3: For [ Variable:%voice Items:%VOICE(1:) ]
A4: If [ %voice ~R call John ]
A5: Call [ Number:+39xxxxxxxxxx Auto Dial:On ]
A6: Stop [ With Error:Off Task:speech recognition ]
A7: End If
A8: If [ %voice ~R take photo ]
A9: Take Photo [ Camera:Rear Filename:speech Naming Sequence:Chronological Insert In Gallery:On Discreet:On Resolution:3264x2176 Scene Mode:Auto White Balance:Auto Flash Mode:Auto ]
A10: Notify [ Title:Speech recognition Text:photo taken Icon:null Number:0 Permanent:Off Priority:3 ]
A11: Stop [ With Error:Off Task:speech recognition ]
A12: End If
A13: End For
A14: Show Scene [ Name:speech popup Display As:Overlay, Blocking Horizontal Position:100 Vertical Position:100 Show Exit Button:On Continue Task Immediately:On ]

Let's see in detail the task actions, basically we try to recognize the speech, then we cycle through every recognition attempt checking any matching condition to be true. If a condition is met then its actions are executed and the task is stopped. If no condition is met as true we show a scene asking for a retry or cancel action.

  1. Get Voice uses a speech recognizer to convert speech into text. The text will be stored in a variable named %VOICE. The stored text could be a comma separated list of results because of deficiencies in the speech recognition, you can choose how many attempts to perform by changing the Maximum Results option.
  2. Variable Split splits a variable considering the given separator. In this case we split the %VOICE variable using the comma separator and we get a set of new variables %VOICE1...%VOICE3 containing the single parts of the original string, e.g if %VOICE = 'man,bad,sad' then %VOICE1 = 'man', %VOICE2 = 'man', %VOICE3 = 'sad'
  3. We start a For loop which will cycle through all the new created %VOICEn variable storing their values in the local %voice variable.

Then it comes the conditional block, in which we perform different operations basing upon the %voice value.

  1. If the recognized text matches regexp 'call John' then...
  2. ...John's number is Called with auto-dial option...
  3. ...and the task is Stopped
  4. End If
  5. If the recognized text matches regexp 'take photo' then...
  6. ...Take Photo with the rear camera, discrete etc...
  7. ...Notify that the photo has been taken...
  8. ...Stop the task
  9. End If

The end of the loop block:

  1. End For

What if no matches were found?

  1. Show Scene speech popup Display

About the last point: I've created a Scene which actually is a layer with two buttons, a retry button that when tapped performs again the task, and a cancel button which simply destroyes the scene.

How to run this task

Is not matter of this entry to talk about profiles or widgets, but clearly each task can be activated by a profile (an event, an opening application, a gesture...) or from an action, for example when clicking a scene button, or from a widget, using zoom, or you can also create your own application using Tasker App Factory.

Considerations

There's really nothing complex here! But which could be the problem? Yes, the problem could stay in the large amount of logic (if conditions) we have to write to listen for all the desired commands.

There's also something we can enhance: we may group similar actions into arrays, so that we can avoid code repetition for similar operations, like calling different contacts; or we can match using complex regular expressions and so on...

So the natural evolution of such approach is to use the power of javascript to write our conditional block!

Let's use javascript

In this case we can use the Javascriptlet action available in tasker to execute our conditional block this way:

A1: Get Voice [ Title:What do you want, babe? Language Model:Free Form Maximum Results:3 Timeout (Seconds):30 ]
A2: Variable Split [ Name:%VOICE Splitter:, Delete Base:Off ]
A3: For [ Variable:%voice Items:%VOICE(1:) ]
A4:Javascriptlet [ Code:.... Libraries: Auto Exit: On Timeout(Seconds):45 ] A5: End For
A6: If [ %nope ~ 1 ]
A7: Show Scene [ Name:speech popup Display As:Overlay, Blocking Horizontal Position:100 Vertical Position:100 Show Exit Button:On Continue Task Immediately:On ]
A8: End If

As we see we have replaced the A4--A12 actions with a single javascriptlet action. Inside our javascript code we can use the voice variable to match it against expressions or also the original %VOICE variable that we can access this way:
var string = global('VOICE');

Let's consider an example:

// do we meet a condition?
var nope = 0;
// we can create a people dictionary ;)
var dict = [
  {name: 'John', 'number': '3278372893'},
  {name: 'Pino', 'number': '9789389349'},
  {name: 'James', 'number': '89434834343'}
];
// call him plz!
for(var i = 0, l = dict.length; i < l; i++) {
  var p = dict[i];
  // let's use regular expressions!
  var rexp = new RegExp("call.*" + p.name , "gi");
  if(rexp.match(voice)) {
    call(p.number, true);
    exit();
  }
}
// Why don't we take a discrete photo?
if(/take.*photo/.test(voice)) {
  takePhoto(0, '640x840', '/path/to/file.jpg', true);
  exit();
}
// can't find a match, no exit before
nope = 1;

If a condition is met then the javascript execution exits and the nope variable is set to 0 so no scene is shown, otherwise the nope variable is set to 1 and the scene is shown to the user.

This was just an example of what you can do, but if you think about the power that js put in your hands you can easily imagine the opportunities that tasker opens for you.

So what about you? Do you think you'll write your own speech recognition application?

Subscribe to abidibo.net!

If you want to stay up to date with new contents published on this blog, then just enter your email address, and you will receive blog updates! You can set you preferences and decide to receive emails only when articles are posted regarding a precise topic.

I promise, you'll never receive spam or advertising of any kind from this subscription, just content updates.

Subscribe to this blog

Comments are welcome!

blog comments powered by Disqus

Your Smartwatch Loves Tasker!

Your Smartwatch Loves Tasker!

Now available for purchase!

Featured