It’s almost midnight and I finally give up for the day as I pack my bag and head home. In hurried motions, I sweep my laptop, cables and notebook into my bag and trundle down to the car park. A few motions then follow – I plug in the iPhone, set Waze to drive me home, launch Zomato to see what’s open on my homeward route and check the clock one last time.

With the spur of the engine turning over, I descend out of the lot to have the chain of streetlights point me home.

Why do I bring up this rather inane sequence?

Because, many of you follow the same routine with fewer or more things you need to do before you set off. Almost all the actions above could have been handled by a voice-first device without me having to use any of my appendages.

And yet, I didn’t speak to Siri. Here’s why.



Have you ever tried talking to Siri? The harsh abruptness in her voice almost ensures that these interactions are nothing more than small talk designed to perform very simple and direct functions. After a point, I just got plain bored with it. Does that make me a hard-ass and unyielding to change? Perhaps. But, for Voice UI to really kick off, it needs to work harder at prolonging conversations. If you’ve ever owned a Mitsubishi Pajero (Shogun in the UK), you would know what I’m talking about as you try to use voice to pair your phone via Bluetooth.


Let’s get real for a moment. When would you really use a voice-UI device? Thronged by a group of people, perhaps even one person? Most likely, you’d do it alone and in solitude. And that’s where things go awry for me.

I don’t like talking to my phone, just through it.

Add to that, the terse poignant interactions and suddenly the ensuing silence haunts me even further. No, thank you! I’d rather type and be launched from app to window and back again as I procrastinate on and on.


Even if I could somehow get used to talking to my device, there is the case for support on my favorite apps. While Alexa seems to be gaining skills after skills, support for Siri in the Middle East is limited to just these on my phone: Careem, Facebook, LinkedIn, Skype, Uber and WhatsApp. There’s no support to pick up on my last watched video on YouTube, no link to show me a recent story on Instagram and it even thought I worked in Vietnam when my contact address is set to Dubai, UAE.


Keep talking to me, in seductive and persuasive tones.

William Bernbach (of DDB fame) couldn’t have said it better – for his time. Some of my analytical peers will argue that advertising today is more scientific. However, let’s not spoil the story over what matters here – persuasion.

Voice UI devices need to build in persuasion as part of the chain of communication. Chatbots seem to be leading the charge with this at the moment as they are asking us questions rather than the other way around. And that’s precisely why advertisers are loving it so goddamn much. Voice interaction too needs persuasive powers – perhaps with a seductive tone to really help us roll over and empty our wallets or time.

With persuasive powers, the devices will be able to prolong conversations beyond the monosyllabic responses to the ear that reverberates – “Hey Mayank, how may I serve you?”

Speak to me, in my language.

One of the biggest challenges for voice-UI in the Middle East is its inability to discern dialects. From Lebanese to Maghrabi, Fusha, Misri and even Nabati; Arabic has many layers of phonetic intonations. They have been rolled and carved in different pots and oases across the sands.

And this too is its biggest opportunity. There are two ways to build up on this front:


A colleague often jokes about how interacting with Cortana for him is like being in a client meeting where he must enunciate every word extremely carefully and clearly. For voice-UI to work and work right, it must fix the first barrier – how does it understand speech as second nature? Just getting Siri to not confuse “Dubai” with “to buy” took me a few minutes. And I’m afraid if you’re going to take that long, I will just type it out.

I’m not saying that voice-first devices have not yet picked up Arabic. They’re getting smarter every day and if you look at Alexa Skills finally beating the 10k mark, there is appetite and adoption for it. However, how does a user even learn about all the new skills that Alexa possesses?! Unless, they’re capturing a colloquial phrase or becoming a natural part of how we talk on a day to day basis, these skills need to be learnt as much by the speaker as it does by Alexa.

Training ourselves to speak a certain language takes time. Stop reading this post for a minute and think about all the jargon you use in your day to day work life. Now, rewind back to the time when you didn’t know any of it. Would you inherently know what requests to make if you didn’t know what to ask for? That’s exactly why voice-UI devices need to take a leap of faith and figure us out rather than us figuring them out.


The Middle East has shown great affinity towards voice-heavy consumption. A common joke made about how lazy we are with baqqalas that order everything from a bottle of water to detergent and even tea is a great little insight into the way we communicate. Google India best put this idea to play in their “What is Fennel?” video.



A recent study by Accenture on the “Dynamic Digital Consumers” of the today also found that…

84% of 14-to-17 year olds currently use or are interested in using the voice-enabled digital assistant in their smartphone.

With these two currents in mind, what I’d like startups in the Middle East to do is focus on Arabic-first Voice UI. Don’t wait for the tech to build elsewhere and then as we usually do, adapt it. This is something that entrepreneurs in our countries should dive into.

Votek caught my attention a few months ago and I’m keen to understand all the work they’re doing in this space. They’ve taken the basic premise from hits such as Toymailand driven it further with Loujee which is a very interesting idea indeed. Especially because cognitive learning begins early on and that is really where we should be focusing on driving the penetration for voice-based interactions.

In conclusion, the voice-first arena holds considerable promise for the tech industry. While 2017 may not be the year of “Voice”, it will certainly be the time when great strides are made in making it more user-friendly. It needs to shed its gimmicky value and do more beyond simple commands that tell us how hot it is. And most importantly, just like Alexa, they need to mature quickly to learn us before we are forced to learn them. In the meantime, I will be trying to enunciate as clearly as possible and working on making a better friend out of Siri.