siri

Apple's virtual assistant Siri literally revolutionised the world of artificially intelligent assistants in smart devices. Although, the world now has more options like Siri from several tech companies, but there is one area where Siri leads the pack and that is localisation.

Siri is the only virtual assistant that supports 24 languages across 36 country dialects. In contrast Google AI can only understand five languages and in case of Alexa or Amazon the number is only two, English and German. And now Apple is upping its game, as iOS 10.3 is introducing another language, which will extend its international advantage further more and that language is Shanghainese.

In a recent interview with Reuters, Apple's head of speech Alex Acero explained how Siri is being taught an entirely new language. Acero joined the company in 2013 and now heads the Apple speech team.

Siri voice recognition was previously powered by Nuance, which the tech giant replaced a couple of years ago with a custom-built in-house voice platform that heavily relies on machine learning in order to improve its understanding of words.

To pick up a new language, Acero says, first of all real people, who can speak the language are brought in to read various paragraphs and word lists in different dialects and accents. The human speech is then recorded to form a canonical representation of words and how they sound aloud. This whole process is dictated by real people with good knowledge about the language to ensure accuracy. This raw training data is then fed into an algorithmic machine training model.

Following this process the algorithm can improve automatically over time as it is trained with more data.

In the next step Apple releases the new language as a feature of iOS and macOS dictation, available on the iPhone keyboard by pressing the microphone key next to the spacebar. This allows Apple to gain more speech samples (sent anonymously) from a much wider base of people. Apple takes the samples and transcribes them by humans, then using this newly verified pairing of audio and text as more input data for the language model. The report says this secondary process cuts the dictation error rate in half.

Apple keeps repeating this step until it feels that it has reached the ultimate level of accuracy, where it can be rolled out as a prime Siri feature.

The language is then released with a software update, just like how Shanghainese will be a part of iOS 10.3 and macOS 10.12.4. Siri is seeded with preset answers to the 'most common queries'; this enables Siri to answer questions like 'tell me a joke'. Questions like 'find nearby restaurants' are handled dynamically, of course.