Machine learning improves Arabic speech transcription capabilities

[ad_1]

Thanks to advances in speech and natural language processing, we hope one day you’ll be able to ask your virtual assistant what the best salad ingredients are. Currently, it is possible to ask your home gadget to play music or turn on voice command, a feature already available in some devices.

It’s a different story if you speak Moroccan, Algerian, Egyptian, Sudanese or any of the other dialects of the Arabic language that are extremely different from region to region and some of which are mutually incomprehensible. If your mother tongue is Arabic, Finnish, Mongolian, Navajo, or another language with a high level of morphological complexity, you may feel left out.

These complex structures intrigued Ahmed Ali to find a solution. He is the chief engineer in the Arabic Language Technologies group at the Qatar Computing Research Institute (QCRI), part of the Qatar Foundation’s Hamad Bin Khalifa University, and the founder of ArabicSpeech, “a community that exists for the benefit of Arabic speech science and speech technologies. ”

When Ali was at IBM years ago, he was fascinated by the idea of talking about cars, gadgets, and gadgets. “Can we make a machine that can understand different dialects – an Egyptian pediatrician to automate a recipe, a Syrian teacher to help kids pick up the essentials from their classes, or a Moroccan chef describing the best couscous recipe?” he states. However, the algorithms powering these machines can’t even eliminate about 30 types of Arabic, let alone make sense of it. Today, most speech recognition tools only work in English and a few other languages.

The coronavirus pandemic has exacerbated an already concentrated reliance on audio technologies, where natural language processing technologies are helping people comply with stay-at-home guidelines and physical distancing measures. However, while we’re using voice commands to assist with e-commerce purchases and manage our households, there are more applications in the future.

Millions of people around the world use massive open online courses (MOOCs) for open access and unlimited participation. Speech recognition, one of the main features of MOOC, is that students can search specific areas of the spoken content of the courses and make translations possible through subtitles. Speech technology enables the digitization of lectures to display the words spoken in college classrooms as text.

According to a recent article in Speech Technology magazine, the voice and speech recognition market is predicted to reach $26.8 billion by 2025, as millions of consumers and companies worldwide begin to rely on voice bots not only to interact with their devices or cars, but also to voice bots. also to improve customer service, drive healthcare innovation, and improve accessibility and inclusion for those with hearing, speech or motor disabilities.

In a 2019 survey, Capgemini predicts that by 2022, more than two out of three consumers will prefer voice assistants over going to stores or bank branches; A share that could rightfully increase, given the home-centered, physically distanced life and commerce that the pandemic has imposed on the world for more than a year and a half.

However, these devices fail to deliver to large areas of the world. For those 30 types of Arabic and millions of people, this is a largely missed opportunity.

Arabic for machines

English or French speaking voice bots are far from perfect. Still, teaching machines to understand Arabic is particularly difficult for a number of reasons. These are the three commonly accepted challenges:

Lack of accent. Arabic dialects are native, as is mainly spoken. Most of the existing text is not accented, meaning it lacks accents such as the narrow (´) or grave (`) that indicate the sound values of the letters. Therefore, it is difficult to determine where celebrities go.
Lack of resources. There is a shortage of labeled data for different Arabic dialects. Collectively, they lack the norms or standardized orthography that dictates how a language is written, including spelling, hyphenation, word endings, and stress. These resources are essential for training computer models, and their scarcity has hindered the development of Arabic speech recognition.
Morphological complexity. Arabic speakers go through a lot of code changing. For example, in areas colonized by the French—North Africa, Morocco, Algeria, and Tunisia—the dialects contain many borrowed French words. As a result, the number of words without a vocabulary that speech recognition technologies cannot understand because it is not Arabic is quite high.

“But the field is moving at lightning speed,” says Ali. It’s a collaborative effort among many researchers to get it to move even faster. Ali’s Arabic Language Technology lab is leading the ArabicSpeech project to bring together Arabic translations with dialects unique to each region. For example, Arabic dialects can be divided into four regional dialects: North African, Egyptian, Gulf, and Levantine. However, given that dialects do not conform to borders, this can be as subtle as one dialect per city; for example, a native Egyptian can distinguish the Alexandrian dialect from his compatriots Aswan (a distance of 1,000 kilometers on the map).

Building a technology-aware future for all

At this point, machines are almost as precise as human replicators, thanks in large part to advances in deep neural networks, a machine learning subfield of artificial intelligence that relies on algorithms inspired by how the human brain works biologically and functionally. However, until recently, speech recognition was somewhat hacked together. The technology has a history of relying on different modules for acoustic modelling, the creation of pronunciation dictionaries and language modeling; all modules that need to be trained separately. More recently, researchers have been training models that convert acoustic features directly into transcriptions of text, potentially optimizing entire tracks for the final task.

Despite these improvements, Ali is unable to give voice commands to most native Arabic-speaking devices. “The year is 2021 and I still can’t speak to many machines in my dialect,” he comments. “So, now I have a device that can understand my English, but machine recognition for multi-dialect Arabic speech hasn’t happened yet.”

Realizing this is the focus of Ali’s work that resulted in the first converter for Arabic speech recognition and dialects; one who has achieved unmatched performance thus far. The technology, called the QCRI Advanced Transcription System, is currently used by broadcasters Al-Jazeera, DW and the BBC to transcribe content online.

There are several reasons why Ali and his team are currently successful at building these speech engines. First, “Resources are needed in all dialects. We need to build the resources to train the model.” Advances in computer processing mean that computationally intensive machine learning now takes place in a graphics processing unit that can quickly process and display complex graphics. As Ali said, “We have great architecture, good modules, and data that represents reality.”

Researchers from QCRI and Kanari AI have recently created models that can achieve human parity in Arabic broadcast news. The system shows the effect of captioning in Aljazeera’s daily reports. While the English human error rate (HER) is about 5.6%, the research revealed that Arabic HER is significantly higher and can reach 10% due to the morphological complexity of the language and the lack of standard orthography in dialect Arabic. Thanks to the latest advances in deep learning and end-to-end architecture, the Arabic speech recognition engine manages to outperform native speakers in broadcast news.

While modern Standard Arabic speech recognition seems to work well, researchers from QCRI and Kanari AI are busy testing the limits of dialectal processing and achieving great results. Since no one at home speaks Modern Standard Arabic, all we need to do is pay attention to the dialect to make sure our voice assistants understand us.

This content was written by Qatar Computer Research InstituteMember of Hamad Bin Khalifa University, Qatar Foundation. It was not written by the editorial staff of MIT Technology Review.

[ad_2]

Source link

Happy Nice Mall