Babbel Speech Recognition

Summary

We're pretty big on Babbel, the online language learning platform, and wanted to let Free Language readers know that in June they released a new browser-based speech recognition feature.

This unique tool encourages users to practice the language they are learning out loud and gives them the opportunity to fine-tune their pronunciation. Some traditional e-learning software includes this sort of tool, but none of them do it online in quite the same way. It is also one of the first applications to incorporate the new Adobe Flash Player 10.1.

Language Learning System Babbel Introduces Speech Recognition

  • Foreign language pronunciation training online
  • Real-time technology based on the new Adobe Flash Player 10.1
  • Speech recognition with Babbel iPhone apps also planned

Babbel integrates a unique speech recognition tool into its language learning system. The feature encourages practice and gives the opportunity to fine-tune pronunciation skills. This adds yet another dynamic dimension to online language learning.

The browser-based speech analysis gives learners an instant evaluation, letting them know how close their pronunciation is to that of a native speaker. Included in all Babbel courses, the feature will take effect automatically. No installation is necessary beyond the latest Flash Player.

Babbel Speech Recognition Waveform

Active speech

Many students of foreign languages, regardless of their level, lack speaking practice and often shy away from direct communication. Breaking out of the shell can take courage that is not always at hand. The idea of the speech recognition feature is to give them the confidence to open their mouths. "We're encouraging learners to speak and improve their pronunciation with a technical tool before they have to face real-life situations", says Markus Witte, Managing Director of Babbel.

In a new kind of exercise integrated into the Babbel courses, learners hear a word or phrase and are prompted to repeat it back. The quality of the pronunciation is then rated on a scale of 0-100. A result of 50 or higher means that the utterance is generally understandable. Beyond that point, the user can continue to polish their pronunciation to desired perfection.

New technology from audio software experts

The Babbel founders have an extensive background in audio technology. Among them are the original developers of TRAKTOR, the world renowned DJ software. This professional-level technology has been channeled into the online language learning system Babbel and made accessible for anyone online.

The new speech recognition tool marks a milestone in online language learning and underscores Babbel’s technological lead. It works directly with the popular Flash Player and is the first to make use of its new capabilities

Another implementation of speech recognition technology is planned for the recently released Babbel iPhone apps.

Babbel Foreign Language iPhone View

About Babbel

Babbel is an online language learning system. Along with the website, there are apps for iPhone and a downloadable vocabulary trainer. Both beginners and returning learners will find interactive exercises for studying English, German, French, Spanish, Italian, Brazilian Portuguese and Swedish. Among diverse kinds of courses there are Basic and Advanced Vocabulary, grammatical exercises and pronunciation training. More than 700,000 people from over 200 countries have already registered with Babbel, while the iPhone apps have had at least 100,000 downloads.

We have reviewed Babbel for the seven languages it's currently working for. Have a look at our articles to find out much more about this unique language-learning platform:

Babbel English
Babbel French
Babbel German
Babbel Italian
Babbel Portuguese
Babbel Spanish
Babbel Swedish

More In-Depth on Babbel's Speech Recognition

Here's an interview with Babbel's Technical Director, Thomas Holl, which goes more into depth on their unique speech recognition technology.

Speech recognition is the exciting new feature at Babbel. It’s not only fun – it’s also amazingly efficient for learning a new language. But how does it work? Here's the low down from our Technical Director Thomas.

Crisi: What does the new speech recognition tool do?

Thomas: Basically, we use pronunciation samples recorded by our native speaking course editors and compare your pronunciation to theirs. As always with Babbel, you get instant feedback. The closer your pronunciation is to this example, the more points you get on a scale from 0 to 100. If you get more than 50 points, you’re good enough to be generally understood.

Crisi: But if you just compare two sounds, is that really speech recognition?
Thomas: Sure, we recognize what you say. We’re now sitting in front of the screen and we are talking but you see that the score is 0 all the time. Now, try saying arrivederci.

Crisi: Arrivederci

Thomas: Nice, 78 points. Better than Aldo Raine in “Inglorious Basterds” (see details here). Remember the hilarious scene where Brad Pitt is trying to speak Italian? We ran his pronunciation through our analysis and as you might expect he scored pretty low. But I’m digressing, sorry. Back to our little test. Your pronunciation is about 78% exact compared to our reference sample. That’s pretty good.

Crisi: Still, it’s only about comparing sounds, not about understanding what I say.

Thomas: Well, there are different sub-types of speech recognition. One is speech-to-text or voice control. That’s what you’d use to enter text or commands if you can’t use a keyboard. Recognizing words and evaluating their pronunciation is another sub-type, and that’s the technology that makes sense for language learning. We can use it for pronunciation training and for building new interactive exercises.

Crisi: So, what’s the technical challenge in this sub-type of speech recognition?

Thomas: Well, it’s not as easy as it sounds – no pun intended. It’s actually not enough to just compare two sounds. It’s a little like telling how similar two people look in two different photos. The audio samples are usually pretty different: a woman has a higher voice than a man and the tempo of speech also differs a lot. And then you have a number of artifacts...

Crisi: Artifacts?

Thomas: Noises and characteristics that are caused by the environment or the technical setup: rumbling, hissing, other sounds mixing into the voice. Most people don’t have a high-end microphone connected to their computer and in our case we just use the built-in mic on my laptop. The audio quality of what the system is hearing is pretty poor.

Crisi: So to make the speech recognition work properly, our users need to have a good mic and be in a quiet room?

Thomas: No, that’s the point: we can also work with cheap microphones and filter out noise in the immediate environment. That’s part of the challenge.

Crisi: Sounds like a lot of filtering and levelling...

Thomas: Yes, that also, but there’s more: We have to distil the “core” of the voice sample and then match that to the original. To do that, the system needs to figure out when you start and stop speaking. You don’t have to press any key to start and stop recording; we do the matching in real-time.

Crisi: So everything we say into the system here is somehow analyzed?

Thomas: Right. Just look at the level: every sound input is analyzed and matched to the sound we’re looking for. In this case, arrivederci.

Crisi: 55 points

Thomas: Ok, yours is better than mine. But you see that the word was recognized among all the other things we said.

Crisi: Is this unique technology? Are there other software product that do this?

Thomas: There are a number of software products that do have speech recognition. Some of them also are of decent quality.

Crisi: So what’s so special about the Babbel speech recognition?

Thomas: Well, it’s online and works in your browser.

Crisi: Does this mean that everything we say here is sent to the Babbel servers and analyzed there?

Thomas: No, the whole audio processing is done instantly, directly in the browser. We don’t have to send the audio to the server and that’s why we can give instant feedback.

Crisi: Do I have to install a plugin or something?

Thomas: You don’t. It’s all done in Flash. 97% of all browsers have the Flash plugin pre-installed. As we use the latest version, you might have to do an update, but that’s very quick. Other than that, you just need a microphone like the one that’s built into my laptop.

Crisi: Babbel has been online since January 2008. Why did it take so long to add this feature?

Thomas: We needed the new Flash Player 10.1 because before that it wasn’t possible to do audio processing locally. It would have been necessary to either send all the audio to the server for analyses or to use a custom browser plugin.

Crisi: What’s wrong with a custom browser plugin?

Thomas: First of all, you have to install new software on your computer. And then you have compatibility issues. There are some rare solutions that offer real-time speech recognition in a browser plugin, but most of them won’t work on your Mac and none of them are compatible with all browsers. Flash is already there, the plugin works fine and it’s available for all platforms.

Crisi: How about the iPhone? You can’t use Flash technology on that platform, can you?

Thomas: No, but the Babbel iPhone apps work natively on the iPhone anyway.

Crisi: Natively?

Thomas: The Babbel apps are built specifically for the iPhone and don’t need a browser or plugin to work. That’s called a “native” application. We can build our algorithm directly into the app.

Crisi: That’s not related to Native Instruments, the software company you used to work for?

Thomas: (laughs): No, not directly. But for being an audio software company, Native Instruments definitely is a great name because the software works natively on the computer.

Crisi: I guess we don’t have to understand that completely. But speaking of audio software: has your audio expertise (along with that of the other Babbel founders) been crucial for this new feature or is it something entirely different than building DJ tools?

Thomas: Both. Of course working on beat detection and time stretching for music and building a speech recognition tool are two different things. On the other hand, we couldn’t have done this in-house without our background.

Crisi: So who actually implemented the new feature?

Thomas: Most of it was done by Toine Diepstraten, one of the Babbel founders. He and I started working together on audio software in our first company, d-lusion, more than 10 years ago. Toine is one of the best developers and audio specialists I’ve ever met. It’s fantastic to have him on board for this project. He did have to do quite some research but without his expertise, this would never have been possible. But this way we have state-of-the art technology that can compare with any other implementation.

Crisi: You sound very convinced

Thomas: From a technical point of view, this is a great piece of software. We actually got some recognition from Adobe, the makers of the Flash Player. They were pretty impressed by our solution.

Crisi: Will this be a focus for Babbel from now on, or do you plan to work on other types of features?
Thomas: It is a very important feature because now we can do everything online that traditional e-learning software can do locally. And we don’t need installation or updates and we have a very lively online community that goes together with the self-directed learning...

Crisi: But?

Thomas: It’s important but it’s not the end. We’ll keep working and adding new features.

Crisi: Can you say what’s next for Babbel?

Thomas: Sorry, but for that we’ll have to turn off the mic.

Crisi: No problem.

Go ahead, try Babbel!

Submitted by polyglot on Mon, 08/23/2010 - 15:12