voiceXML
Check out Wirelessdevnet.com for the full text of this article.
The traditional telephone heritage has been DTMF input and pre-recorded speech output for Interactive Voice Response (IVR) applications. This is because automatic speech recognition (ASR) and text-to-speech (TTS) technologies were still being developed. But now that these technologies have improved significantly, the use of voice input and output is being heavily promoted for the wireless phone market as a way for consumers to tap into the exploding self-service information and transaction resources available from the World Wide Web.
A voice portal is the interface between a caller and an information source - it's the point of entry for a person using an IVR or speech recognition system. When augmented with VoiceXML, the voice portal can host a much wider variety of information, literally funneling any web-based data from your servers out to callers.
VoiceXML is the name of a technology standard developed and managed by the VoiceXML Forum (www.voicexml.org). It builds upon the work of earlier technologies such as VoXML from Motorola and SpeechML from IBM to create a standardized way to interact with services through a voice interface. VoiceXML Forum aims to drive the market for voice- and phone-enabled Internet access by promoting a standard specification for VXML, a computer language used to create Web content and services that can be accessed by phone. The 1.0 Release of the VoiceXML Specification can be downloaded from the VoiceXML Forum website.
Perhaps the first question that may arise is: "Why do we need a markup language for voice commands?" The answer to that question is becoming increasingly obvious as some members of the technology community have expressed their displeasure with textual wireless interfaces such as WAP. Wireless communication devices have the disadvantage of having small screens, limited input capabilities, and limited processing power. They've obviously been huge successes as voice communication conduits however it remains to be seen how the public will accept them as data delivery vehicles. One alternative to the textual interface offered by technologies such as WAP is what was originally known as an IVR, or Interactive Voice Response, system. Historically, these systems have been very proprietary and therefore unsuitable for allowing access to Web-based content. VoiceXML basically allows you to define a "tree" that steps the user through a selection process - known as voice dialogs. The user interacts with these voice dialogs through the oldest interface known to mankind: the voice! Powerful speech recognition software resides on the server to convert the user's stated selection (i.e. "Yes" or "No") into textual selection. This process is akin to selecting a hyperlink on a traditional Web page. Dialog selections result in the playback of audio response files (either prerecorded or dynamically generated using some sort of server-side text-to-speech conversion).
From a business viewpoint, voice applications open up a host of new revenue opportunities. Perhaps the most obvious revenue opportunity comes from the increased number of minutes we will all be spending on our wireless phones. In addition, advertising will become as commonplace through these services as it currently is on traditional media (Web, TV, radio, etc.). As voice services are added to your traditional carrier plan, there will clearly be a market for pay-as-you-go premium services (information lookups, email, contact databases, etc.). It's not hard to imagine most consumers opting to listen to a 15-second ad in exchange for free access to these premium services! Because VoiceXML is XML-based, it is yet another technology driving the move towards content distribution and management in XML. See an article on this topic a the Wireless Developer Network. (where most of this page was taken from.) Within two years, it is very likely that content providers will offer both WAP- and Voice-accessible sites for their wireless customers. Clearly, by this point, a manageable architecture using XML will be required.
VoiceXML is an XML application that defines a tree-like structure that the user can traverse through using voice commands. Click here to view the VoiceXML Document Type Definition (DTD), the document that defines the "grammar" of a valid VoiceXML application. An integral component to every VoiceXML application is the text-to-speech and speech-to-text processing engine that runs on the server. These products are available from a variety of vendors including IBM, Motorola, and SpeechWorks. Readers familiar with WML will find themselves vaguely familiar with VoiceXML as well. This is because both are XML-based markup languages used to define a group of elements that enable a user to traverse information. For instance, common tags supported by VoiceXML include "<"form">", "<"var">", and "<"menu">". All VoiceXML "documents" begin and end with the "<"vxml">" tag. Before diving in, we'd be remiss if we didn't at least kick things off with a simple "Hello World!" example.
"<"pre">"As you can see, this is a very simple example that uses the