IVR Answering System Speech Solutions Technical Library: Speech Technology including IVR Outsourcing Services - IVR Hosting Answering Services and Voice Broadcasting

Database Systems Corp. BBB Business Review

IVR AND VOICE BROADCASTING SERVICES AND SYSTEMS

Home | Contact Us | About Us | Sign Up | FAQ

IVR Software / ACD Answering Services • IVR Applications • Open IVR Applications • Tech Library

ACD Systems • Call Center Services • Voice Broadcasting • Answering Service

Information

IVR Solutions
IVR Service
IVR
IVR Systems
VUI Voice User Interface
IVR Development Systems
IVR Programming
IVR Design
Interactive Voice Response System
IVR Customer Satisfaction Surveys
Toll Free Services
Telephone Answering Service
800 Number Services
Voice Messaging Systems

Website Information

IVRS
IVR Software
Hosted IVR
IVR Hosting

IVR Technology Solutions

This section of our technical library presents information and documentation relating to IVR Suppliers and custom IVR software and products. Business phone systems and toll free answering systems (generally 800 numbers and their equivalent) are very popular for service and sales organizations, allowing customers and prospects to call your organization anywhere in the country. The PACER and WIZARD IVR System is just one of many DSC call center phone system features..

What is Interactive Voice Response?. An Interactive Voice Response (IVR) processes inbound phone calls, plays recorded messages including information extracted from databases and the internet, and potentially routes calls to either inhouse service agents or transfers the caller to an outside extension.

Contact DSC today. to learn more about our IVR services and IVR application development software.

Speech – Ready for Mainstream

Andrew Kozminski Vice President, R&D at Pronexus

A Developer’s Perspective on the Microsoft Speech Platform

Executive Summary

Speech recognition and speech synthesis are truly disruptive technologies (in a positive sense) with great potential. Despite the significant and verifiable return on investment that speech applications deliver, speech has yet not seen the wide adoption that was once forecast. High cost and complexity of development have been significant market barriers. This is about to change.

In March 2004, Microsoft officially enters the market for speech-enabled telephony (and multimodal) applications by releasing its integrated speech platform: the Microsoft Speech Server (MSS) and Microsoft Speech Application SDK. We believe that this is a very significant development, which will change the rules in our industry, from both technical and business perspectives. The Microsoft Speech Server sets new standards for price-performance and introduces new tools that address much of the complexity of speech development. This opens speech application development and deployment to a much larger audience than ever before. As a result, we expect speech to finally become a mainstream technology.

This paper presents the Microsoft speech platform from the perspective of a developer or project manager, who is responsible for implementing a new speech-based application. We will start by taking a quick look at the state of the industry today. We will then proceed to analyze how the Microsoft Speech Server changes the picture and point out some important technical aspects that need to be considered to take full advantage of this new technology.

Today - Speech is Still a Niche

After decades of research and innovation, the two core speech technologies, Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) are ready for many practical applications. Today’s state-of-the-art speech recognition engines, like the one being released by Microsoft, produce 95 to 99 percent accuracy. This is better than most people can type! At the same time, continuous advances in microelectronics, as predicted by Moore’s Law, have increased the power of both desktop and server computers to the point where they can handle even the most sophisticated algorithms involved in speech processing. In short, speech technology is ready today. When properly applied, it can significantly enhance internal and external business processes in most organizations.

However, even the best technology is not enough to guarantee successful deployment in real-life applications or widespread acceptance by the general public. To truly become part of our everyday lives, a technology has to be ubiquitous, simple to use and implement, and affordable. Unfortunately, quite the opposite has been true in the case of speech technology. It has been the domain of a few providers and its application remains complex and expensive. As a result, only relatively few early adopters (typically big corporations with big development budgets) have been able to benefit from speech-enabling their applications. For the vast majority of businesses, speech hasn’t arrived yet.

We can compare the speech industry today to the computer industry of 30 years ago. The computer market was dominated by complex and expensive mainframes and minicomputers and only large companies had the financial resources to buy and support them. Then came the PC and set new standards for affordability and ease of use. The rest, as they say, is history…

Similarly, today’s speech industry is still hampered by proprietary and expensive hardware and software, spotty and inconsistent standards, lack of a wide developer base and broad product distribution, and inadequate development tools.

All of these factors combine to make the job of speech-enabling an application (or developing a speech application from scratch) a difficult and often expensive proposition. A case in point: in our professional services practice, we see customers almost every day walking away from great speech applications and settling for old-style touchtone solutions once they are quoted prices and realize the complexity of deployment and maintenance. Similarly, developers looking to enhance their own applications with speech capabilities have shied away from doing just that, in turn losing out on potential business opportunities.

Tomorrow – Microsoft Speech as a Disruptive Technology

We believe that Microsoft’s platform, like the Personal Computer of 30 years ago, has the potential for becoming a disruptive force that will finally make speech a truly mainstream technology. It addresses all factors that have so far limited the application of speech: it dramatically lowers the total cost of ownership and the cost of core licenses, eliminates much of the complexity, shortens development cycles (by applying Microsoft Visual Studio, one of the most powerful and widely used programming environments) and finally, and perhaps most importantly, opens the platform for easy integration with third-party extensions and add-ons.

From a programmer’s perspective, the main strength of the Microsoft approach is its integration. The Microsoft speech platform is the first to affordably combine the complete set of technologies required to build real life speech-based business applications:

Speech Application Language Tags (SALT): an XML and HTML based open standard markup language, specifically designed for the definition of speech-based application logic. SALT already enjoys wide industry support – the SALT Forum currently represents over 70 member companies, including many of the industry leaders. The SALT specification has been submitted to the W3C.
Windows Server 2003: the latest generation of the Microsoft OS with a whole set of new features designed for enterprise deployment.
Microsoft Visual Studio: one of the most powerful programming environments and a de-facto industry standard with a complete set of development and debugging tools.
Speech Application SDK: a library of tools and ready-to-use speech-enabled building blocks for common processes such as credit card number recognition etc.
Microsoft Speech Server: a common hardware and software platform offered in a few well-defined, pre-configured options. This approach greatly reduces the effort involved in assembling, debugging, tuning and administering an otherwise very complex system, comprised of many subcomponents from different vendors.
.NET Framework: again, a whole set of technologies for building applications in a distributed and modular fashion, perfect for integration with third-party components (like databases, POS, credit card payment systems, billing, OA&M, etc), which are an integral part of every voice application

Together, these components deliver the most complete development platform for speech, second to none in the industry. The only comparable alternative, a VoiceXML-based system, falls short in many areas: lack of integration (engines, tools, libraries, etc. all come from different vendors), lack of industrial strength programming and debugging tools, incomplete support for telephony (which initially led to many proprietary extensions, defeating the standard), and of course the dramatically higher costs.

In short, we believe that the Microsoft speech solution offers a compelling value proposition, particularly for organizations in the currently under-served SME (Small and Medium Enterprises) space.

Is This Picture Perfect?

Unfortunately, the world of technology is never perfect and developers know that firsthand! The introduction of the Microsoft Speech platform makes the development of speech applications much easier, but this doesn’t mean it has become trivial. Speech applications are at the intersection of many different technologies such as telephony, speech, databases, Internet, call-centers, business applications, just to name a few, and are inherently complex.

Consider the following factors when planning a new speech-based solution on the Microsoft platform:

Limitations of SALT: The SALT standard has been specifically designed for speech and multimodal functionality. However, its first version still has some limitations. For example, there is currently no support for call bridging or fax functionality.
Limited telephony hardware support: Only two hardware vendors, Intel and Intervoice, currently support the Microsoft Speech Server. Although both of their implementations are fairly feature rich, they also have their own limitations. For example, today’s Intel implementation does not support Voice over IP or any of the more specialized protocols, like TBCT or Q.SIG. Before deciding on a platform, a system architect should thus carefully review all planned functionality to ensure that it can actually be implemented.
Unilingual speech implementation: today, MSS supports only (American) English.
New Architecture: Microsoft Speech Server is designed around the web-programming model, where the application logic is implemented as a web page and executes in the context of a web server (in this case - IIS). This approach has many advantages, widely discussed in the literature, but it does not serve all applications equally well. For example, a more sophisticated call control (such as different transfer scenarios) may not be possible. In general, applications that are heavily dependent on switch integration (CTI) may not (yet) be good candidates for SALT and the Microsoft Speech platform.
In addition, the model currently lacks features taken for granted by telephony and speech programmers, especially those used to the rapid application development tools traditionally used in IVR development. For example: MSS does not offer support for synchronizing access to global data, exchanging information between telephony channels or implementing custom state machines to handle call progress. All these problems, classic in multithreaded call processing, have to be solved by developers.
New Environment: SALT and the Microsoft Speech Server successfully eliminate a lot of complexity and problems plaguing speech projects. But the environment is still far from simple and a significant learning curve should be expected. For instance, developers have to become very familiar with the low level details of IIS, HTML, ASP.NET, J-Script and SMEX messaging. These technologies will most likely be quite foreign to a typical telephony expert coming from the “old school” of embedded systems and native APIs. On the other hand, how many web developers really know telephony and real-time call processing? The point is that these sets of skills rarely overlap (at least today) and therefore, project managers should add some extra time to their schedules for “re-tooling” their development teams for full productivity.
No Guidance for Best Practices: As any experienced speech developer would attest, building a good voice user interface (VUI) is an art that stretches far beyond simple programming. Several volumes have been written on how things should and should not be done and it is a common knowledge in the industry that an uninitiated developer has very little chance of “getting it perfect” the first time around. Concepts like context sensitive help, prompt escalation, mixed initiative or friendly error handling are not intuitive for programmers and often require the assistance of a language specialist. In this situation, providing even the most basic set of “best practices” can go a long way towards improving results. While the current Speech SDK provides a number of pre-built ASP.NET controls that encapsulate VUI design, it doesn’t offer much help with the concepts mentioned above.
No Call-Flow Design Tool to Organize the Application Logic: particularly in bigger applications with hundreds of dialog steps, developers may find it difficult to navigate through the controls spread across many .ASPX pages to find a specific section of the logic. This is in sharp contrast to the existing industry standards set by the well-structured design and authoring tools for IVR.

In summary, as impressive as the Microsoft Speech platform is, it can still be improved in a number of areas. We certainly believe that Microsoft is working hard to address all of these issues in future releases. In the meantime, developers can turn to add-on tools that overcome many of these limitations.

Accelerate To Speech Success

As stated above, we believe that the Microsoft Speech Server is a compelling platform for developing speech-enabled business applications that deliver cost savings, new business opportunities and other measurable bottom line results. For both product planners and programmers, the Microsoft Speech Server offers the best combination of price and performance, together with the most sophisticated and near complete development environment.

However, the Microsoft value proposition can be further improved by augmenting the Speech Server with a properly selected rapid application development (RAD) tool. An example of such a tool is VBSALT™, a complete speech and telephony development environment specifically designed to complement Microsoft Speech Server.

By addressing several of the shortcomings listed above, VBSALT lets developers rapidly create sophisticated speech applications using programming concepts and environments familiar to any developer who has ever worked with Microsoft Visual Studio. This in turn, opens the exciting opportunities of the speech market to the largest developer base imaginable.

A Picture is Worth a Thousand Words

As you can see in the picture below, VBSALT is integrated on top of the Microsoft Speech Application SDK, which in turn drives the Microsoft Speech Server engine. This integration offers several advantages to the application developer.

Complete graphical design environment for call-flows
Comprehensive set of high level telephony and speech building blocks
Controls that encapsulate best practices for voice user interface (VUI) design such as prompt escalation and error handling.
Access to the most important speech components of the Microsoft Speech SDK directly from a call-flow
Transparent generation and interpretation of SALT
Event-driven customization of call-flows, in any of the languages supported by the CLR (VB.NET, C#, J#, C++, etc.)
Full power of Visual Studio .NET for programming and debugging
Familiar environment of a “classic” Windows application
Multithreaded, event-driven architecture
Out-of-the-box framework for dealing with common programming problems in telephony like thread synchronization, state machines, inter-channel communication

Start Your (Microsoft Speech) Engines

Look around your organization: opportunities for speech-enabling business processes abound! Many of you may also realize the tremendous business opportunities that the Microsoft speech initiative presents to application developers, integrators and ISVs today.

As this paper has hopefully shown, the Microsoft Speech Server is the most compelling speech platform yet. As telephony and speech developers, we believe that the speech market is finally ready for the mainstream. Are YOU ready?

More Information

For more information about the Microsoft Speech Server, visit Microsoft Speech Server Website.

About the Author

Andrew Kozminski is the Vice President, R&D at Pronexus. With over 19 years of experience in information technology and telecommunications, Andrew has developed extensive technical expertise in the design and development of telephony and speech systems and applications.

Contact DSC today. to learn more about our IVR services and IVR application development software.