EasyIVR IVR Technical Library including IVR Outsourcing Services: Linux and Voice XML in the Call Center - IVR Answering Services and Voice Broadcasting

Database Systems Corp. BBB Business Review

IVR AND VOICE BROADCASTING SERVICES AND SYSTEMS

Home | Contact Us | About Us | Sign Up | FAQ

Message Service / IVR Answering Services • IVR Applications • Open IVR Applications • Tech Library

ACD Systems • Call Center Services • Voice Broadcasting • Answering Service • IVR

Information

IVRS Software & Services
IVR Customer Satisfaction Surveys
Telemarketing Services
IVR Provider
Toll Free Services
Telephone Answering Service
800 Number Services
Voice Messaging Systems
Call Recording Systems
Voice Mail Message
Voice Mail System
Voice Mail Software
Inbound Call Center Services
IVR Hosting
Business Phone Services

Website Information

IVRS
IVR Software

IVR Solutions

This section of our technical library presents information and documentation relating to IVR Solutions and custom IVR software and products. Business phone systems and toll free answering systems (generally 800 numbers and their equivalent) are very popular for service and sales organizations, allowing customers and prospects to call your organization anywhere in the country. The PACER and WIZARD IVR System is just one of many DSC call center phone system features..

Contact DSC today. to learn more about our IVR services and IVR application development software.

Linux and VoiceXML in the Call Center

by Tony Mancill, senior IVR engineer with Vesta Corp., formerly systems programmer for BellSouth and Bank of America. He has been doing production Linux support since 1996 and has been a volunteer developer for the Debian GNU/Linux distribution since 1998. He has written several magazine articles on Linux and OpenSource and is the author of Linux Routers, 2/e (Prentice Hall, 2002)

The following is an article entitled "Linux and VoiceXML in the Call Center" by Tony Mancill.

Linux supports both the call-flow and the computer telephony integration (CTI) tiers of our IVR. Together they automate a substantial portion of those 60,000 phone calls via tight integration with our customer relationship management (CRM) application.

A decade ago, Linux was a hobbyist's operating system. I distinctly remember loading it for the first time, floppy by floppy (there were 75 or so), onto a 386SX so that I could have a C compiler without having to use the computers in the college labs. Today, Linux is the heart of my company's interactive voice response (IVR) platform, where we depend upon it to answer and process 60,000 phone calls a day, 24×7, without even questioning its suitability for the task.

Linux supports both the call-flow and the computer telephony integration (CTI) tiers of our IVR. Together they automate a substantial portion of those 60,000 phone calls via tight integration with our customer relationship management (CRM) application.

In the call-flow tier, the voice response units (VRUs) -- VoiceGenie VoiceXML browser -- run on Linux. In the CTI tier, Linux applications provide screen pops into our call center apps, launch the appropriate Web-based applications on the customer service representative (CSR) desktop when the call arrives, pull up the correct customer record and propagate all the IVR-collected information on to the CSR. All of which makes the bean counters happy, because the use of Linux instead of "name brand" applications reduces costs. (Our VoiceXML IVR deployment had an ROI of less than six months.)

And using Linux makes me happy as well, in multiple respects. When I wear my operations hat, I see everything running on commodity hardware -- just a rack of regular IU Web-server boxes, a few of which have telephony cards in them. No fossils with special maintenance contracts, no one-off security concerns, and no difficulties integrating with any number of network management systems. (We use an open-source monitoring system called Nagios.)

When real-world things happen to Linux servers, like hard drive failures, we hot-swap a disk, just as we would in any other server in a datacenter. At the time of writing, several of the production IVR systems have been running for more than 490 days without so much as a reboot. And those aren't idyllic days spent running SETI@Home: besides the constant call traffic, we also do batch reporting jobs, install security patches, create new call flow deployments, etc.

When I wear my programmer hat, Linux and VoiceXML make me glad I am not beholden to any specific technology to develop my call flow applications or to integrate with my business tier. Call flows can be deployed with literally any Web-server technology, from the simplest of static pages to complex dynamic pages created by server-side Web application engines. In our case, Java Server Pages (JSP) runs on our BEA (Nasdaq: BEAS) WebLogic J2BE cluster, which in turn runs on Debian GNU/Linux. In short, if the technology works in a Web environment, it's suitable for our VoiceXML environment.

Finally, sometimes I have to wear the product development hat, the one that covers integrating various technologies into our environment to offer new services (and, occasionally, just to keep up with Joneses). To do this successfully over time, the core technologies must be flexible and open, yet stable. Equally important, there must be innovators and vendors out there providing new features and products. Again. Linux -- and the newer VoiceXML IVR technology -- fill the bill.
Voice XM-What?
While Linux has already "crossed the chasm" and now enjoys mainstream acceptance in most IT shops, VoiceXML is still proving itself as a versa tile and capable alternative to its proprietary predecessors. The language got its start in the mid-1990s when industry leaders started developing "markup" languages to describe caller/computer telephone interactions.

In 1998, the W3C (World Wide Web Consortium) held its first conference on the topic, and in 2000 the 1.0 version of the standard was released. The current generation of browsers all support the VoiceXML 2.0 specification, which finally reached the formal W3C "Recommendation" status in March of 2004.

XML stands for "extensible markup language," which refers to a type of computer language that is based on plain text and tags, also known as markup tags. In VoiceXML's case, the set of tags can be used to describe dialogues that take place between a caller and a computer.

These dialogues are much like the interactions HTML describes between a Web browser and a Web server. In fact, you can think of a VoiceXML browser pretty much as you would a regular Web browser, but instead of mouse clicks and keyboard strokes, VoiceXML takes touchtone and/or human speech as input, generally in response to menu options. And instead of displaying text or graphics to the user, it plays pre-recorded or synthesized "on-the-fly" speech responses.

As for the actual code, VoiceXML looks pretty much like the Web page Hyper Text Markup Language, HTML, with a new set of tags and the addition of event handlers to catch certain events. These handlers allow you to continue to control the flow of dialogue when, for example, the caller doesn't respond to the menu within a certain number of seconds, enters an invalid menu choice, or hangs up in mid-dialogue. The other tags let you perform normal IVR tasks such as playing audio and collecting caller input.

The protocol used to exchange VoiceXML requests and deliver VoiceXML pages is standard HTTP. VoiceXML is the content of those HTTP fetches, just as HTML is the content of a Web browser's HTTP fetch.

The "XML" part of VoiceXML is of note because it allows the source to be validated before it is fed to the VoiceXML interpreter to be executed. This is particularly important in voice applications. When a Web browser doesn't understand a page it's accessing, it might display a meaningful error message, simply misinterpret the page, or it may even crash. But it would be a Very Bad Thing if a VoiceXML browser handling a phone call didn't understand its pages and subsequently caused the IVR to read out gibberish, or even worse, hang up on the caller.

Think of page validation as a way for the browser to make an educated decision in advance about whether it should execute code. If the VoiceXML page doesn't conform, then the browser can throw an exception, which any well-written VoiceXML application will catch and take appropriate action. In almost every case, the best course of action is to transfer the caller to a live operator.

In addition to page validation, XML affords a structured method for vendor implementations to extend the VoiceXML specification. Because VoiceXML and the page validation mechanism (called DTD, for Document Type Definition) are openly defined and available to all, every platform vendor is on equal footing when it comes to offering a fully compliant Voice XML browser.

This is important, not just to provide choice to customers, but also to foster growth of application houses and VoiceXML browser extensions. An example of this growth is the wide variety of speech recognition and synthesis engines that are now available from companies like Speechworks, Scansoft, Nuance, Phonetic Systems and Loquendo.

Another significant example of the openness of the VoiceXML specification is the release of voice portal modules for CRMs from Oracle (Nasdaq: ORCL) and SAP (NYSE: SAP) . Both are written to the VXML 2.0 spec, which allows users to choose any standards-compliant VoiceXML gateway to access the applications.
Touring A VoiceXML IVR
If I'm throwing around a lot of unfamiliar jargon, maybe a picture will help. Customers reach us by dialing toll-free numbers over the PSTN. These calls are delivered to the PBX via T1 tie lines, then trunked from the PBX to the VoiceGenie VRUs via more T1s.

The VoiceGenies are commodity 1U server PCs running the VoiceGenie software on top of RedHat Linux. They also are outfitted with Intel (Nasdaq: INTC) Dialogic telephony cards that terminate the T1s and process the DTMF and speech to clean it up before handing it to the VoiceXML browser.

The browser starts the call flow by fetching pages from the call flow servers via HTTP. In our case, these servers are Debian Linux systems running BEA's WebLogic J2EE. Although we speak of the browser and servers conducting a VoiceXML dialogue, recall that VoiceXML is the content of those HTTP fetches, just as HTML is the content of a Web browser's HTTP fetch.

Operationally, the fact that the interaction between the VoiceGenies and the call flow servers is HTTP-based is very elegant: It automatically confers interoperability on any additional high-availability network components that speak HTTP. For example, let's say you already use a load-balancer to front-end your Web servers. With VoiceXML, a phone call is no different than a Web session, complete with a session ID, so your load-balancer can manage these sessions to the call flow servers.

This detail may seem trivial for those of you who don't have to deal with operations, but it is significant. The undesirable alternative is to deal with proprietary IVR systems that may be more difficult (and costly!) to harden for 24×7 availability. To this point, note that the VoiceGenies themselves are essentially data-less, which makes them simple to maintain and replace in case of a hardware failure. In addition, we don't have to make data backups for them, and we can configure them to take most any type of call.
Call Flows Meet the Business Tier
All the VoiccGenie needs in order to know how to handle a phone call is the URL to fetch when the phone call begins. When the conversation requires business-tier data (such as customer or product information) or results in a transaction, the call flow application interacts directly with the business-tier, in our case with the online transaction processing (OLTP) system of record.

In most companies, there is already some mechanism in place -- such as Java Transaction API (JTA), Enterprise JavaBeans (EJB), remote procedure calls (RPC), Microsoft's (Nasdaq: MSFT) .NET or MQSeries, etc. -- to provide access for existing Web and CSR applications. Because the call flow server in the IVR can run on the same technology as existing Web apps, adding VoiceXML-based IVR may mean no change at all to the business-tier.

Beyond cost savings during the initial integration efforts, this arrangement greatly increases the chances that new functionality can be offered simultaneously across multiple customer contact channels -- the Web site and the CSRs, say, or the IVR, Web and CSRs. That makes business and marketing folks happy; they don't want to hear "we can do it on the Web in two weeks, but the IVR will take three months."

If, for whatever reason, a caller reaches the IVR, but then needs to talk to a CSR, the call is transferred from the VoiceGenie to the ACD. When the CSR answers the call, the call flow systems are accessed again, this time to deliver the screen pop information to the CSR application.

This is another area where the flexibility of Linux (and open systems in general) makes a difference; all the CSR screen pops are a combination of EJB calls (via WebLogic) and common gateway interface (CGI) calls running under Apache. The CGIs, written in Perl, allow for easy maintenance. We can quickly respond to changing call center needs without having to develop and release new J2EE applications.

That said, some CTI implementation details can be hair-raisingly intricate, due to a couple things. First, you're taking an outside event, a phone call, and tying it to an event on a computer, a screen pop. Unless you're running softphones, there are some synchronization issues to address, because the computer may think it's busy when the call arrives, and an outside source generally has to inform it of everything that happens to the call -- hang-up, transfer, put on hold, etc. Second, most existing mechanisms to correlate phone and computer activities are dictated by the underlying PBX and ACD technology (and even the vendor).
Why Call Control Is Still Problematic
Most of us in the call center IVR business contend with complex CTI issues. We don't want to increase talk time or annoy customers by asking repeatedly for the same information -- even if we are running multiple contact centers , based on different applications and using different telco hardware. This gets even more complicated in a service-provider or outsourcing environment, where the call might be passed multiple times from IVR to IVR over the PSTN.

For CTI to work, you need to be able to do two basic things:

1.) Transfer the call from the IVR to the CSR. Ideally, you'd like to release the IVR channel the call was on when you do this, so that the IVR port is free to accept another inbound call.

2.) Associate the call that is eventually delivered to the CSR with what has occurred prior in the IVR. Without this, the customer will have to provide the CSR with the same information that they already gave the IVR.

Of course, there are solutions to both of these problems, but for much of what is deployed today, the mechanisms are proprietary or of limited use.

Let's first look at the need to move calls off the IVR. In our experience, the de facto CTI trunking standard, ISDN PRI, leaves a lot to be desired. Not only are there multiple ISDN transfer mechanisms -- Explicit Call Transfer (ECT), Two B-Channel Transfer (TBCT), RLT (a Nortel (NYSE: NT) variant of TBCT), AT&T's (NYSE: T) TransferConnect, Q.SIG Path Replacement, and so on -- but different vendors support different sets of these mechanisms.

So if you're planning to implement a VoiceXME platform with ISDN PRI trunking, be prepared for some sophisticated integration work. At my company, we ran into one of these integration snags and were impressed both by our VoiceXML platform vendor's efforts to provide the functionality for our equipment, and by our PBX vendor's reluctance to supply even basic information about their implementation of the transfer mechanism in question.

Our options were to either initiate an outbound call on another IVR port (also known as a "hairpin" or "trombone" transfer) or purchase a proprietary call-control application from a third party and integrate that into the call flow application. Hairpinning ties up additional ports, while the proprietary call-control software only works with robbed-bit signaling, which has the nasty side effect of losing all the caller and call information (carried in ANI, DNIS, and UUI). Our solution varies with the call -- if we need that information, we use hairpins, otherwise we save on ports and use the robbed-bit signaling. (Aren't you glad you asked?)

Along with the issue of the transfer mechanisms comes the challenge of passing session information with the call. Most ISDN implementations provide for User-to-User Information (UUI, the same field used to display calling party name), which allows between 90 and 240 or so bytes of data to be forwarded with the call. This can be used to pass small amounts of information between call centers and internally (say, in con junction with a database) to pass richer information about the state of the call. The basic idea is that system A updates the database with the results from its processing of the call, sets the UUI with a database identifier, and then propagates the call to system B. System B then queries this information from the database and processes the call, updating the database with its results, etc.

If your switch environment is homogeneous, you can probably find CTI software that helps simplify this for you, basically alleviating the need to set the UUI because it keeps track of the call as it moves around the network. However, the first time you pass the call to something outside the domain of the CTI software, (cough...like a VoiceXML browser), all bets are off.

So if you haven't already noticed, let me be direct: all this complexity can be frustrating and may also cost you some extra effort explaining to internal and external business partners why you can't provide feature X or Y when you transfer the call, or why it'll take a few months of development effort to be able to do so.

The limitations imposed by ISDN, including its great abundance of disparate and incompatible "standards," give me and my IVR crew pause to reconsider ISDN completely. Fortunately, it seems that we are not alone, and there is hope.
How VoIP and SIP May Help
Although the IETF's Session Initiation Protocol (SIP) did not start out to solve these call center call control problems, it turns out that the signaling protocol, designed to set up and tear down voice over IP (VoIP) calls, may provide us some relief. It is focused on many of these exact issues, like transfer and session information, and some others not yet mentioned, like redundancy and ease of provisioning.

With SIP, as with VoiceXML, folks are taking what works in the Web paradigm and extending it into the telephony space. For example, SIP allows you to pass a block of call information in the headers of various requests, not unlike the session IDs passed in HTTP headers.

Of course, SIP and VoIP are getting a lot of media hype, some of which would suggest we should already be migrating our entire call center to these protocols by now. To that I would reply that I've got a fully functioning call center IVR to support, and I can't just drop the technology I have for something new.

On the other hand, my PBX footprint is bound to increase as I add IVR trunks. If a fair percentage of the calls end up being serviced completely in the IVR, this is wasteful (read: expensive) -- and the problem is exacerbated when the IVR has to hairpin its transfers back to the PBX. So before things get out of hand, I've started looking for a less costly way to get calls from the PSTN into my IVR and then back out of it and over into the PBX.

That gives me another good reason to get away from ISDN -- but our otherwise-still-useful PBX doesn't support SIP. It supports H.323, but it costs $10,000 just to enable it, and then you have to buy port licenses, etc. -- typical legacy vendor stuff. At this point, we are dead-set against spending any money to expand this switch or to enable it for VoIP -- but we aren't quite ready to part with it.

One device accepts calls from the PSTN via T1 trunks and then converts those bound for the IVR into VoIP for transport over Ethernet to the VoiceGenies (which speak SIP or H.323-based VoIP). Calls that need to be transferred wouldn't have to hairpin, thanks to the SIP REFER protocol.

There are a number of commercial VoIP gateways on the market, but we were low on budget money and didn't relish writing a business case for what amounted to an experiment. So we started looking around for an open-source SIP gateway that we could use to test our ideas. That's when we came across Asterisk, the open-source PBX. Asterisk provides a serious number of telephony connectivity options for a piece of freeware you can install on a Linux system in an afternoon.

OK, so from a geek perspective, Asterisk is pretty amazing, but what about using it with the IVR? Well, Asterisk let us exercise the VoiceGenie SIP stack to our satisfaction without having to sink a lot of money into a commercial VoIP gateway that we'd end up having to sell on eBay (Nasdaq: EBAY) if our experiment didn't work.

More importantly, we were able to build a proof-of-concept environment to demonstrate how we might transition to VoIP trunking for the IVR and still integrate, via the PBX, with the existing ACD and CTI software. We're not quite ready to run the call center on Asterisk, but it's a very useful tool (and we still use it for teleconferencing).
Conclusion
I was so excited about Asterisk that I purchased Digium FXO/FXS cards to play with at home. Watch out, you telemarketers!

Seriously, I can't overstate how refreshing it is to have a chance of solving telco interoperability issues -- without having to contact a vendor, without having to install a protocol analyzer on the circuit, and without having to wait (potentially months) for a vendor response. We have no plans to abandon anything so far -- except our reliance on the PBX and ISDN trunking. Linux has proven itself rock-solid, and without a doubt a suitable choice for the call center IVR. We're still looking at new and interesting ways to use the VoiceGenie platform. Some of these include outbound dialing (not telemarketing!) and call recording.

It would be nice to think about using VoIP trunking to transfer calls from a partner call center, but the costs and security concerns of linking my company with another via IP over frame relay decrease the likelihood of that happening. I suppose I'd like to see my LEC knocking down my door trying to get me to sign up for a fiber local loop that runs VoIP and SIP natively. Then I could ditch PRI ISDN use this Public IP Telephone Network (PIPTN) to offer enhanced IVR services via tight call flow integration.

In the meantime, I'm going to be reaping the benefits of open source and open standards in the call center, and continue to look for opportunities to leverage these benefits in other facets of the environment."

by Tony Mancill, senior IVR engineer with Vesta Corp.

Contact DSC today. to learn more about our IVR services and IVR application development software.