The Web-CTI Revolution
Voice-over-IP solutions are typically marketed and sold under a variety of short-term premises, but the most significant potential for VoIP is the long-term capability for integrating voice services into your data services architecture. Simply put, once you have converted your telephony services into packetized network services, the voice services that used to operate outside the data network become just another distributed application, with the same potential for integration and development as any other data service.
Of course, there are several Computer-Telephony Integration (CTI) interfaces that are already in widespread use across a variety of different platforms (see sidebar), although most of the legacy APIs are not generally useful for modern networked applications, given that they tend to be platform specific. Over the last few years, vendors and advocates have been pitching SIP as a fresh solution to this space, since it already uses a control syntax that is ostensibly platform-neutral, and already provides raw access to the kind of functions that are most often needed by business applications. But while SIP does work well for many things, it also approaches the data-integration problem from the wrong direction—it requires integrating your data-centric applications into the voice network, while most of us want to bring the communications services into our data-oriented network application space.
This is where Web service interfaces to voice networks offer the most promise. Theoretically, wrapping the traditional communication services into well-defined XML messages which are then transferred across standardized SOAP protocols will allow organizations to bring telephony and messaging services directly to their applications, without requiring the applications to become peer devices on the telephony network. Better yet, Web services also have the potential to provide an abstract interface into the telephony service that is independent of the underlying telephony protocols (they can provide interfaces to devices on SIP and H.323 networks, and even PSTN networks), and applications only have to become peer devices of those networks if the application itself needs to be voice-enabled.
More to the point, some IP-PBX vendors are already shipping Web Service interfaces for their systems today, so this is potentially possible right now. Unfortunately, most of what is available today is very rough first-generation stuff, and the industry as a whole is many years away from seamless interoperability between telephony systems and user applications via Web services interfaces. However, there are a handful of vendors who are shipping products with these interfaces today, and there is also some ongoing standards activity that is worth watching. IT folk should be aware of these efforts and use this information for planning purposes, but should not expect to be able to buy cross-system, standardized solutions today.
The Promise
On the surface, Web service interfaces to traditional telephony systems (referred to as “Web-CTI” in this article) don’t appear to be all that different from existing interfaces. All of them provide a collection of functions that computer-based applications can tap into, and the only real visible difference is in the transport mechanisms. For example, the platform-specific interfaces usually rely on local transports (such as plain old RS-232 serial connections), while network-oriented interfaces use network protocols to extend the interface across a data network, and from a cursory examination the same appears to be true for Web-CTI interfaces too.
However, the real difference with Web-CTI interfaces is in its additional layering. Whereas the traditional interfaces merely extend the telephony API outwards (but still require the data-oriented applications to conform to that interface), Web-CTI interfaces provide an abstract service-control layer that is separate from the telephony layer. Moreover, the application interface looks and feels like any other application-oriented interface, and does not require the connected applications to become telephony devices. Executing a telephony task is the same as any other kind of task—you can initiate a phone call just as easily as you can issue a database lookup (or anything else that is available through a parallel Web service), and you can do so without having to become a peer device on the telephony network.
This concept is illustrated in Figure 1. In the Web-CTI model, a simple WSDL interface exposes a variety of telephony and communication functions to the application plane, thereby allowing the applications to make use of whatever communication services are needed. Meanwhile, the back-end systems do whatever is needed to make the task actually succeed, whether that be placing a call through one of the available telecommunications interfaces, or communicating with an IVR system through a serial line. In other words, Web-CTI model provides an abstract interface for applications to use which is entirely separate from the telephony and communications infrastructure, which is a significant difference from the traditional interfaces that simply extend the telephony infrastructure outwards without any separation.
But while this separation makes integrating control functions into data-oriented applications simpler, it’s also important to recognize that it imposes a wall between the two worlds as well. In particular, applications do not have access to the voice media through the Web-CTI interface—while they can manipulate calls all day long, they cannot actually participate in the call through this interface. If you need to have your users and/or systems a voice call, you will still have to bring them to the telephony network (whether that be SIP/RTP, a POTS line, or whatever).
It’s also important to recognize that this area is in its early stages, and the industry as a whole is a long way from seeing this kind of promise in a widely-available, standardized form. In particular, different kinds of functionality and implementation models are still being fleshed out, and several more years of work will be needed before the industry can coalesce around a set of practical and functionally-delineated standards.
On the one hand, different kinds of users are bringing different kinds of demands to the table, and vendors are having to address those needs differently. For example, some organizations will be most interested in call-control features for customer-relationship applications, while other organizations will want to integrate presence and instant messaging functions into their corporate applications. Meanwhile, developers of third-party products will likely want to have low-level access to telephony functions as well as media services (such as are usually needed for third-party voicemail and IVR systems), while desktop users may just want to integrate their contact management application into the overall telephony service or simply be able to manipulate basic features like presence and call-forwarding through some kind of high-level interface.
On the other side of the coin, vendors and the standards bodies are also pursuing their own functionality targets. For example, most of the first-wave products from the IP PBX vendors are focusing on high-level call-control functions that are abstracted from the underlying technology, with the intention of expanding into related technologies over time. Meanwhile, the current crop of standards are generally focused on their particular segment of the telecom industry, and its only through serendipity if those standards also provide the kind of functionality that’s needed by broader markets.
All told, it is highly unlikely that a single monolithic standard will emerge anytime soon that addresses all of the desired functions, especially one that has a universally-applicable level of granularity. Instead, multiple specifications are likely to evolve that each provide targeted functionality, with consolidation around a handful of standards happening three or four years from now. Simply put, things are going to get a lot more complicated before they get simpler.
The Players
About a dozen IP PBX vendors are currently shipping Web service interfaces to their systems, but most of those interfaces are aimed at administrative tasks, such as managing the users and phones attached to those systems, or configuring the PBX itself. We could only find three products that are capable of performing rudimentary call-management tasks through a Web services interface to their IP PBX today: Avaya’s Application Enablement Services, Sphere Communications’ Sphericall, and Siemens’ HiPath 8000. All of the other vendors we spoke to were unable to meet our minimum requirements, or were unwilling to discuss their implementation in detail.
For example, Cisco’s line of IP PBX systems does not yet have the ability to manage calls through a general Web-CTI interface, although they do make use of Web services for some configuration and administrative tasks. It is theoretically possible to use some of these interfaces to emulate a phone device in software and achieve some rudimentary integration, but this is not documented and probably would not provide sufficient functionality. Furthermore, Cisco representatives that we spoke to said that their short-term strategy was to continue consolidating their various acquisitions around common local interfaces, while relying on third-party vendors like Metreos to provide additional development tools and services. But while Metreos does indeed have a compelling CTI development platform, they do not have a suitable Web-CTI interface as of yet. Meanwhile, BlueNote Networks says that they are developing a Web-CTI interface to their IP PBX line, but we were not allowed to examine the interfaces or its documentation.
Curiously, while vendors have been cautious with implementation, there has been some significant standards development in this area (usually it’s the other way around). In particular, the ECMA Web services interface for CSTA and the Parlay Web services interface for OSA are both relatively complete, and both of them have been available for a couple of years now. Unfortunately, standards only matter when they are implemented, and we could not find any IP PBX vendors that had fully embraced these efforts.
The ECMA Standards
The most noteworthy of the existing Web-CTI standards are the collection of specifications from the European Computer Manufacturing Association (ECMA). The ECMA specifications are focused on call-management functions, and are already widely implemented in private exchange gear typically found on enterprise networks (this includes Avaya and Siemens IP PBXs, which both use a subset of the ECMA collection).
At the heart of the relevant ECMA standards is CSTA (Computer-Supported Telephony Applications) as defined by ECMA-269, which essentially describes a generalized API for telephony applications to use when communicating with other services and devices. ECMA-269 defines over 130 functions, ranging from basic call-control tasks to operational features such as putting a device into a Do-Not-Disturb condition, and also describes ASN.1 encoding rules for those functions. However, these specifications are heavily focused on call- and device-management, and are missing many important non-telephony functions, such as presence and instant messaging.
The CSTA specification was subsequently supplemented by ECMA-323, which defines XML encoding rules as an alternative to the ASN.1 encoding rules, and also provides examples for use with different SOAP bindings. ECMA-323 was further supplemented by ECMA-348, which defines a standard WSDL definition for the XML encoding, and provides examples for use with SOAP/HTTP.
We do not know of any vendors that implement ECMA-348. Avaya and Siemens both implement portions of ECMA-269 and ECMA-323 (usually implemented as XML-over-TCP), but that’s as close as we’ve seen. Furthermore, many of the vendors we spoke to have expressed the opinion that the ECMA standards are too low-level and complex for wide-scale adoption outside the vendor community. Given the broad adoption of CSTA however, we feel it is highly probable that these standards will continue to be adopted in some form.
The Parlay Standards
The other significant set of standards are the Parlay collection of specifications, as published by the vendor consortium of the same name. The Parlay standards are frequently used to provide application interfaces to carrier networks, and are therefore somewhat common in carrier-class systems and the associated application-development platforms, but they are not at all common on enterprise telephony gear as of this moment. However, IT organizations that want to integrate public-network telephony devices and services into their CTI applications as peers of local devices will likely need to work with Parlay at some point. Furthermore, enterprise application platform vendors like BEA, IBM and Oracle do support Parlay in their “carrier” product lines, and its likely that one or more of those tools will bring some of that functionality into the corporate space, dragging the Parlay interfaces along.
The core Parlay specifications were developed by the Parlay consortium in conjunction with the European Telecommunications Standards Institute (ETSI) and the 3rd Generation Partnership Project (3GPP is the oversight body for the 3G digital-cellular technology). These interfaces form the API layer of the 3GPP Open Service Architecture (OSA), and are generally referred to as the Parlay/OSA APIs. These APIs are intended to be portable across multiple development environments, and the specification describes their use with CORBA and JAIN, and also includes a WSDL definition. Separately, there is also a subset specification called Parlay/X which describes a lighter and higher-level set of APIs that is optimized for use with Web service interfaces in particular. Whereas Parlay/OSA provides asynchronous access to numerous low-level functions, Parlay/X provides synchronous access to a much smaller number of functions.
However, the Parlay/X dictionary maps quite well to the kinds of functions that corporate CTI developers might want to use, with high-level functions for call-control, conferencing, presence, messaging, address book management, and so forth (there are also functions that are more suitable for traditional carrier networks too, such as functions to manage ring-tones and billing information). This makes Parlay/X an interesting specification, even if it is not widely used in corporate telephony environments yet.
One potential problem with the Parlay model is that it is heavily layered. For example, in those cases where OSA provides the network-native application interface (as is the case with 3G cellular networks), Parlay/OSA simply exists as a programmable service, but in other cases the Parlay support has to be provided by a gateway of some kind. Since Parlay/X represents a subset of the Parlay/OSA APIs, it is also usually implemented as a gateway also. This means that the full Parlay stack often requires two gateways: one between Parlay/X and Parlay/OSA, and another between Parlay/OSA and the native telephony network.
Worse though is that there is no real support for Parlay/X in the IP PBX market—we do not know of any vendors who offer it at the current time. However, if application vendors begin pushing into this space, or if enterprise IT developers start clamoring to expand their applications into cellular networks, there is some likelihood that the market will adapt to those demands.
Sphericall Web Services
The most comprehensive Web-CTI interface available today is the Web Services SDK for Sphere Communications’ Sphericall IP PBX offering. The Web services component exposes the Sphericall IP PBX services through a SOAP-compliant WSDL interface, and provides a lightweight, synchronous messaging interface that is optimized for end-user development. It provides functions for third-party call-control, conferencing, call-recording, presence and status, instant messaging, number lookup, and call-history lookups. Sphericall Web Services also provides some administrative and event notification functions, which further rounds out the interface. Multiple Sphericall PBX systems can be installed in a clustering environment, and third-party devices can also be connected through the TAPI and SMDI local interfaces.
The call-handling, conferencing, call-recording, and IM/presence functions appear to be more than suitable for most purposes. Moreover, Sphericall is the only Web services offering that has a sufficiently comprehensive interface at this time, and none of the other implementations that we looked at were as broadly usable.
Sphere has also implemented the most complete session management model, with support for asynchronous bi-directional communications over the SOAP channel. Sphere uses semi-permanent session identifiers to maintain long-term state across transactions, coupled with a client-side “fetchEvents” function. In this model, the client opens a connection, ask for any new events that are associated with the session, and then enters a timeout condition while waiting for event messages to arrive. If no notifications are received within a specified interval, the client will eventually timeout, and then reconnect with the server to restart the process. Cumulatively, this provides for bi-directional asynchronous session-level event-messaging over SOAP, which none of the other implementations offer.
There are a couple of other interesting features in the Spherical Web services implementation worth noting. For one, presence and status information can be set by through the Web services interface, meaning that you can have your application change the user’s call-status automatically (such as changing an operator’s status to reflect the fact that they are talking to a customer whenever they release a call from an incoming queue). Also, the Sphericall IP PBX has a feature called “forwarding profiles” which allow for user-defined call-routing, and those features are also partially exposed through the Web services interface. Finally, Sphere also provides a simulation server which can be used for off-line development and testing, which will probably prove to be extremely useful for most in-house developers. Overall, the Spherical Web services interfaces is pretty comprehensive, and is by far the most complete offering available today.
Avaya Application Enablement Services
Avaya’s Web service interfaces are part of their Application Enablement Services (AES) offering, which is an add-on gateway to Avaya’s IP PBX products. Actually, Avaya has two different interfaces in AES that are of interest to us: there is a high-level first-generation Web services interface based on WSDL and SOAP, and there is also an XML-over-TCP interface based on ECMA-323.
AES also has a handful of classic interfaces (including JTAPI, TSAPI, and CSTA over ASN.1, among others), as well as their own proprietary interfaces, all of which are implemented on the public side of the gateway for applications to tap into. On the back side, the gateway uses Avaya’s proprietary CLAN protocol to communicate with the Avaya IP PBX systems, which in turn implements the local signaling protocol(s) needed for the canonical telephony functions to work.
The WSDL/SOAP Web services interface has functions for managing user accounts and settings, functions for managing the system and devices, and functions for managing call-related activities. At the present time, the range of telephony-related interfaces is pretty small, and is limited to functions that can create a call, answer an incoming call, conferencing and transfers, and session management tasks. There are no functions for presence or instant messaging, voice-recording, call-history lookups, or much else.
However, the XML-over-TCP interface, which Avaya refers to as the Communications Manager API (CMAPI) XML SDK, is much more comprehensive. For example, the current XML SDK contains 238 CSTA-specific XSD files and 52 Avaya-specific XSD files, ranging from call-control and device-management features down to ancillary features like call-recording and playback. As with the WSDL/SOAP interface, there are no functions for presence or instant messaging in the XML SDK as of yet.
Siemens HiPath 8000
Siemens’ HiPath 8000 is billed as a carrier-grade, software-based IP PBX solution that is generally sold into very-large networks. Siemens also has a line of add-on products, including the OpenScape presence and collaboration platform, the Xpressions unified messaging system, and the ProCenter call-center platform. Currently these layered products use CSTA or SIP to talk to the IP PBX, but Siemens says that their strategic plan is to eventually provide SOA interfaces that can support all of these kinds of products directly.
The current HiPath 8000 v2 software release (which just started shipping in April) is somewhat below that target objective, but is a good indication of their strategic direction. At the moment, the HiPath 8000 provides some basic administrative interfaces for device and user configuration (these were also present in the v1 release), and also has some rudimentary call management functions for call-setup and disconnect, call-history lookups, and address book management tasks.
However, Siemens says that all of the HiPath 8000 internal functions are already represented in XML, and they are continuing to productize the low-level interfaces into high-level WSDL. In particular, the 2.1 release due out this summer is likely to have advanced call-control functions, and may also include presence and messaging interfaces, although Siemens would not commit to product details or a release schedule.
Overall, the HiPath 8000 architecture seems to be well designed for scalability purposes, given that it already uses SIP and CSTA for most of its internal functionality. If Siemens is able to productize the these functions into a usable WSDL/SOAP interface, they will have a very strong, standards-driven offering in this space.