One of the key metrics of a satisfying call experience is the end-to-end one-way delay (sometimes referred to as “audio latency”), which is the time elapsed between the customer saying something and the agent hearing it. Not being able to interact smoothly with an agent, an interactive voice response (IVR) or an intelligent virtual agent (IVA), are poor customer experiences, and we strive to minimize them. According to the G.114 ITU recommendation, the user perception of the call quality deteriorates as the one-way delay exceeds 200 milliseconds. If it exceeds 350 milliseconds, holding a conversation is difficult and the delay becomes very annoying. Thus the target figure for most of IP telephony systems is to keep the one-way delay under 200 milliseconds.
In December 2018, Amazon introduced a new feature in Amazon Connect which allows consuming the audio stream of the caller with an external application via AWS Kinesis Video Stream. This feature opens up new opportunities such as running analytics on what the caller is saying during the call. However, we wanted to know if this feature is exploitable for real-time processing such as using an alternative speech recognition engine or an external IVR. That’s why we designed this little experiment.