July 04, 2008

Great intro to Speech Server!

 
 

Sent to you by Brandon Tyler via Google Reader:

 
 

via Search results for '"Speech Server"' by dszabo on 7/3/08

Office Communications Server 2007 is Microsoft's IP communication solution and it allows companies to leverage their network infrastructure for voice, video communication, instant messaging, audio/video calls and for much more. You may ask what's the benefit of using the computer network as opposed to the internal telephony network - you don't pay after the minute at neither of them. However, if it's the computer network, it's only the matter of software to integrate the telephony, video and IM with desktop applications. By routing the IP packages through a server in the DMZ, users can call each other at not cost wherever they are. The presence is also very important to mention - users can see each other's status (whether they are online, busy, away, on a meeting with their laptop, out of the office, etc). The presence icon is integrated into every Office applications and it tells the caller in advance whether the other party will be answering the call or not, or if it's not the right time for the call. This is integrated to every piece of the Office System (SharePoint and the Office client products). Online users will stop using mobiles, there's a whole change in the communication culture. This article outlines OCS 2007 Speech Server, which is an additional server role for Office Communications Server.

OCS 2007 Speech Server

Every software client/device is a UC endpoint in OCS - whether it's an IP phone, Office Communicator (the client of OCS, like Messenger), a video camera in an A/V meeting room, etc. Imagine that you have not only these endpoints, but that you also have non-human endpoints connected to your OCS/telephony infrastructure. These endpoints are software-driven and can communicate with callers on the phone. An example of such an endpoint is Exchange Server Voice Access, where you can get Exchange to read up your emails and you can do other clever things (say "Clear my calendar for today" - which sends a cancellation to every attendees of your meetings for today). You can write these applications using managed code and these application can be deployed and enabled in your OCS infrastructure. These numbers can be even enabled for callers outside of your organization (this is how Exchange Server Voice Access works at Microsoft).

How to write programs for Speech Server?

There are 3 important areas in a voice enabled system:

  1. Speech: the quality of the speech engine
  2. Voice recognition: the quality of the voice recognition engine and
  3. Programmability - how easy to develop voice-enabled applications on this platform.

I'll start with the programmability one and I let you to judge on the other two. There are two programming models that you can use: the web-programming model where the voice application is hosted in IIS as a web page and the dialog is represented by a set of post backs. The other programming model is using Windows Workflow Foundation to design the conversation's flow. I'll focus on the latter today and will skip the web-based one. For the workflow programming model, you need Visual Studio 2005 SP1, IIS, MSMQ and Speech Server installed on your PC (see the pre-requisites section).

Fire up your Visual Studio, there's a project template called "Voice Response Workflow Application" after you have install the development components. You can already start dragging and dropping workflow activities into your workflow designer to describe the conversation's flow. There are many workflow activities that you can use: Statement activity, QuestionAnswer activity, GetAndConfirm activity - this one won't step to the next activity unless the caller is confirmed his/her answer, Menu, etc. When your workflow asks something, you define the question for the activity, like "Can I have your employee ID please?", then you need to define what format you expect the answer in - this definition is called "Grammar". The grammar is a pre-defined pattern that defines the different ways the answer can be said. For example, "yes, it's 1234", or "my employee id is 1234", or "1234", or "it is 1234" and so on. We define a placeholder in this pattern for the number because that's the only thing that we are interested in, and we define the different options how the answer can be said. There's a designer that helps you creating the grammar.

The grammar will have an output variable which you can get when the caller is answered your question. Here, you need to write code - when the caller answered the question, you will get the employee ID into a variable that you can convert to a numeric value and you can do your actions based on this number - for example, you can look up this employee ID in Active Directory, etc.

I've prepared with a small application just to show how this thing works. What it does, it calls you up and it asks you about the number of computers and persons in your household and it submits the answers to a database. I could have written a more intelligent application as well, but this will be enough to understand how it works.

Prerequisites

To be able to play with the product, you need to install all components of it on your development environment.

In order to install the Speech Recognition Server component, you need to enable a few features if you don't have them already enabled. You need to re-start the installer every time you have enabled a feature - it won't refresh automatically. To save some time, copy my features list (Vista):

Enable for OCS VR

You also need Visual Studio 2005 with SP1 (VS 2005 RTM is no go), and the Visual Studio 2005 extensions for .NET Framework 3.0 (Windows Workflow Foundation) package in order to be able to install the Development Tools component of the product. Installing Visual Studio 2005 Service Pack 1 Update for Windows Vista is also recommended if you run Visual Studio 2005 on Vista.

After the product is set up, at least one language pack needs to be installed (you can find them on the installation DVD or your can download them from the Internet) for the Windows services to start. I've installed the English/UK pack. There's also a US and Australian English available on the DVD and 11 additional languages. The full list is:

  • Chinese (People's Republic of China)
  • Chinese (Taiwan)
  • English (Australia)
  • English (United Kingdom)
  • English (United States)
  • French (Canada)
  • French (France)
  • German (Germany)
  • Italian (Italy)
  • Japanese (Japan)
  • Korean (Korea)
  • Portuguese (Brazil)
  • Spanish (Spain)
  • Spanish (United States)

What is a Grammar? What is a Rule?

The "grammar" is a collection of "rules". A rule is a mini workflow where you can describe the expected sentence's structure. In my case, I expect the answer "I have X computer at home" or something similar from the end user. I designed my rule to accept a more sophisticated answer as well, like "I have only 2 computers at my household" or "I have got no computers". It's up to you how you make your rules finer and more resilient. The result of the rule is a value which is the number of computers in my case. The following is a screen shot of my rule from the Visual Studio Rule Editor:

image

The green shapes are called Lists, the white ones are Phrases. Only one of the Phrases apply inside a List shape. The pink shapes are Rule references (RuleRefs), they are used to reference to other rules. The two RuleRefs in my case are references to numeric rules, they are used to recognize the number 0 and the numbers 1 to 999 and convert the recognized words to a numeric value. Looking into those rules, they have several lines where they combine the recognized words and calculate the numeric result. The result is written to the $$._value member variable which is then copied to the $._value member variable by a Script Tag (blue). The value in $._value then can be referenced in the Voice workflow and can be used for further tasks (in my case, confirming the number of computers to the end user). After compiling the grammar, the outcome is an XML file, with a .grxml extension.

What is the Voice Workflow?

After designed the rule, let's work on the workflow part, which is the one that controls the main flow. The Rule that I've described above is evaluated in the HowManyComputers QuestionAnswer activity.

image

How can I start?

I recommend to install the developer samples. After installed, I suggest opening the HelloWorld project from the C:\Program Files\Microsoft Office Communications Server 2007 Speech Server\Samples\Workflow\HelloWorld folder and playing with it.

You can test your application by pressing F5 key - you'll get the Voice Response Debugging window where you need to click on the Call button:

image

When the workflow starts and the server asks you a question, you are redirected to the second tab and need to click on the Start Recording button. Say your answer and Speech Server will recognize it.

image

When your answer is recognized, click on the Submit button to post back your input to your workflow.

Questions?

Don't hesitate to ask!


 
 

Things you can do from here:

 
 

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


July 02, 2008

sipit.jpgAs I’ve written about before, we’re fans of the SIPit interoperability events that are sponsored by the SIP Forum as they provide a great way to test how well different vendors SIP implementations interoperate. We recently attended SIPit 22 at the University of New Hampshire and the feedback was extremely helpful in our continual effort to improve our products.

Anyway, SIPit 23 was recently announced for October 13-17 in Lannion, France. The event is hosted by the ETSI Interopolis Service and France Telecom-Orange Labs. ETSI has a website for the SIPit 23 event that is full of information about the event.

I don’t honestly know yet whether we’ll be attending, but I do encourage vendors to seriously take a look at attending. It’s a great place to learn how well your SIP implementation plays nice with others.

Technorati Tags:
, , , , ,

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


July 01, 2008

I found this in my email inbox this morning.

Dear Marshall Harrison,

Congratulations! We are pleased to present you with the 2008 Microsoft® MVP Award! The MVP Award is our way to say thank you for promoting the spirit of community and improving people’s lives and the industry’s success every day. We appreciate your extraordinary efforts in Communications Server technical communities during the past year....

 

I'm sure that GotSpeech and the impact it has on the Speech Server community had a lot to do with me being rewarded again (3rd year now). GS has become a very active and thriving community and I am always thrilled to see new members come on board. I am really excited when I see members that started out with lots of questions progress to the point that they are now answering other people's questions. That is what it is all about.

So, thanks to each of you for the time you spend on GotSpeech and the contributions that you make to the site and the community.

 

You can find more info the Microsoft MVP program by visiting https://mvp.support.microsoft.com/.

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


June 27, 2008

I originally interviewed Mike Wehrs, Nuance’s vice president of evangelism and industry affairs for an FYI about vSearch in the July/Aug issue of Speech Technology Magazine. Unfortunately, the time crunch was such that we weren’t able to slot the quotes into the story. (Editor: You really need to meet your deadlines. Ryan: I’ll work on that later.) Mike gave [...]

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


June 26, 2008

I've been noticing that the amount of activity on the forums has been increasing with lots of new posters. That is encouraging as it means the new people are trying out Speech Server. Interest in Speech Server is building and that is good for all of us.

One of the things I do quite often is look at he stats towards the bottom of the Forums page. Yesterday when I looked this is what I saw -

GotSpeech Stats

5 new threads, 50 new posts and 13 new users in a 24 hour period. That is the sign of an active and thriving community. My heartfelt thanks to everyone who contributes to this community.

It just goes to prove what Ive been telling everyone - GotSpeech is the place to go for information on OCS 2007 Speech Server.

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


June 24, 2008

Last week we announced LogSearch Beta for our Evolution developer portal. As part of the beta process we will be improving it with a number of new features based on feedback from our customers. 

The first round of changes adds several new features including: message categories, filtering, easy reversing and a new black color theme.

The first thing you will now notice is that when you login you will see a tool bar at the top with a bunch of checkboxes in it:

These boxes don’t do much until you run a search but once you search for something you will start to see some of the power of what it can provide. 

To start with lets run a search for a session and see what we get:

As you can see you get the full set of log messages here. Whats is new here is that you now can see that we have colorizing of the log messages making it easer to pick out the important messages in the log stream. 

The first thing you might want to try doing is clicking on one of the filters and see what happens. For example if you click “Playback” you will filter out everything but the browser playback messages:

The next thing you might want to try is clicking on the “User Filter” button to filter stuff down to only being user facing log messages:

Another new feature is the new button to easily reverse the results to be in chronological order instead of most recent first. This can make it a lot easer to read things like session logs where you are trying to understand the flow of what happened. Do be aware however that if your search spans a large amount of time it may take longer to display the initial search results in this mode since we need to search over the entire time block and gather all the results before we display them in the UI:

Lastly one other thing you may like to try is switching to the “black” theme we now include. To enable this open your preferences dialog and select VoxeoBlack on the theme menu on the General tab:

You will then get your UI rendered in proper old school programer colors:

 Hopefully these improvements will help make the tool more useful.  As always we are looking for feedback on ways we can improve our offerings so please leave us feedback either in the comments below or in our LogSearch feedback forum on the Evolution developer site

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


June 23, 2008

A quick note to alert you about Voice Secure, our latest partnership integration, this one with Voice Verified.

Below is a blurb from our solution landing page:

Angel.com combines award winning IVR and call center technologies with voice authentication solutions to provide a way for callers to quickly, conveniently and securely access personal information over the phone. By simply repeating a few random digits in the IVR, caller identity is automatically verified through the composition of their voice.

  • Secure – No more revealing private information over the phone, risking having your personal information fall into the wrong hands.
  • Convenient – No more forgotten passwords or PINs.
  • Saves Time – No more time consuming process of routing calls to agents for verification of identity or password resets.

Bank account balances. Medical test results. Order status. Customers call your IVR for many reasons, but what they all have in common is they want their information quickly and they want it to be secure. Current security measures include passwords, PINs, challenge questions or personal information to confirm identity, creating a lengthy process for a caller to get the information they need over the phone. Not to mention the burden this places on your call center agents simply to identify the caller.

With simple and accurate voice verification through your Angel.com IVR or call center solution, your callers and agents are immediately able to focus on call resolution rather than identity confirmation.

Bookmark this post:
Ma.gnolia DiggIt! Del.icio.us Blinklist Yahoo Furl Technorati Simpy Spurl Reddit Google


Last updated: July 05, 2008 02:01 AM All times are UTC.
Powered by: Planet

Speech Connection and Logos are Trademark of Nu Echo Inc. Copyrigth 2006.