Introduction to Microsoft Speech Server

Posted February 5, 2004

Chat Date: November 19, 2003

Please note: Portions of this transcript have been edited for clarity

Introduction

Moderator: John_P (Microsoft)
Greetings, everybody, and welcome to today’s Chat: Introduction to Microsoft Speech Server.

Moderator: John_P (Microsoft)
I'll now ask the hosts to introduce themselves...

Host: Chris (Microsoft)
Hi, I'm Chris, Technical Lead in development on the Microsoft Speech Server, focusing primarily on the Telephony Application Services.

Host: Ian (Microsoft)
Hi, my name is Ian and I'm a developer on the telephony application server team.

Host: Kundana (Microsoft)
Hi, I'm Kundana and I work as a software test engineer on Telephony Application Services team

Host: Salva (Microsoft)
Hi, I'm Salvador Neto. I'm a Program Manager in MSS, working specifically with SES

Moderator: John_P (Microsoft)
And I'm John Perry, MVP Lead for Windows SDK, among others.

Moderator: John_P (Microsoft)
Let's get started! Fire away with your questions for our hosts.

Start of Chat.

Host: Kundana (Microsoft)
Microsoft Speech Application Platform enables developers to use Visual Studio .NET 2003, ASP.NET and Windows Server 2003 to create and deploy distributed speech-enabled Web applications.

Host: Chris (Microsoft)
Q: Where can I get an evaluation copy of MS Speech Server?
A: The evaluation copy of the Microsoft Speech Server is not generally available. The beta2 will be available soon. In the meantime you can download the SDK beta 3, which includes a Telephony Simulator application as you may well know. The download location for the SDK is https://www.microsoft.com/speech.

Host: Ian (Microsoft)
Q: We are having a tough time getting hang ups to be detected properly between a Nortel switch and a Dialogic 4port analog card.
A: When you say that the hangup is not detected, what do you mean exactly? Does the SALT app receive the SMEX messages?

Host: Kundana (Microsoft)
Q: what is speech server all about
A: Microsoft Speech Application Platform enables developers to use Visual Studio .NET 2003, ASP.NET and Windows Server 2003 to create and deploy distributed speech-enabled Web applications.Speech Platform uses the Speech Application Language Tags Microsoft Windows Server 2003 family operating system.

Host: Chris (Microsoft)
Q: do I need an active x applet to "Listen" to the mss content on the web
A: This depends on the client. If you're running a desktop client (typically IE) the ActiveX control that ships with the SDK. For Telephony, the interpreter doesn't need (or for that matter) allow ActiveX control.

Host: Salva (Microsoft)
Tomgee - I am not clear what you are looking for - you use the same SDK. The pages run on the Pocket IE browser and the Speech wprocessing can be done in MSS (in some cases in the PPC itself) do you have any specific clarification you want?

Host: Salva (Microsoft)
Q: Could you say a little bit about how multimodal apps for pocket IE will be developed and run on MSS?

Host: Salva (Microsoft)
Q: for Salva_MS - when do you estimate that we will have this (this is a requirement for us to deploy mss
A: Only in our next version, I am afraid. I can't give you any date at this point.

Host: Salva (Microsoft)
Q: I'm using the Speech SDK, does it have the Speech Server built in?
A: In short, no. It does have a voice browser that is a very close approximation of what you will find on the server.

Host: Ian (Microsoft)
Q: reggie : I'm not even sure the dialogic card is receiving the hangup. If that is the case, then the TIM wouldn't get anything right? Therefore TAS certainly wouldn't get any SMEX message
A: Yes, if the card doesn't receive the hangup, the TIM certainly won't, nor the app. Unfortunately, no one here really has experience with anything below the TIM.

Host: Chris (Microsoft)
Q: I'm not Microsoft Speech Partner. Is there any way I can get an evaluation copy of Microsoft Speech Server?
A: You can apply to be a partner. There is a link on https://www.microsoft.com/speech. Realistically, though, I can tell you that we're overwhelmed with interest and will not likely be able to include everyone in the beta. If you have telephony expertise, or are interested primarily in multimodal deployment, you should get the beta2 when this because available for general download

Host: Chris (Microsoft)
Q: I'm using the Speech SDK, does it have the Speech Server built in?
A: The SDK does not include the Speech Server per se. It has an emulator, which you can use using the Telephony Simulator.

Host: Salva (Microsoft)
Q: Are you using ScanSoft technology in MSS for both TTS and ASR?
A: In short, yes. ScanSoft's Speechify TTS engine ships as part of MSS. ScanSoft's OSR recognition engine also works can also be used with MSS, although it odes not ship with it

Host: Chris (Microsoft)
Q: I'd like to make a web form always listen, and if I say a field name, then a value in the dropdown list, it will fill it in. Do you have any suggestions? I'm trying to modify the tapandtalk sample app
A: This is more of an SDK than a MSS question. In theory you could set your various timeouts (silence, in particular) to something really long. In practice, you probably don't want to leave the engine in a listening state longer than necessary, as this will consume CPU needlessly, and you're more prone to false recognitions.

Host: Chris (Microsoft)

Host: Ian (Microsoft)
Q: would it be possible to host asp.net in a process, use this hosted module to process speech enabled aspx pages and then push the resulting salt pages to an instance of MSS for processing?
A: What do you mean by "host asp.net" in a process?

Host: Ian (Microsoft)
Q: I am just looking for a way to turn an asp.net page into the corresponding salt
A: What do you mean by an asp.net page? Do you mean the aspx pages produced by the SDK? I'm still not clear on what you want to do.

Host: Salva (Microsoft)
Q: Does MS Speech SDK Beta3 include ScanSoft TTS?
A: No. not for Beta3 or Beta4, to my knowledge

Host: Kundana (Microsoft)
Q: When would someone use the Speech Server verses whatever the SDK comes with?
A: SDK is used for developing speech enabled web applications and it comes with a voice browser emulator to test the applications.. Speech Server has to be used for deploying these applications which can be accessed via a telephone (client).

Host: Chris (Microsoft)
Q: what's the situation with using mss and Intel HMP
A: The media-related processing is abstracted in a layer called Telephony Interface Manager (TIM.) TIMs are not authored by Microsoft, but rather the vendor for the hardware, or somebody working with this vendor. In other words, Intel can only answer that question adequately. ...answer that question adequately. ...answer that question adequately.

Host: Salva (Microsoft)
Q: Salva: the SDK doesn't include ScanSoft TTS but MSS does or neither?
A: The SDK does not include ScanSoft TTS. MSS does include it.

Host: Chris (Microsoft)
Q: Is Beta 4 available or out?
A: The Beta 4 of the .NET Speech SDK will be available at the same time as the MSS Beta 2.

Moderator: John_P (Microsoft)
For those of you just joining us, the hour's chat topic is: Introduction to Microsoft Speech Server

Host: Ian (Microsoft)
Q: yes I am talking about the aspx produced by the SDK. I'm just wondering if it would be possible to gather input from a switch and pass that to MSS for recognition
A: Your web server side script can communicate with the switch and modify the SALT that gets produced. You would have to code this communication yourself. Is this what you mean?

Host: Salva (Microsoft)
Q: Slva_MS: Is Beta 4 available or out?
A: The SDK Beta 4 (and server Beta 2) are not out quite yet, but will be really soon.

Host: Ian (Microsoft)
Q: can I push prompt or reco requests at the lobby using soap and drive it manually
A: So you want to just use the SES by itself?? In theory, you could talk directly to the SES, but we don't make the SOAP protocol public. I'm still not sure why you would want to do this.

Host: Chris (Microsoft)
Q: What is the difference between Speech SDK and .NET Speech SDK ???
A: The Speech SDK is an older SDK, based on SAPI. This is a programming API for general speech applications, including dictation. The .NET Speech SDK is an authoring frame specifically targeting speech-enabling Web applications, and uses Speech Application Language Tags (SALT) for markup. In principle you can author applications using the .NET Speech SDK and can run these applications on any SALT-compliant interpreter, running against any Speech Recognition engine/technology.

Host: Kundana (Microsoft)
Q: Thank you Kundana_MS. I'm not interested in the telephony aspect at this point. Would I still need the Speech Server? Would I need it for "deploying" the TapandTalk sample app for instance?
A: A. Speech Engines Servcies which is a part of Microsoft Speech Server is required for running the tapandtalk app on a multi modal client.

Moderator: John_P (Microsoft)
Q: rvmey : How much will Speech Server cost?
A: We really won't be able to answer that until closer to the release time...sorry.

Host: Chris (Microsoft)
Q: I assume IE is a multi modal client. I keep seeing that phrase multi-modal but I don't know what it means.
A: In this context we're referring to input modality, and clients like IE, or Pocket PC have multiple input modalities such as speech, keyboard, mousing, etc. By contrast, a plain-old telephone is _not_ a multimodal client. By this definition IE is a multimodal client. However, when referring to multimodal in the MSS, we're more often referring to the pocket PC client. In an earlier question there was a some confusion about the tap-n-talk sample requiring SES. If you're on a PPC client doing remote recognition, you will definitely need SES. If you're running on IE and doing local recognition, you will not need SES. You can run the tap-n-talk demo in IE using the SDK beta 3 today.

Host: Ian (Microsoft)
Q: do you have any hammer stress testing scripts that might be public?
A: We use hammer to a limited extent in testing, but we don't have any publicly available test scripts.

Host: Ian (Microsoft)
Q: what are some effective stress testing techniques? at 4 or 8 ports, just get a bunch of people on phones
A: Hammer is a good tool, if you have it. We have our own internal test tool that works somewhat like hammer, but this is not publicly available.

Host: Ian (Microsoft)
A: For low port numbers you can get a bunch of people on phones, but I doubt you can get them to talk for hours!

Host: Ian (Microsoft)
A: You could in theory get one MSS to provide the load for another MSS installation.

Host: Chris (Microsoft)
Q: Can we use multiple voice boards on MSS?
A: In general there needs to be a one-to-one correspondence with the TAS and the TIM. If the TIM can handle multiple voice boards, then everything should work fine.

Host: Chris (Microsoft)
Q: what is the status of blind transfers in beta 1/2?
A: This should work fine.

Host: Kundana (Microsoft)
Q: this is a high-level question about features. Will MSS be capable of answering a call, interpreting input from that call (keyed or spoken), and then passing that in the form of xml (SALT) for storage or manipulation?
A: Yes, MSS uses a component called Telephony Interface Manager(TIM) to answer the call and recognizes the user input and sends the result to the webserver.MSS supports the Speech Application Language Tags (SALT) markup language.

Host: Ian (Microsoft)
Q: what is the most effective way to modify the grammar for a QA at runtime? Is creating a new grammar object and adding it to the grammars collection of the QA the preferred way?
A: You can activate and deactivate rules. Alternatively, you can change the grammar associated with a Listen, or edit the inline content (if it is inline).

Host: Chris (Microsoft)
Q: Are there tools for programmatically producing grammars from a database?
A: There is a thread covering this in the NG. You should be able to find some information on microsoft.public.netspeechsdk

Host: Chris (Microsoft)
Q: Follow-up question. When you say TIM recognizes the call and sends it to the webserver, what do you mean by sends it to the web server? Is MSS capable of persisting the call into a sql database or dropping it into a queue (MSMQ) for later processing?
A: What Kundana meant was that when the call comes through, your application page will typically navigate to another application page, which resides on a web server. I'm not sure about what you mean by persisting the call, but if you mean record the audio transcript, this could be done. MSS will not, however, do simultaneous record and recognition, so you cannot do this at the app level. You will need to get the audio information from SES.

Host: Salva (Microsoft)
Ok guys. I have to run to a meeting. Nice talking to you. Bye.

Moderator: John_P (Microsoft)
This has been a GREAT chat. Thank you to everyone. Unfortunately, it is time to go.

Moderator: John_P (Microsoft)
To find out more visit https://www.msdn.microsoft.com/chats/

For further information on this topic please visit the following:

Newsgroups: microsoft.public.netspeechsdk

MSI Transcripts: Read the archive of past Windows chats.

Website: Visit the Microsoft Speech Technologies site.

Top of Page