Microsoft Speech API - Cognitive Speech STT iOS : Language not changing

Question

Microsoft Speech API - Cognitive Speech STT iOS : Language not changing

I have started recognition using

_micClient = [SpeechRecognitionServiceFactory createMicrophoneClient:SpeechRecognitionMode_ShortPhrase withLanguage:locale withKey:API_KEY withProtocol:(self)];

Everything worked as intended.

But, second time using the same with another locale, recognition is only in the first language.

Eg: App launched and starts recognition with "hi-IN"

Application Name: com.XXXX.XXXX/1.0.1 STS: https://api.cognitive.microsoft.com/sts/v1.0/issueToken Refreshing token /sts/v1.0/issueToken Initializing Audio Services Initializing Speech Services No application id provided to controller GetIdentityPropertyValue 3 Useragent Value iOS Assistant (iOS; 11.2.6;Mobile;ProcessName/AppName=com.XXXX.XXXX/1.0.1;DeviceType=Near;SpeechClient=1.0.161216) Url: 'https://websockets.platform.bing.com/ws/speech/recognize' Locale: 'hi-IN' Application Id: '' Version: 4.0.150429 UserAuthorizationToken: ServerLoggingLevel: 1 Initiating websocket connection. m_connection=0x0 host=websockets.platform.bing.com port=443 Auth token status: 200 Authorization token hr 0 'Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJzY29wZSI6Imh0dHBzOi8vc3BlZWNoLnBsYXRmb3JtLmJpbmcuY29tIiwic3Vic2NyaXB0aW9uLWlkIjoiMGZhNGQ5NmZjODc5NDA1ZmIyZDc3ZGVmY2NiOTc0MzUiLCJwcm9kdWN0LWlkIjoiQmluZy5TcGVlY2guUzAiLCJjb2duaXRpdmUtc2VydmljZXMtZW5kcG9pbnQiOiJodHRwczovL2FwaS5jb2duaXRpdmUubWljcm9zb2Z0LmNvbS9pbnRlcm5hbC92MS4wLyIsImF6dXJlLXJlc291cmNlLWlkIjoiL3N1YnNjcmlwdGlvbnMvZjJmNWJmMGYtZTRlOC00NDY1LTg4ZDQtYmMyMGFiYTNmMTIzL3Jlc291cmNlR3JvdXBzL1NwZWVjaFJlY29nbml0aW9uL3Byb3ZpZGVycy9NaWNyb3NvZnQuQ29nbml0aXZlU2VydmljZXMvYWNjb3VudHMvU3BhcmtsaW5nU3BlZWNoIiwiaXNzIjoidXJuOm1zLmNvZ25pdGl2ZXNlcnZpY2VzIiwiYXVkIjoidXJuOm1zLnNwZWVjaCIsImV4cCI6MTUyMjkyODc1OH0.PTBvhZ18q__-PCJRtWLr-KkQ99yt4c-mnrd2kdyOn1c' Successfully initialized client connection Create ImpressionId: fff94b5814ae9a097f0d749c137069d9 Create ImpressionId: 01eb6b249fc1d90e37ba61a1a2d64fe9 Reset

Create ImpressionId: e69685c047daf66ef0887614b2a35fc4 ImpressionId: b53b312c6dfd13609e5b1cf2952f0af6 Adding requestId: 'cadbb055d5d4ef5c669d210a5fed2bf7' for 'text/cu.client.context' Subscribing request [cadbb055d5d4ef5c669d210a5fed2bf7] Audio stream created Adding requestId: 'e9012ec9fe3d9ee9e8a075e6274eda06' for 'audio/x-wav' Subscribing request [e9012ec9fe3d9ee9e8a075e6274eda06] Audio Stream Created Creating transcoder 2

Upgrade request returned with HTTP status code: 101 Web socket handshake completed CU Client connected ConnectionStateChanged Microphone permissions: 0 Sent first chunk of audio stream, requestId='e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Speech recording started Speech recording started OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: 'e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Response impression: 'b53b312c6dfd13609e5b1cf2952f0af6'

LanguageGeneration OK Partial : आप OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: 'e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Response impression: 'b53b312c6dfd13609e5b1cf2952f0af6'

LanguageGeneration OK Partial : आपके OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: 'e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Response impression: 'b53b312c6dfd13609e5b1cf2952f0af6'

LanguageGeneration OK Partial : आप किस OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: 'e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Response impression: 'b53b312c6dfd13609e5b1cf2952f0af6'

LanguageGeneration OK Partial : आप कैसे OnDataAvailable: 01 => type 1 Received message: 'audio.stream.response' Response request id: 'e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Response impression: 'b53b312c6dfd13609e5b1cf2952f0af6'

LanguageGeneration OK

Sending audio stream endpoint, requestId='e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' Sent audio stream endpoint, requestId='e9012ec9-fe3d-9ee9-e8a0-75e6274eda06' signaling OnAudioEvent(AUDIO_EVENT_RECORD_STOP)

Then initialises new microphone client with "en-US".

Now when recognition starts:

Create ImpressionId: 0eed72b0b8019f0d7647b4d5d1adc8c6 Reset Canceling request [cadbb055d5d4ef5c669d210a5fed2bf7] Canceling request [e9012ec9fe3d9ee9e8a075e6274eda06]

Create ImpressionId: ff9306c014eba5a9da0fa5979269bced ImpressionId: 04bfd4c2fce0e631c6b6d9f3d16877f2 Adding requestId: 'b9148688143a0a9526df6bd9e31110d1' for 'text/cu.client.context' Subscribing request [b9148688143a0a9526df6bd9e31110d1] Audio stream created Adding requestId: '158b5857d7f60759687076b3bfa9d2bc' for 'audio/x-wav' Subscribing request [158b5857d7f60759687076b3bfa9d2bc] Audio Stream Created Creating transcoder 2

Microphone permissions: 0 Speech recording started Speech recording started Sent first chunk of audio stream, requestId='158b5857-d7f6-0759-6870-76b3bfa9d2bc'

Sending audio stream endpoint, requestId='158b5857-d7f6-0759-6870-76b3bfa9d2bc' Sent audio stream endpoint, requestId='158b5857-d7f6-0759-6870-76b3bfa9d2bc' signaling OnAudioEvent(AUDIO_EVENT_RECORD_STOP) Speech recording stopped Speech recording stopped OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: '158b5857-d7f6-0759-6870-76b3bfa9d2bc' Response impression: '04bfd4c2fce0e631c6b6d9f3d16877f2'

LanguageGeneration OK Partial : तो OnDataAvailable: 81 => type 1 Received message: 'audio.stream.response' Response request id: '158b5857-d7f6-0759-6870-76b3bfa9d2bc' Response impression: '04bfd4c2fce0e631c6b6d9f3d16877f2'

LanguageGeneration OK originating error 0x80070057 ERROR: No Reco originating error 0x80070057

Couldn't find the locale in the log the second time and note that the partial responses are still in "hi-IN". Is there any way to remove old language configurations?

ios

objective-c

azure

speech-recognition

microsoft-cognitive

asked on Stack Overflow Apr 19, 2018 by

LordHari

1 Answer

The websocket connection must be closed between one utterance and the next if you wish to change the language. Just waiting 3 minutes between utterances with no activity during the 3 minutes will close the connection. Also, calling AudioStop() should close the connection. If you already tried calling AudioStop() and that did not work, we will ensure this is fixed in upcoming versions of the released API .

answered on Stack Overflow Apr 24, 2018 by

chrisbasoglu

User contributions licensed under CC BY-SA 3.0