I am trying to throw together a proof of concept project, just to see how good Microsoft's Cognitive Services Speech Transcription is.
I have followed all the examples on their site, but have so far been unsuccessful. Initially I was unable to get it to run at all under one of my existing code bases as x86, it was throwing the error:
An attempt was made to load a program with an incorrect format
Then I created a brand new .net framework x64 console app. And it would start, then crash internally using version 1.4.0 as well as a few other versions I tried and put this error into my event log:
Faulting application name: dotnet.exe, version: 2.1.27415.1, time stamp: 0x5c672873 Faulting module name: Microsoft.CognitiveServices.Speech.core.dll, version: 1.3.1.28, time stamp: 0x5c764ab1 Exception code: 0xc0000094 Fault offset: 0x000000000007567c Faulting process id: 0x6200 Faulting application start time: 0x01d4f1518c240c4b Faulting application path: C:\Program Files\dotnet\dotnet.exe Faulting module path: C:\Users\username.nuget\packages\microsoft.cognitiveservices.speech\1.3.1\runtimes\win-x64\native\Microsoft.CognitiveServices.Speech.core.dll
Finally I found version 1.1.0 which would actually start, (version 1.0.0 would not even allow the app to compile). Now I am running into the issue that the SessionStarted and SessionStopped events are called instantly, but no transcription ever takes place, and using Fiddler it looks like no calls are being made outside of my machine.
Unless Cognitive Services is really buggy, then there must be something simple I am missing. Can anyone point it out?
My goal is to transcribe a 5 minute or less audio file on my local network. Here is the code I am attempting.
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
var file = @"U:\path\file.wav";
ContinuousRecognitionAsync(file).Wait();
Console.WriteLine("End!");
}
public static async Task ContinuousRecognitionAsync(string audiopath)
{
// subscription key and service region. Replace with your own subscription key
// and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("<my free test key>", "westus");
var audio = Microsoft.CognitiveServices.Speech.Audio.AudioConfig.FromWavFileInput(audiopath);
// Creates a continuos speech recognizer using WAV input.
using (var recognizer = new SpeechRecognizer(config, audio))
{
//Subscribes to events.
recognizer.Recognizing += (s, e) =>
{
Console.WriteLine($"\n Recognizing: {e.Result.Text}.");
};
recognizer.Recognized += (s, e) =>
{
Console.WriteLine($"\n Recognized: {e.Result.Text}.");
};
recognizer.SessionStarted += (s, e) =>
{
Console.WriteLine($"\n SessionStarted: {e.SessionId}.");
};
recognizer.SessionStopped += (s, e) =>
{
Console.WriteLine($"\n SessionStopped: {e.SessionId}.");
};
recognizer.SpeechEndDetected += (s, e) =>
{
Console.WriteLine($"\n SpeechEndDetected: {e.SessionId}.");
};
recognizer.SpeechStartDetected += (s, e) =>
{
Console.WriteLine($"\n SpeechStartDetected: {e.SessionId}.");
};
recognizer.Canceled += (s, e) =>
{
Console.WriteLine($"\n Canceled: {e.SessionId}.");
};
// Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
Console.WriteLine("Say something...");
//await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);
Console.WriteLine("Press any key to stop");
Console.ReadKey();
await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
}
}
EDIT: After some changes, and moving the wav file locally (it was on a mapped drive), it did briefly try to run a transcription on the file, but no valid text was ever returned, only blank strings.
Transcription via microphone is working just fine. But as soon as I throw one of my .wav files at it Cognitive Services is once again crashing with the Exception code: 0xc0000094. I even tried the code that half worked, and that is also throwing the same error now.
I figured out the issue, it turned out to be the .wav files themselves. As near as I could tell, they were valid wave files. With WAV listed at the top of the binary file if you looked at it in Notepad++. However, they consistently caused Cognitive Services to crash. And the one time I got it to take one, it was unable to read it and just started running in an infinite loop returning blank strings.
I solved the issue by running the files through a double conversion. I converted them to .m4a files, then back to .wav files. Once I did that they all started working perfectly.
I originally thought it was because I was storing the files remotely on a mapped drive. However, access via mapped drive worked just fine once the files were fixed.
Hopefully Microsoft will add better error handling to the Cognitive Services wrapper. And allow the API to handle more than just wav file types.
User contributions licensed under CC BY-SA 3.0