How do I use MODI in an ASP.Net Web Application?

10

I've written an OCR wrapper library around the Microsoft Office Document Imaging COM API, and in a Console App running locally, it works flawlessly, with every test.

Sadly, things start going badly when we attempt to integrate it with a WCF service running as an ASP.Net Web Application, under IIS6. We had issues around trying to free up the MODI COM Objects, and there were plenty of examples on the web that helped us.

However, problems still remain. If I restart IIS, and do a fresh deployment of the web app, the first few OCR attempts work great. If I leave it for 30 minutes or so, and then do another request, I get server failure errors like this:

The server threw an exception. (Exception from HRESULT: 0x80010105 (RPC_E_SERVERFAULT)): at MODI.DocumentClass.Create(String FileOpen)

From this point on, every request will fail to do the OCR, until I reset IIS, and the cycle begins again.

We run this application in it's own App Pool, and it runs under an identity with Local Admin rights.

UPDATE: This issue can be solved by doing the OCR stuff out of process. It appears as though the MODI library doesn't play well with managed code, when it comes to cleaning up after itself, so spawning new processes for each OCR request worked well in my situation.

Here is the function that performs the OCR:

    public class ImageReader : IDisposable
{
    private MODI.Document _document;
    private MODI.Images _images;
    private MODI.Image _image;
    private MODI.Layout _layout;
    private ManualResetEvent _completedOCR = new ManualResetEvent(false);

    // SNIP - Code removed for clarity

    private string PerformMODI(string fileName)
    {
        _document = new MODI.Document();
        _document.OnOCRProgress += new MODI._IDocumentEvents_OnOCRProgressEventHandler(_document_OnOCRProgress);
        _document.Create(fileName);

        _document.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
        _completedOCR.WaitOne(5000);
        _document.Save();
        _images = _document.Images;
        _image = (MODI.Image)_images[0];
        _layout = _image.Layout;
        string text = _layout.Text;
         _document.Close(false);
        return text;
    }

    void _document_OnOCRProgress(int Progress, ref bool Cancel)
    {
        if (Progress == 100)
        {
            _completedOCR.Set();
        }
    }
    private static void SetComObjectToNull(params object[] objects)
    {
        for (int i = 0; i < objects.Length; i++)
        {
            object o = objects[i];
            if (o != null)
            {
                Marshal.FinalReleaseComObject(o);
                o = null;
            }
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    public void Dispose()
    {
        SetComObjectToNull(_layout, _image, _images, _document);
        GC.Collect();
        GC.WaitForPendingFinalizers();
    }
}

I then instantiate an instance of ImageReader inside a using block (which will call IDisposable.Dispose on exit)

Calling Marshal.FinalReleaseComObject should instruct the CLR to release the COM objects, and so I'm at a loss to figure out what would be causing the symptoms we have.

For what it's worth, running this code outside of IIS, in say a Console App, everything seems bullet proof. It works every time.

Any tips that help me diagnose and solve this issue would be an immense help and I'll upvote like crazy! ;-)

Thanks!

asp.net
ocr
modi
asked on Stack Overflow Aug 28, 2009 by Scott Ferguson • edited Sep 11, 2009 by Scott Ferguson

4 Answers

4

Have you thought of hosting the OCR portion of your app out-of-process.

Having a service can give you tons of flexibility:

  1. You can define a simple end point for your web application, and access it via remoting or WCF.
  2. If stuff is pear shape and the library is all dodge, you can have the service launch a separate process every time you need to perform OCR. This gives you extreme safety, but involves a small extra expense. I would assume that OCR is MUCH more expensive than spinning up a process.
  3. You can keep an instance around of the COM object, if memory starts leaking you can restart yourself without impacting the web site (if you are careful).

Personally I have found in the past the COM interop + IIS = grief.

answered on Stack Overflow Sep 9, 2009 by Sam Saffron
1

MODI is incredibly wonky when it comes to getting rid of itself, especially running in IIS. In my experience, I've found that although it slows everything down, the only way to get rid of these errors is to add a GC.WaitForPendingFinalizers() after your GC.Collect() call. If you're interested, I wrote an article about this.

answered on Stack Overflow Aug 28, 2009 by AJ.
1

Can you replicate the problem in a small console application? Perhaps leaving it sleep for 30 mins and coming back to it?

Best way to solve things like this is to isolate it down totally. I'd be interested to see how that works.

answered on Stack Overflow Sep 9, 2009 by Noon Silk
1

I had to deal with this error a week ago, and after testing some solutions giving here, i finally resolved the problem. I'll explain here how i did it.

In my case i have a windows service runing and processing documents from a folder, the problem occurs when there are more than 20 documents, throwing the error: Exception from HRESULT: 0x80010105 (RPC_E_SERVERFAULT).

In my code i was calling a method each time i detect a document in the folder, i make an instance of MODI document (MODI.Document _document = new MODI.Document();) and i process the file, and that was what causes the error!!

The solution was to have just one global instance of MODI.Document, and process all documents whit it, this way i have only one instance runing for my service all time.

I hope that will help those who are facing the same problem.

answered on Stack Overflow Nov 26, 2010 by lh.anass

User contributions licensed under CC BY-SA 3.0