I wrote a code to parse some Web tables.
I get some web tables into an IHTMLElementCollection
using Internet Explorer with this code:
TabWeb = IE.document.getelementsbytagname("table")
Then I use a sub who gets an object containing the IHTMLElementCollection
and some other data:
Private Sub TblParsing(ByVal ArrVal() As Object)
Dim WTab As mshtml.IHTMLElementCollection = ArrVal(0)
'some code
End sub
My issue is: if I simply "call" this code, it works correctly:
Call TblParsing({WTab, LiRow})
but, if I try to run it into a threadpool:
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing), {WTab, LiRow})
the code fails and give me multiple
System.UnauthorizedAccessException
This happens on (each of) these code rows:
Rws = WTab(RifWT("Disc")).Rows.Length
If Not IsError(WTab(6).Cells(1).innertext) Then
Ogg_W = WTab(6).Cells(1).innertext
My goal is to navigate to another web page while my sub perform parsing.
I want to clarify that:
1) I've tryed to send the entire HTML to the sub and get it into a webbrowser
but it didn't work because it isn't possible to cast from System.Windows.Forms.HtmlElementCollection
to mshtml.IHTMLElementCollection
(or I wasn't able to do it);
2) I can't use WebRequest
and similar: I'm forced to use InternetExplorer
;
3) I can't use System.Windows.Forms.HtmlElementCollection
because my parsing code uses Cells
, Rows
and so on that are unavailable (and I don't want to rewrite all my parsing code)
EDIT:
Ok, I modified my code using answer hints as below:
'This in the caller sub
Dim IE As Object = CreateObject("internetexplorer.application")
'...some code
Dim IE_Body As String = IE.document.body.innerhtml
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing_2), {IE_Body, LiRow})
'...some code
'This is the called sub
Private Sub TblParsing_2(ByVal ArrVal() As Object)
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(ArrVal(0))
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim TabWeb As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
'...some code
I get no errors but I'm not sure that it's all right because I tryed to use IE_Body
string into webbrowser and it throws errors in the webpage (it shows a popup and I can ignore errors).
Am I using the right way to get Html
from Internet Explorer
into a string
?
EDIT2:
I changed my code to:
Dim IE As New SHDocVw.InternetExplorer
'... some code
Dim sourceIDoc3 As mshtml.IHTMLDocument3 = CType(IE.Document, mshtml.IHTMLDocument3)
Dim html As String = sourceIDoc3.documentElement.outerHTML
ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf TblParsing_2), {html, LiRow})
'... some code
Private Sub TblParsing_2(ByVal ArrVal() As Object)
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(ArrVal(0))
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim TabWeb As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
But I get an error PopUp like (I tryed to translate it):
Title:
Web page error
Text:
Debug this page?
This page contains errors that might prevent the proper display or function properly.
If you are not testing the web page, click No.
two checkboxes
do not show this message again
Use script debugger built-in Internet Explorer
It's the same error I got trying to get Html text into a WebBrowser.
But, If I could ignore this error, I think the code could work!
While the pop is showing I get error on
Dim domDoc As New mshtml.HTMLDocument
Error text translated is:
Retrieving the COM class factory for component with CLSID {25336920-03F9-11CF-8FD0-00AA00686F13} failed due to the following error: The 8,001,010th message filter indicated that the application is busy. (Exception from HRESULT: 0x8001010A (RPC_E_SERVERCALL_RETRYLATER)).
Note that I've alredy set IE.silent = True
Edit: There was confusion as to what the OP meant by "Internet Explorer". I originally assumed that it meant the WinForm Webbrowser control; however the OP is creating the COM browser directly instead of using the .Net wrapper.
To get the browser document's defining HTML, you can cast the document against the mshtml.IHTMLDocument3
interface to expose the documentElement
property.
Dim ie As New SHDocVw.InternetExplorer ' Proj COM Ref: Microsoft Internet Controls
ie.Navigate("some url")
' ... other stuff
Dim sourceIDoc3 As mshtml.IHTMLDocument3 = CType(ie.Document, mshtml.IHTMLDocument3)
Dim html As String = sourceIDoc3.documentElement.outerHTML
End Edit.
The following is based on my comment above. You use the WebBrowser.DocumentText
property to create a mshtml.HTMLDocument
.
Use this property when you want to manipulate the contents of an HTML page displayed in the WebBrowser control using string processing tools.
Once you extract this property as a String, there is no connection to the WebBrowser control and you can process the data in any thread you want.
Dim html As String = WebBrowser1.DocumentText
Dim domDoc As New mshtml.HTMLDocument
Dim domDoc2 As mshtml.IHTMLDocument2 = CType(domDoc, mshtml.IHTMLDocument2)
domDoc2.write(html)
Dim body As mshtml.IHTMLElement2 = CType(domDoc2.body, mshtml.IHTMLElement2)
Dim tables As mshtml.IHTMLElementCollection = body.getElementsByTagName("TABLE")
' ... do something
' cleanup COM objects
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(body)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(tables)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(domDoc)
System.Runtime.InteropServices.Marshal.FinalReleaseComObject(domDoc2)
User contributions licensed under CC BY-SA 3.0