I have a list of webpages and I want to read data from those pages. The problem is that those pages are written with Node.js and have some javascripts and I am forced to load those pages into a JavaFX.WebView (because webview has JavaScript engine) and get DOM of those pages from webview. I'm using a TransformerFactory to get the DOM.
The code runs smootly for a while but somehow after Visiting couple of pages my code suddenly stops. The log is like this:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x000000005a1597cb, pid=10044, tid=0x00000000000028b4
#
# JRE version: Java(TM) SE Runtime Environment (8.0_172-b11) (build 1.8.0_172-b11)
the full log is here hs_err_pid10044.log
Also the code snippet for page loading is like this (sorry it is in kotlin)
class ChildPageCrawl(url: String, path: String, semaphore: Semaphore) : JFrame() {
init {
title = "Web View"
setSize(800, 600)
val panel = JFXPanel()
this.add(panel)
Platform.runLater({
val browser = WebView()
val webEngine = browser.engine
panel.scene = Scene(browser, 700.0, 500.0)
webEngine.load(url)
webEngine.loadWorker.stateProperty().addListener { _, _, newState ->
if (newState == Worker.State.SUCCEEDED) {
Thread.sleep(10000) //Waiting for page to fully load
semaphore.acquire()
val doc = webEngine.document
val transformer = TransformerFactory.newInstance().newTransformer()
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no")
transformer.setOutputProperty(OutputKeys.METHOD, "xml")
transformer.setOutputProperty(OutputKeys.INDENT, "yes")
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8")
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4")
val source = DOMSource(doc)
val writer = StringWriter()
val result = StreamResult(writer)
transformer.transform(source, result)
val str = writer.toString()
analyzeSML(str)
semaphore.release()
}
}
})
defaultCloseOperation = DISPOSE_ON_CLOSE
}
}
I don't know where I'm going wrong, also the WebView, and the corresponding WebEngine seems to be slow since they render the webpage which is unnecessary for me.
User contributions licensed under CC BY-SA 3.0