Can't web scrape with PyQt5 more than once

-1

I am attempting to web scrape using the PyQT5 QWebEngineView. Here is the code that I got from another response on StackOverflow:

from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl, QEventLoop
from PyQt5.QtWebEngineWidgets import QWebEngineView
import sys

def render(url):
    class Render(QWebEngineView):
        def __init__(self, t_url):
            self.html = None
            self.app = QApplication(sys.argv)
            QWebEngineView.__init__(self)
            self.loadFinished.connect(self._loadfinished)
            self.load(QUrl(t_url))
            while self.html is None:
                self.app.processEvents(QEventLoop.ExcludeUserInputEvents | QEventLoop.ExcludeSocketNotifiers | QEventLoop.WaitForMoreEvents)
            self.app.quit()

        def _callable(self, data):
            self.html = data

        def _loadfinished(self, result):
            self.page().toHtml(self._callable)

    return Render(url).html

Then if I put the line:

print(render('http://quotes.toscrape.com/random'))

it works as expected. But if I add a second line to that so it reads:

print(render('http://quotes.toscrape.com/random'))
print(render('http://quotes.toscrape.com/tableful/'))

it gives me the error "Process finished with exit code -1073741819 (0xC0000005)" after printing out the first render correctly.

I have narrowed the error down to the line that says self.load(QUrl(t_url))

python
python-3.x
web-scraping
pyqt
pyqt5
asked on Stack Overflow Oct 22, 2018 by boymeetscode • edited Oct 22, 2018 by eyllanesc

1 Answer

1

You're initializing QApplication more than once. Only once instance should exist, globally. If you need to get the current instance and do not have a handle to it, you can use QApplication.instance(). QApplication.quit() is meant to be called right before sys.exit, in fact, you should almost never use one without the other.

In short, you're telling Qt you're exiting the application, and then trying to run more Qt code. It's an easy fix, however...

Solution

You can do 1 of three things:

Store the app in a global variable and reference it from there:

APP = QApplication(sys.argv)
# ... Many lines ellipsed

class SomeClass(QWidget):
    def some_method(self):
        APP.processEvents(QEventLoop.ExcludeUserInputEvents | QEventLoop.ExcludeSocketNotifiers | QEventLoop.WaitForMoreEvents)

Pass the app as a handle to the class.

def render(app, url):
    ...

Create a global instance, and use QApplication.instance().

APP = QApplication(sys.argv)
# ... Many lines ellipsed

class SomeClass(QWidget):
    def some_method(self):
        app = QApplication.instance()
        app.processEvents(QEventLoop.ExcludeUserInputEvents | QEventLoop.ExcludeSocketNotifiers | QEventLoop.WaitForMoreEvents)

Do what's most convenient for you.

answered on Stack Overflow Oct 22, 2018 by Alexander Huszagh • edited Oct 22, 2018 by Alexander Huszagh

User contributions licensed under CC BY-SA 3.0