WebDriverException: Message: Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" while saving a large html file using Selenium Python

1

I'm scrolling through the Google Play Store and the reviews for an app, specified by the URL to the app page. Selenium then finds the reviews and scrolls down to load all reviews. The scrolling part works, without the headless option I can watch Selenium reaching the end of the site. What's not working is saving the html content for further analysis.

Based on other answers I tried different methods for saving the source code.

innerHTML = DRIVER.execute_script("return document.body.innerHTML")

or

innerHTML = DRIVER.page_source

Both result in the same error message and exception.

My code for scrolling through the page and loading all reviews

SCROLL_PAUSE_TIME = 5
options = Options()
options.headless = True
FP = webdriver.FirefoxProfile()
FP.set_preference("intl.accept_languages", "de")

for url in START_URLS:

    try:
        DRIVER = webdriver.Firefox(options=options, firefox_profile=FP)
        DRIVER.get(url)
        time.sleep(SCROLL_PAUSE_TIME)
        app_name = DRIVER.find_element_by_xpath('//h1[@itemprop="name"]').get_attribute('innerText')
        all_reviews_button = DRIVER.find_element_by_xpath('//span[text()="Alle Bewertungen lesen"]')
        all_reviews_button.click()
        time.sleep(SCROLL_PAUSE_TIME)
        last_height = DRIVER.execute_script("return document.body.scrollHeight")
        while True:
            DRIVER.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            try:
                DRIVER.find_element_by_xpath('//span[text()="Mehr anzeigen"]').click()
            except:
                pass
            time.sleep(SCROLL_PAUSE_TIME)
            new_height = DRIVER.execute_script("return document.body.scrollHeight")
            if new_height == last_height:
                logger.info('Durchlauf erfolgreich')
                innerHTML = DRIVER.execute_script("return document.body.innerHTML")
                with open(app_name +'.html','w', encoding='utf-8') as out:
                   out.write(html)
                break
            last_height = new_height

    except Exception as e:
        logger.error('Exception occurred', exc_info=True)
    finally:
        DRIVER.quit()

the log file, showing that the infinity scroll reached the end of the page but couldn't save the file

10.09.19 16:12:00 - INFO - Durchlauf erfolgreich
10.09.19 16:12:13 - ERROR - Exception occurred
Traceback (most recent call last):
  File "scraper.py", line 57, in <module>
    innerHTML = DRIVER.execute_script("return document.body.innerHTML")
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\tenscher\AppData\Local\Programs\Python\Python36\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

last part of the geckodriver.log

...
1568124670155   Marionette  WARN    TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124693017   Marionette  WARN    TimedPromise timed out after 500 ms: stacktrace:
bail@chrome://marionette/content/sync.js:223:64
1568124734637   Marionette  INFO    Stopped listening on port 57015
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child 10464, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Parent 14684, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
JavaScript error: resource:///modules/sessionstore/SessionStore.jsm, line 1639: TypeError: subject.QueryInterface is not a function
A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[Child 2508, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560820494/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
[Child]

I'd like to save the page as a file and in the next step parse the html to extract the reviews. However the saving part is not working with a large page. If I exit the while loop after say 100 steps and save the page it works fine.

python
selenium
firefox
marionette
geckodriver
asked on Stack Overflow Sep 11, 2019 by Sven Tenscher • edited Jun 29, 2020 by DebanjanB

1 Answer

1

NS_ERROR_FAILURE (0x80004005)

This is the generic error of all the errors and occurs for all errors for which a more specific error code does not apply.


However this error message...

selenium.common.exceptions.WebDriverException: Message: [Exception... "Failure"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://marionette/content/proxy.js :: sendReply_ :: line 275"  data: no]

...implies that the Marionette threw an error while attempting to read/store/copy the page_source().

The relevant HTML DOM would have helped us to debug the issue in a better way. However it seems the issue is with the fact that the page_source() is emencely huge/large which exceeds the max value of the max value Marionette can handle. Possibly it's a much bigger string you're dealing with.


Solution

A quick solution will be to avoid passing the page_source() to the variable and print it to find out where the actual issue lies.

print(DRIVER.execute_script("return document.body.innerHTML"))

Or

print(DRIVER.page_source)

Reference

You can find a couple of relevant discussion in:


Outro

Documentation links:

answered on Stack Overflow Sep 11, 2019 by DebanjanB • edited May 27, 2020 by DebanjanB

User contributions licensed under CC BY-SA 3.0