Python Threading using concurrent.futures module

Question

Python Threading using concurrent.futures module

I am new to the principles of threading but I think I have a general idea on how to use it and I am trying to implement it in a system in which I am working on but unfortunately, it's crashing/stopping the system as soon as it reads the threading code that I've wrote.

Here's my code:

Import concurrent.futures

   def get_edi():
       edi = []
       additional_edi_list = 0
       try:
           for info in list(soup.find_all('div', id='servinfo')):
               for description_list in list(info.find_all('dl')):
                   for description_term in list(description_list.find_all('dt')):
                       for a_links_description_term in list(description_term.find_all('a')):
                           for text_term in list(a_links_description_term):
                               pattern = re.compile(r'\bRequired Services',
                                                    flags=re.IGNORECASE)  # -> find a character/string inside a string using regular expressions (regex)
                               if pattern.search(
                                       text_term):  # -> find a character/string inside a string using regular expressions (regex), this line returns 'true or false'
                                   for ul in list(description_list.find_all('ul', class_='unordered-bullet')):
                                       additional_edi_list += 1
                                       if additional_edi_list > 1:
                                           break  # this avoids fetching the "additional services" on some html file
                                       interval_counter = 0
                                       for li in list(ul('li')):
                                           interval_counter += 1
                                           list_item = li.text.split("See")
                                           list_item = list_item[0].split("(")
                                           list_item = list_item[0].rstrip()  # remove the 'trailing' white spaces
                                           edi.append(list_item)
           print("Every Distance Interval Records: " + str(len(edi)))
           thread_test(edi)
           edi.clear()
       except (ValueError, ZeroDivisionError, RuntimeError, TypeError, NameError) as error:
           print("-----get_every_distance_interval----- function error: " + str(error))


    def insert_to_edi(every_distance_interval):
        time.sleep(.400)
        with conn:
            for item in every_distance_interval:
                c.execute("INSERT INTO EveryDistanceInterval (service_text) VALUES (?)",
                          (item,))  # make this a tuple
                conn.commit()


    def thread_test(edi):
        with concurrent.futures.ThreadPoolExecutor() as executor:
            executor.map(insert_to_edi, edi)

The above code is crashing as soon as it reads the threading line:

Process finished with exit code -1073741819 (0xC0000005)

My question is, is there an idea or principle behind this that I clearly do not understand or is my way of using it wrong? I appreciate all y'all help! Thank you in advance!

python

multithreading

web-scraping

threadpool

asked on Stack Overflow Dec 29, 2020 by

Silverbells • edited Dec 29, 2020 by

marc_s

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0