Pandas drop_duplicates occasionally crashes with no exception


I'm using pandas with Python 2.7 (distributed with ArcGIS Desktop 10.7, which is why I can't use Python 3), running on Windows Server 2019.

In my code I'm trying to remove duplicates from a CSV file (~100MB):

df = pd.read_csv(csv_file_path, dtype={
            "COLUMN_1": float,
            "COLUMN_2": int,
            "COLUMN_3": int,
            "COLUMN_4": int
unique_df = df.drop_duplicates(keep="first")
unique_df.to_csv(another_csv_file_path, index=False)

The problem is that this code sometimes works as expected, and some other times it just crashes without any exception or warning. The process is ended and further code is not being executed.

When I run my script from Windows Task Scheduler, the Last Run Result in case of failure is 0xC0000005. In case of success it is 0x0.

Any suggestions?

asked on Stack Overflow Dec 1, 2020 by isshp

0 Answers

Nobody has answered this question yet.

User contributions licensed under CC BY-SA 3.0