I have searched a lot for this error, on stack overflow and other websites but I cannot seem to find a solution to my problem.
Basically, I have a program that is in python
, and I am using python's module rpy2 for communicating with some R
functions, from python.
The problem is that when I run the code, sometimes, but not always I encounter this error. I am on windows. Sometimes when I restart my PC this code runs more exercises, but then eventually this error pops up again. What should I do ?
I have python 3.6.7
, with PyCharm 2018.3.3
. However I doubt the problem is from PyCharm
because when I run my program from the cmd
the same thing happens, except that the program halts directly without notifying me with the message "Process finished with exit code -1073741819 (0xC0000005)". This message only appears in PyCharm, but still.
I have rpy2 version 2.9.5
Code description
I do know, relatively, which part of the code is doing this, but I cannot optimize it more. In other words, In this part of the code, inside cross validation, I am over populating each of the train and validation sets in a certain way, and in order to do that, I am combining both X_train and y_train back into one data frame, overpopulating this data frame, and then getting back the updated, overpopulated, X_train and y_train, and performing my analysis on these overpopulated ones. I think combining both into numpy
arrays into a pandas
dataframe and then un-combining back is creating this memory error. Also its important to note that this is happening in each fold, and I'm doing a 10-folds-10-repeats cross validation. However, even when I run this on a Desktop PC rather than on my laptop the same thing happens, knowing that I have plenty of GBs left on my own laptop. I am doubting this is a python/rpy2 error ??
Code snippet
# I am calling this function inside each fold
df_combined = self.prepare_data(X_train, y_train)
and then after calling prepare_data()
I do as follows:
# THE apply_f1(), apply_f2(), apply_f3(), and apply_f4() ARE THE FUNCTIONS
# THAT USE rpy2 INTERNALLY
if self.f1:
X_train_inner, y_train_inner = self.apply_f1(df_combined)
elif self.f2:
X_train_inner, y_train_inner = self.apply_f2(df_combined)
elif self.f3:
X_train_inner, y_train_inner = self.apply_f3(df_combined)
else:
X_train_inner, y_train_inner = self.apply_f4(df_combined)
The prepare_data()
function:
def prepare_data(self, X_train, y_train):
'''
concatenates X_train_inner and y_train_inner into one, and make them a data frame
so we are able to process the data frame by SMOGN, RandUnder, GN, or SMOTER
'''
# reshape + rename
X_train_samp = X_train
y_train_samp = y_train.reshape(-1, 1)
# combine two numpy arrays together into one numpy array
combined = np.concatenate((X_train_samp, y_train_samp), axis=1)
# transform X_train + y_train into a pandas dataframe
column_names = self.other + [self.target_variable]
df_combined = pd.DataFrame(combined, columns=column_names)
# convert the combined pandas dataframe to R Data.Frame
df_combined = pandas2ri.py2ri(df_combined)
return df_combined
I have had this same error message "Process finished with exit code -1073741819 (0xC0000005)" with PyCharm 2021.1.
It happened because I selected Python 3.9 as an interpreter, while PyCharm was actually trying to use Python 3.10. And actually I had only Python 3.8 installed.
As far as I am concerned, the error disappeared after I selected Python 3.8 as an interpreter.
User contributions licensed under CC BY-SA 3.0