I have searched a lot for this error, on stack overflow and other websites but I cannot seem to find a solution to my problem.
Basically, I have a program that is in
python, and I am using python's module rpy2 for communicating with some
R functions, from python.
The problem is that when I run the code, sometimes, but not always I encounter this error. I am on windows. Sometimes when I restart my PC this code runs more exercises, but then eventually this error pops up again. What should I do ?
python 3.6.7, with
PyCharm 2018.3.3. However I doubt the problem is from
PyCharm because when I run my program from the
cmd the same thing happens, except that the program halts directly without notifying me with the message "Process finished with exit code -1073741819 (0xC0000005)". This message only appears in PyCharm, but still.
rpy2 version 2.9.5
I do know, relatively, which part of the code is doing this, but I cannot optimize it more. In other words, In this part of the code, inside cross validation, I am over populating each of the train and validation sets in a certain way, and in order to do that, I am combining both X_train and y_train back into one data frame, overpopulating this data frame, and then getting back the updated, overpopulated, X_train and y_train, and performing my analysis on these overpopulated ones. I think combining both into
numpy arrays into a
pandas dataframe and then un-combining back is creating this memory error. Also its important to note that this is happening in each fold, and I'm doing a 10-folds-10-repeats cross validation. However, even when I run this on a Desktop PC rather than on my laptop the same thing happens, knowing that I have plenty of GBs left on my own laptop. I am doubting this is a python/rpy2 error ??
# I am calling this function inside each fold df_combined = self.prepare_data(X_train, y_train)
and then after calling
prepare_data() I do as follows:
# THE apply_f1(), apply_f2(), apply_f3(), and apply_f4() ARE THE FUNCTIONS # THAT USE rpy2 INTERNALLY if self.f1: X_train_inner, y_train_inner = self.apply_f1(df_combined) elif self.f2: X_train_inner, y_train_inner = self.apply_f2(df_combined) elif self.f3: X_train_inner, y_train_inner = self.apply_f3(df_combined) else: X_train_inner, y_train_inner = self.apply_f4(df_combined)
def prepare_data(self, X_train, y_train): ''' concatenates X_train_inner and y_train_inner into one, and make them a data frame so we are able to process the data frame by SMOGN, RandUnder, GN, or SMOTER ''' # reshape + rename X_train_samp = X_train y_train_samp = y_train.reshape(-1, 1) # combine two numpy arrays together into one numpy array combined = np.concatenate((X_train_samp, y_train_samp), axis=1) # transform X_train + y_train into a pandas dataframe column_names = self.other + [self.target_variable] df_combined = pd.DataFrame(combined, columns=column_names) # convert the combined pandas dataframe to R Data.Frame df_combined = pandas2ri.py2ri(df_combined) return df_combined
User contributions licensed under CC BY-SA 3.0