Process finished with exit code -1073741819 (0xC0000005) - Rpy2

0

I have searched a lot for this error, on stack overflow and other websites but I cannot seem to find a solution to my problem.

Basically, I have a program that is in python, and I am using python's module rpy2 for communicating with some R functions, from python.

The problem is that when I run the code, sometimes, but not always I encounter this error. I am on windows. Sometimes when I restart my PC this code runs more exercises, but then eventually this error pops up again. What should I do ?

I have python 3.6.7, with PyCharm 2018.3.3. However I doubt the problem is from PyCharm because when I run my program from the cmd the same thing happens, except that the program halts directly without notifying me with the message "Process finished with exit code -1073741819 (0xC0000005)". This message only appears in PyCharm, but still.

I have rpy2 version 2.9.5

Code description

I do know, relatively, which part of the code is doing this, but I cannot optimize it more. In other words, In this part of the code, inside cross validation, I am over populating each of the train and validation sets in a certain way, and in order to do that, I am combining both X_train and y_train back into one data frame, overpopulating this data frame, and then getting back the updated, overpopulated, X_train and y_train, and performing my analysis on these overpopulated ones. I think combining both into numpy arrays into a pandas dataframe and then un-combining back is creating this memory error. Also its important to note that this is happening in each fold, and I'm doing a 10-folds-10-repeats cross validation. However, even when I run this on a Desktop PC rather than on my laptop the same thing happens, knowing that I have plenty of GBs left on my own laptop. I am doubting this is a python/rpy2 error ??

Code snippet

# I am calling this function inside each fold
df_combined = self.prepare_data(X_train, y_train)

and then after calling prepare_data() I do as follows:

# THE apply_f1(), apply_f2(), apply_f3(), and apply_f4() ARE THE FUNCTIONS
# THAT USE rpy2 INTERNALLY
if self.f1:
       X_train_inner, y_train_inner = self.apply_f1(df_combined)

elif self.f2:
        X_train_inner, y_train_inner = self.apply_f2(df_combined)

elif self.f3:
        X_train_inner, y_train_inner = self.apply_f3(df_combined)

else:
    X_train_inner, y_train_inner = self.apply_f4(df_combined)

The prepare_data() function:

    def prepare_data(self, X_train, y_train):
        '''
        concatenates X_train_inner and y_train_inner into one, and make them a data frame
        so we are able to process the data frame by SMOGN, RandUnder, GN, or SMOTER
        '''

        # reshape + rename
        X_train_samp = X_train
        y_train_samp = y_train.reshape(-1, 1)

        # combine two numpy arrays together into one numpy array
        combined = np.concatenate((X_train_samp, y_train_samp), axis=1)

        # transform X_train + y_train into a pandas dataframe
        column_names = self.other + [self.target_variable]
        df_combined = pd.DataFrame(combined, columns=column_names)

        # convert the combined pandas dataframe to R Data.Frame
        df_combined = pandas2ri.py2ri(df_combined)

        return df_combined
python
r
pycharm
cross-validation
rpy2
asked on Stack Overflow Feb 29, 2020 by Perl • edited Feb 29, 2020 by Perl

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0