Pandas Fill 'NaN' methods not working

0

I have a Pandas portion of code within a Larger Python function which references some Django objects within Models. The bulk of the work is just importing a CSV with pandas and then merging it with a new column (evidence_id), which will reference the same value as that in the string. The string value comes from another Django table. I am able to create a new Pandas Column with the 'evidence_id', from the unicode string pulled from my Django models. But it's a string of length one, and the rest of the CSV has @430,000 obs. So when I bring in the new 'evidence_id' column it only has the string value ('1') for the 1st row. I tried backfill and forward fill and replace to get the '1' to repeat for every row in the 'evidence_id' column. I cannot hardcode '1' though; as the 'evidence_id' will change. It just will always be the same number for all rows in the final dataframe. I printed the object type of the extract_properties['evidence_number'] and object 'evidence_number', in hopes that helps discern my issue. Any ideas are greatly appreciated...

    '''
    Processes extract from the extract, direct pulls from DJango models   
    tables. 
    '''
evidence_obj, created = Evidence.objects.get_or_create(case=case_obj, 
evidence_number=extract_properties['evidence_number'])
evidence = pd.Series(extract_properties['evidence_number'])
print type(extract_properties['evidence_number'])
# <type 'unicode'>
print type(evidence)
# dtype: object
print str(evidence)
# 0    1

cols = ['mft_entry', 'sequence_nbr', 'parent_mft', 'parent_sequence_nbr', 'SI_mdate', 'SI_mtime', 'SI_adate', 'SI_atime', 'SI_cdate', 'SI_ctime', 'SI_bdate', 'SI_btime', 'FN_mdate', 'FN_mtime', 'FN_adate', 'FN_atime',\
    'FN_cdate', 'FN_ctime', 'FN_bdate', 'FN_btime', 'typeof', 'extension', 'size', 'nname', 'ppath', 'symbolic_link', 'object_id', 'ads_metadata', 'time_warning', 'shortfilename_mdate', 'shortfilename_mtime',\
    'shortfilename_adate', 'shortfilename_atime', 'shortfilename_cdate', 'shortfilename_ctime', 'shortfilename_bdate', 'shortfilename_btime', 'extracted_filepath']

frame = pd.read_csv('media/tmp/file.csv', delimiter='|', skiprows=6, names=cols, na_values=['', '          ', 'na'], encoding = 'latin-1', parse_dates=True, iterator=True, dayfirst=True, chunksize=1000)
df = pd.concat(frame)
df.fillna('Null')
df['evidence_id'] = evidence
df['evidence_id'].replace('NaN', evidence)
print df.head()

    #           shortfilename_bdate shortfilename_btime        extracted_filepath  \
    # 0          2014-03-20        17:58:44.625  0x002aceb4 -> 0x002aceb6
    # 1          2014-03-20        17:58:44.688  0x0009a9c6 -> 0x0009a9c8
    # 2                 NaN                      0x00002ca5 -> 0x00002ca7
    # 3                 NaN                      0x00000640 -> 0x00000642
    # 4                 NaN                      0x002a9b9f -> 0x002a9ba0

    #   evidence_id
    # 0           1
    # 1         NaN
    # 2         NaN
    # 3         NaN
    # 4         NaN
python
pandas
asked on Stack Overflow Jan 19, 2015 by PR102012

1 Answer

0
import pandas as pd
import numpy as np

test = pd.DataFrame({'a':[1,np.nan,np.nan],'b':[1,5,2]})
test['a'] = test['a'].fillna(test['a'].ix[0])

So in your case the last chunk of code would look like this.

df = pd.concat(frame)
df['evidence_id'] = evidence
df['evidence_id'] = df['evidence_id'].fillna(df['evidence_id'].ix[0])
answered on Stack Overflow Jan 20, 2015 by champagne_campaign

User contributions licensed under CC BY-SA 3.0