I have a Pandas portion of code within a Larger Python function which references some Django objects within Models. The bulk of the work is just importing a CSV with pandas and then merging it with a new column (evidence_id), which will reference the same value as that in the string. The string value comes from another Django table. I am able to create a new Pandas Column with the 'evidence_id', from the unicode string pulled from my Django models. But it's a string of length one, and the rest of the CSV has @430,000 obs. So when I bring in the new 'evidence_id' column it only has the string value ('1') for the 1st row. I tried backfill and forward fill and replace to get the '1' to repeat for every row in the 'evidence_id' column. I cannot hardcode '1' though; as the 'evidence_id' will change. It just will always be the same number for all rows in the final dataframe. I printed the object type of the extract_properties['evidence_number'] and object 'evidence_number', in hopes that helps discern my issue. Any ideas are greatly appreciated...
'''
Processes extract from the extract, direct pulls from DJango models
tables.
'''
evidence_obj, created = Evidence.objects.get_or_create(case=case_obj,
evidence_number=extract_properties['evidence_number'])
evidence = pd.Series(extract_properties['evidence_number'])
print type(extract_properties['evidence_number'])
# <type 'unicode'>
print type(evidence)
# dtype: object
print str(evidence)
# 0 1
cols = ['mft_entry', 'sequence_nbr', 'parent_mft', 'parent_sequence_nbr', 'SI_mdate', 'SI_mtime', 'SI_adate', 'SI_atime', 'SI_cdate', 'SI_ctime', 'SI_bdate', 'SI_btime', 'FN_mdate', 'FN_mtime', 'FN_adate', 'FN_atime',\
'FN_cdate', 'FN_ctime', 'FN_bdate', 'FN_btime', 'typeof', 'extension', 'size', 'nname', 'ppath', 'symbolic_link', 'object_id', 'ads_metadata', 'time_warning', 'shortfilename_mdate', 'shortfilename_mtime',\
'shortfilename_adate', 'shortfilename_atime', 'shortfilename_cdate', 'shortfilename_ctime', 'shortfilename_bdate', 'shortfilename_btime', 'extracted_filepath']
frame = pd.read_csv('media/tmp/file.csv', delimiter='|', skiprows=6, names=cols, na_values=['', ' ', 'na'], encoding = 'latin-1', parse_dates=True, iterator=True, dayfirst=True, chunksize=1000)
df = pd.concat(frame)
df.fillna('Null')
df['evidence_id'] = evidence
df['evidence_id'].replace('NaN', evidence)
print df.head()
# shortfilename_bdate shortfilename_btime extracted_filepath \
# 0 2014-03-20 17:58:44.625 0x002aceb4 -> 0x002aceb6
# 1 2014-03-20 17:58:44.688 0x0009a9c6 -> 0x0009a9c8
# 2 NaN 0x00002ca5 -> 0x00002ca7
# 3 NaN 0x00000640 -> 0x00000642
# 4 NaN 0x002a9b9f -> 0x002a9ba0
# evidence_id
# 0 1
# 1 NaN
# 2 NaN
# 3 NaN
# 4 NaN
import pandas as pd
import numpy as np
test = pd.DataFrame({'a':[1,np.nan,np.nan],'b':[1,5,2]})
test['a'] = test['a'].fillna(test['a'].ix[0])
So in your case the last chunk of code would look like this.
df = pd.concat(frame)
df['evidence_id'] = evidence
df['evidence_id'] = df['evidence_id'].fillna(df['evidence_id'].ix[0])
User contributions licensed under CC BY-SA 3.0