Spliting .csv add additional column to each new .csv

0

I use following code to split .csv file depends on main .csv column 8:

import csv
import pandas as pd    

def spliteCsv(input,output):
    print(input)
    data=set()
    with open (input) as csvfile:
        file = csv.reader (csvfile,delimiter=',')
        next (file,None)
        for row in file:
            if row[7] =='':
                data.add (-1)
            else:
                data.add (int(row[7]))

    data = list(data)
    ofile = pd.read_csv (input, sep=',')
    data.append(max(data)+1)
    for d in data:
        csv_temp = ofile[ofile['col8'].fillna (max(data)).astype(int) == d]
        csv_temp.to_csv ('%s_%s.csv'%(output,d),sep=',')
    return 

here is what I need:

col1  col2  col3  col4  col5  col6  col7  col8  col9 
1     a     k8                            5 
2     j     l9                            5
3     k     o0                            5
4     l     m7                            5

and here is the code output:

col0  col1  col2  col3  col4  col5  col6  col7  col8  col9 
0     1     a     k8                            5 
1     2     j     l9                            5
2     3     k     o0                            5
3     4     l     m7                            5

as you understand, It insert additional column as first column which contains value(col1) - 1

Edit:

source.csv:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    1           1501756607          192.168.1.10    37.48.64.201        47159           7095        1               1           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=1 Ack=1 Win=2235 Len=149 TSval=19928932 TSecr=2777283254
    2           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               2           66              0x00000010      7095 → 47159 [ACK] Seq=1 Ack=150 Win=91 Len=0 TSval=2777285491 TSecr=19928932
    3           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               1           215             0x00000018      7095 → 47159 [PSH, ACK] Seq=1 Ack=150 Win=91 Len=149 TSval=2777285491 TSecr=19928932
    4           1501756607          192.168.1.10    37.48.64.201        47159           7095        150             2           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=150 Ack=150 Win=2235 Len=149 TSval=19928977 TSecr=2777285491
    5           1501756607          192.168.1.10    37.48.64.201        47159           7095        299             2           343             0x00000018      47159 → 7095 [PSH, ACK] Seq=299 Ack=150 Win=2235 Len=277 TSval=19928979 TSecr=2777285491
    6           1501756607          37.48.64.201    192.168.1.10        7095            47159       150                         66              0x00000010      7095 → 47159 [ACK] Seq=150 Ack=576 Win=91 Len=0 TSval=2777285537 TSecr=19928977

output files:

file 1:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    1           1501756607          192.168.1.10    37.48.64.201        47159           7095        1               1           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=1 Ack=1 Win=2235 Len=149 TSval=19928932 TSecr=2777283254
    3           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               1           215             0x00000018      7095 → 47159 [PSH, ACK] Seq=1 Ack=150 Win=91 Len=149 TSval=2777285491 TSecr=19928932

file 2:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    2           1501756607          37.48.64.201    192.168.1.10        7095            47159       1               2           66              0x00000010      7095 → 47159 [ACK] Seq=1 Ack=150 Win=91 Len=0 TSval=2777285491 TSecr=19928932
    4           1501756607          192.168.1.10    37.48.64.201        47159           7095        150             2           215             0x00000018      47159 → 7095 [PSH, ACK] Seq=150 Ack=150 Win=2235 Len=149 TSval=19928977 TSecr=2777285491
    5           1501756607          192.168.1.10    37.48.64.201        47159           7095        299             2           343             0x00000018      47159 → 7095 [PSH, ACK] Seq=299 Ack=150 Win=2235 Len=277 TSval=19928979 TSecr=2777285491

file 3:

frame.number    frame.time_epoch        ip.src          ip.dst      tcp.srcport     tcp.dstport     tcp.seq     tcp.stream      frame.len       tcp.flags       _ws.col.Info
    6           1501756607          37.48.64.201    192.168.1.10        7095            47159       150             3           66              0x00000010      7095 → 47159 [ACK] Seq=150 Ack=576 Win=91 Len=0 TSval=2777285537 TSecr=19928977
python-3.x
pandas
csv
asked on Stack Overflow Aug 16, 2017 by user3806649 • edited Aug 16, 2017 by user3806649

1 Answer

2

Use index=False parameter:

csv_temp.to_csv ('%s_%s.csv'%(output,d),sep=',', index=False)
# NOTE:                                          ^^^^^^^^^^^

UPDATE:

df = pd.read_csv('/path/to/source/file.csv')

df['tcp.stream'] = pd.to_numeric(df['tcp.stream'], errors='coerce').fillna(-1)

# please set desired path and file name in the next line 
output_path_template = 'd:/temp/tcp.stream.{}.csv'

df.groupby('tcp.stream') \
  .apply(lambda x: x.to_csv(output_path_template.format(x.name), index=False))
answered on Stack Overflow Aug 16, 2017 by MaxU • edited Aug 16, 2017 by MaxU

User contributions licensed under CC BY-SA 3.0