I've collected authentication procresses within a ~5GB Log-File. Now I want to alter all parts of the data, which makes it possible to identify where the original data was comming from, because it will be used as training data for machine learning (and maybe published).
Since the logic whithin the data has to be preserved, I thought of changing the IP and MAC addresses with the modulo operator. But I don't know, how to (fast) replace all of them with python (re?).
My first attempt was to use re.search, sort the found IP in 4 parts and change every part with a different modulo operator. The problems occuring where: - it's ugly - it's slow - only does it for the first match
Does anybody know a decent way to solve this problem ?
______EDIT_____
Example-logs:
RID: "700011"; RL: "1"; RG: "windows,authentication_failures,"; RC: "A Kerberos authentication ticket was requested: Failure."; USER: "(no user)"; SRCIP: "None"; HOSTNAME: "(boatyMcBoatface) 10.19.18.1->WinEvtLog"; LOCATION: "(boatyMcBoatface) 10.19.18.1->WinEvtLog"; EVENT: "[INIT]2018 Aug 01 01:59:40 WinEvtLog: Security: AUDIT_FAILURE(4768): Microsoft-Windows-Security-Auditing: (no user): no domain: boatyMcBoatface.haven.ssh: A Kerberos authentication ticket (TGT) was requested. Account Information: Account Name: BackupNow Supplied Realm Name: haven.ssh User ID: S-1-0-0 Service Information: Service Name: krbtgt/haven.ssh Service ID: S-1-0-0 Network Information: Client Address: ::ffff:10.15.16.166 Client Port: 53680 Additional Information: Ticket Options: 0x40810010 Result Code: 0x17 Ticket Encryption Type: 0xffffffff Pre-Authentication Type: - Certificate Information: Certificate Issuer Name: Certificate Serial Number: Certificate Thumbprint: Certificate information is only provided if a certificate was used for pre-authentication. Pre-authentication types, ticket options, encryption types and result codes are defined in RFC 4120.[END]"; ' plugin_sid='700011' proto='6' ctx='192222c3-2222-22222222-422222226754' src_host='' dst_host='' src_net='19111112c3-2222-22222222-422222226754' dst_net='333333a8-f526-1356-bbbe-005022285e074' username='BackupNow' userdata1='1' userdata2='windows,authentication_failures,' userdata3='A Kerberos authentication ticket was requested: Failure.' userdata4='krbtgt/haven.ssh' userdata5='0x17' userdata6='0xffffffff' userdata7='-' userdata9='haven.ssh' device='10.19.18.1'/>ost_dst='boatyMcBoatface' idm_mac_src='12:E4:B1:2B:B3:BB' idm_mac_dst='12:E4:B1:2B:B3:BB' device='10.19.19.23'/>
RID: "700003"; RL: "5"; RG: "windows,"; RC: "Windows Network Logon"; USER: "evservice"; SRCIP: "10.3.3.39"; HOSTNAME: "(boatyMcBoatface) 10.19.19.23->WinEvtLog"; LOCATION: "(boatyMcBoatface) 10.19.19.23->WinEvtLog"; EVENT: "[INIT]2018 Aug 01 01:59:37 WinEvtLog: Security: AUDIT_SUCCESS(4624): Microsoft-Windows-Security-Auditing: evservice: SSI-LOG: boatyMcBoatface.haven.ssh: An account was successfully logged on. Subject: Security ID: S-1-0-0 Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 New Logon: Security ID: S-1-5-21-88886292-694438636-1307214239-9687 Account Name: myservice Account Domain: MY-LOG Logon ID: 0x226aa299c6 Logon GUID: {0354E718-498F-039C-83C2-725752D013BE} Process Information: Process ID: 0x0 Process Name: - Network Information: Workstation Name: Source Network Address: 10.3.3.39 Source Port: 61266 Detailed Authentication Information: Logon Process: Kerberos Authentication Package: Kerberos Transited Services: - Package Name (NTLM only): - Key Length: 0 This event is generated when a logon session is created. It is generated on the computer that was accessed. [END]"; ' plugin_sid='700003' proto='6' ctx='584a8883-a333-22a6-adde-000000876224' src_host='' dst_host='aaaaaaa-2ebf-e2ea-eee-e053079999ed' src_net='555555-f226-11e6-bbbb-005056876974' dst_net='666666de-2be4-8242-1d75-45b6aaaaaaaa' username='myservice' userdata1='5' userdata2='windows,' userdata3='Windows Network Logon' userdata4='4624' userdata5='3' userdata6='MY-LOG' userdata7='0x226cb22322' userdata8='-' idm_host_dst='boatyMcBoatface' idm_mac_dst='A1:15:14:AB:1C:1D' device='10.19.19.23'/>
RID: "700014"; RL: "1"; RG: "windows,authentication_failures,"; RC: "Kerberos user pre-authentication failed."; USER: "(no user)"; SRCIP: "None"; HOSTNAME: "(my-dc02) 22.22.65.6->WinEvtLog"; LOCATION: "(my-dc02) 22.22.65.6->WinEvtLog"; EVENT: "[INIT]2018 Aug 01 09:04:50 WinEvtLog: Security: AUDIT_FAILURE(4771): Microsoft-Windows-Security-Auditing: (no user): no domain: my-dc02.my.ssh: Kerberos pre-authentication failed. Account Information: Security ID: S-1-5-21-1993962763-602162358-1801674531-2146 Account Name: sys-dobackup Service Information: Service Name: krbtgt/gb Network Information: Client Address: ::ffff:22.22.1.1 Client Port: 61391 Additional Information: Ticket Options: 0x40810010 Failure Code: 0x18 Pre-Authentication Type: 2 Certificate Information: Certificate Issuer Name: Certificate Serial Number: Certificate Thumbprint: Certificate information is only provided if a certificate was used for pre-authentication. Pre-authentication types, ticket options and failure codes are defined in RFC 4120. If the ticket was malformed or damaged during transit and could not be decrypted, then many fields in this event might not be present.[END]"; ' plugin_sid='700014' proto='6' ctx='aaaaaaa-e2cf-12a9-9c1f-288888a5c27' src_host='aaaaaa3-ff38-22e6-b718-01544442f94' dst_host='55555ec3-ff20-5515-8059-0011111a2b4' src_net='a6d1111d-7111-811d-f35-f4ea131269107' dst_net='44449bea-960c-4446-6f444-d4444f159b8' username='sys-dobackup' userdata1='1' userdata2='windows,authentication_failures,' userdata3='Kerberos user pre-authentication failed.' userdata4='4771' userdata5='2' userdata6='krbtgt/gb' userdata7='0x18' idm_host_src='do-dc01' idm_host_dst='my-dc02' idm_mac_src='11:30:22:37:33:63' idm_mac_dst='22:21:56:44:14:21' device='22.22.65.6'/>
____EDIT_2___
Example:
_____Before____
1 date time src_ip=192.168.1.1 dst_ip=192.168.1.2 msg
2 date time src_ip=192.168.1.1 dst_ip=192.168.1.3 msg
3 date time src_ip=192.168.1.9 dst_ip=192.168.1.2 msg
_____After_____
1 date time src_ip=1.168.1.2 dst_ip=1.168.1.3 msg
2 date time src_ip=1.168.1.2 dst_ip=1.168.1.4 msg
3 date time src_ip=1.168.1.10 dst_ip=1.168.1.3 msg
My_garbage_code:
import re
file = "C:\Users\Hank\Desktop\Huge.log"
file2 = "C:\Users\Hank\Desktop\Huge2.log"
searchstring = "some_regex_magic"
with open(file) as f:
for line in f:
result = re.findall(searchstring, line)
if result:
ip = old_ip+anonymize_em_all
#No Idea, how to add them back into the string at the correct postion
#replace them directly maybe, without writing a new file ?
res2 ="+ip+\n"
with open(file2,"a") as myfile:
myfile.write(res2)
myfile.close()
best regards
try using the below code, its rough around the edges but doing the replacement.
import re
input=["1 date time src_ip=192.168.1.1 dst_ip=192.168.1.2 msg",
"2 date time src_ip=192.168.1.1 dst_ip=192.168.1.3 msg",
"3 date time src_ip=192.168.1.9 dst_ip=192.168.1.2 msg"]
for line in input:
print re.sub("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}","x.x.x.x",line)
Sample output:
1 date time src_ip=x.x.x.x dst_ip=x.x.x.x msg
2 date time src_ip=x.x.x.x dst_ip=x.x.x.x msg
3 date time src_ip=x.x.x.x dst_ip=x.x.x.x msg
Hope this helps! cheers!
User contributions licensed under CC BY-SA 3.0