Below is our software's log file sample. I like to analysis this data with the help of R language to get some insight information.
30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f
30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry
30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1
30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.
30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2
Each log file contains 20k lines and we have plenty of log files.
My requirement is to split as following.
| 30-Mar-14 | 17:59:58.1244 | (6628 6452) | Module1.exe:Program1.cpp,v | :854: | ERROR: group 7 failed with error = 0x8004000f |
I tried to import this dataset using "Import Dataset" -->"From File" in R studio. I tried with different options available there. But it unable to recognize the fields. Is there any option split based on patterns or regular expression?
Software environment:
Note: I have edited the log file to remove real module names.
There is no such option in the GUI itself (unlike Excel or SPSS, for instance, which might have more powerful GUI import options). You need a script for that.
You can construct a regular expression with placeholders that matches all lines, and call gsub
to extract the values in the placeholders. For instance:
text <- readLines("log.log")
rx <- "^([0-9]+-[^-]+[0-9]+) +([0-9]+:[0-9]+:[0-9]+[.][0-9]+) +.*$"
stopifnot(grepl(rx, text))
And then:
date <- gsub(rx, "\\1", text)
time <- gsub(rx, "\\2", text)
date.time.df <- data.frame(date, time)
Or:
date.time <- gsub(rx, "\\1\n\\2", text)
date.time.l <- strsplit(date.time, "\n")
do.call(rbind, date.time.l)
Enhance rx
to match the other fields.
Here is a script that will do it:
x <- scan(text = "30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f
30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry
30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1
30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.
30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2",
what = '', sep = '\n')
# pull off date/time
dateTime <- sapply(strsplit(x, ' '), '[', 1:2)
# piece together with "|"
dateTime <- apply(dateTime, 2, paste, collapse = "|")
newX <- sub("^[^ ]+ [^(]+", "", x)
# extract the data in parenthesises
par1 <- sub("(\\([^)]+\\)).*", "\\1", newX)
newX <- sub("[^)]+\\)", "", newX) # remove data just matched
# parse the rest of the data
x <- strsplit(newX, ":")
y <- sapply(x, function(.line){
paste(c(paste(c(.line[1], .line[2]), collapse = ":")
, paste0(":", .line[3], ":")
, paste(.line[-(1:3)], collapse = ":")
), collapse = "|")
})
# put it all back together
paste0("|"
, dateTime
, "|"
, par1
, "|"
, y
, "|"
)
Here is the output of the script:
[1] "|30-Mar-14|17:59:58.1244|(6628 6452)| Module1.exe:Program1.cpp,v|:854:| ERROR: group 7 failed with error = 0x8004000f|"
[2] "|30-Mar-14|17:59:58.1254|(6628 6452)| Module1.exe:Program1.cpp,v|:880:| ERROR: group 7 failed on its 3 retry|"
[3] "|30-Mar-14|18:00:04.8491|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 1|"
[4] "|30-Mar-14|18:00:08.6213|( -1 1376 13900)| Module2.exe:Execute|:603:| Information - command 1 completed.|"
[5] "|30-Mar-14|18:00:08.6273|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 2|"
User contributions licensed under CC BY-SA 3.0