Importing unstructured software log file in R?

0

Below is our software's log file sample. I like to analysis this data with the help of R language to get some insight information.

30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f

30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry

30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1

30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.

30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2

Each log file contains 20k lines and we have plenty of log files.

My requirement is to split as following.

| 30-Mar-14 | 17:59:58.1244 | (6628 6452) | Module1.exe:Program1.cpp,v | :854: | ERROR: group 7 failed with error = 0x8004000f |

I tried to import this dataset using "Import Dataset" -->"From File" in R studio. I tried with different options available there. But it unable to recognize the fields. Is there any option split based on patterns or regular expression?


Software environment:

  • R language v3.0.3
  • R studio
  • Windows 7

Note: I have edited the log file to remove real module names.

r
asked on Stack Overflow Apr 17, 2014 by Jeno Karthic • edited Jun 20, 2020 by Community

2 Answers

1

There is no such option in the GUI itself (unlike Excel or SPSS, for instance, which might have more powerful GUI import options). You need a script for that.

You can construct a regular expression with placeholders that matches all lines, and call gsub to extract the values in the placeholders. For instance:

text <- readLines("log.log")
rx <- "^([0-9]+-[^-]+[0-9]+) +([0-9]+:[0-9]+:[0-9]+[.][0-9]+) +.*$"
stopifnot(grepl(rx, text))

And then:

date <- gsub(rx, "\\1", text)
time <- gsub(rx, "\\2", text)
date.time.df <- data.frame(date, time)

Or:

date.time <- gsub(rx, "\\1\n\\2", text)
date.time.l <- strsplit(date.time, "\n")
do.call(rbind, date.time.l)

Enhance rx to match the other fields.

answered on Stack Overflow Apr 17, 2014 by krlmlr
1

Here is a script that will do it:

x <- scan(text = "30-Mar-14 17:59:58.1244 (6628 6452) Module1.exe:Program1.cpp,v:854: ERROR: group 7 failed with error = 0x8004000f

30-Mar-14 17:59:58.1254 (6628 6452) Module1.exe:Program1.cpp,v:880: ERROR: group 7 failed on its 3 retry

30-Mar-14 18:00:04.8491 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 1

30-Mar-14 18:00:08.6213 ( -1 1376 13900) Module2.exe:Execute:603: Information - command 1 completed.

30-Mar-14 18:00:08.6273 ( -1 1376 13900) Module2.exe:Execute:803: Information - Executing command 2",
    what = '', sep = '\n')

# pull off date/time
dateTime <- sapply(strsplit(x, ' '), '[', 1:2)
# piece together with "|"
dateTime <- apply(dateTime, 2, paste, collapse = "|")
newX <- sub("^[^ ]+ [^(]+", "", x) 
# extract the data in parenthesises
par1 <- sub("(\\([^)]+\\)).*", "\\1", newX)
newX <- sub("[^)]+\\)", "", newX)  # remove data just matched

# parse the rest of the data
x <- strsplit(newX, ":")
y <- sapply(x, function(.line){
    paste(c(paste(c(.line[1], .line[2]), collapse = ":")
      , paste0(":", .line[3], ":")
      , paste(.line[-(1:3)], collapse = ":")
      ), collapse = "|")
})

# put it all back together
paste0("|"
    , dateTime
    , "|"
    , par1
    , "|"
    , y
    , "|"
    )

Here is the output of the script:

[1] "|30-Mar-14|17:59:58.1244|(6628 6452)| Module1.exe:Program1.cpp,v|:854:| ERROR: group 7 failed with error = 0x8004000f|"
[2] "|30-Mar-14|17:59:58.1254|(6628 6452)| Module1.exe:Program1.cpp,v|:880:| ERROR: group 7 failed on its 3 retry|"         
[3] "|30-Mar-14|18:00:04.8491|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 1|"              
[4] "|30-Mar-14|18:00:08.6213|( -1 1376 13900)| Module2.exe:Execute|:603:| Information - command 1 completed.|"             
[5] "|30-Mar-14|18:00:08.6273|( -1 1376 13900)| Module2.exe:Execute|:803:| Information - Executing command 2|"              
answered on Stack Overflow Apr 17, 2014 by Data Munger

User contributions licensed under CC BY-SA 3.0