How can I extract first and last line from multiple text blocks separated with new line?

Question

How can I extract first and last line from multiple text blocks separated with new line?

I have a file containing multiple tests with detailed action written one beneath another. All test blocks are separated one from another by new line. I want to extract only first and last line from the all blocks and put it on one line for each block into a new file. Here is an example:

input.txt:

[test1]
duration
summary
code=
Results= PASS

[test2]
duration
summary=x
code=
Results=FAIL

.....

[testX]
duration
summary=x
code=
Results= PASS

output.txt should be sometime like this:

test1 PASS
test2 FAIL
...
testX PASS

eg2:

[Linux_MP3Enc_xffv.2_Con_37_003]
type = testcase
summary = MP3 encoder test
ActionGroup[Linux_Enc] = PASS
ActionGroup[Linux_Playb] = PASS
ActionGroup[Linux_Pause_Resume] = PASS
ActionGroup[Linux_Fast_Seek] = PASS
Duration = 230.607398987 s
Total_Result = PASS

[Composer__vtx_007]
type = testcase
summary = composer
Background[0xff000000] = PASS
Background[0xffFFFFFF] = PASS
Background[0xffFF0000] = PASS
Background[0xff00FF00] = PASS
Background[0xff0000FF] = PASS
Background[0xff00FFFF] = PASS
Background[0xffFFFF00] = PASS
Background[0xffFF00FF] = PASS
Duration = 28.3567230701 s
Total_Result = PASS


[Videox_Rotate_008]
type = testcase
summary = rotation
Rotation[0] = PASS
Rotation[1] = PASS
Rotation[2] = PASS
Rotation[3] = PASS
Duration = 14.0116529465 s
Total_Result = PASS

Thank you!

sed

asked on Stack Overflow Dec 3, 2019 by

mcm187 • edited Dec 3, 2019 by

mcm187

4 Answers

Short and simple gnu awk:

awk -F= -v RS='' '{print $1 $NF}' file
[Linux_MP3Enc_xffv.2_Con_37_003] PASS
[Composer__vtx_007] PASS
[Videox_Rotate_008] PASS

If you do not like the brackets:

awk -F'[]=[]' -v RS='' '{print $2 $NF}' file
Linux_MP3Enc_xffv.2_Con_37_003 PASS
Composer__vtx_007 PASS
Videox_Rotate_008 PASS

answered on Stack Overflow Dec 3, 2019 by

Jotne • edited Dec 3, 2019 by

Jotne

One way to solve this is using a regular expression such as:

(?<testId>test\d+)(?:.*\n){4}.*(?<outcome>PASS|FAIL)

The regex matches your sample output and stores the test id (e.g. "test1") in the capture group named "testId" and the outcome (e.g. "PASS") in the capture group "outcome".

(Test it in regexr)

The regex can be used in any language with regex support. The below code shows how to do it in Python.

(Test it in repl.it)

import re

# Read from input.txt
with open('input.txt', 'r') as f:
  indata = f.read()

# Modify the regex slightly to fit Python regex syntax
pattern = '(?:.*)(?P<testId>test\d+)(?:.*\n){4}.*(?P<outcome>PASS|FAIL)'

# Get a generator which yeilds all matches
matches = re.finditer(pattern, indata)

# Combine the matches to a list of strings
outputs = ['{} {}'.format(m.group('testId'), m.group('outcome')) for m in matches]

# Join all rows to one string
output = '\n'.join(outputs)

# Write to output.txt
with open('output.txt', 'w') as f:
  f.write(output)

Running the above script on input.txt containing:

[test1]
duration
summary
code=
Results= PASS

[test2]
duration
summary=x
code=
Results=FAIL

[test444]
duration
summary=x
code=
Results= PASS

yields a file output.txt containing:

test1 PASS
test2 FAIL
test444 PASS

answered on Stack Overflow Dec 3, 2019 by

k.a.ll.e • edited Dec 3, 2019 by

k.a.ll.e

Using sed as tagged (although other tools would probably be more natural to use) :

sed -nE '/^\[.*\]$/h;s/^Results= ?//;t r;b;:r;H;x;s/\n/ /;p'

Explanation :

/^\[.*\]$/h         # matches the [...] lines, put them in the hold buffer
s/^Results= ?//     # matches the Results= lines, discards the useless part
t r;b               # on lines which matched, jump to label r; 
                    # otherwise jump to the end (and start processing the next line)
:r;H;x;s/\n/ /;p    # label r; append the pattern space (which contains the end of the Results= line)
                    # to the hold buffer. Switch Hold buffer and pattern space,
                    # replace the linefeed in the pattern space by a space and print it

You can try it here.

answered on Stack Overflow Dec 3, 2019 by

Aaron • edited Dec 3, 2019 by

Aaron

In order to print the first and last line from the block, how about:

awk -v RS="" '{
    n = split($0, a, /\n/)
    print a[1]
    print a[n]
}' input.txt

Result for the 1st example:

[Linux_MP3Enc_xffv.2_Con_37_003]
Total_Result = PASS
[Composer__vtx_007]
Total_Result = PASS
[Videox_Rotate_008]
Total_Result = PASS

The man page of awk tells:

If RS is set to the null string, then records are separated by blank lines.

You can easily split the block with blank lines with this feature.

Hope this helps.

answered on Stack Overflow Dec 3, 2019 by

tshiono

User contributions licensed under CC BY-SA 3.0