Regex matching two strings in one group

Question

Regex matching two strings in one group

I'm trying to match two strings from a text file.I wrote a function to perform the matching for both strings. Only one string is printed and the other one is ignored although the sequence works : matched the string and tested with an online regex tester.

NVRAM_info.txt

NvRam is available, BlockSize is : 0x00001000
            Max. datasize is : 0x00040000

#

NVRAM_INFO = "NVRAM_info.txt"
import re

def Capture(Form,File_name,Fil,index):
    B = " "

  with open(File_name) as Fil:
    p = re.compile(Form) 
    for line in Fil:
        m = p.match(line)
        if m != None:
            B= m.group(1)
            if index == 2 :
               logging.info("the maximal Data size of the NVRAM is:%s",B)

            else:
               logging.info("the NVRAM Blocksize is:%s",B)


        break
    else:
        logging.info("couldnt find the Maximal userspace memory size")
return B
Maxsize = Capture(r"\s{1,}Max. datasize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,2)
BlockSize = Capture(r"NvRam is available, BlockSize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,1)

regex

python-3.x

asked on Stack Overflow Apr 14, 2016 by

ES87ME

2 Answers

Try this

NVRAM_INFO = "NVRAM_info.txt"
import re

file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)

g = re.findall(p, test_str)

Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)

Output:

0x00040000
0x00001000

answered on Stack Overflow Apr 14, 2016 by

Tim007

Some of the answers given, though work, can be made more efficient as follows. If s is the line being searched, then

reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'

In [62]: pat = re.compile(reg)

In [64]: blocksize, maxsize = pat.search(s).groups()

In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')

Now, that we know it works, let's see if it's more efficient. (Comparing with @Tim007 's answer)

In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop

In [74]: timeit  re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an 
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop

So it is about 2.31 times faster. The use of \d{8} in place of \d+ makes it efficient because more specific it is, faster it becomes. And secondly this version is less problematic because it doesn't use the re.DOTALL flag, rather makes do with \n.

If given a choice, it’s usually better to define your regular expression pattern so that it works correctly without the need for extra flags. (Beazly)

answered on Stack Overflow Apr 18, 2016 by

C Panda

User contributions licensed under CC BY-SA 3.0