I'm trying to match two strings from a text file.I wrote a function to perform the matching for both strings. Only one string is printed and the other one is ignored although the sequence works : matched the string and tested with an online regex tester.
NVRAM_info.txt
NvRam is available, BlockSize is : 0x00001000
Max. datasize is : 0x00040000
NVRAM_INFO = "NVRAM_info.txt"
import re
def Capture(Form,File_name,Fil,index):
B = " "
with open(File_name) as Fil:
p = re.compile(Form)
for line in Fil:
m = p.match(line)
if m != None:
B= m.group(1)
if index == 2 :
logging.info("the maximal Data size of the NVRAM is:%s",B)
else:
logging.info("the NVRAM Blocksize is:%s",B)
break
else:
logging.info("couldnt find the Maximal userspace memory size")
return B
Maxsize = Capture(r"\s{1,}Max. datasize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,2)
BlockSize = Capture(r"NvRam is available, BlockSize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,1)
Try this
NVRAM_INFO = "NVRAM_info.txt"
import re
file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)
g = re.findall(p, test_str)
Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)
Output:
0x00040000
0x00001000
Some of the answers given, though work, can be made more efficient as follows. If s is the line being searched, then
reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'
In [62]: pat = re.compile(reg)
In [64]: blocksize, maxsize = pat.search(s).groups()
In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')
Now, that we know it works, let's see if it's more efficient. (Comparing with @Tim007 's answer)
In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop
In [74]: timeit re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop
So it is about 2.31 times faster. The use of \d{8} in place of \d+ makes it efficient because more specific it is, faster it becomes. And secondly this version is less problematic because it doesn't use the re.DOTALL flag, rather makes do with \n.
If given a choice, it’s usually better to define your regular expression pattern so that it works correctly without the need for extra flags. (Beazly)
User contributions licensed under CC BY-SA 3.0