I'm trying to match two strings from a text file.I wrote a function to perform the matching for both strings. Only one string is printed and the other one is ignored although the sequence works : matched the string and tested with an online regex tester.
NVRAM_info.txt
NvRam is available, BlockSize is : 0x00001000
Max. datasize is : 0x00040000
NVRAM_INFO = "NVRAM_info.txt"
import re
def Capture(Form,File_name,Fil,index):
B = " "
with open(File_name) as Fil:
p = re.compile(Form)
for line in Fil:
m = p.match(line)
if m != None:
B= m.group(1)
if index == 2 :
logging.info("the maximal Data size of the NVRAM is:%s",B)
else:
logging.info("the NVRAM Blocksize is:%s",B)
break
else:
logging.info("couldnt find the Maximal userspace memory size")
return B
Maxsize = Capture(r"\s{1,}Max. datasize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,2)
BlockSize = Capture(r"NvRam is available, BlockSize is :\s{1,}([a-zA-Z_0-9]{1,})","NVRAM_info.txt",NVRAM_INFO,1)
Try this
NVRAM_INFO = "NVRAM_info.txt"
import re
file = open(NVRAM_INFO, 'r')
test_str = file.read();
p = re.compile(u'BlockSize is : (\dx\d+)\n.*?Max. datasize is : (\dx\d+)', re.DOTALL)
g = re.findall(p, test_str)
Maxsize = g[0][1]
BlockSize = g[0][0]
print(Maxsize)
print(BlockSize)
Output:
0x00040000
0x00001000
Some of the answers given, though work, can be made more efficient as follows. If s
is the line being searched, then
reg = r'BlockSize is : (0x\d{8})\n\s*Max\. datasize is : (0x\d{8})'
In [62]: pat = re.compile(reg)
In [64]: blocksize, maxsize = pat.search(s).groups()
In [65]: blocksize, maxsize
Out[65]: ('0x00001000', '0x00040000')
Now, that we know it works, let's see if it's more efficient. (Comparing with @Tim007 's answer)
In [66]: timeit pat.search(s).groups()
The slowest run took 8.41 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 2.38 µs per loop
In [74]: timeit re.findall(p, s) # @Tim007's answer
The slowest run took 4.94 times longer than the fastest. This could mean that an
intermediate result is being cached 100000 loops, best of 3: 5.51 µs per loop
So it is about 2.31 times faster. The use of \d{8}
in place of \d+
makes it efficient because more specific it is, faster it becomes. And secondly this version is less problematic because it doesn't use the re.DOTALL
flag, rather makes do with \n
.
If given a choice, it’s usually better to define your regular expression pattern so that it works correctly without the need for extra flags. (Beazly)
User contributions licensed under CC BY-SA 3.0