Complicated Preg_Match - match 500 characters after finding a keyword but dont start until the next line break

0

First to explain im new to php and im very new to preg_match and find it confusing, what im trying to do is find a keyword: exception: and then starting from the next line pull out 300 characters

I already have a pregmatch in place for this but want to improve it, what im doing is pulling 300 characters from the keyword, but the problem is after the keyword is the exception name, then on the next line is the code error, the exception can be written in any number of languages but the code error after the exception is independent of language so i want to filter out the exception since it varies by language so i know if the exception is a 100% match when comparing later.

Here is a few examples of an exception:

Exception: System.Runtime.InteropServices.COMException (0x800401D0): OpenClipboard Failed (Exception from HRESULT: 0x800401D0 (CLIPBRD_E_CANT_OPEN))
at System.Runtime.InteropServices.Marshal.ThrowExceptionForHRInternal(Int32 errorCode, IntPtr errorInfo)
at System.Windows.Clipboa

exception: Specified cast is not valid.
Query:Select * from TourneyData where Player_id = 1412
14:14:18.868 [SetCurrentPlayer:12 - DatabaseBase.HandleDatabaseConnectionException] 4: System.InvalidCastException: Specified cast is not valid.
at NpgsqlTypes.NpgsqlTimeStamp.op_I

Exception: System.NullReferenceException: Object reference not set to an instance of an object.
at System.Windows.Forms.Application.ThreadContext.ExitCommon(Boolean disposing)
at System.Windows.Forms.Application.ExitInternal()
at System.Windows.Forms.Application.Exit(C

So how im planning to get to the code error is to display all information on the next line following the keyword exception:

In the last example the output i would want is:

at System.Windows.Forms.Application.ThreadContext.ExitCommon(Boolean disposing)
at System.Windows.Forms.Application.ExitInternal()
at System.Windows.Forms.Application.Exit(C

Ok so here is the code im already using to gather 300 characters after the keyword:

// Snippet length constant
define(SNIPPET_LENGTH, 300);

$pos = stripos($body,$keyword);   
$snippet_pre = substr($body, $pos, SNIPPET_LENGTH);

Now i also use preg_match in a few functions to pull information, for example the code has this find log info:

12:19:42.787 [Main:1 - Bootstrapper.LogSystemInfo] Current culture: it-IT
12:19:42.865 [Main:1 - Bootstrapper.LogSystemInfo] Operating System Name: Microsoft Windows 7 Home Premium 
12:19:42.865 [Main:1 - Bootstrapper.LogSystemInfo] Operating System Architecture: 64 bit
12:19:42.865 [Main:1 - Bootstrapper.LogSystemInfo] Operating System Service Pack: Service Pack 1

This is the preg_match, only including as it might help distinguish how line breaks are distinguished because this catches all the info from BEFORE the line break but i cant figure out how to get 300 characters AFTER the line break:

    preg_match('/Current culture: (.*)/', $body, $culture_pre);
preg_match('/Operating System Name: (.*)/', $body, $os_name_pre);
preg_match('/Operating System Architecture: (.*)/', $body, $os_bit_pre);
preg_match('/Operating System Service Pack: (.*)/', $body, $os_service_pack_pre);

Let me know if you need any additional info

php
preg-match
substr
asked on Stack Overflow Dec 31, 2012 by user1547410

1 Answer

0

preg-match and all regex in general are hard to deal with when they encounter a \n or \r\n.

You can use the m modifier to solve somes cases, but the only thing it does is change the behaviour of the reserved characters $ and ^ making them match end or start of string taking into accound the \n as it would split the string in different substrings. I don't think this will work in your problem, but you could try.

There are other posible ways to fix this, although not all of them are exactly clean:

1- easy way: remove the \r\n or \r before applying the regex:

$chars=array("\r\n", "\n", "\r");
$string=str_replace($chars, '', $string);

The regex will work like this, but you lose the format of the strings if you want to keep it multi-lined.

2- easy and not so clean way: change the \n to a special char you know it won't appear in the string (for example #), apply the regex, change the special char again to \n. It's not pretty but if you are short in time it works.

3- not so easy, clean way: split the string using \n as key, read it line to line applying the preg_match(), if it matches save the following 2 or 3 (or whatever number you need to save) lines.

answered on Stack Overflow Dec 31, 2012 by Naryl

User contributions licensed under CC BY-SA 3.0