Inverse Match Regex .NET

Started by Mr. Analog, March 12, 2015, 03:31:35 PM

Previous topic - Next topic

Mr. Analog

I have a string that contains keywords and strings contained within quotes

"BLAH" CMD "OTHER BLAH"

and while it's trivially easy to select things within the quotes:

"[^"]+"|(\+)

"BLAH" CMD "OTHER BLAH"

I haven't found a way to invert this selection

"BLAH" CMD "OTHER BLAH"

I know inverse selection is easier in some languages than others, but I'm living in Microsoft land and I haven't found anything that seems to want to work. Here's a convenient online .NET regex testing tool (Silverlight req'd):
http://regexhero.net/tester/

Anyway, there's more than one way to parse a cat, but I'm just surprised that inverse selection is such a massive pain
By Grabthar's Hammer

Darren Dirt

grunch:

https://regex101.com/ FTMFW.

Also, I am too brain-tired to try making sense of the OP. But I use RegEx101.com all the time, lets you build the expression in pieces with helpful explanations of each piece, test out on test data etc.




post-grunch:

what about doing a "temporary" replacement of the " parts, i.e. using the RegEx as you described to temporarily replace...
"leftpart" middlepart "rightpart"
with...
"leftpart"`_`middlepart`_`"rightpart"
and then VOILA you can parse to your heart's content the `_`middlepart`_`
then after whatever you could still recover the original text if needed (if you weren't just needing to examine the middle part.


Just an idea, might be completely not-useful to your needs. But w/e, brain tired as I said.


_____________________

Strive for progress. Not perfection.
_____________________

Mr. Analog

That was my immediate thought; match/replace content within quotes and then pick out the rest from the resultant string

However you lose the position in the matched commands in the original string

The root problem was there are operator keywords contained within quotation marks that the original regex didn't account for, so some invalid operands are created:

"this" AND "this and that"

Current code begins parsing conditions from left to right without considering quotation marks containing complex strings that may contain operator keywords, so the current result of this becomes:

Left Operand: this
Operator: AND
Right Operand: this and that
Left Operand: this
Operator: and
Right Operand: that

Which results in the following invalid expression:
value1 AND value2 AND value3

When what is intended is:

Left Operand: this
Operator: AND
Right Operand: this and that

value1 AND value2

Now, thankfully there are quotation marks that work as delimiters and create values from the quoted content. it would be SO EASY if I could just make an inverse selection because I can then find the match positions and create an expression without having to build an actual expression parser :)
By Grabthar's Hammer

Lazybones

So do you want a single regex that repeatedly matches quoted values?

Do you want a regex that uses groups and returns groups of the values ?

I am a little confused still on how the code will use the regex.

Darren Dirt

Quote from: Lazybones on March 12, 2015, 05:32:10 PM
So do you want a single regex that repeatedly matches quoted values?

Do you want a regex that uses groups and returns groups of the values ?

I am a little confused still on how the code will use the regex.

Same.

If it's within code then you can break down the logical steps and deal with the results of multiple RegEx matchings. If it's a single-line command somewhere else then it's possibly a little tougher, but in that case you're probably just looking to focus on a piece of the whole pie, so grouping the "piece in quotes" from the rest should get the job done right?

_____________________

Strive for progress. Not perfection.
_____________________

Lazybones

Also what limits and operators are there for the input"

Unlimited and or quote blocks?
Are no quotes or operator a possible input?
Are there additional NOT operators?

Thorin

HOLY CRAP I KNOW THE ANSWER TO THIS.  What you want is non-capturing groups.  Non-capturing groups are indicated with a questionmark followed by a colon.  So the group is indicated with parentheses, then you put the questionmark and colon at the start of the group (inside the parentheses).  Here's the regex for the selection you're trying to make:


(?:^|")([^"]*)(?:$|")
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Mr. Analog

Quote from: Thorin on March 12, 2015, 05:42:04 PM
HOLY CRAP I KNOW THE ANSWER TO THIS.  What you want is non-capturing groups.  Non-capturing groups are indicated with a questionmark followed by a colon.  So the group is indicated with parentheses, then you put the questionmark and colon at the start of the group (inside the parentheses).  Here's the regex for the selection you're trying to make:


(?:^|")([^"]*)(?:$|")


THANK YOU

Sorry for the confusion guys, I may have given too much detail. The root problem was how to make a selection of content not matching a pattern
By Grabthar's Hammer

Thorin

Also, if you're doing this in .NET and trying to parse a string of operands and operators, and the operators are outside the quotation marks and the operands are inside the quotation marks, why not just split the string into an array, using the quotation marks as your splitter?

"this" AND "this and that"
would result in an array like so:
0: <string.empty>
1: this
2:  AND
3: this and that
4: <string.empty>

If there was a quotation mark at the start of your string, drop the first array item.  If there was a quotation mark at the end of your string, drop the last array item.  Or even easier, remove the first and last characters if they're quotation marks, then do your split.
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Mr. Analog

Quote from: Thorin on March 12, 2015, 05:49:43 PM
Also, if you're doing this in .NET and trying to parse a string of operands and operators, and the operators are outside the quotation marks and the operands are inside the quotation marks, why not just split the string into an array, using the quotation marks as your splitter?

"this" AND "this and that"
would result in an array like so:
0: <string.empty>
1: this
2:  AND
3: this and that
4: <string.empty>

If there was a quotation mark at the start of your string, drop the first array item.  If there was a quotation mark at the end of your string, drop the last array item.  Or even easier, remove the first and last characters if they're quotation marks, then do your split.

Eh, six of one half dozen of the other, the existing code already split expressions up using Regex and I've built expressions like the above in JavaScript regex I just couldn't remember how
By Grabthar's Hammer

Thorin

This might be even better:


("[^"]*")([^"]*)


It gives you two group names (group "1" and group "2").  Group 1 has all the operands, group 2 has all the operators.  In a single regex statement.

edit: or you can assign it your own group names like so:


(?<operands>"[^"]*")(?<operators>[^"]*)


You can use these group names in other regex expressions, as well as use the names with .NET regex code.
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Mr. Analog

By Grabthar's Hammer

Darren Dirt

Quote from: Mr. Analog on March 12, 2015, 06:22:18 PM
Nice, this'll come in handy

By that brief statement I'm guessing you were "tip of the tongue" close to solving it yourself, and got reminded by our "no idea what precisely you are trying to do exactly" suggestions... But good news, apparently!  8)

_____________________

Strive for progress. Not perfection.
_____________________