c# - Extract Multiple Occurances of Variable Length Text Without Multiple Patterns -
from following data .xxx[val1, val2, val3]
values of val1
, val2
, val3
need extracted.
if 1 uses pattern @"\[(.*?), (.*?), (.*?)\]"
data can extracted, when data string varies fails all data.
take these variable examples
.xxx[val1]
or .xxx[val1, val2, val3, val4, val5]
or .xxx[{1-n},]
.
what single regular expression pattern can achieve results on sets of data provided examples?
what correct pattern this?
the best practice not match unknown, design pattern after knowns. in similar practice, not blindly match using .*
(zero or more of anything) backtracking can horrendously slow; why add complexity when not needed.
frankly 1 should favor +
1 or more usage more *
0 or more should used when specific items may not appear.
the string can vary.
it appears example if think compiler, tokens separated either ,
or ending ]
. let develop pattern knowledge (the knowns).
the best way capture consume until known found. using not set of [^ ]
pattern best; says match character not in set. add our total quantifier +
says 1 or more. replacing .*
in old pattern in reverse.
var data = ".xxx[val1, val2, val3, val4, val5]"; var pattern = @" [^[]+ # consume *not* brace # don't match , (.xxx first anchor) \[ # starting brace consumed ( # start of match captures (?<token>[^\s,\]]+) # named match grouping called `token` 1 or more # of not space, comma or end brace captured. [\s,\]]+ # consume token's `,` or space or final bracket. )+ # end match captures, 1 or more ] # ending brace." ; // ignorepatternwhitespace allows comment pattern, // not affect parser processing. regex.match(data, pattern, regexoptions.ignorepatternwhitespace) .groups["token"] .captures .oftype<capture>() .select(cp => cp.value);
result
Comments
Post a Comment