regex - regexp length dependent on contents -
background: using matlab , opening large files long strings of hexadecimal represented in ascii text. matlab script interprets text based on patterns. have utilized bsxfun, cellfun, , arrayfun regular expressions parse out large chunks of data , remove pieces don't need/want. i'm getting down data need. catch is, pattern, in particular pattern length, dependent on value inside of pattern i'm decoding.
here basic building blocks:
hexcharpat = '([0-9a-f])'; hexbytepat = ['(' hexcharpat '{2})'];
hexcharpat single hex character. hexbytepat 2 hex characters, or 1 byte of information.
now building blocks, searching data match various other patterns. many of patterns dependent on data within same pattern. here 1 example pattern (note: i'm using ...
separate lines patterns follow same basic format in code):
pattern3 = [ ... '(?<patnum>03)' ... '(?<numbytes>' hexbytepat '{1})' ... '(?<data>' hexbytepat '{1,125})' ... ]; returnvalues = regexp(datastr,pattern3);
(bsxfun call removed focus , clarity)
returnvalues structure members patnum
, numbytes
, , data
(regular expression tokens). in current pattern, data
can 1 125 bytes. causing data
token have more data should have. in reality, length of data
equal hex2dec(numbytes)
.
a table gets populated based on members , values within returnvalues structure.
i think can "brute force" going ahead , letting data
large, making table column put numbytes
(this "helper column" otherwise don't care value) revisiting data
column regexp , splitting further based on values of numbytes
column, later in script discarding columns no longer need.
all of above cost processor time , memory space (on large files might run out of memory). have "tail" of data gets cut off may contain bytes other patterns. there "elegant" way single regexp (or maybe two, without having create lot of variables in between)?
note: if can done within regexp, don't include how use inside of bsxfun, cellfun, , arrayfun, sort out. pretend want regexp single line (or maybe two) , reason, adding more lines cost me lots of processor time (which if have repeatedly call 1 or more of functions mentioned). of course, there may way within functions use else - if needs shown in context, go it.
Comments
Post a Comment