regex - regexp length dependent on contents -


background: using matlab , opening large files long strings of hexadecimal represented in ascii text. matlab script interprets text based on patterns. have utilized bsxfun, cellfun, , arrayfun regular expressions parse out large chunks of data , remove pieces don't need/want. i'm getting down data need. catch is, pattern, in particular pattern length, dependent on value inside of pattern i'm decoding.

here basic building blocks:

hexcharpat = '([0-9a-f])'; hexbytepat = ['(' hexcharpat '{2})']; 

hexcharpat single hex character. hexbytepat 2 hex characters, or 1 byte of information.

now building blocks, searching data match various other patterns. many of patterns dependent on data within same pattern. here 1 example pattern (note: i'm using ... separate lines patterns follow same basic format in code):

    pattern3 = [ ...                  '(?<patnum>03)' ...                   '(?<numbytes>' hexbytepat '{1})' ...                  '(?<data>' hexbytepat '{1,125})' ...                 ];     returnvalues = regexp(datastr,pattern3); 

(bsxfun call removed focus , clarity)

returnvalues structure members patnum, numbytes, , data (regular expression tokens). in current pattern, data can 1 125 bytes. causing data token have more data should have. in reality, length of data equal hex2dec(numbytes).

a table gets populated based on members , values within returnvalues structure.

i think can "brute force" going ahead , letting data large, making table column put numbytes (this "helper column" otherwise don't care value) revisiting data column regexp , splitting further based on values of numbytes column, later in script discarding columns no longer need.

all of above cost processor time , memory space (on large files might run out of memory). have "tail" of data gets cut off may contain bytes other patterns. there "elegant" way single regexp (or maybe two, without having create lot of variables in between)?

note: if can done within regexp, don't include how use inside of bsxfun, cellfun, , arrayfun, sort out. pretend want regexp single line (or maybe two) , reason, adding more lines cost me lots of processor time (which if have repeatedly call 1 or more of functions mentioned). of course, there may way within functions use else - if needs shown in context, go it.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -