python - Get XML parent tag with regular expressions -

April 15, 2011

i know regular expressions not best way extract info out of xml file, in case it's better me use regular expressions because in structure of program used extract information out of different types of files (text, program code etc.)

let's have following xml code:

<modules>     <orba_sheepla>         <!-- module version -->         <version>0.9.25</version>     </orba_sheepla> </modules>

what need "orba_sheepla" in case. need in general tag 1 level above <version> tag (i.e. parent tag). possible there other tags before , after tag on same level. need make sure tag (or rather: name of tag) containing <version> tag found.

i have tried different kinds of regular expressions, can't seem write right one. can somehow tell expressions match "tag abc" following?

<tag abc>         <version>anything</version>     </the same tag abc>

of course, other solutions welcome!

tag 1 level above <version>
it's better me use regular expressions
can't use parser here

you should use xml parser! it's easier, more robust , shouldn't involve great effort refactoring. use lxml have getparent() function , xpath 1.0 implemented.
^{thanks stribizhev recommending should remarked}

anyway, here's workaround work simple cases (and fail in many real-life examples).

if, , if, xml indented, capture next closing tag lower indentation level.

regex:

(?smi)^([ \t]+)<version>.*?^(?!\1)[ \t]*</([^\s>]+)

captures closing tag in group 2.
is:

^([ \t]+) captures spaces before
<version> tag want
.*?^ finds next line
(?!\1)[ \t]* less indentation
</([^\s>]+) , captures closing tag

code:

import re  text = ''' <modules>     <orba_sheepla>         <!-- module version -->         <version>0.9.25</version>     </orba_sheepla> </modules>'''  pattern = re.compile( r'^([ \t]+)<version>.*?^(?!\1)[ \t]*</([^\s>]+)', re.i | re.s | re.m) match = pattern.search(text)  if match:     print(match.group(2))

output:

orba_sheepla

regex101 demo ideone demo

Search This Blog

TSQL

python - Get XML parent tag with regular expressions -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -