python - Get XML parent tag with regular expressions -
i know regular expressions not best way extract info out of xml file, in case it's better me use regular expressions because in structure of program used extract information out of different types of files (text, program code etc.)
let's have following xml code:
<modules> <orba_sheepla> <!-- module version --> <version>0.9.25</version> </orba_sheepla> </modules>
what need "orba_sheepla" in case. need in general tag 1 level above <version>
tag (i.e. parent tag). possible there other tags before , after tag on same level. need make sure tag (or rather: name of tag) containing <version>
tag found.
i have tried different kinds of regular expressions, can't seem write right one. can somehow tell expressions match "tag abc" following?
<tag abc> <version>anything</version> </the same tag abc>
of course, other solutions welcome!
tag 1 level above
<version>
it's better me use regular expressions
can't use parser here
you should use xml parser! it's easier, more robust , shouldn't involve great effort refactoring. use lxml
have getparent()
function , xpath 1.0 implemented.
thanks stribizhev recommending should remarked
anyway, here's workaround work simple cases (and fail in many real-life examples).
- if, , if, xml indented, capture next closing tag lower indentation level.
regex:
(?smi)^([ \t]+)<version>.*?^(?!\1)[ \t]*</([^\s>]+)
captures closing tag in group 2.
is:
^([ \t]+)
captures spaces before<version>
tag want.*?^
finds next line(?!\1)[ \t]*
less indentation</([^\s>]+)
, captures closing tag
code:
import re text = ''' <modules> <orba_sheepla> <!-- module version --> <version>0.9.25</version> </orba_sheepla> </modules>''' pattern = re.compile( r'^([ \t]+)<version>.*?^(?!\1)[ \t]*</([^\s>]+)', re.i | re.s | re.m) match = pattern.search(text) if match: print(match.group(2))
output:
orba_sheepla
Comments
Post a Comment