python - Regular expression of a sentence -
i'm trying write regular expression represent sentence following conditions: starts capital letter, ends period (and 1 period can appear), , allowed contain comma or semi-colon, when does, must appear (letter)(semicolon)(space) or (letter)(comma)(space).
i've got capital letter , period down. have idea code think i'm not getting syntax right...
in english, expression sentence looks this:
(capital letter) ((lowercase letter)(space) ((lowercase letter)(comma)(space))* ((lowercase letter)(semicolon)(space)* )* (period) i realize ignores case first letter of sentence followed comma or semicolon, it's safe ignore case.
now when try code in python, try following (i've added whitespace make things easier read):
sentence = re.compile("^[a-z] [a-z\\s (^[a-z];\\s$)* (^[a-z],\\s$)*]* \.$") i feel it's syntax issue... i'm not sure if i'm allowed have semicolon , comma portions inside of parentheses.
sample inputs match definition:
"this sentence." "hello, world." "hi there; hi there." sample inputs not match definition:
"i ate breakfast." "this , sentence." "what time it?"
this match said above.
^"[a-z][a-z]*(\s*|[a-z]*|(?<!\s)[;,](?=\s))*[.]"$? => demo
this match:
"this sentence." "hello, world." "hi there; hi there." this won't match:
"i ate breakfast." "this , sentence." "what time it?" "i ,d am." "i a,d am." if don't need " remove regex.
if need regex in python, try
re.compile(r'^[a-z][a-z]*(\s*|[a-z]*|(?<!\s)[;,](?=\s))*[.]$')
python demo
import re tests = ["this sentence." ,"hello, world." ,"hi there; hi there." ,"i ate breakfast." ,"this , sentence." ,"what time it?"] rex = re.compile(r'^[a-z][a-z]*(\s*|[a-z]*|(?<![\s])[;,])*[.]$') test in tests: print rex.match(test) output
<_sre.sre_match object @ 0x7f31225afb70> <_sre.sre_match object @ 0x7f31225afb70> <_sre.sre_match object @ 0x7f31225afb70> none none none
Comments
Post a Comment