python - Regular expression of a sentence -
i'm trying write regular expression represent sentence following conditions: starts capital letter, ends period (and 1 period can appear), , allowed contain comma or semi-colon, when does, must appear (letter)(semicolon)(space) or (letter)(comma)(space).
i've got capital letter , period down. have idea code think i'm not getting syntax right...
in english, expression sentence looks this:
(capital letter) ((lowercase letter)(space) ((lowercase letter)(comma)(space))* ((lowercase letter)(semicolon)(space)* )* (period)
i realize ignores case first letter of sentence followed comma or semicolon, it's safe ignore case.
now when try code in python, try following (i've added whitespace make things easier read):
sentence = re.compile("^[a-z] [a-z\\s (^[a-z];\\s$)* (^[a-z],\\s$)*]* \.$")
i feel it's syntax issue... i'm not sure if i'm allowed have semicolon , comma portions inside of parentheses.
sample inputs match definition:
"this sentence." "hello, world." "hi there; hi there."
sample inputs not match definition:
"i ate breakfast." "this , sentence." "what time it?"
this match said above.
^"[a-z][a-z]*(\s*|[a-z]*|(?<!\s)[;,](?=\s))*[.]"$
? => demo
this match:
"this sentence." "hello, world." "hi there; hi there."
this won't match:
"i ate breakfast." "this , sentence." "what time it?" "i ,d am." "i a,d am."
if don't need "
remove regex.
if need regex in python, try
re.compile(r'^[a-z][a-z]*(\s*|[a-z]*|(?<!\s)[;,](?=\s))*[.]$')
python demo
import re tests = ["this sentence." ,"hello, world." ,"hi there; hi there." ,"i ate breakfast." ,"this , sentence." ,"what time it?"] rex = re.compile(r'^[a-z][a-z]*(\s*|[a-z]*|(?<![\s])[;,])*[.]$') test in tests: print rex.match(test)
output
<_sre.sre_match object @ 0x7f31225afb70> <_sre.sre_match object @ 0x7f31225afb70> <_sre.sre_match object @ 0x7f31225afb70> none none none
Comments
Post a Comment