python - Matching multiple words with FreqDist in nltk -
import nltk nltk.tokenize import word_tokenize txt = "finding common place isn't commonly available among commoners place" fd = nltk.freqdist() w in word_tokenize(a.lower()): fd[w] += 1
i have above script works fine. if fd['place']
2, if type fd['common']
1.
is possible type similar fd['common*']
(which doesn't work) obtain 3 , possibly list of matches? 3 matches (common, commonly, commoners)
i'm assuming has regex
not sure how implement freqdist()
if not, there other packages might that?
freqdist
kind of dictionary, , dictionary keys work exact match.
to use regexps this, need hard way: iterate on entries , add counts words match. of course, needs scan whole list slow if list large, , need lot.
if you're after matching prefixes, use data structure called "prefix tree" or "trie". can guess does. simple work-around record counts in freqdist
each prefix of each word see (so not complete word).
Comments
Post a Comment