python - Matching multiple words with FreqDist in nltk -


import nltk nltk.tokenize import word_tokenize  txt = "finding common place isn't commonly available among commoners place"  fd = nltk.freqdist()  w in word_tokenize(a.lower()):     fd[w] += 1 

i have above script works fine. if fd['place'] 2, if type fd['common'] 1.

is possible type similar fd['common*'] (which doesn't work) obtain 3 , possibly list of matches? 3 matches (common, commonly, commoners)

i'm assuming has regex not sure how implement freqdist()

if not, there other packages might that?

freqdist kind of dictionary, , dictionary keys work exact match.

to use regexps this, need hard way: iterate on entries , add counts words match. of course, needs scan whole list slow if list large, , need lot.

if you're after matching prefixes, use data structure called "prefix tree" or "trie". can guess does. simple work-around record counts in freqdist each prefix of each word see (so not complete word).


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -