python - Matching multiple words with FreqDist in nltk -

April 15, 2015

import nltk nltk.tokenize import word_tokenize  txt = "finding common place isn't commonly available among commoners place"  fd = nltk.freqdist()  w in word_tokenize(a.lower()):     fd[w] += 1

i have above script works fine. if fd['place'] 2, if type fd['common'] 1.

is possible type similar fd['common*'] (which doesn't work) obtain 3 , possibly list of matches? 3 matches (common, commonly, commoners)

i'm assuming has regex not sure how implement freqdist()

if not, there other packages might that?

freqdist kind of dictionary, , dictionary keys work exact match.

to use regexps this, need hard way: iterate on entries , add counts words match. of course, needs scan whole list slow if list large, , need lot.

if you're after matching prefixes, use data structure called "prefix tree" or "trie". can guess does. simple work-around record counts in freqdist each prefix of each word see (so not complete word).

Search This Blog

TSQL

python - Matching multiple words with FreqDist in nltk -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

1111. appearing after print sequence - php -

android - How to create dynamically Fragment pager adapter -