python - Fastest way to match substring from large dict -


i have (usually < 300 symbols length) string 'aabbccdcabcbbacdaaa'.

there python dictionary keys strings in similar format, e.g. 'bcccd', key length varies 10 100 symbols. dictionary has half million items.

i need match initial string dictionary's value or find out there no proper values in dictionary. matching condition: dictionary key should somewhere within string (strict matching).

what best way, in terms of computational speed, it? feel there should tricky way hash initial string , dictionary keys apply clever ways of substring search (like rabin-karp or knuth-morris-pratt). or suffix tree-like structure solution?

def search(string, dict_search):     # if 2 lines expensive, calculate them , pass arguments     max_key = max(len(x) x in dict_search)     min_key = min(len(x) x in dict_search)      return set(         string[x:x+i]          in range(min_key, max_key+1)         x in range(len(string)-i+1)         if string[x:x+i] in dict_search     ) 

running:

>>> search('aabbccdcabcbbacdaaa', {'aaa', 'acd', 'adb', 'bccd', 'cbbb', 'abc'}) {'aaa', 'abc', 'acd', 'bccd'} 

Comments

Popular posts from this blog

1111. appearing after print sequence - php -

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

Ruby on Rails, ActiveRecord, Postgres, UTF-8 and ASCII-8BIT encodings -