java - Filtering out strings from a HashSet (or other collection) -


i reading contents of txt file hashset. file contains every word in english language, , each word becomes string in hashset.

in app, characters being added string. want check whether string is, or can become equal to, of strings in hashset. is, hashset contains string apple. have string appl, , want filter out hashset until becomes set strings start appl (in case set apple).

i can iterate on entire hashset , use startswith(string) method, build new filtered hashset. initial hashset big, question is: there more efficient way (perhaps using different type of collection?)

some code of how right now:

private hashset<string> filter(string partofword){     hashset<string> filteredset = new hashset<>();      (string word : dictionary) { // dictionary full hashset         if (word.startswith(partofword)) {             filteredset.add(word);         }     }     return filteredset; } 

a trie ultimate weapon of doom task, can efficiency out of treeset:

private treeset<string> dictionary;  private treeset<string> filter(string partofword) {     return (treeset<string>)dictionary.subset(partofword, partofword + "zzz"); } 

everything start "appl" between "appl" (inclusive if it's word itself) , "applzzz" (no english word has 3 consecutive "z"'s in it), lexicographically greater words start "appl". time complexity of call subset() o(log n) find start of subset , o(m) (m = number returned) range, pretty good.

note if able reuse returned set new dictionary word grows, have more efficient code overall.

the cast treeset<string> needed because subset() method of sortedset interface , returns sortedset, it's covariant because treeset implementation returns view (another efficiency benefit), of course treeset.

for improved efficiency, uglier code, use sorted string[] , arrays.binarysearch(), once located hit, iterate along array collection hits.

note both treeset , sorted array have o(log n) look-up time, whereas hashset (although unsuitable task) o(1) time.


Comments

Popular posts from this blog

html - Outlook 2010 Anchor (url/address/link) -

javascript - Why does running this loop 9 times take 100x longer than running it 8 times? -

Getting gateway time-out Rails app with Nginx + Puma running on Digital Ocean -