java - Filtering out strings from a HashSet (or other collection) -
i reading contents of txt file hashset
. file contains every word in english language, , each word becomes string in hashset
.
in app, characters being added string. want check whether string is, or can become equal to, of strings in hashset
. is, hashset
contains string apple. have string appl, , want filter out hashset
until becomes set strings start appl (in case set apple).
i can iterate on entire hashset
, use startswith(string)
method, build new filtered hashset
. initial hashset
big, question is: there more efficient way (perhaps using different type of collection?)
some code of how right now:
private hashset<string> filter(string partofword){ hashset<string> filteredset = new hashset<>(); (string word : dictionary) { // dictionary full hashset if (word.startswith(partofword)) { filteredset.add(word); } } return filteredset; }
a trie ultimate weapon of doom task, can efficiency out of treeset
:
private treeset<string> dictionary; private treeset<string> filter(string partofword) { return (treeset<string>)dictionary.subset(partofword, partofword + "zzz"); }
everything start "appl" between "appl" (inclusive if it's word itself) , "applzzz" (no english word has 3 consecutive "z"'s in it), lexicographically greater words start "appl". time complexity of call subset()
o(log n)
find start of subset , o(m)
(m = number returned) range, pretty good.
note if able reuse returned set new dictionary word grows, have more efficient code overall.
the cast treeset<string>
needed because subset()
method of sortedset
interface , returns sortedset
, it's covariant because treeset
implementation returns view (another efficiency benefit), of course treeset
.
for improved efficiency, uglier code, use sorted string[]
, arrays.binarysearch()
, once located hit, iterate along array collection hits.
note both treeset
, sorted array have o(log n)
look-up time, whereas hashset
(although unsuitable task) o(1)
time.
Comments
Post a Comment