Unicode safe find using boost and standard C++ -

March 15, 2012

consider following snippet:

namespace bl = boost::locale; static bl::generator gen; static auto loc = gen("en_us.utf-8"); std::string foo8 = u8"föo"; std::string deco = bl::normalize(foo8,bl::norm_nfd,loc); std::string comp = bl::normalize(foo8,bl::norm_nfc,loc); std::cout << "decomposed: " << deco.find("o") << ", composed: " << comp.find("o") <<"\n";

this gives: "decomposed: 1, composed: 3".

now, correct answer depends on collation factor, cases latter want -- first location of o, not first part of decomposed ö. example can normalize string nfc ensure desired result, won't work cases grapheme-cluster can't composed.

further, x.find("ö") have implementation defined behavior, there no guarantees how ö encoded in search.

i can implement unicode safe find function implementing algorithm in uax 29, or normalizing search strings, i'm wondering if there way using c++ std library , boost -- perhaps combining locale string algorithm -- haven't found solution.

anyone have definitive answer? i'm aware use icu, , boost::locale c++ friendly wrapper around icu library (at least if want full unicode support).

further, x.find("ö") have implementation defined behavior, there no guarentees how ö encoded in search.

sadly, there isn't can this. client of api have ensure call using u8 prefix , argument normalized. 1 write find function normalizes input prior searching, there's no way mitigate ambiguity in encoding.

i can implement unicode safe find function implementing algorithm in uax 29

there's no need implement since implemented boost.locales segment_index.

i'm wondering if there way using c++ std library , boost -- perhaps combining locale string algorithm -- haven't found solution.

the standard library borderline useless , far know boost.locale doesn't have string search facilities. icu's string search functionality uses notion of canonical equivalence , that's best bet.

Search This Blog

TSQL

Unicode safe find using boost and standard C++ -

Comments

Post a Comment

Popular posts from this blog

java - WARN : org.springframework.web.servlet.PageNotFound - No mapping found for HTTP request with URI [/board/] in DispatcherServlet with name 'appServlet' -

android - How to create dynamically Fragment pager adapter -

1111. appearing after print sequence - php -