This shows you the differences between two versions of the page.
sorting_unicode [2007/07/27 13:39] (current)
|Line 1:||Line 1:|
|+||====== Sorting Unicode ======|
|+||To have sort and search in unicode ignore diacritical marks you need to use [[http://en.wikipedia.org/wiki/Unicode_normalization|normalized compatibility decomposition NFKD]] and then take just the first (or only the ASCII) characters of each sequence. |
|+||The [[http://www.icu-project.org/|ICU library]] has C and Java bindings for normalization and lots of other stuff. |
|+||In Python this looks like <code>unicodedata.normalize('NFKD',s).encode('ASCII','ignore')</code>|