User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

sorting_unicode [2007/07/27 13:39] (current)
Line 1: Line 1:
 +====== Sorting Unicode ======
 +To have sort and search in unicode ignore diacritical marks you need to use [[http://​​wiki/​Unicode_normalization|normalized compatibility decomposition NFKD]] and then take just the first (or only the ASCII) characters of each sequence. ​
 +The [[http://​​|ICU library]] has C and Java bindings for normalization and lots of other stuff. ​
 +In Python this looks like <​code>​unicodedata.normalize('​NFKD',​s).encode('​ASCII','​ignore'​)</​code>​
sorting_unicode.txt · Last modified: 2007/07/27 13:39 (external edit)