User Tools

Site Tools


sorting_unicode

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

sorting_unicode [2007/07/27 13:39] (current)
Line 1: Line 1:
 +====== Sorting Unicode ======
 +
 +To have sort and search in unicode ignore diacritical marks you need to use [[http://​en.wikipedia.org/​wiki/​Unicode_normalization|normalized compatibility decomposition NFKD]] and then take just the first (or only the ASCII) characters of each sequence. ​
 +
 +The [[http://​www.icu-project.org/​|ICU library]] has C and Java bindings for normalization and lots of other stuff. ​
 +
 +In Python this looks like <​code>​unicodedata.normalize('​NFKD',​s).encode('​ASCII','​ignore'​)</​code>​
  
sorting_unicode.txt · Last modified: 2007/07/27 13:39 (external edit)