User Tools

Site Tools


sorting_unicode

Differences

This shows you the differences between two versions of the page.


sorting_unicode [2020/10/10 14:13] (current) – created - external edit 127.0.0.1
Line 1: Line 1:
 +====== Sorting Unicode ======
 +
 +To have sort and search in unicode ignore diacritical marks you need to use [[http://en.wikipedia.org/wiki/Unicode_normalization|normalized compatibility decomposition NFKD]] and then take just the first (or only the ASCII) characters of each sequence. 
 +
 +The [[http://www.icu-project.org/|ICU library]] has C and Java bindings for normalization and lots of other stuff. 
 +
 +In Python this looks like <code>unicodedata.normalize('NFKD',s).encode('ASCII','ignore')</code>