Sorting Unicode

To have sort and search in unicode ignore diacritical marks you need to use normalized compatibility decomposition NFKD and then take just the first (or only the ASCII) characters of each sequence.

The ICU library has C and Java bindings for normalization and lots of other stuff.

In Python this looks like

unicodedata.normalize('NFKD',s).encode('ASCII','ignore')

dokuwiki

User Tools

Site Tools

Sorting Unicode

Page Tools