The goal is to have a positive match on strings ‘Šimánek’ and ‘šimánek’ when doing a case-insensitive comparison. Sounds like an easy task, right? It turns out it’s not that easy due to the ‘Š/š’ national characters at the beginning of the strings. A simple:
>>> re.match(u'šimánek', u'Šimánek', re.I)
returns None
. Setting the right locale or using the re.L
flag doesn’t help either. After a couple of experiments, I found a way how to match these strings:
>>> re.match('šimánek'.decode('utf-8'), 'Šimánek'.decode('utf-8'), re.I | re.U)
<_sre.SRE_Match object at 0x8d82480>
Hope this helps.