[Python-projects] Pylint : Non-ASCII characters count double

Sylvain Thénault sylvain.thenault at logilab.fr
Thu Jan 31 11:10:51 CET 2008


On Thu, Jan 31, 2008 at 11:08:16AM +0100, Aurélien Campéas wrote:
> On Wed, Jan 30, 2008 at 06:43:32PM +0100, Lubin Fayolle wrote:
> > Hello,
> > 
> > I have just understood why some of my lines were deemed too long by
> > pylint. For instance, when checking the below script:
> > 
> > <BEGINNING OF SCRIPT>
> > # -*- coding: utf-8 -*-
> > 
> > """A little script to demonstrate a little bug."""
> > 
> > print "------------------------------------------------------------------------"
> > print "-----------------------------------------------------------------------é"
> > 
> > <END OF SCRIPT>
> > 
> > Pylint returns the following warning:
> > 
> >      myscript.py:6: [C] Line too long (81/80)
> > 
> > where line 6 corresponds to the second call of print. The two lines are
> > the same length though... It seems that 'é' counts double, like many (any?)
> > other non-ASCII characters.
> > 
> 
> >>> u'é'.encode('utf-8')
> '\xc3\xa9'
> >>> len(u'é'.encode('utf-8'))
> 2
> 
> UTF-8 is a variable-length encoding. Everything outside pure ASCII (7
> bits) weights than one byte.
> 
> > Thought it was worth mentioning even if it is only an annoyance...
> 
> By changing the encoding of your file to ISO-8859-1 (for instance) you
> would avoid this annoyance (I think).

True. Though this is still a pylint bug as well...
-- 
Sylvain Thénault                               LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian:  http://www.logilab.fr/formations
Développement logiciel sur mesure:       http://www.logilab.fr/services
Python et calcul scientifique:           http://www.logilab.fr/science



More information about the Python-Projects mailing list