Issue 29565854: Noissue - Improved accuracy of evaluated expressions for A103 and A207

Sebastian Noack

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A103.py File flake8-eyeo/tests/A103.py (right): https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A103.py#newcode30 flake8-eyeo/tests/A103.py:30: return dir == x This was causing an A103 ...

Oct. 5, 2017, 10:07 p.m. (2017-10-05 22:07:12 UTC) #1

Vasily Kuznetsov

Hi Sebastian, I'm not completely convinced about this change, at least not yet. It seems ...

Oct. 6, 2017, 10:46 a.m. (2017-10-06 10:46:21 UTC) #2

tlucas

Hey Sebastian, Please find my comment below. Naive as i am, shouldn't a change to ...

Oct. 6, 2017, 11:41 a.m. (2017-10-06 11:41:36 UTC) #3

Sebastian Noack

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_eyeo.py File flake8-eyeo/flake8_eyeo.py (right): https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_eyeo.py#newcode50 flake8-eyeo/flake8_eyeo.py:50: names = {'__builtins__': {'True': True, 'False': False, 'None': None}} ...

Oct. 6, 2017, 6:11 p.m. (2017-10-06 18:11:10 UTC) #4

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_...
File flake8-eyeo/flake8_eyeo.py (right):

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_...
flake8-eyeo/flake8_eyeo.py:50: names = {'__builtins__': {'True': True, 'False':
False, 'None': None}}
On 2017/10/06 11:41:36, tlucas wrote:
> Why are these values necessary?
> 
> From what i understand, they only seem to be needed to get A207 (duplicate
keys)
> evaluated correctly, which also seems to work out of the box for python3*.
> 
> However, providing those values produces the following output for the given
> example:
> 
> if False == foo:
>     bar()
> 
> if True == foo:
>     bar()
> 
> ./main.py:1:4: A103 yoda condition
> ./main.py:1:10: E712 comparison to False should be 'if cond is False:' or 'if
> not cond:'
> ./main.py:4:4: A103 yoda condition
> ./main.py:4:9: E712 comparison to True should be 'if cond is True:' or 'if
> cond:'
> 
> 
> If you would only set names to 'names = {'__builtins__': {}}' the following
> (IMHO more valuable) output would be produced:
> 
> ./main.py:1:10: E712 comparison to False should be 'if cond is False:' or 'if
> not cond:'
> ./main.py:4:9: E712 comparison to True should be 'if cond is True:' or 'if
cond:
> 
> 
> Since evaluate() is only called for is_const() (A103) or the A207 checks, you
> could add an argument 'names' to evaluate(), which could default to {}.
> 
> What do you think about this?

True, False and None are special. They are the only builtins which are
primitives (i.e. immutable values of scalar built-in types). Semantically they
are more similar to numeric or string literals. That the Python tokenizer treats
them as names which are then resolved as builtins, is just an implementation
detail. However, in Python 3, they are read-only (and redefining them in Python
2 would break almost every program). So it is safe to assume that any occurrence
of False, True and None in the source code refer to the original value, while
other builtins are occasionally overridden. Hence treating every built-in but
True/False/None as volatile seems logical, both for A103 and A207.

Yes, `False == foo` is both violating A103 and E712. But from the different
column they indicate, you can see that E712 is referring to the operator, while
A103 is referring to the order of operands. So it is not really redundant. Also
that means that `False is foo` only violates A103 but not E712.

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A...
File flake8-eyeo/tests/A103.py (right):

https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A...
flake8-eyeo/tests/A103.py:30: return dir == x
On 2017/10/06 10:46:21, Vasily Kuznetsov wrote:
> On 2017/10/05 22:07:11, Sebastian Noack wrote:
> > This was causing an A103 (yoda expression), since we determine whether
> something
> > is a constant expression with eval(). So that if it can successfully be
> executed
> > without accessing any global or local variables, it is considered a constant
> > expression, and constant expressions read weird on the left hand side of a
> > comparison. However, eval('dir', {}) doesn't fail because "dir" is a
builtin.
> 
> Isn't this kind of a genuine yoda expression though, since `dir` is more or
less
> like a constant (and we even show another error if people try to redefine it)?
> Same for the other builtins.

Another way to look at it is that builtin functions are rather like default
globals than an inherit part of the programming language. Hence in the (rather
theoretical) case of `dir == orig_dir` this doesn't quite sound like Yoda speak,
while `0 == foo` or `False is foo` certainly does. Another (much more common)
case is when builtin names (like "dir") are shadowed by local variables. It took
me quite a while to figure out where the bogus A111 came from in this line:
https://hg.adblockplus.org/buildtools/file/d4ca9bfb82b5/localeTools.py#l412

Vasily Kuznetsov

On 2017/10/06 18:11:10, Sebastian Noack wrote: > https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_eyeo.py > File flake8-eyeo/flake8_eyeo.py (right): > > https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_eyeo.py#newcode50 ...

Oct. 9, 2017, 8:14 p.m. (2017-10-09 20:14:22 UTC) #5

On 2017/10/06 18:11:10, Sebastian Noack wrote:
>
https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_...
> File flake8-eyeo/flake8_eyeo.py (right):
> 
>
https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/flake8_...
> flake8-eyeo/flake8_eyeo.py:50: names = {'__builtins__': {'True': True,
'False':
> False, 'None': None}}
> On 2017/10/06 11:41:36, tlucas wrote:
> > Why are these values necessary?
> > 
> > From what i understand, they only seem to be needed to get A207 (duplicate
> keys)
> > evaluated correctly, which also seems to work out of the box for python3*.
> > 
> > However, providing those values produces the following output for the given
> > example:
> > 
> > if False == foo:
> >     bar()
> > 
> > if True == foo:
> >     bar()
> > 
> > ./main.py:1:4: A103 yoda condition
> > ./main.py:1:10: E712 comparison to False should be 'if cond is False:' or
'if
> > not cond:'
> > ./main.py:4:4: A103 yoda condition
> > ./main.py:4:9: E712 comparison to True should be 'if cond is True:' or 'if
> > cond:'
> > 
> > 
> > If you would only set names to 'names = {'__builtins__': {}}' the following
> > (IMHO more valuable) output would be produced:
> > 
> > ./main.py:1:10: E712 comparison to False should be 'if cond is False:' or
'if
> > not cond:'
> > ./main.py:4:9: E712 comparison to True should be 'if cond is True:' or 'if
> cond:
> > 
> > 
> > Since evaluate() is only called for is_const() (A103) or the A207 checks,
you
> > could add an argument 'names' to evaluate(), which could default to {}.
> > 
> > What do you think about this?
> 
> True, False and None are special. They are the only builtins which are
> primitives (i.e. immutable values of scalar built-in types). Semantically they
> are more similar to numeric or string literals. That the Python tokenizer
treats
> them as names which are then resolved as builtins, is just an implementation
> detail. However, in Python 3, they are read-only (and redefining them in
Python
> 2 would break almost every program). So it is safe to assume that any
occurrence
> of False, True and None in the source code refer to the original value, while
> other builtins are occasionally overridden. Hence treating every built-in but
> True/False/None as volatile seems logical, both for A103 and A207.
> 
> Yes, `False == foo` is both violating A103 and E712. But from the different
> column they indicate, you can see that E712 is referring to the operator,
while
> A103 is referring to the order of operands. So it is not really redundant.
Also
> that means that `False is foo` only violates A103 but not E712.
> 
>
https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A...
> File flake8-eyeo/tests/A103.py (right):
> 
>
https://codereview.adblockplus.org/29565854/diff/29565855/flake8-eyeo/tests/A...
> flake8-eyeo/tests/A103.py:30: return dir == x
> On 2017/10/06 10:46:21, Vasily Kuznetsov wrote:
> > On 2017/10/05 22:07:11, Sebastian Noack wrote:
> > > This was causing an A103 (yoda expression), since we determine whether
> > something
> > > is a constant expression with eval(). So that if it can successfully be
> > executed
> > > without accessing any global or local variables, it is considered a
constant
> > > expression, and constant expressions read weird on the left hand side of a
> > > comparison. However, eval('dir', {}) doesn't fail because "dir" is a
> builtin.
> > 
> > Isn't this kind of a genuine yoda expression though, since `dir` is more or
> less
> > like a constant (and we even show another error if people try to redefine
it)?
> > Same for the other builtins.
> 
> Another way to look at it is that builtin functions are rather like default
> globals than an inherit part of the programming language. Hence in the (rather
> theoretical) case of `dir == orig_dir` this doesn't quite sound like Yoda
speak,
> while `0 == foo` or `False is foo` certainly does. Another (much more common)
> case is when builtin names (like "dir") are shadowed by local variables. It
took
> me quite a while to figure out where the bogus A111 came from in this line:
> https://hg.adblockplus.org/buildtools/file/d4ca9bfb82b5/localeTools.py#l412

I agree that generating two errors for cases which are both yoda conditions and
incorrect comparison operator is appropriate. Also, this is current behavior and
there's no particularly good reason to change it.

However, I'm still not convinced about the mutability of other builtins. In
overwhelming majority of cases builtins won't be redefined, and we even have an
error (A302) to prevent this. Given this, it makes sense to treat them as
constants in comparisons (A103) and even more so in dictionary key and set
element evaluation (A207).

I can see how the example that you give is annoying, but the reason it ends up
with an A103 is that it also deserves an A302. When our code gets more
compliant, and error ignore lists are gone, you won't be wondering why you're
getting an A103 somewhere, because you'll also see an A302 next to it and once
that's fixed A103 will be gone too.

The reasoning from above paragraph doesn't apply for non-essential builtins,
like `file`, `apply`, etc. for which A302 will not be raised. I think relaxing
A103 (and probably A207) for those could make sense, since we seem to be
implying that it's ok to redefine those names. So basically it seems to me that
A103 (and probably also A207) should be consistent with A302. Isn't this
logical?

Sebastian Noack

On 2017/10/09 20:14:22, Vasily Kuznetsov wrote: > I agree that generating two errors for cases ...

Oct. 11, 2017, 2:52 a.m. (2017-10-11 02:52:48 UTC) #6

On 2017/10/09 20:14:22, Vasily Kuznetsov wrote:
> I agree that generating two errors for cases which are both yoda conditions
and
> incorrect comparison operator is appropriate. Also, this is current behavior
and
> there's no particularly good reason to change it.
> 
> However, I'm still not convinced about the mutability of other builtins. In
> overwhelming majority of cases builtins won't be redefined, and we even have
an
> error (A302) to prevent this. Given this, it makes sense to treat them as
> constants in comparisons (A103) and even more so in dictionary key and set
> element evaluation (A207).
> 
> I can see how the example that you give is annoying, but the reason it ends up
> with an A103 is that it also deserves an A302. When our code gets more
> compliant, and error ignore lists are gone, you won't be wondering why you're
> getting an A103 somewhere, because you'll also see an A302 next to it and once
> that's fixed A103 will be gone too.
> 
> The reasoning from above paragraph doesn't apply for non-essential builtins,
> like `file`, `apply`, etc. for which A302 will not be raised. I think relaxing
> A103 (and probably A207) for those could make sense, since we seem to be
> implying that it's ok to redefine those names. So basically it seems to me
that
> A103 (and probably also A207) should be consistent with A302. Isn't this
> logical?

Let's see it pragmatic: There is virtually no real case scenario where built-in
functions are used as dict/set keys. There is virtually no scenario in which you
get an A103 involving a built-in function on the left-hand where the A302 (which
you get as well) isn't the actual/larger issue. But on the other hand, a common
case we have in our existing code base is that A302 is ignored, and you find the
shadowed builtin on left-hand side of an operation, causing a rather confusing
A103.

Now let's look at it semantically: A Yoda condition (A103) is an expression with
a literal part on the left-hand side and a variable part on the right-hand side
of an operation, so that it reads as if Yoda would say it (as opposed to natural
English language), e.g. "10 being larger than 3 when you add to number" (i.e.
"10 > 3 + n"). There are no real constants in Python, and even if you consider
builtins constant (which they aren't) I cannot come up with a realistic example,
so let's just pretend for a moment that there would be constants in Python (or
just assume a different programming language): "The (constant) maximum number of
connections is larger than the active number of connections (i.e.
"MAX_CONNECTIONS > n_connection"). Does that sound like Yoda?

For A207 this is less an issue, since we essentially just want to avoid
duplicate keys. So even if we cannot determine the final value for sure, if the
same name is given for two keys this is a duplicate. However, in that case it
doesn't matter whether the name refers to a builtin, global or local variable.
So that {dir, dir} gives and A207 while {x, x} doesn't is so inherently
inconsistent and confusing that I consider it a positive side effect that this
fix is eliminating this inconsistency.

When I initially wrote this code, I never meant to consider built-ins (other
than True, False and None). It ever was an unintended side effect that I merely
accepted because back then, 1. I considered false positives extremely rare, 2. I
didn't know that you can control the built-ins made available by eval().

Vasily Kuznetsov

On 2017/10/11 02:52:48, Sebastian Noack wrote: > On 2017/10/09 20:14:22, Vasily Kuznetsov wrote: > > ...

Oct. 11, 2017, 4:13 p.m. (2017-10-11 16:13:09 UTC) #7

On 2017/10/11 02:52:48, Sebastian Noack wrote:
> On 2017/10/09 20:14:22, Vasily Kuznetsov wrote:
> > I agree that generating two errors for cases which are both yoda conditions
> and
> > incorrect comparison operator is appropriate. Also, this is current behavior
> and
> > there's no particularly good reason to change it.
> > 
> > However, I'm still not convinced about the mutability of other builtins. In
> > overwhelming majority of cases builtins won't be redefined, and we even have
> an
> > error (A302) to prevent this. Given this, it makes sense to treat them as
> > constants in comparisons (A103) and even more so in dictionary key and set
> > element evaluation (A207).
> > 
> > I can see how the example that you give is annoying, but the reason it ends
up
> > with an A103 is that it also deserves an A302. When our code gets more
> > compliant, and error ignore lists are gone, you won't be wondering why
you're
> > getting an A103 somewhere, because you'll also see an A302 next to it and
once
> > that's fixed A103 will be gone too.
> > 
> > The reasoning from above paragraph doesn't apply for non-essential builtins,
> > like `file`, `apply`, etc. for which A302 will not be raised. I think
relaxing
> > A103 (and probably A207) for those could make sense, since we seem to be
> > implying that it's ok to redefine those names. So basically it seems to me
> that
> > A103 (and probably also A207) should be consistent with A302. Isn't this
> > logical?

I'm a bit worried that we spend more time on this issue than what it deserves
and that it's a bit slow to discuss this over a code review, especially when
we're in very different timezones. Maybe we should find a better way to converge
to something quicker.

> Let's see it pragmatic: There is virtually no real case scenario where
built-in
> functions are used as dict/set keys. There is virtually no scenario in which
you
> get an A103 involving a built-in function on the left-hand where the A302
(which
> you get as well) isn't the actual/larger issue. But on the other hand, a
common
> case we have in our existing code base is that A302 is ignored, and you find
the
> shadowed builtin on left-hand side of an operation, causing a rather confusing
> A103.

Indeed it's hard to find a real world case where built-in functions are used in
set/dict keys, but I'm still not thrilled when I think that something like
`{'foo': 1, str('foo'): 2}` would become an acceptable dict, while before it was
correctly flagged as a somewhat broken one. This seems to be a regression, even
though it's not very significant.

> Now let's look at it semantically: A Yoda condition (A103) is an expression
with
> a literal part on the left-hand side and a variable part on the right-hand
side
> of an operation, so that it reads as if Yoda would say it (as opposed to
natural
> English language), e.g. "10 being larger than 3 when you add to number" (i.e.
> "10 > 3 + n"). There are no real constants in Python, and even if you consider
> builtins constant (which they aren't) I cannot come up with a realistic
example,
> so let's just pretend for a moment that there would be constants in Python (or
> just assume a different programming language): "The (constant) maximum number
of
> connections is larger than the active number of connections (i.e.
> "MAX_CONNECTIONS > n_connection"). Does that sound like Yoda?

Yeah, this sounds a bit Yoda.
 
> For A207 this is less an issue, since we essentially just want to avoid
> duplicate keys. So even if we cannot determine the final value for sure, if
the
> same name is given for two keys this is a duplicate. However, in that case it
> doesn't matter whether the name refers to a builtin, global or local variable.
> So that {dir, dir} gives and A207 while {x, x} doesn't is so inherently
> inconsistent and confusing that I consider it a positive side effect that this
> fix is eliminating this inconsistency.

Here the second example `{x, x}` is also broken and it's a pity that we can't
detect this automatically. However, saying that because we can't detect that, we
should also allow the first broken one, which we can detect sounds a bit
strange. I'd prefer moving in direction of detecting more errors, not less.
Otherwise yes, inconsistencies are annoying.

> When I initially wrote this code, I never meant to consider built-ins (other
> than True, False and None). It ever was an unintended side effect that I
merely
> accepted because back then, 1. I considered false positives extremely rare, 2.
I
> didn't know that you can control the built-ins made available by eval().

I'm still not very convinced about removing the check that I think makes sense
in order to make our code, which we agree could use some fixing, pass the
checker. On the other hand I don't think Yoda conditions are such a big deal
that they would be worth all the time we spent on this review. But how about
making a version that doesn't break A207?

Sebastian Noack

On 2017/10/11 16:13:09, Vasily Kuznetsov wrote: > On 2017/10/11 02:52:48, Sebastian Noack wrote: > > ...

Oct. 11, 2017, 6:31 p.m. (2017-10-11 18:31:01 UTC) #8

On 2017/10/11 16:13:09, Vasily Kuznetsov wrote:
> On 2017/10/11 02:52:48, Sebastian Noack wrote:
> > There are no real constants in Python, and even if you consider
> > builtins constant (which they aren't) I cannot come up with a
> > realistic example,
> > so let's just pretend for a moment that there would be constants in Python
> > (or just assume a different programming language): "The (constant) maximum
> > number of connections is larger than the active number of connections (i.e.
> > "MAX_CONNECTIONS > n_connection"). Does that sound like Yoda?
> 
> Yeah, this sounds a bit Yoda.

It doesn't.

> > For A207 this is less an issue, since we essentially just want to avoid
> > duplicate keys. So even if we cannot determine the final value for sure, if
> the
> > same name is given for two keys this is a duplicate. However, in that case
it
> > doesn't matter whether the name refers to a builtin, global or local
variable.
> > So that {dir, dir} gives and A207 while {x, x} doesn't is so inherently
> > inconsistent and confusing that I consider it a positive side effect that
this
> > fix is eliminating this inconsistency.
> 
> Here the second example `{x, x}` is also broken and it's a pity that we can't
> detect this automatically. However, saying that because we can't detect that,
we
> should also allow the first broken one, which we can detect sounds a bit
> strange. I'd prefer moving in direction of detecting more errors, not less.
> Otherwise yes, inconsistencies are annoying.
> 
> > When I initially wrote this code, I never meant to consider built-ins (other
> > than True, False and None). It ever was an unintended side effect that I
> merely
> > accepted because back then, 1. I considered false positives extremely rare,
2.
> I
> > didn't know that you can control the built-ins made available by eval().
> 
> I'm still not very convinced about removing the check that I think makes sense
> in order to make our code, which we agree could use some fixing, pass the
> checker. On the other hand I don't think Yoda conditions are such a big deal
> that they would be worth all the time we spent on this review. But how about
> making a version that doesn't break A207?

There you go. I now only ignore builtins other than True/False/None only for
A103, and fixed the inconsistency in A207 so that {x,x} will give you an error
as well.

Vasily Kuznetsov

Oct. 11, 2017, 7:29 p.m. (2017-10-11 19:29:03 UTC) #9

Allright, I think we can agree on this compromise :D
LGTM

https://codereview.adblockplus.org/29565854/diff/29573863/flake8-eyeo/flake8_...
File flake8-eyeo/flake8_eyeo.py (right):

https://codereview.adblockplus.org/29565854/diff/29573863/flake8-eyeo/flake8_...
flake8-eyeo/flake8_eyeo.py:339: namespace = collections.defaultdict(object,
vars(builtins))
This is clever!

Issue 29565854: Noissue - Improved accuracy of evaluated expressions for A103 and A207 (Closed)

Description

Patch Set 1 #

Patch Set 2 : Rebased #

Patch Set 3 : Changed behavior of A207 #

Messages