|
|
Created:
May 19, 2016, 9:57 p.m. by Jon Sonesen Modified:
June 3, 2016, 4:46 a.m. Visibility:
Public. |
DescriptionIssue 4044 - Added handling for __future__ unicode_literals import to check_quotes()
Repository: hg.adblockplus.org/codingtools/
Patch Set 1 #
Total comments: 6
Patch Set 2 : rebase and logic error fixes #
Total comments: 6
Patch Set 3 : handling for new warning A112 added #
Total comments: 15
Patch Set 4 : Changed yield spacing, and moved check to above pytohn 3+ check and makes A112 yield with othe codes #Patch Set 5 : Changed tests for new A109 merge #Patch Set 6 : removed 'r' prefix check #
Total comments: 2
Patch Set 7 : Updates readme for clarity, moves unicode prefix check out of single line and docstring if statemen… #
Total comments: 9
Patch Set 8 : merged edge case fix #
Total comments: 11
Patch Set 9 : changed logical if else to match suggestion from sebastian #
Total comments: 1
Patch Set 10 : removed redundant comment #
MessagesTotal messages: 37
https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:380: is_unicode_literals = False While I didnt test this change yet I have a hard time to believe that tests actually pass. This function is called per line. And usually "from __future__ import unicode_literals" will be placed in a line prior to the actual string literal using that feature just as in the example used as test. So I think we need some persistence across lines. https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:386: if token is 'unicode_literals': It's not as simple as that. Following examples will raise false positives: unicode_literals = 1 from foo import unicode_literals def unicode_literals(): pass So you have to consider the surrounding tokens. https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:413: elif not is_unicode_literals: Just add it to the condition above using "or" rather than duplicating the block body?
https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:386: if token is 'unicode_literals': On 2016/05/20 00:04:34, Sebastian Noack wrote: > It's not as simple as that. Following examples will raise false positives: > > unicode_literals = 1 Yes I should be checking the logical line object which the second patch contains > from foo import unicode_literals > def unicode_literals(): pass > > So you have to consider the surrounding tokens. https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:413: elif not is_unicode_literals: On 2016/05/20 00:04:34, Sebastian Noack wrote: > Just add it to the condition above using "or" rather than duplicating the block > body? Will do.
Hi Jon, The logic is right now, but the check for unicode_literals mode is not completely robust. It might give false negatives in cases of multiple future imports in the same import statement. See my comments below about how this might be fixed. When you work on making it just right, it helps to also have some tests where A110 is correctly raised because of a non-unicode literal that uses '\uXXXX` escapes. Run them with `tox -e py27` to make sure that you're correctly detecting the unicode_literals mode. These tests will always fail in Python 3 since it always has unicode_literals mode on, and modifying the test runner to make it possible to have python-version-specific tests is outside of this task, so just delete them when you're sure the check works right. Another thing: when you're uploading the new patchset, include the original changes too, it's easier to review all changes together and Rietveld can automatically calculate the differences between patchsets. https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342825/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:380: is_unicode_literals = False On 2016/05/20 00:04:34, Sebastian Noack wrote: > While I didnt test this change yet I have a hard time to believe that tests > actually pass. This function is called per line. And usually "from __future__ > import unicode_literals" will be placed in a line prior to the actual string > literal using that feature just as in the example used as test. So I think we > need some persistence across lines. Funnily enough, the test does pass in the first version. I think it might be because `is_unicode_literals` is False from the start and then it's almost always False, so `elif not is_unicode_literals:` branch always gets executed. But anyway, the logic is not right that way. https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:386: if start[0] == 1: This `if` as well as the following one don't need to be inside the loop. https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:389: if logical_line == 'from __future__ import unicode_literals': Actually this is still not completely right. It could be something like `from __future__ import print_function, unicode_literals, division`, so to be more robust you probably want to look at the tokens array and check if the first one is 'from', the second one is '__future__', the third one is 'import' and there's 'unicode_literals' somewhere in the tail. I think this would be sufficiently robust for most purposes.
https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:414: elif is_unicode or is_unicode_literals: Perhaps we should generate a warning when using strings like u'' while using the future import at the same time. Its redundant, and certainly bad practice.
https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:414: elif is_unicode or is_unicode_literals: On 2016/05/20 13:48:01, Sebastian Noack wrote: > Perhaps we should generate a warning when using strings like u'' while using the > future import at the same time. Its redundant, and certainly bad practice. Yes, this sounds like a good idea. Unfortunately in Python 2 `repr` returns literals prefixed with `u` for unicode strings even if `unicode_literals` mode is on. This means we will deviate from our recommendation to follow `repr` (in a good and useful way, but still) and we probably need to reflect this in the style guide. In any case. It still makes sense to already implement it in `flake8-abp`. Perhaps same error code: A110 with another message.
https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:414: elif is_unicode or is_unicode_literals: On 2016/05/23 08:51:14, Vasily Kuznetsov wrote: > On 2016/05/20 13:48:01, Sebastian Noack wrote: > > Perhaps we should generate a warning when using strings like u'' while using > the > > future import at the same time. Its redundant, and certainly bad practice. > > Yes, this sounds like a good idea. Unfortunately in Python 2 `repr` returns > literals prefixed with `u` for unicode strings even if `unicode_literals` mode > is on. This means we will deviate from our recommendation to follow `repr` (in a > good and useful way, but still) and we probably need to reflect this in the > style guide. In any case. > > It still makes sense to already implement it in `flake8-abp`. Perhaps same error > code: A110 with another message. Since compatibility with Python 3 is now mandatory in our coding style guide, and since we also agreed that the future import is preferable over using strings like u'' in Python 3, that effectively means that we shouldn't ever use strings like u'' anymore, doesn't it? So I guess this should be the rule to be added to the style guide an to be implemented here.
https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:414: elif is_unicode or is_unicode_literals: On 2016/05/24 12:41:35, Sebastian Noack wrote: > On 2016/05/23 08:51:14, Vasily Kuznetsov wrote: > > On 2016/05/20 13:48:01, Sebastian Noack wrote: > > > Perhaps we should generate a warning when using strings like u'' while using > > the > > > future import at the same time. Its redundant, and certainly bad practice. > > > > Yes, this sounds like a good idea. Unfortunately in Python 2 `repr` returns > > literals prefixed with `u` for unicode strings even if `unicode_literals` mode > > is on. This means we will deviate from our recommendation to follow `repr` (in > a > > good and useful way, but still) and we probably need to reflect this in the > > style guide. In any case. > > > > It still makes sense to already implement it in `flake8-abp`. Perhaps same > error > > code: A110 with another message. > > Since compatibility with Python 3 is now mandatory in our coding style guide, > and since we also agreed that the future import is preferable over using strings > like u'' in Python 3, that effectively means that we shouldn't ever use strings > like u'' anymore, doesn't it? So I guess this should be the rule to be added to > the style guide an to be implemented here. Yep. All exactly as you say. And it will actually simplify the logic here since now we can just always complain about u'' literals. It seems that we still need to detect `from __future__ import unicode_literals` though to make sure '\uXXXX' escapes are handled correctly.
On 2016/05/24 13:02:43, Vasily Kuznetsov wrote: > https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... > File flake8-abp/flake8_abp.py (right): > > https://codereview.adblockplus.org/29342824/diff/29342828/flake8-abp/flake8_a... > flake8-abp/flake8_abp.py:414: elif is_unicode or is_unicode_literals: > On 2016/05/24 12:41:35, Sebastian Noack wrote: > > On 2016/05/23 08:51:14, Vasily Kuznetsov wrote: > > > On 2016/05/20 13:48:01, Sebastian Noack wrote: > > > > Perhaps we should generate a warning when using strings like u'' while > using > > > the > > > > future import at the same time. Its redundant, and certainly bad practice. > > > > > > Yes, this sounds like a good idea. Unfortunately in Python 2 `repr` returns > > > literals prefixed with `u` for unicode strings even if `unicode_literals` > mode > > > is on. This means we will deviate from our recommendation to follow `repr` > (in > > a > > > good and useful way, but still) and we probably need to reflect this in the > > > style guide. In any case. > > > > > > It still makes sense to already implement it in `flake8-abp`. Perhaps same > > error > > > code: A110 with another message. > > > > Since compatibility with Python 3 is now mandatory in our coding style guide, > > and since we also agreed that the future import is preferable over using > strings > > like u'' in Python 3, that effectively means that we shouldn't ever use > strings > > like u'' anymore, doesn't it? So I guess this should be the rule to be added > to > > the style guide an to be implemented here. > > Yep. All exactly as you say. And it will actually simplify the logic here since > now we can just always complain about u'' literals. It seems that we still need > to detect `from __future__ import unicode_literals` though to make sure '\uXXXX' > escapes are handled correctly. The newest changeset yield the warning when both the u prefix is used and the unicode_literals future import is included. But I just realized that it may be better to just always yield the exception if literals are prefixed with u since it is not advised in the style guide.
https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:47: IS_UNICODE_LITERALS = False Uppercase notation is only for variables that are considered constant (i.e. never change). https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:384: # --- check if this is beginning of file We generally don use --- in comments. So please remove it for consistency. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:389: token_strings = [t[1] for t in tokens] I wonder whether we should also check for the token type to be NAME. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:418: elif is_unicode and not IS_UNICODE_LITERALS: It seems the check for IS_UNICODE_LITERALS is incorrect. Also the check for is_unicode should probably go even above the check for Python 3. After all, the message indicates, and we agreed on, to never use strings like u''. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:419: yield(start, 'A112 use "from __future__ import"' Please document A112 in the README. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:419: yield(start, 'A112 use "from __future__ import"' There should be a space after "yield". Otherwise it looks like yield is a function that we call. However, it's a statement and a tuple. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:420: 'unicode_literals instead of prefixing' Please indent long error messages like it's done in the rest of this script, i.e. align the starting quotes. Also note that you forgot to add spaces in the end. So this message will be printed as: A112 use "from __future__ import"unicode_literals instead of prefixingliterals with "u"
https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:384: # --- check if this is beginning of file On 2016/05/25 08:31:36, Sebastian Noack wrote: > We generally don use --- in comments. So please remove it for consistency. Also pep8 recommends capitalising the first letter in a sentence (see https://www.python.org/dev/peps/pep-0008/#comments). https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:389: token_strings = [t[1] for t in tokens] On 2016/05/25 08:31:35, Sebastian Noack wrote: > I wonder whether we should also check for the token type to be NAME. Do we really need to? Can you imagine a line that starts with ['from', '__future__', 'import'] and has 'unicode_literals' in the tail and they are not names? https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:391: IS_UNICODE_LITERALS = 'unicode_literals' in token_strings Won't this break if it gets a piece of code like this? from __future__ import unicode_literals from __future__ import print_function We probably only need to turn on the unicode literals flag when we see that 'unicode_literals' is imported and we never want to turn it off (save for first line of the file). https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:418: elif is_unicode and not IS_UNICODE_LITERALS: On 2016/05/25 08:31:35, Sebastian Noack wrote: > It seems the check for IS_UNICODE_LITERALS is incorrect. Also the check for > is_unicode should probably go even above the check for Python 3. After all, the > message indicates, and we agreed on, to never use strings like u''. I second that it should come before the Python 3 check. Also, even if we yield A112, we should still yield the other errors, so if we see something like u"foo", we get both A110 and A112.
https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:389: token_strings = [t[1] for t in tokens] On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > I wonder whether we should also check for the token type to be NAME. > > Do we really need to? Can you imagine a line that starts with ['from', > '__future__', 'import'] and has 'unicode_literals' in the tail and they are not > names? I feel that checking for the token type is more correct (less code assumptions). But I guess, I don't insist, if it makes the check so much simpler. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:418: elif is_unicode and not IS_UNICODE_LITERALS: On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > It seems the check for IS_UNICODE_LITERALS is incorrect. Also the check for > > is_unicode should probably go even above the check for Python 3. After all, > the > > message indicates, and we agreed on, to never use strings like u''. > > I second that it should come before the Python 3 check. Also, even if we yield > A112, we should still yield the other errors, so if we see something like > u"foo", we get both A110 and A112. Just an idea, how about moving it even above the check for docstrings and merge A112 with A109? Docstrings using r or b is probably a case we can give up on. I never saw them in real code, and since Python 3 strings prefixed with b are not even recognized as docstrings.
https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:389: token_strings = [t[1] for t in tokens] On 2016/05/25 14:45:29, Sebastian Noack wrote: > On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > > I wonder whether we should also check for the token type to be NAME. > > > > Do we really need to? Can you imagine a line that starts with ['from', > > '__future__', 'import'] and has 'unicode_literals' in the tail and they are > not > > names? > > I feel that checking for the token type is more correct (less code assumptions). > But I guess, I don't insist, if it makes the check so much simpler. I see, it does feel a bit sloppy to just check the content of the tokens and not the type. But after more thinking about it, it seems reasonable to me in this case. What we're assuming is some facts about syntax of Python and it seems like a pretty safe assumption. So I'd probably prefer keeping the check as it is, since it's shorter and easier to read this way. https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:418: elif is_unicode and not IS_UNICODE_LITERALS: On 2016/05/25 14:45:29, Sebastian Noack wrote: > On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > > It seems the check for IS_UNICODE_LITERALS is incorrect. Also the check for > > > is_unicode should probably go even above the check for Python 3. After all, > > the > > > message indicates, and we agreed on, to never use strings like u''. > > > > I second that it should come before the Python 3 check. Also, even if we yield > > A112, we should still yield the other errors, so if we see something like > > u"foo", we get both A110 and A112. > > Just an idea, how about moving it even above the check for docstrings and merge > A112 with A109? Docstrings using r or b is probably a case we can give up on. I > never saw them in real code, and since Python 3 strings prefixed with b are not > even recognized as docstrings. Moving the check to above the check for docstrings seems like a good idea. As for merging, I suppose you mean replacing lines 408-410 with this new check and raising A112 in that case. This sounds good too.
On 2016/05/25 16:25:57, Vasily Kuznetsov wrote: > https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... > File flake8-abp/flake8_abp.py (right): > > https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... > flake8-abp/flake8_abp.py:389: token_strings = [t[1] for t in tokens] > On 2016/05/25 14:45:29, Sebastian Noack wrote: > > On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > > > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > > > I wonder whether we should also check for the token type to be NAME. > > > > > > Do we really need to? Can you imagine a line that starts with ['from', > > > '__future__', 'import'] and has 'unicode_literals' in the tail and they are > > not > > > names? > > > > I feel that checking for the token type is more correct (less code > assumptions). > > But I guess, I don't insist, if it makes the check so much simpler. > > I see, it does feel a bit sloppy to just check the content of the tokens and not > the type. But after more thinking about it, it seems reasonable to me in this > case. What we're assuming is some facts about syntax of Python and it seems like > a pretty safe assumption. So I'd probably prefer keeping the check as it is, > since it's shorter and easier to read this way. > > https://codereview.adblockplus.org/29342824/diff/29343019/flake8-abp/flake8_a... > flake8-abp/flake8_abp.py:418: elif is_unicode and not IS_UNICODE_LITERALS: > On 2016/05/25 14:45:29, Sebastian Noack wrote: > > On 2016/05/25 13:55:32, Vasily Kuznetsov wrote: > > > On 2016/05/25 08:31:35, Sebastian Noack wrote: > > > > It seems the check for IS_UNICODE_LITERALS is incorrect. Also the check > for > > > > is_unicode should probably go even above the check for Python 3. After > all, > > > the > > > > message indicates, and we agreed on, to never use strings like u''. > > > > > > I second that it should come before the Python 3 check. Also, even if we > yield > > > A112, we should still yield the other errors, so if we see something like > > > u"foo", we get both A110 and A112. > > > > Just an idea, how about moving it even above the check for docstrings and > merge > > A112 with A109? Docstrings using r or b is probably a case we can give up on. > I > > never saw them in real code, and since Python 3 strings prefixed with b are > not > > even recognized as docstrings. > > Moving the check to above the check for docstrings seems like a good idea. > > As for merging, I suppose you mean replacing lines 408-410 with this new check > and raising A112 in that case. This sounds good too. Thanks to you guys for the feedback, I agree about checking the tokens. It is easier to read even though it is an assumptive statement since the syntax of python is well documented and not likely to change I think it is safe to do this. I will fix the other errors in addition to merging A109 with A112. To clarify I am going to just replace the A109 check for strings prefixed with 'u' or 'b' with this new check that yields A109?
Also, for this commit I didn't merge the A109 and A112 checks yet.
Looking good! A couple more things to adjust. https://codereview.adblockplus.org/29342824/diff/29344635/flake8-abp/README.md File flake8-abp/README.md (right): https://codereview.adblockplus.org/29342824/diff/29344635/flake8-abp/README.m... flake8-abp/README.md:36: * `A112`: String literals defined with 'u' prefixes and not with unicode_literals mode This line is a bit too long. Would be better to wrap it like A107 for example. Also, perhaps it would be a bit more clear what we mean if it was formulated in imperative, the same way as the error message. https://codereview.adblockplus.org/29342824/diff/29344635/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29344635/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:409: if is_unicode and not is_unicode_literals: Actually if is_unicode_literals is True, this error should still be raised because u is even more redundant. And this check can be moved out of the `if` (and another copy of it below removed) since we want to do it for all strings, single-line and multiline, docstrings or not.
https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (left): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:391: if first_token and re.search(r'^(?:(?:def|class)\s|$)', Since this is not an else or elif block, I think the code reads a little better with a blank line above, so that it's more obvious that the logic below is independent of the logic above. https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:391: if 'unicode_literals' in token_strings: This is an superfluous level of indentation. I'd rather use an additional variable to avoid a long line here: token_strings = [t[1] for t in tokens] future_import = token_strings[:3] == ['from', '__future__', 'import'] if future_import and 'unicode_literals' in token_strings: is_unicode_literals = True https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:405: yield (start, 'A112 use "from __future__ import' A space is missing here. This message would be printed as ".. importunicode_literals .."
https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:47: is_unicode_literals = False has_unicode_literals seems to be more correct grammatically. https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:420: elif not is_bytes: Currently, the is_unicode_literals variable is never used. If I didn't miss something the check here needs to be: is_unicode or is_unicode_literals and not is_bytes https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/tests/A1... File flake8-abp/tests/A112.py (right): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/tests/A1... flake8-abp/tests/A112.py:4: u"""This should not be a unicode docstring""" This test code can be simplified by using a function without a class or a class without a method. https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/tests/A1... flake8-abp/tests/A112.py:8: return [ This test code can b e simplified by just returning the string instead wrapping it into an array.
https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/README.md File flake8-abp/README.md (right): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/README.m... flake8-abp/README.md:36: * `A112`: Use "from __future__ import unicode_literals" instead of One more thing, I just noticed, you should use backticks instead quotes here. Otherwise markdown would render it as "from future import unicode_literals" with "future" being underlined rather than showing the underscores. Moreover, backticks are the way to indicate code snippets in markdown anyway.
fixes for flake8 unicode literals
rebased changeset
merged changes from edge case fix, adds non global state var had to revert to some older logi to get prefix prepending to work
https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:382: checker_state['has_unicode_literals'] = False Awesome that we can avoid the global variable. However, according to the documentation, checker_state is a new dict per file. So no need to initialize/reset the value. https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:425: elif is_unicode or has_unicode_literals and not is_bytes: For better code locality please retrieve checker_state['has_unicode_literals'] where it is used, rather than unconditionally above. Also the code is more consistent, and IMO easier to read, if you avoid the is_unicode and is_bytes variables. if sys.version_info[0] >= 3: if 'b' in prefixes: prefix = 'b' else: u_literals = checker_state.get('has_unicode_literals') if 'u' in prefixes or u_literals and 'b' not in prefixes: prefix = 'u' https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/tests/A1... File flake8-abp/tests/A112.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/tests/A1... flake8-abp/tests/A112.py:7: def foo(): The function and list seems redundant. How about that? # * A112 foo = u'yeahyeah'
See comments below. Also definitely agree with Sebastian about simplifying the A112 test, both parts of it. https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:403: if 'u' in prefixes and not has_unicode_literals: I wrote it before but maybe it got lost with all these rebases and stuff: if 'u' prefix is used it's always A112 regardless of `has_unicode_literals`. https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:413: elif start[0] != end[0]: I have a mild preference for how this looked before with `if start[0] == end[0]:` and more branching inside. I think the logic is clearer that way.
https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:413: elif start[0] != end[0]: On 2016/06/01 11:01:32, Vasily Kuznetsov wrote: > I have a mild preference for how this looked before with `if start[0] == > end[0]:` and more branching inside. I think the logic is clearer that way. I disagree. That way it's easier to understand that we explicitly ignore multiline strings (except for docstrings). Also avoiding another level of indentation, makes it easier to avoid long lines with the now more complex logic below.
https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:413: elif start[0] != end[0]: On 2016/06/01 11:05:01, Sebastian Noack wrote: > On 2016/06/01 11:01:32, Vasily Kuznetsov wrote: > > I have a mild preference for how this looked before with `if start[0] == > > end[0]:` and more branching inside. I think the logic is clearer that way. > > I disagree. That way it's easier to understand that we explicitly ignore > multiline strings (except for docstrings). Also avoiding another level of > indentation, makes it easier to avoid long lines with the now more complex logic > below. Yeah, you have a point too. Ok, let's leave it as is.
https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29344649/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:391: if 'unicode_literals' in token_strings: On 2016/05/27 12:13:55, Sebastian Noack wrote: > This is an superfluous level of indentation. I'd rather use an additional > variable to avoid a long line here: > > token_strings = [t[1] for t in tokens] > future_import = token_strings[:3] == ['from', '__future__', 'import'] > if future_import and 'unicode_literals' in token_strings: > is_unicode_literals = True I like that and will get it done. https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:403: if 'u' in prefixes and not has_unicode_literals: On 2016/06/01 11:01:32, Vasily Kuznetsov wrote: > I wrote it before but maybe it got lost with all these rebases and stuff: if 'u' > prefix is used it's always A112 regardless of `has_unicode_literals`. Thanks for the reminder I thought I had removed it but I am mistaken. I will fixy. https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:425: elif is_unicode or has_unicode_literals and not is_bytes: On 2016/06/01 09:37:52, Sebastian Noack wrote: > For better code locality please retrieve checker_state['has_unicode_literals'] > where it is used, rather than unconditionally above. Also the code is more > consistent, and IMO easier to read, if you avoid the is_unicode and is_bytes > variables. > > if sys.version_info[0] >= 3: > if 'b' in prefixes: > prefix = 'b' > else: > u_literals = checker_state.get('has_unicode_literals') > if 'u' in prefixes or u_literals and 'b' not in prefixes: > prefix = 'u' I agree with you regarding the use of the fetch happening where it is used, and will get that fixed my reasoning for using the variables was that the line was a bit long but I will gladly change it to the more preferable check without using variables.
https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:425: elif is_unicode or has_unicode_literals and not is_bytes: On 2016/06/01 09:37:52, Sebastian Noack wrote: > For better code locality please retrieve checker_state['has_unicode_literals'] > where it is used, rather than unconditionally above. Also the code is more > consistent, and IMO easier to read, if you avoid the is_unicode and is_bytes > variables. > > if sys.version_info[0] >= 3: > if 'b' in prefixes: > prefix = 'b' > else: > u_literals = checker_state.get('has_unicode_literals') > if 'u' in prefixes or u_literals and 'b' not in prefixes: > prefix = 'u' Ah, prior to reducing the indent level with the change to the elif pass the line was too long but not anymore!
simmplified the tests, fixed a112 check and removed extra vars
https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:425: elif is_unicode or has_unicode_literals and not is_bytes: On 2016/06/01 09:37:52, Sebastian Noack wrote: > For better code locality please retrieve checker_state['has_unicode_literals'] > where it is used, rather than unconditionally above. Also the code is more > consistent, and IMO easier to read, if you avoid the is_unicode and is_bytes > variables. > > if sys.version_info[0] >= 3: > if 'b' in prefixes: > prefix = 'b' > else: > u_literals = checker_state.get('has_unicode_literals') > if 'u' in prefixes or u_literals and 'b' not in prefixes: > prefix = 'u' Also I just tried to implement the if else logic as you have demonstrated above and it appeared to introduce a regression where A110 is erroneously thrown.
On 2016/06/02 01:14:10, Jon Sonesen wrote: > https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... > File flake8-abp/flake8_abp.py (right): > > https://codereview.adblockplus.org/29342824/diff/29345461/flake8-abp/flake8_a... > flake8-abp/flake8_abp.py:425: elif is_unicode or has_unicode_literals and not > is_bytes: > On 2016/06/01 09:37:52, Sebastian Noack wrote: > > For better code locality please retrieve checker_state['has_unicode_literals'] > > where it is used, rather than unconditionally above. Also the code is more > > consistent, and IMO easier to read, if you avoid the is_unicode and is_bytes > > variables. > > > > if sys.version_info[0] >= 3: > > if 'b' in prefixes: > > prefix = 'b' > > else: > > u_literals = checker_state.get('has_unicode_literals') > > if 'u' in prefixes or u_literals and 'b' not in prefixes: > > prefix = 'u' > > Also I just tried to implement the if else logic as you have demonstrated above > and it appeared to introduce a regression where A110 is erroneously thrown. ^^^ nvm i was wrong
changed logical if else to match suggestion from sebastian
It looks mostly good to me. Just one more minor comment. https://codereview.adblockplus.org/29342824/diff/29345517/flake8-abp/flake8_a... File flake8-abp/flake8_abp.py (right): https://codereview.adblockplus.org/29342824/diff/29345517/flake8-abp/flake8_a... flake8-abp/flake8_abp.py:382: # check if in unicode_literals mode I feel that comment is kinda redundant. IMO it doesn't add anything that isn't obvious from the code below.
On 2016/06/02 11:41:01, Sebastian Noack wrote: > It looks mostly good to me. Just one more minor comment. > > https://codereview.adblockplus.org/29342824/diff/29345517/flake8-abp/flake8_a... > File flake8-abp/flake8_abp.py (right): > > https://codereview.adblockplus.org/29342824/diff/29345517/flake8-abp/flake8_a... > flake8-abp/flake8_abp.py:382: # check if in unicode_literals mode > I feel that comment is kinda redundant. IMO it doesn't add anything that isn't > obvious from the code below. Same here. LMGTM ;P
removed redundant comment
On 2016/06/02 17:45:58, Jon Sonesen wrote: > removed redundant comment LGTM
LGTM |