|
|
Created:
Sept. 21, 2018, 1:35 p.m. by Tudor Avram Modified:
Oct. 17, 2018, 2:47 p.m. Reviewers:
Vasily Kuznetsov Visibility:
Public. |
DescriptionIssue #5352 - generate_static_pages cannot deal with directories being turned into regular pages
Patch Set 1 #
Total comments: 20
Patch Set 2 : Addressed comments from Patch #1 #
Total comments: 10
Patch Set 3 : Addressed comments from Patch Set #2 #Patch Set 4 : Fixed issue causing tests to fail in Patch Set #3 #Patch Set 5 : Addressed comments from Patch Set #4 #
MessagesTotal messages: 12
Hi Vasily, Here is my fix for the issue for raising an exception when turning directories into files or vice-versa. Looking forward for your feedback. Thanks, Tudor.
Hi Tudor! The overall approach looks good. I have some comments on code details though (see below). Cheers, Vasily https://codereview.adblockplus.org/29887585/diff/29887586/.gitignore File .gitignore (right): https://codereview.adblockplus.org/29887585/diff/29887586/.gitignore#newcode9 .gitignore:9: .idea/ Please include this into .hgignore too, for consistency. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:42: if os.path.isfile(outfile): Are you sure if the two branches of this if statement exhaust all the possibilities? What if it's some other weird object like a named pipe? We should probably do something sensible there, like raise an error right away. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:49: try: I think it would be good to extract this whole block into a separate function. Otherwise it's a bit too much logic altogether. This could also take care of repeated `os.path.dirname(outfile)`. We can also do a quick check: `if dir != output_dir:` and call the directory creation only when it's actually needed. After the code is in a separate method, we might also notice that the loop in line 55 is doing more or less the same thing as `os.makedirs()` and maybe we can just use `os.mkdir()` since we're already doing the iteration anyway. I leave it up to you to decide which option makes more sense. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... File tests/test_generate_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:1: # This file is part of the Adblock Plus web scripts, This file contains the tests specifically for this dir-to-file conversion scenario (as well as file-to-dir) -- how about we name it in a way that makes this clear? generate_pages seems too general. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:15: import pytest The import order should be stdlib, 3-rd party libs, project-local libs with blank lines between them (here pytest is 3-rd party and os is stdlib so they are backwards). It's also nice to leave an empty line after the top comment. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:21: def target_dir_with_file(temp_site): Here and in the following fixture, it seems like you don't need to create the output directory inside of `temp_site` it can just be inside of `tmpdir`. Pytest will tell you that you can't use a function scoped fixture in a session scoped fixtures but these two fixtures probably shouldn't be session scoped anyway because the tests modify them. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:47: print(os.listdir(os.path.join(target_dir_with_file, 'en', 'foo'))) Do we need this print once the test is passing? https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_site/pag... File tests/test_site/pages/foo/bar.md (right): https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_site/pag... tests/test_site/pages/foo/bar.md:1: test What do you think about writing here the explanation of why this file is needed here in the folder? It doesn't really matter what this file contains, and it seems like some explanation would be useful to somebody who tries to understand this later.
Hi Vasily, Thanks for your feedback! I just had a small comment on the `makedirs()` part of the code(see below). Also, from looking at the os.makedirs() code, it looks like it would never raise an exception that would satisfy `e.errno == errno.EEXIST` (it does the exact same check), so that if statement seems redundant over there. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:49: try: On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > I think it would be good to extract this whole block into a separate function. > Otherwise it's a bit too much logic altogether. This could also take care of > repeated `os.path.dirname(outfile)`. > > We can also do a quick check: `if dir != output_dir:` and call the directory > creation only when it's actually needed. > > After the code is in a separate method, we might also notice that the loop in > line 55 is doing more or less the same thing as `os.makedirs()` and maybe we can > just use `os.mkdir()` since we're already doing the iteration anyway. I leave it > up to you to decide which option makes more sense. Yeah, I think it makes sense to put this into a different function. As your suggestion with replacing os.makedirs() comes, I chose to do it only if the path already exists, because: 1) It would rarely be the case for `os.path.isfile(path_so_far)` to be true and then it os.makedirs() would only be called the second time if we would have a file in the previous version of the website that now would become a directory. For example, having `en/foo` as a file in the initial version and then the new website being `en/foo/bar/index`, would be a case where that loop actually changes anything. 2) Yeah, I can replace `os.makedirs()` with multiple `os.mkdir`, but I just assumed that os.makedirs() does something more clever in the backend (making it more efficient) than my rough for loop.
Also, one more question (see below): https://codereview.adblockplus.org/29887585/diff/29887586/.gitignore File .gitignore (right): https://codereview.adblockplus.org/29887585/diff/29887586/.gitignore#newcode9 .gitignore:9: .idea/ On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > Please include this into .hgignore too, for consistency. Done. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:42: if os.path.isfile(outfile): On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > Are you sure if the two branches of this if statement exhaust all the > possibilities? What if it's some other weird object like a named pipe? We should > probably do something sensible there, like raise an error right away. Also, one more thing: do we want to raise an error if the old version of the website has some kind of a weird object or just remove it altogether (and assume the new version is correct and should replace it)? https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... File tests/test_generate_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:15: import pytest On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > The import order should be stdlib, 3-rd party libs, project-local libs with > blank lines between them (here pytest is 3-rd party and os is stdlib so they are > backwards). It's also nice to leave an empty line after the top comment. Done. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:21: def target_dir_with_file(temp_site): On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > Here and in the following fixture, it seems like you don't need to create the > output directory inside of `temp_site` it can just be inside of `tmpdir`. Pytest > will tell you that you can't use a function scoped fixture in a session scoped > fixtures but these two fixtures probably shouldn't be session scoped anyway > because the tests modify them. Done. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:47: print(os.listdir(os.path.join(target_dir_with_file, 'en', 'foo'))) On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > Do we need this print once the test is passing? oops, no. Forgot to remove it. Done. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_site/pag... File tests/test_site/pages/foo/bar.md (right): https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_site/pag... tests/test_site/pages/foo/bar.md:1: test On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > What do you think about writing here the explanation of why this file is needed > here in the folder? It doesn't really matter what this file contains, and it > seems like some explanation would be useful to somebody who tries to understand > this later. Done.
Hi Tudor, > Also, from looking at the os.makedirs() code, it looks like it would never raise > an exception that would satisfy `e.errno == errno.EEXIST` (it does the exact > same check), so that if statement seems redundant over there. Good idea to look at the code of os.makedirs() but I think it might actually fail with this code because of the final mkdir(), so we might need to still handle it. Also, as you can see, it is just doing simple recursion that you can also implement with the additional checks that we need. Cheers, Vasily https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:42: if os.path.isfile(outfile): On 2018/09/24 14:04:51, Tudor Avram wrote: > On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > > Are you sure if the two branches of this if statement exhaust all the > > possibilities? What if it's some other weird object like a named pipe? We > should > > probably do something sensible there, like raise an error right away. > > Also, one more thing: do we want to raise an error if the old version of the > website has some kind of a weird object or just remove it altogether (and assume > the new version is correct and should replace it)? I think if there's some kind of weird object in there, we're probably dealing with some unexpected situation and it's probably safer to stop. It seems that this scenario is very unlikely anyway so it's probably fine. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:49: try: On 2018/09/24 13:26:45, Tudor Avram wrote: > On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > > I think it would be good to extract this whole block into a separate function. > > Otherwise it's a bit too much logic altogether. This could also take care of > > repeated `os.path.dirname(outfile)`. > > > > We can also do a quick check: `if dir != output_dir:` and call the directory > > creation only when it's actually needed. > > > > After the code is in a separate method, we might also notice that the loop in > > line 55 is doing more or less the same thing as `os.makedirs()` and maybe we > can > > just use `os.mkdir()` since we're already doing the iteration anyway. I leave > it > > up to you to decide which option makes more sense. > > Yeah, I think it makes sense to put this into a different function. > > As your suggestion with replacing os.makedirs() comes, I chose to do it only if > the path already exists, because: > 1) It would rarely be the case for `os.path.isfile(path_so_far)` to be true and > then it os.makedirs() would only be called the second time if we would have a > file in the previous version of the website that now would become a directory. > For example, having `en/foo` as a file in the initial version and then the new > website being `en/foo/bar/index`, would be a case where that loop actually > changes anything. > 2) Yeah, I can replace `os.makedirs()` with multiple `os.mkdir`, but I just > assumed that os.makedirs() does something more clever in the backend (making it > more efficient) than my rough for loop. I don't expect many directories to be created when any website is generated and in creating directories the slowest part is the filesystem, so I don't think `os.makedirs()` can be much faster than your loop. I'm also not against using `os.makedirs()` as long as the code is simple -- I was just thinking that it might become simpler and more straightforward with `os.mkdir()`.
Hi Vasily, I addressed your comments from the previous patch. I also refactored a couple of things in generate_static_pages.py, just to make the code a little bit more modular and readable. There also is one more test, where I treat the case where we have an existent object that is neither a file, nor a directory and check if the expected exception is raised. Thanks, Tudor. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:42: if os.path.isfile(outfile): On 2018/09/24 14:42:23, Vasily Kuznetsov wrote: > On 2018/09/24 14:04:51, Tudor Avram wrote: > > On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > > > Are you sure if the two branches of this if statement exhaust all the > > > possibilities? What if it's some other weird object like a named pipe? We > > should > > > probably do something sensible there, like raise an error right away. > > > > Also, one more thing: do we want to raise an error if the old version of the > > website has some kind of a weird object or just remove it altogether (and > assume > > the new version is correct and should replace it)? > > I think if there's some kind of weird object in there, we're probably dealing > with some unexpected situation and it's probably safer to stop. It seems that > this scenario is very unlikely anyway so it's probably fine. Done. https://codereview.adblockplus.org/29887585/diff/29887586/cms/bin/generate_st... cms/bin/generate_static_pages.py:49: try: On 2018/09/24 14:42:23, Vasily Kuznetsov wrote: > On 2018/09/24 13:26:45, Tudor Avram wrote: > > On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > > > I think it would be good to extract this whole block into a separate > function. > > > Otherwise it's a bit too much logic altogether. This could also take care of > > > repeated `os.path.dirname(outfile)`. > > > > > > We can also do a quick check: `if dir != output_dir:` and call the directory > > > creation only when it's actually needed. > > > > > > After the code is in a separate method, we might also notice that the loop > in > > > line 55 is doing more or less the same thing as `os.makedirs()` and maybe we > > can > > > just use `os.mkdir()` since we're already doing the iteration anyway. I > leave > > it > > > up to you to decide which option makes more sense. > > > > Yeah, I think it makes sense to put this into a different function. > > > > As your suggestion with replacing os.makedirs() comes, I chose to do it only > if > > the path already exists, because: > > 1) It would rarely be the case for `os.path.isfile(path_so_far)` to be true > and > > then it os.makedirs() would only be called the second time if we would have a > > file in the previous version of the website that now would become a directory. > > For example, having `en/foo` as a file in the initial version and then the new > > website being `en/foo/bar/index`, would be a case where that loop actually > > changes anything. > > 2) Yeah, I can replace `os.makedirs()` with multiple `os.mkdir`, but I just > > assumed that os.makedirs() does something more clever in the backend (making > it > > more efficient) than my rough for loop. > > I don't expect many directories to be created when any website is generated and > in creating directories the slowest part is the filesystem, so I don't think > `os.makedirs()` can be much faster than your loop. > > I'm also not against using `os.makedirs()` as long as the code is simple -- I > was just thinking that it might become simpler and more straightforward with > `os.mkdir()`. Done. https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... File tests/test_generate_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29887586/tests/test_generate... tests/test_generate_pages.py:1: # This file is part of the Adblock Plus web scripts, On 2018/09/24 11:58:47, Vasily Kuznetsov wrote: > This file contains the tests specifically for this dir-to-file conversion > scenario (as well as file-to-dir) -- how about we name it in a way that makes > this clear? generate_pages seems too general. Done.
Hi Tudor! Almost there. Just a few minor comments. Cheers, Vasily https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... cms/bin/generate_static_pages.py:30: def resolve_dirs(partial_path, path_parts): It seems like resolve_dirs doesn't quite capture what this function is doing. Basically it creates directories unless they exist so it's kind of like os.makedirs() but it also handles some special cases (like stale files in the way). My first thought is "makedirs2()" but that's kind of lame. Maybe "ensure_dir" (because it ensures that the target directory exists and is a directory)? https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... File tests/test_generation_exceptional_cases.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... tests/test_generation_exceptional_cases.py:49: @pytest.mark.script_launch_mode('subprocess') As we've discussed, please try to convert these generation tests to import the right function from generate_static_pages.py and run it in the same process. Doing these subprocess calls slows down the tests and we don't really need it to tests directory/file handling. https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_site/pag... File tests/test_site/pages/foo/bar.md (right): https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_site/pag... tests/test_site/pages/foo/bar.md:2: are located in some place other than root (` \ `). This thing (` \ `) confuses me. We've discussed it.
Hi Vasily, I addressed all of your comments (except removing `script_runner` for one of the tests). Doing that would require mocking `sys.exit()`, which would in turn cause some errors to arise in my code. I explained that in further detail in the comments below. I also had a small comment on one of the variable names used in the previously-existent code. Let me know what you think. Thanks, Tudor. https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... cms/bin/generate_static_pages.py:30: def resolve_dirs(partial_path, path_parts): On 2018/09/27 11:05:07, Vasily Kuznetsov wrote: > It seems like resolve_dirs doesn't quite capture what this function is doing. > Basically it creates directories unless they exist so it's kind of like > os.makedirs() but it also handles some special cases (like stale files in the > way). My first thought is "makedirs2()" but that's kind of lame. Maybe > "ensure_dir" (because it ensures that the target directory exists and is a > directory)? Done. https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... cms/bin/generate_static_pages.py:115: def generate_pages(repo, output_dir): Nit: Also, does `repo` really make sense here now? I know that previously, we used to have Mercurial sources. But as that's not the case, maybe `website` or `raw_dir` would make a little bit more sense? What do you think? (This would be handled in a separate ticket, obviously). https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... File tests/test_generation_exceptional_cases.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... tests/test_generation_exceptional_cases.py:49: @pytest.mark.script_launch_mode('subprocess') On 2018/09/27 11:05:07, Vasily Kuznetsov wrote: > As we've discussed, please try to convert these generation tests to import the > right function from generate_static_pages.py and run it in the same process. > Doing these subprocess calls slows down the tests and we don't really need it to > tests directory/file handling. Done for first two tests. The last one requires some extra mocking that might break things. (see below). https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... tests/test_generation_exceptional_cases.py:75: script_runner): For this last test, I would have to mock sys.exit() and that might break a couple of things in there (as in the function would continue to run, even though it was supposed to stop. This breaks some things along the way. https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_site/pag... File tests/test_site/pages/foo/bar.md (right): https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_site/pag... tests/test_site/pages/foo/bar.md:2: are located in some place other than root (` \ `). On 2018/09/27 11:05:08, Vasily Kuznetsov wrote: > This thing (` \ `) confuses me. We've discussed it. Done.
Hi Vasily, Really sorry, just dicovered that my previous change broke a couple of tests. It's fixed now.
Hi Tudor! Let's also convert test_generate_fifo_instead_of_file and then it's done (see details below). Cheers, Vasily https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... File cms/bin/generate_static_pages.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/cms/bin/generate_st... cms/bin/generate_static_pages.py:115: def generate_pages(repo, output_dir): On 2018/10/04 06:34:18, Tudor Avram wrote: > Nit: Also, does `repo` really make sense here now? I know that previously, we > used to have Mercurial sources. But as that's not the case, maybe `website` or > `raw_dir` would make a little bit more sense? > What do you think? > > (This would be handled in a separate ticket, obviously). Yes, makes sense, `repo` is a confusing name. Perhaps `source_dir` would work so we have `generate_pages(source_dir, output_dir)`. https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... File tests/test_generation_exceptional_cases.py (right): https://codereview.adblockplus.org/29887585/diff/29890605/tests/test_generati... tests/test_generation_exceptional_cases.py:75: script_runner): On 2018/10/04 06:34:18, Tudor Avram wrote: > For this last test, I would have to mock sys.exit() and that might break a > couple of things in there (as in the function would continue to run, even though > it was supposed to stop. This breaks some things along the way. You can catch SystemExit exception in this test. Check this out: >>> try: ... import sys ... sys.exit('boom') ... except SystemExit as exit: ... pass ... >>> exit SystemExit('boom',)
Hi Vasily, Here is the change for the last test. Sorry again for all this :(. Tudor.
LGTM |