|
|
Created:
Sept. 1, 2014, 8:14 p.m. by Wladimir Palant Modified:
Sept. 9, 2014, 6:15 p.m. Visibility:
Public. |
DescriptionSee issue 170 for the specification. This implementation is mostly done, the Git pull command is currently wrong however. Also, the self-update mechanism isn`t implemented yet.
Patch Set 1 #
Total comments: 24
Patch Set 2 : Addressed comments and fixed git pull command #Patch Set 3 : Added self-update #Patch Set 4 : Renamed script #
Total comments: 13
Patch Set 5 : Addressed comments #
Total comments: 3
Patch Set 6 : Removed useless exit() call #Patch Set 7 : Do not reload when used as module #
Total comments: 6
Patch Set 8 : Better safe_join(), usage info, additional parameters #
Total comments: 4
Patch Set 9 : Addressed remaining comments #MessagesTotal messages: 32
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): Since Mercurial and Git only have static functions, I would prefer to use submodules instead classes. That way you won't have to decorate any function as static, and improve the encapsulation of the code. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:45: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() Any reason, not using check_output() here? http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:71: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() Again, any reason you don't use check_output()? http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) I prefer to use the warnings module here, instead directly printing to stderr. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:135: return os.path.join(path, *filter(lambda f: f.count(".") != len(f), subpath.split("/"))) I prefer list comprehensions over filter(lambda: ...). They are faster, and IMO more readable: [f for f in subpath.split("/") if f.count(".") != len(f)] Also you should use os.curdir and os.pardir instead hardcoding "." and "..". http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:151: if key == parenttype or (key in repo_types and type == None): Please use "is" operator when comparing to None. type is None Or if you aren't concerned about empty sequences/strings, just check whether it evaluates to False: not type Those constructs are closer to human language, and therefore more readable. They are faster than the equals operator too.
Current patch is still work in progress but now only the self-update mechanism is missing. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/02 08:12:30, Sebastian Noack wrote: > Since Mercurial and Git only have static functions, I would prefer to use > submodules instead classes. That way you won't have to decorate any function as > static, and improve the encapsulation of the code. I'd rather keep it all in one file. Keep in mind that this file will need to be copied into all our repositories. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:45: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() On 2014/09/02 08:12:30, Sebastian Noack wrote: > Any reason, not using check_output() here? Yes, I need to drop stderr output. "hg id" won't be silent for unknown revisions, probably worth a comment. Also, I consider this call simpler than catching exceptions because the command failed. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:71: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() On 2014/09/02 08:12:30, Sebastian Noack wrote: > Again, any reason you don't use check_output()? With the --revs-only flag - no real reason, I can indeed use check_output() now. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) On 2014/09/02 08:12:30, Sebastian Noack wrote: > I prefer to use the warnings module here, instead directly printing to stderr. I ended up overriding warning.showwarning() to avoid information overload. Not really sure whether it was worth it...
Some remarks: 1. I think we should call the script ensuredeps.py or ensure_deps.py. PEP8 says module names should be all lower case with underscores. I know the file doesn't contain a module, but I do think we want to name Python files consistently, script or module. 2. I think we should either rename the script to ensure_subs.py or the .subs file to .deps, for consistency. (I'd personally prefer deps for both, doesn't have me thinking about sandwiches.)
As discussed on IRC, the problem with .deps is a potential naming conflict with automake (there it is the dependencies folder). While we don't currently use it anywhere, it is used widely enough that we might at some point - at the very least it could cause some confusion. So I think a better choice would be "dependencies" (not that it doesn't start with a period - this file isn't autogenerated but rather edited manually).
I would also prefer a filename without camel case in regard to PEP-8. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/02 14:48:03, Wladimir Palant wrote: > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > Since Mercurial and Git only have static functions, I would prefer to use > > submodules instead classes. That way you won't have to decorate any function > as > > static, and improve the encapsulation of the code. > > I'd rather keep it all in one file. Keep in mind that this file will need to be > copied into all our repositories. Agreed. But you could just initiate those classes below. So you don't need to decorate each method as static. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:45: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() On 2014/09/02 14:48:03, Wladimir Palant wrote: > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > Any reason, not using check_output() here? > > Yes, I need to drop stderr output. "hg id" won't be silent for unknown > revisions, probably worth a comment. Also, I consider this call simpler than > catching exceptions because the command failed. subprocess.check_output(stderr=open(os.devnull, "wb"), ...) http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) On 2014/09/02 14:48:03, Wladimir Palant wrote: > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > I prefer to use the warnings module here, instead directly printing to stderr. > > I ended up overriding warning.showwarning() to avoid information overload. Not > really sure whether it was worth it... You should have a look at warnings.filterwarnings().
Self-update mechanism has been added, the functionality should be complete now. Note that you should review Patch Set 3, the next patch set merely renames the file. I uploaded two separate patch sets to make sure the diff is usable. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:45: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() On 2014/09/03 19:23:07, Sebastian Noack wrote: > subprocess.check_output(stderr=open(os.devnull, "wb"), ...) What about exceptions? There is no subprocess.output(). http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) On 2014/09/03 19:23:07, Sebastian Noack wrote: > You should have a look at warnings.filterwarnings(). I already did, so what? I don't need to switch off warnings, I merely need them displayed in a way that a human can scan through easily. IMHO the only somewhat useful piece of information (other than the actual message) is the line number, the rest of it merely adds noise and is extremely annoying.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:45: result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate() On 2014/09/03 21:57:47, Wladimir Palant wrote: > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > subprocess.check_output(stderr=open(os.devnull, "wb"), ...) > > What about exceptions? There is no subprocess.output(). Sure, if you also have to ignore non-zero status codes, use communicate(). However, you could just address the first item in the sequence, instead writing stderr into a dummy variable: result = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0] http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) On 2014/09/03 21:57:47, Wladimir Palant wrote: > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > You should have a look at warnings.filterwarnings(). > > I already did, so what? I don't need to switch off warnings, I merely need them > displayed in a way that a human can scan through easily. IMHO the only somewhat > useful piece of information (other than the actual message) is the line number, > the rest of it merely adds noise and is extremely annoying. I thought you were talking about filtering out irrelevant warnings, not cleaning up the format. However, you might want to use the logging module instead, which lets you use custom Formatter objects, for this purpose. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:2: # coding: utf-8 I don't find any non-ascii characters below. Also according to PEP-8, this should be avoided, except you really have to use non-ascii characters in your comments or docstring, e.g. for author names. For regular strings you should always use unicode literals like u'\xe4'. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:216: os.execv(sys.executable, [sys.executable, target] + sys.argv[1:]) I think you have to call os.exit(0) afterwards. Otherwise the script is restarted for every repo in the loop. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib Is there a case where this script is actually imported as a module, or can we just ignore that case? Note that it only works as long as the name and expected arguments of resolve_deps() don't change. And since the calling module will still keep a reference to the old version of this module, it also won't work if this function is called multiple times.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:89: print >>sys.stderr, "Invalid line in file %s: %s" % (path, line) On 2014/09/04 10:07:30, Sebastian Noack wrote: > However, you might want to use the logging module instead, which > lets you use custom Formatter objects, for this purpose. Thank you, the logging module is indeed a lot better. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:2: # coding: utf-8 On 2014/09/04 10:07:30, Sebastian Noack wrote: > I don't find any non-ascii characters below. Also according to PEP-8, this > should be avoided, except you really have to use non-ascii characters in your > comments or docstring, e.g. for author names. For regular strings you should > always use unicode literals like u'\xe4'. We specify this in all Python files, UTF-8 is the encoding we generally use for all source code files. I guess this can go away once we migrate to Python 3. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib On 2014/09/04 10:07:30, Sebastian Noack wrote: > Is there a case where this script is actually imported as a module, or can we > just ignore that case? build.py will import it as module. And - no, it shouldn't call it multiple times. Worst-case scenario: self-update succeeds but restart fails and build needs to be started again. I think we can live with that.
http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib On 2014/09/04 13:32:12, Wladimir Palant wrote: > Worst-case scenario: self-update succeeds but restart fails and build needs to > be started again. I think we can live with that. I think you didn't get it. When buildtools calls resolve_deps the second time, it will run the old version. That is because when you reload a module, all references to that module that have been imported before, still point to the old version. And since we already updated the source code, the check above won't work, and the code thinks that it already is at its newest version. http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... ensure_dependencies.py:217: os.exit(0) My bad, I meant sys.exit(0), os.exit doesn't exist. However, good to see that even you don't test your changes sometimes, before uploading them for review. :D
http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:216: os.execv(sys.executable, [sys.executable, target] + sys.argv[1:]) On 2014/09/04 10:07:30, Sebastian Noack wrote: > I think you have to call os.exit(0) afterwards. Otherwise the script is > restarted for every repo in the loop. I just realized that you are using os.execv(), which doesn't return, but replaces the current process. So you don't have to terminate the process afterwards. In fact no code below will be executed anyway.
http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib On 2014/09/04 13:44:52, Sebastian Noack wrote: > I think you didn't get it. I did get it, but I was rather referring to the scenario where the function signature changes. As to the function being called multiple times from another module - I don't really see how that would happen. Some long-running process creating multiple builds from the same repository? If you think that this code is too dangerous - we can consider running ensure_dependencies.py via subprocess.check_call() from build.py. It will waste resources, particularly problematic on the build server, but other than that it should work the same. http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... ensure_dependencies.py:217: os.exit(0) On 2014/09/04 13:44:52, Sebastian Noack wrote: > My bad, I meant sys.exit(0), os.exit doesn't exist. However, good to see that > even you don't test your changes sometimes, before uploading them for review. :D Actually, I did test it. What I actually should have done, was checking the documentation again. Thing is, this code is never executed: os.execv() doesn't return, it *replaces* the current process.
http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib Actually, this doesn't seem to be that much of an issue. reload() doesn't create a new module object, it updates the existing one. This means that all references to this module object are still valid, merely references to objects or functions are outdated. So calling ensure_dependencies.resolve_deps() will still be ok even after the module was reloaded. There is only a problem if this function was imported into the global namespace or no longer exists in the module.
http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:218: import importlib On 2014/09/04 15:55:38, Wladimir Palant wrote: > Actually, this doesn't seem to be that much of an issue. reload() doesn't create > a new module object, it updates the existing one. This means that all references > to this module object are still valid, merely references to objects or functions > are outdated. So calling ensure_dependencies.resolve_deps() will still be ok > even after the module was reloaded. There is only a problem if this function was > imported into the global namespace or no longer exists in the module. That is correct. Though this might be a footgun. I think I better like the idea to call this script with the subprocess module. That approach would also be robust against changes regarding the name and arguments of resolve_deps(). http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5661458385862656/ensu... ensure_dependencies.py:217: os.exit(0) On 2014/09/04 14:34:45, Wladimir Palant wrote: > On 2014/09/04 13:44:52, Sebastian Noack wrote: > > My bad, I meant sys.exit(0), os.exit doesn't exist. However, good to see that > > even you don't test your changes sometimes, before uploading them for review. > :D > > Actually, I did test it. What I actually should have done, was checking the > documentation again. Thing is, this code is never executed: os.execv() doesn't > return, it *replaces* the current process. I know, see my follow-up comment: http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu...
The new version will now advise a manual restart if ensure_dependencies.py is used as a module, it will no longer attempt to do it automatically.
LGTM
I think it would be neat to have a small example dependencies file on top, or to print one when no file could be found or something like that. Currently one would have to read the source carefully or find an example. Also, where were you planning to put this script? I think we should have it under VCS in a central place. The new devtools repo if we can't come up with anything better. http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/03 19:23:07, Sebastian Noack wrote: > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > Since Mercurial and Git only have static functions, I would prefer to use > > > submodules instead classes. That way you won't have to decorate any function > > as > > > static, and improve the encapsulation of the code. > > > > I'd rather keep it all in one file. Keep in mind that this file will need to > be > > copied into all our repositories. > > Agreed. But you could just initiate those classes below. So you don't need to > decorate each method as static. I actually preferred the static methods :P As it was, it was completely clear that Mercurial and Git don't have any state, IMO it was obvious that they were just meant as a namespace. Now, each method is getting a self reference and you have to read the code to figure out if they have state or not. There's no point in having an instance of a class without any state. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:2: # coding: utf-8 On 2014/09/04 13:32:12, Wladimir Palant wrote: > On 2014/09/04 10:07:30, Sebastian Noack wrote: > > I don't find any non-ascii characters below. Also according to PEP-8, this > > should be avoided, except you really have to use non-ascii characters in your > > comments or docstring, e.g. for author names. For regular strings you should > > always use unicode literals like u'\xe4'. > > We specify this in all Python files, UTF-8 is the encoding we generally use for > all source code files. I guess this can go away once we migrate to Python 3. What PEP-8 says, specifically, is this: "Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an encoding declaration." Since we're using UTF-8 in Python 2, we need the encoding declaration. http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:127: return os.path.join(path, *[f for f in subpath.split("/") if f not in (os.curdir, os.pardir)]) So we're ignoring . and .. in paths? Shouldn't we rather ignore just . and complain when we find a ..? http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:187: for dir in config: Why not do something like |for dir, dep in config| here? You're using |config[dir]| a few times below.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > > Since Mercurial and Git only have static functions, I would prefer to use > > > > submodules instead classes. That way you won't have to decorate any > function > > > as > > > > static, and improve the encapsulation of the code. > > > > > > I'd rather keep it all in one file. Keep in mind that this file will need to > > be > > > copied into all our repositories. > > > > Agreed. But you could just initiate those classes below. So you don't need to > > decorate each method as static. > > I actually preferred the static methods :P As it was, it was completely clear > that Mercurial and Git don't have any state, IMO it was obvious that they were > just meant as a namespace. > > Now, each method is getting a self reference and you have to read the code to > figure out if they have state or not. There's no point in having an instance of > a class without any state. Think of those classes as singletons. There is supposed to be only one instance. If there would be a state, it wouldn't make any difference whether it is stored in the instance, the class or global variables. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:2: # coding: utf-8 On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > On 2014/09/04 13:32:12, Wladimir Palant wrote: > > On 2014/09/04 10:07:30, Sebastian Noack wrote: > > > I don't find any non-ascii characters below. Also according to PEP-8, this > > > should be avoided, except you really have to use non-ascii characters in > your > > > comments or docstring, e.g. for author names. For regular strings you should > > > always use unicode literals like u'\xe4'. > > > > We specify this in all Python files, UTF-8 is the encoding we generally use > for > > all source code files. I guess this can go away once we migrate to Python 3. > > What PEP-8 says, specifically, is this: > > "Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an > encoding declaration." > > Since we're using UTF-8 in Python 2, we need the encoding declaration. Keep reading ;P "... non-default encodings should be used only for test purposes or when a comment or docstring needs to mention an author name that contains non-ASCII characters; otherwise, using \x, \u, \U, or \N escapes is the preferred way to include non-ASCII data in string literals." http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:127: return os.path.join(path, *[f for f in subpath.split("/") if f not in (os.curdir, os.pardir)]) On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > So we're ignoring . and .. in paths? Shouldn't we rather ignore just . and > complain when we find a ..? It's common practice to just ignore '..' when sanitizing paths, in most FTP and HTTP file servers, as well as on UNIX. However, if you prefer to raise an exception instead, I won't mind.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/05 09:29:30, Sebastian Noack wrote: > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > > > Since Mercurial and Git only have static functions, I would prefer to > use > > > > > submodules instead classes. That way you won't have to decorate any > > function > > > > as > > > > > static, and improve the encapsulation of the code. > > > > > > > > I'd rather keep it all in one file. Keep in mind that this file will need > to > > > be > > > > copied into all our repositories. > > > > > > Agreed. But you could just initiate those classes below. So you don't need > to > > > decorate each method as static. > > > > I actually preferred the static methods :P As it was, it was completely clear > > that Mercurial and Git don't have any state, IMO it was obvious that they were > > just meant as a namespace. > > > > Now, each method is getting a self reference and you have to read the code to > > figure out if they have state or not. There's no point in having an instance > of > > a class without any state. > > Think of those classes as singletons. There is supposed to be only one instance. > If there would be a state, it wouldn't make any difference whether it is stored > in the instance, the class or global variables. A singleton is arguably also the wrong choice when you need a namespace. A class with only static functions is a common workaround when there's no namespace construct - even Java developers normally don't come up with Utils singletons and the like. http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5754903989321728/ensu... ensure_dependencies.py:2: # coding: utf-8 On 2014/09/05 09:29:30, Sebastian Noack wrote: > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > On 2014/09/04 13:32:12, Wladimir Palant wrote: > > > On 2014/09/04 10:07:30, Sebastian Noack wrote: > > > > I don't find any non-ascii characters below. Also according to PEP-8, this > > > > should be avoided, except you really have to use non-ascii characters in > > your > > > > comments or docstring, e.g. for author names. For regular strings you > should > > > > always use unicode literals like u'\xe4'. > > > > > > We specify this in all Python files, UTF-8 is the encoding we generally use > > for > > > all source code files. I guess this can go away once we migrate to Python 3. > > > > What PEP-8 says, specifically, is this: > > > > "Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an > > encoding declaration." > > > > Since we're using UTF-8 in Python 2, we need the encoding declaration. > > Keep reading ;P > > "... non-default encodings should be used only for test purposes or when a > comment or docstring needs to mention an author name that contains non-ASCII > characters; otherwise, using \x, \u, \U, or \N escapes is the preferred way to > include non-ASCII data in string literals." You left the first part out: "In the standard library [...]" - I do think PEP8 intends to leave this up to us. That said, I really don't have a strong opinion on it, let's just be consistent. I think we should discuss this in a separate issue and then apply the resulting change to all source files - feel free to file one. http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:127: return os.path.join(path, *[f for f in subpath.split("/") if f not in (os.curdir, os.pardir)]) On 2014/09/05 09:29:31, Sebastian Noack wrote: > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > So we're ignoring . and .. in paths? Shouldn't we rather ignore just . and > > complain when we find a ..? > > It's common practice to just ignore '..' when sanitizing paths, in most FTP and > HTTP file servers, as well as on UNIX. However, if you prefer to raise an > exception instead, I won't mind. We're using this for local paths, don't we? Otherwise I wouldn't really mind.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/05 09:55:01, Felix H. Dahlke wrote: > On 2014/09/05 09:29:30, Sebastian Noack wrote: > > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > > > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > > > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > > > > Since Mercurial and Git only have static functions, I would prefer to > > use > > > > > > submodules instead classes. That way you won't have to decorate any > > > function > > > > > as > > > > > > static, and improve the encapsulation of the code. > > > > > > > > > > I'd rather keep it all in one file. Keep in mind that this file will > need > > to > > > > be > > > > > copied into all our repositories. > > > > > > > > Agreed. But you could just initiate those classes below. So you don't need > > to > > > > decorate each method as static. > > > > > > I actually preferred the static methods :P As it was, it was completely > clear > > > that Mercurial and Git don't have any state, IMO it was obvious that they > were > > > just meant as a namespace. > > > > > > Now, each method is getting a self reference and you have to read the code > to > > > figure out if they have state or not. There's no point in having an instance > > of > > > a class without any state. > > > > Think of those classes as singletons. There is supposed to be only one > instance. > > If there would be a state, it wouldn't make any difference whether it is > stored > > in the instance, the class or global variables. > > A singleton is arguably also the wrong choice when you need a namespace. A class > with only static functions is a common workaround when there's no namespace > construct - even Java developers normally don't come up with Utils singletons > and the like. We just need two objects, with different implementation, that behave the same. If you call them namespaces, classes or objects doesn't matter. Also state doesn't matter here. If they need to have a state that is fine, if they don't it is fine too. However, just by passing around classes instead singleton objects, we won't prevent having a state anyway. In Python, classes are first-class objects too. You can dynamically assign members to a class, as you would assign them to an instance of that class.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/05 10:47:06, Sebastian Noack wrote: > On 2014/09/05 09:55:01, Felix H. Dahlke wrote: > > On 2014/09/05 09:29:30, Sebastian Noack wrote: > > > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > > > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > > > > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > > > > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > > > > > Since Mercurial and Git only have static functions, I would prefer > to > > > use > > > > > > > submodules instead classes. That way you won't have to decorate any > > > > function > > > > > > as > > > > > > > static, and improve the encapsulation of the code. > > > > > > > > > > > > I'd rather keep it all in one file. Keep in mind that this file will > > need > > > to > > > > > be > > > > > > copied into all our repositories. > > > > > > > > > > Agreed. But you could just initiate those classes below. So you don't > need > > > to > > > > > decorate each method as static. > > > > > > > > I actually preferred the static methods :P As it was, it was completely > > clear > > > > that Mercurial and Git don't have any state, IMO it was obvious that they > > were > > > > just meant as a namespace. > > > > > > > > Now, each method is getting a self reference and you have to read the code > > to > > > > figure out if they have state or not. There's no point in having an > instance > > > of > > > > a class without any state. > > > > > > Think of those classes as singletons. There is supposed to be only one > > instance. > > > If there would be a state, it wouldn't make any difference whether it is > > stored > > > in the instance, the class or global variables. > > > > A singleton is arguably also the wrong choice when you need a namespace. A > class > > with only static functions is a common workaround when there's no namespace > > construct - even Java developers normally don't come up with Utils singletons > > and the like. > > We just need two objects, with different implementation, that behave the same. > If you call them namespaces, classes or objects doesn't matter. Also state > doesn't matter here. If they need to have a state that is fine, if they don't it > is fine too. Sure, you normally don't pass namespaces around, in Java we would have to have an interface and two objects of classes that implement it. Good thing this is not Java :) But it doesn't really matter too much. Let's leave it up to Wladimir if he prefers the old version or the new one, it's not really important. > However, just by passing around classes instead singleton objects, we won't > prevent having a state anyway. In Python, classes are first-class objects too. > You can dynamically assign members to a class, as you would assign them to an > instance of that class. The point is that we're telling the compiler/interpreter (as well as anyone reading the code) that these are static methods that do not change internal state. If we don't instantiate these objects, they cannot have internal state.
http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... File ensureDeps.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... ensureDeps.py:28: class Mercurial(): On 2014/09/05 14:01:54, Felix H. Dahlke wrote: > On 2014/09/05 10:47:06, Sebastian Noack wrote: > > However, just by passing around classes instead singleton objects, we won't > > prevent having a state anyway. In Python, classes are first-class objects too. > > You can dynamically assign members to a class, as you would assign them to an > > instance of that class. > > The point is that we're telling the compiler/interpreter (as well as anyone > reading the code) that these are static methods that do not change internal > state. If we don't instantiate these objects, they cannot have internal state. Not in Python. staticmethod is just a regular function, evaluated at runtime, wrapping our function in a way that it doesn't get bound to the instance when accessed. The interpreter is entirely agnostic about static methods and other decorators.
On 2014/09/05 14:24:46, Sebastian Noack wrote: > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > File ensureDeps.py (right): > > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > ensureDeps.py:28: class Mercurial(): > On 2014/09/05 14:01:54, Felix H. Dahlke wrote: > > On 2014/09/05 10:47:06, Sebastian Noack wrote: > > > However, just by passing around classes instead singleton objects, we won't > > > prevent having a state anyway. In Python, classes are first-class objects > too. > > > You can dynamically assign members to a class, as you would assign them to > an > > > instance of that class. > > > > The point is that we're telling the compiler/interpreter (as well as anyone > > reading the code) that these are static methods that do not change internal > > state. If we don't instantiate these objects, they cannot have internal state. > > Not in Python. staticmethod is just a regular function, evaluated at runtime, > wrapping our function in a way that it doesn't get bound to the instance when > accessed. The interpreter is entirely agnostic about static methods and other > decorators. How this is implemented doesn't really matter, my point is that we're telling Python and other developers reading the code that this is a static method, and should have the semantics of a static method, i.e. should not change state.
On 2014/09/05 17:05:39, Felix H. Dahlke wrote: > On 2014/09/05 14:24:46, Sebastian Noack wrote: > > > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > > File ensureDeps.py (right): > > > > > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > > ensureDeps.py:28: class Mercurial(): > > On 2014/09/05 14:01:54, Felix H. Dahlke wrote: > > > On 2014/09/05 10:47:06, Sebastian Noack wrote: > > > > However, just by passing around classes instead singleton objects, we > won't > > > > prevent having a state anyway. In Python, classes are first-class objects > > too. > > > > You can dynamically assign members to a class, as you would assign them to > > an > > > > instance of that class. > > > > > > The point is that we're telling the compiler/interpreter (as well as anyone > > > reading the code) that these are static methods that do not change internal > > > state. If we don't instantiate these objects, they cannot have internal > state. > > > > Not in Python. staticmethod is just a regular function, evaluated at runtime, > > wrapping our function in a way that it doesn't get bound to the instance when > > accessed. The interpreter is entirely agnostic about static methods and other > > decorators. > > How this is implemented doesn't really matter, my point is that we're telling > Python and other developers reading the code that this is a static method, and > should have the semantics of a static method, i.e. should not change state. But that is wrong. Merely it means that its state is global, just like with any other object that has only one instance.
On 2014/09/05 17:10:18, Sebastian Noack wrote: > On 2014/09/05 17:05:39, Felix H. Dahlke wrote: > > On 2014/09/05 14:24:46, Sebastian Noack wrote: > > > > > > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > > > File ensureDeps.py (right): > > > > > > > > > http://codereview.adblockplus.org/5168251361296384/diff/5629499534213120/ensu... > > > ensureDeps.py:28: class Mercurial(): > > > On 2014/09/05 14:01:54, Felix H. Dahlke wrote: > > > > On 2014/09/05 10:47:06, Sebastian Noack wrote: > > > > > However, just by passing around classes instead singleton objects, we > > won't > > > > > prevent having a state anyway. In Python, classes are first-class > objects > > > too. > > > > > You can dynamically assign members to a class, as you would assign them > to > > > an > > > > > instance of that class. > > > > > > > > The point is that we're telling the compiler/interpreter (as well as > anyone > > > > reading the code) that these are static methods that do not change > internal > > > > state. If we don't instantiate these objects, they cannot have internal > > state. > > > > > > Not in Python. staticmethod is just a regular function, evaluated at > runtime, > > > wrapping our function in a way that it doesn't get bound to the instance > when > > > accessed. The interpreter is entirely agnostic about static methods and > other > > > decorators. > > > > How this is implemented doesn't really matter, my point is that we're telling > > Python and other developers reading the code that this is a static method, and > > should have the semantics of a static method, i.e. should not change state. > > But that is wrong. Merely it means that its state is global, just like with any > other object that has only one instance. If we were using staticmethod we wouldn't need an instance to begin with - that's my whole point. What I care about is communicating _intent_ - how it works under the hood doesn't matter too much. We prefix private variables so people know they're supposed to be private - we don't go to any lengths to ensure they are private - data hiding is just a high level concept anyway. So with making these methods static, we communicate the fact that they do not need access to any specific instance of the class (nor to any class variables). Having a class that has only static methods says it's not meant to be instantiated. Something that's not meant to be instantiated communicates that it has no state (beyond global state everything has access to anyway).
In addition to addressing the comments, the new patch set added a bunch of parameters to ensure_deps(). These can only be used when the script is used as a module which should be the case for createNightlies.py. We have a few additional requirements there: 1) No self-update (it uses its own buildtools checkout). 2) Don't check out buildtools (unnecessary). 3) Custom repository root (local, not remote). Note that these requirements are due to optimizations in our current build process on the server. We will likely drop these optimizations once we have a real build server, so I don't want to add command line parameters for this functionality. On 2014/09/05 08:55:52, Felix H. Dahlke wrote: > I think it would be neat to have a small example dependencies file on top, or to > print one when no file could be found or something like that. I chose to do both. > Also, where were you planning to put this script? buildtools repository. We discussed the pro and contra a while ago, and while buildtools isn't the cleanest approach it has the big advantage that in most cases we will be able to rely on self-update. So the script will update itself automatically once the dependency on buildtools is updated and the repository contains a newer ensure_dependencies.py.
http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:127: return os.path.join(path, *[f for f in subpath.split("/") if f not in (os.curdir, os.pardir)]) This implementation was indeed overly simplistic, e.g. a backslash in the path would trick it on Windows. I changed it to consider various invalid conditions and to raise an exception. http://codereview.adblockplus.org/5168251361296384/diff/5641332169113600/ensu... ensure_dependencies.py:187: for dir in config: On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > Why not do something like |for dir, dep in config| here? You're using > |config[dir]| a few times below. Done.
Still LGTM.
Looks pretty good now, just some minor suggestions. http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... ensure_dependencies.py:30: usage = """ Nit: I'd argue that this is a "constant", so I think it should be in capitals. http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... ensure_dependencies.py:146: if any(sep in normpath for sep in forbidden): I think we should do this check on subpath, before attempting to normalise it. posixpath.normpath just seems to ignore backslashes and drive letters e.g., but I still think it'd be cleaner to not call it on potential non-POSIX paths.
On 2014/09/06 20:48:32, Felix H. Dahlke wrote: > Looks pretty good now, just some minor suggestions. > > http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... > File ensure_dependencies.py (right): > > http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... > ensure_dependencies.py:30: usage = """ > Nit: I'd argue that this is a "constant", so I think it should be in capitals. > > http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... > ensure_dependencies.py:146: if any(sep in normpath for sep in forbidden): > I think we should do this check on subpath, before attempting to normalise it. > posixpath.normpath just seems to ignore backslashes and drive letters e.g., but > I still think it'd be cleaner to not call it on potential non-POSIX paths. LGTM, don't want to hold this back while on holiday. Feel free to address my comments, but it's fine as it is.
@Sebastian: Felix is on vacation so it is up to you to verify that I addressed his comments correctly. http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... File ensure_dependencies.py (right): http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... ensure_dependencies.py:30: usage = """ On 2014/09/06 20:48:32, Felix H. Dahlke wrote: > Nit: I'd argue that this is a "constant", so I think it should be in capitals. Done. http://codereview.adblockplus.org/5168251361296384/diff/5735735550279680/ensu... ensure_dependencies.py:146: if any(sep in normpath for sep in forbidden): On 2014/09/06 20:48:32, Felix H. Dahlke wrote: > I think we should do this check on subpath, before attempting to normalise it. > posixpath.normpath just seems to ignore backslashes and drive letters e.g., but > I still think it'd be cleaner to not call it on potential non-POSIX paths. Done.
Still LGTM. |