ensureDeps.py - Issue 5168251361296384: Issue 170 - Replacing Mercurial subrepositories

Unified Diff: ensureDeps.py

Issue 5168251361296384: Issue 170 - Replacing Mercurial subrepositories (Closed)

Patch Set: Created Sept. 1, 2014, 8:14 p.m.

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View side-by-side diff with in-line comments

Index: ensureDeps.py

===================================================================

new file mode 100644

--- /dev/null

+++ b/ensureDeps.py

@@ -0,0 +1,210 @@

+#!/usr/bin/env python

+# coding: utf-8

+# This file is part of the Adblock Plus build tools,

+# Adblock Plus is free software: you can redistribute it and/or modify

+# it under the terms of the GNU General Public License version 3 as

+# published by the Free Software Foundation.

+# Adblock Plus is distributed in the hope that it will be useful,

+# but WITHOUT ANY WARRANTY; without even the implied warranty of

+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

+# GNU General Public License for more details.

+# You should have received a copy of the GNU General Public License

+# along with Adblock Plus. If not, see <http://www.gnu.org/licenses/>.

+import sys

+import os

+import re

+import codecs

+import errno

+import subprocess

+import urlparse

+from collections import OrderedDict

+class Mercurial():

Sebastian Noack 2014/09/02 08:12:30 Since Mercurial and Git only have static functions

Wladimir Palant 2014/09/02 14:48:03 I'd rather keep it all in one file. Keep in mind t

Sebastian Noack 2014/09/03 19:23:07 Agreed. But you could just initiate those classes

Felix Dahlke 2014/09/05 08:55:53 I actually preferred the static methods :P As it w

Sebastian Noack 2014/09/05 09:29:30 Think of those classes as singletons. There is sup

Felix Dahlke 2014/09/05 09:55:01 A singleton is arguably also the wrong choice when

Sebastian Noack 2014/09/05 10:47:06 We just need two objects, with different implement

We just need two objects, with different implementation, that behave the same. If you call them namespaces, classes or objects doesn't matter. Also state doesn't matter here. If they need to have a state that is fine, if they don't it is fine too. However, just by passing around classes instead singleton objects, we won't prevent having a state anyway. In Python, classes are first-class objects too. You can dynamically assign members to a class, as you would assign them to an instance of that class.

Felix Dahlke 2014/09/05 14:01:54 Sure, you normally don't pass namespaces around, i

On 2014/09/05 10:47:06, Sebastian Noack wrote: > On 2014/09/05 09:55:01, Felix H. Dahlke wrote: > > On 2014/09/05 09:29:30, Sebastian Noack wrote: > > > On 2014/09/05 08:55:53, Felix H. Dahlke wrote: > > > > On 2014/09/03 19:23:07, Sebastian Noack wrote: > > > > > On 2014/09/02 14:48:03, Wladimir Palant wrote: > > > > > > On 2014/09/02 08:12:30, Sebastian Noack wrote: > > > > > > > Since Mercurial and Git only have static functions, I would prefer > to > > > use > > > > > > > submodules instead classes. That way you won't have to decorate any > > > > function > > > > > > as > > > > > > > static, and improve the encapsulation of the code. > > > > > > > > > > > > I'd rather keep it all in one file. Keep in mind that this file will > > need > > > to > > > > > be > > > > > > copied into all our repositories. > > > > > > > > > > Agreed. But you could just initiate those classes below. So you don't > need > > > to > > > > > decorate each method as static. > > > > > > > > I actually preferred the static methods :P As it was, it was completely > > clear > > > > that Mercurial and Git don't have any state, IMO it was obvious that they > > were > > > > just meant as a namespace. > > > > > > > > Now, each method is getting a self reference and you have to read the code > > to > > > > figure out if they have state or not. There's no point in having an > instance > > > of > > > > a class without any state. > > > > > > Think of those classes as singletons. There is supposed to be only one > > instance. > > > If there would be a state, it wouldn't make any difference whether it is > > stored > > > in the instance, the class or global variables. > > > > A singleton is arguably also the wrong choice when you need a namespace. A > class > > with only static functions is a common workaround when there's no namespace > > construct - even Java developers normally don't come up with Utils singletons > > and the like. > > We just need two objects, with different implementation, that behave the same. > If you call them namespaces, classes or objects doesn't matter. Also state > doesn't matter here. If they need to have a state that is fine, if they don't it > is fine too.

Sure, you normally don't pass namespaces around, in Java we would have to have an interface and two objects of classes that implement it. Good thing this is not Java :) But it doesn't really matter too much. Let's leave it up to Wladimir if he prefers the old version or the new one, it's not really important.

The point is that we're telling the compiler/interpreter (as well as anyone reading the code) that these are static methods that do not change internal state. If we don't instantiate these objects, they cannot have internal state.

Sebastian Noack 2014/09/05 14:24:46 Not in Python. staticmethod is just a regular func

+ @staticmethod

+ def istype(repodir):

+ return os.path.exists(os.path.join(repodir, ".hg"))

+ @staticmethod

+ def clone(source, target):

+ if not source.endswith("/"):

+ source += "/"

+ subprocess.check_call(["hg", "clone", "--quiet", "--noupdate", source, target])

+ @staticmethod

+ def get_revision_id(repo, rev=None):

+ command = ["hg", "id", "--repository", repo, "--id"]

+ if rev:

+ command.extend(["--rev", rev])

+ result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()

Sebastian Noack 2014/09/02 08:12:30 Any reason, not using check_output() here?

Wladimir Palant 2014/09/02 14:48:03 Yes, I need to drop stderr output. "hg id" won't b

Sebastian Noack 2014/09/03 19:23:07 subprocess.check_output(stderr=open(os.devnull, "w

Wladimir Palant 2014/09/03 21:57:47 What about exceptions? There is no subprocess.outp

Sebastian Noack 2014/09/04 10:07:30 Sure, if you also have to ignore non-zero status c

+ return result.strip()

+ @staticmethod

+ def pull(repo):

+ subprocess.check_call(["hg", "pull", "--repository", repo, "--quiet"])

+ @staticmethod

+ def update(repo, rev):

+ subprocess.check_call(["hg", "update", "--repository", repo, "--quiet", "--check", "--rev", rev])

+class Git():

+ @staticmethod

+ def istype(repodir):

+ return os.path.exists(os.path.join(repodir, ".git"))

+ @staticmethod

+ def clone(source, target):

+ source = source.rstrip("/")

+ if not source.endswith(".git"):

+ source += ".git"

+ subprocess.check_call(["git", "clone", "--quiet", "--no-checkout", source, target])

+ @staticmethod

+ def get_revision_id(repo, rev="HEAD"):

+ command = ["git", "-C", repo, "rev-parse", "--revs-only", rev]

+ result, dummy = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()

Sebastian Noack 2014/09/02 08:12:30 Again, any reason you don't use check_output()?

Wladimir Palant 2014/09/02 14:48:03 With the --revs-only flag - no real reason, I can

+ return result.strip()

+ @staticmethod

+ def pull(repo):

+ subprocess.check_call(["git", "-C", repo, "pull", "--quiet", "--all"])

+ @staticmethod

+ def update(repo, rev):

+ subprocess.check_call(["git", "-C", repo, "checkout", "--quiet", rev])

+repo_types = {

+ "hg": Mercurial,

+ "git": Git,

+def parse_spec(path, line):

+ if "=" not in line:

+ print >>sys.stderr, "Invalid line in file %s: %s" % (path, line)

Sebastian Noack 2014/09/02 08:12:30 I prefer to use the warnings module here, instead

Wladimir Palant 2014/09/02 14:48:03 I ended up overriding warning.showwarning() to avo

Sebastian Noack 2014/09/03 19:23:07 You should have a look at warnings.filterwarnings(

Wladimir Palant 2014/09/03 21:57:47 I already did, so what? I don't need to switch off

Sebastian Noack 2014/09/04 10:07:30 I thought you were talking about filtering out irr

Wladimir Palant 2014/09/04 13:32:12 Thank you, the logging module is indeed a lot bett

+ return None, None

+ key, value = line.split("=", 1)

+ key = key.strip()

+ items = value.split()

+ if not len(items):

+ print >>sys.stderr, "No value specified for key %s in file %s" % (key, path)

+ return key, None

+ result = OrderedDict()

+ if not key.startswith("_"):

+ result["_source"] = items.pop(0)

+ for item in items:

+ if ":" in item:

+ type, value = item.split(":", 1)

+ else:

+ type, value = ("*", item)

+ if type in result:

+ print >>sys.stderr, "Ignoring duplicate value for type %s (key %s in file %s)" % (type, key, path)

+ else:

+ result[type] = value

+ return key, result

+def read_deps(repodir):

+ result = {}

+ deps_path = os.path.join(repodir, ".sub")

+ try:

+ with codecs.open(deps_path, "r", encoding="utf-8") as handle:

+ for line in handle:

+ # Remove comments and whitespace

+ line = re.sub(r"#.*", "", line).strip()

+ if not line:

+ continue

+ key, spec = parse_spec(deps_path, line)

+ if spec:

+ result[key] = spec

+ return result

+ except IOError, e:

+ if e.errno != errno.ENOENT:

+ raise

+ return None

+def safe_join(path, subpath):

+ return os.path.join(path, *filter(lambda f: f.count(".") != len(f), subpath.split("/")))

Sebastian Noack 2014/09/02 08:12:30 I prefer list comprehensions over filter(lambda: .

+def get_repo_type(repo):

+ for name, repotype in repo_types.iteritems():

+ if repotype.istype(repo):

+ return name

+ return None

+def ensure_repo(config, parentrepo, dir):

+ target = safe_join(parentrepo, dir)

+ if os.path.exists(target):

+ return

+ parenttype = get_repo_type(parentrepo)

+ type = None

+ for key in config.get("_root", {}):

+ if key == parenttype or (key in repo_types and type == None):

Sebastian Noack 2014/09/02 08:12:30 Please use "is" operator when comparing to None.

+ type = key

+ if type == None:

+ raise Exception("No valid source found to create %s" % dir)

+ url = urlparse.urljoin(config["_root"][type], config[dir]["_source"])

+ print "Cloning repository %s into %s" % (url, target)

+ repo_types[type].clone(url, target)

+def update_repo(config, parentrepo, dir):

+ target = safe_join(parentrepo, dir)

+ type = get_repo_type(target)

+ if type == None:

+ print >>sys.stderr, "Type of repository %s unknown, skipping update" % target

+ return

+ if type in config[dir]:

+ revision = config[dir][type]

+ elif "*" in config[dir]:

+ revision = config[dir]["*"]

+ else:

+ print >>sys.stderr, "No revision specified for repository %s (type %s), skipping update" % (target, type)

+ return

+ resolved_revision = repo_types[type].get_revision_id(target, revision)

+ if not resolved_revision:

+ print "Revision %s is unknown, downloading remote changes" % revision

+ repo_types[type].pull(target)

+ resolved_revision = repo_types[type].get_revision_id(target, revision)

+ if not resolved_revision:

+ raise Exception("Failed to resolve revision %s" % revision)

+ current_revision = repo_types[type].get_revision_id(target)

+ if resolved_revision != current_revision:

+ print "Updating repository %s to revision %s" % (target, resolved_revision)

+ repo_types[type].update(target, resolved_revision)

+def resolve_deps(repodir, level=0):

+ config = read_deps(repodir)

+ if config == None:

+ if level == 0:

+ print >>sys.stderr, "No .sub file in directory %s, nothing to do..." % repodir

+ return

+ if level >= 10:

+ print >>sys.stderr, "Too much subrepository nesting, ignoring %s" % repo

+ for dir in config:

+ if dir.startswith("_"):

+ continue

+ target = safe_join(repodir, dir)

+ ensure_repo(config, repodir, dir)

+ update_repo(config, repodir, dir)

+ resolve_deps(target, level + 1)

+if __name__ == "__main__":

+ repos = sys.argv[1:]

+ if not len(repos):

+ repos = [os.getcwd()]

+ for repo in repos:

+ resolve_deps(repo)

« no previous file with comments | « no previous file | no next file » | no next file with comments »