|
|
Created:
June 15, 2015, 2:12 p.m. by kzar Modified:
July 16, 2015, 1:25 p.m. CC:
Felix Dahlke Visibility:
Public. |
DescriptionIssue 2625 - [cms] Crowdin synchronisation script
Patch Set 1 #
Total comments: 64
Patch Set 2 : Addressed Wladimir's feedback #
Total comments: 42
Patch Set 3 : Addressed further feedback #Patch Set 4 : Slightly simplified request exception logic #
Total comments: 39
Patch Set 5 : Addressed Sebastian's feedback #
Total comments: 6
Patch Set 6 : Addressed more feedback from Sebastian #
Total comments: 7
Patch Set 7 : Addressed even more feedback from Sebastian #
Total comments: 3
Patch Set 8 : Give query_params a default value #
MessagesTotal messages: 24
Patch Set 1 https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:128: def localize_string(self, name, default, comment, localedata, escapes): We previously threw away string comments, now we need to keep track of them to help our translators. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:153: # Append a note about fixed strings to the string comment for translators Without this translators would have way to know what fixed strings like {1} or {2} represented in the strings they are translating. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:370: default = self._params["source"].read_locale(self._params["locale"], page)[name] This change is required so that strings included with the get_strings global function are stored with the parent page. That way the translators have proper context for these strings and locale files for the "master" language do not have to be listed and read manually when synching with Crowdin. https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py#newc... cms/sources.py:173: result = OrderedDict() This is required to keep the correct order of page strings uploaded to Crowdin. Otherwise I found a lot of context was lost when viewing a page's strings in Crowdin.
https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:19: from itertools import islice Nit: I'm not a big fan of importing individual functions, other than in cases where function name and module name are identical - when looking at a function call, it becomes less obvious that the function is imported from somewhere else as opposed to being defined in this file. Writing io.BytesIO() isn't much longer than BytesIO() but makes the code more obvious. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:36: defaultlocale = None The three variables above shouldn't be globals - they refer to a specific invocation of crowdin_sync and should be passed in a parameters wherever appropriate. Note that you can encapsulate API key and project name in a CrowdinAPI class - then you only need to pass in an instance of that class, and the functions can call api.GET("info") for example without any extra parameters. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:45: break Please link to https://stackoverflow.com/questions/3992735/python-generator-that-groups-anot... which is where you've got that code it seems. IMPORTANT: *Always* indicate code that you didn't write yourself. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:51: response = requests.request(request_method, url, **kwargs) You seem to be assuming that this will not throw an exception. However, HTTPError is about HTTP response codes like 404, but there is also ConnectionError and the like - e.g. if the host name could not be resolved. I'm pretty sure that those will be raised when requests.request() is called. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:57: return response.json() This call might also raise an exception - if JSON decoding fails. You need to intercept it as well if you want to log the error consistently. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:73: def ensure_required_locales(required_locales, enabled_locales, defaultlocale): Nit: configure_locales maybe? This is more about configuring the project than anything else. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:78: skipped_locales = list(required_locales.difference(supported_locales)) Nit: why convert a set to list here? Just change skipped_locales.append() into skipped_locales.add() below and it should work. Nit: skipped_locales = required_locales - supported_locales? https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:84: required_locales = required_locales.intersection(supported_locales) Nit: required_locales = required_locales & supported_locales? Or maybe even more obvious: required_locales -= skipped_locales. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:93: data={"languages[]": list(enabled_locales.union(required_locales))} Nit: list(enabled_locales | required_locales) please. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:110: map(parse_file_node, project_info["files"]) You are misusing map() here, its callback should not have side-effects - the result returned should rather be used. I think that here the best option is to keep it simple: for node in project_info["files"]: parse_file_node(node) https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:125: path += "/" I think the following should be simpler: local_files.add(page + ".json") while "/" in page: page = page.rsplit("/", 1)[0] local_directories.add(page) https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:137: page = file_name[:-5] Use os.path.splitext()? https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:139: del page_strings[page] Why delete the strings? https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:163: "files for locale %s..." % (len(files), locale)) Uploading per locale might result in doing one request for each locale. You could compile the complete list instead: def open_files(): for locale in required_locales: for file in files: path = os.path.join(source_dir, "locales", locale, file) if os.path.isfile(path): yield ("files[%s]" % file, open(path, "r")) https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:167: f[1].close() These files should be closed regardless of whether the request fails. Meaning: try: logger.info(...) crowdin_request(...) finally: for fieldname, file in files: file.close() Note that I'd really prefer deconstructing the tuples properly in the loop instead of using f[1]. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:205: archive.extractall(locale_path) Running extractall is pretty dangerous IMHO. I'd suggest iterating over all files, checking that they are valid (not default locale, ending with .json), then extracting them individually. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:216: project_info = crowdin_request("GET", "info") Nit: The two lines above and setting enabled_locales can be done outside the with block. In fact, this seems to belong logically into the ensure_required_locales function - it should just check which locales are already configured. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:222: required_locales, skipped_locales = ensure_required_locales( Why do we need to know the skipped locales here? The warning has been shown already, and ignoring them in the download seems to make no sense - the locales are skipped because they aren't supported by Crowdin, meaning that they cannot be part of the download. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:230: logger.error("No page strings found. (Wrong project directory?) Aborting!") Nit: The settings.ini file is there, so it cannot be a wrong directory. "No existing strings found, maybe project directory not set up correctly?" https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:234: new_directories = list(local_directories.difference(remote_directories)) Nit: I don't see the point converting sets to lists here. Also, can just use the minus operator. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:239: existing_files = list(local_files.intersection(remote_files)) Nit: existing_files = local_files - new_files? https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:151: if locale == self._params["defaultlocale"]: This case is already checked above (getting translation), just move your code up there. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:157: "[" + ", ".join(["%d: %s" % (i, s) for i, s in ", ".join("{%d}: %s" (i, s) for i, s in ...) In other words: You don't need to create a temporary list here, a generator will do. Also, we can specifically mention {1}, no need to expect the translators to make the connection from numbers to placeholders. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:159: ) And what about comments for strings without fixed parts? https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:199: # Note: The comment, it is only relevant when uploading translations This is the wrong place for this comment, it is describing function localize_string(). https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:365: # Note: The comment, it is only relevant when uploading translations Same here, this comment should be moved into function localize_string(). https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:370: default = self._params["source"].read_locale(self._params["locale"], page)[name] On 2015/06/15 14:24:28, kzar wrote: > This change is required so that strings included with the get_strings global > function are stored with the parent page. That way the translators have proper > context for these strings and locale files for the "master" language do not have > to be listed and read manually when synching with Crowdin. This change also happens to be wrong. Each page in the web.adblockplus.org repository calls get_string("installation", "menu") - our translations have it defined in the file menu.json rather than duplicated everywhere. So, as you noticed here, merely filling up params["localedata"] doesn't work - sometimes pages use strings from multiple locale files. What we probably need is some kind of callback: if "record_default" in self._params: self._params["record_default"](name, default, comment, fixed_strings) This callback can be specified as an optional parameter to cms.utils.get_page_params(). https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py#newc... cms/sources.py:173: result = OrderedDict() On 2015/06/15 14:24:28, kzar wrote: > This is required to keep the correct order of page strings uploaded to Crowdin. > Otherwise I found a lot of context was lost when viewing a page's strings in > Crowdin. Crowdin isn't very good at keeping that order but it won't hurt... https://codereview.adblockplus.org/29317015/diff/29317016/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/utils.py#newcode33 cms/utils.py:33: "localecomments": {}, This should no longer be necessary if locale data is collected via callback.
Patch Set 2 : Addressed Wladimir's feedback https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:19: from itertools import islice On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: I'm not a big fan of importing individual functions, other than in cases > where function name and module name are identical - when looking at a function > call, it becomes less obvious that the function is imported from somewhere else > as opposed to being defined in this file. Writing io.BytesIO() isn't much longer > than BytesIO() but makes the code more obvious. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:36: defaultlocale = None On 2015/06/29 19:05:37, Wladimir Palant wrote: > The three variables above shouldn't be globals - they refer to a specific > invocation of crowdin_sync and should be passed in a parameters wherever > appropriate. > > Note that you can encapsulate API key and project name in a CrowdinAPI class - > then you only need to pass in an instance of that class, and the functions can > call api.GET("info") for example without any extra parameters. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:45: break On 2015/06/29 19:05:36, Wladimir Palant wrote: > Please link to > https://stackoverflow.com/questions/3992735/python-generator-that-groups-anot... > which is where you've got that code it seems. > > IMPORTANT: *Always* indicate code that you didn't write yourself. I originally did take this function from Stackoverflow and I added a comment to indicate that https://github.com/kzar/cms/commit/4196784#diff-d9a5accf75fa7c66e418d6e9b1c78... . Since then I was unhappy with the implementation and the others I found on Stackoverflow and I wrote this one myself, mixing together parts from various examples. I was unsure weather to leave the comment in or not before posting the review as I was no longer left with code that exactly matched any of the examples. I decided to leave it out and I should have mentioned that in the review, sorry. I'm happy to either add or leave the comment out, I agree it's very important to give proper attribution. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:51: response = requests.request(request_method, url, **kwargs) On 2015/06/29 19:05:36, Wladimir Palant wrote: > You seem to be assuming that this will not throw an exception. However, > HTTPError is about HTTP response codes like 404, but there is also > ConnectionError and the like - e.g. if the host name could not be resolved. I'm > pretty sure that those will be raised when requests.request() is called. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:57: return response.json() On 2015/06/29 19:05:37, Wladimir Palant wrote: > This call might also raise an exception - if JSON decoding fails. You need to > intercept it as well if you want to log the error consistently. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:73: def ensure_required_locales(required_locales, enabled_locales, defaultlocale): On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: configure_locales maybe? This is more about configuring the project than > anything else. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:78: skipped_locales = list(required_locales.difference(supported_locales)) On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: why convert a set to list here? Just change skipped_locales.append() into > skipped_locales.add() below and it should work. > > Nit: skipped_locales = required_locales - supported_locales? Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:84: required_locales = required_locales.intersection(supported_locales) On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: required_locales = required_locales & supported_locales? Or maybe even more > obvious: required_locales -= skipped_locales. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:93: data={"languages[]": list(enabled_locales.union(required_locales))} On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: list(enabled_locales | required_locales) please. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:110: map(parse_file_node, project_info["files"]) On 2015/06/29 19:05:37, Wladimir Palant wrote: > You are misusing map() here, its callback should not have side-effects - the > result returned should rather be used. I think that here the best option is to > keep it simple: > > for node in project_info["files"]: > parse_file_node(node) Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:125: path += "/" On 2015/06/29 19:05:36, Wladimir Palant wrote: > I think the following should be simpler: > > local_files.add(page + ".json") > while "/" in page: > page = page.rsplit("/", 1)[0] > local_directories.add(page) Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:137: page = file_name[:-5] On 2015/06/29 19:05:37, Wladimir Palant wrote: > Use os.path.splitext()? Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:139: del page_strings[page] On 2015/06/29 19:05:37, Wladimir Palant wrote: > Why delete the strings? We no longer need them, and as they might be quite large I thought it was worth deleting them to avoid wasting memory. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:163: "files for locale %s..." % (len(files), locale)) On 2015/06/29 19:05:37, Wladimir Palant wrote: > Uploading per locale might result in doing one request for each locale. You > could compile the complete list instead: > > def open_files(): > for locale in required_locales: > for file in files: > path = os.path.join(source_dir, "locales", locale, file) > if os.path.isfile(path): > yield ("files[%s]" % file, open(path, "r")) This is deliberate, when uploading translations you are allowed to upload multiple files at once but they must all be for the same language. https://crowdin.com/page/api/upload-translation https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:167: f[1].close() On 2015/06/29 19:05:37, Wladimir Palant wrote: > These files should be closed regardless of whether the request fails. Meaning: > > try: > logger.info(...) > crowdin_request(...) > finally: > for fieldname, file in files: > file.close() > > Note that I'd really prefer deconstructing the tuples properly in the loop > instead of using f[1]. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:205: archive.extractall(locale_path) On 2015/06/29 19:05:37, Wladimir Palant wrote: > Running extractall is pretty dangerous IMHO. I'd suggest iterating over all > files, checking that they are valid (not default locale, ending with .json), > then extracting them individually. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:216: project_info = crowdin_request("GET", "info") On 2015/06/29 19:05:37, Wladimir Palant wrote: > Nit: The two lines above and setting enabled_locales can be done outside the > with block. In fact, this seems to belong logically into the > ensure_required_locales function - it should just check which locales are > already configured. I originally didn't request the project information until after extracting the page strings but during testing I found it to be a real pain. What would often happen was the API key, project name or something else was wrong and instead of getting immediate feedback you had to wait several minutes to find out. This way you know immediately if there's a problem. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:222: required_locales, skipped_locales = ensure_required_locales( On 2015/06/29 19:05:37, Wladimir Palant wrote: > Why do we need to know the skipped locales here? The warning has been shown > already, and ignoring them in the download seems to make no sense - the locales > are skipped because they aren't supported by Crowdin, meaning that they cannot > be part of the download. We need to avoid deleting locale files for skipped_locales, otherwise pre-existing translations for an unsupported locale will be lost. (Perhaps the locale directory just needs to be renamed so the locale matches what Crowdin expects?) https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:230: logger.error("No page strings found. (Wrong project directory?) Aborting!") On 2015/06/29 19:05:37, Wladimir Palant wrote: > Nit: The settings.ini file is there, so it cannot be a wrong directory. "No > existing strings found, maybe project directory not set up correctly?" Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:234: new_directories = list(local_directories.difference(remote_directories)) On 2015/06/29 19:05:37, Wladimir Palant wrote: > Nit: I don't see the point converting sets to lists here. Also, can just use the > minus operator. (Cool I found this one myself and later noticed your comment :) ) Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:239: existing_files = list(local_files.intersection(remote_files)) On 2015/06/29 19:05:36, Wladimir Palant wrote: > Nit: existing_files = local_files - new_files? Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:151: if locale == self._params["defaultlocale"]: On 2015/06/29 19:05:38, Wladimir Palant wrote: > This case is already checked above (getting translation), just move your code up > there. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:157: "[" + ", ".join(["%d: %s" % (i, s) for i, s in On 2015/06/29 19:05:38, Wladimir Palant wrote: > ", ".join("{%d}: %s" (i, s) for i, s in ...) > > In other words: You don't need to create a temporary list here, a generator will > do. Also, we can specifically mention {1}, no need to expect the translators to > make the connection from numbers to placeholders. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:159: ) On 2015/06/29 19:05:37, Wladimir Palant wrote: > And what about comments for strings without fixed parts? Whoops, good point. Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:199: # Note: The comment, it is only relevant when uploading translations On 2015/06/29 19:05:38, Wladimir Palant wrote: > This is the wrong place for this comment, it is describing function > localize_string(). Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:365: # Note: The comment, it is only relevant when uploading translations On 2015/06/29 19:05:38, Wladimir Palant wrote: > Same here, this comment should be moved into function localize_string(). Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/converters.py#n... cms/converters.py:370: default = self._params["source"].read_locale(self._params["locale"], page)[name] On 2015/06/29 19:05:38, Wladimir Palant wrote: > On 2015/06/15 14:24:28, kzar wrote: > > This change is required so that strings included with the get_strings global > > function are stored with the parent page. That way the translators have proper > > context for these strings and locale files for the "master" language do not > have > > to be listed and read manually when synching with Crowdin. > > This change also happens to be wrong. Each page in the http://web.adblockplus.org > repository calls get_string("installation", "menu") - our translations have it > defined in the file menu.json rather than duplicated everywhere. > > So, as you noticed here, merely filling up params["localedata"] doesn't work - > sometimes pages use strings from multiple locale files. What we probably need is > some kind of callback: > > if "record_default" in self._params: > self._params["record_default"](name, default, comment, fixed_strings) > > This callback can be specified as an optional parameter to > cms.utils.get_page_params(). Done. https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/sources.py#newc... cms/sources.py:173: result = OrderedDict() On 2015/06/29 19:05:38, Wladimir Palant wrote: > On 2015/06/15 14:24:28, kzar wrote: > > This is required to keep the correct order of page strings uploaded to > Crowdin. > > Otherwise I found a lot of context was lost when viewing a page's strings in > > Crowdin. > > Crowdin isn't very good at keeping that order but it won't hurt... I found that it does help in practice for our website. https://codereview.adblockplus.org/29317015/diff/29317016/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/utils.py#newcode33 cms/utils.py:33: "localecomments": {}, On 2015/06/29 19:05:38, Wladimir Palant wrote: > This should no longer be necessary if locale data is collected via callback. Done.
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests I see, we use "requests", because urllib doesn't have built-in support for file uploads. However, have you looked into using urllib3 directly? Note that "requests" is just a thing wrapper around urllib3. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:66: if chunk: Nit: You could get rid of the else block: if not chunk: break yield chunk https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:76: store = page_strings.setdefault(page, {}) Do we care to not unnecessarily change the order of strings, e.g. to keep diffs compact? Then we should use an OrderedDict. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:80: comment = comment + "\n" if comment else "" Nit. The ternary operator just adds uneeded complexity here. I'd go for: if comment: comment += "\n" https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:81: comment += ", ".join("{%d}: %s" % (i, s) Nit: No reason to pack/unpack the sequence here: .. % x for x in enumerate(.. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:176: for files in grouper(open_locale_files(locale, new_files), You should better first get the chunk of filenames and then open the files in that chunk, rather than opening all files in advance. https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py#n... cms/converters.py:144: if self._params["record_default_strings"]: Is there any particular reason we call that function only for the default locale? This shouldn't be necessary as we only generate the default locale when we extract translations, right? Also note that record_default_settings will also be available as variable in jinja templates, if you add it to the _params dictionary. So maybe it would be better and simpler to patch Converter.localize_string in cms.bin.translate instead passing that callback around? Note that this would be in line with how cms.bin.generate_static_files is implemented. https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py#newc... cms/sources.py:173: result = collections.OrderedDict() Nit: This could be simplified a little: result = collections.OrderedDict() if locale != default_locale: result.update(self.read_locale(default_locale, page))
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests I just realized that simply using built-in urllib2 with a handwritten request body, doesn't seem too complicated either: boundary = '--' + mimetools.choose_boundary() data = [boundary, 'Content-Disposition: form-data; name="language"', '', locale] for i, filename in enumerate(filenames): data.append(boundary) data.append('Content-Disposition: file; name="files[%d]"; filename="%d"' % (i, i)) data.append('') with open(filename, 'rb') as file: data.append(file.read()) data.append(boundary + '--') data.append('') body = '\r\n'.join(data) request = urllib2.Request(url) request.add_header('Content-Type', 'multipart/form-data; boundary=' + boundary) request.add_header('Content-Length', len(body)) request.set_data(body) response = urllib2.urlopen(request)
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests On 2015/07/08 14:23:00, Sebastian Noack wrote: > I just realized that simply using built-in urllib2 with a handwritten request > body, doesn't seem too complicated either: > > boundary = '--' + mimetools.choose_boundary() > data = [boundary, 'Content-Disposition: form-data; name="language"', '', locale] > for i, filename in enumerate(filenames): > data.append(boundary) > data.append('Content-Disposition: file; name="files[%d]"; filename="%d"' % > (i, i)) > data.append('') > with open(filename, 'rb') as file: > data.append(file.read()) > data.append(boundary + '--') > data.append('') > body = '\r\n'.join(data) > > request = urllib2.Request(url) > request.add_header('Content-Type', 'multipart/form-data; boundary=' + boundary) > request.add_header('Content-Length', len(body)) > request.set_data(body) > response = urllib2.urlopen(request) I discussed using the requests library with Wladimir in #adblockplus before I started writing this code. I don't think it's helpful to continue that discussion now, after weeks have passed and the code has been written, refined and tested.
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests On 2015/07/08 15:26:41, kzar wrote: > On 2015/07/08 14:23:00, Sebastian Noack wrote: > > I just realized that simply using built-in urllib2 with a handwritten request > > body, doesn't seem too complicated either: > > > > boundary = '--' + mimetools.choose_boundary() > > data = [boundary, 'Content-Disposition: form-data; name="language"', '', > locale] > > for i, filename in enumerate(filenames): > > data.append(boundary) > > data.append('Content-Disposition: file; name="files[%d]"; filename="%d"' % > > (i, i)) > > data.append('') > > with open(filename, 'rb') as file: > > data.append(file.read()) > > data.append(boundary + '--') > > data.append('') > > body = '\r\n'.join(data) > > > > request = urllib2.Request(url) > > request.add_header('Content-Type', 'multipart/form-data; boundary=' + > boundary) > > request.add_header('Content-Length', len(body)) > > request.set_data(body) > > response = urllib2.urlopen(request) > > I discussed using the requests library with Wladimir in #adblockplus before I > started writing this code. I don't think it's helpful to continue that > discussion now, after weeks have passed and the code has been written, refined > and tested. I'm certainly curious about the reason, you decided for to use "requests". This library seems to be an unneeded wrapper around "urllib3", which seems to be unneeded for the use case here, in the first place. Usually, we (and in particular Wladimir) are opposed to introduce third-party modules. And if we introduce a new dependency there need to be a good reason to justify that. I don't care that you discussed that with Wladimir already. Since I wasn't involved (or invited) to the discussion back then, I will revisit it.
https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:222: required_locales, skipped_locales = ensure_required_locales( On 2015/07/02 12:33:11, kzar wrote: > We need to avoid deleting locale files for skipped_locales, You can (and should) limit deleting to required_locales. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests Doing multipart encoding manually is very awkward. While we do that in buildtools to avoid introducing a dependency, here we have something that will be run by the server rather than contributors - so a dependency is better than adding code complexity. We use requests because that's what Stack Overflow recommends :). I wasn't aware that it is a wrapper around urllib3 and indeed - using urllib3 would make more sense IMHO. While the requests API might be more convenient, just using urllib3.filepost.encode_multipart_formdata() along with built-in APIs that any Python developer knows will make this code more maintainable. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:39: self.defaultlocale = defaultlocale It doesn't look like this field is ever used - and it seems misplaced here, irrelevant for API requests. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:51: except requests.exceptions.ConnectionError: There are more exception classes - ConnectionError was merely an example but there is also Timeout for example. All of them seem to inherit from RequestError however so catching this one will work. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:76: store = page_strings.setdefault(page, {}) On 2015/07/08 13:03:20, Sebastian Noack wrote: > Do we care to not unnecessarily change the order of strings, e.g. to keep diffs > compact? Then we should use an OrderedDict. Not necessarily relevant for diffs but Crowdin will use the order we define here. So - yes, OrderedDict sounds like a good idea. However, the current code will create a dictionary instance on each call, most of the time it will be thrown away. So maybe a slightly less elegant approach is better: try: store = page_strings[page] except KeyError: store = page_strings[page] = OrderedDict() https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:150: for group in grouper(files, CrowdinAPI.FILES_PER_REQUEST): No need to assume that crowdin_api is a CrowdinAPI instance. Also no need to assume that this constant is identical for all instances. Just use crowdin_api.FILES_PER_REQUEST. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:177: CrowdinAPI.FILES_PER_REQUEST): As above, crowdin_api.FILES_PER_REQUEST please. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:229: archive.extract(member, locale_path) Please use posixpath module here rather than os.path - archive paths don't follow OS path conventions. This should also eliminate the need for calling normpath. https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py#n... cms/converters.py:144: if self._params["record_default_strings"]: On 2015/07/08 13:03:20, Sebastian Noack wrote: > Is there any particular reason we call that function only for the default > locale? This shouldn't be necessary as we only generate the default locale when > we extract translations, right? That's assumptions, the kind that might not be true tomorrow. > So maybe it would be > better and simpler to patch Converter.localize_string in cms.bin.translate > instead passing that callback around? Patch a method to memoize return value without any assumptions about parameters is one thing, wrapping such a complex function something entirely different - that would move many assumptions about its functionality into a completely unrelated piece of code. If anything, we could set the callback as an attribute on Converter *instances* from cms.utils.get_page_params - "inheriting" this callback would introduce significant complexity however and doesn't seem to be worth it. https://codereview.adblockplus.org/29317015/diff/29321284/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/utils.py#newcode21 cms/utils.py:21: record_default_strings=None): Nit: This should be record_default_string as the callback only records one string at a time.
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests On 2015/07/08 23:11:05, Wladimir Palant wrote: > Doing multipart encoding manually is very awkward. While we do that in > buildtools to avoid introducing a dependency, here we have something that will > be run by the server rather than contributors - so a dependency is better than > adding code complexity. Personally, I'd still prefer to go with built-in urllib2 and the implementation I suggested above. The implementation in buildtools is rather hacky, I know. However, I think the snippet I posted above would be reasonable. Also I don't think that it would be justified to use a third-party library just to avoid these few lines. However, it's not important enough to argue about. > We use requests because that's what Stack Overflow recommends :). I wasn't aware > that it is a wrapper around urllib3 and indeed - using urllib3 would make more > sense IMHO. While the requests API might be more convenient, just using > urllib3.filepost.encode_multipart_formdata() along with built-in APIs that any > Python developer knows will make this code more maintainable. Yeah, if you both disagree to go the way I suggested above, I agree that using urllib3 or maybe even better, as you suggested, just urllib/urllib2 in combination with urllib3.filepost.encode_multipart_formdata() would certainly be better than using "requests". https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:88: format=format, record_default_strings=record_string) Recording the default strings is what we do here. But the converter should be agnostic of it. For it it's just a callback to call on every string. And that is what the argument name should reflect. https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py#n... cms/converters.py:144: if self._params["record_default_strings"]: On 2015/07/08 23:11:08, Wladimir Palant wrote: > On 2015/07/08 13:03:20, Sebastian Noack wrote: > > Is there any particular reason we call that function only for the default > > locale? This shouldn't be necessary as we only generate the default locale > when > > we extract translations, right? > > That's assumptions, the kind that might not be true tomorrow. Or the other way around. As far as I see it this code shouldn't be in the default locale code path and the name shouldn't imply it. This will make the logic of that "param" less complicated to understand. > > So maybe it would be > > better and simpler to patch Converter.localize_string in cms.bin.translate > > instead passing that callback around? > > Patch a method to memoize return value without any assumptions about parameters > is one thing, wrapping such a complex function something entirely different - > that would move many assumptions about its functionality into a completely > unrelated piece of code. If anything, we could set the callback as an attribute > on Converter *instances* from cms.utils.get_page_params - "inheriting" this > callback would introduce significant complexity however and doesn't seem to be > worth it. Fair enough.
https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests On 2015/07/09 21:26:55, Sebastian Noack wrote: > Personally, I'd still prefer to go with built-in urllib2 and the implementation > I suggested above. The implementation in buildtools is rather hacky, I know. > However, I think the snippet I posted above would be reasonable. Nope, they are both hacky and I'd definitely prefer some code for this that we don't own. https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py#n... cms/converters.py:144: if self._params["record_default_strings"]: On 2015/07/09 21:26:55, Sebastian Noack wrote: > Or the other way around. True. If we generalize the parameter name into something like "localized_string_callback" then we can just as well record all strings and let the callback choose which ones it needs. It should be called with |result| as parameter rather than |default| of course.
Patch Set 3 : Addressed further feedback https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29317016/cms/bin/translate.p... cms/bin/translate.py:222: required_locales, skipped_locales = ensure_required_locales( On 2015/07/08 23:11:05, Wladimir Palant wrote: > On 2015/07/02 12:33:11, kzar wrote: > > We need to avoid deleting locale files for skipped_locales, > > You can (and should) limit deleting to required_locales. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:26: import requests On 2015/07/10 21:24:04, Wladimir Palant wrote: > On 2015/07/09 21:26:55, Sebastian Noack wrote: > > Personally, I'd still prefer to go with built-in urllib2 and the > implementation > > I suggested above. The implementation in buildtools is rather hacky, I know. > > However, I think the snippet I posted above would be reasonable. > > Nope, they are both hacky and I'd definitely prefer some code for this that we > don't own. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:39: self.defaultlocale = defaultlocale On 2015/07/08 23:11:06, Wladimir Palant wrote: > It doesn't look like this field is ever used - and it seems misplaced here, > irrelevant for API requests. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:51: except requests.exceptions.ConnectionError: On 2015/07/08 23:11:06, Wladimir Palant wrote: > There are more exception classes - ConnectionError was merely an example but > there is also Timeout for example. All of them seem to inherit from RequestError > however so catching this one will work. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:66: if chunk: On 2015/07/08 13:03:19, Sebastian Noack wrote: > Nit: You could get rid of the else block: > > if not chunk: > break > yield chunk Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:76: store = page_strings.setdefault(page, {}) On 2015/07/08 23:11:06, Wladimir Palant wrote: > On 2015/07/08 13:03:20, Sebastian Noack wrote: > > Do we care to not unnecessarily change the order of strings, e.g. to keep > diffs > > compact? Then we should use an OrderedDict. > > Not necessarily relevant for diffs but Crowdin will use the order we define > here. So - yes, OrderedDict sounds like a good idea. > > However, the current code will create a dictionary instance on each call, most > of the time it will be thrown away. So maybe a slightly less elegant approach is > better: > > try: > store = page_strings[page] > except KeyError: > store = page_strings[page] = OrderedDict() Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:80: comment = comment + "\n" if comment else "" On 2015/07/08 13:03:20, Sebastian Noack wrote: > Nit. The ternary operator just adds uneeded complexity here. I'd go for: > > if comment: > comment += "\n" `comment` might be None, hence this logic. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:81: comment += ", ".join("{%d}: %s" % (i, s) On 2015/07/08 13:03:20, Sebastian Noack wrote: > Nit: No reason to pack/unpack the sequence here: > > .. % x for x in enumerate(.. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:88: format=format, record_default_strings=record_string) On 2015/07/09 21:26:55, Sebastian Noack wrote: > Recording the default strings is what we do here. But the converter should be > agnostic of it. For it it's just a callback to call on every string. And that is > what the argument name should reflect. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:150: for group in grouper(files, CrowdinAPI.FILES_PER_REQUEST): On 2015/07/08 23:11:08, Wladimir Palant wrote: > No need to assume that crowdin_api is a CrowdinAPI instance. Also no need to > assume that this constant is identical for all instances. Just use > crowdin_api.FILES_PER_REQUEST. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:176: for files in grouper(open_locale_files(locale, new_files), On 2015/07/08 13:03:20, Sebastian Noack wrote: > You should better first get the chunk of filenames and then open the files in > that chunk, rather than opening all files in advance. `open_locale_files` is a generator and I consume the groups lazily so I don't think the files are opened in advance. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:177: CrowdinAPI.FILES_PER_REQUEST): On 2015/07/08 23:11:05, Wladimir Palant wrote: > As above, crowdin_api.FILES_PER_REQUEST please. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/bin/translate.p... cms/bin/translate.py:229: archive.extract(member, locale_path) On 2015/07/08 23:11:08, Wladimir Palant wrote: > Please use posixpath module here rather than os.path - archive paths don't > follow OS path conventions. This should also eliminate the need for calling > normpath. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/converters.py#n... cms/converters.py:144: if self._params["record_default_strings"]: On 2015/07/10 21:24:04, Wladimir Palant wrote: > On 2015/07/09 21:26:55, Sebastian Noack wrote: > > Or the other way around. > > True. If we generalize the parameter name into something like > "localized_string_callback" then we can just as well record all strings and let > the callback choose which ones it needs. It should be called with |result| as > parameter rather than |default| of course. Done. https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py#newc... cms/sources.py:173: result = collections.OrderedDict() On 2015/07/08 13:03:20, Sebastian Noack wrote: > Nit: This could be simplified a little: > > result = collections.OrderedDict() > if locale != default_locale: > result.update(self.read_locale(default_locale, page)) I think we should leave this unrelated change out of this patchset. https://codereview.adblockplus.org/29317015/diff/29321284/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/utils.py#newcode21 cms/utils.py:21: record_default_strings=None): On 2015/07/08 23:11:09, Wladimir Palant wrote: > Nit: This should be record_default_string as the callback only records one > string at a time. Done.
Patch Set 4 : Slightly simplified request exception logic
https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py#newc... cms/sources.py:173: result = collections.OrderedDict() On 2015/07/11 19:21:18, kzar wrote: > On 2015/07/08 13:03:20, Sebastian Noack wrote: > > Nit: This could be simplified a little: > > > > result = collections.OrderedDict() > > if locale != default_locale: > > result.update(self.read_locale(default_locale, page)) > > I think we should leave this unrelated change out of this patchset. It's not unrelated if you change the code anyway. And in particular your change is introducing the duplication here. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:45: self.project_name, api_endpoint, self.api_key Please encode the parameters properly: url = "/api/project/%s/%s?%s" % ( urllib.quote(self.project_name), urllib.quote(api_endpoint), urllib.urlencode([('key', self.api_key), ('json', '1')]) ) https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:54: fields += [(name + "[]", v) for v in value] Nit: fields.extend((name + "[]", v) for v in value) https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:56: fields += [("files[%s]" % f[0], f) for f in files] Nit: Please use .extend() here as well. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:56: fields += [("files[%s]" % f[0], f) for f in files] Note that |'%s' % f| returns it object representation like |<open file '/dev/null', mode 'r' at 0x7fc2226df780>|. How about using simply an incrementing index instead? That would seem less hacky. Also why can the part between the square brackets be empty for regular fields, but not for files? https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:61: timeout=urllib3.Timeout(connect=5) Any particular reason you specify a custom connect timeout? https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:63: if response.status < 200 or response.status >= 300: How about |response.status not in xrange(200, 299)|? https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:65: except urllib3.exceptions.HTTPError as e: Nit: Since we don't use the variable e you can omit the as-clause. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:66: logger.error("API call to %s failed:\n%s" % (url, response.data)) You can pass the values for the placeholders directly to the logging function, rather than formatting the string here: logger.error("API call to %s failed:\n%s", url, response.data) https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) How about |json.load(response)|? https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) Note that like urllib/urllib2, urllib3's repsonse objects also have a close() method that should be called (in a finally-block). https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:72: logger.error("Invalid response returned by API endpoint %s" % url) Same here: logger.error("Invalid response returned by API endpoint %s", url) https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:129: data={"languages": list(enabled_locales | required_locales)} Since .request() merely iterates over the value (if it isn't a string), and set objects are iterable by themselves, there is no need to turn it into a list. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:221: "/api/project/%s/download/all.zip?key=%s" % ( As above, please use urllib.quote and urllib.urlencode here. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:225: if response.status < 200 or response.status >= 300: How about |response.status not in xrange(200, 299)|? https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:229: with zipfile.ZipFile(io.BytesIO(response.data), "r") as archive: The response is a file-like object by itself. So if you don't use the preload_content argument above, you can parse it to ZipFile() without wrapping it in a BytesIO() object. https://codereview.adblockplus.org/29317015/diff/29322015/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/converters.py#n... cms/converters.py:151: if self._params["localized_string_callback"]: I would rather not duplicate the key lookup: callback = self._params.get("localized_string_callback") if callback: callback(...)
Patch Set 5 : Addressed Sebastian's feedback https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py File cms/sources.py (right): https://codereview.adblockplus.org/29317015/diff/29321284/cms/sources.py#newc... cms/sources.py:173: result = collections.OrderedDict() On 2015/07/14 11:31:04, Sebastian Noack wrote: > On 2015/07/11 19:21:18, kzar wrote: > > On 2015/07/08 13:03:20, Sebastian Noack wrote: > > > Nit: This could be simplified a little: > > > > > > result = collections.OrderedDict() > > > if locale != default_locale: > > > result.update(self.read_locale(default_locale, page)) > > > > I think we should leave this unrelated change out of this patchset. > > It's not unrelated if you change the code anyway. And in particular your change > is introducing the duplication here. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:45: self.project_name, api_endpoint, self.api_key On 2015/07/14 11:31:05, Sebastian Noack wrote: > Please encode the parameters properly: > > url = "/api/project/%s/%s?%s" % ( > urllib.quote(self.project_name), > urllib.quote(api_endpoint), > urllib.urlencode([('key', self.api_key), ('json', '1')]) > ) Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:54: fields += [(name + "[]", v) for v in value] On 2015/07/14 11:31:05, Sebastian Noack wrote: > Nit: fields.extend((name + "[]", v) for v in value) Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:56: fields += [("files[%s]" % f[0], f) for f in files] On 2015/07/14 11:31:04, Sebastian Noack wrote: > Note that |'%s' % f| returns it object representation like |<open file > '/dev/null', mode 'r' at 0x7fc2226df780>|. How about using simply an > incrementing index instead? That would seem less hacky. > > Also why can the part between the square brackets be empty for regular fields, > but not for files? So f[0] is actually the file name and we need to populate the key name for files as that's what the API expects. (I guess it's the difference between a regular array and an associate array.) https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:56: fields += [("files[%s]" % f[0], f) for f in files] On 2015/07/14 11:31:05, Sebastian Noack wrote: > Nit: Please use .extend() here as well. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:61: timeout=urllib3.Timeout(connect=5) On 2015/07/14 11:31:07, Sebastian Noack wrote: > Any particular reason you specify a custom connect timeout? During testing I found that by default it didn't seem to timeout at all. (I waited for several minutes!) I set the timeout to 5 seconds here as that seemed "about right", it made things behave the same as when we were using the requests library instead of urllib3 directly. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:63: if response.status < 200 or response.status >= 300: On 2015/07/14 11:31:07, Sebastian Noack wrote: > How about |response.status not in xrange(200, 299)|? I think I prefer it as is. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:65: except urllib3.exceptions.HTTPError as e: On 2015/07/14 11:31:06, Sebastian Noack wrote: > Nit: Since we don't use the variable e you can omit the as-clause. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:66: logger.error("API call to %s failed:\n%s" % (url, response.data)) On 2015/07/14 11:31:05, Sebastian Noack wrote: > You can pass the values for the placeholders directly to the logging function, > rather than formatting the string here: > > logger.error("API call to %s failed:\n%s", url, response.data) Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) On 2015/07/14 11:31:04, Sebastian Noack wrote: > How about |json.load(response)|? This doesn't work as you would expect, even though by default decode_content should be True[1]. I did attempt to do it this way for some time, also below when we download the zip file but in the end I just couldn't get it to work and to resort to using response.data instead. [1] https://github.com/shazow/urllib3/blob/master/urllib3/response.py https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:72: logger.error("Invalid response returned by API endpoint %s" % url) On 2015/07/14 11:31:05, Sebastian Noack wrote: > Same here: logger.error("Invalid response returned by API endpoint %s", url) Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:129: data={"languages": list(enabled_locales | required_locales)} On 2015/07/14 11:31:05, Sebastian Noack wrote: > Since .request() merely iterates over the value (if it isn't a string), and set > objects are iterable by themselves, there is no need to turn it into a list. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:221: "/api/project/%s/download/all.zip?key=%s" % ( On 2015/07/14 11:31:07, Sebastian Noack wrote: > As above, please use urllib.quote and urllib.urlencode here. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:225: if response.status < 200 or response.status >= 300: On 2015/07/14 11:31:05, Sebastian Noack wrote: > How about |response.status not in xrange(200, 299)|? See above, I prefer it as is. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:229: with zipfile.ZipFile(io.BytesIO(response.data), "r") as archive: On 2015/07/14 11:31:04, Sebastian Noack wrote: > The response is a file-like object by itself. So if you don't use the > preload_content argument above, you can parse it to ZipFile() without wrapping > it in a BytesIO() object. I agree this _should_ work but in practice it just doesn't, I spent several hours trying to get it to work and in the end gave up. https://codereview.adblockplus.org/29317015/diff/29322015/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/converters.py#n... cms/converters.py:151: if self._params["localized_string_callback"]: On 2015/07/14 11:31:08, Sebastian Noack wrote: > I would rather not duplicate the key lookup: > > callback = self._params.get("localized_string_callback") > if callback: > callback(...) Done.
https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:61: timeout=urllib3.Timeout(connect=5) On 2015/07/14 12:54:30, kzar wrote: > On 2015/07/14 11:31:07, Sebastian Noack wrote: > > Any particular reason you specify a custom connect timeout? > > During testing I found that by default it didn't seem to timeout at all. (I > waited for several minutes!) I set the timeout to 5 seconds here as that seemed > "about right", it made things behave the same as when we were using the requests > library instead of urllib3 directly. But differently than urllib/urllib2 which we use everywhere else, right? Unless we address a particular issue I'd rather stick to the defaults. Also I suppose the script might still hang on "read" anyway if only applying a connect timeout. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) On 2015/07/14 11:31:06, Sebastian Noack wrote: > Note that like urllib/urllib2, urllib3's repsonse objects also have a close() > method that should be called (in a finally-block). For reference, I just realized that urllib3, magically closes the response when it reaches EOF. Not sure if I like that behavior. But therefore closing the response isn't necessary as opposed to urllib/urllib2. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) On 2015/07/14 12:54:28, kzar wrote: > On 2015/07/14 11:31:04, Sebastian Noack wrote: > > How about |json.load(response)|? > > This doesn't work as you would expect, even though by default decode_content > should be True[1]. I did attempt to do it this way for some time, also below > when we download the zip file but in the end I just couldn't get it to work and > to resort to using response.data instead. > > [1] https://github.com/shazow/urllib3/blob/master/urllib3/response.py I just tested it myself. And it did work. You have to specify preload_content=False though. Otherwise the underlying file descriptor will already be at EOF. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:229: with zipfile.ZipFile(io.BytesIO(response.data), "r") as archive: On 2015/07/14 12:54:28, kzar wrote: > On 2015/07/14 11:31:04, Sebastian Noack wrote: > > The response is a file-like object by itself. So if you don't use the > > preload_content argument above, you can parse it to ZipFile() without wrapping > > it in a BytesIO() object. > > I agree this _should_ work but in practice it just doesn't, I spent several > hours trying to get it to work and in the end gave up. Ah right, ZipFile() requires a file-like object that supports seek(). https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:190: with open(path, "r") as f: Nit: mode="rb" for compatibility with Windows. https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:229: The logic above is duplicated in CrowdInApi.request(). Perhaps moving the common code into another method? https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:245: if ext == ".json" and locale in required_locales: Nit: I think we should match the file extension case-insensitive.
Patch Set 6 : Addressed more feedback from Sebastian https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:61: timeout=urllib3.Timeout(connect=5) On 2015/07/14 14:39:27, Sebastian Noack wrote: > On 2015/07/14 12:54:30, kzar wrote: > > On 2015/07/14 11:31:07, Sebastian Noack wrote: > > > Any particular reason you specify a custom connect timeout? > > > > During testing I found that by default it didn't seem to timeout at all. (I > > waited for several minutes!) I set the timeout to 5 seconds here as that > seemed > > "about right", it made things behave the same as when we were using the > requests > > library instead of urllib3 directly. > > But differently than urllib/urllib2 which we use everywhere else, right? Unless > we address a particular issue I'd rather stick to the defaults. Also I suppose > the script might still hang on "read" anyway if only applying a connect timeout. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) On 2015/07/14 14:39:28, Sebastian Noack wrote: > On 2015/07/14 11:31:06, Sebastian Noack wrote: > > Note that like urllib/urllib2, urllib3's repsonse objects also have a close() > > method that should be called (in a finally-block). > > For reference, I just realized that urllib3, magically closes the response when > it reaches EOF. Not sure if I like that behavior. But therefore closing the > response isn't necessary as opposed to urllib/urllib2. Acknowledged. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:70: return json.loads(response.data) On 2015/07/14 14:39:27, Sebastian Noack wrote: > On 2015/07/14 12:54:28, kzar wrote: > > On 2015/07/14 11:31:04, Sebastian Noack wrote: > > > How about |json.load(response)|? > > > > This doesn't work as you would expect, even though by default decode_content > > should be True[1]. I did attempt to do it this way for some time, also below > > when we download the zip file but in the end I just couldn't get it to work > and > > to resort to using response.data instead. > > > > [1] https://github.com/shazow/urllib3/blob/master/urllib3/response.py > > I just tested it myself. And it did work. You have to specify > preload_content=False though. Otherwise the underlying file descriptor will > already be at EOF. You're right, I forgot to set preload_content when testing this change and I had wrongly assumed it was the same problem I experienced before. (I previously spent a while trying to open the zipfile in the same way, but that time I had remembered to set preload_content.) Anyway you're right that this works fine with json.read. Done. https://codereview.adblockplus.org/29317015/diff/29322015/cms/bin/translate.p... cms/bin/translate.py:229: with zipfile.ZipFile(io.BytesIO(response.data), "r") as archive: On 2015/07/14 14:39:28, Sebastian Noack wrote: > On 2015/07/14 12:54:28, kzar wrote: > > On 2015/07/14 11:31:04, Sebastian Noack wrote: > > > The response is a file-like object by itself. So if you don't use the > > > preload_content argument above, you can parse it to ZipFile() without > wrapping > > > it in a BytesIO() object. > > > > I agree this _should_ work but in practice it just doesn't, I spent several > > hours trying to get it to work and in the end gave up. > > Ah right, ZipFile() requires a file-like object that supports seek(). Glad one of us understands :p https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:190: with open(path, "r") as f: On 2015/07/14 14:39:28, Sebastian Noack wrote: > Nit: mode="rb" for compatibility with Windows. Done. https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:229: On 2015/07/14 14:39:28, Sebastian Noack wrote: > The logic above is duplicated in CrowdInApi.request(). Perhaps moving the common > code into another method? Done. https://codereview.adblockplus.org/29317015/diff/29322229/cms/bin/translate.p... cms/bin/translate.py:245: if ext == ".json" and locale in required_locales: On 2015/07/14 14:39:29, Sebastian Noack wrote: > Nit: I think we should match the file extension case-insensitive. Done.
https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:52: request_method, str(url), **kwargs Converting url to an str object seems to be unnecessary. I don't see any obvious code path where it isn't one. And even then urllib3 doesn't seem to bother whether it gets an str or unicode object here. https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:58: return response Nit: The return doesn't need to be in the try..catch block https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:60: logger.error("API call to %s failed:\n%s", url, response.data) If self.connect.request() fails, "response" wouldn't be deifned.
Patch Set 7 : Addressed even more feedback from Sebastian https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:52: request_method, str(url), **kwargs On 2015/07/15 10:31:03, Sebastian Noack wrote: > Converting url to an str object seems to be unnecessary. I don't see any obvious > code path where it isn't one. And even then urllib3 doesn't seem to bother > whether it gets an str or unicode object here. For some reason the URL was ending up being unicode, perhaps from the project name pulled out of the config? Anyway it turns out if the URL passed to httplib is unicode then the whole request becomes unicode. Then later when httplib tries to append the body of the request an encoding exception could be thrown if there are any non-ascii characters in the body (e.g. existing Russian translations). This was the simplest way I found to make things work consistently. https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:58: return response On 2015/07/15 10:31:03, Sebastian Noack wrote: > Nit: The return doesn't need to be in the try..catch block Done. https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:60: logger.error("API call to %s failed:\n%s", url, response.data) On 2015/07/15 10:31:03, Sebastian Noack wrote: > If self.connect.request() fails, "response" wouldn't be deifned. Whoops, Done.
LGTM https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322352/cms/bin/translate.p... cms/bin/translate.py:52: request_method, str(url), **kwargs On 2015/07/15 11:09:03, kzar wrote: > On 2015/07/15 10:31:03, Sebastian Noack wrote: > > Converting url to an str object seems to be unnecessary. I don't see any > obvious > > code path where it isn't one. And even then urllib3 doesn't seem to bother > > whether it gets an str or unicode object here. > > For some reason the URL was ending up being unicode, perhaps from the project > name pulled out of the config? Anyway it turns out if the URL passed to httplib > is unicode then the whole request becomes unicode. Then later when httplib tries > to append the body of the request an encoding exception could be thrown if there > are any non-ascii characters in the body (e.g. existing Russian translations). > This was the simplest way I found to make things work consistently. So far for urllib3. This is certainly a silly behavior, if not a footgun. But yeah, nothing else/better we could do about it.
Feel free to address the nit below. LGTM if you don't. https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.p... cms/bin/translate.py:44: def raw_request(self, request_method, api_endpoint, query_params, **kwargs): Nit: Have query_params default to [] and remove it in the call from download_translations?
https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.p... cms/bin/translate.py:44: def raw_request(self, request_method, api_endpoint, query_params, **kwargs): On 2015/07/16 12:17:23, Wladimir Palant wrote: > Nit: Have query_params default to [] and remove it in the call from > download_translations? Mutable objects in default arguments are a footgun. Feel free to run following code, to see what I mean. ;) def foo(x=[]): x.append(42) return x foo() foo() However, if we make it a tuple it should be fine.
Patch Set 8 : Give query_params a default value https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.py File cms/bin/translate.py (right): https://codereview.adblockplus.org/29317015/diff/29322374/cms/bin/translate.p... cms/bin/translate.py:44: def raw_request(self, request_method, api_endpoint, query_params, **kwargs): On 2015/07/16 12:23:56, Sebastian Noack wrote: > On 2015/07/16 12:17:23, Wladimir Palant wrote: > > Nit: Have query_params default to [] and remove it in the call from > > download_translations? > > Mutable objects in default arguments are a footgun. Feel free to run following > code, to see what I mean. ;) > > def foo(x=[]): > x.append(42) > return x > foo() > foo() > > However, if we make it a tuple it should be fine. I remember that one from the interview, which is why I avoided it :p Good point about just using a tuple though. Done.
Even more LGTM
Message was sent while issue was closed.
LGTM |