|
|
Created:
Aug. 16, 2017, 12:43 a.m. by rosie Modified:
May 1, 2018, 6:45 p.m. Base URL:
https://hg.adblockplus.org/cms Visibility:
Public. |
DescriptionIssue 4488 - Add support for JSON page front matter
Patch Set 1 #
Total comments: 12
Patch Set 2 : Removed JSON postprocessing and integrated the cms tests #
Total comments: 6
Patch Set 3 : Cleaned up duplication, removed unnecessary regex #
Total comments: 18
Patch Set 4 : Preserve lines in metadata, fix sort in sitemap.tmpl #
Total comments: 14
Patch Set 5 : Add support for pages with no metadata #
Total comments: 6
Patch Set 6 : Fixed .gitignore and comment placement #Patch Set 7 : Test cases no longer use 'template = empty' #
Total comments: 6
Patch Set 8 : Add newline for code clarity #Patch Set 9 : Removed the sitemap sort attribute #
MessagesTotal messages: 29
Hi Rosie! I think your approach in `converters.py` goes in the right direction but needs a bit more care (see below). Sorry for the ticket not being clear enough on flattening of dictionaries in the JSON structure and for me giving you a wrong idea about the best approach to testing this. Ping me on IRC if you have questions about the comments. Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n... cms/converters.py:122: def parse_json(json_data, parent_key='', sep='_'): This was not very clear from the ticket, but actually this postprocessing was not required by the ticket. I've checked it with Julian and updated the ticket to make it more clear. https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n... cms/converters.py:137: data = data.replace('<!--', '') We need to be more careful here: this line (together with the following one) will strip all comment tags in the whole document. That is probably not what we want. A safer approach would be to make sure that the first non-space symbols of data are '<!--' and if it is so, we take what's between them and try to parse it as JSON. If this succeeds, we insert the remaining part of the first comment back between the comment tags and return the parsed metadata and updated page content. If parsing fails, we try to parse the content of the first comment as old-style metadata, remove parsed lines, insert the rest between the comment tags and return. In case the first non-space symbols are not '<!--', just try to parse everything as JSON, remove what's parsed and return the rest. If JSON parsing failed, we fall back to old-style parsing. I would probably create another function (let's imagine it's called `parse_metadata`) that does the metadata parsing (first as JSON, then old-style) and returns metadata as dict + remaining string and then in `parse_page_content` I would check if the document starts with an HTML comment and then call `parse_metadata` on the content of the first comment, otherwise on the whole page. Does this make sense? https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... File tests/frontmatter_samples/front_matter_json.html (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page", Here we should probably have a bit more diversity of variables. Perhaps one string, one number, a dictionary and a list would be good. However, many variables of the same type are probably excessive. https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... File tests/frontmatter_samples/front_matter_orig.html (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page Seems like having two variables here would be enough: one string and one list. Or do you think additional ones add value? https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa... File tests/test_parse_page_content.py (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa... tests/test_parse_page_content.py:1: #!/usr/bin/env python2 I think we've discussed this and I have suggested to have a separate file for the metadata parsing tests rather than putting the tests into another file. However, at that point I didn't realize the possibility of using the same approach as for most other CMS features: we can make the page use the variables in the content somehow and then check the rendered pages against "correct" expected outputs. I think this approach would be preferable in this case because we would not add any new testing code, only testing data.
Hope you guys have a great weekend! :) https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n... cms/converters.py:122: def parse_json(json_data, parent_key='', sep='_'): On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > This was not very clear from the ticket, but actually this postprocessing was > not required by the ticket. I've checked it with Julian and updated the ticket > to make it more clear. Done. https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n... cms/converters.py:137: data = data.replace('<!--', '') On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > We need to be more careful here: this line (together with the following one) > will strip all comment tags in the whole document. That is probably not what we > want. > > A safer approach would be to make sure that the first non-space symbols of data > are '<!--' and if it is so, we take what's between them and try to parse it as > JSON. If this succeeds, we insert the remaining part of the first comment back > between the comment tags and return the parsed metadata and updated page > content. If parsing fails, we try to parse the content of the first comment as > old-style metadata, remove parsed lines, insert the rest between the comment > tags and return. > In case the first non-space symbols are not '<!--', just try to parse everything > as JSON, remove what's parsed and return the rest. If JSON parsing failed, we > fall back to old-style parsing. > > I would probably create another function (let's imagine it's called > `parse_metadata`) that does the metadata parsing (first as JSON, then old-style) > and returns metadata as dict + remaining string and then in `parse_page_content` > I would check if the document starts with an HTML comment and then call > `parse_metadata` on the content of the first comment, otherwise on the whole > page. > > Does this make sense? Done. https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... File tests/frontmatter_samples/front_matter_json.html (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page", On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > Here we should probably have a bit more diversity of variables. Perhaps one > string, one number, a dictionary and a list would be good. However, many > variables of the same type are probably excessive. Integrated this test with test_site and expected_output. Also updated the variables as suggested. https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page", On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > Here we should probably have a bit more diversity of variables. Perhaps one > string, one number, a dictionary and a list would be good. However, many > variables of the same type are probably excessive. Done. https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... File tests/frontmatter_samples/front_matter_orig.html (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > Seems like having two variables here would be enough: one string and one list. > Or do you think additional ones add value? Sounds good. Also, integrated this test with test_site and expected_output. https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s... tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > Seems like having two variables here would be enough: one string and one list. > Or do you think additional ones add value? Done. https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa... File tests/test_parse_page_content.py (right): https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa... tests/test_parse_page_content.py:1: #!/usr/bin/env python2 On 2017/08/16 18:34:56, Vasily Kuznetsov wrote: > I think we've discussed this and I have suggested to have a separate file for > the metadata parsing tests rather than putting the tests into another file. > However, at that point I didn't realize the possibility of using the same > approach as for most other CMS features: we can make the page use the variables > in the content somehow and then check the rendered pages against "correct" > expected outputs. I think this approach would be preferable in this case because > we would not add any new testing code, only testing data. Done.
Hi Rosie, This looks pretty good! I have no comments on the tests, all is good there. In the main code there are a couple of places where I thought things could be simplified a bit, see below if you agree. Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:122: def parse_metadata(page, data): There's an implicit duplication here, where we manually assign the `page` key in the returned dictionary. Something like this: def parse_metadata(page, data): metadata = {'page': page} try: decoder = json.JSONDecoder() json_data, index = decoder.raw_decode(data) metadata.update(json_data) page_data = data[index:].strip() except ValueError: lines = data.splitlines(True) for i, line in enumerate(lines): if not re.search(r'^\s*[\w\-]+\s*=', line): break name, value = line.split('=', 1) value = value.strip() if value.startswith('[') and value.endswith(']'): value = [element.strip() for element in value[1:-1].split(',')] lines[i] = '' metadata[name.strip()] = value page_data = ''.join(lines) return metadata, page_data We just reuse the same initial `metadata` dictionary, so in case we need to add or change default metadata variables we can do it in one place and then we only return in one place to make the flow a bit more obvious. What do you think about this version? https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:148: start_index = re.search('<!--', data).end() I think regular expressions are a bit of an overkill here. You can just use `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as `end_index`. Although, this creates more magical constants and repetition, so perhaps it would make sense to move the comment start and end strings into constants, something like `COMMENT_START` and `COMMENT_END`. In the end it would be something like this (I also took the liberty to restructure it a bit to have only one return): def parse_page_content(page, data): """Separate page content into metadata (dict) and body text (str)""" # If metadata is in a comment block, extract it if data.lstrip().startswith(COMMENT_START): start_index = data.index(COMMENT_START) + len(COMMENT_START) end_index = data.index(COMMENT_END) comment = data[start_index:end_index] page_tail = data[end_index + len(COMMENT_END):] metadata, comment_data = parse_metadata(page, comment.strip()) page_data = '{}\n{}\n{}\n\n{}'.format( COMMENT_START, comment_data, COMMENT_END, page_tail.strip() ) else: metadata, page_data = parse_metadata(page, data.strip()) return metadata, page_data I think it would make sense to replace the regular expressions with a simple `.index(...)` but feel free to incorporate any of the other changes too.
https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:122: def parse_metadata(page, data): On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: > There's an implicit duplication here, where we manually assign the `page` key in > the returned dictionary. Something like this: > > def parse_metadata(page, data): > metadata = {'page': page} > try: > decoder = json.JSONDecoder() > json_data, index = decoder.raw_decode(data) > metadata.update(json_data) > page_data = data[index:].strip() > except ValueError: > lines = data.splitlines(True) > for i, line in enumerate(lines): > if not re.search(r'^\s*[\w\-]+\s*=', line): > break > name, value = line.split('=', 1) > value = value.strip() > if value.startswith('[') and value.endswith(']'): > value = [element.strip() for element in > value[1:-1].split(',')] > lines[i] = '' > metadata[name.strip()] = value > page_data = ''.join(lines) > return metadata, page_data > > We just reuse the same initial `metadata` dictionary, so in case we need to add > or change default metadata variables we can do it in one place and then we only > return in one place to make the flow a bit more obvious. > > What do you think about this version? Looks good. :) https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:122: def parse_metadata(page, data): On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: > There's an implicit duplication here, where we manually assign the `page` key in > the returned dictionary. Something like this: > > def parse_metadata(page, data): > metadata = {'page': page} > try: > decoder = json.JSONDecoder() > json_data, index = decoder.raw_decode(data) > metadata.update(json_data) > page_data = data[index:].strip() > except ValueError: > lines = data.splitlines(True) > for i, line in enumerate(lines): > if not re.search(r'^\s*[\w\-]+\s*=', line): > break > name, value = line.split('=', 1) > value = value.strip() > if value.startswith('[') and value.endswith(']'): > value = [element.strip() for element in > value[1:-1].split(',')] > lines[i] = '' > metadata[name.strip()] = value > page_data = ''.join(lines) > return metadata, page_data > > We just reuse the same initial `metadata` dictionary, so in case we need to add > or change default metadata variables we can do it in one place and then we only > return in one place to make the flow a bit more obvious. > > What do you think about this version? Done. https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:148: start_index = re.search('<!--', data).end() On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: > I think regular expressions are a bit of an overkill here. You can just use > `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as > `end_index`. Although, this creates more magical constants and repetition, so > perhaps it would make sense to move the comment start and end strings into > constants, something like `COMMENT_START` and `COMMENT_END`. > > In the end it would be something like this (I also took the liberty to > restructure it a bit to have only one return): > > def parse_page_content(page, data): > """Separate page content into metadata (dict) and body text (str)""" > # If metadata is in a comment block, extract it > if data.lstrip().startswith(COMMENT_START): > start_index = data.index(COMMENT_START) + len(COMMENT_START) > end_index = data.index(COMMENT_END) > comment = data[start_index:end_index] > page_tail = data[end_index + len(COMMENT_END):] > metadata, comment_data = parse_metadata(page, comment.strip()) > page_data = '{}\n{}\n{}\n\n{}'.format( > COMMENT_START, comment_data, COMMENT_END, page_tail.strip() > ) > else: > metadata, page_data = parse_metadata(page, data.strip()) > return metadata, page_data > > I think it would make sense to replace the regular expressions with a simple > `.index(...)` but feel free to incorporate any of the other changes too. Yeah, that looks cleaner and avoids regular expressions. Is it necessary to make the strings constants? It seems like the added complexity might not be worth it. https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python Maybe just use `comment_start` and `comment_end` as regular variables, but don't modify them?
LGTM https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... cms/converters.py:148: start_index = re.search('<!--', data).end() On 2017/08/23 18:13:05, rosie wrote: > On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: > > I think regular expressions are a bit of an overkill here. You can just use > > `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as > > `end_index`. Although, this creates more magical constants and repetition, so > > perhaps it would make sense to move the comment start and end strings into > > constants, something like `COMMENT_START` and `COMMENT_END`. > > > > In the end it would be something like this (I also took the liberty to > > restructure it a bit to have only one return): > > > > def parse_page_content(page, data): > > """Separate page content into metadata (dict) and body text (str)""" > > # If metadata is in a comment block, extract it > > if data.lstrip().startswith(COMMENT_START): > > start_index = data.index(COMMENT_START) + len(COMMENT_START) > > end_index = data.index(COMMENT_END) > > comment = data[start_index:end_index] > > page_tail = data[end_index + len(COMMENT_END):] > > metadata, comment_data = parse_metadata(page, comment.strip()) > > page_data = '{}\n{}\n{}\n\n{}'.format( > > COMMENT_START, comment_data, COMMENT_END, page_tail.strip() > > ) > > else: > > metadata, page_data = parse_metadata(page, data.strip()) > > return metadata, page_data > > > > I think it would make sense to replace the regular expressions with a simple > > `.index(...)` but feel free to incorporate any of the other changes too. > > Yeah, that looks cleaner and avoids regular expressions. Is it necessary to make > the strings constants? It seems like the added complexity might not be worth it. > > https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python > > Maybe just use `comment_start` and `comment_end` as regular variables, but don't > modify them? There's not much added complexity to the constants, just capitalize the names and move them out of the method. There's also merit to keeping them inside, though, since they are not needed anywhere. All in all, your approach is good.
On 2017/08/25 10:12:28, Vasily Kuznetsov wrote: > LGTM > > https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py > File cms/converters.py (right): > > https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n... > cms/converters.py:148: start_index = re.search('<!--', data).end() > On 2017/08/23 18:13:05, rosie wrote: > > On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: > > > I think regular expressions are a bit of an overkill here. You can just use > > > `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as > > > `end_index`. Although, this creates more magical constants and repetition, > so > > > perhaps it would make sense to move the comment start and end strings into > > > constants, something like `COMMENT_START` and `COMMENT_END`. > > > > > > In the end it would be something like this (I also took the liberty to > > > restructure it a bit to have only one return): > > > > > > def parse_page_content(page, data): > > > """Separate page content into metadata (dict) and body text (str)""" > > > # If metadata is in a comment block, extract it > > > if data.lstrip().startswith(COMMENT_START): > > > start_index = data.index(COMMENT_START) + len(COMMENT_START) > > > end_index = data.index(COMMENT_END) > > > comment = data[start_index:end_index] > > > page_tail = data[end_index + len(COMMENT_END):] > > > metadata, comment_data = parse_metadata(page, comment.strip()) > > > page_data = '{}\n{}\n{}\n\n{}'.format( > > > COMMENT_START, comment_data, COMMENT_END, page_tail.strip() > > > ) > > > else: > > > metadata, page_data = parse_metadata(page, data.strip()) > > > return metadata, page_data > > > > > > I think it would make sense to replace the regular expressions with a simple > > > `.index(...)` but feel free to incorporate any of the other changes too. > > > > Yeah, that looks cleaner and avoids regular expressions. Is it necessary to > make > > the strings constants? It seems like the added complexity might not be worth > it. > > > > > https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python > > > > Maybe just use `comment_start` and `comment_end` as regular variables, but > don't > > modify them? > > There's not much added complexity to the constants, just capitalize the names > and move them out of the method. There's also merit to keeping them inside, > though, since they are not needed anywhere. All in all, your approach is good. Hey Rosie, Thanks a lot for doing this, also LGTM
https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:38: Adding this blank line is unrelated. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:128: page_data = data[index:].strip() Note that when parsing the legacy non-JSON format, we blank out empty lines but preserve any line endings. This is necessary for line numbers in jinja2 tracebacks to still match. However, here you strip any new lines that might be used to format the JSON object, and only preserve any content that follows. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:129: except ValueError: In case the data can be interpreted as JSON, but result into something else than a dict, a TypeError will be thrown instead of a ValueError, by dict.update(). https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:132: if not re.search(r'^\s*[\w\-]+\s*=', line): There is some redundancy in the regular expression and the logic below, i.e. the fact that there has to be an equals sign and that spaces around the name are ignored, is duplicated in the regular expression used for validation, and in the logic that actually splits the name-value-pair. Instead, we could validate and extract name and value in one go, avoiding duplicated logic: m = re.search(r'^\s*([\w\-]+)\s*=\s*(.*?)\s*$', line) if not m: break name, value = m.groups() if value.startswith('[') and value.endswith(']'): value = [element.strip() for element in value[1:-1].split(',')] lines[i] = '' metadata[name] = value https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:147: comment_start = '<!--' It seems much simpler to use a regular expression here: m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S) if m: comment, page_tail = m.groups() https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:154: metadata, comment_data = parse_metadata(page, comment.strip()) Does stripping the comment have any effect here? It seems parse_metadata() is ignoring trailing/leading spaces, or do I miss something?
Thanks for those points, Sebastian! Rosie, there are few more things to consider if we need to preserve the line breaks. See below. Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:128: page_data = data[index:].strip() On 2017/08/29 22:49:39, Sebastian Noack wrote: > Note that when parsing the legacy non-JSON format, we blank out empty lines but > preserve any line endings. This is necessary for line numbers in jinja2 > tracebacks to still match. However, here you strip any new lines that might be > used to format the JSON object, and only preserve any content that follows. This consideration is rather non-obvious, for example I didn't know about it. I'd say there should be a comment somewhere (probably in this function) explaining the rationale for preserving the line breaks. We probably also want to preserve the number line breaks in JSON case. And the tests should probably have some newlines before and in the comment and make sure that they stay in place. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:147: comment_start = '<!--' On 2017/08/29 22:49:39, Sebastian Noack wrote: > It seems much simpler to use a regular expression here: > > m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S) > if m: > comment, page_tail = m.groups() I think it was my advice to not use regexps here because I didn't know about the non-greedy matching (oops :/, now I do). I agree that this approach is better, but I think we need to save the part before the comment as well if we want to be careful about not eating newlines: m = re.search(r'^\(s*)<!--(.*?)-->(.*)', data, re.S) if m: leading_space, comment, page_tail = m.groups() and then reinsert it back before the comment (in which we replace the metadata with just the correct number of newlines). https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:154: metadata, comment_data = parse_metadata(page, comment.strip()) On 2017/08/29 22:49:39, Sebastian Noack wrote: > Does stripping the comment have any effect here? It seems parse_metadata() is > ignoring trailing/leading spaces, or do I miss something? JSON parsing fails on leading space, but if we want to preserve the newlines, we will need to be more careful here. Probably makes sense to do the stripping inside of `parse_metadata()` and make sure that the returned `comment_data` has the same number of lines as the original comment that what was passed in.
Hey guys! :) It seems this functionality was moved to utils.py, so most of the changes are there. A couple of the test pages were also updated to match the new functionality. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:38: On 2017/08/29 22:49:39, Sebastian Noack wrote: > Adding this blank line is unrelated. True. My linter was showing a warning because there were not two blank lines surrounding the function. Should I change it back? https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:128: page_data = data[index:].strip() On 2017/08/30 10:50:00, Vasily Kuznetsov wrote: > On 2017/08/29 22:49:39, Sebastian Noack wrote: > > Note that when parsing the legacy non-JSON format, we blank out empty lines > but > > preserve any line endings. This is necessary for line numbers in jinja2 > > tracebacks to still match. However, here you strip any new lines that might be > > used to format the JSON object, and only preserve any content that follows. > > This consideration is rather non-obvious, for example I didn't know about it. > I'd say there should be a comment somewhere (probably in this function) > explaining the rationale for preserving the line breaks. > > We probably also want to preserve the number line breaks in JSON case. And the > tests should probably have some newlines before and in the comment and make sure > that they stay in place. (This function was moved to utils.py.) Now, the line numbers are preserved, but if there is any text in a comment block along with the metadata, it will now also be hidden. It seems that this was the original intent, reading over the issue. The test cases have been updated accordingly. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:129: except ValueError: On 2017/08/29 22:49:39, Sebastian Noack wrote: > In case the data can be interpreted as JSON, but result into something else than > a dict, a TypeError will be thrown instead of a ValueError, by dict.update(). The check to make sure a dict is returned now happens on line 78. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:132: if not re.search(r'^\s*[\w\-]+\s*=', line): On 2017/08/29 22:49:39, Sebastian Noack wrote: > There is some redundancy in the regular expression and the logic below, i.e. the > fact that there has to be an equals sign and that spaces around the name are > ignored, is duplicated in the regular expression used for validation, and in the > logic that actually splits the name-value-pair. Instead, we could validate and > extract name and value in one go, avoiding duplicated logic: > > m = re.search(r'^\s*([\w\-]+)\s*=\s*(.*?)\s*$', line) > if not m: > break > name, value = m.groups() > if value.startswith('[') and value.endswith(']'): > value = [element.strip() for element in value[1:-1].split(',')] > lines[i] = '' > metadata[name] = value Done. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:147: comment_start = '<!--' On 2017/08/30 10:50:00, Vasily Kuznetsov wrote: > On 2017/08/29 22:49:39, Sebastian Noack wrote: > > It seems much simpler to use a regular expression here: > > > > m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S) > > if m: > > comment, page_tail = m.groups() > > I think it was my advice to not use regexps here because I didn't know about the > non-greedy matching (oops :/, now I do). I agree that this approach is better, > but I think we need to save the part before the comment as well if we want to be > careful about not eating newlines: > > m = re.search(r'^\(s*)<!--(.*?)-->(.*)', data, re.S) > if m: > leading_space, comment, page_tail = m.groups() > > and then reinsert it back before the comment (in which we replace the metadata > with just the correct number of newlines). Done. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:154: metadata, comment_data = parse_metadata(page, comment.strip()) On 2017/08/30 10:50:01, Vasily Kuznetsov wrote: > On 2017/08/29 22:49:39, Sebastian Noack wrote: > > Does stripping the comment have any effect here? It seems parse_metadata() is > > ignoring trailing/leading spaces, or do I miss something? > > JSON parsing fails on leading space, but if we want to preserve the newlines, we > will need to be more careful here. Probably makes sense to do the stripping > inside of `parse_metadata()` and make sure that the returned `comment_data` has > the same number of lines as the original comment that what was passed in. Done. https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp... File tests/expected_output/en/sitemap (right): https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp... tests/expected_output/en/sitemap:6: <ul><li>title : filter </li><li>title : get_page_url </li><li>title : global </li><li>title : translate </li><li>title : metadata_old_comment </li><li>title : metadata_json </li><li>title : metadata_json_comment </li><li>title : sitemap </li> It seems there was a bug in sitemap.tmpl which was not properly ordering the pages alphabetically. That fix is also included in the next patch, and this test case is fixed to match the behavior. https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:7: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} Added the page attribute for sorting functionality to work properly.
https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n... cms/converters.py:38: On 2018/03/26 02:32:21, rosie wrote: > On 2017/08/29 22:49:39, Sebastian Noack wrote: > > Adding this blank line is unrelated. > > True. My linter was showing a warning because there were not two blank lines > surrounding the function. Should I change it back? Strictly, there should be two blank lines surrounding each function. Flake8 will complain about this as well, however, if you run flake8 trough tox, our configuration ignores this error for this file. That doesn't mean that we disagree to the linter error, but merely didn't care to change all of our code at once, when we started using flake8. Anyway, it seems this is no longer relevant, as the lastest patch doesn't touch this file at all, anymore. https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp... File tests/expected_output/en/sitemap (right): https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp... tests/expected_output/en/sitemap:6: <ul><li>title : filter </li><li>title : get_page_url </li><li>title : global </li><li>title : translate </li><li>title : metadata_old_comment </li><li>title : metadata_json </li><li>title : metadata_json_comment </li><li>title : sitemap </li> On 2018/03/26 02:32:21, rosie wrote: > It seems there was a bug in sitemap.tmpl which was not properly ordering the > pages alphabetically. That fix is also included in the next patch, and this test > case is fixed to match the behavior. Acknowledged. https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... File .pytest_cache/v/cache/lastfailed (right): https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... .pytest_cache/v/cache/lastfailed:1: { This file should not be committed. https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... File .pytest_cache/v/cache/nodeids (right): https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... .pytest_cache/v/cache/nodeids:1: [ This file should not be committed. https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] I just noticed one more edge case, if there is an HTML comment at the beginning of the file, but it doesn't contain any metadata, we would still strip the comment. This could be accounted for with following addition to the logic here: if length > 0: cutoff = m.end() if m else length source = '\n' * source.count('\n', 0, cutoff) + source[cutoff:] return metadata, source https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:7: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} On 2018/03/26 02:32:21, rosie wrote: > Added the page attribute for sorting functionality to work properly. Acknowledged.
Hi Rosie, I don't have much to add to Sebastian's comments, just a couple of inline comments (see below). Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... File .pytest_cache/v/cache/lastfailed (right): https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... .pytest_cache/v/cache/lastfailed:1: { On 2018/03/26 02:57:48, Sebastian Noack wrote: > This file should not be committed. Indeed. Adding .pytest_cache to ignores is probably the best approach to get the two files out of the way. https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/03/26 02:57:48, Sebastian Noack wrote: > I just noticed one more edge case, if there is an HTML comment at the beginning > of the file, but it doesn't contain any metadata, we would still strip the > comment. This could be accounted for with following addition to the logic here: > > if length > 0: > cutoff = m.end() if m else length > source = '\n' * source.count('\n', 0, cutoff) + source[cutoff:] > > return metadata, source Good catch, Sebastian! It probably makes sense to add a test for this too, to avoid future regressions.
https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/03/26 02:57:48, Sebastian Noack wrote: > I just noticed one more edge case, if there is an HTML comment at the beginning > of the file, but it doesn't contain any metadata, we would still strip the > comment. This could be accounted for with following addition to the logic here: > > if length > 0: > cutoff = m.end() if m else length > source = '\n' * source.count('\n', 0, cutoff) + source[cutoff:] > > return metadata, source Perhaps a test case should be added as well to cover this?
Thanks for the feedback! I implemented the change, but ran into some trouble making the test case (see below). https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... File .pytest_cache/v/cache/lastfailed (right): https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... .pytest_cache/v/cache/lastfailed:1: { On 2018/03/26 09:24:28, Vasily Kuznetsov wrote: > On 2018/03/26 02:57:48, Sebastian Noack wrote: > > This file should not be committed. > > Indeed. Adding .pytest_cache to ignores is probably the best approach to get the > two files out of the way. Done. https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... File .pytest_cache/v/cache/nodeids (right): https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac... .pytest_cache/v/cache/nodeids:1: [ On 2018/03/26 02:57:48, Sebastian Noack wrote: > This file should not be committed. Done. https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/03/28 03:01:36, Jon Sonesen wrote: > On 2018/03/26 02:57:48, Sebastian Noack wrote: > > I just noticed one more edge case, if there is an HTML comment at the > beginning > > of the file, but it doesn't contain any metadata, we would still strip the > > comment. This could be accounted for with following addition to the logic > here: > > > > if length > 0: > > cutoff = m.end() if m else length > > source = '\n' * source.count('\n', 0, cutoff) + source[cutoff:] > > > > return metadata, source > > Perhaps a test case should be added as well to cover this? I've implemented Sebastian's suggestion and it works as expected, but while trying to implement a test case for this, I ran into two issues. 1) A file with no specified template uses the default.tmpl to generate the page, but the default.tmpl file is blank, so a blank page is generated. I'm not sure if this is the intended behavior for this test site. 2) The sitemap.tmpl file seems to only include pages that have some metadata when creating the sitemap output, so this new test case would never show up on the sitemap. This may be out of scope for the current issue.
https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/03/31 21:09:28, rosie wrote: > I've implemented Sebastian's suggestion and it works as expected, but while > trying to implement a test case for this, I ran into two issues. > > 1) A file with no specified template uses the default.tmpl to generate the page, > but the default.tmpl file is blank, so a blank page is generated. I'm not sure > if this is the intended behavior for this test site. So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or at very least it's extremely confusing. But I agree that cleaning this up seems out of scope of this change. > 2) The sitemap.tmpl file seems to only include pages that have some metadata > when creating the sitemap output, so this new test case would never show up on > the sitemap. This may be out of scope for the current issue. Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly ignores any pages that don't define any custom metadata: https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477 I cannot tell if that is intended or not, but at very least the comment there gives no sufficient explanation. https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 .hgignore:11: .pytest_cache If you modify .hgignore, please also add the changes to .gitignore (since this repository is mirrored to GitHub). Also directories that are to be ignored (only) under the project root are listed above using the regexp syntax. https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91 cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks This comment is slightly out of place here. Perhaps move it inside the block. Otherwise it looks like that the check in the next line is related to preserving line numbers too (which it is not).
Hi Rosie, Sorry about the roadblocks, fixing them! Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/03/31 22:07:35, Sebastian Noack wrote: > On 2018/03/31 21:09:28, rosie wrote: > > I've implemented Sebastian's suggestion and it works as expected, but while > > trying to implement a test case for this, I ran into two issues. > > > > 1) A file with no specified template uses the default.tmpl to generate the > page, > > but the default.tmpl file is blank, so a blank page is generated. I'm not sure > > if this is the intended behavior for this test site. > > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or > at very least it's extremely confusing. But I agree that cleaning this up seems > out of scope of this change. I suppose we can say that it ended up being so for historical reasons :) Anyway, apart from being confusing this seems to make testing of this case unnecessarily complicated. I think the best approach would be to swap the contents of `default.html` and `empty.tmpl` and adjust the rest of the test suite to make sure the output doesn't change. https://issues.adblockplus.org/ticket/6546 (and corresponding review https://codereview.adblockplus.org/29741581) do this, so perhaps we can just land that before this change. > > 2) The sitemap.tmpl file seems to only include pages that have some metadata > > when creating the sitemap output, so this new test case would never show up on > > the sitemap. This may be out of scope for the current issue. > > Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly > ignores any pages that don't define any custom metadata: > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477 > I cannot tell if that is intended or not, but at very least the comment there > gives no sufficient explanation. This looks like a bug to me. I've raised it in the review (https://codereview.adblockplus.org/29472555/) but then it seems we've forgot about it. I now created https://issues.adblockplus.org/ticket/6545 to fix this, but as far as I can see this doesn't block this issue.
On 2018/03/31 22:07:36, Sebastian Noack wrote: > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py > File cms/utils.py (right): > > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 > cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + > source[cutoff:] > On 2018/03/31 21:09:28, rosie wrote: > > I've implemented Sebastian's suggestion and it works as expected, but while > > trying to implement a test case for this, I ran into two issues. > > > > 1) A file with no specified template uses the default.tmpl to generate the > page, > > but the default.tmpl file is blank, so a blank page is generated. I'm not sure > > if this is the intended behavior for this test site. > > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or > at very least it's extremely confusing. But I agree that cleaning this up seems > out of scope of this change. > > > 2) The sitemap.tmpl file seems to only include pages that have some metadata > > when creating the sitemap output, so this new test case would never show up on > > the sitemap. This may be out of scope for the current issue. > > Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly > ignores any pages that don't define any custom metadata: > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477 > I cannot tell if that is intended or not, but at very least the comment there > gives no sufficient explanation. > > https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore > File .hgignore (right): > > https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 > .hgignore:11: .pytest_cache > If you modify .hgignore, please also add the changes to .gitignore (since this > repository is mirrored to GitHub). > > Also directories that are to be ignored (only) under the project root are listed > above using the regexp syntax. > > https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py > File cms/utils.py (right): > > https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91 > cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks > This comment is slightly out of place here. Perhaps move it inside the block. > Otherwise it looks like that the check in the next line is related to preserving > line numbers too (which it is not). I'm not sure why it is not clear that we ignore pages without metadata. The function is named get pages metadata.
On 2018/04/03 15:48:02, Jon Sonesen wrote: > On 2018/03/31 22:07:36, Sebastian Noack wrote: > > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py > > File cms/utils.py (right): > > > > > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 > > cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + > > source[cutoff:] > > On 2018/03/31 21:09:28, rosie wrote: > > > I've implemented Sebastian's suggestion and it works as expected, but while > > > trying to implement a test case for this, I ran into two issues. > > > > > > 1) A file with no specified template uses the default.tmpl to generate the > > page, > > > but the default.tmpl file is blank, so a blank page is generated. I'm not > sure > > > if this is the intended behavior for this test site. > > > > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or > > at very least it's extremely confusing. But I agree that cleaning this up > seems > > out of scope of this change. > > > > > 2) The sitemap.tmpl file seems to only include pages that have some metadata > > > when creating the sitemap output, so this new test case would never show up > on > > > the sitemap. This may be out of scope for the current issue. > > > > Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly > > ignores any pages that don't define any custom metadata: > > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477 > > I cannot tell if that is intended or not, but at very least the comment there > > gives no sufficient explanation. > > > > https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore > > File .hgignore (right): > > > > https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 > > .hgignore:11: .pytest_cache > > If you modify .hgignore, please also add the changes to .gitignore (since this > > repository is mirrored to GitHub). > > > > Also directories that are to be ignored (only) under the project root are > listed > > above using the regexp syntax. > > > > https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py > > File cms/utils.py (right): > > > > > https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91 > > cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks > > This comment is slightly out of place here. Perhaps move it inside the block. > > Otherwise it looks like that the check in the next line is related to > preserving > > line numbers too (which it is not). > > I'm not sure why it is not clear that we ignore pages without metadata. The > function is named get pages metadata. So after checking with Julian, he prefers current behavior so I closed #6545.
Please ignore this patch-set! I just noticed that something's wrong with the test cases... I think I had an issue with the merging & rebasing. Going to fix it asap. https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] On 2018/04/03 14:29:19, Vasily Kuznetsov wrote: > On 2018/03/31 22:07:35, Sebastian Noack wrote: > > On 2018/03/31 21:09:28, rosie wrote: > > > I've implemented Sebastian's suggestion and it works as expected, but while > > > trying to implement a test case for this, I ran into two issues. > > > > > > 1) A file with no specified template uses the default.tmpl to generate the > > page, > > > but the default.tmpl file is blank, so a blank page is generated. I'm not > sure > > > if this is the intended behavior for this test site. > > > > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or > > at very least it's extremely confusing. But I agree that cleaning this up > seems > > out of scope of this change. > > I suppose we can say that it ended up being so for historical reasons :) Anyway, > apart from being confusing this seems to make testing of this case unnecessarily > complicated. I think the best approach would be to swap the contents of > `default.html` and `empty.tmpl` and adjust the rest of the test suite to make > sure the output doesn't change. https://issues.adblockplus.org/ticket/6546 (and > corresponding review https://codereview.adblockplus.org/29741581) do this, so > perhaps we can just land that before this change. > > > > 2) The sitemap.tmpl file seems to only include pages that have some metadata > > > when creating the sitemap output, so this new test case would never show up > on > > > the sitemap. This may be out of scope for the current issue. > > > > Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly > > ignores any pages that don't define any custom metadata: > > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477 > > I cannot tell if that is intended or not, but at very least the comment there > > gives no sufficient explanation. > > This looks like a bug to me. I've raised it in the review > (https://codereview.adblockplus.org/29472555/) but then it seems we've forgot > about it. I now created https://issues.adblockplus.org/ticket/6545 to fix this, > but as far as I can see this doesn't block this issue. Acknowledged. https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 .hgignore:11: .pytest_cache On 2018/03/31 22:07:35, Sebastian Noack wrote: > If you modify .hgignore, please also add the changes to .gitignore (since this > repository is mirrored to GitHub). > > Also directories that are to be ignored (only) under the project root are listed > above using the regexp syntax. Done. https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91 cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks On 2018/03/31 22:07:36, Sebastian Noack wrote: > This comment is slightly out of place here. Perhaps move it inside the block. > Otherwise it looks like that the check in the next line is related to preserving > line numbers too (which it is not). Done.
https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 .hgignore:11: .pytest_cache On 2018/04/18 02:35:32, rosie wrote: > On 2018/03/31 22:07:35, Sebastian Noack wrote: > > If you modify .hgignore, please also add the changes to .gitignore (since this > > repository is mirrored to GitHub). > > > > Also directories that are to be ignored (only) under the project root are > listed > > above using the regexp syntax. > > Done. Not quite, however, this has been addressed separately meanwhile: https://hg.adblockplus.org/cms/rev/0ae76e6fb071 So just remove your changes to .hgignore/.gitignore and rebase.
I figured out the issue. My test cases were still using 'template = empty' in order to use (what is now) the default template. That's why the tests were failing. Since Vasily's change, my tests pages weren't being generated at all by the server. Now, they are using the default template and it seems to be working. Please let me know if you see any other issues! :) https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 .hgignore:11: .pytest_cache On 2018/04/18 05:01:39, Sebastian Noack wrote: > On 2018/04/18 02:35:32, rosie wrote: > > On 2018/03/31 22:07:35, Sebastian Noack wrote: > > > If you modify .hgignore, please also add the changes to .gitignore (since > this > > > repository is mirrored to GitHub). > > > > > > Also directories that are to be ignored (only) under the project root are > > listed > > > above using the regexp syntax. > > > > Done. > > Not quite, however, this has been addressed separately meanwhile: > https://hg.adblockplus.org/cms/rev/0ae76e6fb071 > > So just remove your changes to .hgignore/.gitignore and rebase. Done.
Basically LGTM. I don't care if the remaining nits get addressed. They are quite unimportant. https://codereview.adblockplus.org/29516687/diff/29755796/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29755796/cms/utils.py#newcode95 cms/utils.py:95: return metadata, source Nit: I'd add a blank line above, since the return statement doesn't depend any more on this block than it on the two blocks (separated by blank lines) above. https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:6: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} Nit: The changes here seem unrelated now. Still it makes sense to sort by "page", otherwise the order is somewhat arbitrary. Either way this makes sense regardless of this change, and therefore could be committed separately.
Basically LGTM. I don't care if the remaining nits get addressed. They are quite unimportant.
Added the suggested newline, and had a question about the changes in sitescripts.tmpl https://codereview.adblockplus.org/29516687/diff/29755796/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29755796/cms/utils.py#newcode95 cms/utils.py:95: return metadata, source On 2018/04/18 21:12:27, Sebastian Noack wrote: > Nit: I'd add a blank line above, since the return statement doesn't depend any > more on this block than it on the two blocks (separated by blank lines) above. Done. https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:6: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} On 2018/04/18 21:12:27, Sebastian Noack wrote: > Nit: The changes here seem unrelated now. Still it makes sense to sort by > "page", otherwise the order is somewhat arbitrary. Either way this makes sense > regardless of this change, and therefore could be committed separately. Is there any advantage to committing it separately? Is it more transparent, or easier to track changes that way? If so, then no problem, this can be a separate commit.
https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:6: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} On 2018/04/18 22:46:18, rosie wrote: > On 2018/04/18 21:12:27, Sebastian Noack wrote: > > Nit: The changes here seem unrelated now. Still it makes sense to sort by > > "page", otherwise the order is somewhat arbitrary. Either way this makes sense > > regardless of this change, and therefore could be committed separately. > > Is there any advantage to committing it separately? Is it more transparent, or > easier to track changes that way? If so, then no problem, this can be a separate > commit. If this change is unrelated, i.e. can be separated from the main change this review is about, and both changes still have the intended effect without breaking anything if applied individually, then the advantages are: 1. If we ever decide to revert either change, doing so is trivial. 2. When somebody comes across this change in the future, who hasn't been part of this discussion, unrelated changes cause confusion. 3. During code review its more efficient and effective if reviewers don't have to focus on more things than necessary at the same time.
Removed the sitemap sort attribute. I'll submit it as a noissue shortly. https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:6: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} On 2018/04/19 00:22:25, Sebastian Noack wrote: > On 2018/04/18 22:46:18, rosie wrote: > > On 2018/04/18 21:12:27, Sebastian Noack wrote: > > > Nit: The changes here seem unrelated now. Still it makes sense to sort by > > > "page", otherwise the order is somewhat arbitrary. Either way this makes > sense > > > regardless of this change, and therefore could be committed separately. > > > > Is there any advantage to committing it separately? Is it more transparent, or > > easier to track changes that way? If so, then no problem, this can be a > separate > > commit. > > If this change is unrelated, i.e. can be separated from the main change this > review is about, and both changes still have the intended effect without > breaking anything if applied individually, then the advantages are: > > 1. If we ever decide to revert either change, doing so is trivial. > 2. When somebody comes across this change in the future, who hasn't been part of > this discussion, unrelated changes cause confusion. > 3. During code review its more efficient and effective if reviewers don't have > to focus on more things than necessary at the same time. Sounds good. Done.
LGTM
LGTM
LGTM |