Issue 29516687: Issue 4488 - Add support for JSON page front matter

rosie

Aug. 16, 2017, 12:43 a.m. (2017-08-16 00:43:53 UTC) #1

Vasily Kuznetsov

Hi Rosie! I think your approach in `converters.py` goes in the right direction but needs ...

Aug. 16, 2017, 6:34 p.m. (2017-08-16 18:34:57 UTC) #2

Hi Rosie!

I think your approach in `converters.py` goes in the right direction but needs a
bit more care (see below).

Sorry for the ticket not being clear enough on flattening of dictionaries in the
JSON structure and for me giving you a wrong idea about the best approach to
testing this.

Ping me on IRC if you have questions about the comments.

Cheers,
Vasily

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n...
cms/converters.py:122: def parse_json(json_data, parent_key='', sep='_'):
This was not very clear from the ticket, but actually this postprocessing was
not required by the ticket. I've checked it with Julian and updated the ticket
to make it more clear.

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n...
cms/converters.py:137: data = data.replace('<!--', '')
We need to be more careful here: this line (together with the following one)
will strip all comment tags in the whole document. That is probably not what we
want.

A safer approach would be to make sure that the first non-space symbols of data
are '<!--'  and if it is so, we take what's between them and try to parse it as
JSON. If this succeeds, we insert the remaining part of the first comment back
between the comment tags and return the parsed metadata and updated page
content. If parsing fails, we try to parse the content of the first comment as
old-style metadata, remove parsed lines, insert the rest between the comment
tags and return.
In case the first non-space symbols are not '<!--', just try to parse everything
as JSON, remove what's parsed and return the rest. If JSON parsing failed, we
fall back to old-style parsing.

I would probably create another function (let's imagine it's called
`parse_metadata`) that does the metadata parsing (first as JSON, then old-style)
and returns metadata as dict + remaining string and then in `parse_page_content`
I would check if the document starts with an HTML comment and then call
`parse_metadata` on the content of the first comment, otherwise on the whole
page.

Does this make sense?

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
File tests/frontmatter_samples/front_matter_json.html (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page",
Here we should probably have a bit more diversity of variables. Perhaps one
string, one number, a dictionary and a list would be good. However, many
variables of the same type are probably excessive.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
File tests/frontmatter_samples/front_matter_orig.html (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page
Seems like having two variables here would be enough: one string and one list.
Or do you think additional ones add value?

https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa...
File tests/test_parse_page_content.py (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa...
tests/test_parse_page_content.py:1: #!/usr/bin/env python2
I think we've discussed this and I have suggested to have a separate file for
the metadata parsing tests rather than putting the tests into another file.
However, at that point I didn't realize the possibility of using the same
approach as for most other CMS features: we can make the page use the variables
in the content somehow and then check the rendered pages against "correct"
expected outputs. I think this approach would be preferable in this case because
we would not add any new testing code, only testing data.

rosie

Hope you guys have a great weekend! :) https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#newcode122 cms/converters.py:122: def ...

Aug. 19, 2017, 1:56 a.m. (2017-08-19 01:56:35 UTC) #3

Hope you guys have a great weekend! :)

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n...
cms/converters.py:122: def parse_json(json_data, parent_key='', sep='_'):
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> This was not very clear from the ticket, but actually this postprocessing was
> not required by the ticket. I've checked it with Julian and updated the ticket
> to make it more clear.

Done.

https://codereview.adblockplus.org/29516687/diff/29516688/cms/converters.py#n...
cms/converters.py:137: data = data.replace('<!--', '')
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> We need to be more careful here: this line (together with the following one)
> will strip all comment tags in the whole document. That is probably not what
we
> want.
> 
> A safer approach would be to make sure that the first non-space symbols of
data
> are '<!--'  and if it is so, we take what's between them and try to parse it
as
> JSON. If this succeeds, we insert the remaining part of the first comment back
> between the comment tags and return the parsed metadata and updated page
> content. If parsing fails, we try to parse the content of the first comment as
> old-style metadata, remove parsed lines, insert the rest between the comment
> tags and return.
> In case the first non-space symbols are not '<!--', just try to parse
everything
> as JSON, remove what's parsed and return the rest. If JSON parsing failed, we
> fall back to old-style parsing.
> 
> I would probably create another function (let's imagine it's called
> `parse_metadata`) that does the metadata parsing (first as JSON, then
old-style)
> and returns metadata as dict + remaining string and then in
`parse_page_content`
> I would check if the document starts with an HTML comment and then call
> `parse_metadata` on the content of the first comment, otherwise on the whole
> page.
> 
> Does this make sense?

Done.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
File tests/frontmatter_samples/front_matter_json.html (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page",
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> Here we should probably have a bit more diversity of variables. Perhaps one
> string, one number, a dictionary and a list would be good. However, many
> variables of the same type are probably excessive.

Integrated this test with test_site and expected_output. Also updated the
variables as suggested.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_json.html:2: "title": "Awesome page",
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> Here we should probably have a bit more diversity of variables. Perhaps one
> string, one number, a dictionary and a list would be good. However, many
> variables of the same type are probably excessive.

Done.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
File tests/frontmatter_samples/front_matter_orig.html (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> Seems like having two variables here would be enough: one string and one list.
> Or do you think additional ones add value?

Sounds good. Also, integrated this test with test_site and expected_output.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/frontmatter_s...
tests/frontmatter_samples/front_matter_orig.html:2: title = Awesome page
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> Seems like having two variables here would be enough: one string and one list.
> Or do you think additional ones add value?

Done.

https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa...
File tests/test_parse_page_content.py (right):

https://codereview.adblockplus.org/29516687/diff/29516688/tests/test_parse_pa...
tests/test_parse_page_content.py:1: #!/usr/bin/env python2
On 2017/08/16 18:34:56, Vasily Kuznetsov wrote:
> I think we've discussed this and I have suggested to have a separate file for
> the metadata parsing tests rather than putting the tests into another file.
> However, at that point I didn't realize the possibility of using the same
> approach as for most other CMS features: we can make the page use the
variables
> in the content somehow and then check the rendered pages against "correct"
> expected outputs. I think this approach would be preferable in this case
because
> we would not add any new testing code, only testing data.

Done.

Vasily Kuznetsov

Hi Rosie, This looks pretty good! I have no comments on the tests, all is ...

Aug. 21, 2017, 6:19 p.m. (2017-08-21 18:19:51 UTC) #4

Hi Rosie,

This looks pretty good! I have no comments on the tests, all is good there. In
the main code there are a couple of places where I thought things could be
simplified a bit, see below if you agree.

Cheers,
Vasily

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:122: def parse_metadata(page, data):
There's an implicit duplication here, where we manually assign the `page` key in
the returned dictionary. Something like this:

    def parse_metadata(page, data):
        metadata = {'page': page}
        try:
            decoder = json.JSONDecoder()
            json_data, index = decoder.raw_decode(data)
            metadata.update(json_data)
            page_data = data[index:].strip()
        except ValueError:
            lines = data.splitlines(True)
            for i, line in enumerate(lines):
                if not re.search(r'^\s*[\w\-]+\s*=', line):
                    break
                name, value = line.split('=', 1)
                value = value.strip()
                if value.startswith('[') and value.endswith(']'):
                    value = [element.strip() for element in
value[1:-1].split(',')]
                lines[i] = ''
                metadata[name.strip()] = value
            page_data = ''.join(lines)
        return metadata, page_data

We just reuse the same initial `metadata` dictionary, so in case we need to add
or change default metadata variables we can do it in one place and then we only
return in one place to make the flow a bit more obvious.

What do you think about this version?

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:148: start_index = re.search('<!--', data).end()
I think regular expressions are a bit of an overkill here. You can just use
`data.index('<!--') + 4` as `start_index` and `data.index('-->')` as
`end_index`. Although, this creates more magical constants and repetition, so
perhaps it would make sense to move the comment start and end strings into
constants, something like `COMMENT_START` and `COMMENT_END`.

In the end it would be something like this (I also took the liberty to
restructure it a bit to have only one return):

    def parse_page_content(page, data):
        """Separate page content into metadata (dict) and body text (str)"""
        # If metadata is in a comment block, extract it
        if data.lstrip().startswith(COMMENT_START):
            start_index = data.index(COMMENT_START) + len(COMMENT_START)
            end_index = data.index(COMMENT_END)
            comment = data[start_index:end_index]
            page_tail = data[end_index + len(COMMENT_END):]
            metadata, comment_data = parse_metadata(page, comment.strip())
            page_data = '{}\n{}\n{}\n\n{}'.format(
                COMMENT_START, comment_data, COMMENT_END, page_tail.strip()
            )
        else:
            metadata, page_data = parse_metadata(page, data.strip())
        return metadata, page_data

I think it would make sense to replace the regular expressions with a simple
`.index(...)` but feel free to incorporate any of the other changes too.

rosie

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#newcode122 cms/converters.py:122: def parse_metadata(page, data): On 2017/08/21 18:19:50, Vasily Kuznetsov wrote: ...

Aug. 23, 2017, 6:13 p.m. (2017-08-23 18:13:05 UTC) #5

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:122: def parse_metadata(page, data):
On 2017/08/21 18:19:50, Vasily Kuznetsov wrote:
> There's an implicit duplication here, where we manually assign the `page` key
in
> the returned dictionary. Something like this:
> 
>     def parse_metadata(page, data):
>         metadata = {'page': page}
>         try:
>             decoder = json.JSONDecoder()
>             json_data, index = decoder.raw_decode(data)
>             metadata.update(json_data)
>             page_data = data[index:].strip()
>         except ValueError:
>             lines = data.splitlines(True)
>             for i, line in enumerate(lines):
>                 if not re.search(r'^\s*[\w\-]+\s*=', line):
>                     break
>                 name, value = line.split('=', 1)
>                 value = value.strip()
>                 if value.startswith('[') and value.endswith(']'):
>                     value = [element.strip() for element in
> value[1:-1].split(',')]
>                 lines[i] = ''
>                 metadata[name.strip()] = value
>             page_data = ''.join(lines)
>         return metadata, page_data
> 
> We just reuse the same initial `metadata` dictionary, so in case we need to
add
> or change default metadata variables we can do it in one place and then we
only
> return in one place to make the flow a bit more obvious.
> 
> What do you think about this version?

Looks good. :)

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:122: def parse_metadata(page, data):
On 2017/08/21 18:19:50, Vasily Kuznetsov wrote:
> There's an implicit duplication here, where we manually assign the `page` key
in
> the returned dictionary. Something like this:
> 
>     def parse_metadata(page, data):
>         metadata = {'page': page}
>         try:
>             decoder = json.JSONDecoder()
>             json_data, index = decoder.raw_decode(data)
>             metadata.update(json_data)
>             page_data = data[index:].strip()
>         except ValueError:
>             lines = data.splitlines(True)
>             for i, line in enumerate(lines):
>                 if not re.search(r'^\s*[\w\-]+\s*=', line):
>                     break
>                 name, value = line.split('=', 1)
>                 value = value.strip()
>                 if value.startswith('[') and value.endswith(']'):
>                     value = [element.strip() for element in
> value[1:-1].split(',')]
>                 lines[i] = ''
>                 metadata[name.strip()] = value
>             page_data = ''.join(lines)
>         return metadata, page_data
> 
> We just reuse the same initial `metadata` dictionary, so in case we need to
add
> or change default metadata variables we can do it in one place and then we
only
> return in one place to make the flow a bit more obvious.
> 
> What do you think about this version?

Done.

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:148: start_index = re.search('<!--', data).end()
On 2017/08/21 18:19:50, Vasily Kuznetsov wrote:
> I think regular expressions are a bit of an overkill here. You can just use
> `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as
> `end_index`. Although, this creates more magical constants and repetition, so
> perhaps it would make sense to move the comment start and end strings into
> constants, something like `COMMENT_START` and `COMMENT_END`.
> 
> In the end it would be something like this (I also took the liberty to
> restructure it a bit to have only one return):
> 
>     def parse_page_content(page, data):
>         """Separate page content into metadata (dict) and body text (str)"""
>         # If metadata is in a comment block, extract it
>         if data.lstrip().startswith(COMMENT_START):
>             start_index = data.index(COMMENT_START) + len(COMMENT_START)
>             end_index = data.index(COMMENT_END)
>             comment = data[start_index:end_index]
>             page_tail = data[end_index + len(COMMENT_END):]
>             metadata, comment_data = parse_metadata(page, comment.strip())
>             page_data = '{}\n{}\n{}\n\n{}'.format(
>                 COMMENT_START, comment_data, COMMENT_END, page_tail.strip()
>             )
>         else:
>             metadata, page_data = parse_metadata(page, data.strip())
>         return metadata, page_data
> 
> I think it would make sense to replace the regular expressions with a simple
> `.index(...)` but feel free to incorporate any of the other changes too.

Yeah, that looks cleaner and avoids regular expressions. Is it necessary to make
the strings constants? It seems like the added complexity might not be worth it.

https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python

Maybe just use `comment_start` and `comment_end` as regular variables, but don't
modify them?

Vasily Kuznetsov

LGTM https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#newcode148 cms/converters.py:148: start_index = re.search('<!--', data).end() On 2017/08/23 18:13:05, rosie ...

Aug. 25, 2017, 10:12 a.m. (2017-08-25 10:12:28 UTC) #6

LGTM

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
cms/converters.py:148: start_index = re.search('<!--', data).end()
On 2017/08/23 18:13:05, rosie wrote:
> On 2017/08/21 18:19:50, Vasily Kuznetsov wrote:
> > I think regular expressions are a bit of an overkill here. You can just use
> > `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as
> > `end_index`. Although, this creates more magical constants and repetition,
so
> > perhaps it would make sense to move the comment start and end strings into
> > constants, something like `COMMENT_START` and `COMMENT_END`.
> > 
> > In the end it would be something like this (I also took the liberty to
> > restructure it a bit to have only one return):
> > 
> >     def parse_page_content(page, data):
> >         """Separate page content into metadata (dict) and body text (str)"""
> >         # If metadata is in a comment block, extract it
> >         if data.lstrip().startswith(COMMENT_START):
> >             start_index = data.index(COMMENT_START) + len(COMMENT_START)
> >             end_index = data.index(COMMENT_END)
> >             comment = data[start_index:end_index]
> >             page_tail = data[end_index + len(COMMENT_END):]
> >             metadata, comment_data = parse_metadata(page, comment.strip())
> >             page_data = '{}\n{}\n{}\n\n{}'.format(
> >                 COMMENT_START, comment_data, COMMENT_END, page_tail.strip()
> >             )
> >         else:
> >             metadata, page_data = parse_metadata(page, data.strip())
> >         return metadata, page_data
> > 
> > I think it would make sense to replace the regular expressions with a simple
> > `.index(...)` but feel free to incorporate any of the other changes too.
> 
> Yeah, that looks cleaner and avoids regular expressions. Is it necessary to
make
> the strings constants? It seems like the added complexity might not be worth
it.
> 
>
https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python
> 
> Maybe just use `comment_start` and `comment_end` as regular variables, but
don't
> modify them?

There's not much added complexity to the constants, just capitalize the names
and move them out of the method. There's also merit to keeping them inside,
though, since they are not needed anywhere. All in all, your approach is good.

Jon Sonesen

On 2017/08/25 10:12:28, Vasily Kuznetsov wrote: > LGTM > > https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py > File cms/converters.py (right): ...

Aug. 28, 2017, 11:33 a.m. (2017-08-28 11:33:07 UTC) #7

On 2017/08/25 10:12:28, Vasily Kuznetsov wrote:
> LGTM
> 
> https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py
> File cms/converters.py (right):
> 
>
https://codereview.adblockplus.org/29516687/diff/29519680/cms/converters.py#n...
> cms/converters.py:148: start_index = re.search('<!--', data).end()
> On 2017/08/23 18:13:05, rosie wrote:
> > On 2017/08/21 18:19:50, Vasily Kuznetsov wrote:
> > > I think regular expressions are a bit of an overkill here. You can just
use
> > > `data.index('<!--') + 4` as `start_index` and `data.index('-->')` as
> > > `end_index`. Although, this creates more magical constants and repetition,
> so
> > > perhaps it would make sense to move the comment start and end strings into
> > > constants, something like `COMMENT_START` and `COMMENT_END`.
> > > 
> > > In the end it would be something like this (I also took the liberty to
> > > restructure it a bit to have only one return):
> > > 
> > >     def parse_page_content(page, data):
> > >         """Separate page content into metadata (dict) and body text
(str)"""
> > >         # If metadata is in a comment block, extract it
> > >         if data.lstrip().startswith(COMMENT_START):
> > >             start_index = data.index(COMMENT_START) + len(COMMENT_START)
> > >             end_index = data.index(COMMENT_END)
> > >             comment = data[start_index:end_index]
> > >             page_tail = data[end_index + len(COMMENT_END):]
> > >             metadata, comment_data = parse_metadata(page, comment.strip())
> > >             page_data = '{}\n{}\n{}\n\n{}'.format(
> > >                 COMMENT_START, comment_data, COMMENT_END,
page_tail.strip()
> > >             )
> > >         else:
> > >             metadata, page_data = parse_metadata(page, data.strip())
> > >         return metadata, page_data
> > > 
> > > I think it would make sense to replace the regular expressions with a
simple
> > > `.index(...)` but feel free to incorporate any of the other changes too.
> > 
> > Yeah, that looks cleaner and avoids regular expressions. Is it necessary to
> make
> > the strings constants? It seems like the added complexity might not be worth
> it.
> > 
> >
>
https://stackoverflow.com/questions/2682745/how-do-i-create-a-constant-in-python
> > 
> > Maybe just use `comment_start` and `comment_end` as regular variables, but
> don't
> > modify them?
> 
> There's not much added complexity to the constants, just capitalize the names
> and move them out of the method. There's also merit to keeping them inside,
> though, since they are not needed anywhere. All in all, your approach is good.

Hey Rosie,

Thanks a lot for doing this, also LGTM

Sebastian Noack

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#newcode38 cms/converters.py:38: Adding this blank line is unrelated. https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#newcode128 cms/converters.py:128: page_data ...

Aug. 29, 2017, 10:49 p.m. (2017-08-29 22:49:40 UTC) #8

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:38: 
Adding this blank line is unrelated.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:128: page_data = data[index:].strip()
Note that when parsing the legacy non-JSON format, we blank out empty lines but
preserve any line endings. This is necessary for line numbers in jinja2
tracebacks to still match. However, here you strip any new lines that might be
used to format the JSON object, and only preserve any content that follows.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:129: except ValueError:
In case the data can be interpreted as JSON, but result into something else than
a dict, a TypeError will be thrown instead of a ValueError, by dict.update().

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:132: if not re.search(r'^\s*[\w\-]+\s*=', line):
There is some redundancy in the regular expression and the logic below, i.e. the
fact that there has to be an equals sign and that spaces around the name are
ignored, is duplicated in the regular expression used for validation, and in the
logic that actually splits the name-value-pair. Instead, we could validate and
extract name and value in one go, avoiding duplicated logic:

  m = re.search(r'^\s*([\w\-]+)\s*=\s*(.*?)\s*$', line)
  if not m:
      break
  name, value = m.groups()
  if value.startswith('[') and value.endswith(']'):
      value = [element.strip() for element in value[1:-1].split(',')]
  lines[i] = ''
  metadata[name] = value

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:147: comment_start = '<!--'
It seems much simpler to use a regular expression here:

  m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S)
  if m:
      comment, page_tail = m.groups()

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:154: metadata, comment_data = parse_metadata(page,
comment.strip())
Does stripping the comment have any effect here? It seems parse_metadata() is
ignoring trailing/leading spaces, or do I miss something?

Vasily Kuznetsov

Thanks for those points, Sebastian! Rosie, there are few more things to consider if we ...

Aug. 30, 2017, 10:50 a.m. (2017-08-30 10:50:01 UTC) #9

Thanks for those points, Sebastian!

Rosie, there are few more things to consider if we need to preserve the line
breaks. See below.

Cheers,
Vasily

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:128: page_data = data[index:].strip()
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> Note that when parsing the legacy non-JSON format, we blank out empty lines
but
> preserve any line endings. This is necessary for line numbers in jinja2
> tracebacks to still match. However, here you strip any new lines that might be
> used to format the JSON object, and only preserve any content that follows.

This consideration is rather non-obvious, for example I didn't know about it.
I'd say there should be a comment somewhere (probably in this function)
explaining the rationale for preserving the line breaks.

We probably also want to preserve the number line breaks in JSON case. And the
tests should probably have some newlines before and in the comment and make sure
that they stay in place.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:147: comment_start = '<!--'
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> It seems much simpler to use a regular expression here:
> 
>   m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S)
>   if m:
>       comment, page_tail = m.groups()

I think it was my advice to not use regexps here because I didn't know about the
non-greedy matching (oops :/, now I do). I agree that this approach is better,
but I think we need to save the part before the comment as well if we want to be
careful about not eating newlines:

    m = re.search(r'^\(s*)<!--(.*?)-->(.*)', data, re.S)
    if m:
        leading_space, comment, page_tail = m.groups()

and then reinsert it back before the comment (in which we replace the metadata
with just the correct number of newlines).

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:154: metadata, comment_data = parse_metadata(page,
comment.strip())
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> Does stripping the comment have any effect here? It seems parse_metadata() is
> ignoring trailing/leading spaces, or do I miss something?

JSON parsing fails on leading space, but if we want to preserve the newlines, we
will need to be more careful here. Probably makes sense to do the stripping
inside of `parse_metadata()` and make sure that the returned `comment_data` has
the same number of lines as the original comment that what was passed in.

rosie

Hey guys! :) It seems this functionality was moved to utils.py, so most of the ...

March 26, 2018, 2:32 a.m. (2018-03-26 02:32:22 UTC) #10

Hey guys!  :)

It seems this functionality was moved to utils.py, so most of the changes are
there. A couple of the test pages were also updated to match the new
functionality.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:38: 
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> Adding this blank line is unrelated.

True. My linter was showing a warning because there were not two blank lines
surrounding the function. Should I change it back?

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:128: page_data = data[index:].strip()
On 2017/08/30 10:50:00, Vasily Kuznetsov wrote:
> On 2017/08/29 22:49:39, Sebastian Noack wrote:
> > Note that when parsing the legacy non-JSON format, we blank out empty lines
> but
> > preserve any line endings. This is necessary for line numbers in jinja2
> > tracebacks to still match. However, here you strip any new lines that might
be
> > used to format the JSON object, and only preserve any content that follows.
> 
> This consideration is rather non-obvious, for example I didn't know about it.
> I'd say there should be a comment somewhere (probably in this function)
> explaining the rationale for preserving the line breaks.
> 
> We probably also want to preserve the number line breaks in JSON case. And the
> tests should probably have some newlines before and in the comment and make
sure
> that they stay in place.

(This function was moved to utils.py.) Now, the line numbers are preserved, but
if there is any text in a comment block along with the metadata, it will now
also be hidden. It seems that this was the original intent, reading over the
issue. The test cases have been updated accordingly.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:129: except ValueError:
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> In case the data can be interpreted as JSON, but result into something else
than
> a dict, a TypeError will be thrown instead of a ValueError, by dict.update().

The check to make sure a dict is returned now happens on line 78.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:132: if not re.search(r'^\s*[\w\-]+\s*=', line):
On 2017/08/29 22:49:39, Sebastian Noack wrote:
> There is some redundancy in the regular expression and the logic below, i.e.
the
> fact that there has to be an equals sign and that spaces around the name are
> ignored, is duplicated in the regular expression used for validation, and in
the
> logic that actually splits the name-value-pair. Instead, we could validate and
> extract name and value in one go, avoiding duplicated logic:
> 
>   m = re.search(r'^\s*([\w\-]+)\s*=\s*(.*?)\s*$', line)
>   if not m:
>       break
>   name, value = m.groups()
>   if value.startswith('[') and value.endswith(']'):
>       value = [element.strip() for element in value[1:-1].split(',')]
>   lines[i] = ''
>   metadata[name] = value

Done.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:147: comment_start = '<!--'
On 2017/08/30 10:50:00, Vasily Kuznetsov wrote:
> On 2017/08/29 22:49:39, Sebastian Noack wrote:
> > It seems much simpler to use a regular expression here:
> > 
> >   m = re.search(r'^\s*<!--(.*?)-->(.*)', data, re.S)
> >   if m:
> >       comment, page_tail = m.groups()
> 
> I think it was my advice to not use regexps here because I didn't know about
the
> non-greedy matching (oops :/, now I do). I agree that this approach is better,
> but I think we need to save the part before the comment as well if we want to
be
> careful about not eating newlines:
> 
>     m = re.search(r'^\(s*)<!--(.*?)-->(.*)', data, re.S)
>     if m:
>         leading_space, comment, page_tail = m.groups()
> 
> and then reinsert it back before the comment (in which we replace the metadata
> with just the correct number of newlines).

Done.

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#n...
cms/converters.py:154: metadata, comment_data = parse_metadata(page,
comment.strip())
On 2017/08/30 10:50:01, Vasily Kuznetsov wrote:
> On 2017/08/29 22:49:39, Sebastian Noack wrote:
> > Does stripping the comment have any effect here? It seems parse_metadata()
is
> > ignoring trailing/leading spaces, or do I miss something?
> 
> JSON parsing fails on leading space, but if we want to preserve the newlines,
we
> will need to be more careful here. Probably makes sense to do the stripping
> inside of `parse_metadata()` and make sure that the returned `comment_data`
has
> the same number of lines as the original comment that what was passed in.

Done.

https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp...
File tests/expected_output/en/sitemap (right):

https://codereview.adblockplus.org/29516687/diff/29525608/tests/expected_outp...
tests/expected_output/en/sitemap:6: <ul><li>title : filter </li><li>title :
get_page_url </li><li>title : global </li><li>title : translate </li><li>title :
metadata_old_comment </li><li>title : metadata_json </li><li>title :
metadata_json_comment </li><li>title : sitemap </li>
It seems there was a bug in sitemap.tmpl which was not properly ordering the
pages alphabetically. That fix is also included in the next patch, and this test
case is fixed to match the behavior.

https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag...
File tests/test_site/pages/sitemap.tmpl (right):

https://codereview.adblockplus.org/29516687/diff/29732555/tests/test_site/pag...
tests/test_site/pages/sitemap.tmpl:7: {%- for unfiltered_page in
get_pages_metadata()|sort(attribute='page') %}
Added the page attribute for sorting functionality to work properly.

Sebastian Noack

https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29516687/diff/29525608/cms/converters.py#newcode38 cms/converters.py:38: On 2018/03/26 02:32:21, rosie wrote: > On 2017/08/29 22:49:39, ...

March 26, 2018, 2:57 a.m. (2018-03-26 02:57:48 UTC) #11

Vasily Kuznetsov

Hi Rosie, I don't have much to add to Sebastian's comments, just a couple of ...

March 26, 2018, 9:24 a.m. (2018-03-26 09:24:29 UTC) #12

Jon Sonesen

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] ...

March 28, 2018, 3:01 a.m. (2018-03-28 03:01:37 UTC) #13

rosie

Thanks for the feedback! I implemented the change, but ran into some trouble making the ...

March 31, 2018, 9:09 p.m. (2018-03-31 21:09:28 UTC) #14

Thanks for the feedback! I implemented the change, but ran into some trouble
making the test case (see below).

https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac...
File .pytest_cache/v/cache/lastfailed (right):

https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac...
.pytest_cache/v/cache/lastfailed:1: {
On 2018/03/26 09:24:28, Vasily Kuznetsov wrote:
> On 2018/03/26 02:57:48, Sebastian Noack wrote:
> > This file should not be committed.
> 
> Indeed. Adding .pytest_cache to ignores is probably the best approach to get
the
> two files out of the way.

Done.

https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac...
File .pytest_cache/v/cache/nodeids (right):

https://codereview.adblockplus.org/29516687/diff/29732555/.pytest_cache/v/cac...
.pytest_cache/v/cache/nodeids:1: [
On 2018/03/26 02:57:48, Sebastian Noack wrote:
> This file should not be committed.

Done.

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
source[cutoff:]
On 2018/03/28 03:01:36, Jon Sonesen wrote:
> On 2018/03/26 02:57:48, Sebastian Noack wrote:
> > I just noticed one more edge case, if there is an HTML comment at the
> beginning
> > of the file, but it doesn't contain any metadata, we would still strip the
> > comment. This could be accounted for with following addition to the logic
> here:
> > 
> >   if length > 0:
> >       cutoff = m.end() if m else length
> >       source = '\n' * source.count('\n', 0, cutoff) + source[cutoff:]
> > 
> >   return metadata, source
> 
> Perhaps a test case should be added as well to cover this?

I've implemented Sebastian's suggestion and it works as expected, but while
trying to implement a test case for this, I ran into two issues.

1) A file with no specified template uses the default.tmpl to generate the page,
but the default.tmpl file is blank, so a blank page is generated. I'm not sure
if this is the intended behavior for this test site.

2) The sitemap.tmpl file seems to only include pages that have some metadata
when creating the sitemap output, so this new test case would never show up on
the sitemap. This may be out of scope for the current issue.

Sebastian Noack

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) + source[cutoff:] ...

March 31, 2018, 10:07 p.m. (2018-03-31 22:07:36 UTC) #15

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
source[cutoff:]
On 2018/03/31 21:09:28, rosie wrote:
> I've implemented Sebastian's suggestion and it works as expected, but while
> trying to implement a test case for this, I ran into two issues.
> 
> 1) A file with no specified template uses the default.tmpl to generate the
page,
> but the default.tmpl file is blank, so a blank page is generated. I'm not sure
> if this is the intended behavior for this test site.

So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or
at very least it's extremely confusing. But I agree that cleaning this up seems
out of scope of this change.

> 2) The sitemap.tmpl file seems to only include pages that have some metadata
> when creating the sitemap output, so this new test case would never show up on
> the sitemap. This may be out of scope for the current issue.

Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly
ignores any pages that don't define any custom metadata:
https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477
I cannot tell if that is intended or not, but at very least the comment there
gives no sufficient explanation.

https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore
File .hgignore (right):

https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11
.hgignore:11: .pytest_cache
If you modify .hgignore, please also add the changes to .gitignore (since this
repository is mirrored to GitHub).

Also directories that are to be ignored (only) under the project root are listed
above using the regexp syntax.

https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91
cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks
This comment is slightly out of place here. Perhaps move it inside the block.
Otherwise it looks like that the check in the next line is related to preserving
line numbers too (which it is not).

Vasily Kuznetsov

Hi Rosie, Sorry about the roadblocks, fixing them! Cheers, Vasily https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py File cms/utils.py (right): https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 ...

April 3, 2018, 2:29 p.m. (2018-04-03 14:29:20 UTC) #16

Hi Rosie,

Sorry about the roadblocks, fixing them!

Cheers,
Vasily

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
source[cutoff:]
On 2018/03/31 22:07:35, Sebastian Noack wrote:
> On 2018/03/31 21:09:28, rosie wrote:
> > I've implemented Sebastian's suggestion and it works as expected, but while
> > trying to implement a test case for this, I ran into two issues.
> > 
> > 1) A file with no specified template uses the default.tmpl to generate the
> page,
> > but the default.tmpl file is blank, so a blank page is generated. I'm not
sure
> > if this is the intended behavior for this test site.
> 
> So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or
> at very least it's extremely confusing. But I agree that cleaning this up
seems
> out of scope of this change.

I suppose we can say that it ended up being so for historical reasons :) Anyway,
apart from being confusing this seems to make testing of this case unnecessarily
complicated. I think the best approach would be to swap the contents of
`default.html` and `empty.tmpl` and adjust the rest of the test suite to make
sure the output doesn't change. https://issues.adblockplus.org/ticket/6546 (and
corresponding review https://codereview.adblockplus.org/29741581) do this, so
perhaps we can just land that before this change.

> > 2) The sitemap.tmpl file seems to only include pages that have some metadata
> > when creating the sitemap output, so this new test case would never show up
on
> > the sitemap. This may be out of scope for the current issue.
> 
> Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly
> ignores any pages that don't define any custom metadata:
> https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477
> I cannot tell if that is intended or not, but at very least the comment there
> gives no sufficient explanation.

This looks like a bug to me. I've raised it in the review
(https://codereview.adblockplus.org/29472555/) but then it seems we've forgot
about it. I now created https://issues.adblockplus.org/ticket/6545 to fix this,
but as far as I can see this doesn't block this issue.

Jon Sonesen

On 2018/03/31 22:07:36, Sebastian Noack wrote: > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py > File cms/utils.py (right): > > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93 ...

April 3, 2018, 3:48 p.m. (2018-04-03 15:48:02 UTC) #17

On 2018/03/31 22:07:36, Sebastian Noack wrote:
> https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
> File cms/utils.py (right):
> 
>
https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
> cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
> source[cutoff:]
> On 2018/03/31 21:09:28, rosie wrote:
> > I've implemented Sebastian's suggestion and it works as expected, but while
> > trying to implement a test case for this, I ran into two issues.
> > 
> > 1) A file with no specified template uses the default.tmpl to generate the
> page,
> > but the default.tmpl file is blank, so a blank page is generated. I'm not
sure
> > if this is the intended behavior for this test site.
> 
> So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake, or
> at very least it's extremely confusing. But I agree that cleaning this up
seems
> out of scope of this change.
> 
> > 2) The sitemap.tmpl file seems to only include pages that have some metadata
> > when creating the sitemap output, so this new test case would never show up
on
> > the sitemap. This may be out of scope for the current issue.
> 
> Apparently the get_pages_metadata() function called in sitemap.tmpl explicitly
> ignores any pages that don't define any custom metadata:
> https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477
> I cannot tell if that is intended or not, but at very least the comment there
> gives no sufficient explanation.
> 
> https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore
> File .hgignore (right):
> 
> https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11
> .hgignore:11: .pytest_cache
> If you modify .hgignore, please also add the changes to .gitignore (since this
> repository is mirrored to GitHub).
> 
> Also directories that are to be ignored (only) under the project root are
listed
> above using the regexp syntax.
> 
> https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py
> File cms/utils.py (right):
> 
>
https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91
> cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks
> This comment is slightly out of place here. Perhaps move it inside the block.
> Otherwise it looks like that the check in the next line is related to
preserving
> line numbers too (which it is not).

I'm not sure why it is not clear that we ignore pages without metadata. The
function is named get pages metadata.

Vasily Kuznetsov

On 2018/04/03 15:48:02, Jon Sonesen wrote: > On 2018/03/31 22:07:36, Sebastian Noack wrote: > > ...

April 3, 2018, 4:25 p.m. (2018-04-03 16:25:12 UTC) #18

On 2018/04/03 15:48:02, Jon Sonesen wrote:
> On 2018/03/31 22:07:36, Sebastian Noack wrote:
> > https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
> > File cms/utils.py (right):
> > 
> >
>
https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
> > cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
> > source[cutoff:]
> > On 2018/03/31 21:09:28, rosie wrote:
> > > I've implemented Sebastian's suggestion and it works as expected, but
while
> > > trying to implement a test case for this, I ran into two issues.
> > > 
> > > 1) A file with no specified template uses the default.tmpl to generate the
> > page,
> > > but the default.tmpl file is blank, so a blank page is generated. I'm not
> sure
> > > if this is the intended behavior for this test site.
> > 
> > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake,
or
> > at very least it's extremely confusing. But I agree that cleaning this up
> seems
> > out of scope of this change.
> > 
> > > 2) The sitemap.tmpl file seems to only include pages that have some
metadata
> > > when creating the sitemap output, so this new test case would never show
up
> on
> > > the sitemap. This may be out of scope for the current issue.
> > 
> > Apparently the get_pages_metadata() function called in sitemap.tmpl
explicitly
> > ignores any pages that don't define any custom metadata:
> > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477
> > I cannot tell if that is intended or not, but at very least the comment
there
> > gives no sufficient explanation.
> > 
> > https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore
> > File .hgignore (right):
> > 
> >
https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11
> > .hgignore:11: .pytest_cache
> > If you modify .hgignore, please also add the changes to .gitignore (since
this
> > repository is mirrored to GitHub).
> > 
> > Also directories that are to be ignored (only) under the project root are
> listed
> > above using the regexp syntax.
> > 
> > https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py
> > File cms/utils.py (right):
> > 
> >
>
https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91
> > cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks
> > This comment is slightly out of place here. Perhaps move it inside the
block.
> > Otherwise it looks like that the check in the next line is related to
> preserving
> > line numbers too (which it is not).
> 
> I'm not sure why it is not clear that we ignore pages without metadata. The
> function is named get pages metadata.

So after checking with Julian, he prefers current behavior so I closed #6545.

rosie

Please ignore this patch-set! I just noticed that something's wrong with the test cases... I ...

April 18, 2018, 2:35 a.m. (2018-04-18 02:35:33 UTC) #19

Please ignore this patch-set! I just noticed that something's wrong with the
test cases... I think I had an issue with the merging & rebasing. Going to fix
it asap.

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29732555/cms/utils.py#newcode93
cms/utils.py:93: return metadata, '\n' * source.count('\n', 0, cutoff) +
source[cutoff:]
On 2018/04/03 14:29:19, Vasily Kuznetsov wrote:
> On 2018/03/31 22:07:35, Sebastian Noack wrote:
> > On 2018/03/31 21:09:28, rosie wrote:
> > > I've implemented Sebastian's suggestion and it works as expected, but
while
> > > trying to implement a test case for this, I ran into two issues.
> > > 
> > > 1) A file with no specified template uses the default.tmpl to generate the
> > page,
> > > but the default.tmpl file is blank, so a blank page is generated. I'm not
> sure
> > > if this is the intended behavior for this test site.
> > 
> > So default.tmpl is empty, and empty.tmpl is not. That looks like a mistake,
or
> > at very least it's extremely confusing. But I agree that cleaning this up
> seems
> > out of scope of this change.
> 
> I suppose we can say that it ended up being so for historical reasons :)
Anyway,
> apart from being confusing this seems to make testing of this case
unnecessarily
> complicated. I think the best approach would be to swap the contents of
> `default.html` and `empty.tmpl` and adjust the rest of the test suite to make
> sure the output doesn't change. https://issues.adblockplus.org/ticket/6546
(and
> corresponding review https://codereview.adblockplus.org/29741581) do this, so
> perhaps we can just land that before this change.
> 
> > > 2) The sitemap.tmpl file seems to only include pages that have some
metadata
> > > when creating the sitemap output, so this new test case would never show
up
> on
> > > the sitemap. This may be out of scope for the current issue.
> > 
> > Apparently the get_pages_metadata() function called in sitemap.tmpl
explicitly
> > ignores any pages that don't define any custom metadata:
> > https://hg.adblockplus.org/cms/file/b683e321a9d0/cms/converters.py#l477
> > I cannot tell if that is intended or not, but at very least the comment
there
> > gives no sufficient explanation.
> 
> This looks like a bug to me. I've raised it in the review
> (https://codereview.adblockplus.org/29472555/) but then it seems we've forgot
> about it. I now created https://issues.adblockplus.org/ticket/6545 to fix
this,
> but as far as I can see this doesn't block this issue.

Acknowledged.

https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore
File .hgignore (right):

https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11
.hgignore:11: .pytest_cache
On 2018/03/31 22:07:35, Sebastian Noack wrote:
> If you modify .hgignore, please also add the changes to .gitignore (since this
> repository is mirrored to GitHub).
> 
> Also directories that are to be ignored (only) under the project root are
listed
> above using the regexp syntax.

Done.

https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py
File cms/utils.py (right):

https://codereview.adblockplus.org/29516687/diff/29738607/cms/utils.py#newcode91
cms/utils.py:91: # Need to preserve line numbers for jinja2 tracebacks
On 2018/03/31 22:07:36, Sebastian Noack wrote:
> This comment is slightly out of place here. Perhaps move it inside the block.
> Otherwise it looks like that the check in the next line is related to
preserving
> line numbers too (which it is not).

Done.

Sebastian Noack

https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore File .hgignore (right): https://codereview.adblockplus.org/29516687/diff/29738607/.hgignore#newcode11 .hgignore:11: .pytest_cache On 2018/04/18 02:35:32, rosie wrote: > On 2018/03/31 ...

April 18, 2018, 5:01 a.m. (2018-04-18 05:01:40 UTC) #20

rosie

I figured out the issue. My test cases were still using 'template = empty' in ...

April 18, 2018, 8:47 p.m. (2018-04-18 20:47:45 UTC) #21

Sebastian Noack

Basically LGTM. I don't care if the remaining nits get addressed. They are quite unimportant. ...

April 18, 2018, 9:12 p.m. (2018-04-18 21:12:28 UTC) #22

Sebastian Noack

Basically LGTM. I don't care if the remaining nits get addressed. They are quite unimportant.

April 18, 2018, 9:12 p.m. (2018-04-18 21:12:30 UTC) #23

rosie

Added the suggested newline, and had a question about the changes in sitescripts.tmpl https://codereview.adblockplus.org/29516687/diff/29755796/cms/utils.py File ...

April 18, 2018, 10:46 p.m. (2018-04-18 22:46:19 UTC) #24

Sebastian Noack

https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pages/sitemap.tmpl File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pages/sitemap.tmpl#newcode6 tests/test_site/pages/sitemap.tmpl:6: {%- for unfiltered_page in get_pages_metadata()|sort(attribute='page') %} On 2018/04/18 22:46:18, ...

April 19, 2018, 12:22 a.m. (2018-04-19 00:22:25 UTC) #25

rosie

Removed the sitemap sort attribute. I'll submit it as a noissue shortly. https://codereview.adblockplus.org/29516687/diff/29755796/tests/test_site/pages/sitemap.tmpl File tests/test_site/pages/sitemap.tmpl ...

April 19, 2018, 3:31 a.m. (2018-04-19 03:31:02 UTC) #26

LGTM

Issue 29516687: Issue 4488 - Add support for JSON page front matter (Closed)

Description

Patch Set 1 #

Patch Set 2 : Removed JSON postprocessing and integrated the cms tests #

Patch Set 3 : Cleaned up duplication, removed unnecessary regex #

Patch Set 4 : Preserve lines in metadata, fix sort in sitemap.tmpl #

Patch Set 5 : Add support for pages with no metadata #

Patch Set 6 : Fixed .gitignore and comment placement #

Patch Set 7 : Test cases no longer use 'template = empty' #

Patch Set 8 : Add newline for code clarity #

Patch Set 9 : Removed the sitemap sort attribute #

Messages