Issue 29472555: Issue 4867 - Add global get_pages_metadata to template converters

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#newcode490 cms/converters.py:490: def parse_page_metadata(self, data, page): This code is essentially duplicating ...

June 23, 2017, 10:09 a.m. (2017-06-23 10:09:26 UTC) #1

Vasily Kuznetsov

See my comments below. Also, the ticket number in the title is wrong. https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File ...

June 23, 2017, 2:15 p.m. (2017-06-23 14:15:24 UTC) #2

See my comments below. Also, the ticket number in the title is wrong.

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#n...
cms/converters.py:490: def parse_page_metadata(self, data, page):
On 2017/06/23 10:09:26, Jon Sonesen wrote:
> This code is essentially duplicating the logic in the init function of the
> Converter class, Vasily and I discussed this and the options were to break the
> logic out into a class function of Converters, make it an utils.py function or
> use it as a function in the converters namespace. 
> 
> We chose to put it in the converters.py namespace as a function because it
makes
> no sense in utils since it is page specific logic, but it is not specific
enough
> to a given page's instance of its own converter class to be a class function.
> 
> I will break this out into its own function in the next patch set if everyone
> agrees this makes sense

As discussed, this approach sounds good.

Now looking at these 3 functions that we're adding to `TemplateConverter` it
starts looking like we should separate all the default globals out into their
own file(s). They are not really part of the converter logic but are more like a
set of services that we provide to the template -- it doesn't seem right to
pollute the converter class with this stuff. The globals often access
`self._params`, which technically is a private attribute of the converter, but
logically that thing is a rendering context and it actually becomes the context
(in jinja sense) of the templates so we will be able to get it using
`contextfunction` decorator. There's also `self._get_locale_data()` that is used
by the globals, but I'm actually wondering if `self._params['localedata']`
should be used instead (it wouldn't load file from the disk the locale every
time and it also supports locale overrides...). I guess we should ask Wladimir
why it's done this way (it's from this change:
https://hg.adblockplus.org/cms/rev/b022896ef69a).

Anyway, you can do the metadata loading refactoring already and perhaps the
separation of the globals will land as a separate change.

https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out...
File tests/test_page_outputs.py (right):

https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out...
tests/test_page_outputs.py:73: def test_revision_arg(revision, temp_site):
Is this test related to `get_pages_metadata` global?

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#newcode490 cms/converters.py:490: def parse_page_metadata(self, data, page): On 2017/06/23 14:15:24, Vasily Kuznetsov ...

June 26, 2017, 7:22 a.m. (2017-06-26 07:22:43 UTC) #3

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#n...
cms/converters.py:490: def parse_page_metadata(self, data, page):
On 2017/06/23 14:15:24, Vasily Kuznetsov wrote:
> On 2017/06/23 10:09:26, Jon Sonesen wrote:
> > This code is essentially duplicating the logic in the init function of the
> > Converter class, Vasily and I discussed this and the options were to break
the
> > logic out into a class function of Converters, make it an utils.py function
or
> > use it as a function in the converters namespace. 
> > 
> > We chose to put it in the converters.py namespace as a function because it
> makes
> > no sense in utils since it is page specific logic, but it is not specific
> enough
> > to a given page's instance of its own converter class to be a class
function.
> > 
> > I will break this out into its own function in the next patch set if
everyone
> > agrees this makes sense
> 
> As discussed, this approach sounds good.
> 
> Now looking at these 3 functions that we're adding to `TemplateConverter` it
> starts looking like we should separate all the default globals out into their
> own file(s). They are not really part of the converter logic but are more like
a
> set of services that we provide to the template -- it doesn't seem right to
> pollute the converter class with this stuff. The globals often access
> `self._params`, which technically is a private attribute of the converter, but
> logically that thing is a rendering context and it actually becomes the
context
> (in jinja sense) of the templates so we will be able to get it using
> `contextfunction` decorator. There's also `self._get_locale_data()` that is
used
> by the globals, but I'm actually wondering if `self._params['localedata']`
> should be used instead (it wouldn't load file from the disk the locale every
> time and it also supports locale overrides...). I guess we should ask Wladimir
> why it's done this way (it's from this change:
> https://hg.adblockplus.org/cms/rev/b022896ef69a).
> 
> Anyway, you can do the metadata loading refactoring already and perhaps the
> separation of the globals will land as a separate change.

Yeah I totally agree here, and actually we talked about this in the past (not to
this extent detail wise) the fact that we could break out globals and/or filters
out of the converters file tp make it cleaner to extend in the future.

Regarding the locale_data changes I agree here, since instantiating any
converter will override the locale data with user specified parameters. But
maybe there is a side effect we are not considering, or are unaware of.

https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out...
File tests/test_page_outputs.py (right):

https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out...
tests/test_page_outputs.py:73: def test_revision_arg(revision, temp_site):
On 2017/06/23 14:15:24, Vasily Kuznetsov wrote:
> Is this test related to `get_pages_metadata` global?

Yeah, so I actually shouldn't have changed this. Did a nono, and changed the
tests to pass with the new code. But basically since I changed the metadata
parsing to be a generator I had to change the 'page_data' instantiation to
happen before calling the generator. This caused the page added by the revision
arg's test bookmark to get added to the page rendering on the filterless
get_pages_metadata calls, even though there was no metadata there the dict with
the key 'page': page name was getting through the filter. I fixy.

Vasily Kuznetsov

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode145 cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): This refactoring ...

June 27, 2017, 1:32 p.m. (2017-06-27 13:32:15 UTC) #4

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode145 cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): On 2017/06/27 ...

June 28, 2017, 2:31 p.m. (2017-06-28 14:31:20 UTC) #5

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:145: for name, value, i in get_page_metadata(params['page'],
data):
On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> This refactoring moved the code into a separate function, but the code is
still
> very interdependent: this `i` in return tuples has nothing to do with metadata
> and why are we even returning an iterator of tuples, wouldn't it be better to
> return a dictionary instead.
> I propose that we go one step further. How about we change `get_page_metadata`
> to `parse_page(data) -> (metadata, pagecontent)` (where `metadata` is a
> dictionary and `pagecontent` is a string). This function will have all parsing
> logic in it, from splitting lines to making the dictionary out of metadata.
Then
> we can just `params.update(metadata)` and `params[key] = (pagecontent,
> filename)`.

The line index is required to strip the metadata from the page data which gets
inserted to params['page_content'].

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:485: if not isinstance(filters, dict) and filters:
On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> This code seems to allow things like `filters = []` or `filters = 0` (which
will
> crash inside of `filter_metadata`). The check should probably be something
like
> `if filters is not None and not isinstance(...):` instead.

Acknowledged.

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:502: if metadata.keys() == ['page']:
On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> Is this a requirement that such metadata should be ignored? This is an honest
> question, I see why you might want to do this, but I can also imagine use
cases
> for including those pages.

This is done to keep a uniform api, the previous implementation does not include
files which have no user defined metadata so adding it now may cause a
regression, however this is probably best directed to @julian since he is the
user.

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode145 cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): On 2017/06/27 ...

July 3, 2017, 9:06 a.m. (2017-07-03 09:06:07 UTC) #6

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:145: for name, value, i in get_page_metadata(params['page'],
data):
On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> This refactoring moved the code into a separate function, but the code is
still
> very interdependent: this `i` in return tuples has nothing to do with metadata
> and why are we even returning an iterator of tuples, wouldn't it be better to
> return a dictionary instead.
> I propose that we go one step further. How about we change `get_page_metadata`
> to `parse_page(data) -> (metadata, pagecontent)` (where `metadata` is a
> dictionary and `pagecontent` is a string). This function will have all parsing
> logic in it, from splitting lines to making the dictionary out of metadata.
Then
> we can just `params.update(metadata)` and `params[key] = (pagecontent,
> filename)`.

Done.

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:485: if not isinstance(filters, dict) and filters:
On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> This code seems to allow things like `filters = []` or `filters = 0` (which
will
> crash inside of `filter_metadata`). The check should probably be something
like
> `if filters is not None and not isinstance(...):` instead.

Done.

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:502: if metadata.keys() == ['page']:
On 2017/06/28 14:31:19, Jon Sonesen wrote:
> On 2017/06/27 13:32:15, Vasily Kuznetsov wrote:
> > Is this a requirement that such metadata should be ignored? This is an
honest
> > question, I see why you might want to do this, but I can also imagine use
> cases
> > for including those pages.
> 
> This is done to keep a uniform api, the previous implementation does not
include
> files which have no user defined metadata so adding it now may cause a
> regression, however this is probably best directed to @julian since he is the
> user.

Still waiting on julians opinion, i guess i get whaat you mean here since have
the pages a member regardless of their metadta could be useful

Vasily Kuznetsov

Hey Jon, Looks almost good to me, the comments are basically just naming suggestions. Cheers, ...

July 3, 2017, 5:42 p.m. (2017-07-03 17:42:45 UTC) #7

Hey Jon,

Looks almost good to me, the comments are basically just naming suggestions.

Cheers,
Vasily

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n...
cms/converters.py:119: def get_page_metadata(page, data):
Perhaps this function should be renamed now since it's actually doing a bit more
than just getting the metadata. I can't think of anything better than something
like `parse_page_content` -- feel free to suggest a better name (we can discuss
on IRC to speed it up).

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n...
cms/converters.py:120: """Generator which gets per page metadata and cleaned
page content"""
Whenever possible, it's best to write docstrings in the style "Doo foo" or if
we're returning something non-obvious, "Do foo, return bar" or "Do foo, yield
bars" [1]. So perhaps here we could write something like "Separate page content
into metadata (dict) and body text (str)".

[1]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n...
cms/converters.py:149: page_data, cleaned_page =
get_page_metadata(params['page'], data)
I think the variable naming is somewhat confusing here. `page_data` should
probably be called `metadata`, since we usually call this metadata elsewhere and
`cleaned_page` could be something like `body_text`. What do you think?

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n...
cms/converters.py:495: page_data, cleaned_page = get_page_metadata(page_name,
data)
We can just take the first part of the tuple that `get_page_metadata` returns
and then I would also suggest renaming the variable for clarity. So we'd end up
with:

    metadata = get_page_metadata(...)[0]

What do you think?

juliandoucette

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode502 cms/converters.py:502: if metadata.keys() == ['page']: On 2017/06/28 14:31:19, Jon Sonesen ...

July 3, 2017, 9:48 p.m. (2017-07-03 21:48:52 UTC) #8

juliandoucette

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#newcode119 cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: ...

July 3, 2017, 9:55 p.m. (2017-07-03 21:55:54 UTC) #9

Vasily Kuznetsov

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode502 cms/converters.py:502: if metadata.keys() == ['page']: On 2017/07/03 21:48:51, juliandoucette wrote: ...

July 4, 2017, 7:43 a.m. (2017-07-04 07:43:49 UTC) #10

juliandoucette

Thank you for clarifying Vasily. My followups below :) https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode502 cms/converters.py:502: ...

July 4, 2017, 9:57 a.m. (2017-07-04 09:57:25 UTC) #11

Vasily Kuznetsov

Thanks for the clarifications, Julian. See my responses below. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#newcode502 cms/converters.py:502: ...

July 4, 2017, 10:23 a.m. (2017-07-04 10:23:34 UTC) #12

Thanks for the clarifications, Julian. See my responses below.

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n...
cms/converters.py:502: if metadata.keys() == ['page']:
On 2017/07/04 09:57:24, juliandoucette wrote:
> On 2017/07/04 07:43:48, Vasily Kuznetsov wrote:
> > The question is: if a page has no explicit metadata (so it will only have
> 'page'
> > key in the dictionary) but matches the query (probably means that the query
> was
> > empty), 
> 
> Or that I queried by page name?
> 
> > should it be in the results? Current implementation always filters out
> > pages with no explicit metadata.
> 
> So I would get an object like this?
> 
> {
>   "page": "name"
>   "content": "..."
> }
> 
> If so, why would I want to filter this out?

The content is not in the dictionary currently. It can be added, but I would say
that should be another ticket. In any case, I was doubting that you would want
to filter such pages out, thanks for confirming that.

Jon: I would say this `if` should be removed.

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py
File cms/converters.py (right):

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n...
cms/converters.py:119: def get_page_metadata(page, data):
On 2017/07/04 09:57:25, juliandoucette wrote:
> On 2017/07/04 07:43:48, Vasily Kuznetsov wrote:
> > Note that this is not the function that gets exposed to the templates. Also,
> > what it does is take contents of one page and separate it into metadata and
> the
> > rest (which is then used to render the page), so the name should reflect
that.
> 
> Oh, sorry. I meant the get_pages_metadata function (the one that is exposed).
It
> does more than get pages metadata no (It includes page contents)? So why not
> rename it something more appropriate (generic)?

Currently the content is not included in metadata and I think it makes sense to
keep it so as far as this ticket is concerned.

P.S. If we include page content, it would be just what's in the file, before the
CMS processing. Not sure if that would be very useful or more confusing.

juliandoucette

Thanks again for clarifying. https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#newcode119 cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/04 ...

July 4, 2017, 10:42 a.m. (2017-07-04 10:42:56 UTC) #13

Jon Sonesen

On 2017/07/04 10:42:56, juliandoucette wrote: > Thanks again for clarifying. > > https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py > File ...

July 4, 2017, 2:57 p.m. (2017-07-04 14:57:59 UTC) #14

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#newcode149 cms/converters.py:149: page_data, cleaned_page = get_page_metadata(params['page'], data) On 2017/07/03 17:42:44, Vasily ...

July 4, 2017, 2:58 p.m. (2017-07-04 14:58:06 UTC) #15

Jon Sonesen

https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#newcode120 cms/converters.py:120: """Generator which gets per page metadata and cleaned page ...

July 4, 2017, 3:02 p.m. (2017-07-04 15:02:39 UTC) #16

juliandoucette

Question: What is going to happen to acceptableads.com when this lands? (It already defines get_pages_metadata.)

July 6, 2017, 1:27 p.m. (2017-07-06 13:27:51 UTC) #17

Jon Sonesen

On 2017/07/06 13:27:51, juliandoucette wrote: > Question: What is going to happen to http://acceptableads.com when ...

July 6, 2017, 1:29 p.m. (2017-07-06 13:29:06 UTC) #18

Jon Sonesen

On 2017/07/06 14:37:14, Vasily Kuznetsov wrote: > LGTM Julian, what do you think? Do you ...

July 6, 2017, 3:01 p.m. (2017-07-06 15:01:06 UTC) #20

Fine by me.

Issue 29472555: Issue 4867 - Add global get_pages_metadata to template converters (Closed)

Description

Patch Set 1 : #

Patch Set 2 : refactor file metadata parsing, fix tests #

Patch Set 3 : fix interdependency, fix poor filter type checking #

Patch Set 4 : address naming, temp var assignment #

Messages