|
|
Created:
June 23, 2017, 9:51 a.m. by Jon Sonesen Modified:
Aug. 10, 2017, 8:07 p.m. Reviewers:
Vasily Kuznetsov CC:
Sebastian Noack, juliandoucette Visibility:
Public. |
DescriptionIssue 4867 - Add global get_pages_metadata to template converters
Patch Set 1 : #
Total comments: 6
Patch Set 2 : refactor file metadata parsing, fix tests #
Total comments: 13
Patch Set 3 : fix interdependency, fix poor filter type checking #
Total comments: 12
Patch Set 4 : address naming, temp var assignment #
MessagesTotal messages: 21
https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#n... cms/converters.py:490: def parse_page_metadata(self, data, page): This code is essentially duplicating the logic in the init function of the Converter class, Vasily and I discussed this and the options were to break the logic out into a class function of Converters, make it an utils.py function or use it as a function in the converters namespace. We chose to put it in the converters.py namespace as a function because it makes no sense in utils since it is page specific logic, but it is not specific enough to a given page's instance of its own converter class to be a class function. I will break this out into its own function in the next patch set if everyone agrees this makes sense https://codereview.adblockplus.org/29472555/diff/29472565/tests/conftest.py File tests/conftest.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/tests/conftest.py#n... tests/conftest.py:22: 'bar.md')]) I modified the bookmark to introduce a template rather than a page because the goal is to test the revision argument, and the fact that it selects the proper revision, this makes is easy to test that without affecting the page data assertions in the test_static and test_dyanmic methods
See my comments below. Also, the ticket number in the title is wrong. https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#n... cms/converters.py:490: def parse_page_metadata(self, data, page): On 2017/06/23 10:09:26, Jon Sonesen wrote: > This code is essentially duplicating the logic in the init function of the > Converter class, Vasily and I discussed this and the options were to break the > logic out into a class function of Converters, make it an utils.py function or > use it as a function in the converters namespace. > > We chose to put it in the converters.py namespace as a function because it makes > no sense in utils since it is page specific logic, but it is not specific enough > to a given page's instance of its own converter class to be a class function. > > I will break this out into its own function in the next patch set if everyone > agrees this makes sense As discussed, this approach sounds good. Now looking at these 3 functions that we're adding to `TemplateConverter` it starts looking like we should separate all the default globals out into their own file(s). They are not really part of the converter logic but are more like a set of services that we provide to the template -- it doesn't seem right to pollute the converter class with this stuff. The globals often access `self._params`, which technically is a private attribute of the converter, but logically that thing is a rendering context and it actually becomes the context (in jinja sense) of the templates so we will be able to get it using `contextfunction` decorator. There's also `self._get_locale_data()` that is used by the globals, but I'm actually wondering if `self._params['localedata']` should be used instead (it wouldn't load file from the disk the locale every time and it also supports locale overrides...). I guess we should ask Wladimir why it's done this way (it's from this change: https://hg.adblockplus.org/cms/rev/b022896ef69a). Anyway, you can do the metadata loading refactoring already and perhaps the separation of the globals will land as a separate change. https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out... File tests/test_page_outputs.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out... tests/test_page_outputs.py:73: def test_revision_arg(revision, temp_site): Is this test related to `get_pages_metadata` global?
https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/cms/converters.py#n... cms/converters.py:490: def parse_page_metadata(self, data, page): On 2017/06/23 14:15:24, Vasily Kuznetsov wrote: > On 2017/06/23 10:09:26, Jon Sonesen wrote: > > This code is essentially duplicating the logic in the init function of the > > Converter class, Vasily and I discussed this and the options were to break the > > logic out into a class function of Converters, make it an utils.py function or > > use it as a function in the converters namespace. > > > > We chose to put it in the converters.py namespace as a function because it > makes > > no sense in utils since it is page specific logic, but it is not specific > enough > > to a given page's instance of its own converter class to be a class function. > > > > I will break this out into its own function in the next patch set if everyone > > agrees this makes sense > > As discussed, this approach sounds good. > > Now looking at these 3 functions that we're adding to `TemplateConverter` it > starts looking like we should separate all the default globals out into their > own file(s). They are not really part of the converter logic but are more like a > set of services that we provide to the template -- it doesn't seem right to > pollute the converter class with this stuff. The globals often access > `self._params`, which technically is a private attribute of the converter, but > logically that thing is a rendering context and it actually becomes the context > (in jinja sense) of the templates so we will be able to get it using > `contextfunction` decorator. There's also `self._get_locale_data()` that is used > by the globals, but I'm actually wondering if `self._params['localedata']` > should be used instead (it wouldn't load file from the disk the locale every > time and it also supports locale overrides...). I guess we should ask Wladimir > why it's done this way (it's from this change: > https://hg.adblockplus.org/cms/rev/b022896ef69a). > > Anyway, you can do the metadata loading refactoring already and perhaps the > separation of the globals will land as a separate change. Yeah I totally agree here, and actually we talked about this in the past (not to this extent detail wise) the fact that we could break out globals and/or filters out of the converters file tp make it cleaner to extend in the future. Regarding the locale_data changes I agree here, since instantiating any converter will override the locale data with user specified parameters. But maybe there is a side effect we are not considering, or are unaware of. https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out... File tests/test_page_outputs.py (right): https://codereview.adblockplus.org/29472555/diff/29472565/tests/test_page_out... tests/test_page_outputs.py:73: def test_revision_arg(revision, temp_site): On 2017/06/23 14:15:24, Vasily Kuznetsov wrote: > Is this test related to `get_pages_metadata` global? Yeah, so I actually shouldn't have changed this. Did a nono, and changed the tests to pass with the new code. But basically since I changed the metadata parsing to be a generator I had to change the 'page_data' instantiation to happen before calling the generator. This caused the page added by the revision arg's test bookmark to get added to the page rendering on the filterless get_pages_metadata calls, even though there was no metadata there the dict with the key 'page': page name was getting through the filter. I fixy.
https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): This refactoring moved the code into a separate function, but the code is still very interdependent: this `i` in return tuples has nothing to do with metadata and why are we even returning an iterator of tuples, wouldn't it be better to return a dictionary instead. I propose that we go one step further. How about we change `get_page_metadata` to `parse_page(data) -> (metadata, pagecontent)` (where `metadata` is a dictionary and `pagecontent` is a string). This function will have all parsing logic in it, from splitting lines to making the dictionary out of metadata. Then we can just `params.update(metadata)` and `params[key] = (pagecontent, filename)`. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:485: if not isinstance(filters, dict) and filters: This code seems to allow things like `filters = []` or `filters = 0` (which will crash inside of `filter_metadata`). The check should probably be something like `if filters is not None and not isinstance(...):` instead. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: Is this a requirement that such metadata should be ignored? This is an honest question, I see why you might want to do this, but I can also imagine use cases for including those pages.
https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > This refactoring moved the code into a separate function, but the code is still > very interdependent: this `i` in return tuples has nothing to do with metadata > and why are we even returning an iterator of tuples, wouldn't it be better to > return a dictionary instead. > I propose that we go one step further. How about we change `get_page_metadata` > to `parse_page(data) -> (metadata, pagecontent)` (where `metadata` is a > dictionary and `pagecontent` is a string). This function will have all parsing > logic in it, from splitting lines to making the dictionary out of metadata. Then > we can just `params.update(metadata)` and `params[key] = (pagecontent, > filename)`. The line index is required to strip the metadata from the page data which gets inserted to params['page_content']. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:485: if not isinstance(filters, dict) and filters: On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > This code seems to allow things like `filters = []` or `filters = 0` (which will > crash inside of `filter_metadata`). The check should probably be something like > `if filters is not None and not isinstance(...):` instead. Acknowledged. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > Is this a requirement that such metadata should be ignored? This is an honest > question, I see why you might want to do this, but I can also imagine use cases > for including those pages. This is done to keep a uniform api, the previous implementation does not include files which have no user defined metadata so adding it now may cause a regression, however this is probably best directed to @julian since he is the user.
https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:145: for name, value, i in get_page_metadata(params['page'], data): On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > This refactoring moved the code into a separate function, but the code is still > very interdependent: this `i` in return tuples has nothing to do with metadata > and why are we even returning an iterator of tuples, wouldn't it be better to > return a dictionary instead. > I propose that we go one step further. How about we change `get_page_metadata` > to `parse_page(data) -> (metadata, pagecontent)` (where `metadata` is a > dictionary and `pagecontent` is a string). This function will have all parsing > logic in it, from splitting lines to making the dictionary out of metadata. Then > we can just `params.update(metadata)` and `params[key] = (pagecontent, > filename)`. Done. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:485: if not isinstance(filters, dict) and filters: On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > This code seems to allow things like `filters = []` or `filters = 0` (which will > crash inside of `filter_metadata`). The check should probably be something like > `if filters is not None and not isinstance(...):` instead. Done. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/06/28 14:31:19, Jon Sonesen wrote: > On 2017/06/27 13:32:15, Vasily Kuznetsov wrote: > > Is this a requirement that such metadata should be ignored? This is an honest > > question, I see why you might want to do this, but I can also imagine use > cases > > for including those pages. > > This is done to keep a uniform api, the previous implementation does not include > files which have no user defined metadata so adding it now may cause a > regression, however this is probably best directed to @julian since he is the > user. Still waiting on julians opinion, i guess i get whaat you mean here since have the pages a member regardless of their metadta could be useful
Hey Jon, Looks almost good to me, the comments are basically just naming suggestions. Cheers, Vasily https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): Perhaps this function should be renamed now since it's actually doing a bit more than just getting the metadata. I can't think of anything better than something like `parse_page_content` -- feel free to suggest a better name (we can discuss on IRC to speed it up). https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:120: """Generator which gets per page metadata and cleaned page content""" Whenever possible, it's best to write docstrings in the style "Doo foo" or if we're returning something non-obvious, "Do foo, return bar" or "Do foo, yield bars" [1]. So perhaps here we could write something like "Separate page content into metadata (dict) and body text (str)". [1]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:149: page_data, cleaned_page = get_page_metadata(params['page'], data) I think the variable naming is somewhat confusing here. `page_data` should probably be called `metadata`, since we usually call this metadata elsewhere and `cleaned_page` could be something like `body_text`. What do you think? https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:495: page_data, cleaned_page = get_page_metadata(page_name, data) We can just take the first part of the tuple that `get_page_metadata` returns and then I would also suggest renaming the variable for clarity. So we'd end up with: metadata = get_page_metadata(...)[0] What do you think?
https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/06/28 14:31:19, Jon Sonesen wrote: > however this is probably best directed to @julian since he is the user. I don't understand the question. Can you please explain?
https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > Perhaps this function should be renamed now since it's actually doing a bit more > than just getting the metadata. I can't think of anything better than something > like `parse_page_content` -- feel free to suggest a better name (we can discuss > on IRC to speed it up). [`get`, `query`, `pages`, `get_pages`, ...] I like short names.
https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/07/03 21:48:51, juliandoucette wrote: > On 2017/06/28 14:31:19, Jon Sonesen wrote: > > however this is probably best directed to @julian since he is the user. > > I don't understand the question. Can you please explain? > The question is: if a page has no explicit metadata (so it will only have 'page' key in the dictionary) but matches the query (probably means that the query was empty), should it be in the results? Current implementation always filters out pages with no explicit metadata. https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/03 21:55:54, juliandoucette wrote: > On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > > Perhaps this function should be renamed now since it's actually doing a bit > more > > than just getting the metadata. I can't think of anything better than > something > > like `parse_page_content` -- feel free to suggest a better name (we can > discuss > > on IRC to speed it up). > > [`get`, `query`, `pages`, `get_pages`, ...] I like short names. Note that this is not the function that gets exposed to the templates. Also, what it does is take contents of one page and separate it into metadata and the rest (which is then used to render the page), so the name should reflect that.
Thank you for clarifying Vasily. My followups below :) https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > The question is: if a page has no explicit metadata (so it will only have 'page' > key in the dictionary) but matches the query (probably means that the query was > empty), Or that I queried by page name? > should it be in the results? Current implementation always filters out > pages with no explicit metadata. So I would get an object like this? { "page": "name" "content": "..." } If so, why would I want to filter this out? https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > Note that this is not the function that gets exposed to the templates. Also, > what it does is take contents of one page and separate it into metadata and the > rest (which is then used to render the page), so the name should reflect that. Oh, sorry. I meant the get_pages_metadata function (the one that is exposed). It does more than get pages metadata no (It includes page contents)? So why not rename it something more appropriate (generic)?
Thanks for the clarifications, Julian. See my responses below. https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29474555/cms/converters.py#n... cms/converters.py:502: if metadata.keys() == ['page']: On 2017/07/04 09:57:24, juliandoucette wrote: > On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > > The question is: if a page has no explicit metadata (so it will only have > 'page' > > key in the dictionary) but matches the query (probably means that the query > was > > empty), > > Or that I queried by page name? > > > should it be in the results? Current implementation always filters out > > pages with no explicit metadata. > > So I would get an object like this? > > { > "page": "name" > "content": "..." > } > > If so, why would I want to filter this out? The content is not in the dictionary currently. It can be added, but I would say that should be another ticket. In any case, I was doubting that you would want to filter such pages out, thanks for confirming that. Jon: I would say this `if` should be removed. https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/04 09:57:25, juliandoucette wrote: > On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > > Note that this is not the function that gets exposed to the templates. Also, > > what it does is take contents of one page and separate it into metadata and > the > > rest (which is then used to render the page), so the name should reflect that. > > Oh, sorry. I meant the get_pages_metadata function (the one that is exposed). It > does more than get pages metadata no (It includes page contents)? So why not > rename it something more appropriate (generic)? Currently the content is not included in metadata and I think it makes sense to keep it so as far as this ticket is concerned. P.S. If we include page content, it would be just what's in the file, before the CMS processing. Not sure if that would be very useful or more confusing.
Thanks again for clarifying. https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:119: def get_page_metadata(page, data): On 2017/07/04 10:23:34, Vasily Kuznetsov wrote: > Currently the content is not included in metadata and I think it makes sense to > keep it so as far as this ticket is concerned. > > P.S. If we include page content, it would be just what's in the file, before the > CMS processing. Not sure if that would be very useful or more confusing. Definitely more confusing. I don't really care if the content is there (because I haven't had to use it yet) - but it does logically follow that if the processed content is not in this object then I should be able to pass this object or the page name to another function to get it (another ticket ~someday).
On 2017/07/04 10:42:56, juliandoucette wrote: > Thanks again for clarifying. > > https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py > File cms/converters.py (right): > > https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... > cms/converters.py:119: def get_page_metadata(page, data): > On 2017/07/04 10:23:34, Vasily Kuznetsov wrote: > > Currently the content is not included in metadata and I think it makes sense > to > > keep it so as far as this ticket is concerned. > > > > P.S. If we include page content, it would be just what's in the file, before > the > > CMS processing. Not sure if that would be very useful or more confusing. > > Definitely more confusing. > I agree here, I think for now we should leave it how it is. If we wanna add it later this is rather trivial
https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:149: page_data, cleaned_page = get_page_metadata(params['page'], data) On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > I think the variable naming is somewhat confusing here. `page_data` should > probably be called `metadata`, since we usually call this metadata elsewhere and > `cleaned_page` could be something like `body_text`. What do you think? Agree here, ack https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:495: page_data, cleaned_page = get_page_metadata(page_name, data) On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > We can just take the first part of the tuple that `get_page_metadata` returns > and then I would also suggest renaming the variable for clarity. So we'd end up > with: > > metadata = get_page_metadata(...)[0] > > What do you think? Acknowledged.
https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py File cms/converters.py (right): https://codereview.adblockplus.org/29472555/diff/29478561/cms/converters.py#n... cms/converters.py:120: """Generator which gets per page metadata and cleaned page content""" On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > Whenever possible, it's best to write docstrings in the style "Doo foo" or if > we're returning something non-obvious, "Do foo, return bar" or "Do foo, yield > bars" [1]. So perhaps here we could write something like "Separate page content > into metadata (dict) and body text (str)". > > [1]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings Acknowledged.
Question: What is going to happen to acceptableads.com when this lands? (It already defines get_pages_metadata.)
On 2017/07/06 13:27:51, juliandoucette wrote: > Question: What is going to happen to http://acceptableads.com when this lands? (It > already defines get_pages_metadata.) The cms globals will use the user defined global, when you remove that definition it will use the default, at least this was my experience during testing
LGTM
On 2017/07/06 14:37:14, Vasily Kuznetsov wrote: > LGTM Julian, what do you think? Do you feel comfortable with me pushing this? You can move yourself to cc if you wish as well.
Fine by me. |