|
|
Created:
Jan. 2, 2017, 9:30 a.m. by Jon Sonesen Modified:
June 23, 2017, 9:07 a.m. Visibility:
Public. |
DescriptionIssue 4867 - Add Context Function get_pages_metadata to Test Site
Repository: hg.adblockplus.org/cms
Patch Set 1 #
Total comments: 8
Patch Set 2 : addresses read_page usage, updates return data to include page key value #
Total comments: 6
Patch Set 3 : remove per line page assignment fix argument and data access errors #
Total comments: 10
Patch Set 4 : #
Total comments: 6
Patch Set 5 : #
Total comments: 2
Patch Set 6 : #
Total comments: 7
Patch Set 7 : realized that later when we add json the filter syntax will have to change anyway. #
Total comments: 4
Patch Set 8 : #
MessagesTotal messages: 51
Here is my implementation of the page metadata fetcher. I found that splitting the metadata by comma rather than adding extra logic to check whether or not the filter option is a list or having the value always be a list. However I am open to changing this.
Looks good and it looks like it works. However, should not we also add a file for `sitemap` to `expected_output` to make the whole thing execute. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:23: page_metadata = {} Seems like one level of indentation was lost here somehow.
I looked at the code a bit more and played with the output, see the comments below. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:12: path = context['source'].page_filename(page_name, _format) You can replace this and the following line with: data, filename = context['source'].read_page(page_name, _format) It basically does the same thing inside. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:22: def parse_page_metadata(data, page): The return value of this function doesn't contain the `page` key that is given in the examples in the ticket. Also `page` argument to the function is not used, and it was probably intended for that. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:14: {%- for page in get_pages_metadata({'tags': 'popular, bar'}) -%} I see you changed the API a little from what is described in the ticket. The way it is there this call would be: get_pages_metadata({'tags': ['popular', 'bar']}) I don't mind the change, it seems that it actually simplifies both the implementation and the usage. You should probably update the ticket though, to make sure it's all in sync.
Thanks for taking the time to look closer at this. My apologies on the oversights, which have been addressed in the second patch set. Additionally as per Sebastian's comments regarding where this patch should land I have cc'd Julian so he can get the global file patch once it gets the go ahead to land. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:12: path = context['source'].page_filename(page_name, _format) On 2017/01/11 10:52:44, Vasily Kuznetsov wrote: > You can replace this and the following line with: > > data, filename = context['source'].read_page(page_name, _format) > > It basically does the same thing inside. Done. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:22: def parse_page_metadata(data, page): On 2017/01/11 10:52:44, Vasily Kuznetsov wrote: > The return value of this function doesn't contain the `page` key that is given > in the examples in the ticket. Also `page` argument to the function is not used, > and it was probably intended for that. Done. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:23: page_metadata = {} On 2017/01/10 19:34:25, Vasily Kuznetsov wrote: > Seems like one level of indentation was lost here somehow. Done. https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/pag... File tests/test_site/pages/sitemap.tmpl (right): https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/pag... tests/test_site/pages/sitemap.tmpl:14: {%- for page in get_pages_metadata({'tags': 'popular, bar'}) -%} On 2017/01/11 10:52:44, Vasily Kuznetsov wrote: > I see you changed the API a little from what is described in the ticket. The way > it is there this call would be: > > get_pages_metadata({'tags': ['popular', 'bar']}) > > I don't mind the change, it seems that it actually simplifies both the > implementation and the usage. You should probably update the ticket though, to > make sure it's all in sync. Acknowledged.
On 2017/01/10 19:34:25, Vasily Kuznetsov wrote: > Looks good and it looks like it works. However, should not we also add a file > for `sitemap` to `expected_output` to make the whole thing execute. > > https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... > File tests/test_site/globals/get_pages_metadata.py (right): > > https://codereview.adblockplus.org/29370597/diff/29370598/tests/test_site/glo... > tests/test_site/globals/get_pages_metadata.py:23: page_metadata = {} > Seems like one level of indentation was lost here somehow. In regards to adding the expected output to make it execute. The global actually is executed because of the way the test setup is ran but adding the expected output would just add more test cases. Since this is landing in the helpcenter repo it also may not be relevant.
Hi Jon! Looking better now but there are a few more things to correct, see below. Regarding the correct output for the `sitemap.tmpl`, I think it would be useful to have it for the purposes of testing this code during development and for making the reviewer's job a bit easier. You're right though that it's less important in this case since this change will not be landing in CMS so there's no issue of untested code being added. All in all, I'm happy either way. Cheers, Vasily https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: page_metadata['page'] = page I don't think we want to execute this for each line. It seems better to start with {'page': page} on line 22 instead. https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:38: if filter_name not in metadata[page]: This change looks like a mistake. `metadata` is already the metadata of the page so doing the `[page]` part is not necessary (actually it leads to an error). https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: if option not in metadata[page][filter_name]: Same here regarding the `[page]` part. Also, what if the metadata doesn't contain `filter_name`?
Perhaps we should still add tests to CMS in order to guarantee the APIs used by the help center (i.e. in context['source'].list_pages and context['source'].read_page) to remain stable.
Patch Set 2 I appear to have uploaded a bad patch. https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: page_metadata['page'] = page On 2017/01/12 10:37:47, Vasily Kuznetsov wrote: > I don't think we want to execute this for each line. It seems better to start > with {'page': page} on line 22 instead. Done. https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:38: if filter_name not in metadata[page]: On 2017/01/12 10:37:47, Vasily Kuznetsov wrote: > This change looks like a mistake. `metadata` is already the metadata of the page > so doing the `[page]` part is not necessary (actually it leads to an error). Yeah, was a fail. Will fix https://codereview.adblockplus.org/29370597/diff/29371576/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: if option not in metadata[page][filter_name]: On 2017/01/12 10:37:47, Vasily Kuznetsov wrote: > Same here regarding the `[page]` part. Also, what if the metadata doesn't > contain `filter_name You are right
Addressed previous errors and added a default template as setting the template to empty on every page seems pretty redundant. Also introduced a basic string match test for the sitemap page.
Hi Jon, Looks good. I noticed that we treat multiple filer values for the same field as if they were joined by AND. The ticket seems to not be completely clear on this (I read it as it recommends AND for filters of different fields but nothing really about several filters of the same field). I think that it might be more useful to use OR there. Julian, do you have any preference? Cheers, Vasily https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: Here if a list of options was given for the same field name, we require that all of them are present. This makes sense in some cases but I wonder if this is what we always want. The ticket is not completely clear about this in my opinion. Perhaps Julian can tell us what would be preferred.
On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > Hi Jon, > > Looks good. I noticed that we treat multiple filer values for the same field as > if they were joined by AND. The ticket seems to not be completely clear on this > (I read it as it recommends AND for filters of different fields but nothing > really about several filters of the same field). I think that it might be more > useful to use OR there. > > Julian, do you have any preference? > > Cheers, > Vasily > > https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... > File tests/test_site/globals/get_pages_metadata.py (right): > > https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... > tests/test_site/globals/get_pages_metadata.py:40: if option not in > metadata[filter_name]: > Here if a list of options was given for the same field name, we require that all > of them are present. This makes sense in some cases but I wonder if this is what > we always want. The ticket is not completely clear about this in my opinion. > Perhaps Julian can tell us what would be preferred. It seems like it may be better to have it be an OR but since we haven't heard back I will reach out via IRC.
See response below. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > Here if a list of options was given for the same field name, we require that all > of them are present. This makes sense in some cases but I wonder if this is what > we always want. The ticket is not completely clear about this in my opinion. > Perhaps Julian can tell us what would be preferred. Good question. A list of required fields is a requirement for help center. A list of optional fields or mixed required and optional fields is not required for help center. I don't have a strong opinion about if and how you should implement optional fields.
Something came to mind. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/23 16:59:31, juliandoucette wrote: > On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > > Here if a list of options was given for the same field name, we require that > all > > of them are present. This makes sense in some cases but I wonder if this is > what > > we always want. The ticket is not completely clear about this in my opinion. > > Perhaps Julian can tell us what would be preferred. > > Good question. A list of required fields is a requirement for help center. A > list of optional fields or mixed required and optional fields is not required > for help center. I don't have a strong opinion about if and how you should > implement optional fields. On that topic, I might suggest accepting a function to resolve True/False here. But, again, that is not a requirement for Help Center.
Corrections. I just realized that what I said before didn't make sense. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/23 16:59:31, juliandoucette wrote: > On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > > Here if a list of options was given for the same field name, we require that > all > > of them are present. This makes sense in some cases but I wonder if this is > what > > we always want. The ticket is not completely clear about this in my opinion. > > Perhaps Julian can tell us what would be preferred. > > Good question. A list of required fields is a requirement for help center. A > list of optional fields or mixed required and optional fields is not required > for help center. I don't have a strong opinion about if and how you should > implement optional fields. - Replace "field" with "value" in my response above - I don't have a strong opinion about if **or** how to implement optional filter values** (Sorry for the confusion)
Thanks for the answer, Julian. I think we got a bit out of sync here, I reformulated the question to make it more clear, see below. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/24 01:10:29, juliandoucette wrote: > On 2017/01/23 16:59:31, juliandoucette wrote: > > On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > > > Here if a list of options was given for the same field name, we require that > > all > > > of them are present. This makes sense in some cases but I wonder if this is > > what > > > we always want. The ticket is not completely clear about this in my opinion. > > > Perhaps Julian can tell us what would be preferred. > > > > Good question. A list of required fields is a requirement for help center. A > > list of optional fields or mixed required and optional fields is not required > > for help center. I don't have a strong opinion about if and how you should > > implement optional fields. > > - Replace "field" with "value" in my response above > - I don't have a strong opinion about if **or** how to implement optional filter > values** > > (Sorry for the confusion) Ok, I think we're not quite on the same page here so perhaps I formulated my question badly. Here's an example to illustrate what I mean: page1: topic=philosophy page2: topic=philosophy,religion Now if we invoke get_pages_metadata(topic='philosophy,religion'), do we want to get the metadata from page1 and page2 or only from page2? Implementation is easy either way, I was just wondering what would be more useful for you.
See response below. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/24 09:24:46, Vasily Kuznetsov wrote: > On 2017/01/24 01:10:29, juliandoucette wrote: > > On 2017/01/23 16:59:31, juliandoucette wrote: > > > On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > > > > Here if a list of options was given for the same field name, we require > that > > > all > > > > of them are present. This makes sense in some cases but I wonder if this > is > > > what > > > > we always want. The ticket is not completely clear about this in my > opinion. > > > > Perhaps Julian can tell us what would be preferred. > > > > > > Good question. A list of required fields is a requirement for help center. A > > > list of optional fields or mixed required and optional fields is not > required > > > for help center. I don't have a strong opinion about if and how you should > > > implement optional fields. > > > > - Replace "field" with "value" in my response above > > - I don't have a strong opinion about if **or** how to implement optional > filter > > values** > > > > (Sorry for the confusion) > > Ok, I think we're not quite on the same page here so perhaps I formulated my > question badly. Here's an example to illustrate what I mean: > > page1: > topic=philosophy > > page2: > topic=philosophy,religion > > Now if we invoke get_pages_metadata(topic='philosophy,religion'), do we want to > get the metadata from page1 and page2 or only from page2? > > Implementation is easy either way, I was just wondering what would be more > useful for you. I think I understood your question correctly. Help center requires "only from page 2".
Thanks for the clarification, Julian. LGTM https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:40: if option not in metadata[filter_name]: On 2017/01/24 19:32:55, juliandoucette wrote: > On 2017/01/24 09:24:46, Vasily Kuznetsov wrote: > > On 2017/01/24 01:10:29, juliandoucette wrote: > > > On 2017/01/23 16:59:31, juliandoucette wrote: > > > > On 2017/01/20 11:53:38, Vasily Kuznetsov wrote: > > > > > Here if a list of options was given for the same field name, we require > > that > > > > all > > > > > of them are present. This makes sense in some cases but I wonder if this > > is > > > > what > > > > > we always want. The ticket is not completely clear about this in my > > opinion. > > > > > Perhaps Julian can tell us what would be preferred. > > > > > > > > Good question. A list of required fields is a requirement for help center. > A > > > > list of optional fields or mixed required and optional fields is not > > required > > > > for help center. I don't have a strong opinion about if and how you should > > > > implement optional fields. > > > > > > - Replace "field" with "value" in my response above > > > - I don't have a strong opinion about if **or** how to implement optional > > filter > > > values** > > > > > > (Sorry for the confusion) > > > > Ok, I think we're not quite on the same page here so perhaps I formulated my > > question badly. Here's an example to illustrate what I mean: > > > > page1: > > topic=philosophy > > > > page2: > > topic=philosophy,religion > > > > Now if we invoke get_pages_metadata(topic='philosophy,religion'), do we want > to > > get the metadata from page1 and page2 or only from page2? > > > > Implementation is easy either way, I was just wondering what would be more > > useful for you. > > I think I understood your question correctly. > > Help center requires "only from page 2". Ok, in this case I misunderstood your previous answer. But now it's clear.
On 2017/01/12 11:42:11, Sebastian Noack wrote: > Perhaps we should still add tests to CMS in order to guarantee the APIs used by > the help center (i.e. in context['source'].list_pages and > context['source'].read_page) to remain stable. I agree here, source I should open a ticket for making unit tests on the mercurial source object?
On 2017/01/25 12:07:25, Jon Sonesen wrote: > On 2017/01/12 11:42:11, Sebastian Noack wrote: > > Perhaps we should still add tests to CMS in order to guarantee the APIs used > by > > the help center (i.e. in context['source'].list_pages and > > context['source'].read_page) to remain stable. > > I agree here, source I should open a ticket for making unit tests on the > mercurial source object? I'd say that it could be part of this ticket since the tests are needed for the global that we define here to work reliably. So perhaps you can just add it to this ticket and create another review with the tests.
NOT LGTM https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:28: value = tuple(value.strip().split(',')) 1. It's annoying to have single values as arrays - e.g. {{ post[title][0] }} vs {{ post[title] }} 2. We don't want to split all values by "," - e.g. we may have commas in dates or the description of a page
https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:28: value = tuple(value.strip().split(',')) On 2017/02/27 21:27:31, juliandoucette wrote: > 1. It's annoying to have single values as arrays > - e.g. {{ post[title][0] }} vs {{ post[title] }} > 2. We don't want to split all values by "," > - e.g. we may have commas in dates or the description of a page Yeah, makes sense. I suppose then not every metadata field should be treated as a list by default. Perhaps a reasonable default would be to treat them as a strings. However, there are fields that probably should be lists, for example the tags. We could come up with some kind of notation for those. One way to do it would be to say that if the value starts with a square bracket ("["), then it's a list and it should be split by comma so that: tags = [foo, bar] will produce the value which is a list: ["foo", "bar"] and things like this: date = Wed, Nov 5 will stay a string: "Wed, Nov 5". Another way to do it is to allow something like this syntax for lists: tags = foo bar baz title = My awesome page This would be a bit harder to parse and would require changes to CMS core to be compatible, so it wouldn't be my first preference. We could also adopt a different approach and treat everything as a string but replace the exact matching with substring matching. Then if we have this in the page header: tags = foo, bar and we look for pages tagged with "foo", it would be found. What I don't like about this approach is that a page with this in the header would also be matched: tags = foobar, quux but "foobar" seems like a different tag. We could probably fix that by requiring that if not the whole string is matched, the end of the match that is inside of the string should be at a word boundary or a comma or something like that. It would be more fiddly, but we can probably end up with something reasonably fool-proof. In this case all metadata fields would be returned as strings (nothing would be converted to a list), which might or might not be what you want. Do any of these solutions sound acceptable or do you have other ideas?
Hey Julian, Thanks for pointing this out, I agree here it would be annoying to always be getting [0] if it there is a singular value in the metadata field. Vasily, thanks for the suggestions. While I think they are viable I was thinking it would be easiest to just always make it a list unless the resulting list has a len less than 2 in which case we just set it back to a string. It is not elegant but also it is simple to understand and implement. I guess it is really up to us as far as implementation details since the resulting usage should be the same from Julian's perspective. https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:28: value = tuple(value.strip().split(',')) On 2017/02/28 11:24:09, Vasily Kuznetsov wrote: > On 2017/02/27 21:27:31, juliandoucette wrote: > > 1. It's annoying to have single values as arrays > > - e.g. {{ post[title][0] }} vs {{ post[title] }} > > 2. We don't want to split all values by "," > > - e.g. we may have commas in dates or the description of a page > > Yeah, makes sense. I suppose then not every metadata field should be treated as > a list by default. Perhaps a reasonable default would be to treat them as a > strings. > > However, there are fields that probably should be lists, for example the tags. > We could come up with some kind of notation for those. One way to do it would be > to say that if the value starts with a square bracket ("["), then it's a list > and it should be split by comma so that: > > tags = [foo, bar] > > will produce the value which is a list: ["foo", "bar"] and things like this: > > date = Wed, Nov 5 > > will stay a string: "Wed, Nov 5". > > Another way to do it is to allow something like this syntax for lists: > > tags = foo > bar > baz > title = My awesome page > > This would be a bit harder to parse and would require changes to CMS core to be > compatible, so it wouldn't be my first preference. > > We could also adopt a different approach and treat everything as a string but > replace the exact matching with substring matching. Then if we have this in the > page header: > > tags = foo, bar > > and we look for pages tagged with "foo", it would be found. What I don't like > about this approach is that a page with this in the header would also be > matched: > > tags = foobar, quux > > but "foobar" seems like a different tag. We could probably fix that by requiring > that if not the whole string is matched, the end of the match that is inside of > the string should be at a word boundary or a comma or something like that. It > would be more fiddly, but we can probably end up with something reasonably > fool-proof. In this case all metadata fields would be returned as strings > (nothing would be converted to a list), which might or might not be what you > want. > > Do any of these solutions sound acceptable or do you have other ideas? could we not just check that the value is not more than one and if so don't make it a list?
On 2017/02/28 11:59:18, Jon Sonesen wrote: > Hey Julian, > > Thanks for pointing this out, I agree here it would be annoying to always be > getting [0] if it there is a singular value in the metadata field. > > Vasily, > > thanks for the suggestions. While I think they are viable I was thinking it > would be easiest to just always make it a list unless the resulting list has a > len less than 2 in which case we just set it back to a string. It is not elegant > but also it is simple to understand and implement. > > I guess it is really up to us as far as implementation details since the > resulting usage should be the same from Julian's perspective. > > https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... > File tests/test_site/globals/get_pages_metadata.py (right): > > https://codereview.adblockplus.org/29370597/diff/29372661/tests/test_site/glo... > tests/test_site/globals/get_pages_metadata.py:28: value = > tuple(value.strip().split(',')) > On 2017/02/28 11:24:09, Vasily Kuznetsov wrote: > > On 2017/02/27 21:27:31, juliandoucette wrote: > > > 1. It's annoying to have single values as arrays > > > - e.g. {{ post[title][0] }} vs {{ post[title] }} > > > 2. We don't want to split all values by "," > > > - e.g. we may have commas in dates or the description of a page > > > > Yeah, makes sense. I suppose then not every metadata field should be treated > as > > a list by default. Perhaps a reasonable default would be to treat them as a > > strings. > > > > However, there are fields that probably should be lists, for example the tags. > > We could come up with some kind of notation for those. One way to do it would > be > > to say that if the value starts with a square bracket ("["), then it's a list > > and it should be split by comma so that: > > > > tags = [foo, bar] > > > > will produce the value which is a list: ["foo", "bar"] and things like this: > > > > date = Wed, Nov 5 > > > > will stay a string: "Wed, Nov 5". > > > > Another way to do it is to allow something like this syntax for lists: > > > > tags = foo > > bar > > baz > > title = My awesome page > > > > This would be a bit harder to parse and would require changes to CMS core to > be > > compatible, so it wouldn't be my first preference. > > > > We could also adopt a different approach and treat everything as a string but > > replace the exact matching with substring matching. Then if we have this in > the > > page header: > > > > tags = foo, bar > > > > and we look for pages tagged with "foo", it would be found. What I don't like > > about this approach is that a page with this in the header would also be > > matched: > > > > tags = foobar, quux > > > > but "foobar" seems like a different tag. We could probably fix that by > requiring > > that if not the whole string is matched, the end of the match that is inside > of > > the string should be at a word boundary or a comma or something like that. It > > would be more fiddly, but we can probably end up with something reasonably > > fool-proof. In this case all metadata fields would be returned as strings > > (nothing would be converted to a list), which might or might not be what you > > want. > > > > Do any of these solutions sound acceptable or do you have other ideas? > > could we not just check that the value is not more than one and if so don't make > it a list? Well, Julian above says that some non-list strings have commas in them so they would become broken by this approach. Also in some cases variables that should be lists might have only one element in them (tags for example). If the template depends on those variables being lists, it might break. So it seems like we would need something more robust.
# On one hand String values by default and list values when string starts with "[" and ends with "]". # On the other Why complicate our key=value format? We could support an alternative format e.g. JSON --- What do you think?
On 2017/03/05 21:00:29, juliandoucette wrote: > # On one hand > > String values by default and list values when string starts with "[" and ends > with "]". > > # On the other > > Why complicate our key=value format? We could support an alternative format e.g. > JSON > > --- > > What do you think? I like this idea, perhaps it could be an optional dir in the site directory which holds json files such as '<page-name>.json' there could also be an optional config option which provides a path to the json metadata directory?
> I like this idea, perhaps it could be an optional dir in the site directory > which holds json files such as '<page-name>.json' there could also be an > optional config option which provides a path to the json metadata directory? Separating meta data from page data was not what I had in mind. I was thinking more along the lines of https://issues.adblockplus.org/ticket/4488
On 2017/03/06 18:22:17, juliandoucette wrote: > > I like this idea, perhaps it could be an optional dir in the site directory > > which holds json files such as '<page-name>.json' there could also be an > > optional config option which provides a path to the json metadata directory? > > Separating meta data from page data was not what I had in mind. > > I was thinking more along the lines of > https://issues.adblockplus.org/ticket/4488 Oh I see, I think that is a good solution and an old ticket. We can certainly implement this as well. Since it is another ticket I guess I have to ask if you guys prefer that we put this ticket on hold and implement the json support prior to closing this ticket? If so then I can update the issue to be blocked by #4488 and then get started on that asap. Thanks!
On 2017/03/07 10:29:38, Jon Sonesen wrote: > On 2017/03/06 18:22:17, juliandoucette wrote: > > > I like this idea, perhaps it could be an optional dir in the site directory > > > which holds json files such as '<page-name>.json' there could also be an > > > optional config option which provides a path to the json metadata directory? > > > > Separating meta data from page data was not what I had in mind. > > > > I was thinking more along the lines of > > https://issues.adblockplus.org/ticket/4488 > > Oh I see, > > I think that is a good solution and an old ticket. We can certainly implement > this as well. Since it is another ticket I guess I have to ask if you guys > prefer that we put this ticket on hold and implement the json support prior to > closing this ticket? If so then I can update the issue to be blocked by #4488 > and then get started on that asap. > > Thanks! I would add that even if we do implement the JSON ticket, we still need to decide how to interpret the metadata of the pages using the classic syntax. It's probably also desirable to finish this ticket sooner rather than later. Julian, if you agree with the above, which of the options below would you prefer? 1. All metadata variables are interpreted as strings and `get_pages_metadata` uses exact matching. 1a. All metadata variables are interpreted as strings and `get_pages_metadata` uses substring matching. 2. Metadata variables default to strings but [values, like this] are converted to lists; exact matching is used for strings and element matching is done for list (that is, the query parameter has to be a member of the list). 3. None of the above. Then how do we interpret and match them? Or would you prefer us to put this on hold and work on JSON ticket first (although the choice between 1, 1a, 2 and 3 still needs to be made it seems)? Vasily
Let's go with option 2 for now, given that this is a global function and not a CMS change. > 2. Metadata variables default to strings but [values, like this] are converted > to lists; exact matching is used for strings and element matching is done for > list (that is, the query parameter has to be a member of the list).
On 2017/03/07 11:48:38, juliandoucette wrote: > Let's go with option 2 for now, given that this is a global function and not a > CMS change. > > > 2. Metadata variables default to strings but [values, like this] are converted > > to lists; exact matching is used for strings and element matching is done for > > list (that is, the query parameter has to be a member of the list). Great, I will do this. One question, are tags with commas something you want? If so the syntax would need to include quotes, and also the logic for parsing would be a bit more complicated. If you would like to use strings with commas in a list then I may suggest waiting for the json support.
> Great, I will do this. One question, are tags with commas something you want? If > so the syntax would need to include quotes, and also the logic for parsing would > be a bit more complicated. If you would like to use strings with commas in a > list then I may suggest waiting for the json support. Good question. No, I don't need lists containing values with commas for AAC or help center.
On 2017/03/07 13:36:43, juliandoucette wrote: > > Great, I will do this. One question, are tags with commas something you want? > If > > so the syntax would need to include quotes, and also the logic for parsing > would > > be a bit more complicated. If you would like to use strings with commas in a > > list then I may suggest waiting for the json support. > > Good question. No, I don't need lists containing values with commas for AAC or > help center. Latest patch should address the problem :)
Hey Jon, I have a couple of nits about whitespace handling and being more careful with brackets (see below) but otherwise it looks good. Cheers, Vasily https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:28: value = value.strip(' ').strip('\n') Can we just strip everything? Or is there a special reason you want to leave tabs and exotic whitespace in there? https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:29: if value.startswith('['): Maybe add "and value.endswith(']')" here to be a bit more careful? https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: value = value[1:-1].split(',') We should probably filter the split value through strip again to remove the whitespace that could be around commas.
Thanks for pointing out those things :) https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:28: value = value.strip(' ').strip('\n') On 2017/03/08 17:07:41, Vasily Kuznetsov wrote: > Can we just strip everything? Or is there a special reason you want to leave > tabs and exotic whitespace in there? Nah, we should just strip everything, will do https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:29: if value.startswith('['): On 2017/03/08 17:07:41, Vasily Kuznetsov wrote: > Maybe add "and value.endswith(']')" here to be a bit more careful? Agreed. https://codereview.adblockplus.org/29370597/diff/29378682/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: value = value[1:-1].split(',') On 2017/03/08 17:07:41, Vasily Kuznetsov wrote: > We should probably filter the split value through strip again to remove the > whitespace that could be around commas. no problem :)
https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: value = value[1:-1].strip().split(',') It should be the other way around. We want to strip each element of the list after splitting, not the whole list before splitting. So we need something like this: value = [element.strip() for element in value[1:-1].split(',')]
On 2017/03/09 09:46:13, Vasily Kuznetsov wrote: > https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... > File tests/test_site/globals/get_pages_metadata.py (right): > > https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... > tests/test_site/globals/get_pages_metadata.py:30: value = > value[1:-1].strip().split(',') > It should be the other way around. We want to strip each element of the list > after splitting, not the whole list before splitting. So we need something like > this: > > value = [element.strip() for element in value[1:-1].split(',')] Ohhh sorry I didn't understand you meant to actually filter the element
https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29378942/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:30: value = value[1:-1].strip().split(',') On 2017/03/09 09:46:13, Vasily Kuznetsov wrote: > It should be the other way around. We want to strip each element of the list > after splitting, not the whole list before splitting. So we need something like > this: > > value = [element.strip() for element in value[1:-1].split(',')] Done.
Hey Jon, I see you changed the matching algoritm in `filter_metadata` -- good catch, I didn't even pay attention to this. However, I think there are couple more things to iron out there. Cheers, Vasilty https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: for option in filter_value.split(','): We should probably only split if the `metadata[filter_name]` is a list. Otherwise if we have a date field and want to match by it, it won't work. https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:45: return True But if you return `True` here, would not this ignore the following filters?
Thanks for catching these :) https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: for option in filter_value.split(','): On 2017/03/10 10:23:38, Vasily Kuznetsov wrote: > We should probably only split if the `metadata[filter_name]` is a list. > Otherwise if we have a date field and want to match by it, it won't work. I guess I thought that this didnt matter since julian said they wont need dates or any strings with commas inside the config fields. However I understand what you mean here as once we move to json compatibility this will be a concern, I will address this. https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:45: return True On 2017/03/10 10:23:38, Vasily Kuznetsov wrote: > But if you return `True` here, would not this ignore the following filters? I thought we were doing an 'or' selection so if one of the values matches the page is added. If this is wrong I will change it.
Just realized we will have the change the filter syntax to json see coment https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: for option in filter_value.split(','): On 2017/03/10 11:01:32, Jon Sonesen wrote: > On 2017/03/10 10:23:38, Vasily Kuznetsov wrote: > > We should probably only split if the `metadata[filter_name]` is a list. > > Otherwise if we have a date field and want to match by it, it won't work. > > I guess I thought that this didnt matter since julian said they wont need dates > or any strings with commas inside the config fields. However I understand what > you mean here as once we move to json compatibility this will be a concern, I > will address this. Actually in this case we will have to change the filter syntax to just accept json as well, this way you can filter on values which have commas currently it is like this filter={'tags':'popular, bar'} which is nice and simple but also lacks the robust-yness required to filter based on values once we add json support.
realized that later when we add json the filter syntax will have to change anyway.
realized that later when we add json the filter syntax will have to change anyway.
Hey Jon, I like the change in the API towards accepting lists for filter values instead of splitting them by comma. This is more unambiguous and more robust. However, there are still a couple of corner cases where it seems current logic would break. I guess we should have added some more proper tests for this from the start, otherwise hard to keep all the different possibilities in the head when changing `filter_metadata`. Sorry for being picky ;) Cheers, Vasily https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:41: for option in filter_value.split(','): On 2017/03/10 11:43:38, Jon Sonesen wrote: > On 2017/03/10 11:01:32, Jon Sonesen wrote: > > On 2017/03/10 10:23:38, Vasily Kuznetsov wrote: > > > We should probably only split if the `metadata[filter_name]` is a list. > > > Otherwise if we have a date field and want to match by it, it won't work. > > > > I guess I thought that this didnt matter since julian said they wont need > dates > > or any strings with commas inside the config fields. However I understand what > > you mean here as once we move to json compatibility this will be a concern, I > > will address this. > > Actually in this case we will have to change the filter syntax to just accept > json as well, this way you can filter on values which have commas currently it > is like this filter={'tags':'popular, bar'} which is nice and simple but also > lacks the robust-yness required to filter based on values once we add json > support. Yeah, this is better. Splitting the filter values was error-prone. https://codereview.adblockplus.org/29370597/diff/29379591/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:45: return True On 2017/03/10 11:01:32, Jon Sonesen wrote: > On 2017/03/10 10:23:38, Vasily Kuznetsov wrote: > > But if you return `True` here, would not this ignore the following filters? > > I thought we were doing an 'or' selection so if one of the values matches the > page is added. If this is wrong I will change it. Yes, the docstring of the function in the issue says "A list of metadata dicts for pages matching all the filters". BTW, perhaps it makes sense to include the docstring into the patch here as well. https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:42: for option in filter_value: How about the situation when the field is a list but the filter value is a string? For example if tags=[foo, bar] and filters={'tags': 'foo'}. I would say the page should be included but when it iterates over 'foo' (as opposed to ['foo']), it will not find 'f' in ['foo', 'bar'] and so False will be returned. If you agree that get_pages_metadata(filters={'tags': 'foo'}) should return a page with tags=[foo, bar] in the header, perhaps the solution here would be to check if `filter_value` is an instance of `basestring` and convert it to a one-item list in this case. https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:45: if filter_value != metadata[filter_name]: Should not this be `elif` instead of `if`. It seems like executing this for listy fields would create false negatives. For example if we have a page that has tags=[foo, bar] and then we do a search with the filters={'tags': ['bar', 'foo']}, this branch would trigger and return False (and I don't think it makes sense to make tags order-dependent).
Hey Vasily, Thank you for catching those things, my bad for the oversight. I will address this. https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... File tests/test_site/globals/get_pages_metadata.py (right): https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:42: for option in filter_value: On 2017/03/15 18:02:58, Vasily Kuznetsov wrote: > How about the situation when the field is a list but the filter value is a > string? For example if tags=[foo, bar] and filters={'tags': 'foo'}. I would say > the page should be included but when it iterates over 'foo' (as opposed to > ['foo']), it will not find 'f' in ['foo', 'bar'] and so False will be returned. > > If you agree that get_pages_metadata(filters={'tags': 'foo'}) should return a > page with tags=[foo, bar] in the header, perhaps the solution here would be to > check if `filter_value` is an instance of `basestring` and convert it to a > one-item list in this case. Odd, I thought I had actually changed this. But perhaps it was in a shelved change that i forgot about. Either way yes I agree that there should be handling for the ase that the filter is just a single string value. Thanks for bringing this up. https://codereview.adblockplus.org/29370597/diff/29379775/tests/test_site/glo... tests/test_site/globals/get_pages_metadata.py:45: if filter_value != metadata[filter_name]: On 2017/03/15 18:02:58, Vasily Kuznetsov wrote: > Should not this be `elif` instead of `if`. It seems like executing this for > listy fields would create false negatives. For example if we have a page that > has tags=[foo, bar] and then we do a search with the filters={'tags': ['bar', > 'foo']}, this branch would trigger and return False (and I don't think it makes > sense to make tags order-dependent). Oh yeah, that is a good catch.
LGTM
On 2017/03/17 10:35:38, Vasily Kuznetsov wrote: > LGTM What are our plans around integrating this into adblockplus/cms in the future again?
On 2017/03/28 16:56:42, juliandoucette wrote: > On 2017/03/17 10:35:38, Vasily Kuznetsov wrote: > > LGTM > > What are our plans around integrating this into adblockplus/cms in the future > again? I would say if you're happy with how it works and you would like to use it in more than one website, we can create an issue for moving this into CMS core. We'll need to adapt it a bit to remove the repetition of the code that reads metadata from pages but otherwise it should not be too complicated.
> I would say if you're happy with how it works and you would like to use it in > more than one website, we can create an issue for moving this into CMS core. > We'll need to adapt it a bit to remove the repetition of the code that reads > metadata from pages but otherwise it should not be too complicated. Sounds good. I'll leave more feedback here after I have tested this in help center.
I'm probably going to use this to complete #5043 on acceptableads.com. I think this warrants implementing this function in adblockplus/cms.
Message was sent while issue was closed.
On 2017/06/16 19:44:14, juliandoucette wrote: > I'm probably going to use this to complete #5043 on http://acceptableads.com. I think > this warrants implementing this function in adblockplus/cms. Sure, closing this review and opening another for the merge into a default global |