|
|
Created:
July 23, 2017, 4:35 p.m. by Vasily Kuznetsov Modified:
Aug. 4, 2017, 6:17 p.m. CC:
ire, saroyanm Visibility:
Public. |
DescriptionFixes 5343 - add global function: get_canonical_url
Repository: https://hg.adblockplus.org/cms
Base revision: b83af9f98bb3
Patch Set 1 #
Total comments: 3
Patch Set 2 : Implement get_canonical_url (according to updated ticket) #Patch Set 3 : Add comment about test-type-specific expected output handling #
Total comments: 1
MessagesTotal messages: 22
Hi Jon and Julian! Here's an implementation of `get_page_url`. At the moment it has two questionable features, which we should perhaps discuss: 1. It allows generating URLs of pages that don't exist. The existence check can be added, but this would raise additional questions (e.g. if the page doesn't exist as specified, but exists in another locale(s), what should we do?), so I'd like to check first if the check is needed at all. 2. Test server overrides `siteurl` configuration variable (which we use as the base URL), so the URL generated during test server run will be different from what static generation produces. The override changes `siteurl` to "http://localhost:5000/", which is the address on which test server runs, so the URLs coming out of `get_page_url` will refer to the local site, which might be good during testing. I'm not sure if this behavior is desirable. I had to modify the test suite a bit and introduce the ability to have different expected outputs for dynamic and static run in order to accommodate #2 above. If we decide that it's better to always use `siteurl` as configured in `get_page_url`, this code won't be necessary. Looking forward to your feedback. Cheers, Vasily https://codereview.adblockplus.org/29495555/diff/29495556/tests/expected_outp... File tests/expected_output/sitemap (right): https://codereview.adblockplus.org/29495555/diff/29495556/tests/expected_outp... tests/expected_output/sitemap:6: <ul><li>title : filter </li><li>title : get_page_url </li><li>title : global </li><li>title : sitemap </li><li>title : translate </li> This change is not completely related but I needed to adjust this to make the existing test for `sitemap` pass. That test should be probably made more robust, but not in this review. https://codereview.adblockplus.org/29495555/diff/29495556/tests/test_page_out... File tests/test_page_outputs.py (right): https://codereview.adblockplus.org/29495555/diff/29495556/tests/test_page_out... tests/test_page_outputs.py:17: return_data[output_file] = f.read().strip() Increase test robustness a bit. https://codereview.adblockplus.org/29495555/diff/29495556/tests/test_page_out... tests/test_page_outputs.py:62: yield 'http://localhost:5000/en/' This was producing "root" as current locale in the test, which is not really ideal. Not completely related, but it makes `get_page_url` test output more sane.
On 2017/07/23 16:55:33, Vasily Kuznetsov wrote: > 1. It allows generating URLs of pages that don't exist. The existence check can > be added, but this would raise additional questions (e.g. if the page doesn't > exist as specified, but exists in another locale(s), what should we do?), so I'd > like to check first if the check is needed at all. Doesn't `generate_static_pages` and `runserver` throw a warning when it encounters a page that cannot resolve already? > 2. Test server overrides `siteurl` configuration variable (which we use as the > base URL), so the URL generated during test server run will be different from > what static generation produces. The override changes `siteurl` to > "http://localhost:5000/", which is the address on which test server runs, so the > URLs coming out of `get_page_url` will refer to the local site, which might be > good during testing. I'm not sure if this behavior is desirable. Does this override also override our command line options for address and port? I like that I can set a site_url in settings.ini and not worry about manually overriding it using command line options to run runserver locally (as long as I can override this behaviour using command line options). Note: We don't have a command line option for protocol? --- *thinking out loud* I don't mind moving global functions to website-defaults if CMS provides everything we need to write them conveniently using @contextfunction
Thanks for the answers, Julian, see below. On 2017/07/24 15:04:13, juliandoucette wrote: > On 2017/07/23 16:55:33, Vasily Kuznetsov wrote: > > 1. It allows generating URLs of pages that don't exist. The existence check > can > > be added, but this would raise additional questions (e.g. if the page doesn't > > exist as specified, but exists in another locale(s), what should we do?), so > I'd > > like to check first if the check is needed at all. > > Doesn't `generate_static_pages` and `runserver` throw a warning when it > encounters a page that cannot resolve already? Indeed, the code that generates URLs in CMS does that, but I'm not using that code because it does some other things like changing locales and it doesn't support using siteurl or producing canonical links. I wasn't sure if we actually want all that additional behavior, but if you would like `get_page_url` to behave consistently with other link generation (meaning that sometimes locale will be changed), then this would be a better way. If I do that, generating broken links will produce a warning. > > 2. Test server overrides `siteurl` configuration variable (which we use as the > > base URL), so the URL generated during test server run will be different from > > what static generation produces. The override changes `siteurl` to > > "http://localhost:5000/", which is the address on which test server runs, so > the > > URLs coming out of `get_page_url` will refer to the local site, which might be > > good during testing. I'm not sure if this behavior is desirable. > > Does this override also override our command line options for address and port? If you mean that when you run it with -a foo.bar -p 3372, then `siteurl` will effectively be changed to "http://foo.bar:3372/", then yes. > I like that I can set a site_url in settings.ini and not worry about manually > overriding it using command line options to run runserver locally (as long as I > can override this behaviour using command line options). `siteurl` in settings doesn't affect on which address and port the server will listen, it only affects what links would be generated, so it's not the same thing as command line options for host and port. The question that I'm pondering though is whether runserver should generate URLs that refer to itself (and then use something like http://localhost:5000/ as the base url) or to the target server (and then use the `siteurl` from the config). I'm not sure, from your answer, which one you would prefer. > Note: We don't have a command line option for protocol? No, we don't. Runserver only does HTTP, it can't do HTTPS and that would probably not be very easy to add. > --- > > *thinking out loud* I don't mind moving global functions to website-defaults if > CMS provides everything we need to write them conveniently using > @contextfunction That sounds like a direction I'd like, especially for less generic global functions. Let me know if some particular API, that would be useful for that, is missing.
CC ire, saroyanm On 2017/07/24 16:05:38, Vasily Kuznetsov wrote: > Indeed, the code that generates URLs in CMS does that, but I'm not using that > code because it does some other things like changing locales and it doesn't > support using siteurl or producing canonical links. I wasn't sure if we actually > want all that additional behavior, but if you would like `get_page_url` to > behave consistently with other link generation (meaning that sometimes locale > will be changed), then this would be a better way. If I do that, generating > broken links will produce a warning. I think that a function called `get_page_url` should throw an error if provided a page that doesn't exist or a locale that is not supported by the page given. And if implementing those checks is painful then we could rename this function to something less promising. > If you mean that when you run it with -a foo.bar -p 3372, then `siteurl` will > effectively be changed to "http://foo.bar:3372/", then yes. Good. > The question that I'm pondering though is whether runserver should generate URLs > that refer to itself (and then use something like http://localhost:5000/ as the > base url) or to the target server (and then use the `siteurl` from the config). > I'm not sure, from your answer, which one you would prefer. There should probably be an order of precedence like: 1. command line options (e.g. -a -p) 2. static configuration (e.g. settings.ini) 3. environment variables (e.g. server hostname and domainname) Note: By this logic we probably wouldn't set a siteurl in settings.ini because we could rely on environment variables and/or command line options passed in different environments. > > Note: We don't have a command line option for protocol? > > No, we don't. Runserver only does HTTP, it can't do HTTPS and that would > probably not be very easy to add. Noted. > That sounds like a direction I'd like, especially for less generic global > functions. Let me know if some particular API, that would be useful for that, is > missing. Will-do. --- I'm concerned about this function being used by mistake in the future to generate hrefs and srcs because it doesn't set hreflang like the page converter and linkify do. And I don't think it should (set hreflang).
On 2017/07/24 17:29:45, juliandoucette wrote: > I'm concerned about this function being used by mistake in the future to > generate hrefs and srcs because it doesn't set hreflang like the page converter > and linkify do. And I don't think it should (set hreflang). ignore "and srcs".
On 2017/07/24 17:29:45, juliandoucette wrote: > I think that a function called `get_page_url` should throw an error if provided > a page that doesn't exist or a locale that is not supported by the page given. > And if implementing those checks is painful then we could rename this function > to something less promising. e.g. `get_canonical_url(page)` (this should check if the page exists) `path.join([paths])` (this doesn't have to check anything)
On 2017/07/24 17:41:04, juliandoucette wrote: > On 2017/07/24 17:29:45, juliandoucette wrote: > > I think that a function called `get_page_url` should throw an error if > provided > > a page that doesn't exist or a locale that is not supported by the page given. > > And if implementing those checks is painful then we could rename this function > > to something less promising. > > e.g. > > `get_canonical_url(page)` (this should check if the page exists) > > `path.join([paths])` (this doesn't have to check anything) Note: (Sorry for confusion) #5343 requests `get_page_url`. But the issues blocked by #5343 could be unblocked by something like `get_canonical_url` or another means to generate a canonical url which consider the requirements outlined in https://issues.adblockplus.org/ticket/5343#comment:3
> I'm concerned about this function being used by mistake in the future to > generate hrefs and srcs because it doesn't set hreflang like the page converter > and linkify do. And I don't think it should (set hreflang). On the other hand, I think I prefer: <a href="{{ get_page_url(PAGE_NAME, locale) }}" hreflang="{{ locale }}">LINK</a> To: {{ PAGE_NAME | linkify }} LINK </a> for this purpose in .tmpl files.
On 2017/07/24 18:05:12, juliandoucette wrote: > > I'm concerned about this function being used by mistake in the future to > > generate hrefs and srcs because it doesn't set hreflang like the page > converter > > and linkify do. And I don't think it should (set hreflang). > > On the other hand, I think I prefer: > > <a href="{{ get_page_url(PAGE_NAME, locale) }}" hreflang="{{ locale }}">LINK</a> > > To: > > {{ PAGE_NAME | linkify }} > LINK > </a> > > for this purpose in .tmpl files. I'm thinking that perhaps it would be less confusing to have two functions instead of one: get_page_url(page[, locale]) and get_canonical_url(page). Then according to your previous comments, `get_page_url` will check that the page exists in the requested locale and will fail if it doesn't; `get_canonical_url` will check that the page exists in default (or any?) locale and will also fail if it doesn't. Does this sounds about right? If so, I will update the ticket and upload a new version of the review. Also, should the url for a page called `index` exclude the page name (e.g. "http://foo.com/" instead of "http://foo.com/index")?
On 2017/07/25 12:00:43, Vasily Kuznetsov wrote: > Does this sounds about right? If so, I will update the ticket and upload a new > version of the review. Yes. As long as `get_page_url` and `get_canonical_url` do not fail if there is no defaultlocale locale file in development (I think the defaultlocale locale file is created by generate_static_pages?). > Also, should the url for a page called `index` exclude the page name (e.g. > "http://foo.com/" instead of "http://foo.com/index")? Yes.
On 2017/07/25 12:34:54, juliandoucette wrote: > On the other hand, I think I prefer: > > <a href="{{ get_page_url(PAGE_NAME, locale) }}" hreflang="{{ locale }}">LINK</a> > > To: > > {{ PAGE_NAME | linkify }} > LINK > </a> > > for this purpose in .tmpl files. While working on the new version of `get_page_url` I realized that those two snippets above are not equivalent. If the translation is not available for `locale`, `linkify` would give you a default locale link. `get_page_url`, on the other hand would fail in this situation. Is that how you thought about this and is it the desired behavior?
On 2017/07/25 20:42:08, Vasily Kuznetsov wrote: > On 2017/07/25 12:34:54, juliandoucette wrote: > > On the other hand, I think I prefer: > > > > <a href="{{ get_page_url(PAGE_NAME, locale) }}" hreflang="{{ locale > }}">LINK</a> > > > > To: > > > > {{ PAGE_NAME | linkify }} > > LINK > > </a> > > > > for this purpose in .tmpl files. > > While working on the new version of `get_page_url` I realized that those two > snippets above are not equivalent. If the translation is not available for > `locale`, `linkify` would give you a default locale link. `get_page_url`, on the > other hand would fail in this situation. Is that how you thought about this and > is it the desired behavior? Good point. I hadn't considered this. It seems we are talking about two separate issues here: 1. Generating canonical page urls - This is a new feature that is blocking other issues and reviews 2. Generating localized page urls - This is a new feature that is not blocking other issues or reviews - linkify accomplishes this in an ugly way for anchor tags in jinja templates - page converters accomplish this well for hrefs and srcs in html and markdown templates I think the spec and scope of 1 is clear. I think that 2 could be improved in jinja templates in one of two ways: way a: ``` {% if page_has_locale(page, locale) %} <a href="{{ get_page_url(page, locale) }}" hreflang="{{ locale }}"> {% else %} <a href="{{ get_page_url(page, defaultlocale) }}" hreflang="{{ defaultlocale }}"> {% endif %} ``` Note: This solution proposes another new function `page_has_locale`. way b: ``` <a {{ get_page_href(page, locale) }}> ``` or ``` <a {{ page | href }}> ``` Note: This solution is similar to linkify except that it generates tag attributes (href and hreflang) instead of an entire tag. This is an improvement because linkify breaks editors/linters tag matching checks. --- I think that `page_has_locale` and `get_page_url` are core-like globals / functions (if you know what I mean?) and `get_page_href` and/or `href` filters are website-default-like (if you know what I mean?). Anyway, I think we should separate these issues if we can get issue 1 resolved sooner that way.
Minor point/correction: <a {{ page | href }}> should have been: <a {{ page | href(locale) }}> for consistency in my example.
On 2017/07/26 12:12:46, juliandoucette wrote: > On 2017/07/25 20:42:08, Vasily Kuznetsov wrote: > > While working on the new version of `get_page_url` I realized that those two > > snippets above are not equivalent. If the translation is not available for > > `locale`, `linkify` would give you a default locale link. `get_page_url`, on > the > > other hand would fail in this situation. Is that how you thought about this > and > > is it the desired behavior? > > Good point. I hadn't considered this. > > It seems we are talking about two separate issues here: > > 1. Generating canonical page urls > - This is a new feature that is blocking other issues and reviews > 2. Generating localized page urls > - This is a new feature that is not blocking other issues or reviews > - linkify accomplishes this in an ugly way for anchor tags in jinja templates > - page converters accomplish this well for hrefs and srcs in html and markdown > templates > > I think the spec and scope of 1 is clear. Very good analysis. I think I now understand better the motivation for different link generation features. Let's make #5343 be only about `get_canonical_link`, putting the locale-related details aside for now. I would then create another issue about localized URL generation, that would address the shortcomings of `linkify`. That one would not be ready yet, but I think we're reasonably close to understanding the desired API there as well. > I think that 2 could be improved in jinja templates in one of two ways: > > way a: > > ``` > {% if page_has_locale(page, locale) %} > <a href="{{ get_page_url(page, locale) }}" hreflang="{{ locale }}"> > {% else %} > <a href="{{ get_page_url(page, defaultlocale) }}" hreflang="{{ defaultlocale > }}"> > {% endif %} > ``` > > Note: This solution proposes another new function `page_has_locale`. This could potentially work, if `page_has_locale` is implemented in a way, that is completely aligned with how `get_page_url` would work. It is quite wordy though, I wouldn't like that too much if I were you. Note: The CMS also has some support for full page translation [1], that will override translation-string-based translation when present. More specifically, when I ask for a link to `mypage` in a non-default locale `mylocale`, and `locales/en/mypage` exists, but `locales/mylocale/mypage` doesn't exist, then the CMS will return the link to `en/mypage` and not to `mylocale/mypage`, without checking if `pages/mypage` exists and if it has normal (translation-string-based json-encoded) translations. It sort of makes sense, given that we probably don't want to use full page translation in some languages and TS-based translation in other languages for the same page. [1] https://hg.adblockplus.org/cms/file/tip/cms/sources.py#l49 > way b: > > ``` > <a {{ get_page_href(page, locale) }}> > ``` > > or > > ``` > <a {{ page | href(locale) }}> > ``` > > Note: This solution is similar to linkify except that it generates tag > attributes (href and hreflang) instead of an entire tag. This is an improvement > because linkify breaks editors/linters tag matching checks. I like way b more, because it seems like a much more compact API to use. It's probably more important what you like though. > --- > > I think that `page_has_locale` and `get_page_url` are core-like globals / > functions (if you know what I mean?) and `get_page_href` and/or `href` filters > are website-default-like (if you know what I mean?). Yes, I think I know what you mean. And it looks like this would enable both way a and way b. Ok, that also sounds pretty ok to me. > Anyway, I think we should separate these issues if we can get issue 1 resolved > sooner that way. Done, see #5452.
I implemented get_canonical_url now, as described in the updated ticket.
On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > I implemented get_canonical_url now, as described in the updated ticket. Thanks Vasily :) I'll wait for Jon's review.
On 2017/07/28 23:43:07, juliandoucette wrote: > On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > > I implemented get_canonical_url now, as described in the updated ticket. > > Thanks Vasily :) > > I'll wait for Jon's review. Hey guys, it looks pretty good to me but I'll have a chance to take a better look in an hour or so, to give it a proper response. Thanks Vasily!!
On 2017/07/28 23:43:07, juliandoucette wrote: > On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > > I implemented get_canonical_url now, as described in the updated ticket. > > Thanks Vasily :) > > I'll wait for Jon's review. Hey guys, it looks pretty good to me but I'll have a chance to take a better look in an hour or so, to give it a proper response. Thanks Vasily!!
On 2017/07/28 23:47:29, Jon Sonesen wrote: > On 2017/07/28 23:43:07, juliandoucette wrote: > > On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > > > I implemented get_canonical_url now, as described in the updated ticket. > > > > Thanks Vasily :) > > > > I'll wait for Jon's review. > > Hey guys, it looks pretty good to me but I'll have a chance to take a better > look in an hour or so, to give it a proper response. Thanks Vasily!! Hey Vasily, Thanks again, sorry for the delay here but I pulled the patch and it seems to work well. Additionally I do like the addition of the test type modifier. However, can I ask specifically why that change needed to be there?
On 2017/08/02 18:20:57, Jon Sonesen wrote: > On 2017/07/28 23:47:29, Jon Sonesen wrote: > > On 2017/07/28 23:43:07, juliandoucette wrote: > > > On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > > > > I implemented get_canonical_url now, as described in the updated ticket. > > > > > > Thanks Vasily :) > > > > > > I'll wait for Jon's review. > > > > Hey guys, it looks pretty good to me but I'll have a chance to take a better > > look in an hour or so, to give it a proper response. Thanks Vasily!! > > Hey Vasily, > > Thanks again, sorry for the delay here but I pulled the patch and it seems to > work well. Additionally I do like the addition of the test type modifier. > However, can I ask specifically why that change needed to be there? Hi Jon, I assume you mean "why do we need the test type modifier?" Indeed, this is not really explained in any comments. The thing is: test server overrides "siteurl" configuration option and returns "http://localhost:5000/" instead of the original value. This behavior is useful as it allows us to produce correct URLs that work in the local testing environment instead of always producing links to the production server, so after discussing with Julian I decided to go along with it. However, this means that the URLs generated by `get_canonical_url` are not the same in static and dynamic testing scenarios, so we need the ability to have two different "correct answers". Cheers, Vasily
On 2017/08/02 18:32:05, Vasily Kuznetsov wrote: > On 2017/08/02 18:20:57, Jon Sonesen wrote: > > On 2017/07/28 23:47:29, Jon Sonesen wrote: > > > On 2017/07/28 23:43:07, juliandoucette wrote: > > > > On 2017/07/28 10:08:07, Vasily Kuznetsov wrote: > > > > > I implemented get_canonical_url now, as described in the updated ticket. > > > > > > > > Thanks Vasily :) > > > > > > > > I'll wait for Jon's review. > > > > > > Hey guys, it looks pretty good to me but I'll have a chance to take a better > > > look in an hour or so, to give it a proper response. Thanks Vasily!! > > > > Hey Vasily, > > > > Thanks again, sorry for the delay here but I pulled the patch and it seems to > > work well. Additionally I do like the addition of the test type modifier. > > However, can I ask specifically why that change needed to be there? > > Hi Jon, > > I assume you mean "why do we need the test type modifier?" > > Indeed, this is not really explained in any comments. The thing is: test server > overrides "siteurl" configuration option and returns "http://localhost:5000/" > instead of the original value. This behavior is useful as it allows us to > produce correct URLs that work in the local testing environment instead of > always producing links to the production server, so after discussing with Julian > I decided to go along with it. However, this means that the URLs generated by > `get_canonical_url` are not the same in static and dynamic testing scenarios, so > we need the ability to have two different "correct answers". > > Cheers, > Vasily Hey, Sorry for the unclear question, ok it seems clear reading the code what it does but the reasoning is is not there in the comments. I am not hugely concerned about it although it would seem that a comment explaining this would be nice, or docstring. If you disagree im happy to leave it alone. LGTM
Hi Jon, Good point about adding a comment that explains test-type-specific expected output handling. I added it. Cheers, Vasily https://codereview.adblockplus.org/29495555/diff/29504563/tests/test_page_out... File tests/test_page_outputs.py (right): https://codereview.adblockplus.org/29495555/diff/29504563/tests/test_page_out... tests/test_page_outputs.py:25: # Move test-type-specific expected outputs (e.g. "xyz@static" -> "xyz") I added a comment to make it more clear what's going on here. |