cms/converters.py - Issue 29472555: Issue 4867 - Add global get_pages_metadata to template converters

Delta Between Two Patch Sets: cms/converters.py

Issue 29472555: Issue 4867 - Add global get_pages_metadata to template converters (Closed)

Left Patch Set: fix interdependency, fix poor filter type checking Created July 3, 2017, 9:04 a.m.

Right Patch Set: address naming, temp var assignment Created July 4, 2017, 3:19 p.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

Left: Side by side diff | Download
Right: Side by side diff | Download

LEFT	RIGHT
1 # This file is part of the Adblock Plus web scripts,	1 # This file is part of the Adblock Plus web scripts,

2 # Copyright (C) 2006-2017 eyeo GmbH	2 # Copyright (C) 2006-2017 eyeo GmbH

3 #	3 #

4 # Adblock Plus is free software: you can redistribute it and/or modify	4 # Adblock Plus is free software: you can redistribute it and/or modify

5 # it under the terms of the GNU General Public License version 3 as	5 # it under the terms of the GNU General Public License version 3 as

6 # published by the Free Software Foundation.	6 # published by the Free Software Foundation.

7 #	7 #

8 # Adblock Plus is distributed in the hope that it will be useful,	8 # Adblock Plus is distributed in the hope that it will be useful,

9 # but WITHOUT ANY WARRANTY; without even the implied warranty of	9 # but WITHOUT ANY WARRANTY; without even the implied warranty of

10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the	10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

(...skipping 98 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
109 # the document.	109 # the document.

110 self._append_text(data)	110 self._append_text(data)

111	111

112 def handle_entityref(self, name):	112 def handle_entityref(self, name):

113 self._append_text(self.unescape('&{};'.format(name)))	113 self._append_text(self.unescape('&{};'.format(name)))

114	114

115 def handle_charref(self, name):	115 def handle_charref(self, name):

116 self._append_text(self.unescape('&#{};'.format(name)))	116 self._append_text(self.unescape('&#{};'.format(name)))

117	117

118	118

119 def get_page_metadata(page, data):	119 def parse_page_content(page, data):
Vasily Kuznetsov 2017/07/03 17:42:44 Perhaps this function should be renamed now since Perhaps this function should be renamed now since it's actually doing a bit more than just getting the metadata. I can't think of anything better than something like `parse_page_content` -- feel free to suggest a better name (we can discuss on IRC to speed it up). juliandoucette 2017/07/03 21:55:54 [`get`, `query`, `pages`, `get_pages`, ...] I lik Show quoted text On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > Perhaps this function should be renamed now since it's actually doing a bit more > than just getting the metadata. I can't think of anything better than something > like `parse_page_content` -- feel free to suggest a better name (we can discuss > on IRC to speed it up). [`get`, `query`, `pages`, `get_pages`, ...] I like short names. Vasily Kuznetsov 2017/07/04 07:43:48 Note that this is not the function that gets expos Show quoted text On 2017/07/03 21:55:54, juliandoucette wrote: > On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > > Perhaps this function should be renamed now since it's actually doing a bit > more > > than just getting the metadata. I can't think of anything better than > something > > like `parse_page_content` -- feel free to suggest a better name (we can > discuss > > on IRC to speed it up). > > [`get`, `query`, `pages`, `get_pages`, ...] I like short names. Note that this is not the function that gets exposed to the templates. Also, what it does is take contents of one page and separate it into metadata and the rest (which is then used to render the page), so the name should reflect that. juliandoucette 2017/07/04 09:57:25 Oh, sorry. I meant the get_pages_metadata function Show quoted text On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > Note that this is not the function that gets exposed to the templates. Also, > what it does is take contents of one page and separate it into metadata and the > rest (which is then used to render the page), so the name should reflect that. Oh, sorry. I meant the get_pages_metadata function (the one that is exposed). It does more than get pages metadata no (It includes page contents)? So why not rename it something more appropriate (generic)? Vasily Kuznetsov 2017/07/04 10:23:34 Currently the content is not included in metadata Show quoted text On 2017/07/04 09:57:25, juliandoucette wrote: > On 2017/07/04 07:43:48, Vasily Kuznetsov wrote: > > Note that this is not the function that gets exposed to the templates. Also, > > what it does is take contents of one page and separate it into metadata and > the > > rest (which is then used to render the page), so the name should reflect that. > > Oh, sorry. I meant the get_pages_metadata function (the one that is exposed). It > does more than get pages metadata no (It includes page contents)? So why not > rename it something more appropriate (generic)? Currently the content is not included in metadata and I think it makes sense to keep it so as far as this ticket is concerned. P.S. If we include page content, it would be just what's in the file, before the CMS processing. Not sure if that would be very useful or more confusing. juliandoucette 2017/07/04 10:42:56 Definitely more confusing. I don't really care if Show quoted text On 2017/07/04 10:23:34, Vasily Kuznetsov wrote: > Currently the content is not included in metadata and I think it makes sense to > keep it so as far as this ticket is concerned. > > P.S. If we include page content, it would be just what's in the file, before the > CMS processing. Not sure if that would be very useful or more confusing. Definitely more confusing. I don't really care if the content is there (because I haven't had to use it yet) - but it does logically follow that if the processed content is not in this object then I should be able to pass this object or the page name to another function to get it (another ticket ~someday).
120 """Generator which gets per page metadata and cleaned page content"""	120 """Separate page content into metadata (dict) and body text (str)"""
Vasily Kuznetsov 2017/07/03 17:42:44 Whenever possible, it's best to write docstrings i Whenever possible, it's best to write docstrings in the style "Doo foo" or if we're returning something non-obvious, "Do foo, return bar" or "Do foo, yield bars" [1]. So perhaps here we could write something like "Separate page content into metadata (dict) and body text (str)". [1]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings Jon Sonesen 2017/07/04 15:02:38 Acknowledged. Show quoted text On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > Whenever possible, it's best to write docstrings in the style "Doo foo" or if > we're returning something non-obvious, "Do foo, return bar" or "Do foo, yield > bars" [1]. So perhaps here we could write something like "Separate page content > into metadata (dict) and body text (str)". > > [1]: https://www.python.org/dev/peps/pep-0257/#one-line-docstrings Acknowledged.
121 page_data = {'page': page}	121 page_data = {'page': page}

122 lines = data.splitlines(True)	122 lines = data.splitlines(True)

123 for i, line in enumerate(lines):	123 for i, line in enumerate(lines):

124 if not re.search(r'^\s[\w\-]+\s=', line):	124 if not re.search(r'^\s[\w\-]+\s=', line):

125 break	125 break

126 name, value = line.split('=', 1)	126 name, value = line.split('=', 1)

127 value = value.strip()	127 value = value.strip()

128 if value.startswith('[') and value.endswith(']'):	128 if value.startswith('[') and value.endswith(']'):

129 value = [element.strip() for element in value[1:-1].split(',')]	129 value = [element.strip() for element in value[1:-1].split(',')]

130 lines[i] = '\n'	130 lines[i] = '\n'

131 page_data[name.strip()] = value	131 page_data[name.strip()] = value

132 return page_data, ''.join(lines)	132 return page_data, ''.join(lines)

133	133

134	134

135 class Converter:	135 class Converter:

136 whitelist = {'a', 'em', 'sup', 'strong', 'code', 'span'}	136 whitelist = {'a', 'em', 'sup', 'strong', 'code', 'span'}

137 missing_translations = 0	137 missing_translations = 0

138 total_translations = 0	138 total_translations = 0

139	139

140 def __init__(self, params, key='pagedata'):	140 def __init__(self, params, key='pagedata'):

141 self._params = params	141 self._params = params

142 self._key = key	142 self._key = key

143 self._attribute_parser = AttributeParser(self.whitelist)	143 self._attribute_parser = AttributeParser(self.whitelist)

144 self._seen_defaults = {}	144 self._seen_defaults = {}

145	145

146 # Read in any parameters specified at the beginning of the file	146 # Read in any parameters specified at the beginning of the file

147 # and override converter defaults with page specific params	147 # and override converter defaults with page specific params

148 data, filename = params[key]	148 data, filename = params[key]

149 page_data, cleaned_page = get_page_metadata(params['page'], data)	149 page_data, body_text = parse_page_content(params['page'], data)
Vasily Kuznetsov 2017/07/03 17:42:44 I think the variable naming is somewhat confusing I think the variable naming is somewhat confusing here. `page_data` should probably be called `metadata`, since we usually call this metadata elsewhere and `cleaned_page` could be something like `body_text`. What do you think? Jon Sonesen 2017/07/04 14:58:06 Agree here, ack Show quoted text On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > I think the variable naming is somewhat confusing here. `page_data` should > probably be called `metadata`, since we usually call this metadata elsewhere and > `cleaned_page` could be something like `body_text`. What do you think? Agree here, ack
150 params.update(page_data)	150 params.update(page_data)

151 params[key] = (cleaned_page, filename)	151 params[key] = (body_text, filename)

152	152

153 def localize_string(	153 def localize_string(

154 self, page, name, default, comment, localedata, escapes):	154 self, page, name, default, comment, localedata, escapes):

155	155

156 def escape(s):	156 def escape(s):

157 return re.sub(r'.',	157 return re.sub(r'.',

158 lambda match: escapes.get(match.group(0),	158 lambda match: escapes.get(match.group(0),

159 match.group(0)),	159 match.group(0)),

160 s, flags=re.S)	160 s, flags=re.S)

161	161

(...skipping 323 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
485 )))	485 )))

486	486

487 def get_pages_metadata(self, filters=None):	487 def get_pages_metadata(self, filters=None):

488 if filters is not None and not isinstance(filters, dict):	488 if filters is not None and not isinstance(filters, dict):

489 raise TypeError('Filters are not a dictionary')	489 raise TypeError('Filters are not a dictionary')

490	490

491 return_data = []	491 return_data = []

492 for page_name, _format in self._params['source'].list_pages():	492 for page_name, _format in self._params['source'].list_pages():

493 data, filename = self._params['source'].read_page(page_name,	493 data, filename = self._params['source'].read_page(page_name,

494 _format)	494 _format)

495 page_data, cleaned_page = get_page_metadata(page_name, data)	495 page_data = parse_page_content(page_name, data)[0]
Vasily Kuznetsov 2017/07/03 17:42:44 We can just take the first part of the tuple that We can just take the first part of the tuple that `get_page_metadata` returns and then I would also suggest renaming the variable for clarity. So we'd end up with: metadata = get_page_metadata(...)[0] What do you think? Jon Sonesen 2017/07/04 14:58:06 Acknowledged. Show quoted text On 2017/07/03 17:42:44, Vasily Kuznetsov wrote: > We can just take the first part of the tuple that `get_page_metadata` returns > and then I would also suggest renaming the variable for clarity. So we'd end up > with: > > metadata = get_page_metadata(...)[0] > > What do you think? Acknowledged.
496 if self.filter_metadata(filters, page_data) is True:	496 if self.filter_metadata(filters, page_data) is True:

497 return_data.append(page_data)	497 return_data.append(page_data)

498 return return_data	498 return return_data

499	499

500 def filter_metadata(self, filters, metadata):	500 def filter_metadata(self, filters, metadata):

501 # if only the page key is in the metadata then there	501 # if only the page key is in the metadata then there

502 # was no user defined metadata	502 # was no user defined metadata

503 if metadata.keys() == ['page']:	503 if metadata.keys() == ['page']:

504 return False	504 return False

505 if filters is None:	505 if filters is None:

(...skipping 29 matching lines...) Expand all Loading...
535 stack.pop()	535 stack.pop()

536 stack[-1]['subitems'].append(item)	536 stack[-1]['subitems'].append(item)

537 stack.append(item)	537 stack.append(item)

538 return structured	538 return structured

539	539

540 converters = {	540 converters = {

541 'html': RawConverter,	541 'html': RawConverter,

542 'md': MarkdownConverter,	542 'md': MarkdownConverter,

543 'tmpl': TemplateConverter,	543 'tmpl': TemplateConverter,

544 }	544 }

LEFT	RIGHT