README.md - Issue 29465720: Issue 4970 - Document the library API of python-abp

Side by Side Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)

Patch Set: Update README to match the changes from https://codereview.adblockplus.org/29465715/ Created Aug. 7, 2017, 8:28 p.m.

Left:
Right:

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View unified diff | Download patch

OLD	NEW
1 # python-abp	1 # python-abp

2	2

3 This repository contains the script that is used for building Adblock Plus	3 This repository contains a library for working with Adblock Plus filter lists

4 filter lists from the form in which they are authored into the format suitable	4 and the script that is used for building Adblock Plus filter lists from the

5 for consumption by the adblocking software.	5 form in which they are authored into the format suitable for consumption by the

	6 adblocking software.
	mathias 2017/08/08 12:24:35 For an introduction that is a bit too much. How ab For an introduction that is a bit too much. How about making two or three short sentences out of this?
6	7

7 ## Installation	8 ## Installation

8	9

9 Prerequisites:	10 Prerequisites:

10	11

11 * Linux, Mac OS X or Windows (any modern Unix should work too),	12 * Linux, Mac OS X or Windows (any modern Unix should work too),

12 * Python (2.7 or 3.5),	13 * Python (2.7 or 3.5+),

13 * pip.	14 * pip.

14	15

15 To install:	16 To install:

16	17

17 $ pip install -U python-abp	18 $ pip install -U python-abp

18	19

19 ## Rendering of filter lists	20 ## Rendering of filter lists

20	21

21 The filter lists are originally authored in relatively smaller parts focused	22 The filter lists are originally authored in relatively smaller parts focused

22 on a particular type of filters, related to a specific topic or relevant	23 on a particular type of filters, related to a specific topic or relevant

23 for particular geographical area.	24 for particular geographical area.

24 We call these parts _filter list fragments_ (or just _fragments_)	25 We call these parts _filter list fragments_ (or just _fragments_)

25 to distinguish them from full filter lists that are	26 to distinguish them from full filter lists that are

26 consumed by the adblocking software such as Adblock Plus.	27 consumed by the adblocking software such as Adblock Plus.

27	28

28 Rendering is a process that combines filter list fragments into a filter list.	29 Rendering is a process that combines filter list fragments into a filter list.

29 It starts with one fragment that can include other ones and so forth.	30 It starts with one fragment that can include other ones and so forth.

30 The produced filter list is marked with a version, a timestamp and	31 The produced filter list is marked with a version, a timestamp and

31 a [checksum](https://adblockplus.org/filters#special-comments).	32 a [checksum][1].

32	33

33 Python-abp contains a script that can do this called `flrender`:	34 Python-abp contains a script that can do this called `flrender`:

34	35

35 $ flrender fragment.txt filterlist.txt	36 $ flrender fragment.txt filterlist.txt

36	37

37 This will take the top level fragment in `fragment.txt`, render it and save into	38 This will take the top level fragment in `fragment.txt`, render it and save into

38 `filterlist.txt`.	39 `filterlist.txt`.

39	40

40 Fragments might reference other fragments that should be included into them.	41 Fragments might reference other fragments that should be included into them.

41 The references come in two forms: http(s) includes and local includes:	42 The references come in two forms: http(s) includes and local includes:

42	43

43 %include http://www.server.org/dir/list.txt%	44 %include http://www.server.org/dir/list.txt%

44 %include easylist:easylist/easylist_general_block.txt	45 %include easylist:easylist/easylist_general_block.txt%

45	46

46 The first instruction contains a URL that will be fetched and inserted at the	47 The first instruction contains a URL that will be fetched and inserted at the

47 point of reference.	48 point of reference.

48 The second one contains a path inside easylist repository.	49 The second one contains a path inside easylist repository.

49 `flrender` needs to be able to find a copy of the repository on the local	50 `flrender` needs to be able to find a copy of the repository on the local

50 filesystem. We use `-i` option to point it to to the right directory:	51 filesystem. We use `-i` option to point it to to the right directory:

51	52

52 $ flrender -i easylist=/home/abc/easylist input.txt output.txt	53 $ flrender -i easylist=/home/abc/easylist input.txt output.txt

53	54

54 Now the second reference above will be resolved to	55 Now the second reference above will be resolved to

(...skipping 13 matching lines...) Expand all Loading...
68 If you don't know all the source names that are needed to render some list,	69 If you don't know all the source names that are needed to render some list,

69 just run `flrender` and it will report what it's missing:	70 just run `flrender` and it will report what it's missing:

70	71

71 $ flrender easylist.txt output/easylist.txt	72 $ flrender easylist.txt output/easylist.txt

72 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener	73 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener

73 al_block.txt' from 'easylist.txt'	74 al_block.txt' from 'easylist.txt'

74	75

75 You can clone the necessary repositories to a local directory and add `-i`	76 You can clone the necessary repositories to a local directory and add `-i`

76 options accordingly.	77 options accordingly.

77	78

	79 ## Library API

	80

	81 Python-abp can also be used as a library for parsing filter lists. For example

	82 to read a filter list (we use Python 3 syntax here but the API is the same):

	83

	84 from abp.filters import parse_filterlist

	85

	86 with open('filterlist.txt') as filterlist:

	87 for line in parse_filterlist(filterlist):

	88 print(line)

	89

	90 If `filterlist.txt` contains a filter list:

	91

	92 [Adblock Plus 2.0]

	93 ! Title: Example list

	94

	95 abc.com,cdf.com##div#ad1

	96 abc.com/ad$image

	97 @@/abc\.com/

	98 ...

	99

	100 the output will look similar to the following:

	101

	102 Header(version='Adblock Plus 2.0')

	103 Metadata(key='Title', value='Example list')

	104 EmptyLine()

	105 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])

	106 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])

	107 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])

	108 ...

	109

	110 In general `parse_filterlist` takes an iterable of strings (such as a list or

	111 an open file) and returns an iterable of parsed filter list lines. Each line

	112 will have its `.type` attribute set to a string indicating its type. It will

	113 also have a `.to_string()` method that converts it to a unicode string in the

	114 filter list format (most of the time it's the same as the string from which the

	115 filter was parsed). Further attributes depend on the type of the line.

	116

	117 Note: `parse_filterlist` returns an iterator, not a list, and only consumes

	118 the input lines when its output is iterated over. This allows much more memory

	119 efficient handling of large filter lists, however there are two things to watch

	120 out for:

	121

	122 Note: iteration over parsed lines may throw a `ParseError` exception if a

	123 line cannot be parsed. The exception will contain the information about the

	124 error and the original line that failed parsing.
	mathias 2017/08/08 12:24:35 It is not clear what bits this is about (I assume It is not clear what bits this is about (I assume the generator that is function parse_filterlist) and the following section/list seems a bit unrelated. Also, as a side-note, I am not sure this is the desired behavior - analogous to EmptyLine there should be an InvalidLine or something that contains an .error reference to the exception object and the .raw string. Otherwise one would always abort processing a list just because of something being allegedly malformed, which could very well be something one just wants to skip over and log about. Vasily Kuznetsov 2017/08/08 14:31:12 Yeah, we've discussed this. But for now that chang Show quoted text On 2017/08/08 12:24:35, mathias wrote: > It is not clear what bits this is about (I assume the generator that is function > parse_filterlist) and the following section/list seems a bit unrelated. > > Also, as a side-note, I am not sure this is the desired behavior - analogous to > EmptyLine there should be an InvalidLine or something that contains an .error > reference to the exception object and the .raw string. Otherwise one would > always abort processing a list just because of something being allegedly > malformed, which could very well be something one just wants to skip over and > log about. Yeah, we've discussed this. But for now that change is not in yet, so I'm documenting the actual behavior.
	125

	126 - When you're parsing filters from a file, you need to complete the iteration

	127 before you close the file.

	128 - Once you iterate over the output of `parse_filterlist` once, it will be

	129 consumed and you won't be iterate over it again.

	130

	131 If you find that this is bothering you, you probably want to convert the output
	mathias 2017/08/08 12:24:34 Everything in this section from here on, maybe inc Everything in this section from here on, maybe including the two items listed above, seems terribly obvious after the first Note: earlier about parse_filterlist() being a generator. Do you really think it is worth the effort documenting that stuff in explicit fashion?
	132 of `parse_filterlist` to a list:

	133

	134 lines_list = list(parse_filterlist(filterlist))

	135

	136 This will load the whole file into memory but unless you're dealing with a

	137 gigantic filter list that should not be a problem.

	138

	139 ### Line types

	140

	141 As mentioned above, lines of different types have different attributes:

	142

	143 \| type \| attributes \|
	mathias 2017/08/08 12:24:35 Are you sure this kind of table markup is supporte Are you sure this kind of table markup is supported by all the systems we want to properly render the README.md file? That'll be news to me, I was under the impression that this is basically a part of the "GitHub flavor" just loosely adopted by some libraries. Vasily Kuznetsov 2017/08/08 14:31:12 Indeed the table markup was not part of the origin Show quoted text On 2017/08/08 12:24:35, mathias wrote: > Are you sure this kind of table markup is supported by all the systems we want > to properly render the README.md file? That'll be news to me, I was under the > impression that this is basically a part of the "GitHub flavor" just loosely > adopted by some libraries. Indeed the table markup was not part of the original Markdown. However, it does seem to be supported by all websites, tools and libraries that I've seen.
	144 \|------------\|------------------------------------------------------------------ ------\|

	145 \| header \| `version` - plugin version string \|

	146 \| emptyline \| no options \|

	147 \| comment \| `text` - text of the comment \|

	148 \| metadata \| `key` - name of the metadata field, `value` - value of the field \|

	149 \| include \| `target` - url/path of the file to include \|

	150 \| filter \| `text` - text of the filter, `selector` - what to look for, `acti on` - what to do with selected items, `options` - filter options \|

	151

	152 #### Filter atributes
	mathias 2017/08/08 12:24:35 This section mentions "Selector" but not ".selecto This section mentions "Selector" but not ".selector", "Action" but not ".action", etc. - hence it's not obvious to the reader what to do with that information and what it is about.
	153

	154 Selector is a dictionary with two keys:

	155

	156 \| key \| meaning \|

	157 \|--------------\|---------------------------------------------------------------- --\|

	158 \| type \| 'css', 'abp-simple', 'url-pattern', 'url-regexp', 'extended-css ' \|

	159 \| value \| the selector itself, the meaning is type-dependent \|

	160

	161 It's preferable to import `SELECTOR_TYPE` namespace from `abp.filters` to refer

	162 to filter types instead of using strings. `SELECTOR_TYPE` contains constants

	163 for each filter type: `SELECTOR_TYPE.CSS`, `SELECTOR_TYPE.ABP_SIMPLE`,

	164 `SELECTOR_TYPE.URL_PATTERN`, `SELECTOR_TYPE.URL_REGEXP` and

	165 `SELECTOR_TYPE.XCSS`.

	166

	167 Action instructs adblocking software on what should be done with the items

	168 matching the selector:

	169

	170 \| action \| meaning \|

	171 \|--------\|---------------------------------------------------------------------- --\|

	172 \| block \| block http(s) request that matches the selector \|

	173 \| allow \| allow http(s) request that matches the filter (whitelist the resource ) \|

	174 \| hide \| hide the DOM element that matches the selector \|

	175 \| show \| show the DOM element that matches the selector (whitelist the element ) \|

	176

	177 The action constants are contained in `FILTER_ACTION` namespace, which can also

	178 be imported from `abp.filters` (`FILTER_ACTION.BLOCK`, `FILTER_ACTION.ALLOW`,

	179 etc.)

	180

	181 Options is a list of tuples consisting of option name and option value. The

	182 option value is `True` or `False` for flags or, for options with a value, it's

	183 a string, list of strings or a list of `(string, boolean)` tuples. See

	184 [documentation on authoring the filter rules][2] for the list of existing

	185 options and their meanings.

	186

	187 ### Other functions

	188

	189 `abp.filters` module also exports a lower-level function for parsing individual

	190 lines of a filter list: `parse_line`. It returns a parsed line object just like

	191 the items in the iterator returned by `parse_filterlist`.

	192

78 ## Testing	193 ## Testing

79	194

80 Unit tests for `python-abp` are located in the `/tests` directory.	195 Unit tests for `python-abp` are located in the `/tests` directory.

81 [Pytest](http://pytest.org/) is used for quickly running the tests	196 [Pytest][3] is used for quickly running the tests

82 during development.	197 during development.

83 [Tox](https://tox.readthedocs.org/) is used for testing in different	198 [Tox][4] is used for testing in different

84 environments (Python 2.7, Python 3.5 and PyPy) and code quality	199 environments (Python 2.7, Python 3.5+ and PyPy) and code quality

85 reporting.	200 reporting.

86	201

87 In order to execute the tests, first create and activate development	202 In order to execute the tests, first create and activate development

88 virtualenv:	203 virtualenv:

89	204

90 $ python setup.py devenv	205 $ python setup.py devenv

91 $ . devenv/bin/activate	206 $ . devenv/bin/activate

92	207

93 With the development virtualenv activated use pytest for a quick test run:	208 With the development virtualenv activated use pytest for a quick test run:

94	209

95 (devenv) $ py.test tests	210 (devenv) $ pytest tests

96	211

97 and tox for a comprehensive report:	212 and tox for a comprehensive report:

98	213

99 (devenv) $ tox	214 (devenv) $ tox

	215

	216

	217 [1]: https://adblockplus.org/filters#special-comments

	218 [2]: https://adblockplus.org/filters#options

	219 [3]: http://pytest.org/

	220 [4]: https://tox.readthedocs.org/

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »