Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: README.md

Issue 29968569: Issue 4014 - Publish python-abp on PyPI (Closed) Base URL: https://hg.adblockplus.org/python-abp/
Patch Set: Address comment on PS4 Created Jan. 3, 2019, 4:41 a.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « MANIFEST.in ('k') | abp/filters/sources.py » ('j') | setup.py » ('J')
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # python-abp 1 python-abp
2 ==========
2 3
3 This repository contains a library for working with Adblock Plus filter lists 4 This repository contains a library for working with Adblock Plus filter lists,
4 and the script that is used for building Adblock Plus filter lists from the 5 a script for rendering diffs between filter lists, and the script that is used
5 form in which they are authored into the format suitable for consumption by the 6 for building Adblock Plus filter lists from the form in which they are authored
6 adblocking software. 7 into the format suitable for consumption by the adblocking software (aka
8 rendering).
7 9
8 ## Installation 10 .. contents::
11
12
13 Installation
14 ------------
9 15
10 Prerequisites: 16 Prerequisites:
11 17
12 * Linux, Mac OS X or Windows (any modern Unix should work too), 18 * Linux, Mac OS X or Windows (any modern Unix should work too),
13 * Python (2.7 or 3.5+), 19 * Python (2.7 or 3.5+),
14 * pip. 20 * pip.
15 21
16 To install: 22 To install::
17 23
18 $ pip install -U python-abp 24 $ pip install --upgrade python-abp
19 25
20 ## Rendering of filter lists 26
27 Rendering of filter lists
28 -------------------------
21 29
22 The filter lists are originally authored in relatively smaller parts focused 30 The filter lists are originally authored in relatively smaller parts focused
23 on a particular type of filters, related to a specific topic or relevant 31 on particular types of filters, related to a specific topic or relevant for a
24 for particular geographical area. 32 particular geographical area.
25 We call these parts _filter list fragments_ (or just _fragments_) 33 We call these parts *filter list fragments* (or just *fragments*) to
26 to distinguish them from full filter lists that are 34 distinguish them from full filter lists that are consumed by the adblocking
27 consumed by the adblocking software such as Adblock Plus. 35 software such as Adblock Plus.
28 36
29 Rendering is a process that combines filter list fragments into a filter list. 37 Rendering is a process that combines filter list fragments into a filter list.
30 It starts with one fragment that can include other ones and so forth. 38 It starts with one fragment that can include other ones and so forth.
31 The produced filter list is marked with a [version and a timestamp][1]. 39 The produced filter list is marked with a `version and a timestamp <https://adbl ockplus.org/filters#special-comments>`_.
32 40
33 Python-abp contains a script that can do this called `flrender`: 41 Python-abp contains a script that can do this called ``flrender``::
34 42
35 $ flrender fragment.txt filterlist.txt 43 $ flrender fragment.txt filterlist.txt
36 44
37 This will take the top level fragment in `fragment.txt`, render it and save into
38 `filterlist.txt`.
39 45
40 The `flrender` script can also be used by only specifying `fragment.txt`: 46 This will take the top level fragment in ``fragment.txt``, render it and save it
41 47 into ``filterlist.txt``.
42 $flrender fragment.txt
43
44 in which case the rendering result will be sent to `stdout`. Moreover, when
45 it's run with no positional arguments:
46 48
47 $flrender 49 The ``flrender`` script can also be used by only specifying ``fragment.txt``::
48 50
49 it will read from `stdin` and send the results to `stdout`. 51 $ flrender fragment.txt
52
53
54 in which case the rendering result will be sent to ``stdout``. Moreover, when
55 it's run with no positional arguments::
56
57 $ flrender
58
59
60 it will read from ``stdin`` and send the results to ``stdout``.
50 61
51 Fragments might reference other fragments that should be included into them. 62 Fragments might reference other fragments that should be included into them.
52 The references come in two forms: http(s) includes and local includes: 63 The references come in two forms: http(s) includes and local includes::
53 64
54 %include http://www.server.org/dir/list.txt% 65 %include http://www.server.org/dir/list.txt%
55 %include easylist:easylist/easylist_general_block.txt% 66 %include easylist:easylist/easylist_general_block.txt%
56 67
57 The first instruction contains a URL that will be fetched and inserted at the 68
58 point of reference. 69 The http include contains a URL that will be fetched and inserted at the point
59 The second one contains a path inside easylist repository. 70 of reference.
60 `flrender` needs to be able to find a copy of the repository on the local 71 The local include contains a path inside the easylist repository.
61 filesystem. We use `-i` option to point it to to the right directory: 72 ``flrender`` needs to be able to find a copy of the repository on the local
73 filesystem. We use ``-i`` option to point it to to the right directory::
62 74
63 $ flrender -i easylist=/home/abc/easylist input.txt output.txt 75 $ flrender -i easylist=/home/abc/easylist input.txt output.txt
64 76
65 Now the second reference above will be resolved to 77
66 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will 78 Now the local include referenced above will be resolved to:
67 be loaded from this file. 79 ``/home/abc/easylist/easylist/easylist_general_block.txt``
80 and the fragment will be loaded from this file.
68 81
69 Directories that contain filter list fragments that are used during rendering 82 Directories that contain filter list fragments that are used during rendering
70 are called sources. 83 are called sources.
71 They are normally working copies of the repositories that contain filter list 84 They are normally working copies of the repositories that contain filter list
72 fragments. 85 fragments.
73 Each source is identified by a name: that's the part that comes before ":" 86 Each source is identified by a name: that's the part that comes before ":" in
74 in the include instruction and it should be the same as what comes before "=" 87 the include instruction and it should be the same as what comes before "=" in
75 in the `-i` option. 88 the ``-i`` option.
76 89
77 Commonly used sources have generally accepted names. For example the main 90 Commonly used sources have generally accepted names. For example the main
78 EasyList repository is referred to as `easylist`. 91 EasyList repository is referred to as ``easylist``.
79 If you don't know all the source names that are needed to render some list, 92 If you don't know all the source names that are needed to render some list,
80 just run `flrender` and it will report what it's missing: 93 just run ``flrender`` and it will report what it's missing::
81 94
82 $ flrender easylist.txt output/easylist.txt 95 $ flrender easylist.txt output/easylist.txt
83 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener 96 Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener
84 al_block.txt' from 'easylist.txt' 97 al_block.txt' from 'easylist.txt'
85 98
86 You can clone the necessary repositories to a local directory and add `-i` 99
100 You can clone the necessary repositories to a local directory and add ``-i``
87 options accordingly. 101 options accordingly.
88 102
89 ## Rendering diffs 103
104 Generating diffs
105 ----------------
90 106
91 A diff allows a client running ad blocking software such as Adblock Plus to 107 A diff allows a client running ad blocking software such as Adblock Plus to
92 update the filter lists incrementally, instead of downloading a new copy of a 108 update the filter lists incrementally, instead of downloading a new copy of a
93 full list during each update. This is meant to lessen the amount of resources 109 full list during each update. This is meant to lessen the amount of resources
94 used when updating filter lists (e.g. network data, memory usage, battery 110 used when updating filter lists (e.g. network data, memory usage, battery
95 consumption, etc.), allowing clients to update their lists more frequently using 111 consumption, etc.), allowing clients to update their lists more frequently
96 less resources. 112 using less resources.
97 113
98 Python-abp contains a script called `fldiff` that will find the diff between the 114 python-abp contains a script called ``fldiff`` that will find the diff between
99 latest filter list, and any number of previous filter lists: 115 the latest filter list, and any number of previous filter lists::
100 116
101 $ fldiff -o diffs/easylist easylist.txt archive/* 117 $ fldiff -o diffs/easylist/ easylist.txt archive/*
102 118
103 where `-o diffs/easylist` is the (optional) output directory where the diffs 119
104 should be written, `easylist.txt` is the most recent version of the filter list, 120 where ``-o diffs/easylist/`` is the (optional) output directory where the diffs
105 and `archive/*` is the directory where all the archived filter lists are. When 121 should be written, ``easylist.txt`` is the most recent version of the filter
106 called like this, the shell should automatically expand the `archive/*` 122 list, and ``archive/*`` is the directory where all the archived filter lists are .
123 When called like this, the shell should automatically expand the ``archive/*``
107 directory, giving the script each of the filenames separately. 124 directory, giving the script each of the filenames separately.
108 125
109 In the above example, the output of each archived `list[version].txt` will be 126 In the above example, the output of each archived ``list[version].txt`` will be
110 written to `diffs/diff[version].txt`. If the output argument is omitted, the 127 written to ``diffs/diff[version].txt``. If the output argument is omitted, the
111 diffs will be written to the current directory. 128 diffs will be written to the current directory.
112 129
113 The script produces three types of lines, as specified in the [technical 130 The script produces three types of lines, as specified in the `technical
114 specification][5]: 131 specification <https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nq bYYlGH72sbsSgQ/>`_:
115
116 * Special comments of the form `! <name>:[ <value>]`
117 * Added filters of the form `+ <filter-text>`
118 * Removed filters of the form `- <filter-text>`
119 132
120 133
121 ## Library API 134 * Special comments of the form ``! <name>:[ <value>]``
135 * Added filters of the form ``+ <filter-text>``
136 * Removed filters of the form ``- <filter-text>``
122 137
123 Python-abp can also be used as a library for parsing filter lists. For example 138
139 Library API
140 -----------
141
142 python-abp can also be used as a library for parsing filter lists. For example
124 to read a filter list (we use Python 3 syntax here but the API is the same): 143 to read a filter list (we use Python 3 syntax here but the API is the same):
125 144
145 .. code-block:: python
146
126 from abp.filters import parse_filterlist 147 from abp.filters import parse_filterlist
127 148
128 with open('filterlist.txt') as filterlist: 149 with open('filterlist.txt') as filterlist:
129 for line in parse_filterlist(filterlist): 150 for line in parse_filterlist(filterlist):
130 print(line) 151 print(line)
131 152
132 If `filterlist.txt` contains a filter list: 153
154 If ``filterlist.txt`` contains this filter list::
133 155
134 [Adblock Plus 2.0] 156 [Adblock Plus 2.0]
135 ! Title: Example list 157 ! Title: Example list
136 158
137 abc.com,cdf.com##div#ad1 159 abc.com,cdf.com##div#ad1
138 abc.com/ad$image 160 abc.com/ad$image
139 @@/abc\.com/ 161 @@/abc\.com/
140 ... 162
141 163
142 the output will look something like: 164 the output will look something like:
143 165
166 .. code-block:: python
167
144 Header(version='Adblock Plus 2.0') 168 Header(version='Adblock Plus 2.0')
145 Metadata(key='Title', value='Example list') 169 Metadata(key='Title', value='Example list')
146 EmptyLine() 170 EmptyLine()
147 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])]) 171 Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr ue)])])
148 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)]) 172 Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a bc.com/ad'}, action='block', options=[('image', True)])
149 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[]) 173 Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\ .com'}, action='allow', options=[])
150 ...
151 174
152 `abp.filters` module also exports a lower-level function for parsing individual
153 lines of a filter list: `parse_line`. It returns a parsed line object just like
154 the items in the iterator returned by `parse_filterlist`.
155 175
156 For further information on the library API use `help()` on `abp.filters` and 176 The ``abp.filters`` module also exports a lower-level function for parsing
157 its contents in interactive Python session, read the docstrings or look at the 177 individual lines of a filter list: ``parse_line``. It returns a parsed line
158 tests for some usage examples. 178 object just like the items in the iterator returned by ``parse_filterlist``.
159 179
160 ## Testing 180 For further information on the library API use ``help()`` on ``abp.filters`` and
181 its contents in an interactive Python session, read the docstrings, or look at
182 the tests for some usage examples.
161 183
162 Unit tests for `python-abp` are located in the `/tests` directory.
163 [Pytest][2] is used for quickly running the tests
164 during development.
165 [Tox][3] is used for testing in different
166 environments (Python 2.7, Python 3.5+ and PyPy) and code quality
167 reporting.
168 184
169 In order to execute the tests, first create and activate development 185 Testing
170 virtualenv: 186 -------
171 187
172 $ python setup.py devenv 188 Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest < http://pytest.org/>`_
173 $ . devenv/bin/activate 189 is used for quickly running the tests during development. `Tox <https://tox.read thedocs.org/>`_ is used for
190 testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code
191 quality reporting.
174 192
175 With the development virtualenv activated use pytest for a quick test run:
176 193
177 (devenv) $ pytest tests 194 Development
195 -----------
178 196
179 and tox for a comprehensive report: 197 When adding new functionality, add tests for it (preferably first). If some
180 198 code will never be reached on a certain version of Python, it may be exempted
181 (devenv) $ tox 199 from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``.
182
183 ## Development
184
185 When adding new functionality, add tests for it (preferably first). Code
186 coverage (as measured by `tox -e qa`) should not decrease and the tests
187 should pass in all Tox environments.
188 200
189 All public functions, classes and methods should have docstrings compliant with 201 All public functions, classes and methods should have docstrings compliant with
190 [NumPy/SciPy documentation guide][4]. One exception is the constructors of 202 `NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc /HOWTO_DOCUMENT.rst.txt>`_.
191 classes that the user is not expected to instantiate (such as exceptions). 203 One exception is the constructors of classes that the user is not expected to
204 instantiate (such as exceptions).
192 205
193 206
194 ## Using the library with R 207 Using the library with R
208 ------------------------
195 209
196 Clone the repo to you local machine. Then create a virtualenv and install 210 Clone the repo to your local machine. Then create a virtualenv and install
197 python abp there: 211 python-abp there::
198 212
199 $ cd python-abp 213 $ cd python-abp
200 $ virtualenv env 214 $ virtualenv env
201 $ pip install --upgrade . 215 $ pip install --upgrade .
202
203 Then import it with `reticulate` in R:
204
205 > library(reticulate)
206 > use_virtualenv("~/python-abp/env", required=TRUE)
207 > abp <- import("abp.filters.rpy")
208
209 Now you can use the functions with `abp$functionname`, e.g.
210 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")`
211 216
212 217
213 [1]: https://adblockplus.org/filters#special-comments 218 Then import it with ``reticulate`` in R:
214 [2]: http://pytest.org/ 219
215 [3]: https://tox.readthedocs.org/ 220 .. code-block:: R
216 [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt 221
217 [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s bsSgQ/ 222 > library(reticulate)
223 > use_virtualenv("~/python-abp/env", required=TRUE)
224 > abp <- import("abp.filters.rpy")
225
226 Now you can use the functions with ``abp$functionname``, e.g.
227 ``abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``.
OLDNEW
« no previous file with comments | « MANIFEST.in ('k') | abp/filters/sources.py » ('j') | setup.py » ('J')

Powered by Google App Engine
This is Rietveld