| LEFT | RIGHT | 
|---|
| (no file at all) |  | 
| 1 # python-abp | 1 python-abp | 
| 2 | 2 ========== | 
| 3 This repository contains a library for working with Adblock Plus filter lists | 3 | 
| 4 and the script that is used for building Adblock Plus filter lists from the | 4 This repository contains a library for working with Adblock Plus filter lists, | 
| 5 form in which they are authored into the format suitable for consumption by the | 5 a script for rendering diffs between filter lists, and the script that is used | 
| 6 adblocking software. | 6 for building Adblock Plus filter lists from the form in which they are authored | 
| 7 | 7 into the format suitable for consumption by the adblocking software (aka | 
| 8 ## Installation | 8 rendering). | 
|  | 9 | 
|  | 10 .. contents:: | 
|  | 11 | 
|  | 12 | 
|  | 13 Installation | 
|  | 14 ------------ | 
| 9 | 15 | 
| 10 Prerequisites: | 16 Prerequisites: | 
| 11 | 17 | 
| 12 * Linux, Mac OS X or Windows (any modern Unix should work too), | 18 * Linux, Mac OS X or Windows (any modern Unix should work too), | 
| 13 * Python (2.7 or 3.5+), | 19 * Python (2.7 or 3.5+), | 
| 14 * pip. | 20 * pip. | 
| 15 | 21 | 
| 16 To install: | 22 To install:: | 
| 17 | 23 | 
| 18     $ pip install -U python-abp | 24     $ pip install --upgrade python-abp | 
| 19 | 25 | 
| 20 ## Rendering of filter lists | 26 | 
|  | 27 Rendering of filter lists | 
|  | 28 ------------------------- | 
| 21 | 29 | 
| 22 The filter lists are originally authored in relatively smaller parts focused | 30 The filter lists are originally authored in relatively smaller parts focused | 
| 23 on a particular type of filters, related to a specific topic or relevant | 31 on particular types of filters, related to a specific topic or relevant for a | 
| 24 for particular geographical area. | 32 particular geographical area. | 
| 25 We call these parts _filter list fragments_ (or just _fragments_) | 33 We call these parts *filter list fragments* (or just *fragments*) to | 
| 26 to distinguish them from full filter lists that are | 34 distinguish them from full filter lists that are consumed by the adblocking | 
| 27 consumed by the adblocking software such as Adblock Plus. | 35 software such as Adblock Plus. | 
| 28 | 36 | 
| 29 Rendering is a process that combines filter list fragments into a filter list. | 37 Rendering is a process that combines filter list fragments into a filter list. | 
| 30 It starts with one fragment that can include other ones and so forth. | 38 It starts with one fragment that can include other ones and so forth. | 
| 31 The produced filter list is marked with a [version and a timestamp][1]. | 39 The produced filter list is marked with a `version and a timestamp <https://adbl
     ockplus.org/filters#special-comments>`_. | 
| 32 | 40 | 
| 33 Python-abp contains a script that can do this called `flrender`: | 41 Python-abp contains a script that can do this called ``flrender``:: | 
| 34 | 42 | 
| 35     $ flrender fragment.txt filterlist.txt | 43     $ flrender fragment.txt filterlist.txt | 
| 36 | 44 | 
| 37 This will take the top level fragment in `fragment.txt`, render it and save into | 45 | 
| 38 `filterlist.txt`. | 46 This will take the top level fragment in ``fragment.txt``, render it and save it | 
| 39 | 47 into ``filterlist.txt``. | 
| 40 The `flrender` script can also be used by only specifying `fragment.txt`: | 48 | 
| 41 | 49 The ``flrender`` script can also be used by only specifying ``fragment.txt``:: | 
| 42     $flrender fragment.txt | 50 | 
| 43 | 51     $ flrender fragment.txt | 
| 44 in which case the rendering result will be sent to `stdout`. Moreover, when | 52 | 
| 45 it's run with no positional arguments: | 53 | 
| 46 | 54 in which case the rendering result will be sent to ``stdout``. Moreover, when | 
| 47     $flrender | 55 it's run with no positional arguments:: | 
| 48 | 56 | 
| 49 it will read from `stdin` and send the results to `stdout`. | 57     $ flrender | 
|  | 58 | 
|  | 59 | 
|  | 60 it will read from ``stdin`` and send the results to ``stdout``. | 
| 50 | 61 | 
| 51 Fragments might reference other fragments that should be included into them. | 62 Fragments might reference other fragments that should be included into them. | 
| 52 The references come in two forms: http(s) includes and local includes: | 63 The references come in two forms: http(s) includes and local includes:: | 
| 53 | 64 | 
| 54     %include http://www.server.org/dir/list.txt% | 65     %include http://www.server.org/dir/list.txt% | 
| 55     %include easylist:easylist/easylist_general_block.txt% | 66     %include easylist:easylist/easylist_general_block.txt% | 
| 56 | 67 | 
| 57 The first instruction contains a URL that will be fetched and inserted at the | 68 | 
| 58 point of reference. | 69 The http include contains a URL that will be fetched and inserted at the point | 
| 59 The second one contains a path inside easylist repository. | 70 of reference. | 
| 60 `flrender` needs to be able to find a copy of the repository on the local | 71 The local include contains a path inside the easylist repository. | 
| 61 filesystem. We use `-i` option to point it to to the right directory: | 72 ``flrender`` needs to be able to find a copy of the repository on the local | 
|  | 73 filesystem. We use ``-i`` option to point it to to the right directory:: | 
| 62 | 74 | 
| 63     $ flrender -i easylist=/home/abc/easylist input.txt output.txt | 75     $ flrender -i easylist=/home/abc/easylist input.txt output.txt | 
| 64 | 76 | 
| 65 Now the second reference above will be resolved to | 77 | 
| 66 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will | 78 Now the local include referenced above will be resolved to: | 
| 67 be loaded from this file. | 79 ``/home/abc/easylist/easylist/easylist_general_block.txt`` | 
|  | 80 and the fragment will be loaded from this file. | 
| 68 | 81 | 
| 69 Directories that contain filter list fragments that are used during rendering | 82 Directories that contain filter list fragments that are used during rendering | 
| 70 are called sources. | 83 are called sources. | 
| 71 They are normally working copies of the repositories that contain filter list | 84 They are normally working copies of the repositories that contain filter list | 
| 72 fragments. | 85 fragments. | 
| 73 Each source is identified by a name: that's the part that comes before ":" | 86 Each source is identified by a name: that's the part that comes before ":" in | 
| 74 in the include instruction and it should be the same as what comes before "=" | 87 the include instruction and it should be the same as what comes before "=" in | 
| 75 in the `-i` option. | 88 the ``-i`` option. | 
| 76 | 89 | 
| 77 Commonly used sources have generally accepted names. For example the main | 90 Commonly used sources have generally accepted names. For example the main | 
| 78 EasyList repository is referred to as `easylist`. | 91 EasyList repository is referred to as ``easylist``. | 
| 79 If you don't know all the source names that are needed to render some list, | 92 If you don't know all the source names that are needed to render some list, | 
| 80 just run `flrender` and it will report what it's missing: | 93 just run ``flrender`` and it will report what it's missing:: | 
| 81 | 94 | 
| 82     $ flrender easylist.txt output/easylist.txt | 95     $ flrender easylist.txt output/easylist.txt | 
| 83     Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener | 96     Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener | 
| 84     al_block.txt' from 'easylist.txt' | 97     al_block.txt' from 'easylist.txt' | 
| 85 | 98 | 
| 86 You can clone the necessary repositories to a local directory and add `-i` | 99 | 
|  | 100 You can clone the necessary repositories to a local directory and add ``-i`` | 
| 87 options accordingly. | 101 options accordingly. | 
| 88 | 102 | 
| 89 ## Rendering diffs | 103 | 
|  | 104 Generating diffs | 
|  | 105 ---------------- | 
| 90 | 106 | 
| 91 A diff allows a client running ad blocking software such as Adblock Plus to | 107 A diff allows a client running ad blocking software such as Adblock Plus to | 
| 92 update the filter lists incrementally, instead of downloading a new copy of a | 108 update the filter lists incrementally, instead of downloading a new copy of a | 
| 93 full list during each update. This is meant to lessen the amount of resources | 109 full list during each update. This is meant to lessen the amount of resources | 
| 94 used when updating filter lists (e.g. network data, memory usage, battery | 110 used when updating filter lists (e.g. network data, memory usage, battery | 
| 95 consumption, etc.), allowing clients to update their lists more frequently using | 111 consumption, etc.), allowing clients to update their lists more frequently | 
| 96 less resources. | 112 using less resources. | 
| 97 | 113 | 
| 98 Python-abp contains a script called `fldiff` that will find the diff between the | 114 python-abp contains a script called ``fldiff`` that will find the diff between | 
| 99 latest filter list, and any number of previous filter lists: | 115 the latest filter list, and any number of previous filter lists:: | 
| 100 | 116 | 
| 101     $ fldiff -o diffs/easylist easylist.txt archive/* | 117     $ fldiff -o diffs/easylist/ easylist.txt archive/* | 
| 102 | 118 | 
| 103 where `-o diffs/easylist` is the (optional) output directory where the diffs | 119 | 
| 104 should be written, `easylist.txt` is the most recent version of the filter list, | 120 where ``-o diffs/easylist/`` is the (optional) output directory where the diffs | 
| 105 and `archive/*` is the directory where all the archived filter lists are. When | 121 should be written, ``easylist.txt`` is the most recent version of the filter | 
| 106 called like this, the shell should automatically expand the `archive/*` | 122 list, and ``archive/*`` is the directory where all the archived filter lists are
     . | 
|  | 123 When called like this, the shell should automatically expand the ``archive/*`` | 
| 107 directory, giving the script each of the filenames separately. | 124 directory, giving the script each of the filenames separately. | 
| 108 | 125 | 
| 109 In the above example, the output of each archived `list[version].txt` will be | 126 In the above example, the output of each archived ``list[version].txt`` will be | 
| 110 written to `diffs/diff[version].txt`. If the output argument is omitted, the | 127 written to ``diffs/diff[version].txt``. If the output argument is omitted, the | 
| 111 diffs will be written to the current directory. | 128 diffs will be written to the current directory. | 
| 112 | 129 | 
| 113 The script produces three types of lines, as specified in the [technical | 130 The script produces three types of lines, as specified in the `technical | 
| 114 specification][5]: | 131 specification <https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nq
     bYYlGH72sbsSgQ/>`_: | 
| 115 | 132 | 
| 116 * Special comments of the form `! <name>:[ <value>]` | 133 | 
| 117 * Added filters of the form `+ <filter-text>` | 134 * Special comments of the form ``! <name>:[ <value>]`` | 
| 118 * Removed filters of the form `- <filter-text>` | 135 * Added filters of the form ``+ <filter-text>`` | 
| 119 | 136 * Removed filters of the form ``- <filter-text>`` | 
| 120 | 137 | 
| 121 ## Library API | 138 | 
| 122 | 139 Library API | 
| 123 Python-abp can also be used as a library for parsing filter lists. For example | 140 ----------- | 
|  | 141 | 
|  | 142 python-abp can also be used as a library for parsing filter lists. For example | 
| 124 to read a filter list (we use Python 3 syntax here but the API is the same): | 143 to read a filter list (we use Python 3 syntax here but the API is the same): | 
|  | 144 | 
|  | 145 .. code-block:: python | 
| 125 | 146 | 
| 126     from abp.filters import parse_filterlist | 147     from abp.filters import parse_filterlist | 
| 127 | 148 | 
| 128     with open('filterlist.txt') as filterlist: | 149     with open('filterlist.txt') as filterlist: | 
| 129         for line in parse_filterlist(filterlist): | 150         for line in parse_filterlist(filterlist): | 
| 130             print(line) | 151             print(line) | 
| 131 | 152 | 
| 132 If `filterlist.txt` contains a filter list: | 153 | 
|  | 154 If ``filterlist.txt`` contains this filter list:: | 
| 133 | 155 | 
| 134     [Adblock Plus 2.0] | 156     [Adblock Plus 2.0] | 
| 135     ! Title: Example list | 157     ! Title: Example list | 
| 136 | 158 | 
| 137     abc.com,cdf.com##div#ad1 | 159     abc.com,cdf.com##div#ad1 | 
| 138     abc.com/ad$image | 160     abc.com/ad$image | 
| 139     @@/abc\.com/ | 161     @@/abc\.com/ | 
| 140     ... | 162 | 
| 141 | 163 | 
| 142 the output will look something like: | 164 the output will look something like: | 
|  | 165 | 
|  | 166 .. code-block:: python | 
| 143 | 167 | 
| 144     Header(version='Adblock Plus 2.0') | 168     Header(version='Adblock Plus 2.0') | 
| 145     Metadata(key='Title', value='Example list') | 169     Metadata(key='Title', value='Example list') | 
| 146     EmptyLine() | 170     EmptyLine() | 
| 147     Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd
     iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr
     ue)])]) | 171     Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd
     iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr
     ue)])]) | 
| 148     Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a
     bc.com/ad'}, action='block', options=[('image', True)]) | 172     Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a
     bc.com/ad'}, action='block', options=[('image', True)]) | 
| 149     Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\
     .com'}, action='allow', options=[]) | 173     Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\
     .com'}, action='allow', options=[]) | 
| 150     ... | 174 | 
| 151 | 175 | 
| 152 `abp.filters` module also exports a lower-level function for parsing individual | 176 The ``abp.filters`` module also exports a lower-level function for parsing | 
| 153 lines of a filter list: `parse_line`. It returns a parsed line object just like | 177 individual lines of a filter list: ``parse_line``. It returns a parsed line | 
| 154 the items in the iterator returned by `parse_filterlist`. | 178 object just like the items in the iterator returned by ``parse_filterlist``. | 
| 155 | 179 | 
| 156 For further information on the library API use `help()` on `abp.filters` and | 180 For further information on the library API use ``help()`` on ``abp.filters`` and | 
| 157 its contents in interactive Python session, read the docstrings or look at the | 181 its contents in an interactive Python session, read the docstrings, or look at | 
| 158 tests for some usage examples. | 182 the tests for some usage examples. | 
| 159 | 183 | 
| 160 ## Testing | 184 | 
| 161 | 185 Testing | 
| 162 Unit tests for `python-abp` are located in the `/tests` directory. | 186 ------- | 
| 163 [Pytest][2] is used for quickly running the tests | 187 | 
| 164 during development. | 188 Unit tests for ``python-abp`` are located in the ``/tests`` directory. `Pytest <
     http://pytest.org/>`_ | 
| 165 [Tox][3] is used for testing in different | 189 is used for quickly running the tests during development. `Tox <https://tox.read
     thedocs.org/>`_ is used for | 
| 166 environments (Python 2.7, Python 3.5+ and PyPy) and code quality | 190 testing in different environments (Python 2.7, Python 3.5+ and PyPy) and code | 
| 167 reporting. | 191 quality reporting. | 
| 168 | 192 | 
| 169 In order to execute the tests, first create and activate development | 193 Use tox for a comprehensive report of unit tests and test coverage:: | 
| 170 virtualenv: | 194 | 
| 171 | 195     $ tox | 
| 172     $ python setup.py devenv | 196 | 
| 173     $ . devenv/bin/activate | 197 | 
| 174 | 198 Development | 
| 175 With the development virtualenv activated use pytest for a quick test run: | 199 ----------- | 
| 176 | 200 | 
| 177     (devenv) $ pytest tests | 201 When adding new functionality, add tests for it (preferably first). If some | 
| 178 | 202 code will never be reached on a certain version of Python, it may be exempted | 
| 179 and tox for a comprehensive report: | 203 from coverage tests by adding a comment, e.g. ``# pragma: no py2 cover``. | 
| 180 |  | 
| 181     (devenv) $ tox |  | 
| 182 |  | 
| 183 ## Development |  | 
| 184 |  | 
| 185 When adding new functionality, add tests for it (preferably first). Code |  | 
| 186 coverage (as measured by `tox -e qa`) should not decrease and the tests |  | 
| 187 should pass in all Tox environments. |  | 
| 188 | 204 | 
| 189 All public functions, classes and methods should have docstrings compliant with | 205 All public functions, classes and methods should have docstrings compliant with | 
| 190 [NumPy/SciPy documentation guide][4]. One exception is the constructors of | 206 `NumPy/SciPy documentation guide <https://github.com/numpy/numpy/blob/master/doc
     /HOWTO_DOCUMENT.rst.txt>`_. | 
| 191 classes that the user is not expected to instantiate (such as exceptions). | 207 One exception is the constructors of classes that the user is not expected to | 
| 192 | 208 instantiate (such as exceptions). | 
| 193 | 209 | 
| 194 ## Using the library with R | 210 | 
| 195 | 211 Using the library with R | 
| 196 Clone the repo to you local machine. Then create a virtualenv and install | 212 ------------------------ | 
| 197 python abp there: | 213 | 
| 198 | 214 Clone the repo to your local machine. Then create a virtualenv and install | 
| 199         $ cd python-abp | 215 python-abp there:: | 
| 200         $ virtualenv env | 216 | 
| 201         $ pip install --upgrade . | 217     $ cd python-abp | 
| 202 | 218     $ virtualenv env | 
| 203 Then import it with `reticulate` in R: | 219     $ pip install --upgrade . | 
| 204 | 220 | 
| 205         > library(reticulate) | 221 | 
| 206         > use_virtualenv("~/python-abp/env", required=TRUE) | 222 Then import it with ``reticulate`` in R: | 
| 207         > abp <- import("abp.filters.rpy") | 223 | 
| 208 | 224 .. code-block:: R | 
| 209 Now you can use the functions with `abp$functionname`, e.g. | 225 | 
| 210 `abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")` | 226     > library(reticulate) | 
| 211 | 227     > use_virtualenv("~/python-abp/env", required=TRUE) | 
| 212 | 228     > abp <- import("abp.filters.rpy") | 
| 213  [1]: https://adblockplus.org/filters#special-comments | 229 | 
| 214  [2]: http://pytest.org/ | 230 Now you can use the functions with ``abp$functionname``, e.g. | 
| 215  [3]: https://tox.readthedocs.org/ | 231 ``abp.line2dict("@@||g.doubleclick.net/pagead/$subdocument,domain=hon30.org")``. | 
| 216  [4]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt |  | 
| 217  [5]: https://docs.google.com/document/d/1SoEqaOBZRCfkh1s5Kds5A5RwUC_nqbYYlGH72s
     bsSgQ/ |  | 
| LEFT | RIGHT | 
|---|