| LEFT | RIGHT | 
|    1 # python-abp |    1 # python-abp | 
|    2  |    2  | 
|    3 This repository contains a library for working with Adblock Plus filter lists |    3 This repository contains a library for working with Adblock Plus filter lists | 
|    4 and the script that is used for building Adblock Plus filter lists from the |    4 and the script that is used for building Adblock Plus filter lists from the | 
|    5 form in which they are authored into the format suitable for consumption by the |    5 form in which they are authored into the format suitable for consumption by the | 
|    6 adblocking software. |    6 adblocking software. | 
|    7  |    7  | 
|    8 ## Installation |    8 ## Installation | 
|    9  |    9  | 
|   10 Prerequisites: |   10 Prerequisites: | 
|   11  |   11  | 
|   12 * Linux, Mac OS X or Windows (any modern Unix should work too), |   12 * Linux, Mac OS X or Windows (any modern Unix should work too), | 
|   13 * Python (2.7 or 3.5, 3.6), |   13 * Python (2.7 or 3.5+), | 
|   14 * pip. |   14 * pip. | 
|   15  |   15  | 
|   16 To install: |   16 To install: | 
|   17  |   17  | 
|   18     $ pip install -U python-abp |   18     $ pip install -U python-abp | 
|   19  |   19  | 
|   20 ## Rendering of filter lists |   20 ## Rendering of filter lists | 
|   21  |   21  | 
|   22 The filter lists are originally authored in relatively smaller parts focused |   22 The filter lists are originally authored in relatively smaller parts focused | 
|   23 on a particular type of filters, related to a specific topic or relevant |   23 on a particular type of filters, related to a specific topic or relevant | 
| (...skipping 23 matching lines...) Expand all  Loading... | 
|   47 The first instruction contains a URL that will be fetched and inserted at the |   47 The first instruction contains a URL that will be fetched and inserted at the | 
|   48 point of reference.  |   48 point of reference.  | 
|   49 The second one contains a path inside easylist repository. |   49 The second one contains a path inside easylist repository. | 
|   50 `flrender` needs to be able to find a copy of the repository on the local |   50 `flrender` needs to be able to find a copy of the repository on the local | 
|   51 filesystem. We use `-i` option to point it to to the right directory: |   51 filesystem. We use `-i` option to point it to to the right directory: | 
|   52  |   52  | 
|   53     $ flrender -i easylist=/home/abc/easylist input.txt output.txt |   53     $ flrender -i easylist=/home/abc/easylist input.txt output.txt | 
|   54  |   54  | 
|   55 Now the second reference above will be resolved to |   55 Now the second reference above will be resolved to | 
|   56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will |   56 `/home/abc/easylist/easylist/easylist_general_block.txt` and the fragment will | 
|   57 be read from this file. |   57 be loaded from this file. | 
|   58  |   58  | 
|   59 Directories that contain filter list fragments that are used during rendering |   59 Directories that contain filter list fragments that are used during rendering | 
|   60 are called sources. |   60 are called sources. | 
|   61 They are normally working copies of the repositories that contain filter list |   61 They are normally working copies of the repositories that contain filter list | 
|   62 fragments. |   62 fragments. | 
|   63 Each source is identified by a name: that's the part that comes before ":" |   63 Each source is identified by a name: that's the part that comes before ":" | 
|   64 in the include instruction and it should be the same as what comes before "=" |   64 in the include instruction and it should be the same as what comes before "=" | 
|   65 in the `-i` option. |   65 in the `-i` option. | 
|   66  |   66  | 
|   67 Commonly used sources have generally accepted names. For example the main |   67 Commonly used sources have generally accepted names. For example the main | 
|   68 EasyList repository is referred to as `easylist`. |   68 EasyList repository is referred to as `easylist`. | 
|   69 If you don't know all the source names that are needed to render some list, |   69 If you don't know all the source names that are needed to render some list, | 
|   70 just run `flrender` and it will report what it's missing: |   70 just run `flrender` and it will report what it's missing: | 
|   71  |   71  | 
|   72     $ flrender easylist.txt output/easylist.txt |   72     $ flrender easylist.txt output/easylist.txt | 
|   73     Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener |   73     Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener | 
|   74     al_block.txt' from 'easylist.txt' |   74     al_block.txt' from 'easylist.txt' | 
|   75  |   75  | 
|   76 You can clone the necessary repositories to a local directory and add `-i` |   76 You can clone the necessary repositories to a local directory and add `-i` | 
|   77 options accordingly. |   77 options accordingly. | 
|   78  |   78  | 
|   79 ## Library API |   79 ## Library API | 
|   80  |   80  | 
|   81 Python-abp can also be used as a library for parsing filter lists. For example |   81 Python-abp can also be used as a library for parsing filter lists. For example | 
|   82 to read a filter list (we use Python 3 syntax here but the API is the same): |   82 to read a filter list (we use Python 3 syntax here but the API is the same): | 
|   83  |   83  | 
|   84     from abp.filter import parse_filterlist |   84     from abp.filters import parse_filterlist | 
|   85  |   85  | 
|   86     with open('filterlist.txt') as filterlist: |   86     with open('filterlist.txt') as filterlist: | 
|   87         for line in parse_filterlist(filterlist): |   87         for line in parse_filterlist(filterlist): | 
|   88             print(line) |   88             print(line) | 
|   89  |   89  | 
|   90 If `filterlist.txt` contains a filter list, the output will look similar to |   90 If `filterlist.txt` contains a filter list: | 
|   91 the following: |   91  | 
 |   92     [Adblock Plus 2.0] | 
 |   93     ! Title: Example list | 
 |   94  | 
 |   95     abc.com,cdf.com##div#ad1 | 
 |   96     abc.com/ad$image | 
 |   97     @@/abc\.com/ | 
 |   98     ... | 
 |   99  | 
 |  100 the output will look something like: | 
|   92  |  101  | 
|   93     Header(version='Adblock Plus 2.0') |  102     Header(version='Adblock Plus 2.0') | 
|   94     Metadata(key='Title', value='Example List') |  103     Metadata(key='Title', value='Example list') | 
|   95     EmptyLine() |  104     EmptyLine() | 
|   96     Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': |  105     Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'd
     iv#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', Tr
     ue)])]) | 
|   97     'div#ad1'}, action='hide', options={'domains-include': ['abc.com', |  106     Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'a
     bc.com/ad'}, action='block', options=[('image', True)]) | 
|   98     'cdf.com'], 'domains-none': True}) |  107     Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\
     .com'}, action='allow', options=[]) | 
|   99     Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': |  | 
|  100     'abc.com/ad'}, action='block', options={'types-none': True, |  | 
|  101     'types-include': ['image']}) |  | 
|  102     Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': |  | 
|  103     'abc\\.com'}, action='allow', options={}) |  | 
|  104     ... |  108     ... | 
|  105  |  109  | 
|  106 In general `parse_filterlist` takes an iterable of strings (such as a list or |  110 `abp.filters` module also exports a lower-level function for parsing individual | 
|  107 an open file) and returns an iterable of parsed filter list lines. Each line |  111 lines of a filter list: `parse_line`. It returns a parsed line object just like | 
|  108 will have its `.type` attribute set to a string indicating its type. It will |  112 the items in the iterator returned by `parse_filterlist`.  | 
|  109 also have a `.to_string()` method that converts it to a unicode string in the |  | 
|  110 filter list format (most of the time it's the same as the string from which the |  | 
|  111 filter was parsed). Further attributes depend on the type of the line. |  | 
|  112  |  113  | 
|  113 **Note:** `parse_filterlist` returns an iterator, not a list, and only consumes |  114 For further information on the library API use `help()` on `abp.filters` and | 
|  114 the input lines when its output is iterated over. This allows much more memory |  115 its contents in interactive Python session, read the docstrings or look at the | 
|  115 efficient handling of large filter lists, however there are two things to watch |  116 tests for some usage examples. | 
|  116 out for: |  | 
|  117  |  | 
|  118 - When you're parsing filters from a file, you need to complete the iteration |  | 
|  119   before you close the file. |  | 
|  120 - Once you iterate over the output of `parse_filterlist` once, it will be |  | 
|  121   consumed and you won't be iterate over it again. |  | 
|  122  |  | 
|  123 If you find that any of these issues is bothering you, you probably want to |  | 
|  124 convert the output of `parse_filterlist` to a list: |  | 
|  125  |  | 
|  126     lines_list = list(parse_filterlist(filterlist)) |  | 
|  127  |  | 
|  128 This will load the whole file into memory but unless you're dealing with a |  | 
|  129 gigantic filter list that should not be a problem. |  | 
|  130  |  | 
|  131 ### Line types |  | 
|  132  |  | 
|  133 As mentioned before, lines of different types have different attributes: |  | 
|  134  |  | 
|  135 | type       | attributes                                                       
           | |  | 
|  136 |------------|------------------------------------------------------------------
     ------| |  | 
|  137 | header     | `version` - plugin version string                                
           | |  | 
|  138 | emptyline  | no options                                                       
           | |  | 
|  139 | comment    | `text` - text of the comment                                     
           | |  | 
|  140 | metadata   | `key` - name of the metadata field, `value` - value of the field 
           | |  | 
|  141 | include    | `target` - url/path of the file to include                       
           | |  | 
|  142 | invalid    | `text` - full text of the line, error - error message            
           | |  | 
|  143 | filter     | `text` - text of the filter, `selector` - what to look for, `acti
     on` - what to do with selected items, `options` - filter options | |  | 
|  144  |  | 
|  145 #### Filter atributes |  | 
|  146  |  | 
|  147 Selector is a dictionary with two keys: |  | 
|  148  |  | 
|  149 | key          | meaning                                            | |  | 
|  150 |--------------|----------------------------------------------------| |  | 
|  151 | type         | 'css', 'abp-simple', 'url-pattern', 'url-regexp'   | |  | 
|  152 | value        | the selector itself, the meaning is type-dependent | |  | 
|  153  |  | 
|  154 Options is a dictionary with a variable set of keys. Only options that are |  | 
|  155 actually present in the filter will be stored there. The list of possible option
     s |  | 
|  156 and their meanings can be found in [documentation on authoring the filter |  | 
|  157 rules][2]. |  | 
|  158  |  | 
|  159 There are four classes of options that are handled differently: |  | 
|  160  |  | 
|  161 - Type options (that make the rule apply or not apply to certain types of |  | 
|  162   requests and resources): |  | 
|  163     - `types-include`: List of additional types to which the rule applies. |  | 
|  164     - `types-exclude`: List of types to which the rule doesn't apply. |  | 
|  165     - `types-none`: If this is `True`, the filter only applies to the types |  | 
|  166       in `types-include`. Otherwise all types except for `document`, `popup`, |  | 
|  167       `elemhide`, `generichide` and `genericblock` are implicitly included. |  | 
|  168 - Domain options (that make the rule apply or not apply to specific domains): |  | 
|  169     - `domains-include`: List of domains to which the rule applies (it will also |  | 
|  170       apply to any subdomains unless they are excluded). |  | 
|  171     - `domains-exclude`: Excluded domains (their subdomains are also excluded |  | 
|  172       unless specifically included). |  | 
|  173     - `domains-none`: If this is `True`, all domains that are not mentioned by |  | 
|  174       `domains-include` and `domains-exclude` are excluded. Otherwise they are |  | 
|  175       included. |  | 
|  176 - `sitekeys`: List of sitekeys that can be used to activate the rule. |  | 
|  177 - Flags: `third-party`, `collapse`, `match-case`, etc. See [documentation][2] |  | 
|  178   for more information on their meaning. |  | 
|  179  |  | 
|  180 ### Other functions |  | 
|  181  |  | 
|  182 `abp.filters` module also exports two lower-level functions for parsing |  | 
|  183 individual lines of filter list or individual filters. Not very surprisingly |  | 
|  184 they are called `parse_line` and `parse_filter` respectively. Both will return |  | 
|  185 a parsed line object just like the items in the iterator returned by |  | 
|  186 `parse_filterlist`. The difference between them is that `parse_line` tries to |  | 
|  187 do line type detection and `parse_filter` will always try to interpret things |  | 
|  188 as a filter. Both functions will throw a `ParseError` exception instead of |  | 
|  189 returning a line with `type="invalid"`. |  | 
|  190  |  117  | 
|  191 ## Testing |  118 ## Testing | 
|  192  |  119  | 
|  193 Unit tests for `python-abp` are located in the `/tests` directory. |  120 Unit tests for `python-abp` are located in the `/tests` directory. | 
|  194 [Pytest][3] is used for quickly running the tests |  121 [Pytest][3] is used for quickly running the tests | 
|  195 during development. |  122 during development. | 
|  196 [Tox][4] is used for testing in different |  123 [Tox][4] is used for testing in different | 
|  197 environments (Python 2.7, 3.5, 3.6 and PyPy) and code quality |  124 environments (Python 2.7, Python 3.5+ and PyPy) and code quality | 
|  198 reporting. |  125 reporting. | 
|  199  |  126  | 
|  200 In order to execute the tests, first create and activate development |  127 In order to execute the tests, first create and activate development | 
|  201 virtualenv: |  128 virtualenv: | 
|  202  |  129  | 
|  203     $ python setup.py devenv |  130     $ python setup.py devenv | 
|  204     $ . devenv/bin/activate |  131     $ . devenv/bin/activate | 
|  205  |  132  | 
|  206 With the development virtualenv activated use pytest for a quick test run: |  133 With the development virtualenv activated use pytest for a quick test run: | 
|  207  |  134  | 
|  208     (devenv) $ pytest tests |  135     (devenv) $ pytest tests | 
|  209  |  136  | 
|  210 and tox for a comprehensive report: |  137 and tox for a comprehensive report: | 
|  211  |  138  | 
|  212     (devenv) $ tox |  139     (devenv) $ tox | 
|  213  |  140  | 
 |  141 ## Development | 
 |  142  | 
 |  143 When adding new functionality, add tests for it (preferably first). Code | 
 |  144 coverage (as measured by `tox -e qa`) should not decrease and the tests | 
 |  145 should pass in all Tox environments. | 
 |  146  | 
 |  147 All public functions, classes and methods should have docstrings compliant with | 
 |  148 [NumPy/SciPy documentation guide][5]. One exception is the constructors of | 
 |  149 classes that the user is not expected to instantiate (such as exceptions). | 
|  214  |  150  | 
|  215  [1]: https://adblockplus.org/filters#special-comments |  151  [1]: https://adblockplus.org/filters#special-comments | 
|  216  [2]: https://adblockplus.org/filters#options |  152  [2]: https://adblockplus.org/filters#options | 
|  217  [3]: http://pytest.org/ |  153  [3]: http://pytest.org/ | 
|  218  [4]: https://tox.readthedocs.org/ |  154  [4]: https://tox.readthedocs.org/ | 
 |  155  [5]: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt | 
| LEFT | RIGHT |