README.md - Issue 29465720: Issue 4970 - Document the library API of python-abp

Unified Diff: README.md

Issue 29465720: Issue 4970 - Document the library API of python-abp (Closed)

Patch Set: Update README to match the changes from https://codereview.adblockplus.org/29465715/ Created Aug. 7, 2017, 8:28 p.m.

Use n/p to move between diff chunks; N/P to move between comments.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: README.md

===================================================================

--- a/README.md

+++ b/README.md

@@ -1,20 +1,21 @@

# python-abp

-This repository contains the script that is used for building Adblock Plus

-filter lists from the form in which they are authored into the format suitable

-for consumption by the adblocking software.

+This repository contains a library for working with Adblock Plus filter lists

+and the script that is used for building Adblock Plus filter lists from the

+form in which they are authored into the format suitable for consumption by the

+adblocking software.

mathias 2017/08/08 12:24:35 For an introduction that is a bit too much. How ab

## Installation

Prerequisites:

* Linux, Mac OS X or Windows (any modern Unix should work too),

-* Python (2.7 or 3.5),

+* Python (2.7 or 3.5+),

* pip.

To install:

$ pip install -U python-abp

## Rendering of filter lists

@@ -23,30 +24,30 @@

for particular geographical area.

We call these parts _filter list fragments_ (or just _fragments_)

to distinguish them from full filter lists that are

consumed by the adblocking software such as Adblock Plus.

Rendering is a process that combines filter list fragments into a filter list.

It starts with one fragment that can include other ones and so forth.

The produced filter list is marked with a version, a timestamp and

-a [checksum](https://adblockplus.org/filters#special-comments).

+a [checksum][1].

Python-abp contains a script that can do this called `flrender`:

$ flrender fragment.txt filterlist.txt

This will take the top level fragment in `fragment.txt`, render it and save into

`filterlist.txt`.

Fragments might reference other fragments that should be included into them.

The references come in two forms: http(s) includes and local includes:

%include http://www.server.org/dir/list.txt%

- %include easylist:easylist/easylist_general_block.txt

+ %include easylist:easylist/easylist_general_block.txt%

The first instruction contains a URL that will be fetched and inserted at the

point of reference.

The second one contains a path inside easylist repository.

`flrender` needs to be able to find a copy of the repository on the local

filesystem. We use `-i` option to point it to to the right directory:

$ flrender -i easylist=/home/abc/easylist input.txt output.txt

@@ -70,30 +71,150 @@

$ flrender easylist.txt output/easylist.txt

Unknown source: 'easylist' when including 'easylist:easylist/easylist_gener

al_block.txt' from 'easylist.txt'

You can clone the necessary repositories to a local directory and add `-i`

options accordingly.

+## Library API

+Python-abp can also be used as a library for parsing filter lists. For example

+to read a filter list (we use Python 3 syntax here but the API is the same):

+ from abp.filters import parse_filterlist

+ with open('filterlist.txt') as filterlist:

+ for line in parse_filterlist(filterlist):

+ print(line)

+If `filterlist.txt` contains a filter list:

+ [Adblock Plus 2.0]

+ ! Title: Example list

+ abc.com,cdf.com##div#ad1

+ abc.com/ad$image

+ @@/abc\.com/

+ ...

+the output will look similar to the following:

+ Header(version='Adblock Plus 2.0')

+ Metadata(key='Title', value='Example list')

+ EmptyLine()

+ Filter(text='abc.com,cdf.com##div#ad1', selector={'type': 'css', 'value': 'div#ad1'}, action='hide', options=[('domain', [('abc .com', True), ('cdf.com', True)])])

+ Filter(text='abc.com/ad$image', selector={'type': 'url-pattern', 'value': 'abc.com/ad'}, action='block', options=[('image', True)])

+ Filter(text='@@/abc\\.com/', selector={'type': 'url-regexp', 'value': 'abc\\.com'}, action='allow', options=[])

+ ...

+In general `parse_filterlist` takes an iterable of strings (such as a list or

+an open file) and returns an iterable of parsed filter list lines. Each line

+will have its `.type` attribute set to a string indicating its type. It will

+also have a `.to_string()` method that converts it to a unicode string in the

+filter list format (most of the time it's the same as the string from which the

+filter was parsed). Further attributes depend on the type of the line.

+**Note:** `parse_filterlist` returns an iterator, not a list, and only consumes

+the input lines when its output is iterated over. This allows much more memory

+efficient handling of large filter lists, however there are two things to watch

+out for:

+**Note:** iteration over parsed lines may throw a `ParseError` exception if a

+line cannot be parsed. The exception will contain the information about the

+error and the original line that failed parsing.

mathias 2017/08/08 12:24:35 It is not clear what bits this is about (I assume

Vasily Kuznetsov 2017/08/08 14:31:12 Yeah, we've discussed this. But for now that chang

+- When you're parsing filters from a file, you need to complete the iteration

+ before you close the file.

+- Once you iterate over the output of `parse_filterlist` once, it will be

+ consumed and you won't be iterate over it again.

+If you find that this is bothering you, you probably want to convert the output

mathias 2017/08/08 12:24:34 Everything in this section from here on, maybe inc

+of `parse_filterlist` to a list:

+ lines_list = list(parse_filterlist(filterlist))

+This will load the whole file into memory but unless you're dealing with a

+gigantic filter list that should not be a problem.

+### Line types

+As mentioned above, lines of different types have different attributes:

+| type | attributes |

mathias 2017/08/08 12:24:35 Are you sure this kind of table markup is supporte

Vasily Kuznetsov 2017/08/08 14:31:12 Indeed the table markup was not part of the origin

+|------------|------------------------------------------------------------------------|

+| header | `version` - plugin version string |

+| emptyline | no options |

+| comment | `text` - text of the comment |

+| metadata | `key` - name of the metadata field, `value` - value of the field |

+| include | `target` - url/path of the file to include |

+| filter | `text` - text of the filter, `selector` - what to look for, `action` - what to do with selected items, `options` - filter options |

+#### Filter atributes

mathias 2017/08/08 12:24:35 This section mentions "Selector" but not ".selecto

+Selector is a dictionary with two keys:

+| key | meaning |

+|--------------|------------------------------------------------------------------|

+| type | 'css', 'abp-simple', 'url-pattern', 'url-regexp', 'extended-css' |

+| value | the selector itself, the meaning is type-dependent |

+It's preferable to import `SELECTOR_TYPE` namespace from `abp.filters` to refer

+to filter types instead of using strings. `SELECTOR_TYPE` contains constants

+for each filter type: `SELECTOR_TYPE.CSS`, `SELECTOR_TYPE.ABP_SIMPLE`,

+`SELECTOR_TYPE.URL_PATTERN`, `SELECTOR_TYPE.URL_REGEXP` and

+`SELECTOR_TYPE.XCSS`.

+Action instructs adblocking software on what should be done with the items

+matching the selector:

+| action | meaning |

+|--------|------------------------------------------------------------------------|

+| block | block http(s) request that matches the selector |

+| allow | allow http(s) request that matches the filter (whitelist the resource) |

+| hide | hide the DOM element that matches the selector |

+| show | show the DOM element that matches the selector (whitelist the element) |

+The action constants are contained in `FILTER_ACTION` namespace, which can also

+be imported from `abp.filters` (`FILTER_ACTION.BLOCK`, `FILTER_ACTION.ALLOW`,

+etc.)

+Options is a list of tuples consisting of option name and option value. The

+option value is `True` or `False` for flags or, for options with a value, it's

+a string, list of strings or a list of `(string, boolean)` tuples. See

+[documentation on authoring the filter rules][2] for the list of existing

+options and their meanings.

+### Other functions

+`abp.filters` module also exports a lower-level function for parsing individual

+lines of a filter list: `parse_line`. It returns a parsed line object just like

+the items in the iterator returned by `parse_filterlist`.

## Testing

Unit tests for `python-abp` are located in the `/tests` directory.

-[Pytest](http://pytest.org/) is used for quickly running the tests

+[Pytest][3] is used for quickly running the tests

during development.

-[Tox](https://tox.readthedocs.org/) is used for testing in different

-environments (Python 2.7, Python 3.5 and PyPy) and code quality

+[Tox][4] is used for testing in different

+environments (Python 2.7, Python 3.5+ and PyPy) and code quality

reporting.

In order to execute the tests, first create and activate development

virtualenv:

$ python setup.py devenv

$ . devenv/bin/activate

With the development virtualenv activated use pytest for a quick test run:

- (devenv) $ py.test tests

+ (devenv) $ pytest tests

and tox for a comprehensive report:

(devenv) $ tox

+ [1]: https://adblockplus.org/filters#special-comments

+ [2]: https://adblockplus.org/filters#options

+ [3]: http://pytest.org/

+ [4]: https://tox.readthedocs.org/

« no previous file with comments | « no previous file | no next file » | no next file with comments »