OLD | NEW |
(Empty) | |
| 1 abpcrawler |
| 2 ========== |
| 3 |
| 4 This tool loads a range of websites in Firefox and records which requests are |
| 5 blocked by the [Adblock Plus extension](http://adblockplus.org). |
| 6 |
| 7 Requirements |
| 8 ------------ |
| 9 |
| 10 * [Python 2.7](https://www.python.org) |
| 11 * [The Jinja2 module](http://jinja.pocoo.org/docs) |
| 12 * [mozrunner module](https://pypi.python.org/pypi/mozrunner) |
| 13 * [Firefox](https://www.mozilla.org/en-US/firefox/) |
| 14 |
| 15 Running |
| 16 ------- |
| 17 |
| 18 Execute the following: |
| 19 |
| 20 ./run.py -b /usr/bin/firefox urls.txt outputdir |
| 21 |
| 22 This will run the specified Firefox binary to crawl the URLs from `urls.txt` |
| 23 (one URL per line). The resulting data and screenshots will be written to the |
| 24 `outputdir` directory. Firefox will close automatically once all URLs have been |
| 25 processed. |
| 26 |
| 27 The complete list of command line flags: |
| 28 |
| 29 -h, --help show help message and exit |
| 30 -b BINARY, --binary BINARY |
| 31 path to the Firefox binary |
| 32 -a ABPDIR, --abpdir ABPDIR |
| 33 path to the Adblock Plus repository |
| 34 -f url [url ...], --filters url [url ...] |
| 35 filter lists to install in Adblock Plus. The arguments |
| 36 can also have the format path=url, the data will be |
| 37 read from the specified path then. |
| 38 -t TIMEOUT, --timeout TIMEOUT |
| 39 Load timeout (seconds) |
| 40 -x MAXTABS, --maxtabs MAXTABS |
| 41 Maximal number of tabs to open in parallel |
| 42 |
| 43 License |
| 44 ------- |
| 45 |
| 46 This Source Code is subject to the terms of the Mozilla Public License |
| 47 version 2.0 (the "License"). You can obtain a copy of the License at |
| 48 http://mozilla.org/MPL/2.0/. |
OLD | NEW |