| Index: README.md |
| =================================================================== |
| new file mode 100644 |
| --- /dev/null |
| +++ b/README.md |
| @@ -0,0 +1,48 @@ |
| +abpcrawler |
| +========== |
| + |
| +This tool loads a range of websites in Firefox and records which requests are |
| +blocked by the [Adblock Plus extension](http://adblockplus.org). |
| + |
| +Requirements |
| +------------ |
| + |
| +* [Python 2.7](https://www.python.org) |
| +* [The Jinja2 module](http://jinja.pocoo.org/docs) |
| +* [mozrunner module](https://pypi.python.org/pypi/mozrunner) |
| +* [Firefox](https://www.mozilla.org/en-US/firefox/) |
| + |
| +Running |
| +------- |
| + |
| +Execute the following: |
| + |
| + ./run.py -b /usr/bin/firefox urls.txt outputdir |
| + |
| +This will run the specified Firefox binary to crawl the URLs from `urls.txt` |
| +(one URL per line). The resulting data and screenshots will be written to the |
| +`outputdir` directory. Firefox will close automatically once all URLs have been |
| +processed. |
| + |
| +The complete list of command line flags: |
| + |
| + -h, --help show help message and exit |
| + -b BINARY, --binary BINARY |
| + path to the Firefox binary |
| + -a ABPDIR, --abpdir ABPDIR |
| + path to the Adblock Plus repository |
| + -f url [url ...], --filters url [url ...] |
| + filter lists to install in Adblock Plus. The arguments |
| + can also have the format path=url, the data will be |
| + read from the specified path then. |
| + -t TIMEOUT, --timeout TIMEOUT |
| + Load timeout (seconds) |
| + -x MAXTABS, --maxtabs MAXTABS |
| + Maximal number of tabs to open in parallel |
| + |
| +License |
| +------- |
| + |
| +This Source Code is subject to the terms of the Mozilla Public License |
| +version 2.0 (the "License"). You can obtain a copy of the License at |
| +http://mozilla.org/MPL/2.0/. |