OLD | NEW |
1 abpcrawler | 1 abpcrawler |
2 ========== | 2 ========== |
3 | 3 |
4 Firefox extension that loads a range of websites and records which | 4 Firefox extension that loads a range of websites and records which |
5 elements are filtered by [Adblock Plus](http://adblockplus.org). | 5 elements are filtered by [Adblock Plus](http://adblockplus.org). |
6 | 6 |
7 Requirements | 7 Requirements |
8 ------------ | 8 ------------ |
9 | 9 |
10 * [Mercurial](https://www.mercurial-scm.org/) or [Git](https://git-scm.com/) (wh
ichever you used to clone this repository) | 10 * [Mercurial](https://www.mercurial-scm.org/) or [Git](https://git-scm.com/) (wh
ichever you used to clone this repository) |
11 * [Python 2.x](https://www.python.org) | 11 * [Python 2.x](https://www.python.org) |
12 * [The Jinja2 module](http://jinja.pocoo.org/docs) | 12 * [The Jinja2 module](http://jinja.pocoo.org/docs) |
13 * [mozrunner module](https://pypi.python.org/pypi/mozrunner) | 13 * [mozrunner module](https://pypi.python.org/pypi/mozrunner) |
14 | 14 |
15 Running | 15 Running |
16 ------- | 16 ------- |
17 | 17 |
18 Execute the following: | 18 Execute the following: |
19 | 19 |
20 ./run.py -b /usr/bin/firefox urls.txt outputdir | 20 ./run.py -b /usr/bin/firefox -l urls.txt -o outputdir |
21 | 21 |
22 This will run the specified Firefox binary to crawl the URLs from `urls.txt` | 22 This will run the specified Firefox binary to crawl the URLs from `urls.txt` |
23 (one URL per line). The resulting data and screenshots will be written to the | 23 (one URL per line). The resulting data and screenshots will be written to the |
24 `outputdir` directory. Firefox will close automatically once all URLs have been | 24 `outputdir` directory. Firefox will close automatically once all URLs have been |
25 processed. | 25 processed. |
26 | 26 |
| 27 In addition parameters can be loaded from a configuration file. Use -c/--config |
| 28 `path to json file` and long-named command line parameters as property names, |
| 29 please take a look at the example `config.json.example`. Command line |
| 30 parameters passed alongside with config file take precedence. |
| 31 Example: |
| 32 ./run.py -c config.json.example -l example.urls |
| 33 |
| 34 This is equiualent to |
| 35 ./run.py -b /usr/bin/firefox -l example.urls -o ../output |
| 36 |
27 Optionally, you can provide the path to the Adblock Plus repository - Adblock | 37 Optionally, you can provide the path to the Adblock Plus repository - Adblock |
28 Plus will no longer be downloaded then. | 38 Plus will no longer be downloaded then. |
29 | 39 |
30 License | 40 License |
31 ------- | 41 ------- |
32 | 42 |
33 This Source Code is subject to the terms of the Mozilla Public License | 43 This Source Code is subject to the terms of the Mozilla Public License |
34 version 2.0 (the "License"). You can obtain a copy of the License at | 44 version 2.0 (the "License"). You can obtain a copy of the License at |
35 http://mozilla.org/MPL/2.0/. | 45 http://mozilla.org/MPL/2.0/. |
OLD | NEW |