| OLD | NEW |
| 1 crawler | 1 crawler |
| 2 ======= | 2 ======= |
| 3 | 3 |
| 4 Backend for the Adblock Plus Crawler. It provides the following URLs: | 4 Backend for the Adblock Plus Crawler. It provides the following URLs: |
| 5 | 5 |
| 6 * */crawlableSites* - Return a list of sites to be crawled | 6 * */crawlableSites* - Return a list of sites to be crawled |
| 7 * */crawlerData* - Receive data on filtered elements | 7 * */crawlerRequests* - Receive all requests made, and whether they were filtered |
| 8 | 8 |
| 9 Required packages | 9 Required packages |
| 10 ----------------- | 10 ----------------- |
| 11 | 11 |
| 12 * [simplejson](http://pypi.python.org/pypi/simplejson/) | 12 * [simplejson](http://pypi.python.org/pypi/simplejson/) |
| 13 | 13 |
| 14 Database setup | 14 Database setup |
| 15 -------------- | 15 -------------- |
| 16 | 16 |
| 17 Just execute the statements in _schema.sql_. | 17 Just execute the statements in _schema.sql_. |
| 18 | 18 |
| 19 Configuration | 19 Configuration |
| 20 ------------- | 20 ------------- |
| 21 | 21 |
| 22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_. | 22 Just add an empty _crawler_ section to _/etc/sitescripts_ or _.sitescripts_. |
| 23 | 23 |
| 24 If you want to import crawlable sites from easylist (see below), you |
| 25 need to make _easylist\_repository_ point to the local Mercurial |
| 26 repository of easylist. |
| 27 |
| 24 Also make sure that the following keys are configured in the _DEFAULT_ | 28 Also make sure that the following keys are configured in the _DEFAULT_ |
| 25 section: | 29 section: |
| 26 | 30 |
| 27 * _database_ | 31 * _database_ |
| 28 * _dbuser_ | 32 * _dbuser_ |
| 29 * _dbpassword_ | 33 * _dbpassword_ |
| 30 * _basic\_auth\_realm_ | 34 * _basic\_auth\_realm_ |
| 31 * _basic\_auth\_username_ | 35 * _basic\_auth\_username_ |
| 32 * _basic\_auth\_password_ | 36 * _basic\_auth\_password_ |
| 33 | 37 |
| 34 Extracting crawler sites | 38 Importing crawlable sites from easylist |
| 35 ------------------------ | 39 --------------------------------------- |
| 36 | 40 |
| 37 Make _filter\_list\_repository_ in the _crawler_ configuration section | 41 python -m sitescripts.crawler.bin.import_sites |
| 38 point to the local Mercurial repository of a filter list. | |
| 39 | |
| 40 Then execute the following: | |
| 41 | |
| 42 python -m sitescripts.crawler.bin.extract_sites > sites.sql | |
| 43 | |
| 44 Now you can execute the insert statements from _crawler.sql_. | |
| OLD | NEW |