| OLD | NEW |
| 1 crawler | 1 crawler |
| 2 ======= | 2 ======= |
| 3 | 3 |
| 4 Backend for the Adblock Plus Crawler. It provides the following URLs: | 4 Backend for the Adblock Plus Crawler. It provides the following URLs: |
| 5 | 5 |
| 6 * */crawlableSites* - Return a list of sites to be crawled | 6 * */crawlableSites* - Return a list of sites to be crawled |
| 7 * */crawlerData* - Receive data on filtered elements | 7 * */crawlerData* - Receive data on filtered elements |
| 8 | 8 |
| 9 Required packages | 9 Required packages |
| 10 ----------------- | 10 ----------------- |
| (...skipping 24 matching lines...) Expand all Loading... |
| 35 ------------------------ | 35 ------------------------ |
| 36 | 36 |
| 37 Make _filter\_list\_repository_ in the _crawler_ configuration section | 37 Make _filter\_list\_repository_ in the _crawler_ configuration section |
| 38 point to the local Mercurial repository of a filter list. | 38 point to the local Mercurial repository of a filter list. |
| 39 | 39 |
| 40 Then execute the following: | 40 Then execute the following: |
| 41 | 41 |
| 42 python -m sitescripts.crawler.bin.extract_sites > sites.sql | 42 python -m sitescripts.crawler.bin.extract_sites > sites.sql |
| 43 | 43 |
| 44 Now you can execute the insert statements from _crawler.sql_. | 44 Now you can execute the insert statements from _crawler.sql_. |
| 45 |
| 46 Extracting domain-specific filters |
| 47 -------------------------------- |
| 48 |
| 49 Make _filter\_list\_repository_ in the _crawler_ configuration section |
| 50 point to the local Mercurial repository of a filter list. |
| 51 |
| 52 You also have to set _domain\_specific\_filter\_files_ to a comma |
| 53 separated list of files in the filter list repository that contain |
| 54 domain-specific rules. |
| 55 |
| 56 Then execute the following: |
| 57 |
| 58 python -m sitescripts.crawler.bin.extract_filters > filters.sql |
| OLD | NEW |