| Index: sitescripts/crawler/README.md |
| =================================================================== |
| --- a/sitescripts/crawler/README.md |
| +++ b/sitescripts/crawler/README.md |
| @@ -42,3 +42,17 @@ |
| python -m sitescripts.crawler.bin.extract_sites > sites.sql |
| Now you can execute the insert statements from _crawler.sql_. |
| + |
| +Extracting domain-specific filters |
| +-------------------------------- |
| + |
| +Make _filter\_list\_repository_ in the _crawler_ configuration section |
| +point to the local Mercurial repository of a filter list. |
| + |
| +You also have to set _domain\_specific\_filter\_files_ to a comma |
| +separated list of files in the filter list repository that contain |
| +domain-specific rules. |
| + |
| +Then execute the following: |
| + |
| + python -m sitescripts.crawler.bin.extract_filters > filters.sql |