Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Unified Diff: sitescripts/crawler/README.md

Issue 8432110: sitescripts: Script to extract domain-specific filters (Closed)
Patch Set: Created Sept. 28, 2012, 2:32 p.m.
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_filters.py » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: sitescripts/crawler/README.md
===================================================================
--- a/sitescripts/crawler/README.md
+++ b/sitescripts/crawler/README.md
@@ -42,3 +42,17 @@
python -m sitescripts.crawler.bin.extract_sites > sites.sql
Now you can execute the insert statements from _crawler.sql_.
+
+Extracting domain-specific filters
+--------------------------------
+
+Make _filter\_list\_repository_ in the _crawler_ configuration section
+point to the local Mercurial repository of a filter list.
+
+You also have to set _domain\_specific\_filter\_files_ to a comma
+separated list of files in the filter list repository that contain
+domain-specific rules.
+
+Then execute the following:
+
+ python -m sitescripts.crawler.bin.extract_filters > filters.sql
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_filters.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld