Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: sitescripts/crawler/README.md

Issue 8432110: sitescripts: Script to extract domain-specific filters (Closed)
Patch Set: Created Sept. 28, 2012, 2:32 p.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_filters.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 crawler 1 crawler
2 ======= 2 =======
3 3
4 Backend for the Adblock Plus Crawler. It provides the following URLs: 4 Backend for the Adblock Plus Crawler. It provides the following URLs:
5 5
6 * */crawlableSites* - Return a list of sites to be crawled 6 * */crawlableSites* - Return a list of sites to be crawled
7 * */crawlerData* - Receive data on filtered elements 7 * */crawlerData* - Receive data on filtered elements
8 8
9 Required packages 9 Required packages
10 ----------------- 10 -----------------
(...skipping 24 matching lines...) Expand all
35 ------------------------ 35 ------------------------
36 36
37 Make _filter\_list\_repository_ in the _crawler_ configuration section 37 Make _filter\_list\_repository_ in the _crawler_ configuration section
38 point to the local Mercurial repository of a filter list. 38 point to the local Mercurial repository of a filter list.
39 39
40 Then execute the following: 40 Then execute the following:
41 41
42 python -m sitescripts.crawler.bin.extract_sites > sites.sql 42 python -m sitescripts.crawler.bin.extract_sites > sites.sql
43 43
44 Now you can execute the insert statements from _crawler.sql_. 44 Now you can execute the insert statements from _crawler.sql_.
45
46 Extracting domain-specific filters
47 --------------------------------
48
49 Make _filter\_list\_repository_ in the _crawler_ configuration section
50 point to the local Mercurial repository of a filter list.
51
52 You also have to set _domain\_specific\_filter\_files_ to a comma
53 separated list of files in the filter list repository that contain
54 domain-specific rules.
55
56 Then execute the following:
57
58 python -m sitescripts.crawler.bin.extract_filters > filters.sql
OLDNEW
« no previous file with comments | « no previous file | sitescripts/crawler/bin/extract_filters.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld