DescriptionIn order to collect all unmatched domain-specific filters, I import all domain-specific filters from easylist and associate them with requests send by the crawler.
As discussed, we will probably import all filters, not just domain-specific ones, from compiled lists in the future. However, since we've decided to put the crawler on hold for now, I think we should leave that part as it is and take it from there in a few months.
The part where I actually figure out which domain-specific filters are unused is still missing. But we should be able to extract that from the database, all the data is there.
I also used the opportunity to turn the extract_crawler_sites script into one that doesn't directly import the data into the database (import_sites) and made a few improvements.
Patch Set 1 #
MessagesTotal messages: 6
|