Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 8432110: sitescripts: Script to extract domain-specific filters (Closed)

Created:
Sept. 28, 2012, 2:32 p.m. by Felix Dahlke
Modified:
Sept. 28, 2012, 7:20 p.m.
Reviewers:
Wladimir Palant
Visibility:
Public.

Description

I need this for the crawler, which should check which domain-specific filters are not used anymore. This is already pushed to default, that was an accident. I actually wanted to push it to a named branch.

Patch Set 1 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+77 lines, -0 lines) Patch
M sitescripts/crawler/README.md View 1 chunk +14 lines, -0 lines 0 comments Download
A sitescripts/crawler/bin/extract_filters.py View 1 chunk +63 lines, -0 lines 0 comments Download

Messages

Total messages: 4
Felix Dahlke
Sept. 28, 2012, 2:36 p.m. (2012-09-28 14:36:39 UTC) #1
Wladimir Palant
I don't really understand the purpose of the script, you will have to explain it ...
Sept. 28, 2012, 4:53 p.m. (2012-09-28 16:53:35 UTC) #2
Felix Dahlke
On 2012/09/28 16:53:35, Wladimir Palant wrote: > I don't really understand the purpose of the ...
Sept. 28, 2012, 7:19 p.m. (2012-09-28 19:19:31 UTC) #3
Felix Dahlke
Sept. 28, 2012, 7:20 p.m. (2012-09-28 19:20:22 UTC) #4
On 2012/09/28 19:19:31, Felix H. Dahlke wrote:
> On 2012/09/28 16:53:35, Wladimir Palant wrote:
> > I don't really understand the purpose of the script, you will have to
explain
> it
> > to me before I can review it. My understanding it that the crawler needs to
> > record *all* filter matches. If somebody wants to filter the resulting data
to
> > site-specific filter only - fine. But we should still have all the data.
> 
> Sorry, this is really lacking context. I wasn't planning to submit this part
for
> review in isolation.
> 
> I'm indeed working on having the crawler send all matched filters and the
sites
> on which they matched to the backend. However, those changes aren't ready for
> review yet and I didn't accidentally push them like this one :)
> 
> Now for the purpose of this script: I've discussed with Arthur that he wants
to
> know which domain-specific filters aren't used on sites from that domain -
these
> are apparently very likely obsolete. That's why I need to gather all the
> domain-specific filters and the domains on which they are supposed to match. I
> can then figure out which filters that should match on a specific domain
didn't.
> 
> But please don't review this just yet. I've thought some more about it and I'm
> going to approach it a bit differently. I will normalize the data and think
> about how to assign sites to domains. I will also rewrite this script (and
> extract_sites.py) to work directly on the database as you suggested.

In fact, I'll just close this review and open a new one once I'm done. In a
proper branch this time. Feel free to send me an email if you think my approach
isn't good.

Powered by Google App Engine
This is Rietveld