Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 8943045: Implemented extraction of URL Fixer data (Closed)

Created:
Nov. 23, 2012, 4:36 p.m. by Wladimir Palant
Modified:
Nov. 28, 2012, 12:39 p.m.
Reviewers:
Thomas Greiner
Visibility:
Public.

Description

Implemented extraction of URL Fixer data

Patch Set 1 #

Total comments: 7

Patch Set 2 : #

Patch Set 3 : Larger blacklist #

Unified diffs Side-by-side diffs Delta from patch set Stats (+148 lines, --1 lines) Patch
A sitescripts/urlfixer/bin/__init__.py View 1 2 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/urlfixer/bin/forceDomains.py View 1 2 1 chunk +53 lines, -0 lines 0 comments Download
A sitescripts/urlfixer/bin/topDomains.py View 1 2 1 chunk +95 lines, -0 lines 0 comments Download
M sitescripts/urlfixer/schema.sql View 1 2 1 chunk +1 line, -0 lines 0 comments Download

Messages

Total messages: 6
Wladimir Palant
Nov. 23, 2012, 4:36 p.m. (2012-11-23 16:36:57 UTC) #1
Thomas Greiner
http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py File sitescripts/urlfixer/bin/topDomains.py (right): http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py#newcode41 sitescripts/urlfixer/bin/topDomains.py:41: re.search(r"['\"_,<>;]|^\.|\.$|\.\.", domain)): That's a small selection of special characters. ...
Nov. 28, 2012, 10:05 a.m. (2012-11-28 10:05:42 UTC) #2
Wladimir Palant
http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py File sitescripts/urlfixer/bin/topDomains.py (right): http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py#newcode31 sitescripts/urlfixer/bin/topDomains.py:31: def getTopDomains(count=1000): Note that this is count=5000 on the ...
Nov. 28, 2012, 10:49 a.m. (2012-11-28 10:49:24 UTC) #3
Thomas Greiner
http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py File sitescripts/urlfixer/bin/topDomains.py (right): http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py#newcode31 sitescripts/urlfixer/bin/topDomains.py:31: def getTopDomains(count=1000): Ok. http://codereview.adblockplus.org/8943045/diff/1/sitescripts/urlfixer/bin/topDomains.py#newcode41 sitescripts/urlfixer/bin/topDomains.py:41: re.search(r"['\"_,<>;]|^\.|\.$|\.\.", domain)): Ok. For ...
Nov. 28, 2012, 11:17 a.m. (2012-11-28 11:17:39 UTC) #4
Wladimir Palant
Nov. 28, 2012, 11:30 a.m. (2012-11-28 11:30:34 UTC) #5
Thomas Greiner
Nov. 28, 2012, 12:23 p.m. (2012-11-28 12:23:45 UTC) #6
LGTM

Powered by Google App Engine
This is Rietveld