Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 9045097: sitescripts: Unmerged changes (Closed)

Created:
Dec. 21, 2012, 9:39 a.m. by Felix Dahlke
Modified:
Dec. 22, 2012, 5:56 a.m.
Reviewers:
Wladimir Palant
Visibility:
Public.

Description

Since we aren't going to merge the branch for 8492019, there are a couple of changes and fixes missing. I've put them all in this review. The most notable change is that we're importing sites directly into the database instead of generating import statements, like we had discussed earlier.

Patch Set 1 #

Total comments: 2
Unified diffs Side-by-side diffs Delta from patch set Stats (+84 lines, -65 lines) Patch
M sitescripts/crawler/README.md View 3 chunks +8 lines, -11 lines 0 comments Download
R sitescripts/crawler/bin/extract_sites.py View 1 chunk +0 lines, -47 lines 0 comments Download
A sitescripts/crawler/bin/import_sites.py View 1 chunk +62 lines, -0 lines 2 comments Download
M sitescripts/crawler/schema.sql View 2 chunks +3 lines, -3 lines 0 comments Download
M sitescripts/crawler/web/crawler.py View 2 chunks +11 lines, -4 lines 0 comments Download

Messages

Total messages: 2
Felix Dahlke
Dec. 21, 2012, 9:42 a.m. (2012-12-21 09:42:42 UTC) #1
Wladimir Palant
Dec. 21, 2012, 2:30 p.m. (2012-12-21 14:30:30 UTC) #2
LGTM with the two comments addressed

http://codereview.adblockplus.org/9045097/diff/1/sitescripts/crawler/bin/impo...
File sitescripts/crawler/bin/import_sites.py (right):

http://codereview.adblockplus.org/9045097/diff/1/sitescripts/crawler/bin/impo...
sitescripts/crawler/bin/import_sites.py:57: cursor.execute("INSERT INTO
crawler_sites (url) VALUES (%s)", url)
Add _get_db().commit()? Otherwise this might bite us if we ever decide to call
that function multiple times.

http://codereview.adblockplus.org/9045097/diff/1/sitescripts/crawler/bin/impo...
sitescripts/crawler/bin/import_sites.py:57: cursor.execute("INSERT INTO
crawler_sites (url) VALUES (%s)", url)
Use INSERT IGNORE in case the URL is already in the database or duplicated in
the history?

Powered by Google App Engine
This is Rietveld