Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Issue 11481051: Update stats processing (Closed)

Created:
Aug. 23, 2013, 3:53 p.m. by Wladimir Palant
Modified:
Sept. 11, 2013, 1:31 p.m.
Visibility:
Public.

Description

This is a large patch, sorry. It probably doesn`t make sense to review this in detail - some polishing is certainly possible but this isn`t really critical. At least I tried to comply with PEP 8 this time :). However, it would be great if you could verify the general approach. Also, anything that can speed up log processing would be appreciated. In general, currently there are three steps executed daily: 1) Mirror servers process their logs and save the result in a temporary file. 2) Stats master retrieves these files and merges them into the existing data. 3) Stats master produces static HTML pages for the data. We probably want to move 1) off the mirrors to the stats master but right now we have too many logs to process them on a single server. Also, we probably don`t want to rebuild all pages in 3) but only do it if the data changed.

Patch Set 1 #

Total comments: 1

Patch Set 2 : Fixed two presentation issues #

Total comments: 22

Patch Set 3 : Fixed review issues #

Patch Set 4 : String concatenation fix #

Total comments: 13

Patch Set 5 : Improved performance using memoization #

Unified diffs Side-by-side diffs Delta from patch set Stats (+1872 lines, -910 lines) Patch
M .sitescripts.example View 2 chunks +17 lines, -4 lines 0 comments Download
M sitescripts/stats/__init__.py View 0 chunks +-1 lines, --1 lines 0 comments Download
M sitescripts/stats/bin/__init__.py View 0 chunks +-1 lines, --1 lines 0 comments Download
M sitescripts/stats/bin/datamerger.py View 1 2 1 chunk +65 lines, -40 lines 0 comments Download
M sitescripts/stats/bin/logprocessor.py View 1 2 3 4 1 chunk +343 lines, -86 lines 0 comments Download
M sitescripts/stats/bin/pagegenerator.py View 1 2 3 1 chunk +138 lines, -116 lines 0 comments Download
A sitescripts/stats/common.py View 1 2 3 4 1 chunk +147 lines, -0 lines 0 comments Download
M sitescripts/stats/countrycodes.py View 1 chunk +1 line, -1 line 0 comments Download
M sitescripts/stats/static/flags.css View 0 chunks +-1 lines, --1 lines 0 comments Download
M sitescripts/stats/static/flags.png View Binary file 0 comments Download
M sitescripts/stats/static/hours.css View 0 chunks +-1 lines, --1 lines 0 comments Download
M sitescripts/stats/static/hours.png View Binary file 0 comments Download
M sitescripts/stats/static/stats.css View 1 chunk +118 lines, -97 lines 0 comments Download
M sitescripts/stats/template/fileOverview.html View 1 chunk +76 lines, -89 lines 0 comments Download
M sitescripts/stats/template/fileStats.html View 1 1 chunk +202 lines, -384 lines 0 comments Download
M sitescripts/stats/template/main.html View 1 chunk +71 lines, -70 lines 0 comments Download
A sitescripts/stats/test/__init__.py View 0 chunks +-1 lines, --1 lines 0 comments Download
A sitescripts/stats/test/common.py View 1 chunk +44 lines, -0 lines 0 comments Download
A sitescripts/stats/test/logprocessor.py View 1 chunk +640 lines, -0 lines 0 comments Download
M sitescripts/templateFilters.py View 4 chunks +1 line, -25 lines 0 comments Download
M sitescripts/utils.py View 1 2 1 chunk +14 lines, -3 lines 0 comments Download

Messages

Total messages: 22
Wladimir Palant
Aug. 23, 2013, 3:53 p.m. (2013-08-23 15:53:54 UTC) #1
Wladimir Palant
Aug. 24, 2013, 1:12 p.m. (2013-08-24 13:12:03 UTC) #2
Sebastian Noack
Also you use a lot of inline regular expressions. So if performance is an issue, ...
Aug. 26, 2013, 4:05 p.m. (2013-08-26 16:05:22 UTC) #3
Wladimir Palant
On 2013/08/26 16:05:22, sebastian wrote: > Also you use a lot of inline regular expressions. ...
Aug. 27, 2013, 7:34 a.m. (2013-08-27 07:34:28 UTC) #4
Sebastian Noack
On 2013/08/27 07:34:28, Wladimir Palant wrote: > On 2013/08/26 16:05:22, sebastian wrote: > > Also ...
Aug. 27, 2013, 10:05 a.m. (2013-08-27 10:05:50 UTC) #5
Wladimir Palant
Sebastian, I merely copied your reply to the right comment threads here. http://codereview.adblockplus.org/11481051/diff/4002/sitescripts/stats/bin/datamerger.py File sitescripts/stats/bin/datamerger.py ...
Aug. 27, 2013, 11:59 a.m. (2013-08-27 11:59:47 UTC) #6
Wladimir Palant
On 2013/08/27 10:05:50, sebastian wrote: > Precompiled reular expressions actually can not be slower, because ...
Aug. 27, 2013, 12:42 p.m. (2013-08-27 12:42:01 UTC) #7
Sebastian Noack
lgtm
Aug. 27, 2013, 12:44 p.m. (2013-08-27 12:44:41 UTC) #8
Felix Dahlke
I only focused on the notification parsing changes here, and optimising that in particular. I ...
Aug. 28, 2013, 5:25 p.m. (2013-08-28 17:25:32 UTC) #9
Sebastian Noack
On 2013/08/28 17:25:32, Felix H. Dahlke wrote: > I ran this with PyPy, which does ...
Aug. 29, 2013, 10:18 a.m. (2013-08-29 10:18:05 UTC) #10
Felix Dahlke
On 2013/08/29 10:18:05, sebastian wrote: > Using PyPy is a pretty good idea. But you ...
Aug. 29, 2013, 10:29 a.m. (2013-08-29 10:29:56 UTC) #11
Felix Dahlke
http://codereview.adblockplus.org/11481051/diff/23002/sitescripts/stats/bin/logprocessor.py File sitescripts/stats/bin/logprocessor.py (right): http://codereview.adblockplus.org/11481051/diff/23002/sitescripts/stats/bin/logprocessor.py#newcode29 sitescripts/stats/bin/logprocessor.py:29: match = re.search(r"\bOpera/([\d\.]+)", ua) On 2013/08/29 10:18:05, sebastian wrote: ...
Aug. 29, 2013, 10:30 a.m. (2013-08-29 10:30:09 UTC) #12
Sebastian Noack
On 2013/08/29 10:29:56, Felix H. Dahlke wrote: > On 2013/08/29 10:18:05, sebastian wrote: > > ...
Aug. 29, 2013, 10:54 a.m. (2013-08-29 10:54:30 UTC) #13
Felix Dahlke
On 2013/08/29 10:54:30, sebastian wrote: > I don't see how a GC can have no ...
Aug. 29, 2013, 12:38 p.m. (2013-08-29 12:38:19 UTC) #14
Felix Dahlke
http://codereview.adblockplus.org/11481051/diff/23002/sitescripts/stats/bin/logprocessor.py File sitescripts/stats/bin/logprocessor.py (right): http://codereview.adblockplus.org/11481051/diff/23002/sitescripts/stats/bin/logprocessor.py#newcode29 sitescripts/stats/bin/logprocessor.py:29: match = re.search(r"\bOpera/([\d\.]+)", ua) On 2013/08/29 10:54:30, sebastian wrote: ...
Aug. 29, 2013, 12:38 p.m. (2013-08-29 12:38:26 UTC) #15
Sebastian Noack
On 2013/08/29 12:38:19, Felix H. Dahlke wrote: > On 2013/08/29 10:54:30, sebastian wrote: > > ...
Aug. 29, 2013, 1:33 p.m. (2013-08-29 13:33:46 UTC) #16
Felix Dahlke
On 2013/08/29 13:33:46, sebastian wrote: > this can speed up things. But in a CPU-bound ...
Aug. 29, 2013, 1:41 p.m. (2013-08-29 13:41:31 UTC) #17
Wladimir Palant
I added memoization in the places noted by Felix, this caused a 16% performance improvement ...
Aug. 29, 2013, 1:42 p.m. (2013-08-29 13:42:42 UTC) #18
Felix Dahlke
LGTM
Aug. 29, 2013, 1:58 p.m. (2013-08-29 13:58:57 UTC) #19
Felix Dahlke
On 2013/08/29 13:42:42, Wladimir Palant wrote: > However, using PyPy is non-trivial to say the ...
Aug. 29, 2013, 3:06 p.m. (2013-08-29 15:06:16 UTC) #20
Felix Dahlke
On 2013/08/29 15:06:16, Felix H. Dahlke wrote: > I tested with 10,000,000 entries and without ...
Aug. 29, 2013, 3:13 p.m. (2013-08-29 15:13:36 UTC) #21
Wladimir Palant
Aug. 29, 2013, 7:40 p.m. (2013-08-29 19:40:28 UTC) #22
Ok, you got me convinced. See http://codereview.adblockplus.org/11577044/.

Powered by Google App Engine
This is Rietveld