|
|
|
Created:
Sept. 21, 2015, 11:52 a.m. by mathias Modified:
Sept. 23, 2015, 9:40 a.m. Visibility:
Public. |
DescriptionIssue 3102 - Introduce log_format "data" for storing POST messages
Patch Set 1 #Patch Set 2 : Issue 3102 - Now the other way around #
Total comments: 6
Patch Set 3 : Issue 3102 - Include log_format main with data #Patch Set 4 : Issue 3102 - Fixed missing whitespace issue #MessagesTotal messages: 10
LGTM
https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... File modules/nginx/templates/nginx.conf.erb (right): https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' Why not have the same fields as with "main" here, and just add one for the data? https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' Do we have more than one use case for this? Will filter hit statistics use the same approach? If not - I would lean towards defining this in the relevant module, not globally (as we're doing it for the notification servers).
https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... File modules/nginx/templates/nginx.conf.erb (right): https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' On 2015/09/21 18:30:45, Felix Dahlke wrote: > Why not have the same fields as with "main" here, and just add one for the data? Because the data is meant to be anonymous. There is no need to track the IP or any of the other meta-data associated with the request, and the selected fields have been chosen together with our data scientists. https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' On 2015/09/21 18:30:45, Felix Dahlke wrote: > Do we have more than one use case for this? Will filter hit statistics use the > same approach? If not - I would lean towards defining this in the relevant > module, not globally (as we're doing it for the notification servers). Yes the filter hit statistics are meant to use the same logic (see my e-mail update from today). In fact there are even a few more upcoming projects that could use this exact approach. And from what we know so far, it looks like all of them would be served with this solution quite well. Note that the particular choice of fields logged has been made up with that in mind.
https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... File modules/nginx/templates/nginx.conf.erb (right): https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' On 2015/09/21 18:38:53, mathias wrote: > On 2015/09/21 18:30:45, Felix Dahlke wrote: > > Why not have the same fields as with "main" here, and just add one for the > data? > > Because the data is meant to be anonymous. There is no need to track the IP or > any of the other meta-data associated with the request, and the selected fields > have been chosen together with our data scientists. While it's not relevant for the Adblock Browser data backend, for something like filter hit statistics we need the IP for debugging purposes IMHO, anyone seeing this will have a very easy time fooling us with false data. Since we won't store these access logs for longer than 30 days, I see no reason why we shouldn't just have the same set of fields, or well, at least the IP.
https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... File modules/nginx/templates/nginx.conf.erb (right): https://codereview.adblockplus.org/29328450/diff/29328453/modules/nginx/templ... modules/nginx/templates/nginx.conf.erb:22: log_format data '[$time_local] "$request" $status "$http_user_agent" ' On 2015/09/21 20:15:51, Felix Dahlke wrote: > While it's not relevant for the Adblock Browser data backend, for something like > filter hit statistics we need the IP for debugging purposes IMHO, anyone seeing > this will have a very easy time fooling us with false data. Since we won't store > these access logs for longer than 30 days, I see no reason why we shouldn't just > have the same set of fields, or well, at least the IP. In fact most of the fields currently matching those in the access log have been introduced for that exact purpose (so we could at least determine a matching IP based on the assumption that other requests in the same time are valid), but I see your point. Especially the fact that this approach would be way easier to handle in aggregation. To be honest I do not like combining system log content with application log stuff (like done with the notifications, for example), which is the sole reason for the separation. But I can agree to just having two versions. I'll update the patch-set accordingly.
LGTM
|
