|
|
Created:
Dec. 27, 2013, 7:43 a.m. by Wladimir Palant Modified:
Jan. 30, 2014, 1:16 p.m. Reviewers:
Sebastian Noack Visibility:
Public. |
DescriptionPage generation currently produces an error because of a very long addon name received from some client (corresponding file name is longer than 255 characters). We need to account for that scenario properly.
Patch Set 1 #Patch Set 2 : Decreased length limit #
Total comments: 8
Patch Set 3 : Addressed nit #Patch Set 4 : Addressed remaining nit #MessagesTotal messages: 10
http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:18: import re, hashlib According to PEP-8, the import should go on an own line, see http://www.python.org/dev/peps/pep-0008/#imports. http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) You can pass the data directly to the md5() constructor: hashlib.md5(result[150:]).hexdigest() Also I would prefer sha1 over md5.
http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2013/12/27 13:49:58, Sebastian Noack wrote: > Also I would prefer sha1 over md5. Mind explaining why? The only property we need here is low probability of collisions, it doesn't have to be strong cryptographically. MD5 also has the advantage of being less computationally intensive than SHA1 and it produces shorter hashes.
http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2013/12/28 09:56:22, Wladimir Palant wrote: > On 2013/12/27 13:49:58, Sebastian Noack wrote: > > Also I would prefer sha1 over md5. > > Mind explaining why? The only property we need here is low probability of > collisions, it doesn't have to be strong cryptographically. MD5 also has the > advantage of being less computationally intensive than SHA1 and it produces > shorter hashes. Beside a smaller digest size, MD5 also has various vulnerabilities against collision attacks. So MD5 should be avoided in favor of SHA1 in new code. However when you don't need a cryptographic hash function, but just a short and fast checksum, why don't you use CRC?
http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2013/12/28 12:41:18, Sebastian Noack wrote: > However when you don't need a cryptographic hash function, but just a short and > fast checksum, why don't you use CRC? CRC checksums aren't meant to be unique, they should merely allow detecting changes in the original message.
http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2013/12/28 16:58:14, Wladimir Palant wrote: > On 2013/12/28 12:41:18, Sebastian Noack wrote: > > However when you don't need a cryptographic hash function, but just a short > and > > fast checksum, why don't you use CRC? > > CRC checksums aren't meant to be unique, they should merely allow detecting > changes in the original message. Cryptographic hashes aren't unique either, they just use cryptographic algorithms to distribute the collisions more randomly. However I agree that CRC-32 might have too many collisions. And since CRC-64 and CRC-128 aren't available in Python's core library, using cryptographic hashes instead is fine. I would prefer SHA1, but since collision attacks aren't a concern here, I don't mind if you prefer to stick with MD5.
I've addressed the nit. Concerning the other comment see below. http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2013/12/29 16:58:23, Sebastian Noack wrote: > I would prefer SHA1, but since collision attacks aren't a concern here, I don't > mind if you prefer to stick with MD5. As I said, MD5 has several advantages over SHA1 for us - and collision attacks are irrelevant here, we are merely using MD5 because of the very low probability of collisions. This isn't something that CRC can provide, regardless of the checksum length. So I cannot see any reason whatsoever why I shouldn't stick with MD5.
LGTM, but you still might want address the remaining nit. http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... File sitescripts/stats/common.py (right): http://codereview.adblockplus.org/5843385483001856/diff/5741031244955648/site... sitescripts/stats/common.py:32: hash.update(result[150:]) On 2014/01/30 12:18:33, Wladimir Palant wrote: > On 2013/12/29 16:58:23, Sebastian Noack wrote: > > I would prefer SHA1, but since collision attacks aren't a concern here, I > don't > > mind if you prefer to stick with MD5. > > As I said, MD5 has several advantages over SHA1 for us - and collision attacks > are irrelevant here, we are merely using MD5 because of the very low probability > of collisions. This isn't something that CRC can provide, regardless of the > checksum length. So I cannot see any reason whatsoever why I shouldn't stick > with MD5. I don't agree. The performance difference isn't even measurable in our case, since we only hash pretty small amounts of data, and the function call in Python is way slower, than hashing those data with any hash function in C. Though you might be right, that MD5's shorter hashes are sufficient and more convenient in this case. But it is best practice to don't use deprecated (and proofed to be insecure) crypto algorithms, even when it doesn't directly add a vulnerability to the system. However as I said, if you don't agree, I don't mind if you ignore my advice. But please consider addressing the nit, my first comment was mainly about, that you can pass the data to hash directly to the constructor: hashlib.md5(result[150:]).hexdigest()
Even more LGTM |