Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code

Side by Side Diff: sitescripts/subscriptions/bin/updateMalwareDomainsList.py

Issue 29821558: Issue #6707 - Make the generated malware domain filter list encode domains as Punycode (Closed)
Patch Set: Updated encoding format Created July 6, 2018, 11:51 a.m.
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments.
Jump to:
View unified diff | Download patch
« no previous file with comments | « .gitignore ('k') | sitescripts/subscriptions/test/test_updateMalwareDomainsList.py » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # This file is part of the Adblock Plus web scripts, 1 # This file is part of the Adblock Plus web scripts,
2 # Copyright (C) 2006-present eyeo GmbH 2 # Copyright (C) 2006-present eyeo GmbH
3 # 3 #
4 # Adblock Plus is free software: you can redistribute it and/or modify 4 # Adblock Plus is free software: you can redistribute it and/or modify
5 # it under the terms of the GNU General Public License version 3 as 5 # it under the terms of the GNU General Public License version 3 as
6 # published by the Free Software Foundation. 6 # published by the Free Software Foundation.
7 # 7 #
8 # Adblock Plus is distributed in the hope that it will be useful, 8 # Adblock Plus is distributed in the hope that it will be useful,
9 # but WITHOUT ANY WARRANTY; without even the implied warranty of 9 # but WITHOUT ANY WARRANTY; without even the implied warranty of
10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 10 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
(...skipping 42 matching lines...) Expand 10 before | Expand all | Expand 10 after
53 section = 'subscriptionDownloads' 53 section = 'subscriptionDownloads'
54 repository = config.get(section, 'malwaredomains_repository') 54 repository = config.get(section, 'malwaredomains_repository')
55 mirrors = config.get(section, 'malwaredomains_mirrors').split() 55 mirrors = config.get(section, 'malwaredomains_mirrors').split()
56 56
57 tempdir = tempfile.mkdtemp(prefix='malwaredomains') 57 tempdir = tempfile.mkdtemp(prefix='malwaredomains')
58 try: 58 try:
59 subprocess.check_call(['hg', '-q', 'clone', '-U', repository, tempdir]) 59 subprocess.check_call(['hg', '-q', 'clone', '-U', repository, tempdir])
60 subprocess.check_call(['hg', '-q', 'up', '-R', tempdir, '-r', 'default'] ) 60 subprocess.check_call(['hg', '-q', 'up', '-R', tempdir, '-r', 'default'] )
61 61
62 path = os.path.join(tempdir, 'malwaredomains_full.txt') 62 path = os.path.join(tempdir, 'malwaredomains_full.txt')
63 file = codecs.open(path, 'wb', encoding='utf-8') 63 file = codecs.open(path, 'wb')
Sebastian Noack 2018/07/06 18:23:25 codecs.open() without specifying an encoding is eq
Sebastian Noack 2018/07/09 11:59:29 That is what I'm suggesting.
Tudor Avram 2018/07/09 13:35:42 Done.
64 64
65 print >>file, FILTERLIST_HEADER 65 print >>file, FILTERLIST_HEADER
66 66
67 error_report = ['Unable to fetch malware domains list', 'Errors:'] 67 error_report = ['Unable to fetch malware domains list', 'Errors:']
68 for mirror in mirrors: 68 for mirror in mirrors:
69 error_message, data = try_mirror(mirror) 69 error_message, data = try_mirror(mirror)
70 if data is not None: 70 if data is not None:
71 break 71 break
72 error_report.append(error_message) 72 error_report.append(error_message)
73 else: 73 else:
74 sys.exit('\n'.join(error_report)) 74 sys.exit('\n'.join(error_report))
75 75
76 zf = zipfile.ZipFile(StringIO(data), 'r') 76 zf = zipfile.ZipFile(StringIO(data), 'r')
77 info = zf.infolist()[0] 77 info = zf.infolist()[0]
78 for line in str(zf.read(info.filename)).splitlines(): 78 for line in str(zf.read(info.filename)).splitlines():
79 domain = line.strip() 79 domain = line.strip()
80 if not domain: 80 if not domain:
81 continue 81 continue
82 82
83 print >>file, '||%s^' % domain.decode('idna') 83 print >>file, '||%s^' % domain
84 file.close() 84 file.close()
85 85
86 if subprocess.check_output(['hg', 'stat', '-R', tempdir]) != '': 86 if subprocess.check_output(['hg', 'stat', '-R', tempdir]) != '':
87 subprocess.check_call(['hg', '-q', 'commit', '-R', tempdir, '-A', '- u', 'hgbot', '-m', 'Updated malwaredomains.com data']) 87 subprocess.check_call(['hg', '-q', 'commit', '-R', tempdir, '-A', '- u', 'hgbot', '-m', 'Updated malwaredomains.com data'])
88 subprocess.check_call(['hg', '-q', 'push', '-R', tempdir]) 88 subprocess.check_call(['hg', '-q', 'push', '-R', tempdir])
89 finally: 89 finally:
90 shutil.rmtree(tempdir, ignore_errors=True) 90 shutil.rmtree(tempdir, ignore_errors=True)
91 91
92 92
93 if __name__ == '__main__': 93 if __name__ == '__main__':
94 main() 94 main()
OLDNEW
« no previous file with comments | « .gitignore ('k') | sitescripts/subscriptions/test/test_updateMalwareDomainsList.py » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld