2011-10-26

Python optimizations exercise.

From there I wanted to post cleanly all my findings. I include all my sources, so you can reproduce it. Basically, Pavel wanted to optimize a simple use case. Here is my shot at it: First the test.py, the html7 implementation and a slightly improved one where I don't force python to go back and forth between bytes and unicode + the benchmark timer :
import timeit


WHITELIST2 = set('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
WHITELIST2_UNI = set(u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')
r = u'234782384723!#$#@!%$@#%$@#%#$%%%%%%%%%%%@#$%342583492058934028590342853490285902344#$%%*%****%7jkb6h546777776ynkk4b56byhbh5j'*500

def html7():
    s = r
    lstr = str; lord = ord; lWHITELIST2 = WHITELIST2
    return "".join([
        c if c in lWHITELIST2
        else "&#" + lstr(lord(c)) + ";"
        for c in s])

def htmlGB():
    s = r
    lstr = unicode; lord = ord; lWHITELIST2 = WHITELIST2_UNI
    return u''.join([
        c if c in lWHITELIST2
        else u'&#' + lstr(lord(c)) + u';'
        for c in s])
assert(html7() == htmlGB())

t = timeit.Timer(stmt = html7)
print 'html7:' + str(t.timeit(number=100))

t = timeit.Timer(stmt = htmlGB)
print 'htmlGB:' + str(t.timeit(number=100))
It gives me that as timing under python and pypy 1.5 :
⚫ python test.py 
html7:2.48782491684
htmlGB:2.14983606339

⚫ pypy-c1.5 test.py
html7:2.44286298752
htmlGB:1.84629392624
Note: the other implementation with caching & statistical tryouts are too convoluted and too much on the speed over memory tradeoff for this simple task IMHO. Then I wanted to go further, so I made a simple port on cython then a "bare metal" version on cython, and I have been really impressed by it ! In order to compile in cython you need a simple makefile-like setup.py file:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("encode", ["encode.pyx"])]

setup(
  name = 'Encoding Test',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)
Then here are my implementations : cython is the direct port, cython_alt is the C-like bare metal one. In the file encode.pyx put:
WHITELIST = set(u'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')

from libc.stdlib cimport malloc, free
from libc.stdio cimport sprintf

def cython(unicode s):
    return u''.join([
    c if c in WHITELIST
    else u'&#' + unicode(< long > c) + u';'
    for c in s])

def cython_alt(unicode s):
    cdef char * buffer = < char * > malloc(len(s) * 10)
    cdef int i = 0
    cdef Py_UCS4 c

    for c in s:
        if (c >= u'a' and c <= u'z') or (c >= u'A' and c <= u'Z') or (c >= u'0' and c <= u'9'):
            buffer[i] = < char > c
            i += 1
        else:
            sprintf(buffer + i,'&#%d;',c)
            while buffer[i]:
                i += 1
    result = < bytes > buffer
    free(buffer)
    return result
And the same test runner test_cython :
import timeit
from encode import cython
from encode import cython_alt


r = u'234782384723!#$#@!%$@#%$@#%#$%%%%%%%%%%%@#$%342583492058934028590342853490285902344#$%%*%****%7jkb6h546777776ynkk4b56byhbh5j'*500

def cython2():
    cython(r)

def cython_alt2():
    cython_alt(r)

assert(cython(r) == cython_alt(r))

t = timeit.Timer(stmt = cython2)
print 'cython:' + str(t.timeit(number=100))

t = timeit.Timer(stmt = cython_alt2)
print 'cython_alt:' + str(t.timeit(number=100))
To compile it just do :
python setup.py build_ext --inplace
The results :
⚫ python test_cython.py 
cython:1.70311307907
cython_alt:0.348756790161
Booya ! :)