Educating yourself does not mean that you were stupid in the first place; it means that you are intelligent enough to know that there is plenty left to 'learn'. -Melanie Joy

Saturday, 22 December 2012

Some Optimization Tips/Tricks in Python

December 22, 2012 Posted by Dinesh

Looping:
Use xrange for looping across long ranges; it uses much less memory than range, and may save time as well. Both versions are likely to be faster than a while loop:

#non-generator
for x in range(10000):
Do Stuff # range will expand initially so memory hog
#generator
for x in xrange(10000):
Do Stuff # good!, since xrange will load dynamically
view raw xrage.py hosted with ❤ by GitHub

xrange is a generator. The performance improvement from the use of generators is the result of the lazy generation of values, means values are generated on demand. Furthermore, we do not need to wait until all the elements have been generated before we start to use them.

You can often eliminate a loop by calling map instead.

Strings:

Building up strings with the concatenation operator + can be slow, because it often involves copying strings several times. Formatting using the % operator is generally faster, and uses less memory.

For example:

String = Header+ Body+ Footer # slow
String = ”%s%s%s”%(Header,Body,Footer) # fast
view raw strings.py hosted with ❤ by GitHub

If you are building up a string with an unknown number of components, consider using string.join to combine them all, instead of concatenating them as you go:
# Slow way:
Str = ""
for X in range(10000):
Str+=`X`
# Fast way
List = [ ]
for X in range(10000):
List.append(`X`)
Str = string.join(List,"")
view raw join_strings.py hosted with ❤ by GitHub


Sample code :

#!/usr/bin/python
Str=""
for X in range(10000000):
Str+=`X`
view raw 1_range.py hosted with ❤ by GitHub
#!/usr/bin/python
Str=""
for X in xrange(10000000):
Str+=`X`
view raw 2_xrange.py hosted with ❤ by GitHub
#!/usr/bin/python
List=[]
for X in range(10000000):
List.append('X')
view raw 3_list_range.py hosted with ❤ by GitHub
#!/usr/bin/python
List=[]
for X in xrange(10000000):
List.append('X')


Here are the results for above codes:


range xrange list_range list_xrange
real 0m4.136s 0m2.863s0m1.867s 0m1.569s
user 0m3.960s 0m2.804s0m1.732s 0m1.516s
system 0m0.160s 0m0.048s 0m0.124s0m0.048s




File Operation: (Ext to my previous post)

# Each call to a file’s readline method is quite slow:
while Line:
Line=file.readline()
#Do Stuff
view raw readline.py hosted with ❤ by GitHub


# It is much faster to read the entire file into memory by calling readlines; however, this uses up a lot of RAM.
Lines=file.readlines()
for line in Lines:
#Do Stuff
view raw readlines.py hosted with ❤ by GitHub


# Another approach is to read blocks of lines.
while 1:
Lines=file.readlines(100)
if len(Lines)==0:
break
for Line in Lines:
#Do Stuff
view raw read_block.py hosted with ❤ by GitHub


#Best of all is to use the xreadlines method of a file:
for Line in file.xreadlines():
#Do Stuff
view raw xreadlines.py hosted with ❤ by GitHub


Sample Codes:

#!/usr/bin/python
file = open("temp","r")
line = file.readline()
while line:
line = file.readline()
pass
view raw 1_readlines.py hosted with ❤ by GitHub
#!/usr/bin/python
file = open("temp","r")
lines = file.readlines()
for line in lines:
pass
view raw 2_readlines.py hosted with ❤ by GitHub
#!/usr/bin/python
file=open("temp","r")
while 1:
Lines=file.readlines(100)
if len(Lines)==0:
break
for Line in Lines:
pass
view raw 3_raedblock.py hosted with ❤ by GitHub
#!/usr/bin/python
file = open("temp","r")
lines = file.xreadlines()
for line in lines:
pass
view raw 4_xreadlines.py hosted with ❤ by GitHub


Here are the results for above codes:

readlinereadlinesreadblockxreadlines
real0m9.948s0m13.037s0m9.880s0m9.574s
user0m7.144s0m4.316s0m2.716s0m2.372s
system0m0.332s0m1.092s0m0.404s0m0.328s