Saturday, April 26, 2014

Python Style Guide: PEP8

According to my colleague, the Python style guide, PEP8, is considered as the "bible" for python programmer. Everyone should conform according to its styles. The best resource can be found in this link. Here I summarize what I think is the most important ones:

1. Code Layout

    Use 4 spaces per indentation level.
    Do not use tab, only use space for indentation.
    Limit all lines to a maximum of 79 characters.
    Separate top-level function and class definitions with two blank lines, otherwise, use one blank line between class method or block code within function.
    Try not to "from a import *", "import one-class" in each line

2. Comment

    Principle: "Comments that contradict the code are worse than no comments."
    My colleague suggest to use documentation string: """ comments """ for each functions/classes defined, this may help automatically generate documentation.
    Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code.
    Use inline comments sparingly.

3. Naming Convention

    Modules should have short, all-lowercase names. (May have underscores if improves readability)  --- ex. calculate_integral.py
    Class names should normally use the CapWords convention. --- ex. class KeyDataStructure (object): 
    Function names should be lowercase (May have underscores if improves readability) --- ex. def calc_two_integral():
    Method follows the same naming convention as function. While for non-public methods and instance variables, add "_" before the name 
    Constant defined under module level, use all capital with underscores. --- ex. MAX_ITER

4. Programming Recommendations

    Code should be written in a way that does not disadvantage other implementations of Python. --- ex. do not use a += b for string operation, use ''.join()
    Comparisons to singletons like None should always be done with is or is not, never the equality operators.
    Always use a def statement instead of an assignment statement that binds a lambda expression directly to a name. 
    Use string methods instead of the string module.
    Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.
    Object type comparisons should always use isinstance() instead of comparing types directly.
    For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
    Don't compare boolean values to True or False using ==.

There is really no need to rigidly memorize all of these. Just keep them in mind, and the code would be much better. 
    

Wednesday, April 16, 2014

Python anti-pattern analysis

Range for iteration (1)

Python has a lot of internal optimization for operations for speed and readability. I have been advised that there is a "python anti-pattern" website, which greatly help to prevent me from using python "inappropriately". Here I did a little bit analysis over why these patterns are important in terms of speed.

1. Use range for iteration

    It is usually not necessary to use range(len(lista)) as a way to generate iteration over the list for indexing the list itself. Here is an example to show what is the difference.
Range for iteration (2)
    In "Range for iteration" figure, only changing "range" to "enumerate" does not improve the speed at all (even worse), however, switching the index searching ( a[i] ) does help to improve the speed a lot. This indicates that: if there is a need to access both index and list item itself, better to avoid index access. If one only need to access the value itself without a need for index, then the following example (2) would show what is the the best choice.
Range for iteration (3)

    The "anti-pattern" also suggest to use "zip" function, if there is a need to loop for two list, rather than using range(len()) statement. The test does not show an increase of speed, however, this "anti-pattern" rule may be made to read the code more readable.

2. Sentinel value in python

Usually, coming from a "C" background, I prefer to do a "check" for every condition statement: check whether a list has any element? using len(list); check if a variable is None, use "if a is None". However, this could be unnecessary in Python, as it has a rich sentinel value for the conditional variables. The following example shows how does the sentinel value improve the efficiency.


3. List comprehension
   
    There are so many rich intrinsic data structures in Python which has been optimized over and over ... it is better to save some looping time and directly utilize them. List comprehension is one of the most commonly used one. The example shown below indicates what is the speed gain.

    In general, there so many advanced skills ( or "tricks") to be used in Python. This is a powerful language, and I am excited to explore more along the way.