Python
People/Blogs
Documentation / Learning / Articles
-
The Hitchhiker’s Guide to Python! - "This opinionated guide exists to provide both novice and expert Python developers a best-practice handbook to the installation, configuration, and usage of Python on a daily basis."
Python Tutor - Visualize Python, Java, JavaScript, TypeScript, and Ruby code execution
-
-
-
Porting to Python 3 Redux - Armin Ronacher describes what he did to make the Jinja2 library compatible with Python 2.6/2.7 and Python 3.3
Descriptors - descriptors are sometimes not explained well, but basically they're reusable properties. Custom descriptors (plus metaclasses) are how the syntax in Python ORMs or libraries like Schematics are implemented.
-
-
When looking at the Schematics source, remember that the type objects themselves are not descriptors. The metaclass creates the actual descriptors (that's how the attribute name is set in the descriptor). Also, it uses a trick to keep the ordering of the fields even in Python 2 (a global infinite iterator).
Decorators - can be tricky to make correct
Python Style
I generally try to follow the Python style guide, PEP 8, though there may be a few exceptions.
NOTE: take a look at YAPF for automatically reformatting code 1).
I think when using PyQt, it’s okay to use some of its style (in particular, camelCase method names). In general, this falls under the category of "being consistent with the prevailing style". But these days I tend to follow PIP 8 in that situation too.
For alignment, I generally prefer the second of these options:
# Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
var_three, var_four)
# More indentation included to distinguish this from the rest.
def long_function_name(
var_one, var_two, var_three,
var_four):
print(var_one)
That is, the one that doesn’t require fixing when a function or variable is renamed. In PyDev/Eclipse, it helps to uncheck the option "After '(' indent to its level" under PyDev→Editor→Typing.
There’s also a standalone program (pep8). PyDev in Eclipse has a checker built-in (Preferences → PyDev → Editor → Code Analysis, under the pep8.py tab). It's also built in to PyCharm (or IntelliJ IDEA with the Python plugin).
"The Hitchhiker's Guide to Python" has some more info on Python style, including "idiomatic" ways of doing things: http://docs.python-guide.org/en/latest/writing/style.html.
Repository Structure / Organization
Regarding a top-level src
directory, common Python practice seems to have been to not have one. Hynek Shlawack argues for it ("Putting Python packages into the root project directory masks packaging errors"); see the "Testing and Packaging" article.
Repository structure:
Importing into __init__.py
:
-
Includes a comment from zzzeek (author of SQLAlchemy), who suggests the best thing is to selectively import into
__init__.py
, especially to avoid
API changes (e.g. if you split a module into a package)
Packaging / Deployment / Distribution
Package Resources
pkg_resources
(from setuptools
) can be used to access files in a package even if the package is in a zip file.
Example:
# File-like object
from pkg_resources import resource_stream
f = resource_stream('my.package', 'filename.bin')
# String
from pkg_resources import resource_string
text = resource_string('my.package', 'filename.txt')
# Filename (may extract to a temporary directory)
from pkg_resources import resource_filename
filename = resource_filename('my.package', 'filename.bin')
Other functions are available. See https://pythonhosted.org/setuptools/pkg_resources.html for full documentation.
Possible alternative in some situations: pkgutil.get_data (standard library)
-
"With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case."
Subprocesses
Optional Static Typing
Profiling
(Aside from standard library profiling support)
Troubleshooting Memory Usage / "Memory Leaks"
First, if using Django, disable DEBUG
(it stores all queries every executed); Django 1.8 is apparently going to put an upper limit on this.
Some tools that can be used:
Pyrasite - useful for inspecting a running application
Shell, memory viewer,
GUI
-
gc.get_referents()
-
heapy, pympler, meliae
tracemalloc (Python 3.4)
Python Shells
IPython seems to be the most popular
-
-
-
DreamPie - GTK Python shell with some nice-looking features
Some of these no longer work with Python 2.4; some have older versions that do.
Interesting Packages
Lists of Packages
General / Uncategorized
Date/time:
Arrow "offers a sensible, human-friendly approach to creating, manipulating, formatting and converting dates, times, and timestamps. It implements and updates the datetime type, plugging gaps in functionality, and provides an intelligent module
API that supports many common creation scenarios. Simply put, it helps you work with dates and times with fewer imports and a lot less code."
-
-
Bunch - "dictionary that supports attribute-style access"; bunchify/unbunchify to recursively convert existing dicts
-
-
-
Can be used to get a shell in a running process
Can be used to show memory usage by object type (number of objects, size)
-
-
RPC
-
-
spyne - RPC with lots of different protocols and transports supported
Seems to be more useful for creating public interfaces; appears to have some client-side but it's not documented as far as I can see
ZeroMQ transport uses simple REQ/REP (this is not recommended in the ZeroMQ guide because it isn't reliable; "the use cases for strict request-reply or extended request-reply are somewhat limited")
-
ZeroRPC seems interesting (created and used by dotCloud), but currently is not well-documented
-
py - "library with cross-python path, ini-parsing, io, code, log facilities"
pycdb "is a debugger completely written in Python using pyelftools library" ("pycdb stands for Python Core Debugger or Python C Debugger")
first - tiny function that returns the first true value from an iterable
psutil - process / system information
Python 3 Compatibility / Backports
Python 2-only Packages
These are packages that I use or might like to use which, last time I checked, did not yet support Python 3.
Python 3 Compatibility Libraries
future - write code that looks more like Python 3, compatible with Python 2 and 3
-
Python 3 Backports
configparser - better, updated version of built-in ConfigParser
pathlib - object-oriented filesystem paths
-
-
-
-
faulthandler - dump Python traceback on fault (SIGSEGV, SIGFPE, SIGABRT, SIGBUS and SIGILL); added in Python 3.3
Debugging
Mostly I use PyCharm for debugging, and I use logging a lot.
Packaging
pip-tools - helps updates packages, requirements.txt
pipenv - management of Pipfile (future replacement for requirements.txt) and project-local virtualenv
pipsi - installs scripts into separate virtualenv
Currently not maintained; setup script only uses system python
2)
Cookie Cutter - "A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template"
Process Management / Process Supervisors
-
Circus (a Mozilla project) "is a program that runs and watches processes and sockets"
Support Python 3
Has a CLI and optional web interface, but also everything can be controlled through the Python
API
IPC sockets can be created as Unix sockets instead of TCP (so you can control which users have access)
Looks like it should be able to reload its configuration without restarting managed processes (unconfirmed)
I've been having trouble figuring out how to use it as a library
Uses
plugins for things like flapping detection, HTTP checks, etc.
Crossbar.io (see
WAMP) has some process-management features, but I've had some trouble with them
-
Honcho - Procfile runner (basically Foreman ported to Python)
-
Sounds like it is intended to be integrated into an application that has multiple components running as separate processes.
Adds a command-line interface to an application (start, stop, etc.) as well as a simple optional web interface
-
Some issues:
Doesn't support Python 3
Unusual code style (lowercase class names, tabs, long lines, some difficult names e.g. "cnscom")
Use of pyev (
GPL license in version >= 0.9)
Exit code of "status" is 0 when a process is in the FATAL state, which is inconsistent with init scripts
Some non-Python process supervisors:
Scripting/Automation/Shell Script Replacement
Higher-level subprocess libraries:
fabric - for scripting local or remote commands
Fabric3 - Python 3-compatible fork (not from original authors)
invoke - task execution tool & library
Plumbum: "Shell Combinators and More" - uses magic / operator overloading to mimic shell syntax
-
Envoy: "Python Subprocesses for Humans"
sh (
pypi) - allows you to call any program (using subprocess) as if it were a function
Other utilities:
Click (by Armin Ronacher) - tool for writing consistent and composable command-line interfaces
Parses options and positional arguments
Utilities for user prompts, colored output, etc.
Okaara - "series of utilities for writing command line interfaces in Python"
Reading input (Y/N, enumerated options, etc.)
Output (coloring, centered text, wrapping)
Progress bars / spinners
cliff - "cliff is a framework for building command line programs. It uses setuptools entry points to provide subcommands, output formatters, and other extensions."
PAWK - A Python line processor (like AWK)
doit - a Make-like task management & automation tool (actions, dependencies, targets)
Paver - another Make-like utility; can integrate with setuptools to replace
setup.py
clint - command line colors, nested quoting/indentation, progress bars, and more
spur - Run commands and manipulate files locally or over SSH using the same interface
-
Older packages:
Filesystem
watchdog - monitor filesystem events (abstraction around Linux inotify and the equivalents on other operating systems)
Atomic Writes
After a quick search, I came up with several packages and a some articles.
Articles:
Libraries:
"atomicwrites" appears to be most popular at the moment.
Atomic writing is also built in to the click command-line arguments library (for file-type arguments).
GUI
I've only used PyQt.
I started to look at Tk (tkinter) for small utilities, but realized that it doesn't support drag and drop (in particular, dropping a file to get its path).
Qt4 "helpers"
-
guidata (CeCILL license; compatible with
GPL, probably not usable from commercial products)
-
Database
-
-
-
dataset - "databases for lazy people"
-
-
Built-in database schema migration support as of Django 1.7 (excellent, fully-automatic for most changes, the best migration support available anywhere as far as I can tell)
ORM has some limitations if you need to do very complex things, or make use of preexisting (often not-well-designed) schemas (e.g. does not support composite primary keys), but is easier to use in return
-
-
Camelot "RAD framework" for desktop database applications, like Django admin for PyQt4 (uses SQLAlchemy); license is
GPL/Commercial
-
-
Sandman/
Sandman2 - "automagically generates a RESTful
API service from your existing database, without requiring you to write a line of code" (uses SQLAlchemy); includes an admin interface (uses Flask-Admin)
In-memory Datastore
redis-py (just "redis" in pypi) - main client
hot-redis - more complex types, implemented as a wrapper around redis-py, using Lua scripts for atomic operations
Data
Serialization
-
-
"Unsafe" mode is similar to pickle; supports nested objects by using anchors / references
ruamel.yaml has a register_class
method for explicitly allowing serialization of certain classes using tags; also supports nested objects like the unsafe serializer
serpent - Serialization based on ast.literal_eval
camel - explicit YAML serialization
4)
-
-
Origami - "Origami is a lightweight package to help you serialize (or fold) objects into a binary format"
-
PyON (unmaintained, not on PyPI)
In the talk Pickles are for Delis, not Software, Alex Gaynor suggests creating simple dump and load methods for your objects, making sure to include a "version" attribute for future changes (video starting at relevant position, corresponding slides).
Dive Into Python 3 shows one way of serializing/deserializing classes to/from JSON.
See also libraries mentioned elsewhere on this page (e.g. marshmallow, Schematics, Colander).
XML
While Python has SAX and DOM packages, the standard in Python is "ElementTree", probably either the built-in implementation or lxml. Some examples:
lxml provides a slightly more Pythonic interface called Objectify. Example:
lxml can also validate against a DTD or schema: http://lxml.de/validation.html
There are a couple of packages available to pre-generate an object model based on an XML schema (I believe this is similar to JAXB in Java):
-
-
Generated code is shorter, but depends on the PyXB module (which is large, ~6MB zipped; most of that is the pyxb.bundles
package)
-
At a glance, it appears to have more powerful validation features than generateDS
Other links of possible interest:
Concurrency / Asynchronous
Multiprocessing
billiard - fork of
multiprocessing
that fixes/improves some things; I think this is used by the Celery package
-
-
Futures
The concurrent.futures
module was added in Python 3.2; backport is available for Python 2.5+.
async_gui - new library (2013-04-06 being the first and only release so far) for concurrent
GUI programming (uses 'yield' and futures to run tasks in the background while keeping the
GUI responsive)
Works with PyQt4/PySide, Tk, Wx, Gtk
Alternates calling the event loop and the futures with a short timeout, rather than a trigger-based mechanism
Asynchronous I/O
-
Mike Bayer (zzzeek) explains that while SQLAlchemy will probably get asyncio compatibility at some point, performance will be worse than using threads. For most CRUD database code, using a thread pool is a good option.
Libraries
Generic event loops:
pyuv (libuv, which was written for Node.js)
pyev (libev) -
WARNING: license is now GPLv3
pyev 0.8.1 appears to be BSD-licensed (
pip install pyev==0.8.1-4.04
)
Higher-level:
PEP 3156 (''asyncio'' module for Python 3.3+, code named ''tulip'')
PEP 3156 proposed a standard library module for asynchronous I/O. Hopefully having an event loop in the standard library will help promote compatibility, allowing easier use of multiple libraries at once (e.g. PyQt4 + Twisted). A PyPI module should be available for Python 3.3; the package entered the standard library in 3.4: https://docs.python.org/3/library/asyncio.html.
Talk by Guido (published 2013-10-29): Tulip: Async I/O for Python 3
Guido's explanation of why yield from
is used instead of yield
(making it incompatible with Python < 3.3): The difference between yield and yield-from
Backport to Python 2.7: Trollius (replaces yield from ...
and return ...
with yield From(...)
and raise Return(...)
).
Interoperability:
-
Quamash - PyQt4/PyQt5/PySide event loop implementation
-
Web
-
dukpy - pip-installable, no external dependency JavaScript interpreter mostly implemented in C (currently "alpha" status); comes with some transpilers
-
Depot - file storage for web apps, with multiple backends (local, GridFS, S3)
pyjade converts from Jade templates into Django, Jinja2, Mako or Tornado templates
Port of the Node.js template language
Jade, which has a HAML-like syntax
-
HTTPie "is a CLI, cURL-like tool for humans"
-
Configuration
-
-
-
Multiple formats (e.g. YAML, INI, XML, Python lists and dicts)
Can read from multiple heterogeneous locations (e.g. defaults + user config file + command line options)
Seems to store configuration globally, but does support some kind of namespaces
dynaconf - standalone package with similar usage to Django settings
Loads settings from Python file, environment variables, Redis, etc.
I don't think I like the way it does type casting; it seems like some kind of schema would be preferable
-
Messaging / Queues
-
WAMP (Web Application Messaging Protocol) - standardized RPC and PUB/SUB over websockets
Kombu is a higher-level messaging library (natively RabbitMQ / AMQP, but supports other backends, such as Redis)
Celery distributed task queue (at a higher level than using a message queue directly)
Zato is a Python ESB/SOA server, with a web
GUI, that speaks lots of protocols and data formats
-
TODO: separate page for messaging?
Interesting architecture example using RabbitMQ, among other things
Image Manipulation
Pillow replaces PIL (and is compatible with Python 3)
Logging
I generally just use the standard library logging module.
Articles:
Some logging-related packages:
structlog - add structured data on top of your existing logger
-
verboselogs - adds a few extra log levels; integrates with
coloredlogs
Eliot - another structured logging package
Logbook - an alternative logging system (replaces standard library logging); I haven't tried it
Sentry
logutils - extra handlers, including a queue handler for dealing with slow handlers or for use with multiprocessing applications
Tools that are not Python-specific:
Unit Testing
pytest - better test framework; you can also use this as a test runner for standard unit tests
-
Text
Singularization/pluralization:
Text / Name Generation
https://github.com/ben174/rikeripsum - "Lorem Ipsum: The Next Generation" - "Generates text - like lorem ipsum - but uses real English. Taken from random samplings of dialog spoken by Commander William Riker in Star Trek: The Next Generation."
name-of-thrones - "Command line tool to generate words that sound like characters from Game of Thrones"
Absolute Timers
Cryptography
Troubleshooting
virtualenv on 64-bit Windows
The batch files used by virtualenv don't like "Program Files (x86)" in the PATH. Instead of trying to fix the batch scripts (I don't like batch scripts… we should just use MSYS bash instead), I changed my PATH to use the "short" directory name, i.e. "PROGRA~2" instead of "Program Files (x86)". My particular error was "\PC was unexpected at this time", since "C:\Program Files (x86)\PC Connectivity Solution\" was in my PATH.
For errors like this:
error: invalid command 'egg_info'
or other setuptools-related errors (e.g. an error on python setup.py develop
), try updating setuptools. Also try removing (pip uninstall
, or manually) any remnants of old versions of distribute
or setuptools
, both in the virtualenv and in the relevant system Python installation.
Building Python From Source
When building Python from source, remember that it requires development packages ("-dev" packages in Debian, "-devel" in RHEL) for several libraries. In particular, the readline library enables line editing in the interactive Python interpreter (e.g. up/down/left/right and home/end keys).
RHEL5:
Shared Library
There may be some reason to compile Python with the --enable-shared
option.
If using a nonstandard prefix (e.g. ``/opt/python27``), then it won't be
able to find the shared library without setting LD_LIBRARY_PATH.
I found an article describing how to add rpath to the binaries::
# Apparently it will give an unhelpful error if the directory doesn't exist
PREFIX=/opt/python27
sudo mkdir -p $PREFIX/lib
./configure --enable-shared --prefix=$PREFIX LDFLAGS="-Wl,-rpath $PREFIX/lib"
Reference: http://koansys.com/tech/building-python-with-enable-shared-in-non-standard-location
Another option is to just link the shared libraries to /usr/lib
, (as long as there isn't another interpreter of the same version on the system). This seems to work well for e.g. adding Python 2.7 / 3.x to RHEL5.