User Tools

Site Tools


info:python

Python

See also:

People/Blogs

Documentation / Learning / Articles

Python Style

I generally try to follow the Python style guide, PEP 8, though there may be a few exceptions.

NOTE: take a look at YAPF for automatically reformatting code 1).

I think when using PyQt, it’s okay to use some of its style (in particular, camelCase method names). In general, this falls under the category of "being consistent with the prevailing style". But these days I tend to follow PIP 8 in that situation too.

For alignment, I generally prefer the second of these options:

# Aligned with opening delimiter
foo = long_function_name(var_one, var_two,
                         var_three, var_four)
 
# More indentation included to distinguish this from the rest.
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

That is, the one that doesn’t require fixing when a function or variable is renamed. In PyDev/Eclipse, it helps to uncheck the option "After '(' indent to its level" under PyDev→Editor→Typing.

There’s also a standalone program (pep8). PyDev in Eclipse has a checker built-in (Preferences → PyDev → Editor → Code Analysis, under the pep8.py tab). It's also built in to PyCharm (or IntelliJ IDEA with the Python plugin).

"The Hitchhiker's Guide to Python" has some more info on Python style, including "idiomatic" ways of doing things: http://docs.python-guide.org/en/latest/writing/style.html.

Repository Structure / Organization

Regarding a top-level src directory, common Python practice seems to have been to not have one. Hynek Shlawack argues for it ("Putting Python packages into the root project directory masks packaging errors"); see the "Testing and Packaging" article.

Repository structure:

Importing into __init__.py:

Packaging / Deployment / Distribution

  • PEX / Pants (by Twitter)
    • Now a distinct project (used to be part of Twitter Commons)
    • PEX builds something like a virtualenv in a single file (runnable zip file with some bootstrapping hooks)

Package Resources

pkg_resources (from setuptools) can be used to access files in a package even if the package is in a zip file.

Example:

# File-like object
from pkg_resources import resource_stream
f = resource_stream('my.package', 'filename.bin')
 
# String
from pkg_resources import resource_string
text = resource_string('my.package', 'filename.txt')
 
# Filename (may extract to a temporary directory)
from pkg_resources import resource_filename
filename = resource_filename('my.package', 'filename.bin')

Other functions are available. See https://pythonhosted.org/setuptools/pkg_resources.html for full documentation.

Possible alternative in some situations: pkgutil.get_data (standard library)

Performance

    • "With CPython 2.7, using dict() to create dictionaries takes up to 6 times longer and involves more memory allocation operations than the literal syntax. Use {} to create dictionaries, especially if you are pre-populating them, unless the literal syntax does not work for your case."

Subprocesses

Optional Static Typing

Performance / Memory

Profiling

(Aside from standard library profiling support)

  • PyCharm has a built-in profiler interface (only in Professional Edition)
  • https://github.com/what-studio/profiling - interactive (console) profiler
  • Upcoming standalone profiler UI from PyDev author Fabio Zadrozny (non-free)

Troubleshooting Memory Usage / "Memory Leaks"

First, if using Django, disable DEBUG (it stores all queries every executed); Django 1.8 is apparently going to put an upper limit on this.

Some tools that can be used:

  • Pyrasite - useful for inspecting a running application
    • Shell, memory viewer, GUI
  • gc.get_referents()
  • heapy, pympler, meliae
  • tracemalloc (Python 3.4)

Python Shells

  • IPython seems to be the most popular
    • The source package is fairly large (0.13.1 is 6.3MB)
    • Lots of nice shortcuts, and tons of features
      • Put "?" after something (e.g. a method) to get the help
      • %load / %run to load a Python file as if you typed it in
      • ipython -i FILE.py to run a Python file and then go to a REPL prompt
    • Can be combined with ipython
    • Curses-based
    • Syntax highlighting
    • Nice automatic popup of completion options
  • DreamPie - GTK Python shell with some nice-looking features

Some of these no longer work with Python 2.4; some have older versions that do.

Interesting Packages

Of course, there are a lot of useful modules already in the Python Standard Library.

Lists of Packages

General / Uncategorized

  • Date/time:
    • Arrow "offers a sensible, human-friendly approach to creating, manipulating, formatting and converting dates, times, and timestamps. It implements and updates the datetime type, plugging gaps in functionality, and provides an intelligent module API that supports many common creation scenarios. Simply put, it helps you work with dates and times with fewer imports and a lot less code."
    • Maya (Kenneth Reitz)
      • "Arrow doesn't do all of the things I need (but it does a lot more!). Maya does do exactly what I need. I think these projects complement each-other, personally."
  • Internationalization: http://babel.pocoo.org/
    • So far I've used the built-in gettext modules for translation, but I haven't had to do anything serious

Python 3 Compatibility / Backports

Python 2-only Packages

These are packages that I use or might like to use which, last time I checked, did not yet support Python 3.

Python 3 Compatibility Libraries

  • future - write code that looks more like Python 3, compatible with Python 2 and 3

Python 3 Backports

Debugging

Mostly I use PyCharm for debugging, and I use logging a lot.

  • PuDB - "PuDB is a full-screen, console-based visual debugger for Python"
  • debug - "Start fancy debugger in a single statement"; just type import debug (works multiple times) and it will drop into an ipdb session at that point
  • pyrasite - can come in handy if you need to debug a running process
    • pyrasite-memory-viewer can help diagnose memory "leaks"
  • PyCharm and PyDev can both attach a debugger to a running process

Packaging

  • pip-tools - helps updates packages, requirements.txt
  • pipenv - management of Pipfile (future replacement for requirements.txt) and project-local virtualenv
  • pipsi - installs scripts into separate virtualenv
    • Currently not maintained; setup script only uses system python 2)
  • Cookie Cutter - "A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python package project template"

Process Management / Process Supervisors

    • Does not run on Python 3 (see notes at https://github.com/Supervisor/supervisor; version 4.0 is expected to work)
    • Can be controlled via XML-RPC
    • It seems that you can update the configuration (e.g. add/remove processes) without restarting all processes using the reread command followed by update 3), but the documentation is unclear
      • There is an extension, supervisor_twiddler, that can make some non-persistent configuration changes (adding/removing processes)
  • Circus (a Mozilla project) "is a program that runs and watches processes and sockets"
    • Support Python 3
    • Has a CLI and optional web interface, but also everything can be controlled through the Python API
    • IPC sockets can be created as Unix sockets instead of TCP (so you can control which users have access)
    • Looks like it should be able to reload its configuration without restarting managed processes (unconfirmed)
    • I've been having trouble figuring out how to use it as a library
    • Uses plugins for things like flapping detection, HTTP checks, etc.
  • Crossbar.io (see WAMP) has some process-management features, but I've had some trouble with them
    • Supports Procfiles
  • Honcho - Procfile runner (basically Foreman ported to Python)
    • Sounds like it is intended to be integrated into an application that has multiple components running as separate processes.
    • Adds a command-line interface to an application (start, stop, etc.) as well as a simple optional web interface
    • Can't reload its configuration without restarting managed processes (see e.g. https://github.com/ateska/ramona/issues/35)
      • Since you would use a separate Ramona instance for each application, I don't see this as a big problem
    • Some issues:
      • Doesn't support Python 3
      • Unusual code style (lowercase class names, tabs, long lines, some difficult names e.g. "cnscom")
      • Use of pyev (GPL license in version >= 0.9)
      • Exit code of "status" is 0 when a process is in the FATAL state, which is inconsistent with init scripts

Some non-Python process supervisors:

  • systemd (system level; requires that you use an OS that uses it)
  • daemontools "family" (daemontools, runit, s6, perp, nosh)
  • Monit (doesn't quite fit; it monitors applications, but doesn't handle the daemonizing)

Scripting/Automation/Shell Script Replacement

Higher-level subprocess libraries:

  • fabric - for scripting local or remote commands
    • Fabric3 - Python 3-compatible fork (not from original authors)
  • invoke - task execution tool & library
    • Intended as the basis for Fabric 2.0
  • Plumbum: "Shell Combinators and More" - uses magic / operator overloading to mimic shell syntax
  • Envoy: "Python Subprocesses for Humans"
  • sh (pypi) - allows you to call any program (using subprocess) as if it were a function

Other utilities:

  • Click (by Armin Ronacher) - tool for writing consistent and composable command-line interfaces
    • Parses options and positional arguments
    • Utilities for user prompts, colored output, etc.
  • Okaara - "series of utilities for writing command line interfaces in Python"
    • Reading input (Y/N, enumerated options, etc.)
    • Output (coloring, centered text, wrapping)
    • Progress bars / spinners
  • cliff - "cliff is a framework for building command line programs. It uses setuptools entry points to provide subcommands, output formatters, and other extensions."
  • PAWK - A Python line processor (like AWK)
  • doit - a Make-like task management & automation tool (actions, dependencies, targets)
  • Paver - another Make-like utility; can integrate with setuptools to replace setup.py
  • clint - command line colors, nested quoting/indentation, progress bars, and more
  • spur - Run commands and manipulate files locally or over SSH using the same interface
  • See also Remote control / configuration management (Salt, Ansible, etc.)
  • Older packages:

Filesystem

  • watchdog - monitor filesystem events (abstraction around Linux inotify and the equivalents on other operating systems)

Atomic Writes

After a quick search, I came up with several packages and a some articles.

Articles:

Libraries:

"atomicwrites" appears to be most popular at the moment.

Atomic writing is also built in to the click command-line arguments library (for file-type arguments).

GUI

I've only used PyQt.

I started to look at Tk (tkinter) for small utilities, but realized that it doesn't support drag and drop (in particular, dropping a file to get its path).

Qt4 "helpers"

Database

    • Built-in database schema migration support as of Django 1.7 (excellent, fully-automatic for most changes, the best migration support available anywhere as far as I can tell)
    • ORM has some limitations if you need to do very complex things, or make use of preexisting (often not-well-designed) schemas (e.g. does not support composite primary keys), but is easier to use in return
    • Basic schema migration support (no introspection or automatic versioning)
    • Does not support Oracle
  • Camelot "RAD framework" for desktop database applications, like Django admin for PyQt4 (uses SQLAlchemy); license is GPL/Commercial
  • QtAlchemy (https://bitbucket.org/jbmohler/qtalchemy) - SQLAlchemy / PyQt4; license is LGPL
  • https://bitbucket.org/fpliger/alchemyui - early GUI generator using SQLAlchemy, traits, and wxPython
  • Sandman/Sandman2 - "automagically generates a RESTful API service from your existing database, without requiring you to write a line of code" (uses SQLAlchemy); includes an admin interface (uses Flask-Admin)

In-memory Datastore

  • redis-py (just "redis" in pypi) - main client
  • hot-redis - more complex types, implemented as a wrapper around redis-py, using Lua scripts for atomic operations

Data

See also data modeling.

  • attrs "is an MIT-licensed Python package with class decorators that ease the chores of implementing the most common attribute-related object protocols" (replaces an older project characteristic)
  • https://github.com/samuelcolvin/pydantic - (Python 3.6+) - validation for type-annotated classes, "settings" classes
  • meza - A Python toolkit for processing tabular data
    • Lazily streams files by default
    • Has a lot of hard required packages
    • Not incremental / lazy - always reads/writes the entire file
    • (2016) Updated sporadically; had a few updates recently. See history on PyPI
  • openpyxl (for reading and writing Excel XLSX files)
    • Has optional "optimized" reader and writer for dealing with large files (with some limitations)
  • XlsxWriter - write-only library for creating Excel XLSX files
    • Good documentation
    • Lots of features (formatting, charts, data validation, etc.)
    • Optional optimized writing for large files ("constant_memory" mode); this mode appears to still support formatting, but there are still some limitations / unsupported features when it is enabled
  • xlrd, xlwt (python-excel)
    • xlrd now also reads XLSX files
  • Schematics - data models that look kind of like ORMs, but without the database (model and validate data, e.g. JSON)
    • Built-in method for generating mock objects with random data
    • Documentation is still very incomplete
  • marshmallow - "simplified object serialization for REST APIs"
    • Operates on arbitrary Python objects
    • You can declare fields for validation in ORM-style, or if the fields are simple Python types, you can just list the names
    • Seems to be lacking in ways to query the schema (model) object
  • Colander - "Colander is useful as a system for validating and deserializing data obtained via XML, JSON, an HTML form post or any other equally simple data serialization."
  • Traits (pypi) - "explicitly typed attributes for Python"
    • Now supports Python 3 as of 4.5.0 (2014-05-07), and traitsui as of 5.0.0
    • Requires C extension (C-API, in traits/ctraits.c), so won't work on PyPy or Jython (and isn't easily installable on Windows)
      • But maybe it will be compatible with PyPy in the future if C-API compatibility increases or the C extension is modified
      • Traits 4.5.0 fails to build on PyPy 2.4 (both pypy and pypy3)
    • TraitsUI generates user interfaces for Traits classes (wxPython / PyQt4); the PyQt4 part, at least, was buggy and missing features when I tried it
    • Enaml declarative user interface library, much more flexible than TraitsUI (see http://stackoverflow.com/a/14070671/187377)
      • No longer part of Enthought, and no longer natively supports Traits; there is a separate traits-enaml package (I don't know how well it works), and they've created their own Traits-like package called Atom; see explanation (essentially, Traits used too much memory and was too slow for a particular application)
    • Built mainly for scientific computing; doesn't integrate well with ORMs
    • IPython includes a basic pure-Python traitlets module, but note that the API has some significant differences from Enthought Traits
  • Atom/Enaml (Enaml used to be based on Traits)
  • dip - "application development and integration framework" by PyQt author
    • Supports Python 2 and 3
    • Models kind of like Traits, declarative UI
    • GPL / Commercial license makes me a bit reluctant to use it
    • Dependent on PyQt (author wants to remove this dependency, eventually)
    • The author states "dip should not be considered for production applications until v1.0 is released", but that doesn't seem likely to happen any time soon (development has been very slow)
  • binio - convenience layer on top of standard python module "struct"
    • This looks similar to using the ctypes module, or maybe CFFI, both of which can parse C structure definitions (see ctypesgen for ctypes)
  • NumPy
  • https://pypi.python.org/pypi/bpmappers - "A mapping tool from model to dictionary"

Serialization

  • pickle (standard library)
  • PyYAML, ruamel.yaml
    • "Unsafe" mode is similar to pickle; supports nested objects by using anchors / references
    • ruamel.yaml has a register_class method for explicitly allowing serialization of certain classes using tags; also supports nested objects like the unsafe serializer
  • serpent - Serialization based on ast.literal_eval
  • camel - explicit YAML serialization 4)
    • Supports versioning (having deserializers for old versions)
  • https://pypi.python.org/pypi/odin - "Object Data Mapping for Python"
    • Offers serialization/deserialization for JSON and other formats
  • Origami - "Origami is a lightweight package to help you serialize (or fold) objects into a binary format"
  • cerealizer - last updated 2012
    • Very similar to Pickle, but intended to be safer
    • Supposed to be fast
  • PyON (unmaintained, not on PyPI)

In the talk Pickles are for Delis, not Software, Alex Gaynor suggests creating simple dump and load methods for your objects, making sure to include a "version" attribute for future changes (video starting at relevant position, corresponding slides).

Dive Into Python 3 shows one way of serializing/deserializing classes to/from JSON.

See also libraries mentioned elsewhere on this page (e.g. marshmallow, Schematics, Colander).

XML

While Python has SAX and DOM packages, the standard in Python is "ElementTree", probably either the built-in implementation or lxml. Some examples:

lxml provides a slightly more Pythonic interface called Objectify. Example:

lxml can also validate against a DTD or schema: http://lxml.de/validation.html

There are a couple of packages available to pre-generate an object model based on an XML schema (I believe this is similar to JAXB in Java):

    • Generated code is longer, but doesn't depend on generateDS
    • It requires lxml
      • It looks like for older versions (e.g. 2.12a) the generated parser only needed some form of ElementTree

Other links of possible interest:

Concurrency / Asynchronous

Multiprocessing

Futures

The concurrent.futures module was added in Python 3.2; backport is available for Python 2.5+.

  • async_gui - new library (2013-04-06 being the first and only release so far) for concurrent GUI programming (uses 'yield' and futures to run tasks in the background while keeping the GUI responsive)
    • Works with PyQt4/PySide, Tk, Wx, Gtk
    • Alternates calling the event loop and the futures with a short timeout, rather than a trigger-based mechanism

Asynchronous I/O

    • Mike Bayer (zzzeek) explains that while SQLAlchemy will probably get asyncio compatibility at some point, performance will be worse than using threads. For most CRUD database code, using a thread pool is a good option.

Libraries

Generic event loops:

  • pyuv (libuv, which was written for Node.js)
  • pyev (libev) - WARNING: license is now GPLv3
    • pyev 0.8.1 appears to be BSD-licensed (pip install pyev==0.8.1-4.04)

Higher-level:

    • As of gevent 1.0, the event loop is based on libev; it does not use pyev—it has its own wrapper (using Cython or CFFI)

PEP 3156 (''asyncio'' module for Python 3.3+, code named ''tulip'')

PEP 3156 proposed a standard library module for asynchronous I/O. Hopefully having an event loop in the standard library will help promote compatibility, allowing easier use of multiple libraries at once (e.g. PyQt4 + Twisted). A PyPI module should be available for Python 3.3; the package entered the standard library in 3.4: https://docs.python.org/3/library/asyncio.html.

Talk by Guido (published 2013-10-29): Tulip: Async I/O for Python 3

Guido's explanation of why yield from is used instead of yield (making it incompatible with Python < 3.3): The difference between yield and yield-from

Backport to Python 2.7: Trollius (replaces yield from ... and return ... with yield From(...) and raise Return(...)).

Interoperability:

Web

  • dukpy - pip-installable, no external dependency JavaScript interpreter mostly implemented in C (currently "alpha" status); comes with some transpilers
  • Depot - file storage for web apps, with multiple backends (local, GridFS, S3)
  • pyjade converts from Jade templates into Django, Jinja2, Mako or Tornado templates
    • Port of the Node.js template language Jade, which has a HAML-like syntax
  • Requests - "HTTP for Humans"
  • HTTPie "is a CLI, cURL-like tool for humans"
  • Beautiful Soup - for pulling data out of HTML/XML
    • NOTE: the package "BeautifulSoup" in pypi is an old version (version 3); the current version is "beautifulsoup4", at least as of 2013-02-15

Configuration

  • PyStaticConfiguration - configuration schemas, validation, reloading
    • Multiple formats (e.g. YAML, INI, XML, Python lists and dicts)
    • Can read from multiple heterogeneous locations (e.g. defaults + user config file + command line options)
    • Seems to store configuration globally, but does support some kind of namespaces
  • dynaconf - standalone package with similar usage to Django settings
    • Loads settings from Python file, environment variables, Redis, etc.
    • I don't think I like the way it does type casting; it seems like some kind of schema would be preferable
  • https://github.com/hynek/environ_config ? ("NOT FOR GENERAL CONSUMPTION YET")

Messaging / Queues

TODO: separate page for messaging?

Interesting architecture example using RabbitMQ, among other things

Image Manipulation

  • Pillow replaces PIL (and is compatible with Python 3)

Logging

I generally just use the standard library logging module.

Articles:

Some logging-related packages:

  • structlog - add structured data on top of your existing logger
  • Colorization for console output: coloredlogs or colorlog
  • verboselogs - adds a few extra log levels; integrates with coloredlogs
  • Eliot - another structured logging package
    • Focuses on the ability keep track of hierarchical actions
    • Can output to journald
    • Comes with a tool to pretty-print its JSON-formatted log messages
  • Logbook - an alternative logging system (replaces standard library logging); I haven't tried it
  • Sentry
  • logutils - extra handlers, including a queue handler for dealing with slow handlers or for use with multiprocessing applications

Tools that are not Python-specific:

  • logstash - can monitor your existing logfiles and pull them all into a single database
    • Kibana
  • Graylog2

Unit Testing

Text

Singularization/pluralization:

Text / Name Generation

  • https://github.com/ben174/rikeripsum - "Lorem Ipsum: The Next Generation" - "Generates text - like lorem ipsum - but uses real English. Taken from random samplings of dialog spoken by Commander William Riker in Star Trek: The Next Generation."
  • name-of-thrones - "Command line tool to generate words that sound like characters from Game of Thrones"

Absolute Timers

Cryptography

See also https://github.com/pyca ("Python Cryptographic Authority").

References:

Troubleshooting

virtualenv on 64-bit Windows

The batch files used by virtualenv don't like "Program Files (x86)" in the PATH. Instead of trying to fix the batch scripts (I don't like batch scripts… we should just use MSYS bash instead), I changed my PATH to use the "short" directory name, i.e. "PROGRA~2" instead of "Program Files (x86)". My particular error was "\PC was unexpected at this time", since "C:\Program Files (x86)\PC Connectivity Solution\" was in my PATH.

Setuptools Issues

For errors like this:

error: invalid command 'egg_info'

or other setuptools-related errors (e.g. an error on python setup.py develop), try updating setuptools. Also try removing (pip uninstall, or manually) any remnants of old versions of distribute or setuptools, both in the virtualenv and in the relevant system Python installation.

Building Python From Source

When building Python from source, remember that it requires development packages ("-dev" packages in Debian, "-devel" in RHEL) for several libraries. In particular, the readline library enables line editing in the interactive Python interpreter (e.g. up/down/left/right and home/end keys).

RHEL5:

Shared Library

There may be some reason to compile Python with the --enable-shared option. If using a nonstandard prefix (e.g. ``/opt/python27``), then it won't be able to find the shared library without setting LD_LIBRARY_PATH.

I found an article describing how to add rpath to the binaries::

  # Apparently it will give an unhelpful error if the directory doesn't exist
  PREFIX=/opt/python27
  sudo mkdir -p $PREFIX/lib
  ./configure --enable-shared --prefix=$PREFIX  LDFLAGS="-Wl,-rpath $PREFIX/lib"

Reference: http://koansys.com/tech/building-python-with-enable-shared-in-non-standard-location

Another option is to just link the shared libraries to /usr/lib, (as long as there isn't another interpreter of the same version on the system). This seems to work well for e.g. adding Python 2.7 / 3.x to RHEL5.

info/python.txt · Last modified: 2018-10-19 16:06 by sam