Hybrid Python/C++ packages, revisited
Last year I published a blog post that explained how to structure a Python package with a C++ extension module. My goal was to craft a Python package that leveraged C++ for performance and had an easily maintainable and testable structure. Well, seven months later, I’m revisiting Python/C++ packaging. I’m now convinced that the structure that I described in my original post is not ideal. This post will lay out the problems I discovered with my prior approach, and explain my new approach. I’ll conclude with some thoughts on the big picture of working on projects with mixed codebases.
The first part of this post assumes familiarity with my last blog post. If you just want to know how to set up your package, you can skip ahead to Part 2. If you’re even more impatient, you can go right to the complete working example on Github.
Part 1: Polluted namespaces and import madness
Python’s package import and namespace system is complex, to say the least. There are many pitfalls. Recall the repository directory structure that I described in my previous post:
python_cpp_example
├── CMakeLists.txt
├── LICENSE
├── README.md
├── build/
├── lib
│ ├── catch/
│ └── pybind11/
├── python_cpp_example
│ ├── __init__.py
│ ├── bindings.cpp
│ ├── math.cpp
│ └── math.hpp
├── setup.py
└── tests
├── __init__.py
├── math_test.py
├── test_main.cpp
└── test_math.cpp
This directory structure is common. The root directory python_cpp_example
contains a README.md
, setup.py
, and other files that support the python_cpp_example
package. The package source code lives in a subdirectory also named python_cpp_example
. Strictly speaking, a Python package is a collection of Python modules in a directory with an __init__.py
file that tells Python, “Hey, this is a package.” So in this example, the python_cpp_example
subdirectory is really the package.
When actively developing a Python package, developers commonly install packages in development mode so they can make changes to the source code without having to reinstall the package after every change. In our example above, running setup.py develop
will add the python_cpp_example
package to the global Python import namespace. This behavior is desirable so that we can run import python_cpp_example
from anywhere. However, without specifying in setup.py
exactly where our package source code lives, running setup.py develop
will add any subdirectory with an __init__.py
to the global namespace. Notice a problem with the directory structure above? The tests
directory also has an __init__.py
, making it a package that setuptools
will install. Running setup.py develop
then makes this possible:
import tests # Uh-oh
tests.test_math.test_add()
We’ve polluted the global package namespace with a generic-sounding tests
package. This is definitely not what we want. So what can we do? One option would be to never install our package in development mode, but this is a fragile solution. Our package should behave the same whether it is installed into site-packages/
or in development mode. The safer option is to change the directory structure and specify source directories in setup.py
to proactively avoid import problems.
Part 2: A better package structure
Many others have written about how to structure a Python package. The most thorough and well-reasoned post I’ve read suggests the following layout:
python_cpp_example
├── setup.py
├── src
│ └── python_cpp_example/ # Source code goes under this directory
└── tests/ # Tests live here
This layout guards against some of the import problems I described in Part 1. To make sure nothing outside of src/
gets added to the global package namespace, add these two arguments to setup()
in your setup.py
:
from setuptools import find_packages
setup(
...
packages=find_packages('src'),
package_dir={'':'src'},
...
)
This guarantees that tests
will never be accidentally added to the global package namespace. We will construct our hybrid Python/C++ package using this layout. The following sections follow closely with my prior blog post, but adapted to this new directory structure.
A simple package with a pybind11
-based C++ extension module
To start, we’ll create a simple pybind11
-based Python module. Pybind11
is an excellent header-only C++ library that makes it easy to write Python wrappers for any C++ code and bundle them into an extension module. The extension module will share the same name as our package, python_cpp_example
. The directory structure for our package is similar to the one described above, with a few additions, notably lib
, build
, and CMakeLists.txt
:
python_cpp_example
├── build/ # Build directory for C++ extension modules
├── lib/ # External C++ libraries
├── setup.py
├── src
│ └── python_cpp_example # Python and C++ source code
│ ├── __init__.py
│ └── hello.py
└── tests/ # Python and C++ unit tests
We’ll also assume that our package already contains one module called hello.py
.
hello.py
:
def say_hello():
print("Hello world!")
Make sure that the src/python_cpp_example
directory contains a blank __init__.py
so Python knows that this directory is a package. We will place our C++ source code in the same directory as our Python code.
The build
directory will contain compiled code generated by our build system. The lib
directory will contain the C++ libraries needed for our package. Under lib/
, download and extract the latest pybind11
release by running the following commands (assuming you’re working in a *nix environment):
wget https://github.com/pybind/pybind11/archive/v2.1.1.tar.gz
tar -xvf v2.1.1.tar.gz
# Copy pybind11 library into our project
cp -r pybind11-2.1.1 python_cpp_example/lib/pybind11
Now we’ll write two simple C++ functions and then wrap them in Python with pybind11
. Under src/python_cpp_example
, create three files: math.hpp
, math.cpp
, and bindings.cpp
.
math.hpp
and math.cpp
are simple C++ header and definition files:
math.hpp
:
/*! Add two integers
\param i an integer
\param j another integer
*/
int add(int i, int j);
/*! Subtract one integer from another
\param i an integer
\param j an integer to subtract from \p i
*/
int subtract(int i, int j);
math.cpp
:
#include "math.hpp"
int add(int i, int j)
{
return i + j;
}
int subtract(int i, int j)
{
return i - j;
}
Lastly, we’ll define our Python wrappers in a file called bindings.cpp
. See the pybind11
tutorial for a detailed explanation of what each line of code does here.
bindings.cpp
:
#include <pybind11/pybind11.h>
#include "math.hpp"
namespace py = pybind11;
PYBIND11_PLUGIN(python_cpp_example)
{
py::module m("python_cpp_example");
m.def("add", &add);
m.def("subtract", &subtract);
return m.ptr();
}
We’ve now defined two functions in C++, add
and subtract
, and written the code to wrap them in a Python module called python_cpp_example
. Your directory structure should now look something like this:
python_cpp_example
├── build
├── lib
│ └── pybind11
├── src
│ └── python_cpp_example
│ ├── __init__.py
│ ├── hello.py
│ ├── bindings.cpp
│ ├── math.cpp
│ └── math.hpp
├── setup.py
└── tests
Next we have to set up our build environment. We’ll use CMake to build the C++ extension modules in our package.
Configuring the build environment
We could have written a Makefile directly to build our package, but using CMake simplifies building pybind11
-based modules. (If you don’t believe me, check out the Makefile that CMake generates.) Using CMake will also make it easier to add C++ unit tests later.
First make sure that you have CMake installed. On macOS, installation can be done with brew
by running the following command:
> brew install cmake
If you are not familiar with CMake, I suggest skimming through the CMake introductory tutorial. We’re going to use a small set of CMake functions here, so even if you’re new to CMake, the code should be easy to follow.
To start, we’ll define a project, set the source directory, and define a list of C++ sources without bindings.cpp
. (This list will come in handy later when we want build C++ tests independently of any Python bindings.) Create a CMakeLists.txt
file in the package’s root directory and add the following:
cmake_minimum_required(VERSION 2.8.12)
project(python_cpp_example)
# Set source directory
set(SOURCE_DIR "src/python_cpp_example")
# Tell CMake that headers are also in SOURCE_DIR
include_directories(${SOURCE_DIR})
set(SOURCES "${SOURCE_DIR}/math.cpp")
Next, we’ll tell CMake to add the pybind11
directory to our project and define an extension module. This time, make sure bindings.cpp
is added to the sources list. Add the following to CMakeLists.txt
:
# Generate Python module
add_subdirectory(lib/pybind11)
pybind11_add_module(python_cpp_example ${SOURCES} "${SOURCE_DIR}/bindings.cpp")
That’s all we need to instruct CMake to build our extension module. Rather than run CMake directly, however, we’re going to configure Python’s built-in setuptools
to build our package automatically via setup.py
.
Building with setuptools
On its own, setup.py
will not build an extension module with a CMake-based build system. We have to define a custom build command. The code I’m presenting here was largely taken from the pybind11
’s CMake example repository. I won’t explain every line of the code here, but in brief, we’re defining two classes that will create a temporary build directory and then call CMake to build any extension modules in our package. Add these two class definitions to setup.py
:
import os
import re
import sys
import sysconfig
import platform
import subprocess
from distutils.version import LooseVersion
from setuptools import setup, find_packages, Extension
from setuptools.command.build_ext import build_ext
class CMakeExtension(Extension):
def __init__(self, name, sourcedir=''):
Extension.__init__(self, name, sources=[])
self.sourcedir = os.path.abspath(sourcedir)
class CMakeBuild(build_ext):
def run(self):
try:
out = subprocess.check_output(['cmake', '--version'])
except OSError:
raise RuntimeError(
"CMake must be installed to build the following extensions: " +
", ".join(e.name for e in self.extensions))
if platform.system() == "Windows":
cmake_version = LooseVersion(re.search(r'version\s*([\d.]+)',
out.decode()).group(1))
if cmake_version < '3.1.0':
raise RuntimeError("CMake >= 3.1.0 is required on Windows")
for ext in self.extensions:
self.build_extension(ext)
def build_extension(self, ext):
extdir = os.path.abspath(
os.path.dirname(self.get_ext_fullpath(ext.name)))
cmake_args = ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=' + extdir,
'-DPYTHON_EXECUTABLE=' + sys.executable]
cfg = 'Debug' if self.debug else 'Release'
build_args = ['--config', cfg]
if platform.system() == "Windows":
cmake_args += ['-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{}={}'.format(
cfg.upper(),
extdir)]
if sys.maxsize > 2**32:
cmake_args += ['-A', 'x64']
build_args += ['--', '/m']
else:
cmake_args += ['-DCMAKE_BUILD_TYPE=' + cfg]
build_args += ['--', '-j2']
env = os.environ.copy()
env['CXXFLAGS'] = '{} -DVERSION_INFO=\\"{}\\"'.format(
env.get('CXXFLAGS', ''),
self.distribution.get_version())
if not os.path.exists(self.build_temp):
os.makedirs(self.build_temp)
subprocess.check_call(['cmake', ext.sourcedir] + cmake_args,
cwd=self.build_temp, env=env)
subprocess.check_call(['cmake', '--build', '.'] + build_args,
cwd=self.build_temp)
print() # Add an empty line for cleaner output
Next, at the bottom of setup.py
, modify setup()
with the newly-defined custom extension builder:
setup(
name='python_cpp_example',
version='0.1',
author='Benjamin Jack',
author_email='benjamin.r.jack@gmail.com',
description='A hybrid Python/C++ test project',
long_description='',
# tell setuptools to look for any packages under 'src'
packages=find_packages('src'),
# tell setuptools that all packages will be under the 'src' directory
# and nowhere else
package_dir={'':'src'},
# add an extension module named 'python_cpp_example' to the package
# 'python_cpp_example'
ext_modules=[CMakeExtension('python_cpp_example/python_cpp_example')],
# add custom build_ext command
cmdclass=dict(build_ext=CMakeBuild),
zip_safe=False,
)
Now you should be able to run python3 setup.py develop
from within your package’s root directory and you will see an extension module generated under src/python_cpp_example/
. You can now import and use your new package.
Up until now, I have largely followed along with the pybind11
tutorial and pybind11
’s CMake example repository. In the following sections, I will describe how to add unit testing within this set up.
Writing Python unit tests
We’ll begin by adding Python unit tests using Python’s built-in unittest
module. Under tests/
add an empty file called __init__.py
. This file will stay empty, but it is required for unittest
’s automatic test discovery.
python_cpp_example/
├── CMakeLists.txt
├── build/
├── lib
│ ├── catch/
│ └── pybind11/
├── setup.py
├── src
│ └── python_cpp_example
│ ├── __init__.py
│ ├── hello.py
│ ├── bindings.cpp
│ ├── math.cpp
│ ├── math.hpp
│ └── python_cpp_example.cpython-36m-darwin.so
└── tests
└── __init__.py
In the same tests
directory, add a file math_test.py
with a few simple unit tests.
import unittest
# import our `pybind11`-based extension module from package python_cpp_example
from python_cpp_example import python_cpp_example
class MainTest(unittest.TestCase):
def test_add(self):
# test that 1 + 1 = 2
self.assertEqual(python_cpp_example.add(1, 1), 2)
def test_subtract(self):
# test that 1 - 1 = 0
self.assertEqual(python_cpp_example.subtract(1, 1), 0)
if __name__ == '__main__':
unittest.main()
Add the following test_suite
keyword argument to setup()
in your setup.py
script:
setup(
...
test_suite='tests',
...
)
That’s all you need for Python unit tests. You can add as many test files as you want, and each one should define a class that extends unittest.TestCase
. As long as all of your files have a _test.py
suffix, unittest
will automatically discover them. Run python3 setup.py test
and you should get output that looks like this:
> python3 setup.py test
test_add (tests.math_test.MainTest) ... ok
test_subtract (tests.math_test.MainTest) ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK
Python’s built-in unittest
module is a powerful unit testing framework with a variety of built in assertions. You can read more about unittest
in the official Python documentation.
Writing C++ unit tests with catch
Unlike Python, C++ needs an external library to enable unit testing. I’ve chosen to use catch
for its concise syntax and its header-only structure. Download and extract catch
in the lib
directory of your package. On a *nix system, you could run the following:
cd lib
wget https://github.com/philsquared/Catch/archive/v1.9.4.tar.gz
tar -xvf v1.9.4.tar.gz
# Copy catch library into our project
cp -r catch-1.9.4 python_cpp_example/lib/catch
Similar to Python’s __init__.py
, catch
requires an initialization file that we’ll name test_main.cpp
. This configuration file must contain two specific lines and nothing else. Under the tests
directory, add the following file:
test_main.cpp
:
#define CATCH_CONFIG_MAIN
#include <catch.hpp>
Now we’ll make a file with two simple unit tests. This time I’m using a test_
prefix (rather than a _test.py
suffix) to easily distinguish between Python unit tests and C++ unit tests without looking at the file extension. Add the following to a file named test_math.cpp
in the tests
directory:
test_math.cpp
#include <catch.hpp>
#include "math.hpp"
TEST_CASE("Addition and subtraction")
{
REQUIRE(add(1, 1) == 2);
REQUIRE(subtract(1, 1) == 0);
}
These tests are analogous to the Python tests in the previous section. Normally, I would not unit test both the pybind11
Python wrappers and the underlying C++ definitions for such simple functions. However, you can imagine an instance in which you didn’t want to expose all of your C++ code with Python wrappers, but you still wanted to unit test that C++ code. Likewise, the Python wrappers can get quite complex and it may be useful to test your C++ code independently of the wrapping code. I’ll revisit the topic of unit testing at the end of this post.
Your directory structure should now look something like this:
python_cpp_example/
├── CMakeLists.txt
├── build
│ └── temp.macosx-10.12-x86_64-3.6
├── lib
│ ├── catch/
│ └── pybind11/
├── src
│ └── python_cpp_example
│ ├── __init__.py
│ ├── hello.py
│ ├── bindings.cpp
│ ├── math.cpp
│ ├── math.hpp
│ └── python_cpp_example.cpython-36m-darwin.so
├── setup.py
└── tests
├── __init__.py
├── math_test.py
├── test_main.cpp
└── test_math.cpp
Lastly, we need to instruct CMake that we’ve added C++ unit tests. We’ll add test_main.cpp
and test_math.cpp
to a TESTS
variable. Then we’ll include the catch
library and define an executable python_cpp_example_test
. Add the following to your CMakeLists.txt
file:
SET(TEST_DIR "tests")
SET(TESTS ${SOURCES}
"${TEST_DIR}/test_main.cpp"
"${TEST_DIR}/test_math.cpp")
# Generate a test executable
include_directories(lib/catch/include)
add_executable("${PROJECT_NAME}_test" ${TESTS})
Now run python3 ./setup.py develop
and if you navigate to build/temp.*
(e.g., build/temp.macosx-10.12-x86_64-3.6
on my system), you should see an executable python_cpp_example_test
. Running this executable will execute the catch
unit tests. Rather than run this executable ourselves, however, we will move the executable to somewhere more convenient and use setuptools
to execute it along side our Python unit tests.
Using Python’s unittest
to execute C++ catch
tests
To run our C++ unit tests along side the Python tests, we’re going to call the python_cpp_example_test
executable from a Python unit test. This requires two steps: first moving the python_cpp_example_test
executable from build/
to tests/
, then writing a simple unittest
test that calls the executable.
First, add this copy_test_file()
function to the CMakeBuild
class in setup.py
:
...
from shutil import copyfile, copymode
...
class CMakeBuild(build_ext):
def run(self):
...
def build_extension(self, ext):
...
def copy_test_file(self, src_file):
'''
Copy ``src_file`` to `tests/bin` directory, ensuring parent directory
exists. Messages like `creating directory /path/to/package` and
`copying directory /src/path/to/package -> path/to/package` are
displayed on standard output. Adapted from scikit-build.
'''
# Create directory if needed
dest_dir = os.path.join(os.path.dirname(
os.path.abspath(__file__)), 'tests', 'bin')
if dest_dir != "" and not os.path.exists(dest_dir):
print("creating directory {}".format(dest_dir))
os.makedirs(dest_dir)
# Copy file
dest_file = os.path.join(dest_dir, os.path.basename(src_file))
print("copying {} -> {}".format(src_file, dest_file))
copyfile(src_file, dest_file)
copymode(src_file, dest_file)
...
Then call copy_test_file()
from within CMakeBuild.build_extension()
.
setup.py
:
...
class CMakeBuild(build_ext):
def run(self):
...
def build_extension(self, ext):
... # There is more code here not shown
subprocess.check_call(['cmake', ext.sourcedir] + cmake_args,
cwd=self.build_temp, env=env)
subprocess.check_call(['cmake', '--build', '.'] + build_args,
cwd=self.build_temp)
# Copy *_test file to tests directory
test_bin = os.path.join(self.build_temp, 'python_cpp_example_test')
self.copy_test_file(test_bin)
print() # Add empty line for nicer output
def copy_test_file(self, src_file):
...
...
Upon building the extension module, Python’s setuptools will also now copy python_cpp_example_test
into the tests/bin
directory. Next we’ll add another Python unit test under the tests
directory.
test_cpp.py
:
import unittest
import subprocess
import os
class MainTest(unittest.TestCase):
def test_cpp(self):
print("\n\nTesting C++ code...")
subprocess.check_call(os.path.join(os.path.dirname(
os.path.relpath(__file__)), 'bin', 'python_cpp_example_test'))
print() # for prettier output
if __name__ == '__main__':
unittest.main()
And now, the moment we’ve been waiting for: a single command to build and test a hybrid Python/C++ package.
> python3 setup.py test
...
test_cpp (tests.cpp_test.MainTest) ...
Testing C++ code...
===============================================================================
All tests passed (2 assertions in 1 test case)
Resuming Python tests...
ok
test_add (tests.math_test.MainTest) ... ok
test_subtract (tests.math_test.MainTest) ... ok
----------------------------------------------------------------------
Ran 3 tests in 0.006s
OK
Under the hood, the above command is first building any extension modules, executing Python unit tests, and then executing C++ unit tests.
Final thoughts: why test both Python and C++?
In this blog post, I’ve described how to structure a Python package with a C++ extension module where both the Python bindings and the C++ code are tested independently. I have a large project structured this way, with a set of Python tests and a set C++ tests. While the structure described in this post is less fragile than the previous set up, I’m wondering whether it’s worthwhile to test both codebases in the same repository. I think it comes down to asking two questions,
- Will my users only interact with my software via the Python interface?
- Should my C++ code be available as an independent library?
If your answer to the first question is yes, then just write Python tests. Pybind11
has its own suite of tests, so trust that the binding library will work as expected. Python is much quicker to write and easier to maintain.
If your C++ code will be available as an independent library, then break out the C++ code into its own Github repository. Write C++ tests within that repository. Document, test, and maintain that library independently of your Python bindings. For your Python bindings, write a minimal set of tests. Or you could write no tests for the bindings at all. Again, pybind11
has its own tests, and there are tools that will even autogenerate python bindings from C++ code. Leave it to the pybind11
developers to test their binding library.
So in conclusion, while I’ve presented a structure here that is more maintainable than the structure described in my previous post, I’m not convinced that the original motivation for this series of posts make sense. Namely, I question that Python code and C++ code should be tested independently. If these codebases truly need to be tested independently, then they should be developed and maintained independently in separate repositories. Conversely, if you only expect your users to interact with your software via Python, then just test the Python interface. More tests are not always better.