Handling bytes consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 bytes object is simply an alias for Python 2’s str, rather than a true implementation of the Python 3 bytes object, which is substantially different.
future contains a backport of the bytes object from Python 3 which passes most of the Python 3 tests for bytes. (See tests/test_future/test_bytes.py in the source tree.) You can use it as follows:
>>> from builtins import bytes
>>> b = bytes(b'ABCD')
On Py3, this is simply the builtin bytes object. On Py2, this object is a subclass of Python 2’s str that enforces the same strict separation of unicode strings and byte strings as Python 3’s bytes object:
>>> b + u'EFGH' # TypeError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument can't be unicode string
>>> bytes(b',').join([u'Fred', u'Bill'])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, found unicode string
>>> b == u'ABCD'
False
>>> b < u'abc'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: bytes() and <type 'unicode'>
In most other ways, these bytes objects have identical behaviours to Python 3’s bytes:
b = bytes(b'ABCD')
assert list(b) == [65, 66, 67, 68]
assert repr(b) == "b'ABCD'"
assert b.split(b'B') == [b'A', b'CD']
Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...' in a bytes() call as follows:
from builtins import bytes
# ...
b = bytes(b'This is my bytestring')
# ...
This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.