Python 3 迁移指南

Python 语言并非新生事物,但它持续获得动力和用户,仿佛它才刚刚发布。我并不是说 Python 没有变化;它已经发展壮大,功能和速度都得到了提升,现在它已成为从数据科学到测试自动化再到教育等各个领域的热门语言。但是,那些 15 或 20 年前最后一次使用 Python 的人会觉得,最新版本的语言是他们已掌握知识的自然延伸和演变。

与此同时,语言的变更——特别是 Python 3.x 中所做的变更——意味着 Python 2 程序无法在 Python 3 中未经修改地运行。这是一个已知的问题,也是 Python 的 BDFL(终身仁慈独裁者)Guido van Rossum 在多年前启动“Python 3000”项目时宣布的流程的一部分。Guido 预计组织从 Python 2 迁移到 Python 3 需要时间,但他也认为对语言的改进是必要的。

好消息是,Python 3(在撰写本文时版本为 3.5)确实比 Python 2 更好。坏消息是,仍然有很多公司(包括我的许多培训和咨询客户)仍然使用 Python 2。

他们为什么不直接升级呢?在大多数情况下,这是因为他们认为升级所需的时间和精力不值得投入开发人员资源。Python 2 和 3 之间的大多数差异很容易被人理解和表达,但升级并非完全自动。将大型代码库从 Python 2 迁移到 3 可能需要几天,但也可能需要数周或数月。

尽管如此,公司很快将被迫升级,因为到 2020 年,将不再支持 Python 2。许多公司不愿意承担这种风险。

如果您必须升级,但又无法升级,那您就陷入了非常糟糕的境地。但是,还有另一种选择:逐步升级,每周仅修改 1-2 个文件,使其与 Python 2 和 3 兼容。经过数月的这种逐步更改,您将能够以相对较小的投入完全切换到 Python 3。

如何使您的代码与两者兼容?在本文中,我提供了一些关于如何做到这一点的建议,包括理解 Python 3 的更改以及为简化此过渡而开发的工具。不要等到 2019 年才开始进行这些更改;如果您是一名 Python 开发人员,您应该已经(在 2016 年年中)开始考虑如何更改您的代码以使其与 Python 3 兼容。

发生了什么变化?

首先要问的是:Python 3 中究竟发生了什么变化?以及,您从 Python 2 迁移到 Python 3 有多容易?或者,您如何修改 Python 2 程序,使其在 Python 2 中继续工作,然后在 Python 3 中也能未经修改地工作?对于我的客户,以及可能对于您的企业而言,在过渡时期,最后一个问题可能是最重要的。

从表面上看,Python 3 实际上并没有发生太多变化。它是一种更简洁、更高效、更现代的语言,其工作方式更符合现代 Python 开发人员的期望。Python 开发人员多年来一直在做的事情,但这些事情在语言中并非默认设置,现在确实是默认设置了。当然,在多年坏习惯之后,我仍然在适应一些事情,例如未能在使用 print 时在传递给它的参数周围使用括号,但总的来说,该语言保持不变。

但是,这并不意味着没有任何变化,或者您可以不更改代码就蒙混过关。

例如,您几乎肯定从未使用过 Python 2 的 input 内置函数来获取用户输入。相反,您想使用 raw_input 内置函数。因此,在 Python 3 中,没有与 Python 2 的 input 等效的功能;Python 3 的 input 函数与 Python 2 的 raw_input 相同。

更深刻的变化是字符串行为的转变。字符串不再包含字节;现在它们包含 Unicode 字符,使用 UTF-8 编码。如果您的 100% 工作都使用 ASCII,那么您很幸运;您的程序中实际上无需进行任何更改。但是,如果您使用非 ASCII 字符,并且在同一个程序中处理二进制文件的内容,则需要进行一些调整。Python 2 的 str 类现在是 bytes 类,Python 2 的 unicode 类现在是 str 类。

还进行了一些其他更改,使 Python 更加高效。例如,Python 2 具有 range 函数(返回整数列表)和 xrange 函数(返回迭代器)。Python 3 的 range 函数与 Python 2 的 xrange 相同,因为它效率更高,而且实际上很少有理由偏爱旧的 range。但是,如果您的程序期望从 range 返回列表,那么当您迁移到 Python 3 时,您可能会遇到麻烦。

另一个问题,在过去一两年中已经变得不那么严重,是第三方库的问题。如果您正在使用来自 PyPI 的软件包,则需要确保不仅您自己的代码可以与 Python 3 一起使用,而且所有这些软件包也可以。长期以来,我认为这些软件包是阻止许多人升级的瓶颈。但如今,大多数流行的软件包都支持 Python 3,您可以在 Python 3 Readiness 站点上看到,该站点跟踪此类信息。

识别问题

那么,如何获取 Python 2 程序并对其进行修改,使其在 Python 2 和 3 下都能工作呢?您可以逐行浏览代码并尝试查找更改,但是有一些工具可以使该过程容易得多。

第一个是 Python 开发人员的老朋友 pylint 程序,它通常检查您的代码的 Python 样式和用法。现代版本的 pylint 具有 py3k 选项,您可以应用该选项来检查您的代码与 Python 3 的兼容性。例如,假设您编写了清单 1 中所示的(糟糕的)程序。您如何找出哪些部分无法工作?您可以运行以下命令


pylint --py3k oldstuff.py

并且,您将获得以下输出


************* Module oldstuff
W:  3, 7: raw_input built-in referenced (raw_input-builtin)
E:  4, 0: print statement used (print-statement)
E:  5, 0: print statement used (print-statement)
E:  6, 0: print statement used (print-statement)
W:  8, 9: raw_input built-in referenced (raw_input-builtin)
E: 10, 4: print statement used (print-statement)
W: 10,48: division w/o __future__ statement (old-division)
E: 14, 4: print statement used (print-statement)
W: 16, 4: range built-in referenced when not iterating
 ↪(range-builtin-not-iterating)
E: 17, 0: print statement used (print-statement)

输出包含错误(“E”)和警告(“W”)。示例程序正在使用 print 作为语句,而不是函数。它在使用 range 而不是迭代。并且,它正在使用 raw_input。您能对此做些什么,以及如何改进呢?pylint 不会告诉您;这不是它的工作。但是,至少您现在有了一个要修复和改进的事项列表,以便它至少可以在 Python 3 下运行。

清单 1. oldstuff.py


#!/usr/bin/env python

name = raw_input("Enter your name: ")
print "Hello, ",
print name,
print "!"

number = raw_input("Enter a number: ")
for i in [2,3,5]:
    print "{} / {} = {}".format(int(number), i, int(number) / i)


for i in range(10):
    print i

x = range(10)
print x[3]

如果您编写了一个带有 requirements 文件的 Python 包,则可以从 PyPI 下载并安装 caniusepython3。针对您的 requirements 文件运行 caniusepython3 将指示哪些可以工作,哪些不能工作。如果您不想下载和安装 caniusepython3,实际上可以访问 Can I Use Python 3 站点并在那里上传您的 requirements 文件。

修复问题

Python 自带一个名为 2to3 的程序已有一段时间,该程序会检查您的 Python 2 代码,并尝试找到使其与 Python 3 一起工作的方法。因此,您可以运行


2to3 oldstuff.py




and get unified diff-style output, indicating what changes you'll need to make in order for your program to work under Python 3. The problem is that this is a one-way conversion. It tells you how to change your program so it'll work with Python 3, but it doesn't help you make your program compatible with both 2 and 3 simultaneously.

Fortunately, there's a package on PyPI called futurize that not only runs 2to3, but also provides the import statements necessary for your code to run under both versions. You can just run:


futurize oldstuff.py

and the output is (as with 2to3) in diff format, so you can use it either to create a file that's compatible with both or to read through things.

What if you have Python 3 code and want to make it backward-compatible with Python 2? The same people who make futurize also make the amusingly named pasteurize, which inserts the appropriate import statements into code.

How do you know if your code really works well under both Python 2 and 3 after you have applied futurize's changes? You can't, and there is no doubt that these automatic tools will get some things wrong. For this reason (among others), it's crucial that you have a good test suite, with good coverage of your Python 2 code. Then you can run your tests against the Python 3 version and ensure that it works correctly there as well. Without these tests, you shouldn't think that your upgrade has worked; even 100% test coverage is never a guarantee, but it at least can tell you that the risk of failure has been minimized.

What if you're doing all sorts of serious and deep things with Python 2 that 2to3 can't notice, or that you can't paper over? A great package on PyPI is six, which papers over the differences between Python 2 and 3. For example, let's say you want to create a new object of the type used for text, such that things will be compatible across versions. In Python 2, that's going to be unicode, but in Python 3, that's going to be str. You don't want to have an "if" statement in your code each time you do this. Thus, using six, you can say:


import six
s = six.text_type()

Now you can be sure that "s" is an object of the appropriate type.

six defines an amazing array of things that have changed, which you might need to keep track of in your code. Want to check something in the builtins namespace (aka __builtins__ in Python 2)? Want to re-raise exceptions? Want to use StringIO (or BytesIO)? Want to deal with metaclasses? Using six, you can write a single line of code, which behind the scenes will issue the appropriate "if" statements for the version of Python you're using.

Even if you don't use six in your code, I recommend that you read through its documentation just to see where things have changed in Python 3. It'll open your eyes (as it did to mine) regarding the behind-the-scenes changes that often aren't discussed in the Python 2/3 world, and it might give you more insights into how to write your code so that it can work in both.

Conclusion

If you're starting to write some new Python code today, you should use Python 3. And if you have Python 2 code that you can upgrade to Python 3, you should do that as well. But if you're like most companies with an existing Python 2 code base, your best option might well be to upgrade incrementally, which means having code that works under 2 and 3 simultaneously. Once you've converted all of your code, and it passes tests under both 2 and 3, you can flip the switch, joining the world of Python 3 and all of its goodness.

Resources

Much has been written about the changes in Python 2 and 3. A great collection of such information is at the Python-Future website. That site offers the futurize and pasteurize packages as well as a great deal of documentation describing the changes between versions, techniques for upgrading and things to watch out for.

The six package is documented here. Even if you don't use six for 2/3 compatibility, I strongly suggest that you look through its capabilities.

Finally, if you're a web developer using Django, you definitely should read the Django-specific documentation regarding moving to Python 3 here.

This is especially important because of Django's handling of strings, bytes and Unicode strings, the names of which changed a bit over the years. Django actually includes a copy of the six library, modified slightly to suit its needs for internal use.

Reuven Lerner 在世界各地向公司教授 Python、数据科学和 Git。您可以订阅他的免费每周“更好的开发人员”电子邮件列表,并从他的书籍和课程中学习,网址为 http://lerner.co.il。Reuven 与他的妻子和孩子们住在以色列的莫迪因。

加载 Disqus 评论