版本 0.17.1 (2015年11月21日)#

注意

我们很荣幸地宣布，pandas 已成为 (NumFOCUS 组织) 的赞助项目。这将有助于确保 pandas 作为世界级开源项目的开发成功。

这是 0.17.0 的一个次要错误修复版本，包含了大量的错误修复，以及一些新功能、增强功能和性能改进。我们建议所有用户升级到此版本。

主要亮点包括

支持条件 HTML 格式化，详见此处
释放 CSV 读取器及其他操作上的 GIL，详见此处
修复了 0.16.2 版本中 DataFrame.drop_duplicates 的回归问题，该问题导致整数值上结果不正确 (GH 11376)

新功能#

条件 HTML 格式化#

警告

这是一个新功能，正在积极开发中。我们将在未来的版本中添加功能，并可能进行破坏性更改。欢迎在 GH 11610 中提供反馈。

我们增加了对条件 HTML 格式化（基于数据对 DataFrame 进行视觉样式设置）的实验性支持。样式设置是通过 HTML 和 CSS 完成的。通过 pandas.DataFrame.style 属性访问样式类，这是一个附加了数据的 Styler 实例。

这是一个快速示例

In [1]: np.random.seed(123)

In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))

In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)

我们可以渲染 HTML 以获得下表。

	a	b	c	d	e
0	-1.085631	0.997345	0.282978	-1.506295	-0.5786
1	1.651437	-2.426679	-0.428913	1.265936	-0.86674
2	-0.678886	-0.094709	1.49139	-0.638902	-0.443982
3	-0.434351	2.20593	2.186786	1.004054	0.386186
4	0.737369	1.490732	-0.935834	1.175829	-1.253881
5	-0.637752	0.907105	-1.428681	-0.140069	-0.861755
6	-0.255619	-2.798589	-1.771533	-0.699877	0.927462
7	-0.173636	0.002846	0.688223	-0.879536	0.283627
8	-0.805367	-1.727669	-0.3909	0.573806	0.338589
9	-0.01183	2.392365	0.412912	0.978736	2.238143

Styler 与 Jupyter Notebook 交互良好。更多信息请参见文档。

增强功能#

DatetimeIndex 现在支持使用 astype(str) 转换为字符串 (GH 10442)
pandas.DataFrame.to_csv() 中支持 compression (gzip/bz2) (GH 7615)
pd.read_* 函数现在也可以接受 pathlib.Path 或 py:py._path.local.LocalPath 对象作为 filepath_or_buffer 参数。 (GH 11033) - DataFrame 和 Series 函数 .to_csv(), .to_html() 和 .to_latex() 现在可以处理以波浪号开头的路径（例如 ~/Documents/） (GH 11438)
如果未提供列，DataFrame 现在使用 namedtuple 的字段作为列 (GH 11181)
如果可能，DataFrame.itertuples() 现在返回 namedtuple 对象。 (GH 11269, GH 11625)
为平行坐标图添加了 axvlines_kwds (GH 10709)

.info() 和 .memory_usage() 选项，用于提供内存消耗的深度自省。请注意，这可能计算成本高昂，因此是一个可选参数。 (GH 11595)

In [4]: df = pd.DataFrame({"A": ["foo"] * 1000})  # noqa: F821

In [5]: df["B"] = df["A"].astype("category")

# shows the '+' as we have object dtypes
In [6]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 9.0+ KB

# we have an accurate memory assessment (but can be expensive to compute this)
In [7]: df.info(memory_usage="deep")
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   A       1000 non-null   object  
 1   B       1000 non-null   category
dtypes: category(1), object(1)
memory usage: 59.9 KB

Index 现在有一个 fillna 方法 (GH 10089)

In [8]: pd.Index([1, np.nan, 3]).fillna(2)
Out[8]: Index([1.0, 2.0, 3.0], dtype='float64')

如果类型为 category 的 Series 的类别是该类型，现在可以提供 .str.<...> 和 .dt.<...> 访问器方法/属性。 (GH 10661)

In [9]: s = pd.Series(list("aabb")).astype("category")

In [10]: s
Out[10]: 
0    a
1    a
2    b
3    b
Length: 4, dtype: category
Categories (2, object): ['a', 'b']

In [11]: s.str.contains("a")
Out[11]: 
0     True
1     True
2    False
3    False
Length: 4, dtype: bool

In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category")

In [13]: date
Out[13]: 
0   2015-01-01
1   2015-01-02
2   2015-01-03
3   2015-01-04
4   2015-01-05
Length: 5, dtype: category
Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05]

In [14]: date.dt.day
Out[14]: 
0    1
1    2
2    3
3    4
4    5
Length: 5, dtype: int32

pivot_table 现在有一个 margins_name 参数，因此您可以使用除默认“All”之外的名称 (GH 3335)
使用固定 HDF5 存储实现 datetime64[ns, tz] dtypes 的导出 (GH 11411)
漂亮打印集（例如 DataFrame 单元格中的）现在使用集合字面量语法 ({x, y}) 而不是旧版 Python 语法 (set([x, y])) (GH 11215)
改进了 pandas.io.gbq.to_gbq() 中流式插入失败时的错误消息 (GH 11285) 以及 DataFrame 不符合目标表 schema 时的错误消息 (GH 11359)

API 变更#

对于不支持的索引类型，Index.shift 中引发 NotImplementedError (GH 8038)
datetime64 和 timedelta64 类型 Series 上的 min 和 max 约简现在结果为 NaT 而不是 nan (GH 11245)。
使用空键进行索引将引发 TypeError，而不是 ValueError (GH 11356)
Series.ptp 现在默认会忽略缺失值 (GH 11163)

弃用#

实现了 google-analytics 支持的 pandas.io.ga 模块已弃用，并将在未来版本中移除 (GH 11308)
弃用 .to_csv() 中的 engine 关键字，该关键字将在未来版本中移除 (GH 11274)

性能改进#

在索引排序前检查单调性 (GH 11080)
当 dtype 不能包含 NaN 时，Series.dropna 性能改进 (GH 11159)
在大多数日期时间字段操作（例如 DatetimeIndex.year, Series.dt.year）、归一化以及与 Period 的相互转换、DatetimeIndex.to_period 和 PeriodIndex.to_timestamp 上释放 GIL (GH 11263)
在某些滚动算法上释放 GIL：rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (GH 11450)
在 read_csv, read_table 中读取和解析文本文件时释放 GIL (GH 11272)
改进了 rolling_median 的性能 (GH 11450)
改进了 to_excel 的性能 (GH 11352)
Categorical 类别的 repr 中的性能错误，它在截断字符串以显示之前就渲染了字符串 (GH 11305)
改进了 Categorical.remove_unused_categories 的性能，(GH 11643)。
改进了使用无数据和 DatetimeIndex 的 Series 构造函数的性能 (GH 11433)
改进了 shift, cumprod 和 cumsum 与 groupby 的性能 (GH 4095)

错误修复#

SparseArray.__iter__() 现在在 Python 3.5 中不再导致 PendingDeprecationWarning (GH 11622)
0.16.2 版本中长浮点数/NaN 输出格式的回归问题已恢复 (GH 11302)
Series.sort_index() 现在正确处理 inplace 选项 (GH 11402)
PyPi 构建中错误分发的 .c 文件在读取浮点数 csv 并传递 na_values=<a scalar> 时会显示异常 (GH 11374)
当索引有名称时，.to_latex() 输出损坏的错误 (GH 10660)
HDFStore.append 中字符串编码长度超过最大未编码长度的错误 (GH 11234)
合并 datetime64[ns, tz] dtypes 时的错误 (GH 11405)
HDFStore.select 中在 where 子句中与 numpy 标量比较时的错误 (GH 11283)
使用 MultiIndex 索引器对 DataFrame.ix 进行索引时的错误 (GH 11372)
具有模糊端点的 date_range 中的错误 (GH 11626)
阻止向访问器 .str, .dt 和 .cat 添加新属性。检索此类值是不可能的，因此在设置时出错。 (GH 10673)
具有模糊时间戳和 .dt 访问器的时区转换中的错误 (GH 11295)
使用模糊时间索引时输出格式的错误 (GH 11619)
Series 与列表类比较中的错误 (GH 11339)
具有 datetime64[ns, tz] 和不兼容的 to_replace 的 DataFrame.replace 中的错误 (GH 11326, GH 11153)
isnull 中的错误，其中 numpy.array 中的 numpy.datetime64('NaT') 未被确定为 null (GH 11206)
混合整数索引的列表式索引中的错误 (GH 11320)
当索引为 Categorical dtype 时，pivot_table 中 margins=True 的错误 (GH 10993)
DataFrame.plot 无法使用十六进制字符串颜色值的错误 (GH 10299)
0.16.2 版本中 DataFrame.drop_duplicates 的回归，导致整数值上的结果不正确 (GH 11376)
pd.eval 中列表中一元运算符出错的错误 (GH 11235)
零长度数组的 squeeze() 中的错误 (GH 11230, GH 8999)
describe() 在分层索引中删除列名的错误 (GH 11517)
DataFrame.pct_change() 未在 .fillna 方法上传播 axis 关键字的错误 (GH 11150)
当将整数和字符串列名混合作为 columns 参数传递时，.to_csv() 中的错误 (GH 11637)
使用 range 进行索引时的错误 (GH 11652)
推断 numpy 标量并在设置列时保留 dtype 的错误 (GH 11638)
当使用 Unicode 列名时，to_sql 导致 UnicodeEncodeError 的错误 (GH 11431)。
修复了 plot 中 xticks 设置的回归问题 (GH 11529)。
holiday.dates 中的错误，其中遵守规则无法应用于假日和文档增强 (GH 11477, GH 11533)
修复了当只有普通的 Axes 实例而不是 SubplotAxes 时的绘图问题 (GH 11520, GH 11556)。
当 header=False 时，DataFrame.to_latex() 生成额外规则的错误 (GH 7124)
当 func 返回包含新日期时间列的 Series 时，df.groupby(...).apply(func) 中的错误 (GH 11324)
当加载文件过大时，pandas.json 中的错误 (GH 11344)
具有重复列的 to_excel 中的错误 (GH 11007, GH 10982, GH 10970)
修复了一个阻止构造 dtype 为 datetime64[ns, tz] 的空 Series 的错误 (GH 11245)。
使用包含整数的 MultiIndex 的 read_excel 中的错误 (GH 11317)
使用 openpyxl 2.2+ 和合并的 to_excel 中的错误 (GH 11408)
当数据中只存在日期时间时，DataFrame.to_dict() 生成 np.datetime64 对象而不是 Timestamp 的错误 (GH 11327)
当计算包含布尔和非布尔列的 DataFrame 的 Kendall 相关性时，DataFrame.corr() 引发异常的错误 (GH 11560)
C inline 函数在 FreeBSD 10+（带 clang）上导致的链接时错误 (GH 10510)
DataFrame.to_csv 中传递用于格式化 MultiIndexes（包括 date_format）的参数的错误 (GH 7791)
DataFrame.join() 中 how='right' 导致 TypeError 的错误 (GH 11519)
空列表结果的 Series.quantile 具有 object dtype 的 Index 的错误 (GH 11588)
当合并结果为空时，pd.merge 结果是空 Int64Index 而不是 Index(dtype=object) 的错误 (GH 11588)
当包含 NaN 值时，Categorical.remove_unused_categories 中的错误 (GH 11599)
DataFrame.to_sparse() 丢失 MultiIndexes 列名的错误 (GH 11600)
具有非唯一列索引的 DataFrame.round() 导致 Fatal Python error 的错误 (GH 11611)
当 decimals 是非唯一索引 Series 时，DataFrame.round() 产生额外列的错误 (GH 11618)

贡献者#

共有 63 人为本次发布贡献了补丁。名字旁带有“+”的人是首次贡献补丁。

Aleksandr Drozd +
Alex Chase +
Anthonios Partheniou
BrenBarn +
Brian J. McGuirk +
Chris
Christian Berendt +
Christian Perez +
Cody Piersall +
Data & Code Expert Experimenting with Code on Data
DrIrv +
Evan Wright
Guillaume Gay
Hamed Saljooghinejad +
Iblis Lin +
Jake VanderPlas
Jan Schulz
Jean-Mathieu Deschenes +
Jeff Reback
Jimmy Callin +
Joris Van den Bossche
K.-Michael Aye
Ka Wo Chen
Loïc Séguin-C +
Luo Yicheng +
Magnus Jöud +
Manuel Leonhardt +
Matthew Gilbert
Maximilian Roos
Michael +
Nicholas Stahl +
Nicolas Bonnotte +
Pastafarianist +
Petra Chong +
Phil Schaf +
Philipp A +
Rob deCarvalho +
Roman Khomenko +
Rémy Léone +
Sebastian Bank +
Sinhrks
Stephan Hoyer
Thierry Moisan
Tom Augspurger
Tux1 +
Varun +
Wieland Hoffmann +
Winterflower
Yoav Ram +
Younggun Kim
Zeke +
ajcr
azuranski +
behzad nouri
cel4
emilydolson +
hironow +
lexual
llllllllll +
rockg
silentquasar +
sinhrks
taeold +