版本 0.9.1 (2012年11月14日)#

这是 0.9.0 的一个错误修复版本，包含一些新特性和增强功能，以及大量的错误修复。新特性包括 DataFrame 和 Series 的按列排序、rank 方法的 NA 处理改进、DataFrame 的遮罩（masking）函数，以及 DataFrame 的盘中时间序列过滤。

新特性#

Series.sort、DataFrame.sort 和 DataFrame.sort_index 现在可以按列指定，以支持多种排序顺序 (GH 928)
In [2]: df = pd.DataFrame(np.random.randint(0, 2, (6, 3)),
   ...:                   columns=['A', 'B', 'C'])

In [3]: df.sort(['A', 'B'], ascending=[1, 0])

Out[3]:
   A  B  C
3  0  1  1
4  0  1  1
2  0  0  1
0  1  0  0
1  1  0  0
5  1  0  0
DataFrame.rank 现在支持 na_option 参数的附加参数值，因此缺失值可以被分配最大或最小的排名 (GH 1508, GH 2159)
In [1]: df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C'])

In [2]: df.loc[2:4] = np.nan

In [3]: df.rank()
Out[3]: 
     A    B    C
0  3.0  2.0  1.0
1  1.0  3.0  2.0
2  NaN  NaN  NaN
3  NaN  NaN  NaN
4  NaN  NaN  NaN
5  2.0  1.0  3.0

[6 rows x 3 columns]

In [4]: df.rank(na_option='top')
Out[4]: 
     A    B    C
0  6.0  5.0  4.0
1  4.0  6.0  5.0
2  2.0  2.0  2.0
3  2.0  2.0  2.0
4  2.0  2.0  2.0
5  5.0  4.0  6.0

[6 rows x 3 columns]

In [5]: df.rank(na_option='bottom')
Out[5]: 
     A    B    C
0  3.0  2.0  1.0
1  1.0  3.0  2.0
2  5.0  5.0  5.0
3  5.0  5.0  5.0
4  5.0  5.0  5.0
5  2.0  1.0  3.0

[6 rows x 3 columns]
DataFrame 新增了 where 和 mask 方法，用于根据给定的布尔遮罩选择值 (GH 2109, GH 2151)
DataFrame 目前支持通过与 DataFrame 长度相同的布尔向量（在 [] 内）进行切片。返回的 DataFrame 具有与原始 DataFrame 相同的列数，但其索引被切片。
In [6]: df = pd.DataFrame(np.random.randn(5, 3), columns=['A', 'B', 'C'])

In [7]: df
Out[7]: 
          A         B         C
0  0.276232 -1.087401 -0.673690
1  0.113648 -1.478427  0.524988
2  0.404705  0.577046 -1.715002
3 -1.039268 -0.370647 -1.157892
4 -1.344312  0.844885  1.075770

[5 rows x 3 columns]

In [8]: df[df['A'] > 0]
Out[8]: 
          A         B         C
0  0.276232 -1.087401 -0.673690
1  0.113648 -1.478427  0.524988
2  0.404705  0.577046 -1.715002

[3 rows x 3 columns]
如果一个 DataFrame 使用基于 DataFrame 的布尔条件（与原始 DataFrame 大小相同）进行切片，则返回一个与原始 DataFrame 大小相同（索引和列）的 DataFrame，其中不符合布尔条件的元素为 NaN。这通过新方法 DataFrame.where 实现。此外，where 接受一个可选的 other 参数用于替换。
In [9]: df[df > 0]
Out[9]: 
          A         B         C
0  0.276232       NaN       NaN
1  0.113648       NaN  0.524988
2  0.404705  0.577046       NaN
3       NaN       NaN       NaN
4       NaN  0.844885  1.075770

[5 rows x 3 columns]

In [10]: df.where(df > 0)
Out[10]: 
          A         B         C
0  0.276232       NaN       NaN
1  0.113648       NaN  0.524988
2  0.404705  0.577046       NaN
3       NaN       NaN       NaN
4       NaN  0.844885  1.075770

[5 rows x 3 columns]

In [11]: df.where(df > 0, -df)
Out[11]: 
          A         B         C
0  0.276232  1.087401  0.673690
1  0.113648  1.478427  0.524988
2  0.404705  0.577046  1.715002
3  1.039268  0.370647  1.157892
4  1.344312  0.844885  1.075770

[5 rows x 3 columns]
此外，where 现在会对输入的布尔条件（ndarray 或 DataFrame）进行对齐，从而使得带有设置的局部选择成为可能。这类似于通过 .ix 进行局部设置（但作用于内容而非轴标签）
In [12]: df2 = df.copy()

In [13]: df2[df2[1:4] > 0] = 3

In [14]: df2
Out[14]: 
          A         B         C
0  0.276232 -1.087401 -0.673690
1  3.000000 -1.478427  3.000000
2  3.000000  3.000000 -1.715002
3 -1.039268 -0.370647 -1.157892
4 -1.344312  0.844885  1.075770

[5 rows x 3 columns]
DataFrame.mask 是 where 的逆布尔操作。
In [15]: df.mask(df <= 0)
Out[15]: 
          A         B         C
0  0.276232       NaN       NaN
1  0.113648       NaN  0.524988
2  0.404705  0.577046       NaN
3       NaN       NaN       NaN
4       NaN  0.844885  1.075770

[5 rows x 3 columns]
支持通过列名引用 Excel 列 (GH 1936)
In [1]: xl = pd.ExcelFile('data/test.xls')

In [2]: xl.parse('Sheet1', index_col=0, parse_dates=True,
                 parse_cols='A:D')
新增选项，可以使用 series.plot(x_compat=True) 或 pandas.plot_params['x_compat'] = True 禁用 pandas 风格的刻度定位器和格式化器 (GH 2205)

现有的 TimeSeries 方法 at_time 和 between_time 已添加到 DataFrame (GH 2149)

DataFrame.dot 现在可以接受 ndarray (GH 2042)

DataFrame.drop 现在支持非唯一索引 (GH 2101)

Panel.shift 现在支持负周期 (GH 2164)

DataFrame 现在支持一元 ~ 运算符 (GH 2110)

API 变更#

使用 PeriodIndex 对数据进行升采样将产生一个跨越原始时间窗口的更高频率 TimeSeries

In [1]: prng = pd.period_range('2012Q1', periods=2, freq='Q')

In [2]: s = pd.Series(np.random.randn(len(prng)), prng)

In [4]: s.resample('M')
Out[4]:
2012-01   -1.471992
2012-02         NaN
2012-03         NaN
2012-04   -0.493593
2012-05         NaN
2012-06         NaN
Freq: M, dtype: float64

Period.end_time 现在返回时间间隔内的最后一纳秒 (GH 2124, GH 2125, GH 1764)

In [16]: p = pd.Period('2012')

In [17]: p.end_time
Out[17]: Timestamp('2012-12-31 23:59:59.999999999')

文件解析器不再对指定了自定义转换器的列强制转换为浮点型或布尔型 (GH 2184)

In [18]: import io

In [19]: data = ('A,B,C\n'
   ....:         '00001,001,5\n'
   ....:         '00002,002,6')
   ....: 

In [20]: pd.read_csv(io.StringIO(data), converters={'A': lambda x: x.strip()})
Out[20]: 
       A  B  C
0  00001  1  5
1  00002  2  6

[2 rows x 3 columns]

有关完整列表，请参阅完整发布说明或 GitHub 上的问题跟踪器。

贡献者#

共有 11 人为本次发布贡献了补丁。名字旁边带有“+”的人是首次贡献补丁。

Brenda Moon +
Chang She
Jeff Reback +
Justin C Johnson +
K.-Michael Aye
Martin Blais
Tobias Brandt +
Wes McKinney
Wouter Overmeire
timmie
y-p