数据帧#

构造函数#

DataFrame([data, index, columns, dtype, copy])

二维、大小可变、可能异构的表格数据。

属性和底层数据#

轴

`DataFrame.index`	数据帧的索引（行标签）。
`DataFrame.columns`	数据帧的列标签。

`DataFrame.dtypes`	返回数据帧中的数据类型。
`DataFrame.info`([verbose, buf, max_cols, ...])	打印数据帧的简洁摘要。
`DataFrame.select_dtypes`([include, exclude])	根据列数据类型返回数据帧列的子集。
`DataFrame.values`	返回数据帧的 NumPy 表示。
`DataFrame.axes`	返回表示数据帧轴的列表。
`DataFrame.ndim`	返回表示轴/数组维数的整数。
`DataFrame.size`	返回表示此对象中元素数量的整数。
`DataFrame.shape`	返回表示数据帧维度的元组。
`DataFrame.memory_usage`([index, deep])	返回每列的内存使用量（以字节为单位）。
`DataFrame.empty`	指示 Series/DataFrame 是否为空。
`DataFrame.set_flags`(*[, copy, ...])	返回具有更新标志的新对象。

转换#

`DataFrame.astype`(dtype[, copy, errors])	将 pandas 对象转换为指定的数据类型 `dtype`。
`DataFrame.convert_dtypes`([infer_objects, ...])	使用支持 `pd.NA` 的数据类型将列转换为最佳可能的数据类型。
`DataFrame.infer_objects`([copy])	尝试为对象列推断出更好的数据类型。
`DataFrame.copy`([deep])	复制此对象的索引和数据。
`DataFrame.bool`()	(已弃用) 返回单个元素 Series 或 DataFrame 的布尔值。

索引，迭代#

`DataFrame.head`([n])	返回前 n 行。
`DataFrame.at`	访问行/列标签对的单个值。
`DataFrame.iat`	通过整数位置访问行/列对的单个值。
`DataFrame.loc`	通过标签或布尔数组访问一组行和列。
`DataFrame.iloc`	(已弃用) 纯粹基于整数位置的索引，用于按位置进行选择。
`DataFrame.insert`(loc, column, value[, ...])	在指定位置将列插入 DataFrame 中。
`DataFrame.__iter__`()	遍历信息轴。
`DataFrame.items`()	遍历 (列名，Series) 对。
`DataFrame.keys`()	获取“信息轴”（有关更多信息，请参见索引）。
`DataFrame.iterrows`()	将 DataFrame 行作为 (索引，Series) 对进行迭代。
`DataFrame.itertuples`([index, name])	将 DataFrame 行作为命名元组进行迭代。
`DataFrame.pop`(item)	返回项目并从框架中删除。
`DataFrame.tail`([n])	返回最后 n 行。
`DataFrame.xs`(key[, axis, level, drop_level])	从 Series/DataFrame 返回横截面。
`DataFrame.get`(key[, default])	获取给定键的对象中的项目（例如：DataFrame 列）。
`DataFrame.isin`(values)	判断 DataFrame 中的每个元素是否包含在 values 中。
`DataFrame.where`(cond[, other, inplace, ...])	替换条件为 False 的值。
`DataFrame.mask`(cond[, other, inplace, axis, ...])	替换条件为 True 的值。
`DataFrame.query`(expr, *[, inplace])	使用布尔表达式查询 DataFrame 的列。

有关 .at、.iat、.loc 和 .iloc 的更多信息，请参见索引文档。

二元运算符函数#

`DataFrame.__add__`(other)	获取 DataFrame 和 other 的加法，按列进行。
`DataFrame.add`(other[, axis, level, fill_value])	获取 DataFrame 和 other 的加法，逐元素进行（二元运算符 add）。
`DataFrame.sub`(other[, axis, level, fill_value])	获取 DataFrame 和 other 的减法，逐元素进行（二元运算符 sub）。
`DataFrame.mul`(other[, axis, level, fill_value])	获取 DataFrame 和 other 的乘法，逐元素进行（二元运算符 mul）。
`DataFrame.div`(other[, axis, level, fill_value])	获取 DataFrame 和 other 的浮点除法，逐元素进行（二元运算符 truediv）。
`DataFrame.truediv`(other[, axis, level, ...])	获取 DataFrame 和 other 的浮点除法，逐元素进行（二元运算符 truediv）。
`DataFrame.floordiv`(other[, axis, level, ...])	获取 DataFrame 和 other 的整数除法，逐元素进行（二元运算符 floordiv）。
`DataFrame.mod`(other[, axis, level, fill_value])	获取数据帧和另一个对象的模，逐元素（二元运算符 mod）。
`DataFrame.pow`(other[, axis, level, fill_value])	获取数据帧和另一个对象的指数幂，逐元素（二元运算符 pow）。
`DataFrame.dot`(other)	计算数据帧和另一个对象之间的矩阵乘法。
`DataFrame.radd`(other[, axis, level, fill_value])	获取数据帧和另一个对象的加法，逐元素（二元运算符 radd）。
`DataFrame.rsub`(other[, axis, level, fill_value])	获取数据帧和另一个对象的减法，逐元素（二元运算符 rsub）。
`DataFrame.rmul`(other[, axis, level, fill_value])	获取数据帧和另一个对象的乘法，逐元素（二元运算符 rmul）。
`DataFrame.rdiv`(other[, axis, level, fill_value])	获取数据帧和另一个对象的浮点除法，逐元素（二元运算符 rtruediv）。
`DataFrame.rtruediv`(other[, axis, level, ...])	获取数据帧和另一个对象的浮点除法，逐元素（二元运算符 rtruediv）。
`DataFrame.rfloordiv`(other[, axis, level, ...])	获取数据帧和另一个数据帧的整数除法，逐元素（二元运算符 rfloordiv）。
`DataFrame.rmod`(other[, axis, level, fill_value])	获取数据帧和另一个数据帧的模运算，逐元素（二元运算符 rmod）。
`DataFrame.rpow`(other[, axis, level, fill_value])	获取数据帧和另一个数据帧的指数幂运算，逐元素（二元运算符 rpow）。
`DataFrame.lt`(other[, axis, level])	获取数据帧和另一个数据帧的小于运算，逐元素（二元运算符 lt）。
`DataFrame.gt`(other[, axis, level])	获取数据帧和另一个数据帧的大于运算，逐元素（二元运算符 gt）。
`DataFrame.le`(other[, axis, level])	获取数据帧和另一个数据帧的小于等于运算，逐元素（二元运算符 le）。
`DataFrame.ge`(other[, axis, level])	获取数据帧和另一个数据帧的大于等于运算，逐元素（二元运算符 ge）。
`DataFrame.ne`(other[, axis, level])	获取数据帧和另一个数据帧的不等于运算，逐元素（二元运算符 ne）。
`DataFrame.eq`(other[, axis, level])	获取数据帧和另一个数据帧的等于运算，逐元素（二元运算符 eq）。
`DataFrame.combine`(other, func[, fill_value, ...])	对另一个 DataFrame 执行按列合并。
`DataFrame.combine_first`(other)	使用 other 中相同位置的值更新空元素。

函数应用、GroupBy 和窗口#

`DataFrame.apply`(func[, axis, raw, ...])	沿着 DataFrame 的轴应用函数。
`DataFrame.map`(func[, na_action])	逐元素地将函数应用于 DataFrame。
`DataFrame.applymap`(func[, na_action])	(已弃用) 逐元素地将函数应用于 DataFrame。
`DataFrame.pipe`(func, args, *kwargs)	应用期望 Series 或 DataFrames 的可链式函数。
`DataFrame.agg`([func, axis])	使用一个或多个操作在指定的轴上进行聚合。
`DataFrame.aggregate`([func, axis])	使用一个或多个操作在指定的轴上进行聚合。
`DataFrame.transform`(func[, axis])	在 self 上调用 `func`，生成一个与 self 具有相同轴形状的 DataFrame。
`DataFrame.groupby`([by, axis, level, ...])	使用映射器或列的 Series 对 DataFrame 进行分组。
`DataFrame.rolling`(window[, min_periods, ...])	提供滚动窗口计算。
`DataFrame.expanding`([min_periods, axis, method])	提供扩展窗口计算。
`DataFrame.ewm`([com, span, halflife, alpha, ...])	提供指数加权 (EW) 计算。

计算/描述性统计#

`DataFrame.abs`()	返回一个 Series/DataFrame，其中包含每个元素的绝对数值。
`DataFrame.all`([axis, bool_only, skipna])	返回所有元素是否为 True，可能跨越轴。
`DataFrame.any`(*[, axis, bool_only, skipna])	返回任何元素是否为 True，可能跨越轴。
`DataFrame.clip`([lower, upper, axis, inplace])	在输入阈值处修剪值。
`DataFrame.corr`([method, min_periods, ...])	计算列的成对相关性，排除 NA/空值。
`DataFrame.corrwith`(other[, axis, drop, ...])	计算成对相关性。
`DataFrame.count`([axis, numeric_only])	计算每列或每行的非 NA 单元格数。
`DataFrame.cov`([min_periods, ddof, numeric_only])	计算列的成对协方差，排除 NA/空值。
`DataFrame.cummax`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最大值。
`DataFrame.cummin`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积最小值。
`DataFrame.cumprod`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积乘积。
`DataFrame.cumsum`([axis, skipna])	返回 DataFrame 或 Series 轴上的累积和。
`DataFrame.describe`([percentiles, include, ...])	生成描述性统计数据。
`DataFrame.diff`([periods, axis])	元素的第一个离散差。
`DataFrame.eval`(expr, *[, inplace])	评估描述 DataFrame 列操作的字符串。
`DataFrame.kurt`([axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
`DataFrame.kurtosis`([axis, skipna, numeric_only])	返回请求轴上的无偏峰度。
`DataFrame.max`([axis, skipna, numeric_only])	返回请求轴上值的最大值。
`DataFrame.mean`([axis, skipna, numeric_only])	返回请求轴上值的平均值。
`DataFrame.median`([axis, skipna, numeric_only])	返回请求轴上值的中间值。
`DataFrame.min`([axis, skipna, numeric_only])	返回请求轴上值的最小值。
`DataFrame.mode`([axis, numeric_only, dropna])	获取沿选定轴的每个元素的众数。
`DataFrame.pct_change`([periods, fill_method, ...])	当前元素与前一个元素之间的分数变化。
`DataFrame.prod`([axis, skipna, numeric_only, ...])	返回沿指定轴的值的乘积。
`DataFrame.product`([axis, skipna, ...])	返回沿指定轴的值的乘积。
`DataFrame.quantile`([q, axis, numeric_only, ...])	返回沿指定轴的给定分位数的值。
`DataFrame.rank`([axis, method, numeric_only, ...])	沿指定轴计算数值数据的排名（从 1 到 n）。
`DataFrame.round`([decimals])	将 DataFrame 四舍五入到可变的小数位数。
`DataFrame.sem`([axis, skipna, ddof, numeric_only])	返回沿指定轴的平均值的无偏标准误差。
`DataFrame.skew`([axis, skipna, numeric_only])	返回沿指定轴的无偏偏度。
`DataFrame.sum`([axis, skipna, numeric_only, ...])	返回沿指定轴的值的总和。
`DataFrame.std`([axis, skipna, ddof, numeric_only])	返回沿指定轴的样本标准差。
`DataFrame.var`([axis, skipna, ddof, numeric_only])	返回沿指定轴的无偏方差。
`DataFrame.nunique`([axis, dropna])	计算指定轴上不同元素的数量。
`DataFrame.value_counts`([subset, normalize, ...])	返回一个 Series，包含 DataFrame 中每个不同行的频率。

重新索引/选择/标签操作#

`DataFrame.add_prefix`(prefix[, axis])	在标签前添加字符串 prefix。
`DataFrame.add_suffix`(suffix[, axis])	在标签后添加字符串 suffix。
`DataFrame.align`(other[, join, axis, level, ...])	根据指定的连接方法，将两个对象在其轴上对齐。
`DataFrame.at_time`(time[, asof, axis])	选择特定时间（例如，上午 9:30）的值。
`DataFrame.between_time`(start_time, end_time)	选择特定时间段（例如，上午 9:00-9:30）的值。
`DataFrame.drop`([labels, axis, index, ...])	从行或列中删除指定的标签。
`DataFrame.drop_duplicates`([subset, keep, ...])	返回删除重复行的 DataFrame。
`DataFrame.duplicated`([subset, keep])	返回一个布尔 Series，表示重复行。
`DataFrame.equals`(other)	测试两个对象是否包含相同的元素。
`DataFrame.filter`([items, like, regex, axis])	根据指定的索引标签对 DataFrame 的行或列进行子集选择。
`DataFrame.first`(offset)	(已弃用) 根据日期偏移选择时间序列数据的初始时间段。
`DataFrame.head`([n])	返回前 n 行。
`DataFrame.idxmax`([axis, skipna, numeric_only])	返回请求轴上最大值的第一次出现的索引。
`DataFrame.idxmin`([axis, skipna, numeric_only])	返回请求轴上最小值的第一次出现的索引。
`DataFrame.last`(offset)	(已弃用) 根据日期偏移选择时间序列数据的最后时间段。
`DataFrame.reindex`([labels, index, columns, ...])	将 DataFrame 调整为新的索引，并使用可选的填充逻辑。
`DataFrame.reindex_like`(other[, method, ...])	返回一个与其他对象具有匹配索引的对象。
`DataFrame.rename`([mapper, index, columns, ...])	重命名列或索引标签。
`DataFrame.rename_axis`([mapper, index, ...])	设置索引或列的轴名称。
`DataFrame.reset_index`([level, drop, ...])	重置索引或其某个级别。
`DataFrame.sample`([n, frac, replace, ...])	从对象的轴上返回随机样本项。
`DataFrame.set_axis`(labels, *[, axis, copy])	将所需索引分配给给定轴。
`DataFrame.set_index`(keys, *[, drop, append, ...])	使用现有列设置 DataFrame 索引。
`DataFrame.tail`([n])	返回最后 n 行。
`DataFrame.take`(indices[, axis])	返回给定位置索引沿轴的元素。
`DataFrame.truncate`([before, after, axis, copy])	截断某个索引值之前和之后的 Series 或 DataFrame。

缺失数据处理#

`DataFrame.backfill`(*[, axis, inplace, ...])	(已弃用) 使用下一个有效观察值填充 NA/NaN 值以填补空白。
`DataFrame.bfill`(*[, axis, inplace, limit, ...])	使用下一个有效观察值填充 NA/NaN 值以填补空白。
`DataFrame.dropna`(*[, axis, how, thresh, ...])	删除缺失值。
`DataFrame.ffill`(*[, axis, inplace, limit, ...])	通过将最后一个有效观察值传播到下一个有效观察值来填充 NA/NaN 值。
`DataFrame.fillna`([value, method, axis, ...])	使用指定的方法填充 NA/NaN 值。
`DataFrame.interpolate`([method, axis, limit, ...])	使用插值方法填充 NaN 值。
`DataFrame.isna`()	检测缺失值。
`DataFrame.isnull`()	DataFrame.isnull 是 DataFrame.isna 的别名。
`DataFrame.notna`()	检测现有（非缺失）值。
`DataFrame.notnull`()	DataFrame.notnull 是 DataFrame.notna 的别名。
`DataFrame.pad`(*[, axis, inplace, limit, ...])	（已弃用）通过将最后一个有效观测值传播到下一个有效观测值来填充 NA/NaN 值。
`DataFrame.replace`([to_replace, value, ...])	将 to_replace 中给出的值替换为 value。

重塑、排序、转置#

`DataFrame.droplevel`(level[, axis])	返回删除了请求的索引/列级别(s)的 Series/DataFrame。
`DataFrame.pivot`(*, columns[, index, values])	返回根据给定的索引/列值组织的重塑后的 DataFrame。
`DataFrame.pivot_table`([values, index, ...])	创建一个类似电子表格的透视表作为 DataFrame。
`DataFrame.reorder_levels`(order[, axis])	使用输入顺序重新排列索引级别。
`DataFrame.sort_values`(by, *[, axis, ...])	按任一轴上的值排序。
`DataFrame.sort_index`(*[, axis, level, ...])	按标签（沿轴）对对象进行排序。
`DataFrame.nlargest`(n, columns[, keep])	返回按 columns 降序排列的前 n 行。
`DataFrame.nsmallest`(n, columns[, keep])	返回按升序排列的 columns 中前 n 行。
`DataFrame.swaplevel`([i, j, axis])	在 `MultiIndex` 中交换级别 i 和 j。
`DataFrame.stack`([level, dropna, sort, ...])	将指定级别(s) 从列堆叠到索引。
`DataFrame.unstack`([level, fill_value, sort])	将(必须是分层)索引标签的某个级别进行透视。
`DataFrame.swapaxes`(axis1, axis2[, copy])	(已弃用) 交换轴并相应地交换值轴。
`DataFrame.melt`([id_vars, value_vars, ...])	将 DataFrame 从宽格式转换为长格式，可以选择保留标识符。
`DataFrame.explode`(column[, ignore_index])	将列表式中的每个元素转换为一行，复制索引值。
`DataFrame.squeeze`([axis])	将一维轴对象压缩为标量。
`DataFrame.to_xarray`()	从 pandas 对象返回一个 xarray 对象。
`DataFrame.T`	DataFrame 的转置。
`DataFrame.transpose`(*args[, copy])	转置索引和列。

组合/比较/连接/合并#

`DataFrame.assign`(**kwargs)	为 DataFrame 分配新列。
`DataFrame.compare`(other[, align_axis, ...])	与另一个 DataFrame 进行比较并显示差异。
`DataFrame.join`(other[, on, how, lsuffix, ...])	连接另一个 DataFrame 的列。
`DataFrame.merge`(right[, how, on, left_on, ...])	使用数据库风格的连接合并 DataFrame 或命名 Series 对象。
`DataFrame.update`(other[, join, overwrite, ...])	使用来自另一个 DataFrame 的非 NA 值进行就地修改。

标志#

标志指的是 pandas 对象的属性。数据集的属性（如记录日期、访问的 URL 等）应存储在 DataFrame.attrs 中。

Flags(obj, *, allows_duplicate_labels)

应用于 pandas 对象的标志。

元数据#

DataFrame.attrs 是一个字典，用于存储此 DataFrame 的全局元数据。

警告

DataFrame.attrs 被认为是实验性的，可能会在没有警告的情况下发生变化。

DataFrame.attrs

此数据集的全局属性字典。

绘图#

DataFrame.plot 既是可调用方法，也是特定绘图方法的命名空间属性，形式为 DataFrame.plot.<kind>。

DataFrame.plot([x, y, kind, ax, ....])

DataFrame 绘图访问器和方法

`DataFrame.plot.area`([x, y, stacked])	绘制堆叠面积图。
`DataFrame.plot.bar`([x, y])	垂直条形图。
`DataFrame.plot.barh`([x, y])	绘制水平条形图。
`DataFrame.plot.box`([by])	绘制 DataFrame 列的箱线图。
`DataFrame.plot.density`([bw_method, ind])	使用高斯核生成核密度估计图。
`DataFrame.plot.hexbin`(x, y[, C, ...])	生成六边形分箱图。
`DataFrame.plot.hist`([by, bins])	绘制 DataFrame 列的一个直方图。
`DataFrame.plot.kde`([bw_method, ind])	使用高斯核生成核密度估计图。
`DataFrame.plot.line`([x, y])	将 Series 或 DataFrame 绘制为线。
`DataFrame.plot.pie`(**kwargs)	生成饼图。
`DataFrame.plot.scatter`(x, y[, s, c])	创建具有不同标记点大小和颜色的散点图。

`DataFrame.boxplot`([column, by, ax, ...])	从 DataFrame 列创建箱线图。
`DataFrame.hist`([column, by, grid, ...])	创建 DataFrame 列的直方图。

稀疏访问器#

稀疏数据类型特定的方法和属性在 DataFrame.sparse 访问器下提供。

DataFrame.sparse.density

非稀疏点与总（密集）数据点的比率。

`DataFrame.sparse.from_spmatrix`(data[, ...])	从 scipy 稀疏矩阵创建一个新的 DataFrame。
`DataFrame.sparse.to_coo`()	将框架的内容作为稀疏 SciPy COO 矩阵返回。
`DataFrame.sparse.to_dense`()	将具有稀疏值的 DataFrame 转换为密集。

序列化/IO/转换#

`DataFrame.from_dict`(data[, orient, dtype, ...])	从类似数组或字典的字典构造 DataFrame。
`DataFrame.from_records`(data[, index, ...])	将结构化或记录 ndarray 转换为 DataFrame。
`DataFrame.to_orc`([path, engine, index, ...])	将 DataFrame 写入 ORC 格式。
`DataFrame.to_parquet`([path, engine, ...])	将 DataFrame 写入二进制 parquet 格式。
`DataFrame.to_pickle`(path, *[, compression, ...])	将对象序列化（pickle）到文件。
`DataFrame.to_csv`([path_or_buf, sep, na_rep, ...])	将对象写入逗号分隔值 (csv) 文件。
`DataFrame.to_hdf`(path_or_buf, *, key[, ...])	使用 HDFStore 将包含的数据写入 HDF5 文件。
`DataFrame.to_sql`(name, con, *[, schema, ...])	将存储在 DataFrame 中的记录写入 SQL 数据库。
`DataFrame.to_dict`([orient, into, index])	将 DataFrame 转换为字典。
`DataFrame.to_excel`(excel_writer, *[, ...])	将对象写入 Excel 表格。
`DataFrame.to_json`([path_or_buf, orient, ...])	将对象转换为 JSON 字符串。
`DataFrame.to_html`([buf, columns, col_space, ...])	将 DataFrame 渲染为 HTML 表格。
`DataFrame.to_feather`(path, **kwargs)	将 DataFrame 写入二进制 Feather 格式。
`DataFrame.to_latex`([buf, columns, header, ...])	将对象渲染为 LaTeX 表格、长表格或嵌套表格。
`DataFrame.to_stata`(path, *[, convert_dates, ...])	将 DataFrame 对象导出为 Stata dta 格式。
`DataFrame.to_gbq`(destination_table, *[, ...])	(已弃用) 将 DataFrame 写入 Google BigQuery 表格。
`DataFrame.to_records`([index, column_dtypes, ...])	将 DataFrame 转换为 NumPy 记录数组。
`DataFrame.to_string`([buf, columns, ...])	将 DataFrame 渲染为控制台友好的表格输出。
`DataFrame.to_clipboard`([, excel, sep])	将对象复制到系统剪贴板。
`DataFrame.to_markdown`([buf, mode, index, ...])	以 Markdown 友好的格式打印 DataFrame。
`DataFrame.style`	返回一个 Styler 对象。
`DataFrame.__dataframe__`([nan_as_null, ...])	返回实现交换协议的 DataFrame 交换对象。

`DataFrame.asfreq`(freq[, method, how, ...])	将时间序列转换为指定频率。
`DataFrame.asof`(where[, subset])	返回在 where 之前没有 NaN 的最后一行（或多行）。
`DataFrame.shift`([periods, freq, axis, ...])	使用可选的时间 freq 将索引按所需周期数进行偏移。
`DataFrame.first_valid_index`()	返回第一个非 NA 值的索引，如果未找到非 NA 值，则返回 None。
`DataFrame.last_valid_index`()	返回最后一个非 NA 值的索引，如果找不到非 NA 值，则返回 None。
`DataFrame.resample`(rule[, axis, closed, ...])	对时间序列数据进行重采样。
`DataFrame.to_period`([freq, axis, copy])	将 DataFrame 从 DatetimeIndex 转换为 PeriodIndex。
`DataFrame.to_timestamp`([freq, how, axis, copy])	转换为时间戳的 DatetimeIndex，位于周期的开始。
`DataFrame.tz_convert`(tz[, axis, level, copy])	将时区感知轴转换为目标时区。
`DataFrame.tz_localize`(tz[, axis, level, ...])	将 Series 或 DataFrame 的时区无感知索引本地化为目标时区。

数据帧#

构造函数#

属性和底层数据#

转换#

索引，迭代#

二元运算符函数#

函数应用、GroupBy 和窗口#

计算/描述性统计#

重新索引/选择/标签操作#

缺失数据处理#

重塑、排序、转置#

组合/比较/连接/合并#

时间序列相关#

标志#

元数据#

绘图#

稀疏访问器#

序列化/IO/转换#