龙空技术网

20pandas.merge: 合并数据

曾老师财务数据Hub 87

前言:

目前你们对“集合合并c语言”大概比较着重,姐妹们都想要分析一些“集合合并c语言”的相关资讯。那么小编在网络上搜集了一些对于“集合合并c语言””的相关知识,希望同学们能喜欢,我们快快来学习一下吧!

pd.merge( )

Definition : merge(left: 'DataFrame | Series', right: 'DataFrame | Series', how: 'str'='inner', on: 'IndexLabel | None'=None, left_on: 'IndexLabel | None'=None, right_on: 'IndexLabel | None'=None, left_index: 'bool'=False, right_index: 'bool'=False, sort: 'bool'=False, suffixes: 'Suffixes'=('_x', '_y'), copy: 'bool'=True, indicator: 'bool'=False, validate: 'str | None'=None) -> 'DataFrame'
参数含义

left:待链接的左侧数据集。

right:待链接的右侧数据集。

how:左右数据集的连接方式。可选‘left’、‘right’、‘outer’、‘inner’,'cross',默认为inner。

on:左右两个待链接数据集有共同列名,且按该列链接两个数据集合时使用该参数。

left_on:链接两个数据集时,左数据集对应连接关键字(可为列表)。

right_on:链接两个数据集时,右数据集对应连接关键字(可为列表)。

left_index:若为True,则按左数据集的索引连接两个数据集。

right_index:若为True,则按右数据集的索引连接两个数据集。

sort:对结果数据集进行排序。

suffixes:为左右数据集中重复列名定义后缀。默认加('_x','_y')。

Merge method

Merge

method

SQL Join Name

Description

left

LEFT OUTER JOIN

Use keys from left frame only

right

RIGHT OUTER JOIN

Use keys from right frame only

outer

FULL OUTER JOIN

Use union of keys from both frames

inner

INNER JOIN

Use intersection of keys from both frames

Examples

import pandas as pdleft = pd.DataFrame(    {        "key": ["K0", "K1", "K2", "K3"],        "A": ["A0", "A1", "A2", "A3"],        "B": ["B0", "B1", "B2", "B3"],    })right = pd.DataFrame(    {        "key": ["K0", "K1", "K2", "K3"],        "C": ["C0", "C1", "C2", "C3"],        "D": ["D0", "D1", "D2", "D3"],    })
result = pd.merge(left, right, on="key" ,how= 'inner')
left = pd.DataFrame(    {        "key1": ["K0", "K0", "K1", "K2"],        "key2": ["K0", "K1", "K0", "K1"],        "A": ["A0", "A1", "A2", "A3"],        "B": ["B0", "B1", "B2", "B3"],    })right = pd.DataFrame(    {        "key1": ["K0", "K1", "K1", "K2"],        "key2": ["K0", "K0", "K0", "K0"],        "C": ["C0", "C1", "C2", "C3"],        "D": ["D0", "D1", "D2", "D3"],    })
result = pd.merge(left, right, how="left",  on=["key1", "key2"])
result = pd.merge(left, right, how="right", on=["key1", "key2"])
result = pd.merge(left, right, how="outer", on=["key1", "key2"])
result = pd.merge(left, right, how="inner", on=["key1", "key2"])
DataFrame与Series链接
df = pd.DataFrame({"Let": ["A", "B", "C"], "Num": [1, 2, 3]})ser = pd.Series(    ["a", "b", "c", "d", "e", "f"],    index=pd.MultiIndex.from_arrays(        [["A", "B", "C"] * 2, [1, 2, 3, 4, 5, 6]], names=["Let", "Num"]    ),)
pd.merge(df, ser.reset_index(), on=["Let", "Num"])
检查重复关键字
left = pd.DataFrame({"A": [1, 2], "B": [2, 2]})right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})result = pd.merge(left, right, on="B", how="outer")
left = pd.DataFrame({"A": [1, 2], "B": [1, 2]})right = pd.DataFrame({"A": [4, 5, 6], "B": [2, 2, 2]})result = pd.merge(left, right, on="B", how="outer", validate="one_to_one")
pd.merge(left, right, on="B", how="outer", validate="one_to_many")
合并指示器
df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]})df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]})
pd.merge(df1, df2, on="col1", how="outer", indicator=True)pd.merge(df1, df2, on="col1", how="outer", indicator="indicator_column")
总结

根据不同的使用场景,设置不同的参数值,达到预期的数据集链接目的。此处仅仅是个人的学习笔记,还有很多灵活的用法没有列举出来,感兴趣的可以深入操练。同时,强烈建议您要多读pandas官方文档,很多用法均来自官方文档,毕竟“您要学会打猎”。

关于数据集链接,我之前的文章也有介绍,建议您系统性的浏览:

18pandas.concat: 合并数据

19pandas读取合并多个Excel文件

有关其他主题文章,也可以关注我的微信公众号“曾老师会计与统计”查看。

标签: #集合合并c语言