龙空技术网

Python数据分析笔记#8.2.2 索引上的合并

Yuan的学习笔记 147

前言:

此刻你们对“python combine”大体比较珍视,姐妹们都想要了解一些“python combine”的相关资讯。那么小编也在网络上汇集了一些关于“python combine””的相关文章,希望看官们能喜欢,大家快快来了解一下吧!

「目录」

数据规整:聚合、合并和重塑

Data Wrangling: Join, Combine, and Reshape

8.1 => 层次化索引8.2 => 合并数据集

--------> 数据库风格的DataFrame合并

--------> 索引上的合并

8.3 => 重塑和轴向旋转

索引上的合并

上一篇笔记讲的是如何根据DataFrame的列名来链接两个DataFrame对象。

有时候我们要根据DataFrame中的index索引来合并数据。这种情况下,我们可以传入left_index=True或right_index=True或两个都传入来说明索引被用作链接键。

我们先创建两个DataFrame,指明根据第一个DataFrame的'key'列和第二个DataFrame的index索引来合并数据:

In [1]: import numpy as npIn [2]: import pandas as pdIn [3]: left1 = pd.DataFrame({'key':['a', 'b', 'a', 'a', 'b', 'c'], 'value':range(6)})In [4]: right1 = pd.DataFrame({'group_val':[3.5, 7]}, index=['a', 'b'])In [5]: left1Out[5]:  key  value0   a      01   b      12   a      23   a      34   b      45   c      5In [6]: right1Out[6]:   group_vala        3.5b        7.0In [7]: pd.merge(left1, right1, left_on='key', right_index=True)Out[7]:  key  value  group_val0   a      0        3.52   a      2        3.53   a      3        3.51   b      1        7.04   b      4        7.0

默认的merge方法是求取链接键的交集,通过传入how='outer'可以得到它们的并集:

In [8]: pd.merge(left1, right1, left_on='key', right_index=True, how='outer')Out[8]:  key  value  group_val0   a      0        3.52   a      2        3.53   a      3        3.51   b      1        7.04   b      4        7.05   c      5        NaN

层次化索引数据的合并

对于层次化索引的数据的合并,我们要以列表的形式指明用作合并键的多个列

In [9]: lefth = pd.DataFrame({'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2':[2000, 2001, 2002, 2001, 2002], 'data':np.arange(5.)})   In [14]: righth = pd.DataFrame(np.arange(12).reshape((6,2)), index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'], [2001, 2000, 2000, 2000, 2001, 2002]], columns=['event1', 'event2'])In [15]: lefthOut[15]:     key1  key2  data0    Ohio  2000   0.01    Ohio  2001   1.02    Ohio  2002   2.03  Nevada  2001   3.04  Nevada  2002   4.0In [16]: righthOut[16]:             event1  event2Nevada 2001       0       1       2000       2       3Ohio   2000       4       5       2000       6       7       2001       8       9       2002      10      11       

比如下面我们就指定根据第一个DataFrame的'key1'列和'key2'列以及第二个DataFrame的index索引来合并:

In [17]: pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True)Out[17]:     key1  key2  data  event1  event20    Ohio  2000   0.0       4       50    Ohio  2000   0.0       6       71    Ohio  2001   1.0       8       92    Ohio  2002   2.0      10      113  Nevada  2001   3.0       0       1In [18]: pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True, how='outer')Out[18]:     key1  key2  data  event1  event20    Ohio  2000   0.0     4.0     5.00    Ohio  2000   0.0     6.0     7.01    Ohio  2001   1.0     8.0     9.02    Ohio  2002   2.0    10.0    11.03  Nevada  2001   3.0     0.0     1.04  Nevada  2002   4.0     NaN     NaN4  Nevada  2000   NaN     2.0     3.0

同时使用双方的索引来合并也没问题:

In [19]: left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'], columns=['Ohio', 'Nevada'])In [20]: right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]], index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])In [21]: left2Out[21]:   Ohio  Nevadaa   1.0     2.0c   3.0     4.0e   5.0     6.0In [22]: right2Out[22]:   Missouri  Alabamab       7.0      8.0c       9.0     10.0d      11.0     12.0e      13.0     14.0In [23]: pd.merge(left2, right2, how='outer', left_index=True, right_index=True)Out[23]:   Ohio  Nevada  Missouri  Alabamaa   1.0     2.0       NaN      NaNb   NaN     NaN       7.0      8.0c   3.0     4.0       9.0     10.0d   NaN     NaN      11.0     12.0e   5.0     6.0      13.0     14.0

join方法

DataFrame还有便捷的实例方法join,它能更方便的实现按索引合并,但要求没有重叠的列。

In [24]: left2.join(right2, how='outer')Out[24]:   Ohio  Nevada  Missouri  Alabamaa   1.0     2.0       NaN      NaNb   NaN     NaN       7.0      8.0c   3.0     4.0       9.0     10.0d   NaN     NaN      11.0     12.0e   5.0     6.0      13.0     14.0

我们还可以向join传入一组DataFrame,类似于concat函数,实现多个DataFrame的合并拼接:

In [25]: another = pd.DataFrame([[7.,8.], [9.,10.], [11., 12.], [16., 17.]], index=['a', 'c', 'e', 'f'], columns=['New York', 'Oregon'])In [26]: anotherOut[26]:   New York  Oregona       7.0     8.0c       9.0    10.0e      11.0    12.0f      16.0    17.0In [27]: left2.join([right2, another])Out[27]:   Ohio  Nevada  Missouri  Alabama  New York  Oregona   1.0     2.0       NaN      NaN       7.0     8.0c   3.0     4.0       9.0     10.0       9.0    10.0e   5.0     6.0      13.0     14.0      11.0    12.0In [29]: left2.join([right2, another], how='outer')Out[29]:   Ohio  Nevada  Missouri  Alabama  New York  Oregona   1.0     2.0       NaN      NaN       7.0     8.0b   NaN     NaN       7.0      8.0       NaN     NaNc   3.0     4.0       9.0     10.0       9.0    10.0d   NaN     NaN      11.0     12.0       NaN     NaNe   5.0     6.0      13.0     14.0      11.0    12.0f   NaN     NaN       NaN      NaN      16.0    17.0

-END-

标签: #python combine