前言:
此刻你们对“python combine”大体比较珍视,姐妹们都想要了解一些“python combine”的相关资讯。那么小编也在网络上汇集了一些关于“python combine””的相关文章,希望看官们能喜欢,大家快快来了解一下吧!「目录」
数据规整:聚合、合并和重塑
Data Wrangling: Join, Combine, and Reshape
8.1 => 层次化索引8.2 => 合并数据集
--------> 数据库风格的DataFrame合并
--------> 索引上的合并
8.3 => 重塑和轴向旋转
索引上的合并
上一篇笔记讲的是如何根据DataFrame的列名来链接两个DataFrame对象。
有时候我们要根据DataFrame中的index索引来合并数据。这种情况下,我们可以传入left_index=True或right_index=True或两个都传入来说明索引被用作链接键。
我们先创建两个DataFrame,指明根据第一个DataFrame的'key'列和第二个DataFrame的index索引来合并数据:
In [1]: import numpy as npIn [2]: import pandas as pdIn [3]: left1 = pd.DataFrame({'key':['a', 'b', 'a', 'a', 'b', 'c'], 'value':range(6)})In [4]: right1 = pd.DataFrame({'group_val':[3.5, 7]}, index=['a', 'b'])In [5]: left1Out[5]: key value0 a 01 b 12 a 23 a 34 b 45 c 5In [6]: right1Out[6]: group_vala 3.5b 7.0In [7]: pd.merge(left1, right1, left_on='key', right_index=True)Out[7]: key value group_val0 a 0 3.52 a 2 3.53 a 3 3.51 b 1 7.04 b 4 7.0
默认的merge方法是求取链接键的交集,通过传入how='outer'可以得到它们的并集:
In [8]: pd.merge(left1, right1, left_on='key', right_index=True, how='outer')Out[8]: key value group_val0 a 0 3.52 a 2 3.53 a 3 3.51 b 1 7.04 b 4 7.05 c 5 NaN
层次化索引数据的合并
对于层次化索引的数据的合并,我们要以列表的形式指明用作合并键的多个列。
In [9]: lefth = pd.DataFrame({'key1':['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2':[2000, 2001, 2002, 2001, 2002], 'data':np.arange(5.)}) In [14]: righth = pd.DataFrame(np.arange(12).reshape((6,2)), index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'], [2001, 2000, 2000, 2000, 2001, 2002]], columns=['event1', 'event2'])In [15]: lefthOut[15]: key1 key2 data0 Ohio 2000 0.01 Ohio 2001 1.02 Ohio 2002 2.03 Nevada 2001 3.04 Nevada 2002 4.0In [16]: righthOut[16]: event1 event2Nevada 2001 0 1 2000 2 3Ohio 2000 4 5 2000 6 7 2001 8 9 2002 10 11
比如下面我们就指定根据第一个DataFrame的'key1'列和'key2'列以及第二个DataFrame的index索引来合并:
In [17]: pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True)Out[17]: key1 key2 data event1 event20 Ohio 2000 0.0 4 50 Ohio 2000 0.0 6 71 Ohio 2001 1.0 8 92 Ohio 2002 2.0 10 113 Nevada 2001 3.0 0 1In [18]: pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True, how='outer')Out[18]: key1 key2 data event1 event20 Ohio 2000 0.0 4.0 5.00 Ohio 2000 0.0 6.0 7.01 Ohio 2001 1.0 8.0 9.02 Ohio 2002 2.0 10.0 11.03 Nevada 2001 3.0 0.0 1.04 Nevada 2002 4.0 NaN NaN4 Nevada 2000 NaN 2.0 3.0
同时使用双方的索引来合并也没问题:
In [19]: left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'], columns=['Ohio', 'Nevada'])In [20]: right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]], index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])In [21]: left2Out[21]: Ohio Nevadaa 1.0 2.0c 3.0 4.0e 5.0 6.0In [22]: right2Out[22]: Missouri Alabamab 7.0 8.0c 9.0 10.0d 11.0 12.0e 13.0 14.0In [23]: pd.merge(left2, right2, how='outer', left_index=True, right_index=True)Out[23]: Ohio Nevada Missouri Alabamaa 1.0 2.0 NaN NaNb NaN NaN 7.0 8.0c 3.0 4.0 9.0 10.0d NaN NaN 11.0 12.0e 5.0 6.0 13.0 14.0
join方法
DataFrame还有便捷的实例方法join,它能更方便的实现按索引合并,但要求没有重叠的列。
In [24]: left2.join(right2, how='outer')Out[24]: Ohio Nevada Missouri Alabamaa 1.0 2.0 NaN NaNb NaN NaN 7.0 8.0c 3.0 4.0 9.0 10.0d NaN NaN 11.0 12.0e 5.0 6.0 13.0 14.0
我们还可以向join传入一组DataFrame,类似于concat函数,实现多个DataFrame的合并拼接:
In [25]: another = pd.DataFrame([[7.,8.], [9.,10.], [11., 12.], [16., 17.]], index=['a', 'c', 'e', 'f'], columns=['New York', 'Oregon'])In [26]: anotherOut[26]: New York Oregona 7.0 8.0c 9.0 10.0e 11.0 12.0f 16.0 17.0In [27]: left2.join([right2, another])Out[27]: Ohio Nevada Missouri Alabama New York Oregona 1.0 2.0 NaN NaN 7.0 8.0c 3.0 4.0 9.0 10.0 9.0 10.0e 5.0 6.0 13.0 14.0 11.0 12.0In [29]: left2.join([right2, another], how='outer')Out[29]: Ohio Nevada Missouri Alabama New York Oregona 1.0 2.0 NaN NaN 7.0 8.0b NaN NaN 7.0 8.0 NaN NaNc 3.0 4.0 9.0 10.0 9.0 10.0d NaN NaN 11.0 12.0 NaN NaNe 5.0 6.0 13.0 14.0 11.0 12.0f NaN NaN NaN NaN 16.0 17.0
-END-
标签: #python combine