8.1.3 Combine Data
You can join data from oml.DataFrame
objects that represent database tables by using the append
, concat
, and merge
methods.
Examples of using these methods are in the following topics.
Append Data from One Object to Another Object
Use the append
method to join two objects of the same data type.
Example 8-4 Appending Data from Two Tables
This example first appends the oml.Float
series object num1
to another oml.Float
series object, num2
. It then appends an oml.DataFrame
object to another oml.DataFrame
object, which has the same column types.
import oml
import pandas as pd
df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
"val" : ["a", "b", "c", "d", "e"],
"ch" : ["p", "q", "r", "a", "b"],
"num" : [4, 3, 6.7, 7.2, 5]})
oml_df = oml.push(df)
# Append an oml.Float series object to another.
num1 = oml_df['id']
num2 = oml_df['num']
num1.append(num2)
# Append an oml.DataFrame object to another.
x = oml_df[['id', 'val']] # 1st column oml.Float, 2nd column oml.String
y = oml_df[['num', 'ch']] # 1st column oml.Float, 2nd column oml.String
x.append(y)
Listing for This Example
>>> import oml
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
... "val" : ["a", "b", "c", "d", "e"],
... "ch" : ["p", "q", "r", "a", "b"],
... "num" : [4, 3, 6.7, 7.2, 5]})
>>> oml_df = oml.push(df)
>>>
>>> # Append an oml.Float series object to another.
... num1 = oml_df['id']
>>> num2 = oml_df['num']
>>> num1.append(num2)
[1, 2, 3, 4, 5, 4, 3, 6.7, 7.2, 5]
>>>
>>> # Explicitly convert oml.Integer to oml.Float
>>> oml.Float(num1).append(num2)
>>> # Append an oml.DataFrame object to another.
... x = oml_df[['id', 'val']] # 1st column oml.Float, 2nd column oml.String
>>> y = oml_df[['num', 'ch']] # 1st column oml.Float, 2nd column oml.String
>>> x.append(y)
id val
0 1.0 a
1 2.0 b
2 3.0 c
3 4.0 d
4 5.0 e
5 4.0 p
6 3.0 q
7 6.7 r
8 7.2 a
9 5.0 b
Combine Two Objects
Use the concat
method to combine columns from one object with those of another object. The auto_name
argument of the concat
method controls whether to invoke automatic name conflict resolution. You can also perform customized renaming by passing in a dictionary mapping strings to objects.
To combine two objects with the concat
method, both objects must represent data from the same underlying database table, view, or query.
Example 8-5 Combining Data Column-Wise
This example first combines the two oml.DataFrame
objects x
and y
column-wise. It then concatenates object y
with the oml.Float
series object w
.
import oml
import pandas as pd
from collections import OrderedDict
df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
"val" : ["a", "b", "c", "d", "e"],
"ch" : ["p", "q", "r", "a", "b"],
"num" : [4, 3, 6.7, 7.2, 5]})
oml_df = oml.push(df)
# Create two oml.DataFrame objects and combine the objects column-wise.
x = oml_df[['id', 'val']]
y = oml_df[['num', 'ch']]
x.concat(y)
# Create an oml.Float object with the rounded exponential of two times
# the values in the num column of the oml_df object, then
# concatenate it with the oml.DataFrame object y using a new column name.
w = (oml_df['num']*2).exp().round(decimals=2)
y.concat({'round(exp(2*num))':w})
# Concatenate object x with multiple objects and turn on automatic
# name conflict resolution.
z = oml_df[:,'id']
x.concat([z, w, y], auto_name=True)
# Concatenate multiple oml data objects and perform customized renaming.
x.concat(OrderedDict([('ID',z), ('round(exp(2*num))',w), ('New_',y)]))
Listing for This Example
>>> import oml
>>> import pandas as pd
>>> from collections import OrderedDict
>>>
>>> df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
... "val" : ["a", "b", "c", "d", "e"],
... "ch" : ["p", "q", "r", "a", "b"],
... "num" : [4, 3, 6.7, 7.2, 5]})
>>> oml_df = oml.push(df)
>>> # Create two oml.DataFrame objects and combine the objects column-wise.
... x = oml_df[['id', 'val']]
>>> y = oml_df[['num', 'ch']]
>>> x.concat(y)
id val num ch
0 1 a 4.0 p
1 2 b 3.0 q
2 3 c 6.7 r
3 4 d 7.2 a
4 5 e 5.0 b
>>>
>>> # Create an oml.Float object with the rounded exponential of two times
... # the values in the num column of the oml_df object, then
... # concatenate it with the oml.DataFrame object y using a new column name.
... w = (oml_df['num']*2).exp().round(decimals=2)
>>> y.concat({'round(exp(2*num))':w})
num ch round(exp(2*num))
0 4.0 p 2980.96
1 3.0 q 403.43
2 6.7 r 660003.22
3 7.2 a 1794074.77
4 5.0 b 22026.47
>>>
>>> # Concatenate object x with multiple objects and turn on automatic
... # name conflict resolution.
... z = oml_df[:,'id']
>>> x.concat([z, w, y], auto_name=True)
id val id3 num num5 ch
0 1 a 1 2980.96 4.0 p
1 2 b 2 403.43 3.0 q
2 3 c 3 660003.22 6.7 r
3 4 d 4 1794074.77 7.2 a
4 5 e 5 22026.47 5.0 b
>>>
>>> # Concatenate multiple oml data objects and perform customized renaming.
... x.concat(OrderedDict([('ID',z), ('round(exp(2*num))',w), ('New_',y)]))
id val ID round(exp(2*num)) New_num New_ch
0 1 a 1 2980.96 4.0 p
1 2 b 2 403.43 3.0 q
2 3 c 3 660003.22 6.7 r
3 4 d 4 1794074.77 7.2 a
4 5 e 5 22026.47 5.0 b
Join Data From Two Objects
Use the merge
method to join data from two objects.
Example 8-6 Joining Data from Two Tables
This example first performs a cross join on the oml.DataFrame
objects x
and y
, which creates the oml.DataFrame
object xy
. The example performs a left outer join on the first four rows of x
with the oml.DataFrame
object other on the shared column id and applies the suffixes .l
and .r
to column names on the left and right side, respectively. The example then performs a right outer join on the id column on the left side object x
and the num column on the right side object y
.
import oml
import pandas as pd
df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
"val" : ["a", "b", "c", "d", "e"],
"ch" : ["p", "q", "r", "a", "b"],
"num" : [4, 3, 6.7, 7.2, 5]})
oml_df = oml.push(df)
x = oml_df[['id', 'val']]
y = oml_df[['num', 'ch']]
# Perform a cross join.
xy = x.merge(y)
xy
# Perform a left outer join.
x.head(4).merge(other=oml_df[['id', 'num']], on="id",
suffixes=['.l','.r'])
# Perform a right outer join.
x.merge(other=y, left_on="id", right_on="num", how="right")
Listing for This Example
>>> import oml
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({"id" : [1, 2, 3, 4, 5],
... "val" : ["a", "b", "c", "d", "e"],
... "ch" : ["p", "q", "r", "a", "b"],
... "num" : [4, 3, 6.7, 7.2, 5]})
>>> oml_df = oml.push(df)
>>>
>>> x = oml_df[['id', 'val']]
>>> y = oml_df[['num', 'ch']]
>>>
>>> # Perform a cross join.
... xy = x.merge(y)
>>> xy
id_l val_l num_r ch_r
0 1 a 4.0 p
1 1 a 3.0 q
2 1 a 6.7 r
3 1 a 7.2 a
4 1 a 5.0 b
5 2 b 4.0 p
6 2 b 3.0 q
7 2 b 6.7 r
8 2 b 7.2 a
9 2 b 5.0 b
10 3 c 4.0 p
11 3 c 3.0 q
12 3 c 6.7 r
13 3 c 7.2 a
14 3 c 5.0 b
15 4 d 4.0 p
16 4 d 3.0 q
17 4 d 6.7 r
18 4 d 7.2 a
19 4 d 5.0 b
20 5 e 4.0 p
21 5 e 3.0 q
22 5 e 6.7 r
23 5 e 7.2 a
24 5 e 5.0 b
>>>
>>> # Perform a left outer join.
... x.head(4).merge(other=oml_df[['id', 'num']], on="id",
... suffixes=['.l','.r'])
id val.l num.r
0 1 a 4.0
1 2 b 3.0
2 3 c 6.7
3 4 d 7.2
>>>
>>> # Perform a right outer join.
... x.merge(other=y, left_on="id", right_on="num", how="right")
id_l val_l num_r ch_r
0 3.0 c 3.0 q
1 4.0 d 4.0 p
2 5.0 e 5.0 b
3 NaN None 6.7 r
4 NaN None 7.2 a
Parent topic: Prepare Data