8.2.2 Correlate Data
Use the corr
method to perform Pearson, Spearman, or Kendall correlation analysis across columns where possible in an oml.DataFrame
object.
For details about the function arguments, invoke help(oml.DataFrame.corr)
or see Oracle Machine Learning for Python API Reference.
Example 8-10 Performing Basic Correlation Calculations
This example first creates a temporary database table, with its corresponding proxy oml.DataFrame
object oml_df1
, from the pandas.DataFrame
object df
. It then verifies the correlation computed between columns A and B, which gives 1, as expected. The values in B are twice the values in A element-wise. The example also changes a value field in df
and creates a NaN
entry. It then creates a temporary database table, with the corresponding proxy oml.DataFrame
object oml_df2
. Finally, it invokes the corr
method on oml_df2
with skipna
set to True
( the default) and then False
to compare the results.
import oml
import pandas as pd
df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
oml_df1 = oml.push(df)
# Verify that the correlation between column A and B is 1.
oml_df1.corr()
# Change a value to test the change in the computed correlation result.
df.loc[2, 'A'] = 1.5
# Change an entry to NaN (not a number) to test the 'skipna'
# parameter in the corr method.
df.loc[1, 'B'] = None
# Push df to the database using the floating point column type
# because NaNs cannot be used in Oracle numbers.
oml_df2 = oml.push(df, oranumber=False)
# By default, 'skipna' is True.
oml_df2.corr()
oml_df2.corr(skipna=False)
Listing for This Example
>>> import oml
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'A': range(4), 'B': [2*i for i in range(4)]})
>>> oml_df1 = oml.push(df)
>>>
>>> # Verify that the correlation between column A and B is 1.
... oml_df1.corr()
A B
A 1 1
B 1 1
>>>
>>> # Change a value to test the change in the computed correlation result.
... df.loc[2, 'A'] = 1.5
>>>
>>> # Change an entry to NaN (not a number) so to test the 'skipna'
... # parameter in the corr method.
... df.loc[1, 'B'] = None
>>>
>>> # Push df to the database using the floating point column type
... # because NaNs cannot be used in Oracle numbers.
... oml_df2 = oml.push(df, oranumber=False)
>>>
>>> # By default, 'skipna' is True.
... oml_df2.corr()
A B
A 1.000000 0.981981
B 0.981981 1.000000
>>> oml_df2.corr(skipna=False)
A B
A 1.0 NaN
B NaN 1.0
Parent topic: Explore Data