pyspark.sql.functions.covar_pop#

pyspark.sql.functions.covar_pop(col1, col2)[source]#

Returns a new Column for the population covariance of col1 and col2.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
col1Column or column name

first column to calculate covariance.

col2Column or column name

second column to calculate covariance.

Returns
Column

covariance of these two column values.

Examples

>>> from pyspark.sql import functions as sf
>>> a = [1] * 10
>>> b = [1] * 10
>>> df = spark.createDataFrame(zip(a, b), ["a", "b"])
>>> df.agg(sf.covar_pop("a", df.b)).show()
+---------------+
|covar_pop(a, b)|
+---------------+
|            0.0|
+---------------+