pyspark.sql.functions.percentile#

pyspark.sql.functions.percentile(col, percentage, frequency=1)[source]#

Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].

New in version 3.5.0.

Parameters
colColumn or column name
percentageColumn, float, list of floats or tuple of floats

percentage in decimal (must be between 0.0 and 1.0).

frequencyColumn or int is a positive numeric literal which

controls frequency.

Returns
Column

the exact percentile of the numeric column.

Examples

>>> from pyspark.sql import functions as sf
>>> key = (sf.col("id") % 3).alias("key")
>>> value = (sf.randn(42) + key * 10).alias("value")
>>> df = spark.range(0, 1000, 1, 1).select(key, value)
>>> df.select(
...     sf.percentile("value", [0.25, 0.5, 0.75], sf.lit(1))
... ).show(truncate=False)
+--------------------------------------------------------+
|percentile(value, array(0.25, 0.5, 0.75), 1)            |
+--------------------------------------------------------+
|[0.7441991494121..., 9.9900713756..., 19.33740203080...]|
+--------------------------------------------------------+
>>> df.groupBy("key").agg(
...     sf.percentile("value", sf.lit(0.5), sf.lit(1))
... ).sort("key").show()
+---+-------------------------+
|key|percentile(value, 0.5, 1)|
+---+-------------------------+
|  0|     -0.03449962216667901|
|  1|        9.990389751837329|
|  2|       19.967859769284075|
+---+-------------------------+