pyspark.sql.functions.percentile#
- pyspark.sql.functions.percentile(col, percentage, frequency=1)[source]#
Returns the exact percentile(s) of numeric column expr at the given percentage(s) with value range in [0.0, 1.0].
New in version 3.5.0.
- Parameters
- Returns
Column
the exact percentile of the numeric column.
See also
Examples
>>> from pyspark.sql import functions as sf >>> key = (sf.col("id") % 3).alias("key") >>> value = (sf.randn(42) + key * 10).alias("value") >>> df = spark.range(0, 1000, 1, 1).select(key, value) >>> df.select( ... sf.percentile("value", [0.25, 0.5, 0.75], sf.lit(1)) ... ).show(truncate=False) +--------------------------------------------------------+ |percentile(value, array(0.25, 0.5, 0.75), 1) | +--------------------------------------------------------+ |[0.7441991494121..., 9.9900713756..., 19.33740203080...]| +--------------------------------------------------------+
>>> df.groupBy("key").agg( ... sf.percentile("value", sf.lit(0.5), sf.lit(1)) ... ).sort("key").show() +---+-------------------------+ |key|percentile(value, 0.5, 1)| +---+-------------------------+ | 0| -0.03449962216667901| | 1| 9.990389751837329| | 2| 19.967859769284075| +---+-------------------------+