pyspark.sql.GroupedData¶
- 
class pyspark.sql.GroupedData(jgd: py4j.java_gateway.JavaObject, df: pyspark.sql.dataframe.DataFrame)[source]¶
- A set of methods for aggregations on a - DataFrame, created by- DataFrame.groupBy().- New in version 1.3.0. - Changed in version 3.4.0: Supports Spark Connect. - Methods - agg(*exprs)- Compute aggregates and returns the result as a - DataFrame.- apply(udf)- It is an alias of - pyspark.sql.GroupedData.applyInPandas(); however, it takes a- pyspark.sql.functions.pandas_udf()whereas- pyspark.sql.GroupedData.applyInPandas()takes a Python native function.- applyInPandas(func, schema)- Maps each group of the current - DataFrameusing a pandas udf and returns the result as a DataFrame.- applyInPandasWithState(func, …)- Applies the given function to each group of data, while maintaining a user-defined per-group state. - avg(*cols)- Computes average values for each numeric columns for each group. - cogroup(other)- Cogroups this group with another group so that we can run cogrouped operations. - count()- Counts the number of records for each group. - max(*cols)- Computes the max value for each numeric columns for each group. - mean(*cols)- Computes average values for each numeric columns for each group. - min(*cols)- Computes the min value for each numeric column for each group. - pivot(pivot_col[, values])- Pivots a column of the current - DataFrameand perform the specified aggregation.- sum(*cols)- Computes the sum for each numeric columns for each group.