KeyValueGroupedDataset (Spark 2.1.1 JavaDoc)

Object
- org.apache.spark.sql.KeyValueGroupedDataset<K,V>

All Implemented Interfaces:

java.io.Serializable
```
public class KeyValueGroupedDataset<K,V>
extends Object
implements scala.Serializable
```
:: Experimental :: A Dataset has been logically grouped by a user specified grouping key. Users should not construct a KeyValueGroupedDataset directly, but should instead call groupByKey on an existing Dataset.

Since:

2.0.0

See Also:
Serialized Form

Method Summary

Methods
Modifier and Type	Method and Description
`<U1> Dataset<scala.Tuple2<K,U1>>`	`agg(TypedColumn<V,U1> col1)` Computes the given aggregation, returning a `Dataset` of tuples for each unique key and the result of computing this aggregation over all elements in the group.
`<U1,U2> Dataset<scala.Tuple3<K,U1,U2>>`	`agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2)` Computes the given aggregations, returning a `Dataset` of tuples for each unique key and the result of computing these aggregations over all elements in the group.
`<U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>>`	`agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3)` Computes the given aggregations, returning a `Dataset` of tuples for each unique key and the result of computing these aggregations over all elements in the group.
`<U1,U2,U3,U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>>`	`agg(TypedColumn<V,U1> col1, TypedColumn<V,U2> col2, TypedColumn<V,U3> col3, TypedColumn<V,U4> col4)` Computes the given aggregations, returning a `Dataset` of tuples for each unique key and the result of computing these aggregations over all elements in the group.
`<U,R> Dataset<R>`	`cogroup(KeyValueGroupedDataset<K,U> other, CoGroupFunction<K,V,U,R> f, Encoder<R> encoder)` (Java-specific) Applies the given function to each cogrouped data.
`<U,R> Dataset<R>`	`cogroup(KeyValueGroupedDataset<K,U> other, scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator<U>,scala.collection.TraversableOnce<R>> f, Encoder<R> evidence$5)` (Scala-specific) Applies the given function to each cogrouped data.
`Dataset<scala.Tuple2<K,Object>>`	`count()` Returns a `Dataset` that contains a tuple with each key and the number of items present for that key.
`<U> Dataset<U>`	`flatMapGroups(FlatMapGroupsFunction<K,V,U> f, Encoder<U> encoder)` (Java-specific) Applies the given function to each group of data.
`<U> Dataset<U>`	`flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce<U>> f, Encoder<U> evidence$3)` (Scala-specific) Applies the given function to each group of data.
`<L> KeyValueGroupedDataset<L,V>`	`keyAs(Encoder<L> evidence$1)` Returns a new `KeyValueGroupedDataset` where the type of the key has been mapped to the specified type.
`Dataset<K>`	`keys()` Returns a `Dataset` that contains each unique key.
`<U> Dataset<U>`	`mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f, Encoder<U> evidence$4)` (Scala-specific) Applies the given function to each group of data.
`<U> Dataset<U>`	`mapGroups(MapGroupsFunction<K,V,U> f, Encoder<U> encoder)` (Java-specific) Applies the given function to each group of data.
`<W> KeyValueGroupedDataset<K,W>`	`mapValues(scala.Function1<V,W> func, Encoder<W> evidence$2)` Returns a new `KeyValueGroupedDataset` where the given function `func` has been applied to the data.
`<W> KeyValueGroupedDataset<K,W>`	`mapValues(MapFunction<V,W> func, Encoder<W> encoder)` Returns a new `KeyValueGroupedDataset` where the given function `func` has been applied to the data.
`org.apache.spark.sql.execution.QueryExecution`	`queryExecution()`
`Dataset<scala.Tuple2<K,V>>`	`reduceGroups(scala.Function2<V,V,V> f)` (Scala-specific) Reduces the elements of each group of data using the specified binary function.
`Dataset<scala.Tuple2<K,V>>`	`reduceGroups(ReduceFunction<V> f)` (Java-specific) Reduces the elements of each group of data using the specified binary function.

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
 - queryExecution
```
public org.apache.spark.sql.execution.QueryExecution queryExecution()
```
 - keyAs
```
public <L> KeyValueGroupedDataset<L,V> keyAs(Encoder<L> evidence$1)
```
 Returns a new KeyValueGroupedDataset where the type of the key has been mapped to the specified type. The mapping of key columns to the type follows the same rules as as on Dataset.
 
 Parameters:
 evidence$1 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - mapValues
```
public <W> KeyValueGroupedDataset<K,W> mapValues(scala.Function1<V,W> func,
 Encoder<W> evidence$2)
```
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
```
 // Create values grouped by key from a Dataset[(K, V)]
 ds.groupByKey(_._1).mapValues(_._2) // Scala
 
```
 Parameters:
 func - (undocumented)
 evidence$2 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 2.1.0
 - mapValues
```
public <W> KeyValueGroupedDataset<K,W> mapValues(MapFunction<V,W> func,
 Encoder<W> encoder)
```
 Returns a new KeyValueGroupedDataset where the given function func has been applied to the data. The grouping key is unchanged by this.
```
 // Create Integer values grouped by String key from a Dataset<Tuple2<String, Integer>>
 Dataset<Tuple2<String, Integer>> ds = ...;
 KeyValueGroupedDataset<String, Integer> grouped =
 ds.groupByKey(t -> t._1, Encoders.STRING()).mapValues(t -> t._2, Encoders.INT()); // Java 8
 
```
 Parameters:
 func - (undocumented)
 encoder - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 2.1.0
 - keys
```
public Dataset<K> keys()
```
 Returns a Dataset that contains each unique key. This is equivalent to doing mapping over the Dataset to extract the keys and then running a distinct operation on those.
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - flatMapGroups
```
public  Dataset flatMapGroups(scala.Function2<K,scala.collection.Iterator<V>,scala.collection.TraversableOnce> f,
 Encoder evidence$3)
```
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 f - (undocumented)
 evidence$3 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - flatMapGroups
```
public  Dataset flatMapGroups(FlatMapGroupsFunction<K,V,U> f,
 Encoder encoder)
```
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 f - (undocumented)
 encoder - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - mapGroups
```
public  Dataset mapGroups(scala.Function2<K,scala.collection.Iterator<V>,U> f,
 Encoder evidence$4)
```
 (Scala-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 f - (undocumented)
 evidence$4 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - mapGroups
```
public  Dataset mapGroups(MapGroupsFunction<K,V,U> f,
 Encoder encoder)
```
 (Java-specific) Applies the given function to each group of data. For each unique group, the function will be passed the group key and an iterator that contains all of the elements in the group. The function can return an element of arbitrary type which will be returned as a new Dataset.
 This function does not support partial aggregation, and as a result requires shuffling all the data in the Dataset. If an application intends to perform an aggregation over each key, it is best to use the reduce function or an org.apache.spark.sql.expressions#Aggregator.
 Internally, the implementation will spill to disk if any given group is too large to fit into memory. However, users must take care to avoid materializing the whole iterator for a group (for example, by calling toList) unless they are sure that this is possible given the memory constraints of their cluster.
 
 Parameters:
 f - (undocumented)
 encoder - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - reduceGroups
```
public Dataset<scala.Tuple2<K,V>> reduceGroups(scala.Function2<V,V,V> f)
```
 (Scala-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Parameters:
 f - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - reduceGroups
```
public Dataset<scala.Tuple2<K,V>> reduceGroups(ReduceFunction<V> f)
```
 (Java-specific) Reduces the elements of each group of data using the specified binary function. The given function must be commutative and associative or the result may be non-deterministic.
 
 Parameters:
 f - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - agg
```
public <U1> Dataset<scala.Tuple2<K,U1>> agg(TypedColumn<V,U1> col1)
```
 Computes the given aggregation, returning a Dataset of tuples for each unique key and the result of computing this aggregation over all elements in the group.
 
 Parameters:
 col1 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - agg
```
public <U1,U2> Dataset<scala.Tuple3<K,U1,U2>> agg(TypedColumn<V,U1> col1,
 TypedColumn<V,U2> col2)
```
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 col1 - (undocumented)
 col2 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - agg
```
public <U1,U2,U3> Dataset<scala.Tuple4<K,U1,U2,U3>> agg(TypedColumn<V,U1> col1,
 TypedColumn<V,U2> col2,
 TypedColumn<V,U3> col3)
```
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 col1 - (undocumented)
 col2 - (undocumented)
 col3 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - agg
```
public <U1,U2,U3,U4> Dataset<scala.Tuple5<K,U1,U2,U3,U4>> agg(TypedColumn<V,U1> col1,
 TypedColumn<V,U2> col2,
 TypedColumn<V,U3> col3,
 TypedColumn<V,U4> col4)
```
 Computes the given aggregations, returning a Dataset of tuples for each unique key and the result of computing these aggregations over all elements in the group.
 
 Parameters:
 col1 - (undocumented)
 col2 - (undocumented)
 col3 - (undocumented)
 col4 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - count
```
public Dataset<scala.Tuple2<K,Object>> count()
```
 Returns a Dataset that contains a tuple with each key and the number of items present for that key.
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - cogroup
```
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other,
 scala.Function3<K,scala.collection.Iterator<V>,scala.collection.Iterator,scala.collection.TraversableOnce<R>> f,
 Encoder<R> evidence$5)
```
 (Scala-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 
 Parameters:
 other - (undocumented)
 f - (undocumented)
 evidence$5 - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0
 - cogroup
```
public <U,R> Dataset<R> cogroup(KeyValueGroupedDataset<K,U> other,
 CoGroupFunction<K,V,U,R> f,
 Encoder<R> encoder)
```
 (Java-specific) Applies the given function to each cogrouped data. For each unique group, the function will be passed the grouping key and 2 iterators containing all elements in the group from Dataset this and other. The function can return an iterator containing elements of an arbitrary type which will be returned as a new Dataset.
 
 Parameters:
 other - (undocumented)
 f - (undocumented)
 encoder - (undocumented)
 
 Returns:
 (undocumented)
 Since:
 
 1.6.0

Class KeyValueGroupedDataset<K,V>

Method Summary

Methods inherited from class Object

Method Detail

queryExecution

keyAs

mapValues

mapValues

keys

flatMapGroups

flatMapGroups

mapGroups

mapGroups

reduceGroups

reduceGroups

agg

agg

agg

agg

count

cogroup

cogroup