pyspark.RDD.coalesce¶
- 
RDD.coalesce(numPartitions: int, shuffle: bool = False) → pyspark.rdd.RDD[T][source]¶
- Return a new RDD that is reduced into numPartitions partitions. - New in version 1.0.0. - Parameters
- numPartitionsint, optional
- the number of partitions in new - RDD
- shufflebool, optional, default False
- whether to add a shuffle step 
 
- Returns
 - See also - Examples - >>> sc.parallelize([1, 2, 3, 4, 5], 3).glom().collect() [[1], [2, 3], [4, 5]] >>> sc.parallelize([1, 2, 3, 4, 5], 3).coalesce(1).glom().collect() [[1, 2, 3, 4, 5]]