pyspark.sql.DataFrame.checkpoint¶
- 
DataFrame.checkpoint(eager: bool = True) → pyspark.sql.dataframe.DataFrame[source]¶
- Returns a checkpointed version of this - DataFrame. Checkpointing can be used to truncate the logical plan of this- DataFrame, which is especially useful in iterative algorithms where the plan may grow exponentially. It will be saved to files inside the checkpoint directory set with- SparkContext.setCheckpointDir().- New in version 2.1.0. - Parameters
- eagerbool, optional, default True
- Whether to checkpoint this - DataFrameimmediately.
 
- Returns
- DataFrame
- Checkpointed DataFrame. 
 
 - Notes - This API is experimental. - Examples - >>> import tempfile >>> df = spark.createDataFrame([ ... (14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> with tempfile.TemporaryDirectory() as d: ... spark.sparkContext.setCheckpointDir("/tmp/bb") ... df.checkpoint(False) DataFrame[age: bigint, name: string]