pyspark.sql.avro.functions.to_avro¶
- 
pyspark.sql.avro.functions.to_avro(data: ColumnOrName, jsonFormatSchema: str = '') → pyspark.sql.column.Column[source]¶
- Converts a column into binary of avro format. - New in version 3.0.0. - Changed in version 3.5.0: Supports Spark Connect. - Parameters
- dataColumnor str
- the data column. 
- jsonFormatSchemastr, optional
- user-specified output avro schema in JSON string format. 
 
- data
 - Notes - Avro is built-in but external data source module since Spark 2.4. Please deploy the application as per the deployment section of “Apache Avro Data Source Guide”. - Examples - >>> from pyspark.sql import Row >>> from pyspark.sql.avro.functions import to_avro >>> data = ['SPADES'] >>> df = spark.createDataFrame(data, "string") >>> df.select(to_avro(df.value).alias("suite")).collect() [Row(suite=bytearray(b'\x00\x0cSPADES'))] - >>> jsonFormatSchema = '''["null", {"type": "enum", "name": "value", ... "symbols": ["SPADES", "HEARTS", "DIAMONDS", "CLUBS"]}]''' >>> df.select(to_avro(df.value, jsonFormatSchema).alias("suite")).collect() [Row(suite=bytearray(b'\x02\x00'))]