Thursday, March 11, 2021

Add a uuid column to a spark dataframe

Recently, I came across a use case where i had to add a new column uuid in hex to an existing spark dataframe, here are two ways we can achieve that

import pyspark.sql.functions as f
from pyspark.sql.types import StringType

# method 1 use udf 
uuid_udf = f.udf(lambda : str(uuid.uuid4().hex), StringType())
df_with_uuid = df.withColumn('uuid', uuid_udf())

# method 2 use lit 
df_with_uuid = df.withColumn('uuid', f.lit(uuid.uuid4().hex))