by basix
~1 min read


  • docs

Generating unique numeric values

There are a bunch of different ways of generating unique numeric ids in spark.

Here are some:

  • row_number
  • monotonically_increasing_id
  • rdd.zipWithIndex
  • a hash function

They all have their pros and cons. I will layout a usecase here and explain why a particular way was used.