2021

Interger ids with spark

There is a bunch of different ways of generating numeric values in spark that can serve the purpose of unique identifiers. Here are some functions to utilize to accomplish that:

  1. row_number()
  2. monotonically_increasing_id()
  3. rdd.zipWithIndex()
  4. hash functions
3 min read
Back to Top ↑