Best Practices for Transformations
Just wanted to see what the general approach is (best practice) for applying transformations in certain scenarios. For example, if several columns need to be transformed based on the value of 1 other column I see 2 approaches.
- In the Transform step perform If/Then/Else logic to apply changes only to specific values - this If/Then/Else logic would need to be applied to 1+ columns:
2. Split the data 1st, then apply direct transformation (no If/Then/Else logic) to passing rows, and then union back with other rows
I can think of pros/cons for both approaches. Did anyone have any thoughts or suggestions? Thanks
Answers
-
I would say keep the Workflow as readable as possible, unless the step(s) being used have a more significant processing effort like Find duplicates and Validate addresses and/or the volumes of data being processed by the Workflow are 10s or 100s millions records, in which case it may be worth experimenting to see which set up is most efficient.
In this case I think it makes most sense to me to do it in a single step (renamed with detail of what the step is doing as you are doing) to keep the Workflow simple and easy to read. Someone could make the argument that using multiple steps makes it clearer what is happening, so it might depend what else the Workflow is doing to the data.
0


