I saw this sequencing challenge from Chandoo “Generate a new sequence number when value changes“. His formula solution is great but I kept thinking “What if we have 20000 rows?“
What Does Sequencing Mean?
Each time we see a new Value it gets the max value above plus 1. If it’s a repeat Value then it gets the same Sequence number that was initially assigned.
Chandoo’s Formula Solution
Notice the expanding ranges. 1st cell is locked and 2nd is unlocked $B$3:C3 and $C$3:C3
If the Value already exists above VLOOKUP retrieves that sequence number. If VLOOKUP can’t find the Value (it’s new) then IFERROR gives us the max value from above. Visit chandoo!
What If We Have 20000 Rows?
Expanding ranges on large data-sets are slow to calculate. What can we do? Some ideas:
- a) helper formulas, pivot table, vlookup (pivot requires refresh but so easy & quick)
- b) 4 step formula solution (faster than expanding ranges but requires some thought)
- vba (should be possible by vba experts)
- Get & Transform (is this possible Ken Puls / Miguel Escobar?)
a) Helper Formulas & Pivot Table & Vlookup
Two helpers alongside the data: counter and match function. If the numbers are the same it’s the 1st occurrence of that number. We’ll need these rows in the pivot. Add a 2nd counter beside the pivot. A vlookup back in column E looks into the pivot & 2nd counter (columns I & J). The pivot saves some formula calculation time and it’s easy to explain. Get the Excel file(1 MB).
b) 4 Step Formula Solution
Calculates in seconds but takes some time to understand how it works. Follow along (Excel file 1 MB).
Step 1 add a counter
Type 1,2,3 and drag it down (or use the ROW function).
Step 2 =MATCH(B4,$B$4:$B$20003,0)
MATCH looks for value in column B. If MATCH position number = counter number then it’s a new value.
Step 3 max so far =MAX(G5,F4)
I type a 1 in cell F4. In cell F5 the MAX formula compares two values to see which one is bigger. As I drag this formula down I’ll always have the max sequence number close for when it’s needed!
Step 4 =IF(D5=E5,F4+1,INDEX($G$4:$G$20003,E5))
I type a 1 cell G4. In cell G5 I use the formula above. Let’s look at each part:
=IF(D5=E5,F4+1, if step 1 = step 2 we know it’s a new value so we increase our sequence number.
INDEX($G$4:$G$20003,E5)) if it’s a repeat value we use index with match number to get it.
My solutions work well with 20000 rows of data. If the amount of data keeps increasing (ie 500000 rows) consider more robust solutions (vba, get & transform) to automate the sequence generation.
Get & Transform
Can Get & Transform solve this? I’m currently enrolled in Power Query Academy. Why not sign-up and join me? Disclaimer: I’m also an affiliate for this course.
My name is Kevin Lehrbass. I live in Markham Ontario Canada. I’m a Data Analyst at McKinsey. This is my personal blog about Microsoft Excel.
I had two challenges this past weekend:
- Chandoo’s sequencing challenge
- Snow shoveling challenge
Mental and physical exercise!