13 Feb 2018

I saw this sequencing challenge from Chandoo “Generate a new sequence number when value changes“. His formula solution is great but I kept thinking “What if we have 20000 rows?

 

What Does Sequencing Mean?

Each time we see a new Value it gets the max value above plus 1. If it’s a repeat Value then it gets the same Sequence number that was initially assigned.

 

Chandoo’s Formula Solution

=IFERROR(VLOOKUP(B4,$B$3:C3,2,FALSE),SUM(MAX($C$3:C3),1))

Notice the expanding range. The 1st cell is locked and the 2nd unlocked  $B$3:C3  and  $C$3:C3 

If the Value already exists above VLOOKUP retrieves that sequence number. If VLOOKUP can’t find the Value (it’s new) then IFERROR gives us the max value from above. Visit chandoo!

 

What If We Have 20000 Rows?

I sometimes use expanding ranges but I rarely use them on large data-sets because they are so slow to calculate. So, what can we do? Here are some ideas:

  • a) helper formulas & pivot table & vlookup (pivot requires a refresh but so easy & quick)
  • b) 4 step formula solution (faster than expanding ranges but requires some thought)
  • vba (should be possible by vba experts)
  • Get & Transform (possible?)

 

a) Helper Formulas & Pivot Table & Vlookup

We use two helpers alongside the data: a counter and a match function. If the numbers are the same then it’s the 1st occurrence of that number. We’ll need these rows in the pivot. We add a 2nd counter beside the pivot. A vlookup back in column E looks into the pivot & 2nd counter (columns I & J). The pivot saves some formula calculation time and it’s easy to explain. Get the Excel file(1 MB).

 

b) 4 Step Formula Solution

It only takes a few seconds to calculate but it takes some time to understand how it works. Follow along using the Excel file(1 MB).

Step 1  add a counter

Type 1,2,3 and drag it down (or use the ROW function).

 

Step 2  =MATCH(B4,$B$4:$B$20003,0)

MATCH looks for the Value in column B. If MATCH position number = counter number this means that it’s a new Value.

 

Step 3 max so far  =MAX(G5,F4)

I type a 1 in cell F4. In cell F5 the MAX formula compares two values to see which one is bigger. As I drag this formula down I’ll always have the max sequence number close for when it’s needed!

 

Step 4  =IF(D5=E5,F4+1,INDEX($G$4:$G$20003,E5))

I type a 1 cell G4. In cell G5 I use the formula above. Let’s look at each part:

=IF(D5=E5,F4+1,  if step 1 = step 2 we know it’s a new value so we increase our sequence number.

INDEX($G$4:$G$20003,E5)) if it’s a repeat value we use index with match number to get it. 

 

Conclusion

Both of my solutions work well with 20000 rows of data. However, if the amount of data keeps increasing (i.e. 500000 rows) you would want to explore more robust solutions (vba, get & transform) to automate the sequence generation.

 

Get & Transform

Can Get & Transform solve this? I’m currently enrolled in Power Query Academy.  Why not sign-up and join me? Disclaimer: I’m also an affiliate for this course.

 

About Me

 

My name is Kevin Lehrbass. I live in Markham Ontario Canada. I’m a Data Analyst at McKinsey. This is my personal blog about Microsoft Excel.

I had two challenges this past weekend: 1) Chandoo’s sequencing challenge and 2) snow shoveling challenge! Mental and physical exercise!

 

 

Post a comment