A work colleague shared this challenge: extract the largest number from this text  as420lkjs09nsdk2324lkjsd099 !  The answer is 2324. How do we solve this? Formulas? VBA?

 

Requirements

Before solving this we need to know the full context. Is it 200 cells? 900000 cells?

We have 1000 alphanumeric cells. With this small amount we can consider almost any approach. 

Download my Excel file. Read below or watch my video.

 

Formula Solution

My colleague solved this challenge with this amazing array formula:

=MAX(IFERROR(MID(F4,ROW($A$1:$A$30),COLUMN($A$1:$Y$1))*1,””))

(after pasting it into formula bar you need to hold Ctrl & Shift keys and then press Enter)

At first I didn’t understand it. But once I audited it I just loved it! I’ll show you why!

 

Step 0

Let’s backup and review how we normally use the MID function. =MID(F4,5,3)

In cell F4, go to 5th position, extract 3 characters. We get  0lk  

Three parts: cell reference, starting point, number of characters to extract.

But we’re going to extract multiple values from each point inside cell F4 !

 

Step 1

All the big action happens in this step!

Guts of the Formula

 

 

 

 

 

 

Peek Inside The Array!

  1. in cell G4 click inside the formula bar
  2. highlight MID(F4,ROW($A$1:$A$30),COLUMN($A$1:$Y$1))
  3. press F9 key (on laptop hold Fn key and press F9 key)

This is a sample of what you’ll see!

 

 

 

 

We only want pure numbers like “4”, “42”, “420” etc.

(press ‘Esc’ key to go back to the formula)

 

Step 2

Multiply each string by 1. Text numbers convert into a number. Everything else is an error.

 

Step 3

Adding IFERROR changes all errors to blank. We are left with just the numbers.

 

Step 4

MAX function gives us the largest number 2324 !

 

Test It!

Change the numbers inside cell F4 or F5 to make sure the solution works.

 

2 Dimensional Array Explained

Q: What’s the difference between the commas and semi-colons inside the array?

A: The semi-colons are part of the vertical array created by the ROW function. The commas are part of the horizontal array created by the COLUMN function.

Each one is one dimensional but when used together a cool two dimensional array is created! 🙂

 

VBA solution?

Via a google search I found vba code that solved this but the code made my head spin. I love vba but it was intense. In this case my colleague’s array formula wins!

 

Power Query solution?

I’m curious if anyone can solve this using Power Query (Get & Transform). Please let me know 🙂

Update: comments here & below my video provide Power Query solutions! Thanks to Daniel Choi and Oyekunle SOPEJU !

 

FlashFill Solution

Thanks to Ankur shukla for suggesting the Flashfill solution! If the pattern is consistent flashfill’s internal logic can fill the results almost instantly after we show it the pattern.

 

About Me

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

I’ve been working with data since 2001 but I’m still learning so much!

I wouldn’t want to ever stop learning. I’m lucky that my job is also my hobby!

Check out my recommended training section.

 

4 Comments

  1. Daniel Choi says:

    Hello Kevin,
    Power query is not proficient, but …

    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
    AddCol = Table.AddColumn(Source, “Custom”, each List.Max(Table.TransformColumnTypes(Table.FromList(Text.Split(Text.Combine(List.Transform( Text.ToList([Column1] ), each if Value.FromText( Text.Lower( _ ) ) is text then ” ” else ( _ ) ) ),” “),Splitter.SplitByNothing() ), {{“Column1″, Int64.Type}})[Column1]) ),
    #”Changed Type” = Table.TransformColumnTypes(AddCol,{{“Column1”, type text}, {“Custom”, Int64.Type}})
    in
    #”Changed Type”
    * I referenced Oyekunle SOPEJU code.(Youtube)
    * Check out the Facebook message.

  2. Daniel Choi says:

    Hello Kevin,
    Power query is not proficient, but …

    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
    AddCol = Table.AddColumn(Source, “Custom”, each List.Max(Table.TransformColumnTypes(Table.FromList(Text.Split(Text.Combine(List.Transform( Text.ToList([Column1] ), each if Value.FromText( Text.Lower( _ ) ) is text then ” ” else ( _ ) ) ),” “),Splitter.SplitByNothing() ), {{“Column1″, Int64.Type}})[Column1]) ),
    #”Changed Type” = Table.TransformColumnTypes(AddCol,{{“Column1”, type text}, {“Custom”, Int64.Type}})
    in
    #”Changed Type”

    * I referenced Oyekunle SOPEJU code.
    Check out the Facebook message.
    ——————————————————————
    Oyekunle SOPEJU’s code can be found in YouTube comments.

  3. Robert H. Gascon says:

    Hello Kevin,
    That is a clever formula but I just want to ask for the reasons for these:
    1. 30 values in start_num argument; and
    2. 25 values in num_chars argument.
    Additional questions:
    1. For the start_num argument, can we also instead use COLUMN?
    2. For the num_chars argument, can we also instead use ROW?
    3. Can we make the start_num and num_chars arguments dynamic?
    I’m just curious.
    Robert

    1. Kevin Lehrbass says:

      Hi Robert,
      I believe the 30 values and 25 values are based on the length of the characters in the cell. Not precise values. They would need to be adjusted based on the actual data.
      Interesting questions….especially about making it dynamic. INDIRECT is often used to make the numbers dynamic but that would make it really slow (and I know you strongly dislike volatility!)…..so I’m curious if it could be done without creating a super slow formula.
      Do you have any ideas Robert? ….and of course as others have mentioned, the puzzle could be solved used other methods like Flashfill and Power Query. Thanks for reading my post Robert!
      Cheers,
      Kevin

Post a comment