A friend shared this challenge: extract the largest number from this text  as420lkjs09nsdk2324lkjsd099 !  The answer is 2324. How do we solve this? Formulas? VBA?

 

Requirements

Before solving this we need to know the full context. Is it 200 cells? 900000 cells?

We have 1000 alphanumeric cells. With this small amount we can consider almost any approach. 

Download my Excel file. Read below or watch my video.

 

Formula Solution

My colleague solved this challenge with this amazing array formula:

=MAX(IFERROR(MID(F4,ROW($A$1:$A$30),COLUMN($A$1:$Y$1))*1,””))

(after pasting it into formula bar you need to hold Ctrl & Shift keys and then press Enter)

At first I didn’t understand it. But once I audited it I just loved it! I’ll show you why!

 

Step 0

Let’s backup and review how we normally use the MID function. =MID(F4,5,3)

In cell F4, go to 5th position, extract 3 characters. We get  0lk  

Three parts: cell reference, starting point, number of characters to extract.

But we’re going to extract multiple values from each point inside cell F4 !

 

Step 1

All the big action happens in this step!

Guts of the Formula

 

 

 

 

 

 

Peek Inside The Array!

  1. in cell G4 click inside the formula bar
  2. highlight MID(F4,ROW($A$1:$A$30),COLUMN($A$1:$Y$1))
  3. press F9 key (on laptop hold Fn key and press F9 key)

This is a sample of what you’ll see!

 

 

 

 

We only want pure numbers like “4”, “42”, “420” etc.

(press ‘Esc’ key to go back to the formula)

 

Step 2

Multiply each string by 1. Text numbers convert into a number. Everything else is an error.

 

Step 3

Adding IFERROR changes all errors to blank. We are left with just the numbers.

 

Step 4

MAX function gives us the largest number 2324 !

 

Test It!

Change the numbers inside cell F4 or F5 to make sure the solution works.

 

2 Dimensional Array Explained

Q: What’s the difference between the commas and semi-colons inside the array?

A: The semi-colons are part of the vertical array created by the ROW function. The commas are part of the horizontal array created by the COLUMN function.

Each one is one dimensional but when used together a cool two dimensional array is created! 🙂

 

VBA solution?

Via a google search I found vba code that solved this but the code made my head spin. I love vba but it was intense. In this case my colleague’s array formula wins!

 

Power Query solution?

I’m curious if anyone can solve this using Power Query (Get & Transform). Please let me know 🙂

Update: comments here & below my video provide Power Query solutions! Thanks to Daniel Choi and Oyekunle SOPEJU !

 

FlashFill Solution

Thanks to Ankur shukla for suggesting the Flashfill solution! If the pattern is consistent flashfill’s internal logic can fill the results almost instantly after we show it the pattern.

 

About Me

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

I’ve been working with data since 2001 but I’m still learning so much!

I wouldn’t want to ever stop learning. I’m lucky that my job is also my hobby!

Check out my recommended training section.

 

13 Comments

  1. Daniel Choi says:

    Hello Kevin,
    Power query is not proficient, but …

    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
    AddCol = Table.AddColumn(Source, “Custom”, each List.Max(Table.TransformColumnTypes(Table.FromList(Text.Split(Text.Combine(List.Transform( Text.ToList([Column1] ), each if Value.FromText( Text.Lower( _ ) ) is text then ” ” else ( _ ) ) ),” “),Splitter.SplitByNothing() ), {{“Column1″, Int64.Type}})[Column1]) ),
    #”Changed Type” = Table.TransformColumnTypes(AddCol,{{“Column1”, type text}, {“Custom”, Int64.Type}})
    in
    #”Changed Type”
    * I referenced Oyekunle SOPEJU code.(Youtube)
    * Check out the Facebook message.

  2. Daniel Choi says:

    Hello Kevin,
    Power query is not proficient, but …

    let
    Source = Excel.CurrentWorkbook(){[Name=”Table1″]}[Content],
    AddCol = Table.AddColumn(Source, “Custom”, each List.Max(Table.TransformColumnTypes(Table.FromList(Text.Split(Text.Combine(List.Transform( Text.ToList([Column1] ), each if Value.FromText( Text.Lower( _ ) ) is text then ” ” else ( _ ) ) ),” “),Splitter.SplitByNothing() ), {{“Column1″, Int64.Type}})[Column1]) ),
    #”Changed Type” = Table.TransformColumnTypes(AddCol,{{“Column1”, type text}, {“Custom”, Int64.Type}})
    in
    #”Changed Type”

    * I referenced Oyekunle SOPEJU code.
    Check out the Facebook message.
    ——————————————————————
    Oyekunle SOPEJU’s code can be found in YouTube comments.

    1. Kevin Lehrbass says:

      Thank you for the Power Query solution Daniel and thanks for watching my video!
      Regards,
      Kevin

  3. Robert H. Gascon says:

    Hello Kevin,
    That is a clever formula but I just want to ask for the reasons for these:
    1. 30 values in start_num argument; and
    2. 25 values in num_chars argument.
    Additional questions:
    1. For the start_num argument, can we also instead use COLUMN?
    2. For the num_chars argument, can we also instead use ROW?
    3. Can we make the start_num and num_chars arguments dynamic?
    I’m just curious.
    Robert

    1. Kevin Lehrbass says:

      Hi Robert,
      I believe the 30 values and 25 values are based on the length of the characters in the cell. Not precise values. They would need to be adjusted based on the actual data.
      Interesting questions….especially about making it dynamic. INDIRECT is often used to make the numbers dynamic but that would make it really slow (and I know you strongly dislike volatility!)…..so I’m curious if it could be done without creating a super slow formula.
      Do you have any ideas Robert? ….and of course as others have mentioned, the puzzle could be solved used other methods like Flashfill and Power Query. Thanks for reading my post Robert!
      Cheers,
      Kevin

    2. Zoran Stanojevic says:

      Hi guys! To try answer Robert’s Qs:
      The start_num value (going from 1 to 30) is determining starting position for the MID() fn so I guess 30 should represent the closest (while remaining at least bigger) value for the given example. The second one, num_chars, is transposed against the former so the combo produces matrix of all starting positions vs lengths thus making “the longest” streak of numbers captured as distinct, thus making it possible to identify via MAX() fn.
      Interchanging ROW() for COLUMN() and vv won’t work.
      Yes, we can make start_num and num_chars arguments dynamically calculated via ROW(INDIRECT()) handle.

  4. Robert H. Gascon says:

    Hello Kevin,
    Recently, I discovered the concatenation of “**0” from this post:
    https://excelxor.com/2015/06/29/simultaneous-locating-of-first-and-last-numbers-in-a-string/
    Inspired by the foregoing, I suggest this dynamic, non-array, non-volatile formula:
    =-AGGREGATE(15,6,-(MID(F4,
    ROW(INDEX($A:$A,1):INDEX($A:$A,LEN(F4))),
    COLUMN(INDEX($1:$1,1):INDEX($1:$1,LEN(F4))))&”**0″),1)
    The crux of the aforesaid concatenation is evident when we modify the alphanumeric string to this:
    as420lsep09noct2324novsd099
    Incidentally, your suggested formula is equivalent to the omission of such concatenation, like this:
    =-AGGREGATE(15,6,-MID(F4,
    ROW(INDEX($A:$A,1):INDEX($A:$A,LEN(F4))),
    COLUMN(INDEX($1:$1,1):INDEX($1:$1,LEN(F4)))),1)
    Rather than 2324, the foregoing formula shockingly returns 155138, which is the serial number for October 1, 2324! With the concatenation of “**0”, the answer remains at 2324, as expected.
    Cheers,
    Robert

    1. Kevin Lehrbass says:

      Hi Robert,
      Aggregate is a function I need to practice! It’s amazing and as you pointed out it’s a non-array…and your entire formula is non-volatile! Also…thanks for sharing the “**0” idea from excelxor !
      At this point I can understand about half of the formula….tomorrow with a fresh mind and a fresh cup of coffee hopefully I’ll finish auditing it!
      Cheers,
      Kevin

    2. Zoran Stanojevic says:

      I was playing little bit with solutions presented here and so far the AGGREGATE() version did find every solution as the previous one, featuring IFERROR(). Until that is I tweaked the input string by adding “1e” in front of the last number portion of “099”. The IFERROR() formula was still correctly identifying 1e99 as the biggest while the AGGREGATE() remained showing 2324!
      This was interesting and let me found whole lot of scenarios leading to false answers. I think one example will suffice.. Imagine if there was negative number, written with a minus sign, there would inevitably be one instance with the minus sign cut off and thus promoted into the positive – therefore legit competitor for the alleged biggest value.

      1. Kevin Lehrbass says:

        Hi Zoran,
        Very good point about the “…promoted into the positive”. We always have to be vigilant to make sure that the solution works for the exact case that we have. There are so many pitfalls. Thanks for reading and commenting!
        Cheers,
        Kevin

      2. Robert H. Gascon says:

        Hello Zoran,
        I must admit that my suggested formula correctly works only when the string excludes negative numbers. Thanks for the observation. Nonetheless, I still stick with the same formula but with a minor modification, like this:
        =AGGREGATE(14,6,–(MID(F4,
        ROW(INDEX($A:$A,1):INDEX($A:$A,LEN(F4))),
        COLUMN(INDEX($1:$1,1):INDEX($1:$1,LEN(F4))))&”**0″),1)

Post a comment