Extracting insights from text can get messy! The good news is that challenging requirements can often lead to very creative solutions!
Here is some sample messy text
Ok…we see a mix of words, letters and numbers. What are we supposed to look for?
Each word must be found somewhere after the previous word in each cell to be counted. The text “4599Dog64947362068chases105C1376Squirrel1A20″ would qualify as ‘chases’ is found after ‘Dog’ and ‘Squirrel’ is found after ‘chases’.
Also, we want to be able to easily change any of the search words. I used Excel’s Data Validation feature (values listed in sheet ‘Lists’) to enable the end user to select eligible words (it could be changed to allow any text to be typed in).
Countif Formula Solves It!
In the criteria part of Excel’s Countif function (bold text) I combine the text with the * wildcard (which indicates any number of characters).
This is where it gets fun! The key part of the solution involves combining Excel’s REPT function with the ? wildcard (any 1 character). Why would we need to do this? Let’s look at requirement #2
We know that the * wildcard indicates any number of characters and the ? wildcard indicates any 1 character. But we need an easy way to have exactly X number of characters (X being a variable) ! In the current example we need exactly 11 characters between ‘Dog’ and ‘chases’ and then exactly 8 characters between ‘chases’ and ‘Squirrel’.
To clarify: we need a flexible solution so that the exact number of characters between our keywords can easily be increased or decreased. Our end users will buy us coffee for a week if they can modify the solution without touching the formulas!
Countif Formula Solves It Again!
=COUNTIF(Tbl_Text2[Text],”*”&$C$2& REPT(“?”,$D$2) &$E$2& REPT(“?”,$F$2)&$G$2&”*”)
The formula is very similar to the 1st formula above. The extra part is how I combine Excel’s REPT function with the ? wildcard (see bold above). Cells D2 and F2 each contain a number where we specify how many times to repeat the ? single wildcard so that in this case we get 11 and 8 characters!
You can easily change the number 11 and 8 using Data Validation. Let’s take a look.
What About Using Excel’s FIND feature?
Yes, we could solve both requirements using FIND feature. However, it would require a lot of typing to change the words and the number of ? wildcards. It would look like this for Requirement #2
Watch My YouTube Video
My name is Kevin Lehrbass. I live in Markham, Ontario, Canada. I’ve been studying, supporting, building, troubleshooting, teaching and dreaming in Excel since 2001. I’m a Data Analyst at McKinsey & Company.
Hundreds of millions of people use Excel everyday. Thousands of hours can be saved and costly errors can be avoided or corrected if you study this powerful software. Check out my videos and my blog posts.
Away from Excel I enjoy playing with my dog Cali (she doesn’t like Excel), learning Spanish and playing Chess.