Numbers describe height, intelligence and tests. How do we compare these numbers to others?

- Tim is 2 meters tall! That’s extremely tall!
- Sonja’s IQ is 117. She’s a genius!
- I got 76% on the test. Not bad.

We’ll use height. Normal Distribution & Standard Deviation allow us to compare height.

It’s easier to visualize than test scores or IQ. In a crowd you compare the height of those around you. Internally you do the math.

Bill Jelen (Mr Excel) and I in 2015.

Bill is 6 feet 1 inch (185 cm). I am 6 feet 4 inches (194 cm).

Instead of using adjectives to describe our height we’ll use standard deviations.

**What do height, I.Q. and test scores have in common?** All are normally distributed.

Chart below shows: most are close to the mean(average), half are above the mean and half are below.

Some are taller (to the right) or shorter (to the left). Very few are extremely tall or short.

In the U.S. average male height is 5 feet 9.3 inches (69.3 inches, 176 cm). source

Left and right sides of the chart are symmetric. Normally distributed data-sets share this symmetry but the spread varies. Values may cluster around the mean or be more spread out. Standard deviation measures this. Normal distribution is also known as Gaussian distribution and the bell curve.

Measures the spread of the numbers from the average (mean) in normally distributed data-sets.

- 68.27% of values are within one standard deviation of the mean.
- 95.45% of values are within two standard deviation of the mean.
- 99.73% of values are within three standard deviation of the mean.

It works for all normally distributed data-sets. In statistics this is known as the empirical rule.

Data clusters closer to the mean as standard deviation decreases. The empirical rule still works.

For U.S. male height 1 standard deviation = 2.94 inches (7.5 cm). Let’s explore this in Excel!

In sheet ‘INPUTS & DATA’ I created a sample date-set using:

- mean male height of 5 feet 9.3 inches (69.3 inches, 176 cm)
- standard deviation of 2.94 inches (7.5 cm)
- formula in column E =NORM.INV(RAND(), 69.3, 2.94) (pasted as values)

Below we see the sample of 30000 produced results very close to the empirical rule.

- 20510 (68.37%) within 1 stand dev of mean (66.37 to 72.24 inches, 68.57 to 183.49 cm)
- 28618 (95.39%) within 2 stand dev of mean (63.43 to 75.18 inches, 61.11 to 190.95 cm)
- 29932 (99.77%) within 3 stand dev of mean (60.49 to 78.11 inches, 53.65 to 198.41 cm)

Displayed using feet and inches:

- 68.37% are between 5 feet 6 inches and 6 feet 0 inches.
- 95.39% are between 5 feet 3.4 inches and 6 feet 3.2 inches.
- 99.77% are between 5 feet 0 inches and 6 feet 6 inches.

Bill’s 1.26 standard deviation is between the 1st & 2nd deviation above the mean.

My 2.38 standard deviation is between the 2nd & 3rd deviation above the mean.

Men above 6 feet 6 inches (past 3rd standard deviation) are extremely tall. Kevin Durant is 6 feet 11 inches with a standard deviation of 4.7

This entire post I’ve been referring to male height in the United States.

**What if Bill and I were to travel to Bolivia? Would we be taller?**

Of course not but we’d be perceived as being taller by the local people! In Bolivia my height would put me 4.5 standard deviations above the 5 feet 3 inch mean (2.92 inch SD). Bill would be 3.4 standard deviations above the mean. If we traveled to The Netherlands my standard deviation would be 1.7 and Bill’s would be 0.6. We would blend in quietly. Isn’t that weird?

If a data-set is perfectly symmetrical (left side of chart is exactly like right side) the skew is zero.

Our sample of 30000 gave us a skew of 0.00561 Closer to zero means more symmetrical.

In sheet ‘STATS’ row 26 I calculate the skew for various small samples.

- skew = -0.6108 (10 rows of sample data)
- skew = -0.1818 (100 rows of sample data)
- skew = -0.0972 (1000 rows of sample data)
- skew = 0.0072 (10000 rows of sample data)

The skew decreases as we include more data! If a data-set is truly normally distributed the skew approaches zero as the sample increases.

In sheet ‘STATS’ rows 23 & 24 we see the mean and median values for the small samples. Row 25 shows that the absolute difference decreases as we include more sample data!

The mean and median values get closer and closer as we increase the sample size.

- 0.832 absolute difference (10 rows of sample data)
- 0.169 absolute difference (100 rows of sample data)
- 0.041 absolute difference (1000 rows of sample data)
- 0.026 absolute difference (10000 rows of sample data)

Horizontal axis labels are linked to cells I14:I22 (sheet STANDARD DEVIATION CHART).

To get the right look and functionality I used three chart tricks.

(1) They weren’t displaying properly (text was jammed together)

To fix this I forced two carriage returns using the CHAR function. Character 10 does the magic. Now the text displays nicely in three lines.

(2) I wanted a way to easily switch between metric and imperial

I added a check box (check = metric, uncheck = imperial) and used the TRUE FALSE in the formula.

(3) I only wanted to display the integer (not all the decimals)

I used the TEXT function to format the number. I also could have used the FLOOR function.

The end formula is a bit long but it gets the job done:

=IF(H13=0,””,IF(H13=1,”man “,”men “)&CHAR(10)&IF($J$12,TEXT(F12,0)&” to “&TEXT(F13,0)&CHAR(10)&” inches”,TEXT(G12,0)&” to “&TEXT(G13,0)&CHAR(10)&” cm”))

Download my Excel file. There are 5 sheets:

- INPUTS & DATA enter parameters to create data-set
- PIVOT TABLE & CHART summarize & visualize
- STANDARD DEVIATION CHART visualize by standard deviations
- STATS additional statistics
- HEIGHT EXAMPLES used in this post

To see how to create the sample data you can replace the pasted values in column E sheet INPUTS & DATA) with formula =NORM.INV(RAND(), $B$5, $B$8)

There’s so much more to learn! Here are some interesting www links:

- Investopedia’s Normal Distribution explanation.
- Interesting articles and calculators at Tall.Life.com
- KhanAcademy NormalDistributionIntro and Excel file.
- How tall is tall discussion at Quora.
- Advanced statistics from statisticsbyjim

My name is Kevin Lehrbass. I’m a Data Analyst.

Normal stores rarely sell my pant size. L.L. Bean’s catalog used to have my size but not any more. Big & Tall stores say “we don’t have that small size” or they do but a single pair costs a fortune.

Fortunately that day I did find my size and there was a sale! I bought all the pairs they had in stock!

]]>

A UDF is a special kind of VBA. It allows you to create functions! Let’s start with an example.

Here we shared formula solutions to Leila’s challenge: *look for the App in range B5:D45, get column header B4:D4. *See how Geraldo’s GetDiv UDF solves this!

Geraldo shared this UDF solution: (Download zipped Excel file)

Function GetDiv(LookupVal As String, Titles As Range, SearchArea As Range) As String

GetDiv = Cells(Titles.Row, SearchArea.Find(LookupVal).Column)

End Function

Geraldo’s function used in cell L5:

=GetDiv(H5,$B$4:$D$4,$B$5:$D$45)

His explanation helped me understand:

The function expects 3 parameters GetDiv (

LookupVal,Titles,SearchArea) as below:

1stLookupValis the value we are looking for ($H5 for the first App name)

2ndTitlesis the range where we have the Divisions’ names: $B$4:$D$4

3rdSearchAreais the area where we’ll look for theLookupVal: $B$5:$D$45

Initially I thought SearchArea was a vba property(it’s a name for where we look for the App).

Now let’s explore further by playing with the code!

This part SearchArea.Find(LookupVal).Column seemed odd so in sheet ‘test Geraldo’s udf‘ I started playing with some vba.

**How does Range.Cells property work?** (SearchArea is a range parameter)

In a basic form it needs two numbers: one for a row, one for a column.

This code assigns a text value to cell E3:

‘provide simple co-ordinates to Cells

Sub Assign_Text_To_A_Cell()

Cells(3, 5).Value = “assign text to a single cell”

End Sub

Cell E3 (row 3, column 5) now has text “assign text to a single cell”.

We could also extract the existing value from a cell:

‘extract the value from a cell and display it in a message box

Sub Extract_and_Display_value()

Dim getvalue As String

getvalue = Cells(9, 3).Value

MsgBox (getvalue)

End Sub

There are many possible applications of Range.Cells! Let’s go back to Geraldo’s code:

Cells(Titles.Row, SearchArea.Find(LookupVal).Column)

Titles.Row is range B4:D4. SearchArea.Find(LookupVal).Column provides a column number so we know which text from B4:D4 to retrieve.

Another way to explain SearchArea.Find(LookupVal).Column is:

In range G16:I18 what is the column number where we find the word “fish”?

This code displays the column number where “fish” is found:

‘Let’s examine syntax: SearchArea.Find(LookupVal).Column from UDF GetDiv

Sub examinesyntax()

Dim LookupVal As String, SearchArea As Range, GetColumnNumber As Integer

LookupVal = “fish”

‘SearchArea = Range(“G16:I18”) <–wrong syntax. Google search helped to create next line

Set SearchArea = Sheets(“test Geraldo’s udf”).Range(“G16:I18”)

GetColumnNumber = SearchArea.Find(LookupVal).Column

‘Had to google for help with MsgBox syntax

MsgBox (“What is the column number where we find the word fish?” & vbCrLf & “It’s column ” & GetColumnNumber & “.”)

End Sub

Building the code really helped me understand SearchArea.Find(LookupVal).Column.

Ready for more vba fun?!

Let’s see how the code behind the right button works (show column number for random item) !

In this case the random word is “orange”.

Rerun to see different random word.

Here’s the code behind the message box:

‘Add more functionality: select random word, show col number

Sub examinesyntaxv2()

Dim LookupVal As String, SearchArea As Range, GetColumnNumber As Integer, Randr As Integer, Randc As Integer, Lookupword As String

Randr = WorksheetFunction.RandBetween(16, 18) ‘random row

Randc = WorksheetFunction.RandBetween(7, 9) ‘random column

Lookupword = Cells(Randr, Randc).Value

‘SearchArea = Range(“G16:I18”) <–wrong syntax. Google search helped to create next line

Set SearchArea = Sheets(“test Geraldo’s udf”).Range(“G16:I18”)

GetColumnNumber = SearchArea.Find(Lookupword).Column

‘Had to google for help with MsgBox syntax

MsgBox (“What is the column number where we find the word ” & Lookupword & “?” & vbCrLf & “It’s column ” & GetColumnNumber & “.”)

End Sub

Now you won’t always get “fish” Notice that vbCrLf creates a carriage return in the MsgBox.

The knowledge doesn’t fall from the sky. I have to practice! Necessity and curiosity (and caffeine) fuel my learning but there’s a lot of tinkering, many google searches for syntax ideas and some grit to get the code just right

By the way…who is Geraldo??? Who is the person behind the GetDiv UDF?

Geraldo is a Data Analyst from São Paulo Brazil.

He loves challenges and creating solutions to solve them!

Geraldo has worked in Finance and knows his way around Excel (pivots, formulas, vba), SQL, Cobol, DBase III, VBS, ASP/XML and now the new stuff: Power BI (M, Dax) with Python being his next thing to conquer! An amazing set of skills!

Thanks for the UDF Geraldo and I look forward to learning more from you!

Learn, build stuff, repeat. That’s it! You can learn a lot of vba on your own but there are many tips and nuances that you can learn from a professional. More structured learning often helps get past certain hurdles.

How you learn is up to you (books, videos, etc). Here are some suggestions:

- Paul Kelley’s website
- Leila Gharani’s YouTube channel
- Jon Acampora’s How to write a UDF
- Dan Strong’s YouTube channel
- Dinesh Kumar Takyar’s YouTube channel
- Sumit Bansal’s Guide to creating a UDF

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

I often spend hours playing with Microsoft Excel but eventually a few things happen:

- My dogs get mad at me (they want to play)
- I need more coffee
- There’s a strange pain in my stomach. Oh yeah…food.

I have two wonderful dogs Cali and Fenton. Here you see Cali demanding that I take a break from the spreadsheet. We’ll go play in the backyard for awhile and then read a book (Excel or Power BI) on the couch

]]>

1)**Kevin’s FEN viewer basic**, 2)**Kevins-FEN-viewer-plus-v3** (database, FEN explanation).

Given this FEN text…

r6k/2R5/6R1/pp1Ppp2/8/Pn2B1Pr/4KP2/8 w – – 0 1

…we can create this chess position:

A FEN is split into 8 parts separated by “/”. Each part is a row on a chess board.

**“r6k” **describes row 8 at the top. Let’s examine each item:

- lowercase “
**r**” = black rook (top left a8), - number “
**6**” = 6 consecutive blanks squares - lowercase “
**k**” = black King (top right h8)

**“****2R5****” **describes row 7.

- number “
**2**” = blank squares (a7 & b7) - uppercase “
**R**” = white rook - number “
**5**” = 5 blank spaces

Near the end the “**w**” indicates that it’s white’s move.

Here we see each row’s FEN code:

Robert Gascon’s ‘**Excel Chess Games Viewer**‘ inspired me to create this FEN viewer two weeks ago. I used 6 steps spread across 42 columns.

Unhide columns to see the formulas:

- on the ribbon select View and check Headings
- select columns N to BI
- right click & unhide

Steps start in column P and move to the right:

Step 1 splits FEN r6k/2R5/6R1/pp1Ppp2/8/Pn2B1Pr/4KP2/8 out per row in column P

Note: each number = consecutive blank squares. Step 6 has 8 cells representing each square in a chess row. ** A FEN is compact. My idea? Spread the FEN over 8 squares of a chess row**.

“r6k” becomes “r666666k”. “2R5″ becomes “22R55555” (step 6 in column AX). Audit formulas in all steps to understand fully.

On the chess board look at cell C7 array formula:

=IFERROR(INDEX($BH$7:$BH$19,MATCH(TRUE,EXACT(AX7,$BG$7:$BG$19),0)),””)

Cell C7 looks for AX7 value “r” in column BG. The answer is a chess icon from column BH.

A couple of important parts of the formula:

- “r” is different from “R” so I used MATCH(TRUE,EXACT(
- I used IFERROR (Numbers are blank squares. There’s nothing to display.)

A good summary from Wikipedia:

FEN is based on a system developed by Scottish newspaper journalist David Forsyth. Forsyth’s system became popular in the 19th century; Steven J. Edwards extended it to support use by computers. FEN is an integral part of the Portable Game Notation for chess games, since FEN is used to define initial positions other than the standard one. FEN does not provide sufficient information to decide whether a draw by threefold repetition may be legally claimed or a draw offer may be accepted; for that, a different format such as Extended Position Description is needed.

- Excel formula calculates value of Chess pieces
- Robert Gascon’s Chess game viewer
- Diarmuid Early’s Chess game viewer
- Daniel Ferry’s Chess game viewer
- Pedro Wave’s Chess board PGN viewer

- Amazing Videos from Agadmator YouTube Chess Channel
- Play and learn chess: www.chess.com
- Study chess tactics (online or via book)
- Chess phone apps (i.e. ‘Chess Time’, ‘Shredder Chess’, ‘Chess Tactics Pro’)

FEN examples from ‘Kevin’s FEN viewer plus v3’:

- Row 16 agadmator: find the next move (Carlsen vs Anand)

Agadmator sometimes asks us to pause the video and find the best next move.

Subscribe to his YouTube channel! You will learn a lot.

- Row 17 my game: find the mate combo! (black’s move)

Use my Excel file to collect your FENs.

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham Ontario Canada.

I know….two chess related Excel posts in a week is a bit too much for most. But some of us love both so why not?

]]>

Canada uses humidex calculation while the U.S. uses HeatIndex.

Humidex(short forhumidity index) is an index number used by Canadian meteorologists to describe how hot the weather feels to the average person, by combining the effect of heat and humidity

The

heat index(HI) orhumitureis an index that combines air temperature and relative humidity, in shaded areas, to posit a human-perceived equivalent temperature, as how hot it would feel if the humidity were some other value in the shade.

I found what appears to be the common formulas for Humidex & HeatIndex.

Let’s start with **Humidex formula**. You’ve seen a heat advisory but a formula advisory?

This formula is extremely long and tedious to read. It may cause dizziness, exhaustion, confusion, etc. Seek professional help if you feel any of these symptoms.

By entering a few carriage returns we can isolate each nested IF. It’s a bit easier to read.

Upon further examination I noticed that this part repeats several times:

(((-42.379+2.04901523*D4+10.14333127*C4-0.22475541*D4*C4-0.00683783*D4*D4-0.05481717*C4*C4+0.00122874*D4*D4*C4+0.00085282*D4*C4*C4-0.00000199*D4*D4*C4*C4)-32)*5/9)

I put it in a named range called ‘hx’. I also created named range ‘cnvt’ to convert Celsius to Fahrenheit.

**The humidex formula below is now much easier to read!**

You’ll have to also audit the named ranges but it’s worth it!

Now let’s look at the **HeatIndex formula**.

This is from **Dick Kusleika’s Excel site.** I can’t think of a way to make this easier to read, can you?

I definitely agree with Dick’s point below!

People have been complaining about the excessive heat for a week around here. Not me. It won’t be long until I’m shoveling my driveway, so I’m counting my blessings.

I’ve enjoyed this warm summer. Today (Aug 10) had a high of 23C, 41% humidity and a cool breeze (much cooler than most days in July!)

After summer is nice fall weather. Then in the winter we will be complaining about the wind chill factor (maybe I’ll calculate that too!) and dreaming of hot summer days!

* What is your definition of heat?* Many from the north of Canada consider southern Ontario summers to be unbearable. But those of you from warm climates would laugh at us in Canada. But then again if you visited Canada in the middle of our winter you would be in shock and we would know what to do

The overall “how we feel” calculations have different variations. Also, we all experience heat differently depending on our tolerance (i.e. health, age), intensity of our outdoor activities, etc.

Here’s some **interesting info** regarding working outside in the heat (Canadian government). My Excel file has various related links.

My name is Kevin Lehrbass. I’m a Data Analyst and I live in Markham Ontario Canada.

Right now it’s summer and it’s been hot (several 30C + days).

Our climate varies a tremendous amount from winter to summer. During winter the temperature can go all the way down to -30C (I live in the southern warmer part of Canada!) but usually it’s between 5C and -20C.

In winter we have to clear our driveways (or pay someone) but the city clears the roads and sidewalks. In Markham we don’t have to shovel the roof of the house but apparently in some parts of Canada that’s necessary. Here the temperature rises enough to periodically melt the snow on the roof.

I’m going to enjoy the last part of summer being in my backyard without putting on a coat or shoes!

]]>(download Robert’s amazing **Excel ChessGames Viewer**)

It’s rare that Chess & Excel overlap! * What should I do first?* Audit formulas? Review games?

See how to add a new game and how to review the moves on the board.

Let’s review a classic game! Below we see sheet ‘Board’.

Select a game: in cell B4 I selected ‘Kasparov’s Brilliancy’.

See the moves: click the “**^**” spinner button on the right.

The next move is the brilliant move! **Can you see it?**

Hold spin buttons to quickly cycle through moves or enter sequence number in the cell K6.

Everything starts with the game you select in cell B4. Cells B5 & B6 retrieve game details.

In cell B9 we see this key formula:

=LOOKUP(IFERROR(LOOKUP(2,

1/(CHOOSE(MoveChoice,WhiteMoves,BlackMoves)=B$8&$A9),

CHOOSE(MoveChoice,WhiteLabels,BlackLabels)),”ET”),PieceIcons)

There are two LOOKUP functions in this formula. Let’s examine the inner LOOKUP.

LOOKUP’s **lookup_value** is hard-coded to 2 (I’ll explain later).

1/(CHOOSE(MoveChoice,WhiteMoves,BlackMoves)=B$8&$A9) is the **lookup_vector** (where we look).

Select **lookup_vector** and press F9 key to see this:

Why so many errors? Because of =B$8&$A9 Only the current cell’s a8 co-ordinate matches the result of CHOOSE(MoveChoice,WhiteMoves,BlackMoves). You’ll see a 1 above in the 19th position.

CHOOSE(MoveChoice,WhiteLabels,BlackLabels) is the **result_vector** (answer we retrieve).

Select **result_vector** and press F9 key to see this:

The **lookup_value** is hard-coded to 2. We won’t find 2 in **lookup_vector** results so LOOKUP defaults to last value of 1 (19th position). Answer is **BR** (position 19 of results_vector) that’s used in the outer LOOKUP function!

=LOOKUP(“BR”,PieceIcons) look for BR(black rook) in named range PieceIcons (=Board!$L$9:$M$21)

To get all the way back to the raw data (sheet ‘Games’) you’ll have to audit these named ranges found inside the double LOOKUP function:

- MoveChoice
- WhiteMoves
- BlackMoves
- WhiteLabels
- BlackLabels

Here are some auditing tips:

- click inside formula bar to see referenced cells
- unhide columns L & M (to see chess icons)
- note spinner button values hidden underneath it
- use F9 key on each part to see results (then press ‘Esc’)
- audit named ranges carefully

There are 45 named ranges. Select *Formulas / Name Manager*. Here’s a sample:

Starting in column B each column is a game with moves starting in row 8. This is modern chess notation. Each row contains a move from white and black.

Column F’s game is called ‘Amazing Nakamura’. I played through the moves and IT IS amazing!

Cell B17 (sheet Board) displays chess piece material advantage. Note: a material advantage doesn’t necessarily mean a player is winning.

=CHOOSE((PtLd>-1)+(PtLd>0)+1,”Black”,”None”,”White”)&” has a”&

IF(PtLd,” “&ABS(PtLd)&”-“,”ny “)&

“point lead in chesspieces.”

Formula above uses named range PtLd (sheet Pieces) where the calculation happens. PtLd formula is:

=SUM(COUNTIF(C5:J12,

{“B”,”W”}&{“P”;”N”;”B”;”R”;”Q”})*

{-1,1}*{1;3;3;5;9})

This calculates the value of the pieces. See this **post** for a detailed explanation.

Robert H. Gascon is Certified Public Accountant from Quezon City, Philippines.

See his Microsoft tech community **profile**. Robert is a valued contributor meaning that he answers a lot of questions posted by Excel users.

Robert has an incredibly deep knowledge of Microsoft Excel.

Robert has shared alternative solutions on my blog. I’ve learned a lot from him. Thank you Robert!

- how-many-unique-list-1-names-found-in-list-2
- sorted-data-validation-list
- largest-number-inside-alphanumeric-string
- dynamic-ranges-using-index-function
- what-is-this-formula-doing
- concatenate-values-to-create-a-key-in-excel
- allocating-costs-in-microsoft-excel

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham (Canada).

In 2018 I visited New York City. In central park I found outdoor chess tables and an indoor chess club.

I’ve played chess since I was 11. I was on the chess team in high school. Chess is fascinating and it’s great mental exercise. In the 90s I discovered Excel. I’ve been hooked ever since that day!

And…July 20th was **International Chess Day!**

The steps we create in Power Query are translated into M code (Power Query language). Understanding it and modifying it requires practice!

In this **post** XLarium shared an alternative solution (thanks!) to create a Cartesian product (all text combinations between multiple lists). Let’s start by auditing his solution.

First step is the same: load all 3 tables into power query. But he solves it with only 1 query (I used 2). Click ‘Advanced Editor’ to see his M code:

let

Source = Colors,

#”Removed Columns” = Table.RemoveColumns(Source,{“Dummy”}),

#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Temp”, each Sizes),

#”Expanded {0}” = Table.ExpandTableColumn(#”Added Custom”, “Temp”, {“Size”}, {“Size”}),

#”Added Custom1″ = Table.AddColumn(#”Expanded {0}”, “Temp”, each Animals),

#”Expanded {0}1″ = Table.ExpandTableColumn(#”Added Custom1″, “Temp”, {“Animal”}, {“Animal”})

in

#”Expanded {0}1″

Let’s review it row by row

Source = Colors,

He references the Colors query (the loaded table).

#”Removed Columns” = Table.RemoveColumns(Source,{“Dummy”}),

His approach doesn’t need a Dummy column so he removes it.

#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Temp”, each Sizes),

The current step refers to the previous step hence: #”Removed Columns”.

Table.AddColumn is obvious but the end part “Temp”, each Sizes puzzled me. So I closed the Advanced Editor and examined ‘Added Custom’ under APPLIED STEPS.

I clicked the gear icon for step ‘Added Custom’.

Even-though ‘Color’ is the only available column he uses **=Sizes** to reference query Sizes!

The result is this:

**What is this?**

Each row in field ‘Color’ gets the entire ‘Sizes’ query (‘Table’) !!

Row 1 (‘Blue’) gets each row from query ‘Sizes’. Repeat for each row.

#”Expanded {0}” = Table.ExpandTableColumn(#”Added Custom”, “Temp”, {“Size”}, {“Size”}),

This long code simply means the double arrow button (right of ‘Temp’) was clicked!

Result of expanding field ‘Temp’ is all combinations of ‘Colors’ & ‘Sizes’ (24 total rows) !

Here we see a sample.

6 colors X 4 sizes = 24 total rows

#”Added Custom1″ = Table.AddColumn(#”Expanded {0}”, “Temp”, each Animals),

Once again XLARIUM uses “Temp” field to create multiple rows for each single row!

#”Expanded {0}1″ = Table.ExpandTableColumn(#”Added Custom1″, “Temp”, {“Animal”}, {“Animal”})

And expanding this gives us the final result of 72 rows!!!

**Why would I modify his M code?**

His solution doesn’t use Dummy columns so I removed them in the original data (sheet ‘Inputs’).

After a refresh Power Query realizes dummy columns don’t exist. The code doesn’t work.

Below we see the original M code. I’m going to modify it !

let

Source = Colors,

#”Removed Columns” = Table.RemoveColumns(Source,{“Dummy”}),

#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Temp”, each Sizes),

#”Expanded {0}” = Table.ExpandTableColumn(#”Added Custom”, “Temp”, {“Size”}, {“Size”}),

#”Added Custom1″ = Table.AddColumn(#”Expanded {0}”, “Temp”, each Animals),

#”Expanded {0}1″ = Table.ExpandTableColumn(#”Added Custom1″, “Temp”, {“Animal”}, {“Animal”})

in

#”Expanded {0}1″

#”Removed Columns” = Table.RemoveColumns(Source,{“Dummy”}),

We remove this row of M code. We longer have Dummy columns in our source data.

#”Added Custom” = Table.AddColumn(#”Removed Columns”, “Temp”, each Sizes),

See the problem above? Step #”Added Custom” refers to previous step #”Removed Columns” that we removed. No problem, we change #”Removed Columns” to Source (query ‘Colors’).

Now we have this shorter code version of XLarium’s M code:

let

Source = Colors,

#”Added Custom” = Table.AddColumn(Source, “Temp”, each Sizes),

#”Expanded {0}” = Table.ExpandTableColumn(#”Added Custom”, “Temp”, {“Size”}, {“Size”}),

#”Added Custom1″ = Table.AddColumn(#”Expanded {0}”, “Temp”, each Animals),

#”Expanded {0}1″ = Table.ExpandTableColumn(#”Added Custom1″, “Temp”, {“Animal”}, {“Animal”})

in

#”Expanded {0}1″

**I now fully understand XLARIUM’s “Temp” column concept to create a Cartesian product! Thanks XLARIUM!**

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham Ontario Canada.

I have 2 dogs. This is Cali She was upset that I was working on this post (on the weekend).

Don’t worry…I spent time with her and Fenton. It’s been hot lately and Cali loves sunning on the deck in the backyard. Fenton loves running in circles.

]]>

(Download: **SOLUTION Excel file**, **PRACTICE Excel file**)

Cartesian product creates all combinations from multiple lists. In this post I’ll use Power Query!

In sheet ‘Inputs’ we have **6 colors, 4 sizes, and 3 animals** giving us **72 combinations.**

(see * BE CAREFUL! *section below if you have thousands of combinations).

**1 – Load 3 tables into Power Query **(from sheet ‘Inputs’).

We need a ‘Dummy’ column with a 1 in each row (I’ll explain later).

- Select any cell within a table
- On the ribbon select ‘Data’ and ‘From Table/Range’
- At top left click ‘Close & Load’ drop down arrow, select ‘Close & Load To’
- Select ‘Only Create Connection’ (as seen below)
- Repeat steps to load other tables

**2 – Merge the Queries.** Use loaded tables to create all combinations.

The 3 loaded tables as queries.

2 new queries will create all row combinations.

We’ll start by combining Colors & Sizes. Highlight queries Colors & Sizes.

Click drop down arrow for ‘Merge Queries’ and select ‘Merge Queries as New’.

Details for the 1st merge:

- Select Sizes in 1st drop down list and Colors in the 2nd
- Highlight Dummy fields in both tables
- Select join kind ‘Left Outer’
- Click OK and name it ‘
**Merge1_ColorsSizes**‘

Click double arrows (Colors column) to expand to all rows.

Now you see all combinations for Sizes and Colors.

2nd and final query. Highlight Animals and Merge1_ColorsSizes

Click drop down arrow for ‘Merge Queries’ and select ‘Merge Queries as New’.

Details for the 2nd merge:

- Select Animals in 1st drop down list, Merge1_ColorsSizes in the 2nd
- Select Dummy fields in both tables
- Select join kind ‘Left Outer’
- Click OK and name it ‘
**Merge2_AnimalsMerge1**‘

Click double arrows (top right) to expand to all rows.

You’ll now see all 72 rows but let’s remove column ‘Dummy’. Right click on column Dummy and remove (we no longer need it).

Now you should see 3 columns and 72 rows!

**3 – Export Back to a Sheet.**

Select query Merge2_AnimalsMerge1.

Top left of screen select drop down arrow on ‘Close & Load’ button, select ‘Close & Load To’

Select ‘Table’, ‘Existing worksheet’ and a destination cell.

Normally we’d have unique IDs in a table (one side) and then repeating IDs in another table (many side). Example: Product table has a row for each product we sell. Sales table has a row for each sale (products sold many times).

But…here we use the 1s in the Dummy fields so that each single row in the first table matches all the rows in the second table. It’s known as a ‘Many to Many’ relationship. It’s usually bad news but here we intentionally use it to get all combinations between these tables.

**!BE CAREFUL!**

**Be careful if your tables have thousands of records**. **If you have 2 tables and each has 1000 rows that’s 1000000 rows! **

You could potentially freeze and crash Excel if you have too many combinations! I have seen database queries that should have a join but don’t! This creates an unintentional Cartesian product that can take many hours or days to run!

I learned this technique from **Power Query Academy** (Ken Puls & Miguel Escobar). **Disclaimer:** i’m a student and an affiliate.

Here is the **Excel file** that uses XLarium’s solution (seen below in comments section). His solution doesn’t require the Dummy column with the 1s. Thanks for sharing!

Until I can quickly apply a concept on a daily basis I don’t consider it to be part of my active skill set. It’s more like an awareness that something is possible. But now I can quickly use Power Query to create all combinations and it’s MUCH EASIER than a formula or vba solution. My Excel file includes a tedious formula solution. See this post for a **Pivot Table** solution.

Note that in this example we have 3 tables so we need 2 queries. With a limited amount of tables Power Query is the fastest solution. It would be rare to get all combinations from many tables. If that’s necessary I would contact a vba programmer.

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham Ontario Canada.

Away from data I sometimes visit museums. In May I visited an impressionist exhibit at Toronto’s **AGO**.

I should visit the **ROM’s Rembrant exhibit** this summer!

Here you see me with Mondrian Art that I saw in New York last year (**only a nerd would create Mondrian art in Excel**).

**Sum cells that are found in both ranges!**

Cells found in both ranges are highlighted green (using conditional formatting).

**=SUM(C10:G24 E3:I14)**

Include both ranges with a space (NOT a comma) between them!

The space tells Excel we want the intersection of the ranges. Other functions (Min, Max) also work.

Sheet ‘CHALLENGE’ compares the easy solution with a tedious/complex solution.

**Sheet ‘MORE FUN!’ is proof that I’m a Excel nerd **Use arrows to change range dimensions. I used formulas inside conditional formatting to create range borders and color overlapping cells green.

I learned this trick from **Paul McFedries book ‘Formulas and Functions’**

I’ve read 2003 & 2010 editions. This is the current edition.

If you are past the basics then I highly recommend this book! I’ve learned so much from this book and I review it often.

The beginning chapters include some essential basic tips and concepts. Further on you’ll find advanced concepts:

- Chapter 12 Working with Statistical Functions
- Chapter 16 Using Regression to Track Trends and Make Forecasts
- Chapter 17 Solving Complex Problems with Solver
- Chapter 19 Building Investment Formulas

My name is Kevin Lehrbass. I’m a Data Analyst.

Yes…I challenged someone a few years ago to solve this. I could see the smoke rising from the effort. The solution worked but was complex.

Upon seeing the easy solution I detected a “you got me” gaze followed by awkward silence. I won’t use this challenge again.

]]>(Download my **Excel demo file**. This post dedicated to **Marcelo Ribeiro Simões**)

I normally advise people to include all relevant criteria inside formulas. This way, formulas serve their purpose independently. Filtering and hiding rows are a separate action for viewing the data that won’t interfere with your formulas.

There are exceptions when it’s helpful to adjust formula results by quickly hiding/filtering rows instead of constantly modifying formula syntax. Imagine a busy meeting with lots of questions.

How do we incorporate hidden and/or filtered rows into our formulas?

The common method is to use complex formulas like this:

=SUMPRODUCT((A7:A309>50)*(SUBTOTAL(103,OFFSET(A7,ROW(A7:A309)-MIN(ROW(A7:A309)),0))))

Power users can decipher this but most people can’t. **What if we could reduce the formula to:**

=COUNTIFS($A$7:$A$309,$AK4,$O$7:$O$309,1)

**How is this possible?**

The trick is to add this helper formula =–SUBTOTAL(103,A8) alongside the data-set. Column O uses the subtotal function to determine if the row is visible. 1 = visible, 0 = non visible. We then include the 1s and 0s inside formula conditions.

Helper columns increase file size but they simplify formula writing and auditing.

The data-set is called ‘Heart Disease UCI’ from Kaggle.com. See sheet ‘Counting Example’. A to N are original columns. We have 303 rows.

Subtotal formula is in column O. Initially subtotal shows 1 for all rows but once you manually hide a row or apply a filter non visible rows will change to 0. Watch formulas in cells A6 and O6 change!

Formulas with a green background change if they can no longer see relevant rows (some rows are already disqualified via formula criteria).

Columns AL to AO are traditional formulas that are not affected by non visible rows. This let’s you compare the results.

Sheet ‘Ranking Example’ demonstrates how hiding/filtering rows can affect ranking when using a subtotal helper column.

A downside of using filters is that you can’t see filter values. You have to remember the values that you’ve filtered columns to. * GOOD NEWS*: you can see filter values if you use

Obviously we can’t have 50 subtotal helper columns as this would increase the file size and create clutter. However 1 or maybe 2 subtotal helper columns can be extremely helpful if you want to calculate visible rows only! Take some time to experiment with this concept.

If you’re still with me there’s more good news! Excel’s AGGREGATE function goes above and beyond what SUBTOTAL function can do. It takes some time to learn it but it’s worth it!

Read this **intro from Microsoft** and then watch this **playlist of videos from Mike Girvin** (ExcelIsFun).

My name is Kevin Lehrbass.

I’m a Data Analyst and I live in Markham Ontario Canada.

The main reason I like the subtotal helper column is to avoid crazy long complex formulas. Why so much pain? Just add a subtotal helper column and use filters and/or slicers!

]]>(download my **Demo Excel File**)

**List 1** has 35 names (16 unique names)

**List 2** has 54 names (18 unique names)

(1)how many unique names in List 1? (2)how many unique names in List 2?

This formula counts unique names:

=SUMPRODUCT(1/COUNTIFS(G7:G41,G7:G41))

Question 3 is tricky: **(3) h**

How should we solve Question 3?

**quick and simple?****power query?****single array formula?****helper formulas?**

Each solution has pros and cons. Let’s explore each one.

**quick and simple**

Why complicate things?

Copy paste List 1 names into column A. Use Remove Duplicates feature. Match function tells us if name is found in List 2. Count function provides final answer.

You would have to repeat Remove Duplicates step if the data changes.

**power query**

All work is done inside Power Query. Great for a large datasets that change frequently. Power Query does have a learning curve but it’s an amazing tool.

Summary: load each list, remove duplicates from List1, merge List 1 & List 2. Export back to your sheet.

**single array formula**

If the answer must be dynamic (power query requires a refresh) and fit in a single cell consider this:

To enter this formula hold keys ‘Ctrl’ & ‘Shift’ & ‘Enter’.

Overview explanation:

The inner MATCH is: MATCH(G7:G41,G7:G41,0) It looks for all List 1 names inside List 1!

Compare inner results with counter values to replace duplicates with blanks IF(MATCH(G7:G41,G7:G41,0)=ROW(G7:G41)-ROW(G7)+1,G7:G41,””)

Now the outer MATCH’s **lookup_value** contains unique List 1 names. We look for these in List 2 and COUNT gives us the total found.

**helper formulas**

If you want a live dynamic answer but not the complexity of solution 3 then consider this.

Column F has a simple counter, column I has this simple formula:

=IF(G7=””,”^^^^^”,IF(MATCH(G7,$G$7:$G$41,0)=F7,G7,”^^^”))

Use either formula below, that look at column I helper, to get the final count:

=SUMPRODUCT(–ISNUMBER(MATCH($I$7:$I$60,$J$7:$J$60,0)))

or this array formula (requires Control Shift Enter….not just Enter)

=COUNT(MATCH($I$7:$I$60,$J$7:$J$60,0))

The **quick and simple** solution is great for the masses especially if it’s a one time question.

I’m becoming comfortable with **Power Query** so that solution was easy. It’s an amazing tool but let’s not forget that a Power Query solution is not fully dynamic. It requires a refresh. If building a multiple step model on top of the answer you would probably prefer one of the formula solutions.

**Array?** I love it but it is challenging for most Excel users to understand. This solution was the most fun to build as I had to be creative.

**Helper formulas**. Splitting the array logic out into steps makes it easier to audit for non array fanatics.

**What about you? What solution do you prefer? Do you have a different solution?**

**Robert H Gascon’s Idea!**

See Robert’s comment below. My ‘**single array formula**‘ solution is difficult to understand/audit. Robert suggests splitting the logic into two parts and creating a named range out of each part! It’s easier to audit this way and the final formula =UniqueCount doesn’t require Control Shift Enter.

I’ve updated my Excel file (see top of post). Go to Formulas/Name Manager. You’ll see named range ‘Unique1’. This is the lookup_value for the outer MATCH function. Named range ‘UniqueCount’ references ‘Unique1’. In cell M5 we just need to type this: =UniqueCount

Another benefit is that with careful planning you could reference the same base named range logic several times! It reminds me of writing database queries where various subsequent queries refer to the same base query. **Thanks Robert for the idea!!!**

My name is Kevin Lehrbass. I live in Markham Ontario Canada. I’m a Data Analyst.

How much of an Excel fanatic am I? I used VBA to create random Mondrian style art! (read this **post**)

]]>