26 Jul 2020

## Awkward Data

Another awkward dataset. Should we re-arrange the data or just find a way to calculate what we need?

(get my Excel file)

### Awkward Data

A nice normalized dataset would have 4 columns: Category, Month, Letter, Value. But we have this:

Each month has a group of 3 columns. If we had more months this pattern would repeat.

### The Question

Add values in columns C, F, I for each month/letter combination. The answers look like this:

The key to solving this is finding the pattern in the awkward data layout.

### 3 ‘As Is’ Solutions

Here are 3 formula solutions that work on the data as is (no rearranging).

Solution 1

###### =SUMIFS(OFFSET(\$A\$3:\$A\$11,0,MATCH(\$I14,\$B\$2:\$G\$2,0)+2),OFFSET(\$A\$3:\$A\$11,0,MATCH(\$I14,\$B\$2:\$G\$2,0)+1),J\$13)

Long formula but I love OFFSET even if it’s a bit volatile.

Solution 2

###### =SUM(IF((\$B\$2:\$E\$2=\$I19)*(\$C\$3:\$F\$11=J\$18),\$D\$3:\$G\$11,””))

An array (requires control shift enter).

Solution 3

###### =SUMPRODUCT((\$B\$2:\$E\$2=\$I24)*(\$C\$3:\$F\$11=J\$18),\$D\$3:\$G\$11)

This is probably my favorite solution. Short and it’s a normal formula (non array).

Which formula is your favorite?

Do you have a better formula solution? (maybe using dynamic arrays?!)

### Rearranging The Data

Rearrange the data so we can create a pivot. It should look like this:

Power Query

Can you rearrange this data using Power Query?

Your queries would need to adjust if the dataset grows (more months). I’d love to see if it’s possible!

Formulas

Well, that’s what we had to do in the old days so you should suffer too!

My file has my solution. It’s backwards compatible to any Excel version (no dynamic arrays).

It’s good practice to normalize using formulas. Note: formulas are in the same sheet as the data. Not a good idea but it’s a demo and it’s easier to read like this. Now we can create a simple pivot table with this data đź™‚

VBA

Of course VBA could rearrange this data. But I’m too lazy to create all those loops right now!

My name is Kevin Lehrbass. I’m a data analyst from Canada.

I saw this awkward dataset just before covid crisis. Things have really changed…but there’s always been awkward data!

I play around with data to keep my mind busy and share ideas with others. I’m hoping that someone can solve this using Power Query. I gave up and went back to formulas.

1. Alan Sidman says:

For the situation shown and not taking into consideration for future months. I loaded the table and then created a reference table (duplicate) of the original. In the original table, I deleted any columns that were associated with February. In the second table I did the same for January. I changed column names in each so that they matched. I added a custom column with the data January in one and February in the other. I then appended the February to January which resulted in the following:
Month Category Column1 Column2
January Category1 c 0.56247919
January Category2 c 0.665416221
January Category3 v 0.317881541
January Category1 f 0.369318453
January Category2 g 0.838794826
January Category3 t 0.265604131
January Category1 c 0.034951919
January Category2 sa 0.59025869
January Category3 a 0.811550546
February Category1 c 0.188865382
February Category2 c 0.083025283
February Category3 v 0.648843625
February Category1 f 0.685798337
February Category2 g 0.633699237
February Category3 t 0.990748883
February Category1 c 0.433338457
February Category2 sa 0.344227568
February Category3 a 0.064277462

I trust this is what you were looking for. If you add additional months, then I am not sure of a way of adding it automatically to the query.

1. Kevin Lehrbass says:

Hi Alan,
You’ve accomplished more than I did that’s for sure! I can often find a power query solution but this time I gave up.
…I received an excel file from Daniel and it includes = Table.Split(#”Filled Down”,3) which I had never heard of before. I think I’ll write a separate post about that. Thanks for reading my blog Alan. I’m curious… how did you find out about it?
Cheers,
Kevin