Before you analyze your data it’s important to decide if you need to unpivot it. Why unpivot? What’s unpivot? Are there exceptions?
It’s easier to analyze data with Pivot Tables or formulas when data has a tabular structure like a database. Each thing has it’s own column, each row is a transaction (i.e. a sale).
This is a crosstab data layout. Great for reading as a final layout. Not ideal to build a model on top of it.
We have 3 parts: Department, Date, Number. Each part should be in a separate column.
How to Unpivot a Crosstab
A crosstab can be unpivoted using Excel’s Get & Transform (aka Power Query):
- Select any cell within your data
- On the ribbon click ‘Data’, in ‘Get & Transform Data’ section select ‘From Table/Range’
- Right click column header ‘Department’
- Select ‘Unpivot Other Columns’
- Double click column header ‘Attribute’, rename to ‘Date’
- Double click column header ‘Value’, rename to ‘Sales’
Export Back To Sheet
- In top left click ‘Close & Load’, ‘Close & Load To…’, and select a location.
The data should now look like this:
This is a database/pivot table friendly layout. Easily analyze this data with Pivot Tables or formulas.
Note: by default the output links to the original data. You could delete the query and keep the output.
When Not To Unpivot
Rare exception: hours have been spent building a model on top of unpivoted data. A deadline is pending so I can answer the questions with a few formulas. Example layout:
We really only have two pieces of data: Date and Number. I decided not to unpivot this layout (see my post) as I could answer the questions with just a few formulas 🙂
Could we leave the data entry layout as a crosstab and still pivot it? Yes! We could do this:
- use Power Query to unpivot this layout
- export the results to a new sheet
- add a Pivot Table (or formulas)
If the data changes or expands we’d have to refresh Power Query and the Pivot Table. Not a best practice but it works!
Preventing the need to unpivot data is ideal. If we have the opportunity early on we can explain the tabular data layout and why it’s important.
More Non-ideal Layouts
Crosstab isn’t the only awkward data layout. There are others including:
Single data-set split across sheets
A single data-set is chopped into pieces and split into several sheets. It’s probably done to make it easier to read…but then trying to analyze it as a single data-set is extremely difficult. Many try using 3D ranges and other circus tricks. A dataset should be kept together in one sheet.
An entire data-set is stacked in a single column. Instead of having various rows of data the rows are all rotated and stacked into this single column. Other software sometimes exports the data like this. Formulas or Power Query can be used to re-arrange this layout into a database or pivot table friendly layout. Read this post.
Mini Data Blocks
Sometimes a dataset is split into many mini datasets. Read this post to see how to pull it all together into a single dataset.
Other Non Tabular Datasets
My name is Kevin Lehrbass. I’ve been working as a Data Analyst since 2001.
I’ve learned how important it is to structure data before analyzing it. Yes there are many tricks (i.e. 3D ranges, array formulas, vba etc) to work around a poor data structure but starting with a proper data layout saves so much unnecessary work and stress.