Silly but a great way to practice VBA. Once I started playing around my curiosity took over. I tinkered around for a few hours while listening to music.

**Not** recommended as a Valentine’s Day gift unless he/she loves spreadsheets and/or programming.

Download my **Excel file** and click the robot to add a heart! Click it again!

I used the macro recorder to create basic code for:

- moving the robot
- adding a heart
- rotating a heart
- varying the heart color

I modified macro recorder code to create more powerful VBA. Examples:

The macro recorder created this:

Sub TESTcreateHeart()

‘ TESTcreateHeart Macro

‘

ActiveSheet.Shapes.AddShape(msoShapeHeart, 847.5, 168.6, 72, 72).Select

End Sub

Changing the numbers inside the brackets revealed the purpose of each one.

- 847.5 lateral position
- 168.6 vertical position
- 72 width
- 72 height

I wanted to randomize the heart location and size. I added some variables (all numbers).

Dim lateral, vertical, heartsize As Integer

Next was assigning the random numbers to each variable. Heartsize is used for width & height.

lateral = (Rnd() * 500) + 70

vertical = (Rnd() * 175) + 45

heartsize = Rnd() * 45

Now the variable names replace the original hard-coded numbers:

ActiveSheet.Shapes.AddShape(msoShapeHeart, lateral, vertical, heartsize, heartsize).Select

The macro recorder created this:

Sub TEST_Heart_Color_and_Transparency()

‘ TEST_Heart_Color_and_Transparency Macro

ActiveSheet.Shapes.Range(Array(“Heart 2”)).Select

With Selection.ShapeRange.Fill

.Visible = msoTrue

.ForeColor.RGB = RGB(255, 0, 0)

.Transparency = 0.6999999881

.Solid

End With

End Sub

I created variables redc and transp and assigned values to them.

transp = Rnd()

redc = WorksheetFunction.RandBetween(150, 255)

I also added a random tilt for the heart and code to create 3D hearts.

The main macro is Create_A_Heart found in module A_CreateHeart. Many actions are inside this macro but I also used the Call procedure to run macros in a separate vba modules. Example: move the robot let and right.

Call E_MoveRobot

This macro gets the variable value from individual cells (named ranges) in sheet Hearts. You can modify the numbers in column D!

I used named ranges instead of hard coded cell references in case I added rows or columns later on.

Sub E_MoveRobot()

Dim movebot, movebotleft, movebotright As Integer

movebotleft = Range(“Move_robot_left_amount”).Value

movebotright = Range(“Move_robot_right_amount”).Value

movebot = WorksheetFunction.RandBetween(movebotleft, movebotright)

ActiveSheet.Shapes.Range(Array(“RedRobot”)).Select

Selection.ShapeRange.IncrementLeft movebot

Range(“MoveRobotValue”).Value = movebot

End Sub

Finally, I added code to assign each heart’s variable values to cells in column J. Why? Why not.

Range(“HeartCount”).Value = Range(“HeartCount”).Value + 1

Range(“HorizontalPosition”).Value = lateral

Range(“VerticalPosition”).Value = vertical

Range(“HeartSize”).Value = heartsize

Range(“MoveRobotTotal”).Value = Range(“MoveRobotValue”).Value + Range(“MoveRobotTotal”).Value

I found these sites helpful for my VBA syntax questions:

- docs.microsoft.com/en-us/office/vba/api/excel.shapes.addshape
- docs.microsoft.com/en-us/office/vba/api/excel.shaperange.incrementrotation
- powerspreadsheets.com/excel-vba-range-object/
- excelmacromastery.com/excel-vba-range-cells/
- thespreadsheetguru.com/blog/how-to-keep-track-of-your-shapes-created-with-vba-code

My name is Kevin Lehrbass. I’m a Data Analyst.

I can still remember when I first discovered VBA in Excel. Life altering!!

These days I don’t use vba regularly but I try to keep my skills alive by building things.

]]>

It’s easier to analyze data with Pivot Tables or formulas when data has a tabular structure like a database. Each thing has it’s own column, each row is a transaction (i.e. a sale).

This is a crosstab data layout. Great for reading as a final layout. Not ideal to build a model on top of it.

We have 3 parts: Department, Date, Number. Each part should be in a separate column.

A crosstab can be unpivoted using Excel’s Get & Transform (aka Power Query):

**Load Data**

- Select any cell within your data
- On the ribbon click ‘Data’, in ‘Get & Transform Data’ section select ‘From Table/Range’

**Transformation Steps**

- Right click column header ‘Department’
- Select ‘Unpivot Other Columns’
- Double click column header ‘Attribute’, rename to ‘Date’
- Double click column header ‘Value’, rename to ‘Sales’

**Export Back To Sheet**

- In top left click ‘Close & Load’, ‘Close & Load To…’, and select a location.

The data should now look like this:

**This is a database/pivot table friendly layout.** Easily analyze this data with Pivot Tables or formulas.

Note: by default the output links to the original data. You could delete the query and keep the output.

Download my **Excel file.** Or use this **Excel file Practice** to create the unpivot steps above.

Rare exception: hours have been spent building a model on top of unpivoted data. A deadline is pending so I can answer the questions with a few formulas. Example layout:

We really only have two pieces of data: Date and Number. I decided not to unpivot this layout (see my **post**) as I could answer the questions with just a few formulas

**Could we leave the data entry layout as a crosstab and still pivot it? Yes!** We could do this:

- use Power Query to unpivot this layout
- export the results to a new sheet
- add a Pivot Table (or formulas)

If the data changes or expands we’d have to refresh Power Query and the Pivot Table. Not a best practice but it works!

Preventing the need to unpivot data is ideal. If we have the opportunity early on we can explain the tabular data layout and why it’s important.

Crosstab isn’t the only awkward data layout. There are others including:

**Single data-set split across sheets**

A single data-set is chopped into pieces and split into several sheets. It’s probably done to make it easier to read…but then trying to analyze it as a single data-set is extremely difficult. Many try using 3D ranges and other circus tricks. A dataset should be kept together in one sheet.

**Stacked Data-set**

An entire data-set is stacked in a single column. Instead of having various rows of data the rows are all rotated and stacked into this single column. Other software sometimes exports the data like this. Formulas or Power Query can be used to re-arrange this layout into a database or pivot table friendly layout. Read this **post**.

**Mini Data Blocks**

Sometimes a dataset is split into many mini datasets. Read this **post** to see how to pull it all together into a single dataset.

**Other Non Tabular Datasets**

See my previous **post** with other non tabular layouts and my recent rearranging data **post**.

My name is Kevin Lehrbass. I’ve been working as a Data Analyst since 2001.

I’ve learned how important it is to structure data before analyzing it. Yes there are many tricks (i.e. 3D ranges, array formulas, vba etc) to work around a poor data structure but starting with a proper data layout saves so much unnecessary work and stress.

In November of 2018 I visited Memphis. I visited the **Blues museum** on Main street and did a tour of Beale street including **BB King’s Blues Club**.

High level: (a) split ‘Clients Assigned’ names into columns, (b) unpivot those columns into rows.

This task is now incredibly easy thanks to Excel’s Power Query (Get & Transform) tool.

Get my **Excel file** and follow along.

**Load Data**

- Select any cell within your data
- On the ribbon click ‘Data’, in ‘Get & Transform Data’ section select ‘From Table/Range’

**Transformation Steps**

- Right click column header ‘Clients Assigned’, ‘Split column’, ‘By Delimiter’, select ‘Comma’, split at ‘Each occurence…’

- Right click column header ‘SalesPerson’ and ‘Unpivot other columns’
- Right click column header ‘Attribute’ and ‘Remove’
- Double left click column header to rename ‘Value’ to ‘Clients Assigned’

**Export Back To Sheet**

- Click ‘Close & Load’, ‘Close & Load To…’, and select a location.

Building a model on top of database layout data is much easier. However, there are some rare exceptions. If the task were simply to count the clients assigned to each salesperson use this formula:

=LEN(G5)-LEN(SUBSTITUTE(G5,”,”,””))

Or if it’s a table format use this formula:

=LEN([@[Clients Assigned]])-LEN(SUBSTITUTE([@[Clients Assigned]],”,”,””))

In the ol’ days before power query rearranging data was often painful. It’s possible to use formulas to solve this but it’s so much work. There’s also a trick to combine Text to Columns with the Alt D P pivot trick. Just a few steps but Power Query is still easier.

My name is Kevin Lehrbass. I’ve worked as a Data Analyst since 2001.

Power Query is a revolutionary tool! The basics are easy to learn but mastering advanced techniques takes practice. The vast majority of Excel users have no idea how valuable Power Query is.

Power Query is free! Starting with Excel 2016 it’s a built-in feature. Learn with me at **Power Query academy**.

Vlookup vs Index/Match debate has jumped the shark. There’s other things worth discussing.

Here’s a **BIG** one:

Whenever possible we should avoid using volatile functions

**Where are you on this scale?**

For years I’ve been cautious of overusing volatile functions (beware of INDIRECT!). Robert has convinced me to move further to the right on my scale. Why? I’ve used volatile OFFSET to create dynamic ranges when I could’ve used INDEX.

Volatile functions include: Today, Now, Indirect, Rand, Randbetween, Offset, etc.

Non volatile functions only calculate when required. =SUM(A1:A10) will not recalculate unless one of the values in range A1:A10 changes. Volatile functions recalculate frequently. In-depth explanations: **Liam Bastick**, **Charles Williams**.

Sometimes we need to extract a subset of data from a large range. Examples include:

- Average 5th to 20th values in a column
- Sum a column from a list of columns
- Get minimum of the values from a row

We need to dynamically change the range, column or row so we can’t hard code the references.

It requires careful study and practice but it’s worth the effort. Let’s start with this:

=SUM(A2:A7) SUM function adds numbers in range A2:A7

We want to change the range to A2:A5 or A2:A9 or A2:A7 without touching the formula.

=SUM(A2:INDEX(A1:A9,J13)) Let’s walk through this:

This part =SUM(A2: is familiar. SUM adds numbers. The range starts with cell A2.

…and now it gets weird

INDEX(A1:A9,J13) normally this returns a single number. If cell J13 is 5 then we get 328 (cell A5 value). **BUT we’re including the INDEX after the colon inside the SUM function**. What we get isn’t value 328 but a reference to cell A5 !

Dynamically our formula gives us =SUM(A2:A5) HUH? **Incredible!** **Life altering!**

Change cell J13 to 8 and it gives us =SUM(A2:A8) **Isn’t that cool ! **We can also make A2 dynamic!

Download my **Excel file** to learn more. With practice you’ll feel comfortable with this approach!

I use INDEX function a lot but not for dynamic ranges. I’ve used OFFSET as I was more comfortable with the syntax and because it was easier to explain OFFSET to business people that I help.

I should’ve incorporated the Reference Form INDEX technique into my skill-set years ago. Overusing OFFSET can increase calculation time (same for overusing array formulas or UDFs).

If there’s an alternative non volatile function then we should use it. Being aware of a concept is not enough. Now I’m putting in the time to learn it thoroughly.

- There are times when using a volatile function like NOW, TODAY or RAND is necessary.
- You have numbers in column A. You want a cumulative sum in column B. You also want to be able to easily delete a row without breaking the cumulative formula. See sheet (5) in my Excel file.

- Mike Girvin (my
**initial exposure**to the technique) - Robert H. Gascon (his
**comment**convinced me) - Microsoft Excel
**documentation**

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

Although I’ve been helping others in Excel for almost 2 decades I’m still learning. Thanks to Robert H. Gascon for convincing me to implement the Index Reference Form syntax into my active skill-set.

It’s not enough to be aware of a concept. It takes time to learn it well enough to be able to apply it when there’s time pressure at work. Index Reference Form is definitely an odd bird but because it’s non volatile it saves calculation time and is well worth learning.

]]>

(follow along with my **Excel file****.**)

I had to concentrate…then I remembered what it does.

**=SUMPRODUCT(MID(G2,{1;2;3},1)*1)**

If you understand this formula quickly then great but if not don’t worry…let’s audit it!

**=MID(G2,{1;2;3},1)**

Normally MID function extracts character(s) from within a cell like this **=MID(G2,3,1) **

MID looks at cell **G2**, goes to the **3rd** position and extracts **1** character. If cell G2 has 789 MID would extract “9”.

But we’ve got this **MID(G2,{1;2;3},1) **what does the **{1;2;3}** do?

**{1;2;3}** is a magical array constant! It extracts the 1st, 2nd and 3rd characters as individual values.

Inside our formula in cell H2 highlight **MID(G2,{1;2;3},1)** and press F9 key (on laptop hold Fn key and press F9).

You’ll see this: **{“7″;”8″;”9”} **(press escape key to revert back to original formula).

Each number has been extracted individually. It’s not “789” but rather “7” and “8” and “9”.

The double quotes indicate that the numbers are in text format (MID is a text function).

We use ***1** to multiply or convert each text number to a real number.

Finally, the SUMPRODUCT function adds the 7, 8 and 9. That’s why we get 24.

*The purpose of the formula was to add all the digits in the cell together!*

We want to add all the digits in a cell. *Below we see values of different lengths.*

If this happens we have two options:

- take a simpler approach
- make our formula smarter (two variations)

Use helper columns to individually extract and convert each number and then add them up.

As it won’t always be 3 characters we’ll change this part **{1;2;3}** into this:

**ROW(INDIRECT(“1:”&LEN(G12)))**

This creates a dynamic array constant for whatever length we need!

Cell G12 has 93782 (5 digits) so our fancy array maker creates **{1;2;3;4;5}**

Isn’t that cool? So our smarter formula is:

**=SUMPRODUCT(MID(G12,ROW(INDIRECT(“1:”&LEN(G12))),1)*1)**

This solution was suggested by Robert H. Gascon (see his comment below!)

As it won’t always be 3 characters we’ll change this part **{1;2;3}** into this:

**ROW(A$1:INDEX(A:A,LEN(G22)))**

This creates the exact same dynamic array constant as 2a except it’s NOT volatile (no INDIRECT).

Brillant! Robert’s formula solution:

**=SUMPRODUCT(–MID(G22,ROW(A$1:INDEX(A:A,LEN(G22))),1))**

Even-though the formula solves the more complex case (variable length) we have to be careful.

Why? Because using complex formulas with large datasets can cause Excel to calculate slowly and possibly crash your Excel file. Our 2a complex solution includes the volatile INDIRECT function (recalculates frequently & slowly). I’ll be honest that I’m not sure how much faster 2b solution is. It’s not using any volatile functions so it’s faster but it’s still using 5 functions.

It takes a lot of practice to get a feel for when you can use a complex formula and when it’s better to simplify. Maybe it’s time to test these two solutions using **FastExcel addin** by Charles Williams!

My name is Kevin Lehrbass. I’m a Data Analyst from Markham Ontario Canada.

Sometimes I write down formulas for future reference. I don’t always have time to create a sample file. In this case, without the data, I had to audit the formula to remember what it does.

The final dynamic formulas:

**=SUMPRODUCT(MID(G12,ROW(INDIRECT(“1:”&LEN(G12))),1)*1)**

and suggested formula from Robert:

**=SUMPRODUCT(–MID(G22,ROW(A$1:INDEX(A:A,LEN(G22))),1))**

These are** brilliant**, **complex** and **dangerous**. **Brilliant:** you can be so creative in Excel. **Complex:** most people won’t understand what it does. **Dangerous** they could be slow to calculate.

Experience teaches us when to use this approach and when to use a different approach.

]]>

Linguistics: words in two languages that look or sound similar, but differ significantly in meaning.

Chess: an idea, method, or concept that is valid in one type of position, but that a player mistakenly applies in another

Another interesting quote from **Daniel Naroditsky (Chesslife)** is:

…a crucial part of endgame mastery consists in the ability to determine whether a certain technique is applicable in a given position.

**When creating models in Excel it’s important to know when to apply a certain concept and when not to. When incorrectly applied it cripples calculation speed of a model even if the end answer is correct.** Let’s look at some examples.

A common mistake I’ve seen is using an expanding range with large data-sets.

**What Is An Expanding Range?**

Here are two examples:

=COUNTIF($B$4:B4,B4) when dragged down it identifies the first occurrence of a value.

=IFERROR(VLOOKUP(B4,$B$3:C3,2,FALSE),SUM(MAX($C$3:C3),1)) a unique sequence number is created when a new values appears.

Notice that COUNTIF, VLOOKUP, and MAX funcitons have partially locked ranges.

For example range $B$4:B4 when dragged down expands to: $B$4:B5 $B$4:B6 $B$4:B7 etc.

**Is An Expanding Range Bad?**

In small data-sets an expanding range can be a great solution! However, most users don’t realize that when used in a large data-set it can be very slow to calculate and possibly crash Excel.

**What is a large data-set?**

20 years ago a large data-set would be 10000 rows. That’s not a large data-set today as Microsoft has improved Excel significantly. Nevertheless, using an expanding range with 900000 rows will cause Excel to struggle.

**Expanding Range Solution!**

**Chandoo‘s** unique sequence number solution uses an expanding range. It works great with a small data-set. For larger data-sets consider this **lighter solution**. Despite more steps it’s faster.

Download my **Excel file** and follow along.

**Scenario 1** flag the first occurrence of each value. The data-set is large enough that an expanding COUNTIF range would be slow.

**Solution 1 **my solution (get file above) compares counter and MATCH function values to flag the first occurrence of each value.

Method A (column D) flags each new value with a 1. Method B (column F) uses TRUE.

**Scenario 2** over using the versatile but incredibly slow INDIRECT function to consolidate data that’s spread across sheets.

**Solution 2 **use Power Query to consolidate the data (or quick copy/paste for one time task). Educate Excel users to always build a data-set in a single sheet. It eliminates the cleanup and makes the analysis so much easier.

**Scenario 3** dragging a heavy array formula down thousands of rows in a large data-set.

**Solution 3 **consider using lighter helper columns to do the same thing.

**Scenario 4** 500000 vlookup functions based on the same lookup value! (nothing wrong with VLOOKUP when used sparingly)

**Solution 4 **one column with a MATCH function followed by INDEX functions. Much more efficient. Note: there are limitations in Excel. When you have a large amount of data consider using Excel’s PowerPivot, Microsoft’s Power BI or SQL Server (database).

**Scenario 5** not considering Power Query.

**Solution 5 **Power Query!!! This applies to thousands of cases!

**Scenario 6** not considering VBA.

**Solution 6 **A couple of lines of VBA code could save hours of work!

**What false friends have you encountered in Excel?**

My name is Kevin Lehrbass. I’m a Data Analyst from Markham Ontario Canada.

This was me at Manhattan’s Central Park chess club in March of 2018. They also have an indoor area.

When I was 14 I joined my high school’s chess club. There was a pyramid ranking system. The top 4 players would play other high schools. I started on the very bottom row. I fought my way to 4th position and onto the chess team. I briefly (funny story) had the top rank!

Find something that interests you, find the right training, practice a ton and anything is possible! I’ve applied this concept to my career in data

]]>

When Joe isn’t running **Internet Kent** (his ISP) he’s often participating in **Tough Mudder** competitions throughout the U.S. and Canada. He has completed many Tough Mudders and he’s also an **ambassador**. As you can see above, Joe is known for his many painted faces.

To keep in shape he has an incredible workout routine. Running is a big part of that. How does Joe know if he’s on pace for a good year of running? That’s where I came in handy.

Joe enters his monthly data in a spreadsheet. How can Joe…

- easily sum his Year-to-Date totals?
- compare to previous years?
- visualize his running stats?

Joe doesn’t want to get nerdy with the data. He just wants to enter a few numbers at the end of the month and quickly study the results. That’s ok Joe. I’ll do the nerdy stuff while you play in the mud.

Download **Joe’s Running Tracker**. Download a blank **Running Tracker**.

**Set-up **A couple of simple selections to set it up.

**Instructions **Screen shot indicates input cells, formula cells and conditional formatting.

**Running Tracker **Enter data in this sheet. Formulas calculate stats. Conditional formatting highlights the data. Now you can get in shape like Joe and participate in Tough Mudders!

**Other Sheets **I created sheets ‘Chart’, ‘YTD Stats’, ‘YTD Chart’ to help Joe visualize and study his data.

Joe has been tracking his running stats since 2013. **What do his running stats tell us?**

Joe’s monthly running totals.

Dark green highlights top months across all years.

Dark orange highlights lowest months.

Here we see monthly km averages.

Joe runs more in the nice weather.

Lots of stats here! **Yearly Totals** is obvious. **Rank Yearly Totals** is also clear.

**Km needed to reach 1st, 2nd, 3rd** Important numbers for Joe! It looks like Joe’s 2018 totals have a chance of pulling into 3rd place! Run joe run there’s only a few days until the new year!

**Total Runs** Joe enters these numbers and then it’s easy to calculate **Average Km per run**.

At the bottom Joe can study is Year to Date totals! **Totals YTD Jan to Nov** allows Joe to quickly compare this year to previous years at the same point in time. **Rank Totals YTD Jan to Nov** ranks based on the same number of completed months.

I also created three additional sheets with charts and Year to Date stats for Joe.

When Joe is finally done running for the year he can curl up with a healthy smoothie and enter his December running total in Excel!

Joe ran 208.15 km in December! Enough to reach 3rd place! Congrats Joe! Here is the updated **Excel file (2018 completed)**.

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham Ontario Canada.

While Joe excels at running I just Excel

Microsoft Excel is my favorite data tool.

]]>

Here is a summary of their methodology (or read the **article**)

In a business world that often seems obsessed with today’s stock price and this quarter’s numbers, our ranking takes the long view: It’s based primarily on financial returns over each CEO’s entire tenure—and because these CEOs have been successful, many have enjoyed a long run in the job. (CEOs on the list have been in the position for an average of 16 years, versus an average in 2017 of 7.2 years for S&P 500 CEOs.) To calculate the final rankings, we also factor in each company’s rating on environmental, social, and governance (ESG) issues.

(1) FINANCIAL 80%, (2) SUSTAINALYTICS 10%, (3) CSRHUB 10%

The financial metric consists of an average of three metrics (country adjusted total shareholder return, industry adjusted TSR, change in market capitalization).

Non financial metrics were split between sustainalytics and csrhub. From my understanding both are related to environmental, social and governance factors (knows as ESG).

Here’s what I’ll do in this post:

- recreate the existing ranking using their raw data
- test the sensitivity of the category weights
- use Excel formulas & features to summarize results

I was pleasantly surprised when I read:

Download the Data behind the Ranking

**My Excel file** allows you to change the weighting easily plus some nerdy stuff. Here is the original **link** where you can find the data.

In my file (sheet ‘DATA’ column W) this formula recreates the final value:

**=(T3*0.8)+(U3*0.1)+(V3*0.1)**

A simple rank of above value in column X confirms the original hard coded rank in column A.

In sheet ‘Dashboard’ change category weightings in cells B18, C18, D18. Each cell is a named range.

Column Z in sheet ‘DATA’ has this formula for the Adjustable Weighting Final Value:

**=(T3*FINANCIAL_RANK)+(U3*SUSTAINALYTICS_RANK)+(V3*CSRHUB_RANK)**

Column AA has the modified rank. **Look at the visual in column AB to compare the original rank with the modified rank.**

Scrolling down column AB shows you the rank gain/loss difference. There’s minor changes in the top 12 but further down we see a fair amount of change.

Below we see that correlation is high for value and value rank based on original weight (80% 10% 10%) versus modified weight (70% 15% 15%).

It means that the two columns of rank values are very similar. However, if we modify weighted values to 50% 25% 25% we see the correlation values dipping to 0.60 and 0.65.

Changing the original 80% 10% 10% weighting to 70% 15% 15% has a big impact on several CEOs.

Top 10 Loss has a similar result. For example: Jeffery Bezos drops from rank 68 all the way down to 97.

I found several stats to be interesting/concerning:

- 44% of these CEOs without MBA or Engineering degree
- Average age of these CEOs is 60
- Only 3% of these CEOs are women

Harvard, you had me at “Excel file”

My name is Kevin Lehrbass and I’m a Data Analyst. I live in Markham Ontario Canada.

I love working with data in Microsoft Excel. Sure, I can write SQL statements and that’s fun too but Excel is my favorite.

Whenever I hear about a top X for cities, MBA schools, CEOs, etc I’m always intrigued to learn about the methodology and play with the numbers.

This post’s source link **https://hbr.org/2018/11/the-best-performing-ceos-in-the-world-2018**

It’s based on the proposition that labor hours decrease in a definite pattern as labor operations are repeated….based on the statistical finding that as cumulative production double, cumulative average time required per unit will be reduced by some constant percentage, ranging typically from 10% to 20%

This quote is from the book “SCHAUM’s BUSINESS FORMULAS”

In other words **The more you learn the easier it gets.**

Well…kind of. More on this later.

In this **Excel file** I converted the formal equation into a formula:

**=$F$24*($F$3)^(LN(E25)/LN(2))** this formula can described as:

**=Unit Time Hours*(Learning Curve Percent)^(LN(Unit Value)/LN(2))**

**Unit Time Hours**total hours to complete the first unit.**Learning Curve Percent**or the improvement rate.**^(LN(Unit Value)/LN(2))**calculates like compound interest.

See columns E and F. At the top I used a simple method to reduce the value above by the same learning curve percent. Below that I used the formula described above.

- The Learning Curve is used for economics & accounting. Can we apply it to Excel?
- When learning Excel what influences (a) Learning Curve Percent, (b) Unit Time Hours?
- Can we reduce these components down to simple terms?

Let’s consider the following:

**Learning Curve Percent**=quality of the training material + necessity + curiosity**Unit Time Hours**=effort

High quality training definitely accelerates learning.

For me, necessity & curiosity are incredible factors in the learning process.

By necessity I learned T-SQL in my last job. I enjoyed it but had to learn quickly and stay focused.

I’ve learned most of my vba due to curiosity. The “*I wonder if it’s possible to…*” has been a huge factor.

Effort (grit) is an incredible factor. Without it nothing else matters.

If we can decrease our learning curve percent we’ll learn faster!

*How can we do that?*

- Find the material that works for you
- Practice. Build stuff (necessity helps!)
- Get help (don’t forget what you learn!)
- Repeat & Improve (it gets easier)

If we don’t improve the percent will remain close to 1. See sheet ‘Learning Curve Percent’.

My name is Kevin Lehrbass. I’m a Data Analyst. I live in Markham Ontario Canada.

My Data Learning Curve started back in the late 1990s. I saw someone working in Excel and it intrigued me. I read a book and took a course. Then I read the Excel 2000 Bible by John Walkenbach cover to cover! In 2001 I started my first data job, learned SQL, took more courses, read more books… and the rest is history.

But…we have to keep using our skills or we’ll forget. My T-SQL (and PL-SQL) skills have dropped off because I haven’t used it in 10 years (I still use SQL). My Excel and DAX formula knowledge have increased. I still play around with VBA and it does come in handy but it’s not an essential part of my current job.

]]>

I was sent an interesting challenge recently to **sort a list of names and addresses**. This doesn’t sound too difficult a task, right?

*Unfortunately, the list of names and addresses is in the format below. And has hundreds of them.*

My first thought was to create a macro. Ideas were running through my head immediately about how to tackle it. I was confident it would be easy with VBA but I didn’t want to limit myself to VBA.

In this post I will explain two methods to achieve our goal of sorting the list by name. One using Excel VBA, and another using Power Query.

**Raw data file.xlsx** (recreate the solution) or get Alan’s Power Query solution **Sorting challenge.xlsx**.

It is nice to have multiple options at our disposal. I used to rely so much on VBA. As soon as a complex task came along, I would dive straight into it. But with Power Query, we can set up a process that others may feel more comfortable with and may be easier to edit in the future.

Once the Power Query process is set up. It can be refreshed at the click of a button in the future if more addresses are added, or information changed.

Select the range of cells and click **Data** > **From Table/Range** (the wording of this step may differ depending on your version of Excel). Check that the range is correct, and that our range does contain headers. Click Ok.

The range opens in the Power Query Editor. We will now step through some processes to transform this data into something we can sort.

We will start by removing the blank rows. We can do this by filtering out the null values. Click the Filter arrow in the *Addresses* column and uncheck the Null box.

The addresses are the main problem here because they spill over 3 rows. We need to get these into one row along with the name to sort the list. That is our goal now.

Let’s start by filling out the Names column. Select the *Name* column. On the ribbon click **Transform** > **Fill** and **Down**.

Now to get the addresses into column we need to understand the pattern in our data. In this example it repeats every 3 rows. If we can group these repeats, then we can pivot them.

Let’s start by inserting an Index column. Click **Add Column** tab and then **Index**. A new column is added, and the index values begin from 0.

We will now add a modulo column to group the repeating values. Select the *Index* column. Click the **Add Column** tab > **Standard** > **Modulo**. Enter 3 as the value.

The modulo calculation categorizes each repeating value for us.

Let’s now get them into columns. Select the *Modulo* column. Click the **Pivot Column** button on the **Transform** tab. Select the *Addresses* column for the **Values Column**. We do not want any aggregate calculations, so click Advanced Options and select **Don’t Aggregate**.

This is really beginning to take shape. Now let’s fill in some blank cells. Select the 0 column (Address column) and click **Transform** > **Fill** > **Down**. Then select column 2 (postcode column) and click **Transform** > **Fill** > **Up**.

Let’s reduce the list to one row for each record of data. Click the filter arrow for column 1 and uncheck the *Null* box.

The *Index* column is no longer needed. Right click the *Index* column and select **Remove**.

The 3 column headers for the address details need to be renamed with something more meaningful than 0, 1 and 2.

Double click each header and enter an appropriate name. *Address*, *City* and *Postcode* were used in my example below.

You can see that the list has already been sorted automatically by Power Query. However, I would like to sort this list. I feel better with this step added.

Select the *Name* column and click the **Sort Ascending** button on the **Home** tab.

All of steps that we have undertaken are listed in the Applied Steps pane. We can use this to remove or edit steps of the process later.

We can also name the query here. And that is important. Click in the **Name** box and enter *SortList* as the query name.

We’re done! Let’s put the query results back into a sheet. Click ‘File’ (top left), ‘Close & Load to’, ‘New Worksheet’.

By using VBA, we have many approaches to re-shaping this data and sorting by name. This is the method that I used and an explanation of the key parts of the macro code.

*Sub ComplexSortProblem()*

*Dim ColumnOffset As Long*

*Dim RowNum As Long*

*Dim LastRow As Long*

*Dim i As Long*

*Dim TotalRows As Long*

*LastRow = Application.WorksheetFunction.CountA(Range(“A:A”))*

*TotalRows = Application.WorksheetFunction.CountA(Range(“B:B”)) + LastRow – 2*

*RowNum = 2*

*Range(“C1”).Value = “City”*

*Range(“D1”).Value = “Postcode”*

*Range(“C1:D1”).Font.Bold = True*

*For i = 2 To TotalRows*

* If Cells(RowNum, 2).Value = “” Then*

* Rows(RowNum).Delete*

* ColumnOffset = 0*

* Else*

* If ColumnOffset >= 1 Then*

* Cells(RowNum – 1, 2 + ColumnOffset).Value = Cells(RowNum, 2).Value*

* Rows(RowNum).Delete*

* Else*

*RowNum = RowNum + 1*

* End If*

* ColumnOffset = ColumnOffset + 1*

* End If*

*Next i*

* *

*ActiveWorkbook.Worksheets(“Sheet1”).Sort.SortFields.Add Key:=Range(“A2:A” & LastRow), _*

* SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal*

* With ActiveWorkbook.Worksheets(“Sheet1”).Sort*

* .SetRange Range(“A1:D” & LastRow)*

* .Header = xlYes*

* .MatchCase = False*

* .Orientation = xlTopToBottom*

* .SortMethod = xlPinYin*

* .Apply*

* End With*

* *

*End Sub*

I used two sets of variables. One set for the shaping of the data and another for the loop.

The *i* and *TotalRows* variables were used solely for the loop. I calculated the number of loop iterations by counting the non-blank cells in column B, adding the non-blank cells from column A and subtracting 2 (there are 2 less blank rows in B than there are entries in A).

The variables *RowNum*,* ColumnOffset* and *LastRow* were used to actually re-shape the data.

*RowNum* tracked the current row through the process, only being increased when a row was not deleted.

*ColumnOffset* was used to reference a column to the right for the second and third address parts.

*LastRow* is a non-blank count on column A. This was used when sorting data at the end to have a dynamic last row identifier (**view other smart ways to find the last row in Excel VBA**)

A For loop is used to run it through a predetermined number of iterations (the number in the *TotalRows* variable).

view other smart ways to find the last row in Excel VBA

If statements are used to test the status of the cell in column B and apply the correct behaviour. If the cell is empty the row is deleted and the *ColumnOffset* variable reset because it means an address has ended.

If the *ColumnOffset* variable is 1 or greater, then we are in the middle of an address so move the data into that column offset and delete the row. If the cell is not empty, or in the middle of an address then increase the *RowNum* variable.

That is a brief explanation of key stages of the macro. Step through the code in an example to understand it better. Maybe you can improve upon it.

There are advantages to each approach and the path you take depends to an extent of where you feel more comfortable.

- Power Query is only available from Excel 2010. It’s an add-in called Power Query in 2010 and 2013. It’s found on the Data tab as Get & Transform in Excel 2016 and later.
- It will be easier though for most Excel folk to modify Power Query steps than to edit VBA code which is a more unique skillset.
- They are both flexible, but VBA offers more flexibility due to how vast the language is.
- Macro security is a concern when using VBA. Macros must be enabled before they can be executed.
- Both are one button refreshable options. So, there is a tie for speed of application.

Alan is the founder of **Computergaga** (https://www.computergaga.com). Alan also has a popular **YouTube channel**. When he is not talking about Excel he likes to spend his time running, hiking and spending time with his two children.

Read Alan’s **post** about importing multiple Excel files from a folder into a single Excel file!

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

**Power Query? VBA? **I love VBA but in this particular case I would definitely go with Alan’s Power Query solution. It’s so much easier to use Power Query for these types of data gymnastics.

**Improve Your Excel Skills!** Check out my recommended **Excel Training**!

]]>