28 Nov 2018

In this guest post Alan Murray (computergaga) compares two solutions to a challenging data sorting issue. I know my preference, what about you?

 

The Challenge

I was sent an interesting challenge recently to sort a list of names and addresses. This doesn’t sound too difficult a task, right?

Unfortunately, the list of names and addresses is in the format below. And has hundreds of them.

 

 

 

 

 

 

 

 

 

 

My first thought was to create a macro. Ideas were running through my head immediately about how to tackle it. I was confident it would be easy with VBA but I didn’t want to limit myself to VBA.

In this post I will explain two methods to achieve our goal of sorting the list by name. One using Excel VBA, and another using Power Query.

 

 

The Data

Raw data file.xlsx (recreate the solution) or get Alan’s Power Query solution  Sorting challenge.xlsx.

 

 

Power Query Solution

It is nice to have multiple options at our disposal. I used to rely so much on VBA. As soon as a complex task came along, I would dive straight into it. But with Power Query, we can set up a process that others may feel more comfortable with and may be easier to edit in the future.

Once the Power Query process is set up. It can be refreshed at the click of a button in the future if more addresses are added, or information changed.

 

Load Data Into Power Query

Select the range of cells and click Data > From Table/Range (the wording of this step may differ depending on your version of Excel). Check that the range is correct, and that our range does contain headers. Click Ok.

 

 

 

 

 

 

The range opens in the Power Query Editor. We will now step through some processes to transform this data into something we can sort.

 

Remove Blank Rows

We will start by removing the blank rows. We can do this by filtering out the null values. Click the Filter arrow in the Addresses column and uncheck the Null box.

 

3 Rows Into 1

The addresses are the main problem here because they spill over 3 rows. We need to get these into one row along with the name to sort the list. That is our goal now.

Let’s start by filling out the Names column. Select the Name column. On the ribbon click Transform > Fill and Down.

 

 

 

 

 

 

 

 

 

 

 

What’s The Pattern?

Now to get the addresses into column we need to understand the pattern in our data. In this example it repeats every 3 rows. If we can group these repeats, then we can pivot them.

Let’s start by inserting an Index column. Click Add Column tab and then Index. A new column is added, and the index values begin from 0.

We will now add a modulo column to group the repeating values. Select the Index column. Click the Add Column tab > Standard > Modulo. Enter 3 as the value.

The modulo calculation categorizes each repeating value for us.

 

 

 

 

 

 

 

Rows Into Columns

Let’s now get them into columns. Select the Modulo column. Click the Pivot Column button on the Transform tab. Select the Addresses column for the Values Column. We do not want any aggregate calculations, so click Advanced Options and select Don’t Aggregate.

 

 

 

 

 

 

 

 

Fill In Blank Cells

This is really beginning to take shape. Now let’s fill in some blank cells. Select the 0 column (Address column) and click Transform > Fill > Down. Then select column 2 (postcode column) and click Transform > Fill > Up.

 

 

 

 

 

 

 

 

 

 

 

Reduce To One Row

Let’s reduce the list to one row for each record of data. Click the filter arrow for column 1 and uncheck the Null box.

 

 

 

 

 

 

The Index column is no longer needed. Right click the Index column and select Remove.

 

Rename Column Headers

The 3 column headers for the address details need to be renamed with something more meaningful than 0, 1 and 2.

Double click each header and enter an appropriate name. Address, City and Postcode were used in my example below.

 

 

 

 

 

 

Sort The Data

You can see that the list has already been sorted automatically by Power Query. However, I would like to sort this list. I feel better with this step added.

Select the Name column and click the Sort Ascending button on the Home tab.

All of steps that we have undertaken are listed in the Applied Steps pane. We can use this to remove or edit steps of the process later.

 

Name The Query

We can also name the query here. And that is important. Click in the Name box and enter SortList as the query name.

 

 

 

 

 

 

 

 

 

 

 

Send Query Back to Sheet

We’re done! Let’s put the query results back into a sheet. Click ‘File’ (top left), ‘Close & Load to’, ‘New Worksheet’.

 

 

VBA Solution

By using VBA, we have many approaches to re-shaping this data and sorting by name. This is the method that I used and an explanation of the key parts of the macro code.

Sub ComplexSortProblem()

Dim ColumnOffset As Long

Dim RowNum As Long

Dim LastRow As Long

Dim i As Long

Dim TotalRows As Long

LastRow = Application.WorksheetFunction.CountA(Range(“A:A”))

TotalRows = Application.WorksheetFunction.CountA(Range(“B:B”)) + LastRow – 2

RowNum = 2

Range(“C1”).Value = “City”

Range(“D1”).Value = “Postcode”

Range(“C1:D1”).Font.Bold = True

For i = 2 To TotalRows

    If Cells(RowNum, 2).Value = “” Then

        Rows(RowNum).Delete

        ColumnOffset = 0

    Else

        If ColumnOffset >= 1 Then

            Cells(RowNum – 1, 2 + ColumnOffset).Value = Cells(RowNum, 2).Value

            Rows(RowNum).Delete

        Else

RowNum = RowNum + 1

        End If

        ColumnOffset = ColumnOffset + 1

    End If

Next i

 

ActiveWorkbook.Worksheets(“Sheet1”).Sort.SortFields.Add Key:=Range(“A2:A” & LastRow), _

        SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal

    With ActiveWorkbook.Worksheets(“Sheet1”).Sort

        .SetRange Range(“A1:D” & LastRow)

        .Header = xlYes

        .MatchCase = False

        .Orientation = xlTopToBottom

        .SortMethod = xlPinYin

        .Apply

    End With

 

End Sub

 

I used two sets of variables. One set for the shaping of the data and another for the loop.

The i and TotalRows variables were used solely for the loop. I calculated the number of loop iterations by counting the non-blank cells in column B, adding the non-blank cells from column A and subtracting 2 (there are 2 less blank rows in B than there are entries in A).

The variables RowNum, ColumnOffset and LastRow were used to actually re-shape the data.

RowNum tracked the current row through the process, only being increased when a row was not deleted.

ColumnOffset was used to reference a column to the right for the second and third address parts.

LastRow is a non-blank count on column A. This was used when sorting data at the end to have a dynamic last row identifier (view other smart ways to find the last row in Excel VBA)  

A For loop is used to run it through a predetermined number of iterations (the number in the TotalRows variable).

view other smart ways to find the last row in Excel VBA

If statements are used to test the status of the cell in column B and apply the correct behaviour. If the cell is empty the row is deleted and the ColumnOffset variable reset because it means an address has ended.

If the ColumnOffset variable is 1 or greater, then we are in the middle of an address so move the data into that column offset and delete the row. If the cell is not empty, or in the middle of an address then increase the RowNum variable.

That is a brief explanation of key stages of the macro. Step through the code in an example to understand it better. Maybe you can improve upon it.

 

 

Deciding On The Solution

There are advantages to each approach and the path you take depends to an extent of where you feel more comfortable.

  • Power Query is only available from Excel 2010. It’s an add-in called Power Query in 2010 and 2013. It’s found on the Data tab as Get & Transform in Excel 2016 and later. 
  • It will be easier though for most Excel folk to modify Power Query steps than to edit VBA code which is a more unique skillset.
  • They are both flexible, but VBA offers more flexibility due to how vast the language is.
  • Macro security is a concern when using VBA. Macros must be enabled before they can be executed.
  • Both are one button refreshable options. So, there is a tie for speed of application.

 

 

Guest Post Author: Alan Murray

 

 

 

 

Alan is the founder of Computergaga (https://www.computergaga.com). Alan also has a popular YouTube channel. When he is not talking about Excel he likes to spend his time running, hiking and spending time with his two children.

Read Alan’s post about importing multiple Excel files from a folder into a single Excel file!

 

 

About Me

My name is Kevin Lehrbass. I’m a Data Analyst living in Markham Ontario Canada.

Power Query? VBA? I love VBA but in this particular case I would definitely go with Alan’s Power Query solution. It’s so much easier to use Power Query for these types of data gymnastics.

Improve Your Excel Skills! Check out my recommended Excel Training!

 

 

2 Comments

  1. Mark says:

    Hey Kevin / Alan,
    Good post, I would definitely go with the PowerQuery option too.
    I’ve been playing with the new dynamic array formulas and think it’s possible to solve this with just 2 formulas.
    1) Put the data in a Table (called “Table1” in the formulas below)
    2) In Cell D2 enter the formula: =SORT(FILTER(Table1[Name],LEN(Table1[Name])>1))
    3) In Cell E2 enter the formula: =INDEX(Table1[Address],MATCH(D2#,Table1[Name],0)+{0,1,2})
    With it being linked to a Table the result will expand if new data is added. Though I admit it won’t correctly work if there are duplicate names in the table.

    1. Kevin Lehrbass says:

      Hi Mark,
      Those dynamic arrays are revolutionary!
      Yes…many solutions can be redone using them!
      Thanks for reading Mark!
      Cheers,
      Kevin

Post a comment