Recently, my boss gave me an interesting mission. He used a dataset of customer leads that were generated from seven different campaign types. Pretty straightforward was my mission, but not without its challenges. It needed to be divided into four equal parts, with each part possessing an equal amount of each lead type. These were, in turn, to be injected into our telephony system for Outbound Customer Calls, scheduled to go live one segment per week.
Simple yet complex problem-so how did I tackle it in Power Query? Let's dive in.
Step 1: Loading the Data
I did the commencement in Power Query by loading the leads dataset, which initially was in CSV format. The list contained all kinds of information regarding leads, but I honed in on the column dealing with campaign type since it contained the key to data segmentation.
Step 2: Sorting by Campaign Type
Then, once the data was loaded, I sorted it according to 'campaign type' so that the leads would group according to their respective campaigns. This step was necessary because my task was to divide the dataset while keeping a balance in the number of leads from each campaign.
Step 3: Adding an Index Column
Now, to split these leads into four parts, I needed a systematic way in which I could break them up. I added an index column from the Add Column tab, which added a sequential number to each row. From there, I could attribute each row to one of the four parts based on its position in the dataset.
This is where things start to get interesting: the Number.Mod function is a little unsung hero in Power Query, providing a really efficient way to divide up your data. By adding a custom column and using the formula Number.Mod([Index],4) I told Power Query to take the index number of each row and divide by 4, then return the remainder.
But why would I use this function? The trick to this is this: if you divide it by 4, the remainder will always be the same: 0, 1, 2, or 3. That's my four pieces I needed to make. Rows with a remainder of 0 would go into the first piece, rows with a remainder of 1 went into the second, etc.
In this way, I balanced the data such that each campaign type would occur once across each of the four parts.
What is Number.Mod and Why Is It So Powerful?
The main use of the Number.Mod function is to get the remainder from an operation in division. The syntax in the formula box is as follows:
Number.Mod(number, divisor)
Here, the number is the value you want to divide and the divisor is the value you're dividing by. The function returns the remainder, which is within the range from 0 up to one less than the divisor.
In this case, I used an index number as the number and 4 as the divisor, so it would always return one of four values namely 0, 1, 2, or 3. This allowed me to split my data into four groups that were roughly of the same size.
Variants of Number.Mod
The Number.Mod function is not only used for these kinds of data splits. Here are several other interesting applications:
Cycling Through Values: If you have some cycling of values, such as days of the week or months, Number.Mod can extend the functionality of the row-to-category assignment by returning values on a repeating cycle.
Group Assignment in a Loop: For any operations that require some kind of distribution of data across a few processes, say different servers or teams, Number.Mod can ensure balanced assignments by cycling through fixed numbers of options.
Alternating colors or formatting can also be done with number.mod in reports to have different colors or formats for every other row to improve readability without having manually to format each one.
Step 5: Exporting the Parts for Weekly Operations
Once I had divided the data, I then exported each of the four parts into a separate CSV file, one for each week. This would mean that each file had an equal mix of leads from every campaign type, ready to be injected into our telephony system to drive outbound calls for these upcoming weeks.
Final Thoughts
What seemed at the beginning an ominous task became plausible through the power of Power Query and the simplicity of the Number.Mod function. Sorting combined with indexing, added to a bit of mathematical logic, served to break the dataset into perfectly balanced parts, a guarantee of smooth operations for weeks ahead.
Next time this type of challenge arises, such as division of data into chunks, or rotation through categories, remember the humble yet extremely powerful function called Number.Mod. It's a multi-purpose tool that saves hours of manual work and keeps your datasets neat and clean.
Post a Comment