Alteryx

Find unique observations on two columns

Working with Data in Alteryx, you can come across datasets where two columns together act as a unique identifier for a single observation, resulting in two identical rows, where the two identifier columns are switched around. To get a better idea, consider the following dataset:


For the airline, a route is a line between two airports, meaning that CPH --> TXL is the same route as TXL --> CPH. Now let's say you want to find out which route had most flights in year 2000? 2001? What about overall?

You are interested in observations on the route level because they have the same attributes. However, the challenge you may face is that the order of the two identifiers change, and it is not possible to use the unique tool on two columns to find the unique observations.

Now imagine a dataset with thousands of rows and routes. The challenge might seem trivial, and while the solution is straight forward, it might not be your first thought. Why don't you take a second to think about how you would solve it? Unique tool on both columns? Reverse the order of the columns? This will only get you halfway. You will probably find yourself dragging tools from the red category to your canvas, such as transpose, summarize and crosstab. This is not a surprise, since these tools will allow you to fix the problem. The red tools have many configurations, for good reasons, but this also means that it's easy to make a mistake, and maybe not obvious where to begin. For that matter, I think that the following approach is a better and more stable solution that does not need to be reconfigured for new scenarios. It just works.

The formula tool will save your day

All you need is the formula tool. Now, imagine that your two identifiers are in field [1] and field [2].

  • Create a new column in the formula tool, you could call it "unique"
  • Put the following in the tool's input: Max(ToString([1]),ToString([2]))+"_"+Min(ToString([1]),ToString([2]))

This makes sure that no matter what, the order of the inputs from field [1] and field [2] in your new unique field will always be the same, and the field will always be independent from the order of field [1] and field [2]. The MAX/MIN functions return the highest or lowest value of the two columns. The "ToString"-formula is to make sure there is no mixup if there are both numeric and strings in the input fields.

For the example above, you would want to input the following in the formula tool: Max(ToString([Departure Airport]),ToString([Arrival Airport]))+"_"+Min(ToString([DepartureAirport]),ToString([Arrival Airport]))and then add a summarize tool to group by "Year" and "unique" and then sum "Number of flights" to get the right answers.
Below you can see the new column "unique" which now identify each unique route.



Back to blog


Subscribe for updates