4 sins against Tableau performance
Tableau is built for high speed analytics. However, when working on larger datasets there are some basic rules to follow to make sure Tableau still performs well. I discovered that by avoiding the following 4 antipatterns the likelihood og a smooth Tableau performance is increased significantly and I thus avoid having to look at this:
Avoid executing query and processing request
1) Do not show to many values
Tableau will do exactly what you ask of it. Therefore you need to give think about what you are actually asking for. If you ask Tableau to show 200.000 lines in a graph it will take time to render the graphics (and the viz will most likely be quite messy). The same concept applies to a list of .eg. customers. Do not ask for a list on 2 million customers, instead filter the results before you show the list - or create a set of the e.g. 1,000 most interesting customers by a given measure. You can't really use a list of 2 million customers anyways.
2) Blend on highest Level
Being able to combine datasources with eachother regardles of origen is a strong feature, but it can be a performance killer if it is done wrong. Lets say I have two datasets with the same 10 mill. customers. One dataset holds the sale and one dataset holds the spending on each customer together with a date. I would like to draw a nice line chart showing the development in both series over time.
As it is the same customers in each dataset it is tempting to blend on customer ID, but with 10 mill. rows to blend it will take time. Instead blend on the date - the customer ID is not shown in the graph and is thereby not needed anyways. The conclusion to be made is thus to always blend on the highest possible aggregation whenever it is possible (or consider a native join instead).
3) Limit the Calculated Fields (Especially on Strings)
When I use calculated fields it is because:
- I need to make calculations across data (not possible to put in the native data source)
- I need to make a quick analysis or try out a new calculation that I will later include in the native data source.
I often incorporate the calculated fields in the native data source when it is possible because each calculated field will lower the performance slightly. The performance considerations are particularly relevant when making calculations on strings eg. "IF left("very long string",6) = "string" Then..". Calculations on booleans or numbers will (as it normally applies) perform faster.
4) Do Not Join/Blend Everything - Use Action Filters
A great alternative for joining or blending data that often will be sufficient is to build action filters that works across multiple data sources. Instead of blending data sources, this method will give the user the feeling that the data is connected.
An example could be a situation where a 5 out of 6 dashboards are on an aggregate level where the customer level is not need (instead city could be the lowest level). For performance reasons it would make sense to have a data source aggregated on city level in these dashboard.
The last dashboard will use another data source on customer level, but because the filters work across multiple data sources, the user does not notice that the data source is changed.