Today marks my foray into the field of data science. And it starts with Kirill Eremenko’s Udemy course. I’ve had some very late nights taking in all this information and learned 5 things in my first week:
1. Data Science is literally a science. So you have to think like a Scientist.
“Science” is a term that gets thrown around a lot and has been co-opted by people to describe “Things I think that are cool and smart-sounding”. But the “science” half of data science isn’t messing around.
For example, Dimensions are considered an independent variable. With Measures being a dependent variable. It’s a page taken right out of mathematical modeling. Blue pills represent Discrete data and green pills represent Continuous data. Both of which are mathematical terms.
From there, Discrete vs Continuous Data can have profound implications on how your data visualization appears. This article by Interlinks has been extremely helpful to me in understanding this concept.
Part of being a Scientists is questioning everything. And if you aren’t the type to question why your data gets aggregated or visualized a certain, you’ll need to change that habit. You can’t press a button, watch things happen, and just be content with that. You have to ask, and understand, why things happen. And you won’t be able to get there on your own. The Tableau community, whether on the /r/Tableau subreddit or the official Tableau Support Forums, are very supportive with members who encourage curiosity. The Tableau community attracts problem-solvers, which is refreshing.
Data Science and Data Visualization is not a “failsafe” field that non-STEM people can just slide into once COVID-19 wrecks their current industry. You’ve got to ask questions, change the way you think about data, and you have to be curious and want answers.
2. Tableau beats the crap out of Excel for data visualization
And I say this as someone who spent years taking complex financial data and “visualizing” it for roadshow presentations and IPOs for large Canadian banks. Any companies who insist on using Excel for data visualization might as well be using a frying pan to hammer in a nail. Sure, you could do it. But you’re using the wrong tool for the wrong purpose. Excel does spreadsheets. Tableau is purpose-built for data visualization. Period.
Tableau shines because it’s so easy to re-arrange data, and it has data interactivity. This isn’t possible in Excel Microsoft Excel regardless of any third-party add-ons you throw at it. Tableau is also fast, sleek, and is far more intuitive as far as data visualization goes.
Microsoft Excel, on the other hand, has zero interactivity and a lot of file bloat when your datasets get large and complex. But most importantly- Excel has no tool for displaying more than one chart/data visualization on the same page. You can put a bunch of graphs on one worksheet, sure. But it looks like hell. It forces you to hop back-and-forth between endless worksheets, which is a gigantic waste of time.
Tableau also does many things better than Excel and in fewer steps. For example- adding a dual axis is so much faster and easier in Tableau. And so is adding labels. There were a lot of times where I followed a Tableau tutorial do something, and looked up in disbelief and said “Wait. That’s it? That’s all I have to do?”
Excel is for spreadsheets. Period. It works beautifully with Tableau for collecting data, but Excel needs to stay in its lane.
3. Blue pills do not represent dimensions and Green pills do not represent Measures
It’s a common misconception, and apparently one that is taught in many circles- including Kirill Eremenko’s Udemy course. And having this (wrong) misconception sit in the back of my brain only confused me while I did my online courses. A lot of times I found myself wondering “How does Tableau know what data is a Dimension, and which ones are Measures?
The answer is: it doesn’t. And that’s because:
Blue pills represent Discrete Data. Which is individually separate and distinct data. For example- country names. You can’t have a rising, continuous scale of country names.
Green pills represent Continuous Data. Data that forms an unbroken whole, without interruption.
This article from The Data School does a great job explaining the difference between the pills. Shamelessly copy and pasting this for my own note-taking purposes:
- Blue things group your data
- Green things count your data
- Dimensions split up the view
- Measures fill the view
4. Dimensions and Measures are a core concept. And you need to understand them
Everything in Tableau begins with Dimensions and Measures. Everything.
And after completing my first Udemy experience I’ve learned that the best source for getting the right definitions is straight from the horse’s mouth. Every high level concept in Tableau branches outward from this section of the official website that defines Dimensions and Measures.
5. Data needs to be cleaned up before it gets imported into Tableau
To most fleshy, squishy human beings a “good” spreadsheet is one that’s nicely coloured and pretty with candy-striped rows striped rows, Sub-Totals, and Totals. Tableau, a cold, non-feeling machine, hates this. It’s hard for Tableau to read.
Instead, Tableau wants this:
A whopping 70% of a Data Scientists’ time is spent cleaning up and re-organizing data so it effortlessly slides down the gullet of the machine-powered Tableau software. Some guidelines for keeping data nice and clean:
- Remove Totals and Sub-Total lines
- Remove unnecessary titles and heads
- Use columns and de-pivot source data
- Make sure that there are no empty cells between filled cells
- Use the Tableau Data Interpreter to clean up data
- Tableau prefers row-wise data over column-wise data
- Use Tableau’s built-in ‘Pivot’ functionality if you’re data is still too column-y.
That’s all for now. The Advanced course is up next.