This formula is immortalized as Newton’s 2nd Law of Motion. The above is an example of Exploratory Data Analysis. You play around with your data set to find a pattern between different variables. For example, the time it takes for the apple (Variable A) increases with the height the apple is dropped from (Variable B). We say that Variable A correlates with Variable B. In datasets that we use today, it is not easy or apparent to determine these correlations. For example, what exactly determine sales? Is it the price? Is it the discounts? Is it the day of the Month or Week? Is it where the product is placed? Is it the color of the product? It could be anything and indeed, we know it is a combination of many many different variables.
Until we do not “explore” the data, we will not be able to identify these trends and correlation. However, the information is there, hidden in your data set.
This is known as Exploratory Data Analysis.
Now, let’s fast-forward from Newton’s time to today. A car developer takes the 2nd law of Motion formula, F=ma and plugs it into a car system so that a driver can see the exact speed and acceleration of the car at any given point in time in the speedometer. Every time the driver is driving, the speedometer ‘dashboard’ in his/her car is giving him/her ‘real time information’ about the car’s speed and acceleration.
The car speed and acceleration are nothing but what business lingo terms as KPI’s and the car’s speedometer dashboard is doing nothing more than what a Business Intelligence Tool is supposed to do.
A BI tool is a reporting tool that will give you information on ‘pre-defined’ KPI’s in real-time.
It will calculate the same formula again and again as the variables change in real-time and update the KPI accordingly. The car speedometer only knows how to calculate speed and will calculate speed of the car forever, nothing more and nothing less.
So in essence, you need a software to Explore & Analyze Data to determine trends & patterns. You can then create formula’s out of these trends so that you know the state of your variable of interest (i.e. KPI) at any given point of time. Then you need a BI tool where you input the formula you created to tell you the current & real-time state of your KPI or variable of interest.
Say for example, you have email’s that your sales agents have sent out to potential customers. Like Sir Issac, you believe that there could be some formula that determine sales. You ask yourself what variables could be driving sales? Is it the time of the day the emails were sent, is it the number of email reminders sent? Is it the length of the emails?
You conduct an exhaustive Exploratory Data Analysis using R or Python and use some cool tricks such as Text Analytics & Sentiment Analysis and determine that (let’s assume) there is a strong and undeniable trend in the number of times sales agent uses the name of the potential client or the name of the potential client’s company in the email.
This is the fruit of Data Analysis and you would not find this information without exploring the dataset and mining it for information. Once you have quantified or formularized this trend, you can use the formula to create KPI’s or Business Targets or Business Best Practices that you team needs to meet or follow.
In our example above, you can create a KPI, the ‘Times the Clients or Client’s Company Name is in an Email’ and plug that into your BI tool to monitor all your sales agents in real-time. Any time they go above or below the desired range of say 3 to 6, you get an alert on your BI tool and you can quickly rectify the situation.
So, to summarize,
Data Anlaytics Languages are good for,
- Exploring Data Sets
- Discovery of trends, patterns, correlations, etc.
- Requires time to clean, explore & mine the data. The data could be unstructured & dirty (such as emails, tweets, video footage, etc.) and it will be your responsibility to clean it and get it to a state fit for analysis.
- Once you have completely explored and analyzed a data-set, you move onto new data sets.
BI Tools on the other hand,
- Real Time reporting of pre-defined KPI’s.
- Good for business operations and checking pulse of important variables.
- Requires that the data be in a well-structured and ‘agreed’ format. BI tools will crash if you give it unacceptable types of data.
- You use the same formula again and again on the same dataset, only the dataset keeps increasing in the number of observations (i.e. # of rows). This is
So, this is my take on the main difference between a data analysis language like R and typical BI tools. At times there are overlaps as BI tools claim to have capabilities of data exploration. However, given the infinite steps you could take in discovering a formula, these capabilities are very limited in comparison to what R offers. Read more about this in our blog post here.