Insights Starved Data Curiosity

How traditional Data Analysis tools cannot help anymore – The Life of Every Other Manager

 

bag of Words data Science

 

You would think that after reaching a middle or upper management position, a Manager would be doing lots of cool things in their day-to-data activity. However, from our experience in the GCC, most managers spend a lot of time and energy in trying to analyse data.

The fact is, junior employees do not understand the business well enough to analyse data and create meaningful reports and hence the onus of analysing data, creating recommendations & advising seniors forcibly falls to every manager’s hands. So, starts the journey outlined below!

 

Excel becoming obsolete in today’s Data Rich Culture

Once upon a time, Excel was good enough for the small datasets that professionals typically used to work with. Today we are working with large datasets that become cumbersome to analyse in Excel. For instance, most managers face some of the following frustrations. Do you face any of them?

  • When working with large datasets, the screen freezes as you scroll and each step in the analysis process takes time to execute.
  • Data Cleaning gives you the nightmares because it requires significant and meticulous manual effort. For example,
    • You have the different names for the same things in a column. For instance, DXB, Dubai, Dubay and Dubia (typo), etc. all mean “Dubai” in your City Column.
    • Responses in the date & currencies column are not standardized.
    • So you have to just grin & bear and start cleaning each problem one by one?
  • You want to work with different data files but joining them is a struggle.
    • For example, you have an HR file of all sales employees and their demographics like age, nationality, gender, income, etc. and a Sales file with sales of last one year with a column for which employee made the sale. You want to see how age or nationality of employee relates to Sales in different locations. How do you go about it in excel?
    • What about if the scenario becomes more complicated and you have more files that you want to join? Say for example you have a training file which lists the different training programs all sales employees completed in the last one year and you want to see which training programs led to maximizing of sales by Sales employees by the employee age or nationality.
  • Data Preparation is very cumbersome.  You make new columns for small analysis steps and copy-paste entire sheets so that you do not lose old data. Only to realize that you missed some small step 2 hours back and your current sheet is incorrect!
  • Data Mining is very limited. You have few functions and capabilities such as filter, sort, pivot tables, lookups, etc. but you wish to look at your data in new angles and ask some interesting questions because you are curious. However you don’t know how to get Excel to do those things, let alone if it is possible to do in excel, so you accept that this is a technology limitation and stop asking those types of questions.
  • Graphical Visualization is limited. You see graphs on the internet such as these and wonder in awe what “programmer” could make them for you.
  • And the WORST PART OF EXCEL is that you have to do all the above-mentioned steps again and again as new data comes in or you have to prepare same report each month or quarter.

Business Intelligence Software’s as the savior?

So, your company invests in a BI Tool and your IT Managers gets the biggest and shiniest toy in the BI Market for you to do your job. Now you are expected to perform & deliver. But let’s see what really happens (totally based on our experience).

  • You initially decided on some KPI’s based on which the BI Consultants made a few dashboards and left. That’s great. You don’t have to ever make those graphs again! This is the core function of BI, a tool for real-time monitoring of pre-defined KPI’s.
  • But what if you want to ask a new question? What if you just want to investigate & explore the data. Look at the HR example above for instance. Rarely is it worth the time and investment to make KPI’s just for exploratory data analysis.
  • Let’s say that you feel it is worth it to make this new KPI. Well, there is a process of that and it looks somewhat like this,
    • Send an email to the IT. Either they make it for you in a couple of days and with exchange of lots of emails or they forward that email to the BI team.
    • You get a quote from the BI team for a new KPI.
    • You wait to get approvals from upper management for budget for new KPI’s.
    • Explain why you did not have that KPI included the first time.
    • The truth is, no one bothers going through the above steps and you just work with what dashboards that you have.
  • Some advanced BI tools may have the capabilities to do basic exploratory analysis. However, given the close to infinite combination of steps you could take to answer a business question in your analysis, BI tools offer limited capabilities when it comes to exploring & manipulating data.
    • For example, you may want to select four different columns from 2 different datasets, filter column A on some criteria, then take a weighted average of column 2 & 3 to create a column 5 and then group column 5 based on responses in column 4 and take average of the different groups in column 5. (For context, these steps are very close to the steps you would use to solve the above mentioned HR & Sales Employee example).
  • What ends up happening is that everything becomes about the BI tool when the focus should have been on the data analysis. Try moving one step ahead in your analysis with a BI tool and mostly what you hear is ‘The tool can’t do this.’, ‘You need an upgraded license to do that.’, ’You have to write an SQL query in order get there’, ‘The IT team will review and get back to you’, etc.
  • You wonder, all you wanted to do was analyse the data and get on with the business.

There are many excellent BI tools are available in the market and BI’s are a must for organizations in today’s day and age. However, while a BI is good for real time reporting of selected KPI’s, it is not a good investigation and exploration tool because it is not an investigation and exploration tool. You use exploration tools to explore data and during the exploration process, when you stumble upon a ‘view of the data’ that you like, you can call that view a KPI and have your BI tool monitor that KPI on a real-time basis.

At the end, the end-user, the manager has to come home to Excel to make the reports the upper management need to see.

Advanced Data Analytics & Data Visualization with R  – Make Data Analysis a Joy!

Just as spreadsheets once revolutionized the work environment and replaced the scientific-calculator, Analytics Programming Languages like R & Python are revolutionizing how we analyse data. Data rich companies like Facebook, Google, Amazon, etc. do not limit themselves to Excel nor do they suffer the costs and limitations of BI tools. They are heavily dependent on open source analytics & statistical programming languages including R, Python and various others in order to determine trends, predict customer moods, etc. This is driving the growth of such Data Analytics languages as Artificial Intelligence, Predictive Modeling, Sentiment & Text Analysis, Social Media Analysis, etc. are becoming must-have capabilities to gain the competitive edge even for SME’s.

Some of the benefits of R for Data Analysis include,

  • It is Open Source & Free.
  • It is Secure & Private. Your data is yours.
  • It is the de-facto Prim facia language of statistical analysis. Advanced data analytics capabilities of each & every kind are possible without limitations. Advanced Data Cleaning, Data-set joining, Data Mining, Predictive modeling, Forecasting, Time Series, Classification, etc. are easy and intuitive to do.
  • It is a scripting language. Just write your analysis steps, i.e. ‘the script’ once and re-run the script with new data and it will output updated graphs & numbers for you. No need to redo everything like in excel.
  • The online community has free packages which have in-built formulas to calculate KPI’s for each & every industry or application imaginable.
  • It uses grammar of analytics & grammar of data visualization. Data Manipulation becomes like talking in English. For example, to graphically output the average number of students from 2010 to 2016 in different locations across Dubai, you would tell R something like the following,
    • Take the Data Set in question, then filter for the year 2010 to 2016, group all schools by the location they are based in and then plot the average number of students for all schools in a given location.
    • The graph you get will be something like this (click here). This is a small example of how a couple of lines of code can produce advanced data analysis, manipulation & visualization in seconds. The graph clearly illustrates some interesting trends that you can then further explore and analyse.

Non-IT folks Learning a Programming Language?

Some would argue that there is a steep learning curve in becoming proficient in R. However, just like one would learn excel, one would go about learning R.

Based on our experience in training business managers & executives from many different industries and background, most of them being from non-IT background, the need of the business and availability of a business-problem makes the journey of learning R easy. You have a dataset and you are stuck with analyzing it with Excel or BI. You start learning R with a problem in mind and the focus shifts from learning a new language to solving that problem. You just happen to learn R along the way.

Looking at this from another point of view, almost all data analysis requirements of non-IT managers are satisfied with a couple of functions & packages in R. The need to learn the exhaustive capabilities and functionality of R is not a key requirement. A 5-day crash-course in R is sufficient to get people started for their day-to-day requirements. From there on, it depends on a case-to-case basis if someone wants to further develop their skills sets. Packages for data analysis like Dyplr, Ggplot, etc. make working with R a breeze and a lot of fun and offer exhaustive range capabilities to the business manager looking for quick data analysis.

Also, there is a strong online community of users who are happy to help in case you are stuck. Rarely will you run into problems or have requirements which an online forum hasn’t already answered.

Another point to consider is that by being an open source language of statics that has been around for some time now, in comparison to BI tools, investing in learning R allows you to take your skills with you from one job to another and you are not dependent on whether your new organization has invested in licences for R.

Developing the right mind-set to exercise Data-Curiosity

The question is not whether you need to invest time in learning R. The question is how do you utilize the world of opportunities that raw data analysis platforms like R & Python open up for you!

We find that more than learning these languages, the challenge becomes how to exercise and develop the muscles of data-curiosity correctly. What we have witnessed is that once familiar with the different possibilities of data analysis available through R, managers need to be introduced to a standardized way of framing the data analysis problem, correctly applying statistical methods and making statistically valid inferences. With great power comes great responsibility!

Also, I would like to point that developing a mind-set for satiating data-curiosity is not dependent on learning a data-analysis language like R. With a new wave of students graduating with strong experience in R, Python, etc. and entering the work force, it is important for senior managers to know about the capabilities and risks of advanced data analysis instead of blindly believing newly hired Data-Scientists. They need to know the what and why of the steps employed for the analysis, they need to be able to ask which statistical methods were employed to create a model, etc. in order to evaluate the ‘correctness’ of the inferences and insights presented to them.

Our Training Program in Data Analytics with R

We offer a 5-day training program in Data Analytics with R and have successfully trained many senior executives & managers across the GCC. We start from the basics and do not expect participants to have any IT or programming background.

The course is aimed for management using case studies and examples of problems faced by managers on a day-to-day basis.

The training program opens a new world of opportunities that were not previously possible with spreadsheets & BI Tools. With these new capabilities achieved by R, you need to change the way you think about Data and Data Analysis and our training focuses on getting the participants on exercising their data-curiosity and framing Data Analysis problems with the right mind-set.

 

Check out our course details and see some case studies.

 

Best Wishes,

Nishant Das

B.Eng. – Engineering Physics

Data Scientist at Marketways Arabia & Big Data Analytics Trainer for Ambeone Training Institute

 

error: Content is protected !!