Sunday, May 01, 2016

10 Steps to become a Data Scientist

The field of data science is a huge ocean just like any other field of science. There are always new technologies, concepts and models that have been developed to gain better insights and predictions and there’s a constant need to update yourself 
Without a doubt, one common mantra among most industry experts is that, the next hottest job in the tech industry or rather any industry would be in the department of analytics for the post of Data Scientist. 

So in this constantly evolving field I'm putting together 10 steps that can aid you in the path of becoming a successful "Data Scientist"

1. Statistics : Without a doubt, statistics can be a fundamental driver in your capability to analyse numbers, data and make meaningful insights out of them. Any software or application can only give you an output based on what statistical method you want to apply. So without the basic knowledge of statistics, there is no progression in this field.
Pro Tip : Get yourself well trained in basic concepts like Regression, Decision Trees, p-values, t-tests, Hypothesis testing , ANOVA . Knowledge of this itself will gain you a powerful hand at the table.

2. R- Programming: R is an free open source software that lets you perform advanced statistical programming and perform high end analytics with your data. With thousands of packages, inbuilt functions, it's relatively easy to learn and definitely makes it a powerful tool to use.  
                Pro Tip :  You can’t operate in the field of data science without getting to know R. It’s 
                equivalent of a journalist saying that he doesn’t know to use MS-Word

3.SQL : Even before you get into the world of Big Data, one language that’s common across all Big Data Database tools is SQL. Be it Hadoop, Vertica ,Teradata etc, any database will work with SQL directly. So knowing SQL can get you an exposure to these tools almost immediately

4. Visualization : After all that interesting analysis and insights you have derived, putting a few charts in excel and power point ain’t going to impress your client or employer. Designing it visually through popular BI tools like Qlikview, Spotfire, Tableau will get you that powerful edge than anyone else

5. Knowing your domain : Specializing in a particular domain or industry can make you a SME (Subject matter expert) which can make you the most sought after person for any analytics on those areas. Understanding what the data means is more important with respect to your current industry. It could be anything from manufacturing data or education data or even supply chain data.

6. Modelling : The most important aspect of data science is to build a model that can effectively describe your data and predict future outcomes given the various scenarios. Without proper modelling skills, its difficult to venture into the various aspects of data science .

7. Database : Let’s face it. You can’t build mathematical models or do key analysis without data. To get the data right and neatly structured , you should learn your ways in the world of Databases. Left and Outer Joins should be known at the back of your head and the only key that should matter to you hereafter should be the primary and foreign keys of a table

8. Online Courses :  The internet is full of data science courses. Some of them include Coursera (Data Science Specialization), DataCamp and  Udacity (Hadoop and Inferential Statistics). These help you go a long way in getting to know the fundamentals of data science

9. Pet Projects : Take 1 project that interests you. Maybe prediction of a cricket match score based on the past record of players or analysing the interdependencies between variables to cause an outcome for a particular dataset or even movie recommendations based on similar movies you like

10. Practice : The most fundamental truth in learning any subject is to work and practice hard. Work on it constantly. Make mistakes and learn upon them . Every mistake you make, debug them and you’ll know why you made the mistake and you’ll never go wrong again. Sites like and github have plenty of people willing to help you out in your statistical problems 
Previous Post
Next Post
Related Posts