Daily Dose of Data Science – Day 10 – Quick tips for preparing for Data Science interviews
Today marks the 10th day of Daily Dose of Data Science series! Since the last 10 days you have been getting certain tips, ideas and recommendations about various aspects of the Data Science process for all level of Data Science and Machine Learning enthusiasts. Considering my experience from the field, today I wanted to share some quick tips and suggestions to prepare for Data Science related roles for any organization.
From day 1 of Daily Dose of Data Science I have received many requests about including certain interview tips and suggestions especially for Data Science and Machine Learning enthusiasts who wants to transition into Data Science related roles or who might be looking for a job change. Now, considering my experience from the field and after taking over 100 interviews for Data Science and Machine Learning related roles, I thought of sharing some of my suggestions and more like a check list of topics or concepts you would need to know to successfully crack interviews for any organization.
Although, job requirements and skill-sets specified for Data Science roles in most recruiting platform is very fuzzy and it can vary from anything between a Data Engineer role to an AI Researcher role, but I would try to highlight the major areas and topics you would need to know for successfully cracking Data Science interview for any organization.
Key areas or topics most interviewers prefer to ask you during Data Science related job interviews are as follows:
- Programming Proficiency – Although there is a popular myth that you do not need to learn programming for being a data scientist, but majorly all FAANG companies prefer asking programming related questions in their initial rounds. Usually questions about Object Oriented Programming, Data Structures, Time Complexity of Algorithms and popular Computer Science algorithms related to Dynamic Programming, Divide and Conquer, Optimization algorithms are very frequently asked. One of the major reason is because in most of the cases, people who can code efficiently are always given more preference as they can express their solution or algorithm in most effective and efficient way. As Python is the most popular language for Data Scientists, please expect coding problems and programming related questions in Python in your initial rounds. Other languages like R, Matlab and Julia comes right after Python but brushing up your programming proficiency is needed. Nevertheless, data scientists are problem solvers and programming is another technique adopted for problem solving and hence programming proficiency matters alot!
- Dealing with Relational Databases – Before data science was glorified, it was the job of our friendly neighborhood Data Engineers and DBMS Experts to deal with structured data analysis, SQL queries and stored procedures. Now, Data Scientists are also expected to have exposure around Data Engineering, Extract-Transform-Load (ETL) workloads and Business Analysis to crunch insights from traditional data systems. So, knowledge on writing SQL queries, especially table joins, writing aggregate functions and performing ETL on data is an expectation for Data Scientists. May be writing very complex stored procedures will never be asked to a data scientist, but atleast the foundational skills are required.
- Maths and Statistics – I often get a very common question – “How much maths do I need to know, to become a Data Scientist?”. The correct answer is “It depends!”. If your role and responsibility is to come up with novel methods and algorithms to solve a problem, definitely you should have a strong understanding of mathematics and statistics. Especially, AI or ML Research Scientists are often expected to have a higher degree of understanding in maths and statistics. Otherwise, foundational and conceptual understanding on topics like – Probability and Distribution, particularly Joint Probability, Conditional probability, Linear Algebra, particularly around Matrix operation, Information Theory, Distance Metrics, Coordinate Systems, Polar Coordinate Systems, Function Optimization – Maxima/Minima, Hypothesis testing and other statistical methods around Mean, Median and Mode calculation is very important.
- Data Visualization and Reporting – Intuitions around data visualization is very important as it helps you to perform a thorough Exploratory Data Analysis. Also, reporting and story telling is another important aspect of data science. So, an applied proficiency in Matplotlib, Seaborn and Plotly can help and knowing other data visualization and reporting platforms like PowerBi and Tableau gives a much important edge for any candidate.
- Machine Learning – Yes, knowing Machine Learning is very important in order to become a data scientist! High proficiency in standard supervised and unsupervised learning is a basic expectation for most organizations. So, try to solve atleast one end to end Machine Learning problem related to Regression, Classification and Clustering problems before appearing for the interview and try to focus on Exploratory Data Analysis (EDA), Feature Engineering, Modeling algorithms like Decision Trees, Regression Trees, Linear Regression, Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbors (KNN), K-Means Clustering, DBSCAN and Multi Linear Perceptron (MLP) Neural Networks, even Boosting or Bagging algorithms. Focus on hyper parameter tuning, overfitting and underfitting, and model evaluation and cross validation. Considering the principles of Data Centric AI, interviewers nowadays even ask about concepts related to Data Drift, Data Quality Inspection and Model Interpretability as well.
- Deep Learning – “Is knowing Deep Learning mandatory to become a data scientist?” – well no, but it does give you an added advantage. “Does knowing only Deep Learning sufficient to become a data scientist?” – not at all. For most use-cases, you will find applying deep learning is not possible at all! So, as a data scientist, you would need to know when to apply for deep learning and when not to go for it. Deep Learning will never magically solve your problems, rather simple statistical methods are much more simple to use, tune and is more explainable. But yes, deep learning can help you alot for unstructured data analysis like dealing with images and text. If the data quality is good, deep learning is very effective. So, it is not mandatory, but knowing about Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNN), Transfer Learning, Data Augmentation, Hyper-parameter tuning intuitions, Transformers and attention networks, Activation functions would provide you an added edge over other candidates with limited exposure around deep learning.
- Model Deployment, Productionization and MLOps – DevOps, code deployment, CI/CD and Cloud services are terms very common in the Software Engineering world. But now, since more and more organization are trying to leverage scalable and sustainable AI and ML systems, Data Science and ML job roles are also expecting candidates to know about Deploying ML models in production either through cloud platforms like Azure, AWS and GCP or on edge devices like Nvidia platforms or Mobile devices running on Android or iOS. So, registering datasets, model versioning, model monitoring through A/B test and publishing model through CI/CD becomes necessary skills, particularly for ML engineers. Also, knowledge on cloud based data stores and data lakes, Web API deployment and usage, and other cloud based services from Azure, AWS and GCP gives alot of added advantage.
- Data and Problem Domain Knowledge – Common sense and domain knowledge is often the secret sauce to pull out a successful data science and machine learning project! Especially problems related to time series, computer vision, natural language are quite different from standard structured data problems. And business domain knowledge is extremely crucial to justify and understand the failure or working of various algorithms or approaches.
After going through all these, are you thinking that each point is a separate role in the industry? Well actually not. In startups and smaller organizations, roles around Data Science expects you to be in a sweet spot between all these skill-sets. Larger organizations and FAANGs do have well defined and distinct roles and responsibilities but yet exposure to all these areas are becoming necessities rather than a luxury. Mostly entry level professionals are expected to have proficiency in atleast 2-3 of these areas, senior level professionals are expected to know about atleast 4-5 of these areas and finally principal level professionals are expected to know all these areas with a good domain knowledge! Thus, being a good data scientist is never easy. But inspite of all the challenges, being a data scientist is extremely rewarding! It is not because of the hefty pay packages but the field and domain experience that you get, makes you feel like Sherlock Holmes and you start observing natural phenomena through different lenses by observing patterns in your day to day activities and your deduction power starts improving significantly!
So, these are some of the top tips and recommendations that I wanted to share considering my own field experience! But I can assure you that if you want to crack interviews related to data science roles, mastering all the above mentioned areas would help you alot!! That’s it for today’s dose! Visit again for another daily dose of data science and please feel free to like, share, comment and subscribe to my posts if you find it helpful!
One Response
Happy everyday!