Q- How do you handle missing or corrupted data in a dataset?
You
could find missing/corrupted data in a dataset and either drop those
rows or columns, or decide to replace them with another value.
In
Pandas, there are two very useful methods: isnull() and dropna() that
will help you find columns of data with missing or corrupted data and
drop those values. If you want to fill the invalid values with a
placeholder value (for example, 0), you could use the fillna() method.
Q- Do you have experience with Spark or big data tools for machine learning?
You’ll
want to get familiar with the meaning of big data for different
companies and the different tools they’ll want. Spark is the big data
tool most in demand now, able to handle immense datasets with speed. Be
honest if you don’t have experience with the tools demanded, but also
take a look at job descriptions and see what tools pop up: you’ll want
to invest in familiarizing yourself with them.
Q- Pick an algorithm. Write the psuedo-code for a parallel implementation.
This
kind of question demonstrates your ability to think in parallelism and
how you could handle concurrency in programming implementations dealing
with big data. Take a look at pseudocode frameworks such as Peril-L and
visualization tools such as Web Sequence Diagrams to help you
demonstrate your ability to write code that reflects parallelism.
Q- What are some differences between a linked list and an array?
An
array is an ordered collection of objects. A linked list is a series of
objects with pointers that direct how to process them sequentially. An
array assumes that every element has the same size, unlike the linked
list. A linked list can more easily grow organically: an array has to be
p-defined or re-defined for organic growth. Shuffling a linked list
involves changing which points direct where — meanwhile, shuffling an
array is more complex and takes more memory.
Q- Describe a hash table.
A
hash table is a data structure that produces an associative array. A
key is mapped to certain values through the use of a hash function. They
are often used for tasks such as database indexing.
Q- Which data visualization libraries do you use? What are your thoughts on the best data visualization tools?
What’s
important here is to define your views on how to properly visualize
data and your personal pferences when it comes to tools. Popular tools
include R’s ggplot, Python’s seaborn and matplotlib, and tools such as
Plot.ly and Tableau.
Machine Learning Interview Questions: Company/Industry Specific
Q- How would you implement a recommendation system for our company’s users?
A
lot of machine learning interview questions of this type will involve
implementation of machine learning models to a company’s problems.
You’ll have to research the company and its industry in-depth,
especially the revenue drivers the company has, and the types of users
the company takes on in the context of the industry it’s in.
Q- How can we use your machine learning skills to generate revenue?
This
is a tricky question. The ideal answer would demonstrate knowledge of
what drives the business and how your skills could relate. For example,
if you were interviewing for music-streaming startup Spotify, you could
remark that your skills at developing a better recommendation model
would increase user retention, which would then increase revenue in the
long run.
The startup metrics Slideshare linked above
will help you understand exactly what performance indicators are
important for startups and tech companies as they think about revenue
and growth.
Q- What do you think of our current data process?
This
kind of question requires you to listen carefully and impart feedback
in a manner that is constructive and insightful. Your interviewer is
trying to gauge if you’d be a valuable member of their team and whether
you grasp the nuances of why certain things are set the way they are in
the company’s data process based on company- or industry-specific
conditions. They’re trying to see if you can be an intellectual peer.
Act accordingly.
Machine Learning Interview Questions: General Machine Learning Interest
Q- What are the last machine learning papers you’ve read?
Keeping
up with the latest scientific literature on machine learning is a must
if you want to demonstrate interest in a machine learning position. This
overview of deep learning in Nature by the scions of deep learning
themselves (from Hinton to Bengio to LeCun) can be a good reference
paper and an overview of what’s happening in deep learning — and the
kind of paper you might want to cite.
Q- Do you have research experience in machine learning?
Related
to the last point, most organizations hiring for machine learning
positions will look for your formal experience in the field. Research
papers, co-authored or supervised by leaders in the field, can make the
difference between you being hired and not. Make sure you have a summary
of your research experience and papers ready — and an explanation for
your background and lack of formal research experience if you don’t.
Q- What are your favorite use cases of machine learning models?
The
Quora thread above contains some examples, such as decision trees that
categorize people into different tiers of intelligence based on IQ
scores. Make sure that you have a few examples in mind and describe what
resonated with you. It’s important that you demonstrate an interest in
how machine learning is implemented.
Q- How would you approach the “Netflix Prize” competition?
The Netflix Prize was a famed competition where Netflix offered $1,000,000
for a better collaborative filtering algorithm. The team that won
called BellKor had a 10% improvement and used an ensemble of different
methods to win. Some familiarity with the case and its solution will
help demonstrate you’ve paid attention to machine learning for a while.
Q- Where do you usually source datasets?
Machine
learning interview questions like these try to get at the heart of your
machine learning interest. Somebody who is truly passionate about
machine learning will have gone off and done side projects on their own,
and have a good idea of what great datasets are out there. If you’re
missing any, check out Quandl for economic and financial data, and
Kaggle’s Datasets collection for another great list.
Q- How do you think Google is training data for self-driving cars?
Machine
learning interview questions like this one really test your knowledge
of different machine learning methods, and your inventiveness if you
don’t know the answer. Google is currently using recapture to source
labelled data on storefronts and traffic signs. They are also building
on training data collected by Sebastian Thrun at GoogleX — some of which
was obtained by his grad students driving buggies on desert dunes!
Q- How would you simulate the approach AlphaGo took to beat Lee Sidol at Go?
AlphaGo
beating Lee Sidol, the best human player at Go, in a best-of-five
series was a truly seminal event in the history of machine learning and
deep learning. The Nature paper above describes how this was
accomplished with “Monte-Carlo tree search with deep neural networks
that have been trained by supervised learning, from human expert games,
and by reinforcement learning from games of self-play.”
Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
ReplyDeleteData Science Training in Hyderabad
Data Science course in Hyderabad
Data Science coaching in Hyderabad
Data Science Training institute in Hyderabad
Data Science institute in Hyderabad