Data scientists are big data wranglers. They take an
enormous mass of messy data points (unstructured and structured) and use their
formidable skills in math, statistics and programming to clean, massage and
organize them. Then they apply all their analytic powers – industry knowledge,
contextual understanding, skepticism of existing assumptions – to uncover
hidden solutions to business challenges
Data Scientist Responsibilities
“A data scientist is
someone who is better at statistics than any software engineer and better at
software engineering than any statistician.”
On any given day, a data scientist may be required to:
- Conduct undirected research and frame
open-ended industry questions
- Extract huge volumes of data from multiple
internal and external sources
- Employ sophisticated analytics programs,
machine learning and statistical methods to prepare data for use in
predictive and prescriptive modeling
- Thoroughly clean and prune data to discard
irrelevant information
- Explore and examine data from a variety of
angles to determine hidden weaknesses, trends and/or opportunities
- Devise data-driven solutions to the most
pressing challenges
- Invent new algorithms to solve problems and
build new tools to automate work
- Communicate predictions and findings to
management and IT departments through effective data visualizations and
reports
- Recommend cost-effective changes to existing
procedures and strategies
Every company will have a different take on job tasks.
Some treat their data scientists as glorified data analysts or combine their duties with data engineers; others need top-level
analytics experts skilled in intense machine learning and data visualizations.
As data scientists achieve new levels of experience or
change jobs, their responsibilities invariably change. For example, a person
working alone in a mid-size company may spend a good portion of the day in data
cleaning and munging. A high-level employee in a business that offers
data-based services may be asked to structure big data projects or create new
products.
Data Scientist Salaries
The term “data scientist” is the hottest job title in the
IT field – with starting salaries to match. It should come as no surprise
that Silicon Valley is the new Jerusalem. According to a 2014 Burtch Works study, 36% of data
scientists work on the West Coast. Entry-level professionals in that area earn
a median base salary of $100,000 – 22% more than their Northeast peers.
Data Scientist
Glassdoor
Average Salary (2015): $118,709 per year
Minimum: $76,000
Maximum: $148,000
Average Salary (2015): $118,709 per year
Minimum: $76,000
Maximum: $148,000
PayScale
Median Salary (2015): $93,991 per year
Total Pay Range: $63,524 – $138,123
Median Salary (2015): $93,991 per year
Total Pay Range: $63,524 – $138,123
Senior Data Scientist
PayScale
Median Salary (2015): $124,273 per year
Total Pay Range: $89,801 – $179,445
Median Salary (2015): $124,273 per year
Total Pay Range: $89,801 – $179,445
Data Scientist Qualifications
Broadly speaking, you have 3 education options if you’re
considering a career as a data scientist:
- Degrees
and graduate certificates provide structure,
internships, networking and recognized academic qualifications for your
résumé. They will also cost you significant time and money.
- MOOCs
and self-guided learning courses are free/cheap, short and
targeted. They allow you to complete projects on your own time – but they
require you to structure your own academic path.
- Bootcamps are intense and faster to
complete than traditional degrees. They may be taught by practicing data
scientists, but they won’t give you degree initials after your name.
Academic qualifications may be more important than you
imagine. As Burtch Works notes, “it’s incredibly rare for
someone without an advanced quantitative degree to have the technical skills
necessary to be a data scientist.”
In its data science salary report, Burtch Works
determined that 88% of data scientists have a master’s degree and 46% have a
PhD. The majority of these degrees are in rigorous quantitative, technical or
scientific subjects, including math and statistics (32%), computer science
(19%) and engineering (16%).
With that being said, companies are desperate for
candidates with real-world skills. Your technical know-how may trump preferred
degree requirements.
What Kind of Skills Will
I Need?
Technical Skills
- Math (e.g. linear algebra, calculus and
probability)
- Statistics (e.g. hypothesis testing and summary
statistics)
- Machine learning tools and techniques (e.g.
k-nearest neighbors, random forests, ensemble methods, etc.)
- Software engineering skills (e.g. distributed
computing, algorithms and data structures)
- Data mining
- Data cleaning and munging
- Data visualization (e.g. ggplot and d3.js) and
reporting techniques
- Unstructured data techniques
- R and/or SAS languages
- SQL databases and database querying languages
- Python (most common), C/C++ Java, Perl
- Big data platforms like Hadoop, Hive & Pig
- Cloud tools like Amazon S3
This list is always subject to change. As Anmol Rajpurohit suggests, “generic
programming skills are a lot more important than being the expert of any
particular programming language.”
Business Skills
- Analytic Problem-Solving: Approaching high-level
challenges with a clear eye on what is important; employing the right
approach/methods to make the maximum use of time and human resources.
- Effective
Communication: Detailing your techniques
and discoveries to technical and non-technical audiences in a language
they can understand.
- Intellectual
Curiosity: Exploring new territories
and finding creative and unusual ways to solve problems.
- Industry
Knowledge: Understanding the way your chosen
industry functions
and how data are collected, analyzed and utilized.
Note: You can view a handy trajectory on How to Become a Data Scientist in an infographic from Datacamp. Also, KDnuggets.com is a great source of information on big data, machine
learning, and data science topics.
What About
Certifications?
To avoid wasting time on poor quality certifications, ask
your mentors for advice, check job listing requirements and consult articles
like Tom’s IT Pro “Best Of” certification lists.
Here are a few that focus on useful skills:
Certified Analytics Professional (CAP)
CAP was created in 2013 by the Institute for Operations Research and the Management
Sciences (INFORMS) and is targeted
towards data scientists. During the certification exam, candidates must
demonstrate their expertise of the end-to-end analytics process. This
includes the framing of business and analytics problems, data and methodology,
model building, deployment and life cycle management.
Requirements:
- 5+ years of analytics work-related experience
for BA/BS holder in a related area
- 3+ years of analytics work-related experience
for MA/MS (or higher) holder in a related area
- 7+ years of analytics work-related experience
for BA/BS (or higher) holder in an unrelated area
- Verification of soft skills/provision of
business value by employer
- Agreement to adhere to Code of Ethics
Cloudera Certified Professional: Data Scientist (CCP:DS)
Targeted towards the elite level, the CCP:DS is aimed at
data scientists who can demonstrate advanced skills in working with big
data. Candidates are drilled in 3 exams – Descriptive and Inferential
Statistics, Unsupervised Machine Learning and Supervised Machine Learning – and
must prove their chops by designing and developing a production-ready data
science solution under real-world conditions.
Related Cloudera certifications include:
EMC: Data Science Associate (EMCDSA)
The EMCDSA certification tests your ability to apply
common techniques and tools required for big data analytics. Candidates are
judged on their technical expertise (e.g. employing open source tools such as
“R”, Hadoop, and Postgres, etc.) and their business acumen (e.g. telling a
compelling story with the data to drive business action).
Once you’ve passed the EMCDSA, you can consider the Advanced Analytics Specialty. This works on
developing new skills in areas such as Hadoop (and Pig, Hive, HBase), Social
Network Analysis, Natural Language Processing, data visualization methods and
more.
SAS Certified Predictive Modeler using SAS Enterprise
Miner 7
This certification is designed for SAS Enterprise Miner
users who perform predictive analytics. Candidates must have a deep, practical
understanding of the functionalities for predictive modeling available in SAS
Enterprise Miner 7 before they can take the performance-based exam. This exam
includes topics such as data preparation, predictive models, model assessment
and scoring and implementation.
Related SAS certifications include:
- Statistical Business Analyst Using SAS 9: Regression
and Modeling
- Business Intelligence Content Developer for SAS 9
Jobs
Similar to Data Scientist
Some data scientists get their start working as low-level Data Analysts, extracting structured data from
MySQL databases or CRM systems, developing basic visualizations or analyzing
A/B test results. These jobs aren’t usually that challenging.
However, once you have your technical skills in order,
you have plenty of options. If you’d like to push beyond your analytical role,
you could think about building/engineering/architecture jobs such as:
- Data/Big Data Engineer
- Data/Big Data Architect
- Hadoop Developer
Data Scientist Job Outlook
In an oft-cited 2011 big data study, McKinsey reported that by
2018 the U.S. could face a shortage of 140,000 to 190,000 “people with deep
analytic skills” and 1.5 million “managers and analysts with the know-how to
use the analysis of big data to make effective decisions.”
The ensuing panic has led to high demand for data
scientists. Companies of every size and industry – from Google, LinkedIn and
Amazon to the humble retail store – are looking for experts to help them
wrestle big data into submission. Starting salaries are astronomical.
The bubble is bound to burst, of course. In a 2014 Mashable article, Roy Lowrance, the
managing director of New York University’s Center for Data Science program, is
quoted as saying “anything that gets hot like this can only cool off.” But even
as demand for data engineers surges, job postings for big data experts are expected to
remain high.
There are also some indications that the roles of data
scientists and business analysts are beginning to merge. In certain companies, “new look”
data scientists may find themselves responsible for financial planning, ROI
assessment, budgets and a host of other duties related to the management of an
organization.
Professional
Organizations for Data Scientists
- Data Science Association
- International Institute for Analytics (IIA)
- International Machine Learning Society (IMLS)
- Institute for Operations Research and the Management
Sciences (INFORMS)
- SIGKDD
CONNECT ME @ FACEBOOK !
No comments:
Post a Comment