DATA SCIENCE
Data science is a multidisciplinary blend of data inference,
algorithm development, and technology in order to solve analytically
complex problems.
At the core is data. Troves of raw information, streaming in and stored
in enterprise data warehouses. Much to learn by mining it. Advanced
capabilities we can build with it. Data science is ultimately about using this
data in creative ways to generate business value:
Data science –
discovery of data insight
This aspect of data science is all about uncovering findings from data.
Diving in at a granular level to mine and understand complex behaviors, trends,
and inferences. It's about surfacing hidden insight that can help enable
companies to make smarter business decisions. For example:
·
Netflix data mines movie viewing patterns to understand what drives user
interest, and uses that to make decisions on which Netflix original series to
produce.
·
Target identifies what are major customer segments within it's base and
the unique shopping behaviors within those segments, which helps to guide
messaging to different market audiences.
·
Proctor & Gamble utilizes time series models to more clearly
understand future demand, which help plan for production levels more optimally.
How do data scientists mine out insights? It starts with data
exploration. When given a challenging question, data scientists become
detectives. They investigate leads and try to understand pattern or
characteristics within the data. This requires a big dose of analytical
creativity.
Then as needed, data scientists may apply quantitative technique in
order to get a level deeper – e.g. inferential models, segmentation analysis,
time series forecasting, synthetic control experiments, etc. The intent is to
scientifically piece together a forensic view of what the data is really
saying.
This data-driven insight is central to providing strategic guidance. In
this sense, data scientists act as consultants, guiding business stakeholders
on how to act on findings.
Data science –
development of data product
A "data product" is a technical asset that: (1) utilizes data
as input, and (2) processes that data to return algorithmically-generated
results. The classic example of a data product is a recommendation engine,
which ingests user data, and makes personalized recommendations based on that
data. Here are some examples of data products:
·
Amazon's recommendation engines suggest items for you to buy, determined
by their algorithms. Netflix recommends movies to you. Spotify recommends music
to you.
·
Gmail's spam filter is data product – an algorithm behind the scenes
processes incoming mail and determines if a message is junk or not.
·
Computer vision used for self-driving cars is also data product –
machine learning algorithms are able to recognize traffic lights, other cars on
the road, pedestrians, etc.
This is different from the "data insights" section above,
where the outcome to that is to perhaps provide advice to an executive to make
a smarter business decision. In contrast, a data product is technical
functionality that encapsulates an algorithm, and is designed to integrate
directly into core applications. Respective examples of applications that
incorporate data product behind the scenes: Amazon's homepage, Gmail's inbox,
and autonomous driving software.
Data scientists play a central role in developing data product. This
involves building out algorithms, as well as testing, refinement, and technical
deployment into production systems. In this sense, data scientists serve as
technical developers, building assets that can be leveraged at wide scale.
What is data science
– the requisite skill set
Data science is a blend of skills in three major areas:
Mathematics Expertise
At the heart of mining data insight and building data product is the
ability to view the data through a quantitative lens. There are textures,
dimensions, and correlations in data that can be expressed mathematically.
Finding solutions utilizing data becomes a brain teaser of heuristics and
quantitative technique. Solutions to many business problems involve building
analytic models grounded in the hard math, where being able to understand the
underlying mechanics of those models is key to success in building them.
Also, a misconception
is that data science all about statistics. While statistics is important, it is
not the only type of math utilized. First, there are two branches of statistics
– classical statistics and Bayesian statistics. When most people refer tostats they
are generally referring to classical stats, but knowledge of both
types is helpful. Furthermore, many inferential techniques and machine learning
algorithms lean on knowledge of linear algebra. For example, a popular method
to discover hidden characteristics in a data set is SVD, which is grounded in matrix math and has
much less to do with classical stats. Overall, it is helpful for data
scientists to have breadth and depth in their knowledge of mathematics.
Technology and Hacking
First, let's clarify on that we are not talking about
hacking as in breaking into computers. We're referring to the tech programmer
subculture meaning of hacking – i.e.,
creativity and ingenuity in using technical skills to build things and find
clever solutions to problems.
Why is hacking ability important? Because data scientists utilize technology in
order to wrangle enormous data sets and work with complex algorithms, and it
requires tools far more sophisticated than Excel. Data scientists need to be
able to code — prototype quick solutions, as well as integrate with complex
data systems. Core languages associated with data science include SQL, Python,
R, and SAS. On the periphery are Java, Scala, Julia, and others. But it is not
just knowing language fundamentals. A hacker is a technical ninja, able to
creatively navigate their way through technical challenges in order to make
their code work.
Along these lines, a data science hacker is a solid algorithmic thinker,
having the ability to break down messy problems and recompose them in ways that
are solvable. This is critical because data scientists operate within a lot of algorithmic
complexity. They need to have a strong mental comprehension of high-dimensional
data and tricky data control flows. Full clarity on how all the pieces come
together to form a cohesive solution.
Strong Business Acumen
It is important for a data scientist to be a tactical business
consultant. Working so closely with data, data scientists are positioned to
learn from data in ways no one else can. That creates the responsibility to
translate observations to shared knowledge, and contribute to strategy on how
to solve core business problems. This means a core competency of data science
is using data to cogently tell a story. No data-puking – rather, present a
cohesive narrative of problem and solution, using data insights as supporting
pillars, that lead to guidance.
Having this business acumen is just as important as having acumen for
tech and algorithms. There needs to be clear alignment between data science
projects and business goals. Ultimately, the value doesn't come from data,
math, and tech itself. It comes from leveraging all of the above to build
valuable capabilities and have strong business influence.