Tuesday, October 25, 2016

A Day in the Life of a Data Scientist

What is a Data Scientist ?

The power of data science

A Day in the Life of a Data Scientist

Life as a Data Scientist at Adobe

LinkedIn Data Scientist Talks Statistics

Interview with Ta Chiraphadhanakul, Data Scientist at Facebook

Inside Facebook's Data-Science Team

Facebook Data Scientist Interview


The Life of a Data Scientist

Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, massage and organize them. Then they apply all their analytic powers – industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover hidden solutions to business challenges
Data Scientist Responsibilities
“A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.”
On any given day, a data scientist may be required to:
  • Conduct undirected research and frame open-ended industry questions
  • Extract huge volumes of data from multiple internal and external sources
  • Employ sophisticated analytics programs, machine learning and statistical methods to prepare data for use in predictive and prescriptive modeling
  • Thoroughly clean and prune data to discard irrelevant information
  • Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities
  • Devise data-driven solutions to the most pressing challenges
  • Invent new algorithms to solve problems and build new tools to automate work
  • Communicate predictions and findings to management and IT departments through effective data visualizations and reports
  • Recommend cost-effective changes to existing procedures and strategies
Every company will have a different take on job tasks. Some treat their data scientists as glorified data analysts or combine their duties with data engineers; others need top-level analytics experts skilled in intense machine learning and data visualizations.
As data scientists achieve new levels of experience or change jobs, their responsibilities invariably change. For example, a person working alone in a mid-size company may spend a good portion of the day in data cleaning and munging. A high-level employee in a business that offers data-based services may be asked to structure big data projects or create new products.
Data Scientist Salaries
The term “data scientist” is the hottest job title in the IT field – with starting salaries to match. It should come as no surprise that Silicon Valley is the new Jerusalem. According to a 2014 Burtch Works study, 36% of data scientists work on the West Coast. Entry-level professionals in that area earn a median base salary of $100,000 – 22% more than their Northeast peers.

Data Scientist

Average Salary (2015):
 $118,709 per year
Median Salary (2015):
 $93,991 per year
Total Pay Range:
 $63,524 – $138,123

Senior Data Scientist

Median Salary (2015):
 $124,273 per year
Total Pay Range:
 $89,801 – $179,445
Data Scientist Qualifications
Broadly speaking, you have 3 education options if you’re considering a career as a data scientist:
  1. Degrees and graduate certificates provide structure, internships, networking and recognized academic qualifications for your résumé. They will also cost you significant time and money.
  2. MOOCs and self-guided learning courses are free/cheap, short and targeted. They allow you to complete projects on your own time – but they require you to structure your own academic path.
  3. Bootcamps are intense and faster to complete than traditional degrees. They may be taught by practicing data scientists, but they won’t give you degree initials after your name.
Academic qualifications may be more important than you imagine. As Burtch Works notes, “it’s incredibly rare for someone without an advanced quantitative degree to have the technical skills necessary to be a data scientist.”
In its data science salary report, Burtch Works determined that 88% of data scientists have a master’s degree and 46% have a PhD. The majority of these degrees are in rigorous quantitative, technical or scientific subjects, including math and statistics (32%), computer science (19%) and engineering (16%).
With that being said, companies are desperate for candidates with real-world skills. Your technical know-how may trump preferred degree requirements.
What Kind of Skills Will I Need?

Technical Skills

  • Math (e.g. linear algebra, calculus and probability)
  • Statistics (e.g. hypothesis testing and summary statistics)
  • Machine learning tools and techniques (e.g. k-nearest neighbors, random forests, ensemble methods, etc.)
  • Software engineering skills (e.g. distributed computing, algorithms and data structures)
  • Data mining
  • Data cleaning and munging
  • Data visualization (e.g. ggplot and d3.js) and reporting techniques
  • Unstructured data techniques
  • R and/or SAS languages
  • SQL databases and database querying languages
  • Python (most common), C/C++ Java, Perl
  • Big data platforms like Hadoop, Hive & Pig
  • Cloud tools like Amazon S3
This list is always subject to change. As Anmol Rajpurohit suggests, “generic programming skills are a lot more important than being the expert of any particular programming language.”

Business Skills

  • Analytic Problem-Solving: Approaching high-level challenges with a clear eye on what is important; employing the right approach/methods to make the maximum use of time and human resources.
  • Effective Communication: Detailing your techniques and discoveries to technical and non-technical audiences in a language they can understand.
  • Intellectual Curiosity: Exploring new territories and finding creative and unusual ways to solve problems.
  • Industry Knowledge: Understanding the way your chosen industry functions and how data are collected, analyzed and utilized.
Note: You can view a handy trajectory on How to Become a Data Scientist in an infographic from Datacamp. Also, KDnuggets.com is a great source of information on big data, machine learning, and data science topics.

What About Certifications?

To avoid wasting time on poor quality certifications, ask your mentors for advice, check job listing requirements and consult articles like Tom’s IT Pro “Best Of” certification lists. Here are a few that focus on useful skills:

Certified Analytics Professional (CAP)

CAP was created in 2013 by the Institute for Operations Research and the Management Sciences (INFORMS) and is targeted towards data scientists. During the certification exam, candidates must demonstrate their expertise of the end-to-end analytics process. This includes the framing of business and analytics problems, data and methodology, model building, deployment and life cycle management.
  • 5+ years of analytics work-related experience for BA/BS holder in a related area
  • 3+ years of analytics work-related experience for MA/MS (or higher) holder in a related area
  • 7+ years of analytics work-related experience for BA/BS (or higher) holder in an unrelated area
  • Verification of soft skills/provision of business value by employer
  • Agreement to adhere to Code of Ethics

Cloudera Certified Professional: Data Scientist (CCP:DS)

Targeted towards the elite level, the CCP:DS is aimed at data scientists who can demonstrate advanced skills in working with big data. Candidates are drilled in 3 exams – Descriptive and Inferential Statistics, Unsupervised Machine Learning and Supervised Machine Learning – and must prove their chops by designing and developing a production-ready data science solution under real-world conditions.
Related Cloudera certifications include:

EMC: Data Science Associate (EMCDSA)

The EMCDSA certification tests your ability to apply common techniques and tools required for big data analytics. Candidates are judged on their technical expertise (e.g. employing open source tools such as “R”, Hadoop, and Postgres, etc.) and their business acumen (e.g. telling a compelling story with the data to drive business action).
Once you’ve passed the EMCDSA, you can consider the Advanced Analytics Specialty. This works on developing new skills in areas such as Hadoop (and Pig, Hive, HBase), Social Network Analysis, Natural Language Processing, data visualization methods and more.

SAS Certified Predictive Modeler using SAS Enterprise Miner 7

This certification is designed for SAS Enterprise Miner users who perform predictive analytics. Candidates must have a deep, practical understanding of the functionalities for predictive modeling available in SAS Enterprise Miner 7 before they can take the performance-based exam. This exam includes topics such as data preparation, predictive models, model assessment and scoring and implementation.
Related SAS certifications include:
 Jobs Similar to Data Scientist
Some data scientists get their start working as low-level Data Analysts, extracting structured data from MySQL databases or CRM systems, developing basic visualizations or analyzing A/B test results. These jobs aren’t usually that challenging.
However, once you have your technical skills in order, you have plenty of options. If you’d like to push beyond your analytical role, you could think about building/engineering/architecture jobs such as:
Data Scientist Job Outlook
In an oft-cited 2011 big data study, McKinsey reported that by 2018 the U.S. could face a shortage of 140,000 to 190,000 “people with deep analytic skills” and 1.5 million “managers and analysts with the know-how to use the analysis of big data to make effective decisions.”
The ensuing panic has led to high demand for data scientists. Companies of every size and industry – from Google, LinkedIn and Amazon to the humble retail store – are looking for experts to help them wrestle big data into submission. Starting salaries are astronomical.
The bubble is bound to burst, of course. In a 2014 Mashable article, Roy Lowrance, the managing director of New York University’s Center for Data Science program, is quoted as saying “anything that gets hot like this can only cool off.” But even as demand for data engineers surges, job postings for big data experts are expected to remain high.
There are also some indications that the roles of data scientists and business analysts are beginning to merge. In certain companies, “new look” data scientists may find themselves responsible for financial planning, ROI assessment, budgets and a host of other duties related to the management of an organization.
Professional Organizations for Data Scientists


Why data science is the fastest growing industry in tech right now?

Data science allows the tech industry to make the claim that they know you better than you do.
It’s not that data was never there; it has always been around. The difference is that now, more data is being collected and at a faster rate, and we finally have the ability to do something with it.
The power of data science has led to such a fast growing industry that data scientists are hard to come by. A 2013 McKinsey report predicted that by 2018, there would be a shortage of 190,000 data scientists in the United States, and a shortage of 1.5 million analysts capable of doing something about the big data flood headed their way.
So why is data science becoming the hot ticket in tech?

Data Science Is About Answers And Decision-Making

In “The Unreasonable Effectiveness of Data”, a 2009 paper by Peter Norvig, Google’s AI expert, the power of big data was summed up perfectly:
“Simple models and a lot of data trump more elaborate models based on less data.”
In other words, more data is better, if you know what to do with it.
Hilary Mason, former chief data scientist at Bit.ly would agree. In a recent interview with Gigaom, Mason said that what is taken for granted now, as far as available data fields go, would only have been “theoretical” 10 or more years ago. So why is data science growing so quickly in the tech industry right now?
Quite simply, data science allows for: accurate answers, decisions based on what is actually happening, and predicting the next big trend.
  1. Data Science Provides More Accurate Answers
Let’s take a look at a real-world example of Norvig’s theory: The 2014 Academy Awards. That year, data scientists who turned to big data proved to be incredibly accurate in predicting who would win. Motion picture columnists and other acclaimed experts, despite their experience in dealing with film and knowledge of who had won in the past, fared poorly in the accuracy of their predictions.
Predictive analytics handily beat the best guesses of the experts.
The tech industry is pulling in massive amounts of data from users on mobile apps and websites, tracking where they go, when they go, what they buy, what they share, what they click, and who their friends are. They are in the perfect position to use this data to “predict the winners”, discovering accurate answers where an educated guess, in the past, would have been as good as they could get.
  1. Data Science Changes How Decisions Are Made
Are data-driven companies any better than those who don’t rely on big data to make their decisions? A study in the Harvard Business Review revealed some surprising results. The more a company believed they were data-driven, the better they did in objective financial and operations measurements. Companies in the top 33% of their industry who relied on data-driven decisions were 6% more profitable than their competitors.
Whether it was the data itself, or the confidence these companies felt about their decisions because they were based on data, such companies did do better.
Without access to data, decisions have always been made by delegating the important decisions to those who have the experience, the HiPPO (“highest paid person’s opinion”). They, in turn, rely on the patterns and instincts they’ve developed over the years. Relying on one person’s gut instincts or perceived smarts instead of data is dangerous; just ask retailer JC Penney, who saw their company spiral downward under leadership that eschewed collecting or monitoring datafor any decisions.
The tech industry embraces a different model. It’s summed up quite simply: you can’t manage what you can’t measure.
Decisions are no longer relegated to the person with the most experience. Instead, data scientists dig into data and find a reason to make a decision based on what is actually happening.
  1. Data Science Finds The Trends
Data scientists are a bit like artists. They look at big data, interpreting that data in the hopes of spotting trends. Data scientists inside an organization do this interpretation with an eye towards organizational goals.
Data science is what tells you what’s hot before the experts even see it on the radar. This is competitive advantage to the nth degree. Forget copycat trends, corporate espionage, or stealing the competitor’s best workers. Data science taps into the information that’s already out there, the information that’s pointing the way a trend is headed.
The ability to ferret out trends can be applied to more than making business decisions. Trend-spotting can be the actual product. Consider Google’s Flu Trends, which uses an algorithm that crunches data collected through its search engine. It’s been surprisingly accurate, beating the CDC to the predictive punch by about two weeks.

The Tech Industry Is Attractive To Data Scientists

For the serious data scientist — a person who has that rare set of skills that combines math, marketing, science, and analytics — the tech industry offers unique opportunities not found elsewhere.
  1. Faster Returns On Data Experiments
Scott Clark, a data scientist at Yelp, preferred the speed the tech industry offered him. According to Clark, making a small change in the Yelp website would have a bigger impact, stretching over millions of people, as compared to the slower return on an experiment that he might see in an academic setting. That speed can be a double-edged sword, however, as that demand for speed and analyzed results adds a layer of stress to data scientists in tech.
The faster returns aren’t the only reason the tech industry is pulling in data scientists. During the recent recession, opportunities in academia or on Wall Street dried up as research funding was reduced. Data scientists turned towards tech, filling a need, revealing their value, and driving home the promise and importance of data science.
  1. The Data Floodgates Are Already Open
The McKinsey report predicted that by 2020, there will 40,000 exabytes of data collected. Someone has to do something with that data.
Data scientists in the tech industry are positioned at the leading edge of this data deluge, one that’s already pouring in from mobile apps, internet, social media, ecommerce, and wearable technology.
Big data is growing in importance to the tech industry thanks to the tech industry itself. Cloud computing led to an increase in data collection because it made it scalable. New approaches to massive pools of data (“data lakes”) are making that data more fluid. Traditionally, data sets are designed first, before any data is collected. By creating data lakes, this approach is flipped. Massive amounts of data can be collected before designing the model. You can collect the data without knowing before hand what you’re going to do with it.

Big Business Needs Data Science Tools

Profitable business is always a driver of technology, and data science is no different. Big data has the potential to be mashed into data sets to understand customers much better than relying on those unscientific experts who make a best guess on hunches and past experience.
Instead of guessing at what to recommend to a customer, businesses can use data sets that tell them exactly what a customer wants depending on the season, weather, past purchases, geographic location, and life events. Information pulled from RFID tags is of little use if you don’t know what to do with that data.
Retail giant Target is well-known for harnessing the power of big data, discovering female customers who were pregnant based on the products they purchased, and then showering them with ads and coupons for baby-related items. Netflix and Amazon are well known for their powerful recommendation engines that use not only what people buy, but also what they look at. Credit card companies have tapped into the associative power found in big data, learning that people who buy anti-scuff furniture pads are also more likely to make their payments on time.
Big Business, it turns out, really needs Big Data. And because of that, the tech industry is needed to harness the power of data science into something usable. Someone must create the apps and systems and algorithms that power these data-driven customer targeting engines.

Why Data Science Matters And How It Powers Business Value?


Without professional expertise that will turn cutting-edge technology into actionable insights, Big Data is nothing. Today, a lot more organizations and institutions in the financial sector as well, are opening up their doors to big data and unlocking its power, thus increasing the value of a data scientist who knows how to drive the value of a large amount of information that already exists inside an institution.

It has now become a universal truth that modern businesses are awash with data. Last year, McKinsey estimated that big data initiatives in the US healthcare system “could account for $300 billion to $450 billion in reduced health-care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health-care costs”. However, bad data is estimated to be costing the US roughly $3.1 trillion a year.

It is becoming clear by the day that the value lies in processing and analysis of data – and that is where a data scientist steps into the spotlight. Executives have heard of how data science is a sexy industry, and how data scientists represent superheroes, but most are still unaware about the value a data scientist holds in an organization.
What a Data Scientist Does
Most data scientists in the industry have advanced degrees and training in statistics, math, and computer science. Their experience is a vast horizon that also extends to data visualization, data mining, and information management. It is fairly common for them to have previous experience in infrastructure design, cloud computing, and data warehousing.

Here are instances when a company can benefit from having a data scientist:
·      When there is a need to crunch large volumes of numbers
·      When possessing lots of operational and customer data

·    When they can benefit from social media streams, credit data, consumer research or third-party data sets

The Ways a Data Scientist Can Add Value to Business

8 ways a Data Scientist can add value to any business:

1. Empowering management and officers to make better decisions

A data scientist who is experienced will serve as a trusted advisor and strategic partner to the management of an institution and ensure that the staff maximizes their analytics’ capabilities. A data scientist will communicate and demonstrate the value of the institution’s analytics product to facilitate an improved process of decision making across the various levels of an organization, through measuring, tracking, and recording all the performance metrics.

2. Directing the actions based on trends which in turn help in defining goals

A data scientist examines and explores the institution’s data, after which they recommend and prescribe certain actions that will help improve the institution’s performance, and better engage customers, ultimately increasing profitability.

3. Challenging the staff to adopt best practices and focus on issues that matter.

One of the responsibilities of a data scientist is to ensure that the staff is familiar and well-versed with the organization’s analytics product. They prepare the staff for success with the demonstration of the effective use of the system to extract insight and drive action. Once the staff understands the product capabilities, their focus can shift to addressing the key business challenges.

4. Identifying opportunities

During their interaction with the organization’s current analytics system, data scientists question the existing processes and assumptions for the purpose of developing additional methods and analytical algorithms. Their job requires them to continuously and constantly improve the value that is derived from the organization’s data.

5. Decision making with quantifiable, data-driven evidence.

With the arrival of data scientists, data gathering and analyzing from various channels has ruled out the need to take high stake risks.

6. Testing these decisions

Half of the battle involves making certain decisions and implementing those changes. What about the other half? It is crucial to know how those decisions have affected the organization. This is where a data scientist comes in. It pays to have someone who can measure the key metrics that are related to important changes and quantify their success.

7. Identification and refining of target audiences

From Google Analytics to customer surveys, companies will have at least one of the many bases of customer data that is collected. But if it isn’t used well, for instance - to identify demographics, the data wouldn’t be useful.

A data scientist can help with the identification of the key groups with precision, via thorough analysis of disparate sources of data. With this in-depth knowledge, organizations can tailor services and products to customer groups, and help profit margins flourish.

8. Recruiting the right talent for the organization

Running through CVs through the day is a daily chore in a recruiter’s life, but that is changing due to big data. With the amount of information available on talent - through social media, corporate databases, and job search websites - data science specialists can work their way through this data and hunt the best of candidates that will fit the organization’s needs.

Recruitment will thus no longer be an exhausting and time consuming human review process. Through mining the vast amount of data that is already available, in-house processing of CVs and applications, and even sophisticated data-driven aptitude tests and games, data science can help your recruitment team make speedier and more accurate selections.
Data science can definitely add value to business by the addition of statistics and insights across workflow, be it hiring new candidates to helping senior staff make better and informed decisions. Data science can add value across all industries.

Interested in a career in Big Data? Simplilearn offers a wide range of courses in the subject with instructor led training from industry experts, as well as hands on experience, practice tests, and high quality eLearning content. So get out there, and get certified. 


Data Science - An Introduction

Data science is a multidisciplinary blend of data inference, algorithm development, and technology in order to solve analytically complex problems.
At the core is data. Troves of raw information, streaming in and stored in enterprise data warehouses. Much to learn by mining it. Advanced capabilities we can build with it. Data science is ultimately about using this data in creative ways to generate business value:

Data science – discovery of data insight
This aspect of data science is all about uncovering findings from data. Diving in at a granular level to mine and understand complex behaviors, trends, and inferences. It's about surfacing hidden insight that can help enable companies to make smarter business decisions. For example:
·        Netflix data mines movie viewing patterns to understand what drives user interest, and uses that to make decisions on which Netflix original series to produce.
·        Target identifies what are major customer segments within it's base and the unique shopping behaviors within those segments, which helps to guide messaging to different market audiences.
·        Proctor & Gamble utilizes time series models to more clearly understand future demand, which help plan for production levels more optimally.
How do data scientists mine out insights? It starts with data exploration. When given a challenging question, data scientists become detectives. They investigate leads and try to understand pattern or characteristics within the data. This requires a big dose of analytical creativity.
Then as needed, data scientists may apply quantitative technique in order to get a level deeper – e.g. inferential models, segmentation analysis, time series forecasting, synthetic control experiments, etc. The intent is to scientifically piece together a forensic view of what the data is really saying.
This data-driven insight is central to providing strategic guidance. In this sense, data scientists act as consultants, guiding business stakeholders on how to act on findings.
Data science – development of data product
A "data product" is a technical asset that: (1) utilizes data as input, and (2) processes that data to return algorithmically-generated results. The classic example of a data product is a recommendation engine, which ingests user data, and makes personalized recommendations based on that data. Here are some examples of data products:
·        Amazon's recommendation engines suggest items for you to buy, determined by their algorithms. Netflix recommends movies to you. Spotify recommends music to you.
·        Gmail's spam filter is data product – an algorithm behind the scenes processes incoming mail and determines if a message is junk or not.
·        Computer vision used for self-driving cars is also data product – machine learning algorithms are able to recognize traffic lights, other cars on the road, pedestrians, etc.
This is different from the "data insights" section above, where the outcome to that is to perhaps provide advice to an executive to make a smarter business decision. In contrast, a data product is technical functionality that encapsulates an algorithm, and is designed to integrate directly into core applications. Respective examples of applications that incorporate data product behind the scenes: Amazon's homepage, Gmail's inbox, and autonomous driving software.
Data scientists play a central role in developing data product. This involves building out algorithms, as well as testing, refinement, and technical deployment into production systems. In this sense, data scientists serve as technical developers, building assets that can be leveraged at wide scale.

What is data science – the requisite skill set
Data science is a blend of skills in three major areas:

Mathematics Expertise
At the heart of mining data insight and building data product is the ability to view the data through a quantitative lens. There are textures, dimensions, and correlations in data that can be expressed mathematically. Finding solutions utilizing data becomes a brain teaser of heuristics and quantitative technique. Solutions to many business problems involve building analytic models grounded in the hard math, where being able to understand the underlying mechanics of those models is key to success in building them.
Also, a misconception is that data science all about statistics. While statistics is important, it is not the only type of math utilized. First, there are two branches of statistics – classical statistics and Bayesian statistics. When most people refer tostats they are generally referring to classical stats, but knowledge of both types is helpful. Furthermore, many inferential techniques and machine learning algorithms lean on knowledge of linear algebra. For example, a popular method to discover hidden characteristics in a data set is SVD, which is grounded in matrix math and has much less to do with classical stats. Overall, it is helpful for data scientists to have breadth and depth in their knowledge of mathematics.
Technology and Hacking
First, let's clarify on that we are not talking about hacking as in breaking into computers. We're referring to the tech programmer subculture meaning of hacking – i.e., creativity and ingenuity in using technical skills to build things and find clever solutions to problems.
Why is hacking ability important? Because data scientists utilize technology in order to wrangle enormous data sets and work with complex algorithms, and it requires tools far more sophisticated than Excel. Data scientists need to be able to code — prototype quick solutions, as well as integrate with complex data systems. Core languages associated with data science include SQL, Python, R, and SAS. On the periphery are Java, Scala, Julia, and others. But it is not just knowing language fundamentals. A hacker is a technical ninja, able to creatively navigate their way through technical challenges in order to make their code work.
Along these lines, a data science hacker is a solid algorithmic thinker, having the ability to break down messy problems and recompose them in ways that are solvable. This is critical because data scientists operate within a lot of algorithmic complexity. They need to have a strong mental comprehension of high-dimensional data and tricky data control flows. Full clarity on how all the pieces come together to form a cohesive solution.
Strong Business Acumen
It is important for a data scientist to be a tactical business consultant. Working so closely with data, data scientists are positioned to learn from data in ways no one else can. That creates the responsibility to translate observations to shared knowledge, and contribute to strategy on how to solve core business problems. This means a core competency of data science is using data to cogently tell a story. No data-puking – rather, present a cohesive narrative of problem and solution, using data insights as supporting pillars, that lead to guidance.
Having this business acumen is just as important as having acumen for tech and algorithms. There needs to be clear alignment between data science projects and business goals. Ultimately, the value doesn't come from data, math, and tech itself. It comes from leveraging all of the above to build valuable capabilities and have strong business influence.

Connect Me @ Facebook !