Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python

by: Peter Bruce (0)

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this popular guide adds comprehensive examples in Python, provides practical guidance on applying statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.

Many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

With this book, you’ll learn:

  • Why exploratory data analysis is a key preliminary step in data science
  • How random sampling can reduce bias and yield a higher-quality dataset, even with big data
  • How the principles of experimental design yield definitive answers to questions
  • How to use regression to estimate outcomes and detect anomalies
  • Key classification techniques for predicting which categories a record belongs to
  • Statistical machine learning methods that "learn" from data
  • Unsupervised learning methods for extracting meaning from unlabeled data.

The Reviews

The content of this book gets 5 stars. I especially appreciate the author including Python this time around. However, O'Reilly decided to print this book in black and white. That isn't acceptable for a $50+ book where you need to be able to distinguish between colored lines on charts.Thankfully I have an O'Reilly subscription where I can view the digital book in color, as I imagine the author intended.

I read through a previous version of this book when I was mainly using R, and it was incredible. One of the better stats application books I've read. Since I switched to Python this year, I was very happy to see that they released a version with Python content. However, I've thus far been very disappointed. The stats content is still great, but overall, the Python code is very often missing comments, doesn't run properly, or some mix of both. The book is still a good primer on the stats that a data scientist needs, but don't expect the code snippets to provide much guidance.

The book is well thought out and the explanations of the concepts are sound. The subtitle is a little misleading giving the impression that the book covers both R and Python equally. The reality is that is puts much more emphasis on R programming language and the Python code is an after thought.

Examples use data that is not providedHard to followCode is provided with little or no explanationWithout the underlying data you can’t reproduce itNot very enlightening

I had purchased a new physical copy of the book, and realized there were several pages that were blank and missing. I contacted O'Reilly about the problem and they were extremely quick with a resolution! They were able to give me a different copy so I could read it without the missing pages. The content of the book itself is good, except in all black and white, which doesn't bother me personally but may bother someone else when it comes to the graphs. I think the R and Python content are both great, and it keeps the code concise and quick to the point. Great for R beginners, but for python users I would recommend a little more experience. As for the math parts, its great for those who are new to statistics and gives easy to read explanations, and a great refresher for those who just want to review some of the concepts. I especially like the sections provided for further reading, which have been helpful.

I've taken many stats classes, most of them using R, at the undergraduate and graduate level, and I really wish I found this book before I did. I picked this book up as a refresher, and not only did it succinctly describe all and a bit more of what I learned in those courses, but it has excellent "further readings," great clarifying synonym lists when it defines "key terms," and is very readable. Literally blown away.

No punches pulled in this book, great for getting right in and doing work.

Shout out to the author for embracing Python

Learning a lot.

The book is amazing and very useful, for beginners also. The most valuable from my point of view is presence of code both for R and Python, which helps understand the syntax better for one language if you know another.

Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
⭐ 4.6 💛 612
Buy the Book