A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site.
Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real . It reduced the calculation time for a scenario in modelStudio from 30 minutes to 7 minutes. Following the advice in your book is helping me a lot with understanding R and the models you can create with R: Thanks
This is exactly what we were hoping for when we added Chapter 5 Data engineering and data shaping to the 2nd edition of the book. The chapter is organized by data manipulation task (what you are trying to do, or your sub-goal) and then teaches the mere methodology in base-
dplyr. The hope was: a Rosetta Stone of data manipulation solutions, that would help many readers- and not lock them into any one notation.
Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.
This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).
Continue reading Data re-Shaping in R and in Python
I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow.
Continue reading New Timings for a Grouped In-Place Aggregation Task
We have a new
rquery vignette here: Working with Many Columns.
This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation).
Please check it out.
We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions.
Continue reading Using a Column as a Column Index