Posted on Categories Administrativia, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, TutorialsTags , , , Leave a comment on Data engineering and data shaping in Practical Data Science with R 2nd Edition

Data engineering and data shaping in Practical Data Science with R 2nd Edition

A kind reader recently shared the following comment on the Practical Data Science with R 2nd Edition live-site.

Thanks for the chapter on data frames and data.tables. It has helped me overcome an obstacle freeing me from a lot of warnings telling me my data table was not a real . It reduced the calculation time for a scenario in modelStudio from 30 minutes to 7 minutes. Following the advice in your book is helping me a lot with understanding R and the models you can create with R: Thanks

This is exactly what we were hoping for when we added Chapter 5 Data engineering and data shaping to the 2nd edition of the book. The chapter is organized by data manipulation task (what you are trying to do, or your sub-goal) and then teaches the mere methodology in base-R, data.table, and dplyr. The hope was: a Rosetta Stone of data manipulation solutions, that would help many readers- and not lock them into any one notation.

Posted on Categories data science, Practical Data Science, Pragmatic Data Science, Pragmatic Machine Learning, Statistics, TutorialsTags , , , , , , ,

Data re-Shaping in R and in Python

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.

This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).

Continue reading Data re-Shaping in R and in Python

Posted on Categories data science, Opinion, Pragmatic Data Science, TutorialsTags , , , , , , , , , 1 Comment on New Timings for a Grouped In-Place Aggregation Task

New Timings for a Grouped In-Place Aggregation Task

I’d like to share some new timings on a grouped in-place aggregation task. A client of mine was seeing some slow performance, so I decided to time a very simple abstraction of one of the steps of their workflow.

Continue reading New Timings for a Grouped In-Place Aggregation Task

Posted on Categories data science, TutorialsTags , , , , ,

New rquery Vignette: Working with Many Columns

We have a new rquery vignette here: Working with Many Columns.

This is an attempt to get back to writing about how to use the package to work with data (versus the other-day’s discussion of package design/implementation).

Please check it out.

Posted on Categories Coding, data science, Programming, TutorialsTags , , , , 15 Comments on Using a Column as a Column Index

Using a Column as a Column Index

We recently saw a great recurring R question: “how do you use one column to choose a different value for each row?” That is: how do you use a column as an index? Please read on for some idiomatic base R, data.table, and dplyr solutions.

Continue reading Using a Column as a Column Index

Posted on Categories Coding, data science, Exciting Techniques, Programming, Statistics, TutorialsTags , , ,

Wanted: cdata Test Pilots

I need a few volunteers to please “test pilot” the development version of the R package cdata, please.

Jackie Cochran at 1938 Bendix Race
Jacqueline Cochran: at the time of her death, no other pilot held more speed, distance, or altitude records in aviation history than Cochran.

Continue reading Wanted: cdata Test Pilots