R is a powerful data science language because, like Matlab, numpy, and Pandas, it exposes vectorized operations. That is, a user can perform operations on hundreds (or even billions) of cells by merely specifying the operation on the column or vector of values.
Of course, sometimes it takes a while to figure out how to do this. Please read for a great R matrix lookup problem and solution.
Continue reading R Tip: How To Look Up Matrix Values Quickly
I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks.
vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables for later use.
Continue reading Re-Share: vtreat Data Preparation Documentation and Video
vtreat version 1.5.2 just became available from CRAN.
We have a logged a few improvement in the NEWS. The changes are small and incremental, as the package is already in a great stable state for production use.
Continue reading What is New For vtreat 1.5.2?
Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial.
This reflects our opinion on the “which is better for data science R or Python?” They both are great. So start with one, and expect to eventually work with both (if you are lucky).
Continue reading Data re-Shaping in R and in Python
wrapr 1.9.6 is now up on CRAN.
We unfortunately usually forget to say this. A big thank you to the staff and volunteers at CRAN.
Continue reading wrapr 1.9.6 is now up on CRAN
In our last note we stated that unpack is a good tool for load R RDS files into your working environment. Here is the idea expanded into a worked example.
Continue reading Using unpack to Manage Your R Environment
I would like to introduce an exciting feature in the upcoming 1.9.6 version of the wrapr R package: value unpacking.
Continue reading unpack Your Values in R
We had such a positive reception to our last Introduction to Data Science promotion, that we are going to try and make the course available to more people by lowering the base-price to $29.99. We are also creating a 1 month promotional price of $20.99. To get a permanent subscription to the course for less than $21 just visit this link https://www.udemy.com/course/introduction-to-data-science/ and use the discount code
ITDS21 any time in January of 2020.
Combine this with the new second edition of Practical Data Science with R, and you have a great study set to succeed at substantial statistical modeling and analytics tasks using the R programming language.
(Note: Lego mini-fig not included!)