Authors: John Mount and Nina Zumel.
p-value is a valid frequentist statistical concept that is much abused and mis-used in practice. In this article I would like to call out a few features of
p-values that can cause problems in evaluating summaries.
Keep in mind:
p-values are useful and routinely taught correctly in statistics, but very often mis-remembered or abused in practice.
Continue reading Remember: p-values Are Not Effect Sizes
- Question: how hard is it to count rows using the
- Answer: surprisingly difficult.
When trying to count rows using
dplyr controlled data-structures (remote
tbls such as
dbplyr structures) one is sailing between Scylla and Charybdis. The task being to avoid
dplyr corner-cases and irregularities (a few of which I attempt to document in this "
Continue reading It is Needlessly Difficult to Count Rows Using dplyr
While working on a large client project using
Sparklyr and multinomial regression we recently ran into a problem:
Apache Spark chooses the order of multinomial regression outcome targets, whereas
R users are used to choosing the order of the targets (please see here for some details). So to make things more like
R users expect, we need a way to translate one order to another.
Providing good solutions to gaps like this is one of the thing Win-Vector LLC does both in our consulting and training practices.
Continue reading Permutation Theory In Action