Archive

Posts Tagged ‘Regression’

What is meant by regression modeling?

April 22nd, 2014 No comments

What is meant by regression modeling?

Linear Regression is one of the most common statistical modeling techniques. It is very powerful, important, and (at first glance) easy to teach. However, because it is such a broad topic it can be a minefield for teaching and discussion. It is common for angry experts to accuse writers of carelessness, ignorance, malice and stupidity. If the type of regression the expert reader is expecting doesn’t match the one the writer is discussing then the writer is assumed to be ill-informed. The writer is especially vulnerable to experts when writing for non-experts. In such writing the expert finds nothing new (as they already know the topic) and is free to criticize any accommodation or adaption made for the intended non-expert audience. We argue that many of the corrections are not so much evidence of wrong ideas but more due a lack of empathy for the necessary informality necessary in concise writing. You can only define so much in a given space, and once you write too much you confuse and intimidate a beginning audience. Read more…

Unprincipled Component Analysis

February 10th, 2014 4 comments

As a data scientist I have seen variations of principal component analysis and factor analysis so often blindly misapplied and abused that I have come to think of the technique as unprincipled component analysis. PCA is a good technique often used to reduce sensitivity to overfitting. But this stated design intent leads many to (falsely) believe that any claimed use of PCA prevents overfit (which is not always the case). In this note we comment on the intent of PCA like techniques, common abuses and other options.

The idea is to illustrate what can quietly go wrong in an analysis and what tests to perform to make sure you see the issue. The main point is some analysis issues can not be fixed without going out and getting more domain knowledge, more variables or more data. You can’t always be sure that you have insufficient data in your analysis (there is always a worry that some clever technique will make the current data work), but it must be something you are prepared to consider. Read more…

The cranky guide to trying R packages

February 13th, 2011 Comments off

This is a tutorial on how to try out a new package in R. The summary is: expect errors, search out errors and don’t start with the built in examples or real data.

Suppose you want to try out a novel statistical technique? A good fraction of the time R is your best bet for a first trial. Take as an example general additive models (“Generalized Additive Models,” Trevor J Hastie, Robert Tibshirani, Statistical Science (1986) vol. 1 (3) pp. 297-318); R has a package named “gam” written by Trevor Hastie himself. But, like most R packages, trying the package from the supplied documentation brings in unfamiliar data and concerns. It is best to start small and quickly test if the package itself is suitable to your needs. We give a quick outline of how to learn such a package and quickly find out if the package is for you.

Read more…

CRU graph yet again (with R)

December 13th, 2009 3 comments

IowaHawk has a excellent article attempting to reproduce the infamous CRU climate graph using OpenOffice: Fables of the Reconstruction. We thought we would show how to produced similarly bad results using R.
Read more…

Categories: Rants, Statistics Tags: , ,

A Demonstration of Data Mining

August 19th, 2009 2 comments

REPOST (now in HTML in addition to the original PDF).

This paper demonstrates and explains some of the basic techniques used in data mining. It also serves as an example of some of the kinds of analyses and projects Win Vector LLC engages in. Read more…