Home > Computer Science, Opinion, Rants, Tutorials > Why I don’t like Dynamic Typing

Why I don’t like Dynamic Typing

February 25th, 2012

A lot of people consider the static typing found in languages such as C, C++, ML, Java and Scala as needless hairshirtism. They consider the dynamic typing of languages like Lisp, Scheme, Perl, Ruby and Python as a critical advantage (ignoring other features of these languages and other efforts at generic programming such as the STL).

I strongly disagree. I find the pain of having to type or read through extra declarations is small (especially if you know how to copy-paste or use a modern IDE). And certainly much smaller than the pain of the dynamic language driven anti-patterns of: lurking bugs, harder debugging and more difficult maintenance. Debugging is one of the most expensive steps in software development- so you want incur less of it (even if it is at the expense of more typing). To be sure, there is significant cost associated with static typing (I confess: I had to read the book and post a question on Stack Overflow to design the type interfaces in Automatic Differentiation with Scala; but this is up-front design effort that has ongoing benefits, not hidden debugging debt).

There is, of course, no prior reason anybody should immediately care if I do or do not like dynamic typing. What I mean by saying this is I have some experience and observations about problems with dynamic typing that I feel can help others.

I will point out a couple of example bugs that just keep giving. Maybe you think you are too careful to ever make one of these mistakes, but somebody in your group surely will. And a type checking compiler finding a possible bug early is the cheapest way to deal with a bug (and static types themselves are only a stepping stone for even deeper static code analysis).For my examples I will pick on the programming language R (which we have used and written about in the past).

One of the supposed advantages of dynamically typed languages is that “everything is a macro.” That is you write a function and it is really a template that specializes and works over many different data types. For example: suppose we decided to write our own function to compute sample variance in R:

variance <- function(x) {
   n <- length(x)
   sumX <- sum(x)
   sumXX <- sum(x*x)
   (n/(n-1))*(sumXX/n - (sumX/n)*(sumX/n))
}

This works great and even matches the built-in funciton var():

> variance(c(1000000,2000000,3000000,4000000,5000000))
[1] 2.5e+12
> var(c(1000000,2000000,3000000,4000000,5000000))
[1] 2.5e+12

That is it works until we (either knowingly or unknowingly) apply the function to data of a different type:

> variance(as.integer(c(1000000,2000000,3000000,4000000,5000000)))
[1] NA
Warning message:
In x * x : NAs produced by integer overflow

Our macro specialized to calculate over the integers when given integer arguments and then fails due to overflow. Here it is obvious, but in a dynamically typed language we don’t always know the type of what we are passing in as we may have gotten the value from somewhere else. If we define variance() as a function over doubles in a statically typed language then the language would force either an explicit (programmer supplied) or implicit (language supplied) coercion when attempting to use the function on a vector of integers. The problem is: it is a bigger responsibility to write a correct macro (as the macro has to work over more possible types than a simple function). The dynamic language pushes this onto us and sometimes we get burnt and sometimes everything is okay. This sort of consideration is one of the reasons functional programing advocates prefer anonymous functions to declaring on the fly classes: less is possible so it is easier to safely implement what is implied.

Some of the problem can be dispelled with test driven development. I am proponent of test driven development, so much so that I don’t want to waste my valuable test budget testing for things that a decent type system can defend against. Also, by starting broad (assuming it is fair to re-use a function on many different types of arguments) you have entered into a bad bargain where you either have to document what subset of arguments the function works properly on (which is essentially declaring types!), add extra defensive code to cast the arguments on the way in (a waste, and needlessly defensive coding brings in its own problems) or write enough tests to document proper function on a whole bunch of types you don’t actually care about (char, byte, short int …)). Unexpected properties of real world data will throw you enough testing and debugging challenges (for example: the effect of unexpected constant data in bad quicksort implementations) that you don’t need additional hidden challenges that a static type system could exclude.

My second complaint is that most dynamically typed languages go further and force the horrible anti-pattern of automatic (or zero-declaration) variables on us. Since we are not, in a dynamically typed language, required to declare type- it is considered a waste to force the user to declare variables at all (statements like “var colTypeClass“). This argument is seductive because another supposed advantage of dynamically typed languages is conciseness, and variable declarations appear to have little value if you are not declaring types. However consider the following code:

sqlColType <- function(colTypeName) {
   colTypeClass <- 'unhandled'
   if(colTypeName %in% list('smallint','integer','bigint','decimal','numeric','real','double precision','serial','bigserial','money')) {
      colTypeClass <- 'numeric'
   } else if(colTypeName %in% list('character varying','character','text','boolean')) {
      colTypeClass <- 'categorical'
   } else if(colTypeName %in% list('interval','date')) {
      colTypeGlass <- 'temporal'
   } else if(length(grep('time',colTypeName))>0) {
      colTypeClass <- 'temporal'
   }
   colTypeClass
}

This code (for better or for worse, and at some point we all have to write or use something this ugly) is attempting to map specific SQL column type names into broad classes of types (numeric, categorical and temporal). However there is a typo-bug in the above code that is only possible in a language with automatic variable declaration. Consider the following to applications of sqlColType():

> sqlColType('integer')
[1] "numeric"
> sqlColType('date')
[1] "unhandled"

The first result is as designed and the second is wrong. What happened is in the if-block where “date” should have been identified we accidentally spelled “Class” with a “G” and the result we meant to return was trapped in a shiny new automatic variable that never escapes the function. You may consider this particular bug unlikely, but in a language without automatic variable declaration it is literally impossible. And you don’t even have to actually have this bug in your code to suffer from it. This mistake is something you have to check for when inspecting/debugging faulty code (because you have not pre-guarantee it can not happen).

My third complaint is the common lack of significant refactoring tools for dynamically typed languages. The ability to automatically apply larger scale meaningful code changes (such as when using Eclipse’s Java development environment) is big. Dynamic type advocates would argue that most of the successful refactorings are just the IDE shepherding around type cruft that is not present in a dynamic language. This is not true. In addition to the trivial code motion and package management there are significant code transformations: method extraction, method signature alteration and safe variable renaming just to name three. It is a real luxury to work with a system that can safely rename a variable (and all of its references) even when there are other strings and variables using the same token. It is also a luxury to work in teams where nobody can say “yeah, we wanted to remove that argument from the method- but nobody has time to update and test all of the consumers.” Most dynamic languages don’t even have the very clever “poor man’s refactoring” (change the method declaration, attempt a re-compile and then insert changes everyplace the compiler flags an error). When changing a method signature in a typical dynamically typed language you are typically left with the lurking worry that some bit of code somewhere is still attempting to use the old signature and will exhibit a runtime error when the exact set of circumstances required to execute the bad path happen in production (i.e. that you won’t be lucky enough to find it in a test). IDEs have a somewhat dirty reputation as being a crutch (somewhat due to horrible interface builders and large boilerplate systems), but the treatment of code as an object subject to a series of meaningful transformations is game changing (and is most commonly associated with statically typed languages, somewhat by historic accident but also likely due to the presence of extra declaration blocks often in statically typed languages and not due to the actual type system itself).

To sum up: dynamic typing allows more expressive code and saves space. But we pay a large cost downstream in more expensive debugging and much weaker ability to refactor or analyze. I favor the compromise where most code is statically typed and either only language supplied functions are capable of dynamic typing or there are user escapes out (like templating). While there is some doubt as to whether you can design a language as powerful as Scheme or Python without dynamic typing (some attempts have failed and some attempts are still evolving) I still prefer static typing. Or (more accurately) I prefer to deal with statically typed code (and am willing to put up with some expense to have it). Initial coding is not the only phase of the software lifecycle.


Be Sociable, Share!
  1. Andrew
    February 25th, 2012 at 07:05 | #1

    Hmm… R is a poor choice for a comparison term for static languages.
    Java/C/C++ and ML don’t support “ranged types” and other type informations regarding actual values, that’s a different kind of semantic.
    ML derivatives have problems of readability and expressivity (try fun twice f x = f f x and then twice hd [[1;2];[3;4]], but hd hd [[1;2];[3;4]] = 1)

    You should have considered Common Lisp, which as declare, check-type, deftype… and has the ability of constructing languages like Shen or Qi for dynamcs, and include Ada among the static ones.

    Ada’s and ML approaches are dual to each other IMO, i.e. each one makes some things easy and others hard, and the other does the vice versa.

  2. Charlie Chaplen
    February 25th, 2012 at 07:07 | #2

    I have a question. Is a1 + a2 + a3 / 3 the same as ((a1 + a2 / 2) + ((a1 + a2) + a3)/2)/2?

    If it is, then you could simplify the formula you use to find averages of two nearby elements, avoiding integer overflow.

    Just my 2c.

  3. BenG
    February 25th, 2012 at 08:01 | #3

    I had the exact same argument with a django developper : no type check basically means you can’t get any descent help at code time because you need to actually run the code to know what’s going on.

    So, as an answer, the guy told me :
    Well, if you’re coding well, you should have proper code comment above every function declaration that tells you which are the types of the function parameters (and your IDE should understand those because you’re using standardized code comments), you also need a good documentation that tells you what are all the methods available for a given class, and you should unit test every single function as well.

    => Basically, a good dynamic language developer has to compensate for the lack of compile-time type checking by writing tons of stuff “all around” the code. I say, what a waste of time…

  4. Pat
    February 25th, 2012 at 08:13 | #4

    This article suffers from a combination of very selective examples, and over-generalizations.

    The first code you present fails because of “integer overflow”. All of the languages you mention as examples of dynamic typing use bignums (Lisp, Scheme, Ruby, Python, and Perl though in Perl I believe the user has to enable it), so there isn’t even a concept of integer overflow. All of the statically typed languages you mention (C, C++, Java, ML, and Scala) still suffer from integer overflow problems. So in this case, not only is static typing not the issue, but the statically typed languages happen to be worse at this. I guess that’s why you ignored all 10 of the programming languages you mentioned up to this point and gave the example in R.

    You say “I don’t want to waste my valuable test budget testing for things that a decent type system can defend against”, which is a fair point, but which of C, C++, Java, ML, and Scala would you say has a “decent type system”, if they can’t even do integer multiplication? In any of the 5 dynamic languages mentioned, you don’t need to “defend against” the possibility of 3rd-grade arithmetic.

    You say “most dynamically typed languages go further and force the horrible anti-pattern of automatic (or zero-declaration) variables on us”, but half the dynamically typed languages you listed earlier don’t suffer from this. Again, you chose to demonstrate the example in R, not a dynamic language like Lisp whose compiler would have caught this. This is not a type error but a scope error, a completely orthogonal issue. (Most dynamic languages with loose scoping rules have static checkers that can catch this at build-time for dynamic languages, as well, e.g., there is a Pylint for Python. R seems to be the exception here in that it has neither.)

    You say “My third complaint is the common lack of significant refactoring tools for dynamically typed languages”, which is actually kind of funny since all of your Java refactoring tools are more-or-less direct descendants of the refactoring tools invented for Smalltalk, a dynamic language. You are absolutely right that “the treatment of code as an object subject to a series of meaningful transformations is game changing”, but Lisp (and later Smalltalk) did this, not Java. As Yegge pointed out, dynamic language refactorings are really good, too, and static language refactorings can never be perfect, either.

    What I got from this post is that R somehow unfortunately managed to take the worst aspects of dynamic languages and the worst aspects of static languages, and combine them into one language which static-typing advocates are using to hate on dynamic typing. Can we call a truce, and all just hate on R instead? :-)

  5. February 25th, 2012 at 09:04 | #5

    Really enjoyed your article. One dynamic language that will avoid the `colTypeGlass` pitfall is Groovy, which requires that all variables be initiated with the `def` keyword. Although I’m hesitant to consider Groovy as dynamic as Python or PHP, since it’s really just a convenient wrapper around Java that introduces ideas like closures and MetaClass programming.

  6. kikito
    February 25th, 2012 at 09:24 | #6

    Regarding your first complaint, I would counter that most of the time one doesn’t really care what the type of a particular variable is – just that it has this or that method. When a specific type is needed, dynamically typed languages usually allow casting, or checking that the type is of certain type, or has certain properties. And yes, usually this involves adding more code to the function that needs it. But you only have to add it to the places that need it.

    In your example, if you really wanted to use doubles in that function, then the first thing you should do would be transforming the input parameters into doubles (which, by the way, can also cause an overflow)

    On the other hand, I strongly disagree with the “dynamic typing is hidden debugging depth” part. Bugs provoked by an incorrect type are *very* uncommon – especially if using TDD. I have not seen one in months. And in the 2 or 3 occasions I have found bugs of this kind, they were trivial to fix; not what I would call “lurking”.

    Stronger typing would have saved me maybe 3 bugs in the course of last year, but in total the amount of time I spent fixing them was less than 1 hour. Just the time it takes to type the declarations is easily an order of magnitude above that, at least.

    On your second point: I totally agree. Implicit variable declaration is evil and should not be allowed. Especially if that variable is made global by default. Contrarily to type-related mistakes, typo-related mistakes are fairly common.

    Regarding your last point (IDEs) I don’t particularly enjoy using them – although my customized vim editor is as powerful as one. I think it is easier to make tools for statically typed languages – if a tool needs to know the type of a variable, the programmer has already provided on its definition. A tool dealing with a dynamic language would not have that luxury. There is also the fact that DT languages have been mainstream for less time than STs. The tools have had less time to develop. But that’s not a problem of the paradigm.

  7. February 25th, 2012 at 10:19 | #7

    Hi John,

    You well said your point. I also strongly agree with you. I had the horrible experience of working with people who constantly argue dynamic vs static typing. The one argument I always here from dynamic world is “Oh..this problem can be solved in two lines of ruby code” which I find not very convincing. And much to the disgrace, these days young programmers are getting more into dynamic languages without going through the world of static typing. I believe a person who understands static typing can write better code in dynamic typing. People who started from dynamic typing might find static typing to be very ceremonial. But truth being said, software should be developed to its totality without leaving any unhandled situations.

    For example, if I have to open a file in python, I can do it in 2 lines such as
    f= open(“file1.txt””,’r’)
    d = f.read()

    In java, we all know that this will take more than 10 lines to open and read a file. But once I have written the code in Java, I am sure that it will work all the time irrespective of any errors such as file not found, stream exceptions etc (coz of checked exception handling)

    Good post. I was also planning a topic like this for a long time. You wrote it best.

  8. February 25th, 2012 at 12:15 | #8

    Thanks all for the comments and criticisms (only way to improve). Just a few points I would like to touch on.

    First the examples seeming artificial. Bugs tend to be subtle and examples need to be clear, so there is a conflict. Both of the bugs (misspelled field name and function not working right over ints did in fact happen to me). The misspelled field name did not in fact make it to production (so I did over play this)- but that is only because (as some commenters pointed out) I use methods to defend against this sort of bugs (in particular fine grain modular design and aways after I finish a procedure or method causing search to highlight all instances of an important variable so misspellings stand out by their failure to highlight). The second bug (which is an odd one since R is so oriented to doubles) was found in production because an R ODBC driver promoted a variable type to integer based on what it secretly learned from the schema table (oops).

    And yes, Eclipse is largely based on IBM’s SmallTalk IDE, so it is possible (and even historically important) to associate refactoring with dynamically typed languages. I don’t think refactoring really uses the type system for much (but I do think it uses the extra declarations that tend to be a side-effect of a static type system).

    And finally, yes I can live without an IDE. I really enjoy Scala (even though I have never gotten the Scala Eclipse plugin to work reliably). The type inference is nice and I like that every variable is declared as “var” (reference to change) or “val” (reference can not change).

    And sorry about the delay approving comments. These were all great comments, but due to spammers I hold all comments- so they only get approved when I get back online.

  9. February 25th, 2012 at 18:08 | #9

    I have to agree with other folks that R is a bad thing to be comparing to anything. I rarely get type errors in Lisp, Python or Clojure, and when I do, it’s immediately obvious I did something silly. I’ve made fairly big complicated things in these languages, and the dynamic typing generally seems to help. R is a shambling type error waiting to happen. Oh, was that a TS or a vector? I’m going to guess what you meant and not say anything!
    C and C++ take the static typing approach to being a shambling type error waiting to happen. Oh, was that a pointer to something? I’m going to guess you’re being clever. Boom.
    I always wanted to like “statically typed languages with their shit in order” like ML family languages, and can see the advantages to them, but could never get used to the type inferencer.

  10. Markk
    February 25th, 2012 at 21:23 | #10

    Has there been any actual evidence that one style or another typing really leads to less errors per line or per function point or whatever the current metric?

    I have found that the types are actually very bad indicators of how variables are actually used and some kind of value ranges are much better. I personally think dynamic typing plus value ranges would likely be the most error free environment for humans to program in, but I have never seen a language that comes close to this. SQL perhaps might be closest in a sense with foreign keys.

    I do like predeclaration.

  11. an anonymous viewer
    February 26th, 2012 at 10:37 | #11

    Have you tried S3/S4 classes and methods in R?

    variance = function(object){
    UseMethod(“variance “)
    }
    varianceS4.numeric = function(object){
    ……
    }
    varianceS4.integer = function(object){
    ……
    }

    setMethod(f = “variance”, signature(object = “numeric”), definition = varianceS4.numeric)
    setMethod(f = “variance”, signature(object = “integer”), definition = varianceS4.integer)

  12. February 26th, 2012 at 12:07 | #12

    I don’t buy your first argument since no type system deals with over/underflows. AAMOF the overflow error you describe can’t be reproduced in strongly typed languages like, at least, C or C++. I must point out though that while the upper layers of your application might take input parameters of any type/class, their return is type/class restricted hence you are “kind of” forcing a type system, that is restricting the type/class your data is.

    The if/then hell you describe is sensible, but let me point you out to the anti-IF campaign (www.antiifcampaign.org) which IMHO makes a good read and deals about the problems you describe.

    Last but not least I’d like to point out that even though R is considered a dynamically-typed programming language, its S4 object system is not as far a method dispatch is concered, hence we could solve the problems you talk about ith a little bit of programming discipine.

    My two cents to an otherwise thought provoking writing.

  13. February 26th, 2012 at 17:28 | #13

    I’ve been mainly writing in Go for the past several months, and one of the things that I enjoy about it is that it is statically typed BUT it does smart things so that you don’t constantly have to type in (and thus read) type names. A simple example:

    func foo(a, b int) string {

    }

    bar := foo(1, 100)

    The last line declares and assigns a new variable bar, but it doesn’t force you to precede it with “string”.

  14. February 26th, 2012 at 18:44 | #14

    @jcborras
    Doubles won’t prevent the overflow (but accept a much larger range than ints before they do this). As others have pointed out you could for the implementation to work in bignums (I think Lisp programmers did this quite often and cluck clucked at C in one of the “worse is better” type articles).

  15. February 26th, 2012 at 18:48 | #15

    @Pat
    I don’t hate R. But it is a great stat package embedded in a very strange language. I do like Python when I get to work with it.

  16. Indra
    February 26th, 2012 at 22:20 | #16

    If your programming language uses dynamic typing, you need to run your code often to make sure you are coding right i.e. to make sure you are not too far away from the green light. Why is that such a bad thing? You are not sure of your code until you run it anyway – so sooner you run your code, better it is. There are languages and development environments where running code frequently is not an expensive operation – with languages like ruby, python and even PHP it is possible. In such cases dynamic typing is a boon as it provides very rapid form of development. However, there are languages and development environments where running your code frequently is an expensive for whatever reasons. In such environment using a dynamically typed language can be a pita. Again dynamic typing is not just about not defining the type of a variable in your code – it is much more than that. You need to work with several such languages to understand the productivity it brings to a developer.

  17. mandev
    February 27th, 2012 at 00:36 | #17

    I can’t agree more with you. My personal experience also tends to prove that you lost more time in testing and debug with dynamic typing language (vs static). I remember being badly hurt, with a typo error for a variable name (ex. “Iamination” instead of “lamination”, try to find the error with a sans serif font!). It’s a schame to have to rely on a color syntax editor to find this kind of error. Now, except for small scripts or proof concepts, I will never advice using a dynamic language.

  18. Stephan
    February 27th, 2012 at 01:41 | #18

    After a long time of programming in C++ I changed to Java in 1998 and found that I was about 2-3 times faster in programming, wrote stabler code and had less problems debugging. In 2008 I changed mainly to Ruby and it gave me the same feeling.

    I don’t know if it is due to programming discipline, TDD or the use of Ruby but I have the impression that I have less problems than before as well when it comes to the code stability as also to finding bugs once they occur.

  19. February 27th, 2012 at 02:31 | #19

    You can never say whether dynamic or static typing is better or worse. Don’t feed the fanboys. Each approach simply carries its tradeoffs. Maybe your use cases make it a proper choice to use statically typed language – just use it and be happy, but don’t think you are “absolutely right”.

    Regarding macros (but similar issues can arise with non-macro-aware languages like python): in a dynamically typed language, you must be ready to maximize the old ‘program to an interface, not to an implementation’ motto. Nothing checks whether the passed-in object conforms to such interface; when writing the function, be sure you program to such interface and make it clear what kind of interface the function expects for the argument, be it via comments or class/function names; then it’s the caller’s responsibility to pass the proper object.

    Regarding IDEs, I’d argue that you’re partially right – even though one of the issues with dynamic languages is that they’ve not got such a commercial/enterprise support yet. When they have, often the IDEs turn very good (see RubyMine or PyCharm as examples) – don’t expect the same refactoring support as statically typed languages, of course, but you often don’t need that as writing-rewriting is much faster, and a good test coverage will just do.

  20. ma2bd
    February 28th, 2012 at 15:34 | #20

    Thanks for your nice article John!

    If you interested in web programming, you should check out Opa: http://opalang.org
    Opa is a statically-typed, functional language to code Web apps in a fast and safe way.

    Opa has been designed to resist XSS attacks automatically using types (just try any code!), whereas most frameworks currently fail (see http://www.cs.berkeley.edu/~prateeks/papers/empirical-webfwks.pdf)

    Recently, a typed MongoDB support (equivalent to an ORM) was added.

  21. February 28th, 2012 at 20:24 | #21

    I am seeing comments on Reddit of the form “you like static types because you are careless.” Maybe so, maybe not. But to my mind a lot of behaviors can be inferred from tastes. And I really find it hard to believe that somebody that finds type-precuations so distasteful is that much more careful about the rest of their coding habits (documentation, modularization, design of invariants and testing to name a few). Perhaps the current trade-offs are good (larger code size leading to more bugs), but I still think careful programers should want some of the aid a static type system can offer.

  22. Gary
    February 29th, 2012 at 04:16 | #22

    I am with you!

    To me, the only advantage of dynamic typing is its flexibility to add behaviors at runtime. However, to take this advantage, you have to compromise correctness, type-safety and predictability. Not affordable in large software systems!

    Defensive code or tests may alleviate some of the above issues. But how much defensive coding is enough? How much testing is enough? The answer probably is never enough!! And defensive coding is ugly!

    Here is good news for static typing. With loosely-coupling (interface) and dynamic decoration, you can add behaviors at runtime. Please check out: http://www.codeproject.com/Articles/312512/Adapt-To-Changes-With-Dynamic-Behaviors.

    I don’t see a need for dynamic typing as long as static typing can provide the same flexibility!

  23. Gary
    February 29th, 2012 at 04:23 | #23

    Here another link discusses dynamic behaviors with static typing: http://www.codeproject.com/Articles/316325/Dynamic-Behaviors-Or-Dynamic-Typing.

  24. March 2nd, 2012 at 11:56 | #24

    @jmount
    R is a great stats package infected by a very strange language.
    One of my “I’ll never do this but wish I had time to do so” projects would be to figure out a way to call C and Fortran based R packages directly from Python or Lisp (rather than the various round about ways of calling them via encapsulated R).

  25. March 4th, 2012 at 01:12 | #25

    My point is that even strongly typed languages like C or C++ don’t issue runtime errors due to overflows (be them of integer or floating point type), hence a type system might not be the universal solution to the problem you are complaining about@jmount

  26. nicolas
    April 23rd, 2012 at 01:55 | #26

    The best is : dynamic when needed, static where possible. And FSharp recently achieved a fantastic breakthrough, widely underrecognized even by people in the know of the said feature : Type Provider.

    Static types are a metaphor. it is here to pretend the world is static and let you program easily within that metaphor. But the world truely is dynamic, and that is where type provider kick in, building ‘static’ type *on the fly*, that is, from dynamic data. That means your static, and widely enjoyable, metaphor just expanded vastly its domain of reach.

    This for me is the killer feature that will gain importance in every langage.

Comments are closed.