Posted on Categories Coding, RantsTags ,

More on “npm” leftpad

Being interested in code quality and software engineering practice I have been following (with some relish) the current Javascript tempest in a teapot: “NPM & left-pad: Have We Forgotten How To Program?” (see also here for more discussion).


NewImage
Image: Ben Halpern @ThePracticalDev

What happened is:

  1. A corporate site called NPM decided to remove control of a project called “Kik” from its author and give it to a company that claimed to own the trademark on “Kik.” This isn’t actually how trademark law works or we would see the Coca-Cola Company successfully saying we can’t call certain types of coal “coke” (though it is the sort of world the United States’s “Digital Millennium Copyright Act” assumes).
  2. The author of “Kik” decided since he obviously never had true control of the distribution of his modules distributed through NPM he would attempt to remove them (see here). This is the type of issue you worry about when you think about freedoms instead of mere discounts. We are thinking more about at this as we had to recently “re-sign” an arbitrary altered version of Apple’s software license just to run “git status” on our own code.
  3. Tons of code broke because it is currently more stylish to include dependencies than to write code.
  4. Egg is on a lot of faces when it is revealed one of the modules that is so critical to include is something called “leftpad.”
  5. NPM forcibly re-published some modules to try and mitigate the damage.

Everybody is rightly sick of this issue, but let’s pile on and look at the infamous leftpad.

Here is leftpad (copied from the source):

module.exports = leftpad;
function leftpad (str, len, ch) {
  str = String(str);
  var i = -1;
  if (!ch && ch !== 0) ch = ' ';
  len = len - str.length;
  while (++i < len) {
    str = ch + str;
  }
  return str;
}

It is going to be a bit long (as it is library code and has to check for more exceptional cases). It appears to be dealing with issues of ch being missing or masquerading as an integer (which I am not sure it gets right- the behavior is may to be pretty nasty if 0 is accidentally passed in instead of '0' for ch). I am sure the author of leftpad is in fact an accomplished and high ability developer, but this one instances just doesn’t look like good code.

But let me show you how padding the word “string” to have total length 10 (by adding 4 zeros) is done in a couple of other languages (one of the reasons authors in these languages would never consider importing something for this purpose).

  • Python:
    
    v = 'string'
    '0'*(10-len(v))+v
    
  • R:
    
    v <- 'string'
    paste(paste(rep('0',max(0,10-nchar(v))),collapse=''),
       v,sep='')
    

Obviously we can wrap these in a function.

And the Javascript has the classic bit of code that costs you the job during a Google style interview:

 while (++i < len) {
    str = ch + str;
 }

Now I don’t code in Javascript, and I know this doesn’t matter for small len: but this snippet looks bad to the educated eye. Unless there is some clever optimization we don’t know about (such as a compiler visibility analysis replacing the commonly immutable Javascript string with some sort of string builder) the code is inefficient in that it likely takes Omega(len^2) time (often colloquially mis-called O(len^2) time). Now there are some cases where what looks like inefficient code is replaced by efficient code (due to a clever escape analysis, the R language supplies such facilities as we discuss here). But I doubt that is the case here. If it is the case (the code performs differently than it superficially appears), that is important to point out in comments or documentation.

This seems to indicate some users of this code likely never looked at it. So they really had no basis for having a direct opinion as to its quality. I assume Javascript authors use dependencies so much because they are tired from the automatic semi-colon insertion wars (I won’t even link to that).

I can, given a little time, write a reliable efficient string pad function. If one’s argument for including a package for this is that you can not be so trusted, then you really should spend some more time practicing until you can. Nobody can safely navigate a large and changing dependency system reliably over time, so you bring in dependencies when the unavoidable future risk is worth it (such as for reliable date/time classes, databases, and such). Bringing in dependencies because “everybody else does it” isn’t “best practice” it is “cargo cult development.” Yet defending code and practices like the above seems to be a hill a number of people are eager to die on.

I will, also, check if a library I already know and trust brings in a good version of a function I want to use. For example in R I tend to use stringr::str_pad for my string padding. This does bring in a lot of dependencies, but it brings them in all at once and brings in the benefits of trusted faster C++ code.

I am not saying I don’t write bad code (I unfortunately sometimes do), I am saying I find it more productive and less effort to fix such code than to concoct complicated defenses of such code.

It has been our experience that you can find defenders for any bad code or practice under the “its all subjective” rubric. However it is not all subjective, there are actually a few principles of code and development quality which you ignore at your own peril.

4 thoughts on “More on “npm” leftpad”

    1. A good point, but I’d still want to double check if it does optimize for the “str = ‘0’ + str” pattern that is being used here (as “+=” optimizations tend to apply to “str = str + ‘0’” patterns).

      And the runtime does appear to be super-linear (bad) at least for large values on Safari OSX (admittedly not the best browser/Javascript in the world). Also for any reasonable real application I know you are not going to see a slowdown. But why not write it the right way (especially in a library!)?

      <html>
      <head>
      <script>
      function leftpad (str, len, ch) {
        str = String(str);
        var i = -1;
        if (!ch && ch !== 0) ch = ' ';
        len = len - str.length;
        while (++i < len) {
          str = ch + str;
        }
        return str;
      }
      function timek(len) {
        t0 = (new Date()).getTime();
        leftpad('string',len,'.');
        t1 = (new Date()).getTime();
        return "<tr><td>" + len + "</td><td>" + (t1-t0) + "</td></tr>";
      }
      </script>
      </head>
      <body>
      <script>
      document.write("<table>")
      document.write(timek(1000000))
      document.write(timek(2000000))
      document.write(timek(5000000))
      document.write(timek(10000000))
      document.write(timek(20000000))
      document.write(timek(50000000))
      document.write(timek(100000000))
      document.write(timek(200000000))
      document.write(timek(500000000))
      document.write("</table>")
      </script>
      </body>
      </html>
      

      Timing on Safari OSX (though at this size the slow down could be from memory allocation rules or many other sources).

      # R code, show square fits better than linear.
      # less that 1/10th the residual error.
      d = data.frame(
        size = c(
          1000000,
          2000000,
          5000000,
          10000000,
          20000000,
          50000000,
          100000000,
          200000000,
          500000000
        ),
        time = c(40, 76, 185, 355, 722, 1836, 3900, 12201, 70511)
      )
      
      summary(lm(time~size,data=d))
      
      # Call:
      #   lm(formula = time ~ size, data = d)
      # 
      # Residuals:
      #   Min     1Q Median     3Q    Max 
      # -11579  -1518   2448   3259   5879 
      # 
      # Coefficients:
      #   Estimate Std. Error t value Pr(>|t|)    
      # (Intercept) -3.455e+03  2.358e+03  -1.466    0.186    
      # size         1.362e-04  1.285e-05  10.599 1.46e-05 ***
      #   ---
      #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      # 
      # Residual standard error: 5963 on 7 degrees of freedom
      # Multiple R-squared:  0.9413,	Adjusted R-squared:  0.933 
      # F-statistic: 112.3 on 1 and 7 DF,  p-value: 1.456e-05
      
      summary(lm(time~I(size*size),data=d))
      
      # Call:
      #   lm(formula = time ~ I(size * size), data = d)
      # 
      # Residuals:
      #   Min      1Q  Median      3Q     Max 
      # -504.42 -366.15  -99.41  446.26  591.20 
      # 
      # Coefficients:
      #   Estimate Std. Error t value Pr(>|t|)    
      # (Intercept)    5.441e+02  1.696e+02   3.208   0.0149 *  
      #   I(size * size) 2.803e-13  2.008e-15 139.549 2.56e-13 ***
      #   ---
      #   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
      # 
      # Residual standard error: 466.7 on 7 degrees of freedom
      # Multiple R-squared:  0.9996,	Adjusted R-squared:  0.9996 
      # F-statistic: 1.947e+04 on 1 and 7 DF,  p-value: 2.56e-13
      

      However, the Python code is much faster (also in different units) and shows more linear looking time complexity. Though obviously for more certainty we would want to repeat many more runs.

      import time
      
      def leftpad(str,lengoal,ch):
          return(ch*(lengoal-len(str)) + str)
      
      def timek(len):
          start = time.time()
          leftpad('string',len,'.')
          end = time.time()
          return("" + str(len) + "," + str(end-start))
      
      print('size,time')
      print(timek(1000000))
      print(timek(2000000))
      print(timek(5000000))
      print(timek(10000000))
      print(timek(20000000))
      print(timek(50000000))
      print(timek(100000000))
      print(timek(200000000))
      print(timek(500000000))
      
      
      size,time
      1000000,0.0010819435119628906
      2000000,0.0017080307006835938
      5000000,0.00436091423034668
      10000000,0.007009983062744141
      20000000,0.012006044387817383
      50000000,0.03125810623168945
      100000000,0.06902599334716797
      200000000,0.24092912673950195
      500000000,0.4371521472930908
      

  1. Good for you for actually testing it. But the original library was for node.js so that would be the most relevant test. It appears that V8 does optimize prepend or at least did a few years ago [1]. So in the environment it was intended for the code might not be as bad as you think.

    It might also be possible to do better. Recent browsers have a String.repeat function [2] which might result in something more like Python performance.

    There is also a polyfill at the bottom of that page (for browsers that don’t have repeat) with a different strategy that looks like it would do a logarithmic number of appends.

    [1] https://gist.github.com/mraleph/3397008
    [2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/repeat

Comments are closed.