Reflections on Curly Braces – Apple’s SSL Bug and What We Should Learn From It

Everyone’s shaking their heads

First of all, I assume that by now, everyone who has ever read a single tweet in his/her life has heard about Apple’s instantly infamous “gotofail” bug by now, and most of you have probably already read Imperial Violet’s analysis of it.

To sum up the debacle in short: A duplicate line of code, goto fail;, causes a critical SSL certificate verification algorithm to jump out of a series of validations at an unexpected time, causing a success value to be returned, and thus rendering the service vulnerable to attacks.

Bad. To say the least.

Now it seems that people unanimously agree in blaming missing curly braces around the if statement in this piece of code

if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;

for the entire mess, and the common conclusion from this fiasco is “always put curly braces around your if statements, and this will never happen to you”.

Or will it? I mean, I find it rather curious that everyone seems to blame the mouse, while there’s a giant elephant in the room…

Now let’s look at that code again

Here’s the entire method, taken from Apple’s open source publication:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    SSLBuffer       hashOut, hashCtx, clientRandom, serverRandom;
    uint8_t         hashes[SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN];
    SSLBuffer       signedHashes;
    uint8_t            *dataToSign;
    size_t            dataToSignLen;
 
    signedHashes.data = 0;
    hashCtx.data = 0;
 
    clientRandom.data = ctx->clientRandom;
    clientRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;
    serverRandom.data = ctx->serverRandom;
    serverRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;
 
 
    if(isRsa) {
        /* skip this if signing with DSA */
        dataToSign = hashes;
        dataToSignLen = SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN;
        hashOut.data = hashes;
        hashOut.length = SSL_MD5_DIGEST_LEN;
 
        if ((err = ReadyHash(&SSLHashMD5, &hashCtx)) != 0)
            goto fail;
        if ((err = SSLHashMD5.update(&hashCtx, &clientRandom)) != 0)
            goto fail;
        if ((err = SSLHashMD5.update(&hashCtx, &serverRandom)) != 0)
            goto fail;
        if ((err = SSLHashMD5.update(&hashCtx, &signedParams)) != 0)
            goto fail;
        if ((err = SSLHashMD5.final(&hashCtx, &hashOut)) != 0)
            goto fail;
    }
    else {
        /* DSA, ECDSA - just use the SHA1 hash */
        dataToSign = &hashes[SSL_MD5_DIGEST_LEN];
        dataToSignLen = SSL_SHA1_DIGEST_LEN;
    }
 
    hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
    hashOut.length = SSL_SHA1_DIGEST_LEN;
    if ((err = SSLFreeBuffer(&hashCtx)) != 0)
        goto fail;
 
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
 
    err = sslRawVerify(ctx,
                       ctx->peerPubKey,
                       dataToSign,                /* plaintext */
                       dataToSignLen,            /* plaintext length */
                       signature,
                       signatureLen);
    if(err) {
        sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify "
                    "returned %d\n", (int)err);
        goto fail;
    }
 
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
 
}

I find it amusing that the first thing anyone would think when looking at this code is “there should have been curly braces”. To make a point, here’s how that would look:

static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    SSLBuffer       hashOut, hashCtx, clientRandom, serverRandom;
    uint8_t         hashes[SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN];
    SSLBuffer       signedHashes;
    uint8_t            *dataToSign;
    size_t            dataToSignLen;
 
    signedHashes.data = 0;
    hashCtx.data = 0;
 
    clientRandom.data = ctx->clientRandom;
    clientRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;
    serverRandom.data = ctx->serverRandom;
    serverRandom.length = SSL_CLIENT_SRVR_RAND_SIZE;
 
 
    if(isRsa) {
        /* skip this if signing with DSA */
        dataToSign = hashes;
        dataToSignLen = SSL_SHA1_DIGEST_LEN + SSL_MD5_DIGEST_LEN;
        hashOut.data = hashes;
        hashOut.length = SSL_MD5_DIGEST_LEN;
 
        if ((err = ReadyHash(&SSLHashMD5, &hashCtx)) != 0) {
            goto fail;
        }
        if ((err = SSLHashMD5.update(&hashCtx, &clientRandom)) != 0) {
            goto fail;
        }
        if ((err = SSLHashMD5.update(&hashCtx, &serverRandom)) != 0) {
            goto fail;
        }
        if ((err = SSLHashMD5.update(&hashCtx, &signedParams)) != 0) {
            goto fail;
        }
        if ((err = SSLHashMD5.final(&hashCtx, &hashOut)) != 0) {
            goto fail;
        }
    }
    else {
        /* DSA, ECDSA - just use the SHA1 hash */
        dataToSign = &hashes[SSL_MD5_DIGEST_LEN];
        dataToSignLen = SSL_SHA1_DIGEST_LEN;
    }
 
    hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
    hashOut.length = SSL_SHA1_DIGEST_LEN;
    if ((err = SSLFreeBuffer(&hashCtx)) != 0) {
        goto fail;
	}
    if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0) {
        goto fail;
    }
    if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0) {
        goto fail;
    }
    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0) {
        goto fail;
    }
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
        goto fail;
    }
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0) {
        goto fail;
    }
 
    err = sslRawVerify(ctx,
                       ctx->peerPubKey,
                       dataToSign,                /* plaintext */
                       dataToSignLen,            /* plaintext length */
                       signature,
                       signatureLen);
    if(err) {
        sslErrorLog("SSLDecodeSignedServerKeyExchange: sslRawVerify "
                    "returned %d\n", (int)err);
        goto fail;
    }
 
fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
 
}

Can you spot the error? I mean: That, obviously, would have made one hell of a difference, wouldn’t it?

What seems to be the problem?

There’s no arguing around the fact that the duplicate line of code is a horrible, horrible bug. But as to how it got there, I beg to differ from the conclusion that everyone else seems to agree on:

Those braces aren’t at fault. A lazy programmer is.

Alright, maybe not entirely. A minor part in this mess can be attributed to the IDE (Xcode, I would assume) not catching the fact that a sizeable portion of the code is unreachable. A modern IDE should really show a warning in such cases, and as Peter Nelson points out, even Xcode seems to have an option for that, though it isn’t on by default – strangely enough, I might add.

But how do we fix it?

Now what can we learn from this? Here’s a number of things we can do to avoid this kind of disaster:

  1. Just try the damn thing and see if it works

    Duh. I mean: really. Why wouldn’t you? And since the purpose of this code is obviously not to allow a key exchange, but rather to deny it, if anything is not according to protocol, you should be throwing all the stuff that could be forged at it, not just valid values.

  2. Write an automated test

    This is the next obvious step, and like the manual test, it should verify all the possible ways the certificate validation could fail, first of all. Landon Fuller wrote up an example to show it is quite possible to run an integration test against this method.

  3. Have someone else review your code

    Another obvious one. This is a hugely critical piece of code at a very, very exposed position in a fundamental part of an operating system – no way this should have ever seen the light of day without at least a second pair of eyes having a look at it. No. Way.

  4. Pair program

    One step up from code reviews: Two brains are smarter than one. Four eyes see more than two. Your code will instantly get better in every way if you agree to share ownership of it. Even if you overlook something like this when hacking away at your code, your pairing partner most likely won’t. Also, they might have better ideas on how to do things, such as:

  5. Conditionals should express what you actually want to check

    That, to me, is one of the most valuable pieces of advice you can take from Uncle Bob Martin:

    If-statements should encapsulate code that is executed only when a condition is true -
    not jump out of an algorithm or method, if otherwise.

    In this case, instead of employing if(err != 0) and what looks like ten million goto fail; commands, the broken part of the method should have checked for (err == 0), and thus looked at least like this:

    if ((err = SSLFreeBuffer(&hashCtx)) == 0)
         if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) == 0)
              if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) == 0)
                   if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) == 0)
                        if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) == 0)
                             err = SSLHashSHA1.final(&hashCtx, &hashOut);
     
    if (err) 
        goto fail;

    which, then, can be simplified even further to

    if ((err = SSLFreeBuffer(&hashCtx)) == 0 &&
        (err = ReadyHash(&SSLHashSHA1, &hashCtx)) == 0 &&
        (err = SSLHashSHA1.update(&hashCtx, &clientRandom)) == 0 &&
        (err = SSLHashSHA1.update(&hashCtx, &serverRandom)) == 0 &&
        (err = SSLHashSHA1.update(&hashCtx, &signedParams)) == 0 )
            err = SSLHashSHA1.final(&hashCtx, &hashOut);
     
    if (err) 
        goto fail;

    Notice how this kind of structure shows what we really want to do: Execute a sequence of steps, and proceed to the next step only if the current step did not return an error. If it did, then any subsequent statements won’t execute, and if err is not 0 after the whole block, there’s one goto fail;, which also states the programmer’s original intent more precisely: If anything went wrong, exit the method.

  6. Don’t copy and paste code

    The most blatant thing I noticed when I glanced over the rest of the source file that contains the bug is the amount of duplicate or nearly duplicate code that can be found. Clearly, someone tried to go the easy way and copied/pasted the same code all over the place. I found minor variations of the defective if-goto-sequence

        if ((err = SSLFreeBuffer(&hashCtx)) != 0)
            goto fail;
        if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
            goto fail;
        if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
            goto fail;
        if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
            goto fail;
        if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
            goto fail;
        if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
            goto fail;

    in at least 5 places – all virtually identical, and all equally horrid; a definite sign of gratuitous and unreflected use of copy and paste.

    In fact, you can eliminate the bug, its cause and about a third of the code in the SSLVerifySignedServerKeyExchange() method by extracting this rather uniform sequence of calls to a HashReference into its own method:

    static OSStatus SSLApplyHash( const HashReference *hash,
                                 SSLBuffer *hashCtx,
                                 SSLBuffer *clientRandom,
                                 SSLBuffer *serverRandom,
                                 SSLBuffer *signedParams,
                                 SSLBuffer *hashOut) {
        OSStatus        err;
        if ((err = SSLFreeBuffer(hashCtx)) == 0 &&
            (err = ReadyHash(hash, hashCtx)) == 0 &&
            (err = hash->update(hashCtx, clientRandom)) == 0 &&
            (err = hash->update(hashCtx, serverRandom)) == 0 &&
            (err = hash->update(hashCtx, signedParams)) == 0 )
            err = hash->final(hashCtx, hashOut);
        return err;
    }

    which can then be called from anywhere using a single line, such as:

    err = SSLApplyHash(&SSLHashMD5, &hashCtx, &clientRandom, &serverRandom, &signedParams, &hashOut);

    I bet that would eliminate at least 50 lines of crappy code from the source file.

    [Update]
    As pg (see comments section) points out, cleaning up this part of the code should have gone even further. Please refer to his very good post to find out just how far.

  7. Make your methods do only one thing

    If you put it in natural language, the SSLVerifySignedServerKeyExchange() method “does a lot of magical things with hashes (no need to get into details), which need to be done in a specific order, but only if all steps in the sequence run without error, and with slight differences depending on which kind of keys are used, and log errors otherwise.”
    Clearly, that is quite a lot more than one thing. And that is a very distinct indicator that it should be significantly shorter, and more precise.
    To me, it should really be split up into several individual methods:

    • The above-mentioned SSLApplyHash utility method
    • A utility method to initialize SSLBuffer variables from the passed in context data, to be used somewhat like this:
      SSLBuffer clientRandom = SSLRandomBuffer( ctx->clientRandom );
    • Two more methods to encapsulate the RSA and non-RSA execution paths
    • Finally, the original API method, which should basically only decide whether to execute the RSA or non-RSA paths, and write the log message in case of an error.

    Apart from getting rid of a lot of code duplication (not just within this method, but in many other places within the file, as well), this would make the algorithm a lot more readable and thus overlooking errors far less likely.

  8. Use readable, descriptive variable names

    Try to explain the meaning of these without reading the context of the surrounding code:
    hashes,
    hashOut,
    hashCtx,
    ctx,
    clientRandom,
    serverRandom,
    signedParams,
    dataToSign
    .

    Wouldn’t it have been a lot more understandable to call them, say,
    appliedHashResults,
    hashOutputBuffer,
    tmpHashContext,
    sslContext,
    randomClientValueForSharedSecret,
    randomServerValueForSharedSecret,
    messageToSign
    ?

    And these were only the first quick ideas I came up with while writing this paragraph, not even a well-reflected choice of names that originated from hours of work on the code that contains these variables…

Conclusion / tl;dr

To make it clear once again: This should never have happened.

It shouldn’t have happened, because this bug is so obvious, that any reasonable amount of scrutiny, which should always be applied to critical pieces of code, will catch it.

It shouldn’t have happened, because even if the humans missed it, the tooling should have complained.

It shouldn’t have happened, because a simple test would have shown that the code never actually did what it was intended to do.

But first and foremost, it shouldn’t have happened, because making the algorithm terse and readable would have forced the programmer to think about flow, structure and intent of his/her code more thoroughly, and at that point, the superfluous goto fail; would have stuck out like a sore thumb.

There’s only one thing to blame here, and that has nothing to do with code style (as which I would consider where the brackets and braces go, whether to add empty lines, etc. etc.), but rather with craftsmanship and professional attitude.

And I have to say it: It’s not worthy of a company like Apple, which prides itself to sell only the highest quality products, to produce such sloppy, unchecked, untested and obviously uncared for code, least of all in a critical part of the foundations of the operating system that runs on all of its devices.

  • Facebook
  • Delicious
  • Digg
  • StumbleUpon
  • Reddit
  • Blogger
  • LinkedIn
Tobias Goeschel

17 Responses to Reflections on Curly Braces – Apple’s SSL Bug and What We Should Learn From It

  1. Dirk Steins says:

    I have to disagree here a little bit. I agree that mostly this has nothing to do with code style and is a horrible, horrible bug.

    But if we assume that this is a merge bug, as has been told, then your constructed example with braces is wrong.

    It would have been


    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0) {
    goto fail;
    goto fail;
    }

    and not the version you posted above (or it would have been a very, very strange merge bug). And in this case the coding style with braces would have helped.

    Using a coding style without GOTO would have helped much, much more, but that is another discussion.

    • Tobias Goeschel says:

      But if we assume that this is a merge bug, as has been told, then your constructed example with braces is wrong.

      I assume it’s a copy/paste bug, rather than a “merge” bug. They changed the method signatures, and since the code sequence appears in many places, someone must have copied the changes, obviously not very carefully.
      But the way I constructed my braces was intentional, because I was making a point: “Always put braces” is not a solution, if you make a logical mistake, and/or if you just have leftover code in places where it shouldn’t be, because you were careless. You won’t even notice at first glance. The only way to prevent such mistakes is to make your code terse and readable, regardless of a particular style.

      I do agree about GOTO statements – and also that they’re part of another discussion ;)

    • ctd says:

      Decades after the famous observation, “goto” still considered harmful.

  2. Ted says:

    Personally, I think its more a meta issue in security and nerds being nerds, always optimistic.

    Nerds are optimistic with my security like this:
    ‘return success unless failure’

    Security should always be pessimistic:
    ‘return failure unless success’

    • Tobias Goeschel says:

      I agree that this would have been a much better approach to the original problem. But that’s really beside the point.

      “It works” is an absolute minimum for any piece of code – and that wasn’t the case. “It’s tested” is important to prove that “it works”. “It’s reviewed” is a sane approach to delivering a product, in any case, and “it’s clean” helps with all of the above. That’s really all I’m saying, and I believe that applies to just about anything that’s ever delivered to a customer, regardless of the technical context.

  3. Walt Howard says:

    The major problem here, and one I see a lot of programmers make is not even trying their code. This would have been caught with the slightest testing.

    Didn’t you even try (minimal test) your code?

    And I disagree about hugely nested positive if statements. Those increase complexity. Doing it that way requires holding multiple conditions in your mind while coding. Bailing after a failure is much simpler, you can bring your entire thought process too bear on each test without being concerned about any other condituon.

    I would have thrown exceptions (C++) or returned an error code instead of having a fail label.

    • Tobias Goeschel says:

      The major problem here, and one I see a lot of programmers make is not even trying their code. This would have been caught with the slightest testing.

      Absolutely. That’s why it’s point 1. of my suggestions on how to fix this kind of bug.

      And I disagree about hugely nested positive if statements. Those increase complexity. Doing it that way requires holding multiple conditions in your mind while coding. Bailing after a failure is much simpler, you can bring your entire thought process too bear on each test without being concerned about any other condituon.

      The idea is to first make it work, but then not to abandon the code, but make it terse and readable. If your way of thinking works better with early returns and such, then use them on the first implementation, until it works correctly – just don’t leave the code ugly and unreadable afterwards, but extract, refactor and clean up as much as you can.

      TDD would completely cover this, of course, because you’d write one test at a time, then implement, then refactor.

      I would have thrown exceptions (C++) or returned an error code instead of having a fail label.

      Obviously, C++ is not an option in this case. And returning an error code is what the methods below the fail label are supposed to do. But either way: That’s an implementation detail, and if everything else were done nicely, it wouldn’t make much of a difference, apart from taste.

  4. Vivek says:

    Wouldn’t any compiler worth its salt give a “Unreachable code” warning?

  5. pg314 says:

    Nice write-up. You are the first one I see that seems to go to the heart of the matter: that the code could be a lot better by extracting SSLApplyHash.

    You can still go further, as I describe here. The crux of the matter is that init/update/final should not return an error code. They should not be allowed to fail.

    • Tobias Goeschel says:

      Thanks – yours is not so bad, either :)

      I chose to not go into further details about the way the hashes are implemented, because I feared the article would have become longer and longer, and that would it have distracted from my main argumentation about craftsmanship and professionalism. I’ll update my text and add a link to your post, though – good job!

  6. Thomas Wharton says:

    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)

    Not to nitpick, but those are not curly braces — they are parenthesis.

    This style of coding is somewhat dubious. There is no compelling reason why the “err = …” cannot be on its own line. Also, the use of whitespace might have made the error more obvious.

    • Tobias Goeschel says:

      Hi Thomas,
      you are right. These: () are parentheses.

      These: {}, however, are curly braces. The article and the initial discussion are about the fact that they were omitted from the if statement.

      One could argue that assignments within boolean expressions are not a particularly good idea, but it’s a widely used style, and I don’t see much benefit in trading the ability to chain all those statements together into one single if for 6 additional lines of code per similar block. I doubt it would have made things more readable.

      Whitespace was indeed used. I’m afraid I don’t understand what you were trying to say there – can you elaborate a bit more?

  7. Unknown says:

    Can this really be considered this a bug? It seems more like a mistake or serious oversight due to the programmer’s negligence. IF statements have always worked with or without curly braces, and the code represents normal operation in an unintended way.

    • Tobias Goeschel says:

      To be honest, I don’t really know how to reply to this. The code does not represent normal operation in an unintended way – it simply does not do what it is supposed to do.
      And yes, it is a mistake AND a serious oversight.

      What would you consider a bug?

  8. ilya says:

    Here’s a different bug where the braces are used:

    if((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0); {

    • ilya says:

      …posted before I had a chance to finish:

      ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0); {
      goto fail;
      }

      …that semicolon at the end of the ‘if’ can really wreak havoc. So, on one level, the only answer seems to be doing code review, where someone can look at the code, without having any knowledge or context, and still spot a problem.

      On the other hand, the multiple goto’s is not a very a good framework. Neither is the complex ‘if’ statement. I’ll propose the following framework for a multi-condition check:

      for( step = 0; step != -1 && ! err; step++) {
      switch(step) {
      case 0:
      err = SSLFreeBuffer(&hashCtx);
      break
      case 1:
      .
      .
      default:
      step = -1;
      }

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>