Fault tolerance

Today’s 0.7.9 release improves the user experience when the local or remote addressbooks fail in unexpected ways.

For the technically curious, here’s a behind-the-scenes look at addressbook failures.


Example 1 - Google permanent failure

The Google Contact API enforces a uniqueness rule when contacts are created, but not when they are retrieved.  So when a Google account contains duplicates (and this does happen), the client gets a set of contacts that it can’t reliably edit.

Aside from leaving the client in an awkward spot, this behaviour is the opposite of Jon Postel’s Robustness Principle, commonly paraphrased from rfc 793 as:

 be conservative in what you send, be liberal in what you receive

Example 2 - Google transient failure

The Google Contact API allows 71 of 83 contacts to be created then for #72 reports:

Internal exception
Error 401

Example 3 - Thunderbird transient failure

A remote contact gets deleted but Thunderbird refuses to delete it’s local counterpart:

Component returned failure code: 0×80004003 (NS_ERROR_INVALID_POINTER) [nsIAbDirectory.deleteCards]

Fault tolerance

The Zindus addon is designed to work in a sync mesh and failures at every node in the mesh are inevitable. When a failure is detected, the sync engine has a choice:

  1. give up persistence ==> back out changes made in the local+remote addressbooks
  2. give up convergence ==> addressbooks are out of sync.  This is what the Zindus addon does.

statusbar: sync failedThe Thunderbird statusbar indicates when the addressbook is out of sync.

Most failures are transient and generally don’t require user intervention.  On the next sync, the sync engine picks up from where it left off.  In example 2 above, the engine tries to create contact #72 and in example 3, delete the appropriate contact.

When a failure is permanent, the status indicator remains ‘X’ until the user intervenes to fix it.

Fault tolerance isn’t an exciting subject, but it’s a key ingredient in delivering a reliable and unobtrusive contact sync service to Thunderbird users.

If you liked this Blog, share the love :                    

2 Responses to “Fault tolerance”

  1. Dan Mosedale Says:

    Example 3 indicates a bug of some sort. If the extension is passing null into nsIAbDirectory.deleteCards, the bug is in the extension, otherwise the bug is in Thunderbird. If there is a Tb bug here, it would be great if you could file a bug in Bugzilla with more details.

  2. leni Says:

    Dan -

    As suggested, a bug report has been filed against example #3:
    https://bugzilla.mozilla.org/show_bug.cgi?id=451306

    Here are references to the discussion of each of these issues on the relevant developers lists:

    example #1 - Google duplicates:
    http://groups.google.com/group/google-contacts-api/browse_thread/thread/1e745c13924ffc34/

    example #2 - Google Internal exception:
    http://groups.google.com/group/google-contacts-api/browse_thread/thread/2887f1c7314cc646

    example #3 - Thunderbird deleteCards:
    http://groups.google.com/group/mozilla.dev.apps.thunderbird/browse_thread/thread/8100b8dd0d6becce

Leave a Reply