Propagation of a leap second

A second may not seem like much time, but it's an eternity for a precision time reference.

Background

At UTC 00:00 on January 1 2006 a leap second was applied to correct for the Earth's unreliable rotation. Since this was the first leap second in 7 years, it was particularly interesting. Documented below is the behaviour of ntpd in responding to the propagation of the leap second through the NTP network.

David Mills (the father of NTP) has documented how NTP is supposed to cope with a leap second here: http://www.eecis.udel.edu/~mills/leap.html.

Usual performance

To set the scene, and establish the usual performance of my personal time server box as a time reference, the distribution of time offset as measured from February to December 2005 is shown below:

Offset Distribution

A Gaussian distribution with the same mean and standard deviation as the offset data has been overlaid to demonstrate the tight clustering of the offset about the mean. In fact, the main body of the distribution can be approximated reasonably well by a Gaussian distribution with standard deviation around one fifth that of the raw data. This demonstrates the effectiveness of ntpd in controlling the offset (and also shows the importance of the assumption of independence in the Central Limit Theorem).

The distribution of the frequency correction applied by ntpd measured over the same time period is show below:

Frequency Distribution

The temporal behaviour for the week leading up to the event is shown below:

Prior to leap second

The periodic temperature induced variation is clearly evident, as is the behaviour of ntpd in responding to and correcting for the variation. Note that the monitor was stopped for a few hours late on the 28th which explains the aberration in the plot.

Leap second comes early!

The leap second was to be applied at 00:00 UTC, which corresponds to 08:00 local time (Perth, Western Australia), so it was a big surprise when I logged on around 7:00 am to find that the leap second correction was already underway! The system log showed the following:

Jan  1 05:50:49 newton ntpd[474]: time reset +1.000078 s
Jan  1 06:00:26 newton ntpd[474]: time reset -0.182042 s
Jan  1 06:26:08 newton ntpd[474]: time reset -0.377780 s

The first correction is about right, but the time is not.

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
xelysium.uwa.edu 130.95.156.8     2 u  110  256   17   20.526  -1012.1   5.293
+murgon.cs.mu.OZ 128.250.33.242   2 u  111  256   17   59.621  -10.942   4.621
*ntp1.cs.mu.OZ.A 128.250.33.242   2 u  112  256   17   62.297  -11.512   4.422
 dns.iinet.net.a 128.250.36.3     2 u  111  256   17   18.325  -15.695   7.171
 202.72.191.202  209.81.9.7       2 u  110  256   17   20.623   -9.824   7.604
 ns.creativecont 130.102.128.43   3 u  109  256   17   94.012  -1004.8   4.782
 b.pool.ntp.uq.e 130.102.152.7    2 u  111  256   17   95.264  -1010.3   4.958
 cazza.aceonline 203.12.160.2     3 u  115  256   17   18.339  -1009.6   4.783

It appears that the Melbourne Uni[1] servers (murgon and ntp1) jumped the gun, and ntpd dutifully followed (as did the iiNet DNS server). Other servers have held off. The corrections at 06:00 and 06:26 indicated by the syslog could either have been due to erratic behaviour of the Melbourne Uni servers, or ntpd becoming confused by the large time differential between servers. It is particularly significant that the the Melbourne Uni servers are reporting a stratum of 2 and synchronised with 128.250.33.242. These servers are normally synchronised directly with GPS as will be shown later.

The leap second comes late!

The leap second should have been applied at 08:00 local time, and the syslog does indeed show a correction at that time:

Jan  1 08:01:17 newton ntpd[474]: time reset -0.137771 s
Jan  1 08:29:09 newton ntpd[474]: time reset -1.049228 s
Jan  1 08:48:24 newton ntpd[474]: time reset +0.237410 s
Jan  1 09:11:53 newton ntpd[474]: time reset +0.711462 s
Jan  1 09:49:26 newton ntpd[474]: time reset -0.322314 s

The correction at 08:01 was much smaller than one second, which is presumably explained by the fact that the Melbourne Uni servers had applied early and ntpd had mostly made the correction. That any correction was required at all is probably an indication of the confusion of ntpd in determining the correct time given the inconsistency between servers of equal stratum.

However, the correction at 08:29 is almost completely inexplicable. The full one second leap is rescinded, and then effectively reapplied by subsequent corrections at 08:48 and 09:11. It is equally inexplicable that ntpd discovers a leap second between 09:30 and 09:36. At 09:30, ntpd was reporting as follows:

status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
version="ntpd 4.2.0@1.1161-r Thu Feb  5 05:40:00 WST 2004 (2)",
processor="i386", system="FreeBSD/5.1-RELEASE", leap=00, stratum=2,
precision=-18, rootdelay=60.074, rootdispersion=1105.071, peer=25390,
refid=128.250.37.2,
reftime=c761ac1b.e23ae808  Sun, Jan  1 2006  9:17:15.883, poll=6,
clock=c761af39.d56c85cf  Sun, Jan  1 2006  9:30:33.833, state=4,
offset=-0.216, frequency=223.488, jitter=149.300, stability=271.412

Then at 09:36, the following was being reported:

status=46f4 leap_add_sec, sync_ntp, 15 events, event_peer/strat_chg,
version="ntpd 4.2.0@1.1161-r Thu Feb  5 05:40:00 WST 2004 (2)",
processor="i386", system="FreeBSD/5.1-RELEASE", leap=01, stratum=2,
precision=-18, rootdelay=58.947, rootdispersion=270.739, peer=25389,
refid=128.250.37.2,
reftime=c761b063.e1d9cbb8  Sun, Jan  1 2006  9:35:31.882, poll=6,
clock=c761b089.1678c3cf  Sun, Jan  1 2006  9:36:09.087, state=4,
offset=-86.652, frequency=218.845, jitter=74.808, stability=235.061

Clearly a leap second had been propogated through the NTP network during this time. Some of the confusion experienced by ntpd is shown below, but unfortunately timestamps for this data is not available.

     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 elysium.uwa.edu 130.95.156.8     2 u   40   64    3   22.422  366.357  19.183
 murgon.cs.mu.OZ .GPS.            1 u   40   64    3   59.378  333.122  21.715
 ntp1.cs.mu.OZ.A .GPS.            1 u   40   64    3   62.878  -131.04  10.996
 dns.iinet.net.a 128.250.36.3     3 u   37   64    3   19.005  1348.96  18.358
 202.72.191.202  128.250.36.2     2 u   38   64    3   18.768  1368.23  18.089
 ns.creativecont 130.102.128.43   3 u   37   64    3   93.595  375.086  19.165
 b.pool.ntp.uq.e 130.102.152.7    2 u   38   64    3   97.585  366.135  20.099
 cazza.aceonline 130.102.2.123    3 u   36   64    3   20.149  366.873  19.246
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+elysium.uwa.edu 128.250.36.2     2 u   31   64  377   18.422  678.806 163.756
*murgon.cs.mu.OZ .GPS.            1 u   39   64  377   58.899  654.355 156.940
xntp1.cs.mu.OZ.A .GPS.            1 u  423 1024   73   62.878  -131.04 457.403
 dns.iinet.net.a 128.250.36.3     2 u  100  128  371   18.553  1425.64 560.389
x202.72.191.202  128.250.36.2     2 u   34   64  377   18.671  1660.36 153.106
+ns.creativecont 130.102.128.43   3 u   37   64  377   92.961  706.476 175.076
-b.pool.ntp.uq.e 130.102.152.7    2 u   40   64  377   94.611  423.284 234.447
+cazza.aceonline 203.12.160.2     3 u   35   64  377   19.034  697.112 173.864
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 elysium.uwa.edu 128.250.36.2     2 u   10   64    1   21.937   30.634   0.004
 murgon.cs.mu.OZ .GPS.            1 u   10   64    1   60.265   28.920   0.004
 ntp1.cs.mu.OZ.A .GPS.            1 u   10   64    1   62.252   16.363   0.004
 dns.iinet.net.a 128.250.36.3     2 u    7   64    1   20.898  -96.024   0.004
 202.72.191.202  128.250.36.2     2 u    9   64    1   19.887  1031.87   0.004
 ns.creativecont 130.102.128.43   3 u    7   64    1   93.766   37.916   0.004
 b.pool.ntp.uq.e 130.102.152.7    2 u    9   64    1   97.337   30.499   0.004
 cazza.aceonline 203.12.160.2     3 u    8   64    1   18.904   27.769   0.004

Note that in each of these dumps, the Melbourne Uni servers are reporting stratum 1 with direct GPS synchronisation as usual.

The aftermath

In the end, what should have been a relatively straight forward one second correction was badly bungled. The plot below shows the offset and frequency correction as the leap-second was applied and afterwards.

Post Leap

Comparing this to the earlier plots showing the usual behaviour of ntpd demonstrates how poorly ntpd coped with the application of the leap second. The plot above does not show the full horror of the event since ntpd reports an offset of zero during periods where it is unsynchronised. Perhaps the most disturbing behaviour is the willingness of ntpd to step backwards (as seen in the syslog entries shown previously). This behaviour can no doubt be controlled by carefully setting options, however the default behaviour is poor. This is especially true since the corrections are in some cases relatively small and should have been applied through slewing.

There is compelling evidence that the NTP network itself malfunctioned during this period, but in any case the behaviour is completely unacceptable and it takes around 24 hours for normal behaviour to be restored.

[1] Thanks to David Squire for correcting my earlier error in claiming the mu.OZ servers belonged to Monash University. These servers in fact belong to Melbourne University. 18-04-2006