The code computing lilliefors test for exponentiality is below. The code uses approximations from Lilliefors paper for large N and then uses linear interpolation to find the approximate P-value.

lilliefors.exp.test.stat <- function(x) {
  n <- length(x)
  i <- 1:n
  xi <- sort(x)
  zi <- xi / mean(x)
  ti <- 1 - exp(-zi)
  t1 <- (i / n) - ti
  t2 <- ti - ((i - 1)/n)
  D <- max(abs(t1),abs(t2))
  return(D)
}

lilliefors.exp.test <- function(x) {
  D <- lilliefors.exp.test.stat(x)
  p <- c(0.05, 0.1, 0.2, 0.3, 0.5, 0.7, 0.8, 0.9, 0.95, 0.99)
  num <- c(0.4550, 0.4959, 0.5530, 0.6, 0.6898, 0.7957, 0.8678, 0.9773, 1.0753, 1.2743)
  num <- num / sqrt(length(x))
  lo <- D >= num
  hi <- D <= num
  if (sum(hi) == 0) {
    p <- 0
    warning("p-value too small to interpolate")
  } else if (sum(lo) == 0) {
    p <- 1
    warning("p-value too large to interpolate")
  } else {
    nlo <- max(num[lo])
    nhi <- min(num[hi])
    plo <- max(p[lo])
    phi <- min(p[hi])
    m <- (phi - plo) / (nhi - nlo)
    b <- plo - (nlo * m)
    p <- m * D + b
    p <- 1 - p
  }
  names(D) <- "Dmax"
  structure(list(statistic = D,
                 p.value = p,
                 method = "Lilliefors test for exponentiality",
                 data.name = deparse(substitute(x))),
            class = "htest")
}

I am finally dragging my self out of the dark ages and starting to use RSS. My real reason for doing so is to try to keep up with publications from various groups. In particular, I really want a feed from various large publishers (eg. ACM and IEEE) that notifies me when an interesting (to me) journal or conference proceeding is available.

First, the good news: the IEEE Computer Society provides a nice set of feeds for new entries into their digital library: one for journals and another for conference proceedings. However, both feeds have bugs which make them unusable (at the moment) with Google Reader.

The proceedings feed has too many tags… one of which is completely bogus. Which one does Google Reader choose you might ask? The bogus one, which means the links don’t actually work. The journals feed only has one tag… and it’s bogus. Luckily it has a valid tag… which Google Reader ignores because there is a PermaLink. *sigh* At least the data is available.

On a similar topic, kudos to the IEEE Computer Society for providing feeds for the contents of their magazines and journals!

Now, the ACM on the other hand does NOT produce a feed of additions to their library. However, they do post new entries on the ACM Digital Library home page. Why not make a feed available? Dunno, luckily it’s easy enough to scrape.

So, for your enjoyment (and mostly mine), feel free to subscribe to my editted feeds from the IEEE Computer Society (journals and proceedings) and additions to the ACM Digital Library. The feeds are updated once per day (early in the morning MST), and they will cease to be when 1) the respective groups fix their feeds (or start providing them) or 2) when they break. YMMV, void where prohibited, etc.

EDIT: I reworked the IEEE CS feeds to dynamically edit the content. This makes more sense than caching the edited copy. So, two new links for conferences and journals.

New Telescope

I am an amateur astronomer (heavy emphasis on the amateur). My wife and I went shopping the other day at a local “variety” store (it’s more of a pawn or consignment shop). Anyway, while meandering through the junk piles, we spotted a Meade ETX-70AT with a $50 price tag.

This scope is somewhat more portable than either of my 4.5″ reflectors. Now, if I could just find a night with clear skies AND time to observe. While waiting for that I ran across three excellent books that should be on any amateur (emphasis on the amateur) astronomer’s shelf.

The first is “The Stars” by H. A. Rhey (creator of Curious George), and the second is “Turn Left at Orion” by Guy Consolmagno and Dan M. Davis. More about those later.

While doing research on my thesis, I began to look for various tests of statistical distributions. I am familiar with χ2, but this test requires a (seemingly) arbitrary choice of the number and shape of bins. Essentially one takes the observed number of items falling in a range and compares that with the expected number (from the proposed distribution). The width and number of bins is up to the experimenter. Also, I was dealing with very steep curves (they look like an exponential decay). χ2 requires the expected value in a bin to be no less than 5 (more the merrier). This severely impacts the number of possible bins for the test given my empirical distributions.

So, I took a trip to the library and found Conover’s book Practical Nonparametric Statistics. In that volume, I found descriptions of non-parametric tests. For example, the Kolmogorov-Smirnov test. This test checks the vertical distance between an empircal cumulative distribution function (ECDF) and a fully specified test distribution.

They key words here are “fully specified”. In this context it means that you can test whether a sample comes from, for example, a exponential distribution with λ=2. If that rate parameter (λ) is computed from the sample, the KS test is invalid.

Lilliefors, on the other hand, modified the KS test for cases where the parameters in the proposed distribution are estimated from the sample. The advantage here is that based simply on the sample data, the distribution can be tested for normality or exponentiality. Unlike χ2, the author cannot manipulate the bin widths/numbers; the entire test is computed from the sample. i.e. it is nonparametric.

The original article by Lilliefors gives a table of p-values, and a function for arbitrary sample sizes. It has been given better approximations by Stephens and also by Mason and Bell. It seems this test is still in somewhat active development. The most recent work I’ve found is an examination of its asymptotic behavior (see Nikitin).

References

  • Lilliefors, H. W. “On the Kolmogorov-Smirnov test for the exponential distribution with mean unknown.” Journal of the American Statistical Association. Vol. 64, 1969, pp. 387–389. [jstor]
  • Conover, W. J. Practical Nonparametric Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1980. [pub]
  • Stephens, M. A. EDF Statistics for Goodness of Fit and Some Comparisons. Journal of the American Statistical Association, 1974, 69(347), pp. 730-737. [jstor]
  • Mason, A. L. and Bell, C. B. New Lilliefors and Srinivasan Tables With Applications. Communications in Statistics: Simulation and Computation. 1986, 15(2), pp. 451-477. [doi]
  • Nikitin, Y. Y. and Tchirina, A. V. Lilliefors test for exponentiality: large deviation, asymptotic efficiency, and conditions of local optimality. Mathematical Methods of Statistics, 2007, 16(1), pp. 16-24. [doi]
My home network is somewhere off of the qwest cloud. I have a DSL router and an access point (running OpenBSD, of course). I also have a co-lo machine. On the co-lo machine, I have a tunnelbroker.net IPv6 tunnel configured (my co-lo ISP doesn’t do IP6 yet *sigh*).

I decided to route a “real” /64 to my house from the co-lo machine and it wasn’t as easy as I expected it would be. I fiddled with IPSec, but being behind a double NAT (access point is the first NAT and the DSL router is the second) made that configuration more trouble than I wanted. (btw, ideas for how to make this work are welcome).

So, I ended up using ssh(1) and tun(4) to create a tunnel between my access point and the co-lo machine.

How it works:

On the home access point:

# cat /etc/hostname.tun0
inet6 alias fe80::2 128
dest fe80::1
! route delete -inet6 default
! route add -inet6 default fe80::1%tun0
# cat /etc/hostname.rum0
inet 172.16.0.1 255.255.255.0 NONE
inet6 alias 2001:470:b813:f000::1
# cat /etc/hostname.dc0
inet 172.16.1.1 255.255.255.0 NONE
inet6 alias 2001:470:b813:f001::1

This creates a tun interface with a link local address (LLA) of fe:80::2%tun0 and the “other” end of the tunnel is fe80::1%tun0. (note: LLA’s are only valid within the context of the interface (tun0 in this case), which is why the % syntax is used). I also add “real” ipv6 addresses to my wireless LAN (rum0) and my wired LAN (dc0).

On the co-lo machine:

inet6 alias ‘fe80::1%tun0′ 128
dest ‘fe80::2%tun0′
! route add -inet6 2001:470:b813:f000:: fe80::2%tun0
! route add -inet6 2001:470:b813:f001:: fe80::2%tun0

This configures the tunnel interface and adds routes through it (using the LLA’s).

Now, back on the home machine:

ssh -4 -f -w 0:0 www.thought.net sh /etc/netstart tun0

The command above creates the actual tunnel and activates the tun interfaces. Shazam!

Day in NYC

Alright, so I am in New York City to speak at NYC BSD CON. I had to arrive a day early (Thursday) because I came directly from speaking at SECTor. It’s been a crazy week thus far.

SECTor was fun. Two of my co-workers were there, so I had some friendly company, before the conference itself go started. I took HD Moore’s “Powersploiting with Metasploit 3.2″ tutorial, which was very well put together and went in depth enough to get me interested in using it for stuff at work. Should be interesting. The conference itself was Tuesday and Wednesday.

On Tuesday, I attended several talks:

  • Double Trouble: SQL Rootkits and Encryption (Kevvie Fowler)
  • Metasploit Prime (HD Moore)
  • More SCADA/ICS Security: Findings from the Field (Mark Fabro)

Kevvie’s talk was interesting and I’m very much interested in the forensic impact of his work with SQL server. He’s a bit of a Microsoft fan-boy, but that does not detract from the coolness of his analysis. HD’s Metasploit talk went over the new features in 3.2 of the framework and what’s ahead. Finally, Fabro’s SCADA talk is similiar to material I’ve heard, but he gives an excellent, high-energy delivery.

I missed most of the talks on Wednesday because I had to prepare my presentation, but I caught Deviant Ollam’s “Ten things everyone should know about locking & physical security.” This is a talk that he gives fairly regularly, but he keeps it up to date and is an excellent speaker. I am totally kicking myself for not visiting the “lockpick village” in the vendor expo area. Doh! Maybe next year.

My talk went well, but it was short. 45min of material done in 30… oops. Time management bit me. Oh well, the audience had good questions and they took 20 more minutes or so.

So, New York. Amazing place. We don’t have alot of subways or passenger trains in Idaho, so it’s been a bit of adventure learning how to navigate the city. I managed to use the train/subway system to get from JFK to my hotel near Columbia University: $7 and 1.5 hours (not bad). I grabbed some Indian food (something we desperately lack in Idaho) on about 100th and Broadway, and then called it an early evening.

Thursday I spent the morning in the fashion district (and navigating the subway system). My wife (and I) are addicted to Project Runway, and I visited Parsons New School of (fill in the blank) in the 7th Ave/14th St area. I then took a walk through Mood Fabrics. Wow, I’m pretty well convinced that if you can’t find it at Mood, it doesn’t exist.

I had an interesting thing happen several times during the day. About 5 times, men dressed in black hats and sporting beards (what I associate with orthodox Jewish) came up to me and asked “Are you Jewish?” I replied, “No,” because, well, I’m not. Each time after responding, the man would just mutter, “OK” or “Thank you” and then walk away. I enquired about this behavior and was told it was recruiting. This struck me odd… “Are you?” “No” “Ok, bye” doesn’t seem like a way to bring in recruits. Turns out, no, or well, sorta. It’s recruiting those partially in the “in” crowd (Jewish) to join in the festivities more actively (or so I was told). And no, I don’t really feel left out.

Back to the hotel to do laundry then back out to the Havana Central on Broadway for dinner with the NCYBSDCON crowd. Ya know, this is perhaps my favorite conference. It’s small, but the people area great and the audience is really receptive. I’m up at 10am tomorrow. Look for the audio and slides to appear on the NYCBSDCON website (2006 is there and you can hear my talk on OpenBSD/sparc64 from there).

Pictures are up from the trip so far on flickr.