Tuesday, November 22, 2016

Two Ensemble Pitfalls to Avoid

If you are a user of some of our ensemble products on weather.utah.edu, I thought I would share a two pitfalls to avoid.

The first pitfall is assuming that ensemble members in the middle of the distribution are more likely to verify than those on the tails.  Although that could be the case, there are times when that's a bad assumption. 

Here's an example based on the NCAR ensemble forecast for Alta Collins for the 2-day period ending this afternoon and encompassing our most recent storm (focus on the top graph).  In the NCAR ensemble, every member is an equally likely outcome.  And, in this particular forecast, the outcomes are relatively evenly distributed.  The odds of 0.6 inches are about the same as 1.4 inches (the latter verified).  

Hedging your forecast to a subset of the plumes makes little sense when the distribution of forecasts is so even.  It might make sense when there is a strong clustering of forecasts, but one also needs to keep in mind that this is only a 10-member ensemble, so in some instances, that clustering might not be significant and you might want to be cautious about biting on it.  

The second is assuming that the full spread of the ensemble (from the low to high members) captures the full range of possible outcomes.  All of the ensemble forecast systems in operations today are underdispersive.  That means they don't fully capture the full range of possibilities in a given forecast period.  For example, at mountain sites in the western U.S., the spread of downscaled 5-day forecasts derived from the Global Ensemble Forecast System, which represent a portion of the forecasts used for the NAEFS-downscaled products on weather.utah.edu, encompasses the observed amount produced during major precipitation events only 56% of the time.  During the other 44%, the observed amount lies outside the downscaled GEFS spread.  

1 comment:

  1. Excellent point on the ensembles. One could even have an argument that the best "cluster" were the 5 ensemble members at the bottom and that taking their avg (0.75) would be the theoretical way to go. Tough to judge on one spot, which happens to be the wettest spot in the state where orographics dominate. Would be interesting having a dozen or so stations and seeing which model did best from a spatial verification perspective.

    Then again, that "one spot" might be the most important one on the planet to us season pass holders in the Little Cottonwood! (lol).