The first pitfall is assuming that ensemble members in the middle of the distribution are more likely to verify than those on the tails. Although that could be the case, there are times when that's a bad assumption.
Here's an example based on the NCAR ensemble forecast for Alta Collins for the 2-day period ending this afternoon and encompassing our most recent storm (focus on the top graph). In the NCAR ensemble, every member is an equally likely outcome. And, in this particular forecast, the outcomes are relatively evenly distributed. The odds of 0.6 inches are about the same as 1.4 inches (the latter verified).
Hedging your forecast to a subset of the plumes makes little sense when the distribution of forecasts is so even. It might make sense when there is a strong clustering of forecasts, but one also needs to keep in mind that this is only a 10-member ensemble, so in some instances, that clustering might not be significant and you might want to be cautious about biting on it.
The second is assuming that the full spread of the ensemble (from the low to high members) captures the full range of possible outcomes. All of the ensemble forecast systems in operations today are underdispersive. That means they don't fully capture the full range of possibilities in a given forecast period. For example, at mountain sites in the western U.S., the spread of downscaled 5-day forecasts derived from the Global Ensemble Forecast System, which represent a portion of the forecasts used for the NAEFS-downscaled products on weather.utah.edu, encompasses the observed amount produced during major precipitation events only 56% of the time. During the other 44%, the observed amount lies outside the downscaled GEFS spread.