## Model Construction and Congestion Pricing

The difficulty of congestion pricing outstrips by far that of pricing energy. Without the benefit of smoothness that comes from an aggregated measurement like energy pricing, we are left with the jumpier price movements of congestion. As can be seen, the presumption that upside volatility is more drastic than downside must be dispensed with when we are dealing with congestion. As such, it provides an excellent opportunity to discuss our approach to model construction. Before we begin, a note about the plots: the red line signifies the historical prices for about 8 days; the model was trained on more history, but 8 days allowed for easier viewing. There is a gap around hour 160, the historical data ends at the trade deadline, and the prediction begins during the trade day, exactly as though the trades were live. We show the genesis of model creation in three steps. The second through fourth figures are forecasting the same time period, and they are zoomed in versions of the first plot.

We begin with a simple model to the right that is familiar to technical analysts in the equities market: the moving average. The trade price here is the moving average of the past 60 hours. This gives a rough picture of what the price could be, and is moderately profitable if it bids and offers at the same price. It has no idea of uncertainty in the inputs as it is purely a function of time. Confidence intervals could still be constructed to reflect an estimate of the noise, but they are unlikely to be particularly informative.

The second model at the left is more powerful, though perhaps no better in terms of profitability. It overlays information about the hourending to underscore the fact that congestion is more prevalent for certain hours on this particular node. This approach, while failing to capture the trend of the real time follows the shape of the day ahead price more closely. Still, as we are using time as our only input the uncertainty is somewhat uninteresting, and we neglect to draw confidence bounds.

We continue to make the model incrementally more powerful, adding variables that we know to be important. Examples for a particular node could include temperatures at one or more relevant locations, wind speeds in those locations, gas prices, load, total outages, and so on. These additional variables come at the cost of potentially overfitting the dataset. We validate our model using standard techniques from the inverse problems literature to ensure that this does not occur.

In the end, we are left with the full model, shown below. By simulating potential draws we can gauge the amount of uncertainty inherent in the trades and adjust the risk we are taking accordingly. We note that it does not fit the data perfectly; this is desirable since a perfect relationship would signify overfitting. How then do we determine that this is the correct model? And what is the point of the plot?

To the former question we answer that the validation gives us an idea of what the out of sample error should be. The true test comes during the backtest, as we determine how the pricing fits into the overall algorithm, but that is much further along in the process.

To the latter we suggest that a partially automated algorithm, of which congestion pricing is a part, may benefit from manual oversight. This plot can show the trader how the prices may evolve so they are able to make informed decisions.