Microsoft Start gets new AI weather forecast model capabilities

Brink · May 23, 2024

Microsoft Bing Blogs:

Microsoft Start’s new AI forecast model capabilities have unlocked the ability for users around the globe to experience high-quality, up-to-the-minute forecasts and maps of both cloud and precipitation together while also filling in gaps for data availability.

Since 2021, Weather from Microsoft Start has been running an operational short-term precipitation nowcasting model powered by generative AI to empower its users to make informed weather decisions. Every 2 minutes, this cutting-edge model provides users with forecasts at a hyper-local 1-kilometer resolution for up to four hours in the future. Since its initial presentation at NeurIPS 2021, the model has undergone continuous enhancements to improve precipitation forecast and map experiences across Microsoft’s weather products. In internal testing on benchmarks such as the SEVIR dataset, Microsoft Start’s model consistently ranks near the top while also providing forecasts up to two times further out compared to other generative AI models including DGMR (2021) and PreDiff (2023).

Traditionally, precipitation nowcasting models rely on weather radar data to “see” where precipitation is occurring and extrapolate how it will evolve. Deep learning models are capable of extracting information from very large volumes of data, and other data sources such as geostationary satellites, further provide vital information for precipitation forecasting. Using this data, Weather from Microsoft Start has developed a new AI model for Joint Global Cloud and precipitation nowcasting.

Adversarial regularization

Similar to the approach in DGMR, Microsoft used an adversarial learning approach, also known as a generative adversarial model (GAN) to improve the realism of model’s predictions. This approach introduced spatial and temporal discriminators to force the forecaster (generator) to produce high visual fidelity and temporal consistency. The spatial discriminator randomly samples forecast frames to improve visual fidelity, while the temporal discriminator samples chunks of several consecutive time frames for improving temporal consistency. During the training process, the generator tries to make predictions which look like real samples from the training data, while the discriminators try to distinguish between generated samples and real samples. Critical to this learning process was the introduction of skewed sampling favoring more frequent selection of frames at longer lead times (Figure 1), which helped reduce blurriness in forecasts further out, where typical regression loss favors overly smooth predictions.

Modifications to the loss function

The training loss function consists of pixel-wise regression loss and the adversarial loss (discriminator loss). The easiest way for the generator to fool the discriminators is by dissipating precipitation to zero, which resembles observed states of “no precipitation”. Since the discriminators are unable to distinguish between a prediction and truth in this scenario, the generator loss is designed to penalize for missing rain by introducing a recall control hyperparameter, α (Equation 1), which penalizes the model for negative bias in radar prediction. The α parameter is tuned by trading-off missed rain instances with an increased rain bias in test datasets.

Since the error in model predictions increases with lead-time, equally weighing these errors in the loss worsens shorter lead time forecasts. To counter this effect, Microsoft introduced a weighting ωt, which decreases with lead time, which results in both, i) an improvement in shorter lead-time forecasts, where the regression loss is important, and ii) better visual fidelity at longer lead time when the discriminator loss is more important. For pixel-wise loss we opt for L1 loss instead of L2, so that the model is not overly penalized for missing extreme precipitation conditions that may occur. Finally, a similar loss function with recall control and lead-time varying weight is applied to each of the outputs in Satellite + Radar Nowcasting.

Using both satellite and radar

Since late 2021, Microsoft Start’s has offered precipitation nowcasting globally, including in regions without radar coverage, thanks to geostationary satellite data providing near-global, high-resolution imagery of clouds and water vapor that can be used by AI models to deduce precipitation. This model provides simulated radar imagery to regions where radar is unavailable using satellites. Despite this achievement, the model performance was limited by availability of satellite imagery. Depending on the region, satellite imagery is only available about 85-95% of the time for acceptable latency.

With evidence suggesting the need for a separate decoder per task and a separate discriminator for each predicted channel, Weather from Microsoft Start built a model 4X bigger model than the previous one that only predicted simulated radar reflectivity. Finally, the new model jointly predicts both satellite and simulated radar reflectivity, enabling its predictions to fill data availability gaps. Since the precipitation task is more important than the satellite prediction task, the radar channel was given 6X more weight in the training loss function than satellite channels.

To evaluate the model performance, simulated radar reflectivity is evaluated by checking precision and recall for different reflectivity thresholds indicative of varying rainfall. Satellite image predictions were compared against persistence using metrics such as MSE, MAE, image quality metrics like PSNR, and MS-SSIM for similarity and FID scores for sharpness. Against the prior baseline of radar-only predictions, Microsoft Start’s new model presents a marked improvement in F1-score. Additionally, it was observed that predicted satellite images score better than a persistence forecast after 15 minutes, meaning these predictions can be used when satellite outages last longer than 15 minutes.

Considerations for operations

Productionizing a global forecast model with up-to-the-minute data presents its own challenges. A global inference is done using small sliding windows (tiles) with some overlap. This tile size is constrained by memory during model training, but not during inference. A small tile size during inference leads to high latency and bigger segmentation effects, to counter this, the generator architecture needs to meet three conditions: translation equivariance, spatially unconstrained operations, and low memory footprint of the hidden state. Consequently, Weather from Microsoft Start has developed its own unique video prediction model to meet these conditions, which allows flexibility in window sizing, thereby giving the ability to vary window size during training and inference.

This new model has unlocked the ability for users to experience seamless cloud and precipitation forecasts and maps while still providing accurate forecasts even when satellite data feeds experience unexpected outages. The new Satellite + Radar nowcasting model is the latest addition to Weather from Microsoft Start’s growing inventory of world-leading weather models. According to an independent study commissioned by Microsoft, *Weather from Microsoft Start was recognized for its leading forecast accuracy. You can find weather information from Weather from Microsoft Start through its integration into Windows 10, Windows 11, Microsoft Edge, Bing, and in the Bing and Microsoft Start mobile apps.

Source:

Improved Radar and Satellite Nowcasting for Clouds and Rain by Weather from Microsoft Start

Microsoft Start’s new AI forecast model capabilities have unlocked the ability for users around the globe to experience high-quality, up-to-the-minute forecasts and maps of both cloud and precipitation together while also filling in gaps for data availability.

blogs.bing.com