Poster
DeformableTST: Transformer for Time Series Forecasting without Over-reliance on Patching
Donghao Luo · Xue Wang
East Exhibit Hall A-C #4200
With the proposal of patching technique in time series forecasting, Transformerbased models have achieved compelling performance and gained great interest fromthe time series community. But at the same time, we observe a new problem thatthe recent Transformer-based models are overly reliant on patching to achieve idealperformance, which limits their applicability to some forecasting tasks unsuitablefor patching. In this paper, we intent to handle this emerging issue. Through divinginto the relationship between patching and full attention (the core mechanismin Transformer-based models), we further find out the reason behind this issueis that full attention relies overly on the guidance of patching to focus on theimportant time points and learn non-trivial temporal representation. Based on thisfinding, we propose DeformableTST as an effective solution to this emergingissue. Specifically, we propose deformable attention, a sparse attention mechanismthat can better focus on the important time points by itself, to get rid of the need ofpatching. And we also adopt a hierarchical structure to alleviate the efficiency issuecaused by the removal of patching. Experimentally, our DeformableTST achievesthe consistent state-of-the-art performance in a broader range of time series tasks,especially achieving promising performance in forecasting tasks unsuitable forpatching, therefore successfully reducing the reliance on patching and broadeningthe applicability of Transformer-based models. Code is available at this repository:https://github.com/luodhhh/DeformableTST.