Poster
in
Workshop: AI for Science: from Theory to Practice
Insight Miner: A Large-scale Multimodal Model for Insight Mining from Time Series
Yunkai Zhang · Yawen Zhang · Ming Zheng · Kezhen Chen · Chongyang Gao · Ruian Ge · Siyuan Teng · Amine Jelloul · Jinmeng Rao · Xiaoyuan Guo · Chiang-Wei Fang · Zeyu Zheng · Jie Yang
Time-series data is essential in various science and industry domains, like environmental analysis, agriculture, transportation, and finance. Researchers need to use their domain knowledge to conduct insight mining from time-series data to study scientific topics. However, this process is time-consuming and highly depends on expert knowledge. This paper proposes a large-scale multimodal model (LMM), Insight Miner, to generate decent and comprehensive time-series descriptions with domain-specific knowledge. To introduce rich time-series insights to Insight Miner, we propose a time-series analysis dataset, TS-Insights, composed of time series and textual insight pairs. In the TS-Insights dataset, we include 100k time series windows sampled from 20 forecasting datasets spanning a wide variety of domains and granularities. Through a meticulous combination of heuristics and statistical tools, we preprocess each raw time series window and use GPT-4 to generate a coherent trend description based on the extracted features. After training with the TS-Insights dataset via instruct tuning, the Insight Miner model performs better in generating time series descriptions and insights compared with state-of-the-art multimodality models, such as LLaVA \citep{liu2023llava} and GPT-4. Our findings suggest a promising direction of leveraging LMMs for time series analysis and potentially offering avenues for efficient insight mining in scientific domains. The TS-Insights dataset is available and will be published upon acceptance.