Domain Adaptation for Time Series Under Feature and Label Shifts

When we train a model to recognize patterns in one type of data, it might not work well when we apply it to a different type of data. This is where unsupervised domain adaptation (UDA) comes in, which helps us transfer the trained model to a different type of data without needing labeled examples.

However, when dealing with complex time series data (like sensor readings over time), it becomes harder to transfer the model because the patterns in the data change over time and can be different between different domains (like different sensors or environments). Additionally, the labels that we use to train the model might be different between domains. This means that the model might not work well when we apply it to the new domain.

To address this problem, a new model Raincoat can handle complex time series data by aligning the different patterns in the data across domains and correcting for any differences. It can also detect when the labels are different between domains and adjust the model accordingly.

Motivation

Training models that can adapt to domain shifts is crucial for robust, real-world implementation. We often encounter data from different sources or domains, which can lead to issues with model performance. That is where domain adaptation comes in – it is all about creating models that can adapt to different domains and perform well in any setting. For example, in healthcare, time series data collection methods vary widely across clinical sites, leading to shifts in underlying features and labels. Endowing learning systems with domain adaptation capabilities increases their reliability and expands applicability across downstream tasks.

However, building models that can adapt to different domains is no easy feat. There are a variety of challenges, including feature and label shifts, private labels, and more. Domain adaptation is a highly complex problem due to several factors, including private labels that do not exist in the source domain. Models trained for robustness to domain shifts must learn highly generalizable features to avoid relying on spurious correlations created by non-causal data artifacts. Shifts in label distributions across domains may result in private labels, making domain adaptation challenging. Traditional techniques often fall short when it comes to these challenges, and that is where new, innovative approaches are needed.

One approach to domain adaptation is adversarial learning, where a model is trained to be invariant to differences between the source and target domains. Another is transfer learning, where a model is pre-trained on a large dataset and then fine-tuned on a smaller, domain-specific dataset. But perhaps the most exciting approach to domain adaptation is unsupervised learning, where a model is trained on data from the source domain and then applied to the target domain without any labeled data. This approach has enormous potential for real-world applications, as it does not require labeled data from every domain.

Raincoat

We introduce Raincoat, a domain adaptation method for time series that can handle both feature and label shifts. Raincoat is the first model for both closed-set and universal domain adaptation on complex time series. It addresses feature and label shifts by considering both temporal and frequency features, aligning them across domains, and correcting for misalignments to detect private labels.

Raincoat improves transferability by identifying label shifts in target domains. It produces generalizable representations robust to feature and label shifts, expanding the scope of existing domain adaptation methods.

Raincoat Model

Raincoat has three steps, as illustrated in the following figure:

  • Step 1: Align - It employs time and frequency-based encoders to learn time series representations, using Sinkhorn divergence for source-target feature alignment as frequency features may not have the same support.

  • Step 2: Correct - It retrains an encoder on the target domain to correct any potential misalignments.

  • Step 3: Inference - It calculates the difference between the aligned and corrected representations of target samples to identify unknown target samples through a bi-modality test and binary classification task.

Properties of Raincoat

To address feature shift, Raincoat takes into account implicit frequency feature shift and incorporates additional frequency feature inductive bias in the encoder, to uncover potential invariant features across domains and enhance transferability.

To address label shifts, it employs target-specific feature encoders that retain the semantic meaning of the target domain, enabling inference without relying on user-specified input.

Our experiments using 5 datasets and comparing with 13 state-of-the-art domain adaptation methods show that Raincoat outperforms these methods in the presence of both feature and label shifts. The following figure illustrates the superiority of Raincoat for closed-set domain adaptation.

The results from testing Raincoat on different datasets show that it can improve the performance of the model by up to 16.33%. This means that Raincoat has the potential to be very useful for transferring models between different domains and making them more accurate.

Publication

Domain Adaptation for Time Series Under Feature and Label Shifts
Huan He, Owen Queen, Teddy Koker, Consuelo Cuevas, Theodoros Tsiligkaridis, Marinka Zitnik
International Conference on Machine Learning, ICML 2023 [arXiv] [ICML 2023]

@inproceedings{he2023domain,
title = {Domain Adaptation for Time Series Under Feature and Label Shifts},
author = {He, Huan and Queen, Owen and Koker, Teddy and Cuevas, Consuelo and Tsiligkaridis, Theodoros and Zitnik, Marinka},
booktitle = {International Conference on Machine Learning, ICML},
year      = {2023}
}

Datasets

Code

PyTorch implementation together with documentation and examples of usage is available in the GitHub repository.

Authors

Latest News

Mar 2024:   Efficient ML Seminar Series

We started a Harvard University Efficient ML Seminar Series. Congrats to Jonathan for spearheading this initiative. Harvard Magazine covered the first meeting focusing on LLMs.

Mar 2024:   UniTS - Unified Time Series Model

UniTS is a unified time series model that can process classification, forecasting, anomaly detection and imputation tasks within a single model with no task-specific modules. UniTS has zero-shot, few-shot, and prompt learning capabilities. Project website.

Mar 2024:   Weintraub Graduate Student Award

Michelle receives the 2024 Harold M. Weintraub Graduate Student Award. The award recognizes exceptional achievement in graduate studies in biological sciences. News Story. Congratulations!

Mar 2024:   PocketGen - Generating Full-Atom Ligand-Binding Protein Pockets

PocketGen is a deep generative model that generates residue sequence and full-atom structure of protein pockets, maximizing binding to ligands. Project website.

Feb 2024:   SPECTRA - Generalizability of Molecular AI

Feb 2024:   Kaneb Fellowship Award

The lab receives the John and Virginia Kaneb Fellowship Award at Harvard Medical School to enhance research progress in the lab.

Feb 2024:   NSF CAREER Award

The lab receives the NSF CAREER Award for our research in geometric deep learning to facilitate algorithmic and scientific advances in therapeutics.

Feb 2024:   Dean’s Innovation Award in AI

Jan 2024:   AI's Prospects in Nature Machine Intelligence

We discussed AI’s 2024 prospects with Nature Machine Intelligence, covering LLM progress, multimodal AI, multi-task agents, and how to bridge the digital divide across communities and world regions.

Jan 2024:   Combinatorial Therapeutic Perturbations

New paper introducing PDGrapher for combinatorial prediction of chemical and genetic perturbations using causally-inspired neural networks.

Nov 2023:   Next Generation of Therapeutics Commons

Oct 2023:   Structure-Based Drug Design

Geometric deep learning has emerged as a valuable tool for structure-based drug design, to generate and refine biomolecules by leveraging detailed three-dimensional geometric and molecular interaction information.

Oct 2023:   Graph AI in Medicine

Graph AI models in medicine integrate diverse data modalities through pre-training, facilitate interactive feedback loops, and foster human-AI collaboration, paving the way to clinically meaningful predictions.

Sep 2023:   New papers accepted at NeurIPS

Sep 2023:   Future Directions in Network Biology

Excited to share our perspectives on current and future directions in network biology.

Aug 2023:   Scientific Discovery in the Age of AI

Jul 2023:   PINNACLE - Contextual AI protein model

PINNACLE is a contextual AI model for protein understanding that dynamically adjusts its outputs based on biological contexts in which it operates. Project website.

Jun 2023:   Our Group is Joining the Kempner Institute

Excited to join Kempner’s inaugural cohort of associate faculty to advance Kempner’s mission of studying the intersection of natural and artificial intelligence.

Jun 2023:   Welcoming a New Postdoctoral Fellow

An enthusiastic welcome to Shanghua Gao who is joining our group as a postdoctoral research fellow.

Jun 2023:   On Pretraining in Nature Machine Intelligence

Zitnik Lab  ·  Artificial Intelligence in Medicine and Science  ·  Harvard  ·  Department of Biomedical Informatics