FedPS: Federated data Preprocessing via aggregated Statistics
Abstract
FedPS is a federated data preprocessing framework that uses aggregated statistics and data-sketching techniques to enable efficient and privacy-preserving data preparation for collaborative machine learning across distributed datasets.
Federated Learning (FL) enables multiple parties to collaboratively train machine learning models without sharing raw data. However, before training, data must be preprocessed to address missing values, inconsistent formats, and heterogeneous feature scales. This preprocessing stage is critical for model performance but is largely overlooked in FL research. In practical FL systems, privacy constraints prohibit centralizing raw data, while communication efficiency introduces further challenges for distributed preprocessing. We introduce FedPS, a unified framework for federated data preprocessing based on aggregated statistics. FedPS leverages data-sketching techniques to efficiently summarize local datasets while preserving essential statistical information. Building on these summaries, we design federated algorithms for feature scaling, encoding, discretization, and missing-value imputation, and extend preprocessing-related models such as k-Means, k-Nearest Neighbors, and Bayesian Linear Regression to both horizontal and vertical FL settings. FedPS provides flexible, communication-efficient, and consistent preprocessing pipelines for practical FL deployments.
Community
TL;DR: A unified framework for tabular data preprocessing in federated learning.
Nice work on federated preprocessing!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- One-Shot Federated Ridge Regression: Exact Recovery via Sufficient Statistic Aggregation (2026)
- Principled Federated Random Forests for Heterogeneous Data (2026)
- RefProtoFL: Communication-Efficient Federated Learning via External-Referenced Prototype Alignment (2026)
- Fisher-Informed Parameterwise Aggregation for Federated Learning with Heterogeneous Data (2026)
- Single-Round Clustered Federated Learning via Data Collaboration Analysis for Non-IID Data (2026)
- Federated Learning and Class Imbalances (2026)
- ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper