Mailinglist: signup here!
Tables are a promising modality for representation learning and generative models with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management and analysis pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all intended for relational data. Representation learning for tables, possibly combined with other modalities such as code and text, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, data preparation, and data analysis (e.g. text-to-sql). The pre-training paradigm was shown to be effective for tabular ML (classification/regression) as well. More recently, we also observe promising potential in applying and enhancing LLMs in the domain of structured data to improve how we process and derive insights from structured data.
The Table Representation Learning (TRL) workshop is the premier venue in this emerging research area and has three main goals:
- (1) Motivate structured data (e.g. tables) as a primary modality for representation and generative models and advance the area further.
- (2) Showcase impactful applications of pretrained table models and identify open challenges for future research, with a particular focus on progress in NLP for this edition at ACL in 2025.
- (3) Foster discussion and collaboration across the NLP, ML, IR and DB communities.
Where: Vienna, Austria
Call for Papers
Important Dates
Submission Open | February 1, 2025 |
Submission Deadline | April 15th, 2025 (11:59PM AoE) |
Notifications | May 15th, 2025 (11:59PM AoE) |
Camera-ready | May 30th, 2025 (11:59PM AoE) |
Slides for contributed talks | June 30th, 2025 (11:59PM AoE) |
Video pitches for posters (optional) | June 30th, 2025 (11:59PM AoE) |
Workshop Date | July 31st, 2025 (Tentative) |
Scope
We invite submissions on any of, or related to, the following topics on machine learning for tabular data:
- Representation Learning for (semi-)Structured Data such as spreadsheets, tables, and full relational databases. Example contributions are new model architectures, data encoding techniques, tailored tokenization methods, pre-training and fine-tuning techniques, etc.
- Generative Models and LLMs for Structured Data such as Large Language Models (LLMs) and diffusion models, and specialized techniques for prompt engineering, single-task and multi-task fine-tuning, LLM-driven interfaces and multi-agent systems, retrieval-augmented generation, etc.
- Multimodal Learning where structured data is jointly embedded or combined with other modalities such as text, images, and code (e.g., SQL), knowledge graphs, visualizations/images.
- Applications of TRL models of table representations for tasks like data preparation (e.g. data cleaning, validation, integration, cataloging, feature engineering), retrieval (e.g. data search, fact-checking/QA, KG alignment), analysis (e.g. text-to-SQL and visualization), tabular data generation, (end-to-end) tabular machine learning, table extraction (e.g. parsers/extraction for unstructured data), and query optimization (e.g. cardinality estimation).
- Challenges of TRL models in production Work addressing the challenges of maintaining and managing TRL models in fast-evolving contexts, e.g., data updating, error correction, and monitoring, handling data privacy, personalization performance, etc.
- Domain-specific challenges for learned table models often arise in domains such as enterprise, finance, medical, law. These challenges pertain to table content, table structure, privacy, security limitations, and other factors that necessitate tailored solutions.
- Benchmarks, analyses, and datasets for TRL including assessing LLMs and other generative models as base models versus alternative approaches, analysis of model robustness with respect to large, messy, and heterogeneous tabular data, etc.
- Other contributions such as surveys, demonstrations, visions, and reflections on table representation learning and generative models for structured data.