3rd Table Representation Learning Workshop @ NeurIPS 2024

14 December 2024, Vancouver, Canada.


Tables are a promising modality for representation learning and generative models with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management and analysis pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all intended for relational data. Representation learning for tables, possibly combined with other modalities such as code and text, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, data preparation, and data analysis (e.g. text-to-sql). The pre-training paradigm was shown to be effective for tabular ML (classification/regression) as well. More recently, we also observe promising potential in applying and enhancing LLMs in the domain of structured data to improve how we process and derive insights from structured data.

The Table Representation Learning (TRL) workshop is the premier venue in this emerging research area and has three main goals:

  • (1) Motivate tables as a primary modality for representation and generative models and advance the area further.
  • (2) Showcase impactful applications of pretrained table models and identify open challenges for future research, with a particular focus on industry insights in 2024.
  • (3) Foster discussion and collaboration across the ML, NLP, IR and DB communities.

When: Saturday 14 December 2024.
Where: Vancouver, Canada.
Specific questions: madelon@berkeley.edu
Follow on Twitter: @TrlWorkshop
Apply for travel award (students), deadline 31 Oct '24: https://forms.gle/hgCfMakWa91e6Zmj9

Sponsored by:


Call for Papers


Important Dates


Submission Open September 1, 2024
Submission Deadline September 20, 2024 (11:59PM AoE)
Notifications October 9, 2024 (11:59PM AoE)
Camera-ready October 30, 2024 (11:59PM AoE)
Slides for contributed talks November 30, 2024 (11:59PM AoE)
Video pitches for posters (optional) November 30, 2024 (11:59PM AoE)
Workshop Date December 14, 2024

Scope

We invite submissions on representation and generative learning over tables, related to any of the following topics:

  • Representation Learning for (semi-)Structured Data such as spreadsheets, tables, and full relational databases. Example contributions are new model architectures, data encoding techniques, tailored tokenization methods, pre-training and fine-tuning techniques, etc.
  • Generative Models and LLMs for Structured Data such as Large Language Models (LLMs) and diffusion models, and specialized techniques for prompt engineering, single-task and multi-task fine-tuning, LLM-driven interfaces and multi-agent systems, retrieval-augmented generation, etc.
  • Multimodal Learning where structured data is jointly embedded or combined with other modalities such as text, images, and code (e.g., SQL), knowledge graphs, visualizations/images.
  • Applications of TRL models of table representations for tasks like data preparation (e.g. data cleaning, validation, integration, cataloging, feature engineering), retrieval (e.g. data search, fact-checking/QA, KG alignment), analysis (e.g. text-to-SQL and visualization), tabular data generation, (end-to-end) tabular machine learning, table extraction (e.g. parsers/extraction for unstructured data), and query optimization (e.g. cardinality estimation).
  • Challenges of TRL models in production Work addressing the challenges of maintaining and managing TRL models in fast-evolving contexts, e.g., data updating, error correction, and monitoring, handling data privacy, personalization performance, etc.
  • Domain-specific challenges for learned table models often arise in domains such as enterprise, finance, medical, law. These challenges pertain to table content, table structure, privacy, security limitations, and other factors that necessitate tailored solutions.
  • Benchmarks, analyses, and datasets for TRL including assessing LLMs and other generative models as base models versus alternative approaches, analysis of model robustness with respect to large, messy, and heterogeneous tabular data, etc.
  • Other contributions such as surveys, demonstrations, visions, and reflections on table representation learning and generative models for structured data.

Organization

Workshop Chairs


Haoyu Dong
Microsoft
Laurel Orr
Numbers Station AI
Qian Liu
Sea AI Lab

Vadim Borisov
University of Tübingen




Program

TRL is again entirely in-person, and will this year feature 2 poster sessions and contributed talks. We also host a few exciting invited talks on established research in this emerging area, and a panel discussion focused on industry/startup perspectives.

Invited Speakers


Yasemin Altun
Google DeepMind
Binyuan Hui
Qwen Team, Alibaba
Mirella Lapata
University of Edinburgh
Gaël Varoquaux
Inria, Probabl
Matei Zaharia
UC Berkeley, Databricks




Panelists (tentative and TBC)


Ines Chami
Numbers Station


Binyuan Hui
Qwen Team, Alibaba


Douwe Kiela
Contextual AI


Maithra Raghu
Samaya AI













Submission Guidelines

Submission link

Submit your (anonymized) paper through OpenReview at: TBC
Please be aware that accepted papers are expected to be presented at the workshop in-person.

Formatting guidelines

The workshop accepts regular research papers and industrial papers of the following types:
  • Short paper: 4 pages + references and appendix.
  • Regular paper: 8 pages + references and appendix.


Submissions should be anonymized and follow the NeurIPS style files (zip), but can exclude the checklist. Non-anonymous preprints are no problem, and artifacts do not have to be anonymized. Just submitting the paper without author names/affiliations is sufficient. Supplementary material, if any, may be added in the appendix. The footer of accepted papers should state “Table Representation Learning Workshop at NeurIPS 2024”. We expect authors to adopt an inclusive and diverse writing style. The “Diversity and Inclusion in Writing” guide by the DE&I in DB Conferences effort is a good resource.

Review process

Papers will receive light reviews in a double-anonymous manner. All accepted submissions will be published on the website and made public on OpenReview but the workshop is non-archival (i.e. without proceedings).

Novelty and conflicts

The workshop cannot accept submissions that have been published at NeurIPS or other machine learning venues as-is, but we do invite relevant papers from the main conference (NeurIPS) to be submitted to the workshop as 4-page short papers. We also welcome submissions that have been published in, for example, data management or natural language processing venues. We rely on OpenReview for handling conflicts, so please ensure that the conflicts in every author's OpenReview profile are complete, in particular, with respect to the organization and program committees.

Camera-ready instructions

Camera-ready papers are expected to express the authors and affiliations on the first page, and state "Table Representation Learning Workshop at NeurIPS 2024'' in the footer. The camera-ready version may exceed the page limit for acknowledgements or small content changes, but revision is not required (for short papers: please be aware of novelty requirements of archival venues, e.g. SIGMOD, CVPR). The camera-ready version should be submitted through OpenReview (submission -> edit -> revision), and will be published on OpenReview and this website. Please make sure that all meta-data is correct as well, as it will be imported to the NeurIPS website.

Presentation instructions

All accepted papers will be presented as poster during one of the poster sessions (the schedule per poster session will be released soon). For poster formatting, please refer to the poster instructions on the NeurIPS site (template, upload, etc), you can print and bring the poster yourself or print it locally through the facilities offered by NeurIPS.
Optional: authors of poster submissions are also invited to send a teaser video of approx. 3 minutes (.mp4) to madelon@berkeley.edu, which will be hosted on the website and YouTube channel of the workshop.
Papers selected for oral presentation are also asked to prepare a talk of 9 minutes (+1 min Q&A), and upload their slides through the "slides" field in OpenReview (pdf) or share a link to Google Slides with madelon@cwi.nl. The schedule for the oral talks will be published soon. The recordings of oral talks will be published afterwards.

Program Committee: TBC

Unfold for full committee
We are very grateful to all below members of the Program Committee!
Wenhu Chen, University of Waterloo
Mukul Singh, Microsoft
Sercan O Arik, Google
Micah Goldblum, New York University
Andreas Muller, Microsoft
Xi Fang, Yale University
Naihao Deng, University of Michigan
Sebastian Schelter, BIFOLD & TU Berlin
Weijie Xu, Amazon
Rajat Agarwal, Amazon
Sharad Chitlangia, Amazon
Lei Cao, University of Arizona
Paul Groth, University of Amsterdam
Alex Zhuang, University of Waterloo
Sepanta Zeighami, University of California, Berkeley
Jayoung Kim, Yonsei University
Jaehyun Nam, KAIST
Sascha Marton, University of Mannheim
Tianji Cong, University of Michigan
Myung Jun Kim, Inria
Aneta Koleva, University of Munich
Peter Baile Chen, MIT
Gerardo Vitagliano, MIT
Reynold Cheng, the University of Hong Kong
Till Döhmen, MotherDuck / University of Amsterdam
Ivan Rubachev, Higher School of Economics
Raul Castro Fernandez, University of Chicago
Peng Shi, University of Waterloo
Paolo Papotti, Eurecom
Carsten Binnig, TU Darmstadt / Google
Tianyang Liu, University of California, San Diego
Tianbao Xie, the University of Hong Kong
Jintai Chen, University of Illinois at Urbana-Champaign
Sebastian Bordt, Eberhard-Karls-Universität Tübingen
Panupong Pasupat, Google
Liangming Pan, University of Arizona
Xinyuan Lu, National University of Singapore
Ziyu Yao, George Mason University
Shuhan Zheng, Hitachi, Ltd.
Shuaichen Chang, Amazon
Julian Martin Eisenschlos, Google DeepMind
Noah Hollmann, Albert-Ludwigs-Universität Freiburg
Linyong Nan, Yale University
Tianshu Zhang, Ohio State University
Liane Vogel, Technische Universität Darmstadt
Roman Levin Amazon
Henry Gouk, University of Edinburgh
Yury Gorishniy, Moscow Institute of Physics and Technology
Edward Choi, KAIST
Gyubok Lee, KAIST
Mingyu Zheng, University of Chinese Academy of Sciences
Tassilo Klein, SAP
Ge Qu, the University of Hong Kong
Artem Babenko, Yandex
Shreya Shankar, University of California Berkeley
Xiang Deng, Google
Zhoujun Cheng, UC San Diego
Mengyu Zhou, Microsoft Research
Mira Moukheiber, MIT
Niklas Wretblad, Linköping University
Gust Verbruggen, Microsoft
Mukul Singh, Microsoft
Amine Mhedhbi, Polytechnique Montréal



Accepted Papers


2024

Your Paper?


2023 (unfold)

Oral

Poster





2022 (unfold)

Oral



Poster