Gaël Varoquaux

CARTE: toward table foundation models

Fri, 19 Jul 2024 00:00:00

Note

Foundation models, pretrained and readily usable for many downstream tasks, have changed the way we process text, images, and sound. Can we achieve similar breakthroughs for tables? Here I explain why with “CARTE”, we’ve made significant headway.

Contents

Pre-training for data tables: hopes and challenges
- Pre-training is a …

Skrub 0.2.0: tabular learning made easy

Wed, 03 Jul 2024 00:00:00

We just released skrub 0.2.0. This release markedly simplifies learning on complex dataframes.

model = tabular_learner(‘classifier’)

Simple, yet solid default baseline

The highlight of the release is the tabular_learner function, which facilitates creating pipelines that readily perform machine learning on dataframes, adding preprocessing to a scikit-learn compatible learner …

Do AIs reason or recite?

Tue, 25 Jun 2024 00:00:00

Despite their apparent intelligence, conversational artificial intelligences often lack logic. The debate rages on: do they reason or do they recite snatches of text memorized on the Internet?

Note

This post was originally published in French as part of my scientific chronicle in Les Echos.

Conversational AI, or large language …

Promoting open-source, from inria to :probabl.

Sun, 09 Jun 2024 00:00:00

Note

Open-source efforts around scikit-learn at Inria are spinning off to a new enterprise, Probabl, in charge of sustainable development of a data-science commons.

Contents

Prelude: funding scikit-learn is hard
The birth of a new ambition
Probabl, a mission-driven enterprise
Probabl is already having an impact
My position within Probabl …

People underestimate how impactful Scikit-learn continues to be

Mon, 27 Nov 2023 00:00:00

Note

François Chollet rightfully said that people often underestimate the impact of scikit-learn. I give here a few illustrations to back his claim.

A few days ago, François Chollet (the creator of Keras, the library that that democratized deep learning) posted:

Indeed, scikit-learn continues to be the most popular machine …