How to Win in 4 or 7 Chess Moves against GPTs

I find a way to systematically force a win in 4 moves against the last release of OpenAI (ChatGPT-4o) and in 7 moves against the best GPT at chess gpt-3.5-turbo-instruct, estimated at 1750 Elo. Clickbait: you don’t need to master chess to beat GPTs, you can well be 700 Elo, just follow the moves ;-) The story behind the discovery is interesting and worth sharing. I also discuss the generalization of the results and the implications for the future of GPTs and chess. Don’t fear the clickbait, the post is serious!

Read More

VaMoS 2024

I’m back from VaMoS 2024 that has been held in Bern (Switzerland). As in 2020 (and actually as in 2007, 2008, …, 2023) a great conference about software variability, variants and configurations. Here is a small report and a few thoughts about the event, that (teasing!) will be organized in Rennes (France) in 2025.

Read More

Linus Torvalds on LLM

Great interview of Linus Torvalds on many topics (Rust, git, etc.) and also on LLM and programming tasks: https://www.youtube.com/watch?v=OvuEYtkOH88 (starting at 20:30). Here is the transcript (using yt-dlp and OpenAI Whisper):

Read More

Testing your DSLs in Langium

Langium is a framework to build domain-specific languages (DSLs). The principle is to write a grammar-like specification and obtains for free zero-effort a parser, an abstract syntax tree (AST), a customizable and advanced editor that can run on the Web or on any modern IDEs such as VSCode, and facilities to interpret or compile programs written in your DSL. Basically, engineering external, textual DSLs at the speed of light! Langium is the successor of Xtext and is under active development. Langium targets TypeScript, Language Server Protocol (LSP), VSCode, and Web technologies. One “feature” I am missing from Xtext is the ability to programmatically test your DSL: test your syntax and conformant/illegal programs, test your interpreter or multiple compilers of your DSL, etc. In this post, I want to outline a setup for facilitating the testing of a DSL using Langium.

Read More

Quick Thoughts about "Bridging the Human–AI Knowledge Gap: Concept Discovery and Transfer in AlphaZero"

DeepMind released once again a fascinating paper (preprint: https://arxiv.org/abs/2310.16410) that develops new interpretability techniques to communicate chess concepts (extracted from AlphaZero) to humans. Four top players, including Vladimir Kramnik and Maxime Vachier-Lagrave, improved chess puzzle solving after being exposed. Evidence is weak, but it’s a signal of a possible revolution. Quick thoughts…

Read More

Thanks ChatGPT for the tricks with PlantUML layout

Customizing the layout (or look and feel) of PlantUML diagrams is tricky. There is a dense and comprehensive documentation, but there is still a gap between what we want and what we get. There are tricks and hacks to overcome default behavior or missing features of PlantUML. Here I am sharing a fun session with ChatGPT that have successfully assisted me in a quite creative way! The solution seems not to be in the documentation, and the “hack” can certainly be systematically applied. Or did I miss something with PlantUML?

Read More

Debunking the Chessboard: Confronting GPTs Against Chess Engines to Estimate Elo Ratings and Assess Legal Move Abilities

Can GPTs like ChatGPT-4 play legal moves and finish chess games? What is the actual Elo rating of GPTs? There have been some hypes, (subjective) assessment, and buzz lately from “GPT is capable of beating 99% of players?” to “GPT plays lots of illegal moves” to “here is a magic prompt with Magnus Carlsen in the headers”. There are more or less solid anecdotes here and there, with counter-examples showing impressive failures or magnified stories on how GPTs can play chess well. I’ve resisted for a long time, but I’ve decided to do it seriously! I have synthesized hundreds of games with different variants of GPT, different prompt strategies, against different chess engines (with various skills). This post is here to document the variability space of experiments I have explored so far… and the underlying insights and results. "Photo of a chessboard with various parrots huddled together, symbolizing different GPT versions, looking puzzled at the chess pieces. On the opposite side, a serene fish, representing Stockfish, contemplates its next move. In the background, a vigilant woman referee observes the game, ensuring the rules are followed." The tldr; is that gpt-3.5-turbo-instruct operates around 1750 Elo and is capable of playing end-to-end legal moves, even with black pieces or when the game starts with strange openings. However, though there are “avoidable” errors, the issue of generating illegal moves is still present in 16% of the games. Furthermore, ChatGPT-3.5-turbo and more surprisingly ChatGPT-4, however, are much more brittle. Hence, we provide first solid evidence that training for chat makes GPT worse on a well-defined problem (chess). Please do not stop to the tldr; and read the entire blog posts: there are subtleties and findings worth discussing!

Read More

Scientific Trip in Japan and SPLC 2023

I spent a couple of days in Japan (mainly Tokyo) for visiting the National Institute of Informatics (NII) and attending the 27th ACM International Systems and Software Product Line Conference (SPLC 2023). I have been involved in several scientific activities:

Read More

ChatGPT for Programming Variability and Variants

I presented at 27th ACM International Systems and Software Product Line Conference (SPLC 2023) the paper “On Programming Variability with Large Language Model-based Assistant” joint work with José A. Galindo and Jean-Marc Jézéquel. In short how LLMs and ChatGPT can be concretely and originally used for programming software variability and variants. I first showed how ChatGPT can assist developers in implementing variability in different programming languages (C, Rust, Java, TikZ, etc.) and mechanisms (conditional compilation, feature toggles, command-line parameters, template, etc.). With “features as prompts”, there is hope to raise the level of abstraction, increase automation, and bring more flexibility when synthesizing and exploring software variants. In a sense, generative AI meets generative programming.

Read More

ChatGPT for Reengineering Variants

I just presented at 27th ACM International Systems and Software Product Line Conference (SPLC 2023, VariVolution workshop) the paper “Generative AI for Reengineering Variants into Software Product Lines: An Experience Report” joint work with Jabier Martinez. Basically using ChatGPT to analyze a bunch of software variants that differ a bit. A short and hopefully intuitive overview in this post: slides, paper, git repo at the end. Creating or maintaining software usually leads to customize and assemble pieces of code… But it can be a mess! To take an analogy, it’s much more convenient to structure reusable pieces of a puzzle that you can systematically combine… Even kids do that!

Read More

Deep Software Variability for Replicability in Computational Science (talk in Japan)

I talked about deep software variability and reproducibility/replicability in computational science at SES 2023, IPSJ/SIGSE Software Engineering Symposium in Tokyo, Japan. Thanks to Paolo Arcaini and Fuyuki Ishikawa for the invitation! Slides are here: https://docs.google.com/presentation/d/1S2YDDMHw9FJ-ogpiGvUvmeHkYhFOQo4Xbccmjg4FL_Q/edit?usp=drivesdk A few photos: https://twitter.com/acherm/status/1695305466455236925

Read More

The Irrational July 12, 1998

I’d like to tell a story about a very special day, the July 12, 1998. There were three things: France won for the first time the World cup in football after decades of losing; it was my 14th birthday; and I lost at European youth chess a very important game, in a purely irrational way… something I have still difficulties to explain 25 years after (and even beginners can wonder what has happened). As we’re July 12 today, I’m thinking it’s a good opportunity to take a step back and reflect on how these three things irrationally relate (or not?)

Read More

Stockfish(1): Chess Piece Values

What is the value of a queen, a bishop, or a knight in chess? There have been several attempts in the past, mainly in books to help assessing positions and taking a decision (e.g., when exchanging pieces). We’ll all agree that you should not give your queen for a pawn in general, but chess is full of subtlety and beauty! A Twitter thread nicely summarizes the different attempts. This nice Wikipedia article lists different proposals, including the article “Assessing Game Balance with AlphaZero: Exploring Alternative Rule Sets in Chess” from Deepmind and Vladimir Kramnik as co-author (funny affiliation: “World Chess Champion 2000–2007”). I have been intrigued to know how Stockfish (the strongest, open source chess engine) encodes and deals with chess piece values. So here we go, let’s dig into the source code and git repository!

Read More

Underrated Electronic Music Tracks (The Case of Club Quarantine)

Some songs / tracks are amazing, but deserve much^more views. It’s really surprising… and meanwhile it’s a bit what we are seeking sometimes: some hidden gems ;) I am a huge fan of listening to DJ sets and then identifying “killer” tracks. The amazing https://www.1001tracklists.com/ has democratized the sharing of tracklist to a next level. Some IDs remain IDs forever… but not all fortunately.

Read More

COVID and Home Advantage in Football: An Analysis of Results and xG Data in European Leagues

In COVID era, football matches have been played in empty stadiums and unusual conditions. One hypothesis is that COVID may have questioned the way the game is played and especially the advantage of playing at home (e.g., due to the absence of fans). In this post, I analyze matches’ results and expected goals (xG)-like data of European leagues (Ligue 1, la Liga, Calcio, Bundesliga, Premier League, Russian championship) since 2014. Results show something happened, especially in Premier League and Ligue 1 for season 2020-2021. I argue there is thus a unique opportunity to qualitatively analyze the COVID period (potentially with data obtained from computer analysis) in order to understand the new tactical or management approaches implemented. Perhaps COVID will lead to a (r)evolution of modern football?

Read More

OBS and Virtual Cam (Linux)

As part of now remote conferences or courses, you may need to show many things: not only your slides, but also your terminal, your browser, or you! There are many (proprietary) systems out there for sharing your screen live: BigBlueButton, Jitsi, Google meet, Zoom, Skype, Teams, etc. My recent experience is that it does not work so well. Hence an idea is to create a virtual camera and project what you want through it. That is, you create this fake camera and then broadcast to any conferencing system (you “just” have to select the virtual cam). Some details below about I setup everything to achieve this on Linux with OBS, v4l2loopback, obs-v4l2sink, ffmpeg, etc.

Read More

Docker New Inactive Image Policy and Reproducible Science

Docker is a popular technology for delivering software in so-called containers. It is mainly used for deploying applications on the cloud, but is also widely considered in the scientific community. Docker Inc. has introduced a new inactive image retention policy: in short, images that have not been pulled or pushed in 6 months are considered as inactive and will be removed. In this blog post, I’m briefly discussing what could be the possible impacts on science, based on my experience and some concrete cases. I have no silver bullet but something is clear: scientists should react now and discuss/find alternate, sustainable solutions.

Read More

Programming (Chess) Puzzles with a Tweet

A friend posed the following puzzle/problem on social media: “Given a 8x8 chessboard, your goal is to place 4 queens and 1 bishop so that all squares of the board are controlled (through diagonales/lines; a piece controls the square where it is located).” My usual reaction is to either promptly ignore this kind of fake problem or to try the resolution for real on a concrete chessboard or mentally, sans voir. But in this >quarantine period, I wanted to find a solution with a program (in next blog posts, I may explain how this attitude becomes a pattern beyond chess puzzles). Here is a short story about the process that lead to a Python solution in less than 280 characters that fits in a Tweet.

Read More

On the Longest Chess Game Ever(!?)

Do you want to see a chess game with almost 18,000 half-moves? Really? OK, here is a Youtube video of almost 5 hours But wait: How is it possible? What’s the point? In practice, the longest games are up to 250 moves and such games are really outliers (the mean is certainly less than 40 moves). So Tom Murphy did it again with his 6th chess paper at the very prestigious SIGBOVIK 2020 (have a look at other papers, it’s both funny and brilliant). Tom generated a game of 17,697 plies (8849 moves), certainly the longest chess game ever.

Read More

Is Cercle the new Essential Mix?

This blog post is an excuse to share some good music and thoughts. Essential Mix aka EM is a super famous radio show on BBC 1 featuring artists (DJs/producers) that deliver a two-hour mix of electronic dance music. I haven’t listened all weekly essential mix since 1993, but I’ve been lucky to hear many, thanks to Internet. Essential Mix is incredible because of the diversity of artists and quality of the shows. Here is an annotated list of my 10 best essential mix (you can easily find them on Youtube or soundcloud):

Read More

VaMoS 2020

After a dozen of french and german trains, I’m back from VaMoS 2020 and Magdeburg. The effort was worth: a great conference about (software) variants/configurations with many discussions (a key feature of VaMoS!) and a diverse set of papers/presentations. I was co-chairing the program committee with Maxime Cordy this year. Modeling variability is the general theme, but the papers cover quite different topics, from counting, sampling, and learning (more related to artificial intelligence problems like SAT solving) to maintenance, evolution and reverse engineering of software. The applicability is also sparse: we selected papers about security, cyber-physical production systems, or operating systems. Some works tackle C-based code, Java code, Docker, modeling and testing artifacts, games artifacts, etc.

Read More

GPT-2 and Chess

Shawn Presser has released an intringuing chess engine based on deep learning-based language model (GPT-2). The model was trained on the Kingbase dataset (3.5 million chess games in PGN notation) in 24 hours using 146 TPUs (ouch!). The engine is purely based on text prediction with no concept of chess. Though GPT-2 has already delivered promising/bluffing results for text generation, one can be skeptical and wonder whether it does work for chess.

Read More

MuZero: A new revolution for Chess?

DeepMind has defeated me again: I’ve spent another sleepless night trying to understand the technical and philosophical implications of MuZero, the successor of AlphaZero (itself successor of AlphaGo). What is impressive and novel is that MuZero has no explicit knowledge of the game rules (or environment dynamics) and is still capable of matching the superhuman performance of AlphaZero (for which rules have been programmed in the first place!) when evaluated on Go, chess and shogi. MuZero also outperformed state-of-the-art reinforcement learning on Attari games, without explicit rules. In this blog post, I want to share some reflections, notes, and then specifically discuss the possible impacts of MuZero on Chess.

Read More

Variants Everywhere: My report of SPLC 2019

SPLC is the major international venue on software product lines and variability. Most of today’s software is made variable to allow for more adaptability and economies of scale, while many development practices (DevOps, A/B testing, parameter tuning, continuous integration…) support this goal of engineering software variants. Numerous systems and application domains have to deal with variants: automotive, mobile phones, Web apps, IoT systems, 3D printing, autonomous and deep learning systems, software-defined networks, security, generative art, games… to just give a few examples. The 23rd edition of SPLC has been nicely organized in Paris, and I want to write a small post to report on some interesting initiatives, papers and topics.

Read More

A Year of Teaching (2018-19 season)

I gave my last course of the academic year last week, in the context of a summer school for PhD students (slides are available). A good opportunity to recap and report on my teaching activities. Spoiler alert: almost 300 hours, with a varying audience (from undergraduate students to PhD students, with computer science to data science specialties), different formats (from self-contained 3 hours lessons to the supervision of a running project), and different content (modeling, testing, crafting languages and compilers, mastering variability, etc.).

Read More

Jupyter and chess analysis

Jupyter notebooks are awesome and widely used in data science. You typically share Web documents that contain live code and visualizations. In fact, you can do much more: I bet Jupyter will be the new Emacs and an operating system per se ;) So what about reading chess games inside Jupyter?

Read More

Some tricks to backup large data (like dump with SQL)

We’re sometimes bound to backup data for personal usage (e.g., photos) or in a professional context (e.g., in research). Data can be very large (gigabytes or terabytes), and I have faced many experiences where (1) data are located in a remote machine (2) the copy is not straightforward. I give three examples below, with some tricks (or not). If you want specifically a trick with mysqldump, it’s at the end of the post.

Read More

Standalone Solution for Loading Models with Xtext

Xtext is an open-source and popular framework for the development of domain-specific languages. The contract is appealing: specify a grammar and you get for free (no effort) a parser (ANTLR), a metamodel, a comprehensive editor (working in Eclipse, or even in the Web), and lots of facilities for writing a compiler/an interpreter.

Read More

Twitter and Word Cloud

A Twitter bot, called @wordnuvola, attracted my curiosity: it can generate a word cloud of your tweets (nice!). As a user, you simply have to follow the account and send back a tweet. In return, you have a nice image after some minutes. The service seems quite successful (viral?) since many people I follow use it for fun. My word cloud result was:

Read More

Jupyter and Markdown

You can write Markdown as part of Jupyter notebooks. The basic idea is to use a special “Cell Type”. An important feature I struggle to activate is the ability to call a (Python) variable inside Markdown.

Read More

The WikipediaMatrix Challenge

How to automatically extract tabular data out of Wikipedia pages? Tables are omnipresent in Wikipedia and incredibly useful for classifying and comparing many “things” (products, food, software, countries, etc.). Yet this huge source of information is hard to exploit by visualization, statistical, spreadsheet, or machine learning tools (think: Tableau, Excel, pandas, R, d3.js, etc.) — the basic reason is that tabular data are not represented in an exploitable tabular format (like the simple CSV format). At first glance, the problem looks quite simple: there should be a piece of code or a library capable of processing some HTML, iterating over some td and tr tags, and producing a good result. In practice, at least to the best of my knowledge, there is not yet reusable and robust solution: corner cases are the norm. In the remainder, I’ll explain why and to what extent the problem is difficult, especially from a testing point of view.

Read More