VaMoS 2020

After a dozen of french and german trains, I’m back from VaMoS 2020 and Magdeburg. The effort was worth: a great conference about (software) variants/configurations with many discussions (a key feature of VaMoS!) and a diverse set of papers/presentations. I was co-chairing the program committee with Maxime Cordy this year. Modeling variability is the general theme, but the papers cover quite different topics, from counting, sampling, and learning (more related to artificial intelligence problems like SAT solving) to maintenance, evolution and reverse engineering of software. The applicability is also sparse: we selected papers about security, cyber-physical production systems, or operating systems. Some works tackle C-based code, Java code, Docker, modeling and testing artifacts, games artifacts, etc.

Read More

GPT-2 and Chess

Shawn Presser has released an intringuing chess engine based on deep learning-based language model (GPT-2). The model was trained on the Kingbase dataset (3.5 million chess games in PGN notation) in 24 hours using 146 TPUs (ouch!). The engine is purely based on text prediction with no concept of chess. Though GPT-2 has already delivered promising/bluffing results for text generation, one can be skeptical and wonder whether it does work for chess.

Read More

MuZero: A new revolution for Chess?

DeepMind has defeated me again: I’ve spent another sleepless night trying to understand the technical and philosophical implications of MuZero, the successor of AlphaZero (itself successor of AlphaGo). What is impressive and novel is that MuZero has no explicit knowledge of the game rules (or environment dynamics) and is still capable of matching the superhuman performance of AlphaZero (for which rules have been programmed in the first place!) when evaluated on Go, chess and shogi. MuZero also outperformed state-of-the-art reinforcement learning on Attari games, without explicit rules. In this blog post, I want to share some reflections, notes, and then specifically discuss the possible impacts of MuZero on Chess.

Read More

Variants Everywhere: My report of SPLC 2019

SPLC is the major international venue on software product lines and variability. Most of today’s software is made variable to allow for more adaptability and economies of scale, while many development practices (DevOps, A/B testing, parameter tuning, continuous integration…) support this goal of engineering software variants. Numerous systems and application domains have to deal with variants: automotive, mobile phones, Web apps, IoT systems, 3D printing, autonomous and deep learning systems, software-defined networks, security, generative art, games… to just give a few examples. The 23rd edition of SPLC has been nicely organized in Paris, and I want to write a small post to report on some interesting initiatives, papers and topics.

Read More

A Year of Teaching (2018-19 season)

I gave my last course of the academic year last week, in the context of a summer school for PhD students (slides are available). A good opportunity to recap and report on my teaching activities. Spoiler alert: almost 300 hours, with a varying audience (from undergraduate students to PhD students, with computer science to data science specialties), different formats (from self-contained 3 hours lessons to the supervision of a running project), and different content (modeling, testing, crafting languages and compilers, mastering variability, etc.).

Read More

Jupyter and chess analysis

Jupyter notebooks are awesome and widely used in data science. You typically share Web documents that contain live code and visualizations. In fact, you can do much more: I bet Jupyter will be the new Emacs and an operating system per se ;) So what about reading chess games inside Jupyter?

Read More

Some tricks to backup large data (like dump with SQL)

We’re sometimes bound to backup data for personal usage (e.g., photos) or in a professional context (e.g., in research). Data can be very large (gigabytes or terabytes), and I have faced many experiences where (1) data are located in a remote machine (2) the copy is not straightforward. I give three examples below, with some tricks (or not). If you want specifically a trick with mysqldump, it’s at the end of the post.

Read More

Standalone Solution for Loading Models with Xtext

Xtext is an open-source and popular framework for the development of domain-specific languages. The contract is appealing: specify a grammar and you get for free (no effort) a parser (ANTLR), a metamodel, a comprehensive editor (working in Eclipse, or even in the Web), and lots of facilities for writing a compiler/an interpreter.

Read More

Twitter and Word Cloud

A Twitter bot, called @wordnuvola, attracted my curiosity: it can generate a word cloud of your tweets (nice!). As a user, you simply have to follow the account and send back a tweet. In return, you have a nice image after some minutes. The service seems quite successful (viral?) since many people I follow use it for fun. My word cloud result was:

Read More

Jupyter and Markdown

You can write Markdown as part of Jupyter notebooks. The basic idea is to use a special “Cell Type”. An important feature I struggle to activate is the ability to call a (Python) variable inside Markdown.

Read More

The WikipediaMatrix Challenge

How to automatically extract tabular data out of Wikipedia pages? Tables are omnipresent in Wikipedia and incredibly useful for classifying and comparing many “things” (products, food, software, countries, etc.). Yet this huge source of information is hard to exploit by visualization, statistical, spreadsheet, or machine learning tools (think: Tableau, Excel, pandas, R, d3.js, etc.) — the basic reason is that tabular data are not represented in an exploitable tabular format (like the simple CSV format). At first glance, the problem looks quite simple: there should be a piece of code or a library capable of processing some HTML, iterating over some td and tr tags, and producing a good result. In practice, at least to the best of my knowledge, there is not yet reusable and robust solution: corner cases are the norm. In the remainder, I’ll explain why and to what extent the problem is difficult, especially from a testing point of view.

Read More