This blog post is an excuse to share some good music and thoughts. Essential Mix aka EM is a super famous radio show on BBC 1 featuring artists (DJs/producers) that deliver a two-hour mix of electronic dance music. I haven’t listened all weekly essential mix since 1993, but I’ve been lucky to hear many, thanks to Internet. Essential Mix is incredible because of the diversity of artists and quality of the shows. Here is an annotated list of my 10 best essential mix (you can easily find them on Youtube or soundcloud):
After a dozen of french and german trains, I’m back from VaMoS 2020 and Magdeburg. The effort was worth: a great conference about (software) variants/configurations with many discussions (a key feature of VaMoS!) and a diverse set of papers/presentations. I was co-chairing the program committee with Maxime Cordy this year. Modeling variability is the general theme, but the papers cover quite different topics, from counting, sampling, and learning (more related to artificial intelligence problems like SAT solving) to maintenance, evolution and reverse engineering of software. The applicability is also sparse: we selected papers about security, cyber-physical production systems, or operating systems. Some works tackle C-based code, Java code, Docker, modeling and testing artifacts, games artifacts, etc.
Shawn Presser has released an intringuing chess engine based on deep learning-based language model (GPT-2). The model was trained on the Kingbase dataset (3.5 million chess games in PGN notation) in 24 hours using 146 TPUs (ouch!). The engine is purely based on text prediction with no concept of chess. Though GPT-2 has already delivered promising/bluffing results for text generation, one can be skeptical and wonder whether it does work for chess.
2019 is over (all the best!) and it’s a good excuse to report on some of my failures and successes (from an MIP to football, in no particular order):
DeepMind has defeated me again: I’ve spent another sleepless night trying to understand the technical and philosophical implications of MuZero, the successor of AlphaZero (itself successor of AlphaGo). What is impressive and novel is that MuZero has no explicit knowledge of the game rules (or environment dynamics) and is still capable of matching the superhuman performance of AlphaZero (for which rules have been programmed in the first place!) when evaluated on Go, chess and shogi. MuZero also outperformed state-of-the-art reinforcement learning on Attari games, without explicit rules. In this blog post, I want to share some reflections, notes, and then specifically discuss the possible impacts of MuZero on Chess.
I gave a talk (see Youtube video) at Embedded Linux Conference Europe 2019 (co-located with Open Source Summit 2019) in Lyon about “Learning the Linux Kernel Configuration Space: Results and Challenges”. The conference was a blast, maybe one of the best conference I’ve attended: diversity of topics, quality of presenters, in-depth and technical content, and many^many exchanges with people that want to understand, share, and help.
SPLC is the major international venue on software product lines and variability. Most of today’s software is made variable to allow for more adaptability and economies of scale, while many development practices (DevOps, A/B testing, parameter tuning, continuous integration…) support this goal of engineering software variants. Numerous systems and application domains have to deal with variants: automotive, mobile phones, Web apps, IoT systems, 3D printing, autonomous and deep learning systems, software-defined networks, security, generative art, games… to just give a few examples. The 23rd edition of SPLC has been nicely organized in Paris, and I want to write a small post to report on some interesting initiatives, papers and topics.
I gave my last course of the academic year last week, in the context of a summer school for PhD students (slides are available). A good opportunity to recap and report on my teaching activities. Spoiler alert: almost 300 hours, with a varying audience (from undergraduate students to PhD students, with computer science to data science specialties), different formats (from self-contained 3 hours lessons to the supervision of a running project), and different content (modeling, testing, crafting languages and compilers, mastering variability, etc.).
Jupyter notebooks are awesome and widely used in data science. You typically share Web documents that contain live code and visualizations. In fact, you can do much more: I bet Jupyter will be the new Emacs and an operating system per se ;) So what about reading chess games inside Jupyter?
We’re sometimes bound to backup data for personal usage (e.g., photos) or in a professional context (e.g., in research). Data can be very large (gigabytes or terabytes), and I have faced many experiences where (1) data are located in a remote machine (2) the copy is not straightforward. I give three examples below, with some tricks (or not). If you want specifically a trick with mysqldump, it’s at the end of the post.
Xtext is an open-source and popular framework for the development of domain-specific languages. The contract is appealing: specify a grammar and you get for free (no effort) a parser (ANTLR), a metamodel, a comprehensive editor (working in Eclipse, or even in the Web), and lots of facilities for writing a compiler/an interpreter.
A Twitter bot, called @wordnuvola, attracted my curiosity: it can generate a word cloud of your tweets (nice!). As a user, you simply have to follow the account and send back a tweet. In return, you have a nice image after some minutes. The service seems quite successful (viral?) since many people I follow use it for fun. My word cloud result was:
You can write Markdown as part of Jupyter notebooks. The basic idea is to use a special “Cell Type”. An important feature I struggle to activate is the ability to call a (Python) variable inside Markdown.
How to automatically extract tabular data out of Wikipedia pages? Tables are omnipresent in Wikipedia and incredibly useful for classifying and comparing many “things” (products, food, software, countries, etc.). Yet this huge source of information is hard to exploit by visualization, statistical, spreadsheet, or machine learning tools (think: Tableau, Excel, pandas, R, d3.js, etc.) — the basic reason is that tabular data are not represented in an exploitable tabular format (like the simple CSV format). At first glance, the problem looks quite simple: there should be a piece of code or a library capable of processing some HTML, iterating over some td and tr tags, and producing a good result. In practice, at least to the best of my knowledge, there is not yet reusable and robust solution: corner cases are the norm. In the remainder, I’ll explain why and to what extent the problem is difficult, especially from a testing point of view.