R4R (Reproducibility for R) aims at making R notebooks reproducible by detecting sources of non-determinism in the notebook, and creating artifacts from the detected dependencies.

This project has received funding from the European Union’s Horizon Europe research and innovation program, ERC PoC 2022, under grant agreement No. 101081989.

Rigorous Engineering of Data Analysis Pipelines (RiGiD)

The RiGiD project lays the groundwork for this research programme and aims to develop a methodology for rigorous engineering of data analysis pipelines that can be adopted in practice. Our approach is pragmatic. Rather than chasing functional correctness, we hope to substantially reduce the incidence of errors in the wild. The research is structured in three overlapping chapters: a catalog of error patterns as well as a labeled dataset to be shared with other researchers, a methodology and tooling for developing data sciences codes with reduced error rates, evaluatation by conducting user studies and developing tools for automating deployment.

This project is supported by the Czech Science Foundation under grand program GX23-07580X (excellence in research EXPRO).

Evolving Language Ecosystems (ELE)

The Evolving Language Ecosystems project explores the fundamental techniques and algorithms for evolving programming languages and their ecosystems. Our purpose is to reduce the cost of wide-ranging language changes and obviate the need for devising entirely new languages. Our findings will grant both researchers and practitioners a greater degree of freedom when experimenting with new ideas on how to express computation.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 695412.

More information at

Big Code: Scalable Analysis of Massive Code Bases

Computer code is increasingly a shared resource. Web sites such as GitHub and BitBucket host tens of millions of software projects. With great amounts of code, come great opportunities and challenges.The Big Code project aims to automatically extract insights from large code bases by a combination of static program analysis and machine learning. The purpose of the project is to address three challenge problems: language ecosystem evolution, predictive workload performance modelling, synthesis of personalized programming hints.

This project is supported by the Czech Ministry of Education, Youth and Sports from the Czech Operational Programme Research, Development, and Education, under grant agreement No. CZ.02.1.01/0.0/0.0/15_003/0000421.

More information at