January 24, 2026

Predictive Test Selection with Machine Learning

Use machine learning to run only high-risk regression tests, cut execution time and costs, and keep high failure detection while filtering flaky tests.
Josh Ip, Founder & CEO
Josh Ip, Founder & CEO

Predictive test selection using machine learning is transforming how software teams manage regression testing. Instead of running every test after a code change, machine learning models identify and prioritize the tests most likely to fail. This approach saves time, reduces resource usage, and speeds up feedback in CI/CD pipelines. For example, Facebook's system catches 99.9% of regressions while running only 33% of tests, cutting test execution time by 84%.

Key benefits include:

  • Faster testing: Only high-risk tests are executed, reducing delays.
  • Resource efficiency: Fewer tests mean lower infrastructure costs.
  • Improved accuracy: ML models learn from historical data to avoid flaky tests and focus on real issues.

This method is especially effective for large codebases, where traditional dependency-based or static analysis approaches often trigger unnecessary tests and struggle to adapt to complexity. By retraining regularly, ML models stay aligned with evolving codebases, ensuring reliable results without manual adjustments.

Platforms like Ranger simplify this process by automating test selection, filtering flaky tests, and integrating with CI/CD tools, making it easier for teams to maintain fast and reliable development workflows.

ML-Based Test Selection Performance Metrics and Benefits

ML-Based Test Selection Performance Metrics and Benefits

What is predictive test selection?

Problems with Traditional Test Selection Methods

Traditional test selection methods struggle to keep up with the demands of modern development. These older approaches rely on rigid rules that fail to adapt to rapidly changing codebases, leading to inefficiencies and the need for more dynamic, machine learning-driven solutions.

Static Methods Fall Behind Code Changes

Dependency-based selection often casts too wide a net, running tests unnecessarily. As Facebook engineers Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra explained:

"This approach has a significant shortcoming: It ends up saying 'yes, this test is impacted' more often than is actually necessary."

Coverage-based methods, while more targeted, require constant maintenance to keep up with code changes. For example, in September 2025, T-Technologies (T-Bank) reported that their Test Impact Analysis tool was selecting 40% to 50% of their test suite for each commit, causing significant load spikes.

These tools also tend to overlook key areas. Typically, they focus solely on source code files and ignore changes to configuration files - like .yaml, .json, or .xml - that can also trigger test failures. As Pavel Plyusnin and his team at T-Technologies observed:

"Coverage-based approaches typically fail to account for changes in non-source code files such as configuration or resource files (e.g., .yaml, .json, .xml)."

The result? Systems bogged down by unnecessary tests and missed opportunities to predict failures accurately.

Struggles with Accurate Failure Prediction

Traditional methods don't just overrun tests - they also fail to predict failures effectively. Static analysis often misses critical test cases, particularly in languages like Java, where features like reflection make dependency tracking more complex. On the other hand, dynamic analysis, while more accurate, requires resource-intensive instrumentation and tracing, which can slow down CI/CD pipelines.

Test flakiness adds another layer of difficulty. Conventional methods often can't distinguish between genuine regressions and flaky tests that fail sporadically. This leads to wasted effort on irrelevant tests while real issues might go unnoticed, further straining resources and compromising reliability.

Using Machine Learning for Test Selection

Traditional test selection methods have their limits, especially with sprawling codebases. Machine learning (ML) offers a smarter, more adaptive approach. Instead of asking, "Which tests might be affected?" ML flips the script and asks, "Which tests are most likely to fail?" This shift from a deterministic to a probabilistic mindset brings a game-changing edge to handling large-scale software projects.

How Predictive Models Work

ML models thrive on data. They dig into historical patterns, analyzing vast datasets of past code changes - like commits, pull requests, and file modifications - and link them to test outcomes. The training process incorporates three key types of data:

  • Change-based features: Information from version control systems and continuous integration (CI) logs.
  • Test history features: Metrics like prior failure rates and execution times.
  • Semantic features: Descriptions in natural language and Abstract Syntax Trees.

When a developer submits new code, the model calculates a failure probability (a value between 0 and 1) for each relevant test. Tests that cross a predefined threshold are then selected for execution. This probability-driven approach ensures that the system focuses on the tests most likely to catch regressions.

To tackle test flakiness - those unpredictable, non-reproducible failures - the model employs strict retry protocols during training. This helps it differentiate between real regressions and random noise that could otherwise skew results. Among various algorithms, gradient-boosted decision-tree (GBDT) models stand out for their ability to handle complex, non-linear relationships in software data. They’re also easy to train and provide interpretable results, making them a reliable choice.

These techniques collectively pave the way for substantial improvements in efficiency and cost-effectiveness.

Benefits of ML-Based Test Selection

The results speak volumes. Facebook’s ML-driven system caught 99.9% of regressions while executing just one-third of the tests that would traditionally depend on modified code. On average, practitioners using ML-based regression test selection save 84% of test execution time while still identifying 90% of failures.

Speed and adaptability are key advantages. Unlike static analysis tools that require manual adjustments, ML systems evolve automatically by retraining as the codebase changes. Accuracy remains high, too - Facebook’s production model maintained over 95% accuracy in predicting individual test outcomes, ensuring developers could trust the system.

Creating and Measuring Predictive Models

How to Build and Train Models

Building a predictive model starts with collecting historical data. This includes test outcomes (pass or fail), details about code changes (like modified files and functions), execution history to identify flaky tests, and coverage data. Since real failures are uncommon in well-maintained codebases, tools like PIT can generate synthetic faults - called "mutants" - to enrich the training dataset.

The next step is feature engineering, where raw data is turned into useful inputs for the model. For instance, you can track how often specific files are changed and study historical failure patterns linked to those files. When choosing a model, gradient-boosted decision trees often work well with tabular software data. Setting a failure probability threshold - such as 30% - helps decide which tests to run, striking a balance between speed and risk.

It’s important to continuously retrain the model to account for recent code changes. Additionally, retrying failed tests aggressively can help differentiate between actual regressions and flaky failures. Once the model is trained, its performance should be measured systematically to ensure it reliably balances risk and efficiency.

Measuring Model Effectiveness

To ensure that machine learning-based test selection improves regression testing, you need to measure its effectiveness. Three key metrics are commonly used:

  • Recall: The percentage of actual failures the model catches.
  • Precision: The proportion of selected tests that actually fail.
  • APFD (Average Percentage of Faults Detected): Evaluates how quickly faults are identified.

In real-world use, models are expected to predict over 95% of test outcomes accurately.

"In production, we require our model to predict more than 95 percent of test outcomes correctly and to catch at least one failing test for more than 99.9 percent of problematic changes." - Mateusz Machalica, Alex Samylkin, Meredith Porth, and Satish Chandra, Facebook

Comparing Model Performance Trade-offs

Predictive test selection involves balancing safety and speed. High recall ensures more bugs are caught but requires running more tests. On the other hand, reducing the number of tests saves time but increases the risk of missing failures. This trade-off is especially critical in continuous testing environments, where efficiency directly affects development speed.

Blending machine learning with traditional static analysis tools like Ekstazi and STARTS can improve efficiency even further. For example, combining these approaches allows teams to run 25.34% and 21.44% fewer tests, respectively, compared to using the tools alone. Many teams prioritize higher recall, preferring to run extra tests rather than risk missing critical failures.

How Ranger Uses AI for Test Selection

Ranger

AI-Based Test Suite Optimization

Ranger takes a smart approach to test selection by using gradient-boosted decision-tree classifiers to predict which tests are most likely to fail after a code change. Here's how it works: commits are transformed into vectors using a bag-of-words model, which allows for quick and efficient processing without relying on heavy coverage maps. The system evaluates a variety of features, including file characteristics (like how often files are changed and how many lines are added or deleted), test-specific data (such as historical failure rates), and cross-file relationships (like the proximity of tests to modified files based on directory structures). For an extra layer of precision, an optional semantic analysis feature uses pre-trained models like StarCoder2 to analyze code changes before and after commits.

In September 2025, T-Technologies applied this framework to a repository containing over 6,500 UI tests. The results? Test execution was reduced to just 15% of the suite, cutting execution time by a factor of 5.9 and speeding up CI/CD processes by 5.6 times - all while maintaining a failure detection rate above 95%. This technical foundation allows Ranger to simplify workflows through built-in automation.

Integration and Automation Features

Ranger’s optimized test selection seamlessly integrates into automated workflows. By connecting directly with tools like Slack and GitHub, Ranger makes quality assurance (QA) smoother and faster. When developers push code changes, the platform automatically identifies and runs the most relevant tests, using insights from historical data and code analysis. AI also plays a role in creating new tests, which are then reviewed by humans to ensure accuracy.

Another standout feature is how Ranger handles flaky tests. By analyzing historical logs and using automated tools, the system filters out unreliable tests, focusing only on true regressions. Plus, since Ranger hosts all the test infrastructure, teams don’t have to worry about managing testing environments or planning for capacity. These enhancements directly improve the speed and reliability of CI/CD pipelines, tackling common bottlenecks in modern integration workflows.

Measured Results and Outcomes

The impact of Ranger’s approach is clear: teams can run just 15%–33% of their tests while still catching 95%–99.9% of regressions. To keep up with evolving codebases, Ranger’s models retrain regularly using a sliding window of recent data. This ensures the system stays accurate without requiring manual adjustments. The result? Faster feedback cycles and more dependable deployments in CI/CD pipelines.

Conclusion: Machine Learning Changes Regression Testing

This article has highlighted how machine learning (ML) is reshaping regression testing, offering smarter, more efficient ways to handle testing processes. ML doesn't just run every possible test - it pinpoints the ones most likely to fail, saving time and resources. For instance, Facebook's predictive test selection system managed to catch 99.9% of regressions while running only 33.3% of the tests tied to modified code. The result? Faster feedback, reduced infrastructure costs, and more dependable deployments. On average, teams using ML-based test selection cut test execution time by 84% while still identifying 90% of failures. Research from March 2025 further supports this, showing a 29.24% reduction in unnecessary tests without sacrificing quality assurance.

"Machine learning is revolutionizing many aspects of life. It is our belief that software engineering is no different in this respect".

Unlike static approaches that need constant manual updates, ML models continuously retrain using recent test results. This means they adapt automatically - ignoring flaky tests, prioritizing higher-risk changes, and evolving alongside the codebase. With this continuous learning, accuracy improves without requiring additional engineering effort.

Platforms like Ranger make these advancements accessible to modern development teams. By offering AI-driven test optimization, Ranger eliminates the need for building custom ML infrastructure. It handles everything - model training, flaky test filtering, and seamless integration - allowing developers to focus on creating features rather than managing testing systems.

For teams struggling with slow CI/CD pipelines and bloated test suites, predictive test selection is more than just an upgrade. It’s a practical, results-driven solution that boosts speed, reduces costs, and enhances reliability.

FAQs

How does machine learning make regression testing faster and more efficient?

Machine learning brings a smart twist to regression testing through predictive test selection. This technique pinpoints the most relevant tests based on specific code changes, cutting down the number of tests that need to be executed while still maintaining reliability and precision.

By zeroing in on tests most likely to uncover potential issues, machine learning helps teams work more efficiently. It saves time, reduces resource usage, and keeps code quality intact. Plus, faster feedback loops mean developers can roll out new features with added confidence.

What data do predictive test selection models rely on?

Predictive test selection models sift through a range of data to pinpoint the most relevant tests to execute. They draw insights from sources like historical test results, version control logs, recent code changes, defect patterns, and attributes tied to specific test cases.

Using this information, these models prioritize tests that are likely to have the greatest impact. This approach not only saves time but also ensures testing remains thorough and dependable.

How does Ranger manage flaky tests during test selection?

Ranger leverages machine learning models that are trained on historical test data to pinpoint flaky tests and predict which ones are most likely to fail. This smart process helps streamline test selection, cutting down on unnecessary test runs and reducing false positives.

By improving the accuracy of testing results, Ranger allows teams to concentrate on genuine problems, ensuring they can confidently move forward with their code changes.

Related Blog Posts

Book a demo