30 abril 2026

Defining the Problem and Objectives in Machine Learning: The Foundation of Any Successful Project

Most ML projects fail not because of the model, but because the problem was poorly defined. The problem definition phase is the foundation of any successful Machine Learning project. A mediocre model with a well-defined problem will always outperform an excellent model built on a poorly defined foundation.

This stage determines what will be predicted, why, how success will be measured, and what business impact it will have. If this fails, everything that follows—data, features, models, evaluation—is built on sand.

1. Problem Formulation: From Vague to Actionable

The initial question must be concrete, measurable, and actionable. The difference between a well-formulated and poorly formulated problem determines the project’s fate.

Examples of poorly defined problems:

«We want to predict sales»
«We need to know which customers will leave»
«We want to improve production»

Examples of correctly formulated problems:

«Predict weekly demand per product and store with an error below 10% to optimize inventory levels and reduce stockouts»
«Identify customers with a probability above 70% of abandoning the service in the next 30 days to activate personalized retention campaigns»
«Detect defects in industrial parts during inspection with a minimum accuracy of 95% to reduce quality costs and claims»

A well-defined problem answers:

What exactly will be predicted
Why that prediction is needed
When the prediction will be made
What level of accuracy is required
What concrete action will be taken with the result

2. The Target Variable: The Heart of the Project

Without a clear and well-defined target, there is no model. The target variable must be observable, measurable, and available at prediction time.

Types of target variables by problem:

Type	Examples	Use Cases
Binary Classification	Churn (yes/no), Fraud (yes/no)	Detection, binary decisions
Multiclass Classification	Product category, Risk level	Segmentation, categorization
Regression	Price, Demand, Delivery time	Continuous value prediction
Time Series	Sales at t+7, Consumption at t+30	Forecasting, planning
Ranking	Recommended product order	Recommendation systems

Critical errors when defining the target:

Using a target that doesn’t exist in historical data
Defining a target that won’t be available in production
Ignoring concept drift (changes in target behavior over time)
Confusing correlation with causation when choosing the variable

3. Project Objectives: Beyond the Model

An ML project doesn’t end with a trained model. It’s a complete solution that must generate measurable value.

Objectives must follow the SMART framework, a proven method for converting vague intentions into actionable goals:

S – Specific

Function: Eliminate ambiguity and define exactly what will be achieved.

A specific objective answers: What exactly will we accomplish? Who is involved? Where will it be applied?

❌ Vague: «Improve predictions»
✅ Specific: «Improve the accuracy of the demand prediction model for the 50 stores in the northern region»

M – Measurable

Function: Establish quantifiable criteria to know if the objective has been achieved.

It must be answerable with a number or percentage. If you can’t measure it, you can’t manage it.

❌ Not measurable: «Significantly reduce churn»
✅ Measurable: «Reduce churn rate from 25% to 20%» or «Detect 95% of fraud cases»

A – Actionable

Function: Ensure the objective is realistic given available resources, time, and capabilities.

An objective should be ambitious but possible. Unattainable objectives demotivate the team and waste resources.

❌ Unattainable: «Achieve 100% accuracy in fraud detection in 2 weeks with limited historical data»
✅ Actionable: «Achieve 92% accuracy in fraud detection in 3 months, improving from the current 85%»

R – Relevant

Function: Guarantee that the objective is aligned with the business’s strategic priorities.

Ask yourself: Why is this objective important? How does it contribute to the company’s overall goals? Is the effort worthwhile?

❌ Irrelevant: «Predict customers’ favorite color» (if it doesn’t impact sales or retention)
✅ Relevant: «Predict purchase probability in the next 7 days to optimize marketing campaign spending»

T – Time-bound

Function: Establish a clear deadline that creates urgency and allows planning.

Without a defined timeframe, objectives are postponed indefinitely. Time also helps prioritize resources.

❌ No deadline: «Reduce inventory costs»
✅ With deadline: «Reduce inventory costs by 15% before the end of Q2 2026»

Complete example applying SMART:

Poorly defined objective:
«We want to use ML to improve our business»

SMART objective:
«Reduce cart abandonment rate from 68% to 55% in the next 4 months through a personalized recommendation model, generating an estimated increase of €200K in monthly sales»

S: Reduce cart abandonment with personalized recommendations
M: From 68% to 55% (13 percentage points)
A: Based on industry benchmarks and previous pilot tests
R: Direct impact on revenue (€200K/month)
T: 4-month deadline

This structure helps you validate whether your objective is well-constructed before investing resources in the project.

4. Success Metrics: Technical vs. Business

The technical metric must serve the business metric, never the other way around. A model with an AUC-ROC of 0.95 that doesn’t generate value is a failure.

Technical metrics by problem type:

Classification:

Accuracy: when classes are balanced
Precision: when false positives are costly
Recall: when false negatives are critical
F1-score: balance between precision and recall
AUC-ROC: overall discrimination capability

Regression:

MAE (Mean Absolute Error): errors on the same scale as the target
RMSE (Root Mean Squared Error): penalizes large errors
MAPE (Mean Absolute Percentage Error): relative error, useful for comparison

Business metrics:

Project ROI
Operational cost reduction
Increase in sales or retention
Time savings in manual processes

Golden rule: If your model has 98% accuracy but doesn’t reduce costs or increase revenue, you’ve failed in problem definition.

5. Real-World Constraints and Requirements

A model doesn’t operate in a vacuum. It must meet constraints that are often more important than the technical metric.

Technical constraints:

Maximum latency: how long can the prediction take? (100ms, 1s, 1 hour)
Computational resources: available memory, CPU, GPU
Update frequency: how often is the model retrained?
Data availability: which features will be available in real-time?

Business constraints:

Interpretability: do you need to explain each decision? (compliance, regulation)
Implementation cost: available budget
Acceptable risk: what is the cost of an error?
User impact: customer experience

Legal constraints:

GDPR compliance and data protection
Handling sensitive data (health, finance)
Mandatory explainability (right to explanation)

6. From Prediction to Action

A model without an associated action is just a pretty chart. The prediction must trigger a concrete decision or process.

Examples of actions based on predictions:

If churn probability > 0.7 → send personalized offer to customer
If forecasted demand > current stock → generate automatic purchase order
If fraud score > threshold → block transaction and activate manual review
If defect detected → remove part from production line

The action must be defined BEFORE training the model, because it conditions:

The optimal decision threshold
The balance between precision and recall
The cost of errors (false positives vs. false negatives)

7. Complete Practical Case

Let’s see how all these elements integrate in a real project:

Problem:
Predict which customers of a SaaS platform will abandon the service in the next 30 days.

Target variable:
Binary churn (1 = abandons, 0 = remains)

Project objective:
Reduce churn rate from 22% to 18% in 6 months, increasing average LTV by 15%.

Technical metrics:

Recall ≥ 0.85 (capture at least 85% of actual churners)
AUC-ROC ≥ 0.90 (good discrimination capability)
Precision ≥ 0.60 (avoid saturating the retention team)

Business metric:
Increase monthly retention rate from 78% to 82%, generating an additional €150K in ARR.

Constraints:

Interpretable model (legal department requirement)
Latency < 100ms for real-time scoring
Daily data updates
Retention campaign cost: €15 per customer

Action:
Send personalized retention campaign to customers with probability > 0.7, with incentive proportional to customer value.

Impact evaluation:
If the campaign retains 30% of identified customers, the project ROI is positive from month 3 onwards.

Conclusion

Correctly defining the problem is the most important step in the entire Machine Learning process. A clear problem, a well-defined target, and objectives aligned with the business are the foundation of any solution that generates real value.

When this phase is executed correctly, the rest of the workflow flows naturally: data exploration has a clear purpose, feature engineering is goal-oriented, model selection responds to concrete constraints, and evaluation measures what truly matters.

Investing time in this stage is not optional. It’s the difference between a successful ML project and a model that never reaches production.

In the next article of the series, we’ll explore data preparation and exploration (EDA): how to understand the dataset, detect quality issues, identify hidden patterns, and build the foundation for a solid model.

Machine Learning, AI, General, IA, Inteligencia Artificial

| Tags: AI, artificial, ia, inteligencia, Inteligencia artificial, learning, machine, Machine Learning

De On-Premise a la Nube in English

Defining the Problem and Objectives in Machine Learning