Defining the Problem and Objectives in Machine Learning


Defining the Problem and Objectives in Machine Learning: The Foundation of Any Successful Project

Most ML projects fail not because of the model, but because the problem was poorly defined. The problem definition phase is the foundation of any successful Machine Learning project. A mediocre model with a well-defined problem will always outperform an excellent model built on a poorly defined foundation.

This stage determines what will be predicted, why, how success will be measured, and what business impact it will have. If this fails, everything that follows—data, features, models, evaluation—is built on sand.

1. Problem Formulation: From Vague to Actionable

The initial question must be concrete, measurable, and actionable. The difference between a well-formulated and poorly formulated problem determines the project’s fate.

Examples of poorly defined problems:

  • «We want to predict sales»
  • «We need to know which customers will leave»
  • «We want to improve production»

Examples of correctly formulated problems:

  • «Predict weekly demand per product and store with an error below 10% to optimize inventory levels and reduce stockouts»
  • «Identify customers with a probability above 70% of abandoning the service in the next 30 days to activate personalized retention campaigns»
  • «Detect defects in industrial parts during inspection with a minimum accuracy of 95% to reduce quality costs and claims»

A well-defined problem answers:

  1. What exactly will be predicted
  2. Why that prediction is needed
  3. When the prediction will be made
  4. What level of accuracy is required
  5. What concrete action will be taken with the result

2. The Target Variable: The Heart of the Project

Without a clear and well-defined target, there is no model. The target variable must be observable, measurable, and available at prediction time.

Types of target variables by problem:

Type Examples Use Cases
Binary Classification Churn (yes/no), Fraud (yes/no) Detection, binary decisions
Multiclass Classification Product category, Risk level Segmentation, categorization
Regression Price, Demand, Delivery time Continuous value prediction
Time Series Sales at t+7, Consumption at t+30 Forecasting, planning
Ranking Recommended product order Recommendation systems

Critical errors when defining the target:

  • Using a target that doesn’t exist in historical data
  • Defining a target that won’t be available in production
  • Ignoring concept drift (changes in target behavior over time)
  • Confusing correlation with causation when choosing the variable

3. Project Objectives: Beyond the Model

An ML project doesn’t end with a trained model. It’s a complete solution that must generate measurable value.

Workflow diagram showing problem definition, target variable, objectives, metrics, and action in a Machine Learning project.

Objectives must follow the SMART framework, a proven method for converting vague intentions into actionable goals:

S – Specific

Function: Eliminate ambiguity and define exactly what will be achieved.

A specific objective answers: What exactly will we accomplish? Who is involved? Where will it be applied?

  • ❌ Vague: «Improve predictions»
  • ✅ Specific: «Improve the accuracy of the demand prediction model for the 50 stores in the northern region»

M – Measurable

Function: Establish quantifiable criteria to know if the objective has been achieved.

It must be answerable with a number or percentage. If you can’t measure it, you can’t manage it.

  • ❌ Not measurable: «Significantly reduce churn»
  • ✅ Measurable: «Reduce churn rate from 25% to 20%» or «Detect 95% of fraud cases»

A – Actionable

Function: Ensure the objective is realistic given available resources, time, and capabilities.

An objective should be ambitious but possible. Unattainable objectives demotivate the team and waste resources.

  • ❌ Unattainable: «Achieve 100% accuracy in fraud detection in 2 weeks with limited historical data»
  • ✅ Actionable: «Achieve 92% accuracy in fraud detection in 3 months, improving from the current 85%»

R – Relevant

Function: Guarantee that the objective is aligned with the business’s strategic priorities.

Ask yourself: Why is this objective important? How does it contribute to the company’s overall goals? Is the effort worthwhile?

  • ❌ Irrelevant: «Predict customers’ favorite color» (if it doesn’t impact sales or retention)
  • ✅ Relevant: «Predict purchase probability in the next 7 days to optimize marketing campaign spending»

T – Time-bound

Function: Establish a clear deadline that creates urgency and allows planning.

Without a defined timeframe, objectives are postponed indefinitely. Time also helps prioritize resources.

  • ❌ No deadline: «Reduce inventory costs»
  • ✅ With deadline: «Reduce inventory costs by 15% before the end of Q2 2026»

Complete example applying SMART:

Poorly defined objective:
«We want to use ML to improve our business»

SMART objective:
«Reduce cart abandonment rate from 68% to 55% in the next 4 months through a personalized recommendation model, generating an estimated increase of €200K in monthly sales»

  • S: Reduce cart abandonment with personalized recommendations
  • M: From 68% to 55% (13 percentage points)
  • A: Based on industry benchmarks and previous pilot tests
  • R: Direct impact on revenue (€200K/month)
  • T: 4-month deadline

This structure helps you validate whether your objective is well-constructed before investing resources in the project.

4. Success Metrics: Technical vs. Business

The technical metric must serve the business metric, never the other way around. A model with an AUC-ROC of 0.95 that doesn’t generate value is a failure.

Technical metrics by problem type:

Classification:

  • Accuracy: when classes are balanced
  • Precision: when false positives are costly
  • Recall: when false negatives are critical
  • F1-score: balance between precision and recall
  • AUC-ROC: overall discrimination capability

Regression:

  • MAE (Mean Absolute Error): errors on the same scale as the target
  • RMSE (Root Mean Squared Error): penalizes large errors
  • MAPE (Mean Absolute Percentage Error): relative error, useful for comparison

Business metrics:

  • Project ROI
  • Operational cost reduction
  • Increase in sales or retention
  • Time savings in manual processes

Golden rule: If your model has 98% accuracy but doesn’t reduce costs or increase revenue, you’ve failed in problem definition.

5. Real-World Constraints and Requirements

A model doesn’t operate in a vacuum. It must meet constraints that are often more important than the technical metric.

Technical constraints:

  • Maximum latency: how long can the prediction take? (100ms, 1s, 1 hour)
  • Computational resources: available memory, CPU, GPU
  • Update frequency: how often is the model retrained?
  • Data availability: which features will be available in real-time?

Business constraints:

  • Interpretability: do you need to explain each decision? (compliance, regulation)
  • Implementation cost: available budget
  • Acceptable risk: what is the cost of an error?
  • User impact: customer experience

Legal constraints:

  • GDPR compliance and data protection
  • Handling sensitive data (health, finance)
  • Mandatory explainability (right to explanation)

6. From Prediction to Action

A model without an associated action is just a pretty chart. The prediction must trigger a concrete decision or process.

Examples of actions based on predictions:

  • If churn probability > 0.7 → send personalized offer to customer
  • If forecasted demand > current stock → generate automatic purchase order
  • If fraud score > threshold → block transaction and activate manual review
  • If defect detected → remove part from production line

The action must be defined BEFORE training the model, because it conditions:

  • The optimal decision threshold
  • The balance between precision and recall
  • The cost of errors (false positives vs. false negatives)

7. Complete Practical Case

Let’s see how all these elements integrate in a real project:

Problem:
Predict which customers of a SaaS platform will abandon the service in the next 30 days.

Target variable:
Binary churn (1 = abandons, 0 = remains)

Project objective:
Reduce churn rate from 22% to 18% in 6 months, increasing average LTV by 15%.

Technical metrics:

  • Recall ≥ 0.85 (capture at least 85% of actual churners)
  • AUC-ROC ≥ 0.90 (good discrimination capability)
  • Precision ≥ 0.60 (avoid saturating the retention team)

Business metric:
Increase monthly retention rate from 78% to 82%, generating an additional €150K in ARR.

Constraints:

  • Interpretable model (legal department requirement)
  • Latency < 100ms for real-time scoring
  • Daily data updates
  • Retention campaign cost: €15 per customer

Action:
Send personalized retention campaign to customers with probability > 0.7, with incentive proportional to customer value.

Impact evaluation:
If the campaign retains 30% of identified customers, the project ROI is positive from month 3 onwards.

Conclusion

Correctly defining the problem is the most important step in the entire Machine Learning process. A clear problem, a well-defined target, and objectives aligned with the business are the foundation of any solution that generates real value.

When this phase is executed correctly, the rest of the workflow flows naturally: data exploration has a clear purpose, feature engineering is goal-oriented, model selection responds to concrete constraints, and evaluation measures what truly matters.

Investing time in this stage is not optional. It’s the difference between a successful ML project and a model that never reaches production.


In the next article of the series, we’ll explore data preparation and exploration (EDA): how to understand the dataset, detect quality issues, identify hidden patterns, and build the foundation for a solid model.