Data Science Project Success - One pager
“Foresight is not about predicting the future, it’s about minimising surprise. - Karl Schroeder"
If there is anything my experience of applied data science has taught me, it's so important to have a clear understanding of the destination you're aiming for. Just like planning a trip, this involves drawing a map that outlines the steps needed to get there. In the context of a Data Science project, this means defining your goals and outlining the key steps needed to achieve them.
.
What's your Why?
Diving headfirst into the technical aspects of a project, such as exploring models and initiating the Exploratory Data Analysis (EDA), is a common inclination, especially when the excitement of tackling a new challenge kicks in. However, this approach often overlooks the foundational step of questioning the "Why?" behind the endeavor.
- What is the problem you are trying to solve?
- Is it even worth solving/solvable?
- Why are we trying to solve it?
.
What's your problem?
.
.
Understanding the Problem: Before delving into the intricacies of data and models, it’s crucial to clearly define the problem at hand. A comprehensive understanding of the problem not only sets the direction for the project but also helps in determining the approach needed to address it.
.
Assessing Worth and Value: Not all problems are worth solving, and not all are solvable with the available resources and within the constraints. It’s essential to evaluate whether the problem is significant enough to warrant the investment of time, effort, and resources. It’s about weighing the benefits and costs/challenges.
.
Determining the Purpose: Beyond the what and the how, the "Why?" is about understanding the purpose and the value of solving the problem. It involves aligning the problem-solving process with the overarching goals and objectives of the individual or organization undertaking it.
.
“Management is doing things right; leadership is doing the right things.” - Peter Drucker
.
Data Science pitfalls
1) Type III Error
A Type III error occurs when the problem-solving process is flawlessly executed, but the underlying problem being addressed is incorrect or not the most pertinent one. It is not about inaccuracies in solutions or conclusions; rather, it is about the misalignment between the problem addressed and the problem that actually needed solving! This happens all too often in business.
.
The Importance of Problem Formulation:
Charles Kettering’s assertion, "A problem well stated is a problem half-solved," encapsulates the essence of avoiding Type III errors. It implies that the clarity and precision in defining a problem are tantamount to solving it.
.
"A problem well stated is a problem half-solved" Charles Kettering
It underscores the idea that the value derived from solving a problem is inherently tied to the relevance and accuracy of the problem formulated.
.
"The formulation of the problem is often more essential than its solution,"
Solving the wrong problem, no matter how efficiently, yields little to no value and can lead to misguided decisions and strategies.
.
If I had an hour to solve a problem, I’d spend 55 minutes thinking about the problem and 5 minutes thinking about solutions. — Albert Einstein
.
Implications in Data Science:
.
In the context of data science, a Type III error can have far-reaching consequences. It can lead to the allocation of resources, time, and effort towards addressing issues that may not be of importance or relevance to the business, while the actual problems remain unaddressed. This error, seen all too often in business, stems from a lack of understanding of the domain, poor communication with stakeholders, or an outright misinterpretation of the core objectives.
.
1) Literature Reviews
I vividly recall a project from a while back where the objective was to cluster electricity usage load profiles. A team member promptly dove into pre-processing the data, analyzing the curves, and exploring k-means clustering. Another colleague experimented with a scipy library to group load profiles based on their peak times and locations.
At that juncture, I was reminded of the ingenuity of Fisher Black and Myron Scholes, who brilliantly adapted the Heat-Diffusion equation from chemical engineering to unravel the complexities of Option Pricing, thereby securing a Nobel Prize in Economics.
Inspired by this interdisciplinary ingenuity, I embarked on a literature review and discovered the technique of using cross-correlation, commonly employed to cluster stock performance in the NASDAQ. The potential parallel was intriguing. With a collaborative spirit, we harnessed PySpark to compute the cross-correlation matrix of our time-series data, and then transitioned to Hierarchical Clustering. This cross-domain adaptation unveiled insightful clusters based on temporal patterns, effectively circumventing the challenges we encountered with k-means, and propelling our project forward on a promising trajectory.
2) Premature completion fallacy
The Premature Completion Fallacy concept is crucial in understanding project management dynamics and optimising productivity. At its core, it challenges the notion that starting a project early invariably leads to finishing it early, highlighting the importance of meticulous planning and preparation.
.
Understanding Premature Completion Fallacy:
The Premature Completion Fallacy is the erroneous belief that initiating a project sooner will always lead to its earlier completion. It overlooks the complexities, unforeseen challenges, and the necessity for thorough preparation and planning that are crucial in any project. The fallacy lies in equating the commencement of a task with its progression and completion.
.
Plan ahead, perform literature & methodology reviews, build out roadmaps and timelines before jumping in too deep! Be sure to identify potential roadblocks (e.g. data quality issues, technical constraints) and plan proactively for them to avoid project derailment.
.
The Wisdom of Aristotle and Lincoln:
.
Aristotle’s adage, "Well begun is half done," encapsulates the essence of avoiding the Premature Completion Fallacy. The emphasis here is not merely on beginning but on beginning well, with adequate preparation and a clear vision.
.
Abraham Lincoln’s quote, "Give me six hours to chop down a tree, and I will spend the first four sharpening the axe," exemplifies the importance of preparation. It underscores the idea that time spent on preparing and planning is not wasted but is a crucial investment.
.
Implications in Project Management:
In the realm of project management, succumbing to the Premature Completion Fallacy can lead to a myriad of challenges, including inadequate resource allocation, lack of clarity in objectives, and unforeseen complications. It can result in projects being rushed, with compromises on quality and effectiveness.
.
The 2 approaches to Data Science
1) The "Working Backwards" philosophy
The "Working Backwards" philosophy, advocated by Colin Bryar and Bill Carr, is a customer-centric methodology that goes beyond mere alignment with customer preferences; it’s about reconstructing the entire developmental process to cater to the evolving demands of the customer.
.
"…in a working-backwards world, starting with the customer and working backwards is the only way to build products and services that customers actually want." Colin Bryar and Bill Carr
.
This approach emphasises the importance of understanding, attracting, and retaining customers, requiring constant reflection and realignment of strategies to satisfy the consumer base.
It’s key to give the customers what they want, not what you think they want. Peter Drucker’s insights, values the continual reassessment and realignment of strategies to consumer needs:
.
"The purpose of a business is to create a customer. The test of the organisation is its capacity to create customers." Peter Drucker
The Human Nature of Stakeholders
Human nature inherently drives people to support what they help build.
Human nature inherently drives people to support what they help build. When stakeholders feel that their voices are heard, and they are part of the creation process, a sense of ownership and affinity develops, leading to increased support, funding, and assistance in navigating organisational red tape. Stakeholders become advocates for your work, championing the initiative within the organisation.
This human inclination to support what one helps create is a powerful catalyst for securing stakeholder buy-in and fostering a collaborative and supportive environment.
.
2) Build it and they will come philosophy
On the other hand, the "Build It And They Will Come" strategy is a more assertive approach, operating on a foundational belief in the inherent demand for a product or service. Put simply, Build It And They Will Come is an approach where the team develops the solution with the assumption that there is a demand for it.
.
This strategy, while bold, can be risky, potentially overlooking the need for consumer feedback and adaptation. It is a stark contrast to the adaptive nature of the "Working Backwards" philosophy, uncovering the intricate balance between innovation, assumption, validation, and consumer response in the world of product development.
The "Build It And They Will Come" philosophy is akin to an entrepreneur stepping into the Dragon's Den, presenting a meticulously developed product, and seeking buy-in from potential investors and stakeholders. It’s not easy!!
.
The IKEA Effect
.
The concept of working backwards and early stakeholder engagement is profoundly related to the IKEA effect. The IKEA effect hypothesises people value things they have a hand in creating more highly, much like the affinity one feels for IKEA furniture that has to be assembled from scratch with one’s own hands.
.
The IKEA effect hypothesises people value things they have a hand in creating more highly, much like the affinity one feels for IKEA furniture that has to be assembled from scratch with one’s own hands.
.
There’s a story, a journey of creation, imbued in the product, fostering a sense of pride and attachment. When stakeholders are engaged early in a project, and their inputs are integrated into the development process, it creates a similar sense of ownership and value. This approach leverages the inherent human tendency to value self-created products, utilising the IKEA effect to build affinity and support for the project from the ground up.
.
Please, therefore, engage your stakeholders as early as you can and they will no doubt guide you to success!
Importance of having a data science advocate
.
The role of an advocate is pivotal in bridging the gap between technical intricacies and organisational objectives. In the context of data science advocacy, the role of an advocate becomes pivotal in melding technical nuances with overarching organisational objectives. The advocate’s task is to secure understanding, support, and crucially, funding for data science initiatives. It is through advocacy that intricate facets of data science are translated into organisational value
.
There are two Ms; one is ‘model’, the other is ‘magic’. Data scientists think models, the business thinks magic.
.
Aristotle
Aristotle’s concepts of ethos, pathos, and logos, articulated in his seminal work, "Rhetoric," remain profoundly relevant, especially in the realm of data science advocacy. In this work, Aristotle delineates the importance of credibility (ethos), emotional connection (pathos), and logical argument (logos) as the foundational pillars of effective persuasion.
Aristotle’s ethos, pathos, and logos are profoundly relevant in this context. Aristotle emphasised the importance of credibility (ethos), emotional connection (pathos), and logical argument (logos) as the cornerstones of effective persuasion, and these elements are crucial for a data science advocate to secure understanding, support, and ultimately, funding for data science initiatives.
Empathy and emotion can resonate more deeply and be more persuasive than mere logical reasoning, serving as powerful catalysts in gaining genuine support and commitment.
.
What will you deliver?
.
Establishing Clear Outcomes (The 'What')
Success must be defined by articulating the desired outcome in clear, quantifiable terms. This clarity acts as a compass, preventing the pursuit of shifting goals and aiding in the discernment of which projects are most aligned with organisational objectives. As Peter Drucker wisely stated, “If you can’t measure it, you can’t change it.” Utilising business metrics such as conversion rates or savings from fraud reduction is pivotal in measuring the success of a project.
“If you can’t measure it, you can’t change it.”
.
Given the constraints of resources like time, budget, and external support, prioritisation becomes a necessity. It’s imperative to allocate resources to projects that promise substantial value, rather than to those that demand high precision but offer limited impact.
.
.
Crafting a Strategic Deliverable (The 'How')
It’s important to conceptualise a deliverable that aligns with the intended goals and seamlessly integrates with existing systems. For example, if an e-commerce platform aims to refine product discovery and user experience, several avenues could be explored—better search functionalities, optimising recommendations, or improving email campaigns. If the focus is on refining recommendations, the method of deployment needs careful consideration—whether to employ a daily updating cache or a service that generates real-time recommendations based on user input. Avoid developing an overly complex recommender, for example, if it cannot be integrated into the system.
.
Soliciting feedback early and often is crucial to circumvent the development of a feature-rich yet ineffective deliverable that struggles with integration.
.
While an exhaustive plan isn’t a prerequisite, outlining the fundamental aspects of the project is crucial to secure alignment and buy-in from business, product, and tech teams. Soliciting feedback early and often is crucial to circumvent the development of a feature-rich yet ineffective deliverable that struggles with integration.
.
.
Defining the Scope
.
The scope is the backbone of a project, delineating the boundaries and setting the timeline—it’s the ‘Where’ and ‘When’ of a project.
“Scope is the boundary between ambition and reality.”
In a landscape of finite resources and competing priorities, a well-articulated scope is pivotal. It ensures focus, preventing the dilution of efforts and resources. It’s about defining the exact facets the project will touch and the depth of its impact.
A scope, agreed upon by all stakeholders, helps aligns varying expectations, providing a transparent snapshot of the project’s path and expected deliverables - managing expectations and reducing the chances of future discrepancies and conflicts.
.
Boundaries and Constraints
One of the most important considerations in any Data Science project is the distinction between constraints and boundaries. While constraints refer to limitations imposed by the available data, resources, and time, boundaries refer to ethical and moral considerations that guide your approach and decision-making.
.
Constraints
.
Constraints - such as limited data availability, computing resources, or time constraints, are pivotal considerations. It's crucial to maintain realism regarding these constraints and to modify your approach as the project progresses. This might involve prioritising certain data sources or analysis techniques, or devising innovative solutions to operate within your constraints.
There are three distinct types of constraints in a data science project: business, technical, and resource constraints.
.
Business constraints allocate a budget and the liberty to experiment whilst adhering to specific limitations.
.
Business constraints typically revolve around budget allocations and operational boundaries. They define the financial and strategic parameters within which a project must be executed. For example, a project may have the freedom to explore various models but within a set budget and timeline, ensuring that the experimentation does not compromise the financial stability or strategic alignment of the project.
.
Technical constraints such as limitations on latency, throughput, interface, missing data, schema, or formats that must be complied with.
.
Technical constraints relate to the specific technological limitations that a project must navigate. Adherence to these constraints is crucial to ensure the seamless integration and functionality of the developed solution within the existing technological infrastructure.
For example, a data science project may need to develop algorithms that can process data within a certain latency on a particular cloud provider in the format of a REST api to prevent disruptions and ensure the smooth operation of the overall system.
.
Resource constraints pertain to limitations on the amount of compute, platform, and memory resources available for the project.
.
Resource constraints involve the limitations on the computational power, platform capabilities, and memory resources available for executing the project. These constraints dictate the scale and complexity of the solutions that can be developed and deployed.
.
Balancing resource constraints is pivotal to developing a solution that is both effective and feasible, avoiding over-utilisation of available resources and ensuring the sustainability of the project in the long run.
.
.
Boundaries
Boundaries serve as the ethical and moral compass in projects, delineating the limits within which decisions and actions must reside to maintain integrity and ethical conduct. They are crucial in fostering trust and ensuring the project’s alignment with ethical standards and moral principles.
.
Ethical Considerations
Ethical considerations, such as data privacy, are pivotal, necessitating stringent protection measures and adherence to regulations to safeguard individual rights and maintain confidentiality. Addressing these considerations is vital in preserving the project's credibility and the individuals' trust whose data is being handled.
.
Bias and Fairness
Ensuring fairness and addressing bias are essential to prevent discriminatory outcomes and promote equality. This involves a critical assessment of data and methodologies to identify and mitigate any inherent biases that may lead to skewed or unfair conclusions, fostering a sense of equality and justice in project outcomes.
.
Transparency
Transparency is integral in building trust and fostering accountability among stakeholders. Clear communication of methodologies, assumptions, and findings allows stakeholders to understand the decision-making processes, promoting openness, honesty, and informed understanding among all relevant parties.
.
Steering the Future
Karl Schroeder's book "Steering the Future" provides valuable insights into the process of planning and executing successful projects. One of the key insights from the book is the importance of working backwards from your goals to determine the steps needed to achieve them.
.
Schroeder describes this approach as "backcasting", where you start with a clear vision of your desired end state and work backwards to determine the steps needed to get there.
This approach can be particularly valuable in Data Science projects, where the complexity of the data and analysis can make it difficult to determine the most effective path forward.
Another important insight from Schroeder's book is the importance of taking a systems thinking approach to your project. This involves looking beyond the immediate problem you're trying to solve and considering the broader context in which your project exists.
.
By taking a systems thinking approach, you can identify potential risks and opportunities that may impact your project, and develop strategies to mitigate these risks and capitalise on these opportunities.
.
Time-boxing
Parkinson's Law and the Importance of Constraints
The traditional approach to project management involves starting with a solution and then estimating the time and resources required for each component. However, this approach is flawed as it can result in an open-ended commitment to resources.
.
Parkinson's Law states that work expands to fill the time available for its completion.
In the context of Data Science projects, this means that without clear constraints and deadlines, projects can easily become bloated and never reach completion.
.
To avoid this trap, it's important to set clear constraints and deadlines for your project. This may involve defining a clear scope and timeline for your project, and identifying key milestones and deliverables that need to be completed within specific timeframes.
The time-box will vary across project stages. At the start, tighter time-boxes are required to limit wild goose chases and ensure that the project stays on track. Once there is more certainty of going into production, bigger time-boxes can be allocated.
.
.
.
Keep reading!
Stay tuned for subsequent articles where we will delve deeper into these concepts, providing more nuanced insights and practical strategies to navigate the intricate landscape of data science.
.
1 - Feasibility assessment
2 - Proof of Concept (POC)
3 - Deploy to production
4 - Operational Maintenance
The great thing about data and new finding is always knowing what to work on to. Your article has shed light on what is truly needed to be done when someone needs to solve a problem especially in today's world of Technology . I would love to dive more into the world of Data Science and your articles will help me get better understanding of it.