You probably have a great idea for an AI project. Here’s how to put it to the test.
Part II of my Good AI Use Case series
The Two (and a half)-Step Test for choosing the right AI tool. (image by the author)
Generative AI and machine learning (ML) both have massive potential to streamline how we work and innovate. But you already know that. With so much opportunity knocking, what we’re more likely to struggle with is decision paralysis.
How will you know if your idea is actually a good AI use-case, and which of a crushing variety of approaches is most suitable?
But first, some definitions. There’s a lot of jargon being tossed around on this topic, so I want to be clear about what I mean when I use terms like ML and AI. Artificial Intelligence (AI) is the larger field working to equip computers with the ability to simulate human intelligence. Within AI are:
Machine Learning (ML): AI that learns and improves from data
Generative AI (gen AI): AI that specialises in creating new and unique content, such as text, images, or sounds
Large Language Models (LLMs): A type of generative AI model designed to understand and generate human language. This where ChatGPT, Gemini, Claude and others sit.
Machine Learning and Generative AI are subfields within Artificial Intelligence. (image by the author)
A recent McKinsey Global Survey found that 65% of organisations surveyed are using generative AI, and importantly they’re already deriving value from the technology, both as cost reductions and as revenue increases. But don’t jump on yet. Though the data makes it clear there are the real gains to be had from gen AI, it’s essential to evaluate your ideas carefully and determine if they align with either technology’s capabilities.
How can you be sure your next big idea (or your colleagues’) is not only feasible but also well-matched for generative AI?
Put it to the test.
Ask yourself: what boring, repetitive tasks do I wish were automated?
This simple question paves the way for a two (and a half) step test that will help you confidently distinguish between problems best suited for generative AI and those where tried-and-true predictive ML reigns supreme.
Step 0. Ask yourself — What do I wish I could automate?
You may ask yourself, “What is that beautiful house?”
You may ask yourself, “Where does that highway go to?”
And you may ask yourself, “Am I right, am I wrong?”
— from “Once in a Lifetime”, The Talking Heads
Possibly the best justification for implementing an AI solution is to free up your time and energy (or your colleagues’) so you can spend it on the good stuff. The meaningful tasks. The interesting questions. The strategic work. Whatever that looks like for you. What you want is an AI-automated solution to streamline your processes and take care of the tedious, repetitive tasks that’re a drain on your resources.
Let’s consider the following real-world scenario to see how the Two-Step Test might be applied to a business challenge.
Case study:
Imagine you own a property consulting agency. Year in and year out, you’ve wished you could anticipate fluctuations in the property market. It’d give your business an advantage in a very competitive housing market. But your experienced team of real estate agents disagree on property value trends and frankly, they’re too busy to be bothered. Their time is better spent on connecting with clients than doing the desk research to estimate individual home prices.
Now it’s your turn to identify a repetitive task to test out as an AI project idea.
Have you identified a task? Move on to Step 1.
Step 1. Can it be done with traditional machine learning?
Case study:So you’ve decided you want to automate the task of predicting property value trends. What now? A very good option, if you have historical data on home sale prices and property characteristics, would be to leverage traditional predictive machine learning techniques. The historical data could be used to train e.g. a linear regression model, which would identify relationships between the relevant variables and predict future home values.
Traditional machine learning excels at tasks that involve identifying patterns in existing data. Feed it good quality data and watch it spin predictions into business value.
NOTE — the success of any ML model hinges on the quality and quantity of data it has to learn from, so to use an ML approach it’s essential you have data suitable for your model to learn from. (I’ll save a deeper discussion for a future post).
For a more in-depth exploration of use-cases that allow traditional ML to shine, check out Part I of this Good AI Use-Case series.
Did you answer ‘Yes’ to the question of whether you can use classic predictive ML approaches? Hooray! Decision paralysis averted. Now escape the cycle at Step 1, and off you go!
If your answer is that traditional ML isn’t a viable option for your use-case, move on to Step 2.
Step 2. How do you feel about mistakes?
Case study:
Let’s stretch our housing market narrative a bit. Imagine your original idea has transformed to brainstorming potential housing market scenarios. Now you can use a generative AI model to create hypothetical scenarios based on historical data and descriptions. Your scenarios could then be analysed by your real estate experts, to identify potential trends and opportunities in each of the scenarios.
Generative AI excels at tasks that entail the creation of new outputs (“data”), whether that output consists of compelling narratives or innovative product designs or synthetic data for model training. But caveat emptor. Gen AI systems tend to prioritize aesthetic criteria, such as style or composition, over the accuracy of the generated content. No matter how shiny the packaging, a polished appearance does not guarantee the underlying content is accurate or factually correct.
The non-deterministic nature of generative AI means its results can fluctuate. As data and AI strategists, part of our job is to remind colleagues that even the most advanced tools can produce unexpected outcomes. Gen AI is not infallible. The inherent risk of errors in gen AI outputs is a reality that must be factored into project plans.
If there’s anything my teams and I’ve learned while building gen AI solutions, it’s that even the most advanced AI systems will make mistakes. Some errors, like context drift, seem small and mainly go unnoticed. Context drift is when generated output struggles to maintain a consistent topic throughout a longer discussion. In a chatbot project, these were incremental mistakes that led to slightly drifting conversation. It was a mild annoyance. But in the case of a code generation model, context drift manifested as incorrect or incomplete code snippets because the model lost track of the original context of the code. Much more mission-critical. The lessons here are that minor errors may eventually undermine the reliability of the generated output.
Much more public examples of mistakes are out there, too. Take the time when Twitter / X’s chatbot, Grok, misinterpreted tweets and generated a news headline accusing a professional basketball player of vandalism. The blunder stemmed from Grok misinterpreting social media posts that joked about the player “shooting bricks” (aka missing shots). Grok’s gaffe gives us a very potent (and very public) example of the dangers of AI misinterpreting informal language. If a generative AI model creates output that leads to legal or ethical consequences, who’s to blame? Ouch. Let’s agree to learn from these mistakes.
When we use generative AI in solutions we have to accept the reality and prepare for bungling and bloopers. Best case scenario, you don’t mind mistakes in AI outputs. Otherwise, you need to know how to check the generated outputs for correctness and plan accordingly. Ask yourself:
Is my use-case resilient to mistakes?
If not, how will I detect and correct any mistakes?
Mistakes Were Made (or The Subtle Art of Handling AI Mistakes)
What would a use-case look like where you don’t mind mistakes? In the best “I don’t mind mistakes” cases, AI will be used as a tool to augment human decision-making, not replace it. Your use-case might sound something like this:
AI-powered brainstorming and creative ideas generation. Sometimes we need a spark to get the creative juices flowing. Generative AI can be used to create random ideas or variations on existing concepts. Even when nonsensical, these ideas can inspire human designers or marketers and be a springboard for creative exploration.
AI-assisted content moderation on social media platforms. The sheer volume of content on social media means human moderation of the platforms is nearly impossible. AI can be used to flag potentially offensive or harmful content for human reviewers. The AI might make some mistakes, like unnecessarily flagging innocuous content (false positive), but it can still be a valuable efficiency tool for reducing and focusing the workloads of human moderators.
Internal or customer-facing chatbots. Internal chatbots can be used to sift through vast volumes of text to provide your employees with an accurate, documented answer. Or you might use AI chatbots for routine tasks like processing returns or guiding your customers to the right department. While your chatbots might make mistakes or misunderstand complex questions, they can still be helpful for handling simple inquiries and freeing up humans for handling the more complex issues.
Did you answer ‘Yes’ to the question of whether you’re okay with mistakes? Super! Decision paralysis averted. Now escape the cycle at Step 2, and carry on!
If your answer is that you DO mind mistakes in your output, move on to Step 2.5.
Step 2.5. Do you know how you’ll detect mistakes ?
Case study:
Your property consulting agency wants to ensure the quality of the AI-generated scenarios, therefore it’s essential to implement methods for detecting potential errors. You could compare the generated scenarios to real-world historical events and analyse any inconsistencies. Additionally, a team of real estate experts could review the scenarios and provide feedback on their plausibility and alignment with property value trends.
The other option for handling the inevitable errors in the generated output from your AI, is to accept that they’ll be there and have a plan to sort out how to detect them. You’ll also need the capacity to correct mistakes before any damage is done. Some inspo for how and where to detect mistakes:
Use metrics for downstream model performance. Multiple metrics exist for assessing how the generated output compares to human outputs. For example, the BLEU score measures the similarity between a machine-generated answer and a set of reference answers provided by humans.
Evaluate AI performance against actual human response data. Methods like the TSTR score (Train Synthetic, Test Real) give you the ability to measure the quality of synthetic data. By training separate ML models — one on generated data and one on ‘real’ data — TSTR allows you to compare performance of the two models and assess whether and where the synthetic data resulted in mistakes.
Use human evaluation. Human experts well-versed in the use-case and expected outcomes, can provide the most comprehensive assessment of generative AI outputs. Although resource-intensive, this approach offers unparalleled insights into quality, creativity, coherence, and factual accuracy. Is the tweet summary based on a joke? Is the flagged content offensive or not? Including a human-in-the-loop is crucial when dealing with subjective aspects that metrics might not fully capture.
In medical diagnosis. Quality assurance measures, such as cross-validating with expert clinicians and comparison with established diagnostic standards, are essential to detecting errors in medical settings.
In fraud detection. Continuous monitoring of model performance against known fraud cases and the development of anomaly detection algorithms to identify discrepancies between expected and actual patterns are required for fraud use-cases.
In machine translation. Translation use-cases require the use of metrics to evaluate both fluency and accuracy, such as BLEU and ROUGE scores, as well as human evaluation to identify contextual and semantic inconsistencies.
Safeguards like human oversight and downstream metrics will be the key to ensuring quality and reliability in your operational AI workflows.
Did you answer ‘Yes’ to having a plan to detect mistakes? Brilliant! Decision paralysis averted. Now escape the cycle at Step 2.5, and get to work!
If your answer is that you DO NOT have a plan and cannot put one in place for detecting mistakes in your output, STOP. Go back to the start. Do not pass go. Unfortunately, this idea may not be a good AI use-case. Give it some more consideration — Can you plug any data or metrics gaps? Or do you have another project idea to explore?— then try testing the idea again.
Recap
The Two (and a half)-Step Test
Remember, you now have the tools in-hand that’ll help you escape decision paralysis. Use this simple Two-Step Test to choose the right AI approach:
1. Can your task be accomplished with traditional machine learning?Traditional ML excels in tasks where interpretability and efficiency are crucial. Its ability to learn and make predictions from existing data makes it a strong choice for tasks like predicting housing market fluctuations, optimising inventory, and personalising plans (e.g., for medical treatments).
2. Mistakes will be made. How do you feel about that? If your AI will be used as a tool to augment human decision-making and not replace it, then you won’t mind mistakes. After all, there will be a human-in-the-loop for your tasks like brainstorming, assisted content-moderation, and internal chatbots. If creating new information or mimicking real-world data is essential, then generative AI may be your champion.
2.5. Mistakes were made. How will you find them? Be absolutely sure you have a plan for how you’ll handle mistakes when they happen, because they will happen. Gen AI opens up possibilities like drug discovery through virtual exploration of chemical spaces, automated content creation, and the generation of synthetic data for training other AI models. It’s a brave new world.
Looking ahead
Now that you’ve learned about the Two(and a half)-Step Test for evaluating AI use-cases, you’re equipped to make informed decisions and choose the right approach for your specific circumstances.
Are you ready to put your next big idea to the test?
Check out Part I in this Good AI Use-Case series if you’re keen to understand the strengths and limitations of both traditional machine learning and generative AI, and also examples of when/where to use them.
Stay tuned for future articles in this Good AI Use Case series. And if there’s a topic you’d like to see in this series, let me know in the comments!
Commentaires