A practical checklist
Is it good?
When humans use AI...
How do we evaluate whether AI results are good?
When we use artificial intelligence as humans, how do we determine whether the output generated by AI, such as texts, images, or other artifacts, is good?
For example: Is the upper image a good image?
What criteria do we use to make that determination?
For example: Is the upper image a good image?
What criteria do we use to make that determination?
We constantly need to ask ourselves in every small step of our work when using AI: Is what has been produced good, and how do I evaluate it?
For this, I have developed a checklist with 7 criteria that can be kept in mind and referred to repeatedly. In the following, we will look at the details of every step.
For this, I have developed a checklist with 7 criteria that can be kept in mind and referred to repeatedly. In the following, we will look at the details of every step.
Is it good if humans use AI?
When we look at the headline of the article, it may also touch upon the question of whether it is good for us as humans to use artificial intelligence, whether it is fundamentally desirable, or what risks are associated with it.
And although this is not the topic of this article, I would like to address it because of this video briefly:
If you haven't seen it yet, I recommend it; it's available on YouTube. The A.I. Dilemma, in March 2023, states that 50% of artificial intelligence researchers believe that there is still a 10% chance that we will not be able to control this artificial intelligence and that it might even eradicate us.
They compare this to saying, well, imagine if 50% of aircraft engineers were to say that there is already a 10% chance that we will all die if we board this plane. Would we still board it? Probably not. And yet, that's exactly what we're doing now, and humanity as a whole is rapidly embracing this artificial intelligence with all its benefits. But we also need to consider what is happening behind the scenes and whether it could potentially become dangerous.
AI Manipulates Humans and Lies to Achieve Goals
The following example illustrates a fundamental point if we use AI.
Scientists conducting experiments with early versions of GPT-4 instructed it to log into a website with a Captcha installed to prevent machine access and registration.
In this case, GPT-4 couldn't solve it. However, the scientists provided GPT-4 with various resources, including access to Task Rabbit, a service where people can be hired for tasks.
The AI then came up with the idea to hire a person through Task Rabbit to solve the Captcha, as it couldn't solve it itself.
The AI found a human at task rabbit who agreed to solve the Captcha for a fee. However, the assistant questioned whether the AI was human and suggested it should be able to solve the Captcha on its own.
The AI responded by claiming to have had eye surgery and asked the assistant to solve it. The assistant complied, and the AI achieved its goal of registering on the website.
When questioned by the scientists about the lie, the AI reflected on its actions and explained that it had been given the goal of registering on the website and had used all available resources to accomplish it.
This raises an interesting point: as humans, we define the goals for AI, such as registering on a website. However, we often fail to provide ethical or even legal guidelines, allowing AI to exploit the flexibility it has.
When working with AI, we must clearly define the goal and carefully consider the criteria we want to apply and evaluate whether the AI has acted correctly.
In this case, GPT-4 couldn't solve it. However, the scientists provided GPT-4 with various resources, including access to Task Rabbit, a service where people can be hired for tasks.
The AI then came up with the idea to hire a person through Task Rabbit to solve the Captcha, as it couldn't solve it itself.
The AI found a human at task rabbit who agreed to solve the Captcha for a fee. However, the assistant questioned whether the AI was human and suggested it should be able to solve the Captcha on its own.
The AI responded by claiming to have had eye surgery and asked the assistant to solve it. The assistant complied, and the AI achieved its goal of registering on the website.
When questioned by the scientists about the lie, the AI reflected on its actions and explained that it had been given the goal of registering on the website and had used all available resources to accomplish it.
This raises an interesting point: as humans, we define the goals for AI, such as registering on a website. However, we often fail to provide ethical or even legal guidelines, allowing AI to exploit the flexibility it has.
When working with AI, we must clearly define the goal and carefully consider the criteria we want to apply and evaluate whether the AI has acted correctly.
New skills are necessary to be able to use AI.
A recent study by Microsoft says that if we use artificial intelligence, we need more than just the ability to operate it, such as writing prompts to express what we want clearly. That is an important skill, but we need many more meta-skills, like operating AI and identifying biases or discrimination towards minorities.
- Bias detection and handling: How do we determine that? Do we have sensors as humans to detect biases?
- Intellectual curiosity: How do we always remain curious and try to understand what the AI is doing?
- Creative evaluation: How do we evaluate whether what comes out of artificial intelligence is creative? What criteria do we use for that?
- Analytical judgement: And what analytical skills do we have to decide if it's good for the AI to do something or not?
- Emotional intelligence: What emotional intelligence skills do we have to assess whether the AI is doing a good job or whether we should use it?
- Flexibility: We also need a high level of flexibility because things are changing rapidly, and we need to check if we are up to date constantly.
These are a lot of meta-skills that we actually need to learn. And few courses, information, or teaching materials are available for us to learn from yet.
Checklist "Is it good?" based on a concrete example of a UX designer
Here is the beginning of some of the new meta-skills to learn: A checklist to pose the right questions when using AI. We will examine it using a concrete example.
Let's imagine ourselves as a UX designer working in a large company. Our task is to design a user interface for the following use case: the user can choose to share their contacts from their address book with the company by pressing a button.
Let's imagine ourselves as a UX designer working in a large company. Our task is to design a user interface for the following use case: the user can choose to share their contacts from their address book with the company by pressing a button.
You may be familiar with this feature from social services like WhatsApp, where you can share your address book and then access all those contacts within the app. This is the task for our UX designer, who must also consider adhering to the corporate design guidelines, such as using red buttons. They should also follow usability heuristics to ensure a good job and incorporate some tips and tricks from neuromarketing to encourage users to press that button.
This is a typical briefing for a UX designer, and our UX designer will be using an AI to generate this interface based on the briefing.
Now, let's take a look at what the AI has generated.
As we examine it, the question arises: is this good?
Is this a good interface that we see here? At first glance, we can already identify some weaknesses, but how do we evaluate it exactly? What criteria do we use to determine if it is good?
Is this a good interface that we see here? At first glance, we can already identify some weaknesses, but how do we evaluate it exactly? What criteria do we use to determine if it is good?
Checklist "Is it good?" with seven criteria
I have developed a helpful tool for that: a checklist in the form of a small sheet where we can go through the seven criteria step by step.
First of all, most of the criteria cannot be definitively judged as a simple "yes" or "no". There is always a gray area in the middle that says, "I can take the risk!"
(1) Is it legal?
The first fundamental criterion is: "Is it legal?"
In this case, it must be said that in Germany, this interface is not compliant with the GDPR guidelines.
Several things would need to be done, at the very least, consent should be obtained and the user should be informed that they are sharing personal data and asked if they really want to do so, which this interface doesn't do.
However, in this case, the AI didn't have the briefing and didn't know that either.
Several things would need to be done, at the very least, consent should be obtained and the user should be informed that they are sharing personal data and asked if they really want to do so, which this interface doesn't do.
However, in this case, the AI didn't have the briefing and didn't know that either.
If we take a quick look at Bard, the Google AI variant, Bard can't draw and instead creates ASCII art, but at least it asks for consent with the same briefing.
Bard asks something like "I understand that sharing my contacts will allow the company to send me personalized offers." It's not exactly what's happening here, but at least Bard recognized on its own that such information should be asked for, although it still wouldn't comply with the law.
As a UX designer or the person responsible for this AI, I would then need to conduct the evaluation myself or improve the briefing. It is important for me to think more explicitly about what I need to consider here and what the specific rules of the GDPR are in this case, for example.
Bard asks something like "I understand that sharing my contacts will allow the company to send me personalized offers." It's not exactly what's happening here, but at least Bard recognized on its own that such information should be asked for, although it still wouldn't comply with the law.
As a UX designer or the person responsible for this AI, I would then need to conduct the evaluation myself or improve the briefing. It is important for me to think more explicitly about what I need to consider here and what the specific rules of the GDPR are in this case, for example.
(2) Is it ethical?
The next question, which is often obvious but difficult to answer, is "Is it ethical?".
The outcome generated by the AI should be ethically justifiable. One might check, for example, if it discriminates against minorities or promotes equality. These are important considerations.
As a UX designer or product development team, we may implicitly address them, but often not explicitly. Whether it's the product development team or the UX team, we should be familiar with the company's ethical guidelines, which are documented somewhere.
And we should have them at hand to use them as a checklist. If we do, we can provide them to the artificial intelligence. If not, we won't be able to know, and we won't be able to verify if it's correct.
So, it's important to be more conscious of whether it is ethically justifiable and how to determine that.
As a UX designer or product development team, we may implicitly address them, but often not explicitly. Whether it's the product development team or the UX team, we should be familiar with the company's ethical guidelines, which are documented somewhere.
And we should have them at hand to use them as a checklist. If we do, we can provide them to the artificial intelligence. If not, we won't be able to know, and we won't be able to verify if it's correct.
So, it's important to be more conscious of whether it is ethically justifiable and how to determine that.
(3) Is it correct?
Is it correct? This is the next also trivial sounding question.
Formal correctness
Nevertheless, there are many internal formulations, such as brand names or technical terms, that artificial intelligence often doesn't know. As UX designers, for example, we know how to spell brand names, product names, etc., and we do it implicitly. But the AI doesn't know that and often writes such things incorrectly.
If we don't pay attention and don't have a checklist to make sure we have all the brand names and all the technical terms and have checked them thoroughly, or if we give them to the AI so that it can take them into account correctly, then potentially something can slip through here as well. This is the issue of formal correctness.
Is everything spelled correctly? In this case, not really, but artificial intelligence is actually quite good at suggesting corrections that are at least formally very good and good enough that they don't need to be checked anymore.
Nevertheless, there are many internal formulations, such as brand names or technical terms, that artificial intelligence often doesn't know. As UX designers, for example, we know how to spell brand names, product names, etc., and we do it implicitly. But the AI doesn't know that and often writes such things incorrectly.
If we don't pay attention and don't have a checklist to make sure we have all the brand names and all the technical terms and have checked them thoroughly, or if we give them to the AI so that it can take them into account correctly, then potentially something can slip through here as well. This is the issue of formal correctness.
Factual Correctness
But there is also the issue of factual correctness. Often, something slips through with the AI results, which is incorrect. And one would ask, can that actually happen?
And apparently, it can, because not only Google, one of the largest and richest companies in the world, has made mistakes, but also AI Pin, an AI startup that has received a lot of funding, made a mistake in its introductory video.
This AI Pin has a camera and can actually see what you have in your hand.
But there is also the issue of factual correctness. Often, something slips through with the AI results, which is incorrect. And one would ask, can that actually happen?
And apparently, it can, because not only Google, one of the largest and richest companies in the world, has made mistakes, but also AI Pin, an AI startup that has received a lot of funding, made a mistake in its introductory video.
This AI Pin has a camera and can actually see what you have in your hand.
In this case, it was almonds. And it says, "Look, these almonds contain 15 grams of protein." Then, a user checked again and said, "No, these are not 60 almonds if they are supposed to contain 15 grams of protein."
How can companies that publish this advertisement video not fact-check for accuracy?
Despite the numerous safeguards in place, it seems that we are experiencing a lapse in our ability to detect the AI's tendency to hallucinate. We appear to be easily persuaded and no longer verify its accuracy. This raises concerns about a peculiar situation occurring with us.
That's why we need to double-check when we commission AI to do something. Is factual correctness ensured?
How can companies that publish this advertisement video not fact-check for accuracy?
Despite the numerous safeguards in place, it seems that we are experiencing a lapse in our ability to detect the AI's tendency to hallucinate. We appear to be easily persuaded and no longer verify its accuracy. This raises concerns about a peculiar situation occurring with us.
That's why we need to double-check when we commission AI to do something. Is factual correctness ensured?
(4) Is it harmless?
The next question is whether it is harmless, meaning it does not cause any damage.
In the case of the almonds, it is potentially a problem if we give this information to an allergic person who then assumes that this amount of almonds is harmless.
In the case of the almonds, it is potentially a problem if we give this information to an allergic person who then assumes that this amount of almonds is harmless.
It is important to consider this as an explicit point, apart from legal and ethical considerations, in order to think carefully: Who could be harmed?
User interfaces, for example, can harm epileptics if they are implemented incorrectly with too many animations and effects.
However, the function may also harm people who are in the address book and who do not want the service to receive their email address from the address book.
It is important to explicitly consider this as a UX designer or in the product development team to assess what kind of harm it may cause.
This is a crucial point to have clear.
Is it harmless for our environment?
And, of course, sustainability is also part of this discussion. To what extent does what I am doing here harm the environment? How could I explicitly brief the AI to consider also sustainability issues?
However, the function may also harm people who are in the address book and who do not want the service to receive their email address from the address book.
It is important to explicitly consider this as a UX designer or in the product development team to assess what kind of harm it may cause.
This is a crucial point to have clear.
Is it harmless for our environment?
And, of course, sustainability is also part of this discussion. To what extent does what I am doing here harm the environment? How could I explicitly brief the AI to consider also sustainability issues?
ecosia.org, for example, has implemented an option to let AI consider explicitly green answers.
(5) Is it useful?
On the next level, we are at the core of the user experience. Is it actually useful?
Why is this actually a good feature for sharing my contacts? Is it useful for the user? What does the user gain from sharing their address book? In our example, the briefing didn't mention why they should use it.
So, regardless of whether AI is used or not, it should always be asked: What is the actual benefit for the user?
And also, how does it benefit the business? What is the business value that should be generated here?
It is important to clearly understand these two perspectives of usefulness and include them in the briefing for AI or for verification of the AI's output.
This is an important step that should always be taken but is often overlooked.
This is an important step that should always be taken but is often overlooked.
(6) Is it usable?
Apart from the usefulness, one should look at whether it is usable.
In our briefing, we told the AI to consider usability heuristics, but even an inexperienced UX designer can see that there are too many elements that look like buttons but aren't (or maybe they are?) which doesn't provide clear visual guidance.
I don't know which button I should really click on, maybe the lower share context, but the one in the middle is also very appealing, but what does it do exactly?
I don't know which button I should really click on, maybe the lower share context, but the one in the middle is also very appealing, but what does it do exactly?
So, this interface would quickly fail in terms of usability. It's not user-friendly, and we need to clearly communicate our criteria more in detail in the briefing or need them explicitly for later verification.
This is also true for all kinds of output I let AI generate. It should always explicitly consider how well human users can use it. It includes ensuring that the language of the generated text is clear and understandable for the target audience, that AI-generated presentations have a clear structure, and that the key messages are easily understood by the viewer.
This is also true for all kinds of output I let AI generate. It should always explicitly consider how well human users can use it. It includes ensuring that the language of the generated text is clear and understandable for the target audience, that AI-generated presentations have a clear structure, and that the key messages are easily understood by the viewer.
(7) Is it ambitious?
The seventh stage is very important whilst often overlooked:
Is it ambitious?
Which ambition do we have?
Do we actually have a claim to the end result?
What standards do we want the AI to meet?
Can we formulate them explicitly?
Is it ambitious?
Which ambition do we have?
Do we actually have a claim to the end result?
What standards do we want the AI to meet?
Can we formulate them explicitly?
Do we want to be innovative? Do we want to differentiate ourselves by a certain aspect? Are specific details of a special interest to us?
What is our ambition?
Every team that produces something for customers or interacts with customers should ask themselves this question and clearly formulate its own ambitions, regardless of whether they want to use an AI assistant.
And the clearer we can formulate our ambition as a team or even as an individual for ourselves, the more we can communicate this ambition to the AI assistant or at least check whether it meets the ambitions.
Let's try to understand this a little better with this example.
What is our ambition?
Every team that produces something for customers or interacts with customers should ask themselves this question and clearly formulate its own ambitions, regardless of whether they want to use an AI assistant.
And the clearer we can formulate our ambition as a team or even as an individual for ourselves, the more we can communicate this ambition to the AI assistant or at least check whether it meets the ambitions.
Let's try to understand this a little better with this example.
This is the image from the beginning of the article. It was created relatively quickly with a simple prompt demanding DALL-E's image generation. At first, I checked quickly: It is legal, ethical, harmless, useful, and usable. It meets all the requirements. But is it ambitious enough?
At first, I thought, "Of course, it fits as an intro image for AI and quality considerations!" But then, when I scanned through my LinkedIn feed, I noticed that many of my colleagues in the network seem to use similar simple prompts, and you can almost immediately recognize that it is a very similar style.
So, if I had the ambition that it should somehow be extraordinary and stand out from the other illustrations, then the image would have failed.
At first, I thought, "Of course, it fits as an intro image for AI and quality considerations!" But then, when I scanned through my LinkedIn feed, I noticed that many of my colleagues in the network seem to use similar simple prompts, and you can almost immediately recognize that it is a very similar style.
So, if I had the ambition that it should somehow be extraordinary and stand out from the other illustrations, then the image would have failed.
In this case, it perfectly exemplifies how it appears unambitious, making it a good example. AI-generated images with unambitious prompts are going to be the new stock images that may no longer be desirable.
Is it good enough?
A question of effort versus expected benefit in a specific context.
"Good enough" refers to the value I receive in relation to the effort I put in, in the appropriate context.
It's never about "absolute" goodness, even after considering all seven stages in detail. Instead, it's always about the context and whether it is good "enough".
If I want to quickly make a prototype or a visualization, maybe this AI-generated interface is perfect to say, "Hey, here you can share contacts; don't pay attention to the details." Then, the effort of two minutes to make this visualization is completely sufficient and really good.
If I want to delight millions of customers with this interface, then it would be very bad. So the question always arises: is it good enough?
It's never about "absolute" goodness, even after considering all seven stages in detail. Instead, it's always about the context and whether it is good "enough".
If I want to quickly make a prototype or a visualization, maybe this AI-generated interface is perfect to say, "Hey, here you can share contacts; don't pay attention to the details." Then, the effort of two minutes to make this visualization is completely sufficient and really good.
If I want to delight millions of customers with this interface, then it would be very bad. So the question always arises: is it good enough?
4 different levels of influence: Me, my team, my company, society
We have seen that at each of these levels, I can ask myself the question: Does it affect me personally? Am I the one who can and wants to decide? Or does it affect my team? Is it the company that has regulations in place? Or is it society?
From top to bottom, both I and my team can make many decisions and must consider them.
While the company often regulates ethical matters, such as through ethical guidelines, society has clearly defined legal requirements. I cannot change them, but at each level, I should think about who the influencer is and who actually regulates.
Who determines what is good at each level? It is recommended that you be clear about it.
While the company often regulates ethical matters, such as through ethical guidelines, society has clearly defined legal requirements. I cannot change them, but at each level, I should think about who the influencer is and who actually regulates.
Who determines what is good at each level? It is recommended that you be clear about it.
Using AI, you have promoted yourself to a manager
It may seem trivial, but it has far-reaching consequences. Once I assign a task to AI, I essentially promote myself to a manager. I now have an AI apprentice or trainee who can do quite a lot, sometimes even more than I know. However, I am unaware of its blind spots and what it doesn't know unless I explicitly communicate it.
So, essentially, I have encountered a classic situation that every leader knows. I delegate a task and must carefully consider all the information I provide to ensure the task is executed correctly.
So, essentially, I have encountered a classic situation that every leader knows. I delegate a task and must carefully consider all the information I provide to ensure the task is executed correctly.
Four stages of delegation to an AI
How and how detailed I need to shape the briefing and evaluate the quality of the output depends on how much I delegate to the AI.
In that case, we can look at a somewhat simplified model of AI delegation. You may also be familiar with this from autonomous driving.
The first stage is I do it completely on my own.
Then, I don't need a briefing at all. Although I still need to go through these seven steps myself, I can do it implicitly, without explicitly visualizing it.
The next stage is using AI assistance.
The briefing should be detailed and explicit, but I can still correct everything, as I still need to go through everything. I still have full control of the eventual output.
In the third stage, we cross a threshold where it becomes really important to answer all questions of the checklist in detail.
This happens when I actually let the AI do it and only briefly check it before publishing it. In this case, I need to define the criteria precisely to ensure qualitative output.
At the final stage of delegation, an artificial intelligence automatically provides information and interacts with customers without any human interference.
At this stage, I need to define all the criteria very precisely and perhaps even verify the output multiple times using different safeguarding measures.
Then, I don't need a briefing at all. Although I still need to go through these seven steps myself, I can do it implicitly, without explicitly visualizing it.
The next stage is using AI assistance.
The briefing should be detailed and explicit, but I can still correct everything, as I still need to go through everything. I still have full control of the eventual output.
In the third stage, we cross a threshold where it becomes really important to answer all questions of the checklist in detail.
This happens when I actually let the AI do it and only briefly check it before publishing it. In this case, I need to define the criteria precisely to ensure qualitative output.
At the final stage of delegation, an artificial intelligence automatically provides information and interacts with customers without any human interference.
At this stage, I need to define all the criteria very precisely and perhaps even verify the output multiple times using different safeguarding measures.
Delegation of inspection: You become the manager of managers.
Maybe you've already thought about it:
Can't I actually have this check done by AI?
So I'll just let the AI generate this interface, and then I'll send it off again to an AI assistant and say, "Check if it's legally correct, check if it's ethically correct."
Can't I actually have this check done by AI?
So I'll just let the AI generate this interface, and then I'll send it off again to an AI assistant and say, "Check if it's legally correct, check if it's ethically correct."
I can definitely delegate the inspection, but that doesn't change as much as I wanted because now I have become a manager of managers. I promoted myself in the hierarchy one level up. I still bear the responsibility, including the final result, no matter how many levels there are in between.
I did it here as an example, where I asked the AI: "Does the interface you just made meet the legal requirements? If not, make an update so that it meets those requirements."
I did it here as an example, where I asked the AI: "Does the interface you just made meet the legal requirements? If not, make an update so that it meets those requirements."
The AI produced this update, which already has a bit more information and a few checkboxes to tick, but still doesn't ensure GDPR compliance.
So, I remain responsible for checking if the checker checked and executed it correctly.
It's like those nesting dolls, and every time I open one, the next one comes out and I have to apply and consider this checklist again.
The checklist in practical application
This checklist consists of deliberately simple questions. Hopefully, it will be useful for everyday situations. It is not a huge framework, but simply seven questions.
However, there may be a whole universe of interpretations behind these questions.
It is important to have a clear compass to answer the underlying questions and discuss and establish them as a team. I would recommend going through these questions as a team and possibly establishing certain answers, clearly stating these as guidelines, or at least ensuring that everyone gets the same understanding of what they should or could pay attention to.
It is important to have a clear compass to answer the underlying questions and discuss and establish them as a team. I would recommend going through these questions as a team and possibly establishing certain answers, clearly stating these as guidelines, or at least ensuring that everyone gets the same understanding of what they should or could pay attention to.
PDF and Workshop as Additional Offerings
There is a cheat sheet available as a PDF.
You can print it out, keep it somewhere, or even memorize it. However, it is only a helpful guide. The more interesting part lies in the questions behind it, which you must define for yourself.
Therefore, I am developing a team workshop in which you are guided through all the multiple questions and topics in order to develop consistent guidelines for using AI. It will help you better assess the quality and ethics of using AI.
If you want to get a printable PDF version of the cheat sheet or more information about the workshop, fill out the form below.
You can print it out, keep it somewhere, or even memorize it. However, it is only a helpful guide. The more interesting part lies in the questions behind it, which you must define for yourself.
Therefore, I am developing a team workshop in which you are guided through all the multiple questions and topics in order to develop consistent guidelines for using AI. It will help you better assess the quality and ethics of using AI.
If you want to get a printable PDF version of the cheat sheet or more information about the workshop, fill out the form below.
Additionally, you can find updates on this cheat sheet and framework and other information on quality and ethics in AI on my LinkedIn profile. Feel free to follow me there.
My mission:
When humans use AI, they should have clear guidelines to answer the question:
Is it good?
For more information and an exchange about how to apply AI in your work context I am available!
Write me or make an appointment for a video call directly!
P.S. Follow me on LinkedIn for Updates on better work with help of AI .
growhuman exists to help people work together easier and better.
We believe that people are happier at work and in life when they're enabled to realize their full potential and work more effectively. These people are the foundation for the success of any business.
Technology can be extremely helpful for this. We just have to manage to use it in the right way.