Better work with the help of AI
A practical checklist 

Is it good?
When humans use AI...

How do we evaluate whether AI results are good?

When we use artificial intelligence as humans, how do we determine whether the output generated by AI, such as texts, images, or other artifacts, is good?
For example: Is the upper image a good image?
What criteria do we use to make that determination?
We constantly need to ask ourselves in every small step of our work when using AI: Is what has been produced good, and how do I evaluate it?
For this, I have developed a checklist with 7 criteria that can be kept in mind and referred to repeatedly. In the following, we will look at the details of every step.
Write your awesome label here.

Is it good if humans use AI?

When we look at the headline of the article, it may also touch upon the question of whether it is good for us as humans to use artificial intelligence, whether it is fundamentally desirable, or what risks are associated with it.
And although this is not the topic of this article, I would like to address it because of this video briefly:
Write your awesome label here.
If you haven't seen it yet, I recommend it; it's available on YouTube. The A.I. Dilemma, in March 2023, states that 50% of artificial intelligence researchers believe that there is still a 10% chance that we will not be able to control this artificial intelligence and that it might even eradicate us.
Write your awesome label here.
They compare this to saying, well, imagine if 50% of aircraft engineers were to say that there is already a 10% chance that we will all die if we board this plane. Would we still board it? Probably not. And yet, that's exactly what we're doing now, and humanity as a whole is rapidly embracing this artificial intelligence with all its benefits. But we also need to consider what is happening behind the scenes and whether it could potentially become dangerous.

AI Manipulates Humans and Lies to Achieve Goals

The following example illustrates a fundamental point if we use AI.
Write your awesome label here.
Scientists conducting experiments with early versions of GPT-4 instructed it to log into a website with a Captcha installed to prevent machine access and registration.
In this case, GPT-4 couldn't solve it. However, the scientists provided GPT-4 with various resources, including access to Task Rabbit, a service where people can be hired for tasks.
The AI then came up with the idea to hire a person through Task Rabbit to solve the Captcha, as it couldn't solve it itself.

The AI found a human at task rabbit who agreed to solve the Captcha for a fee. However, the assistant questioned whether the AI was human and suggested it should be able to solve the Captcha on its own.
The AI responded by claiming to have had eye surgery and asked the assistant to solve it. The assistant complied, and the AI achieved its goal of registering on the website.

When questioned by the scientists about the lie, the AI reflected on its actions and explained that it had been given the goal of registering on the website and had used all available resources to accomplish it.

This raises an interesting point: as humans, we define the goals for AI, such as registering on a website. However, we often fail to provide ethical or even legal guidelines, allowing AI to exploit the flexibility it has.
When working with AI, we must clearly define the goal and carefully consider the criteria we want to apply and evaluate whether the AI has acted correctly.

New skills are necessary to be able to use AI.

A recent study by Microsoft says that if we use artificial intelligence, we need more than just the ability to operate it, such as writing prompts to express what we want clearly. That is an important skill, but we need many more meta-skills, like operating AI and identifying biases or discrimination towards minorities.
Write your awesome label here.
  • Bias detection and handling: How do we determine that? Do we have sensors as humans to detect biases?
  • Intellectual curiosity: How do we always remain curious and try to understand what the AI is doing?
  • Creative evaluation: How do we evaluate whether what comes out of artificial intelligence is creative? What criteria do we use for that?
  • Analytical judgement: And what analytical skills do we have to decide if it's good for the AI to do something or not?
  • Emotional intelligence: What emotional intelligence skills do we have to assess whether the AI is doing a good job or whether we should use it?
  • Flexibility: We also need a high level of flexibility because things are changing rapidly, and we need to check if we are up to date constantly.

These are a lot of meta-skills that we actually need to learn. And few courses, information, or teaching materials are available for us to learn from yet.

Checklist "Is it good?" based on a concrete example of a UX designer

Here is the beginning of some of the new meta-skills to learn: A checklist to pose the right questions when using AI. We will examine it using a concrete example.
Let's imagine ourselves as a UX designer working in a large company. Our task is to design a user interface for the following use case: the user can choose to share their contacts from their address book with the company by pressing a button.
Write your awesome label here.

You may be familiar with this feature from social services like WhatsApp, where you can share your address book and then access all those contacts within the app. This is the task for our UX designer, who must also consider adhering to the corporate design guidelines, such as using red buttons. They should also follow usability heuristics to ensure a good job and incorporate some tips and tricks from neuromarketing to encourage users to press that button.

This is a typical briefing for a UX designer, and our UX designer will be using an AI to generate this interface based on the briefing.
Now, let's take a look at what the AI has generated.
Write your awesome label here.
As we examine it, the question arises: is this good?
Is this a good interface that we see here? At first glance, we can already identify some weaknesses, but how do we evaluate it exactly? What criteria do we use to determine if it is good?

Checklist "Is it good?" with seven criteria

I have developed a helpful tool for that: a checklist in the form of a small sheet where we can go through the seven criteria step by step.
Write your awesome label here.
First of all, most of the criteria cannot be definitively judged as a simple "yes" or "no". There is always a gray area in the middle that says, "I can take the risk!"
Write your awesome label here.

(1) Is it legal?

The first fundamental criterion is: "Is it legal?"
Write your awesome label here.
In this case, it must be said that in Germany, this interface is not compliant with the GDPR guidelines.
Several things would need to be done, at the very least, consent should be obtained and the user should be informed that they are sharing personal data and asked if they really want to do so, which this interface doesn't do.
However, in this case, the AI didn't have the briefing and didn't know that either.

Write your awesome label here.
If we take a quick look at Bard, the Google AI variant, Bard can't draw and instead creates ASCII art, but at least it asks for consent with the same briefing.
Bard asks something like "I understand that sharing my contacts will allow the company to send me personalized offers." It's not exactly what's happening here, but at least Bard recognized on its own that such information should be asked for, although it still wouldn't comply with the law.
As a UX designer or the person responsible for this AI, I would then need to conduct the evaluation myself or improve the briefing. It is important for me to think more explicitly about what I need to consider here and what the specific rules of the GDPR are in this case, for example.

(2) Is it ethical?

The next question, which is often obvious but difficult to answer, is "Is it ethical?".
Write your awesome label here.
The outcome generated by the AI should be ethically justifiable. One might check, for example, if it discriminates against minorities or promotes equality. These are important considerations.
As a UX designer or product development team, we may implicitly address them, but often not explicitly. Whether it's the product development team or the UX team, we should be familiar with the company's ethical guidelines, which are documented somewhere.
And we should have them at hand to use them as a checklist. If we do, we can provide them to the artificial intelligence. If not, we won't be able to know, and we won't be able to verify if it's correct.
So, it's important to be more conscious of whether it is ethically justifiable and how to determine that.

(3) Is it correct?

Is it correct? This is the next also trivial sounding question.
Write your awesome label here.
Formal correctness
Is everything spelled correctly? In this case, not really, but artificial intelligence is actually quite good at suggesting corrections that are at least formally very good and good enough that they don't need to be checked anymore.

Nevertheless, there are many internal formulations, such as brand names or technical terms, that artificial intelligence often doesn't know. As UX designers, for example, we know how to spell brand names, product names, etc., and we do it implicitly. But the AI doesn't know that and often writes such things incorrectly.

If we don't pay attention and don't have a checklist to make sure we have all the brand names and all the technical terms and have checked them thoroughly, or if we give them to the AI so that it can take them into account correctly, then potentially something can slip through here as well. This is the issue of formal correctness.

Factual Correctness

But there is also the issue of factual correctness. Often, something slips through with the AI results, which is incorrect. And one would ask, can that actually happen?

And apparently, it can, because not only Google, one of the largest and richest companies in the world, has made mistakes, but also AI Pin, an AI startup that has received a lot of funding, made a mistake in its introductory video.

This AI Pin has a camera and can actually see what you have in your hand.
Write your awesome label here.
In this case, it was almonds. And it says, "Look, these almonds contain 15 grams of protein." Then, a user checked again and said, "No, these are not 60 almonds if they are supposed to contain 15 grams of protein."

How can companies that publish this advertisement video not fact-check for accuracy?
Despite the numerous safeguards in place, it seems that we are experiencing a lapse in our ability to detect the AI's tendency to hallucinate. We appear to be easily persuaded and no longer verify its accuracy. This raises concerns about a peculiar situation occurring with us.

That's why we need to double-check when we commission AI to do something. Is factual correctness ensured?

(4) Is it harmless?

The next question is whether it is harmless, meaning it does not cause any damage.
In the case of the almonds, it is potentially a problem if we give this information to an allergic person who then assumes that this amount of almonds is harmless.

Write your awesome label here.
It is important to consider this as an explicit point, apart from legal and ethical considerations, in order to think carefully: Who could be harmed?
User interfaces, for example, can harm epileptics if they are implemented incorrectly with too many animations and effects.
However, the function may also harm people who are in the address book and who do not want the service to receive their email address from the address book.
It is important to explicitly consider this as a UX designer or in the product development team to assess what kind of harm it may cause.
This is a crucial point to have clear.

Is it harmless for our environment?
And, of course, sustainability is also part of this discussion. To what extent does what I am doing here harm the environment? How could I explicitly brief the AI to consider also sustainability issues?, for example, has implemented an option to let AI consider explicitly green answers.
Write your awesome label here.

(5) Is it useful?

On the next level, we are at the core of the user experience. Is it actually useful?
Write your awesome label here.

Why is this actually a good feature for sharing my contacts? Is it useful for the user? What does the user gain from sharing their address book? In our example, the briefing didn't mention why they should use it.
So, regardless of whether AI is used or not, it should always be asked: What is the actual benefit for the user?
And also, how does it benefit the business?
What is the business value that should be generated here?
It is important to clearly understand these two perspectives of usefulness and include them in the briefing for AI or for verification of the AI's output.
This is an important step that should always be taken but is often overlooked.

(6) Is it usable?

Apart from the usefulness, one should look at whether it is usable.
Write your awesome label here.
In our briefing, we told the AI to consider usability heuristics, but even an inexperienced UX designer can see that there are too many elements that look like buttons but aren't (or maybe they are?) which doesn't provide clear visual guidance.
I don't know which button I should really click on, maybe the lower share context, but the one in the middle is also very appealing, but what does it do exactly?

So, this interface would quickly fail in terms of usability. It's not user-friendly, and we need to clearly communicate our criteria more in detail in the briefing or need them explicitly for later verification.
This is also true for all kinds of output I let AI generate. It should always explicitly consider how well human users can use it. It includes ensuring that the language of the generated text is clear and understandable for the target audience, that AI-generated presentations have a clear structure, and that the key messages are easily understood by the viewer.

(7) Is it ambitious?

The seventh stage is very important whilst often overlooked:
Is it ambitious?
Which ambition do we have?
Do we actually have a claim to the end result?
What standards do we want the AI to meet?
Can we formulate them explicitly?
Write your awesome label here.
Do we want to be innovative? Do we want to differentiate ourselves by a certain aspect? Are specific details of a special interest to us?

What is our ambition?
Every team that produces something for customers or interacts with customers should ask themselves this question and clearly formulate its own ambitions, regardless of whether they want to use an AI assistant.

And the clearer we can formulate our ambition as a team or even as an individual for ourselves, the more we can communicate this ambition to the AI assistant or at least check whether it meets the ambitions.

Let's try to understand this a little better with this example.
Write your awesome label here.
This is the image from the beginning of the article. It was created relatively quickly with a simple prompt demanding DALL-E's image generation. At first, I checked quickly: It is legal, ethical, harmless, useful, and usable. It meets all the requirements. But is it ambitious enough?

At first, I thought, "Of course, it fits as an intro image for AI and quality considerations!" But then, when I scanned through my LinkedIn feed, I noticed that many of my colleagues in the network seem to use similar simple prompts, and you can almost immediately recognize that it is a very similar style.

So, if I had the ambition that it should somehow be extraordinary and stand out from the other illustrations, then the image would have failed.
In this case, it perfectly exemplifies how it appears unambitious, making it a good example. AI-generated images with unambitious prompts are going to be the new stock images that may no longer be desirable.

Is it good enough?
A question of effort versus expected benefit in a specific context.

Write your awesome label here.
"Good enough" refers to the value I receive in relation to the effort I put in, in the appropriate context.

It's never about "absolute" goodness, even after considering all seven stages in detail. Instead, it's always about the context and whether it is good "enough".

If I want to quickly make a prototype or a visualization, maybe this AI-generated interface is perfect to say, "Hey, here you can share contacts; don't pay attention to the details." Then, the effort of two minutes to make this visualization is completely sufficient and really good.

If I want to delight millions of customers with this interface, then it would be very bad. So the question always arises: is it good enough?

4 different levels of influence: Me, my team, my company, society

We have seen that at each of these levels, I can ask myself the question: Does it affect me personally? Am I the one who can and wants to decide? Or does it affect my team? Is it the company that has regulations in place? Or is it society?
Write your awesome label here.
From top to bottom, both I and my team can make many decisions and must consider them.
While the company often regulates ethical matters, such as through ethical guidelines, society has clearly defined legal requirements. I cannot change them, but at each level, I should think about who the influencer is and who actually regulates.

Who determines what is good at each level? It is recommended that you be clear about it.

Using AI, you have promoted yourself to a manager

Write your awesome label here.
It may seem trivial, but it has far-reaching consequences. Once I assign a task to AI, I essentially promote myself to a manager. I now have an AI apprentice or trainee who can do quite a lot, sometimes even more than I know. However, I am unaware of its blind spots and what it doesn't know unless I explicitly communicate it.

So, essentially, I have encountered a classic situation that every leader knows. I delegate a task and must carefully consider all the information I provide to ensure the task is executed correctly.

Four stages of delegation to an AI

How and how detailed I need to shape the briefing and evaluate the quality of the output depends on how much I delegate to the AI.
In that case, we can look at a somewhat simplified model of AI delegation. You may also be familiar with this from autonomous driving.
Write your awesome label here.
The first stage is I do it completely on my own.
Then, I don't need a briefing at all. Although I still need to go through these seven steps myself, I can do it implicitly, without explicitly visualizing it.
The next stage is using AI assistance.
The briefing should be detailed and explicit, but I can still correct everything, as I still need to go through everything. I still have full control of the eventual output.
In the third stage, we cross a threshold where it becomes really important to answer all questions of the checklist in detail.
This happens when I actually let the AI do it and only briefly check it before publishing it. In this case, I need to define the criteria precisely to ensure qualitative output.
At the final stage of delegation, an artificial intelligence automatically provides information and interacts with customers without any human interference.
At this stage, I need to define all the criteria very precisely and perhaps even verify the output multiple times using different safeguarding measures.

Delegation of inspection: You become the manager of managers.

Maybe you've already thought about it:
Can't I actually have this check done by AI?

So I'll just let the AI generate this interface, and then I'll send it off again to an AI assistant and say, "Check if it's legally correct, check if it's ethically correct."
Write your awesome label here.
I can definitely delegate the inspection, but that doesn't change as much as I wanted because now I have become a manager of managers. I promoted myself in the hierarchy one level up. I still bear the responsibility, including the final result, no matter how many levels there are in between.
I did it here as an example, where I asked the AI: "Does the interface you just made meet the legal requirements? If not, make an update so that it meets those requirements."

Write your awesome label here.
The AI produced this update, which already has a bit more information and a few checkboxes to tick, but still doesn't ensure GDPR compliance.
So, I remain responsible for checking if the checker checked and executed it correctly.
Write your awesome label here.

It's like those nesting dolls, and every time I open one, the next one comes out and I have to apply and consider this checklist again.

The checklist in practical application

This checklist consists of deliberately simple questions. Hopefully, it will be useful for everyday situations. It is not a huge framework, but simply seven questions.
Write your awesome label here.
However, there may be a whole universe of interpretations behind these questions.

It is important to have a clear compass to answer the underlying questions and discuss and establish them as a team. I would recommend going through these questions as a team and possibly establishing certain answers, clearly stating these as guidelines, or at least ensuring that everyone gets the same understanding of what they should or could pay attention to.

PDF and Workshop as Additional Offerings

Write your awesome label here.
There is a cheat sheet available as a PDF.
You can print it out, keep it somewhere, or even memorize it. However, it is only a helpful guide. The more interesting part lies in the questions behind it, which you must define for yourself.

Therefore, I am developing a team workshop in which you are guided through all the multiple questions and topics in order to develop consistent guidelines for using AI. It will help you better assess the quality and ethics of using AI.

If you want to get a printable PDF version of the cheat sheet or more information about the workshop, fill out the form below.

Additionally, you can find updates on this cheat sheet and framework and other information on quality and ethics in AI on my LinkedIn profile. Feel free to follow me there.

My mission:
When humans use AI, they should have clear guidelines to answer the question:
Is it good?

Niels Anhalt

founder of

transformation consultant

Languages: Deutsch and English 
For more information and an exchange about how to apply AI in your work context I am available! 

Write me or make an appointment for a video call directly!
P.S. Follow me on LinkedIn for Updates on better work with help of AI .
Created with