Real vs. realistic: The subtle but important problems of generative AI systems.

by Bob Glaser, UX Architect, Strategist, Voice and AI Interaction Design

“This was from an earlier version of Dall-E. The prompt was: “a doll with a sad face at a carnival.” At this point March 2023, faces and hands were particularly problematic. Now, while they are much better, the details still tend to have anomalies, even with extensive prompt engineering.

I used to do some work in 3D modeling and rendering. How realistic something needed to be was highly dependant on the amount of detail in the model. But it also depended on the amount of effort in terms of complex detail or and surfacing as well as control and power was available for rendering (depending on how physically accurate the light needed to be). How the image would be viewed and and how long it might be viewed by the person or group that requested it mattered as well. It becomes very much an 80/20 result when it came to setting or addressing reasonable deadlines. Very often it was only the main object (usually a product) that was the primary focus. There may be multiple objects in a hierarchy of precedence or grouping. There might be other times when details in the background and in the shadows were critical to the viewer.

When generative AI is used to create elements it first need to utilize millions of examples of ingested data of what it perceives to the same elements required to match the requested prompt. It does this roughly through algorithms and other programming process using the data that it has been provided. The more data that is provided, the more realistic the result. None of this based on any context or understanding of the real world. The quality of the output was dependent on the quality and quantity (including aspects of variety) of the input. This is why, for example earlier versions of stable diffusion models produced often weird or distorted hands or face, especially the eyes. They didn’t become better by understanding how a hand is constructed or function nor did it understand eye structure within a cranium. What it did was learn represent them more realistically by using more ingested data and improved algorithms.

Until AI is based on factual data and not merely on the most data it will never really be able to correctly answer based on an understanding of the question rather than creating the most acceptable result. The best LLMs, stable diffusion models and other similarly trained systems are most effective because they are based on the largest data sets available. There are several problems with this that must always be remembered.

  1. The natural language approach means that the answers or explanations given are not based on all available facts but rather all available data. Usually the internet is the primary source. This results in responses that are based on sounding as realistic as possible. This also presumes that most information is going to be accurate. It may very well be but if the system has for example, a 96% accuracy rating and you don’t actually know the answer, you are unlikely to be able to discern that if answer is 4% incorrect or that 4% could be a critical part of the answer.

    There are fixes (filters, algorithms and other processes) that are often specific to individual models because filtering systems are build to address these fixes. The weakness of this solution is that those built in corrections or governors address problems only after they have been flagged. They are adjusting the results to be more correct but may not be correcting how the solution is determined. This means that the only real solution is to provide only validated factual data for the model to be trained with. Most of these approaches correct many problems but unfortunately, instead of correcting the problem, they can simply make it more difficult to recognize. The smaller these errors are doesn’t make them less dangerous, it makes them harder to find. Many will remember when they were in school and given a test. Often you see that merely stating the answer was not sufficient – you must show your work. Most generative AI systems I know aren’t truly able to show their work, because of the manner in which they execute a prompt.

    Most system issues are exposed when someone requests (prompts) the system on something that is already known by the requestor. The problem is exacerbated because it is based on the fact that most users aren’t going to ask questions that they already know the answer to. Why would they waste their time on something both futile and unproductive.

    People that are writing prompts know that in order to get something close to what they want requires context. However, what we think we provide as context in our minds has far more details, and often important details, that we often presume when we speak to another person. The details that are provided are prioritized by the system though we might think that the priority is obvious, it isn’t compared to how the AI model is going to interpret it.

    Even when you talk face-to-face, in person with someone, while each of you may have their own perspective, there are myriad conditions which both people can assume. These things include all environmental aspects like light and temperature and what other objects and people are around them and their relative positions and situations and expectations. There may also be contexts such as what the emotional effects of these environmental conditions are. For example it could be two people alone in a doctors waiting room, or alone in the forest, or in a crowd on the street or next to each other at a concert. There generally is no need in conversation to point out all of these elements. They are assumed since both people are in relatively the same environment, and it’s inferred context.
  2. The systems were designed using high end and powerful technology with allows is to ingest huge amounts of data with which to build the model. It is this speed that makes current AI models like LLMs available to a wide range of people and to do it so easily. It is also the mass of unfiltered data that makes their output seem far more amazing than it is. This, unfortunately allows the systems to be mindless con-artists.

    The way systems approach and correct problems has already been well reported on. While the designers of the systems can tell you how an particular request is is processed, they also have pointed out something that is fairly alarming: That is that they can accurately describe how a problem is handled but they can tell you exactly what went into the solution of any specific problem. In certain aspects these are “black boxes” where the input goes in one place and a result comes out the other end.
  3. While there is much discussion on the ethical use of generative AI systems, the part that seems frightening absent is in addressing it from the standpoint of ethical design and development. There are many positions now in business that focus specifically on the ethical use of AI. This is not surprising since it’s general release has found a population who see it as way to exploit people. Any service that’s so ubiquitous would end up being weaponized even if the creators never envisioned it. In 1942, Isaac Asimov first published the three laws of robotics.
  • robot may not injure a human being or, through inaction, allow a human being to come to harm
  • a robot must obey the orders given it by human beings except where such orders would conflict with the First Law
  • a robot must protect its own existence as long as such protection does not conflict with the First or Second Law. (he later added the 4th law)
  • a robot may not harm humanity, or, by inaction, allow humanity to come to harm.

These laws are very well known and, while dated, they still lay the groundwork for designing AI systems to be safe and not to simply assume that they will be used safely, of used with malice or criminal intent. We also have to look at AI more broadly than as simply a utilitarian tool. We must also look at AI working to AGI or strong AI or human-level AI. We have done far more and been more successful at imitating human output than we have replicating the human thought process. Even the label of “real” vs  realistic (or fake) changes with the context with which it is applied.

Consider how social media and the internet were intended to provide information and connection (relatively) freely to all. I don’t think any of the designers had this inconsideration when they were being built. This is not to even imply that they are bad ideas, but rather that designing anything for only altruistic reasons without the though that it could also be a manipulative corrupting power is naive at best. Simply dismissing any potential problem with “No one is going to do that!” is at the minimum of being thoughtless and at its worst, dangerous.

I know that there are AI designers and developers out there who are considering this, it just seems to me to be far fewer than the number we need. I have for some time been trying myself to work out a flexible methodology to build both ethics and safety into a system, and I’ve been working at it for a while. It’s easy to come up with theories but quite a challenge to design a pragmatic method to build and incorporate into systems in a methodical way which can be tested. Each new aspect to the solution brings a new challenge to address.

Unknown's avatar

About rrglaser

Sr. UX Architect/director, with avocations in music, science & technology, fine arts & culture. Finding ways of connecting disparate ideas, facts, and concepts into solving problems. In the last 30 years, I have worked at (among others) various Ad agencies, Xerox, Pitney Bowes, Shortel, Philips (medical imaging R&D), CloudCar, IDbyDNA, and Cisco. I prefer to stand at the vertex of art, technology, culture and design since there is the where the best view of the future exists. "Always learning, since I can't apply what I haven't yet learned."
This entry was posted in Uncategorized and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a comment