Cloud Out Loud Podcast

AI Still Needs Humans in the Loop

Jon and Logan Gallagher Season 1 Episode 34

Send us a text



When identical prompts yield different outputs and a request for "a cat running across the screen" generates four cylinders and a bobbing balloon, the non-deterministic nature of generative AI becomes impossible to ignore. Despite the power of these sophisticated models, we discovered that creating consistent, high-quality results demanded substantial human intervention – not less. Our journey moved from refining prompts to building extensive code example libraries, requiring the very software engineering expertise these systems supposedly reduce.

This experience directly contradicts the popular "vibe programming" approach where developers mindlessly shuttle between AI suggestions and error messages. While this might suffice for weekend projects (as its originator noted), it produces unmaintainable spaghetti code lacking the architectural vision essential for professional software. Real engineering demands creating systems that scale, maintain, and transfer to other developers – not just code that temporarily functions.

The lessons extend beyond development to any organization implementing AI in decision-making processes. These systems lack the contextual understanding to independently determine business priorities or handle nuanced human factors. The future belongs not to AI replacing humans, but to thoughtful partnerships harnessing each party's unique strengths.

Ready to dive deeper into AI's practical realities? Listen now and join the conversation about how we can responsibly integrate these powerful tools while maintaining human oversight and expertise. Share your experiences with us at cloudoutloud@ndhsw.com or @CloudOutLoudPod.

Jon Gallagher on LinkedIn

Logan Gallagher on LinkedIn

The Animation App on Github

Andrej Karpathy on vibe programming

*Extra!* Andrej Karpathy on "Privacy Hygiene" - protecting yourself online now

Announcer:

Welcome to Cloud Out Loud podcast with your hosts John Gallagher and Logan Gallagher. Join these two skeptical enthusiasts or are they enthusiastic skeptics? As they talk to each other about the cloud out loud? These two gents are determined to stay focused on being lazy and cheap as they evaluate what's going on in the cloud, how it affects their projects and company cultures, and sometimes how it affects the world outside of computing infrastructure. Please remember that the opinions expressed here are solely those of the participants and not those of any cloud provider, software vendor or any other entity. As with everything in the software industry, your mileage may vary.

Jon Gallagher:

Welcome back everybody. So this podcast is going to be different. The last podcast we talked about a really interesting application of AI that Logan has created and made available. Take a look at the previous episode if you want to know more about that. Now we're going to talk about lessons learned and we'll still point to the GitHub archive, so there's no problem there, you'll still have access to it. We're going to talk about lessons learned. We're also going to talk about what this means for the application of AI in the workplace, and we're going to harp about our continuous theme to AI, which is, yeah, we're going to harness AI to do the scut work, to release human potential, not replace human potential. So with that, I'm John Gallagher

Logan Gallagher:

and I'm Logan Gallagher,

Jon Gallagher:

and welcome aboard. So, Logan, since our last episode, what have you been up to? On the animation, app.

Logan Gallagher:

Yes, I've made some additional improvements and changes to the app and I've been demoing it in the various classes I've been teaching to address the fact that when you provide a prompt in the app like, let's say, you want tumbling dice or planets, rotating a sun in a solar system, that prompt is being sent to the AI model along with a bunch of additional system instructions that I've embedded in the application code that provide code examples to the model along with the user prompt. And the thing with AI and this is a built-in fact about generative AI models is that they provide unpredictable responses back. They're non-deterministic. You can pass in the exact same prompt two times and get two responses back, and that's awesome because it can potentially provide creative responses. But that leads to variability. You don't quite know what you're going to get returned back to you by the generative AI model.

Jon Gallagher:

Let me jump in here for a second. When we talk about saying the exact same prompt submitted to it, we get different responses maybe not different in magnitude, but definitely different. One of the things is it's not like the model itself is learning in between the first time you asked and the second time you asked. It is because the internal architecture can't guarantee that question A will result in answer A every time Exactly.

Logan Gallagher:

These are such large and complex models. They have incredibly complex inner workings. I'm sending them for complex here and are working, so I'm sending them for complex here.

Jon Gallagher:

I wish right now you could see Logan trying to put into words the interactions that occur inside of these models.

Logan Gallagher:

The neural network.

Logan Gallagher:

Yes, and so that just leads to sometimes some of the weights and neurons that are occurring in this model are going to be returning different things, even if you're sending the same exact prompt, and that can lead to interesting creative responses.

Logan Gallagher:

But if we're wanting to provide consistently high-quality animations, in the case of my application, that does mean that I'm wanting to try to include as many guardrails and examples to the model as possible, and so one of the additions I've added to the application recently is I previously went down the rabbit hole of trying to craft the perfect prompt and figure out the way to write the prompt with the best wording possible to get the best results, and I quickly realized that while writing a well-written prompt can improve your results, it is not going to guarantee you the best possible results every time, and so I have changed course, and now what I'm doing is I have created essentially a catalog of examples of code, examples that I will, that I provide to the generative model along with the user prompt.

Logan Gallagher:

So if the user wants the animation application to generate an animal, I'm providing the model with a group of code examples that would be similar to the output that the model might return for creating a cat walking or a dog running, and there's some capabilities built into the framework. I'm using my application, langchain, where it can look at the user prompt that's being passed into the application and then go and grab the relevant examples from my example catalog to pass to the generative AI model along with the user prompt, which is all great, except that has required me to have to create that catalog of examples, and so it's required me to write Python code or to use an AI to help me generate some Python code, but code that has been heavily reviewed and audited by me as someone that has expertise in writing in this programming language. So even though we're hoping to leverage this generative AI model to create code to create these animations, it sure is requiring a human to write a lot of code to create these animations.

Jon Gallagher:

It sure is requiring a human to write a lot of code, absolutely To write a lot of code and to curate the answer that's coming out. Because there's two things that come out of your experience so far. One is, in many ways the prompt engineering is kind of like putting blindfolds on and guiding someone else through a maze. Like putting blindfolds on and guiding someone else through a maze. If you remember how the maze works, you're still telling someone take two steps and turn right, then turn left and tell me what you see. It's an iterative process, first of all, and second of all, it still is non-deterministic. As we said earlier, I can give you exactly the same wording and you will not give me back the same sort of response back. So that's the first. One is the iteration on that, and that is a lot of time.

Jon Gallagher:

But since we've decided not to do prompt engineering, we're going to go into limiting the case with examples. We're having to create the examples that it bases off of. In one of the first iterations of this I think I asked it to say show me a cat running across the screen and there were four cylinders and a bobbing balloon that were roughly associated. That went somewhere between left, right up down. It is nowhere near that what is necessary either. I could have written a better prompt, but what we've done here is take examples of that will get the machine halfway, three-quarters of the way there. Both of these are requiring a whole lot, not just of human effort, but expertise. Yeah, and that is one of the things that gets covered up all the time is that some incredibly powerful, valuable human time is required to make these things work.

Logan Gallagher:

Absolutely. In my case, it's required me to create and curate all these examples that, hopefully, will lead to the generative AI model being able to extrapolate beyond those examples and create new and interesting animations. It has also required me to set up my application code in such a way that can retrieve those examples and pass them in with the user prompt, and so that is required Software design expertise to create the application and to write the example code.

Jon Gallagher:

So here are these marvelous models, this Gen AI stuff, which we're told is going to eliminate human labor, that's going to put people out of work, and what we're telling you is no, no, the top end of your labor force is going to be necessary in order to make this stuff continue to work. So let's talk about an example from the opposite side, which is a trend that's going on right now People talking about vibe programming. Logan, tell us what vibe programming is.

Logan Gallagher:

Yeah, there's been a term that seems to have entered the discourse online, certainly on X, formerly Twitter and elsewhere. It was a term coined by the former lead research scientist at OpenAI named Andrej Karpathy, and he had put out a tweet back in February saying that these days he is using the generative AI models to do code auto-suggestion when he's writing code in his developer environment and he says that it's gotten so good that he has the model write the code for him and then he'll try to run his application. When his application runs and it spits out an error message, he'll pass that error message back to the model. The model will suggest a fix and then he just kind of continues in that cycle have the ai model write some code, run the code run into an error, pass the error to the model, rents and repeat that cycle. And he was remarking on it pretty approvingly, saying like this is my new workflow I think has called it vibe coding where the software developer is a pretty passive subject in this workflow that seems to be mostly responsible for copying and pasting error messages back and forth to the model and that workflow is all well and good.

Logan Gallagher:

I can't say that I've never passed an error message to a model I certainly have, but that workflow is all well and good until you get this huge sprawling spaghetti tangled mess of code, because a generative error model that's basing its suggestions on its training data set of code examples and probably a training data set of all of the posts that were ever made on Stack Overflow, can suggest a fix to that individual issue, but it may not be taking into account factors like how should we structure this code base to make sure that it's maintainable in the long term, and how should we modularize these classes, methods and functions to ensure that when this code gets passed on to the next person working at this company, they'll still be able to navigate it, and how do we avoid certain common pitfalls and avoid certain design vulnerabilities in our application code? Just writing code by solving one error message at a time with the assistance of an AI is not going to have that more holistic view.

Jon Gallagher:

Yeah, so we're going to post a link to Andre's original tweet. I'm going to call it tweet because it's not a kst and this has inspired a whole bunch of people. I'm in Vibe programming and we're going to base this company's basing on Vibe programming. People didn't read the whole freaking tweet where he says this is great for weekend projects.

Logan Gallagher:

He did say that and he does provide that nice little caveat.

Jon Gallagher:

It's more along the lines of, rather than this is the future of software programming. It's that this is kind of cool and at the end he's like but it gets really tough to understand and I have to tease it out and this is good for a weekend. And meanwhile people are like, oh, I'll get funding from VCs and we'll do Vibe programming. I can't imagine anything more idiotic Anecdote from the beginning of my software engineering career Actually before I was a software engineer and this is literally an inflection point. Those of you who started off, let's say, playing around with BASIC or playing around with toy languages or even more powerful languages maybe the first program you ever wrote started, did some processing and finished and you're yay, I drew stars on a screen or I output prime to 100 places, yay me. You become a software engineer in the same way you become a civil engineer, in the same way you become any sort of engineer where you develop processes that can be optimized, that can be repeated and that can be pulled off the shelf for other situations. If you're hand creating stuff, you're maybe an artist, but you're not an engineer. And I remember the first time I presented a program that was working to an actual CS major. She slapped me around the head and shoulders and said I'm going to teach you structured programming right now, breaking things out into functions and or functions and procedures for those of you from Pascal. Create programs that are maintainable, that are scalable, that have the ability to go from reading from files on a disk to reading data from a database. Programs have to evolve.

Jon Gallagher:

In the background, where we're recording, there's a picture of the St John's Bridge here in Portland, the northernmost bridge on the Willamette River, and I'm looking at it, thinking about over its lifespan and I think it was brought into service in 1931, and it's going to be coming up on its 100th year. It's still able to maintain traffic. It's still despite freezing, despite rain, despite wind, it's able to adapt to all of those things. There's an old joke that if carpenters built the way programmers programmed, the first woodpecker would have destroyed civilization.

Jon Gallagher:

We, as software engineers, we're in charge of creating software that can be maintained, that can scale and that is cost effective, and the only way to do that is to be able to have something that can be passed to another person, that we don't have to engage in artiste every time this program needs to be changed. God help us if Vibe programming becomes a way of doing it, because I've looked at some of the code and it is pure spaghetti and there ain't even a meatball in there. So the idea that we have this incredible power-hungry machine that's just spitting out code that 10-year-olds were doing in BASIC is absurd. But it's a new trend, so we're going to follow that one. Sorry for the rant. It's a particular sore spot for me and for Logan, because our careers started and have continued in creating systems that work and deploying other people's software?

Logan Gallagher:

Yeah, certainly, and when you're responsible for getting other people's software to work, the importance of maintainability really becomes very visceral.

Jon Gallagher:

Oh my gosh.

Logan Gallagher:

When you're running the infrastructure, the platform for other people's applications to run on, yeah, the importance of maintainability of code becomes all too real.

Jon Gallagher:

Yeah, and what does a Vibe programming project do for logging and for instrumentation, the things that we are talking about all the time in the cloud? You put one of those programs that's a black box that has no logging, no instrumentation, in the cloud and it will just burn through all of your money without you being able to do anything about it. Okay, sorry, source bot gone. Your experience with the animation stuff. How can we extrapolate that into a more business context?

Logan Gallagher:

Well, I think a couple of things that this has really brought home to me. First is that a lot of mental attention is required for using AI. Using these very powerful, very capable tools does not give you permission to kind of see that attention. When I've been creating this animation app, I'm still having to think very seriously about the application architecture and having to think very seriously about making sure that the examples that I'm developing are well-written code that hopefully the model can expand upon to create interesting and novel animation. But it still requires a lot of attention and expertise.

Logan Gallagher:

The models can help you create and solve individual, discrete problems, but they don't really understand the why and the how. Even though these models were trained potentially on every single post that's ever been posted onto a website like Stack Overflow, where software developers post the errors and bugs that they've encountered and they solicit feedback from other community members who can offer their suggestions, and then those suggestions can be upvoted on by other people in the community. The generative AI models may have been trained on every single page of Stack Overflow, but they still won't necessarily understand the why of how to solve these problems and the appropriate solutions for software problems. It's still going to be up to us to understand that, why it's still going to require human expertise to guide these powerful tools, at least in our experience of interacting with them. In their current state, we're absolutely not at the point where these tools are capable enough to take over, and it's hard for me to imagine us reaching that point in any foreseeable future.

Jon Gallagher:

I completely agree. There's the human aspect of AI. The title of this module is going to be AI. It still takes a human. The title of this module is going to be AI. It still takes a human Because the AI architecture is that thing that's blundering through the maze and may not even know or care why it's in the maze. The actual entrance and goal factors are something that a human will make a decision about while guiding and optimizing the pathway through, about while guiding and optimizing the pathway through.

Jon Gallagher:

There is no world, no possibility that a AI system can decide what the value. For example, it happens all the time in corporations we're going to assign assets in this particular IT department for optimizing inventory or creating a new system, a new website for ordering or a new personnel tracking system. So these are orthogonal systems that all have some sort of weight or value to the corporation. There's going to have to be some sort of decision made. You could probably put it into a matrix and assign values. Maybe you could try and do an ROI analysis or some sort of net present value analysis that the finance people say should be the determination of all business. But when it comes down to it, it needs some sort of personality to the business itself to say no, no, we're going to optimize our personnel system because we need to be able to scale. The inventory is being handled by people right now and they are learning valuable lessons. As they learn, those lessons, they'll feed back into the inventory system. So it takes a human to prioritize among these three different projects. There's no AI system that's going to be able to optimize that, and the people who are selling AI as if it's going to do that are lying to you or completely unaware of how your business operates.

Jon Gallagher:

Another thing I worry about is AI making decisions and those decisions being exposed to people. Let's take, for example, an automated loan officer. We're a bank. We're going to give all our loan decisions to an AI and we'll sample and we'll make sure nothing bad happens. That assumes that all the data that comes into the loan AI system is regularized, that everyone's going to provide, that you're going to make all the decisions what based on W-2s, based on the assets that people have, without some sort of way of understanding how any of these could be converted to eventually pay off the loan and how could that loan AI system be subverted or be conned by potential borrowers.

Jon Gallagher:

We haven't gotten to the point where it's been deployed enough that people have figured out how to break these. But there's a lot of scenarios that you could think of where people could just send bogus applications in until one gets through. If your marginal cost of sending in an application is the amount of time it takes to send an email, or if you've got a low-cost bot system that's running it, what's the downside of attempting to subvert an automated loan processing system? So we are advocating very much for humans in the loop, and it's based on our experience, both our business philosophies and technical philosophies, but now the experience in running the application, the example that we have provided.

Logan Gallagher:

Absolutely yeah, and we'll continue this journey, I think. Well, I know I personally am going to continue tweaking this app. I'm having a lot of fun, but it's certainly been reinforced to me that, while these tools are powerful, I am still very much having to remain in the loop as a developer of the application. I don't see that changing anytime soon.

Jon Gallagher:

So we're both arguing against just the wholesale implementation of, as in a social governmental context that a lot of people are pushing for AI and pushing for access to data to automate a lot of governmental aspects. Maybe something like the Veterans Administration how do you determine whether someone's the level, that someone has been disabled? That's, there's a formula to it, but there's also a certain amount of human input into it. Can this person does? How much use does this person have left of their arm, given the wounding that they had or the accident that they sustained on a military base? These are things that are not possible to fully automate, but it looks like some entities are trying to do that and it's something that we're very afraid of, and particularly as it expands through the government itself. Any other subjects you want to talk?

Logan Gallagher:

about here. Nope, I think it's a fun little, a bit of a rejoinder to the more sunny outlook from the previous episode. But hey, that's. The fact of the matter is, both sides are present in developing these tools.

Jon Gallagher:

We live in Portland. Sunshine just means rain is on its way. Okay, again, we'll be giving you links to the GitHub repo and instructions on how to play around with the animation application. We'll also give you pointers to the tweet from Andre.

Logan Gallagher:

Yeah.

Jon Gallagher:

And if you have feedback, you just may. This may set some people off or more questions. Please feel free to contact us on all the link points we give on the show notes. So thank you everybody. Yep, Till next time, okay.

Logan Gallagher:

Bye.

Announcer:

Thank you for listening to Cloud Out Loud podcast. Please let us know in comments if you caught either of the gents calling a product or technology by the wrong name. Other information and suggestions are welcome too, or feel free to tweet us at at cloud out loud pod or email us at cloud out loud at ndhswcom. We hope to see you again next week for another episode of Cloud Out Loud.

People on this episode