The duck blog: Friday, September 6, 2024 post

Does AI really get it?

Last week I was playing a bit with Cursor, an LLM-powered IDE. After a friend suggested that this is (translated to social network speak) a game-changing thing that will revolutionize the IT industry, I decided that I don't know enough about it to just say "No way" — which I did say anyway. Although I've uninstalled it a few days ago (more on that later), I wanted to share some insights I had during that time.

In this post, I want to touch on a realization I had while I was experimenting with the tool, and give you my opinion about why I think people who believe LLM doesn't "understand" anything are wrong. Well, maybe not technically wrong, but for all intents and purposes they are.

Btw, I'm going to use the terms "AI" and "LLM" interchangeably, but please note that they aren't the same thing. The so-called real AI is a general-purpose learning machine, and LLM is a pre-trained model, it doesn't technically 'learn' anything new.

Coding with Cursor

I'm currently working on a little side project that involves 5 different languages: SQL, Go, HTML, CSS, and JavaScript. Of these, I tried code generation in the three programming languages. The initial impressions ranged from "OMG, wow!" to "Pfft, this is not good." It was a mixed bag of impressions, that later turned out to be related to the different modes that this editor has.

Prior to introducing Cursor, I'd been working on the project on my own for a few weeks. I had the opportunity to try using Cursor in two ways:

To generate the whole project from scratch
To modify the existing project

There wasn't much difference between those two use cases. The results were generally OK for SQL and Go, and abysmal for JavaScript. The general impression is that the code generation lacked context-awareness and acted more like a semi-accurate multi-line auto-complete. Remember, traditional autocomplete is generally mostly accurate when it's present, so this is worse.

After tinkering with Cursor for a bit, I realized it actually has four modes.

Automatic auto-complete
Inline prompt-based code generation
Inline prompt-based code generation with additional context
Chat mode

The first two modes are the ones that didn't work very well. They usually feel more like the AI is making wild guesses than actually understanding anything. The third mode I did not try at all. The fourth mode, though, was really interesting.

In the chat mode, Cursor can be given context, additional information from the terminal and other places, etc. And then we edit the code in a more conversational way.

I quickly learned that asking it to comment on things gives much better results, even asking it to comment on its own generated code or answers.

A little chat

The first trial of the chat feature was troubleshooting a query that wasn't working as I expected. I first troubleshooted it myself and I couldn't figure it out. Then I told Cursor to take a look at the particular query, also giving it the SQL file that contains the schema as additional context, as well as what I get as a result.

Based on the given information it was immediately able to tell that I was doing an inner join instead of the left join, explained why that was not working, and it additionally correctly inferred my intent in the query, and fixed a few other things that would not have worked correctly. I didn't really ask it to fix those things, but it did it anyway. And just in case you missed it, it inferred my intent based on the code that would otherwise fail to fulfill this intent.

A bit later, I decided to give it something a bit more involved. In this app, I have a list of items that are stored in the database. The items form a tree structure by referencing a parent item by id. Under the same parent, the items are ordered in a certain way. I have two additional requirements for the items:

Since this is a collaborative app, the items can be independently inserted into exact same position (e.g., between the same two items) by two different users, and there must not be a conflict when that happens, and when the respective items are exchanged between the users, they must be in the same relative order for both users
It is not desirable to alter the position of the items after the fact unless the items are being moved.

I showed the algorithm to Cursor and asked it to comment on it. I did not give it the additional requirement nor point it to a file that contains them. I wrote the algorithm in plain English, and I didn't use any kind of standard language (because I'm not formally educated in CS). Cursor was able to infer what this algorithm is trying to achieve and for what purpose, and it was able to highlight a few edge cases that I have considered but never bothered to address.

It also commented on a some aspects negatively and suggested corrections. For instance, when calculating the new position, I deliberately bias them towards 20% of the range between the next and previous items because appending to the end is more likely than inserting in the middle. Cursor didn't understand why I was doing this, though. After I told it that these positions are stored as permanent history in the database, it suddenly 'realized' the benefit of the 0.2 bias and corrected its remarks accordingly.

These moments when Cursor would show signs of 'realization' after being given new information has repeated a few times in different conversation. I was quite entertained by that, if I'm being honest.

Does it get it, tho?

The next day, I was thinking about the chat session I had before. And then it dawned on me. What's the difference? How do you tell the difference between real human gets it vs when AI does so?

The more I thought about it, the more I realized I actually couldn't. When I explain something to a human, and they finally understand it, the rest of the conversation builds on the mutual agreement that 'it' is now a thing (I won't say 'fact', because we could technically both be wrong). When an LLM 'understands' something (at least within the context of the chat session), it leads to what sounds to me like the exact same verbal exchange that I would expect from a human that understood something. What's more, it demonstrates the same lack of understanding in some situations that I came to expect from humans (e.g., when I use vague terminology or use terms in slightly non-standard or ambiguous ways). It's actually surprisingly 'organic' if you will. (Yes it does apologize an awful lot, but it's not like there aren't any people who do that.)

My opinion is that, for all intents and purposes, AI does 'get it'. I'm going to further qualify this by saying that:

this 'understanding' is currently not permanent, so it's confined to a particular chat session (yes it 'gets it', but it doesn't 'learn it')
I'm limiting this claim to technical matters on which I tested this particular AI (it uses Claude 3.5 and GPT 4o under the hood)

I'm starting to think the only reason some of us claim it doesn't understand anything is we know it's AI, and we — at least to some extent — understand what it does under the hood. Basically we believe that it cannot technically understand things. In other words, we are very likely prejudiced, thinking inside the box. Because, as I said before, there's actually no basis — just looking at the exchange in the chat window — to claim that it has any less capacity to understand than the average human programmer I've conversed with in real life. And I've talked to some very smart people so my 'average' programmer is a lot smarter than the global average. It also does not have any more capacity to understand than any human I've talked to. And it makes mistakes, just like any human.

I also noticed that it behaves significantly differently depending on whether I formulate my prompts as instructions or questions. For example if I say "Get rid of all semicolons" it will simply do it. But if I say "I think rebalancing positions isn't necessary in this case, what do you think?" it may even subtly insist that rebalancing is a good thing and that I should add it. I wanted to highlight this for those that insist LLMs simply do what we tell them to and don't push back. This is entirely dependent on the attitude they infer from our prompts. Almost like with... real humans.

Now I understand (heh, see what I did there?) that this makes it sound like I'm saying AI is actually fantastic. Which isn't the case. I'm merely saying that it does appear to understand things, and that this appearance as good as the real thing for all intents and purposes. Merely understanding human language isn't the full extent of intelligence, let alone the full extent of human capability. This is especially true given its inability to learn new things — something that causes a bit of frustration.

Why I uninstalled it anyway

I find that Cursor was making me intellectually lazy. As I was getting more and more comfortable with its usage, I developed a habit of whipping it out any time I ran into some issue that I couldn't solve within a few seconds.

The usual approach would be to first do a sanity check to see if I'm understanding correctly how the thing is supposed to work, check the docs, do a quick search to see if anyone's encountered a similar issue, gather more data, etc. With AI, I basically just point it to the code I'm working on, any related code, and paste or describe the error, 9 out of 10 times it's able to tell me exactly what to fix, and even fix it for me. It's nice and easy, and that makes me... lazy. After about a week of that, I was already feeling like I'm starting to lose my edge. It was becoming a lot harder to focus on and think about problems.

It also takes away the joy of growth. It may be a more practical approach if you just want working software. Not necessarily dead easy as it can still make a lot of mistakes, nor free. But for me, I know from experience that once I get the hang of these languages and what I can (and cannot) do with them, I'll be able to write code much faster than it takes me to correct all the mistakes AI makes, so that's something I don't want taken away from me. Yes, it takes time, but no, I've no problem with that.

In short, it was making me dumb. Not making me feel dumb, but making me dumb, period. I'm supposed to be smart, and work smartly on smart things. AI is bad for my ego. 😂

Hajime, the duck guy

Friday, September 6, 2024, by Hajime Yamasaki Vukelic

Coding with Cursor

A little chat

Does it get it, tho?

Why I uninstalled it anyway