Skill Foundry Newsletter

Let's talk about AI and coding this week. Since ChatGPT started on the hype train earlier this year, I've closely watched its progress and evaluated its use cases.

This week, I came across an interesting article referencing a Stanford University study about how ChatGPT's behavior has changed over time. Interestingly, they found that "their (GPT 3 & 4) performance on some tasks have gotten substantially worse over time".

ChatGPT Getting Worse at Code Generation?

While I'm interested in all the things AI can accomplish, my focus is mostly on the code generation capabilities since they impact my learning community and me the most. While I've stated before that I don't find LeetCode to be indicative of new-hire performance, I was surprised that when the researchers tested GPT 4 with 50 problems categorized as "easy" by LeetCode, they found the percentage of acceptable code (passing the tests) dropped from 52% in March to 10% in June.

Before celebrating the death of LLMs, we need to note that the acceptable code level dropped because the LLM didn't follow the instructions and added extra commentary and markup to the generated code. When this was scrubbed out, the performance increased by 18% from March to June.

ChatGPT generated this when I asked it to "generate an image of a robot claw arm trying to put a square peg into a round hole." Not ready for prime-time!

This is something I have noticed when working with LLMs over time. They are becoming more verbose and less likely to follow detailed instructions, which reduces their usefulness. I have also noticed that for some tasks, like generating code to perform a sort, it sometimes will not write the sort code I asked for, instead generating a different type of sort. The code "works", but it's not what I asked for, so you must be a skilled developer to use LLMs effectively.

Issues in Advanced Topics

As I've explored LLMs, they do a fairly poor job with the more advanced critical thinking skills involved in IT. When asking ChatGPT about topics like software architecture, performance tuning, and other senior/architectural concerns, its responses ranged from correct but superficial to completely wrong (and sometimes wrong in a way that would cause serious problems in the software stack).

I'm not the only one who has noticed this, as evidenced in a report by Immunefi, a web security company. Immunefi found that about 64% of respondents said ChatGPT provided "limited accuracy" in identifying security vulnerabilities, and approximately 61% said it lacked the specialized knowledge for identifying exploits that hackers can abuse.

I'm Holding firm on my Opinion

ChatGPT and LLMs are changing the way I work, and they are going to impact jobs, though I'm very skeptical of doom and gloom scenarios for anyone except spam and misinformation peddlers who are going to be able to crank out more content than ever.

I certainly don't trust it to assist job seekers on LinkedIn.

Skill Foundry

Skill Foundry Newsletter - Issue 05

ChatGPT Getting Worse at Code Generation?

Issues in Advanced Topics

I'm Holding firm on my Opinion

Announcing Skill Foundry's Evolution!

Skill Foundry Newsletter - Issue 14

Skill Foundry Newsletter - Issue 14