How Good is ChatGPT?

2023-12-06

The other day, I saw Adric and Dylan playing a card game. Adric was trying to guess the color of cards (red or black) dealt off a deck by Dylan. Correct guesses went in one pile, incorrect guesses in another. Adric seemed to be doing better than chance and Dylan didn't quite believe it.

I pointed out that the dealt cards were left face-up on the table, so Adric could just guess the color least represented there. He was cross I spoiled the secret. But I asked him how well he could expect to do with this strategy. We discussed it a bit and concluded that simulating it would be easier than working out a closed-form solution.

Later that night, he told me he coded it up and found the guesser can get it right about 58% of the time.

This inspired me to have a go myself

/* LIBRARY */

/* APPLICATION */

The next day, we compared code. He'd hard-coded the 52-card deck. I asked him how the edge in this game changes as a function of deck size. He suggested we ask ChatGPT to plot it for us

Impressive! The python translation and results look valid. Here's the data it gave us with a bit of formatting and a plot I added in Excel.

Circling back, I asked ChatGPT about this game in a new session. It wasn't able to solve it from scratch, but it was able to understand its own python code

All in all, a very impressive performance. And vastly improved from just a year ago

What a difference a year makes... pic.twitter.com/gNxISlPjth
— Carl Lumma (@clumma) December 4, 2023

Though it still has some funny limitations