hammer_mt on Lenny's Newsletter

14 Comments

The difficulty in doing an experiment like this is that:

a) everybody has a different idea of what a good answer is (hundreds of Lenny's audience voted in this poll and many preferred the AI)

b) the real answers to questions like this are not published publicly (exponent is one of the rare places they are, and these were among the highest rated answers)

c) it's hard to know how much effort to put into beating the human (I spent a few hours on each prompt to make the article interesting, but there's aways more to optimize)

I feel like this exercise was enough to prove the point and start a discussion. It's at least interesting that hundreds of people voted for AI over highly rated human answers to a task. What that says about the state of product management vs the rise of AI is an exercise for the reader. :-)

FYI we're planning a more comprehensive follow up. If you'd be open to it, I'd love to do a 'stretch' experiment as a follow up where you submit your answer to a task, I try and beat it with AI, and we run another experiment. We could include it in our PM hard evaluation benchmark we're constructing as a next step to see if these results hold. Can't do it properly without more real human answers from top experts though!

Expand full comment

Like (7)