The difference between the promise and reality of LLMs and the difference between the promise and reality of humanoid robots are a different order of magnitude.
When a language model fumbles, its mistakes are still wrapped in convincing writing, so the error is only apparent if the user already knows what the answer should be.
When a humanoid robot fumbles, its mistakes are obvious because the physical world offers immediate feedback.
It's the difference between lying on your résumé that you're a world-class gymnast, and having to actually perform.
How much of this is due to nearly all humans already having advanced knowledge of what they would expect out of a humanoid robot in the home?
With the gymnast example, as a non-gymnast, I don’t know the difference between a high and low scoring routine on the floor or beam. If a humanoid robot did a routine and didn’t fall, I would assume all is well. I don’t know the technical details of what is required for a gymnastics competition.
This seems like the same idea as an LLM writing a paper that looks correct to someone who doesn’t already know the answer.
In a home context, this could look like the robot not practicing proper food safety or storage around someone who doesn’t know the details about that kind of thing, which is a good number of people. What it’s doing might look correct enough, and it produces food you can eat… all is well, until you get sick and don’t know why.
Which gymnast competition? The well known ones are more beauty contests with/on gymnastics equipment. However there are also competitions where they measure objective things. I know what I like to see in a beauty contest, but that is also subjective. I too don't know what a technical competition is measuring, but I know they have objective things they look for.
I don’t know what you’re referring to when talking about gymnastics as a beauty contest.
I’m not an expert, but I know there are specific moves with various degrees of difficulty. I believe there is a max score based on that difficulty level, and any imperfection will lower that score, such as a foot pointed or flexed the wrong way at the wrong time, taking an extra step on a landing, etc.
I know all these rules exist, but I’m not an expert where I can say someone had their foot flexed when it should have been pointed. These details would go over my head, where a humanoid robot might get a pass from me, while an actual gymnast or judge would be able to see faults.
I’m kinda torn between “genAI powered robots will have a ground truth reality as a reference, so they will ultimately be more grounded and effective that LLMs” and “LLMs are like drunk uncle Steve with his PHD swimming in vodka, and using genAI in robots will end up as well as having drunk uncle Steve drive home”.
Guardrails on tasks it will attempt are inevitable, but I can also see that becoming a paywalled enshitification farm.
Yeah, imagine "that guy from the pub" who is unemployed for years because he claims to be "overqualified for everything", and then add that he knows exactly how to convince you that he is capable of EVERYTHING you throw at him...
LLMs are much closer in practice, they're already useful for a pretty wide range of tasks. Humanoid robots are still comically clumsy and limited, barely able to complete scripted tech demos.
The difference is very easy to define and notorious difficult to solve: it is physics. And man is physics a hard problem to "solve".
Welcome to the world of hard tech not easy machine learning models. Capital is in short supply, it doesn't go nearly as far and you don't get wild multiples in return if you even get any.