Coexistence between Humans and Embodied Agents

And the need for Timing agents


Extended from original post on LinkedIn.


I found this very interesting paper on Coexistence between Humans and Embodied Agents.
I like how it reflects many of the current challenges in building AI-driven character products by the fact that, just putting a face into existing chat AIs, doesn’t really turn that chat AI into an embodied intelligence.
Thus when you see an embodied agent, it tends to fall short of the User’s Expectation about what the agent is or can do - how it can coexist with the user/customer and its environment.

Thanks to the authors Hannah Kuehn, Joseph La Delfa, Miguel Vasco, Danica Kragic, and Iolanda Leite from the KTH Royal Institute of Technology for sharing these thoughts.

They point out that current embodied agents do manage to Exist with humans and their environments, using the latest multimodal foundation models, but introduce three limitations:

  • they are Stagnant in that such models are built from data collected prior to the agent’s deployment and usage;
  • Generic in that such data is very frequently not context- or application-specific, which in turn doesn’t really map to General, nor does it allow the agent to Generalize but instead, leads to perform below Expectation in specialized tasks - where they are more likely to be needed and scrutinized;
  • finally they introduce Steamrolling, a limitation in which, over time, both the user’s creative diversity and the agent’s output diversity will progressively become more narrow (for the agent, due to it being trained on data generated by other agents).

I feel that in addition to these, there’s a need to clarify what embodiment brings into the User Experience in the first place, i.e., we can have a multimodal chat agent understanding our speech, our screen and our surroundings, but how different is that from also having an expressive embodied social agent?

One of the missing pieces, in my opinion, is Timing.
Time drives our human embodiment, but the agent’s mind is typically agnostic of time.
It may understand how to compute time intervals, measure time frames or query datetime-indexed databases, but it does not operate at an embodied time constraint like we do. On one hand, that’s actually their advantage - they can, in most cases, operate faster than us, and solve multiple problems at once - but then as they bring those results back and attempt to close the interaction loop with the user, they fail to implement a continuous, fluid interaction paradigm that properly leverages the fact that they are embodied.

As embodied natural agents, we use our own body to paint our communication in socio-emotional ways and to add spatial dimensions to our content.
But we also use it to rhythmicize our conversation and thoughts so that it’s more easily understandable and retainable by the listeners.

This is yet another reason why I’m stressing that User Experience and Expectations are key in AI-driven products.


Reference:
Kuehn, H., Delfa, J.L., Vasco, M., Kragic, D., & Leite, I. (2025). Humans Co-exist, So Must Embodied Artificial Agents. ArXiv, abs/2502.04809.
https://arxiv.org/abs/2502.04809

Tiago Ribeiro
Tiago Ribeiro
AI Technology & Product Consulting

Eclectic scientist and engineer striving to breathe the Illusion of Life into autonomous characters