Solving multi-agent problem in digital assistants

I am new to Artificial Intelligence in general, but I am familiar with Machine Learning, as I took this course in my college.

Currently, I'm trying to start a personal project related with solving the multi-agent problem in digital assistants (no particular DA, as they all work on the same principal). Current digital assistants only search for specific patterns / keywords in a person's speech, so they don't actually 'understand' what the user is saying. Thus, as user's demands / expectations are increasing, companies are reaching a dead-end as in they are trying to find a simpler solution.

The motivation comes from this article []. I have tried searching for research articles to provide me with a working ground on understanding and tackling the problem, but, so far, I have been unable to find any such material.

It is requested from the community to please share their understanding of the problem and, if possible, provide links to resources that address this problem.

