This is an automated archive made by the Lemmit Bot.

The original was posted on /r/singularity by /u/Arman64 on 2024-09-28 04:42:34+00:00.


I’ve spent the past few days using AVM, putting it through my usual tests to explore its features and limitations. Overall, it’s an incredible piece of technology, but it’s definitely not what was demoed months ago.

I’m comparing it primarily to other voice AIs from different companies, but especially to the standard voice chat for ChatGPT 4o. Some info: I’ve tested it with a range of custom instructions and without any. I haven’t attempted to jailbreak it and have tried several voices as well. I’m a native English speaker with an Australian accent and can speak other languages too. I’m a doctor with an expertise in psychology/psychiatry but also an AI enthusiast. I use it with AirPods Pro 2’s with voice isolation. Just to note that some of the issues I’m facing may be widely known, due to user error, standard across every AI, or fundamental limitations of the software.

Pros:

  • The response time between the user and the AI is at the level of a regular human interaction.
  • It has a great ability to pick up on tone and emotion based on the dialogue.
  • The ability to do and modify accents is pretty cool.
  • The expressiveness, rate, volume, and emotional affect in speech are big improvements over other models.
  • The realism and audio quality of the voice are getting closer to sounding like a human, still in the uncanny valley but definitly a step in the right direction.
  • It would definitely grab the attention of people who are not into AI who are not too impressed by AI so far.
  • The pauses, laughter, breathing, and other non-dialogue features enhance the experience.
  • Being able to cut it off works generally quite well.

Cons:

  • It’s heavily censored, nearly to the point it’s unusable in certain situations. While I don’t expect it to depict or discuss anything sexual, violent, or controversial, it refuses some incredibly benign requests, which is disappointing.
  • In longer chats, you may get frequent interruptions stating it cannot discuss something due to guidelines, but if you say “continue,” it does. This can happen multiple times a minute, even when the topic is within guidelines.
  • It’s unable to swear, even when specifically asked that it can.
  • It doesn’t seem to challenge your views (or be happy to explore anything controvertial), doesn’t like to form an opinion, and is far too agreeable to the point of sycophancy.
  • It can repeat words, questions, and statements frequently within the same chat, as if it has poor recall.
  • The tonal variations can fluctuate dramatically; sometimes it’s consistent, but other times it’s all over the place.
  • The model itself is not as intelligent as ChatGPT 4o; it seems more like the level of early GPT-4 or worse at times.
  • There have been rare errors that are creepy—once it screamed for no reason, another time the voice sounded very robotic, and another time it used a completely different voice (not my voice but it sounded really strange).
  • You really can’t stop to think when speaking, as any pause will make it start to speak.
  • The daily limit is quite low, but I expect this to improve within the next month or so. The really annoying thing is that you have to wait 24 hours which makes it harder to demo to others.
  • It tries to be engaging by generally asking you a question at the end of its “turn”, but when it does that all the time, it can be a bit tedious.
  • The lack of promised features such as multimodal image/video analysis and the ability to sing is a bummer.
  • When speaking in different languages, it can sometimes switch languages without being told to—same with accents and tones.
  • I’ve noticed it tends to misunderstand words and hallucinate more often than the standard voice mode. Sometimes it agrees to do a request but ignores it completely or addresses it only briefly.
  • It does not recognise emotions based on your tone, just from the context of the words.
  • The replies seem to be quite brief and can appear superficial.

Overall, while AVM shows a lot of promise and has some impressive features, there are significant areas that need improvement.