Preview: Meta’s New AI Tools
Meta unveiled many new AI tools at Meta Connect 2024, and though some are generic and others are underdeveloped, there is a lot of exciting progress here.
Meta has a slightly odd position in the AI battle. On one hand, their smart glasses have a built-in assistant that uses their Llama AI to answer queries based on what you see. Zuckerberg boasted that Llama is on track to be the most used AI model by the end of the year and “may be the biggest already.”
The flipside, however, is that they are quite far behind their competition, and those numbers are somewhat disingenuous. Their high user numbers come solely from the fact that they have built Meta’s assistant into all of their app’s search boxes, meaning many persons will accidentally use it without wanting to and then get annoyed, as it’s less useful than a standard search box. If there were a separate Meta Assistant app — like Plaud, Google’s Gemini, and ChatGPT — it would not only be the smallest of these but also the least advanced.
Meta Connect 2024 saw Zuckerberg silently acknowledge these points, announcing a series of quality-of-life updates to bring Llama up to speed with the competition while also pushing boundaries by making use of their enormous free user base.
To start with the basics, Meta Assistant — available to chat with in Messenger, WhatsApp, and Instagram — can now edit photos and respond to voice commands. It already did so on the glasses, but the in-app version is slightly different and couldn’t until now. Also, for some reason, Meta is obsessed with celebrity endorsements for their AI products, so their AI assistant can now respond in the nearly life-like voices of various B-list celebrities like John Cena, Judi Dench, Kristen Bell, Keegan-Michael Key, and Awkwafina. I don’t know why you’d want that, but you can. My cynical view is that Meta knows it loses in a head-to-head with ChatGPT or Plaud for complex tasks, so instead it aims to attract the broadest user base to their AI assistant — persons with the least complex requests, such as children and the elderly — and celebrity voices help with that.
They also announced new tools that make it easier for large professional influencers to interact with their audience, from replying to fans with an AI assistant to letting fans have video calls with an AI persona version of the influencer. All of these tools seem convenient, advanced, and depressingly impersonal.
Their big customer-facing AI feature is automatic video dubbing for short-form vertical videos, starting with English and Spanish. The tool won’t just translate the creator’s speech but adjust the video to make it look like the person is actually speaking the translation. The translated voices are meant to sound like the person’s own voice and accent, but the pre-recorded demonstrations aren’t particularly persuasive. It’s fairly convincing for the first sentence, but as the person talks further, the accent becomes more generic, and the speech sounds more mechanical.
The AI assistant on their smart glasses is where Meta is focusing most of its attention. Most of these features aren’t fully fleshed out or reliable yet but point to how competent these glasses will eventually become.
Meta’s new Llama 3.2 model is multimodal, meaning it can directly interact with video and photo content rather than processing them into natural language. Theoretically, this means it will provide live feedback on video behavior. Imagine doing bicep workouts in front of a mirror and your glasses give you live feedback on your position, or playing basketball and receiving live coaching instructions based on your performance. Meta also promises that their AI will be able to remember information you tell it, but I’m doubtful about how reliable this will be in practice, as memory remains a tough challenge for AI assistants.
The glasses will also provide real-time translation for at least French, Spanish, and Italian. Just wear the glasses and set them to translation mode, and a person speaking to you in Italian will be translated into English in your ears. Then, you can speak your response and show the person the Meta app, which will display a written translation of what you said. I’ve tested many live translation features, including from Google — the industry leader in translation — and they are all clunky in practice. Perhaps this will be different.
Some of these features will be rolling out now, but even so, don’t trust that they’ll be polished and reliable. This is all very ambitious technology, and it’s going to be a while before I’d use them regularly.
One handy new feature is that you can now call a number or open a QR code just by looking at a piece of paper with them written on it.