An anonymous reader quotes an Ars Technica report: Wednesday, Charlie Holtz, developer of Replicate combined GPT-4 Vision (commonly known as GPT-4V) and ElevenLabs’ voice cloning technology to create a unauthorized AI version of famous naturalist David Attenborough narrating Holtz’s every move on camera. As of Thursday afternoon, the Message describing the stunt had garnered over 21,000 likes. “Here we have a remarkable specimen of Homo sapiens distinguished by his circular silver glasses and mane of tousled curly locks,” the fake Attenborough says in the demo as Holtz looks on with a smile. “He is wearing what appears to be a blue cloth covering, which one can only assume is part of his courtship display.” “Look closely at the subtle arch of her eyebrow,” he continues, as if it were a narration from a BBC wildlife documentary. “It’s as if he’s in the middle of a complex ritual of curiosity or skepticism. The backdrop suggests a sheltered habitat, possibly a common feeding area or watering hole.”
How it works? Every five seconds, a Python script called “narrator” takes a photo from Holtz’s webcam and transmits it to GPT-4V – the version of OpenAI’s language model that can process image inputs – via an API, which has a special prompt to make it creates a text in the style of Attenborough’s narrations. Then it feeds that text into an ElevenLabs AI voice profile trained from audio samples of Attenborough’s speech. Holtz provided the code (called “narrator”) which brings it all together on GitHub, and requires API tokens for OpenAI and ElevenLabs which cost money to run. During the demo video, when Holtz holds up a cup and takes a drink, Attenborough’s mock narrator says, “Ah, in his natural environment we observe sophisticated Homo sapiens engaging in the critical ritual of hydration. This male individual has selected a small cylindrical container, likely filled with life-giving H2O, and is expertly tilting it toward his intake port. Such grace, such balance.