multimodal AI

Overview of recent AI industry news including OpenAI staff departures, Sony Music Group's copyright warnings, Scarlett Johansson's voice usage issue, and new developments in ChatGPT search integration.

Last Week in AI: Episode 33

1. Significant Industry Moves

OpenAI Staff Departures and Safety Concerns

Several key staff members responsible for safety at OpenAI have recently left the company. This wave of departures raises questions about the internal dynamics and commitment to AI safety protocols within the organization. The departures could impact OpenAI’s ability to maintain and enforce robust safety measures as it continues to develop advanced AI technologies​​.

For more details, you can read the full article on Gizmodo.

Sony Music Group’s Warning to AI Companies

Sony Music Group has issued warnings to approximately 700 companies for using its content to train AI models without permission. This move highlights the growing tension between content creators and AI developers over intellectual property rights and the use of copyrighted materials in AI training datasets​.

For more details, you can read the full article on NBC News.

Scarlett Johansson’s Voice Usage by OpenAI

Scarlett Johansson revealed that OpenAI approached her to use her voice for their AI models. This incident underscores the ethical and legal considerations surrounding the use of celebrity likenesses in AI applications. Johansson’s stance against the unauthorized use of her voice reflects broader concerns about consent and compensation in the era of AI-generated content.

For more details, you can read the full article on TechCrunch.

ChatGPT’s New Search Product

OpenAI is reportedly working on a stealth search product that could integrate ChatGPT capabilities directly into search engines. This new product aims to enhance the search experience by providing more intuitive and conversational interactions. The development suggests a significant shift in how AI could transform search functionalities in the near future​.

For more details, you can read the full article on Search Engine Land.

2. Ethical Considerations and Policy

Actors’ Class-Action Lawsuit Over Voice Theft

A group of actors has filed a class-action lawsuit against an AI startup, alleging unauthorized use of their voices to train AI models. This lawsuit highlights the ongoing legal battles over voice and likeness rights in the AI industry. The outcome of this case could set a precedent for how AI companies use personal data and celebrity likenesses in their products.

For more details, you can read the full article on The Hollywood Reporter.

Inflection AI’s Vision for the Future

Inflection AI is positioning itself to redefine the future of artificial intelligence. The company aims to create AI systems that are more aligned with human values and ethical considerations. Their approach focuses on transparency, safety, and ensuring that AI benefits all of humanity, reflecting a commitment to responsible AI development.

For more details, you can read the full article on Inflection AI.

Meta’s Introduction of Chameleon

Meta has introduced Chameleon, a state-of-the-art multimodal AI model capable of processing and understanding multiple types of data simultaneously. This new model is designed to improve the integration of various data forms, enhancing the capabilities of AI applications in fields such as computer vision, natural language processing, and beyond.

For more details, you can read the full article on VentureBeat.

Humane’s Potential Acquisition

Humane, a startup known for its AI-driven wearable device, is rumored to be seeking acquisition. The company’s AI Pin product has garnered attention for its innovative approach to personal AI assistants. The potential acquisition indicates a growing interest in integrating advanced AI into consumer technology​.

For more details, you can read the full article on The Verge.

Adobe’s Firefly AI in Lightroom

Adobe has integrated its Firefly AI-powered generative removal tool into Lightroom. This new feature allows users to seamlessly remove unwanted elements from photos using AI, significantly enhancing the photo editing process. The tool demonstrates the practical applications of AI in creative software and the ongoing evolution of digital content creation​.

For more details, you can read the full article on TechCrunch.

Amazon’s AI Overhaul for Alexa

Amazon plans to give Alexa an AI overhaul, introducing a monthly subscription service for advanced features. This update aims to enhance Alexa’s capabilities, making it more responsive and intuitive. The shift to a subscription model reflects Amazon’s strategy to monetize AI advancements and offer premium services to users.

For more details, you can read the full article on CNBC.

3. AI in Practice

Microsoft’s Recall of AI Feature Under Investigation

Microsoft is under investigation in the UK for its recent recall of an AI feature. The investigation will assess whether the recall was handled appropriately and if the feature met safety and regulatory standards. This case highlights the importance of regulatory oversight in the deployment of AI technologies.

For more details, you can read the full article on Mashable.

Near AI Chatbot and Smart Contracts

Near AI has developed a chatbot capable of writing and deploying smart contracts. This innovative application demonstrates the potential of AI in automating complex tasks in the blockchain ecosystem. The chatbot aims to make smart contract development more accessible and efficient for users.

For more details, you can read the full article on Cointelegraph.

Google Search AI Overviews

Google is rolling out AI-generated overviews for search results, designed to provide users with concise summaries of information. This feature leverages Google’s advanced AI to enhance the search experience, offering quick and accurate insights on various topics​.

For more details, you can read the full article on Business Insider.

Meta’s AI Advisory Board

Meta has established an AI advisory board to guide its development and deployment of AI technologies. The board includes experts in AI ethics, policy, and technology, aiming to ensure that Meta’s AI initiatives are aligned with ethical standards and societal needs​.

For more details, you can read the full article on Meta’s Investor Relations.

Stay tuned for more updates next week as we continue to cover the latest developments in AI.

Last Week in AI: Episode 33 Read More »

OpenAI's ChatGPT introduces the 'Read Aloud' feature, enhancing accessibility and interaction across 37 languages with customizable voice options.

ChatGPT Gets Vocal: Introducing “Read Aloud”

OpenAI has just added a “Read Aloud” feature to ChatGPT. This transforms how we interact with AI, enabling it to voice responses in 37 languages. It’s a big leap forward, making AI chats more accessible and dynamic.

Accessibility Across Devices

Mobile: Tap and Hold

On iOS and Android, “Read Aloud” springs to life with a simple tap and hold on the message you’re curious about. Choose “Read Aloud,” and the AI takes it from there, speaking directly to you.

Web: Click Away

For web users, a “Read Aloud” button now appears below messages. A single click, and ChatGPT starts conversing with you, no tapping or holding required.

Tailored Auditory Experience

Smart Language Detection

“Read Aloud” isn’t just about hearing the AI; it’s about understanding. The feature automatically detects the language of the message, ensuring the AI’s response is in the right tongue.

Voice Options

Customization doesn’t end with languages. You get five voice options to choose from, making each interaction uniquely yours.

OpenAI’s Multimodal Evolution

This update is part of OpenAI’s broader vision to enhance ChatGPT’s multimodal capabilities. Following the introduction of a voice chat feature in September 2023, “Read Aloud” adds another layer of interaction, pushing the boundaries of how we engage with AI.

Keeping Pace with Tech Giants

OpenAI isn’t alone in exploring voice features. Google and Microsoft have woven similar “Read Aloud” functionalities into their chatbots, Gemini and Copilot. It’s a clear signal that the future of AI involves not just reading text but also hearing it, making digital interactions more human-like.

In essence, ChatGPT’s “Read Aloud” feature is setting a new standard for AI communication. By making AI chats more accessible and engaging across languages and devices, OpenAI continues to innovate, ensuring that AI isn’t just seen but also heard.

Image credit: MJ

ChatGPT Gets Vocal: Introducing “Read Aloud” Read More »

Latest AI Developments: Google Gemini, Apple Vision Pro, and Neuralink's Human Trial

Last Week in AI: Episode 17

Welcome back to “Last Week in AI.” Although, it’s been a slow couple of weeks, we’ve got some pretty groundbreaking stuff to talk about. From Google shaking things up with Gemini, to Apple launching a ton of apps for its Vision Pro platform, making digital interaction more immersive than ever. And let’s not forget Neuralink successfully testing their brain-computer interface in a human for the first time. Let’s break it down for you.

Google

Google’s Bard can now make images using its Imagen 2 model. It’s Google’s answer to ChatGPT Plus. They made sure it’s responsible, with watermarks and no-go zones for certain content. Plus, they dropped ImageFX, a simple tool for making pictures from text. Bard now speaks over 40 languages worldwide.


Key Points:
  • Bard vs. ChatGPT Plus: Google’s stepping up, adding image-making to Bard.
  • Safety First: Watermarks and rules keep things in check.
  • Worldwide Reach: Bard’s now a global player, with a massive language boost.

With Bard and ImageFX, Google’s blending creativity, ethics, and accessibility. It’s smart, it’s global, and it’s responsible. That’s the future of AI they’re betting on.


Gemini

Google’s AI, Gemini, handles text, code, audio, images, and video, all in one. There are three versions: Ultra, Pro, and Nano, with Ultra being a real standout, even outsmarting human experts in understanding language and coding.

Key Points:
  1. Versatility: Gemini’s a jack-of-all-trades, mixing and matching different types of data seamlessly.
  2. Three Models: From Ultra’s heavyweight capabilities to Nano’s mobile-friendly design, there’s something for every need.
  3. Safety and Accessibility: Google’s not cutting corners on safety, checking Gemini for bias and toxicity. It’s getting baked into Google products and is available for developers through Google AI tools.

Google’s Gemini is built to be versatile, accessible, and safe. This is a way for AI to work with us, making life easier for developers and changing how we all interact with technology.


Shopify

Shopify’s now adding a media editor and conversational search to its toolkit with its AI-powered Magic suite. It can tweak photo backgrounds or switch them up entirely—no Photoshop skills needed. It even suggests backgrounds to match what you’ve got, making products shine without the fancy photo shoot. All these AI perks? They’re thrown in for free, knocking down hurdles for entrepreneurs.


Key Points:
  • DIY Photo Magic: Sellers can edit photos like pros, thanks to generative image fill.
  • Convo Search: This isn’t your old-school search; it gets what you’re looking for by understanding the intent, making results way more relevant.
  • Tools Galore: Beyond photos, Shopify’s got AI doing heavy lifting with product descriptions, chatbots, and smart replies, all aimed at easing merchant and buyer chats.

Shopify’s making sure small and big businesses alike get a fair shot. It’s about giving options, not orders. Shopify’s vision? A leveled playing field where all sellers get to shine.


Midjourney

Midjourney’s latest anime-style update, Niji V6 is here. It offers both amateurs and professionals new ways to blend text with imagery. This allows artists to embed words directly into their pictures but also introduces enhanced features for customizing art like never before.


Key Points:
  • Creative Fusion: Niji V6 lets you combine drawings with text, adding a personal touch to your art.
  • Enhanced Control: Features like ‘Vary (Region)’, ‘Pan’, and ‘Zoom’ give artists unprecedented control over their creations.
  • Accessibility: Available to paying users through the Midjourney chatbot, with a full release scheduled for February.

By empowering artists to fuse text and imagery seamlessly, it opens up new possibilities for storytelling and personal expression. Whether you’re just dabbling in digital art or you’re a seasoned pro, Niji V6 promises to inspire and transform the way anime-style art is made.


OpenAI

OpenAI’s ChatGPT users can now pull in specialized GPTs right into their chats. Just hit “@,” pick your GPT, and boom, it’s like adding a new brain to the conversation. They’ve even set up a GPT Store (like an App Store) to make finding and using these GPTs easy. But, not many folks are using them yet, and those who do, it’s dropping. Plus, they hit a bit of a snag with some not-so-great GPTs slipping through, which they’re cleaning up.

Key Points:
  • Customizable Convos: This new feature lets you tailor your ChatGPT conversations with specific GPTs for whatever you need.
  • GPT Store: A marketplace to grab these GPTs, designed to be user-friendly even for non-coders.
  • Challenges Ahead: Adoption’s been slow, and there’s been a bit of trouble with moderation, but OpenAI’s not backing down, planning to let creators earn money from their GPTs soon.

The idea’s solid: more personalized, useful chats. But, they’ve got some hurdles to clear, especially getting more users on board and keeping the GPT Store clean. Still, with plans to monetize, there’s a clear path forward for developers and users alike.


Mistral

Miqu-1-70b” got leaked online and everyone’s talking about it possibly beating GPT-4. The head of Mistral said it was an old model accidentally leaked by someone they work with. But here’s the kicker: they’re working on a new version that might just outdo GPT-4.


Key Points:
  • Leaky Boat: “Miqu-1-70b” has everyone buzzing about it possibly taking on GPT-4.
  • Inside Job: The boss over at Mistral says, oops, it was an older model that got out by accident.
  • Game On: They’re hinting they’ve got something even bigger brewing that could outdo GPT-4.

Mistral’s little accident is now big news, showing everyone just how intense the AI race is getting. And this leak? It might just shake things up, pushing the open-source AI scene into new places and turning up the heat on OpenAI.


Apple

Apple’s doing something big with the Vision Pro. They’ve got 600 new apps hitting the scene. This is about taking computing to a whole new level by blending the digital and real worlds like never before.

The Points:
  1. Wide Range: These apps are all over the map – games, work tools, learning, you name it. It’s about making computing not just something you do, but something you experience.
  2. Top-Notch Tech: The display on this thing is next-level. You’re not just looking at a screen; you’re in it. And you control it with your eyes, hands, voice – however you want.
  3. Big Changes: What Apple’s aiming for here is to change the game. How we watch, work, play, learn – it’s all going to be different with these apps and Vision Pro.

Apple’s Vision Pro is setting a new standard for digital interaction, merging the lines between the virtual and the real. It’s creating experiences that change how we see and interact with the world around us.


Neuralink

Elon Musk’s Neuralink just hit a big milestone: they’ve put their brain-computer interface device into a person for the first time. The patient seems to be doing just fine. Neuralink’s big idea is to let people with serious paralysis use tech like computers and phones just by thinking. They’re calling this brain implant Telepathy, aiming to help folks with conditions like ALS communicate or even use social media directly with their minds.

Key Points:

  1. First Human Trial: Neuralink’s moved from experiments to actually implanting a device in a human, showing they’re on track towards making this tech a reality.
  2. The Goal: The tech is all about translating what’s in your brain into commands for devices, without moving a muscle.
  3. Not Alone: Neuralink’s not the only one in this race. Companies like Synchron and Blackrock Neurotech are also pushing the boundaries of what’s possible with brain-computer interfaces.

Neuralink’s stepping into new territory, blending mind and machine in ways we’ve only dreamed of. This first human trial is a big deal, showing Musk’s vision of merging humans with AI isn’t just sci-fi fantasy anymore.

In Summary

This week has shown us just how fast the world of AI is evolving. Google’s Gemini is setting new standards in versatility, Apple’s Vision Pro apps are redefining user interaction, and Neuralink is pushing the boundaries of what’s possible with neurotechnology. Each of these developments not only highlights the rapid advancements in AI but also hints at the transformative impact these technologies could have on our everyday lives. The future of AI is here and now, and it’s more exciting than ever. Stay tuned for more updates.

Last Week in AI: Episode 17 Read More »

Google Bard AI Chatbot Enhanced with Gemini Model

The Next Big Thing in AI: Google’s Bard Meets Gemini!

Big news in the AI world: Google’s AI chatbot Bard is upgrading with Gemini. Let’s see what’s new!

Bard’s New Brain: Gemini Unleashed!

Gemini is more than an update; it’s like a supercharged AI brain for Bard. It brings advanced skills in reasoning, planning, and understanding. Imagine a chatbot that not only answers questions but also thinks and plans like a pro.



Choose Your Flavor: Ultra, Pro, or Nano

Gemini comes in three sizes: Ultra, Pro, and Nano. This means it can work on everything from smartphones to high-end servers. You get a slice of the AI magic, no matter your device.

The Rollout: Phase One with Gemini Pro

The Bard upgrade is in two phases. First, the Gemini Pro version. It’s set to enhance Bard’s understanding, summarizing, brainstorming, writing, and planning skills. This is the biggest quality leap for Bard since its launch.

Going Global in English

Initially, this new Bard will be available in English in over 170 countries. More languages and countries will follow soon.

Outperforming the Competition

In benchmarks, Bard with Gemini Pro is outperforming GPT-3.5. The focus is now on text-based prompts, but multimodal support is planned for the future.

Looking Ahead: Bard Advanced and Gemini Ultra

Coming in 2024, Bard Advanced will be powered by Gemini Ultra. It will offer multimodal reasoning capabilities, meaning it can understand and interact with various data types, like images and text. A trusted tester program will precede its launch.

Continual Improvements: The Journey So Far

Bard has continuously improved since its launch. Google aims to make it the best AI collaborator for users.

What This Means for Us

If you’re into AI, this update is a big deal. It’s more than just a chatbot for simple tasks. Bard, with Gemini, can brainstorm, plan, and understand complex content.

In Conclusion: The Future of AI is Here!

Google’s Bard, powered by Gemini, is changing the AI game. It’s an exciting time for AI, whether you’re a tech guru, a curious learner, or just love cool new tech. Stay tuned for what’s next in AI!

The Next Big Thing in AI: Google’s Bard Meets Gemini! Read More »