AI and You: Microsoft’s Copilot Moves, NYT-OpenAI Debate Fair Use, GPT Store Opens

G.F.A.L.O.E.

2 года назад

We’re just half a month into the new year, but predictions that 2024 will be remembered as boom times for generative AI seem to be coming true already.

Microsoft got things rolling on Jan. 4 by announcing the biggest change to its keyboard design in nearly 30 years, adding a button that will give people direct access to its AI Copilot tool on new Windows 11 computers starting this month. Makes sense given that Microsoft has invested $13 billion in OpenAI, the maker of ChatGPT and the large language model that powers the Copilot service.

CNET’s Sareena Dayaram called the new keyboard button «a bold bid for AI dominance,» explaining how it will serve «as a physical portal to its Copilot service, which helps people perform tasks like summarizing documents, recommending music and answering questions you might ask a search engine or AI chatbot.»

For its part, Microsoft said its goal is to make gen AI a part of everyday life, which doesn’t seem that far-fetched given that Windows is the most popular computer operating system and there are over 1 billion people using Windows today. On Jan. 15, the company announced new subscriptions services for Copilot, which Microsoft says has been part of more than 5 billion chat and created over 5 billion images so far. The consumer Copilot Pro is $20 a month (same pricing as ChatGPT Plus.)

«AI will be seamlessly woven into Windows from the system, to the silicon, to the hardware,» Yusuf Mehdi, Microsoft’s consumer marketing chief, wrote in a post announcing the Copilot key. «This will not only simplify people’s computing experience but also amplify it, making 2024 the year of the AI PC.»

It’s not just PCs that are getting an AI boost. Last week at CES, the world’s largest consumer electronics show, companies including Volkswagen, Intel, McDonald’s, L’Oreal and LG showcased AI-branded products and services. (You can find CNET’s complete coverage of CES here.) According to the Consumer Technology Association, which runs CES, over 230 million smartphones and PCs sold in the US this year will «tap the powers of generative AI» in some way.

«You don’t want to show up at the costume party in plain clothes, right?» Dipanjan Chatterjee, a principal analyst at Forrester, told CNET about the AI tagline being added to what seemed like every gadget and new service at CES. «Everyone’s going to be there saying AI. You’re probably going to look like a fool if you don’t.»

One of the more interesting AI announcements out of CES was Volkswagen’s news that it’s adding gen AI tech, including ChatGPT, to some of its car models in North America and Europe so you can talk to your car (visions of Knight Rider, anyone?). To be delivered to new and existing cars on the road through an over-the-air software update starting in the second quarter of 2024, the AI software will expand the capabilities of Volkswagen’s IDA voice assistant beyond handling simple tasks, like initiating a call, to automatically turning up the heat after you ask IDA «to warm up the driver’s side.» And it will be able answer thousands of questions beyond giving you driving directions and destination info — including all kinds of advice, including how to rekindle your love life, notes CNET’s Stephen Shankland.

Why you should get on the chatbot bandwagon sooner rather than later

If you’ve read this far and are still unsure what the gen AI fuss is all about, don’t worry — I got you. Despite all the noise around AI, most Americans (82%) haven’t even tried ChatGPT, and over half say they’re more concerned than excited by the increased use of AI in their daily life, according to the Pew Research Center.

Still, chatbots are literally changing the conversation around the future of work, education and how we may go about day-to-day tasks. So becoming comfortable with chatbots should be on your 2024 to-do list.

To help with that, I wrote an expansive, consumer-friendly overview of chatbots as January’s cover story for CNET. And I included practical tips about how to start working with tools like ChatGPT and beyond, talked to experts about which jobs will and won’t be affected by the gen AI tsunami (TL;DR: pretty much everything), about the issues that you need to be aware of when working with these tools — including privacy, security and copyright — and about the use cases, ethical use cases that is, that we all should be experimenting with as soon as possible.

I encourage you to read it if you want to know what I’ve learned after a year looking into all things gen AI. In the meantime, here are few takeaways:

Natural language: The new generation of chatbots — including ChatGPT, Google Bard, Microsoft Bing, Character.ai and Claude.ai — are based on a large language model, or LLM, a type of AI neural network that uses deep learning (it tries to simulate the human brain) to work with an enormous set of data to perform a variety of natural language processing tasks. What does that mean? They can understand, summarize, predict and generate new content in a way that’s easily accessible to everyone. Instead of needing to know programming code to speak to a gen AI chatbot, you can ask questions (known as «prompts» in AI lingo) using plain English.

Gen AI is a general purpose technology: Generative AI’s ability to have that natural language collaboration with humans puts it in a special class of technology — what researchers and economists call a general-purpose technology. That is, something that «can affect an entire economy, usually at a national or global level,» Wikipedia explains. «GPTs have the potential to drastically alter societies through their impact on pre-existing economic and social structures.» Other such GPTs include electricity, the steam engine and the internet — things that become fundamental to society because they can affect the quality of life for everyone. (That GPT is different, by the way, from the one in ChatGPT, which stands for «generative pretrained transformer.»)

Mass market phenomenon: If hitting a million users is a key milestone for turning an untested tech service into a mainstream destination, think about this: It took Netflix three and a half years to reach 1 million users launching in 1999, Facebook 10 months and Instagram three months in 2010. ChatGPT, which debuted on Nov. 30, 2022, reached 1 million users in five days. Yep, just five days.

The AI effect on jobs: There’s been a lot of talk about the future or work and how jobs may fare due to the expected productivity and profit boost AI and automated tech should help deliver. There’s good news and bad news on the jobs front. The bad news: v-pre as many as 40% of roles could be affected by the new tech, which means reskilling, retraining and redoing job descriptions to incorporate how AI will change the nature of jobs needs to happen now.

What should today’s — and tomorrow’s — workers do? The experts agree: Get comfortable with AI chatbots if you want to remain attractive to employers. The good news: v-pre according to Goldman Sachs, new tech has historically ushered in new kinds of jobs. In a widely cited March 2023 report, the firm noted that 60% of today’s workers are employed in occupations that didn’t exist in 1940. Still, Goldman and others, including the International Monetary Fund, said AI will lead to significant disruption in the workforce.

Among the new occupations we’re already seeing is prompt engineering. That refers to someone able to effectively «talk» to chatbots because they know how to ask questions to get a satisfying result. Prompt engineers don’t necessarily need to be technical engineers but rather people with problem-solving, critical thinking and communication skills. (Liberal arts majors — your time has come!) Job listings for prompt engineers showed salaries of $300,000 or more in 2023.

Think of it the way that Andrew McAfee, a principal research scientist at the MIT Sloan School of Management, described it to me. «When the pocket calculator came out, a lot of people thought that their jobs were going to be in danger because they calculated for a living,» he said. «It turns out we still need a lot of analysts and engineers and scientists and accountants — people who work with numbers. If they’re not working with a calculator or by now a spreadsheet, they’re really not going to be very employable anymore.»

Jobs most worried about AI

There are many reports citing which careers will be most affected by AI, including one from Pew Research that said the roles with the highest exposure include budget analysts, data entry keyers, tax preparers, technical writers and web developers. Indeed.com in September looked at 55 million job postings and more than 2,600 skills to determine which jobs and skills had low, moderate and high exposure to gen AI disruption and offered some words of optimism to us humans. «The human element required in many critical job skills — including empathy, intuition, and manual dexterity — remains irreplaceable. Gen AI, while adept at processing data and executing specific tasks, lacks the innate human qualities that define various roles, especially those centered around manual work, human interactions and decision-making based on nuanced understanding.»

In a survey by software developer DevRev, lawyers, artists, accountants, doctors and data scientists in the US — in that order — expressed the most concerns with how gen AI could affect their work. Meanwhile, the UK Department of Education cited white-collar jobs as being the most disrupted by gen AI, with telephone salespeople, lawyers, psychologists and some teachers topping its list, according to ZDNet.

Which raises the question: Who’s less at risk? Anyone whose jobs require manual skills — vets and nurses — and those who work on projects outdoors, including builders and gardeners.

OpenAI opens store with 3 million custom versions of ChatGPT

Making good on its promise in November to give creators — no programming skills required — a way to create customized tools based off its popular ChatGPT chatbot, OpenAI last week opened the GPT Store. The company said that over 3 million GPTs have been created, and that it’s still figuring out a revenue plan to be able to pay «GPT builders.»

«As a first step, US builders will be paid based on user engagement with their GPTs,» OpenAI said. «We’ll provide details on the criteria for payments as we get closer.»

Among the GPTs featured are an AllTrails personal trail recommendations for hikes, runs or riders, a Khan Academy programming tutor, a Canva design tool, a book recommender called Books and an AI scientist called Scholar AI that lets you «generate new hypotheses» and analyze text, figures and tables from over 200 million scientific paper and books.

Anyone who subscribes to OpenAI’s $20-per-month ChatGPT Plus subscription can run the GPTs and create their own GPTs, reports CNET’s Stephen Shankland.

«The GPT Store is designed to promote and categorize GPTs, making it easier to find what you’re looking for or discover what you didn’t even know you wanted,» Shankland said. other examples of GPTs available now include a fitness trainer, laundry buddy washing label decoder, music theory instructor, coloring book picture generator, haiku writer and the Pearl for Pets for vet advice.

Washing label decoder? I like how people’s minds work. Let me know if you’ve got a favorite.

Copyrighted content, training data and fair use

How much of your copyrighted content can an AI large language model co opt for training purposes?

That’s at the heart of The New York Times Dec. 27 lawsuit against OpenAI and Microsoft, with the paper noting that the maker of ChatGPT had used its intellectual property in the form of «millions» of unique, high-value and copyrighted articles to train the chatbot without the NYT’s permission or compensation. The suit comes after discussions that started in April between OpenAI and Microsoft, a top investor in the San Francisco-based startup, failed to reach an «amicable resolution possibly involving a commercial agreement and ‘technological guardrails’ around generative AI products, the NYT said. The paper said it isn’t asking for a specific amount of money in compensation but argues that OpenAI should be held responsible for «billions of dollars in statutory and actual damages.»

«Defendants seek to free-ride on The Times’s massive investment in its journalism,» the NYT wrote in its complaints. OpenAI and Microsoft, it argued, are «using The Times’s content without payment to create products that substitute for The Times and steal audiences away from it.»

OpenAI, which has raised billions in funding and has a valuation of more than $80 billion, said in a Jan. 8 blog post that the NYT’s suit is «without merit.» OpenAI says that training with copyrighted material falls under the category of fair use, and that if publishers don’t want their content co-opted for training purposes, publishers can opt-out of the scraping process. Says OpenAI, «The negotiations focused on a high-value partnership around real-time display with attribution in ChatGPT, in which The New York Times would gain a new way to connect with their existing and new readers, and our users would gain access to their reporting. We had explained to The New York Times that, like any single source, their content didn’t meaningfully contribute to the training of our existing models and also wouldn’t be sufficiently impactful for future training.»

The NYT isn’t the only copyright holder taking on gen AI companies or copyright infringement, notes the Associated Press. The Authors Guild, which represents authors such as John Grisham, George R.R. Martin, Jodi Picault and Scott Turow, filed suit against OpenAI in September and amended its complaint in December. «The lawsuit cites specific ChatGPT searches for each author, such as one for Martin that alleges the program generated «an infringing, unauthorized, and detailed outline for a prequel» to A Game of Thrones that was titled A Dawn of Direwolves and used «the same characters from Martin’s existing books in the series ‘A Song of Ice and Fire,'» the AP reported.

A third lawsuit was filed by nonfiction writers, «including an author of the Pulitzer Prize-winning biography on which the hit movie Oppenheimer was based,» the AP added.

And when it comes to images and AI, there are suits brought by artists who claim the big AI text-to-image generators, including Stability AI and Midjourney, are co opting their copyrighted art, as well as reports that AI companies may be working to bypass copyrights.

As for how the NYT suit will play out (or won’t — the sides might reach a settlement), lawyers, as you would expect, say it could go either way. The NYT is attempting, through copyright law, to protect one of its most valuable assets, its content, while OpenAI is trying to figure out a path forward for AI that doesn’t stifle innovation.

AI term of the week: Training data

An LLM is only as good as the data it’s been trained on, which is why 2024 will see legal arguments around copyright issues and who should be compensated or not for the data used to make these systems smarter. Training data can be used to introduce bias and error into the AI system as well — which is why there’s a push for AI companies to be transparent about what training data has been used in their systems. (That transparency doesn’t exist today.)

So I thought it was worthwhile to share some of the ways «training data» is being defined today.

According to Coursera, training data is «the information or examples given to an AI system to enable it to learn, find patterns, and create new content.»

Venture capitalist firm A16z, which has invested in dozens of AI startups, defines training data as the «dataset used to train a machine learning model.»

But software developer Lark offers a more robust definition that also addresses the underlying concerns around training data: «Training data, in the context of AI and machine learning, refers to the datasets used to train AI models or algorithms. This data serves as the foundational material on which the AI system learns to perform specific tasks or make predictions. The quality and diversity of training data play a pivotal role in determining the accuracy, generalization, and effectiveness of AI models. Essentially, the training data serves as the information source that enables AI models to recognize patterns, make decisions, and improve their performance over time. In simple terms, it can be described as the educational material for AI systems, shaping their understanding and decision-making abilities.»

Editors’ note: CNET is using an AI engine to help create some stories. For more, see this post.