- 03/15/2022
The results: a mixed bag, fascinating but at times quite strange. (All green text in this report was written by GPT-3, not a human author or editor.) In some ways, working with GPT-3 was like entering the wizarding world of Harry Potter...except I was the one wielding the spells to manipulate the world around me.
GPT-3 As A Prompt: It Certainly Can Provoke
Most of us, at some point, get stuck on how to phrase what we want to say while writing. Can GPT-3 offer useful suggestions? Sometimes. It sparked my creativity and helped me think of ways to connect my thoughts in a more sophisticated way, though sometimes the results were not exactly what I had in mind. In the first example (see Figure 1), I hoped the machine would give me a more concise sentence than what I had been contemplating. The results, however, are clunky, though the ideas are not incorrect. The second attempt (see Figure 2) is more polished in tone, but not what I intended, since I was not using GPT-3 for translation.
With low code technology more people can code, leading to a diversity of new ideas which in turn leads to higher quality and more innovation and in the end, better software. | Most of us, at some point, get stuck on how to phrase what we want to say while writing. While working on “Software Development in the Age of AI”, I tried out a few sentences with GPT-3 and found that it was easier to prototype sentences in English and let the neural network translate them into Chinese. |
Fig 1 | Fig 2 |
Next, I decided to see how well GPT-3 generates similes, metaphors, and analogies. Sometimes while writing about complex technologies, finding a good simile can improve the reader’s comprehension; but such allusions can be very difficult to find. Can AI help?
Using GPT-3 to finish my sentences for me is like having a personal assistant who always knows what to say. | GPT-3 can also come up with similes, metaphors, and analogies to help explain concepts, such as this analogy: Using GPT-3 to finish my sentences for me is like having a superpower that I can use to make my writing more interesting and engaging. |
Fig 3 | Fig 4 |
These aren’t bad suggestions, but note how pushy the language is. In Figure 3, it claims that GPT-3 “always” knows what to say and brags about its “superpower” in Figure 4. This was a common theme running through my experiments; sometimes the AI sounds rather like a tryhard. But in other instances, its performance was far more nuanced.
For example, many of our internal FCAT reports contain a Questions to Consider section at the end. These questions are designed to encourage our readers to think broadly about the impact a given technology or innovation might have on our business. In this context, GPT-3 was a bit more arresting.
Given recent advances in AI that have given rise to new software development tools that generate complete portions of code and allow users to program with natural language, should we change our interview process when hiring software engineers? What will the future skills of a developer look like? Should we focus less on programming skills and more on high-level design questions? In addition, how important are coding challenges in the interview process? And what other assessment techniques can be used to evaluate a software engineer’s skills? Will AI automate the process of software development to such an extent that human involvement will no longer be necessary? How do we ensure that the human role in software development is not lost? |
Fig 5 |
How might we verify the results we get from GPT-3? Can another AI tool act as a fact-checker? And how can we be sure GPT-3 is a real AI? Can’t it just be a human typing answers into a computer? |
Fig 6 |
In Figure 5, the app clearly understands the context; its suggestions, while not earth-shattering, are pretty good. In Figure 6, something very different happens. Here, the machine has clearly misunderstood the context and suggests questions that the author did not intend. And yet the ideas themselves, no matter how crudely expressed, raise an interesting point that might deserve separate attention. This is one of the more interesting “errors” we encountered, where the machine gets it wrong, but we find ourselves in a sort of dialogue with the application as we reason through its suggestion. Maybe, in a roundabout sort of way, GPT-3 is on to something: perhaps we should address the underlying issue of “trusting AI” in a subsequent article.
GPT-3 Is Actually A Pretty Useful Title Generator.
While titles are the first thing you see on a report, FCAT researchers actually write them last since we don’t usually know exactly what we have until we’re done. When thinking up a title for “Software Development in the Age of AI”, an internal FCAT report on how AI is changing software development, I found that AI-generated titles are often more creative and interesting than those created by humans. At least this human and her editor. I tried a few different approaches. First, I gave GPT-3 a summary of the report; then I gave it my own title suggestions; and finally, I fed it some of the report’s content to see how it would respond to that. In all cases, the results were interesting and mostly on target (see Figure 7). The results were best when giving it a short summary of the report and a few example title suggestions. I used the same approach for this article and, again, found the results more useful than one might expect (see Figure 8). Yes, some of the titles are overbearing, but so too are some suggested by humans, especially when writing on deadline.
|
Fig 7 |
|
Fig 8 |
In fact, the title “Software Development in the Age of AI” was partly inspired by one of GPT-3’s responses: Coding in the Age of AI. That said, it also suggested “Natural Language, Robots, and Autonomous Software Testing, Oh My!” I took a pass on that one.
Proceed With Caution
Of course, GPT-3 can also come up with completely false information. Consider the following errors:
Recently, Fidelity Investments, looking to attract younger, more diverse investors, partnered with My Little Pony and added a dash of millennial pink to its website. |
Fig 9 |
Recently, Fidelity Investments, looking to do more with cryptocurrency, acquired the startup company Coinbase for about $100 million. |
Fig 10 |
Those might be amusing notions, but clearly we can’t trust these tools to write responsibly. Even when it performs useful tasks like generating summaries, careful proofing is a must. In Figure 11, I asked it to summarize the first section of “Software Development in the Age of AI”.
|
Fig 11 |
This summary is not perfect (I didn’t discuss writing documentation at all in “Software Development in the Age of AI”), but these “TL;DR” summaries aren’t a bad place to start when trying to share the main points of our research.
What will human and AI interactions look like when it comes to writing? Let’s ask GPT-3: Sometimes AI can be quite helpful, offering suggestions that improve the sophistication of one's writing. Other times, results are clunky or simply incorrect. But regardless of the outcome, these interactions always prove to be thought-provoking.
Sarah Hoffman leads AI and Machine Learning (ML) research for FCAT, helping the firm understand trends in these technologies and their potential impact on Fidelity.