Technology
How Smart is ChatGPT?
Visualizing ChatGPT’s Performance in Human Exams
ChatGPT, a language model developed by OpenAI, has become incredibly popular over the past year due to its ability to generate human-like responses in a wide range of circumstances.
In fact, ChatGPT has become so competent, that students are now using it to help them with their homework. This has prompted several U.S. school districts to block devices from accessing the model while on their networks.
So, how smart is ChatGPT?
In a technical report released on March 27, 2023, OpenAI provided a comprehensive brief on its most recent model, known as GPT-4. Included in this report were a set of exam results, which we’ve visualized in the graphic above.
GPT-4 vs. GPT-3.5
To benchmark the capabilities of ChatGPT, OpenAI simulated test runs of various professional and academic exams. This includes SATs, the bar examination, and various advanced placement (AP) finals.
Performance was measured in percentiles, which were based on the most recently available score distributions for test takers of each exam type.
Percentile scoring is a way of ranking one’s performance relative to the performance of others. For instance, if you placed in the 60th percentile on a test, this means that you scored higher than 60% of test-takers.
The following table lists the results that we visualized in the graphic.
Category | Exam | GPT-4 Percentile | GPT-3.5 Percentile |
---|---|---|---|
Law | Uniform Bar Exam | 90 | 10 |
Law | LSAT | 88 | 40 |
SAT | Evidence-based Reading & Writing | 93 | 87 |
SAT | Math | 89 | 70 |
Graduate Record Examination (GRE) | Quantitative | 80 | 25 |
Graduate Record Examination (GRE) | Verbal | 99 | 63 |
Graduate Record Examination (GRE) | Writing | 54 | 54 |
Advanced Placement (AP) | Biology | 85 | 62 |
Advanced Placement (AP) | Calculus | 43 | 0 |
Advanced Placement (AP) | Chemistry | 71 | 22 |
Advanced Placement (AP) | Physics 2 | 66 | 30 |
Advanced Placement (AP) | Psychology | 83 | 83 |
Advanced Placement (AP) | Statistics | 85 | 40 |
Advanced Placement (AP) | English Language | 14 | 14 |
Advanced Placement (AP) | English Literature | 8 | 8 |
Competitive Programming | Codeforces Rating | <5 | <5 |
The scores reported above are for GPT-4 with visual inputs enabled. Please see OpenAI’s technical report for more comprehensive results.
As we can see, GPT-4 (released in March 2023) is much more capable than GPT-3.5 (released March 2022) in the majority of these exams. It was, however, unable to improve in AP English and in competitive programming.
Regarding AP English (and other exams where written responses were required), ChatGPT’s submissions were graded by “1-2 qualified third-party contractors with relevant work experience grading those essays”. While ChatGPT is certainly capable of producing adequate essays, it may have struggled to comprehend the exam’s prompts.
For competitive programming, GPT attempted 10 Codeforces contests 100 times each. Codeforces hosts competitive programming contests where participants must solve complex problems. GPT-4’s average Codeforces rating is 392 (below the 5th percentile), while its highest on a single contest was around 1,300. Referencing the Codeforces ratings page, the top-scoring user is jiangly from China with a rating of 3,841.
What’s Changed With GPT-4?
Here are some areas where GPT-4 has improved the user experience over GPT-3.5.
Internet Access and Plugins
A limiting factor with GPT-3.5 was that it didn’t have access to the internet and was only trained on data up to June 2021.
With GPT-4, users will have access to various plugins that empower ChatGPT to access the internet, provide more up to date responses, and complete a wider range of tasks. This includes third-party plugins from services such as Expedia which will enable ChatGPT to book an entire vacation for you.
Visual Inputs
While GPT-3.5 could only accept text inputs, GPT-4 has the ability to also analyze images. Users will be able to ask ChatGPT to describe a photo, analyze a chart, or even explain a meme.
Greater Context Length
Lastly, GPT-4 is able to handle much larger amounts of text and keep conversations going for longer. For reference, GPT-3.5 had a max request value of 4,096 tokens, which is equivalent to roughly 3,000 words. GPT-4 has two variants, one with 8,192 tokens (6,000 words) and another with 32,768 tokens (24,000 words).
Interested in learning more about the impact artificial intelligence is having on the world of work? VC+ members have access to this special dispatch as well as our entire archive of VC+ content. Find out more. |
Technology
Charting the Next Generation of Internet
In this graphic, Visual Capitalist has partnered with MSCI to explore the potential of satellite internet as the next generation of internet innovation.
Could Tomorrow’s Internet be Streamed from Space?
In 2023, 2.6 billion people could not access the internet. Today, companies worldwide are looking to innovative technology to ensure more people are online at the speed of today’s technology.
Could satellite internet provide the solution?
In collaboration with MSCI, we embarked on a journey to explore whether tomorrow’s internet could be streamed from space.
Satellite Internet’s Potential Customer Base
Millions of people live in rural communities or mobile homes, and many spend much of their lives at sea or have no fixed abode. So, they cannot access the internet simply because the technology is unavailable.
Satellite internet gives these communities access to the internet without requiring a fixed location. Consequently, the volume of people who could get online using satellite internet is significant:
Area | Potential Subscribers |
---|---|
Households Without Internet Access | 600,000,000 |
RVs | 11,000,000 |
Recreational Boats | 8,500,000 |
Ships | 100,000 |
Commercial Aircraft | 25,000 |
Advances in Satellite Technology
Satellite internet is not a new concept. However, it has only recently been that roadblocks around cost and long turnaround times have been overcome.
NASA’s space shuttle, until it was retired in 2011, was the only reusable means of transporting crew and cargo into orbit. It cost over $1.5 billion and took an average of 252 days to launch and refurbish.
In stark contrast, SpaceX’s Falcon 9 can now launch objects into orbit and maintain them at a fraction of the time and cost, less than 1% of the space shuttle’s cost.
Average Rocket Turnaround Time | Average Launch/Refurbishment Cost | |
---|---|---|
Falcon 9* | 21 days | < $1,000,000 |
Space Shuttle | 252 days | $1,500,000,000 (approximately) |
Satellites are now deployed 300 miles in low Earth orbit (LEO) rather than 22,000 miles above Earth in Geostationary Orbit (GEO), previously the typical satellite deployment altitude.
What this means for the consumer is that satellite internet streamed from LEO has a latency of 40 ms, which is an optimal internet connection. Especially when compared to the 700 ms stream latency experienced with satellite internet streamed from GEO.
What Would it Take to Build a Satellite Internet?
SpaceX, the private company that operates Starlink, currently has 4,500 satellites. However, the company believes it will require 10 times this number to provide comprehensive satellite internet coverage.
Charting the number of active satellites reveals that, despite the increasing number of active satellites, many more must be launched to create a comprehensive satellite internet.
Year | Number of Active Satellites |
---|---|
2022 | 6,905 |
2021 | 4,800 |
2020 | 3,256 |
2019 | 2,272 |
2018 | 2,027 |
2017 | 1,778 |
2016 | 1,462 |
2015 | 1,364 |
2014 | 1,262 |
2013 | 1,187 |
Next-Generation Internet Innovation
Innovation is at the heart of the internet’s next generation, and the MSCI Next Generation Innovation Index exposes investors to companies that can take advantage of potentially disruptive technologies like satellite internet.
You can gain exposure to companies advancing access to the internet with four indexes:
- MSCI ACWI IMI Next Generation Internet Innovation Index
- MSCI World IMI Next Generation Internet Innovation 30 Index
- MSCI China All Shares IMI Next Generation Internet Innovation Index
- MSCI China A Onshore IMI Next Generation Internet Innovation Index
MSCI thematic indexes are objective, rules-based, and regularly updated to focus on specific emerging trends that could evolve.
Click here to explore the MSCI thematic indexes
-
Technology2 weeks ago
Countries With the Highest Rates of Crypto Ownership
While the U.S. is a major market for cryptocurrencies, two countries surpass it in terms of their rates of crypto ownership.
-
Technology2 weeks ago
Mapped: The Number of AI Startups By Country
Over the past decade, thousands of AI startups have been funded worldwide. See which countries are leading the charge in this map graphic.
-
Technology3 weeks ago
All of the Grants Given by the U.S. CHIPS Act
Intel, TSMC, and more have received billions in subsidies from the U.S. CHIPS Act in 2024.
-
Technology3 weeks ago
Visualizing AI Patents by Country
See which countries have been granted the most AI patents each year, from 2012 to 2022.
-
Technology4 weeks ago
How Tech Logos Have Evolved Over Time
From complete overhauls to more subtle tweaks, these tech logos have had quite a journey. Featuring: Google, Apple, and more.
-
AI1 month ago
Ranked: Semiconductor Companies by Industry Revenue Share
Nvidia is coming for Intel’s crown. Samsung is losing ground. AI is transforming the space. We break down revenue for semiconductor companies.
-
Personal Finance1 week ago
Visualizing the Tax Burden of Every U.S. State
-
Misc7 days ago
Visualized: Aircraft Carriers by Country
-
Culture1 week ago
How Popular Snack Brand Logos Have Changed
-
Mining1 week ago
Visualizing Copper Production by Country in 2023
-
Misc1 week ago
Charted: How Americans Feel About Federal Government Agencies
-
Healthcare1 week ago
Which Countries Have the Highest Infant Mortality Rates?
-
Demographics1 week ago
Mapped: U.S. Immigrants by Region
-
Maps1 week ago
Mapped: Southeast Asia’s GDP Per Capita, by Country