OpenAI’s New Reasoning Model-o1- Beats the Reasoning Skills of PhD Students
On September 13, OpenAI released o1 , a new model designed to tackle challenging problems. The o1 model can handle complex reasoning step-by-step and tackle problems in fields like math, programming, and science. Compared to the previous model, ChatGPT-4o, o1 has made significant improvements. Its chain-of-thought (COT) reasoning ability allows o1 to excel in complex reasoning tests, even outperforming human doctors in some cases.
Key Features of ChatGPT-o1
The generative AI, code-named “Strawberry,” was officially named “o1” at the OpenAI conference on September 13.
1Powerful Reasoning Abilities
One of the standout features of the o1 model is its ability to handle complex, step-by-step reasoning. Its Chain of Thought (COT) capability enables structured thinking by breaking down a single task into multiple simpler steps, which improves the logical flow of its answers. When tackling complex questions, o1 doesn't just give a direct answer—instead, it walks through the reasoning process step-by-step, similar to how a person would solve a problem. It first considers the logic of each step and then gradually works toward the final answer. This approach is useful for tasks like identifying privileged emails in a lawyer’s inbox or brainstorming product marketing strategies.
2Specialized Training Approach
o1 was trained with a new optimization algorithm and a specially tailored dataset (using reinforcement learning to enhance its chain-of-thought reasoning). This new training approach makes the model more accurate, and researchers say it has fewer "hallucinations" (i.e., it generates fewer answers that seem reasonable but are actually incorrect or nonsensical).
In a series of tweets, OpenAI research scientist Noam Brown said, “o1 is trained with reinforcement learning to ‘think’ through an internal chain of thought before responding. The longer it thinks, the better it performs on reasoning tasks.”
3Drawbacks of o1
However, o1’s thinking time could also be a downside. Compared to other AI models, o1 might respond more slowly. Plus, o1 is currently a pure text model, which means it can only handle specific files or gather real-time information from the internet. Lastly, even users with trial access to o1 are limited to 30 trials of o1-preview and 50 trials of o1-mini each week.
ChatGPT-o1 vs. Other AI Models
o1's abilities in math, programming, and science have significantly improved, outperforming the best previous model, Claude 3.5 Sonnet. In scientific Q&A sessions, it even exceeds the level of human PhDs. Specifically, in programming, o1 outperformed 83% of competitors in the Codeforces programming contest. For math competitions, like AIME 2024, GPT-4o could only solve about 12% of the problems on average, while o1 solved 74%. When considering a consensus of 64 samples, the success rate can reach 83%. In terms of scientific skills, for doctoral-level questions (GPQA Diamond), GPT-4o's accuracy is 56.1, human experts average 69.7, and o1 achieved 78. o1 is currently the first AI model to surpass human PhDs in GPQA scientific tests.
Data shows that in the 2024 American Middle School Mathematics Invitational and the Codeforces Programming Competition, o1-preview's ability to solve math and programming problems improved by 5 to 6 times compared to GPT-4o. Even more impressive, the full version of o1 is up to 8 to 9 times better than GPT-4o. Additionally, in the GPQA-diamond test—a challenging benchmark that assesses expertise in chemistry, physics, and biology—both o1-preview and o1 greatly outperformed GPT-4o and, importantly, surpassed human experts as well.
How to Try ChatGPT-o1 and Pricing Info
Starting September 13, ChatGPT Plus (Advanced Edition) and Team (Team Edition) users can access the o1 model in ChatGPT. Both o1-preview and o1-mini can be selected manually in ChatGPT's model options. Right now, o1-preview has a weekly message limit of 30, while o1-mini has a limit of 50 messages per week.
OpenAI is working hard to raise the message limits for o1 and to enable ChatGPT to automatically choose the right AI model based on what users ask. Starting next week, ChatGPT Enterprise and Edu users will have access to both models. Additionally, OpenAI plans to make o1-mini available to all ChatGPT free users as well.
Pricing: Compared to the earlier GPT-4o, o1 is more expensive. Using o1-preview through the API costs $15 per million tokens for incoming requests and $60 for outgoing requests. In contrast, GPT-4o charges only $5 for incoming requests and $15 for outgoing requests (100,000 tokens is the size of the text block the model processes, which is roughly equivalent to about 750,000 words). In other words, o1 is three to four times more expensive to use than GPT-4o.
More Tools and Models to Explore
Do you think OpenAI is pricey? Want to try out an AI tool that has multiple built-in AI models all in one place? ChatArt is definitely your top alternative to ChatGPT! It supports models like OpenAI o1-preview, o1-mini, GPT-4o, Claude 3.5, Gemini 1.5, and more. You can get quick answers to any questions through AI chat, plus there are over 100 writing tools for you to explore! It also includes AI content detection and human-like rewriting to help with original paper writing and content creation!
ChatArt
Best AI Chat, AI Writing, Marketing Assistant
5,323,556 users have tried it for free!
- Supported models: OpenAI o1-preview, o1-mini, GPT-4o, Claude 3.5, Gemini 1.5, etc.
- The AI writing generator creates high-quality and smooth articles, blogs, papers, and more with just one click.
- Over 100 writing templates available, supporting text export in multiple languages.
- The professional AI marketing SEO writing assistant takes care of everything from marketing copy and e-commerce writing to slogans, emails, and brand building—all in one place.
- Grammar checker and bypass AI detector help create 100% original text content, fully freeing up your writing inspiration!
The launch of o1 marks the start of a new era for AI. It not only redefines what AI can do in theory but also shows us the exciting possibilities for using this technology in the future. However, it's important to use AI tools responsibly.
Free AI TV Show Name Generator with Unlimited TV Show Title Ideas
Free Research Paper Generator | Research Maker for Student
Free AI Pick Up Line Generator