Grok 4 is impressive. Elon says that Grok 4 appears in almost all academic areas or beyond doctoral students. The demo of the last night showed some incredible results in AI benchmarks. The most impressive of these was Grok 4’s score in the “last test of humanity”.
The last exam of humanity is a challenging AI benchmark that was created by the Center for AI Security and Scale AI and contains 2,500 multimodal questions on various topics that are at the limit of human knowledge. These questions are incredibly difficult. GROK exceeded all other voice models, including Openai’s O3 and Google’s Gemini 2.5 with a significant edge. Grok 4 Heavy, a model that uses several agents to summarize problems with problems and also use tools such as code designs or searches, has carried out exceptionally well.
Test grok. Is it really PhDD level?
This is difficult to test unless you have really strong knowledge of a topic. Since I have been studying Google search for a long time, I thought I would test Grok 4 on a few questions about how Google’s systems work.
Prompt: When was Google’s last core update? What is it specific? (be short)
Grok 4 thought for 38 seconds. There are relevant sources on the Internet and correctly told me that the update started on June 30th and can take up to three weeks.
I thought that would be a good answer.
Prompt: How can a website restore if it is influenced by this update?
Traditionally, LLMS frustrate me with her answer to this question. They come from long -term advice to SEO, which were good many years ago, but not exactly now. Most LLMs will tell me that I should concentrate on the technical aspects of a website and work on more backlinks. In recent years, however, Google’s core updates have introduced new techniques for the use of machine learning systems to predict which pages are probably the gratifying result for a viewfinder. What is really important when they have been influenced by core update is that they have helpful, reliable and satisfactory content.
Grok nailed this question.
However, these are not questions from the PhD level. Let us see if GROK can explain how Google search works.
Prompt: Explain the machine learning systems that Google uses in the search.
I was enthusiastic to see that Grok called on my website, among other things, to answer this question.
I found that interesting because GROK’s data protection guideline Says that you use Brave’s search results.
Still, A search for courageous question Doesn’t enforce my page.
GROK’s answer how Google uses machine learning so that the search began to discuss functions such as AI overviews and AI mode. Then I gave me the answer I was looking for. Here are some parts of it. This is an impressive answer.
There are much more that could be added to this answer, but I thought it was pretty good.
Next I asked a question about the latest breakthrough by Google called Muvera. Grok initially thought it was a typo. It made some searches and also arrested this answer.
Other interesting new functions
I am very interested in Grok 4 heavy that uses several agents to prevent problems for them. The Costs for this costs $ 300/m or $ 3000 per year, which corresponds to Openai and Google’s premium plans that offer multicagenic browsers in tools such as operators and project wall.
The demo for Grok Voice was impressive. They compared it with Chatgpt’s voice and it was quite obvious that there was a better latency and a real sounding voice.
https://www.youtube.com/watch?v=ilelxyknalo
The team also said that it works on a coding model, multi-modal agent and a really good videoogenization model that should all be released in the next few months.
What impressed me the most about this demo was Elons tone when I discussed the future. Maybe he’s hyperbolic here? He talks about whether AI is good for humanity. And ends with a terrible note.
“The actual idea of ​​a human economy under the assumption that civilization is developing will appear very picturesque afterwards. It will look as if” throwing cave men into a fire “. It will most likely be good … yes … but I have reconciled with the fact that even if it would not be good, at least I would be alive to see it happens. “
https://www.youtube.com/watch?v=Qblcxnrcvb8
Is Grok 4 Agi? Most likely no, but it is damn impressive. Is it better than O3 or Gemini 2.5? Maybe, although I will probably use the models from Openai and Google for the time being. Given the fact that Xai A has enormous amount of computing powerI think GRK could quickly appear as a dramatically more helpful model than any other.
I will continue to test GRK 4. I will report more in my Community And Newsletter.