Well, you’ve probably seen it in the news, DeepSeek is an artificial intelligence model created with a supposedly quite low budget.
Table of Contents
And yet it is better than some of the best models created by the best companies in the United States in some aspects, but why has DeepSeek caused such a stir?
What’s so special about a Chinese AI company developing such a model?
Well, most likely, reading the news didn’t tell you anything because it happened to me too.
So today I’m going to tell you exactly and without nonsense or clickbait what’s so special about DeepSeek and why it has turned the world upside down.

Who Founded DeepSeek?
Liang Wenfeng was an engineering student at a Chinese university. Engineering was something he was passionate about, but in his student years, he realized that if he really wanted to make money, he had to start exploring other topics.
That’s where he started to get interested in finance and formed a student group within his university to talk and learn about financial markets and the stock market.
Applying engineering knowledge to the stock market, he began to make his first forays into the world of Quant Trading.
Quant trading is when the action of buying and selling assets is done by a computer, that is, instead of having a person who is there buying and selling shares by hand.
We have a computer with software that is there doing analysis of all the data, and deciding when to buy and when to sell.
It’s not the typical stock market bot that your favorite investment YouTuber leaves you the download link for, but we’re talking about advanced mathematical models.
But Liang’s idea went beyond simply making a stock market bot. His idea was to use Machine Learning or machine learning to be able to make decisions, and he dedicated a large part of his university years to all this.
So much so that in 2016, after finishing university, he founded High Flyer, an investment firm 100% based on purchase decisions automated by computer.
Which over the years became the top four firm in China, managing assets worth 8 billion dollars.
Let’s say he didn’t do badly, but one of his dreams from the beginning was to use artificial intelligence applied to financial markets.
To have an artificial intelligence that was capable of determining with great precision when to buy and when to sell.

History of the Chinese AI DeepSeek
The topic of artificial intelligence was so interesting to Liang that in 2021 he bought thousands of Nvidia graphics cards.
Many saw him as an eccentric millionaire who bought toys to get rid of the itch to play with technologies that at the moment had no use outside of a university project, and others knew that what was coming was going to be a revolution.
Two years later, on July 17, 2023, he founded the company Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Corporation Limited.
Also known as DeepSeek, a company with less than 200 employees.
That at the end of 2024 launched an artificial intelligence model that shook the entire technology industry and the stock market, and even made more than one government nervous. What was this launch about?

Did the Chinese government create DeepSeek?
There are a couple of very important points that we have to clarify about DeepSeek. The first is that it is a company founded with the capital of High Flyer, this Quant trading firm that Liang has.
But well, right now there is no evidence that China as a government has anything to do with DeepSeek.
Liang used money from his investment firm to create this company, in the pure American capitalist style.
While the Chinese government is indeed encouraging artificial intelligence, it is really not a company that has any kind of affiliation with the government, it is a company of Liang or rather of High Flyer.
Is the Chinese government somehow helping DeepSeek? Well, the truth is that it is difficult to know, that is, we have no way of knowing this.
But in principle, China’s plans to encourage artificial intelligence are rather projects to build data centers, boost AI in universities, and make laws favor generative AI in some way and that it is not as restricted as in Europe.
DeepSeek R1 and V3
DeepSeek launched two models in November 2024, one is called DeepSeek R1 and the other is called DeepSeek V3.
These two models are large language models in the style of ChatGPT, Claude, Gemini, or Llama.
Basically, they are models that generate text, and there are many models of this type, but not as good as this one.
Look, there are tests to assess how good a model is compared to others, and here you can see some of these tests.

DeepSeek Benchmarks
Well, it turns out that in some of these tests, DeepSeek V3 is now better than the best we had to date.
Surpassing Claude 3.5 and GPT 4, DeepSeek is better in performance tests such as:
MMLU-Redux:
Which includes tests of general knowledge, logical reasoning, and advanced comprehension in multiple subjects such as mathematics, history, sciences, and more.

Drop (Discrete Reasoning Over Paragraphs):
We also have, for example, Drop, which is discrete reasoning over paragraphs, which measures the ability to reason about long texts.
Where the answers sometimes require calculations, data combinations, or logical reasoning.

Aider polyglot:
Another example of a test is aider polyglot, which evaluates the model’s ability to work with multiple programming languages, understanding and executing tasks with different syntaxes.
Well, this, to give us an idea, tells us that in some of these tests it is not only very similar to the performance of other very advanced models like ChatGPT.
But in some, it is even better, and this is impressive because in general, it is not so easy to create a model that is so good, that is, very few companies in the entire world have achieved it, and that is the first reason why DeepSeek is so impressive.

DeepSeek Prices
The rates for using this model are quite cheaper than the competition, the website to chat with DeepSeek is free.
Type | DeepSeek-chat | DeepSeek-Reasoner |
1M Tokens INPUT (CACHE HIT)⁴ | $0.07 | $0.14 |
1M Tokens INPUT (CACHE MISS) | $0.27 | $0.55 |
1M Tokens OUTPUT⁵ | $1.10 | $2.19 |

How Does DeepSeek Make Money?
The way these companies really make money is through the API.
Well, using this API has a price, obviously it is not free, and the price is measured by token, each token is roughly a generated word. The more words the AI generates, the more the application owner is charged.
While ChatGPT’s output tokens cost 10 per million tokens in its GPT 4 model, the standard DeepSeek with v3 costs 1.1 per million tokens, about 10 times cheaper than ChatGPT.
What is an API?
The API is the application programming interface, this basically in plain English is a way to connect programs with each other. Imagine you have an app like, for example:
A personal training fitness app, and you want this app to have artificial intelligence like a chat with a personal trainer who is actually an AI and gives you recommendations on how you are doing with the exercises.
Well, for this company that you have of sports applications, managing to program artificial intelligence at the level of ChatGPT, Gemini, or DeepSeek is not within your reach, it is something very complex that needs a lot of investment money and many servers to be able to execute all that intelligence.
So your best option is to connect your application with one of these artificial intelligences that already exist, and that is the service they provide.
The sports app would give a series of your data (prompts), the training history, biometric data, and the AI would generate a response, that is, it would simulate this personal trainer.
This is one of the most important business models that exists in this type of service, not like ChatGPT.

DeepSeek Training
These artificial intelligence models, as we said before, need supercomputers with many graphics cards to be able to run in data centers with millions of investment.
Well, hold on because it turns out that DeepSeek has released its DeepSeek v3 and R1 model in an open-source and totally free way, that is, you can literally download them and run them in your data center, but for the largest model of DeepSeek R1, for example, which has 671 billion parameters, you would need about 16 A100 Nvidia graphics that have about 80 GB of memory each, adding up to a total of 1280 GB of memory.
Setting this up would cost you something like half a million dollars, but well, you could run DeepSeek R1 with all its potential, and this is a pretty big threat against the American artificial intelligence industry.
Well, the first week of launch has had more than a million downloads, and these are not the app downloads or people who registered on the website.
But people with technical knowledge and with the infrastructure to run the AI on their own servers.
Development Costs
Another thing that is very impressive and one of the most talked about things is that DeepSeek was incredibly cheap to train.
But not only to train, but it is also cheap to run. Not only was the training done with a fairly small number of graphics and in a surprisingly short time.
But also when they made R1, the model that reasons, they also spent very little money compared to what people would expect when creating a model of that type.
How did they do this? Well, with a series of technical improvements, evolutions, and optimizations.
Llama or GPT are based on a general-purpose neural network that is trained with a lot of knowledge of all kinds to generate text about anything or any area of knowledge.
But in general, when they generate words, what we have is a large brain that processes each of the prompts that the user puts.

DeepSeek Architecture
DeepSeek, on the other hand, is based on an architecture called mixture of experts.
Here the idea is that instead of having a large model that is executed completely every time we are going to process the prompt that the user writes.
We have several smaller models that are specialized in different topics.
This is not an original idea from DeepSeek, they didn’t come up with it, but it was already implemented, for example, in a Google project called Gard or in the mix model, and it even comes from a 2017 paper.
So what we have in front of us is a router, the router is the one that is going to choose who is in charge of solving that question.
With the mixture of experts, what is going to be used specifically is selected in a much more specific way, it is like having a brain that is used all the time at 100%.
Or that only uses some specific parts to do specific tasks, and that really what it does is save a lot of energy to the system.
Therefore, we need fewer graphics to run the model.
Development Difficulties
Not only did they use a different architecture, but they also had to modify the graphics cards they used in the training.
Since the Nvidia graphics cards that are used to train AI are less powerful in China.
Due to laws, and the political conflicts between the United States and China, which force Nvidia to limit the graphics that are sent to China.
That is why in the development of DeepSeek it was necessary to modify all those graphics one by one to make the most of them, despite the limitations they had at the hardware level.
This allowed them to create a training model that is more efficient than the one used in the United States, and reducing costs to the maximum.

Another Relevant Chinese AI
Qwen2.5-Max
Recently, Alibaba’s AI, one of the largest companies in China, Qwen2.5-Max, was launched, which aims to compete directly with leading models such as GPT-4o, DeepSeek V3, and more.
Summary:
DeepSeek came to change the way these types of models are trained, and it is admirable how the developers bypassed the difficulties to achieve their objectives with great ingenuity and creativity, they managed to create an AI model that competes with the biggest.
Don’t be surprised to see that a lot of AI models based on DeepSeek’s code will start to come out from now on, and here at Responsible Technology we will be on the lookout to tell you everything about China’s AI.