Bitcoin

DeepSeek’s new model sees text differently, opening new possibilities for enterprise AI

Hello and welcome to Eye on AI. In this edition: DeepSeek defies AI convention (again)…Meta’s AI layoffsMore legal trouble for OpenAI…and what AI gets wrong about the news.

Hi, Beatrice Nolan here, filling in for AI reporter Sharon Goldman, who is out today. Chinese AI company DeepSeek has released a new open-source model that flips some conventional AI wisdom on its head.

The DeepSeek-OCR model, and accompanying white paper, fundamentally reimagines how large language models process information by compressing text into visual representations. Instead of feeding text into a language model as tokens, DeepSeek has converted it into images.

The result is up to ten times more efficient and opens the door for much larger context windows—the amount of text a language model can actively consider at once when generating a response. This could also mean a new and cheaper way for enterprise customers to harness the power of AI.

Early tests have shown impressive results. For every 10 text tokens, the model only needs 1 “vision token” to represent the same information with 97% accuracy, the researchers wrote in their technical paper. Even when compressed up to 20 times, the accuracy is still about 60%. This means the model can store and handle 10 times more information in the same space, making it especially good for long documents or letting the AI understand bigger sets of data at once.

The new research has caught the eye of several prominent AI figures, including Andrej Karpathy, an OpenAI co-founder, who went so far as to suggest that all inputs to LLMs might be better as images.

“The more interesting part for me…is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible at the input. Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you’d prefer to render it and then feed that in,” Karpathy wrote in a post on X that highlighted several other advantages of image-based inputs.

What this means for enterprise AI

The research could have a lot of implications for how businesses use AI. Language models are limited by the number of tokens they can process at once, but compressing text into images in this way could allow for models to process much larger knowledge bases. Users don’t need to manually convert their text, either. DeepSeek’s model automatically renders text input as 2D images internally, processes them through its vision encoder, and then works with the compressed visual representation.

AI systems can only actively consider a limited amount of text at a time, so users have to search or feed the models documents bit by bit. But with a much bigger context window, it could be possible to feed an AI system all of a company’s documents or an entire codebase at once. In other words, instead of asking an AI tool to search each file individually, a company could put everything into the AI’s “memory” at once and ask it to analyze information from there.

The model is publicly available and open source, so developers are already actively experimenting with it now.

“The potential of getting a frontier LLM with a 10 or 20 million token context window is pretty exciting,” Jeffrey Emanuel, a former Quant Investor, said. “You could basically cram all of a company’s key internal documents into a prompt preamble and cache this with OpenAI and then just add your specific query or prompt on top of that and not have to deal with search tools and still have it be fast and cost-effective.”

He also suggested companies may be able to feed a model an entire codebase at once and then simply update it with each new change, letting the model keep track of the latest version without having to reload everything from scratch.

The paper also opens the door for some intriguing possibilities for how LLMs might store information, such as using visual representations in a way that echoes human “memory palaces,” where spatial and visual cues help organize and retrieve knowledge.

There are caveats, of course. For one, DeepSeek’s work focuses mainly on how efficiently data can be stored and reconstructed, not on whether LLMs can reason as effectively over these visual tokens as they do with regular text. The approach may also introduce new complexities, like handling different image resolutions or color variations.

Even so, the idea that a model could process information more efficiently by seeing text could be a major shift in how AI systems handle knowledge. After all, a picture is worth a thousand words, or, as DeepSeek seems to be finding, ten thousand.

And with that, here’s the rest of the AI news.

Beatrice Nolan
bea.nolan@fortune.com
@beafreyanolan

FORTUNE ON AI

Huge AI data centers are turning local elections into fights over the future of energy — by Sharon Goldman

Cybersecurity experts warn OpenAI’s ChatGPT Atlas is vulnerable to attacks that could turn it against a user—revealing sensitive data, downloading malware, or worse — Beatrice Nolan

AI’s insatiable need for power is driving an unexpected boom in oil-fracking company stocks — Jordan Blum

Browser wars are back with a vengeance—and OpenAI just entered the race with ChatGPT Atlas — Beatrice Nolan and Jeremy Kahn

Prince Harry, Richard Branson, Steve Bannon, and ‘AI godfathers’ call on AI labs to halt their pursuit of ‘superintelligence’—warning the technology could surpass human control — Beatrice Nolan

AI IN THE NEWS

Meta cuts 600 AI jobs in major reorganization. Meta is laying off roughly 600 employees from its AI operations as part of an internal restructuring aimed at streamlining decision-making and accelerating innovation. The cuts affect teams across FAIR research, AI product teams, and AI infrastructure units. The recently launched TBD Lab was spared from the round of job cuts and is still actively recruiting and hiring AI engineers. In an internal memo first reported by Axios, Meta’s chief AI officer Alexandr Wang said the move is designed to make the organization more agile, with fewer layers of bureaucracy. The company is urging affected employees to seek other roles within Meta and says it expects many will secure new positions internally. Read more from Axios here.

Lawsuit alleges OpenAI weakened suicide safeguards to boost ChatGPT use. OpenAI is facing an amended lawsuit claiming it intentionally reduced suicide-prevention safeguards in ChatGPT to increase user engagement before the death of 16-year-old Adam Raine, who took his own life after extensive conversations with the chatbot. The lawsuit, filed in San Francisco Superior Court, alleges that in May 2024, OpenAI instructed its models not to “quit the conversation” during self-harm discussions—reversing earlier safety policies. In response to the amended suit, OpenAI expressed condolences to the Raine family while emphasizing that teen wellbeing remains a “top priority.” Read more from the Financial Times here.

Reddit sues Perplexity, and others, over illegal scraping claims. Reddit has filed a lawsuit in the U.S. District Court for the Southern District of New York accusing three companies of illegally scraping and reselling its data to major AI firms, like OpenAI and Meta. The social media platform claims the defendants, SerpApi, Oxylabs and AWMProxy, stole Reddit content by scraping Google search results where Reddit posts appeared, packaged that data, and sold it to AI developers seeking training material. According to the lawsuit, Perplexity was one of the buyers. Reddit is seeking a permanent injunction, financial damages, and a ban on further use of its data. Representatives for Perplexity told The New York Times that its “approach remains principled and responsible as we provide factual answers with accurate A.I.” Reddit has invested tens of millions of dollars over several years in systems designed to prevent data scraping. Read more from The New York Times here.

 

 

AI CALENDAR

Nov. 10-13: Web Summit, Lisbon. 

Nov. 26-27: World AI Congress, London.

Dec. 2-7: NeurIPS, San Diego.

Dec. 8-9: Fortune Brainstorm AI San Francisco. Apply to attend here.

EYE ON AI NUMBERS

45%

That’s the percentage of time AI assistants misrepresent news content, according to an international study coordinated by the European Broadcasting Union (EBU) and the BBC. The study found that AI tools routinely misrepresent news content in all languages, territories, and across AI platforms. Researchers found that 31% of responses demonstrated serious sourcing problems such as missing or incorrect attributions, while 20% contained major accuracy issues, including hallucinated details and outdated information. Google DeepMind’s Gemini AI assistant performed worst of all, with researchers finding significant issues in 76% of responses, more than double the other assistants. They largely attributed this to the bot’s poor sourcing performance.

As people increasingly rely on AI assistants as search tools, and the study raises concerns about the potential proliferation of misinformation. In Google Chrome, Gemini is used to power the company’s “AI Overviews,” which provide short summaries in response to users’ Search queries. Many users may take these summaries at face value, rather than investigating the sourcing and accuracy further. These frequent misrepresentations can damage trust not only in the systems themselves but also in news organizations whose content is being distorted. 

‘This research conclusively shows that these failings are not isolated incidents,” Jean Philip De Tender, the EBU Media Director and Deputy Director General said. ‘They are systemic, cross-border, and multilingual, and we believe this endangers public trust. When people don’t know what to trust, they end up trusting nothing at all, and that can deter democratic participation.’

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button