Divine Artifact in a Scientific World

Chapter 291: Ten Times


Jack was crossing the quad, headed towards his next class, when he received a text message from Madison. She'd sent him a link to a video. A press release from AI-n-stein.

He stopped and clicked the link.

"Today we announce version 5 of our premier BBnB model. Version 5 is a significant improvement over version 4, with performance on some benchmarks showing as much as a 60% improvement. Many years of hard work went into training our latest version, and all our effort has paid off."

Looks like AI-n-stein finished using Radium to teach their old model some new tricks, he thought. He chuckled to himself as he thought about how they would react when Radius released their new version.

The young woman in the video smiled happily, then continued. "And while some of our competitors decline to release specific numbers, an independent assessment performed by Figure Eight Research shows that our new version outperforms all other models available."

His phone dinged, informing him he had a new message from Madison. He'd already seen enough of the AI-n-stein PR video, so he closed it, then read Madison's new message.

She'd sent him another link. This time to a video of a tech influencer.

He clicked the link and watched as a young man appeared on screen.

"As many of my regular viewers may know, I've spent the last few videos examining the performance of the new Karl and Radium models from Radius 10K."

A bar chart appeared on screen.

"As you can see from this chart, the Karl model performance is on par with that of AI-n-stein's BBnB version 4 and the other competitors."

The bar chart on screen was replaced with a new bar chart.

"And Radius's Radium model crushed AI-n-stein's BBnB version 4 model in every category. It also outperformed all specialist models for which a benchmark was available."

"Pretty much everyone has been clambering to get a Radius account, and my sources inside several AI companies told me they were low-key panicking."

The young man smiled knowingly, showing that he thought he knew exactly why those AI companies might be panicking.

Then he turned serious and said, "In the past, new versions of AI models have shown modest improvements over previous versions, at best. So you can understand why I would be surprised and a little skeptical of AI-n-stein's latest claims."

A new bar chart appeared on screen.

"Buried deep in their latest press release, we find this chart, showing a comparison between Radium and BBnB version 5, attributed to an investigation performed by Figure Eight Research. The numbers are not quite as glowing as AN-n-stein would have us believe."

Most of the bars flashed on the chart.

"For these benchmarks we can see that Radium still outperforms BBnB."

Four bars flashed on the chart.

"For these two benchmarks, we see that BBnB just edges out Radium. Now, take a moment and ask yourselves, what is special about these two benchmarks. And while you're at it, also ask yourself if there is anything missing."

The young man picked up an energy drink and took a long sip, making sure the logo on the can was clearly visible on screen.

"Did you spot the problems with this chart?" the young man asked with a smirk.

"The only two benchmarks for which AI-n-stein's BBnB outperformed Radium also happen to be benchmarks created by AI-n-stein. And even more telling, three coding and content editing benchmarks are missing from the Figure Eight Research assessment."

"So, I asked myself, did AI-n-stein really catch up?"

Then he held up the energy drink can again.

"Thanks to this video's sponsor, I was able to run an assessment BBnB and Radium against those three missing benchmarks."

The young man began a long technical description of how he ran the assessment, and where he had published the source code for others to review. Skipping ahead in the video, Jack stopped when a new bar chart appeared on screen.

"And here are the results of my assessment. As you can see, Radius is still kicking AI-n-stein's ass in areas that really matter for practical use."

The young man said a few more words about why he thought those three benchmarks mattered, then ended the video with, "And for those that may be wondering, no, I have not received any funding, or communication of any kind from Radius 10K. I would love to hear from them, but they have so far ignored my e-mails."

Because they just didn't have the staff necessary to field all the questions flowing their way from various sources, Radius 10K was choosing to answer none of them. Social media could be a brutal and vindictive world, so they figured it was better to remain silent than show favoritism.

——————————

Hammond Saltzman, CEO of AI-n-stein, walked into the meeting room and gazed at the people seated at the conference table.

"Eric, where are we at on usage and subscription numbers?" he asked.

"Usage numbers appear to have stabilized, but it's too early to tell if the trend has reversed. As for subscription numbers, we show a small uptick from last week. It's not a large number, but looks promising."

"Good, good," said the CEO, smiling. Then his smile faded, and he said, "Terrance, I see you shaking your head. Are you still upset that I forced you to publish early?"

"Yes, Hammond, I am. I still think we should have waited until we could show parity across the board. I've already seen several blogs that caught on to the fact that several benchmarks were excluded from Figure Eight's assessment."

"Yes, I understand, Terrance, but I have the board breathing down my neck to stop the bleeding, and this was the quickest way to do that."

The CEO turned to another man at the table and said, "Rory, what about the new data we received from a generous donor? How goes the analysis?"

"We have a clean-room implementation of their model architecture working. And verified that it is the same model they are using for their services. And I believe we will be able to produce a paper that reasonably explains how we arrived at the new architecture. However..."

"What, spit it out," said the CEO, impatiently.

"It's nearly ten times harder to train," said Rory unhappily. "We can tune it, but it will take ten times longer than our own model."

"Ten times?!" exclaimed Terrence.

"Yes, it's heavily optimized for inference-time performance. It has a much larger context window than any model I've seen, and has some architectural changes that improve chain-of-thought style problem solving, but the tradeoff is that it's much harder to train."

"If it's so hard to train, how the heck did they catch up to us?" someone asked.

"Hell is I know," said Rory. "As far as I can tell, it would have taken them at least a decade to train this model. And that's assuming they had Blit Blaster A5000 cards ten years ago. If we assume they only had access to the same tech we did, then it should have taken them closer to twenty years."

"So, can we conclude that they will not be able to make any sudden leaps forward?" asked the CEO hopefully.

"If they don't have access to some magical server that makes a Cerebras look like a 1970s calculator, then that seems like a safe assumption," said Terrance reservedly.

"Excellent," beamed the CEO. "Proceed with training the new model. Make sure it outperforms Radium on all the benchmarks that matter."

If you find any errors ( broken links, non-standard content, etc.. ), Please let us know < report chapter > so we can fix it as soon as possible.


Use arrow keys (or A / D) to PREV/NEXT chapter