News | Galileo’s Latest Evaluation Foundation Models to Assess LLMs

Galileo’s Latest Evaluation Foundation Models to Assess LLMs

Published by: Insights Desk Released: Jun 07, 2024 Source: DemandTalk

Highlights:

Galileo CEO Vikram Chatterji reported that businesses must be able to assess hundreds, if not thousands, of AI solutions in near real-time for issues including toxicity, security threats, and hallucinations.
The startup tested Luna EFMs through a sequence of benchmarks to compare their performance with other AI assessment tools.

Galileo’s latest evaluation foundation models are specifically designed to assess the effectiveness of large language models, like Google LLC’s Gemini Pro and OpenAI’s GPT-4o. The generative AI evaluation startup has introduced industry’s first family of evaluation foundation models.

Galileo created the Luna EFM models, essentially LLMs, in response to the AI industry’s experiments in using AI for AI evaluation. The business stated that there has been encouraging development in the last few years in the amount of research published on the applicability of employing models like GPT-4 to evaluate the responses of different LLMs.

Galileo reasoned that considering these advancements, it could be more sensible to design a group of specialized LLMs that were explicitly trained to assess the outputs of other generative AI models. The outcome of this effort is the Luna EFM family.

The business notes in a study posted on Arxiv that each of its Luna EFMs has been customized to carry out a particular evaluation task, such as identifying instances of artificial intelligence systems fabricating replies or hallucinations. Others are made to identify fraudulent prompts, context quality issues, and data leaks.

Galileo, a company that creates tools to improve the accuracy of AI models, asserts that its Luna EFM models can provide businesses with the assurance they need to implement generative AI chatbots at scale because they are faster, more affordable, and more accurate than either GPT-4 or conventional human vibe checks.

Galileo CEO Vikram Chatterji reported that businesses must be able to assess hundreds, if not thousands, of AI solutions in near real-time for issues including toxicity, security threats, and hallucinations. He stated that after collaborating with numerous enterprises to address this issue, the organization concluded that standard LLM-based reviews and human assessments were excessively costly and time-consuming.

“We set out to solve that, and with Galileo Luna we’re setting new benchmarks for speed, accuracy and cost efficiency. Luna can evaluate millions of responses per month 97% cheaper, 11x faster, and 18% more accurately than evaluating using OpenAI GPT3.5,” Chatterji claimed.

The startup tested the Luna EFMs through a sequence of benchmarks to see how well they performed compared to other AI assessment tools. The findings were quite encouraging.

Chatterji claims that the Luna family outperformed all other evaluation models by as much as 20% in terms of overall accuracy, including surpassing the business’s own Chainpoll LLM, which is intended to identify hallucinations.

Furthermore, the testing demonstrated how much less expensive the Luna EFMs are, with assessment compute costs reportedly up to 30 times lower than those of GPT-3.5. With data provided in a matter of milliseconds, the evaluations are also far faster. The CEO added that because the Luna EFMs can be easily adjusted to identify particular issues with generative AI outputs, they are also far more flexible than other systems.

Furthermore, the Luna EFMs excel in explainability by offering users an explanation of their assessments. According to the startup, doing this can facilitate the debugging and root-cause analysis processes.

The early adopters of the Luna EFMs have endorsed Galileo’s benchmarks.

According to Alex Klug, Head of product, data science, and AI at HP Inc., the industry leader in personal computers, producing safe, dependable, and production-grade AI systems requires precise model evaluation tools. “Until now, existing evaluation methods such as human evaluations or using LLMs as a judge have been very costly and slow. With Luna, Galileo is overcoming enterprise teams’ biggest evaluation hurdles,” added Klug.

The company added that many Fortune ten banks and Fortune 50 organizations are already using the Luna EFMs, which are available in its Galileo Project and Galileo Evaluate platforms.

fundamentos da cmm: um guia para iniciantes em tec...

get a sneak peek into revealx...

the total economic impact™ of extrahop reveal(x)...

7 critical reasons for office 365 backup...

top 5 use cases for splunk enterprise security...

2024 gartner® magic quadrant™ for siem...

the hidden costs of downtime...

the ai philosophy powering digital resilience...

following the leaders: how premier organizations b...

the essential guide to zero trust...

2023 gartner® market guide for security, orchestr...

security and network transformation in the age of ...

protecting data using artificial intelligence and ...

prioritize modern tools to scale video content cre...

the right video tools improve business impact....

prioritize video production speed, cost, and secur...

how cppi is unifying cost management with autodesk...

the westlands advisory 2023 it/ot network protecti...

the westlands advisory 2023 it/ot network protecti...

building for success with off-premises private clo...

deciphering cryptowall ransomware to plot a cyber ...

apache spark maximizing data potential with advanc...

navigating shadow data: securing your sensitive bu...

guide to data center virtualization: management, p...

mastering source code management: best practices a...

bespoke software catalyzing roi: transforming busi...

maximize sales pipeline through marketing automati...

modern web applications security for comprehensive...

application virtualization for revenue returns opt...

profitable content cloud solutions to boost conten...

soap api for web service integration: working and ...

mainframe modernization for capital returns: it in...

maximize profit with marketing spend management so...

leveraging thick data for business revenue returns...

how to make your business recession proof to earn ...

boost channel partner engagement for increased sal...

integrated marketing communication strategy to bui...

customer intelligence for investment returns: mode...

maximizing value from unstructured data to support...

the power of ssrf in optimizing operational effcie...

google llc enhances gemma 2 with 3 new models...

mission cloud launches mission ai foundation to op...

checkly secures usd 20 m for synthetic monitoring...

particle launches tachyon single-board computer fo...

gradient ai raises usd 56 m to innovate in insuran...

uk regulator seeks feedback on google-anthropic pa...

extreme networks inc. announced partnership with i...

kindo reels in usd 20.6 m and acquires whiterabbit...

microsoft’s spreadsheetllm enhances ai’s compr...

herculesai raises usd 26 m to develop and expand i...

intel capital leads usd 15 m investment in ai cons...

snowflake introduces multifactor authentication af...

alphabet call offs hubspot acquisition plans...

tracebit secures usd 5 m to promote cyberthreat de...

immunefi ethereum foundation collaborate on crowd...

ransomware group volcano demon makes phone calls f...

meta releases four open-source language models...

tembo raises usd 14 m to operate postgresql manage...

harvey is reportedly raising usd 100 m at usd 1.5 ...

twilio urges authy users to update their apps afte...

startup zero networks secures usd 20m...

openzeppelin rolls out defender 2.0 for new blockc...

best devsecops practices: securing the agile pipel...

role of machine learning in networking...

actionable big data insights to help you make bett...

new searchlight security module brings extra intel...

14 interesting trends that affect innovation and t...

what is web hosting?...

data privacy best practices every business should ...

Galileo’s Latest Evaluation Foundation Models to Assess LLMs

Highlights:

Insights Desk

Related posts

Google LLC Enhances Gemma 2 with 3 New Models...

Mission Cloud Launches Mission AI Foundation to Op...

Gradient AI Raises USD 56 M to Innovate in Insuran...

UK Regulator Seeks Feedback on Google-Anthropic Pa...

Extreme Networks Inc. Announced Partnership with I...

Kindo Reels in USD 20.6 M and Acquires WhiteRabbit...

Microsoft’s SpreadsheetLLM Enhances AI’s Compr...