Selene by Atla AI: Open Source LLM Judge for AI App Evaluation

Selene

3.5 | 126 | 0
Type:
Open Source Projects
Last Updated:
2025/09/14
Description:
Selene by Atla AI provides precise judgments on your AI app's performance. Explore open source LLM Judge models for industry-leading accuracy and reliable AI evaluation.
Share:
LLM evaluation
AI judge
model evaluation
open source AI
AI reliability

Overview of Selene

Selene by Atla AI: Frontier AI Evaluation Models

What is Selene?

Selene is a suite of open-source LLM Judge models developed by Atla AI, designed to provide precise and reliable evaluations of AI application performance. It helps developers build trust with customers by ensuring the reliability of their generative AI apps through detailed scores and actionable critiques.

How does Selene work?

Selene models function as LLM-as-a-Judge, analyzing AI responses to provide scores and critiques. You can use the Selene models through Hugging Face Transformers, Ollama, or Github.

Selene Models

Explore the right size for your evaluation needs with two primary models:

  • Selene 1: The flagship model offering industry-leading accuracy across a wide variety of evaluation tasks. Ideal for pre-production evaluations.
  • Selene 1 Mini: A lean, optimized version perfect for running evaluations at inference time, prioritizing speed and efficiency.

Key Features and Benefits

  • High Accuracy: Selene is designed to provide the most accurate evaluations available.
  • Versatile Evaluation: Suitable for a wide variety of eval tasks.
  • Optimized for Speed: Selene 1 Mini is optimized for running evals quickly during inference.
  • Open Source: Use and contribute to the models through Hugging Face Transformers.

How to Use Selene

To use Selene, you can leverage the Hugging Face Transformers library. Here's a simple example:

from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda"  # the device to load the model onto
model_id = "AtlaAI/Selene-1-Mini-Llama-3.1-8B"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt = "I heard you can evaluate my responses?"  # replace with your eval prompt

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Use Cases

  • Evaluating Agent Performance: Use Selene to evaluate the performance of AI agents, track errors, and gain instant insights.
  • Building Trust: Ensure the reliability of your generative AI app to build trust with customers.
  • Pre-Production Evals: Use Selene 1 for rigorous evaluations before deploying your AI application.
  • Inference-Time Evals: Use Selene 1 Mini for quick evaluations during inference.

Why is Selene important?

As AI applications become more prevalent, ensuring their reliability and trustworthiness is crucial. Selene provides a robust and accurate means of evaluating AI performance, empowering developers to create safer and more reliable AI systems. It is particularly important for building trust with customers, especially in generative AI applications where outputs can be unpredictable.

Where can I use Selene?

You can integrate Selene into your AI development workflow using Hugging Face Transformers. Also, you can explore Agent Evals by Atla to enhance and track Agents.

By providing open-source evaluation models, Atla AI contributes to a future with safe and reliable AI.

Best Alternative Tools to "Selene"

Browse AI
No Image Available
420 0

Browse AI: Extract web data, monitor changes, and turn websites into APIs without coding. AI-powered for easy and reliable data extraction.

web scraping
data extraction
昇思MindSpore
No Image Available
392 0

Huawei's open-source AI framework MindSpore. Automatic differentiation and parallelization, one training, multi-scenario deployment. Deep learning training and inference framework supporting all scenarios of the end-side cloud, mainly used in computer vision, natural language processing and other AI fields, for data scientists, algorithm engineers and other people.

AI Framework
Deep Learning
Amanu
No Image Available
469 0

Build Telegram apps for AI startups fast. Chatbots, Mini Apps and AI infrastructure. From idea to MVP in 4 weeks.

Telegram
Chatbots
Mini Apps
EnergeticAI
No Image Available
167 0

EnergeticAI is TensorFlow.js optimized for serverless functions, offering fast cold-start, small module size, and pre-trained models, making AI accessible in Node.js apps up to 67x faster.

serverless AI
node.js
tensorflow.js
Rowy
No Image Available
132 0

Rowy is an open-source, Airtable-like CMS for Firestore with a low-code platform for Firebase and Google Cloud. Manage your database, build backend cloud functions, and automate workflows effortlessly.

low-code
firebase backend
Focus Gulf
No Image Available
117 0

Focus Gulf is a leading supplier of industrial equipment and spare parts in Saudi Arabia. Discover quality products tailored to your business needs, including pumps, generators, and testing tools.

industrial equipment
spare parts
DomainScore.ai
No Image Available
88 0

DomainScore.ai is an AI-powered tool providing comprehensive domain name evaluation and scoring based on relevance, brandability, trustworthiness, SEO, and simplicity.

domain analysis
SEO domain
Visage Technologies
No Image Available
243 0

Visage Technologies specializes in AI/ML solutions, offering consultancy and engineering services optimized for performance, accuracy, and compliance. Experts in edge AI and computer vision.

computer vision
edge AI
CopyFrog
No Image Available
150 0

CopyFrog is an AI content creator that generates high-quality images, text, and video content for marketing, social media, and product descriptions. Try it for free!

AI content creation
image generation
KushoAI
No Image Available
219 0

KushoAI transforms your inputs into a comprehensive ready-to-run test suite. Test web interfaces and backend APIs in minutes with our AI Agents.

AI testing
test automation
AI agent
myGPTReader
No Image Available
254 0

myGPTReader: AI chatbot for reading and summarizing web pages, documents, and YouTube videos, powered by chatGPT.

AI chatbot
chatGPT
document reader
Refact.ai
No Image Available
335 0

Refact.ai, the #1 open-source AI agent for software development, automates coding, debugging, and testing with full context awareness. An open-source alternative to Cursor and Copilot.

AI coding assistant
code generation
Predibase
No Image Available
166 0

Predibase is a developer platform for fine-tuning and serving open-source LLMs. Achieve unmatched accuracy and speed with end-to-end training and serving infrastructure, featuring reinforcement fine-tuning.

LLM
fine-tuning
model serving
Molmo AI
No Image Available
95 0

Discover Molmo AI, the state-of-the-art open-source multimodal AI model. Powerful, free, and easy to use for image processing, text analysis, and more.

multimodal
AI model
open-source
CloudVerse AI
No Image Available
213 0

CloudVerse.AI is a cloud financial management platform for multicloud FinOps, optimizing spending with AI-driven insights.

FinOps
cloud cost management