Gremlin · by EvalOps
Documentation

Build evaluation‑nativeAI with Gremlin

Everything you need to integrate Gremlin into your AI workflows. From quickstart guides to advanced evaluation patterns, our documentation helps you build reliable, measurable AI systems with confidence.
Getting started

From zero to evaluation in minutes

Get up and running with Gremlin in just a few steps. Our quickstart guide walks you through installation, basic configuration, and your first evaluation to demonstrate the power of evaluation-native AI development.
01 ::
5‑minute quickstart
Install the SDK, configure your first evaluator, and run evaluations on sample data.
02 ::
Interactive tutorials
Step-by-step walkthroughs with executable code examples and real evaluation scenarios.
03 ::
Example projects
Complete reference implementations for common AI use cases and evaluation patterns.
04 ::
Migration guides
Seamlessly migrate from existing evaluation frameworks with detailed migration paths.
Core concepts

Understanding evaluation‑native AI

Master the fundamental concepts that make Gremlin different. Understanding these building blocks will help you design more effective evaluation strategies and build more reliable AI systems.
01 ::
Evaluation primitives
Learn about assertions, judges, golden sets, and how they compose into powerful evaluation workflows.
02 ::
Agent control loops
Understand how evaluations integrate directly into your agent's decision-making process.
03 ::
Metrics and scoring
Design meaningful metrics that capture accuracy, safety, cost, and performance dimensions.
04 ::
Guardrails and policies
Implement safety checks, fallback strategies, and policy enforcement at every step.
API Reference

Comprehensive API documentation

Complete reference documentation for all Gremlin APIs, SDKs, and integration points. Explore endpoints, authentication, request/response formats, and error handling with interactive examples.
SDKs & Tools

Language‑specific guides

Use Gremlin in your preferred programming language. Our SDKs provide idiomatic interfaces that feel natural while maintaining consistency across different development environments.
01 ::
Python SDK
Native Python integration with async support, type hints, and Jupyter notebook compatibility.
02 ::
TypeScript/JavaScript
Full-featured Node.js and browser support with TypeScript definitions and React hooks.
03 ::
Go SDK
High-performance Go client with context support, structured logging, and middleware patterns.
04 ::
REST API
Language-agnostic HTTP API with OpenAPI specifications and comprehensive curl examples.
Integration patterns

Connect with your existing stack

Gremlin integrates seamlessly with popular AI frameworks and tools. Learn how to add evaluation-native capabilities to your existing workflows without major architectural changes.
01 ::
LangChain integration
Drop-in evaluators for LangChain agents with automatic chain instrumentation and callback handling.
02 ::
LlamaIndex support
Evaluate retrieval quality, response relevance, and query performance in RAG applications.
03 ::
MLOps platforms
Connect with MLflow, Weights & Biases, Neptune, and other experiment tracking platforms.
04 ::
CI/CD pipelines
Automated evaluation in GitHub Actions, Jenkins, and other CI systems with detailed reporting.
Advanced topics

Expert‑level evaluation techniques

Deep dive into advanced evaluation patterns for production AI systems. Learn how to handle edge cases, scale to enterprise workloads, and maintain evaluation quality as your system grows.
01 ::
Custom evaluators
Build domain-specific evaluators with custom metrics, scoring functions, and validation logic.
02 ::
Distributed evaluation
Scale evaluations across multiple workers with load balancing and fault tolerance.
03 ::
A/B testing frameworks
Statistical significance testing, experiment design, and gradual rollout strategies.
04 ::
Production monitoring
Real-time evaluation monitoring, alerting, and automated incident response workflows.
Use cases & examples

Real‑world applications

Explore detailed examples of how teams use Gremlin to solve real evaluation challenges. Each use case includes complete code examples, evaluation strategies, and lessons learned from production deployments.
Code examples

See Gremlin in action

Interactive code examples you can run locally or in our playground. Start with simple evaluations and progress to complex multi-step agent workflows with full observability.

Explore the documentation

Complete guides coming soon - join our waitlist for early access

Coming Soon

Quickstart Guide

Get up and running with your first evaluation in under 5 minutes.

Tutorial
Coming Soon
📖

API Reference

Complete API documentation with interactive examples and schemas.

Reference
Coming Soon
🐍

Python SDK

Native Python integration with async support and type safety.

SDK Guide
Coming Soon
📦

JavaScript SDK

Full-featured Node.js and browser support with TypeScript definitions.

SDK Guide
Coming Soon
🔗

LangChain Integration

Add evaluation to your LangChain agents with drop-in evaluators.

Integration
Coming Soon
🚀

Production Deployment

Best practices for deploying evaluations in production environments.

Guide
Community

Get help when you need it

Join our community of developers building evaluation-native AI. Get help with implementation, share best practices, and collaborate on the future of AI evaluation.
Discord community
Coming soon - Join developers discussing AI evaluation
GitHub discussions
Coming soon - Ask questions and share knowledge

Documentation stats

150+
Code examples
50+
Integration guides
25+
Use case studies
99%
Uptime