C

Collections: Organization Shared Models

Collection
Vision
N/A
Context
N/A
Input /1M
N/A
Output /1M
N/A
Max Output

Collections: Organization Shared Models

Overview

Property Value
Model ID collection/org-shared-models
Display Name Organization Shared Models
Type Organization Collection
Access Method collection/org-shared-models
Scope Organization-level
Routing Strategy Least Used

Description

The Organization Shared Models collection is a flexible, team-managed collection that enables organizations to pool and share language models across all members. This collection uses a least-used routing strategy to optimize resource utilization and prevent bottlenecks by automatically distributing requests to the model with the lowest current usage.

Key Characteristics

  • Team-Managed: Organization admins control which models are included
  • Smart Distribution: Least-used routing prevents model saturation
  • Flexible Composition: Mix of different providers and model types
  • Usage Optimization: Real-time load balancing based on request volume
  • Shared Resources: All organization members can access the pool
  • Cost Control: Organization-level quota management applies

Specifications

Aspect Details
Collection Type Organization-level
Access Level Organization members only
Scope organization
Routing Strategy Least Used (lowest current usage)
Visibility Organization (shared with all org members)
Member Count 1 or more models
Access Control Role-based (admin can modify members)
Quota Tracking Organization-wide usage

Use Cases

1. Team-Based Model Selection

  • Shared model pool for development team
  • Collaborative inference across departments
  • Democratic access to premium models
  • Cost-shared model subscriptions

2. Distributed Workload Balancing

  • Prevent bottlenecks on high-demand models
  • Automatic load distribution
  • Smooth out traffic spikes
  • Optimize cost per request

3. Shared Resource Optimization

  • Maximize model utilization
  • Reduce idle time across the pool
  • Fair resource allocation
  • Efficient team productivity

4. Organization-Wide Inference Scaling

  • Scale inference capacity through model diversity
  • Add/remove models as needs change
  • Balance between quality and cost
  • Support multiple use cases with one collection

5. A/B Testing and Experimentation

  • Test multiple models in production
  • Gradually shift load between models
  • Compare model performance at scale
  • Data-driven model selection

Characteristics of Included Models

Models in organization shared collections typically include:

Aspect Details
Model Diversity Mix of providers (OpenAI, Google, Anthropic, etc.)
Size Variety From 7B lightweight to 405B powerhouse models
Capability Range Text-only to multimodal models
Cost Profile Mix of budget and premium models
Context Windows Varied (4K to 200K tokens)
Specializations General purpose, code, reasoning, vision

Model Selection Strategy

Least Used Routing

The organization shared models collection uses least-used routing to optimize load:

  1. Usage Tracking: Maintains request count per model
  2. Live Selection: At each request, selects model with lowest usage
  3. Load Balancing: Automatically distributes incoming requests
  4. Fair Distribution: Prevents any single model from becoming saturated
  5. Dynamic: Adapts to real-time usage patterns

Selection Algorithm

For each request:
  1. Query current usage for all collection members
  2. Filter out unavailable/rate-limited models
  3. Select model with minimum usage count
  4. If tied: Use secondary ordering (trust_level, cost)
  5. Route request to selected model
  6. Increment usage counter for tracking

Example Load Distribution

Collection with 3 models: [ModelA, ModelB, ModelC]

Initial state:
  ModelA usage: 0
  ModelB usage: 0
  ModelC usage: 0

Request 1 → ModelA (usage: 0) → ModelA usage now 1
Request 2 → ModelB (usage: 0) → ModelB usage now 1
Request 3 → ModelC (usage: 0) → ModelC usage now 1
Request 4 → ModelA (usage: 1) → ModelA usage now 2
Request 5 → ModelB (usage: 1) → ModelB usage now 2
Request 6 → ModelA (usage: 2) → But now ModelB/C have lower usage!
            → Rebalance to ModelC (usage: 1)

Result: Even distribution across all models

Dynamic Rebalancing

  • Continuous monitoring of model usage
  • Weights adjusted based on recent requests
  • Prefers underutilized models
  • Falls back to secondary ordering if tied
  • Handles model unavailability gracefully

Usage

Basic Request

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "collection/org-shared-models",
    "messages": [
      {"role": "user", "content": "Analyze this dataset"}
    ],
    "temperature": 0.7
  }'

With Vision (if available in collection)

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "collection/org-shared-models",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

Batch Request with Monitoring

async function batchRequest(prompts) {
    const results = [];

    for (const prompt of prompts) {
        const response = await fetch('https://api.langmart.ai/v1/chat/completions', {
            method: 'POST',
            headers: {
                'Authorization': 'Bearer sk-your-api-key',
                'Content-Type': 'application/json'
            },
            body: JSON.stringify({
                model: 'collection/org-shared-models',
                messages: [{ role: 'user', content: prompt }]
            })
        });

        const data = await response.json();
        results.push({
            prompt,
            response: data.choices[0].message.content,
            selected_model: data._selected_model,
            tokens: data.usage.total_tokens
        });
    }

    return results;
}

Collection Management

Viewing Collection Details

SELECT
    id,
    collection_name,
    display_name,
    description,
    scope,
    routing_strategy,
    organization_id,
    created_at,
    updated_at
FROM model_collections
WHERE collection_name = 'org-shared-models'
  AND is_active = true;

Viewing Collection Members with Usage

SELECT
    mcm.id,
    mc.category_display_id as model_id,
    mc.model_name,
    mc.provider,
    mcm.priority,
    mcm.weight,
    mcm.is_active,
    -- Estimate usage from recent requests
    COUNT(rl.id) as recent_requests_24h
FROM model_collection_members mcm
JOIN model_categories mc ON mcm.model_category_id = mc.id
LEFT JOIN request_logs rl ON (
    rl.response_data->>'_selected_model' = mc.category_display_id
    AND rl.created_at > NOW() - INTERVAL '24 hours'
)
WHERE mcm.collection_id = (
    SELECT id FROM model_collections
    WHERE collection_name = 'org-shared-models'
)
GROUP BY mcm.id, mc.category_display_id, mc.model_name, mc.provider, mcm.priority, mcm.weight, mcm.is_active
ORDER BY recent_requests_24h DESC;

API: List Collection Models

curl -X GET https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
  -H "Authorization: Bearer sk-your-api-key" | jq '.models'

API: Add Model to Collection

curl -X POST https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "anthropic/claude-3-opus"
  }'

API: Remove Model from Collection

curl -X DELETE "https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members/anthropic%2Fclaude-3-opus" \
  -H "Authorization: Bearer sk-your-api-key"

API: Update Collection Settings

curl -X PUT https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "Premium Shared Models",
    "description": "High-quality models for production use",
    "routing_strategy": "least_used"
  }'

Response Format

Responses include collection and load-balancing metadata:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on the dataset analysis..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350
  },
  "_collection_routed": true,
  "_collection_name": "org-shared-models",
  "_selected_model": "openai/gpt-4o",
  "_routing_strategy": "least_used",
  "_usage_at_selection": 45,
  "_model_alternatives": [
    {"model": "anthropic/claude-3-opus", "usage": 62},
    {"model": "google/gemini-2.5-pro", "usage": 51}
  ]
}

Performance Characteristics

Metric Target
Selection Latency <2ms
Load Balancing Efficiency ±10% across models
P50 Response Time <3 seconds (varies by model)
P95 Response Time <8 seconds (varies by model)
Usage Tracking Accuracy Real-time (within 100ms)
Cache TTL 10 seconds (usage-based, always fresh)
Request Success Rate >99% (with fallback)

Load Balancing Example

Scenario: 3-Model Collection

Scenario Setup:
  - ModelA: Premium, expensive, very capable
  - ModelB: Mid-tier, balanced cost/quality
  - ModelC: Budget, lightweight, fast

Over 1 hour with 1000 requests:

Without load balancing (users pick ModelA):
  - ModelA: 900 requests
  - ModelB: 100 requests
  - ModelC: 0 requests
  → ModelA overloaded, expensive, inefficient

With least-used routing:
  - ModelA: 333 requests
  - ModelB: 333 requests
  - ModelC: 334 requests
  → Balanced usage, fair distribution, cost-optimized

Typical Collection Composition

A typical organization shared models collection might include:

1. openai/gpt-4o
   - High capability, premium tier
   - Used for critical/complex tasks

2. anthropic/claude-3-opus
   - Excellent reasoning, long context (200K)
   - Research and analysis tasks

3. google/gemini-2.5-pro
   - Multimodal, fast, cost-effective
   - General purpose tasks

4. mistralai/mistral-large
   - Open source, efficient, good quality
   - General conversation

5. groq/llama-3.3-70b-versatile
   - Fast inference, free option
   - High-volume requests

Usage Metrics and Analytics

Request Distribution (Last 24h)

SELECT
    response_data->>'_selected_model' as model,
    COUNT(*) as request_count,
    ROUND(AVG(CAST(response_data->>'_latency_ms' AS NUMERIC)), 2) as avg_latency,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
  AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY response_data->>'_selected_model'
ORDER BY request_count DESC;

Cost Analysis by Model

SELECT
    response_data->>'_selected_model' as model,
    COUNT(*) as requests,
    SUM(CAST(response_data->>'total_tokens' AS INTEGER)) as total_tokens,
    ROUND(SUM(CAST(response_data->>'_cost' AS NUMERIC)), 2) as total_cost,
    ROUND(AVG(CAST(response_data->>'_cost' AS NUMERIC)), 4) as avg_cost_per_request
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY response_data->>'_selected_model'
ORDER BY total_cost DESC;

Load Balance Efficiency

WITH model_usage AS (
    SELECT
        response_data->>'_selected_model' as model,
        COUNT(*) as request_count,
        ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
    FROM request_logs
    WHERE request_data->>'model' = 'collection/org-shared-models'
      AND created_at > NOW() - INTERVAL '24 hours'
    GROUP BY response_data->>'_selected_model'
)
SELECT
    COUNT(*) as total_models,
    ROUND(AVG(percentage), 2) as ideal_percentage,
    ROUND(STDDEV(percentage), 2) as distribution_stddev,
    MAX(percentage) as max_percentage,
    MIN(percentage) as min_percentage
FROM model_usage;

Error Handling

Collection Not Found

{
  "error": {
    "code": "collection_not_found",
    "message": "Collection 'org-shared-models' not found or access denied"
  }
}

No Available Models

{
  "error": {
    "code": "collection_no_models",
    "message": "No models available in org-shared-models collection",
    "details": {
      "total_members": 5,
      "available_models": 0,
      "reasons": [
        "model-1: rate limited until 2025-12-28T14:30:00Z",
        "model-2: temporarily unavailable",
        "model-3: quota exceeded",
        "model-4: user access denied",
        "model-5: health check failed"
      ]
    }
  }
}

Partial Availability with Fallback

{
  "success": true,
  "model": "collection/org-shared-models",
  "_original_selection": "openai/gpt-4o",
  "_fallback_used": true,
  "_fallback_reason": "rate_limited",
  "_selected_model": "anthropic/claude-3-opus",
  "choices": [...]
}

Access Denied for Organization

{
  "error": {
    "code": "access_denied",
    "message": "Your organization does not have access to org-shared-models collection",
    "required_scope": "organization",
    "user_scope": "individual"
  }
}

Billing and Credits

  • Cost: Based on selected model (varies)
  • Tracking: Per-model usage tracked in request_logs
  • Attribution: Each request logs actual model used
  • Organization Quotas: Collection requests count toward organization monthly quota
  • Cost Reporting: Detailed breakdown by model and user
  • Shared Budget: Organization credits shared across all members

Limits and Constraints

Constraint Value
Min Models 1
Max Models Unlimited (practical: 10-20)
Name Length 100 characters
Description Length Text field (no limit)
Member Add/Remove Instant (no queue)
Request Rate Model-dependent (typically >100 req/s)
Concurrent Requests No limit at collection level
Cache Invalidation <1 second (usage-based)

Integration Examples

Example 1: Data Analysis Pipeline

import anthropic
import json

client = anthropic.Anthropic(
    api_key="sk-your-api-key",
    base_url="https://api.langmart.ai/v1"
)

def analyze_data(data_json):
    """Analyze data using org shared models collection"""
    message = client.messages.create(
        model="collection/org-shared-models",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Analyze this data and provide insights:

{json.dumps(data_json, indent=2)}

Provide:
1. Summary statistics
2. Trends or patterns
3. Recommendations"""
            }
        ]
    )

    return {
        "analysis": message.content[0].text,
        "selected_model": message.model,
        "tokens_used": message.usage.output_tokens + message.usage.input_tokens
    }

Example 2: Multi-Model Comparison

async function compareModels(prompt) {
    // Get collection info
    const collectionResponse = await fetch(
        'https://api.langmart.ai/api/user/model-collections',
        {
            headers: { 'Authorization': 'Bearer sk-your-api-key' }
        }
    );

    const collections = await collectionResponse.json();
    const orgSharedCollection = collections.collections.find(
        c => c.name === 'org-shared-models'
    );

    // Get all models in collection
    const collectionDetails = await fetch(
        `https://api.langmart.ai/api/user/model-collections/${orgSharedCollection.id}`,
        {
            headers: { 'Authorization': 'Bearer sk-your-api-key' }
        }
    );

    const details = await collectionDetails.json();
    const models = details.models || [];

    // Collect load metrics
    const loadMetrics = models.map(m => ({
        model: m.model_id || m.id,
        recentRequests: m.recent_requests_24h || 0,
        usage: m.usage_percentage || 0
    }));

    return {
        collectionName: 'org-shared-models',
        totalModels: models.length,
        loadDistribution: loadMetrics,
        recommendedAction: calculateLoadRecommendation(loadMetrics)
    };
}

Administration and Governance

Organizational Policies

  • Access Control: Only organization members can use the collection
  • Model Management: Organization admins manage collection membership
  • Cost Governance: Shared quota management applies
  • Usage Reporting: Detailed usage reports per team member
  • Audit Trail: All model add/remove operations logged

Best Practices

  1. Start Small: Begin with 3-5 models, expand as needed
  2. Model Mix: Include both premium and cost-effective models
  3. Monitoring: Regularly review usage distribution
  4. Rebalancing: Add/remove models based on actual usage patterns
  5. Documentation: Document which models serve which use cases
  6. Testing: Periodically benchmark models in production

Database Schema

See /datastore/tables/99_model_collections.sql for complete schema details.

Key Tables

  • model_collections: Collection metadata
  • model_collection_members: Collection membership and weights
  • model_categories: Available models
  • request_logs: Request history and load metrics
  • organizations: Organization details
  • Collection Tools: /gateway-type3/collection-tools.ts
  • Model Collection Router: /gateway-type1/lib/services/model-collection-router.ts
  • Database Schema: /datastore/tables/99_model_collections.sql
  • API Endpoints: /api/user/model-collections/*

Monitoring Dashboard

Key metrics to monitor for organization shared models collection:

Real-Time Dashboard:
├── Load Distribution
│   ├── Model usage percentages
│   ├── Request count per model
│   └── Distribution fairness (std dev)
├── Performance
│   ├── P50, P95, P99 latencies
│   ├── Request success rate
│   └── Error rate by model
├── Costs
│   ├── Total cost (24h, 7d, 30d)
│   ├── Cost per request
│   └── Cost by model
└── Health
    ├── Model availability
    ├── Rate limit status
    └── Recent errors

Version History

  • v1.0 (2025-12-20): Initial organization collection implementation
  • Least-used routing strategy implemented
  • Load balancing and usage tracking enabled
  • Collection membership API created
  • Organization-scoped access control added