Collections: Organization Shared Models

Overview

Property	Value
Model ID	collection/org-shared-models
Display Name	Organization Shared Models
Type	Organization Collection
Access Method	collection/org-shared-models
Scope	Organization-level
Routing Strategy	Least Used

Description

The Organization Shared Models collection is a flexible, team-managed collection that enables organizations to pool and share language models across all members. This collection uses a least-used routing strategy to optimize resource utilization and prevent bottlenecks by automatically distributing requests to the model with the lowest current usage.

Key Characteristics

Team-Managed: Organization admins control which models are included
Smart Distribution: Least-used routing prevents model saturation
Flexible Composition: Mix of different providers and model types
Usage Optimization: Real-time load balancing based on request volume
Shared Resources: All organization members can access the pool
Cost Control: Organization-level quota management applies

Specifications

Aspect	Details
Collection Type	Organization-level
Access Level	Organization members only
Scope	organization
Routing Strategy	Least Used (lowest current usage)
Visibility	Organization (shared with all org members)
Member Count	1 or more models
Access Control	Role-based (admin can modify members)
Quota Tracking	Organization-wide usage

Use Cases

1. Team-Based Model Selection

Shared model pool for development team
Collaborative inference across departments
Democratic access to premium models
Cost-shared model subscriptions

2. Distributed Workload Balancing

Prevent bottlenecks on high-demand models
Automatic load distribution
Smooth out traffic spikes
Optimize cost per request

3. Shared Resource Optimization

Maximize model utilization
Reduce idle time across the pool
Fair resource allocation
Efficient team productivity

4. Organization-Wide Inference Scaling

Scale inference capacity through model diversity
Add/remove models as needs change
Balance between quality and cost
Support multiple use cases with one collection

5. A/B Testing and Experimentation

Test multiple models in production
Gradually shift load between models
Compare model performance at scale
Data-driven model selection

Characteristics of Included Models

Models in organization shared collections typically include:

Aspect	Details
Model Diversity	Mix of providers (OpenAI, Google, Anthropic, etc.)
Size Variety	From 7B lightweight to 405B powerhouse models
Capability Range	Text-only to multimodal models
Cost Profile	Mix of budget and premium models
Context Windows	Varied (4K to 200K tokens)
Specializations	General purpose, code, reasoning, vision

Model Selection Strategy

Least Used Routing

The organization shared models collection uses least-used routing to optimize load:

Usage Tracking: Maintains request count per model
Live Selection: At each request, selects model with lowest usage
Load Balancing: Automatically distributes incoming requests
Fair Distribution: Prevents any single model from becoming saturated
Dynamic: Adapts to real-time usage patterns

Selection Algorithm

For each request:
  1. Query current usage for all collection members
  2. Filter out unavailable/rate-limited models
  3. Select model with minimum usage count
  4. If tied: Use secondary ordering (trust_level, cost)
  5. Route request to selected model
  6. Increment usage counter for tracking

Example Load Distribution

Collection with 3 models: [ModelA, ModelB, ModelC]

Initial state:
  ModelA usage: 0
  ModelB usage: 0
  ModelC usage: 0

Request 1 → ModelA (usage: 0) → ModelA usage now 1
Request 2 → ModelB (usage: 0) → ModelB usage now 1
Request 3 → ModelC (usage: 0) → ModelC usage now 1
Request 4 → ModelA (usage: 1) → ModelA usage now 2
Request 5 → ModelB (usage: 1) → ModelB usage now 2
Request 6 → ModelA (usage: 2) → But now ModelB/C have lower usage!
            → Rebalance to ModelC (usage: 1)

Result: Even distribution across all models

Dynamic Rebalancing

Continuous monitoring of model usage
Weights adjusted based on recent requests
Prefers underutilized models
Falls back to secondary ordering if tied
Handles model unavailability gracefully

Usage

Basic Request

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "collection/org-shared-models",
    "messages": [
      {"role": "user", "content": "Analyze this dataset"}
    ],
    "temperature": 0.7
  }'

With Vision (if available in collection)

curl -X POST https://api.langmart.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "collection/org-shared-models",
    "messages": [
      {
        "role": "user",
        "content": [
          {"type": "text", "text": "What is in this image?"},
          {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
        ]
      }
    ]
  }'

Batch Request with Monitoring

async function batchRequest(prompts) {
    const results = [];

    for (const prompt of prompts) {
        const response = await fetch('https://api.langmart.ai/v1/chat/completions', {
            method: 'POST',
            headers: {
                'Authorization': 'Bearer sk-your-api-key',
                'Content-Type': 'application/json'
            },
            body: JSON.stringify({
                model: 'collection/org-shared-models',
                messages: [{ role: 'user', content: prompt }]
            })
        });

        const data = await response.json();
        results.push({
            prompt,
            response: data.choices[0].message.content,
            selected_model: data._selected_model,
            tokens: data.usage.total_tokens
        });
    }

    return results;
}

Collection Management

Viewing Collection Details

SELECT
    id,
    collection_name,
    display_name,
    description,
    scope,
    routing_strategy,
    organization_id,
    created_at,
    updated_at
FROM model_collections
WHERE collection_name = 'org-shared-models'
  AND is_active = true;

Viewing Collection Members with Usage

SELECT
    mcm.id,
    mc.category_display_id as model_id,
    mc.model_name,
    mc.provider,
    mcm.priority,
    mcm.weight,
    mcm.is_active,
    -- Estimate usage from recent requests
    COUNT(rl.id) as recent_requests_24h
FROM model_collection_members mcm
JOIN model_categories mc ON mcm.model_category_id = mc.id
LEFT JOIN request_logs rl ON (
    rl.response_data->>'_selected_model' = mc.category_display_id
    AND rl.created_at > NOW() - INTERVAL '24 hours'
)
WHERE mcm.collection_id = (
    SELECT id FROM model_collections
    WHERE collection_name = 'org-shared-models'
)
GROUP BY mcm.id, mc.category_display_id, mc.model_name, mc.provider, mcm.priority, mcm.weight, mcm.is_active
ORDER BY recent_requests_24h DESC;

API: List Collection Models

curl -X GET https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
  -H "Authorization: Bearer sk-your-api-key" | jq '.models'

API: Add Model to Collection

curl -X POST https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model_id": "anthropic/claude-3-opus"
  }'

API: Remove Model from Collection

curl -X DELETE "https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members/anthropic%2Fclaude-3-opus" \
  -H "Authorization: Bearer sk-your-api-key"

API: Update Collection Settings

curl -X PUT https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "Premium Shared Models",
    "description": "High-quality models for production use",
    "routing_strategy": "least_used"
  }'

Response Format

Responses include collection and load-balancing metadata:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on the dataset analysis..."
      }
    }
  ],
  "usage": {
    "prompt_tokens": 150,
    "completion_tokens": 200,
    "total_tokens": 350
  },
  "_collection_routed": true,
  "_collection_name": "org-shared-models",
  "_selected_model": "openai/gpt-4o",
  "_routing_strategy": "least_used",
  "_usage_at_selection": 45,
  "_model_alternatives": [
    {"model": "anthropic/claude-3-opus", "usage": 62},
    {"model": "google/gemini-2.5-pro", "usage": 51}
  ]
}

Performance Characteristics

Metric	Target
Selection Latency	<2ms
Load Balancing Efficiency	±10% across models
P50 Response Time	<3 seconds (varies by model)
P95 Response Time	<8 seconds (varies by model)
Usage Tracking Accuracy	Real-time (within 100ms)
Cache TTL	10 seconds (usage-based, always fresh)
Request Success Rate	>99% (with fallback)

Load Balancing Example

Scenario: 3-Model Collection

Scenario Setup:
  - ModelA: Premium, expensive, very capable
  - ModelB: Mid-tier, balanced cost/quality
  - ModelC: Budget, lightweight, fast

Over 1 hour with 1000 requests:

Without load balancing (users pick ModelA):
  - ModelA: 900 requests
  - ModelB: 100 requests
  - ModelC: 0 requests
  → ModelA overloaded, expensive, inefficient

With least-used routing:
  - ModelA: 333 requests
  - ModelB: 333 requests
  - ModelC: 334 requests
  → Balanced usage, fair distribution, cost-optimized

Typical Collection Composition

A typical organization shared models collection might include:

1. openai/gpt-4o
   - High capability, premium tier
   - Used for critical/complex tasks

2. anthropic/claude-3-opus
   - Excellent reasoning, long context (200K)
   - Research and analysis tasks

3. google/gemini-2.5-pro
   - Multimodal, fast, cost-effective
   - General purpose tasks

4. mistralai/mistral-large
   - Open source, efficient, good quality
   - General conversation

5. groq/llama-3.3-70b-versatile
   - Fast inference, free option
   - High-volume requests

Usage Metrics and Analytics

Request Distribution (Last 24h)

SELECT
    response_data->>'_selected_model' as model,
    COUNT(*) as request_count,
    ROUND(AVG(CAST(response_data->>'_latency_ms' AS NUMERIC)), 2) as avg_latency,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
  AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY response_data->>'_selected_model'
ORDER BY request_count DESC;

Cost Analysis by Model

SELECT
    response_data->>'_selected_model' as model,
    COUNT(*) as requests,
    SUM(CAST(response_data->>'total_tokens' AS INTEGER)) as total_tokens,
    ROUND(SUM(CAST(response_data->>'_cost' AS NUMERIC)), 2) as total_cost,
    ROUND(AVG(CAST(response_data->>'_cost' AS NUMERIC)), 4) as avg_cost_per_request
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
  AND created_at > NOW() - INTERVAL '7 days'
GROUP BY response_data->>'_selected_model'
ORDER BY total_cost DESC;

Load Balance Efficiency

WITH model_usage AS (
    SELECT
        response_data->>'_selected_model' as model,
        COUNT(*) as request_count,
        ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
    FROM request_logs
    WHERE request_data->>'model' = 'collection/org-shared-models'
      AND created_at > NOW() - INTERVAL '24 hours'
    GROUP BY response_data->>'_selected_model'
)
SELECT
    COUNT(*) as total_models,
    ROUND(AVG(percentage), 2) as ideal_percentage,
    ROUND(STDDEV(percentage), 2) as distribution_stddev,
    MAX(percentage) as max_percentage,
    MIN(percentage) as min_percentage
FROM model_usage;

Error Handling

Collection Not Found

{
  "error": {
    "code": "collection_not_found",
    "message": "Collection 'org-shared-models' not found or access denied"
  }
}

No Available Models

{
  "error": {
    "code": "collection_no_models",
    "message": "No models available in org-shared-models collection",
    "details": {
      "total_members": 5,
      "available_models": 0,
      "reasons": [
        "model-1: rate limited until 2025-12-28T14:30:00Z",
        "model-2: temporarily unavailable",
        "model-3: quota exceeded",
        "model-4: user access denied",
        "model-5: health check failed"
      ]
    }
  }
}

Partial Availability with Fallback

{
  "success": true,
  "model": "collection/org-shared-models",
  "_original_selection": "openai/gpt-4o",
  "_fallback_used": true,
  "_fallback_reason": "rate_limited",
  "_selected_model": "anthropic/claude-3-opus",
  "choices": [...]
}

Access Denied for Organization

{
  "error": {
    "code": "access_denied",
    "message": "Your organization does not have access to org-shared-models collection",
    "required_scope": "organization",
    "user_scope": "individual"
  }
}

Billing and Credits

Cost: Based on selected model (varies)
Tracking: Per-model usage tracked in request_logs
Attribution: Each request logs actual model used
Organization Quotas: Collection requests count toward organization monthly quota
Cost Reporting: Detailed breakdown by model and user
Shared Budget: Organization credits shared across all members

Limits and Constraints

Constraint	Value
Min Models	1
Max Models	Unlimited (practical: 10-20)
Name Length	100 characters
Description Length	Text field (no limit)
Member Add/Remove	Instant (no queue)
Request Rate	Model-dependent (typically >100 req/s)
Concurrent Requests	No limit at collection level
Cache Invalidation	<1 second (usage-based)

Integration Examples

Example 1: Data Analysis Pipeline

import anthropic
import json

client = anthropic.Anthropic(
    api_key="sk-your-api-key",
    base_url="https://api.langmart.ai/v1"
)

def analyze_data(data_json):
    """Analyze data using org shared models collection"""
    message = client.messages.create(
        model="collection/org-shared-models",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Analyze this data and provide insights:

{json.dumps(data_json, indent=2)}

Provide:
1. Summary statistics
2. Trends or patterns
3. Recommendations"""
            }
        ]
    )

    return {
        "analysis": message.content[0].text,
        "selected_model": message.model,
        "tokens_used": message.usage.output_tokens + message.usage.input_tokens
    }

Example 2: Multi-Model Comparison

async function compareModels(prompt) {
    // Get collection info
    const collectionResponse = await fetch(
        'https://api.langmart.ai/api/user/model-collections',
        {
            headers: { 'Authorization': 'Bearer sk-your-api-key' }
        }
    );

    const collections = await collectionResponse.json();
    const orgSharedCollection = collections.collections.find(
        c => c.name === 'org-shared-models'
    );

    // Get all models in collection
    const collectionDetails = await fetch(
        `https://api.langmart.ai/api/user/model-collections/${orgSharedCollection.id}`,
        {
            headers: { 'Authorization': 'Bearer sk-your-api-key' }
        }
    );

    const details = await collectionDetails.json();
    const models = details.models || [];

    // Collect load metrics
    const loadMetrics = models.map(m => ({
        model: m.model_id || m.id,
        recentRequests: m.recent_requests_24h || 0,
        usage: m.usage_percentage || 0
    }));

    return {
        collectionName: 'org-shared-models',
        totalModels: models.length,
        loadDistribution: loadMetrics,
        recommendedAction: calculateLoadRecommendation(loadMetrics)
    };
}

Administration and Governance

Organizational Policies

Access Control: Only organization members can use the collection
Model Management: Organization admins manage collection membership
Cost Governance: Shared quota management applies
Usage Reporting: Detailed usage reports per team member
Audit Trail: All model add/remove operations logged

Best Practices

Start Small: Begin with 3-5 models, expand as needed
Model Mix: Include both premium and cost-effective models
Monitoring: Regularly review usage distribution
Rebalancing: Add/remove models based on actual usage patterns
Documentation: Document which models serve which use cases
Testing: Periodically benchmark models in production

Database Schema

See /datastore/tables/99_model_collections.sql for complete schema details.

Key Tables

model_collections: Collection metadata
model_collection_members: Collection membership and weights
model_categories: Available models
request_logs: Request history and load metrics
organizations: Organization details

Collection Tools: /gateway-type3/collection-tools.ts
Model Collection Router: /gateway-type1/lib/services/model-collection-router.ts
Database Schema: /datastore/tables/99_model_collections.sql
API Endpoints: /api/user/model-collections/*

Monitoring Dashboard

Key metrics to monitor for organization shared models collection:

Real-Time Dashboard:
├── Load Distribution
│   ├── Model usage percentages
│   ├── Request count per model
│   └── Distribution fairness (std dev)
├── Performance
│   ├── P50, P95, P99 latencies
│   ├── Request success rate
│   └── Error rate by model
├── Costs
│   ├── Total cost (24h, 7d, 30d)
│   ├── Cost per request
│   └── Cost by model
└── Health
    ├── Model availability
    ├── Rate limit status
    └── Recent errors

Version History

v1.0 (2025-12-20): Initial organization collection implementation
Least-used routing strategy implemented
Load balancing and usage tracking enabled
Collection membership API created
Organization-scoped access control added

Collections: Organization Shared Models

Collections: Organization Shared Models

Overview

Description

Key Characteristics

Specifications

Use Cases

1. Team-Based Model Selection

2. Distributed Workload Balancing

3. Shared Resource Optimization

4. Organization-Wide Inference Scaling

5. A/B Testing and Experimentation

Characteristics of Included Models

Model Selection Strategy

Least Used Routing

Selection Algorithm

Example Load Distribution

Dynamic Rebalancing

Usage

Basic Request

With Vision (if available in collection)

Batch Request with Monitoring

Collection Management

Viewing Collection Details

Viewing Collection Members with Usage

API: List Collection Models

API: Add Model to Collection

API: Remove Model from Collection

API: Update Collection Settings

Response Format

Performance Characteristics

Load Balancing Example

Scenario: 3-Model Collection

Typical Collection Composition

Usage Metrics and Analytics

Request Distribution (Last 24h)

Cost Analysis by Model

Load Balance Efficiency

Error Handling

Collection Not Found

No Available Models

Partial Availability with Fallback

Access Denied for Organization

Billing and Credits

Limits and Constraints

Integration Examples

Example 1: Data Analysis Pipeline

Example 2: Multi-Model Comparison

Administration and Governance

Organizational Policies

Best Practices

Database Schema

Key Tables

Related Resources

Monitoring Dashboard

Version History