Collections: Organization Shared Models
Overview
| Property | Value |
|---|---|
| Model ID | collection/org-shared-models |
| Display Name | Organization Shared Models |
| Type | Organization Collection |
| Access Method | collection/org-shared-models |
| Scope | Organization-level |
| Routing Strategy | Least Used |
Description
The Organization Shared Models collection is a flexible, team-managed collection that enables organizations to pool and share language models across all members. This collection uses a least-used routing strategy to optimize resource utilization and prevent bottlenecks by automatically distributing requests to the model with the lowest current usage.
Key Characteristics
- Team-Managed: Organization admins control which models are included
- Smart Distribution: Least-used routing prevents model saturation
- Flexible Composition: Mix of different providers and model types
- Usage Optimization: Real-time load balancing based on request volume
- Shared Resources: All organization members can access the pool
- Cost Control: Organization-level quota management applies
Specifications
| Aspect | Details |
|---|---|
| Collection Type | Organization-level |
| Access Level | Organization members only |
| Scope | organization |
| Routing Strategy | Least Used (lowest current usage) |
| Visibility | Organization (shared with all org members) |
| Member Count | 1 or more models |
| Access Control | Role-based (admin can modify members) |
| Quota Tracking | Organization-wide usage |
Use Cases
1. Team-Based Model Selection
- Shared model pool for development team
- Collaborative inference across departments
- Democratic access to premium models
- Cost-shared model subscriptions
2. Distributed Workload Balancing
- Prevent bottlenecks on high-demand models
- Automatic load distribution
- Smooth out traffic spikes
- Optimize cost per request
3. Shared Resource Optimization
- Maximize model utilization
- Reduce idle time across the pool
- Fair resource allocation
- Efficient team productivity
4. Organization-Wide Inference Scaling
- Scale inference capacity through model diversity
- Add/remove models as needs change
- Balance between quality and cost
- Support multiple use cases with one collection
5. A/B Testing and Experimentation
- Test multiple models in production
- Gradually shift load between models
- Compare model performance at scale
- Data-driven model selection
Characteristics of Included Models
Models in organization shared collections typically include:
| Aspect | Details |
|---|---|
| Model Diversity | Mix of providers (OpenAI, Google, Anthropic, etc.) |
| Size Variety | From 7B lightweight to 405B powerhouse models |
| Capability Range | Text-only to multimodal models |
| Cost Profile | Mix of budget and premium models |
| Context Windows | Varied (4K to 200K tokens) |
| Specializations | General purpose, code, reasoning, vision |
Model Selection Strategy
Least Used Routing
The organization shared models collection uses least-used routing to optimize load:
- Usage Tracking: Maintains request count per model
- Live Selection: At each request, selects model with lowest usage
- Load Balancing: Automatically distributes incoming requests
- Fair Distribution: Prevents any single model from becoming saturated
- Dynamic: Adapts to real-time usage patterns
Selection Algorithm
For each request:
1. Query current usage for all collection members
2. Filter out unavailable/rate-limited models
3. Select model with minimum usage count
4. If tied: Use secondary ordering (trust_level, cost)
5. Route request to selected model
6. Increment usage counter for tracking
Example Load Distribution
Collection with 3 models: [ModelA, ModelB, ModelC]
Initial state:
ModelA usage: 0
ModelB usage: 0
ModelC usage: 0
Request 1 → ModelA (usage: 0) → ModelA usage now 1
Request 2 → ModelB (usage: 0) → ModelB usage now 1
Request 3 → ModelC (usage: 0) → ModelC usage now 1
Request 4 → ModelA (usage: 1) → ModelA usage now 2
Request 5 → ModelB (usage: 1) → ModelB usage now 2
Request 6 → ModelA (usage: 2) → But now ModelB/C have lower usage!
→ Rebalance to ModelC (usage: 1)
Result: Even distribution across all models
Dynamic Rebalancing
- Continuous monitoring of model usage
- Weights adjusted based on recent requests
- Prefers underutilized models
- Falls back to secondary ordering if tied
- Handles model unavailability gracefully
Usage
Basic Request
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "collection/org-shared-models",
"messages": [
{"role": "user", "content": "Analyze this dataset"}
],
"temperature": 0.7
}'
With Vision (if available in collection)
curl -X POST https://api.langmart.ai/v1/chat/completions \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model": "collection/org-shared-models",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
]
}
]
}'
Batch Request with Monitoring
async function batchRequest(prompts) {
const results = [];
for (const prompt of prompts) {
const response = await fetch('https://api.langmart.ai/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer sk-your-api-key',
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'collection/org-shared-models',
messages: [{ role: 'user', content: prompt }]
})
});
const data = await response.json();
results.push({
prompt,
response: data.choices[0].message.content,
selected_model: data._selected_model,
tokens: data.usage.total_tokens
});
}
return results;
}
Collection Management
Viewing Collection Details
SELECT
id,
collection_name,
display_name,
description,
scope,
routing_strategy,
organization_id,
created_at,
updated_at
FROM model_collections
WHERE collection_name = 'org-shared-models'
AND is_active = true;
Viewing Collection Members with Usage
SELECT
mcm.id,
mc.category_display_id as model_id,
mc.model_name,
mc.provider,
mcm.priority,
mcm.weight,
mcm.is_active,
-- Estimate usage from recent requests
COUNT(rl.id) as recent_requests_24h
FROM model_collection_members mcm
JOIN model_categories mc ON mcm.model_category_id = mc.id
LEFT JOIN request_logs rl ON (
rl.response_data->>'_selected_model' = mc.category_display_id
AND rl.created_at > NOW() - INTERVAL '24 hours'
)
WHERE mcm.collection_id = (
SELECT id FROM model_collections
WHERE collection_name = 'org-shared-models'
)
GROUP BY mcm.id, mc.category_display_id, mc.model_name, mc.provider, mcm.priority, mcm.weight, mcm.is_active
ORDER BY recent_requests_24h DESC;
API: List Collection Models
curl -X GET https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
-H "Authorization: Bearer sk-your-api-key" | jq '.models'
API: Add Model to Collection
curl -X POST https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"model_id": "anthropic/claude-3-opus"
}'
API: Remove Model from Collection
curl -X DELETE "https://api.langmart.ai/api/user/model-collections/COLLECTION_ID/members/anthropic%2Fclaude-3-opus" \
-H "Authorization: Bearer sk-your-api-key"
API: Update Collection Settings
curl -X PUT https://api.langmart.ai/api/user/model-collections/COLLECTION_ID \
-H "Authorization: Bearer sk-your-api-key" \
-H "Content-Type: application/json" \
-d '{
"display_name": "Premium Shared Models",
"description": "High-quality models for production use",
"routing_strategy": "least_used"
}'
Response Format
Responses include collection and load-balancing metadata:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "openai/gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Based on the dataset analysis..."
}
}
],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 200,
"total_tokens": 350
},
"_collection_routed": true,
"_collection_name": "org-shared-models",
"_selected_model": "openai/gpt-4o",
"_routing_strategy": "least_used",
"_usage_at_selection": 45,
"_model_alternatives": [
{"model": "anthropic/claude-3-opus", "usage": 62},
{"model": "google/gemini-2.5-pro", "usage": 51}
]
}
Performance Characteristics
| Metric | Target |
|---|---|
| Selection Latency | <2ms |
| Load Balancing Efficiency | ±10% across models |
| P50 Response Time | <3 seconds (varies by model) |
| P95 Response Time | <8 seconds (varies by model) |
| Usage Tracking Accuracy | Real-time (within 100ms) |
| Cache TTL | 10 seconds (usage-based, always fresh) |
| Request Success Rate | >99% (with fallback) |
Load Balancing Example
Scenario: 3-Model Collection
Scenario Setup:
- ModelA: Premium, expensive, very capable
- ModelB: Mid-tier, balanced cost/quality
- ModelC: Budget, lightweight, fast
Over 1 hour with 1000 requests:
Without load balancing (users pick ModelA):
- ModelA: 900 requests
- ModelB: 100 requests
- ModelC: 0 requests
→ ModelA overloaded, expensive, inefficient
With least-used routing:
- ModelA: 333 requests
- ModelB: 333 requests
- ModelC: 334 requests
→ Balanced usage, fair distribution, cost-optimized
Typical Collection Composition
A typical organization shared models collection might include:
1. openai/gpt-4o
- High capability, premium tier
- Used for critical/complex tasks
2. anthropic/claude-3-opus
- Excellent reasoning, long context (200K)
- Research and analysis tasks
3. google/gemini-2.5-pro
- Multimodal, fast, cost-effective
- General purpose tasks
4. mistralai/mistral-large
- Open source, efficient, good quality
- General conversation
5. groq/llama-3.3-70b-versatile
- Fast inference, free option
- High-volume requests
Usage Metrics and Analytics
Request Distribution (Last 24h)
SELECT
response_data->>'_selected_model' as model,
COUNT(*) as request_count,
ROUND(AVG(CAST(response_data->>'_latency_ms' AS NUMERIC)), 2) as avg_latency,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY response_data->>'_selected_model'
ORDER BY request_count DESC;
Cost Analysis by Model
SELECT
response_data->>'_selected_model' as model,
COUNT(*) as requests,
SUM(CAST(response_data->>'total_tokens' AS INTEGER)) as total_tokens,
ROUND(SUM(CAST(response_data->>'_cost' AS NUMERIC)), 2) as total_cost,
ROUND(AVG(CAST(response_data->>'_cost' AS NUMERIC)), 4) as avg_cost_per_request
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
AND created_at > NOW() - INTERVAL '7 days'
GROUP BY response_data->>'_selected_model'
ORDER BY total_cost DESC;
Load Balance Efficiency
WITH model_usage AS (
SELECT
response_data->>'_selected_model' as model,
COUNT(*) as request_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 2) as percentage
FROM request_logs
WHERE request_data->>'model' = 'collection/org-shared-models'
AND created_at > NOW() - INTERVAL '24 hours'
GROUP BY response_data->>'_selected_model'
)
SELECT
COUNT(*) as total_models,
ROUND(AVG(percentage), 2) as ideal_percentage,
ROUND(STDDEV(percentage), 2) as distribution_stddev,
MAX(percentage) as max_percentage,
MIN(percentage) as min_percentage
FROM model_usage;
Error Handling
Collection Not Found
{
"error": {
"code": "collection_not_found",
"message": "Collection 'org-shared-models' not found or access denied"
}
}
No Available Models
{
"error": {
"code": "collection_no_models",
"message": "No models available in org-shared-models collection",
"details": {
"total_members": 5,
"available_models": 0,
"reasons": [
"model-1: rate limited until 2025-12-28T14:30:00Z",
"model-2: temporarily unavailable",
"model-3: quota exceeded",
"model-4: user access denied",
"model-5: health check failed"
]
}
}
}
Partial Availability with Fallback
{
"success": true,
"model": "collection/org-shared-models",
"_original_selection": "openai/gpt-4o",
"_fallback_used": true,
"_fallback_reason": "rate_limited",
"_selected_model": "anthropic/claude-3-opus",
"choices": [...]
}
Access Denied for Organization
{
"error": {
"code": "access_denied",
"message": "Your organization does not have access to org-shared-models collection",
"required_scope": "organization",
"user_scope": "individual"
}
}
Billing and Credits
- Cost: Based on selected model (varies)
- Tracking: Per-model usage tracked in request_logs
- Attribution: Each request logs actual model used
- Organization Quotas: Collection requests count toward organization monthly quota
- Cost Reporting: Detailed breakdown by model and user
- Shared Budget: Organization credits shared across all members
Limits and Constraints
| Constraint | Value |
|---|---|
| Min Models | 1 |
| Max Models | Unlimited (practical: 10-20) |
| Name Length | 100 characters |
| Description Length | Text field (no limit) |
| Member Add/Remove | Instant (no queue) |
| Request Rate | Model-dependent (typically >100 req/s) |
| Concurrent Requests | No limit at collection level |
| Cache Invalidation | <1 second (usage-based) |
Integration Examples
Example 1: Data Analysis Pipeline
import anthropic
import json
client = anthropic.Anthropic(
api_key="sk-your-api-key",
base_url="https://api.langmart.ai/v1"
)
def analyze_data(data_json):
"""Analyze data using org shared models collection"""
message = client.messages.create(
model="collection/org-shared-models",
max_tokens=1024,
messages=[
{
"role": "user",
"content": f"""Analyze this data and provide insights:
{json.dumps(data_json, indent=2)}
Provide:
1. Summary statistics
2. Trends or patterns
3. Recommendations"""
}
]
)
return {
"analysis": message.content[0].text,
"selected_model": message.model,
"tokens_used": message.usage.output_tokens + message.usage.input_tokens
}
Example 2: Multi-Model Comparison
async function compareModels(prompt) {
// Get collection info
const collectionResponse = await fetch(
'https://api.langmart.ai/api/user/model-collections',
{
headers: { 'Authorization': 'Bearer sk-your-api-key' }
}
);
const collections = await collectionResponse.json();
const orgSharedCollection = collections.collections.find(
c => c.name === 'org-shared-models'
);
// Get all models in collection
const collectionDetails = await fetch(
`https://api.langmart.ai/api/user/model-collections/${orgSharedCollection.id}`,
{
headers: { 'Authorization': 'Bearer sk-your-api-key' }
}
);
const details = await collectionDetails.json();
const models = details.models || [];
// Collect load metrics
const loadMetrics = models.map(m => ({
model: m.model_id || m.id,
recentRequests: m.recent_requests_24h || 0,
usage: m.usage_percentage || 0
}));
return {
collectionName: 'org-shared-models',
totalModels: models.length,
loadDistribution: loadMetrics,
recommendedAction: calculateLoadRecommendation(loadMetrics)
};
}
Administration and Governance
Organizational Policies
- Access Control: Only organization members can use the collection
- Model Management: Organization admins manage collection membership
- Cost Governance: Shared quota management applies
- Usage Reporting: Detailed usage reports per team member
- Audit Trail: All model add/remove operations logged
Best Practices
- Start Small: Begin with 3-5 models, expand as needed
- Model Mix: Include both premium and cost-effective models
- Monitoring: Regularly review usage distribution
- Rebalancing: Add/remove models based on actual usage patterns
- Documentation: Document which models serve which use cases
- Testing: Periodically benchmark models in production
Database Schema
See /datastore/tables/99_model_collections.sql for complete schema details.
Key Tables
model_collections: Collection metadatamodel_collection_members: Collection membership and weightsmodel_categories: Available modelsrequest_logs: Request history and load metricsorganizations: Organization details
Related Resources
- Collection Tools:
/gateway-type3/collection-tools.ts - Model Collection Router:
/gateway-type1/lib/services/model-collection-router.ts - Database Schema:
/datastore/tables/99_model_collections.sql - API Endpoints:
/api/user/model-collections/*
Monitoring Dashboard
Key metrics to monitor for organization shared models collection:
Real-Time Dashboard:
├── Load Distribution
│ ├── Model usage percentages
│ ├── Request count per model
│ └── Distribution fairness (std dev)
├── Performance
│ ├── P50, P95, P99 latencies
│ ├── Request success rate
│ └── Error rate by model
├── Costs
│ ├── Total cost (24h, 7d, 30d)
│ ├── Cost per request
│ └── Cost by model
└── Health
├── Model availability
├── Rate limit status
└── Recent errors
Version History
- v1.0 (2025-12-20): Initial organization collection implementation
- Least-used routing strategy implemented
- Load balancing and usage tracking enabled
- Collection membership API created
- Organization-scoped access control added