O

LangMart: Microsoft: Phi 4 Multimodal Instruct

Openrouter
Vision
131K
Context
$0.0500
Input /1M
$0.1000
Output /1M
N/A
Max Output

LangMart: Microsoft: Phi 4 Multimodal Instruct

Model Overview

Property Value
Model ID openrouter/microsoft/phi-4-multimodal-instruct
Name Microsoft: Phi 4 Multimodal Instruct
Provider microsoft
Released 2025-03-08

Description

Phi-4 Multimodal Instruct is a versatile 5.6B parameter foundation model that combines advanced reasoning and instruction-following capabilities across both text and visual inputs, providing accurate text outputs. The unified architecture enables efficient, low-latency inference, suitable for edge and mobile deployments. Phi-4 Multimodal Instruct supports text inputs in multiple languages including Arabic, Chinese, English, French, German, Japanese, Spanish, and more, with visual input optimized primarily for English. It delivers impressive performance on multimodal tasks involving mathematical, scientific, and document reasoning, providing developers and enterprises a powerful yet compact model for sophisticated interactive applications. For more information, see the Phi-4 Multimodal blog post.

Description

LangMart: Microsoft: Phi 4 Multimodal Instruct is a language model provided by microsoft. This model offers advanced capabilities for natural language processing tasks.

Provider

microsoft

Specifications

Spec Value
Context Window 131,072 tokens
Modalities text+image->text
Input Modalities text, image
Output Modalities text

Pricing

Type Price
Input $0.05 per 1M tokens
Output $0.10 per 1M tokens

Capabilities

  • Frequency penalty
  • Max tokens
  • Min p
  • Presence penalty
  • Repetition penalty
  • Response format
  • Seed
  • Stop
  • Temperature
  • Top k
  • Top p

Detailed Analysis

Microsoft: Phi 4 Multimodal Instruct