Intent Classification System¶
Overview¶
AICO's intent classification system provides sophisticated, multilingual intent recognition using state-of-the-art transformer models. The system follows AICO's architecture patterns and integrates seamlessly with the conversation management pipeline.
Architecture¶
Core Components¶
graph TD
A[Thread Manager] --> B[Intent Classifier AI Processor]
B --> C[ModelService Client]
C --> D[ZMQ Message Bus]
D --> E[ModelService Handler]
E --> F[TransformersManager]
F --> G[XLM-RoBERTa Model]
Key Features¶
- Multilingual Support: XLM-RoBERTa supports 100+ languages
- Semantic Understanding: Goes beyond keyword matching
- Context Awareness: Considers conversation history
- Real-time Performance: <50ms inference with caching
- Architecture Compliance: Follows AICO BaseAIProcessor patterns
Implementation Details¶
AI Processor (/shared/aico/ai/analysis/intent_classifier.py)¶
The core intent classification logic follows AICO's BaseAIProcessor pattern:
class IntentClassificationProcessor(BaseAIProcessor):
"""
Advanced multilingual intent classification processor.
Uses ModelService for all transformer operations.
"""
def __init__(self):
super().__init__(
component_name="intent_classifier",
version="v2.0"
)
self.model_name = "intent_classification" # TransformersManager model
self.supported_languages = [
'en', 'es', 'fr', 'de', 'it', 'pt', 'ru', 'zh', 'ja', 'ko',
# ... 100+ languages via XLM-RoBERTa
]
Intent Categories¶
The system recognizes these standard conversation intents:
- GREETING: "hello", "hi", "good morning"
- QUESTION: "what", "how", "why", "when", "where"
- REQUEST: "please help", "can you", "I need"
- INFORMATION_SHARING: "I want to tell you", "here's what happened"
- CONFIRMATION: "yes", "correct", "that's right"
- NEGATION: "no", "wrong", "I disagree"
- COMPLAINT: "problem", "issue", "broken"
- FAREWELL: "goodbye", "bye", "see you later"
- GENERAL: General conversation
Semantic Prototypes¶
Instead of hardcoded examples, the system uses semantic prototypes:
intent_descriptions = {
IntentType.GREETING.value: "greeting hello hi welcome",
IntentType.QUESTION.value: "question what how why when where",
IntentType.REQUEST.value: "request help please can you assist",
# ... semantic descriptions for each intent
}
These descriptions are converted to embeddings using XLM-RoBERTa, creating semantic prototypes that understand intent meaning across languages.
ModelService Integration¶
TransformersManager Configuration¶
"intent_classification": TransformerModelConfig(
name="intent_classification",
model_id="xlm-roberta-base",
task=ModelTask.TEXT_CLASSIFICATION,
priority=1,
required=True,
description="Multilingual intent classification using XLM-RoBERTa",
multilingual=True,
memory_mb=600
)
Automatic Model Loading¶
The model loads automatically on first request:
def get_model(self, model_name: str) -> Optional[Any]:
if model_name == "intent_classification":
# Auto-loads XLM-RoBERTa on first request
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
model = AutoModel.from_pretrained("xlm-roberta-base")
wrapper = IntentModelWrapper(tokenizer=tokenizer, model=model)
# Cached for future requests
Embeddings Handler¶
The embeddings handler supports transformer models:
async def handle_embeddings_request(self, request_payload) -> EmbeddingsResponse:
# Use TransformersManager for all transformer models
transformer_model = self.get_transformer_model(model)
# Generate embedding using transformer model
tokenizer = model_components.tokenizer
transformer = model_components.model
# Tokenize and get embeddings
inputs = tokenizer(prompt, return_tensors="pt", max_length=512, truncation=True)
with torch.no_grad():
outputs = transformer(**inputs)
embedding = outputs.last_hidden_state[:, 0, :].numpy().flatten()
Thread Manager Integration¶
The Advanced Thread Manager uses intent classification for intelligent thread resolution:
async def _classify_intent(self, message: str) -> str:
"""Classify message intent using AICO's AI processing architecture"""
from shared.aico.ai.analysis.intent_classifier import get_intent_classifier
from shared.aico.ai.base import ProcessingContext
processor = await get_intent_classifier()
processing_context = ProcessingContext(
thread_id="intent_classification",
user_id="anonymous",
request_id=f"intent_{hash(message)}",
message_content=message
)
result = await processor.process(processing_context)
return result.data.get("predicted_intent", "general")
ZMQ Protocol¶
Request Message¶
message IntentClassificationRequest {
string text = 1;
optional string user_id = 2;
repeated string conversation_context = 3;
}
Response Message¶
message IntentClassificationResponse {
bool success = 1;
string predicted_intent = 2;
float confidence = 3;
string detected_language = 4;
float inference_time_ms = 5;
repeated IntentPrediction alternative_predictions = 6;
map<string, string> metadata = 7;
optional string error = 8;
}
Topics¶
- Request:
modelservice/intent/request/v1 - Response:
modelservice/intent/response/v1
Performance Characteristics¶
Inference Speed¶
- Target: <50ms per classification
- Typical: 15-30ms with model caching
- Caching: Embedding cache with 1-hour TTL
Memory Usage¶
- Model Size: ~600MB (XLM-RoBERTa base)
- Cache Size: Configurable (default: 1000 entries)
- Concurrent Models: Max 3 by default
Accuracy¶
- Multilingual: Consistent performance across 100+ languages
- Context-Aware: Considers conversation history
- Confidence Scoring: Provides uncertainty estimates
Configuration¶
AI Processor Config¶
ai:
intent_classifier:
confidence_threshold: 0.7
cache_size: 1000
context_window: 10
enable_few_shot: true
ModelService Config¶
Usage Examples¶
Direct AI Processor Usage¶
from shared.aico.ai.analysis.intent_classifier import get_intent_classifier
from shared.aico.ai.base import ProcessingContext
processor = await get_intent_classifier()
context = ProcessingContext(
thread_id="example",
user_id="user123",
request_id="req456",
message_content="Can you help me with something?",
shared_state={'recent_intents': ['greeting']}
)
result = await processor.process(context)
print(f"Intent: {result.data['predicted_intent']}")
print(f"Confidence: {result.data['confidence']:.2f}")
print(f"Language: {result.data['detected_language']}")
ModelService Client Usage¶
from backend.services.modelservice_client import ModelserviceClient
client = ModelserviceClient()
result = await client.classify_intent(
text="¿Puedes ayudarme con algo?", # Spanish
user_id="user123",
conversation_context=["greeting"]
)
if result['success']:
data = result['data']
print(f"Intent: {data['predicted_intent']}")
print(f"Language: {data['detected_language']}")
Testing¶
Unit Tests¶
@pytest.mark.asyncio
async def test_intent_classification():
processor = await get_intent_classifier()
context = ProcessingContext(
thread_id="test",
user_id="test",
request_id="test",
message_content="Hello there!"
)
result = await processor.process(context)
assert result.success
assert result.data["predicted_intent"] == "greeting"
assert result.data["confidence"] > 0.7
Integration Tests¶
@pytest.mark.asyncio
async def test_modelservice_integration():
client = ModelserviceClient()
result = await client.classify_intent(
text="What is machine learning?",
user_id="test-user"
)
assert result["success"]
assert result["data"]["predicted_intent"] == "question"
Monitoring and Observability¶
Metrics¶
- Classification Latency: P50, P95, P99 response times
- Model Loading Time: Time to load XLM-RoBERTa model
- Cache Hit Rate: Embedding cache effectiveness
- Error Rate: Failed classifications per minute
Logging¶
logger.info(f"Intent classified: {predicted_intent} (confidence={confidence:.2f}, language={detected_language})")
logger.debug(f"Processing time: {processing_time_ms:.1f}ms")
logger.warning(f"Low confidence classification: {confidence:.2f}")
Troubleshooting¶
Common Issues¶
- Model Loading Failures
- Check transformers library version
- Verify model cache directory permissions
-
Ensure sufficient memory (600MB+ required)
-
Low Classification Accuracy
- Check input text preprocessing
- Verify language support
-
Consider context window size
-
Performance Issues
- Monitor model loading time
- Check embedding cache hit rate
- Verify concurrent model limits
Debug Mode¶
Enable debug logging for detailed processing information:
Future Enhancements¶
Planned Features¶
- Custom Intent Training: Support for domain-specific intents
- Confidence Calibration: Improved uncertainty estimation
- Multi-turn Context: Better conversation history integration
- Performance Optimization: Model quantization and optimization
Research Areas¶
- Few-shot Learning: Rapid adaptation to new intents
- Cross-lingual Transfer: Better multilingual performance
- Contextual Embeddings: Dynamic context-aware representations
- Federated Learning: Privacy-preserving intent model updates