1. Overview
FusionStorage is Huawei’s self-developed distributed block storage software based on a Server SAN architecture. It aggregates local disks (SSD/HDD) from standard x86 or ARM servers into a unified storage resource pool.
Key Features:
- High Performance: SSD caching + distributed parallel I/O
- High Reliability: Multi-replica mechanism + automatic rebuild
- High Scalability: Linear expansion up to 4096 nodes
2. What is Server SAN?
Definition:
A storage architecture that pools local storage from multiple servers while integrating compute and storage resources.
Characteristics:
- Replaces proprietary storage with commodity hardware
- Linear scalability of compute and storage
- Simplified management and lower TCO
3. Core Features
SmartCache (Read Acceleration)
SmartCache uses SSD as a read cache to accelerate hotspot data access.
- Without SmartCache: all data resides on HDD → higher latency
- With SmartCache: hot data is cached on SSD → improved read performance
Architecture:
- Cache Pool (SSD-based)
- Cache Partitions
4. Architecture and Components
FSM (FusionStorage Manager)
- Provides monitoring, alarms, logs, and configuration
- Typically deployed in active/standby mode
MDC (Metadata Controller)
- Central control unit of the cluster
- Manages data distribution and recovery policies
- Leader election via Zookeeper
VBS (Virtual Block Service)
- Provides block storage services (volumes)
- Manages volume metadata
- Only one active VBS (leader) handles metadata writes
OSD (Object Storage Daemon)
- Executes actual I/O operations
- Typically one process per disk
FSA (FusionStorage Agent)
- Runs on each node
- Handles communication with FSM
- Includes MDC, VBS, and OSD processes
5. Key Mechanisms
Metadata Management
- MDC cluster elects a leader via Zookeeper
- Leader MDC manages cluster state and scheduling
- VBS also uses leader/follower architecture to avoid metadata conflicts
Heartbeat Mechanism
- ZK ↔ MDC: 1s interval, 10s timeout
- MDC ↔ OSD: 1s interval, 5s failure detection
- MDC ↔ VBS: 1s interval
→ Ensures fast fault detection and failover
6. Data Reliability
Multi-Replica Mechanism
- Supports 2 or 3 replicas
- Distributed across nodes and racks
Automatic Data Rebuild
Process:
Failure detection → replica comparison → data reconstruction → parallel recovery
Power Failure Protection
- Uses NVDIMM or SSD
- Prevents metadata and cache loss
7. I/O Path
- VBS: logical mapping (key-value structure)
- OSD: executes physical I/O
8. Deployment Modes
Converged Deployment
- VBS + OSD on same node
- Recommended for virtualization
Separated Deployment
- VBS and OSD on different nodes
- Recommended for high-performance workloads (e.g., databases)
9. Use Cases
Enterprise Core Applications
- High-performance workloads (databases, big data)
- Often combined with InfiniBand and SSD
Cloud Data Centers
- Large-scale storage pooling
- Supports hypervisors and mainstream applications (e.g., MySQL)
10. Resource Constraints
Storage Pool Requirements
- Minimum 12 disks per pool
- Same disk type within a pool
- Same cache type and quantity
- Disk count difference per node ≤ 2
Cluster Scale
- Zookeeper nodes: 3 / 5 / 7 (odd numbers only)
- MDC: 3–96 nodes (default 3)
- MDC count = ZK count + number of storage pools
11. Fault Tolerance
MDC
- Maximum failures allowed = total MDC – 1
- System can still run with only one MDC
Disk Failure Tolerance
Depends on:
- Replica number
- Fault domain (node/rack level)
Simplified:
- Max failures ≈ (replica count – 1) × disks per fault domain
12. Network Planes
- Management Plane: FSM ↔ FSA
- Service Plane: OS ↔ VBS
- Storage Plane: MDC / VBS / OSD
→ Separation improves performance, stability, and security
Leave a comment