From fd962649a1704f28bada3c39bc2f105bbc997e1a Mon Sep 17 00:00:00 2001 From: Zac Dover Date: Tue, 31 Mar 2026 23:39:38 +1000 Subject: [PATCH] Add CephFS summary to ceph.md architecture page Add a CephFS summary to the ceph.md architecture page. Signed-off-by: Zac Dover --- docs/architecture/ceph.md | 166 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 166 insertions(+) diff --git a/docs/architecture/ceph.md b/docs/architecture/ceph.md index 7909f1e..ca6a172 100644 --- a/docs/architecture/ceph.md +++ b/docs/architecture/ceph.md @@ -289,6 +289,172 @@ part of the unified Ceph storage platform, RGW benefits from the same reliability, performance, and operational characteristics that make Ceph a leading choice for software-defined storage solutions. +## CephFS in Summary + +### Introduction + +CephFS (Ceph File System) is Ceph's distributed file system interface that +provides POSIX-compliant file storage built on top of the RADOS object store. +As one of Ceph's three primary storage interfaces alongside RBD (block storage) +and RGW (object storage), CephFS enables users to mount a shared filesystem that +appears as a traditional hierarchical directory structure while leveraging +Ceph's distributed storage capabilities for scalability, reliability, and +performance. This combination of familiar filesystem semantics with enterprise +storage features makes CephFS suitable for workloads ranging from home +directories and shared application data to high-performance computing and big +data analytics. + +### Architecture and Components + +CephFS operates through a carefully designed architecture that separates data +and metadata management. At its core, CephFS relies on two essential components: +the Metadata Server (MDS) and the underlying RADOS storage cluster that stores +both file data and metadata. + +The Metadata Server daemon (ceph-mds) manages all filesystem metadata including +directory structures, file ownership, permissions, access timestamps, and +extended attributes. Unlike traditional filesystems where metadata resides on +the same storage devices as data, CephFS stores metadata in dedicated RADOS +pools, allowing it to be replicated and distributed independently. This +separation enables CephFS to scale metadata operations independently of data +operations, a critical capability for large-scale deployments. + +File data in CephFS is stored as RADOS objects distributed across the cluster's +Object Storage Daemons (OSDs). When a client writes a file, CephFS stripes the +data across multiple objects according to configurable striping parameters, +enabling parallel I/O and leveraging the aggregate bandwidth of multiple storage +devices. This architecture allows CephFS to scale from gigabytes to petabytes +while maintaining consistent performance characteristics. + +### POSIX Compliance and Compatibility + +CephFS provides strong POSIX compliance, supporting the vast majority of +standard filesystem operations expected by applications and users. This includes +hierarchical directory structures, standard file permissions and ownership, +symbolic and hard links, extended attributes, and file locking mechanisms. The +POSIX compliance ensures that existing applications can use CephFS without +modification, making it a drop-in replacement for traditional network filesystems +like NFS or SMB. + +Clients can access CephFS through multiple methods. The kernel client integrates +directly with the Linux kernel, providing native filesystem performance and +supporting standard mount operations. FUSE (Filesystem in User Space) clients +enable CephFS mounting on systems without kernel module support or in situations +requiring non-root access. Additionally, libcephfs provides a library interface +for applications to interact with CephFS programmatically, enabling custom +integration scenarios. + +### Metadata Server Design + +The MDS represents a sophisticated component designed specifically for +distributed metadata management. In CephFS, metadata operations like listing +directories, creating files, or checking permissions can dominate workload +patterns, particularly with applications handling many small files. By +maintaining metadata in memory and leveraging high-performance RADOS operations +for persistence, the MDS achieves low-latency metadata operations essential for +good filesystem performance. + +CephFS supports multiple MDS daemons operating simultaneously, enabling both +high availability and horizontal scalability. In active-passive configurations, +standby MDS daemons monitor active instances and can take over immediately if an +active MDS fails, with the transition handled automatically by Ceph monitors. +The journal stored in RADOS ensures that no metadata operations are lost during +failover. + +For scalability, CephFS implements dynamic subtree partitioning, allowing +multiple active MDS daemons to divide the filesystem namespace among themselves. +The system automatically balances load by migrating directory subtrees between +MDS instances based on access patterns. A heavily accessed directory can even be +sharded across multiple MDS daemons, with each daemon handling different entries +within the same directory. This dynamic load balancing ensures that metadata +operations scale with the number of active MDS instances. + +### Performance Characteristics + +CephFS delivers strong performance across diverse workloads through several +architectural optimizations. Client-side caching reduces latency for frequently +accessed data and metadata, with cache coherency maintained through distributed +locking mechanisms managed by the MDS. This caching enables multiple clients to +access the same files efficiently while maintaining consistency. + +The striping of file data across multiple RADOS objects enables high-bandwidth +sequential I/O operations, with clients performing parallel reads and writes +directly to OSDs. For large files, this parallelism allows CephFS to saturate +available network bandwidth and leverage the aggregate throughput of many +storage devices simultaneously. + +Metadata performance benefits from the MDS's in-memory metadata cache and +efficient RADOS operations for persistence. For workloads with good locality, +where applications repeatedly access files within the same directory trees, the +MDS cache provides excellent performance. The ability to scale metadata +operations through multiple active MDS daemons addresses the metadata bottleneck +that plagues many distributed filesystems at scale. + +### Snapshots and Quotas + +CephFS provides sophisticated snapshot capabilities enabling point-in-time +copies of directory trees. Snapshots are space-efficient, storing only changed +data rather than full copies, and can be created instantly on any directory +within the filesystem. Users can browse snapshot contents through a special +`.snap` directory and restore files or entire directory trees as needed. +Administrative snapshots enable backup and recovery strategies while +user-accessible snapshots provide self-service recovery from accidental +deletions or modifications. + +Directory quotas allow administrators to limit storage consumption at any point +in the directory hierarchy. Quotas can restrict both the total bytes consumed +and the number of files, with enforcement occurring at write time. This enables +multi-tenant deployments where different users or projects share a filesystem +while preventing any single entity from consuming excessive resources. + +### Multiple Filesystems + +Recent CephFS versions support multiple independent filesystems within a single +Ceph cluster, each with its own namespace, MDS cluster, and data pools. This +capability enables isolation between different use cases or tenants while +sharing the underlying storage infrastructure. Each filesystem can be configured +with different parameters, replication strategies, or performance +characteristics appropriate to its specific workload requirements. + +### Security and Access Control + +CephFS implements multiple layers of security. Path-based access restrictions +allow administrators to limit client access to specific directory subtrees, +enabling multi-tenant scenarios where different clients see only their allocated +portions of the filesystem. CephX authentication integrates with Ceph's native +authentication system, ensuring that only authorized clients can mount the +filesystem. + +Standard POSIX permissions and ACLs provide fine-grained access control at the +file and directory level, allowing familiar Unix-style permission management. +Extended attributes enable additional metadata storage for applications +requiring custom attributes or security labels. + +### Use Cases and Applications + +CephFS excels in scenarios requiring shared filesystem access across multiple +clients. Home directories, shared application data, and collaborative workspaces +benefit from CephFS's strong consistency and POSIX compatibility. High +performance computing environments leverage CephFS for shared job data and +scratch space, taking advantage of the parallel I/O capabilities and scalability. + +Content creation workflows in media and entertainment utilize CephFS for shared +storage of large media files, benefiting from high bandwidth and the ability to +scale capacity and performance independently. Big data analytics platforms use +CephFS for storing datasets that multiple processing nodes must access +simultaneously. + +### Conclusion + +CephFS represents a mature, scalable distributed filesystem that brings POSIX +compatibility to Ceph's distributed storage platform. By separating metadata and +data management, supporting multiple active MDS daemons, and leveraging RADOS +for reliable distributed storage, CephFS delivers enterprise-grade filesystem +capabilities suitable for demanding production workloads. Its combination of +familiar filesystem semantics, strong performance, and advanced features like +snapshots and dynamic metadata scaling makes CephFS a compelling choice for +organizations requiring shared filesystem storage at scale. + ## See Also The architecture of the Ceph cluster is explained in [the Architecture chapter of the upstream Ceph