News Archives

[Colloquium] Ceph: A Scalable, High-Performance Distributed File System

January 18, 2007

Watch Colloquium: 

AVI file (425 MB) 
Quicktime (154 MB)


  • Date: Thursday, January 18th, 2007 
  • Time: 11 am — 12:15 pm 
  • Place: ECE 118

Ethan L. Miller 
University of California, Santa Cruz

Abstract: The data storage needs of large high-performance and general-purpose computing environments are generally best served by distributed storage systems because of their ability to scale and withstand individual component failures. Object-based storage promises to address these needs through a simple networked data storage unit, the Object Storage Device (OSD) that manages all local storage issues and exports a simple read/write data interface. Despite this simple concept, many challenges remain, including efficient object storage, centralized metadata management, data and metadata replication, data and metadata reliability, and security.

This talk will open by detailing the object-based storage paradigm and showing how it improves on existing storage paradigms. I will then describe the Ceph file system developed at the UC Santa Cruz Storage Systems Research Center (SSRC). Ceph is designed to provide data at an aggregate of hundreds of gigabytes per second to tens of thousands of clients while handling individual component failures, and can also handle 250,000 metadata requests per second, removing bottlenecks in opening files for high-bandwidth use. This talk will focus on two aspects of Ceph: the decentralized algorithm used to distribute data to thousands of storage devices, and the scalable mechanisms that Ceph uses to ensure that access to the file system is secure.

Bio: Ethan Miller is an associate professor of computer science at the University of California, Santa Cruz, where he is a member of the Storage Systems Research Center (SSRC). He received his ScB from Brown in 1987 and his PhD from UC Berkeley in 1995, where he was a member of the RAID project. He has written over 75 papers in areas including large-scale storage systems, file systems for next-generation storage technologies, secure file systems, distributed systems, and information retrieval. His current research interests include issues in petabyte-scale storage systems, file systems for non-volatile RAM technologies, and long-term archival storage systems. His broader interests include file systems, operating systems, parallel and distributed systems, and computer security. He can be contacted at elm@cs.ucsc.edu.