Presented by

  • Ojaswin Mujoo

    Ojaswin Mujoo

    Ojaswin Mujoo is a Linux Kernel Developer at IBM, who works in the Filesystems domain, mainly on ext4. Ojaswin has been focusing on improving the performance of ext4 especially on enterprise servers and database workloads. His recent improvements to ext4 allocator improved the filesystem's performance by 2x when the FS is fragmented or undergoing lots of parallel writes, especially on large page size architectures like PowerPC. Off late he's been interested in how different databases interact with Linux FS layers and ways to eliminate bottlenecks in the kernel and improve performance. As a part of this he has also worked on how Linux kernel can provide atomic write guarantees to the userspace.

  • Ritesh Harjani

    Ritesh Harjani

    Ritesh is a Linux filesystems developer from IBM. Last few years he has been focusing on improving performance and scalability of filesystems on Power.


Most of the DBs like Mysql and Postgres currently use a mechanism called the double-write buffer, where they write a complete 8K/16K chunk to their own journal before performing an actual write to the disk, however this comes at a cost of major performance hit of as much as 30% in extreme cases. This double-write is required by DBs to provide crash consistency (in case if the crash happens in the middle of the write). However, it can be avoided if the stack supports atomic writes (both device and OS stack). We've now started seeing devices which can perform such multi kilobytes atomic writes but Linux still needs work before it can support atomic writes. Currently there is a proposal from the community to enable atomic writes for DIRECT-IO on XFS, however there are problems when it comes down to supporting atomic writes for buffered-io (e.g. postgres uses buffered-io). In this talk we would like to go over the why databases would benefit from atomic writes, the challenges involved in implementing them in Linux and the work done so far. We have a working prototype implemented for doing atomic write extent allocations in ext4 for direct-io. Next we are looking to leverage 64K pagesize of Power (or using large folio support for other archs), to prototype Linux atomic write support for buffered-io.