Menu
About me Kontakt

In his latest video, Jeff Geerling embarks on an ambitious project known as the PetaByte Pi Project, which aims to connect 60 hard drives with a total capacity of 1.2 petabytes to a single Raspberry Pi. As he points out, many people doubt the success of this endeavor, but Jeff decided to take on the challenge. Throughout the video, he outlines the struggles that led up to this project, as the Raspberry Pi chip was never intended for such a massive data operation despite his successful efforts to patch the Linux kernel to support 16 drives. Challenges related to power supply and space for the drives are also significant, but Jeff has the Storinator to mitigate many of these issues. He chose to film in his workshop rather than his usual office, as the latter could not accommodate such a large setup.

Jeff further explains how immense petabyte-level data storage is by noting that one could download all of Wikipedia 50,000 times. For the average home user, or even S-tier data hoarders, experiencing such a scale is still a distant goal. He elaborates on why he chose the Storinator XL60 from 45 Drives to build his project and humorously discloses that he managed to convince 45 Drives to send him the server, promising not to let his colleague in a "red shirt" near it.

In the next part of the video, Jeff dives into the technical details, showcasing the assembly of the hardware and the array of problems encountered. He notes that his Raspberry Pi Compute Module 4 has a quad-core ARM processor with only one PCI Express lane, making the task even more challenging. Despite opting for more straightforward methods through RAID cards, it unfortunately yielded disappointing results with many operations failing or some disks disappearing during operation. Jeff also attempted various file systems, but the outputs were mixed, confirming the theoretical limits of Raspberry Pi hardware.

In conclusion, Jeff summarizes his project by stating that while he successfully got 1.2 petabytes running on a Raspberry Pi, the entire operation was complicated and far from optimized. He admits that he wouldn't recommend this solution to anyone, as his Pi couldn't handle load requirements typical for server-grade hardware. Jeff intends to continue working on this project, planning to revert to more suitable server solutions from 45 Drives. At the time of writing this article, the video has already garnered 2,421,465 views and 38,329 likes, indicating considerable interest in the topic and Jeff's intriguing approach to technology.

Toggle timeline summary

  • 00:00 Introduction of 1.2 petabytes of hard drives and a Raspberry Pi.
  • 00:05 Plans to connect 60 hard drives to the Raspberry Pi for the PetaByte Pi Project.
  • 00:23 Challenges faced during the project, including limitations of Raspberry Pi's capabilities.
  • 00:46 Collaboration with Broadcom engineers to get initial drives working.
  • 00:56 Introduction of the Storinator for power and space management.
  • 01:16 Mention of networking bottlenecks with the Raspberry Pi.
  • 01:41 Explaining the massive scale of storage, likening it to downloading Wikipedia multiple times.
  • 02:04 Collaboration with 45 Drives to setup the required storage.
  • 02:16 Unboxing of the server unit, the Storinator XL60.
  • 02:40 Examination of the internal components including a 26-core CPU and multiple drive slots.
  • 04:25 Discussion about hard drive reliability and performance.
  • 05:00 Choosing the Raspberry Pi Compute Module 4 for the project.
  • 05:57 Details about the assembly of components including networking and RAID setup.
  • 07:08 Troubleshooting issues encountered during the assembly and setup.
  • 11:33 Attempt to boot the Raspberry Pi OS on the configuration.
  • 13:14 Experiencing errors during network file transfers and configuring RAID.
  • 17:05 Switching to BTRFS file system after initial RAID failures.
  • 19:25 Performance benchmarks showcasing lower than expected data transfer speeds.
  • 20:00 Discussion on the time needed to transfer large amounts of data.
  • 21:01 Recap of hardware limitations and configuration challenges.
  • 21:30 Conclusion on the feasibility of running such a large setup on a Raspberry Pi.

Transcription

This is 1.2 petabytes of hard drives. And this is a Raspberry Pi. And using this giant server, I'm going to plug all 60 hard drives into this one Raspberry Pi and build the Piedabyte. Or a Piedabyte? Or PetaPi? Let me know in the comments what you want to call this thing. For now, I'm calling it the PetaByte Pi Project. A lot of people said this will never work, but there's only one way to find out. And to get to today, I had to solve a lot of problems. The chip in the Raspberry Pi was never meant for this kind of thing. It only has a tiny bit of bandwidth, and the thing it uses to communicate with the hard drives, its PCI Express bus, might not even be able to work with all the drives. Last year, I worked with Broadcom engineers to patch the Linux kernel to get 16 drives working. But getting 60? Well, that adds another layer of complexity. And what about power and space for the drives? Well, luckily, I have the Storinator for that. And you might notice I'm in the workshop. I couldn't even fit this thing in my office where I normally record, so I had to boot out Redshirt Jeff and film here. But since I felt sorry for kicking him out, I made sure 45 Drives threw him a bone with the front panel design. I think he approves. Once we get all the drives working, if we get all the drives working, we're also going to have to deal with the Pi's networking bottlenecks. I guess I should make it obvious right away, don't try this at home. Not that many people have 60 hard drives and a Storinator sitting around, but there's a reason 45 Drives doesn't sell a Storinator Pi. It might work, but it's gonna bottleneck. Before we get into hardware, I need to put into perspective how massive a petabyte is. I could download all of Wikipedia on here. 50,000 times. My NAS, the one I loaded up with 8 terabyte hard drives, that thing is 40 times smaller than this petabyte of storage. Of course, it's gonna be a while before normal home users, or heck, even S-tier data hoarders, deal with petabytes. Just having somewhere to plug in all the drives is hard, and that's why I called up 45 Drives. They build a range of storage servers, and their biggest one is this one, the Storinator XL60, and it is massive. I mean, look at this thing. Full disclosure, somehow I convinced 45 Drives to send me the server and all these hard drives. They told me they want to see what wild and crazy things I can do with it, but I had to promise the insurance company I wouldn't let redshirt Jeff near it. I was thinking about unboxing it on camera, but FedEx already did. Luckily, nothing was damaged, in spite of this box completely missing one of its corners. If I pop the top cover off, you can see slots for 60 hard drives, and if you just pop the drives in, you don't even need a tray to hold them, and looking closely inside, it looks like they even have some 3D-printed parts in here. After I pop off the back cover, you can see a 26-core Xeon CPU, a SuperMicro server motherboard with seven PCIe slots, a dual 10-gigabit Ethernet card, 256 gigs of RAM, and we're gonna rip it all out. Sorry to whoever cable-managed this thing. It looks great right now, but it's gonna get a little messy. First, I took out the boot SSDs after making sure one had an identifying mark from the factory. Then, when I pulled the network card out, Then, when I pulled the network card, I found a nice surprise. One thing I do like is it looks like all the screws in their chassis are the same thread, so I could mix and match these screws, which is nice. Standards. It's nice to have them. Phillips heads. It's nice to have those, too. If Apple built this thing, there'd probably be five different kinds of screws, and half of them would be pentalobes. I unplugged the ATX and CPU power connections, plus a little purple connector that I have no idea what it's for. Then I went to unplug all the HBA drive connections, and realized they were in a weird order. Looks like they have C, A, B, D for the order of these cards, which is not as intuitive as I'd hoped. It's probably important for a storinator, since 45 drives has a fancy dashboard that can show where the drives are physically located in the system, but the ordering won't matter on the Pi. Look at that. Even the motherboard screws are the same. How nice. That is nice, and it's especially nice because there are so many screws here. Server motherboards seem to have more screws in them than standard ones. Kept an obvious there, but it is interesting. A lot of PC builds I've seen only have like six screws holding the motherboard, and there can be an awful lot of board flex. ProtoCase, the division that actually stamps out these cases, embeds studs in every position, so there are nine screws total. Oh, somewhere someone's screaming at me to use my static protection. Where's your anti-static wristband? You don't even ground the anti-static mat you're using. So instead of all that enterprise-grade hardware, we're going with this. This Raspberry Pi Compute Module 4 has a four-core ARM CPU, one PCI Express lane, one gigabit ethernet, and a paltry eight gigabytes of RAM. I'm going to put it on this I-O board. Then I'm going to use this PCI Express switchboard to plug in four of these. This is an enterprise-grade RAID card, and what's funny is, this one is actually a little newer than the ones that came with the Storinator. Some people told me I should go the easy route, and instead of using four of these HBAs, I could use a thing called a SAS expander and plug all the drives through that. But the Storinator's custom backplane boards down here that plug all the drives in, it might not work that way. Plus, I wanted to see if a Pi could handle enterprise-grade storage, and if you want the best performance, you're going to use four of those fancy RAID cards, not a bunch of expanders. Of course, if you're only interested in performance, go check out Linus's video on a petabyte of flash. I 3D printed an ATX adapter plate for the I-O board and drilled out the ATX mounting holes to a quarter inch so they'd slip over the Storinator's motherboard standoffs. That thing lets me mount the Pi so all the I-O goes out where a normal motherboard's I-O is. I also 3D printed an I-O shield, but because of a little threaded screw that was stuck in the case, the I-O board stands a little proud and the I-O shield won't fit, so I just tossed it. More airflow, right? But when I went to put in the PCI Express switch, I realized the thing is actually pretty big with the wide spacing between the slots. This will be interesting. I'm literally going to have it on top of the Pi. But that's okay. I grabbed a little piece of cardboard to insulate the boards from each other, at least temporarily. But I also had to get a Molex power connection to each slot on the switchboard, and that means the power supply cable management had to go. All right, we're going to snip these. The power supply had two Molex power connectors, but that was it. So it didn't have enough power connections, so I hope that I have an adapter. Looks like I do have a couple. So I'm going to go SATA to Molex for two of these. Since that's what this board needs. I finished wiring up the power, then plugged in the USB 3 cable adapter that goes from the Pi's by one slot to the switchboard's input. As long as it doesn't blow up, we'll be good. Indeed. After I got that sorted, I started plugging all the backplanes into the HBAs. So the fun thing is, for the Pi, this doesn't matter too much. What matters is getting them plugged in. So I am going to try to get these organized. But if they aren't, that's not the end of the world. In the end, I didn't really get them organized. I just made sure they all clicked into one of the free ports. No click, but it's in. Well, maybe except for that one. The next thing I need is a power switch to be able to turn on everything. And that's because the IO board doesn't have an ATX power header. So to turn on the power supply, I could either jump the right pins on its ATX header or use a fancy switch like this one. Unfortunately, it looks like the leads on the switch are soldered, so I couldn't get it to slide into the power button hole. I'll just have to have it dangling. Oh well. The last thing left is to install the actual Pi compute module. Compute module 4 goes here. I think that's everything. I should make it clear. Before I started this build, I did do some testing already at my desk with four HBAs and one hard drive plugged into each one. And if you think that looks messy, just think what it would look like if I plugged in 60 drives that way. But I did that four drive test and I made sure the drives would at least power up, be recognized, and work together in a RAID 0 array. With that setup, I got about 416 megabytes per second. So I think this will work. But like I said, the only way to know is to test it at scale. All right, here we go. This is the fun part. Well, it's fun, but then the problem is to move this, I'll need to take them all back out again. Now, the hard drives 45 drives sent are Seagate Exos X20 drives. And you might think about jamming any old drive in a system like this, but you shouldn't, even if you're using a Pi. The performance and warranty are typically better on enterprise drives, but that's not usually as important for someone building massive storage arrays. The two most important features are reliability and how many hard drive days are supported. These drives have a 550 terabyte per year and 2.5 million hour MTBF rating. Assuming they don't fail right away, they only have a five year warranty though, so don't focus too much on the MTBF. In fact, Seagate says they focus more on AFR or annual failure rate instead. I love reading Backblaze's hard drive reliability reports, and sure, Seagate isn't always on top, but most of the models are pretty reliable. But even if we had the worst drive ever, that's the whole point of a system like this. There are 60 drives in here. We should set them up with redundancy because with hard drives, it's not a question of if one fails, but when. That's why I'm going to need to work out a new plan for backing up a whole petabyte because even if all this expensive hardware works great, I still need a 3-2-1 backup if I want my data to be safe. Check out my video on that from last year. The second important thing is how many drive bays are supported. Not every hard drive can run inside a Storinator. If you look at the specs on a desktop drive, they don't even say how many drive bays are supported because they're just not built to handle vibrations you get in a server like this. Even Ironwolf Pro drives are only rated for up to 24 bays per server, so they wouldn't be a good choice. The Exos drives? You can put as many as you want. There's no limit, except for maybe the size of your rack. And as an added bonus, these Enterprise drives are like party balloons. They're filled with helium instead of air. They're still pretty heavy though, but the helium reduces friction and heat with so many drive platters spinning around inside. So when you build your first data hoarder setup, make sure you choose the right hard drives. When you get to petabyte scale, it actually matters a lot. But enough about hard drives. They're all installed. We tidied up the back of the system a little bit at least, and it's time to see if this thing boots. I'm going to boot Raspberry Pi OS on here, and I actually already applied the patch you see on the screen right now to the open source MPT3 SAS Linux driver that makes these HBAs work on the Pi. I have instructions for how to do that on GitHub in a link below. So let's boot up the Raspberry Pi OS. I'm going to boot it up. I'm going to boot it up. I'm going to boot it up. in a link below. And you might have noticed I had to swap out the fancy power button from 45 drives for this little power switch. Since the Compute Module 4 I-O board doesn't have an ATX power input, I have to use a separate switch to turn on the power supply. But that's wired up, the drives are in, and the Pi is ready, I hope. Let's see what happens. Ha ha, that's pretty loud. But I guess you kind of need this much airflow when the hard drives by themselves can eat up to 600 watts of power. Huh, I guess the redundant power supply works. And it's a good thing both of these tiny power supplies can do, let's see, 1,200 watts. I was also worried about the startup surge when you boot up all these hard drives at the same time. I know the fancy RAID cards I'm using are supposed to stagger the hard drive spin-up, but I wasn't sure if it would work or not with this setup. But it looks like it did because I can hear the different drives spinning up at different times. All right, we'll give it a few minutes to boot and hop over to the terminal to see what happened. First, I made sure I could see all the HBAs using LSPCI. Then I also checked to make sure it was using the MPT3 SAS driver, which is the patched driver I had to compile into PiOS's kernel. I also checked the system log and saw it had initialized a bunch of disks but when I checked out how many drives the system could see with LSBLK, it was only showing 45 drives. I mean, that'd be great for a sponsor segue, but I really wanted to see all 60. So I shut down the server and noticed all the HBAs still had blinking green lights, so they were all at least getting power. That one wasn't all the way in. Oh, look at this, the card's not all the way in. Yeah, it came out as I plugged all the devices in. Let's try this again. This time I watched as all the disks were getting initialized and it was fun listening closely to hear the different sets of hard drives spinning up. You can hear all the drives slowly spinning up. Oh, look at all these. This time when I ran LSBLK, it showed me all 60 hard drives. I also checked the model for all the drives and it's showing the right model number. So the next step was to put the drives in an array and see how they perform. Because I'm not a masochist, I pulled out my Mac, plugged the Pi into my network, and started managing it over SSH. I grabbed the IP, tried logging in, then realized I never put my MacBook's SSH key on the Pi, so I headed back to the office and did the rest of my testing from there. First, I tried RAID 0, which spreads data out over all 60 drives. This is a terrible idea for redundancy because if any one drive fails, all your data goes poof. But it's also the simplest and fastest RAID level and doesn't require any special data calculations to run on the Pi's CPU, which just doesn't have the horsepower for fancy things like parity on a 60 drive array. But I kept getting failures during the last part of the formatting process. It kept dying with input output error while writing out and closing the file system. Checking the system log, I saw a bunch of buffer IO errors and I also noticed some fault state messages with an error code like 0x2623 or 0x5854. And I also rebooted a few times while debugging and sometimes not all the disks would show up. Whenever that happened, the last message would be something like start watchdog error. So sometimes during boot, the driver would just die for no particular reason. But about half the time at least, all the drives would show up. I also noticed mdadmin would report the array is broken after formatting failed. And at that point, one or two hard drives would just be gone until I reboot it. If I couldn't format the RAID 0 array, I couldn't mount it either. So that was a dead end. And at this point, I was wondering if maybe the vibration when all the drives started writing could be an issue. I mean, it shouldn't be. But exploring that idea, I actually have a seismograph, my raspberry shake running in my basement. And sure enough, when I turn on the Storinator, I can see the fan noise up in this band. Then whenever a group of drives spins up, it registers pretty clearly. And I remember around a decade ago, there was this masterpiece of a video. AHHHHHH!! Don't shout at your jbugs, they don't like it! But no, I mean, these Exos drives are built to handle the vibration in a chassis like the Storinator. And I'm not actually yelling at them, so I don't think that's it. Vibration is definitely something to think about, but I don't think that's my main issue. So at this point, I switched tracks and installed ZFS. I've actually tested it on the Pi before, and it usually runs faster than standard rate on the Pi. But then I realized you have to have kernel sources to install ZFS, and I didn't because of the custom kernel I built. And getting that set up would take a bit of time, so I moved on to plan C, which was BTRFS. BTRFS is already enabled on PiOS, so I just installed the extra package to manage volumes and used makefs to create a BTRFS RAID 0 file system. And it worked! I got a 1.07 pebibyte, or 1.2 petabyte, RAID 0 storage volume. I mounted it and ran my benchmark script, but it was a little disappointing. Earlier, with one drive on each HBA, I could get around 416 megabytes per second. Now, with all 60 drives in BTRFS RAID 0, I'm only getting 213 megabytes per second, almost half the performance. And random writes were less than 20 megabytes per second. But the end goal is to see how well this thing performs on the network. So, I installed Samba, connected from my Mac, and copied over 70 gigabytes of video files. At first, it was getting over 100 megabytes per second, but after a few seconds, it dropped to 30, then nothing, then 30 again. I used ATOP to monitor the Pi, and there were a few spikes in CPU usage, but after copying about 3.5 gigabytes, the copy just stopped completely and failed. BTRFS said some device is missing. That's oddly similar to the problem I had with MD RAID 0 earlier. And the system log showed a similar error, too. That false state with the 5854 code. At this point, I shot an email off to Broadcom about it, but in the meantime, I also remembered BTRFS has a single mode, which puts the drives together, but not in RAID. Instead of splitting up writes across all the drives, it would store complete files on different drives. So, I set up a single BTRFS file, did the network copy again, and then the same thing happened. Error messages in the log, some device is missing in BTRFS, and looking at the array, it looks like one of the drives, device 23, just poofed out of existence. So, I went for a Hail Mary. I went back to the simpler Linux RAID setup using MD Admin, this time using Linear, which is similar to BTRFS's single layout, with the drive setup sequentially instead of all together in a stripe. And it was interesting seeing each drive get formatted one at a time, like here when it was formatting SDAJ. And at the part where RAID 0 failed, it wrote to each drive in sequence instead of all at once, and that seems to have made it work. I mounted the drive, and of course, since this array is so big, and the Pi is so slow, it took six minutes just to mount. But it worked, so I ran my benchmark on it. The benchmark was better, getting 250 megabytes per second, but still nowhere near the performance I got with just four hard drives. In this case, that's because the way a linear array works, it's only really benchmarking one drive at a time. So, I did another network copy, expecting it to fail, but no, it actually worked this time. There were a few times when the copy slowed down to 30 megabytes per second, but overall, it averaged 72. But that means this storage setup can't even max out a one gigabit network. But hey, at least the copy did finish. 74.31 gigabytes in 17 minutes and 8 seconds. Extrapolating that out to 1.2 petabytes, it would take 192 days. And that's kind of the limit, at least with all 60 drives. At this point, I talked to the Broadcom engineers. Now, I should mention that none of this setup is a supported configuration. We're using a Pi in ways it was never meant to be used, and we're putting enterprise storage cards in a situation they were never made for. But the engineers did have a couple ideas. First, the firmware on the cards was version 5, but the latest version is 22. So, a firmware upgrade could help a little, but when I tried, it failed. It looks like that has to be done on a machine with an x86 processor, and the Pi's little ARM CPU couldn't cope. Second, they looked up the errors I was getting, and that 0x2623 error apparently means the driver detected a data corruption during a storage operation. And seeing as I have a bit of a hacked together solution for both of those things, I wouldn't doubt it. So, my best guess is that when the PCI Express bus is sending tons of data to and from all 60 hard drives through all four cards at the same time, something is flaky. It might be the power delivery to one of the cards, or one of the connections on the PCI Express switch, or maybe even that USB 3 adapter cable that goes from the Pi to the switch. I don't know. So, yeah, I mean, we got 1.2 petabytes on the Pi. It worked, at least with no redundancy and no speedup, but would I recommend this setup? No, not at all. I took the Pi to the bleeding edge, and it started bleeding out. There's a reason 45 drives uses a Xeon CPU and a server motherboard for their servers. They can handle 60 drives easily, and even do it with 10 gigabit networking, or even more than that. The Pi, it couldn't even saturate a one gigabit connection. But heck, if you have an old storage server sitting around and a ton of hard drives, you could run it on a Pi, but don't expect the experience to be pain-free. Make sure you subscribe. I'm going to follow my own advice and swap back to the 45 drives hardware. I'm also going to review this thing compared to some other enterprise options like this 100 drive server Patrick reviewed on Serve the Home, and see if I can mount this massive 300 pound server in my new rack. Oh, and if you want to see me build the new rack, head over to Gearling Engineering, where my dad and I just posted a new video about it. Until then, I'm Jeff Gearling.