Benchmark środowisk to prosta sprawa? Nie dla każdego...

So, welcome to the 2013 Lightning Talks at Surge. I'm very excited for the group that we have. We've got, I think it's eight people. Eight. If anybody wants to volunteer and if we have some time and you want to do one, they're really good to do if you just want to do it off the cuff. The rules are they're five minutes long at most. Please try to keep it to that time, contestants, though you're not contesting anything. There is something called the Cantril exception. For those who were here last time, you remember why we had to do this. And if you weren't here, it's just very simple. I get to let it go further than five minutes. The Cantril exception has two conditions. Your last name must be Cantril. And then there's other conditions that remain secret. If you have a problem with this, I can tell you I don't make the rules. I just think them up and write them down. Forgive us for any switching. There's three different, three or four different technologies being used for the slides. Not everybody uses Keynote for some reason. So without any further ado, we're going to start with Hugh Bryan. And you can come back to the display. Thank you. What's normal? That's not normal. Oh, I should come to the display. Yes. It's a hardware problem. I'm a software guy. Hugh, turn my mic on. There we go. Great. Thank you. Thank you so much. Okay. It's lightning. I'm going to go really fast, but I'll keep it tight. My name is Hugh Bryan. I'm a pre-sales engineer for AppDynamics. We're the number one largest APM enterprise solution in the marketplace, which is pretty cool for us. I want to talk to you today about what's normal. Enterprise applications are, there's that line from Marc Andreessen, software is eating the world. So one of the things I've had to do over my career, I'm sorry, I was a Java software engineer. So I've been sitting where a lot of you are sitting. And it's understanding what's normal in the context of an application. Why do we want to know what's normal so we can know what's right? What's abnormal? It reminds me of that line from young Frankenstein, if you're old enough to remember that. Okay. If you can't measure it, you can't what? Manage it. The things that I, when I walk around and visit people in their respective places of business, they don't even know what's normal. So what we try to do is tell them just what's your average response time over the last week, or day, or month, or year, or whatever it is. So you can start measuring it and managing it. Know what your key performance indicators are. You know, if you don't know anything yet, just get, how about an average response time, a call count, and an error count. We'll start with that. Okay. What are some of the problems that we run into? This is, I bring these up because I've run into these. I've walked into a shop, I walked into Gannett, I said, give me your top ten problems that you have on a regular basis, and he quoted them right back to me and I felt validated immediately. Garbage collections in Java, garbage collections in general do what? Stop everything. If you're a Java guy and you don't know that, then you need to know that. It's super important. Garbage collection in general is a full stop of your application. I was at a JBoss conference and half the room didn't know that and I almost passed out. Houston, we have an I.O. problem. This is a big one. If you're running an application, the first thing I do when I walk in, if it's slow, I go, hey, did you leave debugging on on your logging? Okay. As soon as you put that app under load, you're dead. Right? Any kind of I.O. bottleneck with network, disk, et cetera, will kill you. Okay. The other thing is, the second, third, top, top three, configuration. Every time. Okay? The thing about this is, if you didn't read the directions, you probably set it up wrong. On the .NET side of the house, for those folks, if you're doing any .NET, I'm not sure that anyone here is, I'm not sure, but maybe you are, your out of the box configuration can be a disaster. So just keep that in mind. Along with that comes request overload. So the number of requests coming through, right? If you have a thread pool set up, and it's at 50 concurrent users, and you have 50 concurrent users, and then you have 51, 52, and 53, what's the thing that happens? Anybody? What condition do we get into? A block wait condition. Right? So we're waiting. That might be okay, but we want to know, we need to measure so we can know that. If you're a Hibernate guy, you already know the answer to this, or if you're a Java guy. Hibernate is great for certain kinds of transactions. Don't build reports with it, because you'll, it's horrible. Okay? Web services. These are my favorites, because the reason I have my job, one, is because of web services and SOAP. They're really horrible. They have a lot of overhead, and they keep me in business. They really do. I'm not kidding. I swear to God. They're horrible. And not for nothing, the .NET guys love doing, breaking their applications from a two-tier to a three-tier, and they'll go, web front end, web service tier, talking to SQL server, and all they've done is throw that web method tag on their code, and they've got instant web services, and then they call me because their performance is horrible. So that's great. Okay. Most important one. I'm almost out of time. I have to mention this. Synchronous versus asynchronous. This is really important, because our applications are getting more and more complex. Asynchronous operations are everywhere. It's built into the languages that we're using now. Closure, Scala, and the list goes on. And so being able to understand the difference between what a synchronous and an asynchronous transaction is is really important. And the last one. I'll get in just under time. Databases. Normally, the databases are pretty tuned up pretty well, at least in the past they have been, because everyone ran Oracle, and they over-hardwared it, and the queries were really fast. Typically, it's the guy writing the code, sorry to say. You know, query within a query within a query. Still see that all the time. And no indices, or something along those lines. I'd just like to get up there and throw those out to people. I hope they give you something to chew on and think about. I'm out of time. Again, my name's Hugh Bryan. I'm a pre-sales engineer for AppDynamics, and I really appreciate your time. Someone chuckling there at my web services bit, and listening to what I had to say. Thank you very much. All right. Thank you. So next, you might recognize this man. You may not, because he's clean-shaven. Could we go to the loop? Can I change mine? No, you can go. Pull up the ponies. It's good. But I thought after an is-it-normal presentation, I could do are-you-regular. Having traveled so much to so many different countries and actually had full-blown dysentery twice. No, we're not gonna do that one. That's unfortunate. Yeah. Is it? Are we up? No. No. We had this problem last time. So this presentation will not be without irony. I am going to teach everyone how to tie their shoes and have no shoelaces. So I was not prepared today for that. Is that good? Yeah. So I'm a big Puma fan, by the way. I have narrow feet. I even have a Puma hat. Makes me look more like the lead singer of ACDC. Yeah! All right. Great. Okay. So I'm a dad. I have three kids that can't tie their shoes. One of them is almost able to tie their shoes. It seems kind of weird, right? Most kids should be able to learn to tie their shoes around four. It turns out most people learn to tie their shoes wrong and they learn correctly at an average age of 28 right now. So I'm gonna show you how to do that. So this is the way people learn to tie their shoes, right? You have loop, swoop, and then you have through the hoop, right? Everybody, who ties their shoes this way? Great. Who doesn't tie their shoes this way? And who doesn't do bunny ears? You don't do bunny ears. Okay. Good. All right. So a lot of people will do this, and then they'll double knot it afterwards. I don't know how many times. Has anybody ever picked a double knot out of a child's shoes? It's awesome. It's like they put chewing gum in the knot too. It's amazing. So this is a sad knot and a sad pony. This wasn't built for BronyCon, but I think that it might fit well. So this is what happens when a group of ponies watch two ponies one shoe. I've heard nasty things happen in that, but they watched how to tie real shoes. So we start with this. Loop, swoop, through the hoop. And then something magical happens. There's a magical unicorn pony thing that happens. We do one more loop through the hoop. Right? Do that. You don't have to double knot your shoes anymore, and they won't come untied. And then when you want to untie them, you pull them out, and everything's happy. All right? So really, this is a double slipped reef knot or a surgeon shoelace knot. And if you actually, I mean, everybody is going to learn how to scale systems here and stuff like that. That's going to be great. Yay. There's probably a lot of people here who can show you how to do that, but I just showed you something awesome that's life-changing. All right? You know, when the whole revolution happens and the little organic things turn all the computing systems off, your shoes aren't going to fall off. All right? And if you find a kindergarten or a first grade teacher and you teach them this, they will be indebted to you forever. Do you know how much time they spend tying fucking shoes on the playground? It's unreal. So you will bring them to tears, and you will have the best friends everywhere. So that's how you tie your shoes. Thank you, Theo. Yes. Next, Mike Adler with one weird trick to save your entire business. And you're using a computer, right? And the same thing. That's it. Hello. Hi, I'm Mike Adler. For the last year, I've been a systems engineer at the Huffington Post where we have a pretty simple business model. Somebody comes to our site, we show them some news and some ads, and then somebody actually pays us. It's actually that simple. Just scale it up to 1.3 billion page views per month. So one of the great things about scaling up that big is that when you fail, it's pretty spectacular. I won't talk about that. I'm just going to talk today about how we try to avoid that. Here's our site. Shouldn't this be, like, astonishingly simple? If we have a failure, shouldn't we be able to just sort of shed all the interactive features and just do a failover to a read-only site and, if nothing else, just give people some kittens? Maybe we could do that. How do you build it? How do you build a great ability into an eight-year-old code base? Will your users even recognize it or like it? Will it be even as reliable as the primary site? We had to build it and find out. So as a side project, I started hacking with Varnish because I knew it could capture HTTP requests and reserve them sort of indefinitely. But it wouldn't be a normal Varnish server. It would be where the proxy is caching the requests as they're being served. Now, these Varnish servers need to live far away, distant data centers, totally decoupled from the flow of production traffic. If I couldn't have production traffic, I'd have the next best thing. Phantom.js is a headless web browser that can simulate real users and real traffic. It not only requests the page, but it also executes JavaScript and all the asynchronous requests, filling the cache with all the essential URLs. So with these components, I had a working proof of concept, and all I needed was a name. I call it Space Blanket because it's a warm cache designed to be used in an emergency. As a side benefit, my boss would be forced to say the word Space Blanket to senior management. So here, users hit the CDN. The CDN goes to our primary data centers. If they fill the health checks, then CDN's forward to our Space Blanket installation. And here's how we warm the request. The Varnish server, the Phantom.js bots hit the Varnish server, which forwards to the CDN in the primary data center. I put the Varnish VCL up on GitHub, but in brief, we anonymize the requests by stripping the cookies, and we normalize URLs to improve hit rates by stripping some query parameters, just like general abuse of Varnish. But will it work in production? Now, our Varnish caches could only really hold a small portion of our total site content, but maybe that would be enough for most of our requests. So ultimately, we would have to test in production with real traffic and our true CDN cache to know exactly what will happen. Late one night, we flipped the switch, and Space Blanket engaged automatically across all of our domains. Here we can see our page view metrics per minute, and they're basically steady. It's working. Somehow, we managed to serve our most essential services using just 1% of our typical infrastructure. I think I found the one weird trick about web ops. So is this useful outside of our narrow use case? I don't know. But about a month ago, the New York Times went down for two hours in the middle of a big news day. I don't know anything about their circumstances, but I had to think maybe it could have been useful. There's a chance. So why do we like Space Blanket a lot at the Huffington Post? Well, first off, it's a failsafe that engages automatically in 60 seconds. It is a single service tier with no dependencies, so we can expect it's going to work at that critical moment. It protects against not just sort of component failures, but sort of catastrophic worst case scenarios. Also smaller ones. And also, we don't have to worry about split brain, because we just keep one brain, and then if we don't need it, we stop using it. Let's see. I didn't get to talk about a few other aspects of this project, so if you have any questions, feel free to come up to me and ask me. Thanks. Thank you very much. So next, we've got Brendan Gregg with Benchmarking Gone Wrong. We're going to have to switch computers. Thanks. My name is Brendan, and this is a quick talk on Benchmarking Gone Wrong. This is a case I worked on earlier this year, where a professional company that does benchmarking sent us a report, and it looked like this. This is where disk throughput higher is better. It's joint versus Amazon versus Rackspace, and our results looked pretty bad. We actually had a new manager join joint, and he got the report as well and was very worried, and thought maybe we're not the high-performing cloud company anymore, because an independent third-party professional benchmarking firm has sent us this. And so imagine if this was sent to your company, and an independent source has tested your product. So I wasn't that worried, because I get sent benchmark reports all the time, and most of them are actually wrong. In fact, almost 100% of them are wrong, and it's wrong for various reasons. People are testing the wrong thing, there's misconfigurations, there's confusion about what the results mean. So what I did was I started working with the company to find out what had gone wrong, and they weren't testing similar instance types, the benchmarks weren't testing real-world things properly enough. We worked through various issues, and we improved performance quite a lot, and however, the results are still quite bad. One of the things they're doing is they're testing disk I-O and not file system I-O, which doesn't make sense because it's not 1990 anymore. A lot of applications talk to the file system, caching works. Disk I-O benchmarking is interesting, but so various things like this. And we had this back and forth that went for about a month, and each time we were improving the performance better and better, but the results still looked bad. And at this point, the manager was getting quite worried. He's like, we've worked on this for a month with this firm, and performance still sucks for joint. Maybe we should accept it. So I was confused because it doesn't match the lay of the land how I know it, and so I'm doing active benchmarking, and that's where you run the benchmark, and then you root cause it to find out why that number is that number, and it's not twice that number. Every time I run the benchmark, I can't reproduce. The results are always very good, and so I'm wondering if there's some variance. Maybe they're really unlucky when they run the benchmark. Arda said earlier about crontabs firing at the same time because NTP is in sync, and I'm wondering maybe they're running their benchmark from crontab or something like that. Maybe they have a short duration, and it's just interfering with something else. They had actually been running the benchmark for a month every day now, and so they were able to send us a report showing variance, which would be really interesting, and that shows how things are changing over time. These should be horizontal lines. If there's any wiggle, it means there's some variance, and it looked like this. And I'm like, what the hell is this? And we've been working with them for a month, and there was no explanation. It's like, here it is. Here's variance. Hey, if you deploy on Amazon, and it doesn't matter who's who because the graph is so ridiculous, but if you deploy on Amazon on the 20th, performance is a thousand times better than if you deploy on the 8th or the 9th. This is just the craziest line graph I've ever seen. And I'm like, rack space is all over the place. So at this point, I've tested everything. I've looked at the server. I've looked at the client. I've looked at the benchmark. I know exactly what they're running. The one thing I haven't looked at is the little piece of shell code that they use to execute the benchmark and generate the report. And I kind of don't want to insinuate that this professional firm has screwed it up, but it's like, please, could I see your shell script that runs the benchmark and generates the report? Now, there were good spots about it because they did want to do this properly, and they sent me the shell script, and I'll show you what it did. So they're running FIO, file IOTester, fine. I will show you the command they ran, which is not the same command that most of us would run. I'd do this entirely from the shell or entirely from ORC. They were doing this sort of thing. Okay, let's get the results we're interested in. Okay, there's that column. We don't accept characters in our reporting, so we need to get rid of those KB slash S. So let's just chop them off. I don't know when I saw this, but you have to be kidding me. You can't just do that. That K is important. It's part of the number. You can't just throw it away. And the numbers at the top, that's bytes per second. The numbers at the bottom is kilobytes per second, and they just threw it away. And that explained a 1000x variance that I couldn't explain. So I send the manager, I tell the manager who's freaking out, and actually freaking out executives that maybe we weren't the high performing company. I'm like, I found it. Here it is. Understandable, because I don't think they ever saw the other type of result. Understandable shell bug. Next report they send is going to be fantastic. They send us another report, and it still looks crazy. And I'm going out of my head, but I've given you a 1000x bug. What they'd done was this. Their software also didn't accept decimal points. So how do you do this? Yep, that's right. And that's what they did. No, this few characters, as I've said, just punched me in the gut, and I'll never, ever forget this until I die. You can't just, I like the decimal point. That's important too. This turns 1.6 into 16. And so that's how we ended up with this crazy, crazy graph. Now, there were lots of mistakes made in, it's normal with benchmarking, very, very error prone, very, very failure prone. This one took the cake, because it also included a little bit of shell script they used to process the report. Thank you. All right, up next, we've got Chris Niren and, this is like something out of Monty Python. You said it. Okay. On the symbiotic relationship between developers, coders, and administ everything-us. All right, up next, we've got Chris Niren and, this is like something out of Monty Python. You said it. Okay. On the symbiotic relationship between developers, coders, and administ everything-us. I like the music. Okay, so I cannot possibly hope to follow Brendan, so I'm going to go in a completely different direction. So, let us begin. So, what we have here is what is known as a clownfish and a sea anemone, and these have a rather symbiotic relationship with one another in that the clownfish eats little particles off of the sea anemone, and the sea anemone absorbs the clownfish's waste, and in addition to this, the clownfish is immune to the stingers on the sea anemone, which is a nice little bit of symbiotic relationships in nature, and as shown on the previous slide, it is rather reminiscent of the relationship between developers and administrators. So, how are these two groups really related, beyond not being able to use a mic correctly? Really, it comes down to a couple things. Really, it comes down to a couple things. First of all, really, it is code that unites what we do in a technical capacity, and what I mean by that is developers write code, and most admins write code as well, and as things scale upward and outward and in every which direction, you need to be able to write code to keep up with things. As a developer, I used to write code. Obviously, I would hope so, otherwise I would lose my job, anyway, and as an administrator, I've been writing Ruby and Burl and about to write a lot more Ruby to do the various things that I do for getting my value to the customer, and that is the second place where we really come together and have the same ultimate goal. It's about delivering value to the customer, and I have given a couple talks recently coming from both sides, meaning how to make your admin's lives easier as a developer, and the things that admins can do to be sharper and more valuable, and of course, this is specializing on how to code, so there are a couple other things that are rather important, I think. Thank you very much. All right, thanks, Chris. Next, we have Riley Burton, and I'm going to let him announce his title. Hey, hello? Hey. Yep. I'm a software engineer at AppNexus. My name is Riley. I am a software engineer at AppNexus. We are also hiring. I am here today to talk about selfies or being photographed in a picture. Everybody here is probably doing it wrong. Many of us are terrible at appearing in photographs. I am no exception. I am probably the least photogenic person maybe on the planet, and there will be several examples of this going forward. Sometimes it's because we're shy. Sometimes it's because we hate cameras, whatever. It doesn't really matter. We're bad at it as a community, and I'm here to correct that. Now, the example here is not the pretty blonde. It is me, and that is actually a self-taken photograph with my cell phone, and it's embarrassing. I'm here to give you tips for taking a great selfie, okay? Here we go. Number one, put your ass back. Now, this is also known as the teapot stance. The problem with standing straight on to the camera, like this, is that it accentuates all of the terrible things about your person, so in my case, I have a long torso and short legs. You can't see the short legs in this, but standing straight on accentuates how terrible you can look in a straight-on picture because it flattens out the light, honestly. There's less shadows to play tricks on the eyes of the observer, right? If you put your ass back, you instantly become more photogenic, so do this going forward, and tip, if you're ever in a group of people who are all being photographed together, quietly whisper, put your ass back. They will all, from then on, thank you. Number two, control the yaw of your head. Yaw is left to right, like you're shaking your head. Control the yaw of your head. Same principle. If you take a photograph straight on, everything is flattened out, and you look like a pancake face. Instead, pick your best side, if it's the left eye, if it's the right eye, and point that at the camera a little bit, like this, and you can see this is awful. This looks like a mug shot, right? This is me straight on in my hotel room. Don't do this. Instead, do this. And as you can see, instantly, it's a great selfie. Number three tip, control the pitch of your head. Control the pitch of your head. Now, yaw is left to right. Pitch is up and down, like nodding. Now, what you don't want to do is ever give your chin to the camera. If you do this, there's a couple of problems. Number one, it makes your head look fat. Number two, if you happen to have any bats in the cave, they become exposed to the camera. Nobody wants to see that. Don't do it. Instead of this, combining pitch, yaw, you do this. And now, you're beautiful and playful. Number four point is sort of controversial. Do you smile? I tend to not smile in my selfies or pictures because I can't put on that fake smile. It just looks ridiculous. If you have a nice smile, by all means, please smile in your photographs. If you don't, I don't recommend deadpan because that's sort of scary. I would go with a smirk. See, look, he's practicing. Now, combining all of this magic that I've taught you so far, we have put your ass back or the teapot stance. We have controlled the yaw of your head and control the pitch of your head and then smile if you want to. You can go from this dubious aesthetic value at best in this photograph to the undeniable beauty of this. Thank you, Riley. You can imagine what it's like to work with him. All right, up next, John Moore with the title, I think, is Operating at Scale. All right, you folks ready to hear about a large-scale system? All right, here we go. Now, the typical American family has two kids, but this is not good enough for my wife and I because we have four, and they range in age from eight years old down to an eight-month-old baby. So that's a whole binary order of magnitude greater than the typical family. So, you know, in computer science, you commonly hear the way of counting, which goes zero, one, many. Well, when you have kids, the way that you count is one, two, insanity. All right? And we have so many kids, we are subject to the law of large numbers, which means that highly improbable events become everyday occurrences. So, for example, right now, does Google have a failed hard drive? Yes. Is one of my kids sick? Yes. Will one of the kids wake up in the middle of the night? Yes. If you want to talk about operating under degraded performance, try doing this to a couple of parents on a regular basis. So, in other words, the system is never green, as we all know from operating large-scale systems. And, in fact, you know, because the kids are running around, it's actually a distributed system, right? And so one of the fallacies of distributed computing, right, is, you know, is the network reliable? No, right? There are lost messages all the time. Earlier this week, we had to schedule an interview with my daughter who's entering kindergarten with her kindergarten teacher so she can come in and be assessed. My wife and I carefully looked at our calendars, and we scheduled, and we said, yep, okay, 11 o'clock, does that work for you? Does that work for you? Okay, good. We're going to take that one. So, 11 o'clock rolls around. I show up at the school. My wife shows up at the school. Who's got the kid? No. We forgot to bring the kid. We all know that when you have complex systems, you get unexplained behavior. And so, when you're a parent of this many kids, you find that you say and do things that, when you take them out of context, you can't really understand how you got there. And so, I'm actually not going to provide the context for this, but I have actually uttered the following. Don't use your sister as a weapon. That is a true story. All right. I'm subject to bit rot, okay? I cannot tell you which of my kids like tomatoes right now because it changes every day. I blame cosmic rays. My son actually goes back and forth on whether he likes bacon, if you can believe this. I know, right? Currently, he does like bacon, so we're going to keep him for a little while longer. But, you know, I'm thinking, hey, I'm an architect. There's this great book called Release It by Michael Nygaard, right? And so, he's got all these stability patterns in there. And so, there's something we can use here that's going to help us out, right? And so, there's this concept of a bulkhead, right? Which is that, oh, okay, I'm going to take the problematics, you know, problem one part of the system. We're going to firewall it off from the rest of the system and then the rest of the system won't be disturbed. So, when my oldest son and oldest daughter are fighting, right? Mom, he did this to me. No, she did this first. What we do is we make them go stand in the bathroom together until they can agree on what happened. And meanwhile, the rest of us have peace for a while. So, this works pretty well. Earlier, I mentioned that there's, you know, we're subject to lost messages, right? And so, everyone knows that the way that you deal with this is you have to have positive acknowledgments. In parenting terms, this is what did I just say to you, right? And you would be surprised at how many messages actually get lost. Now, when you're overwhelmed, you know that it's important to fail fast, right? And so, you have to get really good at issuing the appropriate error response, right? So, here's a handy chart for you. So, 403 forbidden. This is no. 503 service unavailable. This is not now. 404 not found. I don't want to hear it. 405 method not allowed. Use your fork. 406 not acceptable. Well, no, I've actually said that. That is not acceptable. 409 conflict is stop fighting. And of course, 410 gone refers to dad's sanity. But the number one stability pattern that you have to employ, as we all know, are timeouts. So, quick poll. Has anyone had to be in the triage room on sysadmin appreciation day? Yeah, it's kind of ironic, right? And so, you know, I'm going to show you a picture of this past Father's Day of me taking a nice nap on the couch with my children dogpiling on top of me. But as we all know, even when you're in the triage room, it's really great if you like hanging out with your team. And as we all know, that's the true way that you can operate at scale. Thank you. Thanks, John. Last, but certainly not least, is a guy some of you may have heard of. Where is he? Oh, there he is. Brian Cantrell. There's a title to this. Fundamentally, how a pleasure cruise became an odyssey. If you've been here before and heard him speak, you'll know that not only do you have no idea how this is going to happen and turn out, he doesn't either. So, thanks for that. It's like a reality TV. Okay. Hi, I'm Brian Cantrell. I want to clarify something very important. I am not the manager that Brendan was referring to in his excellent lightning talk. I'm realizing as he's speaking, I'm kind of laughing and then I'm realizing, wait a minute, everyone is assuming that that is me that he's referring to. That is not. That was somebody in marketing. So, I just want to set the record straight there and hopefully we can. So, I was out in the hall and Eric Sproul cornered me and said, hey, I'm really looking forward to your lightning talk this year. What antiquarian Unix command are you going to talk about this year? I'm like, you know, I will thank you to acknowledge that I am not so predictable. I'm not going to be talking about some antiquarian Unix command. I'm going to be talking about a very interesting, challenging problem, a war story, if you will, disaster porn that is, oh, goddammit, involving an antiquarian Unix command. So, we are actually going to go do some Unix spelunking. And it starts innocently enough. So, we've got a complicated distributed system. We, like a lot of people, want to monitor logs. We want to alert on those logs. You want to build that monitoring system to be orthogonal to the actual system that's emitting these errors. So, what do you do? You're going to look at this log file, right? You want to grep through the log file occasionally. But you don't want to do that because you can run that at a cron or something and have a high latency. What you want to do is you want to use this great little bit called tail minus F. Let's just tail minus F the log, right? It seems so reasonable. And so, we build this great alerting system and it works really well and it's very prompt and alerts us whenever there's an issue. But then we notice that alerts are getting dropped every once in a while. And the system is in dire state, a very dire state. We didn't get an alert on it. And we go to kind of the engineer who built this alerting system, Trent Nick at Joyent, goes to kind of root cause this. And this is one of these things where it's like, wow, Trent, it sounds like you've got a pretty nasty bug in your alerting and monitoring system. He goes back and he's like, actually, it looks like it's an OS bug. I'm like, what? And as it turns out, tail minus F doesn't deal with truncation so well. And the log was rolled and tail minus F is happily waiting at an offset that's kind of in the indefinite future. You're like, oh, shit. What are tail minus F semantics? I've got the man page here. So let's see here. I'm sure they specify what happens on a trunk. No. They do not mention at all what happens on a truncation. So we on a truncation, like I said, we just effectively completely ignore it and we will never see the file until it grows to be the length at which you were under the truncate. So, okay. It doesn't say anything there. Now let's go back in time a little bit. Let's ask the question, when was tail minus F originally invented? And my kind of first guess on that would be like, I don't know. I mean, it seems like tail is kind of in the animal brain of Unix. Let's go back to the Torah here. So here we have this is the Unix programmer's manual. This is seventh edition Unix. I would say this is what God intended, but God made so many horrific mistakes. So you read this man page here and, wow, that's, all right, that's terse. End of man page. Okay, well, I guess like tail minus F is not here. I do think this is, by the way, kind of funny that under bugs, tails relative to the end of the file are treasured up in a buffer and thus are limited in length. Are you kidding me? I mean, you wonder why people thought Unix was fruity. I'm treasured up in a buffer. Gene Kranz is uploading a software patch to the lunar module of Apollo motherfucking 14 and you're treasuring things up in a buffer. Anyway, so I felt kind of embarrassed for us all when I read that. But so let's, let's put treasuring aside. All right, tail minus F isn't here. Fine. Okay. So let's, so then I started digging a little bit and many may guess when this came in. This came in with BSD. Very, very, very early BSD. So we are in 2.9 BSD. So this is truly the animal brain now of BSD. And here's the man page for tail minus F. Okay, now this is going to explain what the semantics should be in terms of truncation. Specifying F causes tail to not quit end of file, okay, but rather wait and try to read repeatedly, good, good, good, in hopes that the file will grow. And did you just say the H word? Are you serious? We're just going to hope that this thing keeps growing? Jesus Christ, you want me to treasure up some of your hopes? All right, so fine, fine, fine. Let's just go to the source code. Let's see what the source code does. I actually, when I first pulled up the source code, it took me a while to find the source code for this. When I first looked at the source code, I thought there was a different file because you're looking at the variable that it sets called follow and it's like it doesn't actually do anything with follow. All it does is these like three lines at the end and you realize, wait a minute, that actually is tail minus F. This is the entire implementation of tail minus F. And the, I believe, up here at the top of the file, we read it, get some explanation. Option F means loop endlessly trying to read more characters at the end of the file on the assumption that the file is growing. Okay, now we are hoping and we're assuming. Okay, awesome. Meanwhile, back in modern day, we've got a problem, right? And so let's ask what the other guys do about this. This is one of those questions where I'm sad to confess that if every other operating system on this had punted, this was going to be Trent's problem, not mine. So let's go see, let's actually go look at what the, I've got another window over here somewhere. I remind you of the contrillion exception to that if that's a five minute alarm. Very good. Okay, so I've got, let's see, which tail? I believe that this is going to be a, this should be the GNU tail. Yes, good. So we get, of course, lots of crazy ass options that don't exist anywhere else. Very good. Okay. So, and hopefully everyone can see that and we echo foo to, we'll just do temp foo. Okay, now we are going to tail minus F, temp foo, we'll go over here. I'll echo, we'll just echo bar bar and we will do that to temp foo. Okay. Bar. That seems wrong. Okay. Let's just do baz to temp foo. Temp foo. File truncated. No other output. Thank you, tail. Actually, I'm going to make that message slightly more accurate. Let's just do a strings on which tail and grep for trunk. Good, that's only in one place. CP. Let's just do a little software engineering on the fly here. Yeah. Hey, there we go. Thank you. All right. That's a good, very good, at least we've made, I'll go submit a patch request. Patch is welcome. So, we'll go submit that. All right. So, obviously, the new tail does absolutely the wrong thing. And then I made the huge, huge, huge mistake of looking at what had been done on the BSDs and damn the BSDs if they didn't get this exactly right. They used the KQ mechanism to actually get truncated events. Like, oh, man, now I got to go implement this. Okay. So, we have event ports in Illumos. We are smart OS. It's in Illumos derivative. We have event ports, no problem. So, I'm going to invent this new event. It's going to be a file truncate event. This will be great. And we have the BSD tail. We use BSD tail. We use the event ports and we'll listen for both the file modify event and the file truncate event. If we see the file truncate event, then we will reset our seek offset and all will be great. So, I did this and I tested it up and it all looks great and I push it, all looks good. And then several months pass. And I hear back from Trent. It's like, you know, we're seeing that problem again. It's like, well, Trent, it's clearly user error because I fixed it. And this problem is obviously fixed. He's like, no, no, no. Go look at this process over here. It's running your bits. It's running the new tail. And it was clearly truncated because the offset is off in space. And then I made a huge mistake that I really try not to make. I concluded that there was no other data to be gathered from this file that was in this wrong state. And said, okay, I just need, I've missed some condition. I have missed a condition and there must be some way for a truncate event to happen and for me to miss it. And so, I thought about it and thought about it. I got it. I know what's going on. I know what's going on. What's going on is that we, I've got one event for a modify and a truncate. And what's happening is a modify is coming in and I'm processing the modify and then I've got no event on that file and the truncate is coming in and we'll miss the truncate. It's like this really tight window. It's like seven instruction window and I write this test case and I use detrace and the chill action, blow up the window, show it. Oh, it's so, so nice. I know exactly how I'm going to fix it. Instead of having just a modify and a truncate in a single event, I will have a modify and a truncate. Got it. Sweet. Okay. Fix that. Test that. That looks great. Push that back. Done. Oh, boy, that was a complicated one, but I did it. And then deploy it into production. Again, several months pass. Trent says, you know, I'm seeing that problem again. It's like, Jesus, Trent, what's your problem? And then you look at this and sure enough, it's running my fix. Why is it not working? And so now I actually took a little, I did what I should have done the first time, which is these tail minus Fs are running all over our cloud and what I should have done is on that first one that failed, I should have asked the question, are there any others that are in this state? And I did that for the first time and I asked how many tails are in this state? It's like, oh, like half of them. Like, oh, shit. Oh, shit. So that means we're missing alerts all over the place. But that also means like that little like three instruction race or whatever that I found, that was not the problem. That may have been a problem. Indeed, it was a problem, but it was not the problem. And I usually pride myself on not screwing up like this, but I had. Damn. Back to the drawing board. And then I realized something very, very, very stupid, which is the code path that instruments that implements the truncate event, emits the truncate event in the kernel was all the Ocreate, O trunk cases, which makes sense. And just hooking into the existing mechanism. I kind of forgot about this thing, this little thing called F truncate. I'm embarrassed to admit. So there's F truncate, as it turns out. Right. F truncate. Right. What moron invented that? F truncate allows you to simply truncate a file. And indeed, when we were rolling the logs, we weren't actually Ocreating O trunk, we were F truncating the file. So I was just like, my event was just never firing. It's like, okay, problem solved. I just hadn't fixed it at all, at all, at all. Okay. Fine. So I go to fix it. And I add the other events. Okay. This should all be working. And I get on the machine. And like, okay, that's fine. I got my test machine. And I'm going to test it. And I go to run tail on a file. I'm kind of tailing bar temp foo, as I was doing earlier. And I go over to the file. And I can kind of simulate this a little bit. And I kind of did this echo baz to temp foo. And what did I see in my tail minus F was not one baz, but baz baz. I'm like, are you fucking kidding me? This isn't even the problem I'm trying to solve right now. You can't be a problem right now. And then you're just like, okay, I'm just going to like, I'm hallucinating. I'm just going to, I'm just going to unsee that. And then so I like, I echo boo to the file. And it's like, just one boo. I'm like, there. I'm just going to, we're just going to unsee this. This didn't happen. And like echo bar to the file. Bar, bar. Like, no, no, no, no. What are you, why are you fucking with me? Okay. So something is clearly very, very, very wrong. And a lot of detracing, like kind of panic detracing later. And I realized what the problem was. The problem was actually another problem that I had unearthed in the way we do event ports in terms of file monitoring. And so I had a truncate event and a modify event, and I was associating the truncate event and then I was associating the modify event. Okay. That's fine. But as it turns out, the ordering in which these were associated was last first. So the modify event would be associated before the truncate event. So what I would see is the file would be truncated and modified. And I would say, Hey, by the way, this file was modified. Oh, okay. I'll read it. Boo. Oh, by the way, this file was truncated. Oh, okay. I'll reset my seek offset to zero then. Boo. Okay. Fix that. Fine. Jesus Christ. Is everything complicated? Fix that. Fine. Okay. That's fixed now. And I got Trent's original bug fix. We're all fixed. We're done. And now I want to write just a little, I'm so close to just being all the way done here. I just need to write a little test case and then we're going to be all good to go. And so I wrote a little test case. Let me see if I can find my little test case around here somewhere. I know it's in one of these guys. Okay. Here we are. So I wrote a little test case and this is going to be my test case that, so I can verify that I fixed the problem. And this is just a little guy that's going to spawn off a little tail minus F. And then we're going to, in a loop, we're just going to hammer this thing. Truncate, truncate, truncate, truncate, truncate. And then I'm going to assert that I've never seen the same data twice. Great. Okay. And I run this and I see this. What? What? And sure enough, there's not a core here, but the, sure enough, what this thing had done, and let's see if I can get it to do it again. What this thing had done is it had decided to actually generate a core file because it had died. Let's see if we can get it to do it again here. It won't do it every time, but there we go. Sigbus, core dumped. It's like, what? No, no, we're not debugging a core dump right now. Tail does not core dump. And you look at tail, and it's like, what? And this was one of those moments, and I'm sure you've had these moments, where the tears begin to well up. And I've done this for so long. We've all done this for so long. I'm so used to being punched in the mouth. I know what it's like to be punched in the mouth, and I'm fine with it. I get punched in the mouth again and again and again and again and again. But sometimes when you're, like, eight problems in, and you really just need this one thing to work right now, so I can begin to unwind my life, and that's when the tears begin to form. I didn't cry, but I could kind of feel, and I go to look at the core dump, and I'm like, it makes no sense whatsoever. And as it turns out, the BST tail, as an optimization, is mmapping the file. And a whole lot of detracing later determined that it was actually mmapping a file that is in tempfs. And if you mmap a file that's truncated, you are deep into undefined territory. In fact, a SIG bus is as close to the defined behavior as you're going to get. ZFS, as it turns out, zfods the page. Everyone else SIG buses you. And then I'm like, but wait a minute. I go over to the Mac, and I can't get my Mac to do it. I'm looking at the source code of the BST tail, and I can't get my Mac to do it. And then I actually go over to the Apple source code, and there we are in user command tail, pound ifdef Apple. You know, truncating an mmap file is not a very good idea. It can leave a lot of undefined behavior, so we actually don't do all this code. So we actually don't do that at all for Apple, but don't bother pushing that up anywhere, Apple. We'll just all suffer through these problems again for the very first time. So with all of this, finally, finally, finally, everything was resolved. Everything was resolved, and tail minus f works as well as it can ever work, because the thing that's frustrating to me, of course, is tail minus f never can actually completely work, because the file can be truncated and overwritten before you can actually, tail can actually read it. So there's a little bit that's dissatisfying there. And then the other thing that's really dissatisfying is that if you look at the ftruncate mampage, you can actually set the truncation to any length you want, not zero. And this was the moment at which I said, fuck that. So if you are so stupid as to go into a log file and ftruncate it to offset 25, go fuck yourself. You just lost data. That's not my problem. And you know what? Yeah, I get it. I get it. There's some kid right now, right now they're six months old, and they're going to be up on this stage in 40 years, and they're going to be bitching about me, and they're going to be bringing up all my old emails and shit, and I don't care. I don't care. I'm done. Tail minus f works as well as it can ever work. So there's a little bit I don't care. I'm done. Tail minus f works as well as it's going to. Thank you very much.

Menu

Benchmark środowisk to prosta sprawa? Nie dla każdego...

Toggle timeline summary

Transcription