Mark Zuckerberg's Lecture on the Functioning of Facebook from... 2005

This afternoon, I have the pleasure of introducing Mark Zuckerberg, which is one of our guest speakers this semester, to come and talk a little bit about computer science in the real world. As most of you probably know, as you guys all do this much more than I do, founder of Facebook.com, which is a social networking program, whatever you want to call it, used it over 2,000 schools across the nation, and possibly the world, is it the world too, or just the nation? So good influence for doing some things in computer science. He's going to tell us some of the background of it and what's been important and so forth. So please join me in welcoming. This is the first time I've ever had to hold one of these things, so I'm just going to attach it really quickly. All right. Can you hear? Is this good? Is this amplified at all? All right, sweet. So this is like one of the first times I've been to a lecture at Harvard. But I guess what's probably going to be most useful for you guys is if I just take you through some of the courses that I took at Harvard, where I actually did go to lecture sometimes. I was joking. And sort of how different decisions that I had to make when I was moving along with Facebook got impacted by different stuff that I was learning in the classes that I was taking. And if all goes according to plan, then maybe some of you guys will come out of this thinking that taking CS or engineering stuff at Harvard is actually sort of useful. So that's the game plan. I think that this is slotted for two hours. There's no way I'm going to speak for two hours. I'll probably speak for like 20 minutes or 15 minutes, and then I'll just let you guys ask questions, because I'm sure you guys have more interesting stuff to ask me than I can come up with to talk about myself. So I guess I'll just kind of get started. When I was here, I started off taking 121. I never actually took 50, so I'm probably, you know, you should have gotten the other guy who was doing Facebook, Dustin Moskovitz, who is my roommate. When we got started, the site was written in PHP, which isn't something that you learn in one of these classes. But fortunately, if you have a good background in C, the syntax is very similar, and you can pick it up in like a day or two. So I started writing the site and launched it at Harvard in February 2004, so I guess almost two years ago now. And within a couple of weeks, a few thousand people had signed up, and we started getting some emails from people at other colleges asking for us to launch it at their schools. And I was taking 161 at the time, so I don't know if you guys know the reputation of that course, but I mean, it was kind of heavy. It was a really fun course, but it didn't leave me with much time to do anything else with Facebook. So my roommate Dustin, who I guess had just finished CS50, was like, hey, I want to help out. I want to do the expansion and help you figure out how to do the stuff. So I was like, that's pretty cool, dude, but you don't really know any PHP or anything like that. So that weekend, he went home, bought the book Pearl for Dummies, came back, and was like, all right, I'm ready to go. I was like, dude, the site's written in PHP, not Pearl, but that's cool. So he picked up PHP over a few days, because I promised that if you have a good background in C, PHP is a very simple thing to pick up. And he just kind of went to work. So I mean, the first big decision that we really had to make was in how to kind of expand the architecture to go from the single school type setup that we had when it was just at Harvard to something that supported multiple schools. So this was a decision that had to be made on a bunch of levels, both in the product and how we wanted privacy to work. But I think that one really important decision that's helped us scale pretty well is how we decided to distribute the data. So I don't know how much of complexity stuff and big O notation you guys do in this class. So I mean, one of the most complicated computations that we do on the site is the computation to tell how you're connected to people. I mean, you can imagine that stored as sort of a series of, I guess, undirected, it's not weighted, so undirected, unweighted pairs of ID numbers of people in the database. Then if you want to figure out who's friends with someone, you have to look at all their friends. So that's maybe like 100 or 200 people. But then if you want to figure out who's a friend of a friend or what the closest connection is there, then you kind of have to look at the 100 or 200 friends of each of those friends. So it becomes, at each level, there's another factor of n multiplied in, where n is the number of friends that each of your friends has. So you can see that this kind of becomes exponentially difficult to solve for the shortest path between people. So I mean, if you're just looking for a friend of a friend, that's n squared. If you're looking for a friend of a friend of a friend, that's n cubed. That's something that traditionally was pretty difficult for a lot of the predecessor sites to Facebook. So for example, Friendster had large problems with this, because they were trying to compute paths six degrees out, or like seven degrees out. And that's something that when you're doing n seventh, that just is really, I guess, very hard and took down their site for a while. So one of the things that we kind of had in mind when we were figuring out how to do this was, how do you distribute the database in such a way that this computation becomes manageable? So what we decided was that everyone on the site does most of their activity at the school that they're kind of based at. So if you're at Harvard, then most of the people who you're going to be seeing or transacting with on the site are going to be at Harvard. It's actually probably like 90% of the stuff that you do on the site. So we decided to split up the databases and create one instance of a MySQL database for each school on the network. And in doing that, if you notice, the paths that we compute are only within the school. So instead of, say, now we're at 6 million users, and instead of having to do n cubed over some portion of 6 million, it's just n cubed over 10,000, which is a much more manageable type of computation. So that was sort of the first big architectural decision that we had to make that contributed to us not dying a few months later. And I don't know, it was probably a pretty important one. So when we first set up the site, we had just one computer that we were running. It wasn't in our dorm room. We were renting it. I kind of learned my lesson for trying to run a site out of my dorm room a few months earlier, and Harvard almost tried to kick me out. So I ended up renting a server off-site this time, and I guess running originally the database and the web server, so Apache is what we were using in this instance, to serve the pages from the same machine. And because we distributed the databases in the way that we did, we were able to, as time went on, just add more machines linearly and sort of just grow the site without having any kind of exponential expansion on the amount of machinery that we had. But after we hit, say, about 30 or 50 schools, we started realizing that we could start getting more performance out of MySQL or Apache, and that some of the way that stuff was set up just wasn't as optimal as it could be. So I mean, for example, when you have MySQL and Apache running on the same server, you then, if something happens to that server, then not only does the database for that school or the schools on that server just stop responding in a way that will get you anything useful, but you can't even load any web pages, so you get page not found, and that kind of sucks. But another issue is that the variance in the use from school to school is also not going to be perfect. So I mean, some schools are always going to have heavier use. We have schools now like Penn State that have 50,000 users, and then the majority of the schools, I think, still have less than 2,000 users, just because there's a lot of small schools and a lot of schools that don't have complete ubiquity. So I mean, in trying to deal with this issue and kind of make it so that you could deal with the fact that Penn State had 50,000 people and just a ton of users all the time, and then you have some schools that don't, what we decided to do is separate out some of the web servers from the database servers and make it so that we just had a pool of Apache web servers that we could load balance between and make it so that you can use those uniformly while just having the database layer be sort of consistent. So I don't know if this stuff is interesting to you guys at all, or I mean, if this is anything that matters to what you guys are studying now. So if there's more stuff that you guys would rather know about in terms of the architecture, then I'll kind of leave that open to questions later. So just that I don't spend a lot of time just talking about random applications that you guys might not ever care to use. But let me try to find some interesting examples. So, I mean, so I guess, like, one of the things that was pretty interesting was just sort of trying to, like, when we got to a point in terms of traffic where we started maxing out the performance of some of these open source applications that are generally pretty performant. So for example, MySQL is a really good open source database. And I don't know if any of you guys sort of in your own time mess around and, like, make anything with MySQL or use it in any way. But like, I mean, it's pretty easy to use, and it's also, like, decently quick. You know, indices work pretty well. It's not as fully featured as something like Oracle, but it's pretty good. And I mean, we got to a point where I think around when we started doing, like, maybe 100 million page views a day that we started running into some bottlenecks on that. So like, so for example, a typical query on MySQL might take, like, two to four milliseconds. And I mean, that's not that much. But like, when you're doing 100 billion of page views a day, and each page view might have, like, 30 to 50 queries, I mean, especially if you're doing something like a profile view that just kind of queries all kinds of different information, then that kind of starts to suck. So we started to develop a caching sort of layer that kind of allowed quicker access to some of the information. And originally, we were using another open source application, Memcache, which I don't know if any of you guys have any experience with that. But like, it was pretty quick. It got access times down to, like, I guess like 0.3 to 0.5 milliseconds, which is pretty good. But it also, like, has a bunch of sort of distribution issues. And I mean, it's supposed to be like a distributed hash table sort of application, where it's like you can just attach, like, any number of Memcache boxes in a cluster and be able to just, like, hook it up and have it go. But we sort of ran into a lot of issues there where, like, different Memcache boxes would go down, and there was no redundancy on the information. So, like, when a Memcache box went down and you had a cache miss, then all of a sudden, like, you had a lot more traffic going to, like, a specific set of databases, and that would suck. So as time went on, we even kind of, like, outgrew Memcache and sort of the indices on MySQL. And I mean, we still use that stuff, but we had to build on top of that extra redundancy. And I mean, I think that, like, that's something that's probably, like, maybe a little interesting. But I mean, I'll let you guys ask me more questions about that later. I'm, like, not really sure what would be interesting to talk about right now. Maybe you guys can help out a little. Go for it. Yeah, so I mean, I think that, like, so that's not a technical question at all, but, like, I don't know. I mean, so I guess I'll just, like, go into question time now, because, like, I'm not really sure what's, like, relevant stuff for me to be discussing. So I'll just answer this, and then anyone else who wants to ask me questions can just go for that. I mean, I guess, like, I never really spent a lot of time worrying about stuff, like, I mean, there are companies out there, like Google, that could just sort of get into your space and do whatever you want at any time. And I mean, I think that, like, one of the cool things about, like, this time in technology is that individuals are leveraged and able to do way more than they'd really ever been able to do before. I mean, like, even, like, four years ago when Google was started, I mean, now they have hundreds of thousands of machines, you know, and probably billions of dollars spent on equipment. I think, like, the generation before Google, you couldn't even make a site without sort of, like, some big piece of hardware, and I think, like, eBay, for example, ran off of two, like, $50,000 machines or something, and, like, it's, like, you just can't start doing that if you're just a kid in a dorm room, you know? So I think that, like, the fact that we could sort of rent machines for, you know, like, $100 a month and use that to scale up to a point where we had 300,000 users is pretty cool, and it's a pretty unique thing that, like, that's going on in technology right now. And I mean, it, like, makes it so that instead of worrying about just, like, who's sort of the big player and, like, what is Google going to do next, you can do more of, like, you can just, like, get a lot of stuff done, and I mean, instead of, like, having to go out and have some of the traditional business problems, like, you have to raise capital before you can make anything, like, that's no longer an issue, so you can just, like, you're leveraged to do a lot more on your own now. I don't know if that kind of answers the question that you're asking, but I mean, I mean, it's one of the reasons why I think that at this point it makes a lot of sense to be studying this stuff, because, like, at no point in the past could you leverage, like, such a small amount of money to get powerful enough technology to really touch people in the way that you can today. So, I mean, Google does about 250 million page views a day, right? They have hundreds of thousands of machines and, like, 5,000 employees. Facebook does 400 million page views a day, so that's, like, almost, you know, that's, like, a lot more than Google does, you know, and we have hundreds of machines and we just passed 50 employees. So, like, and that's just, like, a sort of a technical generation of three or four years in the architectures that were created. So I mean, and then you go, like, three or four years back before that to, from, like, eBay to Google, and it's just completely different, because, I mean, at least Google is running off of this, like, a lot of distributed equipment that, I mean, they have hundreds of thousands of machines, but the idea there was get a lot of shitty machines, you know, that are, like, really cheap, and, I mean, that's a big step up, because then it's, like, okay, that's more redundant, you know, and they're not losing information, they don't expect stuff to always work. It's a much more, kind of, mature attitude than eBay's, which, I mean, was the only thing that they could do at the time, but, yeah. I have a question about the HTTP stuff. The what? The distributed hash testing stuff. Yeah, well, which one? I was just wondering if you, like, have you been able to use it at all, like, your extensions in Memcache? One thing I've noticed is that, yeah, there aren't really good available libraries in the HTTP stuff. There's always one or more resources, but in terms of implementation, that can go forward on its own. Yeah. A lot of the stuff that we didn't necessarily extend Memcache, we, like, built a bunch of stuff, like, ourself. So, I mean, right now it's not open source. I mean, like, we consider doing it, and, I mean, there's a lot of work that goes into making stuff open source, which, I mean, it's, like, on top of whether or not you want to, like, lose the competitive advantage, it's just, like, it's kind of unfortunate, because, I mean, I think that if it were just easier to make something like that, then you could do it. But, like, then there's a lot of support, you know, and, like, licensing and all that stuff, and I don't know. We found that it's been kind of annoying. I mean, one of the things that we actually considered making open source was this search server that actually that guy standing right there made while he was still out in California. And I guess, like, we got to a point where MySQL was lagging a little on some of the searches that we were trying to do, and we decided that it would be a cool thing to do to just, like, make sort of a series of, like, distributed machines that could kind of, he doesn't use a hash table. What's the structure that you use, McCollum? So, I mean, yeah, we thought about making that open, but then, like, that's when we kind of, like, had to do all this work to, like, come up with a license, and we're just like, all right, you know, screw that. But... Yo. Hiring people? I guess, like, when, as you grow, like, the most important thing is to have smart people, right? I mean, like, and if you think about how, like, the technical leverage stuff that I was just talking about in answering Akai's question, like, as technology becomes sort of more generic and less expensive, the leverage point becomes more in the people. You know, so, I mean, so if, and if you kind of think about this from a perspective of, like, a person to people time spent or, like, user time spent or page view analysis, it's like, because of technology now, like, people are much more leveraged to kind of do, and to do more things and just be more important in the equation. You know, so because of that, it's, like, really important to get the most intelligent people. And also, I mean, it's like when you're a small company, then you can be really nimble and get a lot of stuff done, and there's relatively little bureaucracy. So, if you have smart people who can take advantage of that to build cool things, then that's awesome. Yeah. I mean, I guess besides that, I don't know, designing new things. There's not much corporate bureaucracy yet, so I don't have to waste that much time on that. Keep on going. Yeah. Yeah. I have a lawyer who works for me full time. Yeah. We didn't, and that, I guess, provided some annoyance later on. I guess, like, getting stuff set up really well is good. Right? I mean, it's like, getting stuff clean is really good. And, I mean, no one's ever going to tell you, like, oh, a lawyer's bad. It's all just a question of opportunity cost and what you prioritize. Right? And, like, I guess that, in our case, it's like, we now have to deal with a bunch of stuff that wasn't set up properly in the beginning. But, like, I mean, actually, most of that stuff is dealt with. So, I mean, it's not even a big deal anymore. Right? But, like, but instead of talking to lawyers early on, we were making stuff. You know? And, like, I think that that was probably the right use of our time. So, it's like, I mean, I think that one cool characteristic of a lot of the companies that end up being really successful, not that we are really successful, but I guess we also fall into this bucket, is that they started off as someone trying to make something cool and not someone trying to make a company. You know? And, like, I mean, you kind of have, like, Google came out of Larry and Sergey's, like, PhD dissertation at Stanford. And Yahoo came out of, just like, I guess, also some Stanford guys just, like, kind of screwing around in their dorm room. And eBay came out of, like, some guy trying to build a marketplace for his girlfriend to exchange Pez dispensers. You know? Amazon was a little more calculated. But, like, but, I mean, so I can't imagine that any of those people really had that much advice, and it seems to have worked out okay for them. But, I mean, at the same time, I'm not going to kind of sit here and tell you not to get advice on stuff. And a lot of times, people are just, like, too careful, too. I mean, it's like, I think it's more useful to, like, make things happen and then, like, apologize later than it is to make sure that you dot all your I's now and then, like, just not get stepped on. Yeah. Go for it. Well, I mean, I think that you're kind of always at that point, right? I mean, most companies are started on, like, a couple of ideas, and those are, like, a few things that they do well, right? So, I mean, Yahoo's was, like, we're going to organize all this information in the world, like, by directory, right? And that was, like, what they started off doing. And then they kind of diversified out as time went on and built more stuff. And, like, a lot of that stuff is, like, the core of their business now. I mean, it's, like, they didn't originally do search, you know, and now directory just, like, doesn't exist. You know, it just sucks. You know, there's, like, no utility for it. I mean, Google's big thing was just, like, they did PageRank, you know, and then, I guess, like, out of PageRank, they have search, and now they kind of extend that to do other similar type of algorithms, searching in other spaces. But, I mean, you can kind of tell how, like, all the other stuff that they're doing is sort of tangential. And it's, like, they're trying really hard to, like, make PageRank and other types of algorithms that are very similar to that, like, work in their spaces. And it's just, like, not as elegant or pure of an idea as the original one was. So, I mean, Facebook, for example, when it just got started, like, what I thought was the most interesting thing was just to be able to type in someone's name and find out information about them. I mean, there was, like, hardly any of the stuff that was there now. There was no groups. There was no messages, even. There was poking. Yeah, so, I mean, so it's, like, you kind of get started on, like, some kind of core idea. And that's, like, and generally, like, the company will do well because, I guess, like, the people who are starting off working on that core idea kind of understand that single core idea in some sort of unique way. But that doesn't imply that they have any better understanding of anything else than anyone else. Yeah, so that's why kind of surrounding yourself with a lot of smart people is really important. I mean, there's a lot of applications on the Internet now that do that stuff. Right? So, I mean, Flickr is a pretty cool photo application. Although, I think, like, in three weeks we passed them in the number of photos that we had on our site. But, like, I mean, I think that the coolest thing about photos is that you can tag them in, like, in the way that makes them link to people's profiles. And I think that that's something that you can really only do if you have the context of, like, everyone around you on the site. Like, that kind of requires a ubiquity of usage. So I don't know if any of the other guys would have done that if they had that kind of use, but they didn't. I don't know. Don't any of you guys have any CS questions? What's an idea? All right, what's an example? I don't know much about Facebook. So, you know, the next thing you want to do is encourage people to keep going together. How do you go about figuring out which technology is a good one? How do you find the right technology? Do you have any processes in place to figure out any of that? Do you have any directives for those types of things? Or how many different companies do you go to to figure out what those are? Like, what are the connections up there? Do you have any of those? I think that our process for filtering what technologies to use are, like, trust the smart people, right? So we definitely have some people at the company who are just, like, really smart. And I think that most of the people at the company are generally pretty smart, but there are a few guys in particular, I'm not one of them, who I think that when they say that something is generally a good practice to go at it, then they can get support for that pretty easily. And I think that a lot of the engineers sort of build a consensus around that. I'm trying to think of a good example. I think that it's somewhat goal-oriented. So with photos, we knew that we wanted to support just people uploading unlimited photos. So, I mean, there's no real concept of, like, unlimited. It's just, like, you have to keep on adding stuff, right? Like, keep on adding storage, and you want to make it so that it kind of works as seamlessly as possible. So the first thing that we were trying to do was, like, well, let's evaluate, like, these companies that just do sort of large storage for a living. So, like, NetApp or something, network compliance. So we talked to them for a while, and then we're like, all right, well, we don't really want to go with, like, this single big box approach. We want to go with, like, having just a series of distributed smaller boxes with a lot of hard drive and a lot of RAM. And so I think that the architecture that we first built was one where we had a bunch of those machines with relatively slow but very stable disk behind a layer of caching boxes with, like, a ton of RAM that could hold most of the thumbnails and the most frequently accessed images in, like, I guess in RAM at any time. And then, like, right before we launched, it occurred to us that we were going to have, like, some issues with this. And the issues that we were going to have were going to be network issues, not, like, hardware issues. So, like, for example, if you take a photo album of 30 photos and each of your photos is 3 megabytes, then you have to upload 90 megabytes to Facebook. And that kind of sucks. I mean, it sucks because people tend to have, like, not optimal connections and because, like, our router, like, I guess most routers are set up to only be able to handle a gigabit at a time. So routers are kind of expensive. They're, like, they are a big piece of equipment. I don't think that there is a distributed version of that yet. But, like, so we couldn't in the time frame that we wanted to launch it just, like, get a new router and get it set up. So, like, so what we ended up doing was building a Java applet and an ActiveX control that, like, coupled the choosing of the photos that people wanted to upload with compression on the client side to make it smaller. And then, like, that way people can just, like, upload their photos relatively quickly. And then they're, like, we also saved CPU on our side because we don't have to, like, do the compression on our side, although that wasn't that huge of a bottleneck. So that worked. And then we got it to a point where we were having uploads at a rate of, like, 100 a second. And, like, people were using the feature, like, way more than we thought we were going to. And even though we had this, like, caching tier set up, it just, like, still wasn't fast enough. And I'm sure you guys remember this. Like, a few weeks ago the site was not having a good time. And, like, so what we ended up doing at that point was kind of using edge caching, so, like, Akamai type of stuff to, like, make these photos, which are static content, just be closer to people. So that way we can sort of offload some of the equipment and the sort of, like, having to transfer these still, like, somewhat large files to people. So that's where we are now, and it seems to be working pretty well. It's not, like, it wasn't that we had any sort of upfront technical genius about it. It was just sort of that, like, at each point we sort of anticipated the issues or picked them out pretty quickly and then had enough competence to evaluate, I think, what the options were that we had and make what I think were decent decisions about how to execute them. What's that? Yeah. What's that? Um, so the way that, I guess, like, the methodology that we have is that I want it to be, like, as sort of, like, as much of a meritocracy as possible where the people who can come up with the coolest solutions and implement them the quickest and have, like, the fewest bugs sort of get to work on, like, the stuff that they think is the most interesting and go off and, like, have the most influence in the company. So we're also onboarding a lot of people because we're hiring relatively quickly. And in doing so, we sort of have, we pair up, like, new people who are coming in with some, like, the better people, like, who are sort of at, like, the top of the chain. And then we have them sort of, like, work with those people when they first come in to learn the stuff that they're working on so that the new guys, like, the incoming class can sort of, like, learn what, like, some of the people who are currently at the company are working on. And I think in doing that, they pick up the style and sort of the methods that we use for doing stuff. But I think that, like, I don't know. I mean, it changes pretty quickly. I think one difference between sort of the way stuff works in a company and the way stuff works in school is that this is a very iterative process. And, like, I mean, it's nice when you get stuff right the first time, but, like, we don't need to. A lot of companies go through phases where they just, like, or stages where they don't get stuff right the first time. Like, Microsoft, I mean, I don't know when the last time was that they had a good product before version 4. You know, but, like, by the time they get to version 4, it's, like, always good, you know, for the most part. So I think that, like, works out pretty well for them. And, I mean, Google always, like, releases their stuff in beta. So we sort of, like, I guess we try to have multiple people work on the same thing so everyone can learn from each other. And, like, to kind of pick off some of the mistakes that might be made that we can, I guess, like, reduce pretty quickly. But, like, I guess in general the idea is that it doesn't have to be perfect the first time around. And as long as you get the architecture as right as possible, then a lot of the other implementation stuff isn't going to be as big of a deal. And you can sort of work that out at any time. I don't know if that's sort of answering the question that you asked. So now when, you know, we find that we need more of these, you know, there's so much about IT tech, and there's these people that are working regularly with other people. But when you started it, it was just sort of using what they had to do. And, obviously, there were sort of domains, knowledge, and issues of who you're trying to be. How do you feel what you need to feel about it? How do you go about figuring out how to do things? I mean, if you decide to take some classes, if you get books, if you go higher, or if you get involved with more people, how do you think it's working, if you're learning computer science as a student? The internet's a pretty good tool. Yeah. I think that, like, that's how we did most of it. I mean, we usually, I mean, we kind of make a point of not hiring people for skills. Because I guess the theory is, like, if someone has skills in an area and has been doing it for 10 or 15 years, then that's probably what they can do. And that's good, and that means that they can do that. But if you hire someone, say, right out of college, or someone younger, who you're just hiring them for raw intelligence, then the idea is that they're going to be able to learn stuff really quickly. And there's a lot of information available just all over the place. And now, in recent years, there's good tools for sorting through that. And I think that the most performant people we have are, like, sort of younger people who didn't necessarily know that much about anything specific coming out of college. I mean, a good example is, like, I mean, Dustin, my roommate at Harvard, wasn't even a CS major. He was an economics major. You know, and he's just, like, a really smart dude and was able to pick it up. Some of the other good people we have are EE majors out of Stanford, you know, or Berkeley. And, I mean, it's not, they aren't even CS all the time. Like, math people, it's like, if you studied math, you can learn this stuff relatively quickly in a lot of the time. So, yeah. Yeah? Yeah, I guess if you have the infrastructure in place, right now when you're focusing on hiring, do you look for more, still look for, like, back-to-school people, or do you look for people who might have the business knowledge to help grow you further and make more money? Like, what's actually the priority right now in growing the company? I never really hire people just because they have business skills. Like, I think that, like, it's actually kind of funny, but knowledge of a lot of core CS stuff is really important in business, too. So, one of the main things that you learn when you're studying CS is, like, complexity and scale. And, like, that is, like, a huge issue in business, too. It's like, how do you go from having five people to a hundred people? And, like, what's, like, kind of the change in the dynamic there? And, like, how are certain processes, like, how is a sales force going to scale from, like, five people to a hundred people? You know, and I mean, it's, like, the same type of intelligence that can figure out both of those problems. And it's like, it might be a different type of person who cares to solve the problems, but, like, I think that, like, the second part of my answer to what you said is that I think we're sort of continually in the process of building out infrastructure, and I don't think you ever get out of that process. And, like, we're kind of focusing not on just building something and figuring out how to make money off of it and sort of, like, maximizing the value of our business in the short term, but instead sort of, like, always looking to maximize what the long-term value would be. And I think that in doing that, you kind of need to always just be building out your base and not at any time be worried about maximizing your money. AUDIENCE MEMBER 2 Can you go back to, like, when you first introduced this, like, you guys had a history of, like, the day after holidays and things like that, and everybody else was like, like, excuse me, all these different times, and, like, how many people? Our peaks are pretty strong. So, like, at 5 in the morning, there's, like, no matter how many users we have signed up, there's always, like, 5,000 people, and that's it. And then, like, if you get to, like, 9 p.m. Pacific, so, like, midnight here, this is, like, the peak across the country, it's, like, close to 400,000 people using it simultaneously. And, like, it's actually kind of interesting because we, like, monitor these graphs and we have this huge LCD in our office, and, like, whenever there's a blip in the traffic, we're like, oh, crap, what happened? And a lot of times it's, like, Laguna Beach. You know? So, but usually it doesn't swing that far the other way. Yeah? Right now we don't, but we may at some point in the future. Yeah. Well, I mean, I think that what makes Facebook fun and useful is that there's a lot of information about a lot of people that you can get. But what's more important is that the information is available to the people who that person wants that information to be available to, and the flip side of that is that the information is available to the people who want to have access to that information. So, I mean, one of the, kind of, the core decisions that we made was only to let people at the same school see each other's profiles. And, I mean, I guess the idea behind that was that you're at Harvard, like, you probably wouldn't have that hard of a time just, like, letting someone else at Harvard see your information. But at the same time, it's, like, only people at Harvard who you're probably going to see on a day-to-day basis and maybe meet who are ever going to want to look you up. You know, it's not like some kid out at Stanford who you will never talk to is going to be interested in knowing what your cell phone number is, or, like, what you're interested in. So, by limiting the scope of the information to, like, sort of as narrow as makes sense, I think that we solve a lot of those issues. And then we also give people complete control over, like, how, like, what parts of their profile get shown. So, we don't force anyone to show anything, and we, like, I guess, give people granular control over some of the more sensitive stuff. So, like, right next to the cell phone field, there's, like, another field that's, like, who do you want to show this to? Just your friends, you know, just people at your school, what? So, I mean, like, we care about it, because if people stop, if people feel like their information isn't private, then that screws us in the long term, too. So, yeah. I have just something further on that. I guess, even though you put the information on your cell phone, what's the recourse case like? Say you have a photo, or somebody puts that photo on, like, a message board or something like that, like, how do you control what users do with the information that's being put on here? It's very hard to control what people do with information that they have access to. I mean, there's, like, the best that we can do is give people control over their information and who can see it. And then once they let someone see it, it's sort of, like, out of anyone's control. So, I mean, I originally threw that together in, like, a half an hour. And I guess, like, it was pretty complicated because or it was more complicated than I thought it was going to be. And I think, like, part of the reason why we changed it was because it didn't work as well as we wanted it to. And the original goal was to sort of make it so that you can have this wiki-type thing on people's profiles that when you moused over something, it showed who kind of added that part of it. But, like, I guess there were a lot of cases that we missed or it just, like, wasn't well designed by me. And, like, I don't know if you guys remember, but, like, you used to, like, mouse over stuff and it just, like, wasn't as good. And, like, it might, like, tell you the wrong person or it might highlight, like, more than it was supposed to. So, I mean, so I kind of, like, coupled that with thinking, like, you know, this isn't even the best feature. You know, it's, like, it would be much more interesting if instead of having to mouse over stuff, people could just, like, see the picture and the name of the person who posted everything without having to, like, just go through the whole wall. So, over the summer, we just kind of went through and wrote a better parser for the walls and tried to decompose them. And then going forward, we made it so that you just added a post and it went to the top of the wall. Yeah. I just wanted to make something where people could type in someone's name and get information about a person. I thought that would be cool. Oh, yeah. So, I mean, the SMS gateway is, also have, like, an email counterpart. So it's, like, if your phone number is, like, X, then, and you have Singular as your provider, then, like, you could email x at singular.com or some variant of that, and the text message would go to your phone. And that's a free gateway. So, I mean, you know how, like, when you text message people, a lot of times, like, depending on what your cell phone plan is, it'll cost you money? If you do it through email, it actually doesn't cost any money. So that's how we chose to do it. We were doing a high volume of them, and we decided that it would just be, like, a better thing for us to do to actually, like, kind of do it the legit way and send a text message directly to the cell phone as opposed to going through the email gateways. So we're kind of in the process of getting that set up now. Yeah. Yeah. Um... I mean, I think that we're, like, always looking for more stuff to do. I don't think that we're competing with Myspace, and I think it's kind of a different type of application. Yeah. So, I mean, I did that so that people couldn't go through and scrape the pages. And we have a lot of stuff that we put in place to make sure that people don't aggregate information off of Facebook. And you obviously, like, you can't see profiles of people at other schools. But also, if you try to view a lot of profiles, it, like, picks up that you're just viewing an abnormal number of profiles. And we also sort of, like, just by analyzing user activity, we've built, like, Bayesian filters that, I guess, just, like, let us pick out abnormal activity, like, really quickly, and just kind of show very limited information to those users. But, like, one of the things that we wanted to do, we wanted to make it especially difficult for anyone to try to scrape email addresses because that's really annoying if people get spam. So we figured that by making it an image instead of plain text, that just added, like, an extra level of complexity in terms of scraping. Yeah. Mm-hmm. Um... Well, we can use it to target posters to you, for example. I don't know if any of you guys bought posters off of that, but, like, we sort of, like, I mean, we're trying to figure out what we could do with that, but we're obviously, like, really sensitive to people's privacy. I mean... What's that? Yeah, I think we're actually gonna be releasing something, like, in, like, late this week or next week that shows some aggregate statistics that we think are interesting. I don't know. I mean, the stuff is kind of cool, but it's not, like, the type of thing that you come back to every day. No CS questions? Yeah. I'm especially disappointed that Will Chen didn't ask me any questions. Cool. What's up? Of course. I mean... I think that there's value to, like, to what people do on the site. But, I mean... Yeah, of course. Well, I can tell you what we're gonna do in the next two weeks. So, I mean, there's the thing that I just kind of mentioned before, where we're aggregating a bunch of stats and just kind of show, like, what's hot and, like, what's changing. And also just, like, surprising statistics that we found, so, like, 2% of people at Harvard are libertarian, for example, or something like that. I think that another thing that we're... that we're gonna launch, hopefully, sometime either late this week or next week is something that allows people to clarify their relationships with other people. So, I mean, a lot of the problems that we kind of deal with at Facebook aren't always technical, but they're sometimes, like, they're social problems. And it's, like, one thing that I think is really interesting is, you know, if you have 100 or 150 friends, it's, like, how well do you know each of those people, and who are maybe, like, the five people who you actually care about, you know, like, a lot. And that's not something that you can really answer right now, because the connections are binary. You know, it's either, like, you are connected or you're not. So, I've been trying to think for a while about, like, how we could design something that would make it so that people could express how close they were to people in sort of an unbiased way. So, I mean, you can imagine, like, if you made a feature that was just, like, rate your friendship on a scale of 1 to 10, that would not work, right? Because, like, first of all, like, no one would want to do that, because that's, like, you're, like, insulting someone if you're, like, you're a 3, you know? But, um, like, it's also, like, kind of boring, you know? And so no one would want to do it because of that. And, like, it would just be skewed by social pressure in the same way that friends are, that, like, some people have a different sense of what a friend is to them than, like, another person would. You know? So, it's like, if someone has 30 friends and another person has 150 friends, it's like, does that person actually have more friends in real life? Like, maybe or maybe not. And maybe the person with 30 just has a higher threshold for making someone a friend on Facebook. So, I mean, I guess, like, the solution that we kind of came up with for this was to make, to, like, kind of judge relationships based on bi-directional factual statements. So, for example, I took CS50 with this person, or I lived in a house with this person. And, like, there's just kind of a bunch of different ways to do stuff like that, but, I mean, I figured that that would probably be a little more accurate, because, like, it's like, no one's gonna, like, there's no pressure to lie about something like that. It's not like, what are you talking about? I didn't take CS50 with you. You know? But, like, if someone aggregates, like, a lot of different connections, then that kind of means something. So, I mean, you take someone like Dustin, who's my roommate here, and it's like, OK, well, we lived together in Kirkland House, then we worked on Facebook, then we, like, moved out to Palo Alto, and now, like, we're still working on Facebook, then, like, maybe that's like a, that's, like, enough connections to say, like, OK, well, this person, like, clearly has a lot to do with this person, you know? Whereas if, um, if, like, the only category that you know someone through is, like, this person's my Facebook friend, then that also means something. You know? So, I don't know. We'll see how it works. Nothing's for sure. What's up? Um, it's a combination. So, I mean, I think that, like, another thing that's pretty important for each of these events is the date at which they occur. So, it's like, I mean, if you had, for example, like, a date on each person's friendship with each person, then that would give you a more accurate representation of, like, what that meant, right? Because right now you don't know what friend means to, like, to each of the people on the network. And because you don't know when that friendship was formed, you don't know, like, what has changed in their relationship since that friendship was formed. But, I mean, even if, like, friendship means very little to someone, if you know that, like, that that happened yesterday, that they became friends, then you still, like, know that there's some, that there's some strength. It's like a certainty thing. You know, it's like, it's a lower certainty that their relationship has diverged since that point if the date at which the action occurred was sooner. So, sorry, more recent. So I think that that's one of the things that we're kind of focusing on here, is, like, I took a course, I took CS50 with someone, this term is a lot different than saying, like, I'm a senior now and I took CS50 with this person when I was a freshman. And, I mean, a lot of these, like, the analysis of how, like, people look at this and see the relationships isn't necessarily, like, Facebook isn't going to rate the relationship. It's sort of, people have an implicit understanding of what the difference is between having taken CS50 with someone this term and having taken CS50 with them three years ago. So, I mean, I think that that'll kind of help out. What's that? Not too. Because I think that a lot of this stuff, we sort of have a very unique platform for building it. Like, I don't think that there's any other company or, like, group of people in the world who could develop this right now. I mean, even Google, where there are, like, 5,000 engineers, is not in a place to, like, make an application that sort of characterizes people's relationships like this. And it's, like, the same thing with the photo tagging. It's, like, we could do that because, I mean, photo tagging only works if everyone around you is on the site, right? Because, I mean, otherwise you're going to get a type of use where it's, like, you go and you upload a photo and you go to tag a bunch of people and they're not there and that sucks, right? So, like, even if 50% of the people at Harvard were on Facebook, then the tagging and the way that we set it up would still suck. So, like, it only works because 97% of the people at Harvard are on Facebook or whatever. So, like, because of that, it's, like, not that big of a concern, you know? Yeah? ... I think that a lot of people, like, I mean, the people who work at Facebook really like working at Facebook, I think, for the most part, and spend a lot of their time doing that. And, like, a lot of the time that they're spending, they spend, like, working on stuff that might be sort of, like, strategically important to, like, what we're trying to do at that point. But also, like, a lot of people just mess around with the code base and, like, kind of, like, put if statements in there that's, like, if the user is me, then, like, put this in there, you know? And, like, and then, I mean, so, like, I walk around to different people's places during the day, or, like, people come and talk to me. I hold CEO office hours, as a joke, like, from 2 to 4 every day. Not today. But, um, and, like, people just come and, like, show me different stuff that they're doing. And, I mean, a lot of it is, like, relatively cool. And, I mean, stuff that I, like, wouldn't have necessarily thought of. So, I mean, I guess, like, you asked before if we were saving, if we were archiving old profile information. And one of the reasons why I said that we might start doing it is because one of the guys at the company came up with something where it's, like, so you go to your friends page, and it shows your recently updated friends. And then you click on that, and it shows their new profile, but there's no indication of what changed, you know? So, um, so one of the guys made something that keeps an old version of his profile and then makes it so that when you go to his profile, when he updates it, it highlights in yellow, like, the parts of it that were changed. And, like, I think that is pretty cool. You know, and it's not, like, a huge project. I mean, it actually kind of is if we have to start storing everyone's information. But, like, but, I mean, it's, it's somewhat cool. You know, it's not, like, the type of thing that, like, that you necessarily are bound to come up with. But I definitely think it's, like, a pretty big improvement over what we have now. You know, it's, like, now it's, like, really hard to, like, go to someone's profile and tell it changed. So. And that's just the most recent example that I have. Um. So. I don't want to do that. And the reason is because I think that Facebook is a directory and the primary purpose is to look up someone. Right? Like, type in their name and get some information about them. And, like, one of the things that's really useful is that everyone's page is structured in the same way. So if you want to see if someone's single, you don't have to, like, scan down the columns until you get to relationship status. You just know where that is. You know, and, like, so you, like, click, go to, like, your eyes just, like, go to that thing. But, like, if you had, like, different people changing their CSSs in different ways, then, like, that could become annoying. Especially if people were doing stuff like dark blue text on black backgrounds. And it just gets, like, kind of obnoxious. Yeah. So how successful or how effective was Facebook for you? Um, and, um, what did you feel was the main purpose of Facebook? Um, the purpose for me of the high school one was the same. I think that, like, I think that the application, I mean, this is going to probably sound pretty stupid, but, like, wanting to look people up, I think, is, like, kind of a core human desire. Right? It's, like, I think that, like, people just want to know stuff about other people. So I think that, um, providing an interface where people can just type in someone's name and, like, get some information about them is generally a pretty useful thing. So, I mean, growth has been pretty good. It was tough to figure out exactly how to gauge it because we, like, when we did college, we opened it up at Harvard. Then we opened it up at, like, a couple of colleges around Harvard. And the idea was always we're really short on money and equipment. So while getting as little equipment as possible, we want to maximize our growth. So we want to launch at the schools that we think are going to grow the quickest based on the fact that the people at those schools are going to have the most number of friends at the schools that we're already at. We took a different approach for high school because we could just launch it everywhere at the same time. So we didn't really know how it was going to grow. I think it's growing at, like, more than 5,000 people a day, which is pretty good. Yeah. When you started Facebook, did you intend for it to become a full-fledged community? No. Well, how did you do it? I mean, I remember, like, thinking that it would be cool if you could have a directory of everyone. I remember, like, arguing with my parents about this because after I almost got kicked out of school for, like, this project that I did before Facebook, um, like, they were like, what good could possibly come of, like, doing something new? And I'm like, no, this is pretty cool. I'm like, just, like, imagine how cool it would be if, like, you could just, like, type in someone's name and get some information about them. And they were like, I just, I don't see it. And I'm like, well, like, we'll just do it at Harvard for now, but, like, imagine what happens if, like, one day you can just, like, type in anyone's name and get some information about them. And, like, that'd be kind of cool, right? So, um, they didn't buy it, but now they do. Um, yeah. Um, so I don't know. I guess, like, at each phase, we're just kind of looking at, like, a natural way to preserve the integrity of the network and also to make it so that it's more useful. I guess is, like, the answer to that question. Mm-hmm. Yeah. Um, I just suggest that you take the hardest courses that you can, because you learn the most when you challenge yourself, right? So, like, 161 just, like, ruined my life, and I learned so much from it. Um, 121 I also found pretty hard. Um, 124 kind of changed the way I thought about stuff. Like, what 124 taught me that I think was really useful was that there are, like, I mean, I think a lot of people focus on how to do stuff as well as possible, and, like, kind of, like, how to make, like, the most efficient algorithm. But, like, what has always gotten us by isn't doing it, isn't, like, doing stuff in the most efficient way, but laying the framework, and, like, in a pretty efficient way. So, I mean, it kind of teaches you both sides of the problem, like data structures and algorithms, and, like, how the setup is really important. And, I mean, that's definitely, like, saved our ass in scaling a lot of times. Um, I don't know. Work with smart people. Learn from people. Yeah. Um, Um, people can make whatever they want, but that doesn't mean that they can put it on the site. So, I mean, like, Um, I think that, like, before stuff goes on the site, a lot of people see it, and, like, I mean, I definitely, like, check off on it before it can go live. But, I mean, I think that people have a lot of creativity to do cool stuff, and a lot of the times, like, it's, like, you, someone can come up with a cool idea, but, like, that doesn't mean it's the final way that it would happen. You know, so, like, so, for example, people putting, like, highlighting in yellow, like, what the changes are in their profile. I think that just the concept of highlighting stuff that has changed is really good, but the interface that that guy used for it isn't what I think is the best one, and the way that he's storing the old profile information isn't optimal either, and, I mean, that kind of is cool, because he was just doing it for himself. But, like, but, I mean, if we were ever going to make something live out of that, which, I mean, I'd want to, we'd do it in a different way. So, and it's more just like a mock-up. So, like, do you have to do something to ground out the two guys, like, top-down or bottom-down? I mean, it goes both ways. I'm, like, I'm not completely unopinionated. So, going back to the medical part, is this a different platform? Yeah. So, college students are over-age, and a lot of folks aren't sure who they want, and how they're going to do themselves, and that's not what we're talking about. I mean, they're not the same age, you know, younger, and stuff like that. But, could you sort of implicate their 15, 16, younger, and, I mean, what do you guys think? So, I mean, a lot of the solutions that we come up with stuff aren't technical or organizational, but just applying social pressure in good ways. So, I mean, MySpace has almost a third of their staff is monitoring the pictures that get uploaded for pornography. We hardly ever have any pornography uploaded. And, like, I think that a lot of the reason is that people use their real names on Facebook. And your real email address for school. And if you have that, then you're not going to upload pornography. And I think that's a really simple social solution to a possibly complex technical issue. So, I mean, that said, we changed some of the features around for high school. So, for example, we took parties out because we figured that, like, parents would get pissed off, or it would just break up all the keg parties really quickly, and that would suck for everyone. I don't know. We de-emphasize contact information in high school. Yeah. Well, we end it here. If you have other questions, feel free to come down. Thank you very much. Yeah.

Menu

Mark Zuckerberg's Lecture on the Functioning of Facebook from... 2005

Toggle timeline summary

Transcription