Should services charge “super users”?

Robert Scoble Uncategorized May 26, 2008 1 Minute

Om Malik says that Twitter should charge super users like me and come up with a business model.

Dare Obasanjo, in a separate, but similar post comes to the conclusion that Twitter’s problems are due to super users like me.

Interesting that both of these guys are wrong.

First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.

Second of all, why can FriendFeed keep up with the ever increasing load? I have 10,945 friends on FriendFeed (all added in the past three months, which is MUCH faster growth than Twitter had) and it’s staying up just fine.

But to the point, why not charge super users? I’d pay. But, if Dare and Om are right, there’s no way that I’d support the service enough to pay for my real cost on the service.

Either way, Twitter’s woes were happening long before my account got super huge. Remember SXSW last year? I only had 500 followers and Leo Laporte had something like 800. The service still went down. If this were a straight “n-scale” problem the crashing problems wouldn’t have shown up so early.

Why not just limit account size, like Facebook did? Well, that’s one way to deal with the problem, but if you look at my usage of Facebook it’s gone down to only a few minutes every month. I don’t even answer messages there anymore. Why? Cause I get frustrated at getting messages from people who wonder why I won’t accept them as a friend. It’s no business “utility” if I can’t make infinitely large friend lists and use those lists in the same way I use email (which Facebook also bans).

So, what do I do? I get excited by FriendFeed which lets 11,000 people interact with me in a public way. I have a feeling that that rapid growth will continue unabated and so far Friendfeed has stayed “Google fast.”

Nice try, though.

Published by Robert Scoble

I help entrepreneurs build their technology business' story, help with getting ready for investors, with other launch plans, and many other strategic things that can help your new startup. Call to talk: +1-425-205-1921 (text first). View all posts by Robert Scoble

Published May 26, 2008

139 thoughts on “Should services charge “super users”?”

Tiago Macedo says:

May 26, 2008 at 10:56 am

Actually Dare’s post sums it up. It doesn’t mean that Twitter’s architecture is the one he suggests but given their database problems it’s very likely.

Everytime you update, Twitter has to get a list of your 25k followers, sort out any @ replies, find out what their notification settings are, notify each and everyone individually and add a message to their feed (even if it’s still the same one). All this while their feeds are being hit like crazy by desktop clients.

So, Twitter is a notification system with multiple entries and exit points. Friendfeed is an aggregator. It doesn’t, as far as I know, notify anyone.

LikeLike
Tiago Macedo says:

May 26, 2008 at 3:56 am

Actually Dare’s post sums it up. It doesn’t mean that Twitter’s architecture is the one he suggests but given their database problems it’s very likely.

Everytime you update, Twitter has to get a list of your 25k followers, sort out any @ replies, find out what their notification settings are, notify each and everyone individually and add a message to their feed (even if it’s still the same one). All this while their feeds are being hit like crazy by desktop clients.

So, Twitter is a notification system with multiple entries and exit points. Friendfeed is an aggregator. It doesn’t, as far as I know, notify anyone.

LikeLike
Oliver Thylmann says:

May 26, 2008 at 10:59 am

As far as I know this is exactly how Twitter does it. Once you post a Message it gets copied to the streams of all your followers. The problem is that building up the last messages of the people you follow based on their user_id is just not working fast enough. Having a copy of your message is easier and faster to load. so this is exactly how it works, but I am not sure why this means that you need to start paying 🙂

LikeLike
Oliver Thylmann says:

May 26, 2008 at 3:59 am

As far as I know this is exactly how Twitter does it. Once you post a Message it gets copied to the streams of all your followers. The problem is that building up the last messages of the people you follow based on their user_id is just not working fast enough. Having a copy of your message is easier and faster to load. so this is exactly how it works, but I am not sure why this means that you need to start paying 🙂

LikeLike
Wolf says:

May 26, 2008 at 11:03 am

If you’re awake for 12 hours a day you have 3.9 seconds to ‘interact’ with each of your ‘friends’.

LikeLike
Wolf says:

May 26, 2008 at 4:03 am

If you’re awake for 12 hours a day you have 3.9 seconds to ‘interact’ with each of your ‘friends’.

LikeLike
Robert Scoble says:

May 26, 2008 at 11:09 am

Wolf: that’s assuming that each of my friends actually does anything or wants to be interacted with. I’ve found that to be not true at all.

Anyway, here’s another really interesting conversation cluster that this post started over on FriendFeed: http://friendfeed.com/e/a2463347-f07a-ab3f-4f41-dba35d38dee9

LikeLike
Robert Scoble says:

May 26, 2008 at 4:09 am

Wolf: that’s assuming that each of my friends actually does anything or wants to be interacted with. I’ve found that to be not true at all.

Anyway, here’s another really interesting conversation cluster that this post started over on FriendFeed: http://friendfeed.com/e/a2463347-f07a-ab3f-4f41-dba35d38dee9

LikeLike
jonas says:

May 26, 2008 at 11:16 am

Could you elaborate on “remixes them”? Because so far Dare Obasanjos thoughts sounds much more plausible.

LikeLike
Robert Scoble says:

May 26, 2008 at 11:16 am

Tiago: FriendFeed certainly notifies people. It even has an API where you can get messages sent into.

As to architecture. OK, let’s have one object:

Scoble’s Tweets.

Then let’s have another object.

Jane Smith’s Tweets.

Now let’s have a third object:

John Schmidt’s Tweet page that displays both Jane’s and Scoble’s Tweets.

Sounds like Scoble’s and Jane’s Tweets are being copied, right?

No.

In fact, if John Schmidt never uses his account, nothing happens at all.

But, let’s say that John Schmidt opened his Web browser and visited Twitter. Well, ONLY THEN does John Schmidt’s object (which knows which Tweets it should go look for) talk to the other two objects, and say “give me your Tweets.” Then John’s object mashes them together and displays them to John. It also, then, closes down and releases all memory and disk space until the next time John asks for something.

This does not change if there are a million “objects” being mashed up. No copies are living permanently. Just the original objects.

Got it yet? I’ll do a video, if you want to understand it more.

LikeLike
Paul Short says:

May 26, 2008 at 11:16 am

Robert, Duncan Riley referred me to a plugin for wordpress blogs that automatically adds friendfeed comments to the originating post on your blog + lets people comment on friendfeed items from the blog as well. You can see it in action on his Inquisitr.com site. Here’s the link to that plugin if you’re interested: http://tinyurl.com/2uqa6l

LikeLike
jonas says:

May 26, 2008 at 4:16 am

Could you elaborate on “remixes them”? Because so far Dare Obasanjos thoughts sounds much more plausible.

LikeLike
Robert Scoble says:

May 26, 2008 at 4:16 am

Tiago: FriendFeed certainly notifies people. It even has an API where you can get messages sent into.

As to architecture. OK, let’s have one object:

Scoble’s Tweets.

Then let’s have another object.

Jane Smith’s Tweets.

Now let’s have a third object:

John Schmidt’s Tweet page that displays both Jane’s and Scoble’s Tweets.

Sounds like Scoble’s and Jane’s Tweets are being copied, right?

No.

In fact, if John Schmidt never uses his account, nothing happens at all.

But, let’s say that John Schmidt opened his Web browser and visited Twitter. Well, ONLY THEN does John Schmidt’s object (which knows which Tweets it should go look for) talk to the other two objects, and say “give me your Tweets.” Then John’s object mashes them together and displays them to John. It also, then, closes down and releases all memory and disk space until the next time John asks for something.

This does not change if there are a million “objects” being mashed up. No copies are living permanently. Just the original objects.

Got it yet? I’ll do a video, if you want to understand it more.

LikeLike
Paul Short says:

May 26, 2008 at 4:16 am

Robert, Duncan Riley referred me to a plugin for wordpress blogs that automatically adds friendfeed comments to the originating post on your blog + lets people comment on friendfeed items from the blog as well. You can see it in action on his Inquisitr.com site. Here’s the link to that plugin if you’re interested: http://tinyurl.com/2uqa6l

LikeLike
Robert Scoble says:

May 26, 2008 at 11:18 am

Translation: the only scaling problem would be when I started up my Twitter and wanted to see all objects from everyone. Then my object would have to work harder than, say, your object because your object would only have to find a few Tweets. Mine has to find 23,000. OK, so they have to throw a little extra processor at my account, but only when I’m using the system. If, like right now, I’m not using the system it has absolutely no extra load on the system unless someone calls my object and makes it do work.

How do I know this? Ask the Exchange team how it keeps stuff from duplicating all over the place and causing server disks from filling up.

LikeLike
Robert Scoble says:

May 26, 2008 at 4:18 am

Translation: the only scaling problem would be when I started up my Twitter and wanted to see all objects from everyone. Then my object would have to work harder than, say, your object because your object would only have to find a few Tweets. Mine has to find 23,000. OK, so they have to throw a little extra processor at my account, but only when I’m using the system. If, like right now, I’m not using the system it has absolutely no extra load on the system unless someone calls my object and makes it do work.

How do I know this? Ask the Exchange team how it keeps stuff from duplicating all over the place and causing server disks from filling up.

LikeLike
Robert Scoble says:

May 26, 2008 at 11:20 am

>>Once you post a Message it gets copied to the streams of all your followers.

Absolutely wrong.

Only gets copied if a user instantiates his object and asks for those things. Even then, it’s not “copied” except to display it, and that copy is temporary and stored in your browser, or in your Google Talk account.

LikeLike
Robert Scoble says:

May 26, 2008 at 4:20 am

>>Once you post a Message it gets copied to the streams of all your followers.

Absolutely wrong.

Only gets copied if a user instantiates his object and asks for those things. Even then, it’s not “copied” except to display it, and that copy is temporary and stored in your browser, or in your Google Talk account.

LikeLike
Michael Foord says:

May 26, 2008 at 11:21 am

“If this were a straight “n-scale” problem the crashing problems wouldn’t have shown up so early.”

Why not? As they scale up their system – the number of users is growing just as fast. If they scale just quick enough to stay one step behind the problem they will continue to have issues.

I don’t blame them – it’s a difficult problem and not many sites have to cope with such massive growth so quickly.

LikeLike
Michael Foord says:

May 26, 2008 at 4:21 am

“If this were a straight “n-scale” problem the crashing problems wouldn’t have shown up so early.”

Why not? As they scale up their system – the number of users is growing just as fast. If they scale just quick enough to stay one step behind the problem they will continue to have issues.

I don’t blame them – it’s a difficult problem and not many sites have to cope with such massive growth so quickly.

LikeLike
RBA says:

May 26, 2008 at 11:23 am

So it is no secret to say that Twitter wasn’t created with scalability in mind – like 90% of all “2.0” projects. After all, Twitter was born and it stayed completely in the dark for over 8 months until it exploded at the SXSW’07. I don’t think it went down during those first 8 months (and if it did, not many people noticed anyway).

And ever since the first time it went down, chances are they’ve been patching and optimizing things here and there, when perhaps what Twitter needs is a complete remake – which shouldn’t really be THAT hard considering Twitter is above all, a very simple application – that thing doesn’t put a spacecraft in Mars – so the main focus should be scalability. Perhaps they’re doing that already. If not, they should.

On the other hand, FF most likely has been created with scalability in mind, and so far, other than throwing hardware at it, as long as they’re somewhat ahead of the growth game, it doesn’t need anything to stay afloat as it grows. It’s not rocket science either – they simply didn’t (supposedly) ignore the possibility of growth when they started to write their software. Which is what everyone should do when starting a project, and there’s plenty of documentation out there and plenty of great engineers who know how to architect a simple (or complex) app so that it will scale if necessary.

Leaving that aside, the business model is a very interesting and fair question. No, I don’t agree with Om. Not because I don’t think super-users shouldn’t be charged, but because charging super-users doesn’t fix anything, scalability-wise. I also don’t think Om understands how Twitter works internally. Ok, *I* don’t know how Twitter works, but if it does the way Om describes it, then the folks at Twitter absolutely definitely need to rewrite the whole thing from scratch. Personally I didn’t like neither Obasanjo’s nor Om’s articles at all. You? Well, you’re talking about Twitter and FriendFeed, and a bit of Facebook. Thank god for that “This is why I love the tech industry” article, because it is for posts like that I’m still reading you. (No offense, I just don’t use neither Tw nor FF, so this fun madness you guys have is completely out of my radar…)

LikeLike
RBA says:

May 26, 2008 at 4:23 am

So it is no secret to say that Twitter wasn’t created with scalability in mind – like 90% of all “2.0” projects. After all, Twitter was born and it stayed completely in the dark for over 8 months until it exploded at the SXSW’07. I don’t think it went down during those first 8 months (and if it did, not many people noticed anyway).

And ever since the first time it went down, chances are they’ve been patching and optimizing things here and there, when perhaps what Twitter needs is a complete remake – which shouldn’t really be THAT hard considering Twitter is above all, a very simple application – that thing doesn’t put a spacecraft in Mars – so the main focus should be scalability. Perhaps they’re doing that already. If not, they should.

On the other hand, FF most likely has been created with scalability in mind, and so far, other than throwing hardware at it, as long as they’re somewhat ahead of the growth game, it doesn’t need anything to stay afloat as it grows. It’s not rocket science either – they simply didn’t (supposedly) ignore the possibility of growth when they started to write their software. Which is what everyone should do when starting a project, and there’s plenty of documentation out there and plenty of great engineers who know how to architect a simple (or complex) app so that it will scale if necessary.

Leaving that aside, the business model is a very interesting and fair question. No, I don’t agree with Om. Not because I don’t think super-users shouldn’t be charged, but because charging super-users doesn’t fix anything, scalability-wise. I also don’t think Om understands how Twitter works internally. Ok, *I* don’t know how Twitter works, but if it does the way Om describes it, then the folks at Twitter absolutely definitely need to rewrite the whole thing from scratch. Personally I didn’t like neither Obasanjo’s nor Om’s articles at all. You? Well, you’re talking about Twitter and FriendFeed, and a bit of Facebook. Thank god for that “This is why I love the tech industry” article, because it is for posts like that I’m still reading you. (No offense, I just don’t use neither Tw nor FF, so this fun madness you guys have is completely out of my radar…)

LikeLike
Robert Scoble says:

May 26, 2008 at 11:25 am

Michael Foord: nah, the problems would become much much worse as they scaled up. The architecture they chose isn’t too far off. It’s just that they never did engineer it properly. The fact that just this week they’ve gotten the ability to turn off features one by one shows me that they never were run professionally until recently. I bet that Twitter starts getting stable very quickly now. Remember, there’s only a million or two on Twitter. Facebook keeps up with 80 million. Hotmail 200 million every 30 days. Facebook and Hotmail don’t go down, even though they are doing stuff more complex and at a larger scale than Twitter is.

LikeLike
Robert Scoble says:

May 26, 2008 at 4:25 am

Michael Foord: nah, the problems would become much much worse as they scaled up. The architecture they chose isn’t too far off. It’s just that they never did engineer it properly. The fact that just this week they’ve gotten the ability to turn off features one by one shows me that they never were run professionally until recently. I bet that Twitter starts getting stable very quickly now. Remember, there’s only a million or two on Twitter. Facebook keeps up with 80 million. Hotmail 200 million every 30 days. Facebook and Hotmail don’t go down, even though they are doing stuff more complex and at a larger scale than Twitter is.

LikeLike
John Handelaar says:

May 26, 2008 at 4:30 am

“This is like saying that Exchange stores each email once for each user. That’s totally not true”

Sweet how you never had to work with an Exchange server which did exactly that, and then added ‘All’ as a recipient to the address book of every user.

I’ll grant it doesn’t do it now. But it sure as hell used to.

LikeLike
John Handelaar says:

May 26, 2008 at 11:30 am

“This is like saying that Exchange stores each email once for each user. That’s totally not true”

Sweet how you never had to work with an Exchange server which did exactly that, and then added ‘All’ as a recipient to the address book of every user.

I’ll grant it doesn’t do it now. But it sure as hell used to.

LikeLike
HvI says:

May 26, 2008 at 11:31 am

Quote: “If you unironically regard public comments on an RSS feed as “conversations” with “friends,” you either need a dictionary or better friends.” http://textism.com/favrd/favorites/tweet/817323555

LikeLike
HvI says:

May 26, 2008 at 4:31 am

Quote: “If you unironically regard public comments on an RSS feed as “conversations” with “friends,” you either need a dictionary or better friends.” http://textism.com/favrd/favorites/tweet/817323555

LikeLike
Robert Scoble says:

May 26, 2008 at 11:34 am

>I’ll grant it doesn’t do it now. But it sure as hell used to.

I know it did. Which is why some people still don’t understand the architecture that Exchange uses (which is why I was “educated” on the issue).

By the way, this caused a famous and massive problem inside Microsoft when the database server filled up when someone accidentally emailed something to “all.” Email went down for two days, the way I heard it.

LikeLike
Robert Scoble says:

May 26, 2008 at 4:34 am

>I’ll grant it doesn’t do it now. But it sure as hell used to.

I know it did. Which is why some people still don’t understand the architecture that Exchange uses (which is why I was “educated” on the issue).

By the way, this caused a famous and massive problem inside Microsoft when the database server filled up when someone accidentally emailed something to “all.” Email went down for two days, the way I heard it.

LikeLike
Geoff says:

May 26, 2008 at 12:11 pm

As far as I’m aware Twitter is the only service that allows posting and receiving by SMS. The big problem with SMS is it is a untimed service, when I text there is no guarantee when and if it will be delivered this must be a problem for them.

Robert if you remember in the bad old days 🙂 when Blogger was crashing all the time they offered a Pro service where you paid in the hope of some reliability – fortunately Google took them out and over a period of a year or two sorted out the problems. I hope that Google do the same with Twitter 🙂

LikeLike
Geoff says:

May 26, 2008 at 5:11 am

As far as I’m aware Twitter is the only service that allows posting and receiving by SMS. The big problem with SMS is it is a untimed service, when I text there is no guarantee when and if it will be delivered this must be a problem for them.

Robert if you remember in the bad old days 🙂 when Blogger was crashing all the time they offered a Pro service where you paid in the hope of some reliability – fortunately Google took them out and over a period of a year or two sorted out the problems. I hope that Google do the same with Twitter 🙂

LikeLike
Sol Young says:

May 26, 2008 at 1:27 pm

No, services shouldn’t charge “super users.” (I’d be surprised if “super users” don’t start receiving significant sponsorships to come and use a service).

As far as the workflow for Twitter vs. the workflow of FriendFeed, it’s impossibly unfair to compare Twitter to FriendFeed (yet). Twitter is pushing updates the moment you send an update. FriendFeed isn’t doing instant updates via XMPP (Jabber) or SMS.

Additionally, Twitter is at the “oh wow, if I follow 10,000 people I’ll probably have 1,000 follow me back and I can spam them.” This is making a large number of “super users”, not just you Robert 🙂 They’re getting hammered in traffic compared to FriendFeed.

Let’s compare the numbers in terms of service reliability and overall load (rounded down)… You’ve got 10,000 followers on FriendFeed and 20,000 on Twitter. If this is a true representation of the population on each service (it’s not, but we’ll pretend), this means Twitter has double the traffic of users. Double the traffic, in a push based service, does not mean double the load… There are double the updates to double the followers.

A semi-decent formula for load based on the above:
Twitter != FriendFeed x 2
Twitter = FriendFeed ^ 2

LikeLike
Sol Young says:

May 26, 2008 at 6:27 am

No, services shouldn’t charge “super users.” (I’d be surprised if “super users” don’t start receiving significant sponsorships to come and use a service).

As far as the workflow for Twitter vs. the workflow of FriendFeed, it’s impossibly unfair to compare Twitter to FriendFeed (yet). Twitter is pushing updates the moment you send an update. FriendFeed isn’t doing instant updates via XMPP (Jabber) or SMS.

Additionally, Twitter is at the “oh wow, if I follow 10,000 people I’ll probably have 1,000 follow me back and I can spam them.” This is making a large number of “super users”, not just you Robert 🙂 They’re getting hammered in traffic compared to FriendFeed.

Let’s compare the numbers in terms of service reliability and overall load (rounded down)… You’ve got 10,000 followers on FriendFeed and 20,000 on Twitter. If this is a true representation of the population on each service (it’s not, but we’ll pretend), this means Twitter has double the traffic of users. Double the traffic, in a push based service, does not mean double the load… There are double the updates to double the followers.

A semi-decent formula for load based on the above:
Twitter != FriendFeed x 2
Twitter = FriendFeed ^ 2

LikeLike
Michael Foord says:

May 26, 2008 at 1:42 pm

Geoff: Google have bought jaiku and so are unlikely to buy Twitter. 🙂

I wonder how much the outages are driving people into Pownce and Jaiku. I know of at least one of my ‘Twitter friends’ who is going *back* to Jaiku because of the service problems.

LikeLike
Michael Foord says:

May 26, 2008 at 6:42 am

Geoff: Google have bought jaiku and so are unlikely to buy Twitter. 🙂

I wonder how much the outages are driving people into Pownce and Jaiku. I know of at least one of my ‘Twitter friends’ who is going *back* to Jaiku because of the service problems.

LikeLike
Dario Salvelli says:

May 26, 2008 at 2:04 pm

Hi Robert. Do u remember that Twitter was born for other goal? Do u remember the name of Twttr? It was “only” a send SMS group apps, in origin. The team is the same of Odeo. True?

So, Twitter is a sort of messagging system such as IM but in a public way (but you can also set a protected status, why are u frustated?) and as the team write the system that “Twitter was not architected as a messaging system”:

http://dev.twitter.com/2008/05/twittering-about-architecture.html

LikeLike
Dario Salvelli says:

May 26, 2008 at 7:04 am

Hi Robert. Do u remember that Twitter was born for other goal? Do u remember the name of Twttr? It was “only” a send SMS group apps, in origin. The team is the same of Odeo. True?

So, Twitter is a sort of messagging system such as IM but in a public way (but you can also set a protected status, why are u frustated?) and as the team write the system that “Twitter was not architected as a messaging system”:

http://dev.twitter.com/2008/05/twittering-about-architecture.html

LikeLike
barrkel says:

May 26, 2008 at 2:17 pm

There are two basic ways to build a Twitter-like solution. Either you have, (A) per tweet, a single write and, per user, huge joined reads; or, (B) per tweet, huge numbers of writes and, per user, a single cheap read.

With Twitter, reading generally happens more often than writing, especially when you have desktop clients built around polling. That implies going with solution (B), which has some big problems – most databases aren’t set up to deal efficiently with lots of writes.

So, you can try to work it with solution (A), but then you need lots of muscle for all these joined queries. If you’re using database sharding, you’ll probably need to issue queries to multiple databases running on multiple machines, and join all the results and sort them by time, per each user page refresh or desktop client poll. That’s a lot of work per user.

It sounds pretty expensive – better cache it. Leads to a hybrid solution; single write, rare combination reads but not too often (i.e. not every poll or page refresh). Some risk of stale updates.

No matter which way you look at it, though, the scaling isn’t quite linear, as some of the old folks will follow new folks as they get added. It should ultimately end up as linear, though with a high constant factor, that constant determined by the average “noise threshold” per user.

Looking at the pure “unit of work”, lots of writes probably beats lots of reads, because the reading solution requires sorting and, with the addition of caching layers, has cache coherency problems. Writing can be based around appending to queues.

Also, all the “extra” features that Twitter-folks (in their blogs at least) seem to think are so essential, are quite costly to implement.

LikeLike
Barry Kelly says:

May 26, 2008 at 7:17 am

There are two basic ways to build a Twitter-like solution. Either you have, (A) per tweet, a single write and, per user, huge joined reads; or, (B) per tweet, huge numbers of writes and, per user, a single cheap read.

With Twitter, reading generally happens more often than writing, especially when you have desktop clients built around polling. That implies going with solution (B), which has some big problems – most databases aren’t set up to deal efficiently with lots of writes.

So, you can try to work it with solution (A), but then you need lots of muscle for all these joined queries. If you’re using database sharding, you’ll probably need to issue queries to multiple databases running on multiple machines, and join all the results and sort them by time, per each user page refresh or desktop client poll. That’s a lot of work per user.

It sounds pretty expensive – better cache it. Leads to a hybrid solution; single write, rare combination reads but not too often (i.e. not every poll or page refresh). Some risk of stale updates.

No matter which way you look at it, though, the scaling isn’t quite linear, as some of the old folks will follow new folks as they get added. It should ultimately end up as linear, though with a high constant factor, that constant determined by the average “noise threshold” per user.

Looking at the pure “unit of work”, lots of writes probably beats lots of reads, because the reading solution requires sorting and, with the addition of caching layers, has cache coherency problems. Writing can be based around appending to queues.

Also, all the “extra” features that Twitter-folks (in their blogs at least) seem to think are so essential, are quite costly to implement.

LikeLike
Tim Marman says:

May 26, 2008 at 2:43 pm

“First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.”

Robert, as was already pointed out this was once true for Exchange, but regardless I fail to see how you can make this same assumption for Twitter.

Regardless of how many times it’s stored, Twitter also has a tougher routing problem. With Exchange, the sender defines where the message will be received. Twitter is fundamentally different – the sender broadcasts the message, and then the system needs to figure out where to deliver it. This means some of your 25,000 followers – remember, it still has to figure out if I will receive the message based on whether it’s an @reply and what my settings are.

Twitter also has to deliver it to the countless number of tracks. Let’s assume that the average word length for English is 5.10 (http://blogamundo.net/lab/wordlengths/). On twitter, it’s likely less given the 140 char limitation, we tend to use more abbreviations and generally shorter words. Taking out, let’s say, 30 chars for punctuation – that means there are 20 distinct words. Twitter in turn needs to figure out who is tracking what, and the track functionality supports tracking word1+word2+word3. Obviously there are a number of ways to implement this more efficiently, but in effect Twitter has to do a fair amount of processing to see if a given message should be delivered to a given person’s track queue.

It’s clear that they have a bottleneck somewhere. Given the roots of the service, it’s pretty clear the architecture didn’t plan for this kind of use – and they admitted it in the link Dario posted. None of us really know what’s going on behind the scenes, but based on what little evidence we have Dare’s scenario seems plausible and perhaps likely.

Ignoring some of the differences in how the service is used, the other thing that FriendFeed had was the luxury of architecting their system after they saw how Twitter was being used. Twitter likely would have done things differently with the benefit of hindsight, but it sounds like (from interviews with Blaine) that much of their time was spent fighting fires as opposed to re-engineering the system.

LikeLike
Tim Marman says:

May 26, 2008 at 7:43 am

“First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.”

Robert, as was already pointed out this was once true for Exchange, but regardless I fail to see how you can make this same assumption for Twitter.

Regardless of how many times it’s stored, Twitter also has a tougher routing problem. With Exchange, the sender defines where the message will be received. Twitter is fundamentally different – the sender broadcasts the message, and then the system needs to figure out where to deliver it. This means some of your 25,000 followers – remember, it still has to figure out if I will receive the message based on whether it’s an @reply and what my settings are.

Twitter also has to deliver it to the countless number of tracks. Let’s assume that the average word length for English is 5.10 (http://blogamundo.net/lab/wordlengths/). On twitter, it’s likely less given the 140 char limitation, we tend to use more abbreviations and generally shorter words. Taking out, let’s say, 30 chars for punctuation – that means there are 20 distinct words. Twitter in turn needs to figure out who is tracking what, and the track functionality supports tracking word1+word2+word3. Obviously there are a number of ways to implement this more efficiently, but in effect Twitter has to do a fair amount of processing to see if a given message should be delivered to a given person’s track queue.

It’s clear that they have a bottleneck somewhere. Given the roots of the service, it’s pretty clear the architecture didn’t plan for this kind of use – and they admitted it in the link Dario posted. None of us really know what’s going on behind the scenes, but based on what little evidence we have Dare’s scenario seems plausible and perhaps likely.

Ignoring some of the differences in how the service is used, the other thing that FriendFeed had was the luxury of architecting their system after they saw how Twitter was being used. Twitter likely would have done things differently with the benefit of hindsight, but it sounds like (from interviews with Blaine) that much of their time was spent fighting fires as opposed to re-engineering the system.

LikeLike
Roger Benningfield says:

May 26, 2008 at 3:37 pm

Robert: Um, I don’t think you understand what Dare was saying. You might wanna calm down a touch. It might be unfair to blame *you* for Twitter’s woes, but Dare’s analysis of the architecture is probably pretty accurate.

Open up Twitter… now, did you wait several minutes for your page to appear? If not, then something’s being cached on the server side. It could be via memcached, it could be via “baking” your page instead of “frying” it, or whatever. But the data isn’t being collected on the fly as you seem to believe. It’s being pushed into the cache when you’re not around to ensure UI response times remain tolerable.

Dare’s point was that Twitter was built as a micro-blogging system, and that’s how blogging systems work. You cache the hell outta everything, and you make a choice… make some users wait for extended page renders, or burn cycles in the background to ensure that everyone gets equal treatment.

LikeLike
Roger Benningfield says:

May 26, 2008 at 8:37 am

Robert: Um, I don’t think you understand what Dare was saying. You might wanna calm down a touch. It might be unfair to blame *you* for Twitter’s woes, but Dare’s analysis of the architecture is probably pretty accurate.

Open up Twitter… now, did you wait several minutes for your page to appear? If not, then something’s being cached on the server side. It could be via memcached, it could be via “baking” your page instead of “frying” it, or whatever. But the data isn’t being collected on the fly as you seem to believe. It’s being pushed into the cache when you’re not around to ensure UI response times remain tolerable.

Dare’s point was that Twitter was built as a micro-blogging system, and that’s how blogging systems work. You cache the hell outta everything, and you make a choice… make some users wait for extended page renders, or burn cycles in the background to ensure that everyone gets equal treatment.

LikeLike
Pingback: Stop Twitter Spam » More On Charging for Mass-Followers
Kurt says:

May 26, 2008 at 4:10 pm

Robert (Scoble, not the other one) –
Twitter does store multiple copies of each message, they’ve said so repeatedly in various presentations.

LikeLike
Kurt says:

May 26, 2008 at 9:10 am

Robert (Scoble, not the other one) –
Twitter does store multiple copies of each message, they’ve said so repeatedly in various presentations.

LikeLike
Richard Cunningham says:

May 26, 2008 at 4:47 pm

I don’t know if twitter are using a sharded database yet, at 350,000 users they still only had one database and read slave:
http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
Dare’s post would make sense if they have now moved to a sharded structure but my best guess is that they haven’t had a chance to do that yet.

It seems there will be duplication at least in the caching layer (memcached),
everytime Scoble sends a message 25,000 per user caches get invalidated and will need repopulating by new SQL queries.

Twitter are looking to get rid of the “with others” tab from a user to avoid at least some of this very type of problem, see here:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/89a7292e5a9eee6d

I think charging heavy users is the wrong model.

LikeLike
Richard Cunningham says:

May 26, 2008 at 9:47 am

I don’t know if twitter are using a sharded database yet, at 350,000 users they still only had one database and read slave:
http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster
Dare’s post would make sense if they have now moved to a sharded structure but my best guess is that they haven’t had a chance to do that yet.

It seems there will be duplication at least in the caching layer (memcached),
everytime Scoble sends a message 25,000 per user caches get invalidated and will need repopulating by new SQL queries.

Twitter are looking to get rid of the “with others” tab from a user to avoid at least some of this very type of problem, see here:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/89a7292e5a9eee6d

I think charging heavy users is the wrong model.

LikeLike
Mogilny says:

May 26, 2008 at 5:41 pm

Charging is silly.

– Money won’t help twitter right now.
– Charging won’t deter “superusers”.

They shouldn’t charge, they should ban.

LikeLike
Mogilny says:

May 26, 2008 at 10:41 am

Charging is silly.

– Money won’t help twitter right now.
– Charging won’t deter “superusers”.

They shouldn’t charge, they should ban.

LikeLike
Aaron Brethorst says:

May 26, 2008 at 5:49 pm

Robert: “By the way, this caused a famous and massive problem inside Microsoft when the database server filled up when someone accidentally emailed something to “all.” Email went down for two days, the way I heard it.”

I think you’re referring to Bedlam DL3, which was quite different from this. It was summarized by the Exchange team back in 2004: http://msexchangeteam.com/archive/2004/04/08/109626.aspx

LikeLike
Aaron Brethorst says:

May 26, 2008 at 10:49 am

Robert: “By the way, this caused a famous and massive problem inside Microsoft when the database server filled up when someone accidentally emailed something to “all.” Email went down for two days, the way I heard it.”

I think you’re referring to Bedlam DL3, which was quite different from this. It was summarized by the Exchange team back in 2004: http://msexchangeteam.com/archive/2004/04/08/109626.aspx

LikeLike
Omar Shahine says:

May 26, 2008 at 5:59 pm

What does Exchange have to do with it? Exchange doesn’t scale (yet) to hundreds of millions of users. Mail systems like Hotmail do in fact keep 25,000 copies of an email if that is how many hotmail users receive the same message. Doing anything else would literally saturate and melt the network (if the architecture was built around a single instance store). There is a physical limit to how much network traffic you can have going between clusters (or scale units). Ideally you optimize your IO for the access patterns of your application (lots of writes and many more reads of unique data).

BTW, the issue you are describing with Exchange failing is documented here:

http://msexchangeteam.com/archive/2004/04/08/109626.aspx

And it wasn’t a failure for the reasons you describe. It has to do with numerous issues and failures unrelated to db scale.

LikeLike
Omar Shahine says:

May 26, 2008 at 10:59 am

What does Exchange have to do with it? Exchange doesn’t scale (yet) to hundreds of millions of users. Mail systems like Hotmail do in fact keep 25,000 copies of an email if that is how many hotmail users receive the same message. Doing anything else would literally saturate and melt the network (if the architecture was built around a single instance store). There is a physical limit to how much network traffic you can have going between clusters (or scale units). Ideally you optimize your IO for the access patterns of your application (lots of writes and many more reads of unique data).

BTW, the issue you are describing with Exchange failing is documented here:

http://msexchangeteam.com/archive/2004/04/08/109626.aspx

And it wasn’t a failure for the reasons you describe. It has to do with numerous issues and failures unrelated to db scale.

LikeLike
Robert Scoble says:

May 26, 2008 at 6:27 pm

>Open up Twitter… now, did you wait several minutes for your page to appear?

Open up FriendFeed. Refresh many times. DId the page change? It certainly wasn’t pre-cached before I hit the servers.

Computers now are fast, if you have the right architecture.

How does Google work? It is always fast and doesn’t pre-cache all my pages. To do that it’d have to know what I’m thinking before I actually searched for something.

One thing you haven’t thought about is that even if everything was precached that only a small percentage of my 23,000 followers ever log into Twitter. So, if it’s building a page for each of the 23,000 followers it’s totally wasting resources.

LikeLike
Robert Scoble says:

May 26, 2008 at 11:27 am

>Open up Twitter… now, did you wait several minutes for your page to appear?

Open up FriendFeed. Refresh many times. DId the page change? It certainly wasn’t pre-cached before I hit the servers.

Computers now are fast, if you have the right architecture.

How does Google work? It is always fast and doesn’t pre-cache all my pages. To do that it’d have to know what I’m thinking before I actually searched for something.

One thing you haven’t thought about is that even if everything was precached that only a small percentage of my 23,000 followers ever log into Twitter. So, if it’s building a page for each of the 23,000 followers it’s totally wasting resources.

LikeLike
Master William says:

May 26, 2008 at 6:28 pm

Now who do I believe when it comes to authority on systems architecture.

Robert Scoble or Dare Obasanjo?

lol

LikeLike
Master William says:

May 26, 2008 at 11:28 am

Now who do I believe when it comes to authority on systems architecture.

Robert Scoble or Dare Obasanjo?

lol

LikeLike
Dare Obasanjo says:

May 26, 2008 at 6:31 pm

Robert,
It isn’t clear to me why you are taking my post so personally. Regardless of how Twitter is implemented, allowing a user to have 25,000 followers and 25,000 people they are following will cause scale problems. There are different optimizations you could make (Single Instancing is not the panacea you claim, see my post at for http://www.25hoursaday.com/weblog/2008/05/26/SomeThoughtsOnSingleInstanceStorageAndTwitter.aspx more) but it doesn’t change the fact that Twitter has made some bad design and feature decisions.

As to whether people who generate massive load on the system should be charged…isn’t that a fact of life everywhere else? Internet service providers like Comcast are known to fire customers who use too much bandwidth, in fact your buddy Dave Winer just blogged about that happening to him. Flickr, Y! Mail and a bunch of other services also charge for “pro” features. Why would Twitter pursuing such a business model be so wrong? Would you prefer to have ads in your Twitter streams?

LikeLike
Dare Obasanjo says:

May 26, 2008 at 11:31 am

Robert,
It isn’t clear to me why you are taking my post so personally. Regardless of how Twitter is implemented, allowing a user to have 25,000 followers and 25,000 people they are following will cause scale problems. There are different optimizations you could make (Single Instancing is not the panacea you claim, see my post at for http://www.25hoursaday.com/weblog/2008/05/26/SomeThoughtsOnSingleInstanceStorageAndTwitter.aspx more) but it doesn’t change the fact that Twitter has made some bad design and feature decisions.

As to whether people who generate massive load on the system should be charged…isn’t that a fact of life everywhere else? Internet service providers like Comcast are known to fire customers who use too much bandwidth, in fact your buddy Dave Winer just blogged about that happening to him. Flickr, Y! Mail and a bunch of other services also charge for “pro” features. Why would Twitter pursuing such a business model be so wrong? Would you prefer to have ads in your Twitter streams?

LikeLike
Robert Scoble says:

May 26, 2008 at 6:33 pm

Master: I wasn’t arguing with Dare so much about the architecture. But it’s totally ridiculous to say that my messages are copied 23,000 times. If that were true, then WordPress.com’s architecture would be going down left and right cause this blog would be copied 400,000 times. It is, but in your Web browser, not on the server side.

LikeLike
Robert Scoble says:

May 26, 2008 at 11:33 am

Master: I wasn’t arguing with Dare so much about the architecture. But it’s totally ridiculous to say that my messages are copied 23,000 times. If that were true, then WordPress.com’s architecture would be going down left and right cause this blog would be copied 400,000 times. It is, but in your Web browser, not on the server side.

LikeLike
Robert Scoble says:

May 26, 2008 at 6:38 pm

Omar: one problem, Twitter isn’t like email. First of all, only a small percentage of Twitter users ever sign into Twitter again. Let’s say it’s as high as 50% (I think it’s lower). That would mean 12,500 copies. Then, not every one of those users signs in every day. Let’s say only 50% sign in on a particular day. That’s 6,250 copies. But how does Twitter know which users will sign in? It doesn’t. It needs to create the pages on the fly, not copy everything to 23,000 (er, two million) separate tables in a big database. If it did that it’d quickly die, and make it extremely hard to maintain, too.

Also, many users don’t even use the Web interface. Most of the time I’m looking at messages coming at me in Google Talk. Those are coming one at a time at me. Are you really seriously expecting me to believe that Twitter copies messages 23,000 times before sending them out to me via the XMPP database?

LikeLike
Robert Scoble says:

May 26, 2008 at 11:38 am

Omar: one problem, Twitter isn’t like email. First of all, only a small percentage of Twitter users ever sign into Twitter again. Let’s say it’s as high as 50% (I think it’s lower). That would mean 12,500 copies. Then, not every one of those users signs in every day. Let’s say only 50% sign in on a particular day. That’s 6,250 copies. But how does Twitter know which users will sign in? It doesn’t. It needs to create the pages on the fly, not copy everything to 23,000 (er, two million) separate tables in a big database. If it did that it’d quickly die, and make it extremely hard to maintain, too.

Also, many users don’t even use the Web interface. Most of the time I’m looking at messages coming at me in Google Talk. Those are coming one at a time at me. Are you really seriously expecting me to believe that Twitter copies messages 23,000 times before sending them out to me via the XMPP database?

LikeLike
Omar Shahine says:

May 26, 2008 at 6:57 pm

Robert, I know Twitter isn’t email, why did you bring up Exchange and Bedlam then?

I’m not sure why you are being so assumptive about their architecture unless some one laid it out to you. Further some of your statements in defense(?) of what they may or may not be doing don’t even make sense.

I’m not being assumptive. I haven’t said one way or another what they are doing because I have no idea. I only know of the massive large scale systems we have at Microsoft and the relative pros and cons of each. I also know each is designed to meet one general architecturural need and generally these things don’t translate well to serve different kinds of IO. So that you might find is that within any large system you have dozens or more subsystems specifically designed to one scale problem. Some of those will require creating duplicate copies of the data if read performance is required to make your application scale OR be responsive.

LikeLike
Omar Shahine says:

May 26, 2008 at 11:57 am

Robert, I know Twitter isn’t email, why did you bring up Exchange and Bedlam then?

I’m not sure why you are being so assumptive about their architecture unless some one laid it out to you. Further some of your statements in defense(?) of what they may or may not be doing don’t even make sense.

I’m not being assumptive. I haven’t said one way or another what they are doing because I have no idea. I only know of the massive large scale systems we have at Microsoft and the relative pros and cons of each. I also know each is designed to meet one general architecturural need and generally these things don’t translate well to serve different kinds of IO. So that you might find is that within any large system you have dozens or more subsystems specifically designed to one scale problem. Some of those will require creating duplicate copies of the data if read performance is required to make your application scale OR be responsive.

LikeLike
Alberto says:

May 26, 2008 at 7:05 pm

Robert,

Have you read the article referred by Al3x in Twitter’s dev blog: http://www.hueniverse.com/hueniverse/2008/03/on-scaling-a-mi.html

It explains why there is a need to duplicate copies, the main reason is the speed for other APIs (i.e. FriendFeed) to read them and one of the reasons why there is not a huge delay (i.e. 10 seconds) when the tweets appear in FriendFeed, without the copying that time will be longer.

LikeLike
Alberto says:

May 26, 2008 at 12:05 pm

Robert,

Have you read the article referred by Al3x in Twitter’s dev blog: http://www.hueniverse.com/hueniverse/2008/03/on-scaling-a-mi.html

It explains why there is a need to duplicate copies, the main reason is the speed for other APIs (i.e. FriendFeed) to read them and one of the reasons why there is not a huge delay (i.e. 10 seconds) when the tweets appear in FriendFeed, without the copying that time will be longer.

LikeLike
Jeff Putz says:

May 26, 2008 at 7:14 pm

I think Om’s point was more along the lines that instead of simply bleeding cash, they could make a little. I don’t think he was questioning the architecture.

LikeLike
Jeff Putz says:

May 26, 2008 at 12:14 pm

I think Om’s point was more along the lines that instead of simply bleeding cash, they could make a little. I don’t think he was questioning the architecture.

LikeLike
Nick Halstead says:

May 26, 2008 at 7:24 pm

Scoble, I think you are fantastic and your enthusiasm for the industry is amazing, but you really should steer clear of the technical arguments.

You are trying to argue that twitter is using a ‘pivot table’ – so you have one table for users, one table for messages, and a third table that describes your friend relationships. When a query comes into to see a particular users stream you think they ‘mix’ this up, so you do a many-to-many lookup, so for every user (25k in your case) you then look in every one of those users message queues for the most recent messages then mash them together.

Now they may have started with a ‘obvious’ schema like this about a year ago but I can assure you 1000% that this does NOT scale very far and certainly not up to the point they have got. The reason? because many-to-many lookups in any RDBM are extremely costly and secondly it is very hard to scale across hardware when you build like this, because it is almost impossible to shard because the many-to-many means everyone can potentially be joined together.

The second methodology described which you laughed at, IS SCALABLE – because you can shard to as many machine as you like for an example lets say each shard owns (10,000) users – each message you send just has to send a tiny signal to each shard of your new message – each shard then looks up within its own local database of 10,000 users to see if any of them are following you. It then adds your message to their queue.

This is a classic normalization vs de-normalization – you describe normalization in how you think it works – what I hope (and I am sure they are doing a variant of) is de-normalization.

LikeLike
Nick Halstead says:

May 26, 2008 at 12:24 pm

Scoble, I think you are fantastic and your enthusiasm for the industry is amazing, but you really should steer clear of the technical arguments.

You are trying to argue that twitter is using a ‘pivot table’ – so you have one table for users, one table for messages, and a third table that describes your friend relationships. When a query comes into to see a particular users stream you think they ‘mix’ this up, so you do a many-to-many lookup, so for every user (25k in your case) you then look in every one of those users message queues for the most recent messages then mash them together.

Now they may have started with a ‘obvious’ schema like this about a year ago but I can assure you 1000% that this does NOT scale very far and certainly not up to the point they have got. The reason? because many-to-many lookups in any RDBM are extremely costly and secondly it is very hard to scale across hardware when you build like this, because it is almost impossible to shard because the many-to-many means everyone can potentially be joined together.

The second methodology described which you laughed at, IS SCALABLE – because you can shard to as many machine as you like for an example lets say each shard owns (10,000) users – each message you send just has to send a tiny signal to each shard of your new message – each shard then looks up within its own local database of 10,000 users to see if any of them are following you. It then adds your message to their queue.

This is a classic normalization vs de-normalization – you describe normalization in how you think it works – what I hope (and I am sure they are doing a variant of) is de-normalization.

LikeLike
Francine hardaway says:

May 26, 2008 at 7:25 pm

Spammers are taking up a lot of space on Twitter lately. Sony has a spamming bot.

LikeLike
Francine hardaway says:

May 26, 2008 at 12:25 pm

Spammers are taking up a lot of space on Twitter lately. Sony has a spamming bot.

LikeLike
Trevor Plantagenent says:

May 26, 2008 at 7:35 pm

I don’t believe this, you’re making statements that are 100% false. Yes, every time you tweet, it’s copied 25,000 times. It has to work that way or it wouldn’t have scaled as far as it has. You’re setting yourself up for massive humiliation when you’re definitively proven wrong.

LikeLike
Trevor Plantagenent says:

May 26, 2008 at 12:35 pm

I don’t believe this, you’re making statements that are 100% false. Yes, every time you tweet, it’s copied 25,000 times. It has to work that way or it wouldn’t have scaled as far as it has. You’re setting yourself up for massive humiliation when you’re definitively proven wrong.

LikeLike
Stuart Dallas says:

May 26, 2008 at 8:10 pm

I agree with all the commenters saying that Twitter must be copying messages to each follower, but I really hope they’re just copying an ID 25k times and the actual text of the message maybe just a few 10’s of times (i.e. to multiple caches). There’s no way it would perform as well as it does (most of the time) if it were transferring 30 gig of data every time Scoble tweets.

LikeLike
Stuart Dallas says:

May 26, 2008 at 1:10 pm

I agree with all the commenters saying that Twitter must be copying messages to each follower, but I really hope they’re just copying an ID 25k times and the actual text of the message maybe just a few 10’s of times (i.e. to multiple caches). There’s no way it would perform as well as it does (most of the time) if it were transferring 30 gig of data every time Scoble tweets.

LikeLike
Pingback: Show Us the Money | Mark Evans
Stuart Dallas says:

May 26, 2008 at 8:13 pm

Scratch the 30 gig, misread Om’s blog. Even so it would be a huge amount of data being moved about Twitter’s internal network given that Scoble is just one of quite a few “super users” on there.

LikeLike
Stuart Dallas says:

May 26, 2008 at 1:13 pm

Scratch the 30 gig, misread Om’s blog. Even so it would be a huge amount of data being moved about Twitter’s internal network given that Scoble is just one of quite a few “super users” on there.

LikeLike
Pingback: Scripting News for 5/26/2008 « Scripting News Annex
Craig says:

May 26, 2008 at 8:19 pm

I took a different approach than flat out charging users with mass followings:

http://thoughtindustry.blogspot.com/2008/05/twitters-issues-with-scalability.html

LikeLike
Craig says:

May 26, 2008 at 1:19 pm

I took a different approach than flat out charging users with mass followings:

http://thoughtindustry.blogspot.com/2008/05/twitters-issues-with-scalability.html

LikeLike
JoeDuck says:

May 26, 2008 at 8:29 pm

I think the charging issue is irrelevant because there are so few of you. If Twitter can’t find a better way to monetize than smacking the <500 Superusers with huge monthly fees they are eventually going to be toast anyway. The Flickr model seems more realistic – charge heavy users a small annual fee and put them on a more robust platform. Heck, I’ll pay just to keep having to hear everybody talk so much about Twitter, the challenges of which seem to have gripped the online community in a dangerously obsessive fashion.

LikeLike
JoeDuck says:

May 26, 2008 at 1:29 pm

I think the charging issue is irrelevant because there are so few of you. If Twitter can’t find a better way to monetize than smacking the <500 Superusers with huge monthly fees they are eventually going to be toast anyway. The Flickr model seems more realistic – charge heavy users a small annual fee and put them on a more robust platform. Heck, I’ll pay just to keep having to hear everybody talk so much about Twitter, the challenges of which seem to have gripped the online community in a dangerously obsessive fashion.

LikeLike
Robert Scoble says:

May 26, 2008 at 8:30 pm

Trevor: well, if every Tweet is copied for every one of its potential readers, that totally explains why Twitter has some scale problems. Most Twitter users don’t use the service very often, if at all (I watch).

In my scenario there ARE copies. Just not automatic ones. Also, Twitter only needs to keep the last 10 Tweets cached on each user’s page, to keep the home page fast. Other pages take forever to load, so I doubt those are cached. Even in the home page scenario my Tweets would only be copied to those users who haven’t had my Tweets replaced by other users (most of the time my Tweets would be pushed lower, so there wouldn’t be 23,000 copies, only, maybe 1,000).

Either way, if I’m to blame for Twitter going down, why isn’t FriendFeed going down? There’s a lot more activity on FriendFeed surrounding my messages (and they aren’t cached in any obvious way) and it’s been down about 1/100th as much as Twitter.

LikeLike
Robert Scoble says:

May 26, 2008 at 1:30 pm

Trevor: well, if every Tweet is copied for every one of its potential readers, that totally explains why Twitter has some scale problems. Most Twitter users don’t use the service very often, if at all (I watch).

In my scenario there ARE copies. Just not automatic ones. Also, Twitter only needs to keep the last 10 Tweets cached on each user’s page, to keep the home page fast. Other pages take forever to load, so I doubt those are cached. Even in the home page scenario my Tweets would only be copied to those users who haven’t had my Tweets replaced by other users (most of the time my Tweets would be pushed lower, so there wouldn’t be 23,000 copies, only, maybe 1,000).

Either way, if I’m to blame for Twitter going down, why isn’t FriendFeed going down? There’s a lot more activity on FriendFeed surrounding my messages (and they aren’t cached in any obvious way) and it’s been down about 1/100th as much as Twitter.

LikeLike
Larry says:

May 26, 2008 at 9:27 pm

Maybe you’re both right. Maybe it keeps a single copy of each tweet text and copies a tweet ID to each user’s queue. The heavy-lifting of building the queue would be done at write time. To build a page it would look up each tweet ID in a user queue using a simple key-value map (which can easily be replicated and scaled.)

LikeLike
Larry says:

May 26, 2008 at 2:27 pm

Maybe you’re both right. Maybe it keeps a single copy of each tweet text and copies a tweet ID to each user’s queue. The heavy-lifting of building the queue would be done at write time. To build a page it would look up each tweet ID in a user queue using a simple key-value map (which can easily be replicated and scaled.)

LikeLike
James says:

May 26, 2008 at 9:29 pm

First, as someone has already noted, any large e-mail system will indeed duplicate messages being sent to a large number of recipients — typically a single copy per server, with each recipient getting a Unix hard link (or equivalent) to that copy.

There are several critical differences between Twitter and e-mail, however. The push or notification aspect is one, but message size is a big one. In particular, each of the hard links pointing to a single instance of an e-mail will be bigger than the entire body of a Tweet! Duplicating messages, even in pathological cases like Scoble’s, is trivial: 25,000 copies of a 140 byte message represents a mere 3.5 Mbytes, smaller than a single large e-mail body!

Similarly, I think you’re overestimating the burden of keeping pre-calculated per-viewer data around: the default view has about 16 messages, each 140 bytes plus a bit of metadata (sender username/icon URL), total perhaps 3.2K. 10,000 users on the server? 32 Mb! Trivial. Even ten *million* users on a single node would fit on a PC you can buy online from Dell!

The best architecture is probably a hybrid: keep the recent message queue in RAM for active users (and update realtime when those they follow post messages), built the cache from disk when they log in. Even on a single host, with 15kRPM drives (4ms writes), that’s 100 spindle-seconds; a pair of Apple’s 16-drive arrays and you’re looking at three seconds to process a Scoble-tweet, ignoring both write merging and RAID overhead.

In reality, of course, you can omit a lot of those write-barriers and re-issue the writes from a redo log in the event of a crash, cutting the write load still further. Mirror the writes and distribute reads consistently, you get failover and gain cache hits to boot (each server only sees half as many active users).

Or you write it all in Ruby and SQL then throw a kajillion dollars worth of hardware at making it all sort of work most of the time through brute force. Even $15m can only buy you so much brute force, though…

LikeLike
James says:

May 26, 2008 at 2:29 pm

First, as someone has already noted, any large e-mail system will indeed duplicate messages being sent to a large number of recipients — typically a single copy per server, with each recipient getting a Unix hard link (or equivalent) to that copy.

There are several critical differences between Twitter and e-mail, however. The push or notification aspect is one, but message size is a big one. In particular, each of the hard links pointing to a single instance of an e-mail will be bigger than the entire body of a Tweet! Duplicating messages, even in pathological cases like Scoble’s, is trivial: 25,000 copies of a 140 byte message represents a mere 3.5 Mbytes, smaller than a single large e-mail body!

Similarly, I think you’re overestimating the burden of keeping pre-calculated per-viewer data around: the default view has about 16 messages, each 140 bytes plus a bit of metadata (sender username/icon URL), total perhaps 3.2K. 10,000 users on the server? 32 Mb! Trivial. Even ten *million* users on a single node would fit on a PC you can buy online from Dell!

The best architecture is probably a hybrid: keep the recent message queue in RAM for active users (and update realtime when those they follow post messages), built the cache from disk when they log in. Even on a single host, with 15kRPM drives (4ms writes), that’s 100 spindle-seconds; a pair of Apple’s 16-drive arrays and you’re looking at three seconds to process a Scoble-tweet, ignoring both write merging and RAID overhead.

In reality, of course, you can omit a lot of those write-barriers and re-issue the writes from a redo log in the event of a crash, cutting the write load still further. Mirror the writes and distribute reads consistently, you get failover and gain cache hits to boot (each server only sees half as many active users).

Or you write it all in Ruby and SQL then throw a kajillion dollars worth of hardware at making it all sort of work most of the time through brute force. Even $15m can only buy you so much brute force, though…

LikeLike
Richard Cunningham says:

May 26, 2008 at 9:32 pm

Everyone should read Nick Halstead’s comment.

Robert, I’m sure from reading these comments that the people talking about the technical problems understand how a the normalized databases they teach you in Computer Science course work. It just that large system can’t use them (flickr for example doesn’t it’s sharded / de-normalized)

I don’t think is twitter is sharded yet since they weren’t at 350,000 users (http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)
They certainly SHOULD be copying messages around if they are sharded.

You would think if they could get to 350,000 users on one database they could get to 1 million users by adding some database read-only slave servers.

Scaling isn’t about saving disk space, CPU cycles, memory – that is being efficient it’s not the same thing. Microsoft might try that with Exchange to reduce their customers hardware costs (not that it works from what I hear)

Scaling is knowing you can buy a rack of machines of servers and actually make them reduce your load.

LikeLike
Richard Cunningham says:

May 26, 2008 at 2:32 pm

Everyone should read Nick Halstead’s comment.

Robert, I’m sure from reading these comments that the people talking about the technical problems understand how a the normalized databases they teach you in Computer Science course work. It just that large system can’t use them (flickr for example doesn’t it’s sharded / de-normalized)

I don’t think is twitter is sharded yet since they weren’t at 350,000 users (http://highscalability.com/scaling-twitter-making-twitter-10000-percent-faster)
They certainly SHOULD be copying messages around if they are sharded.

You would think if they could get to 350,000 users on one database they could get to 1 million users by adding some database read-only slave servers.

Scaling isn’t about saving disk space, CPU cycles, memory – that is being efficient it’s not the same thing. Microsoft might try that with Exchange to reduce their customers hardware costs (not that it works from what I hear)

Scaling is knowing you can buy a rack of machines of servers and actually make them reduce your load.

LikeLike
Pingback: The fav.or.it Blog » Fixing Twitter
Nick Halstead says:

May 26, 2008 at 10:13 pm

Could not resist typing up my complete thoughts on scaling twitter – http://blog.fav.or.it/2008/05/26/fixing-twitter/

LikeLike
Nick Halstead says:

May 26, 2008 at 3:13 pm

Could not resist typing up my complete thoughts on scaling twitter – http://blog.fav.or.it/2008/05/26/fixing-twitter/

LikeLike
Tom says:

May 26, 2008 at 10:16 pm

I just wanted to put in another “Everyone should read Nick Halstead’s comment” vote. Honestly, there are many things I admire about Scoble (getting the obligatory compliment before the insult out of the way) but this post is just ridiculous from at technical perspective.

As far as “charging super users” goes it isn’t really worth arguing because its going to be different for every service.

This is why you need a business model. To determine which ways of making money will be most effective and execute on them. Charging super users will be right in some cases while being wrong in others (depending on how much value the company in question can put into the “charged” scenario)

LikeLike
Tom says:

May 26, 2008 at 3:16 pm

I just wanted to put in another “Everyone should read Nick Halstead’s comment” vote. Honestly, there are many things I admire about Scoble (getting the obligatory compliment before the insult out of the way) but this post is just ridiculous from at technical perspective.

As far as “charging super users” goes it isn’t really worth arguing because its going to be different for every service.

This is why you need a business model. To determine which ways of making money will be most effective and execute on them. Charging super users will be right in some cases while being wrong in others (depending on how much value the company in question can put into the “charged” scenario)

LikeLike
Tony Fendall says:

May 26, 2008 at 10:26 pm

The whole debate is stupid! Why does no one seem to get it?

Could a team of competent software engineers build a system which could handle this many users? Yes!
Should twitter have a system which can handle this many users? Yes!

I don’t understand why people are so keen to defend poor service. If it’s broke, then the twitter guys should fix it. That means better code, more servers, what ever it takes.

If the problem is that they can’t find a way to monetize it, then that’s a different problem, but one where having lots of users should help, not hinder.

LikeLike
Tony Fendall says:

May 26, 2008 at 3:26 pm

The whole debate is stupid! Why does no one seem to get it?

Could a team of competent software engineers build a system which could handle this many users? Yes!
Should twitter have a system which can handle this many users? Yes!

I don’t understand why people are so keen to defend poor service. If it’s broke, then the twitter guys should fix it. That means better code, more servers, what ever it takes.

If the problem is that they can’t find a way to monetize it, then that’s a different problem, but one where having lots of users should help, not hinder.

LikeLike
Prokofy Neva says:

May 26, 2008 at 10:43 pm

We used to have this thing in the ’90s for people who wanted to broadcast to a large audience — it was called “television”. And we had this other thing you could use to interact with the broadcaster, called “the U.S. Postal Service”. The fun part was when a guy like Dave Letterman would read your letter right there on the air, when he was broadcasting.

LikeLike
Prokofy Neva says:

May 26, 2008 at 3:43 pm

We used to have this thing in the ’90s for people who wanted to broadcast to a large audience — it was called “television”. And we had this other thing you could use to interact with the broadcaster, called “the U.S. Postal Service”. The fun part was when a guy like Dave Letterman would read your letter right there on the air, when he was broadcasting.

LikeLike
Angus McDonald says:

May 26, 2008 at 10:55 pm

Om’s gotten it totally wrong. Twitter should charge your followers (past perhaps the first 100) for the privilege of getting access to your opinions faster. The website could remain free, but most twitterers I know use desktop clients to keep abreast, and so following someone is the only way to do that.

Charging 10 cents per month to each follower after the first would reap a far greater income for them, and annoy interesting twitterers less!

See my blog post for more:
http://falkayn.blogspot.com/2008/05/oms-got-wrong-business-model-for.html

LikeLike
Angus McDonald says:

May 26, 2008 at 3:55 pm

Om’s gotten it totally wrong. Twitter should charge your followers (past perhaps the first 100) for the privilege of getting access to your opinions faster. The website could remain free, but most twitterers I know use desktop clients to keep abreast, and so following someone is the only way to do that.

Charging 10 cents per month to each follower after the first would reap a far greater income for them, and annoy interesting twitterers less!

See my blog post for more:
http://falkayn.blogspot.com/2008/05/oms-got-wrong-business-model-for.html

LikeLike
Brian says:

May 27, 2008 at 1:32 am

“Sweet how you never had to work with an Exchange server which did exactly that, and then added ‘All’ as a recipient to the address book of every user.

Exchange never ever stored a message per user. If all users are on the same Exchange server and sent a message from someone on the same Exchange server, it it only stored ONCE. That’s been the case since Exchange 4.0. Bedlam had more to do with people hitting Reply All to an alias that had users on different servers. It was the message queue that caused the primary problems during Bedlam.

In Exchange 2007 there is a deemphasis on SIS–it only applies to attachments. Not sure what the scaling problems are with Twitter as I have no idea how the system is designed. But, it would be safe to figure that whether or not they use SIS is not the source of their instability.

Now, back to your regularly scheduled debate about an non-scalable, useless communication tool.

LikeLike
Brian says:

May 26, 2008 at 6:32 pm

“Sweet how you never had to work with an Exchange server which did exactly that, and then added ‘All’ as a recipient to the address book of every user.

Exchange never ever stored a message per user. If all users are on the same Exchange server and sent a message from someone on the same Exchange server, it it only stored ONCE. That’s been the case since Exchange 4.0. Bedlam had more to do with people hitting Reply All to an alias that had users on different servers. It was the message queue that caused the primary problems during Bedlam.

In Exchange 2007 there is a deemphasis on SIS–it only applies to attachments. Not sure what the scaling problems are with Twitter as I have no idea how the system is designed. But, it would be safe to figure that whether or not they use SIS is not the source of their instability.

Now, back to your regularly scheduled debate about an non-scalable, useless communication tool.

LikeLike
Ethan Ambabo says:

May 27, 2008 at 3:14 am

Another aspect of the problem is that so many people use twitter in near-constant streams throughout multi-hour events so as to give “live” coverage of an event or just their stream of thought (case in point: the owner of this blog). I’m certain that taking all those SMS’s in, displaying them, and broadcasting them out to webpages, RSS feeds, and then more phones is taxing to say the least. Look at the times it goes down and I’m sure you’ll see a major correlation with tech events. I understand that some have found this useful, but I really don’t and it’s a sure fire way to get me to unfollow you, and I can guarantee why Twitter’s user base hasn’t diversified beyond “social techies.”

Solution? Create a new service on top of twitter for twitter-streams, because obviously people don’t get the idea behind 120-character limits (by the way, SMS has a 160-character limit) and hold their tweetstorms in a buffer to digest and spew out to followers when the server load can handle it.

LikeLike
Ethan Ambabo says:

May 26, 2008 at 8:14 pm

Another aspect of the problem is that so many people use twitter in near-constant streams throughout multi-hour events so as to give “live” coverage of an event or just their stream of thought (case in point: the owner of this blog). I’m certain that taking all those SMS’s in, displaying them, and broadcasting them out to webpages, RSS feeds, and then more phones is taxing to say the least. Look at the times it goes down and I’m sure you’ll see a major correlation with tech events. I understand that some have found this useful, but I really don’t and it’s a sure fire way to get me to unfollow you, and I can guarantee why Twitter’s user base hasn’t diversified beyond “social techies.”

Solution? Create a new service on top of twitter for twitter-streams, because obviously people don’t get the idea behind 120-character limits (by the way, SMS has a 160-character limit) and hold their tweetstorms in a buffer to digest and spew out to followers when the server load can handle it.

LikeLike
Pingback: Scoble Scale? - Marc LaFleur
Alberto says:

May 27, 2008 at 5:03 am

With Nick’s explanation on why de-normalization is needed to scale, it is clear one of the complex issues Twitter has to solve. The other one mentioned in this post is the business model, or when to charge.

Angus has the right idea – charging the followers – although I don’t agree on the analysis. Still using Robert as the super-user, he should not be charged because of his tweets, but for the number of people he is following. Each tweet sent by the friends Robert is following will be copied on his queue (well the tweet ID) and the size and freshness of this queue (visually the ‘With Others’) can be used as the factor to charge.

LikeLike
Alberto says:

May 26, 2008 at 10:03 pm

With Nick’s explanation on why de-normalization is needed to scale, it is clear one of the complex issues Twitter has to solve. The other one mentioned in this post is the business model, or when to charge.

Angus has the right idea – charging the followers – although I don’t agree on the analysis. Still using Robert as the super-user, he should not be charged because of his tweets, but for the number of people he is following. Each tweet sent by the friends Robert is following will be copied on his queue (well the tweet ID) and the size and freshness of this queue (visually the ‘With Others’) can be used as the factor to charge.

LikeLike
Jerry says:

May 27, 2008 at 5:27 am

Like they say, you get what you pay for. You want better SLA’s and uptime? Crack open your wallet.

LikeLike
Jerry says:

May 26, 2008 at 10:27 pm

Like they say, you get what you pay for. You want better SLA’s and uptime? Crack open your wallet.

LikeLike
pwb says:

May 27, 2008 at 5:37 am

Does anyone *actually* know how Twitter does what is does? Given the 140 byte limit, Twitter seems very “doable” with some basic design choices. This is why I think Ruby/Rails may really be a culprit here…it’s too high level to support some of the things Twitter needs to do.

LikeLike
pwb says:

May 26, 2008 at 10:37 pm

Does anyone *actually* know how Twitter does what is does? Given the 140 byte limit, Twitter seems very “doable” with some basic design choices. This is why I think Ruby/Rails may really be a culprit here…it’s too high level to support some of the things Twitter needs to do.

LikeLike
geekazine says:

May 27, 2008 at 7:54 am

Wow. Lots of interesting comments. Then again, lots of BLAH, BLAH, BLAH.

Here’s a fact that people are overlooking. Traffic brings revenue. Let’s say that the 25,000 posts get counted. What does that really amount to?

That means that 25,000 people are looking at what Robert Scoble is saying. If I was a person that wants to get out my product name, then I think I would pay Twitter to keep the service going. Better yet I might ask and pay Robert to push my wares.

About a month back I @(replied) Robert on something. I believe it was during one of many Twitters’ “Problems”. The exchange was short and sweet. However, I looked at my Followers an hour later, it jumped up 15 (which it doesn’t normally do).

I tested the water by @ another person. The same thing happened. I gain more followers by replying to high profile twitters.

Now, apply that all to a marketing model. Communication can mean $$. I guess that’s why Twitter was able to raise 20 million on it’s own.

The problem isn’t the Twitterflood. If that was the case then sites like MySpace and Facebook would be going down on a daily basis. If it DOES do what OM Malik suggests, then Twitter needs to look at their internal structure. Not at Robert Scoble, or Leo Laporte.

Limits and subscription fees are a great way to kill the idea. Some will pay for it, while others will say “See ya”. Twitter will fall like a ball of flame into the Pacific Ocean.

They keep the idea fresh. To most, Twitter is an “Oh, I heard of that”. People might know about it, but never signed up. Oncemore, Twitter can easily become a cash cow. The data that comes into Twitter is like when Daffy Duck found the Sultan’s cave.

I’M RICH! I’M WEALTY BEYOND MY WILDEST DREAMS!!!…

Keep going Scoble. I’m listening…

LikeLike
geekazine says:

May 27, 2008 at 12:54 am

Wow. Lots of interesting comments. Then again, lots of BLAH, BLAH, BLAH.

Here’s a fact that people are overlooking. Traffic brings revenue. Let’s say that the 25,000 posts get counted. What does that really amount to?

That means that 25,000 people are looking at what Robert Scoble is saying. If I was a person that wants to get out my product name, then I think I would pay Twitter to keep the service going. Better yet I might ask and pay Robert to push my wares.

About a month back I @(replied) Robert on something. I believe it was during one of many Twitters’ “Problems”. The exchange was short and sweet. However, I looked at my Followers an hour later, it jumped up 15 (which it doesn’t normally do).

I tested the water by @ another person. The same thing happened. I gain more followers by replying to high profile twitters.

Now, apply that all to a marketing model. Communication can mean $$. I guess that’s why Twitter was able to raise 20 million on it’s own.

The problem isn’t the Twitterflood. If that was the case then sites like MySpace and Facebook would be going down on a daily basis. If it DOES do what OM Malik suggests, then Twitter needs to look at their internal structure. Not at Robert Scoble, or Leo Laporte.

Limits and subscription fees are a great way to kill the idea. Some will pay for it, while others will say “See ya”. Twitter will fall like a ball of flame into the Pacific Ocean.

They keep the idea fresh. To most, Twitter is an “Oh, I heard of that”. People might know about it, but never signed up. Oncemore, Twitter can easily become a cash cow. The data that comes into Twitter is like when Daffy Duck found the Sultan’s cave.

I’M RICH! I’M WEALTY BEYOND MY WILDEST DREAMS!!!…

Keep going Scoble. I’m listening…

LikeLike
Trevor Plantagenent says:

May 27, 2008 at 7:55 am

I don’t see why not to charge the supertweeters, they pay whoever hosts their blogs, don’t they? They’re using Twitter as their personal blogging platform, no difference.

LikeLike
Trevor Plantagenent says:

May 27, 2008 at 12:55 am

I don’t see why not to charge the supertweeters, they pay whoever hosts their blogs, don’t they? They’re using Twitter as their personal blogging platform, no difference.

LikeLike
BSally says:

May 27, 2008 at 8:35 pm

Does it really cost that much to host someone’s Tweets, even if they are quite frequent? If so, that seems like an inefficient model. I feel like most sites practice some form of economies of scale to where frequent users actually become more affordable (and therefore receive price breaks)… so, for example, Twitter might charge 2 cents for every 10 posts to relatively infrequent users like myself, and 2 cents for every 100 posts to frequent updaters. Keeping costs low for busy Twitterers would also increase business for them, since people are more likely to check your site frequently when it is frequently updated. If Twitter has difficulties understanding this, I can recommend a Basic Economics textbook (ebooks version).

LikeLike
BSally says:

May 27, 2008 at 1:35 pm

Does it really cost that much to host someone’s Tweets, even if they are quite frequent? If so, that seems like an inefficient model. I feel like most sites practice some form of economies of scale to where frequent users actually become more affordable (and therefore receive price breaks)… so, for example, Twitter might charge 2 cents for every 10 posts to relatively infrequent users like myself, and 2 cents for every 100 posts to frequent updaters. Keeping costs low for busy Twitterers would also increase business for them, since people are more likely to check your site frequently when it is frequently updated. If Twitter has difficulties understanding this, I can recommend a Basic Economics textbook (ebooks version).

LikeLike
Steve Wilhelm says:

May 28, 2008 at 6:24 am

“First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them.”

I do know how Twitter is built. However, “back in the day,” I was the development manager for a real-time, stock quote delivery system, so I do have some experience with architectural issues Twitter may be facing.

Let’s look at the procedure Robert refers to as “remixes them.” In the simplest architecture, there would be a single list (database, flat file, etc) of all the twitters created by everyone stored in chronological order. You may, as a storage optimization, just store a user id with the twitter string, and tweet time stamp (aka a tweet).

In this single architecture, a “remix” would require a query across all the tweets for a period of time for all people that a user follows. This query would be fairly fast when the number of tweets in the specified period of time is fairly small, and the number of users a person follows is fairly small. You can see that this type of query becomes more expensive when the number of users you follow increases and the overall number of tweets per period increases.

So to speed up this query, you could build some kind of index based on users. But maintaining this index would become expensive, especially during high incoming tweet periods.

So one might try to optimize this architecture by breaking up the universal store into list of tweets per person. Now each incoming tweet can be easily added to the user’s tweet list.

Then the “remix” of tweets of the people you follow would require a join across each list and then sorted by chronological order. This would become increasingly more expensive when a user starts to increase the number of people they follow. It would be particularly expensive for super users who follow lots of users.

A reasonable compromise might be to keep a single universal stream of tweets in chronological order and two lists for each user: a list of pointers of all their tweets, and a list of pointers to all the tweets from the people they follow.

Maintaining these three lists would look something like: sender publishes a tweet, it is added to the universal store, a pointer is then added to the sender’s tweet list, and then “push the tweet to followers” by walking sender’s list of followers and add a pointer to the tweet to each “follow” list.

This approach scales fairly well. It allows the act of updating the follow lists to be partitioned across multiple servers. Each server can just take (using shared queues) a tweet from the universal store and “fan it out” to the appropriate followers. It also can be separates the operation from the inbound tweet processing.

To optimize the “fan it out” process, messaging publish and subscribe product like JMS or TIBCO Rendezvous and broadcast the tweets to the servers that manage follow lists. This would require a universal store process to publish all tweets and a cloud of follow list managers listening (aka subscribing) to tweet broadcasts updates for each followed person.

This approach also nicely addresses Twitter’s need to separate outbound follower queues for users that have requested point to point delivery of messages via Instant Messaging and SMS.

For further scaling optimzation, you can have several tweet stores instead of one single universal store. You just need to ensure that all incoming tweets from a particular user are added to the same store to maintain ordered delivery to followers.

So it is quite reasonable to copy (at least references) each of Robert’s tweets 25,000 times, just do so in a scalable manner.

LikeLike
Steve Wilhelm says:

May 27, 2008 at 11:24 pm

“First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them.”

I do know how Twitter is built. However, “back in the day,” I was the development manager for a real-time, stock quote delivery system, so I do have some experience with architectural issues Twitter may be facing.

Let’s look at the procedure Robert refers to as “remixes them.” In the simplest architecture, there would be a single list (database, flat file, etc) of all the twitters created by everyone stored in chronological order. You may, as a storage optimization, just store a user id with the twitter string, and tweet time stamp (aka a tweet).

In this single architecture, a “remix” would require a query across all the tweets for a period of time for all people that a user follows. This query would be fairly fast when the number of tweets in the specified period of time is fairly small, and the number of users a person follows is fairly small. You can see that this type of query becomes more expensive when the number of users you follow increases and the overall number of tweets per period increases.

So to speed up this query, you could build some kind of index based on users. But maintaining this index would become expensive, especially during high incoming tweet periods.

So one might try to optimize this architecture by breaking up the universal store into list of tweets per person. Now each incoming tweet can be easily added to the user’s tweet list.

Then the “remix” of tweets of the people you follow would require a join across each list and then sorted by chronological order. This would become increasingly more expensive when a user starts to increase the number of people they follow. It would be particularly expensive for super users who follow lots of users.

A reasonable compromise might be to keep a single universal stream of tweets in chronological order and two lists for each user: a list of pointers of all their tweets, and a list of pointers to all the tweets from the people they follow.

Maintaining these three lists would look something like: sender publishes a tweet, it is added to the universal store, a pointer is then added to the sender’s tweet list, and then “push the tweet to followers” by walking sender’s list of followers and add a pointer to the tweet to each “follow” list.

This approach scales fairly well. It allows the act of updating the follow lists to be partitioned across multiple servers. Each server can just take (using shared queues) a tweet from the universal store and “fan it out” to the appropriate followers. It also can be separates the operation from the inbound tweet processing.

To optimize the “fan it out” process, messaging publish and subscribe product like JMS or TIBCO Rendezvous and broadcast the tweets to the servers that manage follow lists. This would require a universal store process to publish all tweets and a cloud of follow list managers listening (aka subscribing) to tweet broadcasts updates for each followed person.

This approach also nicely addresses Twitter’s need to separate outbound follower queues for users that have requested point to point delivery of messages via Instant Messaging and SMS.

For further scaling optimzation, you can have several tweet stores instead of one single universal store. You just need to ensure that all incoming tweets from a particular user are added to the same store to maintain ordered delivery to followers.

So it is quite reasonable to copy (at least references) each of Robert’s tweets 25,000 times, just do so in a scalable manner.

LikeLike
sam ism says:

May 30, 2008 at 9:31 pm

Life is a very short journey and we are not here to waste it in using Twitter, Facebook, iPhone, Flickr,ETC.. then simple die.I just don’t get it how do we benefits the community and the people by using these things what is the meaning of these things in life. people are dying all over the world and we are using Twitter to stay in touch with friends.

LikeLike
sam ism says:

May 30, 2008 at 2:31 pm

Life is a very short journey and we are not here to waste it in using Twitter, Facebook, iPhone, Flickr,ETC.. then simple die.I just don’t get it how do we benefits the community and the people by using these things what is the meaning of these things in life. people are dying all over the world and we are using Twitter to stay in touch with friends.

LikeLike
Edvin Aghanian says:

May 31, 2008 at 11:53 am

Mr. Scoble,

I must say that I have not really cared enough to read your column in the past. After reading this post, however, I will make sure never to read anything else you choose to write. What I see is a person that is clearly ignorant about a complex set of topics related to application design and scalability, speaking sophomorically about them. Perhaps you should take some time, in your case, a great deal of it, and educate yourself about these matters before speaking great volumes of nonsense relating to the technical implementation of this or any other application.

Start by reading the comments here and questioning some of the very smart people, who have graciously taken the time to try and educate you. Please, for the sake of the thousands of people that clearly believe you to be an authority on matters of technology, stop this idiocy.

LikeLike
Edvin Aghanian says:

May 31, 2008 at 4:53 am

Mr. Scoble,

I must say that I have not really cared enough to read your column in the past. After reading this post, however, I will make sure never to read anything else you choose to write. What I see is a person that is clearly ignorant about a complex set of topics related to application design and scalability, speaking sophomorically about them. Perhaps you should take some time, in your case, a great deal of it, and educate yourself about these matters before speaking great volumes of nonsense relating to the technical implementation of this or any other application.

Start by reading the comments here and questioning some of the very smart people, who have graciously taken the time to try and educate you. Please, for the sake of the thousands of people that clearly believe you to be an authority on matters of technology, stop this idiocy.

LikeLike
Pingback: netcaetera.ro » Blog Archive » Should Twitter charge high-spew users?
Pingback: Weekly SOA crumbs #19 - Service Endpoint
Pingback: RatZine - Rat stinkin news » Blog Archive » Should Twitter charge high-spew users?
Pingback: Zach Ware » Blog Archive » Twitter Should PAY Super Users
Pingback: Should Twitter make you pay? (Spewsers) | TechBurgh
Pingback: fav.or.it | Fixing Twitter
Pingback: Revisiting (and rethinking) the Twitter “Pay to Listen” business model - Loosely Coupled has moved
Pingback: Do we need to Monetize Twitter? « Are You Hiring?
Pingback: Why Monetize Twitter? « Are You Hiring?
Pingback: Why Monetize Twitter? - Cause Marketing & The Social Sector | Cause Marketing & The Social Sector