The race to time-based and blog search

OK, MSN is late to the party again. Hey, Bill Gates and Steve Ballmer, how’s that acquisition I recommended coming? That’d get us into the blog search market right now. Among other things. But, since I haven’t been given a check to go company shopping yet, let’s talk about what the state of time-based search is.

In a phrase: it sucks.

No one is doing it well.

I can just hear everyone saying “huh? I thought Feedster, Technorati, IceRocket, Bloglines, and Pubsub, among others, are doing time-based search?”

Yes, but they all are unsatisfactory. Why? Well, for one, they’ll never have the traffic of MSN Search, Yahoo, or Google. Most of the “normal” people around me never will use a search engine other than these three. Heck, most of the people in the world have never even clicked on “advanced search” and you’re gonna try to get them to visit something like http://blogsearch.google.com ? Yeah, right.

But, before I dive into the state of time-based search today, let’s look at Yahoo, Google, and MSN first so you can see just how bad those three are if you want to find something that was added to the Web yesterday.

We have a great case study. Yesterday Microsoft and Real settled their anti-trust case and announced a new partnership. It was written about on hundreds of blogs and hundreds of “pro” news sources.

Β We also have today’s Apple announcements. So, let’s search on both of those.

Apple Video iPod on Google; on MSN; on Yahoo.
Microsoft Real Settlement 2005 on Google; on MSN; on Yahoo.

Now, Google does the best, but is far from satisfying (and, when I did the same search earlier today had no results, so you can see that Google is reacting pretty quickly and they have links to their news service at the top of the page). First of all, there’s a LOT of noise to dig through (most of the links, even on Google’s page, are old from previous annoucements).

Anyway, let’s just say that the current “big” search engines don’t do a good job with things that just happened and they are totally missing any blog voices (even the big traffic folks like Engadget).

Quick, which gives you the better result? Google/MSN/Yahoo or Memeorandum’s cluster of results for Apple’s announcements? Memeorandum’s wins hands down. How did that happen? After all, Google/MSN/Yahoo index billions of pages and Memeorandum only indexes a few thousand blogs.

Hint: it’s not who indexes the most, but who indexes and brings back the best stuff that matters.

So, let’s go over to the blog searches and try these queries there and see what comes up, shall we?

Apple Video iPod on Google’s blog search, on Yahoo’s blog search, on Feedster, on Technorati, on IceRocket. Unfortunately I can’t link to NewsGator’s blog search (need to register), or Pubsub’s blog search (only exports to RSS feeds).

Now, what do you get on all of these blog search engines? LOTS of time-based lists of blogs. But, is this useful either? No. No. No.

Why? Cause I don’t know where to start. I don’t know the reputation of any of these results. I don’t know which ones are being linked to, which ones are being viewed, which ones write about Apple a lot, or which ones are people who never write about Apple except for today. There’s no context. No help for me to figure out what’s going on. At least Technorati shows me inbound links (but you don’t know what those inbound links were for — what if they were all for a Paris Hilton video that that blog had on it, would that blogger be as interesting for you to read about Apple as, say, Engadget or an Apple-focused blog?

Not to me.

So, what are we left with? Two things: titles and names of blogs. My new favorite? IceRocket. Why? Cause it shows both, and how many links, but it also lets you exclude and refocus your search. And, it shows you how long the post is (longer posts probably are more interesting, unless they are too long, like this one. Heheh.).

Which, brings me to another thing wrong with blog search.

The search for great blogs is way too hard, especially when using a blog-search engine.

Here’s a homework project. Search any of the engines for “scrapbooking blog” and find me the top five scrapbooking blogs. I went to Technorati and searched for “best scrapbooking blogs.” Technorati brought me tons of results. But, they aren’t the right ones. I’m thinking of something like Technorati’s Scrapbooking page, but this is just woefully incomplete and can be gamed very easily (I could go straight to the top of this page if I added the Scrapbooking tag to my blog on Technorati).

Anyway, I thought about going on. I tried various scrapbooking searches at Feedster, Bloglines, Technorati, IceRocket, NewsGator and they all left me wanting.

What do I want? An engine that mixes both time-based searching with relevancy.

How can the engines do it? Well, they should study Mary Hodder’s “social gestures” writings very closely.

Mary, I wish Mary would be in charge of building a new kind of search engine. I hope MSN listens to her, she’s done the best thinking on this topic I’ve seen (and her ideas come out of real user research).

Anyway, what would you like to see in blog search? Time-based search?

63 Replies to “The race to time-based and blog search”

  1. Thats so true. When u search for something on technorati, it doesn’t reflect the latest thing. Technorati seems to be suffering from information overload and does not give very good search result. By the way, the new template is so much better.

    Like

  2. Thats so true. When u search for something on technorati, it doesn’t reflect the latest thing. Technorati seems to be suffering from information overload and does not give very good search result. By the way, the new template is so much better.

    Like

  3. actually i liked your radio blog template better…it forced you to focus on the text and it made for an easy reading directly from the site…

    plus mudpit opened up in a new window…so i didn’t have to click once more to see all your entries…

    hmmm…anyway your choice…we’ll have to keep up πŸ™‚

    btw Scoble…when are you putting up your Xbox 360 video online…the one that talks about its web-intergation features…you still under NDA? I’m hoping there’s atleast one more big announcement before the launch cause I hear that Sony’s going to debut the Playstation Online service to parry the Xbox 360 launch…

    Like

  4. actually i liked your radio blog template better…it forced you to focus on the text and it made for an easy reading directly from the site…

    plus mudpit opened up in a new window…so i didn’t have to click once more to see all your entries…

    hmmm…anyway your choice…we’ll have to keep up πŸ™‚

    btw Scoble…when are you putting up your Xbox 360 video online…the one that talks about its web-intergation features…you still under NDA? I’m hoping there’s atleast one more big announcement before the launch cause I hear that Sony’s going to debut the Playstation Online service to parry the Xbox 360 launch…

    Like

  5. a couple of more comments:

    – you’re dead on about the Apple thing…Steve Jobs has said periodically that Media Centers and video ipods make no sense…and well both have been set up by MS/partners before…

    – Xbox 360 is about to launch…I think Microsoft needs to partner with someone for a handheld gaming system…you guys have great IP in this area…BUT don’t go out to make your own handheld…partner with someone who’s already got something out there…i think it’s important cause that’s the only thing missing from the MS portfolio in the gaming market…

    and I wouldn’t have written this if I didn’t haev a partner in mind πŸ™‚

    N-Gage! it has not done so well (thanks to shoddy design) but the concept is just plain cool…a good integrated device and you could tout its connectivity features with Xbox 360…heck you can call it the Xage (pronounced: Zage)…

    Like

  6. a couple of more comments:

    – you’re dead on about the Apple thing…Steve Jobs has said periodically that Media Centers and video ipods make no sense…and well both have been set up by MS/partners before…

    – Xbox 360 is about to launch…I think Microsoft needs to partner with someone for a handheld gaming system…you guys have great IP in this area…BUT don’t go out to make your own handheld…partner with someone who’s already got something out there…i think it’s important cause that’s the only thing missing from the MS portfolio in the gaming market…

    and I wouldn’t have written this if I didn’t haev a partner in mind πŸ™‚

    N-Gage! it has not done so well (thanks to shoddy design) but the concept is just plain cool…a good integrated device and you could tout its connectivity features with Xbox 360…heck you can call it the Xage (pronounced: Zage)…

    Like

  7. “But, before I dive into the state of time-based search today, let’s look at Yahoo, Google, and MSN first so you can see just how bad those three are if you want to find something that was added to the Web yesterday.”

    Let’s qualify. You mean how bad they are if you only look at the web search results and ignore the onebox/shortcut displays they have.

    In other words, do [video ipod] on Google or Yahoo, and at the top of the pages, they show you plenty of news results. They aren’t behind in gathering fresh data. They’re simply segregating it into the news area and giving you a heads-up that it is there.

    You’re either missing it or ignoring it because those top of the page segments don’t feel “normal” to you. All I can say is that the search engines are aware of that issue.

    If you look at my article it talks about how at some point, the search engines need to automatically push the right button or tab or link for you, to give you 10 news results for queries that obviously are news related. Or you do a shopping search and you get all shopping results automatically.

    The problem is the search engines are frightened about making such a change. If they get it wrong, they may lose people. So they are slowly letting vertical listings creep in this way.

    Remember, web search is NOT a time based activity. Honestly. Think about it. The last time you did a web search for something new, you weren’t looking for the best overall site on the subject, were you? No, you wanted the latest, timely informaiton. You wanted news. They give you excellent news through news search engines. And Yahoo, among the majors, as you know just started incorporating blogs as a news source, as well.

    Overall, Robert, I think the posts you are doing on search are great in raising the issues out there and helping push for further UI changes that need to happen. But I think it would also help to point out some of the features that do exactly what you want, when they exist. IE — everyone, you want timely info? news.google.com, news.yahoo.com are great places to go.

    As for your blog search problem, yeah, I know that well. It’s why I don’t depend on blog search much. I get get timely, but I also get all the crud. PubSub tries to solve this by picking the most authorative blogs, but I haven’t found that’s really solved the problem much.

    Ultimately, it will probably come down to blog search further refining this, letting you search by default against a set of hand selected or some other method filtered blogs, to cut out all the spam — and you can go further across all the blogs if you want. But when there are simply so many blogs out there, a good chunk of them splogs and so on, you’ve got to have some filtering. THAT’s why news search works so well, because the vertical sites allowed in there are reviewed.

    Like

  8. “But, before I dive into the state of time-based search today, let’s look at Yahoo, Google, and MSN first so you can see just how bad those three are if you want to find something that was added to the Web yesterday.”

    Let’s qualify. You mean how bad they are if you only look at the web search results and ignore the onebox/shortcut displays they have.

    In other words, do [video ipod] on Google or Yahoo, and at the top of the pages, they show you plenty of news results. They aren’t behind in gathering fresh data. They’re simply segregating it into the news area and giving you a heads-up that it is there.

    You’re either missing it or ignoring it because those top of the page segments don’t feel “normal” to you. All I can say is that the search engines are aware of that issue.

    If you look at my article it talks about how at some point, the search engines need to automatically push the right button or tab or link for you, to give you 10 news results for queries that obviously are news related. Or you do a shopping search and you get all shopping results automatically.

    The problem is the search engines are frightened about making such a change. If they get it wrong, they may lose people. So they are slowly letting vertical listings creep in this way.

    Remember, web search is NOT a time based activity. Honestly. Think about it. The last time you did a web search for something new, you weren’t looking for the best overall site on the subject, were you? No, you wanted the latest, timely informaiton. You wanted news. They give you excellent news through news search engines. And Yahoo, among the majors, as you know just started incorporating blogs as a news source, as well.

    Overall, Robert, I think the posts you are doing on search are great in raising the issues out there and helping push for further UI changes that need to happen. But I think it would also help to point out some of the features that do exactly what you want, when they exist. IE — everyone, you want timely info? news.google.com, news.yahoo.com are great places to go.

    As for your blog search problem, yeah, I know that well. It’s why I don’t depend on blog search much. I get get timely, but I also get all the crud. PubSub tries to solve this by picking the most authorative blogs, but I haven’t found that’s really solved the problem much.

    Ultimately, it will probably come down to blog search further refining this, letting you search by default against a set of hand selected or some other method filtered blogs, to cut out all the spam — and you can go further across all the blogs if you want. But when there are simply so many blogs out there, a good chunk of them splogs and so on, you’ve got to have some filtering. THAT’s why news search works so well, because the vertical sites allowed in there are reviewed.

    Like

  9. Danny’s comment has an open element with no closing , it’s screwing up your comments.

    Time/Blog-based searching does currently suck, but it will get better. I agree that people need to get these results from the main search page, not from a separate page (and I assume once BlogSearch makes its way out of Beta that it will be integrated somehow into google’s main search). I think it would be good to see time-based entries in a separate column next to regular search entries. If you look at google, their regular search has the heading “Web” and their Blog Search has the heading “Blog Search”. They should combine these on one page (in two columns) but they would likely have to shrink their “sponsored links” div. Each column could be leafed through its pages without affecting the other column (ie use some Ajax).

    I think a relevancy figure should be determinable from a heuristic that combines text relevancy with recency with # of inbounds with # of comments/trackbacks (for that entry only), but of course that’s open to spam attacks.

    Like

  10. Danny’s comment has an open element with no closing , it’s screwing up your comments.

    Time/Blog-based searching does currently suck, but it will get better. I agree that people need to get these results from the main search page, not from a separate page (and I assume once BlogSearch makes its way out of Beta that it will be integrated somehow into google’s main search). I think it would be good to see time-based entries in a separate column next to regular search entries. If you look at google, their regular search has the heading “Web” and their Blog Search has the heading “Blog Search”. They should combine these on one page (in two columns) but they would likely have to shrink their “sponsored links” div. Each column could be leafed through its pages without affecting the other column (ie use some Ajax).

    I think a relevancy figure should be determinable from a heuristic that combines text relevancy with recency with # of inbounds with # of comments/trackbacks (for that entry only), but of course that’s open to spam attacks.

    Like

  11. Everyone knows the web 1.0 algorithmic search doesn’t work in the real time environment. Historical stuff – great and awesome but real time stuff I agree with you Scob 100%. It’s going to be an algorithm that no one sees coming and I think it will be social based …. keep your eye out for some new stuff…

    Like

  12. Everyone knows the web 1.0 algorithmic search doesn’t work in the real time environment. Historical stuff – great and awesome but real time stuff I agree with you Scob 100%. It’s going to be an algorithm that no one sees coming and I think it will be social based …. keep your eye out for some new stuff…

    Like

  13. I still think we should not divide search engines by content type unless it is clearly divided. Yeah, blogs are a type of content, but there’s a fine line between what is a blog and what is a “site”. I think we should just search from the standard Google website, and let the search engine figure out what’s what. Don’t put that burden on the user. I want to be able to get for example the most relevant posts about the new Xbox…. are blogs more relevant ? Or content websites ? Are newer posts better (or more relevant) than older posts ? Relevancy is very hard to measure as you know.

    regards,
    Leo

    Like

  14. I still think we should not divide search engines by content type unless it is clearly divided. Yeah, blogs are a type of content, but there’s a fine line between what is a blog and what is a “site”. I think we should just search from the standard Google website, and let the search engine figure out what’s what. Don’t put that burden on the user. I want to be able to get for example the most relevant posts about the new Xbox…. are blogs more relevant ? Or content websites ? Are newer posts better (or more relevant) than older posts ? Relevancy is very hard to measure as you know.

    regards,
    Leo

    Like

  15. Part of the problem is that time-based searching is dependent upon the blog post being found in a “timely manner”. This means that web crawlers have to be on the alert 24/7 and happen to cross THAT post at THAT moment or soon after its release. Imagine the size and speed of that crawler!

    So that leaves us with two options. Either the burden is upon the user to update tagging and search services, or pings and trackbacks will have to merge and grow into a new form of “I got a secret!”.

    I see a form of ping and trackback services sending out an excerpt of the post to search engines and directories at the moment of posting. Immediately, time-based information is delivered, literally, to your door.

    Google, Yahoo, and MSN are not the end-all, but they are the beginning. I see tagging as part of the baby steps of information gathering on the Internet.

    The first to come up with this new form of ping and trackback service, with checks and balances thrown in, will get all my attention, and it should get yours.

    Like

  16. Part of the problem is that time-based searching is dependent upon the blog post being found in a “timely manner”. This means that web crawlers have to be on the alert 24/7 and happen to cross THAT post at THAT moment or soon after its release. Imagine the size and speed of that crawler!

    So that leaves us with two options. Either the burden is upon the user to update tagging and search services, or pings and trackbacks will have to merge and grow into a new form of “I got a secret!”.

    I see a form of ping and trackback services sending out an excerpt of the post to search engines and directories at the moment of posting. Immediately, time-based information is delivered, literally, to your door.

    Google, Yahoo, and MSN are not the end-all, but they are the beginning. I see tagging as part of the baby steps of information gathering on the Internet.

    The first to come up with this new form of ping and trackback service, with checks and balances thrown in, will get all my attention, and it should get yours.

    Like

  17. thanks Robert, very kind of you to note my work. i wish someone would fix these problems now too. i think there is a real need to make blogs accessible to people who aren’t in the blog community, and who want some comprehensible way (no geekiness allowed) to understand them, and who is doing the writing.

    it would help people quite a bit.

    mary

    Like

  18. thanks Robert, very kind of you to note my work. i wish someone would fix these problems now too. i think there is a real need to make blogs accessible to people who aren’t in the blog community, and who want some comprehensible way (no geekiness allowed) to understand them, and who is doing the writing.

    it would help people quite a bit.

    mary

    Like

  19. Jeremy: it’s just a subculture I’m not that familiar with that’s very large. Translation: it is a good one for geeks like me to study and see if we can do better in serving.

    Like

  20. Jeremy: it’s just a subculture I’m not that familiar with that’s very large. Translation: it is a good one for geeks like me to study and see if we can do better in serving.

    Like

  21. Oh, you should have mentioned it in NYC – I love that community, and have worked with it. And, I think it makes too much money to be a subculture. πŸ™‚

    And, you are right – there’s a lot we can learn from scrapbooking about grassroots efforts, word-of-mouth activities, and building lasting communities.

    Like

  22. Oh, you should have mentioned it in NYC – I love that community, and have worked with it. And, I think it makes too much money to be a subculture. πŸ™‚

    And, you are right – there’s a lot we can learn from scrapbooking about grassroots efforts, word-of-mouth activities, and building lasting communities.

    Like

  23. “The best stuff” is very subjective, the world tis not all geek posers.

    PS – Gawd are these comments impossible to read. For all the blog blahger and overhype, you’d think someone could ever come up with a decent comment system, no blog engines ever seem to work.

    Like

  24. “The best stuff” is very subjective, the world tis not all geek posers.

    PS – Gawd are these comments impossible to read. For all the blog blahger and overhype, you’d think someone could ever come up with a decent comment system, no blog engines ever seem to work.

    Like

  25. Blogniscient looks interesting.

    Sounds like you are looking for a tail -f INTERNET | grep ‘October 16, 2005’ command somewhere.

    Findforward has some interesting ways of using the Google api..

    http://www.findforward.com/?q=microsoft&t=chat

    My problem isn’t finding stuff on the internet, it’s reading it all. Robert, you have way too many blogs going on right now! πŸ™‚ Way too much stuff happening with Microsoft. Also with China… (see here http://travelcostarica.blogspot.com/)

    The name is a bit misleading, but I use this blog for documenting places I am going to travel to. I am going to China this week to see if it works from over there…

    Try searching for China on Gadda.be to get some interesting results of what is happening in the world today…

    Like

  26. Blogniscient looks interesting.

    Sounds like you are looking for a tail -f INTERNET | grep ‘October 16, 2005’ command somewhere.

    Findforward has some interesting ways of using the Google api..

    http://www.findforward.com/?q=microsoft&t=chat

    My problem isn’t finding stuff on the internet, it’s reading it all. Robert, you have way too many blogs going on right now! πŸ™‚ Way too much stuff happening with Microsoft. Also with China… (see here http://travelcostarica.blogspot.com/)

    The name is a bit misleading, but I use this blog for documenting places I am going to travel to. I am going to China this week to see if it works from over there…

    Try searching for China on Gadda.be to get some interesting results of what is happening in the world today…

    Like

  27. Pingback: Don Singleton

Comments are closed.