Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google / Wolfram Alpha / Watson are great at answering broad questions.

1. "Who was the 12th president?" - Zachary Taylor 2. "What color wine is cabernet sauvignon?" - Red 3. "Is a ferret a rodent?" - The ferret is the domesticated member of the Order Carnivora, Family Mustelidae and Genus Mustela. A common misconception is that ferrets are rodents.

The real challenge is answering niche questions:

1. What size are the OEM rear wheels of a Honda S2000? 2. How can I fix MySQL error 1064? 3. How do I remove wine from a macbook?

These types of questions aren't answerable by a simple mining of Wikipedia or Encyclopedic knowledge. They represent niches within our society (S2000 owners, programmers, people who spilled wine on their macbooks). Google provides excellent links to pages that contain answers to these questions, but it cannot deduce a single answer or common response. This is why sites like Answers.com, Yahoo! Answers, StackExchange, etc. can flourish, but it's also why an NLP question and answer system is very difficult.

I've been working on a system to mine existing responses to questions - http://gotoanswer.stanford.edu - I only have a small subset of programming-related questions (~10M), but you can get an idea for what I'm trying to do by searching for "How do I remove wine from a macbook?" You'll see that there are results for removing wine the liquid and WINE the windows non-emulator.



You should put your contact details in your profile (In the "About" section - your email address isn't publicly visible).

Anyway - I'm really interested in this area. I have Q/A system built that can answer (some of) the broad-type questions you mention.

I think grouping Google/Wolfram/Watson together misses that each has their strengths and weaknesses, and that they take dramatically different approaches.

Google traditionally relies on ranking information it finds to answer questions (though the whole knowledge graph thing is moving it closer to what Watson does).

Wolfram relies on manual curation of facts and probably the best "calculation" engine of the three.

Watson relies on manual curation of sources, and automatic extraction of facts and ranking of them.

I think it's quite interesting that Google is moving to a model more similar to Watson.

Anyway - I'd love to hear about your approach and what you are doing. My contact is in my profile.


You bring up a good point, but it seems as if Watson was designed with this in mind. If you notice in the JSON response, it lists this query as a factoid class.

It may handle different queries with different attributes differently, such as focusing on certain portions of its corpus or changing what aspects of its search results are more heavily weighed.

A query identified as a factoid might be researched and judged very differently than something a bit more nebulous, such as a comparison, or something with more specificity like the examples you listed.

Admittedly, I am basing quite a bit off of one example response given in their documentation, but it is an intriguing clue as to how Watson will handle that aspect of understanding which info to discern.


Factoid is a word coined by Norman Mailer for "an item of unreliable information that is repeated so often that it becomes accepted as fact". http://en.wikipedia.org/wiki/Factoid


In QA literature it is often used in the "unverified fact" sense (which is also mentioned in Wikipedia article).

The Wikipedia article says it well:

A factoid is a questionable or spurious (unverified, false, or fabricated) statement presented as a fact, but without supporting evidence, although the term can have conflicting meanings.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: