Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As someone who has actively participated in DDH for a while now, here are my views:

- A non-trivial part of the current contributions included "cheat sheets" which IMO, really required a lot of effort to ensure correctness/usability but don't really provide much improvement to search results(I don't think I myself used the feature in the past 1.5 years more than 3-4 times), so, this should really free up time for DDG staff to focus on the more important instant answers and features.

- The community has been, for a while now, getting smaller and less contributing in the recent past. Backed by data from official repos(the number of commits over time, that is)[1]. After all, there are only a finite number of instant answers before they just become redundant.

- The current model for the triggers(when an instant answer gets displayed) is quite restrictive. It's just regex-based. IMO, a lot more growth can be achieved using ML models for triggering, A/B testing etc.

I'm still kind of disappointed with this. Perhaps unrelated, but does anyone have any suggestions for people willing to work on similar open source projects.

[1]: https://github.com/duckduckgo/zeroclickinfo-spice/graphs/con... , https://github.com/duckduckgo/zeroclickinfo-goodies/graphs/c...



Kiwix - most people are too conditioned to think that search has to happen online and don't even realize what is possible offline.

Entire web archives such as the entire dump of wikipedia and stackexchange (including media and indexes for search) can be stored locally. The missing piece is Google level search quality on the local machine. Given that brute force substring search can process Gigabytes in seconds nowadays. If you have enterprise grade server hardware things are reaching 1000GB/s. At this rate, there is no reason to think in a couple years local search of all known human knowledge can't happen on a local device at Google level result quality.

For anyone interested in the search space look into whats possible today in local offline search.


This is a great observation & seems to dovetail with technologies like IPFS.[1]

[1]: https://ipfs.io


You might be right, but human knowledge is also expanding, of course. The question is: will it expand faster than hardware capabilities?

Anyway, I wish we'd see more search and NLP related posts here on HN. It deserves far more attention than it gets.


For the average person this rate does not matter. They don't need access to the cutting edge of quantum physics, astronomy, dance, art or javascript.

All you have to do is look at the speed at which new info is being added to Wikipedia and Stackoverflow which is stabilizing, i.e. it is not growing as it once was. Basic/foundational knowledge is more or less all covered. https://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%...

And that sum total comes to 50-60 GB compressed. Think about that number. It's not big.


The sum total of our collective intelligence is equal to an install of gtaV... Crazy.


Wikipedia is not the sum of our collective knowledge. It's little more than the preface.


We're talking about the "long tail" of information, which is huge also outside of science. Think popular culture.


It would be awesome if you could download dumps of wikepedia filtered by category so You can get the size down. Probably a lot of information that is useless to me in there


Kiwix does this, at least to a certain degree: http://wiki.kiwix.org/wiki/Content


Listen to Wikipedia http://listen.hatnote.com


NLP is rightly ignored.

https://en.m.wikipedia.org/wiki/Neuro-linguistic_programming...

Edit: Fortunately I'm left feeling foolish, rather than horrified.



The average user's needs are so small.

You do not even need "Google level" for most of today's web users.

You can deliver what users need with respect to web search with much less than "Google level".

For example a simple "<title>" search. This is how Google started.

The entry point into the web should be search for domains. A "<title>" search can do that.

Most users today do not do much searching within websites via Google. They search for websites using Google.

Anyway, you are right about storage space and offline search but obviously that truth misaligns with the "cloud" business narrative and coaxing users to store all their personal data in datacenters instead of on their desk or in their pocket.

Expect much opposition to this simple truth.


http://web.archive.org/ now provides full-text search, mostly of website titles.

Try it out. You'll find that it's... it feels like a trip back to 1998.


I'd say especially the average user profits from a search system that's somewhat clever and finds things even if they do not ask the exactly right query.

And searching for domains is only a tiny part of it, especially now where a lot of information is stuck in general sites with a lot of content (wikis, Q&A sites, social media sites) and not on special-interest sites. And for many generic searches the special-interest domains are various levels of spam/affiliate marketing.


PCIe 3 x16 devices have a 16GB/s theoretical max, so 1000GB/s is still out of reach for single machine I/O (though it's not as though search needs anywhere near these bandwidths anyway).


The Intel i9-7900x has 44 PCIe 3.0 lanes and wikipedia tells me each lane has throughput 984.6 MB/s so there's ~40 GB/s, maybe fast compression could make a small integer multiple.

https://www.intel.com/content/www/us/en/products/processors/...


AMD Threadripper has 64 in all available models: https://en.wikipedia.org/wiki/Zen_(microarchitecture)



That blog seems to imply you're using a distributed architecture, ie. not a single machine.


I've been using Google and Wolfram Alpha for these things over the years, but it has always irked me that I'm sending this info to a third-party, to run through their services that I have no way to read or improve the code, and knowing that these things are only available to me if I'm online. I was really happy when I found out the DuckDuckGo Instant Answers modules' source code is open.

It's been on my list of things that I will almost definitely never take the time to actually work on, but I wished what I had was (A) a browser extension or GNOME extension that incorporates an offline version of all the DuckDuckHack modules, and (B) the same thing in an open source mobile app. (This kind of thing could just as easily live in a command line app, though, and I'd be super happy if a project maintainer incorporated them into something like GNU Units.) I looked into it, especially for (B), but I realized that the DuckDuckHack code depends on Perl.


Well, about offline availability, a large number of instant answers(spices and fatheads that are) use external APIs or indexed databases from websites, so they can't work offline.

DDG does have official(and unofficial) browser extensions and apps for iOS/Android.


> Well, about offline availability, a large number of instant answers [...] can't work offline

Sure, but there are a large number of instant answers that can and do work offline because they're simple, static tables, or are self-contained—existing only to apply transformations on the input (e.g., cheatsheets, natural language unit conversions, and calculations).

> DDG does have official(and unofficial) browser extensions and apps for iOS/Android

A browser extension that just sends the query the same as it would if you hit their homepage is in the "what's the point?" category, just like mobile sites that nag you to install their app when all it does is show you the same content that is (or could be) on the mobile site itself. The "is a browser extension" is not the interesting part. "Doesn't send data to a third party" and "can operate without being connected to the network" are.


Why can't we have an intermediary search service that grabs search results from Google and posts them on a search website anonymously?


Startpage [1] is what you're looking for.

[1] https://startpage.com


Right. StartPage.com delivers Google search results in privacy. Plus, it offers a free proxy with every search result so you can visit websites through StartPage anonymously, too.


In DuckDuckGo, !g more or less does this, in that it disables search bubbling, but I think google can see your client IP when the results are served to your browser.


Banging into Google using !G is like searching Google directly. Banging from DDG doesn't confer any privacy protections. A lot of people don't know this.


Start page does just that. Ddg something and use !sp to search there.


Let me save you a lot of time for the future:

!s is enough to redirect to Startpage. :-)


searx proxies user requests to different search engines.

https://github.com/asciimoo/searx

there are different instances : https://github.com/asciimoo/searx/wiki/Searx-instances




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: