Tech heads, how hard would it be to create a search engine!!!

Mask

"OneOfTheBest"
Platinum Member
That idea just popped in my mind, so I haven't research anything just yet. It would have to be on the same levels(bing, google & yahoo)
 

MrSid

International
International Member
To incorporate inside of an app...





Why you say this?
Imagine your home laptop as your computing and storage device .
Now imagine a data centre with 10000 computers .
Now multiply that data center by 10 .
Google s BORG algorithms treat a data centre as you will treat your home laptop .

now throw Kubernetes on top of that . Google has redefined how we look at data centers .

Google is adding compute and storage to its data centers daily at the rate as the whole compute and storage it had in 2005 .

Until you have the same compute power as google then building another google search is next to impossible .Thats why I say that .

If you want to create a search engine for with in your app you can easily do that . That is typical second year software engineering assignments .However to create on the same scale as google or bing that's where next to impossible is .
 

Mask

"OneOfTheBest"
Platinum Member
Imagine your home laptop as your computing and storage device .
Now imagine a data centre with 10000 computers .
Now multiply that data center by 10 .
Google s BORG algorithms treat a data centre as you will treat your home laptop .

now throw Kubernetes on top of that . Google has redefined how we look at data centers .

Google is adding compute and storage to its data centers daily at the rate as the whole compute and storage it had in 2005 .

Until you have the same compute power as google then building another google search is next to impossible .Thats why I say that .

If you want to create a search engine for with in your app you can easily do that . That is typical second year software engineering assignments .However to create on the same scale as google or bing that's where next to impossible is .


Great insight, I'm forward this to my Goon(mad scientist in tech terms) to let him know why my idea would be impossible.
 

Mask

"OneOfTheBest"
Platinum Member
Imagine your home laptop as your computing and storage device .
Now imagine a data centre with 10000 computers .
Now multiply that data center by 10 .
Google s BORG algorithms treat a data centre as you will treat your home laptop .

now throw Kubernetes on top of that . Google has redefined how we look at data centers .

Google is adding compute and storage to its data centers daily at the rate as the whole compute and storage it had in 2005 .

Until you have the same compute power as google then building another google search is next to impossible .Thats why I say that .

If you want to create a search engine for with in your app you can easily do that . That is typical second year software engineering assignments .However to create on the same scale as google or bing that's where next to impossible is .




I've been looking at this article about the idea ...


How to Build a Search Engine Like Google


file:


Have you ever thought about building a fully featured search engine working similar to Google or Bing? Google has emerged as one of the biggest companies on Internet within a very short span of time. All internet entrepreneurs might have amused by seeing the success of Google as a Company. Thinking about the Technology, how google is working so fast and powerful? How does google manage the fault tolerance? Where do google save all these data of billions of web pages? Can you create a search engine like Google? If so how?

Well, thinking about building a search engine like google, you need to know various aspects. First of all building a search engine like google cannot be done overnight. It takes months or even years to crawl and store all the data, and to rank the results, to make it crawl almost the entire web. But usually you should be able to start producing the search results within a couple of week.

Where do you store the data? Where do Google stores the data? Google has a unique NOSQL database called BigTable where they store the entire search data. BigTable works on a distributed system which works on much reliable HDFS system. This file system supports distributed computing to support thousands of notes attached in the network.

What Technology should I use?
You cannot run google on MySQL. Period. Not even in Oracle, if you are looking for a global scale service. You need to have something similar to BigTable which works on a file system like HDFS. But BigTable is google specific technology and are not open source and not available to the public, except a hosted version of it is recently made available in google cloud.


Hadoop : Hadoop is a collection of various bigdata components/software/tools including HDFS which is widely regarded as the BEST distributed filesystem available now. Hadoop is open source continuously researched and developed by Apache! Hadoop is the best file system you can use to run a highly scalable, multimachine applications like search engines, analytics etc.Hadoop help you to connects thousands of nodes together to work as a expandable file system.http://hadoop.apache.org/


HBase: Hbase is a database that works on NOSQL (Not Only SQL) system, which can work on top of Hadoop to store petabytes of data. Though it based on Java and regarded as a reliable database. Hadoop is maintained by Apache!http://hbase.apache.org/


Hypertable: Hypertable is another NOSQL database which works on Hadoop. It works based on C++ and the Hypertable company claims that the performance is much faster the HBase. Hypertable support is also very good and it has more flexibility on queries comparing with HBase.http://hypertable.com/

So for running a Google clone, you shall either use Hadoop + HBase or Hadoop + Hypertable.

What Hardware Should I use?
Of course I understand that you don’t want to start with your own datacenter initially. Google has their own, ever expanding datacenter around the world. The ideal solution to start would be you tie up with a datacenter or hosting company who can provide a series of nodes(computers) in a single network. The key reason, why need nodes in a single network is that, as we expand more nodes in future in a scalable distributed system, nodes in same physical network can significantly improve the performance of your search engine.

How Can I Code a Google Clone Application?
Here comes the most tricky and interesting part on your journey to build a Google clone search engine. No matter your decide to use the right technology or to use the right infrastructure, if the code is not powerful, and designed to manage the scalability, your spider won’t be effective enough. I am not able to cover your the components of your software logic, algorithm to build up a spider. Anyway the below diagram found on Inout Spider will give you a read good idea about the major components required to build a spider. Inout Spider is a commercial application (widely regarded as a powerful search engine data spider application, and a standard google clone script) which work on Hadoop and Hypertable technologies. So if you cannot code it yourself, I recommend you consider Inout Spider.

file:

Summary

Building a search engine like google, is never as easy task, or else we would have seen much google clones online. But with the right technology, hardware and software(your own, or commercial applications like Inout Spider), your dream is achievable.



Disclaimer

By Google clone, I do not mean an exact google clone, The term Google is used as a synonym for ‘search engine’. This article is indented to help you create a standard search engine like Google, Bing, Yahoo, Baidu etc.
 
Top