TechTalkz.com Logo

Go Back   TechTalkz.com Technology & Computer Troubleshooting Forums > Tech World > Technical Discussions > Tech Reference

Notices

Reply
 
Thread Tools Display Modes
Old 02-06-2007, 02:02 PM   #1
Junior Member (25+)
 
Darklord's Avatar
 
Join Date: Feb 2007
Location: /Asia/India/Rajasthan/Bits-Pilani
Posts: 54
Thanks: 1
Thanked 14 Times in 8 Posts
Rep Power: 9 Darklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to behold
Send a message via Yahoo to Darklord
Smile HOW GOOGLE WORKS ??

Guys... This article describes how efficiently Google manages such huge data and fast search results.

The key to the speed and reliability of Google search is cutting up data into chunks.

Google Machinery:

To deal with the more than 10 billion Web pages and tens of terabytes of information on Google's servers, the company combines cheap machines with plenty of redundancy. Its commodity servers cost around $1,000 a piece, and Google's architecture places them into interconnected nodes. All machines run on a stripped-down Linux kernel. The distribution is Red Hat but Google doesn't use much of the distribution. Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

The downside to cheap machines is that they must be made to work together reliably. These things are cheap and easy to put together. The problem is, these things break. In fact, at Google, many will fail every day. So, Google has automated methods of dealing with machine failures, allowing it to build a fast, highly reliable service with cheap hardware.

The Search:

Google replicates the Web pages it caches by splitting them up into pieces it called "shards". The shards are small enough that several can fit on one machine. And they are replicated on several machines, so that if one breaks, another can serve up the information. The master index is also split up among several servers, and that set also is replicated several times. These servers are called chunk servers.

As a search query comes into the system, it hits a Web server, and then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box.

In parallel, clusters of document servers contain copies of Web pages that Google has cached. The refresh rate is from one to seven days, with an average of two days. That's mostly dependent on the needs of the Web publishers.

Each set of document servers contains one copy of the Web. These machines are responsible for delivering the content snippets that show searchers relevant text from the page. When the top 10 results are available, they are sent to the document servers, which load the 10 result pages into memory. Then, these pages are parsed to find the best snippet that contains all the query words.

The Backbone of Google’s Architecture:

Google uses three software systems built in-house to route queries, balance server loads and make programming easier.

The Google File System was written specifically to deal with the cheap machines that will fail. All the files are broken into chunks and then distributed randomly across different machines in a way such that each chunk has at least two copies that are not physically adjacent, i.e., not on the same power line or connected to the same switch. Chunks typically are 64 megabytes and are replicated three times. All this replication makes it easier to make changes. Google simply takes one replica at a time offline, updates it, then plugs the machines back in.

Because these chunks are randomly distributed all over, Google needs a master containing metadata to keep track of where the chunks are. When a query comes into the system, the file system master tells it which chunk server has the data. From there on, you just talk to the chunk servers.

Client machines are responsible for dealing with fault tolerance. If a client requests a file from the specified chunk server and gets no response within the designated time period, it uses the meta information to locate another chunk server, while sending the file master a hint that the first chunk server might have died. If the master confirms the chunk went out, it will replicate the chunks that were on it to another server, making sure that the information is replicated at least the minimum number of times.

To enable Google programmers to write applications to run in parallel on 1,000 machines, engineers created the Map/Reduce Framework in 2004. This framework provides automatic and efficient parallelization and distribution. It is fault tolerant and it does the I/O scheduling, being a little bit smart about where the data lives.

Programmers write two simple functions, map and reduce, to create a long list of key/value pairs. Then, the mapping function produces other key/value pairs. For example, if an application is needed to count URLs on one host, the programmer would take the URL and the contents and map them into the pair consisting of hostname and This produces an intermediate set of key/value pairs with different values. Next, a reduction operation takes all the outputs that have the same key and combines them to produce a single output.

Map/Reduce is a very simple abstraction that makes it possible to write programs that run over these terabytes of data with little effort.

The third homegrown application is Google's Global Work Queue, which is for scheduling. Global Work Queue works like old-time batch processing. It schedules queries into batch jobs and places them on pools of machines. The setup is optimized for running random computations over tons of data.
Mostly, huge tasks are split into lots of small chunks, which provides even load balancing across machines. The idea is to have more tasks than machines so machines are never idle.

Google uses its massive architecture to learn from data. It analyzes the most common misspellings of queries, and uses that information to power the function that suggests alternate spellings for queries.

The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words. To do this, the system tries to cluster concepts into "reasonably coherent" sub- clusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. This kind of machine learning has had little success in academic trials, because they didn't have enough data. If there is enough data, reasonably good answers are obtained out of it.

Google's redundancy theory works on a meta level. One literal meltdown -- a fire at a data center in an undisclosed location -- brought out six fire trucks but didn't crash the system.

__________________________________________________ _____________________________

Source :
KERNEL (CSA Annual Magazine, BITS,Pilani)
Article by : D Sriram
Editor: Sridatta Chegu (Me )
__________________
Darklord is offline   Reply With Quote
Thanked Users:
Dark Star (02-06-2007)
Old 02-06-2007, 07:28 PM   #2
Elite Member (1000+)
 
Dark Star's Avatar
 
Join Date: May 2006
Location: /dev/had0
Age: 19
Posts: 1,577
Thanks: 98
Thanked 170 Times in 146 Posts
Rep Power: 43 Dark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just GreatDark Star is just Great
Re: HOW GOOGLE WORKS ??

Awesome never bothered to know that btw thanks a lot BITS gem
__________________
My GNU/ Tux Blog : ~TuxEnclave~
Dark Star is offline   Reply With Quote
Old 02-06-2007, 11:57 PM   #3
Junior Member (25+)
 
ajaykumar.kataram's Avatar
 
Join Date: Jun 2006
Location: hyderabad
Age: 30
Posts: 29
Thanks: 6
Thanked 10 Times in 5 Posts
Rep Power: 3 ajaykumar.kataram will become famous soon enoughajaykumar.kataram will become famous soon enough
Send a message via Yahoo to ajaykumar.kataram
Re: HOW GOOGLE WORKS ??

see how it works
Attached Files
File Type: zip 42googleworks2ac.zip (554.3 KB, 18 views)
ajaykumar.kataram is offline   Reply With Quote
The Following 2 Users Say Thank You to ajaykumar.kataram For This Useful Post:
Dark Star (03-06-2007), Darklord (03-06-2007)
Old 03-06-2007, 01:54 AM   #4
Junior Member (25+)
 
Darklord's Avatar
 
Join Date: Feb 2007
Location: /Asia/India/Rajasthan/Bits-Pilani
Posts: 54
Thanks: 1
Thanked 14 Times in 8 Posts
Rep Power: 9 Darklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to beholdDarklord is a splendid one to behold
Send a message via Yahoo to Darklord
Re: HOW GOOGLE WORKS ??

@Ajaykumar.. Thanks for the image..
Darklord is offline   Reply With Quote
Old 03-06-2007, 01:49 PM   #5
Elite Member (1000+)
 
Night_virus's Avatar
 
Join Date: Jul 2006
Location: Kolhapur, India
Age: 17
Posts: 1,114
Thanks: 69
Thanked 47 Times in 45 Posts
Rep Power: 17 Night_virus is a glorious beacon of lightNight_virus is a glorious beacon of lightNight_virus is a glorious beacon of lightNight_virus is a glorious beacon of lightNight_virus is a glorious beacon of light
Send a message via MSN to Night_virus Send a message via Yahoo to Night_virus
Re: HOW GOOGLE WORKS ??

nice, explained in Image...
Night_virus is offline   Reply With Quote
Old 03-06-2007, 03:11 PM   #6
Senior Member (500+)
 
The Chosen One's Avatar
 
Join Date: Jan 2007
Location: Tunisia
Age: 18
Posts: 831
Thanks: 39
Thanked 46 Times in 43 Posts
Rep Power: 20 The Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud ofThe Chosen One has much to be proud of
Send a message via MSN to The Chosen One Send a message via Yahoo to The Chosen One
Re: HOW GOOGLE WORKS ??

gr8,good job
The Chosen One is offline   Reply With Quote
Old 05-06-2007, 06:11 PM   #7
Regular Member (100+)
 
Join Date: May 2006
Posts: 225
Thanks: 0
Thanked 27 Times in 20 Posts
Rep Power: 5 sree has a spectacular aura aboutsree has a spectacular aura about
Re: HOW GOOGLE WORKS ??

good post!
sree is offline   Reply With Quote
Old 05-06-2007, 09:32 PM   #8
Network Dude
 
Petrowhisky's Avatar
 
Join Date: Nov 2005
Location: In the heaven of Technologies..
Posts: 73
Thanks: 2
Thanked 7 Times in 5 Posts
Rep Power: 4 Petrowhisky will become famous soon enoughPetrowhisky will become famous soon enough
Re: HOW GOOGLE WORKS ??

Thanx for the post dark lord and thanx ajaykumar for the image...
Petrowhisky is offline   Reply With Quote
Reply

Thread Tools
Display Modes


Similar Threads
Thread Thread Starter Forum Replies Last Post
IE7 works ok until ALex Internet Explorer 12 28-08-2007 09:52 PM
Google Romance - Pin All Your Romantic Hopes on Google! Strider General Discussions 3 31-10-2006 08:13 AM
Google Introduces Business Coupons on Google Maps sree Software Releases 1 19-08-2006 02:16 AM
Google Pack : A free collection of essential software from Google Strider Software Releases 11 28-06-2006 12:05 AM
Google Romance - Pin All Your Romantic Hopes on Google Strider Technical Discussions 2 03-04-2006 12:01 PM

Google
 


All times are GMT +5.5. The time now is 01:46 PM.


vBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO
Copyright © 2005-2008, TechTalkz.com. All Rights Reserved - Privacy Policy
Valid XHTML 1.0 Transitional