Open Source Clustering: Introduction to Apache Zookeeper

February 10, 2014

Open Source Clustering: Introduction to Apache Zookeeper

 _______  _______  _______  _______  __   __  _______ 
|   _   ||       ||   _   ||       ||  | |  ||       |
|  |_|  ||    _  ||  |_|  ||       ||  |_|  ||    ___|
|       ||   |_| ||       ||       ||       ||   |___ 
|       ||    ___||       ||      _||       ||    ___|
|   _   ||   |    |   _   ||     |_ |   _   ||   |___ 
|__| |__||___|    |__| |__||_______||__| |__||_______|

________    ______      ______    __   ___  _______   _______    _______    _______   _______   
("      "\  /    " \    /    " \  |/"| /  ")/"     "| /"     "|  |   __ "\  /"     "| /"      \  
 \___/   :)// ____  \  // ____  \ (: |/   /(: ______)(: ______)  (. |__) :)(: ______)|:        | 
   /  ___//  /    ) :)/  /    ) :)|    __/  \/    |   \/    |    |:  ____/  \/    |  |_____/   ) 
  //  \__(: (____/ //(: (____/ // (// _  \  // ___)_  // ___)_   (|  /      // ___)_  //      /  
 (:   / "\\        /  \        /  |: | \  \(:      "|(:      "| /|__/ \    (:      "||:  __   \  
  \_______)\"_____/    \"_____/   (__|  \__)\_______) \_______)(_______)    \_______)|__|  \___)

Introduction:

Apache Zookeeper was made for distributed systems and came from the open source community. The defacto standard when developing clusters in production. In a big data environment that uses Apache Hadoop you will definitely find Apache Zookeeper. Zookeeper can be embedded in Java and C. When you want to architect a distributed applications or multiple servers across an LAN or WAN network then Zookeeper is a good choose. Clients connect to Zookeeper server via TCP protocol. Once you get a connection you can do things like: send requests, get requests, get watch events, and send heartbeats.

If the connection drops then the client will connect to the next server. If you only run one zookeeper instance then your using singleton which can't do replication and if it crashs your stuck. One good feature of Zookeeper is that it sets the order by time stamping updates so if you make a series of sequential requests you can expect it to arrive and get processed in that order.

Note: The term "znodes" refers to a data node. Uses a shared hierarchical name space which contains znodes.

ZNode Types:

Persistent Nodes: is there until its deleted
Ephemeral Nodes: is there as long as the session is active. Can't have children
Sequence Nodes: (Unique naming) applies both persistent and ephemeral nodes

Znode Operation Types:

Write: Create, Delete, setData, setACL
Read: Exists, getChildren, getData, getACL, sync

ZNode Watchs:

NodeChildrenChanged
NodeCreated
NodeDataChanged
NodeDeleted

When you want to trigger something to happen when a znode changes then we set watches which are time triggers. Watches are ordered.

Tips to developers:

Since Zookeeper uses inmemory and it writes logs sequentially then you should set it up in its own server (box) because then it doesn't have to fight for resources. It is also recommended that you set the java heap to a reasonable size. Its also recommended that the amount you set on the heap isn't larger then the amount of memory available. To set it to 1gb to 2gb type:

JAVA_OPTS="-Xms1g -Xmx2g"

Clustering:

For Zookeeper to remain active in a cluster you need atleast 3 znodes (alive).

Good rule of thumb is to use odd number of servers. Remember this formula:

n-1 / 2

Benefits:

Coordination Service
Synchronization
Configuration Management
Grouping
Naming
High availability
Favors consistency over everything else
Order is preserved

Znodes are like files and directories. You can think of the shared hierarchical namespace like a linux file system. Top level always starts at root but that root directory can have directories and files. Znodes can also have data associated with it such as coordination data (status, configuration, location, etc.) . Between a byte to kilobytes approximately in size.

Common misconception:

You should not use Zookeeper as a data store it was not meant for that.

What's an Emsemble?

These are multiple Zookeeper replicated over multiple hosts.

Performance:

Apache Zookeeper stores its data inmemory so you can expect high throughput and low latency. Apache Zookeeper is faster on reads then on writes at a ratio of 10:1 . The reason why reads are faster is because read requests are processed locally at the Zookeeper server which the client is currently connected to. Write are forwarded to the leader and goes through majority consensus before the response is generated.

Remember: If the connection drops then the client will connect to the next server.

Consistency Guarantee:

Sequential Consistency: Ordered
Atomicity: All or nothing
Single System Image: all clients see the same thing
Reliability: data is persisted once applied
Timeliness: up to date

Problems solved by Apache Zookeeper:

Outages (if your doing replication your safe)
Coordination challenges
Operation Complexity. Have humans manually updating stuff on every server... Imagine 100,000 servers..
Data changing

What happens when I set up replication and the leader dies?

Zookeeper may be down for a second but if you configured it right then it goes through a voting algorithm to elect a new leader.

Other usages of Apache Zookeeper out in the wild:

Distributed Locking: (safe) You can implement barriers and latches
Discover Service: You can use DNS but Apache Zookeeper is much easier
Configuration: I think that dynamic configuration is a bit risky
Apache Hadoop: Uses it for service management

Drop some feedback in the comments below!
:>

Thanks,
Zak
Twitter: @Prospect1010
Linked: http://www.linkedin.com/in/zakeriahassan
Github: https://github.com/zmhassan

Search This Blog

Open Source Blog

Open Source Clustering: Introduction to Apache Zookeeper

Introduction:

Comments

Post a Comment

Popular Posts

How to use gettext() function in nunjucks for localization in Webmaker.org