“GlusterFS replication can happen on just 2 nodes as a minimum, as opposed to 3 with HDFS.”
So this little tidbit was tucked into the Gluster marketing material for 3.3
Note that we use Gluster internally and it’s been a pretty solid system. That said, they need to do a little more research before they post that blurb. First of all 3 nodes is the recommended amount of replication in a Hadoop cluster, you can easily run with two or four nodes if you want to, it’s just all about what amount of redundancy you want. Second of all Hadoop has you run in JBOD for your disks and Gluster wants you to RAID them. The amount of space saving is going to be very small and going to be similar to the exercise I went through comparing Greenplum to Hadoop disk usage, which is really not that much. So this as a selling point of using Gluster as a replacement for HDFS is just not true.