2

Roku Rocks

Posted by scottk on June 13, 2010 in Ramblings |

I picked up a Roku DVP box last week, it is a little box about the same height and 1/4 the size of your standard DVD/VCR. It handles wired and wireless networking to get the internet. What sparked this is that for awhile I’ve been  thinking we are paying way to much for our cable service that we rarely watch.  In fact the only things that get watched are shows recorded from Sprout/Nick for the kids and Nebraska football games. Most of the content we watch day to day is centered on a file share and circulated to XBMC machines in the house.

So getting the Roku out of the box and hooked up took no time at all, it’s dead simple if you know how to hook up standard TV components. Once in I must say I was disappointed that there isn’t an out of the box solution for reading from our share on the Roku, but I did find a good tutorial on getting MyMedia up and running which adds this functionality. About an hour of work (after finding the tutorial) installing Python and putting the box in developer mode and now I have access to my shared structure. I am having to recode a lot of the previously ripped things from .avi to .mp4 h.264 but found a handy batch script to allow Handbrake to iterate through a whole directory of files and fix them up.

The big wins on the Roku are the Netflix and Amazon Video on Demand services. With Netflix it will pick whatever you’ve dropped into your instant queue and allow you to watch it. This makes Netflix a million times more valuable to us as it’s not tied to ordering DVD’s or streaming them through a laptop and the interface is simple enough the kids could work it. Additionally the Amazon Video on Demand service is amazing. For under $20 I bought a full season of Super Why and did similar for a full season of Blue’s Clues. In the future I can get at the Amazon service behind any flash enabled internet connected device, so all of our laptops and I’m thinking my Android phone once it gets the FroYo update.

The only want I have for this device right now is to see some way for me to get Big10 football games now that the Huskers have moved over. As far as I understand the Big10 network does streaming of broadcasts so I don’t think it will take much for that piece to become a reality.

Roku = Win

Tags: , , ,

How big is a Yottabyte

Posted by scottk on June 6, 2010 in Ramblings |

from here

Tags:

re.last.fm

Posted by scottk on June 6, 2010 in Ramblings |

Giving last.fm a try again. My last last.fm usage was December 2008 according to my profile. I’ve been using Pandora for quite a long time and don’t really have any issues with it, I just figure it’s good to check back on the alternatives every once in awhile and see where they are at now.

I ran into a bit of an issue on my Ubuntu 10.04 laptop getting Banshee and Last.fm to play together nicely. It wouldn’t allow recognize my last.fm account was authorized in the banshee player. I did the following three things and the issue was resolved:

  1. rm ~/.cache/banshee-1/extensions/last.fm/audioscrobbler-queue.xml
  2. gconftool-2 –unset /apps/banshee-1/plugins/lastfm/session_key
  3. gconftool-2 –unset /apps/banshee-1/plugins/lastfm/username

Hints taken from here

1

What the smux

Posted by scottk on May 28, 2010 in Sysadmin |

We’ve been running into issues where Dell Openmanage doesn’t want to allow snmp in to it’s service in order to gather information. This is normally done via smux which is setup in the snmpd.conf

# Allow Systems Management Data Engine SNMP to connect to snmpd using SMUX
smuxpeer .1.3.6.1.4.1.674.10892.1

What I’ve come to find out is that one some of our servers the smux password has been set to something strange. In order to fix this you need to make a call to the Dell utility dcecfg. For or servers it was one of two calls

/opt/dell/srvadmin/dataeng/bin/dcecfg32 command=setsmuxpassword password=
/opt/dell/srvadmin/sbin/dcecfg command=setsmuxpassword password=

After that you need to restart the snmp services:

srvadmin-services.sh restart
/etc/init.d/snmpd restart

And you should be golden. The best way to test is to make an snmp call to the Dell Mibs

snmpwalk -v2c -c <your community> -On <your server> .1.3.6.1.4.1.674.10892.1

Tags: , ,

Android 6 months later

Posted by scottk on May 24, 2010 in Ramblings |

It’s six months into my Android time so I thought I’d rattle off the apps I’m still using

  • Twitter App has replaced Twidroid
  • Pandora
  • Default Browser
  • AppMind
  • ColorNote
  • Touchdown
  • Gowalla
  • AndChat
  • WaveSecure
  • Dropbox
  • ES File Explorer

I also make extensive use of the camera and post to Twitter, Facebook, Gmail, Picasa features

Tags:

1

Greenplum vs Hadoop Disk Space

Posted by scottk on May 24, 2010 in Ramblings |

I’ve been spending a whole lot of time calculating Greenplum vs Hadoop disk usage. So here the general equation

(MaxAllocFactor * DiskSize * ( #Disk – RaidDisks ) ) / ReplicationFactor

MaxAllocFactor = Max recommended allocation. 70% for Greenplum and 75% for Hadoop

DiskSize = Size of your drive

#Disk = Number of drives

RaidDisks = Number disk eaten up by RAID, for Hadoop this is 0

ReplicationFactor = Greenplum everything is mirrored for replication factor is 2. Hadoop recommends three copies of data thus it gets a replication factor of 3.

So let’s look at a 24 drive array attached storage, we’ll use 500GB drives.

(MaxAllocFactor * DiskSize * ( #Disk - RaidDisks ) ) / ReplicationFactor
Greenplum: ( .70 * 500GB * ( 24 - 4 ) ) / 2 = 3.5 TB effective space
Hadoop: ( .75 * 500GB * ( 24 - 0 ) ) / 3 = 3.0 TB effective space

Next we’ll look at single server, let’s say a 1U with 4 3.5″ 2TB drives

Greenplum: ( .70 * 2TB * ( 4 - 1 ) ) / 2 = 2.1 TB effective space
Hadoop: ( .75 * 2TB * ( 4 - 0 ) ) / 3 = 2 TB effective space

How about a single 2U server with 12 1TB drives

Greenplum: ( .70 * 1TB * ( 12 - 2 ) ) / 2 = 3.5 TB effective space
Hadoop: ( .75 * 1TB * ( 12 - 0 ) ) / 3 = 3 TB effective space

So what does this mean? It means that you shouldn’t run laughing to the bank on your backend savings by choosing Hadoop over Greenplum, given you plan to use the same storage architecture. Greenplum and Hadoop are two very different technologies so comparing the two is kind of silly in the first place. They fall into the same category of processing large datasets in the same manner that a Ford F350 and Mazda Miata are both cars. They will both get you down that road, but in an entirely different manner.

Don’t talk to me about compression factors, everyone wants to say how their grandmother in Pensacola got 20x compression on system X. System X never happens to be my system, so I’ve stopped drinking the compression factor koolaid.

Tags: , ,

The Ops Data Choke

Posted by scottk on April 14, 2010 in Ramblings |

With all this chatter about big database warehouses and driving deep into business data to bring out insights I think it’s also good to take a look one of the other areas that experienced a huge choke on data and has been running into it for decades. There was a lot of talk these past couple of days on all the application data that is out there because we have new remote sensor devices (like smart phones) that are creating these new streams of  data. Interestingly if you look in your data center you already have platforms that have the potential to generate volumes upon volumes of sensor data, of which traditionally a huge percentage of it is thrown away. The concepts of RRD files and sar data collection exist upon the idea that there is no freakin way you are going to be able to collect all that data and store it long term.

We are now just getting to a point where turning on the pieces that generate system data isn’t so much of a concern. Previously had you been running a dual single core server the idea of setting sar or even SNMP data collection to every minute gave you great pause, percentage wise that’s a lot of cycles for system status instead of the application. With dual quad cores that that kind of concern is starting to pass away and quad will soon be a small number of cores to be running on one processor. Once you had the data, pushing it over a 10Mb line could take up a fairly decent swath of that bandwidth, but we’re at 1Gb now and will most likely see 10Gb as the defacto standard in the near future so our network layer becomes less of a concern. The last hurdle that is standing in the way is a decent way to collect and store this data. It’s not only stoage size for all of this incoming data, it’s also speed to get the constant flow going into the system that needs to be handled well. Additionally we need to be able to pull it out quickly to be notified when change happens to trigger events. There isn’t a clear cut solution to this problem, though I think we are starting to see things that could take it on. If only we could get down to the raw level of the data stored so that when we go back to compare we can look at exact counters instead of min, max and average values over a fifteen minute period. How awesome would it be to use things like K-means clustering to assign fingerprints to servers which could make ops be more predictive about the failure horizons based on seemingly unrelated events to the naked eye. To be able to look back with precision to how much traffic our network was pushing two years ago and look for spikes in sub-minute time frames within that period. To have fine grained enough power usage data to determine what device is plugged into an outlet based on that fingerprint over time.To be able to see when the purchasing of new hardware will out weigh the costs of running legacy hardware based upon very concrete numbers and a model around them. All these things are there and they are not very far away.

Tags: , ,

End of Days

Posted by scottk on April 13, 2010 in Ramblings |

Greenplum Days has come to a close and the last two sessions showed they saved the best for last. The Expansion/Fault Tolerance talk was far and away the pinnacle learning session, very much highlighting how 4.0 fixes some of the pain points in 3.3.x version and you can see the foundational building blocks for some interesting things coming into place.  The following talk was the MAD Skills: Advanced Analytics – Driving the Future of Data Warehousing and Analytics.  I had the notion I was biting off way more than I could chew by stepping into those session, since I address thing more from a systems side. I was surprised to find though that I followed right along with how scalable vectors and K-means clustering could be applied to immediately to the business we do at Adknowledge in a variety of ways.

I’m also on board with the tenets of the MAD (Magnetic, Agile, Deep) skills ideology, the Magnetic tenet being the one I identify with the most. This tenet really thrives on the idea of an approachable omnivorous database that ingests all data that is thrown at it. Not because you bend and shape the data to shoehorn it into the database, but that the database is large and powerful enough to soak up all the data as is and internalize it and allow people to manipulate it. As someone who really likes data I’m always suspect of data that I know has been massaged before it’s put anywhere. I want my data in the rawest form so I can massage and decide what is important and what isn’t, not be held to what someone wrote as an ETL process six months ago thought was important. Once you have all that this raw data in a system it is hard to resist getting in and screwing with it and trying to pick out insights. If the bar to access this data is low those of us who like to find insight the data it becomes a magnet. In order to build even deeper insight these victims that have been sucked in want to add more data to enhance their new models. This new data and insights derived from it adds to the magnetism and you fall into a nice circular cycle. Eventually nobody knows which came first, the people and their insights or the data. It doesn’t really matter because they keep piling on and feeding off each other. I’ve seen this work and I’ve seen this happen on a much smaller scale. It’s only now as the processing power, storage, bandwidth and great database implementations form a magical brew can we now see this begin to take life in a much grander scheme.

Data manipulation, storage and sharing are going to see some big changes over the next few years. Core beliefs in how we store things and what we do with them are going to be repeatedly challenged. It’s going to be a very exciting few years for data. I’m looking forward to it.

Tags: , ,

Greenplum Days – Day 2

Posted by scottk on April 13, 2010 in Ramblings |

At this point it’s the afternoon of Day 2 of the two day Greenplum Days conference. As a non-C level person I have gotten very little exposure to what the inner workings of Greenplum Chorus is, I have a feeling that isn’t an accident as people of my level would nail Greenplum with a multitude of technical questions on a product that is in it’s early stages. What I have gathered from the bit and pieces that I’ve come across is that Chorus is going to be social networking tool wrapped around data intelligence and analytics. Holy hand grenades does that sounds extremely unexciting. Really though it is exciting and what data has been screaming for. You see more and more companies opening up APIs to get at their data on the web and data sharing is becoming more common place. Tim O’Reilly and Scott McNealy while on totally separate ends of the political spectrum both agree that Open Data is the next big thing and where there is going to be lots of energy in the coming years. In order to compete with data giants like Google or Microsoft the little guys are going to need to share data. It’s obvious, the crazy thing is that we’re talking about company to company data sharing when for most business out there just sharing data and the insights it has internally prods a sore spot. Chorus looks to be a gateway to the data and tools to chronicle the insights gained on those datasets. Not only for your Greenplum cluster but aimed at pulling any sort of data that is out there. Software engineers seems to have all the fun for the last decade or so and now it’s the data engineers turn to step up and play, because the hardware and tools are there to allow sculptures to be created out of the seemingly infinite sand. Right now it seems Chorus is more of an idea and if it the product representation of it works it will launch new data insights throughout an enterprise. If it works in that realm the next challenge will be to see if it launches a revolution in cross company business sharing. I don’t know if Chorus is the solution for that or some other representation of the idea will be the one that steps up to the plate and knocks it out of the park. It will happen though, the pitches are being thrown and batters are warming up, the game is already in progress.

Tags: , , , ,

What’s that Greenplum thing

Posted by scottk on April 10, 2010 in Ramblings |

Tomorrow I catch my flight to head to Las Vegas so I can attend Greenplum Days on Monday-Tuesday. I’m pretty excited about this trip even though I will be in Vegas and essentially spending all of my time in a conference.  You see the last month or so I’ve been heavily working with the Greenplum Database. As someone with a past in all aspects of database interaction from DBA to Programmer to SysAdmin, I find the product to be a very powerful offering. It still has a little maturing to do, but it is currently dazzling me with some of the things we’ve been able to do.

Stepping back a little bit I should explain how Greenplum Database is different then your traditional database offering. In many aspects a database is very similar to a book, and in this instance I’ll compare them to an encyclopedia (those things where you found info before Google/Wikipedia). If you only have a small amount of information for your encyclopedia the book form this is fairly simple. Ideally this information is in alphabetical order and finding an entry is as simple as opening up the book and finding the correct page between the two covers. If you are lucky there may even be an table of contents or index where you can reference a term you a looking for and go directly to it. This is exactly what a database is doing in a digital setting. As a business that uses that data you have the book as the central source of the information and having one person who spends their day looking up things. As business ramps up having this one person scenario might not be fast enough for the company, so you look to other solutions. The traditional database method was that you hire someone that is faster at looking things up (bigger server), you might even hire the Rain Man because he can memorize the whole thing. Another approach that has become fairly common is going out and buying another copy of the encyclopedia and someone to read it (replication). This works out great in that not only does it double the amount of question you can answer but additionally if person #1 calls in sick or Rain Man leaves to accompany me to Las Vegas, you still have someone there to answer those questions. It gets a little more complicated in that when the next version of the encyclopedia comes out you need to make sure they both get it at the same time so they don’t give out conflicting information, but all in all this is a fairly good solution.

This method of handling databases has historically worked out well as people accept the bounds of what they can do. More and more though companies don’t want to know what’s in the encyclopedia, they want to know what’s in the whole library. They also don’t want to know what the entry on Edgar Allan Poe says, instead they want any entry that might talk about Poe or potentially any entry that might mention literary works from the early 19th century. Your traditional ways of looking for data break down very quickly here. Nobody looks through every book that comes into the library and notes any mention of such things and one person searching through the entire library to answer that question is a monumental task, it can’t be done in a timely manner.

In the real world what you would do is hire a team of people and divide up the library so each person was in charge of a specific section, the more people you bring in the faster this is going to go. This runs well if of course you have a good system and someone who is very competent overlooking the whole project. It is this same track that databases have started to take, commonly referred to as sharding, and this is what Greenplum does. It splits the data up among a set of servers so when a question comes in it asks all of the workers to look at their smaller set of data, report back and the manager consolidates that data comes up with the answer. This works extremely well for massive sets of data compared to the old way of doing things. Greenplum is not the first to tackle the problem in this matter, but I do believe they are the first to abstract away most of the complication to set up and maintain such a system at a reasonable price.

So tomorrow I head out to Vegas to be a part of Greenplum’s announcement of their new software version as well as something called Greenplum Chorus. Greenplum 4.0 looks very promising and is going to make a huge impact upon our usage. As for Chorus, they’ve been touting as the next big thing in data warehousing, so it will be interesting to find out exactly what it is.

Tags: , ,

Copyright © 2006-2024 SimpIT.com All rights reserved.
This site is using the Desk Mess Mirrored theme, v2.5, from BuyNowShop.com.