[wplug] Thoughts on a Storage System

Duncan Hutty dhutty at ece.cmu.edu
Wed May 20 09:25:32 EDT 2009


Michael Semcheski wrote:
> Hey All,
> 
> I'm currently looking at some different options for providing lots of
> storage to a few applications.  The applications are in development
> and use in-house.  We analyze video data, and if you record a lot of
> video, you need a lot of disk space.  We have other applications that
> may be coming online in a few months that could use the space too.
> Typically though, we have no need to access the data via the file
> system - doing everything via API would be a-ok.
> 
> I've been spending some time here and there to see if there are any
> compelling open source projects that we should trial.  Hadoop is
> definitely on the radar, but its not a perfect fit.  It seems designed
> for Java, and and though it supports C++, I'm not sure if there's
> first class support.  Also, there's a fad for things that scale up to
> 1000's of nodes.  That has its place, but many of those systems don't
> scale down to three or four nodes as well.  CAStor, from Caringo seems
> excellent, but its blows our budget and our cost per GB.
> 
> 
> Anyway, here are the requirements I've come up with so far:
> 
>    1. Able to put, get, and delete data.
>    2. Able to run efficiently on as few as three nodes.
>    3. If there are multiple nodes, be able to use capacity for
> redundancy / failover / balancing.
>    4. Each additional node adds to the total storage available.
>    5. Good linear read / write performance.
>    6. Must be able to recover from multi-node failure.
>    7. System stays online if 15% of the nodes are offline.
>    8. Data can be deleted - we do not have unlimited storage space.
>    9. Able to scale up to many TB of total space. (ie, currently we
> have about 10TB, but could easily get another 10TB in the next year.)
> 
> 
> And these are a few of the "wishlist" items I've come up with - not
> requirements but they would be neato:
> 
>    1. Support for different tiers of storage. (ie, most recent data
> goes to the faster tier, and is moved to the slower tier of time.)
>    2. Integration with a distributed job processing system.
>    3. Support for local clients. (ie, clients can cache a portion of
> the data for offline or disconnected operation.)
> 
> 
> Anybody have any thoughts?  Know of something that does what we want,
> or something similar?
> 

Storage is something that's been on my OneDay list for a while now and 
this is the project that I have at the back of my mind to look into:
http://www.gluster.org

I'm sure wplug would be grateful if you make a comparison of some of the 
FOSS options, their advantages and disadvantages:)
--
Duncan Hutty


More information about the wplug mailing list