[wplug] Thoughts on a Storage System
Duncan Hutty
dhutty at ece.cmu.edu
Wed May 20 09:25:32 EDT 2009
Michael Semcheski wrote:
> Hey All,
>
> I'm currently looking at some different options for providing lots of
> storage to a few applications. The applications are in development
> and use in-house. We analyze video data, and if you record a lot of
> video, you need a lot of disk space. We have other applications that
> may be coming online in a few months that could use the space too.
> Typically though, we have no need to access the data via the file
> system - doing everything via API would be a-ok.
>
> I've been spending some time here and there to see if there are any
> compelling open source projects that we should trial. Hadoop is
> definitely on the radar, but its not a perfect fit. It seems designed
> for Java, and and though it supports C++, I'm not sure if there's
> first class support. Also, there's a fad for things that scale up to
> 1000's of nodes. That has its place, but many of those systems don't
> scale down to three or four nodes as well. CAStor, from Caringo seems
> excellent, but its blows our budget and our cost per GB.
>
>
> Anyway, here are the requirements I've come up with so far:
>
> 1. Able to put, get, and delete data.
> 2. Able to run efficiently on as few as three nodes.
> 3. If there are multiple nodes, be able to use capacity for
> redundancy / failover / balancing.
> 4. Each additional node adds to the total storage available.
> 5. Good linear read / write performance.
> 6. Must be able to recover from multi-node failure.
> 7. System stays online if 15% of the nodes are offline.
> 8. Data can be deleted - we do not have unlimited storage space.
> 9. Able to scale up to many TB of total space. (ie, currently we
> have about 10TB, but could easily get another 10TB in the next year.)
>
>
> And these are a few of the "wishlist" items I've come up with - not
> requirements but they would be neato:
>
> 1. Support for different tiers of storage. (ie, most recent data
> goes to the faster tier, and is moved to the slower tier of time.)
> 2. Integration with a distributed job processing system.
> 3. Support for local clients. (ie, clients can cache a portion of
> the data for offline or disconnected operation.)
>
>
> Anybody have any thoughts? Know of something that does what we want,
> or something similar?
>
Storage is something that's been on my OneDay list for a while now and
this is the project that I have at the back of my mind to look into:
http://www.gluster.org
I'm sure wplug would be grateful if you make a comparison of some of the
FOSS options, their advantages and disadvantages:)
--
Duncan Hutty
More information about the wplug
mailing list