No one has said it yet, but this is awesome. As someone working through bunches of books at the moment, it's great to see some kind of structure to at least guide me while I go about my own things.
Even if I don't use the courses fully, I'll certainly take bits and pieces as I go forward.
Sorry about the confusion - 'Big Data' is something I haven't delved into at all yet - I'm following a discrete maths course along with JUST getting into Ruby/Rails & Android development. I'm new to CS, and only recently followed along with www.cs50.net last fall. I did, however, ask my brother-in-law, whose main focus is big data in the medical field, about it. I'll post his recommendations if any later on - so check this in a day or so if you can.
Thanks for the heads up, I downloaded a cache of courses that were recorded and made available online some time ago - one of those being Linear Algebra. I'll have to work through it soon!
My background's in CS and I've mostly been a web developer for several years. Only know the normal SQL stuff but wouldn't consider myself an expert. I'm also interested in OLAP but I can't seem to find anything on it other than what it is.
Just to clarify, you get a $25 credit for Amazon Web Services which is meant to offset the cost of the AWS you use during the course. It's not like a $25 Amazon gift card.
Cloudera was my number one source of Hadoop information while I was working on getting my first cluster going and useful. I would highly recommend them as a resource.
I haven't had time to watch the BDU courses so I can't give any opinion on them though I plan to watch them when I get some time tomorrow. Having a quick glance though the BDU ones appear to have more hands on material. The Cloudera videos are parts of their on-site course which would have the hands on stuff but that has been trimmed from the videos.
I guess it depends on how you best learn. For me the Cloudera videos gave me a good overview and understanding of the various to prepare me to dig in deeper. Combined with the Hadoop book I was able to setup and run a small(12-node) cluster and use it for data storage and report generation.
Thanks for providing these amazing resources & that too completely FREE. woot !
But the UX on the site is terrible.
If & when time permits, please take a look at the Coursera, Udacity, Codecademy, Udemy, Lore(lore.com, formerly coursekit) sites.
User engagement is directly proportional to the usability & smooth, pleasing UX(user experience)
It seems that as the amount of data being produced continues to expand at an unprecedented rate, it will become essential to master Hadoop which seems to be the gold standard for managing big data - would anyone disagree with this?
Well, you will encounter data in many forms. Sometimes it will already have been "hadooped-down" by someone else, and you can analyze it on a single machine. Don't underestimate what a single machine can do these days, if you have say 16 cores and 32 GB of RAM.
Or you can set up a system that will incrementally summarize the data, and then you could do smaller queries against those summaries. That is the goal of Storm AFAIK.
I think that is better model for a lot of applications. The model of having your production systems save terabytes of raw data and then analyzing it in a big batch job leaves a lot to be desired. It works but it's not very flexible and has this latency problem.
Hadoop is good in that it's the only open source solution I know of that can churn through hundreds of terabytes of data. But I wouldn't say it's a complete solution for "managing big data". It's part of one.
I wouldn't be surprised if Hadoop (or another Map/Reduce implementation) becomes a sort of "assembly language" for big data, with higher-level abstractions built on top. You can already start to see this happen with Hive and Pig.
Anyone taken the cloudera courses and can compare this to it? My employer is going to spring $2800 for a cloudera hadoop training. I wonder if it's worth it.
To answer my own question, articles and downloads all point to the IBM website, and the first lesson teaches you to "Get started with Hadoop-based data analytics on IBM Cloud" so it's clearly IBM pushing for their BigData solution based on Hadoop.
Even if I don't use the courses fully, I'll certainly take bits and pieces as I go forward.
Great stuff.