Basics of Machine Learning

3pt14159 · on May 18, 2014

If you want to get into machine learning, it is actually pretty easy, provided you studied Math, Engineering, or Science in University. The papers are hard at first, but they don't hide like other papers do. Everything is laid out there for you to code and run with your own data. In a field like, say, Structural Engineering, the paper writers can make claims about the structural resilience of an Ultra High Performance Concrete that they tested, without you ever being able to hope to repeat the exact experiment. You may not even be able to get your hands on the proprietary compound they used.

In ML, you might not be able to use the same corpus / training set, but you're usually able to recode the actual algorithm and you're usually able to find a compatible type of data set to work off of.

Also, most ML people are lazy, so if they don't work for Google or Facebook they're usually using open data datasets anyway, which are trivial to drop in and verify.

visarga · on May 18, 2014

> most ML people are lazy

That's a great virtue for programmers in general.

"Laziness: The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris. (LarryWall, ProgrammingPerl, 1st edition)"

> they're usually using open data datasets anyway

They do that in order to be able to judge the performance of various algorithms - they need to have a common standard to test against.

3pt14159 · on May 18, 2014

I use lazy in the positive sense, I'd actually forgotten that it had negative connotations when I wrote the above.

mandeepj · on May 19, 2014

Bill Gates once said - "I choose a lazy person to do a hard job, Because a lazy person will find an easy way to do it."

source - http://en.wikiquote.org/wiki/Talk:Bill_Gates#Lazy_person_to_...

icelancer · on May 19, 2014

>Also, most ML people are lazy

As someone who has written a fair amount of machine learning code, I could not agree more. It is an inelegant way to solve a problem by brute force.

Which, of course, is occasionally very useful! But it is often lazy.

Though it sounds really cool.

nsomaru · on May 18, 2014

I'm curious -- does one need to consult these papers when using algorithms from libraries such as SciKit (Python)?

I ask because the field seems so broad, and having completed Andrew Ng's original ML course, I'm not sure which way to go...

thanks.

agibsonccc · on May 18, 2014

Hi Machine learning instructor from Zipfian Academy here.

We teach scikit learn at our bootcamp and people who have some sort of scientific background, have completed Ng's course, or are software engineers typically apply to our program.

What we have found in the class room is that if you understand the fundamentals of machine learning algorithms, you don't have to understand all the minutiae of an algorithm as a user.

The hardest problem to solve is figuring out how to encode data such that the algorithm learns properly as well as all the pre processing steps that typically go in to machine learning.

The only thing that would be worth understanding for an algorithm are the hyperparameters associated with the respective models. An example of this would be learning rate or regularization coefficients.

If you understand what those are useful for, you should be fine.

nandemo · on May 19, 2014

FWIW, I'm studying ML with Introduction to Statistical Learning, which was recommended to me by several people.

Personally I didn't enjoy Coursera's ML course (though I believe Ng is a great teacher) and dropped out, but I'm enjoying this book.

It's available as PDF for free:

http://www-bcf.usc.edu/~gareth/ISL/

petulla · on May 18, 2014

I'm sure these lectures are great but my computer was none too happy about 100 embedded videos in a single Web page.

nhebb · on May 18, 2014

Yeah, my machine learned not to visit that page again.

visarga · on May 18, 2014

And they don't have controls (full screen, volume, seek).

imurray · on May 18, 2014

The main course website is here: http://www.inf.ed.ac.uk/teaching/courses/iaml/

There are more notes and so on there. It isn't an online course though, so don't expect too much.

edran · on May 18, 2014

We also have two more in-depth (and harder) courses on ML with publicly available slides:

* Machine Learning and Pattern Recognition: http://www.inf.ed.ac.uk/teaching/courses/mlpr/

* Probabilistic Modelling and Reasoning: http://www.inf.ed.ac.uk/teaching/courses/pmr/

Unfortunately video lectures are often only available internally.

catkin · on May 19, 2014

I didn't take it myself, but my (now former) classmates that took PMR told me it was one of the hardest courses they ever did.

jds375 · on May 18, 2014

I also highly recommend these lectures from Cornell. The lecturer is well-known for his free SVM-light implementation. http://machine-learning-course.joachims.org

graycat · on May 18, 2014

Best I can tell, it comes in a Champagne bottle with a cork held in by wire and a nice, new label, but, when open the bottle and pour out the contents, what get is 99 44/100% pure, old cookbook-style, applied statistics, some curve fitting, some hypothesis testing, some statistical estimation with definitions, theorems, and proofs mostly filtered out. Also not much in experimental design and only little in 'resampling' techniques.

Right: Since we need a computer to do the data manipulations, the computer science people want to conclude that the statistics is also part of 'computer science'? Now bookkeeping, accounting, numerical solutions of differential equations, etc. are also part of 'computer science'? Or, what the heck ever happened to the field of statistics, biostatistics, quality control, etc.?

Houshalter · on May 19, 2014

What's your complaint exactly? There is a lot of overlap with statistics, sure. Especially in the simpler stuff. Machine learning is literally using machines to automatically create models. What did you expect?

graycat · on May 19, 2014

> Machine learning is literally using machines to automatically create models.

Good we got that clear. In statistics, the 'models' have hypotheses that should be satisfied. Checking the hypotheses and, also, the model that gets built, is tricky work not much mentioned in the book and, really, super tough to program in general.

> What's your complaint exactly?

Old wine, filtered, in new bottles.

I believe that students would be better advised to go for the original stuff in courses/books on statistics, the field itself. Yes, then a computer can do the data manipulations.

There has long been a threat of 'cook-book' statistics where get some software and a 'tool', apply it where some crucial hypothesis does not hold, and then get misleading results. Do this in medicine and can kill people; indeed, instead, actually, biostatistics is a relatively serious subject.

Well, it seems fairly clear that with 'machine learning' the threat of 'cook-book' statistics is much greater.

Or, there long was threat enough with SPSS, SAS, R, Matlab, etc., but now the threat is greater.

> What did you expect?

Something new and good, not 'statistics done badly'.

Houshalter · on May 20, 2014

I don't know enough about the field of statistics to argue with you. But if what you say is true, why aren't statisticians using traditional methods winning competitions and benchmarks, and taking over the field?

The "cook-book" problem is overfitting and it's pretty well known and avoidable.

graycat · on May 20, 2014

> taking over the field

Sorry, but "field"? What "field"? 'Machine learning' in 'computer science'? That's not a 'field'; it's a really bad joke.

> The "cook-book" problem is overfitting and it's pretty well known and avoidable.

Gee, people have been criticizing 'cook-book' statistics for decades, and the only problem was "over fitting"? Amazing. Gee, we can clean out nearly all of the definitions, theorems, and proofs of statistics and just put in a simple solution for the "avoidable" problem of over fitting! Astounding.

Did I mention a bad joke?

Houshalter · on May 20, 2014

>Sorry, but "field"? What "field"? 'Machine learning' in 'computer science'? That's not a 'field'; it's a really bad joke.

Ok then. I guess biology isn't a field either because I don't like their methods. They are too messy and impure, and really it's just applied physics. (Mildly relevant: http://xkcd.com/435/)

And sorry I meant overfitting is the equivalent of the "cook-book" problem in machine learning. Otherwise it's fairly easy to try a model against test data and see how well it performs. No need to guess whether the underlying assumptions of the model are true, just see if it works.

graycat · on May 20, 2014

> No need to guess whether the underlying assumptions of the model are true, just see if it works.

No, there are assumptions also for this approach, and these assumptions also need to be checked.

Then in applying the model that has been checked, there are still more assumptions, that need to be checked.

And checking models is often essentially hypothesis testing where need more assumptions and, commonly, some experience to know what hypothesis tests to use. E.g., tests in regression can be F ratios and/or t-tests, and need to check some assumptions or at least robustness here.

E.g., in power spectral estimation, see, Blackman and Tukey, 'The Measurement of Power Spectra', there is a severe trade off between resolution and stability, another case of a trade off between bias and variance. Tough to 'automate' that consideration.

So, take a box of data. Divide it in half. Use the first half for the 'fitting' or 'training' data. Test on the second half. Not deep; I advised some people in finance to do that in about 1982; not nearly new; and I very much doubt I was the first; gotta believe that J. Tukey, L. Breiman, and others thought of this long before I did.

And if the test fails, then try to fit again. Run all night and in the morning, presto, have a model that does well on the test data. Now what have we? Not a trivial question to answer.

Why least squares instead of something else? A good answer is not easy.

I'm not criticizing biology, and it's done some excellent science, e.g., see the E. Lander lectures in his MIT course in biology.

Statistics is a good field with a lot of good work by a lot of bright people with excellent backgrounds in pure and applied math.

The first time I heard about 'machine learning' I guessed they meant filling in the optimal value function for stochastic dynamic programming. Okay, that is a kind of 'learning'. But, that's not what they had in mind.

Next an example was the children's interactive computer game Animals where a child thinks of an animal, answers some questions about the animal to chase down a binary tree, stored in the program, to a leaf, and have the computer guess the animal. If the computer is wrong it asks the child "What is true for your animal and false for my guess.", adds to the binary tree, and thus, 'learns'. Call it a parlor trick.

A little like 'self driving cars': Really 'self-driving'? Nope, not close. Only on streets where everything has been mapped down to 1 cm or better including all the painted lines, all the curbs, all the traffic signs, and all the traffic lights, etc. In no way does the 'self driving car' actually look at a new street scene, make sense out of it, and use that 'learning' to drive. Instead, so far a self driving car is about as amazing as a train on a track.

We're talking hype, old wine, filtered, in new bottles, and not much that is new and good. Or, the old academic joke goes, "the new is not good and the good, not new.". The last time I looked at a leading 'machine learning' prof, I had conclude that he needed to return to ugrad school, be a math major, and learn how to read/write math.

E.g., why maximum likelihood estimation? There are reasons, but I saw no hint of them in the Ng lectures at Stanford.

We've known a lot about pure and applied statistics and how to use them for a very long time. The good work in statistics is high quality pure/applied math, e.g., Billingsly, 'Convergence of Probability Measures', e.g. with Ulam's result Le Cam called 'tightness', Brillinger on time series, Serfling on limit theorems, Rao on linear methods, Breiman on CART, and much more. I'm not seeing comparable quality in 'machine learning'; indeed, from all I've seen only a tiny fraction of the computer science machine learning people have the math prerequisites for good research in statistics.

Yes, I've published peer-reviewed original research in statistics, indeed, applied to a problem in computer science.

Houshalter · on May 21, 2014

For tradeoffs you design your error metric to be whatever best represents your problem.

>And if the test fails, then try to fit again. Run all night and in the morning, presto, have a model that does well on the test data. Now what have we? Not a trivial question to answer.

Minor quibble, but the proper way is to use validation data to find the best model and training parameters, the test data is only used once to test how well it actually works.

>A little like 'self driving cars': Really 'self-driving'? Nope, not close. Only on streets where everything has been mapped down to 1 cm or better including all the painted lines, all the curbs, all the traffic signs, and all the traffic lights, etc. In no way does the 'self driving car' actually look at a new street scene, make sense out of it, and use that 'learning' to drive. Instead, so far a self driving car is about as amazing as a train on a track.

That's not accurate. Self driving cars do take input and can't rely on a static map since the world changes and it has to detect obstacles and other cars. I'm not entirely certain but I don't believe they are using very much machine learning in current generation self-driving cars. However there is a startup working to make one with pure machine vision, and it was done once in the '90s with neural nets (ALVINN.)

graycat · on May 22, 2014

> Minor quibble, but the proper way is to use validation data to find the best model and training parameters, the test data is only used once to test how well it actually works.

No: I was assuming that at your step, the text with the 'validation data' fails. So, we have to return and use the validation data on each of the fitting efforts. Then to my question: What do we have? Not so easy to answer. Net the two step approach of 'training data' and 'validation data' is not so good. Also we need some assumptions about the two boxes of data and, then, any data we plug into the resulting model.

For the self driving cars, once they see something not on their static map of the street, apparently they have to stop and wait, maybe drive around. The point is, those cars can only drive on streets very carefully mapped out, down to 1 cm.

Apparently the basic problem is, driving a car takes some basic intelligence, that is, available essentially only from humans. For anything much like real human intelligence, we don't know how to program it. In particular, machine learning doesn't know how to program it.

seizethecheese · on May 18, 2014

Software eating the world.

pistle · on May 19, 2014

In case you want a better user experience...

Playlists on youtube vs. the mass of embedded videos on one page.

Naive Bayes https://www.youtube.com/watch?v=xYqiIjaqydU&list=SPBv09BD7ez...

Decision Tree https://www.youtube.com/watch?v=eKD5gxPPeY0&list=PLBv09BD7ez...

Generalization and Overfitting https://www.youtube.com/watch?v=j9_yzC-x-js&list=PLBv09BD7ez...

Nearest Neighbor Methods https://www.youtube.com/watch?v=k_7gMp5wh5A&list=PLBv09BD7ez...

K-means Clustering https://www.youtube.com/watch?v=mHl5P-qlnCQ&list=PLBv09BD7ez...

Mixture Models and the E-M Algorithm https://www.youtube.com/watch?v=REypj2sy_5U&list=PLBv09BD7ez...

Principle Component Analysis https://www.youtube.com/watch?v=IbE0tbjy6JQ&list=PLBv09BD7ez...

Hierarchical Agglomerative Clustering http://www.youtube.com/watch?v=GVz6Y8r5AkY&list=PLBv09BD7ez_...

Legoben · on May 18, 2014

Any chance we can get the lectures before #5?

mrknmc · on May 18, 2014

Here's a link to the course website: http://www.inf.ed.ac.uk/teaching/courses/iaml/ I took this course and Professor Lavrenko is one of the best lecturers at the uni.

peanut_merchant · on May 18, 2014

I did the course several years ago when I studied there. He is excellent. Also recommend his text technologies course (lecture notes but no videos here) : http://www.inf.ed.ac.uk/teaching/courses/tts/

arethuza · on May 18, 2014

Interesting to see that Edinburgh is using US style position naming in addition to more traditional UK positions (i.e. "Lecturer" == "Assistant Professor", "Reader" == "Associate Professor" etc.)

ssabev · on May 18, 2014

It's probably because a few of our lecturers are actually from/did their PhDs in the US :)

frozenport · on May 19, 2014

I have 32GB of ram and this nearly caused my computer to lock up.