Note: In the interests of transparency I do not use a Kepler generation card anymore and haven’t for some years. However this doesn’t detract from the substance of my article which is more about the craze for the best, when, the least will often do when starting out, and warning… this article might ramble a bit as I diverge slightly off topic but with relevance.
Enthusiastic about Deep-Learning and looking for the latest and greatest GTX card ready to roll? My serious advice, given with all due respect, is to master the basics (which will take many months) with an el cheapo Kepler card (almost) from the junk yard. Forget all this talk about bus bandwidths, parallelism, CUDA benchmarks, TFLOPS, card aesthetics (yes, some find important!) and just get stuck in at the deep-end with something that will compute dot product 100X faster than your lowly CPU but for next to nothing.
Always bear in mind.
The slowest Kepler card will be FAR faster than the fastest i7 CPU. That is any card is better than no card.
Fiction: You need a GTX 1050 Ti as a minimum to be productive
Browse the web for an hour or so and you will likely find the consensus for entry-level DL exploration is something along the lines of an Nvidia GTX 1050 Ti (4 GB card a few years old now), and to be fair that assumption is reasonable – but what if say you do not have a 1050 Ti handy, want to experiment with a lesser card to hand, or even cannot afford a 1050 or do not wish to buy (these are upwards of $100/£75 GBP used as of January 2020) because you are unsure of your interest.
An excellent reference for GPU performance is Tim Dettmers blog, now this isn’t a slight against this some would say almost definitive resource, and it is superb for evaluating cards for cutting edge GPU heavy applications (or Kaggle) but my post kind of wanders in the opposite direction, what do you really need to start out, not what would you really need to perform deep fake analysis on 0.5TB of image data (reference here to the current 01/20 Kaggle competition).
So you need a card, one that works, and one that is cheap, and one that can do a lot of deep learning experimentation – there are caveats to cheapo cards though, you will still need a card that CUDA 9 can recognise and drivers that will function (desktop drivers for Kepler are still supported), this invariably means a card with at least Compute Capability 3.0 (check here) or to save some hassle 3.5, I say 3.5 as you will need to build or find a (Python) wheel for CC 3.0 on Kepler cards otherwise, whilst 3.5 is currently supported with the existing binaries.
Having built both TF 1.X and TF 2 from source with Bazel for earlier cards (an example is the 1.13 wheel I compiled earlier last year, stripped of optimizations and for CC 3.0) you can at an absolute minimum do GPU computation with a Kepler generation GTX 650 (cost $20) by building for CC 3.0, the compilation depends of course on your machine, but even the slowest machine will overnight compile a working TensorFlow 2.
So, equipped with an earlier generation card (basically anything GTX from 2012) you can train many straightforward architectures, linear, logistic, standard FFNN’s and even Convolutional with datasets such as MNIST.
Don’t forget… Alexnet (which used multiple Convolutional and Max Pooling layers as well as the final dense and less efficient pre ReLu activation functions) was trained on a pair of pre-Kepler GTX 580’s, and today with transfer learning you can get a headstart and use a faster (Kepler) card with more efficient activations.
Now for the controversial part of this article!
How does the lowly Kepler perform today in 2020?
So for those puzzled about Kepler, it is the generation of GPU cards (both consumer and professional) launched by Nvidia in 2012, the architecture being now eight years old, in computing terms that is positively ancient. You’d expect the card to run at a crawl, right?
Local GTX 660/4 vs GCP K80 for simple classification tasks
Implement a simple FFNN classification architecture (say some dense layers and a sigmoid with adam) and then fit this model and pay careful attention to the epoch iteration times – a cheapo GTX card will not process the mini-batch much faster than a later/faster card – say a K80 (or better, the mythical K80 is ageing and not gracefully, but given as an example of a professional DL benchmarking GPU). Now this is obviously not true with more advanced architectures (with complex layers, Convolutional or otherwise) but for simple non-image classification with dense (i.e. Fully connected) ReLu layers with adam on the end you aren’t going to see huge differences. Now even more contentious is the bus did not seem to be the bottleneck on batch transfer (comparing a 2007 Dell Professional Workstation T5400 with a generic GCP instance, hardware varies but Haswell in my GCP test), both running Ubuntu 18.04 LTS.
Incidentally I also checked against a P4 and T4, at 30s epoch times on the GTX the heavy duty GPU’s were knocking off about 4s-6s an epoch.
Note you will not see this however on Convolutional or other architectures beyond basics, but starting out these cards will be sufficient for basic convolutional classification (simple MNIST) with say a softmax classifier.
Didn’t try any SVM metrics but my understanding is they don’t use the GPU anyhow (well, not usually, my SKL models didn’t a year or so ago), and lets face it, SVM’s are now the black sheep of the ML family.
Convolutional? Employ Transfer Learning!
Another factor is that by utilizing transfer learning for convolutional image processing a lighter (i.e. Kepler) card can even do decent CNN classification without grinding to a halt, in fact, if you are doing CNN work definitely utilize transfer learning (such as the VGG structure/weights) as much as possible, training times are the bottleneck, using pre-trained weights, modifying your later dense layers (for classification) and you have saved a heck of lot of costly GPU time.
So, if you are starting out, possibly willing to compile your own TensorFlow wheel and want to experiment before committing much money get hold of a cheap Kepler (GTX 6 series) card and play around, 2GB – 4GB are nice, but by playing with batch sizes you could I guess get away with a 1GB card!
Big Models such as ResNeXt?
You could try a 4 GB card and employ transfer learning along with small batch sizes – this way you can even train quite sophisticated networks such as ResNeXt-50, however don’t expect fast progress. But, it can be done. Your bottlenecks will be the memory (even a 4 GB 980 will suffer from this issue) and computational throughput (slow, but working).
Irrelevant really with usually +10% performance difference max from ref. I have used Nvidia reference cards, EVGA, Asus, Zotac and others. I can’t say the best, but even the cheapest brand (often Zotac, also coincidentally the least sexy looking) are just as reliable in reality, if you have a choice I’d focus more on whether to use a blower (usually ref designs) or Open Air, which may be dictated by your PC’s casing and cooling requirements.
How about the GTX 690 aka “the heater”?
This being the (originally) $1000 GTX 690 (now <$100 on auction sites) – because well – I have little experience of it, it is an interesting card (being 2 x GTX 680 2GB in one package) and with CC 3.0 you could use it with TF 2 utilizing “/device:CPU:”… to run parallel jobs. Actually in testing this is the only Kepler that arguably can challenge a (non Ti) GTX 980 or GTX 1060 computationally (the standard 980 being roughly equal to the 1060/2), but you have to factor in with 300W peak this will not exactly be super cool. I am a bit ambivalent about recommending this card – if it is cheap and you can handle the power consumption – yes – however at about the same price used as a GTX 1050 Ti the later Pascal card (at 75W!) whilst undoubtedly less powerful will have a longer strategic viability for you, it is also dare I say going to be more reliable.
An original Titan cheap would be a fine deal, the best Kepler card, and one that you can use today for DL and going forward a while. But this isn’t a cheap card even today used.
Some other Keplers such as the K20 Compute card would be useful as well, but these are usually quite a way out of the usual Kepler consumer card budget. If you are thinking of a K20 then well, why not a post Kepler K40.. or K80? The budget alas slips…
Budget Recommendation: GTX 650/660/670/680 with 2-4 GB if possible.
Cheap Workstations (for a cheap card)
For something useful, say a Dell T5400 Workstation, 32GB and a GTX 660/4 could be found for under $200/£150 or so, load up Ubuntu, CUDA, Python 3, TF2 and xfce for a light desktop and you are in business (of sorts), this hardware is more than capable for classification (beyond MNIST), and can be used as a client for doing heavy duty GCP work. Having a 950W industrial grade PSU also is handy (it will just about power that Kepler Titan/GTX 690).
Ubuntu – the OS of choice
On a slightly tangential somewhat on a cheap Workstation (but fitting to our general theme) save yourself huge troubles and utilize some Linux build on the machine (Ubuntu for ML being the least hassle to configure without a doubt and the choice of most in the field today). There are literally a hundred reasons for this off the top of my head, but in the end (and soon after the beginning) you’ll realize there is no other way to go in Deep Learning, in fact there are NO other paths, at least as a general purpose DL OS, forget any MS stuff and OSX.
Note here that pre-Kepler cards (such as the GTX 5 series, pre 2012) will not work due to the Compute Capability requirements required by the Nvidia driver architecture needed to support TensorFlow (I believe PyTorch has similar if more ambigious requirements), so you can go cheap, but not bargain basement.
However if you are serious longer-term I’d argue to save stress either use GCP preemptibles (or AWS although I am unfamiliar with their spot instances) with a K80/P4 (or better) for around $0.50c (£0.40 GBP) an hour, or buy something like a GTX 1070 as a minimum for Convolutional work which will be useful when you start building your own complex CNN, then a Kepler will not make the cut – I should emphasise here a Kepler card is useful but it isn’t going to help with serious contemporary architecture implementations.
A Nice thing about working with the Nvidia platform is the toolsets are consistent and therefore nvidia-smi for example will run to interrogate a 2012 Kepler card as well as the current generation. Some stats will be unavailable (for example Watts power consumption stats and power management seems to be a later Maxwell thing).
How about post Kepler?
Well, obviously will be better (at least in the power consumption department and usually in performance), if the 7 series the GTX 750 makes a fine introduction to ML (and can be had in 4GB variants) – this also has (unusually) a Maxwell architecture so is very power efficient. Anything beyond this (9 – Maxwell, 10 – Pascal etc) isn’t really within the scope of this article as would exceed budget.
What about AMD Cheapo Cards? (ROCm)
So, what if you have an AMD Card? Well you can try ROCm with TF and your mileage will vary, I am not endeared to AMD and neither dare I say is the ML community, the company are however desperately playing catch-up and one hopes they will succeed as Nvidia pulled a fast one on the DC fiasco with banning GTX cards, which they could only have got away with given their dominant position, and that utter domination is a real problem for AMD (and therefore user adoption).
Shouldn’t be negative here but in summary to save heartache even if you have an AMD card currently I would still throw in a cheap Kepler and use it unless the AMD card is super powerful and supported with a TensorFlow 2 build.
Caveat: Older cards suck power! Expect generally a Kepler to consume (generally) up to 200W and idle at around 20-30W. Some cards (the amazing GTX 690 or an original Titan which can still give the 1060/2 some worry) can pull 300W through 2 8-pin and the Bus! (also it can I imagine provide your room heating). In comparison many later cards consume all their power through the PCIe bus connector (hence <75W). Your choice of a Kepler card may well depend on whether your PSU can handle it (500W as a rough guess but a lot more for a 690 or Titan).