Solve the previous question with keras
Sigiloso
Find tutorials on the web. What is required is using cross categorical loss, last layer of 4 nodes as the number of labels, and adjusting number of batches to run fast enough but not exceed the Memory constraint.