PDFs cannot be attached, to show what must be the worst (vague, convoluted, terse and tricky) Spark SQL/Python json-in parquet-out tech assessments I've ever seen... and I've seen a lot of them.
Sigiloso
a python file, since there was no need for a .whl which could be spark-submit'd (not that they'd know what to do with it). The solution worked, since I tested it (having created my own test data based on the problem description).