Explain a project you've worked on involving data. What tools and technologies did you use? How would you handle missing or corrupted data in a dataset? What is the difference between INNER JOIN and LEFT JOIN in SQL? What is a DataFrame in PySpark? How is it different from a Pandas DataFrame? How do you optimize a SQL query?