Pergunta de entrevista da empresa Kyndryl

Can we use CNN in multi-modal architecture for image processing?