Different placement or collaboration policies in handling datasets and workloads across cloud, edge, and user-end may substantially affect a cloud-edge computing environment’s overall performance. However, the common practice is to optimize the performance only on the edge layer, while ignoring the rest of the system. This paper calls attention to optimize AI-oriented workloads’ performance across all components in cloud-edge architectures holistically. Our goal is to optimize AI-workload allocation in cloud clusters, edge servers, and end devices, achieving the minimum response time in latency-sensitive applications.This paper presents new workload allocation methods for AI workloads in cloud-edge computing systems. We have proposed two efficient allocation algorithms to reduce the end-to-end response time of single-workload and multi-jobs scenarios, respectively. We apply six edge AI workloads from a comprehensive edge computing benchmark – Edge AIBench for experiments. Besides, we conduct experiments in a real edge computing environment. Our experiment results demonstrate the high efficiency and effectiveness of our algorithms in real-life applications and datasets. Our multi-job allocation algorithm’s end-to-end response time outperforms the other four baseline strategies by 33% to 63%.