Benchmarking Arabic AI with Large Language Models