Benchmarking Intelligence: Testing the Cognitive Limits of Large Language Models