First consider the implementation of GMM by simulation (so-called simulated method of moments, SMM).
GMM is widely-advertised as potentially useful when a likelihood is unavailable. In other cases the likelihood may be "available" but very difficult to derive or evaluate. But model moments may also be seemingly unavailable (i.e., analytically intractable). SMM recognizes that model moments are effectively never intractable, because they can be calculated arbitrarily accurately from an arbitrarily long model simulation. That's really exciting, because simulation ability is a fine litmus test of model understanding. If you can't figure out how to simulate pseudo-data from a given probabilistic model, then you don't really understand the model (or the model is ill-posed). Assembling everything: If you understand a model you can simulate it, and if you can simulate it you can estimate it consistently by SMM, choosing parameters to minimize divergence between data moments and (simulated) model moments. Eureka! No need to work out complex likelihoods, even if they are in principle "available," and in this age of Big Data, MLE efficiency lost may be a small price for SMM tractability gained.
Now consider the properties of GMM/SMM under misspecification, which is what intrigues me the most.
All econometric models are approximations to a true but unknown data-generating process (DGP), and hence likely misspecified. GMM/SMM has special appeal from that perspective. Under correct specification any consistent estimator (e.g., MLE or GMM/SMM) unambiguously gets you to the right place asymptotically, and MLE has the extra benefit of efficiency, so it's preferable. But under misspecification, consistency distinguishes the estimators, quite apart from the secondary issue of efficiency. In particular, under misspecification the best asymptotic DGP approximation for one purpose may be very different from the best for another. GMM/SMM is appealing in such situations, because it forces you to think about which features of the data (moments, M) you'd like to match, and then by construction it's consistent for the M-optimal approximation.
In contrast to GMM/SMM, pseudo-MLE ties your hands. Gaussian pseudo-MLE, for example, may be consistent for the KLIC-optimal approximation, but KLIC optimality may not be of maximal relevance. From a predictive perspective, for example, the KLIC-optimal approximation minimizes 1-step-ahead mean-squared prediction error, but 1-step quadratic loss may not be the relevant loss function. The bottom line: under misspecification MLE may not be consistent for what you want, whereas by construction GMM is consistent for what you want (once you decide what you want).
So, at least in part, GMM/SMM continues to intrigue me. It's hard to believe that it's been three decades since Lars Hansen's classic GMM paper (1982, Econometrica), and two decades since the similarly-classic indirect inference papers of Tony Smith (1990, Duke Ph.D. Dissertation, and 1993, J. Applied Econometrics) and Christian Gourieroux, Alain Monfort and Eric Renault (1993, J. Applied Econometrics). (SMM is a special case of indirect inference.) If by now the Hansen-Smith-Gourieroux-Monfort-Renault insights seem obvious, it's only because many good insights are obvious, ex post.