But PCR has some awkward aspects, well-known in some circles (see, e.g., Hastie and Tibshirani, Elements of Statistical Learning, Chapter 3) but curiously little-known in others.
(1) First-step PC extraction is "unsupervised" (in machine-learning jargon). Hence the x-variable linear combinations given by the PC's may differ importantly from the best x-variable linear combinations for predictive purposes. This is unfortunate because second-step PCR typically is used for prediction!
(2) PCR shrinks in rather awkward/extreme directions/amounts. PCR shrinks the excluded PC's completely to 0 (by construction), and moreover, it shrinks the included PC's equally toward 0, regardless of the relative sizes of their associated eigenvalues.
So, what to do?
(1) Wold's partial least squares (PLS) attempts to address issue (1). Recent interesting work, moreover, extends PLS in powerful ways, as with the Kelly-Pruitt three-pass regression filter and its amazing apparent success in predicting aggregate equity returns.
(2) Ridge regression (among others) addresses issue (2). It includes all PC's and shrinks them toward 0 according to the relative sizes of their associated eigenvalues.