State-of-the-art cache-oblivious parallel algorithms for dynamic programming (DP) problems usually guarantee asymptotically optimal cache performance without any tuning of cache parameters, but they often fail to exploit the theoretically best parallelism at the same time. While these algorithms achieve cache-optimality through the use of a recursive divide-and-conquer (DAC) strategy, scheduling tasks at the granularity of task dependency introduces artificial dependencies in addition to those arising from the defining recurrence equations. We remove the artificial dependency by scheduling tasks ready for execution as soon as all real dependency constraints are satisfied, while preserving the cache-optimality, by inheriting the DAC strategy. We applied our approach to a set of widely known dynamic programming problems, such as Floyd-Warshall’s All-Pairs Shortest Paths, Stencil, and LCS. Theoretical analyses show that our techniques improve the span of 2-way DAC-based Floyd Warshall’s algorithm on an n node graph from Θ(n log^2 n) to Θ(n), stencil computations on a d dimensional hypercubic grid of width w for h time steps from Θ((d^2h) w^(log(d+2)-1)to Θ(h), and LCS on two sequences of length n each from Θ(n^log_23 ) to Θ(n). In each case, the total work and cache complexity remain asymptotically optimal. Experimental measurements exhibit a 3-5 times improvement in absolute running time, 10-20 times improvement in burdened span by Cilkview, and approximately the same L1/L2 cache misses by PAPI.