This question was motivated by a question asked on stackoverflow.
Suppose you are given a rooted tree $T$ (i.e. there is a root and nodes have children etc) on $n$ nodes (labelled $1, 2, \dots, n$).
Each vertex $i$ has a non-negative integer weight associated: $w_i$.
Additionally, you are given an integer $k$, such that $1 \le k \le n$.
The weight $W(S)$ of a set of nodes $S \subseteq \{1,2,\dots, n\}$ is the sum of weights of the nodes: $\sum_{s \in S} w_s$.
Given input $T$, $w_i$ and $k$,
The task is to find a minimum weight sub-forest* $S$, of $T$, such that $S$ has exactly $k$ nodes (i.e. $|S| = > k$).
In other words, for any subforest $S'$ of $T$, such that $|S'| = k$, we must have $W(S) \leq W(S')$.
If the number of children of each node were bounded (for instance binary trees), then there is a polynomial time algorithm using dynamic programming.
I have a feeling that this is NP-Hard for general trees, but I haven't been able to find any references/proof. I even looked here, but could not find something which might help. I have feeling that this will remain NP-Hard even if you restrict $w_i \in \{0,1\}$ (and this might be easier to prove).
This seems like it should be a well studied problem.
Does anyone know if this is an NP-Hard problem/there is a known P time algorithm?
*A sub-forest of $T$ is a subset $S$ of nodes of the tree $T$, such that if $x \in S$, then all the children of $x$ are in $S$ too. (i.e. it is a disjoint union of rooted sub-trees of $T$).
PS: Please pardon me if it turns out that I missed something obvious and the question is really off-topic.
Similar to the solution for a binary tree, you can solve it in polynomial time on a tree without degree restriction: First, generalize the problem such that every node also has a "count" $c_i\in\{0,1\}$, and the problem is to find a subforest $S$ of count $k=\sum_{i\in S} c_i$. Generalize the dynamic programming approach to this version (it still works with a table, given a fixed count $C$, what is the minimal weight subforest in the subtree having count precisely $C$)
Keep the original tree with nodes of count 1. Every node $v$ with degree greater than 2 is split into a binary tree with deg$(v)$ leaves (the shape does not matter). The new nodes have count and weight 0. Solve the problem on the new tree. When reading out the solution ignore any new node; this will still be a subforest of the same weight. Because any original subforest translates into a new subforest of the same weight, the found subforest is optimal.
External links referenced by this document:
Local articles referenced by this article: