Contrary to popular belief, we show that the optimal parameters for IBM Model 1 are not unique. We demonstrate that, for a large class of words, IBM Model 1 is indifferent among a continuum of ways to allocate probability mass to their translations. We study the magnitude of the variance in optimal model parameters using a linear programming approach as well as multiple random trials, and demonstrate that it results in variance in test set log-likelihood and alignment error rate.
In everyday life we are often forced to make decisions involving risks and
perceived opportunities. The consequences of our decisions are affected by the
outcomes of random variables that are to various degrees beyond our control. Such
decision problems arise, for instance, in financial and insurance markets.
Necessary and sufficient conditions for ε -optimal solutions of convex infinite programming problems are established. These Kuhn-Tucker type conditions are derived based on a new version of Farkas' lemma proposed recently. Conditions for ε -duality and ε -saddle points are also given. Keywords: ε -solution, ε -duality, ε -saddle point.
Tuyển tập báo cáo các nghiên cứu khoa học quốc tế ngành hóa học dành cho các bạn yêu hóa học tham khảo đề tài: Research Article Optimality Conditions and Duality for DC Programming in Locally Convex Spaces
In this paper, we present a supervised learning approach to training submodular scoring functions for extractive multidocument summarization. By taking a structured prediction approach, we provide a large-margin method that directly optimizes a convex relaxation of the desired performance measure.
When training the parameters for a natural language system, one would prefer to minimize 1-best loss (error) on an evaluation set. Since the error surface for many natural language problems is piecewise constant and riddled with local minima, many systems instead optimize log-likelihood, which is conveniently differentiable and convex. We propose training instead to minimize the expected loss, or risk. We deﬁne this expectation using a probability distribution over hypotheses that we gradually sharpen (anneal) to focus on the 1-best hypothesis. ...