Data Envelopment Analysis (DEA) in Massive Data Sets
José H. Dulá and Francisco J. López
Data Envelopment Analysis (DEA) is a clustering methodology for records in data sets corresponding to entities sharing a common list of attributes. Broadly defined, DEA partitions the records into two subsets; those "efficient" and those "inefficient." An efficient record is one which lies on a specific portion of the boundary of a finitely generated polyhedral set in the dimension of the attribute space known as the `frontier'; inefficient points are those located elsewhere. In traditional applications, DEA frontiers are nonparametric surrogates for unknown theoretical efficiency limits. More generally, however, frontiers are subsets of the boundary defined by extreme elements of the data set. This chapter deals with data envelopment analysis under this broader, more general, definition as it applies to large scale problems.
Keywords: Clustering, Data envelopment analysis, Non-parametric estimation, Linear programming, Convex polyhedral set, Envelopment forms, Decomposition algorithms.