.. _datadb_mnist: ********************************************************************* LeCun et al. (1999): The MNIST Dataset Of Handwritten Digits (Images) ********************************************************************* The MNIST_ dataset of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting. See http://yann.lecun.com/exdb/mnist for more information. .. note:: The version that is offered here is identical to the four files distributed there, but has been converted into a single HDF5 file than can easily be read by PyMVPA. Terms Of Use ============ `Yann LeCun`_ (Courant Institute, NYU) and `Corinna Cortes`_ (Google Labs, New York) hold the copyright of MNIST_ dataset, which is a derivative work from original NIST datasets. MNIST_ dataset is made available under the terms of the `Creative Commons Attribution-Share Alike 3.0`_ license. .. _MNIST: http://yann.lecun.com/exdb/mnist .. _Creative Commons Attribution-Share Alike 3.0: http://creativecommons.org/licenses/by-sa/3.0/ .. _Yann LeCun: http://yann.lecun.com/ .. _Corinna Cortes: http://web.me.com/corinnacortes/work/Home.html Download ======== A single hdf5 file containing entire MNIST_ dataset is available from http://data.pymvpa.org/datasets/mnist/ Requirements ============ * HDF5 access facility. * *PyMVPA 0.5* (or later) provides the `h5load()` function (utilizes H5PY_ package). .. _H5PY: http://h5py.alfven.org/ Instructions ============ >>> from mvpa2.suite import * >>> filepath = os.path.join(pymvpa_datadbroot, 'mnist', "mnist.hdf5") >>> datasets = h5load(filepath) >>> train = datasets['train'] >>> test = datasets['test'] >>> print train > >>> print test > >>> # assign a mapper able to recreate 28x28 pixel image arrays >>> test.a.mapper = FlattenMapper(shape=(28, 28)) >>> test.mapper.reverse(test).shape (10000, 28, 28) References ========== :ref:`LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998) `. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278--2324.