Universal Dependencies is a multi-lingual treebank in permanent grow that contains annotated language data. I use this dataset to analyze different language parameters.
The current project makes a thorough per-language statistical analysis of token and sentence lengths for 91 languages present in the Version 2.6 dataset.
Visit the project page for a more complete description.