words [options] files
|Report the number of occurrences of each word.|
|Report only the total number of words|
|Convert input to lower case before detecting words.|
|set pattern defining the word separators|
|print version and exit|
|print short help and exit|
|print full documentation via less and exit|
--patternoption. By default, any character other then underscore and alphabetic characters (including accented characters) acts as a separator.
--count option, the output comes in 1 column of words, sorted in case insensitive order. With the
--count option two tab-separated columns appear with the counts in column 1 and the words in column 2; the order will be reverse numerically sorted on column 1 and normally sub-sorted on column 2.
--fold option converts all input to lowercase.
The Prêt-à-porter robe is priced at € 77.50, the shoes (ladies' only) at € 255.
To show the words in it:
words test #=> à at is ladies only porter priced Prêt robe shoes the The
To count the words, after folding upper to lower case:
words --count --fold test #=> 2at 2the 1à 1is 1ladies 1only 1porter 1prêt 1priced 1robe 1shoes
- to be a possible word character, thus finding words like
words -p '[^[:alpha:]-]' test #=> at is ladies only priced Prêt-à-porter robe shoes the The
Note that the - must be at the end of the expression, in order not to be interpreted as a range-character.
To count the number of backslashes in a TeX file:
words --pattern='[^\\]' -c test #=>
but, of course, this is a lot faster:
tr -dc '\\' <test |wc -c