Word Count Analyzer

Have you ever had different tools give you conflicting word counts? Although word count is a seemingly mundane task it is sometimes the cause of a lot of unnecessary stress in client-translator relationships. Your client's tool reports one word count, and your tool reports a different word count. What is causing the difference? This tool will tell you. TM-Town's Word Count Analyzer searches your text for areas that are known to cause word count discrepancies across different tools and reports those to you. Try the live demo below!

Word Count Analyzer is an open source tool built by TM-Town. Currently this tool supports English; other language support is coming very soon. To see all advanced configuration options choose the "Advanced" radio button below.

or

TM-Town's recommended settings (aka the default settings) are highlighted in orange text

Ellipsis

Hyperlink

Contraction

Hyphenated word


Date

Number

Numbered List

XML and HTML


Forward Slash

Backslash

Dotted line

Dashed line


Underscore

Stray punctuation

Tool Word Count
Your selected settings
TM-Town
Microsoft Word / wc (Unix)*
Pages*

Word count totals for Microsoft Word, wc (Unix) and Pages are merely estimates. The algorithms behind those word counts are liable to change at any time.

Analyzing your text, please wait...


Learn More

Common word count gray areas include:

  • Ellipses
  • Hyperlinks
  • Contractions
  • Hyphenated Words
  • Dates
  • Numbers
  • Numbered Lists
  • XML and HTML tags
  • Forward slashes and backslashes
  • Punctuation

Other gray areas not covered by this tool:

  • Headers
  • Footers
  • Hidden Text specific to Microsoft Word

Ellipsis

default = 'ignore'

  • 'ignore'
    Ignores all ellipses in the word count total.
  • 'no_special_treatment'
    Ellipses will not be searched for in the string.

Checks for any occurrences of ellipses in your text. Writers tend to use different formats for ellipsis, and although there are style guides, it is rare that these rules are followed.

Three Consecutive Periods ...

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Four Consecutive Periods ....

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Three Periods With Spaces . . .

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 3
Pages 0

Four Periods With Spaces . . . .

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 4
Pages 0

Horizontal Ellipsis

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Hyperlink

default = 'count_as_one'

  • 'count_as_one'
    Counts a hyperlink as one word.
  • 'no_special_treatment'
    Hyperlinks will not be searched for in the string. Therefore, how a hyperlink is handled in the word count will depend on other settings (mainly slashes).
  • 'split_at_period'
    Pages will split hyperlinks at a period and count each token as a separate word.

http://www.example.com

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 4

Contraction

default = 'count_as_one'

  • 'count_as_one'
    Counts a contraction as one word.
  • 'count_as_multiple'
    Splits a contraction into the words that make it up. Examples:
    • don't => do not (2 words)
    • o'clock => of the clock (3 words)

Most tools count contractions as one word. Some might argue a contraction is technically more than one word.

can't

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 1

Hyphenated word

default = 'count_as_one'

  • 'count_as_one'
    Counts a hyphenated word as one word.
  • 'count_as_multiple'
    Breaks a hyphenated word at each hyphen and counts each word separately. Example:
    • devil-may-care (3 words)

devil-may-care

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 3

Date

default = 'no_special_treatment'

  • 'no_special_treatment'
    Dates will not be searched for in the string. Therefore, how a date is handled in the word count will depend on other settings.
  • 'count_as_one'
    Counts a date as one word. This is more commonly seen in translation CAT tools where a date is thought of as a placeable that can usually be automatically translated. Examples:
    • Monday, April 4th, 2011 (1 word)
    • April 4th, 2011 (1 word)
    • 04/04/2011 (1 word)
    • 04.04.2011 (1 word)
    • 2011/04/04 (1 word)
    • 2011-04-04 (1 word)
    • 2003Nov9 (1 word)
    • 2003 November 9 (1 word)
    • 2003-Nov-9 (1 word)
    • and others...

Most word processing tools do not do recognize dates, but translation CAT tools tend to recognize dates as one word or placeable. TM-Town's tool checks for many date formats including those that include day or month abbreviations. A few examples are listed below (not an exhaustive list).

Monday, April 4th, 2011

Tool Word Count
TM-Town 4
Microsoft Word / wc (Unix) 4
Pages 4

04/04/2011

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 3

04.04.2011

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 1

Number

default = 'count'

  • 'count'
    Counts a number as one word.
  • 'ignore'
    Ignores any numbers in the string (with the exception of dates and numbered_lists) and does not count them towards the word count.

Simple number 200

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 1

Number with preceding unit $200

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 1

Number with unit following 50%

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 1

Numbered list

default = 'count'

  • 'count'
    Counts a number in a numbered list as one word.
  • 'ignore'
    Ignores any numbers that are part of a numbered list and does not count them towards the word count.

1. List item a
2. List item b
3. List item c

Tool Word Count
TM-Town 12
Microsoft Word / wc (Unix) 12
Pages 9

XML and HTML

default = 'remove'

  • 'remove'
    Removes any XML or HTML opening and closing tags from the string.
  • 'keep'
    Ignores any XML or HTML in the string.

<span class="large-text">Hello world</span> <new-tag>Hello</new-tag>

Tool Word Count
TM-Town 3
Microsoft Word / wc (Unix) 4
Pages 12

Forward slash

default = 'count_as_multiple_except_dates'

  • 'count_as_multiple_except_dates'
    Separates any tokens that include a forward slash (except dates) at the slash(s) and counts each token individually. Example:
    • she/he/it 4/25/2014 (4 words)
  • 'count_as_multiple'
    Separates any tokens that include a forward slash at the slash(s) and counts each token individually. Whether dates, hyperlinks and xhtml are included depends on what is set for those options. Example:
    • she/he/it (3 words)
  • 'count_as_one'
    Counts any tokens that include a forward slash as one word. Example:
    • she/he/it (1 word)

she/he/it

Tool Word Count
TM-Town 3
Microsoft Word / wc (Unix) 1
Pages 3

Backslash

default = 'count_as_one'

  • 'count_as_one'
    Counts any tokens that include a backslash as one word. Example:
    • c:\Users\johndoe (1 word)
  • 'count_as_multiple'
    Separates any tokens that include a backslash at the slash(s) and counts each token individually. Example:
    • c:\Users\johndoe (3 words)

c:\Users\johndoe

Tool Word Count
TM-Town 1
Microsoft Word / wc (Unix) 1
Pages 3

Dotted line

default = 'ignore'

  • 'ignore'
    Ignores any dotted lines in the string and does not count them towards the word count.
  • 'count'
    Counts a dotted line as one word.

.........

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

………………………

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Dashed line

default = 'ignore'

  • 'ignore'
    Ignores any dashed lines in the string and does not count them towards the word count.
  • 'count'
    Counts a dashed line as one word.

-----------

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Underscore

default = 'ignore'

  • 'ignore'
    Ignores any series of underscores in the string and does not count them towards the word count.
  • 'count'
    Counts a series of underscores as one word.

____________

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Stray punctuation

default = 'ignore'

  • 'ignore'
    Ignores any punctuation marks surrounded on both sides by a whitespace in the string and does not count them towards the word count.
  • 'count'
    Counts a punctuation mark surrounded on both sides by a whitespace as one word.

?

Tool Word Count
TM-Town 0
Microsoft Word / wc (Unix) 1
Pages 0

Additional Resources