Incomplete Words

In the context of formal languages, any finite sequence of letters of a given alphabet $\Sigma$ is called a word over $\Sigma$. We call a word incomplete if it does not contain every letter of $\Sigma$.

For example, using the alphabet $\Sigma=\{ a, b, c\}$, ‘$ab$’, ‘$abab$’ and ‘$\,$’ (the empty word) are incomplete words over $\Sigma$, while ‘$abac$’ is a complete word over $\Sigma$.

Given an alphabet $\Sigma$ of $\alpha$ letters, we define $I(\alpha,n)$ to be the number of incomplete words over $\Sigma$ with a length not exceeding $n$.
For example, $I(3,0)=1$, $I(3,2)=13$ and $I(3,4)=79$.

Find $I(10^7,10^{12})$. Give your answer modulo $1\,000\,000\,007$.

This problem is related to combinatorics and the theory of finite automata in the field of formal languages.

In terms of the sequences or words that we have to count, it may initially seem that we simply have to count all words of length at most $n$ over an alphabet of $\alpha$ letters. However, we are asked to count incomplete words, meaning words that do not contain every letter of the alphabet.

To approach this, we need to use a technique called the Principle of Inclusion and Exclusion (PIE). This principle provides a method to compute the number of elements in the union of multiple sets by considering their individual sizes and the sizes of their intersections.

First, the total number of sequences of length at most $n$ using an alphabet of $\alpha$ letters, would be the sum of $\alpha^i$ for $i$ in the range $0$ through $n$. This represents the count of all possible combinations we can have, with all possible lengths from $0$ to $n$.

However, this count includes words that are complete (i.e. words that use every letter in the alphabet). To calculate the number of complete words, we use PIE.

Let’s look at the case for words of length exactly $n$. If we exclude $k$ letters, there will be $\alpha-k$ letters remaining, and the number of words of length $n$ over this smaller alphabet is $(\alpha – k)^n$. There are ${\alpha \choose k}$ ways to choose which $k$ letters to exclude.

However, if we just subtract all these counts, we will be excluding some sequences twice. PIE tells us that, to correct for this, we must add back in the counts for cases where we exclude $k+1$ letters. We continue alternating subtractions and additions until we have accounted for cases where we exclude all but one letter.

If we sum this up for all words of length at most $n$, we must also adjust the power in the term $(\alpha – k)^n$ to account for shorter words.

Finally, we subtract this count of complete words from the total count of all words. This will yield the count of incomplete words (I) of length at most $n$ over an alphabet of $\alpha$ letters.

However, considering the large numbers in the problem ($\alpha=10^7, n=10^{12}$), calculating this directly is not feasible. Therefore, for an exact solution, we would have to come up with a clever simplification or optimization of this formula, or use an algorithmic approach utilizing dynamic programming or generating functions, that takes into account the properties of large exponents or the modular arithmetic involved. This goes well beyond typical high school mathematics and into the realm of competitive programming and number theory.

More Answers:
Neighbourly Constraints
Divisible Palindromes
Palindromic Sequences

Error 403 The request cannot be completed because you have exceeded your quota. : quotaExceeded

Share:

Recent Posts

Mathematics in Cancer Treatment

How Mathematics is Transforming Cancer Treatment Mathematics plays an increasingly vital role in the fight against cancer mesothelioma. From optimizing drug delivery systems to personalizing

Read More »