Wordle opening strategy
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Wordle background

    Wordle is a popular free online word game, where the goal is to guess a five letter word. It's essentially the Mastermind 1970s board game, but with words.

    With each guess, you get feedback as to which letters in your guess are in the answer in the same position, in the answer in a different position, or not in the answer. Using this feedback you can make better guesses, and you have 6 tries to rach the right answer.

    For example, if the answer was "cigar" and I guesses "climb", then I get feedback that "c" is in the correct position, "i" is in the wrong position, and "l", "m", and "b" are not in the answer.

    The goal of the opening couple of guesses is to get as much feedback as possible about which letters are in the solution in order to make more educated guesses with the remaining four tries.

    USA Today suggests using "adieu" as an opener, since it contains four vowels. The Sun meanwhile suggests a two-vowel opener followed by a word with popular consonants like "t" or "s". These are intuitively reasonable ideas but neither newspaper seems to have performed any analysis to come up with optimal words.

    Letter frequency of all words

    An important concept in determining the strategy for word games like this is letter frequency. That is, some letters crop up in English more often than others. For example, "e" occurs much more often than "z". We can get the counts of each letter by importing a list of all English words. This analysis uses Mieliestronk's list of 58000 words. It uses British spellings rather than American - my gut feeling is that this means slightly more "s"s and "u"s and slightly fewer "z"s, but it's representative enough for this analysis.

    The easiest way to retrieve this list of words is to use the read_csv() function from pandas. This give us a 1 column dataframe, which we can convert into a list. Although I can pull the data directly from the URL, I've downloaded it locally to save repeatedly accessing the webpage.

    We need the arguments header=None, so it doesn't treat the first word as a header line, and keep_default_na=False so it doesn't treat the word "null" as a missing value.

    import pandas as pd
    words = pd.read_csv(
        "corncob_lowercase.txt", 
        header=None, 
        keep_default_na=False
    )[0].to_list()
    words[0:5]

    The next step is to convert each word into a list of letters. Here I've used a list comprehension.

    letters_by_word = [list(word) for word in words]
    letters_by_word[0:5]

    This list of lists isn't useful to work with, so we need to flatten it into a single list.

    import itertools
    letters = list(itertools.chain(*letters_by_word))
    letters[0:10]

    Next, we need to get counts of each letter, and visualize them. It's actually easier to do this by putting them back into a data frame. There's a bit of messing about in order to get the letters to be a categorical variable that will display in order of frequency from most common letter to least common.

    letter_counts = pd.DataFrame({"letter": letters}) \
        .value_counts() \
        .reset_index() \
        .rename(columns={0: "n"}
    )
    letter_counts["letter"] = pd.Categorical(
        letter_counts["letter"], 
        letter_counts["letter"][::-1]
    )
    letter_counts.head()