Mastodon
  • What is Mastodon?
  • Using Mastodon
    • Signing up for an account
    • Setting up your profile
    • Posting toots
    • Using the network features
    • Dealing with unwanted content
    • Promoting yourself and others
    • Set your preferences
    • More settings
    • Using Mastodon externally
    • Moving or leaving accounts
    • Running your own server
  • Running Mastodon
    • Preparing your machine
    • Installing from source
    • Configuring your environment
    • Installing optional features
      • Full-text search
      • Hidden services
      • Single Sign On
    • Setting up your new instance
    • Using the admin CLI
    • Upgrading to a new release
    • Backing up your server
    • Migrating to a new machine
    • Scaling up your server
    • Moderation actions
    • Troubleshooting errors
      • Database index corruption
  • Developing Mastodon apps
    • Getting started with the API
    • Playing with public data
    • Obtaining client app access
    • Logging in with an account
    • Guidelines and best practices
    • Libraries and implementations
  • Contributing to Mastodon
    • Technical overview
    • Setting up a dev environment
    • Code structure
    • Routes
    • Bug bounties and responsible disclosure
  • Spec compliance
    • ActivityPub
    • WebFinger
    • Security
    • Microformats
    • OAuth
    • Bearcaps
  • REST API
    • OAuth Scopes
    • Rate limits
  • API Methods
    • apps
      • oauth
    • accounts
      • bookmarks
      • favourites
      • mutes
      • blocks
      • domain_blocks
      • filters
      • reports
      • follow_requests
      • endorsements
      • featured_tags
      • preferences
      • suggestions
    • statuses
      • media
      • polls
      • scheduled_statuses
    • timelines
      • conversations
      • lists
      • markers
      • streaming
    • notifications
      • push
    • search
    • instance
      • trends
      • directory
      • custom_emojis
    • admin
    • announcements
    • proofs
    • oembed
  • API Entities
    • Account
    • Activity
    • Admin::Account
    • Admin::Report
    • Announcement
    • AnnouncementReaction
    • Application
    • Attachment
    • Card
    • Context
    • Conversation
    • Emoji
    • Error
    • FeaturedTag
    • Field
    • Filter
    • History
    • IdentityProof
    • Instance
    • List
    • Marker
    • Mention
    • Notification
    • Poll
    • Preferences
    • PushSubscription
    • Relationship
    • Report
    • Results
    • ScheduledStatus
    • Source
    • Status
    • Tag
    • Token

Database index corruption

How to recover from database index corruption.

    • Locale data and collation
    • Am I affected by this issue?
    • Fixing the issue
    • Avoiding the issue

A somewhat common configuration issue can lead to index corruption throughout the database. This page attempts to explain why this may occur and how to fix it.

Locale data and collation

Textual values in the database, such as usernames, or toot identifiers, are compared using so-called collation rules defining how characters are ordered and how to change their case. When setting up a database, Mastodon will use the database server’s default locale settings, including the default collation rules, which often is defined by the operating system’s settings.

Unfortunately, in late 2018, a glibc update changed the collation rules for many locales, which means databases using an affected locale would now order textual values differently. Since the database indexes are algorithmic structures which rely on the ordering of the values they are indexing, some of them would become inconsistent.

More information: https://wiki.postgresql.org/wiki/Locale_data_changes https://postgresql.verite.pro/blog/2018/08/27/glibc-upgrade.html

Am I affected by this issue?

If your database is not using C or POSIX for its collation setting (which you can check with SELECT datcollate FROM pg_database WHERE datname = current_database();), your indexes might be inconsistent, if you ever ran with a version of glibc prior to 2.28 and did not immediately reindex your databases after updating to glibc 2.28 or newer.

You may have found this page because of PgHero warnings about “Duplicate Indexes”. While such warnings can sometimes be indicative of an issue in deploying or updating Mastodon, they are not related to database index corruption and not indicative of any functional issue with your database.

You can check whether your indexes are consistent using PostgreSQL’s amcheck module: as the database server’s super user, connect to your Mastodon database and issue the following (this may take a while):

CREATE EXTENSION IF NOT EXISTS amcheck;
SELECT bt_index_check(c.oid)
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE c.relname IN ('index_account_domain_blocks_on_account_id_and_domain',
  'index_account_proofs_on_account_and_provider_and_username',
  'index_accounts_on_username_and_domain_lower', 'index_accounts_on_uri',
  'index_accounts_on_url', 'index_conversations_on_uri',
  'index_custom_emoji_categories_on_name',
  'index_custom_emojis_on_shortcode_and_domain',
  'index_devices_on_access_token_id', 'index_domain_allows_on_domain',
  'index_domain_blocks_on_domain', 'index_email_domain_blocks_on_domain',
  'index_invites_on_code', 'index_markers_on_user_id_and_timeline',
  'index_media_attachments_on_shortcode', 'index_preview_cards_on_url',
  'index_statuses_on_uri', 'index_tags_on_name_lower',
  'index_tombstones_on_uri', 'index_unavailable_domains_on_domain',
  'index_users_on_email', 'index_webauthn_credentials_on_external_id'
);

If this raises an error, your database is corrupted and needs fixing. If it does not, you may need to perform more involved checks to be sure. Unlike the previous checks, those more involved checks will lock tables when running, thus interfering with the availability of your instance.

CREATE EXTENSION IF NOT EXISTS amcheck;
SELECT bt_index_parent_check(c.oid)
FROM pg_index i
JOIN pg_class c ON i.indexrelid = c.oid
WHERE c.relname IN ('index_account_domain_blocks_on_account_id_and_domain',
  'index_account_proofs_on_account_and_provider_and_username',
  'index_accounts_on_username_and_domain_lower', 'index_accounts_on_uri',
  'index_accounts_on_url', 'index_conversations_on_uri',
  'index_custom_emoji_categories_on_name',
  'index_custom_emojis_on_shortcode_and_domain',
  'index_devices_on_access_token_id', 'index_domain_allows_on_domain',
  'index_domain_blocks_on_domain', 'index_email_domain_blocks_on_domain',
  'index_invites_on_code', 'index_markers_on_user_id_and_timeline',
  'index_media_attachments_on_shortcode', 'index_preview_cards_on_url',
  'index_statuses_on_uri', 'index_tags_on_name_lower',
  'index_tombstones_on_uri', 'index_unavailable_domains_on_domain',
  'index_users_on_email', 'index_webauthn_credentials_on_external_id'
);

If this succeeds, without returning an error, your database should be consistent, and you can safely disregard the warning Mastodon emits when running db:migrate.

Fixing the issue

Unless you take action, if you are affected, your database could get more and more inconsistent as the time pass. Therefore, it is important to fix it as soon as possible.

Mastodon 3.2.2 and later come with a semi-interactive script to fix those corruptions as best as possible. If you’re on an earlier version, update to 3.2.2 first. It is possible that running the database migrations to 3.2.2 will fail because of those very corruptions, but the database should then be brought to a state that the maintenance tool bundled with Mastodon 3.2.2 can then recover from.

Before attempting to fix your database, stop Mastodon and make a backup of your database. Then, with Mastodon still stopped, run the maintenance script:

RAILS_ENV=production tootctl maintenance fix-duplicates

The script will walk through the database to automatically find duplicates and fix them. In some cases, those operations are destructive. In the most destructive cases, you will be asked to choose which record to keep and which records to discard. In all cases, walking through the whole database in search of duplicates is an extremely long operation.

In some cases, duplicate records may have unreconcilable conflicts (such as two different local users sharing the same username). In these cases, the deduplication operation may be partially destructive and you will be asked which records to keep unchanged and which records will be changed. This script is therefore semi-interactive. In all cases, walking through the whole database in search of duplicates is an extremely long operation.
Because the maintenance script will temporarily remove indexes, Mastodon has to be completely stopped during the whole process to prevent additional duplicates from occurring.

Avoiding the issue

To avoid the issue, reindex your database immediately after any libc update. The SQL command REINDEX or the reindexdb command-line tool may be useful for this.

Last updated January 16, 2022 · Improve this page

Sponsored by

Dotcom-Monitor LoadView Stephen Tures Swayable

Join Mastodon · Blog · ·

View source · CC BY-SA 4.0 · Imprint