One of the domain controllers in the network was failing and was reporting numerous errors with replication, active directory object updates and several other problems. The SYSVOL replication was encountering problems as well.
The event log for Active Directory Domain Services was loaded with errors. The DC was logging event IDs 467, 1173, 1084, 2108, 2042, 1925, 1645, and several others.
These logged errors included several issues. Event ID 467 clearly showed that the NTDS database was corrupt.
Event ID 467:
NTDS (584) NTDSA: Database C:\Windows\NTDS\ntds.dit: Index DRA_USN_index of table datatable is corrupted (0).
The event ID 1645 indicated that the SPN for the DC in question was not registered on the Key Distribution Center.
Event ID 1645:
Active Directory Domain Services did not perform an authenticated remote procedure call (RPC) to another directory server because the desired service principal name (SPN) for the destination directory server is not registered on the Key Distribution Center (KDC) domain controller that resolves the SPN.
Destination directory server:
Verify that the names of the destination directory server and domain are correct. Also, verify that the SPN is registered on the KDC domain controller. If the destination directory server has been recently promoted, it will be necessary for the local directory server’s account data to replicate to the KDC before this directory server can be authenticated.
The error 1084 showed that the server was unable to replicate AD objects.
Event ID 1084:
Internal event: Active Directory Domain Services could not update the following object with changes received from the following source directory service. This is because an error occurred during the application of the changes to Active Directory Domain Services on the directory service.
Source directory service:
Synchronization of the directory service with the source directory service is blocked until this update problem is corrected.
This operation will be tried again at the next scheduled replication.
Restart the local computer if this condition appears to be related to low system resources (for example, low physical or virtual memory).
8451 The replication operation encountered a database error.
Attempting to replicate the server using repadmin fails as well.
And a lengthy logged event that ultimately provided the solution. Event ID 2108 shows repair procedures that can be attempted to resolve the issues at hand.
Event ID 2108:
This event contains REPAIR PROCEDURES for the 1084 event which has previously been logged. This message indicates a specific issue with the consistency of the Active Directory Domain Services database on this replication destination. A database error occurred while applying replicated changes to the following object. The database had unexpected contents, preventing the change from being made.
Source domain controller:
Please consult KB article 837932. A subset of its repair procedures are listed here.
1. Confirm that sufficient free disk space resides on the volumes hosting the Active Directory Domain Services database then retry the operation. Confirm that the physical drives hosting the NTDS.DIT and log files do not reside on drives where NTFS compression is enabled. Also check for anti-virus software accessing these volumes.
2. It may be of benefit to force the Security Descriptor Propagator to rebuild the object container ancestry in the database. This may be done by following the instructions in KB article 251343.
3. The problem may be related to the object’s parent on this domain controller. On the source domain controller, move the object to have a different parent.
4. If this machine is a global catalog and the error occurs in one of the read-only partitions, you should demote the machine as a global catalog using the Global Catalog checkbox in the Sites & Services user interface. If the error is occurring in an application partition, you can stop the application partition from being hosted on this replica. This may be changed using the ntdsutil.exe command.
5. Obtain the most recent ntdsutil.exe by installing the latest service pack for your operating system. Prior to booting into Directory Services Restore Mode (DSRM), verify that the DSRM password is known. Otherwise reset it prior to restarting the system.
6. In DSRM, run the NT CMD prompt, run “ntdsutil files integrity”. If corruption is found and other replicas exist, then demote replica and check your hardware. If no replicas are present, restore a system state backup and repeat this verification.
7. Perform an offline defragmentation using the “ntdsutil files compact” function.
8. The “ntdsutil semantic database analysis” should also be performed. If errors are found, they may be corrected using the “go fixup” function. Note that this should not be confused with the database maintenance function called “ESE repair”, which should not be used, since it causes data loss for Active Directory Domain Services Databases.
If none of these actions succeed and the replication error continues, you should demote this domain controller and promote it again.
Primary Error value:
8451 The replication operation encountered a database error.
Secondary Error value:
-1414 JET_errSecondaryIndexCorrupted, Secondary index is corrupt. The database must be defragmented
The final lines of this event ID showed the problem this DC was facing. The server was having database errors with indexes being corrupted and indicating that it must be defragmented. Step number 8 was attempted first to check for errors though none were found. The operation needs to be performed when the AD DS service is not running so the first step is to stop the service. Doing so will automatically stop its dependents which are the Kerberos KDC, DFS Replication, DNS Server and Intersite Messaging services . The semantic database analysis was then ran by starting ntdsutil, activating instance NTDS, entering semantic database analysis and issuing go.
With no errors being shown with the analysis, the offline defragmentation was executed and a new NTDS.dit file was generated. A full backup was taken and the current NTDS DB was replaced with the newly defragmented file.
All the stopped services were restarted. Following the defragmentation, the DC recovered from all the errors and was able to resume normal operations.
The SYSVOL DFSR replication was also in an error state. Event IDs 2212, 2213 and 6804 were being logged. ID 2213 provided the solution as well to resume the DFS replication by running the following command from an elevated prompt. Take care of the notice in this Microsoft KB when resuming DFS replication.
wmic /namespace:\\root\microsoftdfs path dfsrVolumeConfig where volumeGuid=”GUID” call ResumeReplication
DFS replication resumed, however, additional errors were soon logged, most notably event ID 6016.
Event ID 6016:
The DFS Replication service failed to update configuration in Active Directory Domain Services. The service will retry this operation periodically.
Object Category: msDFSR-Subscription
Object DN: CN=56c779af-e088-4cdf-a87e-afaf34c8daa2,CN=0c3e30a1-22f5-4d82-b5f1-39a610bfef89,CN=DFSR-LocalSettings,CN=DC,OU=Domain Controllers,DC=domain
Error: 5 (Access is denied.)
Domain Controller: dc.domain
Polling Cycle: 60
The DC was unable to update its configuration in ADDS due to an access denied error. ADSIEdit was launched to check the permissions settings for this configuration entry. The permissions for the computer object belonging to this domain controller were missing. Full control permissions were added back for the computer object.
The DFS Replication service was restarted and the server was able to successfully resume replication.
Isn’t it great when the logged events themselves provide the needed solutions !