Domain Controller loses major functionality randomly

0

I have a weird scenario going on, 2 domain controllers at 2 different sites that communicate over a BOVPN. At random the main server (named SERVER) will no longer be able to resolve DNS and even opening Active Directory will fail stating that it is unable to contact a DNS server.

Site1 = SERVER 
Site2 = FSSERVER 
Site3 = SERVERFS but this has been    decommissioned and removed from AD

The weird part about it is that I am able to remote in from an external source still and from this server, I am able to ping out via IP to site2 still.

The fix for this is to reboot the server but that is not ideal, it is a Small Business Server 2008 and here is the output from dcdiag:

Directory Server Diagnosis


Performing initial setup:

   Trying to find home server...

   Home Server = SERVER

   * Identified AD Forest. 
   Done gathering initial info.


Doing initial required tests


   Testing server: Downtown\SERVER

      Starting test: Connectivity

         ......................... SERVER passed test Connectivity



Doing primary tests


   Testing server: Downtown\SERVER

      Starting test: Advertising

         ......................... SERVER passed test Advertising

      Starting test: FrsEvent

         There are warning or error events within the last 24 hours after the

         SYSVOL has been shared.  Failing SYSVOL replication problems may cause

         Group Policy problems. 
         ......................... SERVER passed test FrsEvent

      Starting test: DFSREvent

         ......................... SERVER passed test DFSREvent

      Starting test: SysVolCheck

         ......................... SERVER passed test SysVolCheck

      Starting test: KccEvent

         ......................... SERVER passed test KccEvent

      Starting test: KnowsOfRoleHolders

         ......................... SERVER passed test KnowsOfRoleHolders

      Starting test: MachineAccount

         ......................... SERVER passed test MachineAccount

      Starting test: NCSecDesc

         ......................... SERVER passed test NCSecDesc

      Starting test: NetLogons

         ......................... SERVER passed test NetLogons

      Starting test: ObjectsReplicated

         ......................... SERVER passed test ObjectsReplicated

      Starting test: Replications

         [Replications Check,SERVER] A recent replication attempt failed:

            From FSSERVER to SERVER

            Naming Context: DC=sac,DC=local

            The replication generated an error (8524):

            The DSA operation is unable to proceed because of a DNS lookup failure.



            The failure occurred at 2015-03-18 08:49:00.

            The last success occurred at 2015-03-18 05:48:56.

            1 failures have occurred since the last success.

            The guid-based DNS name

            ea2273d9-dd9a-446d-9bc5-6e9507dbb114._msdcs.sac.local

            is not registered on one or more DNS servers.

         ......................... SERVER failed test Replications

      Starting test: RidManager

         ......................... SERVER passed test RidManager

      Starting test: Services

         ......................... SERVER passed test Services

      Starting test: SystemLog

         An Error Event occurred.  EventID: 0xC00A0032

            Time Generated: 03/18/2015   10:27:34

            Event String:

            The RDP protocol component X.224 detected an error in the protocol stream and has disconnected the client.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:28:21

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:33:26

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:38:31

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:43:36

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:48:41

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Error Event occurred.  EventID: 0xC0001B70

            Time Generated: 03/18/2015   10:50:27

            Event String:

            The Microsoft Exchange Information Store service terminated with service-specific error 0 (0x0).

         An Error Event occurred.  EventID: 0xC000271A

            Time Generated: 03/18/2015   10:53:30

            Event String:

            The server {C1F1173B-21B1-11D2-849B-006008198DC0} did not register with DCOM within the required timeout.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:53:46

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:54:52

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={92F3F35E-4AD5-4F7B-A3E6-A7CE17DBB0C7},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x00000450

            Time Generated: 03/18/2015   10:54:52

            Event String:

            Windows was unable to read the Windows Management Instrumentation (WMI) filter information associated with the Group Policy object CN={0C900DC5-7BD9-48C0-B340-F3373D17ED05},CN=POLICIES,CN=SYSTEM,DC=SAC,DC=LOCAL.This may be caused by a deleted WMI Filter defined in the domain that is still in use by Group Policy objects. Group Policy settings for this Group Policy object will not be enforced. Other Group Policy objects may still apply. Windows will attempt to retrieve this information at the next policy cycle. This speciffic problem may be resolved by identifying all GPOs that reference the WMI filter and removing the references. Contact an administrator if this event recurs for several hours.

         An Warning Event occurred.  EventID: 0x800007DC

            Time Generated: 03/18/2015   10:56:07

            Event String:

            While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.

         An Warning Event occurred.  EventID: 0x800007DC

            Time Generated: 03/18/2015   10:56:07

            Event String:

            While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.

         An Warning Event occurred.  EventID: 0x800007DC

            Time Generated: 03/18/2015   10:56:07

            Event String:

            While transmitting or receiving data, the server encountered a network error. Occassional errors are expected, but large amounts of these indicate a possible error in your network configuration.  The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.

         An Error Event occurred.  EventID: 0xC0040031

            Time Generated: 03/18/2015   11:01:13

            Event String:

            Configuring the Page file for crash dump failed. Make sure there is a page file on the boot partition and that is large enough to contain all physical memory.

         An Warning Event occurred.  EventID: 0x80050004

            Time Generated: 03/18/2015   11:01:19

            Event String:

            HP NC326i PCIe Dual Port Gigabit Server Adapter #2: The network link is down.  Check to make sure the network cable is properly connected.

         An Error Event occurred.  EventID: 0xC0040031

            Time Generated: 03/18/2015   11:01:29

            Event String:

            Configuring the Page file for crash dump failed. Make sure there is a page file on the boot partition and that is large enough to contain all physical memory.

         An Warning Event occurred.  EventID: 0x800009CF

            Time Generated: 03/18/2015   11:02:19

            Event String:

            The server service was unable to recreate the share ORM because the directory d:\Groups\New Folder no longer exists.  Please run "net share ORM /delete" to delete the share, or recreate the directory d:\Groups\New Folder.

         An Warning Event occurred.  EventID: 0x00000420

            Time Generated: 03/18/2015   11:02:34

            Event String:

            The DHCP service has detected that it is running on a DC and has no credentials configured for use with Dynamic DNS registrations initiated by the DHCP service.   This is not a recommended security configuration.  Credentials for Dynamic DNS registrations may be configured using the command line "netsh dhcp server set dnscredentials" or via the DHCP Administrative tool.

         An Error Event occurred.  EventID: 0x00000001

            Time Generated: 03/18/2015   11:02:34

            Event String:

            An uncorrected hardware error occurred. A record describing the condition is contained in the data section of this event.

         An Warning Event occurred.  EventID: 0x00001696

            Time Generated: 03/18/2015   11:02:38

            Event String:

            Dynamic registration or deregistration of one or more DNS records failed with the following error: 


         An Warning Event occurred.  EventID: 0x00002724

            Time Generated: 03/18/2015   11:02:42

            Event String:

            This computer has at least one dynamically assigned IPv6 address.For reliable DHCPv6 server operation, you should use only static IPv6 addresses.

         An Error Event occurred.  EventID: 0xC0001B70

            Time Generated: 03/18/2015   11:02:59

            Event String:

            The HP Insight Event Notifier service terminated with service-specific error 1 (0x1).

         An Error Event occurred.  EventID: 0xC435050B

            Time Generated: 03/18/2015   11:03:21

            Event String:

            NIC Agent: Connectivity has been lost for the NIC in slot 0, port 2. [SNMP TRAP: 18012 in CPQNIC.MIB]

         An Warning Event occurred.  EventID: 0x84350463

            Time Generated: 03/18/2015   11:03:23

            Event String:

            System Information Agent: Health: Post Errors were detected.  One or more Power-On-Self-Test errors were detected during server startup. Details of the POST error messages can be found in  Integrated Management Log. 


         An Error Event occurred.  EventID: 0xC0001B7A

            Time Generated: 03/18/2015   11:04:20

            Event String:

            The Windows Internal Database (MICROSOFT##SSEE) service terminated unexpectedly.  It has done this 1 time(s).

         An Error Event occurred.  EventID: 0xC00A0032

            Time Generated: 03/18/2015   11:06:11

            Event String:

            The RDP protocol component X.224 detected an error in the protocol stream and has disconnected the client.

         ......................... SERVER failed test SystemLog

      Starting test: VerifyReferences

         ......................... SERVER passed test VerifyReferences



   Running partition tests on : ForestDnsZones

      Starting test: CheckSDRefDom

         ......................... ForestDnsZones passed test CheckSDRefDom

      Starting test: CrossRefValidation

         ......................... ForestDnsZones passed test

         CrossRefValidation


   Running partition tests on : DomainDnsZones

      Starting test: CheckSDRefDom

         ......................... DomainDnsZones passed test CheckSDRefDom

      Starting test: CrossRefValidation

         ......................... DomainDnsZones passed test

         CrossRefValidation


   Running partition tests on : Schema

      Starting test: CheckSDRefDom

         ......................... Schema passed test CheckSDRefDom

      Starting test: CrossRefValidation

         ......................... Schema passed test CrossRefValidation


   Running partition tests on : Configuration

      Starting test: CheckSDRefDom

         ......................... Configuration passed test CheckSDRefDom

      Starting test: CrossRefValidation

         ......................... Configuration passed test CrossRefValidation


   Running partition tests on : sac

      Starting test: CheckSDRefDom

         ......................... sac passed test CheckSDRefDom

      Starting test: CrossRefValidation

         ......................... sac passed test CrossRefValidation


   Running enterprise tests on : sac.local

      Starting test: LocatorCheck

         ......................... sac.local passed test LocatorCheck

      Starting test: Intersite

         ......................... sac.local passed test Intersite

here is output from repadmin /showrepl

C:\Users\Administrator>repadmin /showrepl

Repadmin: running command /showrepl against full DC localhost
Downtown\SERVER
DSA Options: IS_GC
Site Options: (none)
DSA object GUID: 8c15b912-0f0c-4ee7-9cd0-58176ba3d5ae
DSA invocationID: 8c15b912-0f0c-4ee7-9cd0-58176ba3d5ae

==== INBOUND NEIGHBORS ======================================

DC=sac,DC=local
    Northgate\FSSERVER via RPC
        DSA object GUID: ea2273d9-dd9a-446d-9bc5-6e9507dbb114
        Last attempt @ 2015-03-18 08:49:00 failed, result 8524 (0x214c):
            The DSA operation is unable to proceed because of a DNS lookup failure.
        1 consecutive failure(s).
        Last success @ 2015-03-18 05:48:56.

CN=Configuration,DC=sac,DC=local
    Northgate\FSSERVER via RPC
        DSA object GUID: ea2273d9-dd9a-446d-9bc5-6e9507dbb114
        Last attempt @ 2015-03-18 11:02:25 was successful.

CN=Schema,CN=Configuration,DC=sac,DC=local
    Northgate\FSSERVER via RPC
        DSA object GUID: ea2273d9-dd9a-446d-9bc5-6e9507dbb114
        Last attempt @ 2015-03-18 11:02:25 was successful.

DC=DomainDnsZones,DC=sac,DC=local
    Northgate\FSSERVER via RPC
        DSA object GUID: ea2273d9-dd9a-446d-9bc5-6e9507dbb114
        Last attempt @ 2015-03-18 11:02:26 was successful.

DC=ForestDnsZones,DC=sac,DC=local
    Northgate\FSSERVER via RPC
        DSA object GUID: ea2273d9-dd9a-446d-9bc5-6e9507dbb114
        Last attempt @ 2015-03-18 11:02:26 was successful.

Source: Northgate\FSSERVER
******* 1 CONSECUTIVE FAILURES since 2015-03-18 05:48:56
Last error: 8524 (0x214c):
            The DSA operation is unable to proceed because of a DNS lookup failure.

Per the logs, it seems like replication might be the issue but not sure why. At this point the server has been restarted and here is the stats for repadmin:

C:\Users\Administrator>REPADMIN /REPLSUM
Replication Summary Start Time: 2015-03-18 11:41:37

Beginning data collection for replication summary, this may take awhile:
  .....


Source DSA          largest delta    fails/total %%   error
 FSSERVER              05h:52m:41s    1 /   5   20  (8524) The DSA operation is unable to proceed be
cause of a DNS lookup failure.
 SERVER                    03m:21s    0 /   5    0


Destination DSA     largest delta    fails/total %%   error
 FSSERVER                  03m:21s    0 /   5    0
 SERVER                05h:52m:41s    1 /   5   20  (8524) The DSA operation is unable to proceed be
cause of a DNS lookup failure.

UPDATE NIC settings

Site1

Ethernet adapter Local Area Connection:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : HP NC326i PCIe Dual Port Gigabit Server Adapter
   Physical Address. . . . . . . . . : 00-24-81-FF-D0-9A
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::34da:c891:d8b0:443b%10(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.23.5(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.23.1
   DNS Servers . . . . . . . . . . . : 192.168.13.6
                                       192.168.23.5
   Primary WINS Server . . . . . . . : 192.168.23.5
   NetBIOS over Tcpip. . . . . . . . : Disabled

Site2 (updated DNS servers to point to remote site as primary and local IP as secondary)

    Ethernet adapter Local Area Connection:

   Connection-specific DNS Suffix  . :
   Description . . . . . . . . . . . : HP Ethernet 1Gb 4-port 331i Adapter
   Physical Address. . . . . . . . . : 9C-8E-99-50-10-82
   DHCP Enabled. . . . . . . . . . . : No
   Autoconfiguration Enabled . . . . : Yes
   Link-local IPv6 Address . . . . . : fe80::c599:fef1:ce10:24de%11(Preferred)
   IPv4 Address. . . . . . . . . . . : 192.168.13.6(Preferred)
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.13.1
   DHCPv6 IAID . . . . . . . . . . . : 245141145
   DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-18-88-C5-1D-9C-8E-99-50-10-82

   DNS Servers . . . . . . . . . . . : 192.168.23.5
                                       192.168.13.6
   Primary WINS Server . . . . . . . : 192.168.23.5
   NetBIOS over Tcpip. . . . . . . . : Enabled

UPDATE BPA results Site1

DNS Client not configured - The DNS client is not configured to point only to the internal IP address of the server. For information about how to fix network settings, see "Managing Your Windows Small Business Server 2008 network" at the Microsoft Web site (http://go.microsoft.com/fwlink/?LinkId=115881).

Internal network adapter is not configured to register IP address in DNS - Verify that the internal network adapter is configured to register in DNS. For information about how to fix network settings, see "Managing Your Windows Small Business Server 2008 Network" at the Microsoft Web site (http://go.microsoft.com/fwlink/?LinkId=115881).

Site2

DC BPA Title: All OUs in this domain should be protected from accidental deletion

Severity:
Warning

Date:
3/18/2015 12:25:41 PM

Category:
Configuration

Issue:
Some organizational units (OUs) in this domain are not protected from accidental deletion.

Impact:
If all OUs in your Active Directory domains are not protected from accidental deletion, your Active Directory environment can experience disruptions that might be caused by accidental bulk deletion of objects.

Resolution:
Make sure that all OUs in this domain are protected from accidental deletion.

More information about this best practice and detailed resolution procedures: http://go.microsoft.com/fwlink/?LinkId=142204

DNS BPA

Title:
DNS: The DNS server should have scavenging enabled.

Severity:
Warning

Date:
3/18/2015 12:28:54 PM

Category:
Configuration

Issue:
Scavenging is disabled on the DNS server.

Impact:
The size of the DNS database can become excessive if scavenging is not enabled.

Resolution:
Enable scavenging on the DNS Server.

More information about this best practice and detailed resolution procedures: http://go.microsoft.com/fwlink/?LinkId=188775

****UPDATE QUESTION**

Is it normal for the delta to keep increasing? Could this be an indicator to my issue?

Source DSA          largest delta    fails/total %%   error
 FSSERVER                  51m:34s    0 /   5    0
 SERVER                       :12s    0 /   5    0


Destination DSA     largest delta    fails/total %%   error
 FSSERVER                     :12s    0 /   5    0
 SERVER                    51m:34s    0 /   5    0

UPDATE FSMO roles

C:\Users\Administrator>netdom query /domain:sac.local fsmo
Schema master               SERVER.sac.local
Domain naming master        SERVER.sac.local
PDC                         FSSERVER.sac.local
RID pool manager            FSSERVER.sac.local
Infrastructure master       FSSERVER.sac.local
The command completed successfully.

UPDATE 03/20/15

Issue is back

C:\Users\Administrator>repadmin /showrepl server
Repadmin can't connect to a "home server", because of the following error.  Try specifying a differe
nt
home server with /homeserver:[dns name]
Error: An LDAP lookup operation failed with the following error:

    LDAP Error 90(0x5a): (null)
    Server Win32 Error 0(0x0): (null)
    Extended Information: (null)

C:\Users\Administrator>dcdiag /test:replications

Directory Server Diagnosis

Performing initial setup:
   Trying to find home server...
   Home Server = SERVER
   [SERVER] LDAP connection failed with error 0,
   The operation completed successfully..
   [SERVER] Unrecoverable LDAP Error 89:

UPDATE WORK AROUND

During this time, I restarted the netlogon service and it failed to bring Microsoft Exchange Information store and Transport service back to a started state. After manually starting these services, replication is back to working state. WTF?! No crazy event logs pop up that I can correlate this to

A dcdiag results can be found here: http://pastebin.com/gz0hV4MT

Here are results for netdiag: http://pastebin.com/njNFhY6q I do see fatal errors regarding DNS, C:\Windows\System32\config\netlogon.dns does exist and permissions match that of the other DC.

CORRECTION TO NETDIAG OUTPUT

I was using the 32bit version of netdiag and its known to have issues reading dns file, here are results from 64bit version: http://pastebin.com/z2ZjepqR No failures are showing

windows-server-2008
domain-name-system
active-directory
small-business
file-replication-services
asked on Server Fault Mar 18, 2015 by nGX • edited Mar 20, 2015 by nGX

1 Answer

1

Each server needs to have both AD DNS servers listed in the DNS client settings, but the primary should be a remote AD DNS server IP and the secondary should be the local IP, but not localhost. Also, make sure in your DNS server properties that it is binding to all IP addresses. DO this for both AD servers.

More Info: https://abhijitw.wordpress.com/2012/03/03/best-practices-for-dns-client-settings-on-domain-controller/

Edit: So I looked at your network config, and it looks good except of the DNS servers. Change the order of the DNS servers per the above instructions and see if that helps.

answered on Server Fault Mar 18, 2015 by John Homer • edited Mar 18, 2015 by John Homer

User contributions licensed under CC BY-SA 3.0