I am upgrading my system from 2012 R2 to 2016 servers. In the process I found that my secondary domain controller does not come online when my PDC shuts down. I ran dcdiag and after trying to fix the errors decided it is probably going to be easier to simply start over with new domain controllers.
But, not being very experienced with these matters, I want to ask what I need to look out for, or important steps I may need to take to avoid problems that experienced operators are familiar with but I am likely not familiar with. I know the basics. I created the PDC and secondary and connected them for replication. At one point I even had all the errors cleaned up. I just didn't keep up with fixing the problems when they occurred.
The system is quite small. A TFS server, build machine and several client PC's. I could literally write down easily all the computers, service account and users and re-enter them if I had to. But I want to make sure I don't miss a step that could cause problems that will take a lot of work to repair.
I found this link at Microsoft. Although it's older, I wondered if these steps would still be a good guide. This guide speaks to replacing a single domain controller, which is effectively my situation since my secondary DC is no longer functioning correctly anyway.
Add after initial post per request in comments: output from
dcdiag /q on the PDC. Of course the BDC also has a series of errors, the first BDC error significant: the BDC failed Advertising and I couldn't figure out how to repair that.
On the PDC, 1st error:
There are warning or error events within the last 24 hours after the SYSVOL has been shared. Failing SYSVOL replication problems may cause Group Policy problems. ......................... VSVR-WBC-DC01 failed test DFSREvent
An error event occurred. EventID: 0xC0000827 Time Generated: 08/02/2017 12:13:42 Event String: Active Directory Domain Services could not resolve the following DNS host name of the source domain controller to an IP address. This error prevents additions, deletions and changes in Active Directory Domain Services from replicating between one or more domain controllers in the forest. Security groups, group policy, users and computers and their passwords will be inconsistent between domain controllers until this error is resolved, potentially affecting logon authentication and access to network resources. A warning event occurred. EventID: 0x8000051C Time Generated: 08/02/2017 12:18:06 Event String: The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate with the following directory service has consistently failed. ......................... VSVR-WBC-DC01 failed test KccEvent
3rd error and evidence I failed to check in on the DC for along time. I won't do that again!
[Replications Check,VSVR-WBC-DC01] A recent replication attempt failed: From VSVR-WBC-DC02 to VSVR-WBC-DC01 Naming Context: DC=ForestDnsZones,DC=wbc,DC=local The replication generated an error (8524): The DSA operation is unable to proceed because of a DNS lookup failure. The failure occurred at 2017-08-02 12:14:07. The last success occurred at 2015-10-09 17:57:42. 11059 failures have occurred since the last success. The guid-based DNS name 8e9f6660-68b8-4273-b79c-6be31f66cd9d._msdcs.wbc.local is not registered on one or more DNS servers. ......................... VSVR-WBC-DC01 failed test Replications
The (8524) error above is repeated 4 more times but only shown once here for practicality.
The DS has corrupt data: rIDPreviousAllocationPool value is not valid ......................... VSVR-WBC-DC01 failed test RidManager
An error event occurred. EventID: 0x00000423 Time Generated: 08/02/2017 12:13:21 Event String: The DHCP service failed to see a directory server for authorization. An error event occurred. EventID: 0x0000410B Time Generated: 08/02/2017 12:13:35 Event String: The request for a new account-identifier pool failed. The operation will be retried until the request succeeds. The error is ......................... VSVR-WBC-DC01 failed test SystemLog
The EventID: 0x00000423 error above is repeated once but only shown once here for practicality.
I appreciate you wanting to look at the dcdiag report. Because the error list was so long and I was having trouble clearing things, that is why I decided to simply start over. Hence my question about what do I have to watch out for, or do, to make a DC recreation as painless as possible.
If the replication was good and the dcdiag show no error, I would suggest to check your DHCP configuration. As the PDC/BDC concept no longer exist and both are suposed to work at the same time.
A common error is to give only one DNS server to the domain computer. Make sure the second DC is there too, and as a bestpractice, thats all, don't configure your DHCP to give external DNS please. It will prevent strange bug too.
tl;dr - as yagmoth555 has pointed out, you need to use the DCs for DNS.
Then, make sure replication works and you've got the whole structure on another DC.
If it does, move the roles RID master, PDC emulator, and infrastructure master over to another/new DC.
Once they're successfully moved, remove DC role from old DC.
User contributions licensed under CC BY-SA 3.0