TechNet Radio: Patch Management at Microsoft
Published: June 23, 2004
On This Page
Introduction
Overview
Desktop Patching
Server Patching
Best Practices
Questions and Answers
Closing Comments
Introduction
Thomsen:
Welcome to the second installment of the popular Microsoft TechNet Radio broadcast. Today’s broadcast, Patch Management at Microsoft, will cover a topic of high interest to IT professionals around the world. After listening to the broadcast, please remember to go to http://www.microsoft.com/technet/community/tnradio/default.mspx. From this site, you can listen to other TechNet Radio broadcasts, such as the very popular IT Security at Microsoft broadcast. Now, let’s dive into patch management at Microsoft.
Top of page
Overview
Thomsen:
This is Paul Thomsen. I’m a SMS technologist here at Microsoft IT Department. I’m here with Brian.
Keogh:
Hi. I’m Brian Keogh. I’m Group Manager for Server Patch Management, with the responsibility for running the SMS server in SMS service in both server and desktop environments.
Thomsen:
I was thinking that maybe we should start by having you, Brian, tell us how many client computers the Microsoft IT Patch Management people serve.
Keogh:
The best way to speak into this would be to break it out into both the server and the client environment. In the server space, we have about 7,000 servers, and an additional 1,000 servers in lab environments. In the client space, we have about 150,000 desktop systems. One of the things to note in our classification of a desktop system is that it’s a server or system running Windows OS. It could be on a varying range of hardware from a server to a desktop to a laptop to a Tablet PC.
Thomsen:
We currently support about 50,00 employees. Just to give you a scope of the type of processes and technology that we run in Microsoft, roughly 6,000,000 e-mails are sent internally each day.
Keogh:
I might add that we have on the desktop side 160 sites around the world, 450 buildings. In addition to the 50,000 employees, we actually have 20,000 contractors or so forth. Before long, we’re expecting to add devices to our mix of client-side clients. That’ll be another 10, 20,000 devices there.
Thomsen:
When you talk about devices, can you clarify what you mean there?
Keogh:
That’d be like Smartphones and Pocket PCs and so forth.
Thomsen:
Okay. Why is that important?
Keogh:
It means extra workload, obviously, from my point of view. It is a concern from a security point of view in that they are a window into Microsoft just like any other company. There is the potential for sensitive data to be on there and so forth. It’s not clear at this point how much patch management we’ll be doing on them, but certainly security management.
Thomsen:
Thank you. I was thinking we should next talk about technologies that we use here to do patch management within Microsoft.
Keogh:
Sure. We, obviously, use SMS to a very large degree. I would say in the server environment, we certainly patch 95 percent of the 8,000 servers with SMS. Some of the systems are not able to be patched by SMS currently. For example, IA64 is a platform that doesn’t yet have an SMS agent. We’ve also got some servers that are on the outside of the network as such. These would be VPN servers, servers that are not contactable from the SMS infrastructure. For those, we would use manual patch remediation, or just a script, a general script to blast a patch out to them. In the client space, I believe SMS is one of the biggest deployment methods for deploying patches. Am I right in saying that, Paul?
Thomsen:
Certainly it’s the primary one from the Microsoft IT point of view, both in the sense of sending the patches out to the end users, but also in the sense of verifying that the patches are applied to all computers. One clarification I should make is that here at Microsoft we have a cooperative management model with our end-user community, which probably is rather unique, but then again, like any other customer, we do have special issues that we deal with. One of ours is that our users are naturally very technical. They’re programmers and testers and program managers and things like that. We allow them to have a high degree of management of the desktop computers. We institute corporate standards on top of that. As a result, that means that our end users are quite capable of patching their computers themselves. For instance, we leave Windows Update enabled for their convenience. That’s particularly helpful for those computers that happen to be out of the reach of the IT Department’s scope. Therefore, we can’t say that SMS patches 100 percent of desktop computers, or even more than 50 percent. It’s somewhere in that range. The users will patch a certain fraction of them. SMS will patch the remaining ones. If some slip through, then we have other tools. The Security Department has tools to find those computers and either patch them, or block them off the network essentially. We use a variety of solutions like that depending upon the threats and the circumstances. SMS is the primary one from the Microsoft IT point of view. It also allows us to verify compliance so that we can see how well everybody has done in terms of patching their computers.
Keogh:
It’s certainly worth noting that the centralized administration reporting makes the whole task a whole lot simpler. The security patch management feature, with the MBSA integration, certainly seems to make life a lot easier in making sure that your environment is secure. Whenever you move outside of that scope, that becomes a lot more complex.
Thomsen:
Certainly for an enterprise like ourselves, a large organization, we have to have a high degree of accountability and so forth. It’s expected by our management. That’s why we need reporting systems and systems that are quite real-time, like SMS is. It takes a little while for the data to flow up the hierarchy, but it’s happening all the time. We have about as good a view on our patch management situation as could be hoped for with any system; whereas, small organizations may not be able to afford that level of accountability, or simply may not be as big a threat. If you’ve only got 10 or 100 computers, then your patch management issues just are not as serious as they would be for an organization like ourselves.
Keogh:
What mechanisms would you normally recommend for a small or medium size shop to do patch deployments with?
Thomsen:
For the small ones, Windows Update, much like any consumer at home would be quite adequate. That’s on the scale of 10 or 50 clients, that sort of thing. For the larger ones, then SUS is the appropriate option. That might be in the, say, 100 to 500, maybe even 1,000 range, which provides a fairly automated solution, and fairly efficient on the network and so forth, but not with the full control and reporting that SMS is going to give you, as well as, of course, the other computer management facilities that SMS provides.
Keogh:
One of the questions that I’m asked often is with the special cases of the Microsoft, what secrets do we have? What insider knowledge do we work with? Do you want to speak to that a little bit?
Thomsen:
We’re very much like any other customer. We find out about the patches at pretty much the same time everybody else does. I can choose the day of the month and everything. Our Security Department has to go through the same evaluation and so forth. The one thing is that we do hear rumors occasionally that something’s coming out and so forth. Those are pretty much the same kind of rumors you’ll hear outside the company too. Sometimes the PSS guys will have an inside knowledge or various newsgroups or so forth. We don’t have any inside knowledge that allows us to start the preparations any earlier than any other customer. We have to have as responsive a system as everybody else does.
Keogh:
I’m sure you, as well as me and my team, keep our fingers crossed on the second Tuesday of every month hoping that no release will happen.
Thomsen:
Yes, definitely. We’ve got more than enough work already, so we don’t need the extra work of sending out the latest patches. We certainly recognize the reality. It’s one of those realities that we live with in the computer industry. If we happen to have a month with no patches, that’s certainly cool. Also, Brian, one thing that I often get when I’m talking with customers, which I have the pleasure of doing fairly frequently, is that they’ll point out that Microsoft IT is a special case, not a typical enterprise environment, in particular, in the sense that all our users are pretty technical. They’re running great new Windows operating systems. We don’t have any Windows 95, for example. We got hot new laptops and other fancy computers. We don’t have old first-generation Pentiums or anything that we have to deal with. Do you think that’s a fair assessment that we’re not a typical enterprise?
Keogh:
I think there’s certainly similarities and differences. If you look at some of the similarities, security is mission critical. It’s our top priority. This is seen from the very high levels all the way down to the guys who work on the frontline trenches. I often speak to too much, too little time, and mostly reactive. This covers every enterprise. I’m sure there’s very few enterprise IT groups who have loads of time, loads of money, and sit back during a day-to-day basis. We do have a mix of operating systems and configurations.
We have some challenges that perhaps our customers may not experience. We are beginning to see Longhorn in our environments, which while it’s a great thing, creates some complexities around patching. Also in the server space, we’re evaluating Service Pack 1 for Windows 2003. You mentioned users. Not all users are cooperative. When we talk about locking down desktop and the such, there would be a lot of pushback from the user base to make that happen. Of course, balancing security cost and efficiency is a bottom line. I think these are similarities most of their companies deal with. Some of the differences to speak to would be being Microsoft first and best customer. This is something we work very diligently on, and are always looking to provide valuable feedback to the product groups. Also, part of deploying Service Pack 1 for Windows 2003 or Service Pack 1 for SMS 2003, we will deploy these service packs multiple times. By the time SMS 2003 SP1 goes out the door to the public for full release, we will probably have deployed that two, maybe three times. I mentioned that not all users are cooperative.
We’re also fortunate in that the majority of users are technical and local administrators. This can help troubleshooting issues, because we are speaking to people who know how to get around a Windows desktop. Microsoft has got a big target painted on us. Obviously, we’re a high target for security attacks. We need to be extremely diligent in all our steps we may take to make Microsoft secure. Also, we’re also very fortunate. We’ve got state-of-the-art networks, and we are running the latest operating systems. As you mentioned, we’re not running Windows 95. Typically speaking, I think 99 percent of the operating systems we have out there are going to be Windows 2000, Windows XP, Windows 2003, and Longhorn, and a range of service pack versions.
Thomsen:
In a lot of other ways, we’re like other customers. We are different. But then, again, talking to anybody out there or any two IT organizations out there in the world, they’re going to have similarities between each other and differences as well.
Keogh:
I think the point here is that we don’t have a magic bullet. We are just another IT organization in a very large enterprise. It happens to be Microsoft enterprise. We don’t have any magic bullets to solve all the security issues and all the challenges that we face on a day-to-day basis.
Top of page
Desktop Patching
Thomsen:
Paul. We’ll go over the desktop patching aspect in my- of keeping Microsoft networks and the system secure, what it is- some of the specifics of patch management at Microsoft, speaking to the desktop arena. To start with, what is the environment and the SMS service provided?
Keogh:
As mentioned earlier, we got 128,000 odd devices at the moment. We’re at about 450 buildings, 160 sites, that sort of thing, so fairly substantial organization. In terms of services, we certainly do the patch management as we’re discussing today, but also software distribution. We do about 70 to 100 per year. Inventory collection, both hardware and software, we collect quite a few details, and we have about 40 hardware inventory extensions in place to even enrich the data that SMS provides naturally to us. That allows us to do all kinds of reporting for various people that are interested in what the Microsoft community looks like from a desktop point of view in terms of what our hardware is, who’s got certain kinds of software, all those kinds of things. Of course, that kind of data can also be useful for software distribution targeting in that we can send Office updates to people who have Office and things of that sort. Those are our key things. We don’t reuse remote tools at this time at all. It just hasn’t been a big priority for us just yet, although we may in the future. We do have Software Metering enabled, but we don’t actually use it too much. At this point, we haven’t had too many user requests for the data that it was providing. But, we’ve got it enabled. It works nicely with SMS 2003 and throughout our whole hierarchy, which we’re very pleased with.
Thomsen:
Is one of the reasons we don’t use Software Metering is because we don’t tend to worry about how many licenses we have of Office and Publisher and so on?
Keogh:
That’s certainly a part of it, is we don’t need that level of detail. Also, just that people who request things from us haven’t gotten used to this idea of concurrent usage. How many people actually use various applications as opposed to how many people have got it installed? They’re rather used to phrasing their questions in terms of, Okay, whose got this offer installed? They don’t need the metering data per se. Obviously, once they get used to the availability of that data and so forth, they may become more interested.
Thomsen:
Looking at some of the challenges that you face on a day-to-day basis, you have so many clients installed. What kind of challenges do you experience in managing so many clients? How many clients did you say you had installed?
Keogh:
About 128,000 right now and growing all the time. We’re expecting pretty dramatic growth. We’re introducing some strategies to increase our client coverage and so forth. For instance, historically, we haven’t worried about the lab computers, which is the test machines that are used for testing new versions of Office and Windows and so forth in the labs, largely because those machines do require special configurations and so forth. We haven’t wanted to interfere with those machines, but security is, of course, a big concern these days. Even those machines do have to be patched. Rather than having all the labs have a hodge podge of their own patching strategies and so forth, or just not patching them, we’ve started working with them so that we can take management of those computers as well.
Between those kinds of efforts, we’re going to grow to quite probably 160,000, maybe 200,000, or even 250,000. Some significant growth ahead of us in the next number of months. That’s certainly one of the big challenges we face is that we obviously don’t want to do that in a haphazard manner. We want to have a client deployment strategy, so that we’re comfortable that we’re getting all the computers that we should. That’s important for SMS management generally, or computer management generally, is you want to maximize your coverage. It’s particularly critical for patch management, because it doesn’t take a lot of vulnerable computers to allow a denial of service attack, for instance, to take over your network and so forth. It’s something that to maximize patch management success, you have to have a client deployment strategy in place. That will vary from organization to organization depending upon various environmental issues and cultural issues and so forth as to how are you going to find the computers that are out there? How can you get the privilege on the computers in order to install the client? Are there any exceptions? Are there scenarios where you don’t want to install the client for some reason?
In our case, we like to use the logon script. We use that both to install the client in the first place, and then also to check on its health, which I’ll talk about in just a moment. That way we can be confident that any computer that joins the domains, the corporate domains, is one that we’re aware of. That will include pretty much all the computers that are using services on our corporate network. Obviously, there will be some that are in work groups, or in private domains, or whatever, that don’t run our corporate logon script. Those computers at the moment are outside of our management zone. They would also be the first ones to being shut down in terms of network ports if a security issue occurred. That’s going to keep their risk to a minimum from our point of view.
Thomsen:
You mentioned shutting off network ports? What’s the situation with that?
Keogh:
Right. That’s something that our Security Department does, and we had alluded to earlier when talking about the patch management technologies here. The idea there is that the Security Department is the ultimate guarantor of security within Microsoft. When patches got to the point where it must be enforced, must be present on all computers, they’ll do their own scanning to find computers that are vulnerable. They’ll try to fix those computers where they can. If they can’t they’ll just shut down that network port. An internally developed application, very particular to Microsoft, and right now it’s not available outside, unfortunately. There are thoughts about how can we productize it and so forth. Part of it may be that it has to be somewhat hardware-dependent and so forth. It depends upon which routers you’re using and things like that or switches. It’s a tricky thing to share, but it is a useful part of our patch management strategy. It’s obviously a big hammer approach. Shutting down ports is very expensive and inconvenient to everybody. It’s that final wall that you don’t want to get that far if you can avoid it. That’s why you should use SMS and other solutions first. But it’s nice to have that system that we can rely on.
Thomsen:
We’re fortunate in the server space in that SMS is deployed to such a large degree. It is in a very managed state that we have agreement with security that they will not use that port shutdown tool unless absolutely necessary. Even then there’ll be a lot of discussion around it first. We do manage to secure our environment in a pretty quick time. As you say, our area is a much more known and tangible environment rather than desktop space where we have so many managed and unmanaged systems.
Keogh:
Certainly.
Thomsen:
You mentioned some other new strategies that you’re beginning to leverage to get more clients deployed. Could you- do you want to give us a quick rundown on what those are?
Keogh:
Various things. I should also mention in the client deployment space, we also have a boundary management strategy. With SMS, you have to associate clients with sites. For example, people in Hong Kong, they’re computers are managed by a Hong Kong SMS server. People here in Redmond are managed by Redmond servers, and so forth. That’s done using IP subnets. There the trick is that to make sure you’re aware of all the IP subnets. In a small organization, that’s probably not too challenging. For a large one like ours, it is a big job. Therefore, we have to have a strategy around those kinds of issues. In our case, new strategies, or things like improving the logon scripts so that it’s more intelligent in terms of determining client health issues, which I’ll talk about in a moment. But also in terms of, for instance, rather than being a logon script, will make it a computer start-up script so that it starts up when it boots as long as it’s a member of the domain. That will allow us to not worry about whether or not the user has administrator privileges, because now it’ll run within the local system context. That’ll give us that additional percentage of machines where the user is not an administrator.
The client health strategy is the other big thing that I want to emphasize. That is, that computer management software is unique in that it’s pretty much a behind-the-scenes kind of software just running on the computer. The user doesn’t necessarily see the big benefit from it other than occasionally getting software that they might be interested in. Certainly, the users aren’t going to be checking to make sure that it’s functional and so forth, as they might do with an Outlook client, for example. If they’re Outlook client is not working, they’re going to phone the Help Desk and get it resolved one way or another. If the SMS client is broken, they probably won’t notice in the first place. Even if they do, they’re not going to care. We have to have our own proactive kind of strategy and fix those clients on a large scale. For us, that means during the logon script when it runs, it will check various factors for the client. For instance, have the SMS client logs been updated lately? Is the SMS client service started and running? Things of that sort. If things look suspicious, then the logon script can try to fix them. If it can’t fix them, then it will de-install the client and re-install it again. One way or another, we’ll get that computer back on the network and functional again as an SMS client. That means, of course, then that patch management can be successful on that computer. It’s quite important to do that even though it’s a small fraction of computers that fail in terms of SMS clients. It’s not a common issue. When you’ve got 128,000 clients, it doesn’t take a high percentage for that to become an issue that we want to worry about. It’s something that everyone should have a strategy for. I’ve heard of other customers who have a strategy more where they run a tool on their server side that goes looking for unhealthy clients, and then will fix them from a central location. There are alternate strategies that could be used.
Thomsen:
I think I’m right in saying that with the release of SP1, we will be looking at the availability of an SMS client health tool. Do you think that is going to be useful to you in the client space?
Keogh:
Definitely. It’s already something we’re looking at and finding useful. It’s not the full solution at this point. This particular release of that functionality mainly helps us to quantify the client health situation, which is important, of course. You want to know, for instance, in our case because of test computers and people going to conferences and on vacation and so forth, we know that at any given point in time we’re not going to have 100 percent of all computers on the network ready to receive patches. The question is what percentage is not on the network? For us, it tends to work out to over the course of two weeks, we’ll hear from 85 percent of our computers. In other words, 15 percent won’t come onto the network during that time. That’s normal. We know that from the client health tools. Out of that 15 percent that’s remaining, is that because their SMS client is broken? Or is it because they are on vacation and so forth? That’s where the SP1 tools come in is that they’ll do a variety of things, essentially like pinging all the computers and so forth that are offline trying to determine what fraction are online versus broken. It can then give us a good number so that we can break down that remaining 15 percent and have a good feel for what our true client health situation is. Obviously, if we see that it gets worse for some reason, if all of a sudden a lot of clients are online but broken, then we know that we have to start doing some kind of investigation, and fix whatever the problem is.
Thomsen:
You mentioned earlier on, work group desktops, or systems that are in an untrusted domain. I believe that SP1 has some functionality to be able to manage that or to manage untrusted systems. Is that something you see yourself leveraging in the future?
Keogh:
To some degree, yes. SP1 supports that scenario. It may well have worked with SMS 2003 RTM version anyway as an SMS client on a work group computer. But the product team hadn’t tested it and worked out any issues that were outstanding. With SP1, they’ve done all that. They’re happy to support that particular scenario. That does apply to us in some degree in the sense that some computers will join the domain, get the SMS client installed, and then be removed from the domain for some reason and stay in a work group. Those particular clients will continue to be perfectly functional with SMS, which is great. But the computers that have always been in a work group, those are the ones that are outside of our control. At the moment, we’re not going after them. We could in the sense of using SMS Network Discovery, for example. We could go out and find them and try to install the client there if we can get privileges on those computers. For some customers, that may be a perfectly valid strategy and so forth. For us, that’s another phase of the overall client coverage strategy. That may not happen for another six months or so for us.
Thomsen:
What kind of processes are used to manage patches in the desktop environment?
Keogh:
As we mentioned earlier, we hear about patches very much like everybody else on the second Tuesday of each month and so forth. It’s up to our Security Department to determine the actual urgency and so forth. The IT Department is the deliverer of the message, but we won’t make the message in the first place so to speak. They’ll evaluate whether or not the patch is applicable to us, whether we have clients that are vulnerable to it. They’ll also determine whether or not it’s worth deploying, whether there’s a significant enough risk. Partially that’ll be determined based on whether or not there’s actually exploits out there already or viruses or other kinds of software that’s going to try and use those holes to get privileges where they shouldn’t. That’s where the security group will make those kinds of determinations. If need be, they’ll ask us to go through the whole process in 48 hours, where we’ll prepare to deploy, do the actual deployment, and then at the end of the 48-hour period, start actually enforcing the installation of the software, the patch. Normally, we go for a two-week cycle. That gives the users lots of opportunity to see that a patch is available, and that it’s ready to be installed. The users can choose to install it at a convenient time for them. Then they get lots of nagging thanks to SMS 2003. It’s done in a friendly manner. It’s in little bubbles in the bottom right-hand corner and so forth.
Thomsen:
So they get the message.
Keogh:
Yes, exactly. They have no excuse when the time is finished for not having installed it themselves. If we then go ahead and install the software and reboot, it’s because they haven’t been paying attention or just haven’t been putting time into patch management the way they should. The process is quite easy for them. Just a few button clicks, some mouse clicks, and they’re gone. That’s our normal process. With SMS 2003, there’s a little update available so that you can use MBSA 1.2 as your scanning engine. That finds a very high percentage of all the patches that are needed on any given computer. I don’t have specific numbers that I’ve heard, but I’m guessing something like 95 percent or so. There’s 1 in 20 patches maybe, somewhere around that number, that MBSA 1.2 can’t find for whatever reason. They’re using software that can’t be identified by our normal registry entries and things of that sort. What we’ll have to do in that case is create a package of our own, which we happened to do with VB script, which will check each of the computers as it runs on our SMS clients, determine whether or not that particular client needs the patch. If so, install it, the version that’s appropriate for that operating system, and then, of course, report back status on how that went. That’s an additional patch management process that we have to do in some cases, but it’s a small fraction. Normally, we just use the built-in patch management features of SMS. Normally, we don’t have to use that software distribution strategy. The final thing is I always like to emphasize that we try our best to use good, reproducible processes as much as you would see with the Microsoft Operations Framework, and the Solutions for Management. We try to learn from each experience and make it reproducible so that the whole thing goes smoothly and efficiently each time, and we’re maximizing our success.
Thomsen:
With respect to MOFF and MSN, are there any particular sites or locations for this information that might be useful?
Keogh:
Microsoft.com/management has links to it. I think at the end of our presentation too here, we’re going to talk about some resources that are available for that.
Thomsen:
Great. It’s probably valuable to note that these are frameworks that we’ve actually provided some input on. That input will continue as each solution comes out. What kind of challenges do you expect with respect to users and when to reboot their desktops and so on? You spoke to that a little bit. Can you delve a bit more into that subject?
Keogh:
It’s an important thing for any organization I think to work cooperatively with your users, even if they don’t have administrative rights, and even if they don’t normally install software on their computers. Nonetheless, if you work well with them, you communicate up-front, tell them what your patch management strategy is and why it’s important, then they’re going to be a lot more receptive to following the process when the time comes. That means that when they start getting reminded, and it’s convenient, then they will go ahead and click the buttons and allow the patches to install. If, by chance, they forget that and it reboots, then they’ll be aware of why this occurred, and they’ll understand that it was necessary. Those are key factors to success, and, of course, experience. Once you’ve been through a cycle or two of the patch management process, then most of the users will have seen it enough so that they become comfortable with the process. In our case, I can’t say that we have any significant feedback from the users. We don’t have a lot of complaints or anything like that. Things do go quite smoothly.
Top of page
Server Patching
Thomsen:
That’s enough about desktop management for now. I don’t know about you, but I’m always intrigued by the server side of patch management, particularly servers within data centers. You must face a lot of risks of upsetting the service owners, for instance, at Web sites or payroll servers or Exchange Servers go down unexpectedly, that could be quite an issue. What kind of challenges do you face in patching servers?
Keogh:
You raise some interesting question that we do face on a day-to-day basis, or a week-to-week basis. Part of the way that we manage unexpected reboots or patching servers when they shouldn’t be patched, that kind of thing, is by working with the server owners to define maintenance windows. These are windows basically when the server owners have said, Hey, yes. You can apply this patch, a patch, and reboot a server at this time during the week. Each server has a pre-defined maintenance window associated with it. That maintenance window is decided by the server owners. Basically, we’ve broken up our maintenance windows, spread them over four days. That’ll be a Thursday, Friday, Saturday and Sunday, and roughly six maintenance windows per day, giving us a total of 24 maintenance windows. Server owners do have an option to opt out of maintenance windows. That’s something that we do find acceptable, but, obviously, we’d prefer they didn’t.
If they do decide to opt out, then it’s totally on their shoulders to patch their server manually. We don’t mind how they get the patch on it, whether they TS onto the box, or write a script, or whatever. That’s their problem. But they must be patched by the deadline. You mentioned earlier on the deadline that you must achieve as far as keeping the environment secure. We have the same deadline. A server owner must have their system patched within 24 hours for an emergency patch, and within 14 days for a critical patch. If they haven’t managed to patch their server within this time, then they will be forced-patched.
The challenge here for the server owners is that if they haven’t patched, we will force-patch. But we’ll force-patch whenever we feel like it. Whenever it suits us. Typically, that’s at 10:00 a.m. on a Tuesday morning, which happens to be 14 days after the patch is released, and we will reboot their server. You can imagine a SAP system, or an Exchange Server. I mentioned 6,000,000 e-mails a day. If one of those Exchange Servers goes offline, there’s impact felt. We put it back on the shoulders of the server owners. We say simply to them, Hey, you decided not to have this automated remediation. You also decided not to security patch your system. We are simply following up after they drop the ball, and we’re patching their system. I have received escalations about patches being applied after the deadline. There have been serious escalations.
Typically, what happens is it goes up their management chain, comes down my management chain, explain to them what we did, it goes back up and the guy who didn’t patch his server is held accountable for the reboot because he didn’t keep his system secure. It’s a thing that we’ve worked out with our management. We’ve set very clear expectation with the server owners. There’s no exception that we will accept for a server owner going, Well, I didn’t know. There’s a wide scale communication. Not that we don’t choose the same multi-pronged approach as you do in the desktop space, there’s no excuse for not securing your server, considering the impact it could potentially have to Microsoft.
Thomsen:
You were mentioning that there are certain kinds of exceptions though in terms of people don’t have to join your change control windows. Why would people choose to do that?
Keogh:
A good example of that would be if they need a full 14 days to test the patch. Some groups have a very thorough test and change control process for deploying patches. They may decide that we can’t do it on the time scale that SMS will provide, or server patch management will provide. Therefore, they will decide to run the patch to their own test and pre-production systems, make sure that the patch doesn’t have any problems and then will manually take care of it.
If they haven’t tested the patch by the 14th day, we’ll still deploy that patch. There is an exception process. Rather, not an exception process, a deferment process. They can go to security with a very strong business case and say, Hey, we can’t test this patch soon enough. Can we defer the forced remediation for another 7 days or so? Typically, a maximum of 7 days is given. We won’t defer a patch deployment any longer than that.
Thomsen:
You and I have worked together for a while now. One of the things I know about your patch management environment is that you actually use a separate SMS infrastructure from our desktops, separate servers, and so forth. Why is that that you do that?
Keogh:
There’s a few different aspects of this. One is that there’s a small amount of history involved in this. As you know, Paul, we have had SMS in a desktop environment for a significant period of time from SMS 2.0 and perhaps before that as well. In the server space, we’re just- it’s a relatively new install of SMS 2003. We brought SMS 2003 into production in the data center only with beta versions of SMS 2003. The reason we have two different infrastructures is partly because of history and partly because of business requirements. The history is that the SMS infrastructure in the desktop space has been in existence longer than that in the server space. The second piece as well is we brought new infrastructure into the data center because of the SNAs associated with it. For example, you do a hardware inventory every seven days. In the data center space, we do that every night. For us to have so fresh data in the client space, which would be required in the server space, would put a great deal of pressure and resources and processing power of the SMS systems. Remember, we’re only managing roughly 8,000 servers compared to your 150,000 systems in the client space.
We also provide an additional managed service in the server space. I mentioned earlier on our maintenance windows. Part of delivering the maintenance windows is we only provide that service in the server space. It’s not something that we’ve even discussed about deploying to the client space. In order to achieve that scale of service in the server space, we’ve found, certainly by our initial design, that a separate infrastructure is necessary. It’s worthwhile pointing out that we are always doing an ongoing evaluation of the two environments. Part of our current evaluation is to see, can we consolidate the two infrastructures? If we can, and it makes sense to do so, we can still deliver upon the SLAs, then we will do that. If we can’t, then we’ll keep two separate infrastructures. Obviously, we’ll do the right thing for the business rather than consolidating for the sake of consolidating.
I mentioned some of the business drivers, some of the SLAs that we’ll be looking to achieve. Two of the key business metrics that we use to measure our success is the number of servers that are patched outside of their maintenance windows. For example, if we apply a patch outside of a server’s pre-defined maintenance window, that’s going to be a ding against our success. Another metric is going to be the number of servers patched by deadline. Yet, again, you have similar metrics, but measured in a slightly different way. We need to make sure that we continue to run our business in line with those two goals that I mentioned.
Thomsen:
Brian, earlier when I was talking about the desktop computer management side of things, one of the things we talked about was the strategy for deploying clients on the desktops. I imagine you have a similar strategy for deploying clients on the servers. Could you tell us about that?
Keogh:
We do have a similar strategy. I suppose in some ways, it’s more manual. You’re deploying to approximately 150,000 systems. We’re over 8,000 systems. The way that we manage that list of 8,000 systems is using a configuration management database, a tool that we call IT Config. IT Config is not rocket science. It’s a SQL Server with a bunch of details about each server in the data center. This is what we target our agent installs on. All our reports are based on the server list in IT Config. It’s nothing more than just looking at that list, looking what’s in SMS, and then making sure that the delta is addressed on an ongoing basis. Like yourselves, we have some scripts. We don’t use logon scripts or startup scripts for deploying the agent to, but we do have some custom scripts that we use to deploy the agent. Like yourself, these custom scripts are a wrapper around the SMS agent-install process. They test for simple things like connectivity, and has an agent been installed previously? Certain things that we’ve built up over time and shared with the development group to say, These are common issues that we have with deploying agents in the data center and looking to provide feedback to them so that they can help develop the next cycle of tools to be more useful in the enterprise environment.
Thomsen:
What your script does is it goes through the IT Config database, the list of all the servers in the data centers and determines which ones should have the SMS client installed, and then goes ahead and tries to install it.
Keogh:
Yes, pretty much. What the script does, as you say, goes in and checks some basic criteria. If it doesn’t have the SMS agent installed, it will install it, working around whatever issues we’ve experienced in the past. It’s pretty important to stress that in the data center, every server must have an SMS agent installed. This is basically a policy that we’ve set for all the managed locations or data centers. With that policy, we can basically install the agent without any approval or pre-approval or no what matter what a server owner might get upset about. That policy makes our job an awful lot easier.
Top of page
Best Practices
Keogh:
Let’s talk about best practices that Microsoft IT has learned doing patch management. For example, what kind of staffing is needed to do SMS patch management properly on an enterprise of this scale?
Thomsen:
That’s always a tricky question to answer in terms of staffing. The most direct answer is simply that we have one patch management specialist on the server side and one on the desktop side. They’re quite specialized in their role, and get very good at it, and do it well and efficiently. Of course, they are supported by other people. They take advantage of the SMS infrastructure. The other SMS administrators also help in an indirect sense with the patch management. On the desktop side, we’ve got two other SMS administrators, and myself as an SMS technologist looking at the futures and so forth. On the server side, I know that you have two people who support the infrastructure that’s used by the desktop side, but I’m a little vaguer on the rest of the administrators on your server side.
Keogh:
Yes, as you mentioned, we’ve got one guy who does all the patching and software distribution throughout the data center. We’ve got an engineer who’s involved in architecting the futures off all the SMS infrastructure in dealing with some of the projects like SMS 2003 SP1. We’ve also got a senior technologist who is basically the highest level of the escalation. Before we go engage Dev, this guy will basically be engaged by the lower tiers. We then have two guys who are involved in what we term our tier three troubleshooting. These are the guys who deal with SMS on a day-to-day basis. They deal with all the change types and all the management of the SMS infrastructure. Then, of course, we’ve got our tier 2 operations groups. These guys will receive alerts from the MOM infrastructure. They’ll also work with your team for break-fix issues that are experienced that haven’t been picked up by monitoring, because it’s not something that’s typically monitored. All in all, we’ve got about five people supporting the infrastructure, and one guy who is the engineer for the infrastructure.
Thomsen:
And that supports pretty much all the SMS functionality on both sides between those combination of people. We do have management staff, of course. We have people who create our packages on the SMS side, the packaging specialists. In our case, we have four packagers, and a couple of testers, and then various program managers as in project managers, and traditional managers. It’s a little hard to give a specific number in terms of the number of people that assists are critical to our SMS security function. It depends upon where you draw the line.
Keogh:
We speak about the infrastructure support. Maybe I can speak to what makes up our infrastructure. In the server environment, we’ve got about 17 primary SMS servers. That includes one central server where all our data center administration and reporting is done. Then in a client space, we’ve got one central server, yet again, where the administration is done, and this server that you would typically use on a day-to-day basis. We then have that distributed globally. We’ve got about 22 primary sites, and then about 139 secondary sites. Probably, it’s worthwhile noting that the secondary sites are typically located on shared platforms. They’re not on dedicated hardware. They are running services, as well as SMS. They’re running file and print and other services as well. As far as SMS being a good patch management solution, could we use something else in our environment?
Thomsen:
That’s a good point. People are often going to wonder, Well, you’re Microsoft. Of course, you’re going to be using SMS. That’s Microsoft’s solution for patch management on an enterprise scale. There is a certain amount of truth to that. At the same time, security is critical to Microsoft. If we found that SMS wasn’t working, or wasn’t the best solution for the job, we would have to look elsewhere, regardless of what the product team thought. We do choose to use SMS because it’s the right solution for us. Part of the reason for that is it’s certainly the deployment of patches is, as we mentioned earlier, but also compliance checking. We can use SMS to verify that all our computers are out there have got the patches that they should have, regardless of where it got them from.
A key point for us is that we have had SMS in our environment for a while now. We’ve been using it for software distribution, inventory collection, and so forth. We have people and processes in place, as well as the servers, for SMS. Adding patch management on top of that is relatively straightforward. There’s a little bit of extra training, but no additional servers and so forth. That allows us to implement a solution quickly and effectively at relatively little cost. That’s obviously going to be a key reason why we would want to use SMS. Just being an enterprise, we’re going to need a software deployment mechanism. We’re going to need a way to check our computers, and report on them, and things like that. SMS provides that for us. I think it’s fair to say that SMS is the right solution for us, and it works well for what we’re trying to accomplish.
Keogh:
Certainly in the server space, I mentioned the IT config as being our configuration management database, CMDB. We’re certainly looking at SMS to provide some of the base functionality for that. A lot of the dynamic fields that we can collect from systems will be tracked using SMS, which simply adds value to our infrastructure today. We obviously use SMS 2003. What are the benefits of it over SMS 2.0?
Thomsen:
Certainly, we do like SMS 2003 even more than we liked 2.0. I guess the key point when it comes to patch management is that some solutions should be used period. That’s the most important thing. Patch management can’t be ignored. To do it manually is- it’s just prohibitively difficult and error prone. SMS is certainly a good solution for a lot of people. 2.0 is quite a reasonable step along that line. It gives a lot of the functionality that’s needed for proper patch management. But 2003 builds on it. For example, SMS 2003 handles mobile clients much better than the 2.0 did. 2.0 does have various strategies for mobile clients, such as laptops. In fact, there’s a technical paper at microsoft.com/SMS on that subject. The advanced client within SMS 2003 was engineered specifically for that kind of scenario. It’s bound to do a better job on those computers. That’s a substantial fraction of any computer environment these days. For us, we figure about 40 percent of our desktop computers are laptops. Obviously, that’s crucial. Also, I’d referred earlier to the nagging feature of the SMS client where-
Keogh:
Yes, I’ve been through an experience of that.
Thomsen:
Exactly. It’s a friendly little nag, but nonetheless, it’s an important nag, because it keeps the user aware that there are patches available that they should go ahead and install. We found that to be important. With 2.0, it didn’t have that feature. The user would get an initial notification, and then when the grace period was over, they’d be rebooted, or have the patch installed. In the meantime, it was up to the user to remember that they had to go ahead and install this patch. That just didn’t work as well as it should. It didn’t make for as friendly a user experience as it could. We liked that feature of 2003. In addition to that, we’ve got the- just various minor improvements in terms of reporting, this patch management reporting. Status messages are more rich, so to speak. Especially, with Service Pack 1 of SMS 2003, there’s new views and so forth that are available that make the whole reporting solution that much more complete and more detailed. It’s easier for us to give the reliable reports that we have to give to management and others that are interested in patch state.
Keogh:
Very cool, thanks.
Thomsen:
That’s some of the common best practices that we’ve seen with SMS. Do you have any further thoughts, Brian, on best practices for computer management with SMS?
Keogh:
Yeah, I have a few best practices that I like to speak to people about. Going down through a list of such, security is the number one priority. It’s certainly something that every level of our management agrees to. It’s key. The executive support for security being the number one priority is we couldn’t be successful without it. Some of the other pieces as well, process is just as critical as the implementation of the technology. If we don’t have a strong process for our agent health for deploying patches, all that kind of thing, then our security policies and it’s all of our patch deployments, they wouldn’t be nearly as successful as they are. My final best practice to share would be during a soft- a patch deployment, it’s important to understand exactly where you stand. Certainly, in the server space, we increase the frequency of our hardware and software inventory during a patch deployment. We want to understand as soon as possible about any failures, any issues that are experienced. Even in a scan once a night is not going to achieve that. Typically, we increase the frequency to perhaps four times a day, and get our information at our fingertips refreshed quickly, especially during a critical or an emergency deployment, 48 hours. If you just understand where those patches are after a 24-hour period, you’re going to be behind the curve in addressing issues. The final point is increase your frequency of hardware and software inventory during a critical deployment. That’s all the best practices that I have to share.
Thomsen:
Those are good points.
Top of page
Questions and Answers
Keogh:
We’ve heard lots of questions when talking with customers. This covers some of the common questions that we deal with on an ongoing basis. What is the main difference between SMS and SUS?
Thomsen:
SUS provides a decent security patching function for smaller organizations, medium-sized organizations. SMS provides the entire software, computer management suite, software distribution, and so forth. Because SUS is targeted for small organizations, it just doesn’t have the level of functionality that larger organizations are going to need in terms of things like status reporting. Why would you want to use SMS for patch management, and not SUS or Microsoft Update Services?
Keogh:
As we spoke earlier on, SUS is a small to medium scale patch management solution. Whereas, SMS is a much larger, medium to large enterprise scale solution. It depends on the size of the enterprise or the business that you’re running will depend on the solution that you use. Is SMS 2003 Service Pack 1 available for test environments?
Thomsen:
It is available for early testing. Beta 1 is available via Beta Place. You should talk to your Microsoft representative, your technical account manager, or other contact. It will be available for a wider beta later on. Keep an eye on it at http://www.microsoft.com/SMS, and details for joining the beta program will be available there. I want to clarify that SUS and WUS are going to integrate in the future. I’ve heard words to that effect for their patch management capabilities. Can you clarify that for me?
Keogh:
Sure, just worthwhile clarifying, WUS, or W-U-S, is Windows Update Service. There is work going on to integrate WUS with SMS that will increase the patching capabilities. Fortunately, I don’t have any TA for when this is going to happen. I am eager for it to happen, as it’ll make our lives a little bit easier. How often are status updates triggered in the desktop space? Is there a way to force an update?
Thomsen:
This depends upon how you’ve got your environment configured. In particular, the security scanning advertisement that looks for the patches themselves. You do have the option to set it so that it will force an update of your inventory data that flows up through the SMS hierarchy right away. That’s a switch that you can set. In a large environment, you may hesitate to do that because, of course, that’s going to mean a lot of data flowing up your central site not quite all at once, but in a short period of time. That could have a significant load on your network and servers that you’re not used to. In smaller environments, or where you have spare capacity, it’s definitely worth considering. There’s also the element of status messages for the advertisements. Those will flow continuously even as a normal part of the process. If you’re relying on status data, then it’s already pretty much flowing as fast as it can. If it’s the inventory data that you need, then that’s where you might want to think about either the regular frequency – do you run it once a week or once a day – or do you force it to run right away?
Keogh:
Great.
Thomsen:
In SMS 2003, it’s difficult to handle patches for different types of machines, different operating systems and so forth, especially different languages. Some of the machines may require rear boots, some don’t, and so forth. This means that your patch catalog needs to be created for each type of setting so that you have different configurations appropriate to the different client bases. One option might be to create multiple packages and ship out the binaries many times. Is there anything that can be done to make it easier for these kinds of different scenarios?
Keogh:
Yes. Here at Microsoft, we are combining multiple packages using dependencies. The patch installer runs without a reboot. Then another package runs and reboots if the machine is configured to reboot, or needs a reboot. Where does Microsoft run the SMS 2003 MBSA scan tool in the SMS hierarchy? Does the central site handle all inventory, all at once, for all 100,000 clients if it is set to return the inventory?
Thomsen:
The MBSA tool itself runs on the clients. It’s actually distributed as a SMS software distribution as part of the patch management feature set. The bits themselves are executed on the clients. All the analysis and so forth is done at that level. That level of work is highly distributed. The results themselves then flow up as part of the inventory collection. That goes back to the question just a minute ago about whether you want to do it right away, or once a week, or once a day. Our environment is closed with no access to the public Internet. Is it possible that we can manually download the XML files and the bits and then place them on the SMS server?
Keogh:
Yes. You can manually download the XML file. You need to look for it in the download program location to copy it. SMS Patch Management Distribution Software Update Wizard also allows an option to download it automatically, or download it manually.
Thomsen:
Very good. Thank you, Brian.
Keogh:
Thanks, Paul.
Top of page
Closing Comments
Keogh:
I hope you found this discussion on patch management at Microsoft useful. Remember to go to http://www.microsoft.com/technet/community/tnradio/default.mspx to get more details on patch management, links to patch management case studies, learn about the MSN Solutions Accelerator, IT Showcase, and a whole host of other resources are available on this link. Also on the Web site, you will be able to download this entire broadcast, listen to it again on your PC, or read the entire transcript. Thanks again for tuning into Microsoft TechNet Radio.
Top of page