Writing about Windows API functions in Powershell
Ok, before I plunge into studying the details of how PowerShell can be applied to operating Windows API (or manipulating Windows API for that matter) let's figure out what I already know about that. Also, it would be useful to brush off the dust from my ability to express my thoughts since with this two weeks pause it apparently isn't getting any better.
So, yes, just an outline. When I first heard about Windows PowerShell it sounded to me like they made an attempt at designing this system more similar to Linux, where it's possible to programmatically control most of the system's aspects and operations. Like, Linux had the shell, and with it, I could do anything - I could write scripts, automating the most of routine operations: archiving and deleting old and temporary files, making backups, sending emails, and so on. Also, through the shell I had access to the deepest depths of the system itself, like, I could control processes, executing at that moment, allocate memory; I had a precise and granular control over input/output and disk operations. If something wasn't within reach of the standard shell language I could employ C in the same scripts, and here my power became truly unlimited and boundless.
On the other hand, Windows system always provided quite limited abilities for users, regarding their control of the system. Everything was going on under the hood, and it was always a tricky thing - to get access to the stuff magically happening there. Like, Windows had the command prompt. Technically, someone with a relatively good knowledge of Window's system commands could use command prompt for simple automation - like, once again, cleaning old files, for example. Also, some of the power users were even aware that the command prompt allowed organizing loops and conditional branching. (if-then constructions) Still, this tool didn't offer much in terms of accessing more intimate places - magical processes hidden deep inside, and so on. Actually, in Windows, even before they came up with PowerShell, there was another, less often mentioned tool, Windows Management Interface. (WMI) It allowed using small programs written in JavaScript or VBScript to programmatically control the operating system on a more advanced level. For example, I could get and monitor a list of currently executing processes, get the information about installed apps and packages, the amount of free disk space and memory, the status of network activity, open sockets, and so on. There were many possibilities. Probably the most compelling aspect of this tool for any wannabe hacker was that it didn't confine one to his own computer - everything could be done on any computer accessible on the network as well. For example, I could organize a procedure for rebooting all the machines on the network on schedule and then remotely launch on them applications I wanted. Similarly, I could remotely run and terminate applications; I had full access to disk operations - creation and deletion files, and so on. In fact, it always was an effective instrument many network administrators neglected, preferring instead RDP and other tools with visual remote control. But, once again, such tools didn't allow automation.
Now, another thing I need to think about is Windows API. Like, what's the Windows API essentially. In a nutshell, it's a point of access to Window's functions. Like, when you write any kind of program you more or less write a massive bundle of functions. They can be organized into objects with encapsulated data and an ability to create child objects with inherited properties and other polymorphisms. But essentially the process is the same. You write a bunch of functions and then call them in the main program flow or whatever. An operating system by definition is a platform, allowing - apart from maintaining its own crucial processes - other applications developed by other people to be executed on it. To do that it also has to establish certain rules application developers need to abide in the process of interacting with the operating system. For example, if everybody wrote the software, without regard to anything happening in the system at the same time, (the way it used to be in the times of DOS) such system would crash pretty soon. Well, because Windows by definition is a multitasking system with many applications and processes executing in parallel, so, for example, if any process tries to grab a part of the memory currently occupied by a different application it would lead to somewhat disastrous consequences. Similarly, the app can inadvertently interfere with the operating system's own background processes, and it also wouldn't end up well.
So, essentially, to ensure that everyone plays nicely and by rules Windows established a set of entry points or procedures, using which applications running on top of it can perform their operations without disrupting anything else. For example, if I want to allocate some amount of memory for my application's data I cannot do it directly (using the C malloc command or whatever) - I need to call a special Windows API procedure that, in turn, addresses the root process; then probably there in the heart of the OS happens some ritual of figuring out how to react to my request, but eventually, in most cases, the system satisfies my plea to get additional memory. In the end, I get a pointer to the area of memory the system allocated for me, where I can safely place my data or do whatever. It's worth noting that I don't have any control over where this memory will be allocated physically, or maybe when the operating system is running out of memory I can get less than I requested, and so on. The point is, I cannot do anything directly; I always have to address the Windows API. As a matter of fact, all the application builders and other tools with a high level of abstraction in the end address API to do whatever they are trying to do like, creating fancy interfaces with endless buttons and diagrams, capturing user's input, and so on.
So, yes, and the topic is about using Windows Powershell to operate Windows API. Even without knowing much about details and technicalities at this stage, it already sounds interesting. As I perceive Powershell, it's something similar to when I open a command prompt window and begin to do some sort of magic. I also know that using Windows API I can do anything in the system due to the simple fact that everything in the system is done through the Windows API. So the combination of Powershell and Windows API is somewhat like unrestricted magic on the tips on my fingers. Ok, to be continued.
Ok, another thing, before I get bogged down in the details. It's probably a good thing to remember about any kind of research. I mean, when you begin to loosely browse and read a multitude of materials, without having a clear outline in your head like, what are you going to write about and in what format, this process of browsing and reading turns into something akin to the procrastination. Like, there can be a lot of various information, but, without a clear outline, you don't know what you need to focus on, what has to be memorized; you miss points important for your task. Therefore it's always sensible to establish the outline before plunging into that infinite treasure trove of information the Internet provides for any possible topic. As a matter of fact, I learned this by the way of trial and error - through my own mistakes and failures, mostly failures to meet the deadlines, which wasn't fatal but nevertheless. So the point is, studying the materials is a dangerous process, capable of sucking you in for a long time if you haven't decided what exactly you want to know. On the other hand, if you do the research hopefully gets more limited and more directed to a set of specific goals and doesn't turn into an endless process of browsing and reading.
So, literally, the description of the task at hand sounds like "To describe the process of interaction with the Windows API using Powershell." The final document should be about two or three pages long. Three pages in my experience are about twenty-five hundred to three thousand words - a quota that can be more difficult to fulfill if it needs to be written entirely from the top of my head, like, giving my own ideas, opinions, and analysis. It's not that difficult if it's a retelling, in some form, of the existing information. The problem is, it shouldn't be a retelling either. Also, the technical details related to this Windows API interaction using Powershell are presumably innumerable, so I need to decide in advance what I consider important, regarding this interaction. Also, I need some structure. As I mentioned above, the Powershell gives a user a sense of being empowered (hence the name) because it provides unlimited possibilities, regarding controlling the system. And Windows API naturally contains an exhaustive list of things that can be done to the system - literally, anything. So this is one point.
I think it will be sensible to briefly explain first what Windows API actually is, as well as what's Powershell and what additional abilities it gives in comparison to what used to be before (Like the command prompt and stuff) So we have a necessary introductory paragraph.
Then the things get a little bit more tricky, although on the other hand, not so much. Let's dive into the mind of a programmer who wants to learn about this thing - interaction with Windows API using Powershell as quick as possible, without reading much. Also, this information has to be somewhat exhaustive, in a sense that, after reading it, the programmer will be ready to go - like, to do something useful, practical, and magical, using his/her newly acquired set of skills. Also, it's very likely that the programmer already knows a lot about Windows API and all kinds of shells, so there's no point in dwelling on it indefinitely.
Here comes the question about the balance of explanation and examples. I can use examples, but considering that Windows API has thousands of various functions I need to choose wisely. In other words, I need to pick something that would be both somewhat practical and exciting at the same time. Because programmers are easily excitable. Like, I don't know, a way to use one simple command to delete all the files on a hard drive, remotely, say, in Pentagon. Or how to operate graphics memory, although operating graphics memory is somewhat irrelevant here, considering that nobody's likely to create graphics applications via the command prompt. So there needs to be something practical.
Another important question is whether I need to explain how to set up all this system - fortunately here I actually won't need to explain anything since Powershell goes prepackaged with operating system starting with Windows 7 or something. I probably will need to both mention this fact and briefly explain how to launch it. Plus, give some idea why Windows developers came up with this tool in the first place. So this all goes in the introduction. Like, "Powershell is a powerful tool designed to increase and amplify the power of power users." Something like that.
Then right to the meat of the topic. Maybe I can give one introductory example, like, a necessary "Hello World" style thing. Although it doesn't necessarily need to be so basic and primitive, considering that the material is aimed at more or less qualified and experienced programmers. So, what is the most ubiquitous thing qualified programmers to do with Windows API? Probably this would be the introductory example.
Starting from here I'll probably need to identify the key aspects of this application of Powershell - its ability to interact with Windows API. Because so far it exists in my head as one solid mass - like, using the command prompt window to access the inner depths of system's sacred innards. I bet there are various aspects of this process, I'm yet unaware of, that can be categorized and described separately.
Also, returning inside the head of the potential programmer reading this text. Ok, what would I consider important specifically for me? What would I like to know right from the start? Ok, there are some mundane and pragmatic matters I wonder about. Like, I never used Powershell to operate Windows API myself, but from what I know about Windows API, each of its commands contains a long list of parameters many of them optional and having a default value. So I wonder if it's possible in Powershell to avoid enumerating all those endless parameters. It's the first thing. Another thing I wonder about is that many variables are passed to the API function by reference, in other words, after the function is completed they can get assigned specific values, and, as far as I remember, this way of feedback is used much more often with Windows API than the function's return value, for example. So how PowerShell deals with that?
Ok, probably it will be sensible to mention this nuance of Windows API based programming and find an example, showing how elegantly it can be done with Powershell. (If it indeed can be elegantly done with Powershell, I have no idea)
Another thing I would be thinking about as a programmer: Apparently operating with Windows API using Powershell has its limitations. Like, as I noted before, it's not likely that somebody will try to create a graphic application this way, so all the functions dealing with graphics memory and drawing stuff are somewhat beyond this context. I mean, it's probably still possible to use them, but what would be the point? So another thing, I need to outline the sphere of meaningful application of this tool and technique - define the cases when it's sensible. Draw the boundaries so to speak. Yes, and also find a bunch of the most interesting and relevant examples.
Speaking of the examples, it will be sensible if each example will illustrate some separate aspect of Powershell application to Windows API. Also, covering the range of potential questions that might emerge among programmers dealing with this tool. Ok, to be continued.
Ok, I need to reflect a bit on the information I got so far and decide where to dig further. One of the key problems is that information existing on the Internet, regarding calling API functions from Powershell scripts is very technically specific and apparently aimed at the specialists who are already deep into this topic. The question is whether to leave it this way with all the technical specificity and jargon or to bring it to a more layman-friendly, understandable level.
The key general points that can be deduced from the information I got:
Powershell is a tool build atop of the NET framework, utilizing mostly its functions and capabilities.
Although NET framework, in turn, is built on the top of Windows API (it always boils down to Windows API) calling API functions in Powershell scripts is considered somewhat of a quirk. It's not a particularly typical thing you might see in the Powershell script, in other words.
So the punchline here is that although Powershell itself has a lot of useful and powerful commands and also it can easily tap into the power of NET framework, neither the Powershell nor NET framework has all the tools, necessary to accomplish all the possible tasks. In this case, sometimes it takes to plunge directly into the murky waters of Windows API functions.
There are several practical problems mentioned in relation to converting Windows API functions into the Powershell commands (or scriptlets, or whatever the fuck they are called, I need to check.)
First, the argument types in API functions are not the same as C# types used in PowerScript. The former came from the times when Windows programs were written in pure C++. In any case, it's not a big deal either since for each native API type there's a compatible C# equivalent. This type compatibility can be verified on a specific site generally dedicated to the issue of translating Windows API functions into Powershell commands (or scriptlets or whatever) (I need to check out this site since there's also some information there that can clarify some things currently baffling me)
Another thing mentioned in passing is that compiling C# code leaves some traces on the hard drive. It's considered bad in terms of security. It's not clear to me at this point, why is it so, or whether this information is particularly important in the context of my writing. Just a thing to keep in mind since it sometimes defines the method programmers use to convert Windows API calls into C# snippets.
Speaking of which, there are three methods. The first is through add-type command. Why it's called type is somewhat makes me confused, as well as other things since I'm not very familiar with C# programming and its jargon. The point is, there's a special syntax allowing to write a sort of wrapper around native Windows API function, which subsequently allows to address it as a C# function or something. Not forgetting that it needs to be supplied parameters with types compatible with the respective native Windows API types.
The second method reads literally like "Get a reference to a private type in the .NET framework that calls the method" and I'm yet to figure out what the fuck it's supposed to mean. Although it feeds my curiosity as something mysterious and barely comprehensible. As I said, there's a lot of C# specific jargon making the picture somewhat muddled, murky, and hazy. What I managed to figure out so far though is that by type they probably mean something that normal people normally refer to as the function or method, but anyway. Ok, I need to Google what "type" exactly means in the NET framework programmers' lexicon.
The third method is called, "Using reflection to dynamically define a method that calls the Windows API function." It's almost as vague to me as the previous one, but in this case, I can make an educated guess. Namely, the whole list of Windows API functions can be requested somehow. In programming, it's called reflection. Like, I may write a program not knowing actually what commands and procedures are available to me, but I can include in it checks, for example, what methods of working with the disk are present in the system in its current configuration, the number, and type of parameters they require. Further, the program will continue according to the received information. This all means that the program is somewhat fluid and dynamic - it gets formed on the go, which is somewhat fascinating.
Also, another side note: I see here somewhat of a potential structure for this article. There are three methods of invoking Windows API functions in Powershell scripts, and I can explain how each method works and in what situations it makes sense to use each of them. Preferably in humanly-understandable language.
Speaking of which, one serious problem I face here is the aforementioned jargon and terminology requiring some background in C# programming. Similarly, the syntax of the examples is somewhat confusing; if I wanted to explain right away why there's a certain instruction or, say, parentheses, I wouldn't be able to answer, being hundred percent sure that I'm not making things up. Ok, here's another nuance to zoom in on.
I think I will take an example with the Windows API copyfile function because, in the material, where it comes from, it's very well explained why it's sometimes necessary to invoke Windows API functions, in the first place. Namely, because sometimes there's no other choice. So it's going to be the final part of the introductory paragraph, explaining what API, and C#, and Powershell is all about, and the evolution of stuff, gradually accumulating on the top of pure Native Windows API.
So what do I need now? Probably a clearer picture of Powershell and what it can do. Also, the relationship between Powershell and NET framework. In other words, is it possible to call NET framework functions directly in the Powershell command line or not? Also, I need to brush the dust off what I know about the NET framework itself. Like, what does it offer, in terms of an ability of programmers to do even cooler things? And how the process of development for the NET framework is fundamentally different from writing code in C++ using native Windows API. At least, it hopefully will add clarity to the terminology used.
Also, since it's clearly a topic aimed at people who are already deep into this sphere of knowledge there's no point in lengthy explanations about Powershell, NET framework, and especially, Windows API. Everybody knows pretty well what those things are.
So the possible structure looks like the this:
A brief introduction, telling just a little bit about Windows API, Net framework, and Powerscript. Shedding some light on the evolution from the point, when people wrote programs in C++ invoking native API functions, to the point when people started typing commands into Powershell console window. Also, mentioning why it's so cool actually.
Then, why it's sometimes impossible to stick to Powerscript commands and C# functions, and why sometimes it's necessary to actually invoke native Windows API functions.
Then the meat of the discussion: three methods of invoking Windows API functions in Powershell scripts, upsides, downsides, what method is justified in which situation, what exactly is happening when I use a specific method, and so on.
A conclusion with some words about why it's actually so cool to invoke Windows native API functions in Powershell scripts.
Ok, to be continued.
Ok, there are several things to ponder. The first method of including a Windows API function into the Powershell script has to do with a piece of C# code that gets compiled on the fly. It means, in a course of its execution such a program leaves traces in the form of chunks of compiled code. Which is considered not good from the point of view of security. Also, this API function needs to have a respective representation in the NET framework, which is not always the case. So this is when there's a reason to resort to the third method of invoking API functions that has to do with reflections.
Also, another small but important nuance I need to figure out is the usage of namespaces and class names while adding new types via add-type. I learned that "type" in this context is just a fancy way to say, class. In fact, you add a class with methods, defined within it. For some reason, in the NET framework, it's called class. Also, the instructions executed in the Powershell window are called cmdlets. For some reason.
Ok, also, it's worth noting that, normally, methods of the class (type) can be static and dynamic, and they have respectively different syntax when addressed in the Powerscript after the class is added to the list of powerscript cmdlets. In the latter case, calling a method predictably requires to create an instance of the class first. And in the former case, the method is called using a special fancy syntax Classname::Methodname. Since it's a static method and stuff.
Speaking of adding Windows API functions to the Powershell, it has also to do with a thing called PInvoke. (Platform Invoke) If I understand it correctly this mechanism is responsible for compiling on the fly and adding chunks of NET framework code to the Powershell session. There's also a special site PInvoke dot net, containing NET representations of various API functions (with native argument types already converted into the respective NET framework types) Basically, those are chunks of code ready to be included into the add-type command.
The things requiring my narrower focus at this moment are the following:
The second mysterious method of including Windows API methods that have to do with some private dot net functions invoking respective API functions. Or something like that.
I partly understood the third method of adding Windows API functions to the Powershell session via reflection, although I probably need to dwell a bit more on it.
I need to google what the fuck the term "assembly" means in the dot Net jargon. I mean, precisely.
Ok, let's roll, the key points.
Powershell is something that Microsoft developers came up with when they saw how cool and easily everything is done in Unix and Linux systems, like, everything and all is accessible and available via the command line. At that point, MS had a command prompt, which was pretty lame, and CScript boosted by WMI interface, allowing programmers and system administrators to penetrate deeper into the OS's sacred internals. Still, it wasn't as good as the Linux's powerful command line, so one day somebody came up with an idea to create something similar in Windows. It wouldn't be really an equivalent since the systems' fundamental architectures are different like, everything in Linux can be seen as a plain-text file while everything in Windows, such as input and output of functions, is a highly structured data. Nonetheless, in the end, they invented a thing called Powershell. In a nutshell, it's something between command prompt and a programming interface, like, in order to do something in Powershell I use so-called cmdlets. Cmdlets are in some way similar to MS-DOS commands that preceded them but, in fact, they rely on the underlying power of NET Framework. Those are like wrappers around respective NET Core functions or something. Speaking of which, if a programmer finds the base functionality provided by cmdlets insufficient he can add to the Powershell session any additional existing or custom class using Add-type. In Powershell, terminology class called a "type" for some reason. In any case, it allows adding additional methods and functionality to the Powershell session. In fact, there's somewhat of a deep hierarchy of nested concepts here. Everything starts with the Assembly - a conglomeration of various classes and whatnot, within the assembly there's something else, representing the current session, and within that thing located a declaration of the class (type). Also, there are namespaces. Ok, at least when some particular method existing within the scope of the Powershell session is addressed it's prefixed with the declaration of namespace and declaration of the class (type). Also, class methods (type) can be static or dynamic, so to invoke the latter one needs to instantiate the class first, and in the former case to use a fancy syntax like className::methodName.
Ok, a key point of the narrative here would be that Powershell heavily relies on the underlying NET framework infrastructure. And NET framework, in turn, relies on Windows native API functions since, in the end, all that fancy stuff, including NET Core, NET framework, and whatever, is built on top of the native Windows API. So sometimes it even makes sense to call those functions from the Powershell. One nuance here is that in order to directly add a Windows API function to the Powershell session, it needs to have a sort of representation in the NET framework. In fact, I invoke an equivalent NET Framework function or something. Also, I need to ensure that types of the arguments I pass to the function added this way are compatible with the respective types in the native Windows API function. Apparently, to make things more convenient and lives of programmers easier there exists a special site PInvoke dot net, where for all (or at least the most frequently used) Windows API functions there's a chunk of code that can be fed directly to the Add-Type command. All the types in the function declaration are already translated therefrom Native Windows API types (that were used when the Windows software was developed in pure unadulterated C++) to the equivalent C# types.
So, yes, Add-Type is a pretty simple way to add the raw Windows core functionality to the fancy way the things are done now in the Powershell. Although from the point of view of people concerned about security there are some downsides. Particularly, when the chunk of code is added via add-type the built-in compiler compiles it on the go, naturally leaving some undesirable traces on the hard drive. Pieces of object code or something like that.
So regarding this situation, it's sometimes more sensible to use other methods of adding Windows API functions to the Powershell. Also, an additional reason to do that is that some functions might not have their respective C# representations within the scope of the current Powershell session, so it makes it impossible to add them using add-type. So, nonetheless, there are two other methods of adding Windows API functions. The first sounds like "using a private type that invokes the respective function." And the second method has to do with reflection. Reflection basically is when I can request what classes and functions are currently available in the system and act accordingly. Ok, to be continued.
Ok, speaking of differences between adding Windows API functions using the add-type method and, for example, using a private class, addressing respective API function. For one thing, according to the documentation, add-type method deals with NET Core classes. So the Windows API function, called this way, needs to have some sort of NET Core implementation or wrapper. In other words, if it's not registered as a valid member of some NET type (class) it's impossible to invoke it using add-type. Probably there exist some exotic API functions, not having any representation in the NET Core.
Here it's also worth mentioning that NET framework is organized in the following way:
There is AppDomain - a bigger entity, representing both the current Powershell session and the top level of the hierarchy, within which located assemblies. Assemblies in NET jargon reflect underlying DLL libraries, containing classes (types) and functions. Each assembly contains modules - a module represents some specific DLL library, and within modules, there are types (classes) with their static and dynamic methods, properties and other things - all together it's called members.
So the second way of invoking Windows API functions uses P\Invoke (Platform Invoke) principle. This principle and technique are related to the fact that since the inception of NET framework Windows operates two types of code - so-called managed code and unmanaged code. Managed code is the code executed on a CLR virtual machine, designed to perform code written in many different high-level languages and compiled into a sort of intermediate language, subsequently converted into the machine code by the virtual machine in the runtime.
Unmanaged code can be anything from Windows API functions that haven't found their representation in the NET Core or third-party libraries, for example. What they have in common is that this code cannot be executed in the controlled environment of CLR. Still, such methods can be invoked and used within the managed code.
Particularly, the third method of using Windows API functions in the Powershell has to do with using P\Invoke technique to extract the raw code from respective DLLs. Also, a side note. To declare a Windows API function in the scope of Powershell session in such a way, one needs to build a dynamic assembly within the current AppDomain, create a dynamic module representing respective library within this assembly, and only then, within this module to pull the desired class and method from the DLL. After all these manipulations are successfully completed, the target method can be called in Powershell (prefixed by the classname) As another side note, Windows API isn't particularly organized in classes, so mostly all the functions in a specific library are represented by a single classname, somewhat reflecting the library's general purpose.
Some notes.
I need to add a correction to my previous assumption that add-type always requires a C# representation of a native Windows API function. Here there's also the P\Invoke mechanism at play. Also a side note: A description of the Win API function invoked this way is called function signature.
Ok, a fish skeleton.
Since the inception of NET framework technology (aimed to allow users more freedom and flexibility in operating their computers) all the code executed on Windows systems can be divided into two categories: managed code and unmanaged code.
Managed code is the code compiled from one of the multiple high-level languages supporting the paradigm of CLI (common language interface) In other words, it doesn't matter what language you use - eventually it gets compiled into the intermediate code that gets executed in the controlled environment of CLR (common language runtime) - a virtual machine that converts this intermediate language into the machine code in the runtime.
The benefits of code execution in the controlled environment is a higher security, an ability of programmers to work on a higher level of abstraction, and potentially, an ability to migrate the code from one platform to another, without hardware compatibility problem.
The unmanaged code is represented by the third-party libraries, Windows API functions not having their respective representation in the .NET framework, and other pieces of random code.
Technically, all the architecture of the .NET framework has Windows API as an underlying layer. So any calls to the .NET Core functions eventually leads to addressing Windows API functions.
Most of the operations in Powershell are performed using so-called cmdlets - command line instructions, utilizing underlying .NET framework functionality. If the capabilities provided by cmdlets are insufficient to solve a specific task there's another option: using the add-type command to add a specific .NET Core type to the Powershell session. .NET Core type in its structure is similar to a class in OOP - it contains methods, properties, events, constructors, nested types; altogether it's called members.
Generally, NET Framework in the context of Powershell organized in the following way:
The uppermost level is AppDomain, which somewhat represents the current Powershell session. The AppDomain serves as a container for Assemblies - collections of modules and types stored in the respective DLLs. A module is typically associated with a specific DLL - and all the functions stored in that DLL are represented by types (classes) and their members - methods, properties, etc.
So, sometimes, even the NET Core functions aren't enough to solve the task, for example, when it's necessary to set system parameters or implement a function copying files with the specific paths that cannot be processed by copy-file cmdlet. In such cases, it makes sense to tap directly into the mighty treasure trove of powerful Windows API functions - after all, everything is eventually built upon them.
There are three ways to do that. The first one is using the add-type method. Similarly to how it can add to the Powershell session a .NET Core function, written in C#, add-type can add an unmanaged piece of code (API function) from one of the core Windows libraries.
In order to do that, the system uses P\Invoke technique. P\Invoke is a standard method of including unmanaged pieces of code into the programs designed to be executed in the managed environment (CLR)
Without getting into technical details, adding a Windows API function into the Powershell environment requires calling add-type cmdlet with the function's signature as a parameter. The function's signature is a description of a function, including its location (a respective module or DLL) and the list of its parameters. The types of parameters should be C# equivalents of the respective native Windows types.
There's a site Pinvoke dot net that makes lives of programmers easier - it contains the signatures for most of Windows API functions. Also, an important thing to remember is that types of arguments in Windows native functions and C# are different, therefore it's also important to declare compatible argument types in function's signature.
After the function is added to the Powershell session, it can be called as a Powershell command prefixed with the name of the respective class. It's worth noting that functions added this way can be declared as static or dynamic methods of the added type. To call the latter, it's necessary first to create an instance of the class (type), containing this method.
The downside of the method of adding Windows API functions through add-type is that since the pieces of code added via the P\Invoke mechanism are compiled in the runtime, it leaves pieces of compiled code on the hard drive, which is not good in terms of security.
Also, if the function isn't in the scope of the current Powershell session (the DLL containing respective module and type isn't loaded, for example) it's impossible to invoke the function this way.
The second way of adding the Windows API function to the Powershell is by referring to the private type, calling this function. Here we need to remember that most of the NET framework functionality is based on the underlying Windows API functionality. There exists a collection of NET Core private methods, addressing respective Windows API functions. The gist of the second method is in addressing this private NET Core method calling the respective Windows API function.
The third method uses the concept called Reflection. For example, I can use the get-member command in Powershell to get a list of types and their methods currently available to me in the Powershell session.
The topic of today's conversation is adding Windows API functionality to the Powershell. Let's discuss when and why it might be necessary and how to actually do that.
First things first, the Powershell is a command line management tool, based primarily on the .NET framework. Most of the operations in Powershell are performed using so-called cmdlets - command line instructions, utilizing underlying .NET framework functionality. If the capabilities provided by cmdlets are insufficient for solving a specific task there's another option: using the add-type command to add .NET Core types to the Powershell session. .NET Core type is the equivalent to a class, the way it's defined in classic OOP - it contains methods, properties, events, constructors, nested types - all together in .NET terminology it's called type members.
All the functionality provided by the .NET framework within a Powershell session is organized as the following hierarchy:
The uppermost level of the hierarchy is called AppDomain that somewhat represents the context of the current Powershell session. The AppDomain serves as a container for Assemblies - collections of modules and types. A module is typically associated with a specific DLL. Modules respectively include .NET types (classes) and their members - methods, properties, etc.
Theoretically, Powershell was designed with the expectation that its cmdlets (or in a pinch, .NET core classes added via add-type) would be sufficient to cover all the necessary functions, regarding managing the system, automation, etc. In other words, using native Windows API functions in it isn't a particularly usual situation and here's why:
Since the inception of the .NET framework, all the code executed on Windows systems can be divided into two categories: managed code and unmanaged code.
Managed code is the code written on one of high-level languages, supporting the paradigm of CLI (common language infrastructure) It gets compiled into the intermediate code (CIL) that is executed in the controlled environment of CLR (common language runtime) - a virtual machine that converts this intermediate language into the machine code during execution or in other words, "Just in Time" (JIT).
The benefits of the code execution in the controlled environment are higher security, more efficient distribution of memory, automatic garbage collection, an ability of programmers to work on a higher level of abstraction, and potentially, an ability to port code from one platform to another, without hardware compatibility problem.
The unmanaged code is represented by the libraries, compiled to be executed on the specific hardware. For example, third-party libraries. Also, in our case, Windows API functions imported into the Powershell environment also can be regarded as the unmanaged code.
Technically, all the architecture of the .NET framework has Windows API as an underlying layer. So any calls to the .NET Core functions eventually lead to addressing Windows API functions. And despite the wide range of possibilities provided by the .NET framework, sometimes it's necessary to address more low-level Windows functionality. For example, using NetSessionEnum API to remotely enumerate active sessions on machines in the local network.
So, when Powershell cmdlets and .NET types don't provide functionality, necessary to solve the task at hand, it makes sense to tap directly into the mighty treasure trove of powerful Windows API functions - after all, everything is eventually built upon them.
There are three ways to do that. The first one is using the add-type method. Similarly to how a .NET Core function can be added to the Powershell session using add-type cmdlet, the same cmdlet can also add an unmanaged piece of code (API function) from one of the core Windows libraries.
In order to do that, the system uses P/Invoke functionality. P/Invoke is a standard way of including unmanaged pieces of code into the programs designed to be executed in the managed environment (CLR)
In the following example, we'll add the Windows API function ShowWindowAsync to the current Powershell session.
Adding a Windows API function into the Powershell environment requires calling add-type cmdlet with the function's signature as a parameter. The function's signature is a description of the function, starting with specifying its location ( for example [dllImport("user32.dll")] tells that the function is located in the user32.dll library) followed by the function's name and the list of its parameters. The types of parameters are declared as C# equivalents of the respective native Windows types. An important thing to remember is that the C# types of arguments declared in the Windows API function's signature should be compatible with the respective C/C++ types.
In our example, the add-type cmdlet returns a reference to the automatically created class, containing our Windows API function as a class method. If we assign this returning value to a variable we can then call our function as a method of the class represented by this variable. It's worth noting that functions added this way can be declared as static or dynamic methods, which determines how they can be involved. For example, to call the dynamic method, it's necessary first to create an instance of the class containing this method.
The downside of adding Windows API functions through add-type is that dynamically added pieces of code are compiled in the runtime. As a result, the compiler leaves pieces of code on the hard drive, which in certain cases is considered unacceptable in terms of security. So, there are other ways.
The second way of adding the Windows API function to the Powershell is referring to the private type, calling this function. Here we need to remember that most of the .NET framework functionality is based on the underlying Windows API functionality. So naturally, there exist .NET private types, addressing respective Windows API functions. The problem is that because these types are not publicly exposed they cannot be called directly. The gist of the second method is in finding the private .NET type calling the respective Windows API function. The private type, addressing the necessary API function can be found using the .NET reflection method as in the following example.
Here we traverse through all the assemblies, loaded within AppDomain, (in our context it's the current Powershell session) check all the modules in these assemblies and find the private .NET type, calling the function we need, in our case MessageBox. In our example we use System.Reflection.MethodInfo object as an output; after the method is found, we can address it simply using the Invoke method of the MethodInfo object. The advantage is that, as opposed to using add-type cmdlet, this method doesn't require compiling C# code on the fly, so the program doesn't leave any traces.
The third method uses the concept called Reflection. The key idea is that I always can check what classes and methods are available in the current environment and then, based on that information, generate the code "on the fly." To add the Windows API function using this method it's necessary to define a dynamic assembly within current AppDomain. (Powershell session) Then define a dynamic module within the assembly, and finally, within that module can be defined a dynamic type that will contain dynamic method, the implementation of which we will import as the respective Windows API function. To do the latter we use the class Runtime.InteropServices.DllImportAttribute, specifying that the implementation of our dynamic method will be loaded from a certain Windows core DLL, such as "kernel32.dll, for example.
Final thoughts. Well, Windows API is something that was at the beginning, and many of us, now working with .NET, and C#, and whatnot, sometimes miss those unlimited options and possibilities that came with that low-level programming - using API functions, manually operating memory, creating unusual and unique program interfaces using GDI, and so on. So, from my point of view, sometimes it's just cool to have an option to tap into the raw power of core Windows functions, and the ability to use them in Powershell is definitely a good thing.
Ok, If I wrote an instruction on how to include Windows API functions into the Powershell session - and make them operable, and callable and, all that - what would be the important points and nuances I would need to stress?
Ok, step by step, starting with the first simplest method of using Add-Type cmdlet. The first thing to remember is when I add something in the context of Powershell it has to be included into the .NET hierarchy - the one that includes the AppDomain on the top, representing the current Powershell session, inside which located assemblies containing modules; modules, in turn, include types and members. Types in .NET are the same thing as classes in OOP and their members are represented by class methods, properties, fields, events, nested types - in other words, a combination of code and data, embodying the underlying theoretical principle of encapsulation.
Speaking of methods, they can be static and dynamic - static can be called directly without instantiating the class, as opposed to the dynamic that can only be called as methods of a specific instance of the class - an object explicitly created, initialized, and placed somewhere in memory.
Also, there are several nuances, regarding Add-Type parameters. Some parameters and their very presence depend on how exactly the type is added. Like, there are several methods. Some of them have to do with addressing specific dlls with their full path provided in a respective parameter and it's not what interests me here. Two methods deal with creating a new type and new member respectively. What is remarkable, when Add-Type is called in the mode of adding a new member (with the - MemberDefinition parameter) it can use an additional parameter, specifying the namespace. This parameter is not used when the command is called to create a new type (using the -TypeDefinition parameter).
Probably, it has to do with the fact that whether you add a member or type this new unit has to be incorporated into the existing .NET object hierarchy. With the new member, an interesting question is what type it's going to be assigned to? (It's an especially relevant question, considering that it will determine what data encapsulated in that type (class) will be accessible to this member. (method)
Ok, several other moments. When we deal with Windows API functions we use -MemberDefinition parameter i.e. add this piece of code as a member of some class (type). The respective class (type) is dynamically generated, and the Add-Type command returns the reference to this new class, which allows us to call our added member or whatever. As a matter of fact, it's not the default behavior of the Add-Type command - returning this reference to the created type - we add a special -passthrough parameter to make it happen.
An interesting question here what is the name of this newly automatically generated class - we don't specify it anywhere, plus there's no parameter where it can be specified. (Or maybe I'm mistaken; I need to check)
Another thing to keep in mind is that in most cases - or at least when we create new meta-structures through Add-Type with -TypeDefinition and -MemberDefinition parameters we supply a piece of the raw C# code (or more generally, a piece of code written in a language defined by -language parameter - by default it's C#) that gets compiled on the fly. The result is a piece of code in the intermediate language (CIL) that are stored somewhere on the hard drive and then later compiled into the actual code by the runtime environment. (CLR)
Another moment is that when we add Windows API functions this way we involve so-called P/Invoke functionality - a set of mechanisms designed to organize the inclusion of pieces of unmanaged code into overall managed C# program. The way I understand it, when we add an instruction like DllImport(["user32.dll"]) we implicitly use the P/Invoke mechanism. One of the key problems it solves is marshaling the parameters. Like, C# and C++ use fundamentally different approaches to reserving memory for parameters and variables, so when I need to pass the parameters to an unmanaged Windows API function written in C++ I actually cannot pass them directly. Instead, I use some trickery called marshaling, which, as I understand, bridges the gaps and inconsistencies, regarding those different paradigms of memory management in managed and unmanaged (legacy, wild, unhinged, C++) code. It's even more complicated when I need to get some output, or I pass callback functions, or I pass some parameters by reference, hoping to receive the output from the function through them. But anyway, more on that later.
A couple of notes. In the case when a piece of C# code I pass as a -MemberDefinition parameter is a reference to the Windows API function stored in one of the core Windows dlls it's called C# function signature. All the parameters in this function declaration are defined as C# types, so an important point here is to use C# types that are actually compatible with the respective C++ types used by Windows API functions.
Ok, the second method of adding Windows API functions to the Powershell is referring to the .NET private type, calling the respective function. As an example, here we have a piece of code that traverses through all the assemblies loaded in the context of the current Powershell session (AppDomain) Then it leafs through all the modules of each assembly, examines all the types in each module, and checks if in any of them there's a method, which name matches the name of the Windows API function we want to use.
Several notes here. The cycle is organized through the runtime. Reflection class. Reflection is a collection of methods in .NET, allowing a programmer to check what is actually available within AppDomain of the current C# application. (In the context of Powershell it's what is loaded within the current Powershell session - assemblies, modules, types, and their members)
So Reflection can be used in two different ways: First, it allows to examine the existing object structure, for example, learn what methods (members) some class (type) has. Or it can be used to actually create new elements of this structure - like, it allows to create assemblies, modules, types, and members dynamically. This is what we actually do in the third example.
A couple of side notes. When in the second example we traverse through the nested structures we use pipes to channel the output of one (For-Each) command into the input of the next command. The following command deals with the elements nested within elements comprising the current stream. For example, the list of assemblies, returned by the first command, is piped into the input of the command extracting modules, and so on. Also, this code uses $_ as a reference to the current object, so when we apply For-Each to a collection it's automatically assigned all the elements of the collection sequentially while we enumerate them.
Some things that remain somewhat vague. For example, when we add a Windows API function as a new member (class method) via -MemberDefinition, the runtime environment automatically creates a type (class), attaching this newly created member to it. But what about module and assembly. How are they getting chosen, and is this process somehow controlled or not? Also, the namespaces in C# - how does this concept coexist with all the structural hierarchy? In other words, what particularly the namespaces are here for? Like, we have assemblies, and, for example, if there are functions with the same names in different assemblies they won't be mixed up, I guess. Also, this parameter isn't present when I use the Add-Type command with -TypeDefinition parameter, which makes the situation somewhat even more mysterious.
Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!
Reply !stop to disable the comment. Thanks!
Thank you =)
Congratulations @kitsunesama! You have completed the following achievement on Steemit and have been rewarded with new badge(s) :
Award for the number of posts published
Click on the badge to view your Board of Honor.
If you no longer want to receive notifications, reply to this comment with the word
STOP
To support your work, I also upvoted your post!
Do not miss the last post from @steemitboard:
SteemFest³ - SteemitBoard support the Travel Reimbursement Fund.