Użycie Pythona w Excelu - praktyczny przykład (film, 19 minut)
Leila Gharani w swoim najnowszym wideo omawia ekscytujące nowe możliwości Pythona w Excelu, które są dostępne w wersji Office 365. To przełomowe wprowadzenie z pewnością przyciągnie uwagę wielu użytkowników Excela, nawet tych, którzy nie są programistami. Leila podkreśla, że umiejętność korzystania z tego narzędzia może znacząco ułatwić codzienne zadania związane z analizą danych. W najnowszych aktualizacjach Excela pojawiła się nowa sekcja w zakładce formuły, w której użytkownicy mogą wstawiać własne formuły Python oraz eksplorować predefiniowane przykłady. Odpowiedzi na często zadawane pytania dostarczają konkretne przykłady, jak efektywnie pracować z danymi za pomocą tego języka programowania.
W pierwszym przykładzie Leila pokazuje, jak można załadować dane do Pythona, aby uzyskać zestawienie danych w formie tzw. DataFrame. W tym kontekście wyjaśnia, jak używać skrótów klawiszowych, takich jak Control+Enter, aby uruchomić skrypty. Dzięki tym nowym możliwościom, użytkownicy Excela mogą uzyskiwać wgląd w dane, korzystając z funkcji takich jak "describe", co daje szybki dostęp do statystyki danych. Zobaczymy, jak prosto można podsumowywać dane według kolumn wchodząc w głąb możliwości analizy danych z Pythona.
Kolejnym krokiem jest wyjaśnienie, jak grupować dane, by uzyskać sumy sprzedaży na podstawie daty. Leila pokazuje, jak zrealizować to zadanie z użyciem funkcji "groupby" oraz jak dynamicznie zmieniać zakresy obliczeń. Możliwość edytowania zaledwie jednego wpisu w tabeli sprawia, że cała analiza natychmiast się aktualizuje, co znacznie zwiększa efektywność pracy. Leila używa również wykresów, aby wizualizować dane bezpośrednio w Excelu, co przyciąga uwagę wszystkich, którzy pracują z wizualizacją danych.
W drugiej części wideo, Leila przedstawia, jak można korzystać z bibliotek Python, takich jak pandas, by zrealizować bardziej złożone operacje na danych, w tym wyciąganie URL z tekstu. Prezentuje to jako przykład, jak dynamicznie można przetwarzać dane, z wykorzystaniem wyrażeń regularnych. Tu Leila wskazuje na elastyczność Pythona i jego zdolność do integracji z Excellem, co poszerza jego możliwości jako narzędzia do analizy.
Na koniec wideo Leila podsumowuje wszystkie nowości i zachęca widzów do wyrażenia swojej opinii w komentarzach. Warto również zwrócić uwagę na statystyki tego wideo, które na dzień pisania artykułu osiągnęło ponad 1,8 miliona wyświetleń oraz prawie 38 tysięcy polubień, co świadczy o dużym zainteresowaniu i entuzjazmie społeczności wobec nowego narzędzia, jakim jest Python w Excelu. Jest to z pewnością początek nowej ery w analityce danych przy użyciu Excela i Pythona.
Toggle timeline summary
-
Wprowadzenie i ekscytacja związana z Pythonem dla Excela.
-
Wprowadzenie do integracji Pythona w Excelu.
-
Omówienie aktualizacji Excela, które poprawiają funkcjonalność.
-
Rozwiązywanie obaw dotyczących umiejętności programowania.
-
Zachęta do oglądania całej demonstracji.
-
Lokalizacja Pythona w interfejsie Excela.
-
Wchodzenie w tryb Pythona i tworzenie niestandardowych formuł.
-
Wysyłanie zakresów danych z Excela do Pythona.
-
Praca z ramkami danych w Pythonie.
-
Zrozumienie ramek danych i ich struktury.
-
Wprowadzenie do biblioteki pandas do analizy danych.
-
Używanie metod Pythona do uzyskiwania spostrzeżeń z danych.
-
Opisywanie szczegółów statystycznych danych dotyczących produktów.
-
Zgłębianie spostrzeżeń dotyczących konkretnych kolumn danych produktów.
-
Tworzenie nazwanych odniesień dla ramek danych.
-
Używanie pandas do podstawowych funkcji Excela takich jak suma.
-
Grupowanie i agregowanie danych sprzedaży według dat.
-
Zmiana częstotliwości agregacji danych z dziennej na miesięczną.
-
Dynamiczne aktualizowanie obliczeń w Excelu.
-
Tworzenie wizualizacji bezpośrednio z danych.
-
Omówienie logiki obliczeń w komórkach Pythona.
-
Reorganizacja zestawów danych za pomocą funkcji melt w pandas.
-
Używanie funkcji zapytania do skutecznego filtrowania danych.
-
Ekstrakcja adresów URL z tekstu komórek, zawierających różne formaty.
-
Łączenie Pythona z Zapytaniami Power dla dynamicznej analizy danych.
-
Podsumowanie i zaproszenie do opinii od widowni.
Transcription
Oh my God! Okay, it's happening. Everybody stay calm. Everybody stay calm. So Python for Excel is here and it's ready to test in the better channel of Office 365. Excel has been getting a lot of updates, but this, this is going to put it in a different league. So you might be thinking, that's not for me. I'm not a programmer. I'm not a geek. I just do my daily Excel stuff. I don't need this. Wait until you see what I'm going to show you. And I recommend that you watch the whole thing and take some time to digest it because I didn't want to give you a stroke. So I sprinkled all the cool stuff throughout the video. So sit back and let's start from scratch. Okay, so when you open up Excel and you have Python, you are going to find it in the formulas tab. There is a Python section here. You can get started by inserting your own custom Python formula or exploring Python samples and then start by taking this tour. But we are going to get started together right now and I'll show you how this works. The first thing we need to do is get into Python mode. One way we can get there is by going to insert Python and then inserting a custom Python formula. I keep your eye on the formula bar and see what happens when I select that. The box changes into a Python box. Now, another way you can get in Python mode is by just typing in equals py. This function creates Python formulas. The moment I press tab, I'm already in Python mode. What happens now? Well, let's say as our first example, we want to get some insights from this dataset. If you're working with data, the first step you need to do is to send your data to Python because it needs to be able to see it to work with it. The way we can do that here is by just referencing the cell. I'm just going to select it, use the keyboard shortcuts, Control Shift and the down arrow key to select the whole range. And notice what we get in the formula bar. We have our data range, headers equals true. That was automatically picked up and everything is wrapped inside an Excel function. Now, when I press enter, nothing happens. What enter does is it adds new lines to my Python script because Python scripts could get long. So it makes it easier to work with these scripts. So for me to be able to run this, instead of enter, I need to press Control Enter and I get what is called a data frame. If you know Python, you know what this is. If not, it's basically a condensed version of your table. It's a two-dimensional data structure that's all sitting in a cell. Now, take a look at this. Here we have a dropdown. What is being returned here is a Python object that's our data frame, but I can switch my view and return Excel values. When I do that, I get my data set back, but this takes up a lot of space. So I'm going to switch it to a Python object and get my data frame. Now you can think of this data frame as an Ikea table, right? When you buy a new table from Ikea, it's all condensed in a single package. If you want to get a glimpse of what's inside, you can click on this card and you get to see the beginning rows of your data set and the ending rows. This data frame is a fundamental structure of a specific Python library that's called pandas. The pandas Python library is great for data analysts and you need to know the name. I know it sounds weird, but you need to know it if you want to look up other functionalities in Python's library. But I'm going to explain more about libraries and pandas later. What I want to show you right now is some of the cool things you can do here. So for example, I'm going to go to another cell, go to Python mode by typing in equals py. I'm going to reference my data frame and then use the describe methods. I'm just going to type in dot describe. So I'm in Python mode now, not Excel mode. I'm going to open and close bracket, press Control Enter to run this, and I get a data frame back, but I want to see the Excel values. So I'm going to switch my view to Excel values and I get to see some information about my data set. So I have the quantity and sales columns here. This is the count of each column, the mean standard deviation, and so on. Now, if you take a look at our data set, we have date, product, quantity, and sales. Now let's say I wanted to describe the product column. I'm just going to go and adjust my function. So right after I reference the data frame, which is the data set here, I'm going to put in my column products. The way you reference columns is by opening up the square brackets, similar to referencing Excel tables, and then typing in, in quotation mark, the name of the column headers, in this case, product, and I'm going to close the square bracket, press Control Enter, and I have some information about products. So I have 96 products, 15 are unique. The top is laptop bag red because that occurs 15 times. It's a quick way of getting insights from your data. Now, if you're planning on using the data that you uploaded to Python and doing different types of calculations, it helps to give it a name. This way, you don't always have to reference the cell. The way you do that is by going to your first data frame and then just typing in any name, as long as it doesn't have spaces in it. So I'm going to call it df for data frame, equals this data set. So when I run this, right now, nothing changes. All I've done is just give this a name, but when it comes to do other types of calculations based on this data frame, I no longer need to reference it by referencing the cell. Instead, I can just type in df and that's it. Now talking about shortcuts, another shortcut that I like is to avoid these square brackets. So whenever I can avoid them, I will. And the great thing with the pandas library is that it transforms your headers into attributes. So as long as you don't have spaces in the headers, you can actually type them out like this. I can mention the data frame dot the header dot describe. And when I run this, I get the same thing. But if they happen to have spaces, you need to use the square brackets. Now there are lots of functionalities in the pandas library. So some simple ones that you're familiar with in Excel is for example, the sum or the average function. So let's say I wanted to use Python and sum the sales column. I'm going to go to Python mode, but this time using my shortcut keys, Control-Alt-Shift-P. Now this sounds like a complicated shortcut, but it's actually easy to reach for on the keyboard. Then I'm going to mention the data frames. I gave it a name, it's df dot. I want to sum the sales columns. I'm going to go with dot sales, doesn't have any spaces. I can reference it in this way, dot sum method, run this, and I get the total of my sales column. So let's just double check this, that's 17, four, five, 10. So if I sum this, 17, four, five, 10. Now, if I want to get the average, I'm going to use mean here and that's the average. So all of this you can do with simple Excel formulas, right? Why do we need Python? So let's take this a step further. Let's say I want to get my total sales values for each date. Notice my dates are repeated, I want to aggregate them for each date, similar to what you would do in a pivot table. I'm going to go to Python mode, then reference my data frame, which was df dot group by, I want to group everything by the date column, and I want to have the total values from sales. I'm going to go with dot sales, dot sum, and I'm going to press Control Enter, and I'm going to get my series returned as a Python object. But if I want everything as Excel values, I'm just going to spill them, and I have the total sales for each date. And if you don't believe me, let's just double check this. So this is for the 1st of June, and I have these four numbers here. That's 9680, that's 9680. Now, what if I changed my mind and I want everything on a monthly basis? Well, check this out. I'm going to go inside group by, and I'm going to use a class called pd dot grouper. My key equals the date column, and my frequency equals, I'm going to put m here and close the bracket, run this, and I get my sales on a monthly basis. So total sales for June should be 38060. Let's just double check this, and we have 38060. Okay, you might say, well, a pivot table can do this. The great thing about this is that these are formulas. So if I change this value to, let's say, 6000, so keep an eye on the June value here, and I press enter, all of this refresh immediately. I'm just going to press Control Z to go back. Something to keep in mind is that you always get the last expression returned. So you could be writing a lot of code, but whatever you end up seeing in the cell is based on the last expression. Let me show you something even cooler. I'm going to give this a name. I'm going to call this for chart, and we are going to plot a chart in a cell. So now I've called this series for chart, and I want to plot this series. I'm going to do for chart dot plot. I want to plot a line chart. My x-axis is going to have my dates, which are in the date column. My y-axis is going to have my sales values, and the kind of chart I want to plot is a line chart. I'm going to run this, and I get a tiny little line chart in a cell. You probably really can't see this. So let me go and merge the cells, and we get to see our monthly sales values. How cool is that? If I wanted an area chart, I'm going to just change this to area, and I have an area chart. If I wanted frequency to be on a weekly basis, I'm going to change this to W, and I get my weekly sales data shown as an area chart. I prefer a line chart, so I'm going to change that. So notice that the last expression is the one that plots the chart for me, and that's what's returned. Now, if you're wondering how I'm coming up with this syntax, how do I know these classes and the methods? Well, there is great documentation on this in the Pandas library. You have lots of information here on this webpage to help you get started, and lots of code to play around with. Plus, you've got AI. There is Bing Chat and ChatGBT that can help you when you get stuck. Just double-check your results to make sure it's what you want. Now, before I move on, and I show you some even more cooler stuff, there is some important thing to keep in mind, and that's the calculation logic of Python cells. Python cells calculate from left to right and top to bottom. So notice here. First, I've created the data frame, and here I'm referencing the data frame. I'm on the right side. Here, I'm below it. But if I copy and paste this here, I'm going to end up with hash value. I get diagnostics pop-up here, and it tells me namedf is not defined. That's because Python cells calculate from left to right and top to bottom. I would have to bring my data frame here for everything to calculate fine. So I'm just going to demonstrate that. I'm going to Control-X and Control-V this, and now everything calculates fine. Now, let me show you some cool examples from the default libraries that we have. So here, I'm using pd.melt function to restructure this dataset. So I have quantity and cells in two separate columns. When I use pd.melt, I can bring them in the same column. So this is similar to unpivot in Power Query. And look at the formula bar here. It's very simple to write this. I just need to mention my non-value columns, which are date and product, and my value columns, which are quantity and cells. And it does the whole restructuring for me. Now, another cool one is query. Take a look at this. I'm going to go in Python mode, reference my dataset, which is df, right? That's my data frame that I uploaded to Python, and I'm going to use query. Then I'm going to put this in. What this does is it takes a look at this sales column. It grabs everything that's above 2,000 and takes a look at my product column to see if anything contains the word black. So this is similar to the query function in Google Sheets, which is also similar to the syntax from SQL. When I run this, by default, I get a data frame back, but I'm going to show the Excel values, and I get everything that has the word black and sales is above 2,000. Now let's move on to something else. We can create charts like this. I have small multiples returned as a single object. Take a look at this one. This is a bit more complicated, but I also wanted to show you that you can write longer scripts and get Python to do what you want. So here I just have dates and students from 2018 to 2022, and what I wanted to do here was this. I wanted to return the last four years, and I wanted to arrange the charts beside one another. I also wanted to make sure that the y-axis is showing the same values, right? So that I can easily compare each of these with one another. Now I've even added labels to this, and you can do all of that in Python here. You can set the title, set the label, and so on. And yeah, I didn't do all of this on my own. I didn't know I had to flatten the axis, so I asked ChatGPT for help, and it helped me get what I wanted right here. Now this is my favorite part of the video. What we're going to do is this. We have an Excel table, and we have some feedback in each row. What we want to do is grab any URLs that we have inside each row. The tricky part is that they might be in the middle of the text, and the pattern isn't so obvious. So sometimes they start with www. Sometimes we don't even have that. They just end with .com. Sometimes we have HTTPS, and some might just have HTTP. Some might not even have a link. What we want to do is just grab any URLs in there. So here you can see we're grabbing both of these in the cell, and then we're omitting this one because it didn't have any links. Then we have this and this. How can we do this? The great news is you can import any Python libraries or modules that might not be there by default. So the ones we used are there by default, the pandas library for data analysis, matplotlib for data visualization, but Python has so many different libraries and modules. There are ones for AI, for web, for regular expressions. So this RE that I'm importing here works with regular expressions. All I have to do is go to Python mode and import the library or module, and I've returned RE loaded. That's just because I want to see something in the cell aside from none, which is what I would see. If I don't have this, everything still works, but I'm going to see none or zero in the cell, right? So I'm just going to press Control Z to go back. Now, after I imported the library, what I did is create a data frame based on the table. And the way you do that is similar to before, except you just have to make sure that you have the right referencing for the table, that you are including everything. So I have hash all to make sure that I'm also including the header. Now, when I hover over the data frame here and click on the card, I can see that the column header is a part of my data frame. Now, the next step is to write some Python code to get this done. What is done here is that I've combined the patterns using this. This is based on the patterns that I can see in my dataset. This makes sure that anything that ends with .com is included, that anything that has HTTPS, HTTP, or www is also included. Then there's a function here that grabs the URLs. And this function is run on each single cell. So on each single text that I have in my data frame. Now, you can also see the syntax here. Wherever you come across the hash sign, this means that this is a comment. This is a good way of also testing your code. You can comment things out to see what is the last expression that's returned, or just add in some documentation to your code. Now, at the end here, we're just filtering out any rows that don't have any URLs. Now, the great thing is because it's referencing the table, it is dynamic. So if I go in and add another URL, so let's say youtube.com, this is going to be automatically added to my URL list here. Now, if you're wondering whether you can connect Python to your Power Queries, yes, you can. You don't need to have the data physically loaded in your workbook. It can be there via connection only, and you can create a data frame if you want based on these and use Python code on them. Now, this video is getting super long, so I'm going to leave this topic for a future video. As you can see, I'm super excited about this. It just opens up a whole new dimension in Excel. What do you think about it? Let me know in the comments. Thank you for being here, thank you for watching, and I'll catch you in the next video.