VisiData - a tool for data visualization (spreadsheets, CSV files, and more) in... the terminal (movie, or even an entire playlist)
On the channel, Saul Pwanson introduces a tool called Visidata that he developed during his sabbatical. This tool is written in Python 3 and is available as open-source. Tired of dealing with data in the terminal, he decided to create this unique tool. Visidata allows importing various file formats such as Excel, CSV, and JSON. With its easy asynchronous loading and capabilities for running data analysis, Visidata can be described as a 'Swiss Army chainsaw for data'.
In the demonstration, Saul shows how easy it is to use Visidata with a dataset of 311 service complaints from New York over the past two months. The tool offers various operations on the data such as hiding columns, splitting dates, and performing frequency analysis. Immediately during the data loading, users can see the data being loaded asynchronously, which enhances the efficiency of working with data.
Next, Saul illustrates how to analyze data over days and how to split dates into weekdays. With the help of Python, users can easily create additional columns such as the weekdays, which allows for quicker insight into the patterns at play. The demonstration of the tool continues to take place in the terminal, emphasizing its functionality and flexibility.
Later in the presentation, Saul conducts data visualization analysis, highlighting the ability to use geographic coordinates. He creates plots of complaints about rats or mice on a map of New York, demonstrating the effectiveness of Visidata in visualizing the location of complaints. The program also offers the ability to save results to a CSV file, making it convenient and practical to use.
Finally, Saul informs his audience about the upcoming release of Visidata version 1.0, which will debut right after the conference, further sparking interest in this tool. At the time of writing, the video has received 37,116 views and 845 likes, indicating a growing interest in this new data analysis tool. It’s worth following this channel to stay updated with technological trends and tool updates.
Toggle timeline summary
-
Introduction and greeting.
-
Inquiring about audience experience with tabular data.
-
Asking if anyone prefers using the terminal.
-
Sharing frustration with traditional command-line tools.
-
Introducing Visadata, a new data tool.
-
Describing Visadata's features and supported formats.
-
Demonstrating loading a dataset of 311 service complaints.
-
Explaining how to manipulate data, such as hiding columns.
-
Performing frequency analysis on data.
-
Interpreting results from histogram data.
-
Using Python to derive the weekday from date.
-
Visualizing data points on a map.
-
Analyzing complaint types, particularly rodents.
-
Showing how to zoom in and filter data on the map.
-
Explaining the workflow of saving data as CSV.
-
Discussing extensibility and custom functions in Visadata.
-
Announcing the upcoming release of version 1.0.
-
Inviting audience for further demonstrations and assistance.
-
Closing the talk and thanking the audience.
Transcription
You're good to go. Hello. So has anyone ever had to use any kind of tabular data? By a show of hands. OK, a couple people. Does anyone prefer to use the terminal? And has anyone ever used grep, cut, awk, sort, unique with a CSV file? OK, right, exactly. Well, I did that way too much, and I got kind of fed up with it. So I made a tool called Visadata, which I've been working on for the past year on my sabbatical. It's written in Python 3. It's open source. It's in PyPy now. We're working on getting it through an app. It supports a whole lot of formats, Excel files, and CSV files, and JSON. It's extensible via Python 3. And it's kind of like a Swiss Army chainsaw for data. And this is a lightning demo. So I have a data set of two months of 311 service complaints from New York. And so it's loading it now. You can see it's loading it asynchronously down there in the corner. You can do all kinds of fun stuff. For instance, you can hide columns with a minus key. And if you wanted to, for instance, we have a created date. Let me split this based on, it's got a date and time. I just want to see the date. So I just split that. And I've got another column here for just the date. If I wanted to do a frequency analysis on that date, for instance, I would just press Shift-F, and here we go. And it tallies it asynchronously. And then if I hide these columns, you can actually make the sky a little bigger here. And you can see the stuff. If we sort by the date, then you can see basic patterns with this simple histogram. And so for some reason, there's a cycle of seven days. I don't know. Let's see here. So if I type this date column as a date, I'm going to rename it to create date. And then I can use that as an expression, or in an expression, to make another column. So let's say I just wanted to get the weekday. This is just Python here. And so you can see the weekday. And that was just the date time we just converted to weekday there. And so you can see, OK, right. It's day five and six, which happens to be Saturday and Sunday. That makes sense. So if I quit out of this then, and I want to do some simple visualizations of this, let's say, there's a bunch of different fields in here, like the latitude and longitude. So let me make these just floats here, because it turns out that latitude and longitude, even though they're on a sphere, you think of them as xy-coordinates. So I'm going to make the longitude be an x-coordinate. And actually, I'm going to go get the borough here. I'll make that be a key column, so you can get a little bit of color. And I'm just going to plot that with dot. This is all in the terminal still, by the way. And so I'm not sure if you guys have ever seen a map of New York. These are the complaints. Thank you. If we wanted to see all the things we had done to get to this point, we could press Shift-D to get to a command log. And this is all the things that we did in order. If we wanted to, I could actually save this off and replay it later. That took 20 seconds earlier, so I'm not going to waste that time right now. But if we wanted to do some more fun stuff with this, let's say this is a simple map of just the complaints. Let's do some more things here. I'm going to keep the borough out of there. And oh, here we go, complaint type. So I'll do a frequency analysis on that. And after that's done here, let's find something interesting. Oh, it's rodents. That's always fun. So I pressed Enter just to go to all the rodent complaint types. Let's see whether rat sightings or mouse sightings or whatever is the exact same plot here. And so here's the map of those. And if I want to, I can turn those layers on and off with the numbers here so you can see only the mouse sightings or only the rat sightings, et cetera. Bounce back and forth between them if you want to, et cetera. And just because I think it's fun, you can use the mouse to zoom in, for instance, on Manhattan. And then you can select a region and just press Enter to go to just those rows for that. And then, thank you. And then you can save those off as a CSV file. And that's the very simple workflow with Visidata. Like I said, it's extensible with Python. You can do all kinds of expressions. You can create a Visidata RC with custom functions in there. I have a gender function in there. So anybody's first name, I can just make a column with their supposed gender based on their first name. It's just a great tool if you're ever doing data and just want to do something really quickly before you even know what's even in there. So we're working on releasing 1.0, actually, right after this conference next week. And so if you're interested in this and using this, I would love to give you either a more hands-on demo or show how we could help you. So please hit me up after the lightning talks. Anyway, thank you very much. Thank you.