When you are dealing with a giant datasets, Python is your best friend. I was tasked to find the unique records in a column in a ten million record dataset. For smaller datasets, its really easy to do it in excel this way:
(1) Select a column, go to Data tab and click remove duplicates
It was an eventful day where i got a request to review a 10 million row CSV. Tried to use excel in mac to open this giant file but excel couldn't even open the file completely:
So what do i do? Like every sane person would do , i decided to use my python skills to split up the file into chunks to save the manual effort. Then it just flashed on my mind. I am just lazy.
As a Python developer, we would be dealing with a lot of old python 2.x code which we would like to convert it to Python 3. Python by default provides us a nice command line utility called 2to3. You can read more about it here.
But let us say we want to use Pycharm to do that for us. All we need is few configurations using Pycharm’s External tools.
Here are the steps:
(1) First step is to find where the 2to3 utility is located in your system, o find that we can use the ‘which’ command as follows:
As a Python Developer working on shared codebases, its always beneficial to generate a requirements.txt file for your projects, so that the dependencies can be managed better.
How do we do this without spending too much time?
An easier way is to generate the requirements.txt is by generating the file from pip itself like this:
pip freeze > requirements.txt
pip3 freeze > requirements.txt
But is there an even easier way to do it and list the dependencies of each python file? Pigar to the rescue.
We can install Pigar by running:
pip install pigar
There are times in your life where you think of this :
‘Things happen for no reason at all’
It all started on a precarious day where I wanted to get things done really quickly. Even when the entire county was shut-down amid a pandemic, there was optimism flowing through my veins to get my daily work done. I am not a religious person but I do have a deja-vu when the universe sends me a sign.
I turned on my laptop, authenticated the mac with a tip of my finger — And then it began — When I opened…
As software engineers, we might run into a use case of running a batch job to fetch data from a resource using an API with a bulk dataset from time to time. The traditional approach to solving this would be to write a simple Shell script or a quick python request script to read data from a CSV, frame the API request, and call them repeatedly in a loop.
Is there a way to simplify this even more? The answer is ‘Yes’.
Postman has this tool called Collection Runner where you can repeat a particular collection of requests ’n’ number…
A while ago, I used a fancy Reporting plugin for my tests and it looked great on my local machine. I was so happy seeing it and executed my tests on a Jenkins machine. With full positivity, I clicked on HTML reports to view the results on the machine and was shell shocked to see all those beautiful CSS was stripped away :
So that happened and i dug into Jenkins documentation and found the culprit — its the default content security policy.
The default rule is set to:
sandbox; default-src 'none'; img-src 'self'; style-src 'self';
This ruleset results in…
A Few days ago, I was trying to test the Site Speed of a certain page and needed an easier way to Clear cache and reload the page instead of multiple clicks. It seems there is a built-in option in Google Chrome to do this:
Distinguished Automation Test Engineer, Father and the Lazy