Show code
#import libraries
import numpy as np
import pandas as pd
import polars as pl
import duckdb as db
import datetime as datetime
import json
import plotly.express as px
import folium
from folium.plugins import FullscreenIssues and Figures, 2019-2024
Jesus LM
Jun, 2025
Crime in Mexico City presents a complex and evolving challenge. The city, a sprawling metropolis, grapples with a range of criminal activities, from petty theft and street-level drug offenses to organized crime and violent acts. Factors contributing to this multifaceted issue include socioeconomic disparities, corruption, and the influence of transnational criminal organizations.
It’s important to recognize that crime trends are dynamic and influenced by various factors. Continued efforts are necessary to improve public safety and address the root causes of crime in Mexico City.
While efforts have been made to improve security through increased policing and community-based initiatives, persistent issues such as impunity and a lack of trust in law enforcement continue to hinder progress.
Addressing crime in Mexico City requires a multifaceted approach that includes improving data collection and analysis, strengthening law enforcement, and addressing the underlying socioeconomic factors that contribute to crime.
Analyzing crime trends in Mexico City requires considering both official statistics and the lived experiences of residents, as well as the interplay of local, national, and international forces.
Ongoing research and policy development are crucial to addressing the root causes of crime and fostering a safer environment for all.
Duckdb is a powerful tool for data analysts and developers who need to perform fast and efficient analytical queries on large datasets, especially in environments where simplicity and portability are crucial.<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1415763 entries, 0 to 1415762
Data columns (total 22 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 anio_inicio 1415748 non-null float64
1 mes_inicio 1415748 non-null object
2 fecha_inicio 1415748 non-null object
3 hora_inicio 1415748 non-null object
4 anio_hecho 1415343 non-null float64
5 mes_hecho 1415343 non-null object
6 fecha_hecho 1415342 non-null object
7 hora_hecho 1415353 non-null object
8 delito 1415749 non-null object
9 categoria_delito 1415749 non-null object
10 sexo 1168059 non-null object
11 edad 931053 non-null float64
12 tipo_persona 1408196 non-null object
13 calidad_juridica 1415748 non-null object
14 competencia 1415749 non-null object
15 colonia_hecho 1340796 non-null object
16 colonia_catalogo 1323317 non-null object
17 alcaldia_hecho 1413284 non-null object
18 alcaldia_catalogo 1372214 non-null object
19 municipio_hecho 1413284 non-null object
20 latitud 1340994 non-null float64
21 longitud 1340994 non-null float64
dtypes: float64(5), object(17)
memory usage: 237.6+ MB
We can see that the original dataset includes 1,415,763 rows.
However, there are many null values in different fields for considerable gaps.
As the dataset can be considered large, we decided to drop all null latitude rows, obtaining 1,340,994 rows.
Due to the missing rows in many fields, we kept around with 94.72% of the original dataset
Polars is a modern DataFrame library that prioritizes performance and efficiency. Its Rust-based architecture, combined with features like lazy evaluation and parallel processing, makes it a powerful tool for data professionals.Schema([('anio_inicio', Float64),
('mes_inicio', String),
('fecha_inicio', String),
('hora_inicio', String),
('anio_hecho', Float64),
('mes_hecho', String),
('fecha_hecho', String),
('hora_hecho', String),
('delito', String),
('categoria_delito', String),
('sexo', String),
('edad', Float64),
('tipo_persona', String),
('calidad_juridica', String),
('competencia', String),
('colonia_hecho', String),
('colonia_catalogo', String),
('alcaldia_hecho', String),
('alcaldia_catalogo', String),
('municipio_hecho', String),
('latitud', Float64),
('longitud', Float64)])
df = (
df.filter(
# drop null and 2018 years
(pl.col('anio_inicio')!=2018)
).with_columns(
# create datetime field
(pl.col('fecha_inicio').cast(pl.String) + ' ' + pl.col('hora_inicio')
.cast(pl.String)).alias('fecha_inicio')
).select(
# exclude columns
pl.exclude('anio_inicio','mes_inicio','hora_inicio',
'anio_hecho','mes_hecho','fecha_hecho','hora_hecho')
).with_columns(
fecha_inicio=pl.col('fecha_inicio').str.to_datetime()
).drop_nulls(subset='fecha_inicio')
)[{'fecha_inicio': 0,
'delito': 0,
'categoria_delito': 0,
'sexo': 237070,
'edad': 449780,
'tipo_persona': 6858,
'calidad_juridica': 1,
'competencia': 0,
'colonia_hecho': 382,
'colonia_catalogo': 17677,
'alcaldia_hecho': 3,
'alcaldia_catalogo': 953,
'municipio_hecho': 3,
'latitud': 0,
'longitud': 0}]
We can see that even if we dropped around 75,000 rows, there continue to be many fields with empty rows, such as sex, age, neighborhood, mayorship and municipality.
Age goes from 0.0 up to 369.0 years old!
We have cleaned age values by setting age less than 18 to 18, and age values gretar than 99 to 99.
| fecha_inicio | crimes |
|---|---|
| 2019 | 256,827 |
| 2020 | 204,659 |
| 2021 | 228,627 |
| 2022 | 237,659 |
| 2023 | 239,402 |
| 2024 | 173,803 |
fig = px.bar(years,
x='fecha_inicio',
y='crimes',
orientation='v',
hover_data=['fecha_inicio','crimes',],
height=500,
width=830,
title='Crimes in Mexico City by Year',
template='ggplot2',)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Crimes')),
)
fig.update_traces(marker_color='#7f0000',
texttemplate = "%{value:,.0f}",)
fig.show()| fecha_inicio | crimes |
|---|---|
| Dec, 2023 | 17,358 |
| Jan, 2024 | 18,354 |
| Feb, 2024 | 18,750 |
| Mar, 2024 | 19,797 |
| Apr, 2024 | 19,923 |
| May, 2024 | 20,879 |
| Jun, 2024 | 19,209 |
| Jul, 2024 | 19,429 |
| Aug, 2024 | 19,149 |
| Sep, 2024 | 18,313 |
fig = px.line(months,
x='fecha_inicio',
y='crimes',
hover_data=['fecha_inicio','crimes',],
height=500,
width=830,
title='Crimes in Mexico City 2019-2024',
template='ggplot2',)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Crimes')),
)
fig.update_traces(line_color='#7f0000',
line={'width':3},
)
fig.show()| sexo | crimes |
|---|---|
| NA | 237,070 |
| Femenino | 535,960 |
| Masculino | 567,947 |
fig = px.bar(df_sex.sort_values('crimes'),
y='sexo',
x='crimes',
orientation='h',
hover_data=['sexo','crimes',],
height=500,
width=830,
title='Crimes in Mexico City by Sex',
template='ggplot2',
text='crimes',
)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Sex')),
)
fig.update_traces(marker_color='#7f0000',
texttemplate = "%{value:,.0f}",)
fig.show()| edad | crimes |
|---|---|
| 56 | 10,789 |
| 47 | 16,414 |
| 44 | 16,818 |
| 90 | 313 |
| 85 | 789 |
| 37 | 21,022 |
| 82 | 1,181 |
| 88 | 493 |
| 50 | 16,098 |
| 73 | 3,296 |
fig = px.bar(df_edad,
x='edad',
y='crimes',
orientation='v',
hover_data=['edad','crimes',],
height=500,
width=830,
title='Crimes in Mexico City by Age',
template='ggplot2',)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Age')),
)
fig.update_traces(marker_color='#7f0000',)
fig.show()| colonia_hecho | crimes |
|---|---|
| PEDREGAL DE SANTO DOMINGO | 10,073 |
| JUÁREZ | 11,423 |
| BUENAVISTA | 11,749 |
| NARVARTE | 11,906 |
| AGRÍCOLA ORIENTAL | 12,024 |
| MORELOS | 12,518 |
| ROMA NORTE | 14,878 |
| DEL VALLE CENTRO | 17,178 |
| DOCTORES | 24,711 |
| CENTRO | 40,066 |
fig = px.bar(df_colonia,
y='colonia_hecho',
x='crimes',
orientation='h',
hover_data=['colonia_hecho','crimes',],
height=500,
width=830,
title='Crimes in Mexico City - Top 10 Neighborhoods',
template='ggplot2',)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Alcaldia')),
)
fig.update_traces(marker_color='#7f0000',
texttemplate = "%{value:,.0f}",)
fig.show()| alcaldia_hecho | crimes |
|---|---|
| MILPA ALTA | 13,184 |
| CUAJIMALPA DE MORELOS | 23,329 |
| LA MAGDALENA CONTRERAS | 27,285 |
| TLAHUAC | 41,547 |
| XOCHIMILCO | 45,924 |
| IZTACALCO | 60,427 |
| AZCAPOTZALCO | 64,561 |
| VENUSTIANO CARRANZA | 78,446 |
| MIGUEL HIDALGO | 84,242 |
| TLALPAN | 84,382 |
| COYOACAN | 93,953 |
| ALVARO OBREGON | 94,083 |
| BENITO JUAREZ | 101,759 |
| GUSTAVO A. MADERO | 139,294 |
| CUAUHTEMOC | 188,964 |
| IZTAPALAPA | 199,591 |
fig = px.bar(df_alcaldia,
y='alcaldia_hecho',
x='crimes',
orientation='h',
hover_data=['alcaldia_hecho','crimes',],
height=500,
width=830,
title='Crimes in Mexico City by Mayorship',
template='ggplot2',)
fig.update_layout(
xaxis=dict(title=dict(text='')),
yaxis=dict(title=dict(text='Alcaldia')),
)
fig.update_traces(marker_color='#7f0000',
texttemplate = "%{value:,.0f}",)
fig.show()df_map = (
df.with_columns(
(pl.col('colonia_hecho') + ', ' + pl.col('alcaldia_hecho')).alias('neighborhood')
)
.filter(pl.col('alcaldia_hecho')!='FUERA DE CDMX')
.group_by('neighborhood', maintain_order=True)
.agg(latitude=pl.col('latitud').mean(),
longitude=pl.col('longitud').mean(),
crimes=pl.col('delito').len()
)
)| neighborhood | latitude | longitude | crimes |
|---|---|---|---|
| CENTRO, CUAUHTEMOC | 19.4327 | -99.1375 | 40,042 |
| DOCTORES, CUAUHTEMOC | 19.4200 | -99.1486 | 24,711 |
| DEL VALLE CENTRO, BENITO JUAREZ | 19.3831 | -99.1682 | 17,178 |
| ROMA NORTE, CUAUHTEMOC | 19.4184 | -99.1627 | 14,878 |
| AGRÍCOLA ORIENTAL, IZTACALCO | 19.3947 | -99.0708 | 12,008 |
| NARVARTE, BENITO JUAREZ | 19.3930 | -99.1542 | 11,906 |
| JUÁREZ, CUAUHTEMOC | 19.4268 | -99.1628 | 11,408 |
| PEDREGAL DE SANTO DOMINGO, COYOACAN | 19.3275 | -99.1677 | 10,073 |
| POLANCO, MIGUEL HIDALGO | 19.4335 | -99.1956 | 10,019 |
| AGRÍCOLA PANTITLAN, IZTACALCO | 19.4104 | -99.0649 | 9,662 |
<folium.plugins.heat_map.HeatMap at 0x310ecc530>
# mexico city crime map
m = folium.Map(
location=[19.35, -99.12],
zoom_start=10,
control_scale=False,
)
# Layers
Crime = folium.FeatureGroup(name='<u><b>Place</b></u>', show=True)
m.add_child(Crime)
#draw marker with symbol you want at base
my_symbol_css_class= """ <style>
.fa-mysymbol3:before {
font-family: Gill Sans;
font-weight: bold;
font-size: 11px;
color: white;
background-color:'';
border-radius: 10px;
white-space: pre;
content: 'P';
}
</style>
"""
# the below is just add above CSS class to folium root map
m.get_root().html.add_child(folium.Element(my_symbol_css_class))
# then we just create marker and specific your css class in icon like below
for i in heat_map.index:
html=f"""
<p style="font-size: 14px;">{heat_map.iloc[i]['neighborhood']}</font></p>
<p style="font-size: 14px;">Total crimes: {heat_map.iloc[i]['crimes']}</font></p>
"""
iframe = folium.IFrame(html=html, width=220, height=90)
popup = folium.Popup(iframe, max_width=250)
folium.Marker(
location = [heat_map.iloc[i]['latitude'], heat_map.iloc[i]['longitude']],
icon = folium.Icon(color='darkred', prefix='fa', icon='fa-mysymbol3'),
popup = popup,
tooltip = heat_map.iloc[i]['neighborhood']
).add_to(Crime)
folium.plugins.Fullscreen().add_to(m)
mAnalyzing crime in Mexico City presents significant challenges, and the limitations of available data create substantial obstacles to drawing definitive conclusions.
Varied Crime Landscape:
Mexico City experiences a range of criminal activities, from petty theft to organized crime-related violence. This creates a multifaceted challenge for law enforcement. The distribution of crime is uneven, with certain areas experiencing higher rates of specific offenses.
Impact of Socioeconomic Factors:
Poverty, inequality, and lack of opportunity contribute to crime rates in certain neighborhoods. Addressing these underlying socioeconomic issues is crucial for long-term crime reduction.
Efforts in Crime Reduction:
Mexico City has implemented strategies, including increased data transparency and targeted policing, aimed at reducing crime. The use of open data policies, has shown to be a helpful tool in crime reduction.
Challenges Remain:
Despite progress, challenges persist, including issues related to organized crime, corruption, and the effectiveness of the criminal justice system. The issue of firearms, and their influx from the United States, is a large factor in violent crimes.
The importance of Data:
The use of data to create crime maps, and to find crime hot spots, has become a very important tool for law enforcement.
The hardship of incomplete data makes it essential to improve transparency and data sharing among all agencies involved in public security.
A significant portion of crimes goes unreported due to distrust in authorities, fear of retaliation, or the perception that reporting is futile.
This underreporting creates a distorted picture of the actual crime situation.
Jesus LM
Economist & Data Scientist